diff mbox series

[ovs-dev,dpdk-hwol,v1] netdev-dpdk: support port representors

Message ID 1544571266-24777-1-git-send-email-ophirmu@mellanox.com
State Superseded
Headers show
Series [ovs-dev,dpdk-hwol,v1] netdev-dpdk: support port representors | expand

Commit Message

Ophir Munk Dec. 11, 2018, 11:34 p.m. UTC
Dpdk port representors were introduced in dpdk 18.xx.
Prior to representors there was a one-to-one relationship
between an rte device (e.g. PCI bus) and an eth device (a dpdk
port id). With representors the relationship becomes one-to-many
rte device to eth devices.
For example in [3] there are two devices (representors) using the same
PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
"0000:08:00.0,representor=[5]".
This commit handles the new one-to-many relationship. For example,
when one of the device representors in [3] is closed - the PCI bus
cannot be detached until the other device representor is closed as
well. OVS remains backward compatible by supporting dpdk legacy PCI
ports which do not include representors.
Dpdk representors related commits are listed in [1]. dpdk representors
documentation appears in [2]. A sample configuration which uses two
representors ports (the output of "ovs-vsctl show" command) is
shown in [3].

[1]
e0cb96204b71 ("net/i40e: add support for representor ports")
cf80ba6e2038 ("net/ixgbe: add support for representor ports")
26c08b979d26 ("net/mlx5: add port representor awareness")

[2]
doc/guides/prog_guide/switch_representation.rst

[3]
Bridge "ovs_br0"
    Port "ovs_br0"
        Interface "ovs_br0"
            type: internal
    Port "port-rep3"
        Interface "port-rep3"
            type: dpdk
            options: {dpdk-devargs="0000:08:00.0,representor=[3]"}
    Port "port-rep5"
        Interface "port-rep5"
            type: dpdk
            options: {dpdk-devargs="0000:08:00.0,representor=[5]"}
    ovs_version: "2.10.90"

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
---
v1:
1. rebase on top of Kevin's patch
dpdk: Update to use DPDK 18.11.[ovs-dev,v7,dpdk-latest,1/1] dpdk: Update to use DPDK 18.11.
https://patchwork.ozlabs.org/patch/1005535/
2. skipping count of sibling ports in case the sibling port state is RTE_ETH_DEV_UNUSED 

 lib/netdev-dpdk.c | 112 ++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 83 insertions(+), 29 deletions(-)

Comments

Stokes, Ian Dec. 12, 2018, 3:40 p.m. UTC | #1
On 12/11/2018 11:34 PM, Ophir Munk wrote:
> Dpdk port representors were introduced in dpdk 18.xx.
> Prior to representors there was a one-to-one relationship
> between an rte device (e.g. PCI bus) and an eth device (a dpdk
> port id). With representors the relationship becomes one-to-many
> rte device to eth devices.
> For example in [3] there are two devices (representors) using the same
> PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
> "0000:08:00.0,representor=[5]".
> This commit handles the new one-to-many relationship. For example,
> when one of the device representors in [3] is closed - the PCI bus
> cannot be detached until the other device representor is closed as
> well. OVS remains backward compatible by supporting dpdk legacy PCI
> ports which do not include representors.
> Dpdk representors related commits are listed in [1]. dpdk representors
> documentation appears in [2]. A sample configuration which uses two
> representors ports (the output of "ovs-vsctl show" command) is
> shown in [3].

Thanks for this Ophir.

I need to complete some more testing on various devices before a full 
review can be completed but here are some first pass comments in the 
mean time that could be addressed for a v2.

Also with support for 18.11 being upstreamed to master (by tomorrow if 
there are no other comments), will this patch be targeting OVS 2.11? If 
so you could target the v2 at master once 18.11 is there rather than 
dpdk-hwol but I'll leave it to you.

Ian
> 
> [1]
> e0cb96204b71 ("net/i40e: add support for representor ports")
> cf80ba6e2038 ("net/ixgbe: add support for representor ports")
> 26c08b979d26 ("net/mlx5: add port representor awareness")
> 
> [2]
> doc/guides/prog_guide/switch_representation.rst
> 
> [3]
> Bridge "ovs_br0"
>      Port "ovs_br0"
>          Interface "ovs_br0"
>              type: internal
>      Port "port-rep3"
>          Interface "port-rep3"
>              type: dpdk
>              options: {dpdk-devargs="0000:08:00.0,representor=[3]"}
>      Port "port-rep5"
>          Interface "port-rep5"
>              type: dpdk
>              options: {dpdk-devargs="0000:08:00.0,representor=[5]"}
>      ovs_version: "2.10.90"
> 
> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> ---
> v1:
> 1. rebase on top of Kevin's patch
> dpdk: Update to use DPDK 18.11.[ovs-dev,v7,dpdk-latest,1/1] dpdk: Update to use DPDK 18.11.
> https://patchwork.ozlabs.org/patch/1005535/
> 2. skipping count of sibling ports in case the sibling port state is RTE_ETH_DEV_UNUSED
> 
>   lib/netdev-dpdk.c | 112 ++++++++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 83 insertions(+), 29 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 6b8e05e..30af043 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1216,6 +1216,25 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
>       }
>   }
>   
> +/* get the number of OVS interfaces which have the same DPDK

Minor, can you capitalize the first letter of the comment.

> + * rte device (e.g. same pci bus address). */
> +static int
> +netdev_dpdk_get_num_ports(struct rte_device *device)
> +    OVS_REQUIRES(dpdk_mutex)
> +{
> +    struct netdev_dpdk *dev;
> +    int count;
> +
You could initialize count to 0 above where it is declared to save the 
line below.

> +    count = 0;
> +    LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> +        if (rte_eth_devices[dev->port_id].device == device
> +        && rte_eth_devices[dev->port_id].state != RTE_ETH_DEV_UNUSED) {
> +            count++;
> +        }
> +    }
> +    return count;
> +}
> +
>   static int
>   vhost_common_construct(struct netdev *netdev)
>       OVS_REQUIRES(dpdk_mutex)
> @@ -1351,20 +1370,23 @@ static void
>   netdev_dpdk_destruct(struct netdev *netdev)
>   {
>       struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> -    struct rte_eth_dev_info dev_info;
> +    struct rte_device *rte_dev;
>   
>       ovs_mutex_lock(&dpdk_mutex);
>   
>       rte_eth_dev_stop(dev->port_id);
>       dev->started = false;
> -
>       if (dev->attached) {
> +        /* Remove the port eth device */
>           rte_eth_dev_close(dev->port_id);
> -        rte_eth_dev_info_get(dev->port_id, &dev_info);
> -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
> -        } else {
> -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
> +        /* if this is the last port_id using this rte device

Minor: capitalize first letter of comment above, period at end of the 
comment.

> +         * remove this rte device and all its eth devices */
> +        rte_dev = rte_eth_devices[dev->port_id].device;
> +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
> +            if (rte_dev_remove(rte_dev) < 0) {
> +                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> +            }
>           }
>       }
>   
> @@ -1630,8 +1652,26 @@ netdev_dpdk_get_port_by_mac(const char *mac_str)
>       return DPDK_ETH_PORT_ID_INVALID;
>   }
>   
> +/* return the first DPDK port_id matching the devargs pattern */
Minor: Capitalize first word and add a period to the end of the comment.

> +static dpdk_port_t
> +netdev_dpdk_get_port_by_devargs(const char *devargs)
> +{
> +    struct rte_dev_iterator iterator;
> +    dpdk_port_t port_id;
> +
> +    if (rte_dev_probe(devargs)) {
> +        port_id = DPDK_ETH_PORT_ID_INVALID;
> +    } else {
> +        RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
> +            break;
Reading the comments for RTE_ETH_FOREACH_MATCHING_DEV

If a break is done before the end of the loop, the function 
rte_eth_iterator_cleanup() must be called.

That doesn't seem to be handled here and I would think needs to be checked?
> +        }
> +    }
> +    return port_id;
> +}
> +
>   /*
> - * Normally, a PCI id is enough for identifying a specific DPDK port.
> + * Normally, a PCI id (optionally followed by a representor number)
> + * is enough for identifying a specific DPDK port.
>    * However, for some NICs having multiple ports sharing the same PCI
>    * id, using PCI id won't work then.
>    *
> @@ -1644,29 +1684,32 @@ static dpdk_port_t
>   netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
>                               const char *devargs, char **errp)
>   {
> -    char *name;
>       dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
>   
>       if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
>           new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
>       } else {
> -        name = xmemdup0(devargs, strcspn(devargs, ","));
> -        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
> -                || !rte_eth_dev_is_valid_port(new_port_id)) {
> -            /* Device not found in DPDK, attempt to attach it */
> -            if (!rte_dev_probe(devargs)
> -                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
> -                /* Attach successful */
> -                dev->attached = true;
> -                VLOG_INFO("Device '%s' attached to DPDK", devargs);
> -            } else {
> -                /* Attach unsuccessful */
> +        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
> +        if (!rte_eth_dev_is_valid_port(new_port_id)) {
> +            new_port_id = DPDK_ETH_PORT_ID_INVALID;
> +        } else {
> +            struct netdev_dpdk *dup_dev;
> +
> +            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);

Above, netdev_dpdk_lookup_by_port_id(new_port_id) requires holding of 
dpdk_mutex.

> +            if (dup_dev) {
> +                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> +                               "which is already in use by '%s'",
> +                          netdev_get_name(&dev->up), devargs,
> +                          netdev_get_name(&dup_dev->up));

The alignment above here seems a bit odd, could you vertically align them?
>                   new_port_id = DPDK_ETH_PORT_ID_INVALID;
> +            } else {
> +                /* device successfully found */
Minor: Capital at beginning and period at end of comment.

> +                dev->attached = true;
> +                VLOG_INFO("Device '%s' attached to DPDK port %d",
> +                          devargs, new_port_id);
>               }
>           }
> -        free(name);
>       }
> -
>       if (new_port_id == DPDK_ETH_PORT_ID_INVALID) {
>           VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK", devargs);
>       }
> @@ -3234,11 +3277,15 @@ netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
>       char *response;
>       dpdk_port_t port_id;
>       struct netdev_dpdk *dev;
> -    struct rte_eth_dev_info dev_info;
> +    struct rte_device *rte_dev;
> +    struct rte_dev_iterator iterator;
>   
>       ovs_mutex_lock(&dpdk_mutex);
>   
> -    if (rte_eth_dev_get_port_by_name(argv[1], &port_id)) {
> +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, argv[1], &iterator) {
> +        break;
Same comment as above, need to handle break with iterator clean up if 
you do not reach the end of the loop.

> +    }
> +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
>           response = xasprintf("Device '%s' not found in DPDK", argv[1]);
>           goto error;
>       }
> @@ -3251,15 +3298,22 @@ netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
>           goto error;
>       }
>   
> -    rte_eth_dev_close(port_id);
> +    rte_dev = rte_eth_devices[port_id].device;
> +    if (netdev_dpdk_get_num_ports(rte_dev)) {
> +        response = xasprintf("Device '%s' is being shared with other "
> +                             "interfaces. Remove them before detaching.",
> +                             argv[1]);
> +        goto error;
> +    }
>   
> -    rte_eth_dev_info_get(port_id, &dev_info);
> -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> -        response = xasprintf("Device '%s' can not be detached", argv[1]);
> +    rte_eth_dev_close(port_id);
> +    if (rte_dev_remove(rte_dev) < 0) {
> +        response = xasprintf("Device '%s' can not be removed", argv[1]);
>           goto error;
>       }
>   
> -    response = xasprintf("Device '%s' has been detached", argv[1]);
> +    response = xasprintf("All devices shared with device '%s' "
> +                          "have been detached", argv[1]);
>   
>       ovs_mutex_unlock(&dpdk_mutex);
>       unixctl_command_reply(conn, response);
>
Ilya Maximets Dec. 12, 2018, 3:55 p.m. UTC | #2
Not a full review. Just one comment inline.

Best regards, Ilya Maximets.

On 12.12.2018 2:34, Ophir Munk wrote:
> Dpdk port representors were introduced in dpdk 18.xx.
> Prior to representors there was a one-to-one relationship
> between an rte device (e.g. PCI bus) and an eth device (a dpdk
> port id). With representors the relationship becomes one-to-many
> rte device to eth devices.
> For example in [3] there are two devices (representors) using the same
> PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
> "0000:08:00.0,representor=[5]".
> This commit handles the new one-to-many relationship. For example,
> when one of the device representors in [3] is closed - the PCI bus
> cannot be detached until the other device representor is closed as
> well. OVS remains backward compatible by supporting dpdk legacy PCI
> ports which do not include representors.
> Dpdk representors related commits are listed in [1]. dpdk representors
> documentation appears in [2]. A sample configuration which uses two
> representors ports (the output of "ovs-vsctl show" command) is
> shown in [3].
> 
> [1]
> e0cb96204b71 ("net/i40e: add support for representor ports")
> cf80ba6e2038 ("net/ixgbe: add support for representor ports")
> 26c08b979d26 ("net/mlx5: add port representor awareness")
> 
> [2]
> doc/guides/prog_guide/switch_representation.rst
> 
> [3]
> Bridge "ovs_br0"
>     Port "ovs_br0"
>         Interface "ovs_br0"
>             type: internal
>     Port "port-rep3"
>         Interface "port-rep3"
>             type: dpdk
>             options: {dpdk-devargs="0000:08:00.0,representor=[3]"}
>     Port "port-rep5"
>         Interface "port-rep5"
>             type: dpdk
>             options: {dpdk-devargs="0000:08:00.0,representor=[5]"}
>     ovs_version: "2.10.90"
> 
> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> ---
> v1:
> 1. rebase on top of Kevin's patch
> dpdk: Update to use DPDK 18.11.[ovs-dev,v7,dpdk-latest,1/1] dpdk: Update to use DPDK 18.11.
> https://patchwork.ozlabs.org/patch/1005535/
> 2. skipping count of sibling ports in case the sibling port state is RTE_ETH_DEV_UNUSED 
> 
>  lib/netdev-dpdk.c | 112 ++++++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 83 insertions(+), 29 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 6b8e05e..30af043 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1216,6 +1216,25 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
>      }
>  }
>  
> +/* get the number of OVS interfaces which have the same DPDK
> + * rte device (e.g. same pci bus address). */
> +static int
> +netdev_dpdk_get_num_ports(struct rte_device *device)
> +    OVS_REQUIRES(dpdk_mutex)
> +{
> +    struct netdev_dpdk *dev;
> +    int count;
> +
> +    count = 0;
> +    LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> +        if (rte_eth_devices[dev->port_id].device == device
> +        && rte_eth_devices[dev->port_id].state != RTE_ETH_DEV_UNUSED) {
> +            count++;
> +        }
> +    }
> +    return count;
> +}
> +
>  static int
>  vhost_common_construct(struct netdev *netdev)
>      OVS_REQUIRES(dpdk_mutex)
> @@ -1351,20 +1370,23 @@ static void
>  netdev_dpdk_destruct(struct netdev *netdev)
>  {
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> -    struct rte_eth_dev_info dev_info;
> +    struct rte_device *rte_dev;
>  
>      ovs_mutex_lock(&dpdk_mutex);
>  
>      rte_eth_dev_stop(dev->port_id);
>      dev->started = false;
> -
>      if (dev->attached) {
> +        /* Remove the port eth device */
>          rte_eth_dev_close(dev->port_id);
> -        rte_eth_dev_info_get(dev->port_id, &dev_info);
> -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
> -        } else {
> -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
> +        /* if this is the last port_id using this rte device
> +         * remove this rte device and all its eth devices */
> +        rte_dev = rte_eth_devices[dev->port_id].device;
> +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
> +            if (rte_dev_remove(rte_dev) < 0) {
> +                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> +            }
>          }
>      }
>  
> @@ -1630,8 +1652,26 @@ netdev_dpdk_get_port_by_mac(const char *mac_str)
>      return DPDK_ETH_PORT_ID_INVALID;
>  }
>  
> +/* return the first DPDK port_id matching the devargs pattern */
> +static dpdk_port_t
> +netdev_dpdk_get_port_by_devargs(const char *devargs)
> +{
> +    struct rte_dev_iterator iterator;
> +    dpdk_port_t port_id;
> +
> +    if (rte_dev_probe(devargs)) {
> +        port_id = DPDK_ETH_PORT_ID_INVALID;
> +    } else {
> +        RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
> +            break;
> +        }
> +    }
> +    return port_id;
> +}
> +
>  /*
> - * Normally, a PCI id is enough for identifying a specific DPDK port.
> + * Normally, a PCI id (optionally followed by a representor number)
> + * is enough for identifying a specific DPDK port.
>   * However, for some NICs having multiple ports sharing the same PCI
>   * id, using PCI id won't work then.
>   *
> @@ -1644,29 +1684,32 @@ static dpdk_port_t
>  netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
>                              const char *devargs, char **errp)
>  {
> -    char *name;
>      dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
>  
>      if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
>          new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
>      } else {
> -        name = xmemdup0(devargs, strcspn(devargs, ","));
> -        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
> -                || !rte_eth_dev_is_valid_port(new_port_id)) {
> -            /* Device not found in DPDK, attempt to attach it */
> -            if (!rte_dev_probe(devargs)
> -                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
> -                /* Attach successful */
> -                dev->attached = true;
> -                VLOG_INFO("Device '%s' attached to DPDK", devargs);
> -            } else {
> -                /* Attach unsuccessful */
> +        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
> +        if (!rte_eth_dev_is_valid_port(new_port_id)) {
> +            new_port_id = DPDK_ETH_PORT_ID_INVALID;
> +        } else {
> +            struct netdev_dpdk *dup_dev;
> +
> +            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> +            if (dup_dev) {
> +                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> +                               "which is already in use by '%s'",
> +                          netdev_get_name(&dev->up), devargs,
> +                          netdev_get_name(&dup_dev->up));
>                  new_port_id = DPDK_ETH_PORT_ID_INVALID;
> +            } else {
> +                /* device successfully found */
> +                dev->attached = true;
> +                VLOG_INFO("Device '%s' attached to DPDK port %d",
> +                          devargs, new_port_id);
>              }
>          }
> -        free(name);
>      }
> -
>      if (new_port_id == DPDK_ETH_PORT_ID_INVALID) {
>          VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK", devargs);
>      }
> @@ -3234,11 +3277,15 @@ netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
>      char *response;
>      dpdk_port_t port_id;
>      struct netdev_dpdk *dev;
> -    struct rte_eth_dev_info dev_info;
> +    struct rte_device *rte_dev;
> +    struct rte_dev_iterator iterator;
>  
>      ovs_mutex_lock(&dpdk_mutex);
>  
> -    if (rte_eth_dev_get_port_by_name(argv[1], &port_id)) {
> +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, argv[1], &iterator) {
> +        break;
> +    }
> +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
>          response = xasprintf("Device '%s' not found in DPDK", argv[1]);
>          goto error;
>      }
> @@ -3251,15 +3298,22 @@ netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
>          goto error;
>      }
>  
> -    rte_eth_dev_close(port_id);
> +    rte_dev = rte_eth_devices[port_id].device;


We should not use 'rte_eth_devices' directly because it's internal to DPDK.
See the discussion here:
   http://mails.dpdk.org/archives/dev/2018-November/119198.html


> +    if (netdev_dpdk_get_num_ports(rte_dev)) {
> +        response = xasprintf("Device '%s' is being shared with other "
> +                             "interfaces. Remove them before detaching.",
> +                             argv[1]);
> +        goto error;
> +    }
>  
> -    rte_eth_dev_info_get(port_id, &dev_info);
> -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> -        response = xasprintf("Device '%s' can not be detached", argv[1]);
> +    rte_eth_dev_close(port_id);
> +    if (rte_dev_remove(rte_dev) < 0) {
> +        response = xasprintf("Device '%s' can not be removed", argv[1]);
>          goto error;
>      }
>  
> -    response = xasprintf("Device '%s' has been detached", argv[1]);
> +    response = xasprintf("All devices shared with device '%s' "
> +                          "have been detached", argv[1]);
>  
>      ovs_mutex_unlock(&dpdk_mutex);
>      unixctl_command_reply(conn, response);
>
Aaron Conole Dec. 15, 2018, 10:11 p.m. UTC | #3
Ophir Munk <ophirmu@mellanox.com> writes:

> Dpdk port representors were introduced in dpdk 18.xx.
> Prior to representors there was a one-to-one relationship
> between an rte device (e.g. PCI bus) and an eth device (a dpdk
> port id). With representors the relationship becomes one-to-many
> rte device to eth devices.
> For example in [3] there are two devices (representors) using the same
> PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
> "0000:08:00.0,representor=[5]".
> This commit handles the new one-to-many relationship. For example,
> when one of the device representors in [3] is closed - the PCI bus
> cannot be detached until the other device representor is closed as
> well. OVS remains backward compatible by supporting dpdk legacy PCI
> ports which do not include representors.
> Dpdk representors related commits are listed in [1]. dpdk representors
> documentation appears in [2]. A sample configuration which uses two
> representors ports (the output of "ovs-vsctl show" command) is
> shown in [3].
>
> [1]
> e0cb96204b71 ("net/i40e: add support for representor ports")
> cf80ba6e2038 ("net/ixgbe: add support for representor ports")
> 26c08b979d26 ("net/mlx5: add port representor awareness")
>
> [2]
> doc/guides/prog_guide/switch_representation.rst
>
> [3]
> Bridge "ovs_br0"
>     Port "ovs_br0"
>         Interface "ovs_br0"
>             type: internal
>     Port "port-rep3"
>         Interface "port-rep3"
>             type: dpdk
>             options: {dpdk-devargs="0000:08:00.0,representor=[3]"}
>     Port "port-rep5"
>         Interface "port-rep5"
>             type: dpdk
>             options: {dpdk-devargs="0000:08:00.0,representor=[5]"}
>     ovs_version: "2.10.90"
>
> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> ---

FYI, the following clang error seems to be spit out by this patch:

libtool: compile:  clang -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -Wthread-safety -fno-strict-aliasing -Wswitch-bool -Wlogical-not-parentheses -Wsizeof-array-argument -Wshift-negative-value -Qunused-arguments -Wshadow -Warray-bounds-pointer-arithmetic -mssse3 -I./dpdk-18.11/build/include -D_FILE_OFFSET_BITS=64 -Werror -Wno-cast-align -Wno-error=unused-command-line-argument -MT lib/stream-ssl.lo -MD -MP -MF lib/.deps/stream-ssl.Tpo -c lib/stream-ssl.c -o lib/stream-ssl.o
lib/netdev-dpdk.c:1700:23: error: calling function
      'netdev_dpdk_lookup_by_port_id' requires holding mutex 'dpdk_mutex'
      exclusively [-Werror,-Wthread-safety-analysis]
            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);

> v1:
> 1. rebase on top of Kevin's patch
> dpdk: Update to use DPDK 18.11.[ovs-dev,v7,dpdk-latest,1/1] dpdk: Update to use DPDK 18.11.
> https://patchwork.ozlabs.org/patch/1005535/
> 2. skipping count of sibling ports in case the sibling port state is RTE_ETH_DEV_UNUSED 
>
>  lib/netdev-dpdk.c | 112 ++++++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 83 insertions(+), 29 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 6b8e05e..30af043 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1216,6 +1216,25 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
>      }
>  }
>  
> +/* get the number of OVS interfaces which have the same DPDK
> + * rte device (e.g. same pci bus address). */
> +static int
> +netdev_dpdk_get_num_ports(struct rte_device *device)
> +    OVS_REQUIRES(dpdk_mutex)
> +{
> +    struct netdev_dpdk *dev;
> +    int count;
> +
> +    count = 0;
> +    LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> +        if (rte_eth_devices[dev->port_id].device == device
> +        && rte_eth_devices[dev->port_id].state != RTE_ETH_DEV_UNUSED) {
> +            count++;
> +        }
> +    }
> +    return count;
> +}
> +
>  static int
>  vhost_common_construct(struct netdev *netdev)
>      OVS_REQUIRES(dpdk_mutex)
> @@ -1351,20 +1370,23 @@ static void
>  netdev_dpdk_destruct(struct netdev *netdev)
>  {
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> -    struct rte_eth_dev_info dev_info;
> +    struct rte_device *rte_dev;
>  
>      ovs_mutex_lock(&dpdk_mutex);
>  
>      rte_eth_dev_stop(dev->port_id);
>      dev->started = false;
> -
>      if (dev->attached) {
> +        /* Remove the port eth device */
>          rte_eth_dev_close(dev->port_id);
> -        rte_eth_dev_info_get(dev->port_id, &dev_info);
> -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
> -        } else {
> -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
> +        /* if this is the last port_id using this rte device
> +         * remove this rte device and all its eth devices */
> +        rte_dev = rte_eth_devices[dev->port_id].device;
> +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
> +            if (rte_dev_remove(rte_dev) < 0) {
> +                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> +            }
>          }
>      }
>  
> @@ -1630,8 +1652,26 @@ netdev_dpdk_get_port_by_mac(const char *mac_str)
>      return DPDK_ETH_PORT_ID_INVALID;
>  }
>  
> +/* return the first DPDK port_id matching the devargs pattern */
> +static dpdk_port_t
> +netdev_dpdk_get_port_by_devargs(const char *devargs)
> +{
> +    struct rte_dev_iterator iterator;
> +    dpdk_port_t port_id;
> +
> +    if (rte_dev_probe(devargs)) {
> +        port_id = DPDK_ETH_PORT_ID_INVALID;
> +    } else {
> +        RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
> +            break;
> +        }
> +    }
> +    return port_id;
> +}
> +
>  /*
> - * Normally, a PCI id is enough for identifying a specific DPDK port.
> + * Normally, a PCI id (optionally followed by a representor number)
> + * is enough for identifying a specific DPDK port.
>   * However, for some NICs having multiple ports sharing the same PCI
>   * id, using PCI id won't work then.
>   *
> @@ -1644,29 +1684,32 @@ static dpdk_port_t
>  netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
>                              const char *devargs, char **errp)
>  {
> -    char *name;
>      dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
>  
>      if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
>          new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
>      } else {
> -        name = xmemdup0(devargs, strcspn(devargs, ","));
> -        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
> -                || !rte_eth_dev_is_valid_port(new_port_id)) {
> -            /* Device not found in DPDK, attempt to attach it */
> -            if (!rte_dev_probe(devargs)
> -                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
> -                /* Attach successful */
> -                dev->attached = true;
> -                VLOG_INFO("Device '%s' attached to DPDK", devargs);
> -            } else {
> -                /* Attach unsuccessful */
> +        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
> +        if (!rte_eth_dev_is_valid_port(new_port_id)) {
> +            new_port_id = DPDK_ETH_PORT_ID_INVALID;
> +        } else {
> +            struct netdev_dpdk *dup_dev;
> +
> +            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> +            if (dup_dev) {
> +                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> +                               "which is already in use by '%s'",
> +                          netdev_get_name(&dev->up), devargs,
> +                          netdev_get_name(&dup_dev->up));
>                  new_port_id = DPDK_ETH_PORT_ID_INVALID;
> +            } else {
> +                /* device successfully found */
> +                dev->attached = true;
> +                VLOG_INFO("Device '%s' attached to DPDK port %d",
> +                          devargs, new_port_id);
>              }
>          }
> -        free(name);
>      }
> -
>      if (new_port_id == DPDK_ETH_PORT_ID_INVALID) {
>          VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK", devargs);
>      }
> @@ -3234,11 +3277,15 @@ netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
>      char *response;
>      dpdk_port_t port_id;
>      struct netdev_dpdk *dev;
> -    struct rte_eth_dev_info dev_info;
> +    struct rte_device *rte_dev;
> +    struct rte_dev_iterator iterator;
>  
>      ovs_mutex_lock(&dpdk_mutex);
>  
> -    if (rte_eth_dev_get_port_by_name(argv[1], &port_id)) {
> +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, argv[1], &iterator) {
> +        break;
> +    }
> +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
>          response = xasprintf("Device '%s' not found in DPDK", argv[1]);
>          goto error;
>      }
> @@ -3251,15 +3298,22 @@ netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
>          goto error;
>      }
>  
> -    rte_eth_dev_close(port_id);
> +    rte_dev = rte_eth_devices[port_id].device;
> +    if (netdev_dpdk_get_num_ports(rte_dev)) {
> +        response = xasprintf("Device '%s' is being shared with other "
> +                             "interfaces. Remove them before detaching.",
> +                             argv[1]);
> +        goto error;
> +    }
>  
> -    rte_eth_dev_info_get(port_id, &dev_info);
> -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> -        response = xasprintf("Device '%s' can not be detached", argv[1]);
> +    rte_eth_dev_close(port_id);
> +    if (rte_dev_remove(rte_dev) < 0) {
> +        response = xasprintf("Device '%s' can not be removed", argv[1]);
>          goto error;
>      }
>  
> -    response = xasprintf("Device '%s' has been detached", argv[1]);
> +    response = xasprintf("All devices shared with device '%s' "
> +                          "have been detached", argv[1]);
>  
>      ovs_mutex_unlock(&dpdk_mutex);
>      unixctl_command_reply(conn, response);
Ophir Munk Dec. 15, 2018, 11:28 p.m. UTC | #4
Hi Aaron,
It seems like a false negative, nevertheless this error must be eliminated.
Please note that line 700:
dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
is called inside static function netdev_dpdk_process_devargs()
which is called only once by another static function netdev_dpdk_set_config() where the dpdk_mutex is taken.
So why does clang issue an error? 
I guess that since function netdev_dpdk_process_devargs() did not take dpdk_mutes explicitly inside its body - clang cannot detect the transitivity from netdev_dpdk_set_config to netdev_dpdk_process_devrags of taking the dpdk_mutex
I suggest adding OVS_REQUIRES(dpdk_mutex) explicitly in netdev_dpdk_process_devrags() in a hope to fix this error. 
I will add it in v2.

Is there a way to verify the clang errors independently before sending patches? Can you please advise on this?

Regards,
Ophir

  1686 static dpdk_port_t
  1687 netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
  1688                             const char *devargs, char **errp)
  1689     OVS_REQUIRES(dpdk_mutex)
  1690 {
	  1691     dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
	  1692
	  1693     if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
		  1694         new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
		  1695     } else {
			  1696         new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
			  1697         if (!rte_eth_dev_is_valid_port(new_port_id)) {
				  1698             new_port_id = DPDK_ETH_PORT_ID_INVALID;
				  1699         } else {
					  1700             struct netdev_dpdk *dup_dev;
					  1701
					  1702             dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
					  1703             if (dup_dev) {
						  1704                 VLOG_WARN_BUF(errp, "'%s' is trying to use d


static int
1755 netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
1756                        char **errp)
1757 {


> -----Original Message-----
> From: Aaron Conole <aconole@bytheb.org>
> Sent: Sunday, December 16, 2018 12:12 AM
> To: Ophir Munk <ophirmu@mellanox.com>
> Cc: ovs-dev@openvswitch.org; Shahaf Shuler <shahafs@mellanox.com>;
> Thomas Monjalon <thomas@monjalon.net>
> Subject: Re: [ovs-dev] [dpdk-hwol PATCH v1] netdev-dpdk: support port
> representors
> 
> Ophir Munk <ophirmu@mellanox.com> writes:
> 
> > Dpdk port representors were introduced in dpdk 18.xx.
> > Prior to representors there was a one-to-one relationship between an
> > rte device (e.g. PCI bus) and an eth device (a dpdk port id). With
> > representors the relationship becomes one-to-many rte device to eth
> > devices.
> > For example in [3] there are two devices (representors) using the same
> > PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
> > "0000:08:00.0,representor=[5]".
> > This commit handles the new one-to-many relationship. For example,
> > when one of the device representors in [3] is closed - the PCI bus
> > cannot be detached until the other device representor is closed as
> > well. OVS remains backward compatible by supporting dpdk legacy PCI
> > ports which do not include representors.
> > Dpdk representors related commits are listed in [1]. dpdk representors
> > documentation appears in [2]. A sample configuration which uses two
> > representors ports (the output of "ovs-vsctl show" command) is shown
> > in [3].
> >
> > [1]
> > e0cb96204b71 ("net/i40e: add support for representor ports")
> > cf80ba6e2038 ("net/ixgbe: add support for representor ports")
> > 26c08b979d26 ("net/mlx5: add port representor awareness")
> >
> > [2]
> > doc/guides/prog_guide/switch_representation.rst
> >
> > [3]
> > Bridge "ovs_br0"
> >     Port "ovs_br0"
> >         Interface "ovs_br0"
> >             type: internal
> >     Port "port-rep3"
> >         Interface "port-rep3"
> >             type: dpdk
> >             options: {dpdk-devargs="0000:08:00.0,representor=[3]"}
> >     Port "port-rep5"
> >         Interface "port-rep5"
> >             type: dpdk
> >             options: {dpdk-devargs="0000:08:00.0,representor=[5]"}
> >     ovs_version: "2.10.90"
> >
> > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> > ---
> 
> FYI, the following clang error seems to be spit out by this patch:
> 
> libtool: compile:  clang -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I
> ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -
> Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-
> function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -
> Wmissing-prototypes -Wmissing-field-initializers -Wthread-safety -fno-strict-
> aliasing -Wswitch-bool -Wlogical-not-parentheses -Wsizeof-array-argument -
> Wshift-negative-value -Qunused-arguments -Wshadow -Warray-bounds-
> pointer-arithmetic -mssse3 -I./dpdk-18.11/build/include -
> D_FILE_OFFSET_BITS=64 -Werror -Wno-cast-align -Wno-error=unused-
> command-line-argument -MT lib/stream-ssl.lo -MD -MP -MF
> lib/.deps/stream-ssl.Tpo -c lib/stream-ssl.c -o lib/stream-ssl.o
> lib/netdev-dpdk.c:1700:23: error: calling function
>       'netdev_dpdk_lookup_by_port_id' requires holding mutex 'dpdk_mutex'
>       exclusively [-Werror,-Wthread-safety-analysis]
>             dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> 
> > v1:
> > 1. rebase on top of Kevin's patch
> > dpdk: Update to use DPDK 18.11.[ovs-dev,v7,dpdk-latest,1/1] dpdk:
> Update to use DPDK 18.11.
> >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpat
> >
> chwork.ozlabs.org%2Fpatch%2F1005535%2F&amp;data=02%7C01%7Cophirm
> u%40me
> >
> llanox.com%7Cf6a20b11663f4ab1a39608d662da4f79%7Ca652971c7d2e4d9ba6
> a4d1
> >
> 49256f461b%7C0%7C0%7C636805087113130623&amp;sdata=A%2BiFB3q%2B
> HNqRPua0
> > qyCA06ZwfapeUNKNG2%2F0XsbNcRE%3D&amp;reserved=0
> > 2. skipping count of sibling ports in case the sibling port state is
> > RTE_ETH_DEV_UNUSED
> >
> >  lib/netdev-dpdk.c | 112
> > ++++++++++++++++++++++++++++++++++++++++--------------
> >  1 file changed, 83 insertions(+), 29 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > 6b8e05e..30af043 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -1216,6 +1216,25 @@ dpdk_dev_parse_name(const char dev_name[],
> const char prefix[],
> >      }
> >  }
> >
> > +/* get the number of OVS interfaces which have the same DPDK
> > + * rte device (e.g. same pci bus address). */ static int
> > +netdev_dpdk_get_num_ports(struct rte_device *device)
> > +    OVS_REQUIRES(dpdk_mutex)
> > +{
> > +    struct netdev_dpdk *dev;
> > +    int count;
> > +
> > +    count = 0;
> > +    LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> > +        if (rte_eth_devices[dev->port_id].device == device
> > +        && rte_eth_devices[dev->port_id].state != RTE_ETH_DEV_UNUSED)
> {
> > +            count++;
> > +        }
> > +    }
> > +    return count;
> > +}
> > +
> >  static int
> >  vhost_common_construct(struct netdev *netdev)
> >      OVS_REQUIRES(dpdk_mutex)
> > @@ -1351,20 +1370,23 @@ static void
> >  netdev_dpdk_destruct(struct netdev *netdev)  {
> >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > -    struct rte_eth_dev_info dev_info;
> > +    struct rte_device *rte_dev;
> >
> >      ovs_mutex_lock(&dpdk_mutex);
> >
> >      rte_eth_dev_stop(dev->port_id);
> >      dev->started = false;
> > -
> >      if (dev->attached) {
> > +        /* Remove the port eth device */
> >          rte_eth_dev_close(dev->port_id);
> > -        rte_eth_dev_info_get(dev->port_id, &dev_info);
> > -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> > -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
> > -        } else {
> > -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> > +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
> > +        /* if this is the last port_id using this rte device
> > +         * remove this rte device and all its eth devices */
> > +        rte_dev = rte_eth_devices[dev->port_id].device;
> > +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
> > +            if (rte_dev_remove(rte_dev) < 0) {
> > +                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> > +            }
> >          }
> >      }
> >
> > @@ -1630,8 +1652,26 @@ netdev_dpdk_get_port_by_mac(const char
> *mac_str)
> >      return DPDK_ETH_PORT_ID_INVALID;
> >  }
> >
> > +/* return the first DPDK port_id matching the devargs pattern */
> > +static dpdk_port_t netdev_dpdk_get_port_by_devargs(const char
> > +*devargs) {
> > +    struct rte_dev_iterator iterator;
> > +    dpdk_port_t port_id;
> > +
> > +    if (rte_dev_probe(devargs)) {
> > +        port_id = DPDK_ETH_PORT_ID_INVALID;
> > +    } else {
> > +        RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
> > +            break;
> > +        }
> > +    }
> > +    return port_id;
> > +}
> > +
> >  /*
> > - * Normally, a PCI id is enough for identifying a specific DPDK port.
> > + * Normally, a PCI id (optionally followed by a representor number)
> > + * is enough for identifying a specific DPDK port.
> >   * However, for some NICs having multiple ports sharing the same PCI
> >   * id, using PCI id won't work then.
> >   *
> > @@ -1644,29 +1684,32 @@ static dpdk_port_t
> > netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
> >                              const char *devargs, char **errp)  {
> > -    char *name;
> >      dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
> >
> >      if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
> >          new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
> >      } else {
> > -        name = xmemdup0(devargs, strcspn(devargs, ","));
> > -        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
> > -                || !rte_eth_dev_is_valid_port(new_port_id)) {
> > -            /* Device not found in DPDK, attempt to attach it */
> > -            if (!rte_dev_probe(devargs)
> > -                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
> > -                /* Attach successful */
> > -                dev->attached = true;
> > -                VLOG_INFO("Device '%s' attached to DPDK", devargs);
> > -            } else {
> > -                /* Attach unsuccessful */
> > +        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
> > +        if (!rte_eth_dev_is_valid_port(new_port_id)) {
> > +            new_port_id = DPDK_ETH_PORT_ID_INVALID;
> > +        } else {
> > +            struct netdev_dpdk *dup_dev;
> > +
> > +            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> > +            if (dup_dev) {
> > +                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> > +                               "which is already in use by '%s'",
> > +                          netdev_get_name(&dev->up), devargs,
> > +                          netdev_get_name(&dup_dev->up));
> >                  new_port_id = DPDK_ETH_PORT_ID_INVALID;
> > +            } else {
> > +                /* device successfully found */
> > +                dev->attached = true;
> > +                VLOG_INFO("Device '%s' attached to DPDK port %d",
> > +                          devargs, new_port_id);
> >              }
> >          }
> > -        free(name);
> >      }
> > -
> >      if (new_port_id == DPDK_ETH_PORT_ID_INVALID) {
> >          VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK",
> devargs);
> >      }
> > @@ -3234,11 +3277,15 @@ netdev_dpdk_detach(struct unixctl_conn
> *conn, int argc OVS_UNUSED,
> >      char *response;
> >      dpdk_port_t port_id;
> >      struct netdev_dpdk *dev;
> > -    struct rte_eth_dev_info dev_info;
> > +    struct rte_device *rte_dev;
> > +    struct rte_dev_iterator iterator;
> >
> >      ovs_mutex_lock(&dpdk_mutex);
> >
> > -    if (rte_eth_dev_get_port_by_name(argv[1], &port_id)) {
> > +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, argv[1], &iterator) {
> > +        break;
> > +    }
> > +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
> >          response = xasprintf("Device '%s' not found in DPDK", argv[1]);
> >          goto error;
> >      }
> > @@ -3251,15 +3298,22 @@ netdev_dpdk_detach(struct unixctl_conn
> *conn, int argc OVS_UNUSED,
> >          goto error;
> >      }
> >
> > -    rte_eth_dev_close(port_id);
> > +    rte_dev = rte_eth_devices[port_id].device;
> > +    if (netdev_dpdk_get_num_ports(rte_dev)) {
> > +        response = xasprintf("Device '%s' is being shared with other "
> > +                             "interfaces. Remove them before detaching.",
> > +                             argv[1]);
> > +        goto error;
> > +    }
> >
> > -    rte_eth_dev_info_get(port_id, &dev_info);
> > -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> > -        response = xasprintf("Device '%s' can not be detached", argv[1]);
> > +    rte_eth_dev_close(port_id);
> > +    if (rte_dev_remove(rte_dev) < 0) {
> > +        response = xasprintf("Device '%s' can not be removed",
> > + argv[1]);
> >          goto error;
> >      }
> >
> > -    response = xasprintf("Device '%s' has been detached", argv[1]);
> > +    response = xasprintf("All devices shared with device '%s' "
> > +                          "have been detached", argv[1]);
> >
> >      ovs_mutex_unlock(&dpdk_mutex);
> >      unixctl_command_reply(conn, response);
Ophir Munk Dec. 16, 2018, 12:02 p.m. UTC | #5
Hi Ilya,
Please find comments inline.

> -----Original Message-----
> From: Ilya Maximets <i.maximets@samsung.com>
> Sent: Wednesday, December 12, 2018 5:56 PM
> To: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org
> Cc: Shahaf Shuler <shahafs@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Stokes, Ian <ian.stokes@intel.com>
> Subject: Re: [ovs-dev,dpdk-hwol,v1] netdev-dpdk: support port
> representors
> 
> Not a full review. Just one comment inline.
> 
> Best regards, Ilya Maximets.
> 
> On 12.12.2018 2:34, Ophir Munk wrote:
> > Dpdk port representors were introduced in dpdk 18.xx.
> 
.....
> >
> > -    rte_eth_dev_close(port_id);
> > +    rte_dev = rte_eth_devices[port_id].device;
> 
> 
> We should not use 'rte_eth_devices' directly because it's internal to DPDK.
> See the discussion here:
>  http://mails.dpdk.org/archives/dev/2018-November/119198.html

The discussions mentioned in the link raised pros and cons for using a direct access to rte_eth_devices[] but was not conclusive.
I am in favor of keeping the direct access to rte_eth_devices array for the following reasons:
1. Using rte_eth_dev_info_get() API uses the same pointer to the internal rte_eth_devices[] array. So actually it is as dangerous as accessing the array directly.
2. Theoretically there is logic in using the API as it may have a safer implementation in the future. However, I talked with Thomans Monjalon - the maintainer of this API - and he recommended using the direct array access. Thomas mentioned that this API will be dropped. 

Thomas - can you please share your view on this topic?

Regards,
Ophir
Ophir Munk Dec. 16, 2018, 12:21 p.m. UTC | #6
Hi Ian,
Please find comments inline.

> -----Original Message-----
> From: Ian Stokes <ian.stokes@intel.com>
> Sent: Wednesday, December 12, 2018 5:41 PM
> To: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org
> Cc: Asaf Penso <asafp@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>; Kevin Traynor <ktraynor@redhat.com>; Thomas
> Monjalon <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>
> Subject: Re: [dpdk-hwol PATCH v1] netdev-dpdk: support port representors
> 
> On 12/11/2018 11:34 PM, Ophir Munk wrote:
> > Dpdk port representors were introduced in dpdk 18.xx.
...
> 
> Thanks for this Ophir.
> 
> I need to complete some more testing on various devices before a full
> review can be completed but here are some first pass comments in the
> mean time that could be addressed for a v2.
> 
> Also with support for 18.11 being upstreamed to master (by tomorrow if
> there are no other comments), will this patch be targeting OVS 2.11? If so
> you could target the v2 at master once 18.11 is there rather than dpdk-hwol
> but I'll leave it to you.
> 

Yes, this patch is targeting OVS 2.11.
All your review comments are accepted. 
I will send v2 on master with updates following your comments.

Regards,
Ophir
Thomas Monjalon Dec. 17, 2018, 9:49 a.m. UTC | #7
16/12/2018 13:02, Ophir Munk:
> Hi Ilya,
> Please find comments inline.
> 
> From: Ilya Maximets <i.maximets@samsung.com>
> > 
> > Not a full review. Just one comment inline.
> > 
> > Best regards, Ilya Maximets.
> > 
> > On 12.12.2018 2:34, Ophir Munk wrote:
> > > Dpdk port representors were introduced in dpdk 18.xx.
> > 
> .....
> > >
> > > -    rte_eth_dev_close(port_id);
> > > +    rte_dev = rte_eth_devices[port_id].device;
> > 
> > 
> > We should not use 'rte_eth_devices' directly because it's internal to DPDK.
> > See the discussion here:
> >  http://mails.dpdk.org/archives/dev/2018-November/119198.html
> 
> The discussions mentioned in the link raised pros and cons for using a direct access to rte_eth_devices[] but was not conclusive.
> I am in favor of keeping the direct access to rte_eth_devices array for the following reasons:
> 1. Using rte_eth_dev_info_get() API uses the same pointer to the internal rte_eth_devices[] array. So actually it is as dangerous as accessing the array directly.
> 2. Theoretically there is logic in using the API as it may have a safer implementation in the future. However, I talked with Thomans Monjalon - the maintainer of this API - and he recommended using the direct array access. Thomas mentioned that this API will be dropped. 
> 
> Thomas - can you please share your view on this topic?

I agree with the idea of avoiding direct access to this structure.
That's why I wrote a new function specifically for this use case:
	https://patches.dpdk.org/patch/48429/

About workarounding the direct access with rte_eth_dev_info_get(),
I am not sure it is a great idea.
In my opinion, it is best to keep the direct access to the array
and add a comment about the need to replace it with a proper API.
Ilya Maximets Dec. 18, 2018, 12:11 p.m. UTC | #8
On 17.12.2018 12:49, Thomas Monjalon wrote:
> 16/12/2018 13:02, Ophir Munk:
>> Hi Ilya,
>> Please find comments inline.
>>
>> From: Ilya Maximets <i.maximets@samsung.com>
>>>
>>> Not a full review. Just one comment inline.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>> On 12.12.2018 2:34, Ophir Munk wrote:
>>>> Dpdk port representors were introduced in dpdk 18.xx.
>>>
>> .....
>>>>
>>>> -    rte_eth_dev_close(port_id);
>>>> +    rte_dev = rte_eth_devices[port_id].device;
>>>
>>>
>>> We should not use 'rte_eth_devices' directly because it's internal to DPDK.
>>> See the discussion here:
>>>  http://mails.dpdk.org/archives/dev/2018-November/119198.html
>>
>> The discussions mentioned in the link raised pros and cons for using a direct access to rte_eth_devices[] but was not conclusive.
>> I am in favor of keeping the direct access to rte_eth_devices array for the following reasons:
>> 1. Using rte_eth_dev_info_get() API uses the same pointer to the internal rte_eth_devices[] array. So actually it is as dangerous as accessing the array directly.
>> 2. Theoretically there is logic in using the API as it may have a safer implementation in the future. However, I talked with Thomans Monjalon - the maintainer of this API - and he recommended using the direct array access. Thomas mentioned that this API will be dropped. 
>>
>> Thomas - can you please share your view on this topic?
> 
> I agree with the idea of avoiding direct access to this structure.
> That's why I wrote a new function specifically for this use case:
> 	https://patches.dpdk.org/patch/48429/
> 
> About workarounding the direct access with rte_eth_dev_info_get(),
> I am not sure it is a great idea.
> In my opinion, it is best to keep the direct access to the array
> and add a comment about the need to replace it with a proper API.

OK. Thank you, Thomas. That sounds reasonable.
Ophir, please, add a "FIXME" comment that describes the situation.

Best regards, Ilya Maximets.
Ophir Munk Dec. 20, 2018, 11:34 a.m. UTC | #9
> -----Original Message-----
> From: Ilya Maximets <i.maximets@samsung.com>
> Sent: Tuesday, December 18, 2018 2:12 PM
> To: Thomas Monjalon <thomas@monjalon.net>; Ophir Munk
> <ophirmu@mellanox.com>
> Cc: ovs-dev@openvswitch.org; Shahaf Shuler <shahafs@mellanox.com>;
> Stokes, Ian <ian.stokes@intel.com>; Olga Shern <olgas@mellanox.com>;
> Asaf Penso <asafp@mellanox.com>; Kevin Traynor <ktraynor@redhat.com>
> Subject: Re: [ovs-dev,dpdk-hwol,v1] netdev-dpdk: support port
> representors
> 
> On 17.12.2018 12:49, Thomas Monjalon wrote:
> > 16/12/2018 13:02, Ophir Munk:
...
> >>>
> >>> On 12.12.2018 2:34, Ophir Munk wrote:
> >>>> Dpdk port representors were introduced in dpdk 18.xx.
> >>>
> >> .....
> Ophir, please, add a "FIXME" comment that describes the situation.
> 

I added the FIXME comment and sent v3
Aaron Conole Dec. 20, 2018, 6:32 p.m. UTC | #10
Ophir Munk <ophirmu@mellanox.com> writes:

> Hi Aaron,
> It seems like a false negative, nevertheless this error must be eliminated.
> Please note that line 700:
> dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> is called inside static function netdev_dpdk_process_devargs()
> which is called only once by another static function netdev_dpdk_set_config() where the dpdk_mutex is taken.
> So why does clang issue an error?

It's not clear to me why the compiler's flow-path analysis didn't
properly detect this chain.  A brief inspection makes me think that it
might need some robustness improvement from the compiler side - it may
be worth filing a bug with the clang/llvm folks if it really is an error
on the compiler's part.

> I guess that since function netdev_dpdk_process_devargs() did not take dpdk_mutes explicitly inside its body - clang cannot detect the transitivity from netdev_dpdk_set_config to netdev_dpdk_process_devrags of taking the dpdk_mutex
> I suggest adding OVS_REQUIRES(dpdk_mutex) explicitly in netdev_dpdk_process_devrags() in a hope to fix this error. 
> I will add it in v2.

Sounds plausible.

> Is there a way to verify the clang errors independently before sending patches? Can you please advise on this?

You can always double check by using travis with a github account, or
clang locally (you'll want to set CC and CXX to your clang compiler,
but I don't know if you might need to adjust any other variables).

> Regards,
> Ophir
>
>   1686 static dpdk_port_t
>   1687 netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
>   1688                             const char *devargs, char **errp)
>   1689     OVS_REQUIRES(dpdk_mutex)
>   1690 {
> 	  1691     dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
> 	  1692
> 	  1693     if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
> 		  1694         new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
> 		  1695     } else {
> 			  1696         new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
> 			  1697         if (!rte_eth_dev_is_valid_port(new_port_id)) {
> 				  1698             new_port_id = DPDK_ETH_PORT_ID_INVALID;
> 				  1699         } else {
> 					  1700             struct netdev_dpdk *dup_dev;
> 					  1701
> 					  1702             dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> 					  1703             if (dup_dev) {
> 						  1704                 VLOG_WARN_BUF(errp, "'%s' is trying to use d
>
>
> static int
> 1755 netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
> 1756                        char **errp)
> 1757 {
>
>
>> -----Original Message-----
>> From: Aaron Conole <aconole@bytheb.org>
>> Sent: Sunday, December 16, 2018 12:12 AM
>> To: Ophir Munk <ophirmu@mellanox.com>
>> Cc: ovs-dev@openvswitch.org; Shahaf Shuler <shahafs@mellanox.com>;
>> Thomas Monjalon <thomas@monjalon.net>
>> Subject: Re: [ovs-dev] [dpdk-hwol PATCH v1] netdev-dpdk: support port
>> representors
>> 
>> Ophir Munk <ophirmu@mellanox.com> writes:
>> 
>> > Dpdk port representors were introduced in dpdk 18.xx.
>> > Prior to representors there was a one-to-one relationship between an
>> > rte device (e.g. PCI bus) and an eth device (a dpdk port id). With
>> > representors the relationship becomes one-to-many rte device to eth
>> > devices.
>> > For example in [3] there are two devices (representors) using the same
>> > PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
>> > "0000:08:00.0,representor=[5]".
>> > This commit handles the new one-to-many relationship. For example,
>> > when one of the device representors in [3] is closed - the PCI bus
>> > cannot be detached until the other device representor is closed as
>> > well. OVS remains backward compatible by supporting dpdk legacy PCI
>> > ports which do not include representors.
>> > Dpdk representors related commits are listed in [1]. dpdk representors
>> > documentation appears in [2]. A sample configuration which uses two
>> > representors ports (the output of "ovs-vsctl show" command) is shown
>> > in [3].
>> >
>> > [1]
>> > e0cb96204b71 ("net/i40e: add support for representor ports")
>> > cf80ba6e2038 ("net/ixgbe: add support for representor ports")
>> > 26c08b979d26 ("net/mlx5: add port representor awareness")
>> >
>> > [2]
>> > doc/guides/prog_guide/switch_representation.rst
>> >
>> > [3]
>> > Bridge "ovs_br0"
>> >     Port "ovs_br0"
>> >         Interface "ovs_br0"
>> >             type: internal
>> >     Port "port-rep3"
>> >         Interface "port-rep3"
>> >             type: dpdk
>> >             options: {dpdk-devargs="0000:08:00.0,representor=[3]"}
>> >     Port "port-rep5"
>> >         Interface "port-rep5"
>> >             type: dpdk
>> >             options: {dpdk-devargs="0000:08:00.0,representor=[5]"}
>> >     ovs_version: "2.10.90"
>> >
>> > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
>> > ---
>> 
>> FYI, the following clang error seems to be spit out by this patch:
>> 
>> libtool: compile:  clang -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I
>> ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -
>> Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-
>> function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -
>> Wmissing-prototypes -Wmissing-field-initializers -Wthread-safety -fno-strict-
>> aliasing -Wswitch-bool -Wlogical-not-parentheses -Wsizeof-array-argument -
>> Wshift-negative-value -Qunused-arguments -Wshadow -Warray-bounds-
>> pointer-arithmetic -mssse3 -I./dpdk-18.11/build/include -
>> D_FILE_OFFSET_BITS=64 -Werror -Wno-cast-align -Wno-error=unused-
>> command-line-argument -MT lib/stream-ssl.lo -MD -MP -MF
>> lib/.deps/stream-ssl.Tpo -c lib/stream-ssl.c -o lib/stream-ssl.o
>> lib/netdev-dpdk.c:1700:23: error: calling function
>>       'netdev_dpdk_lookup_by_port_id' requires holding mutex 'dpdk_mutex'
>>       exclusively [-Werror,-Wthread-safety-analysis]
>>             dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
>> 
>> > v1:
>> > 1. rebase on top of Kevin's patch
>> > dpdk: Update to use DPDK 18.11.[ovs-dev,v7,dpdk-latest,1/1] dpdk:
>> Update to use DPDK 18.11.
>> >
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpat
>> >
>> chwork.ozlabs.org%2Fpatch%2F1005535%2F&amp;data=02%7C01%7Cophirm
>> u%40me
>> >
>> llanox.com%7Cf6a20b11663f4ab1a39608d662da4f79%7Ca652971c7d2e4d9ba6
>> a4d1
>> >
>> 49256f461b%7C0%7C0%7C636805087113130623&amp;sdata=A%2BiFB3q%2B
>> HNqRPua0
>> > qyCA06ZwfapeUNKNG2%2F0XsbNcRE%3D&amp;reserved=0
>> > 2. skipping count of sibling ports in case the sibling port state is
>> > RTE_ETH_DEV_UNUSED
>> >
>> >  lib/netdev-dpdk.c | 112
>> > ++++++++++++++++++++++++++++++++++++++++--------------
>> >  1 file changed, 83 insertions(+), 29 deletions(-)
>> >
>> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>> > 6b8e05e..30af043 100644
>> > --- a/lib/netdev-dpdk.c
>> > +++ b/lib/netdev-dpdk.c
>> > @@ -1216,6 +1216,25 @@ dpdk_dev_parse_name(const char dev_name[],
>> const char prefix[],
>> >      }
>> >  }
>> >
>> > +/* get the number of OVS interfaces which have the same DPDK
>> > + * rte device (e.g. same pci bus address). */ static int
>> > +netdev_dpdk_get_num_ports(struct rte_device *device)
>> > +    OVS_REQUIRES(dpdk_mutex)
>> > +{
>> > +    struct netdev_dpdk *dev;
>> > +    int count;
>> > +
>> > +    count = 0;
>> > +    LIST_FOR_EACH (dev, list_node, &dpdk_list) {
>> > +        if (rte_eth_devices[dev->port_id].device == device
>> > +        && rte_eth_devices[dev->port_id].state != RTE_ETH_DEV_UNUSED)
>> {
>> > +            count++;
>> > +        }
>> > +    }
>> > +    return count;
>> > +}
>> > +
>> >  static int
>> >  vhost_common_construct(struct netdev *netdev)
>> >      OVS_REQUIRES(dpdk_mutex)
>> > @@ -1351,20 +1370,23 @@ static void
>> >  netdev_dpdk_destruct(struct netdev *netdev)  {
>> >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> > -    struct rte_eth_dev_info dev_info;
>> > +    struct rte_device *rte_dev;
>> >
>> >      ovs_mutex_lock(&dpdk_mutex);
>> >
>> >      rte_eth_dev_stop(dev->port_id);
>> >      dev->started = false;
>> > -
>> >      if (dev->attached) {
>> > +        /* Remove the port eth device */
>> >          rte_eth_dev_close(dev->port_id);
>> > -        rte_eth_dev_info_get(dev->port_id, &dev_info);
>> > -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
>> > -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
>> > -        } else {
>> > -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
>> > +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
>> > +        /* if this is the last port_id using this rte device
>> > +         * remove this rte device and all its eth devices */
>> > +        rte_dev = rte_eth_devices[dev->port_id].device;
>> > +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
>> > +            if (rte_dev_remove(rte_dev) < 0) {
>> > +                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
>> > +            }
>> >          }
>> >      }
>> >
>> > @@ -1630,8 +1652,26 @@ netdev_dpdk_get_port_by_mac(const char
>> *mac_str)
>> >      return DPDK_ETH_PORT_ID_INVALID;
>> >  }
>> >
>> > +/* return the first DPDK port_id matching the devargs pattern */
>> > +static dpdk_port_t netdev_dpdk_get_port_by_devargs(const char
>> > +*devargs) {
>> > +    struct rte_dev_iterator iterator;
>> > +    dpdk_port_t port_id;
>> > +
>> > +    if (rte_dev_probe(devargs)) {
>> > +        port_id = DPDK_ETH_PORT_ID_INVALID;
>> > +    } else {
>> > +        RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
>> > +            break;
>> > +        }
>> > +    }
>> > +    return port_id;
>> > +}
>> > +
>> >  /*
>> > - * Normally, a PCI id is enough for identifying a specific DPDK port.
>> > + * Normally, a PCI id (optionally followed by a representor number)
>> > + * is enough for identifying a specific DPDK port.
>> >   * However, for some NICs having multiple ports sharing the same PCI
>> >   * id, using PCI id won't work then.
>> >   *
>> > @@ -1644,29 +1684,32 @@ static dpdk_port_t
>> > netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
>> >                              const char *devargs, char **errp)  {
>> > -    char *name;
>> >      dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
>> >
>> >      if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
>> >          new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
>> >      } else {
>> > -        name = xmemdup0(devargs, strcspn(devargs, ","));
>> > -        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
>> > -                || !rte_eth_dev_is_valid_port(new_port_id)) {
>> > -            /* Device not found in DPDK, attempt to attach it */
>> > -            if (!rte_dev_probe(devargs)
>> > -                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
>> > -                /* Attach successful */
>> > -                dev->attached = true;
>> > -                VLOG_INFO("Device '%s' attached to DPDK", devargs);
>> > -            } else {
>> > -                /* Attach unsuccessful */
>> > +        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
>> > +        if (!rte_eth_dev_is_valid_port(new_port_id)) {
>> > +            new_port_id = DPDK_ETH_PORT_ID_INVALID;
>> > +        } else {
>> > +            struct netdev_dpdk *dup_dev;
>> > +
>> > +            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
>> > +            if (dup_dev) {
>> > +                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
>> > +                               "which is already in use by '%s'",
>> > +                          netdev_get_name(&dev->up), devargs,
>> > +                          netdev_get_name(&dup_dev->up));
>> >                  new_port_id = DPDK_ETH_PORT_ID_INVALID;
>> > +            } else {
>> > +                /* device successfully found */
>> > +                dev->attached = true;
>> > +                VLOG_INFO("Device '%s' attached to DPDK port %d",
>> > +                          devargs, new_port_id);
>> >              }
>> >          }
>> > -        free(name);
>> >      }
>> > -
>> >      if (new_port_id == DPDK_ETH_PORT_ID_INVALID) {
>> >          VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK",
>> devargs);
>> >      }
>> > @@ -3234,11 +3277,15 @@ netdev_dpdk_detach(struct unixctl_conn
>> *conn, int argc OVS_UNUSED,
>> >      char *response;
>> >      dpdk_port_t port_id;
>> >      struct netdev_dpdk *dev;
>> > -    struct rte_eth_dev_info dev_info;
>> > +    struct rte_device *rte_dev;
>> > +    struct rte_dev_iterator iterator;
>> >
>> >      ovs_mutex_lock(&dpdk_mutex);
>> >
>> > -    if (rte_eth_dev_get_port_by_name(argv[1], &port_id)) {
>> > +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, argv[1], &iterator) {
>> > +        break;
>> > +    }
>> > +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
>> >          response = xasprintf("Device '%s' not found in DPDK", argv[1]);
>> >          goto error;
>> >      }
>> > @@ -3251,15 +3298,22 @@ netdev_dpdk_detach(struct unixctl_conn
>> *conn, int argc OVS_UNUSED,
>> >          goto error;
>> >      }
>> >
>> > -    rte_eth_dev_close(port_id);
>> > +    rte_dev = rte_eth_devices[port_id].device;
>> > +    if (netdev_dpdk_get_num_ports(rte_dev)) {
>> > +        response = xasprintf("Device '%s' is being shared with other "
>> > +                             "interfaces. Remove them before detaching.",
>> > +                             argv[1]);
>> > +        goto error;
>> > +    }
>> >
>> > -    rte_eth_dev_info_get(port_id, &dev_info);
>> > -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
>> > -        response = xasprintf("Device '%s' can not be detached", argv[1]);
>> > +    rte_eth_dev_close(port_id);
>> > +    if (rte_dev_remove(rte_dev) < 0) {
>> > +        response = xasprintf("Device '%s' can not be removed",
>> > + argv[1]);
>> >          goto error;
>> >      }
>> >
>> > -    response = xasprintf("Device '%s' has been detached", argv[1]);
>> > +    response = xasprintf("All devices shared with device '%s' "
>> > +                          "have been detached", argv[1]);
>> >
>> >      ovs_mutex_unlock(&dpdk_mutex);
>> >      unixctl_command_reply(conn, response);
diff mbox series

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 6b8e05e..30af043 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1216,6 +1216,25 @@  dpdk_dev_parse_name(const char dev_name[], const char prefix[],
     }
 }
 
+/* get the number of OVS interfaces which have the same DPDK
+ * rte device (e.g. same pci bus address). */
+static int
+netdev_dpdk_get_num_ports(struct rte_device *device)
+    OVS_REQUIRES(dpdk_mutex)
+{
+    struct netdev_dpdk *dev;
+    int count;
+
+    count = 0;
+    LIST_FOR_EACH (dev, list_node, &dpdk_list) {
+        if (rte_eth_devices[dev->port_id].device == device
+        && rte_eth_devices[dev->port_id].state != RTE_ETH_DEV_UNUSED) {
+            count++;
+        }
+    }
+    return count;
+}
+
 static int
 vhost_common_construct(struct netdev *netdev)
     OVS_REQUIRES(dpdk_mutex)
@@ -1351,20 +1370,23 @@  static void
 netdev_dpdk_destruct(struct netdev *netdev)
 {
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
-    struct rte_eth_dev_info dev_info;
+    struct rte_device *rte_dev;
 
     ovs_mutex_lock(&dpdk_mutex);
 
     rte_eth_dev_stop(dev->port_id);
     dev->started = false;
-
     if (dev->attached) {
+        /* Remove the port eth device */
         rte_eth_dev_close(dev->port_id);
-        rte_eth_dev_info_get(dev->port_id, &dev_info);
-        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
-            VLOG_INFO("Device '%s' has been detached", dev->devargs);
-        } else {
-            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
+        VLOG_INFO("Device '%s' has been removed", dev->devargs);
+        /* if this is the last port_id using this rte device
+         * remove this rte device and all its eth devices */
+        rte_dev = rte_eth_devices[dev->port_id].device;
+        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
+            if (rte_dev_remove(rte_dev) < 0) {
+                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
+            }
         }
     }
 
@@ -1630,8 +1652,26 @@  netdev_dpdk_get_port_by_mac(const char *mac_str)
     return DPDK_ETH_PORT_ID_INVALID;
 }
 
+/* return the first DPDK port_id matching the devargs pattern */
+static dpdk_port_t
+netdev_dpdk_get_port_by_devargs(const char *devargs)
+{
+    struct rte_dev_iterator iterator;
+    dpdk_port_t port_id;
+
+    if (rte_dev_probe(devargs)) {
+        port_id = DPDK_ETH_PORT_ID_INVALID;
+    } else {
+        RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
+            break;
+        }
+    }
+    return port_id;
+}
+
 /*
- * Normally, a PCI id is enough for identifying a specific DPDK port.
+ * Normally, a PCI id (optionally followed by a representor number)
+ * is enough for identifying a specific DPDK port.
  * However, for some NICs having multiple ports sharing the same PCI
  * id, using PCI id won't work then.
  *
@@ -1644,29 +1684,32 @@  static dpdk_port_t
 netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
                             const char *devargs, char **errp)
 {
-    char *name;
     dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
 
     if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
         new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
     } else {
-        name = xmemdup0(devargs, strcspn(devargs, ","));
-        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
-                || !rte_eth_dev_is_valid_port(new_port_id)) {
-            /* Device not found in DPDK, attempt to attach it */
-            if (!rte_dev_probe(devargs)
-                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
-                /* Attach successful */
-                dev->attached = true;
-                VLOG_INFO("Device '%s' attached to DPDK", devargs);
-            } else {
-                /* Attach unsuccessful */
+        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
+        if (!rte_eth_dev_is_valid_port(new_port_id)) {
+            new_port_id = DPDK_ETH_PORT_ID_INVALID;
+        } else {
+            struct netdev_dpdk *dup_dev;
+
+            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
+            if (dup_dev) {
+                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
+                               "which is already in use by '%s'",
+                          netdev_get_name(&dev->up), devargs,
+                          netdev_get_name(&dup_dev->up));
                 new_port_id = DPDK_ETH_PORT_ID_INVALID;
+            } else {
+                /* device successfully found */
+                dev->attached = true;
+                VLOG_INFO("Device '%s' attached to DPDK port %d",
+                          devargs, new_port_id);
             }
         }
-        free(name);
     }
-
     if (new_port_id == DPDK_ETH_PORT_ID_INVALID) {
         VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK", devargs);
     }
@@ -3234,11 +3277,15 @@  netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
     char *response;
     dpdk_port_t port_id;
     struct netdev_dpdk *dev;
-    struct rte_eth_dev_info dev_info;
+    struct rte_device *rte_dev;
+    struct rte_dev_iterator iterator;
 
     ovs_mutex_lock(&dpdk_mutex);
 
-    if (rte_eth_dev_get_port_by_name(argv[1], &port_id)) {
+    RTE_ETH_FOREACH_MATCHING_DEV(port_id, argv[1], &iterator) {
+        break;
+    }
+    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
         response = xasprintf("Device '%s' not found in DPDK", argv[1]);
         goto error;
     }
@@ -3251,15 +3298,22 @@  netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
         goto error;
     }
 
-    rte_eth_dev_close(port_id);
+    rte_dev = rte_eth_devices[port_id].device;
+    if (netdev_dpdk_get_num_ports(rte_dev)) {
+        response = xasprintf("Device '%s' is being shared with other "
+                             "interfaces. Remove them before detaching.",
+                             argv[1]);
+        goto error;
+    }
 
-    rte_eth_dev_info_get(port_id, &dev_info);
-    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
-        response = xasprintf("Device '%s' can not be detached", argv[1]);
+    rte_eth_dev_close(port_id);
+    if (rte_dev_remove(rte_dev) < 0) {
+        response = xasprintf("Device '%s' can not be removed", argv[1]);
         goto error;
     }
 
-    response = xasprintf("Device '%s' has been detached", argv[1]);
+    response = xasprintf("All devices shared with device '%s' "
+                          "have been detached", argv[1]);
 
     ovs_mutex_unlock(&dpdk_mutex);
     unixctl_command_reply(conn, response);