diff mbox series

[ovs-dev,v4] netdev-dpdk: support port representors

Message ID 1546939877-22076-1-git-send-email-ophirmu@mellanox.com
State Superseded
Delegated to: Ian Stokes
Headers show
Series [ovs-dev,v4] netdev-dpdk: support port representors | expand

Commit Message

Ophir Munk Jan. 8, 2019, 9:31 a.m. UTC
Dpdk port representors were introduced in dpdk versions 18.xx.
Prior to port representors there was a one-to-one relationship
between an rte device (e.g. PCI bus) and an eth device (referenced as
dpdk port id in OVS). With port representors the relationship becomes
one-to-many rte device to eth devices.
For example in [3] there are two devices (representors) using the same
PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
"0000:08:00.0,representor=[5]".
This commit handles the new one-to-many relationship. For example,
when one of the device port representors in [3] is closed - the PCI bus
cannot be detached until the other device port representor is closed as
well. OVS remains backward compatible by supporting dpdk legacy PCI
ports which do not include port representors.
Dpdk port representors related commits are listed in [1]. Dpdk port
representors documentation appears in [2]. A sample configuration
which uses two representors ports (the output of "ovs-vsctl show"
command) is shown in [3].

[1]
e0cb96204b71 ("net/i40e: add support for representor ports")
cf80ba6e2038 ("net/ixgbe: add support for representor ports")
26c08b979d26 ("net/mlx5: add port representor awareness")

[2]
doc/guides/prog_guide/switch_representation.rst

[3]
Bridge "ovs_br0"
    Port "ovs_br0"
        Interface "ovs_br0"
            type: internal
    Port "port-rep3"
        Interface "port-rep3"
            type: dpdk
            options: {dpdk-devargs="0000:08:00.0,representor=[3]"}
    Port "port-rep5"
        Interface "port-rep5"
            type: dpdk
            options: {dpdk-devargs="0000:08:00.0,representor=[5]"}
    ovs_version: "2.10.90"

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
---
v1 (hwol branch):
1. rebase on top of Kevin's patch
dpdk: Update to use DPDK 18.11.[ovs-dev,v7,dpdk-latest,1/1] dpdk: Update to use DPDK 18.11.
https://patchwork.ozlabs.org/patch/1005535/
2. skipping count of sibling ports in case the sibling port state is RTE_ETH_DEV_UNUSED 

v2 (switching to master branch):
1. Update based on review comments: https://patchwork.ozlabs.org/patch/1011457/#2051417
2. Send patch on master branch

v3:
Add a FIXME comment regarding the direct access to DPDK rte_eth_devices array

v4:
1. Add FIXME comment to every direct access to DPDK rte_eth_devices array
2. Do not probe unconditionally during add-port. Instead see if a device has already
been probed, and if so skip the rte_dev_probe(devargs)



 lib/netdev-dpdk.c | 136 ++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 106 insertions(+), 30 deletions(-)

Comments

Stokes, Ian Jan. 9, 2019, 3:09 p.m. UTC | #1
On 1/8/2019 9:31 AM, Ophir Munk wrote:
> Dpdk port representors were introduced in dpdk versions 18.xx.
> Prior to port representors there was a one-to-one relationship
> between an rte device (e.g. PCI bus) and an eth device (referenced as
> dpdk port id in OVS). With port representors the relationship becomes
> one-to-many rte device to eth devices.
> For example in [3] there are two devices (representors) using the same
> PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
> "0000:08:00.0,representor=[5]".
> This commit handles the new one-to-many relationship. For example,
> when one of the device port representors in [3] is closed - the PCI bus
> cannot be detached until the other device port representor is closed as
> well. OVS remains backward compatible by supporting dpdk legacy PCI
> ports which do not include port representors.
> Dpdk port representors related commits are listed in [1]. Dpdk port
> representors documentation appears in [2]. A sample configuration
> which uses two representors ports (the output of "ovs-vsctl show"
> command) is shown in [3].
> 

Thanks for the v4 Ophir, this addresses my issues and tests ok. I have 
no issues with this.

I'll give people a chance for any final comments after today's community 
meeting and if there are none I'll look to push this to master and the 
2.11 release.

Ian
Ilya Maximets Jan. 10, 2019, 11:32 a.m. UTC | #2
On 08.01.2019 12:31, Ophir Munk wrote:
> Dpdk port representors were introduced in dpdk versions 18.xx.
> Prior to port representors there was a one-to-one relationship
> between an rte device (e.g. PCI bus) and an eth device (referenced as
> dpdk port id in OVS). With port representors the relationship becomes
> one-to-many rte device to eth devices.
> For example in [3] there are two devices (representors) using the same
> PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
> "0000:08:00.0,representor=[5]".
> This commit handles the new one-to-many relationship. For example,
> when one of the device port representors in [3] is closed - the PCI bus
> cannot be detached until the other device port representor is closed as
> well. OVS remains backward compatible by supporting dpdk legacy PCI
> ports which do not include port representors.
> Dpdk port representors related commits are listed in [1]. Dpdk port
> representors documentation appears in [2]. A sample configuration
> which uses two representors ports (the output of "ovs-vsctl show"
> command) is shown in [3].
> 
> [1]
> e0cb96204b71 ("net/i40e: add support for representor ports")
> cf80ba6e2038 ("net/ixgbe: add support for representor ports")
> 26c08b979d26 ("net/mlx5: add port representor awareness")
> 
> [2]
> doc/guides/prog_guide/switch_representation.rst
> 
> [3]
> Bridge "ovs_br0"
>     Port "ovs_br0"
>         Interface "ovs_br0"
>             type: internal
>     Port "port-rep3"
>         Interface "port-rep3"
>             type: dpdk
>             options: {dpdk-devargs="0000:08:00.0,representor=[3]"}
>     Port "port-rep5"
>         Interface "port-rep5"
>             type: dpdk
>             options: {dpdk-devargs="0000:08:00.0,representor=[5]"}
>     ovs_version: "2.10.90"
> 
> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> ---
> v1 (hwol branch):
> 1. rebase on top of Kevin's patch
> dpdk: Update to use DPDK 18.11.[ovs-dev,v7,dpdk-latest,1/1] dpdk: Update to use DPDK 18.11.
> https://patchwork.ozlabs.org/patch/1005535/
> 2. skipping count of sibling ports in case the sibling port state is RTE_ETH_DEV_UNUSED 
> 
> v2 (switching to master branch):
> 1. Update based on review comments: https://patchwork.ozlabs.org/patch/1011457/#2051417
> 2. Send patch on master branch
> 
> v3:
> Add a FIXME comment regarding the direct access to DPDK rte_eth_devices array
> 
> v4:
> 1. Add FIXME comment to every direct access to DPDK rte_eth_devices array
> 2. Do not probe unconditionally during add-port. Instead see if a device has already
> been probed, and if so skip the rte_dev_probe(devargs)
> 
> 
> 
>  lib/netdev-dpdk.c | 136 ++++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 106 insertions(+), 30 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 320422b..b6e903f 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1218,6 +1218,26 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
>      }
>  }
>  
> +/* Get the number of OVS interfaces which have the same DPDK
> + * rte device (e.g. same pci bus address).
> + * FIXME: avoid direct access to DPDK internal array rte_eth_devices.
> + */
> +static int
> +netdev_dpdk_get_num_ports(struct rte_device *device)
> +    OVS_REQUIRES(dpdk_mutex)
> +{
> +    struct netdev_dpdk *dev;
> +    int count = 0;
> +
> +    LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> +        if (rte_eth_devices[dev->port_id].device == device
> +        && rte_eth_devices[dev->port_id].state != RTE_ETH_DEV_UNUSED) {
> +            count++;
> +        }
> +    }
> +    return count;
> +}
> +
>  static int
>  vhost_common_construct(struct netdev *netdev)
>      OVS_REQUIRES(dpdk_mutex)
> @@ -1353,20 +1373,25 @@ static void
>  netdev_dpdk_destruct(struct netdev *netdev)
>  {
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> -    struct rte_eth_dev_info dev_info;
> +    struct rte_device *rte_dev;
>  
>      ovs_mutex_lock(&dpdk_mutex);
>  
>      rte_eth_dev_stop(dev->port_id);
>      dev->started = false;
> -

Please, keep the empty line.

>      if (dev->attached) {
> +        /* Remove the port eth device. */
>          rte_eth_dev_close(dev->port_id);
> -        rte_eth_dev_info_get(dev->port_id, &dev_info);
> -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> -            VLOG_INFO("Device '%s' has been detached", dev->devargs);

Please keep this log message for the successful case.

> -        } else {
> -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
> +        /* If this is the last port_id using this rte device
> +         * remove this rte device and all its eth devices.
> +         * FIXME: avoid direct access to DPDK internal array rte_eth_devices.
> +         */
> +        rte_dev = rte_eth_devices[dev->port_id].device;
> +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {

rte_eth_dev_close() sets the RTE_ETH_DEV_UNUSED state for PMDs with
RTE_ETH_DEV_CLOSE_REMOVE flag.
The function netdev_dpdk_get_num_ports() skips unused ports.
The result should not be equal to 1. We need the other way to detect
the siblings or close the port after calculating them.
In this case we'll probably do not need to have check for UNUSED state
inside the _get_num_ports().

> +            if (rte_dev_remove(rte_dev) < 0) {
> +                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> +            }
>          }
>      }
>  
> @@ -1632,8 +1657,37 @@ netdev_dpdk_get_port_by_mac(const char *mac_str)
>      return DPDK_ETH_PORT_ID_INVALID;
>  }
>  
> +/* Return the first DPDK port_id matching the devargs pattern. */
> +static dpdk_port_t
> +netdev_dpdk_get_port_by_devargs(const char *devargs)

OVS_REQUIRES(dpdk_mutex)

I'm not sure if 'netdev_dpdk_get_port_by_devargs' is the correct name
because it not only finds (gets) the port, but probes (attaches) it.

> +{
> +    struct rte_dev_iterator iterator;
> +    dpdk_port_t port_id;
> +
> +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {

Invalid spacing around the FOREACH loop. Same in all the other places.

> +        /* If a break is done - must call rte_eth_iterator_cleanup. */
> +        rte_eth_iterator_cleanup(&iterator);
> +        break;
> +    }
> +    /* The port may be invalid if it was not probed */
> +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
> +        if (rte_dev_probe(devargs)) {
> +            port_id = DPDK_ETH_PORT_ID_INVALID;
> +        } else {
> +            RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
> +                /* If a break is done - must call rte_eth_iterator_cleanup. */
> +                rte_eth_iterator_cleanup(&iterator);
> +                break;
> +            }
> +        }
> +    }
> +
> +    return port_id;
> +}
> +
>  /*
> - * Normally, a PCI id is enough for identifying a specific DPDK port.
> + * Normally, a PCI id (optionally followed by a representor number)
> + * is enough for identifying a specific DPDK port.
>   * However, for some NICs having multiple ports sharing the same PCI
>   * id, using PCI id won't work then.
>   *
> @@ -1641,34 +1695,42 @@ netdev_dpdk_get_port_by_mac(const char *mac_str)
>   *
>   * Note that the compatibility is fully kept: user can still use the
>   * PCI id for adding ports (when it's enough for them).
> + *
> + * In order to avoid clang errors add OVS_REQUIRES(dpdk_mutex) to this function
> + * since it is required by netdev_dpdk_lookup_by_port_id() while the dpdk_mutex
> + * is taken outside of this function.
>   */
>  static dpdk_port_t
>  netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
>                              const char *devargs, char **errp)
> +    OVS_REQUIRES(dpdk_mutex)
>  {
> -    char *name;
>      dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
>  
>      if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
>          new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
>      } else {
> -        name = xmemdup0(devargs, strcspn(devargs, ","));
> -        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
> -                || !rte_eth_dev_is_valid_port(new_port_id)) {
> -            /* Device not found in DPDK, attempt to attach it */
> -            if (!rte_dev_probe(devargs)
> -                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
> -                /* Attach successful */
> -                dev->attached = true;
> -                VLOG_INFO("Device '%s' attached to DPDK", devargs);
> -            } else {
> -                /* Attach unsuccessful */
> +        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
> +        if (!rte_eth_dev_is_valid_port(new_port_id)) {
> +            new_port_id = DPDK_ETH_PORT_ID_INVALID;
> +        } else {
> +            struct netdev_dpdk *dup_dev;
> +
> +            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> +            if (dup_dev) {
> +                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> +                               "which is already in use by '%s'",
> +                          netdev_get_name(&dev->up), devargs,
> +                          netdev_get_name(&dup_dev->up));

Why you duplicating this part of code ?
Can we just replace 'rte_eth_dev_get_port_by_name' with
'netdev_dpdk_get_port_by_devargs' and keep all the other logic ?

netdev_dpdk_get_port_by_devargs should look like this in this case:

/* Return the first DPDK port_id matching the devargs pattern. */
static bool
netdev_dpdk_get_port_by_devargs(const char *devargs, dpdk_port_t *port_id)
    OVS_REQUIRES(dpdk_mutex)
{
    struct rte_dev_iterator iterator;

    RTE_ETH_FOREACH_MATCHING_DEV (*port_id, devargs, &iterator) {
        /* If a break is done - must call rte_eth_iterator_cleanup. */
        rte_eth_iterator_cleanup(&iterator);
        break;
    }
    return *port_id != DPDK_ETH_PORT_ID_INVALID;
}

netdev_dpdk_process_devargs:

        if (netdev_dpdk_get_port_by_devargs(devagrs, &new_port_id)
                || !rte_eth_dev_is_valid_port(new_port_id)) {
            /* Device not found in DPDK, attempt to attach it */
            if (!rte_dev_probe(devargs)
                && !netdev_dpdk_get_port_by_devargs(devargs, &new_port_id)) {
                /* Attach successful */
                dev->attached = true;
                VLOG_INFO("Device '%s' attached to DPDK", devargs);
            } else {
                /* Attach unsuccessful */
                new_port_id = DPDK_ETH_PORT_ID_INVALID;
            }
        }

If it works, I'd prefer this variant.

Also, do we still need netdev_dpdk_get_port_by_mac() if we're using
devargs iterator ?

>                  new_port_id = DPDK_ETH_PORT_ID_INVALID;
> +            } else {
> +                /* Device successfully found. */
> +                dev->attached = true;
> +                VLOG_INFO("Device '%s' attached to DPDK port %d",
> +                          devargs, new_port_id);
>              }
>          }
> -        free(name);
>      }
> -

Please, keep the empty line.

>      if (new_port_id == DPDK_ETH_PORT_ID_INVALID) {
>          VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK", devargs);
>      }
> @@ -1761,7 +1823,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>                  dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
>                  if (dup_dev) {
>                      VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> -                                        "which is already in use by '%s'",
> +                                  "which is already in use by '%s'",
>                                    netdev_get_name(netdev), new_devargs,
>                                    netdev_get_name(&dup_dev->up));
>                      err = EADDRINUSE;
> @@ -3236,11 +3298,17 @@ netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
>      char *response;
>      dpdk_port_t port_id;
>      struct netdev_dpdk *dev;
> -    struct rte_eth_dev_info dev_info;
> +    struct rte_device *rte_dev;
> +    struct rte_dev_iterator iterator;
>  
>      ovs_mutex_lock(&dpdk_mutex);
>  
> -    if (rte_eth_dev_get_port_by_name(argv[1], &port_id)) {
> +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, argv[1], &iterator) {
> +        /* If a break is done - must call rte_eth_iterator_cleanup. */
> +        rte_eth_iterator_cleanup(&iterator);
> +        break;
> +    }
> +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
>          response = xasprintf("Device '%s' not found in DPDK", argv[1]);
>          goto error;
>      }
> @@ -3253,15 +3321,23 @@ netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
>          goto error;
>      }
>  
> -    rte_eth_dev_close(port_id);
> +    /* FIXME: avoid direct access to DPDK internal array rte_eth_devices. */
> +    rte_dev = rte_eth_devices[port_id].device;
> +    if (netdev_dpdk_get_num_ports(rte_dev)) {

Probably same here.
DPDK does not set RTE_ETH_DEV_UNUSED state for PMDs without RTE_ETH_DEV_CLOSE_REMOVE flag.

> +        response = xasprintf("Device '%s' is being shared with other "
> +                             "interfaces. Remove them before detaching.",
> +                             argv[1]);
> +        goto error;
> +    }
>  
> -    rte_eth_dev_info_get(port_id, &dev_info);
> -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> -        response = xasprintf("Device '%s' can not be detached", argv[1]);
> +    rte_eth_dev_close(port_id);
> +    if (rte_dev_remove(rte_dev) < 0) {
> +        response = xasprintf("Device '%s' can not be removed", argv[1]);

/removed/detached/
You're using different words for equal situations. This is misleading.

>          goto error;
>      }
>  
> -    response = xasprintf("Device '%s' has been detached", argv[1]);
> +    response = xasprintf("All devices shared with device '%s' "
> +                          "have been detached", argv[1]);
>  
>      ovs_mutex_unlock(&dpdk_mutex);
>      unixctl_command_reply(conn, response);
>
Thomas Monjalon Jan. 10, 2019, 1:42 p.m. UTC | #3
Hi,

10/01/2019 12:32, Ilya Maximets:
> On 08.01.2019 12:31, Ophir Munk wrote:
> >      if (dev->attached) {
> > +        /* Remove the port eth device. */
> >          rte_eth_dev_close(dev->port_id);
> > -        rte_eth_dev_info_get(dev->port_id, &dev_info);
> > -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> > -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
> 
> Please keep this log message for the successful case.
> 
> > -        } else {
> > -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> > +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
> > +        /* If this is the last port_id using this rte device
> > +         * remove this rte device and all its eth devices.
> > +         * FIXME: avoid direct access to DPDK internal array rte_eth_devices.
> > +         */
> > +        rte_dev = rte_eth_devices[dev->port_id].device;
> > +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
> 
> rte_eth_dev_close() sets the RTE_ETH_DEV_UNUSED state for PMDs with
> RTE_ETH_DEV_CLOSE_REMOVE flag.
> The function netdev_dpdk_get_num_ports() skips unused ports.
> The result should not be equal to 1. We need the other way to detect
> the siblings or close the port after calculating them.
> In this case we'll probably do not need to have check for UNUSED state
> inside the _get_num_ports().

Maybe I don't understand correctly.
Why do you want to count closed sibling ports?

> > +            if (rte_dev_remove(rte_dev) < 0) {
> > +                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> > +            }
> >          }
> >      }

[...]
> Also, do we still need netdev_dpdk_get_port_by_mac() if we're using
> devargs iterator ?

The devargs iterator manages "mac=" property in DPDK 18.11.

[...]
> > -    rte_eth_dev_close(port_id);
> > +    /* FIXME: avoid direct access to DPDK internal array rte_eth_devices. */
> > +    rte_dev = rte_eth_devices[port_id].device;
> > +    if (netdev_dpdk_get_num_ports(rte_dev)) {
> 
> Probably same here.
> DPDK does not set RTE_ETH_DEV_UNUSED state for PMDs without RTE_ETH_DEV_CLOSE_REMOVE flag.

If RTE_ETH_DEV_CLOSE_REMOVE is not set, you should call rte_dev_remove()
just after rte_eth_dev_close().

> > +        response = xasprintf("Device '%s' is being shared with other "
> > +                             "interfaces. Remove them before detaching.",
> > +                             argv[1]);
> > +        goto error;
> > +    }
> >  
> > -    rte_eth_dev_info_get(port_id, &dev_info);
> > -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> > -        response = xasprintf("Device '%s' can not be detached", argv[1]);
> > +    rte_eth_dev_close(port_id);
> > +    if (rte_dev_remove(rte_dev) < 0) {
> > +        response = xasprintf("Device '%s' can not be removed", argv[1]);
Ilya Maximets Jan. 10, 2019, 2:23 p.m. UTC | #4
On 10.01.2019 16:42, Thomas Monjalon wrote:
> Hi,
> 
> 10/01/2019 12:32, Ilya Maximets:
>> On 08.01.2019 12:31, Ophir Munk wrote:
>>>      if (dev->attached) {
>>> +        /* Remove the port eth device. */
>>>          rte_eth_dev_close(dev->port_id);
>>> -        rte_eth_dev_info_get(dev->port_id, &dev_info);
>>> -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
>>> -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
>>
>> Please keep this log message for the successful case.
>>
>>> -        } else {
>>> -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
>>> +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
>>> +        /* If this is the last port_id using this rte device
>>> +         * remove this rte device and all its eth devices.
>>> +         * FIXME: avoid direct access to DPDK internal array rte_eth_devices.
>>> +         */
>>> +        rte_dev = rte_eth_devices[dev->port_id].device;
>>> +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
>>
>> rte_eth_dev_close() sets the RTE_ETH_DEV_UNUSED state for PMDs with
>> RTE_ETH_DEV_CLOSE_REMOVE flag.
>> The function netdev_dpdk_get_num_ports() skips unused ports.
>> The result should not be equal to 1. We need the other way to detect
>> the siblings or close the port after calculating them.
>> In this case we'll probably do not need to have check for UNUSED state
>> inside the _get_num_ports().
> 
> Maybe I don't understand correctly.
> Why do you want to count closed sibling ports?

Proposed version of 'netdev_dpdk_get_num_ports' iterates over
'dpdk_list' which contains only ports that currently opened by OVS.
So, the only port that could have UNUSED state in that list is the
port that we're just closed here. If we'll count number of siblings
before closing the port where should be no UNUSED ports in that list.

The issue I tried to raise here is that rte_eth_dev_close() changed the
state only for PMDs with RTE_ETH_DEV_CLOSE_REMOVE. So, the result
of 'netdev_dpdk_get_num_ports' will be different for different PMDs.

> 
>>> +            if (rte_dev_remove(rte_dev) < 0) {
>>> +                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
>>> +            }
>>>          }
>>>      }
> 
> [...]
>> Also, do we still need netdev_dpdk_get_port_by_mac() if we're using
>> devargs iterator ?
> 
> The devargs iterator manages "mac=" property in DPDK 18.11.

Good.

> 
> [...]
>>> -    rte_eth_dev_close(port_id);
>>> +    /* FIXME: avoid direct access to DPDK internal array rte_eth_devices. */
>>> +    rte_dev = rte_eth_devices[port_id].device;
>>> +    if (netdev_dpdk_get_num_ports(rte_dev)) {
>>
>> Probably same here.
>> DPDK does not set RTE_ETH_DEV_UNUSED state for PMDs without RTE_ETH_DEV_CLOSE_REMOVE flag.
> 
> If RTE_ETH_DEV_CLOSE_REMOVE is not set, you should call rte_dev_remove()
> just after rte_eth_dev_close().

And what if we call rte_dev_remove() after rte_eth_dev_close() for
PMD with RTE_ETH_DEV_CLOSE_REMOVE ?

> 
>>> +        response = xasprintf("Device '%s' is being shared with other "
>>> +                             "interfaces. Remove them before detaching.",
>>> +                             argv[1]);
>>> +        goto error;
>>> +    }
>>>  
>>> -    rte_eth_dev_info_get(port_id, &dev_info);
>>> -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
>>> -        response = xasprintf("Device '%s' can not be detached", argv[1]);
>>> +    rte_eth_dev_close(port_id);
>>> +    if (rte_dev_remove(rte_dev) < 0) {
>>> +        response = xasprintf("Device '%s' can not be removed", argv[1]);
> 
> 
> 
> 
>
Thomas Monjalon Jan. 10, 2019, 2:50 p.m. UTC | #5
10/01/2019 15:23, Ilya Maximets:
> On 10.01.2019 16:42, Thomas Monjalon wrote:
> > Hi,
> > 
> > 10/01/2019 12:32, Ilya Maximets:
> >> On 08.01.2019 12:31, Ophir Munk wrote:
> >>>      if (dev->attached) {
> >>> +        /* Remove the port eth device. */
> >>>          rte_eth_dev_close(dev->port_id);
> >>> -        rte_eth_dev_info_get(dev->port_id, &dev_info);
> >>> -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> >>> -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
> >>
> >> Please keep this log message for the successful case.
> >>
> >>> -        } else {
> >>> -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> >>> +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
> >>> +        /* If this is the last port_id using this rte device
> >>> +         * remove this rte device and all its eth devices.
> >>> +         * FIXME: avoid direct access to DPDK internal array rte_eth_devices.
> >>> +         */
> >>> +        rte_dev = rte_eth_devices[dev->port_id].device;
> >>> +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
> >>
> >> rte_eth_dev_close() sets the RTE_ETH_DEV_UNUSED state for PMDs with
> >> RTE_ETH_DEV_CLOSE_REMOVE flag.
> >> The function netdev_dpdk_get_num_ports() skips unused ports.
> >> The result should not be equal to 1. We need the other way to detect
> >> the siblings or close the port after calculating them.
> >> In this case we'll probably do not need to have check for UNUSED state
> >> inside the _get_num_ports().
> > 
> > Maybe I don't understand correctly.
> > Why do you want to count closed sibling ports?
> 
> Proposed version of 'netdev_dpdk_get_num_ports' iterates over
> 'dpdk_list' which contains only ports that currently opened by OVS.
> So, the only port that could have UNUSED state in that list is the
> port that we're just closed here. If we'll count number of siblings
> before closing the port where should be no UNUSED ports in that list.
> 
> The issue I tried to raise here is that rte_eth_dev_close() changed the
> state only for PMDs with RTE_ETH_DEV_CLOSE_REMOVE. So, the result
> of 'netdev_dpdk_get_num_ports' will be different for different PMDs.

Yes, that's true.
You can manage it two ways:
	- you rely on OVS list of open ports
	- you remove the device completely (as before)
	  for PMDs not supporting RTE_ETH_DEV_CLOSE_REMOVE

> >>> +            if (rte_dev_remove(rte_dev) < 0) {
> >>> +                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> >>> +            }
> >>>          }
> >>>      }
> > 
> > [...]
> >> Also, do we still need netdev_dpdk_get_port_by_mac() if we're using
> >> devargs iterator ?
> > 
> > The devargs iterator manages "mac=" property in DPDK 18.11.
> 
> Good.
> 
> > 
> > [...]
> >>> -    rte_eth_dev_close(port_id);
> >>> +    /* FIXME: avoid direct access to DPDK internal array rte_eth_devices. */
> >>> +    rte_dev = rte_eth_devices[port_id].device;
> >>> +    if (netdev_dpdk_get_num_ports(rte_dev)) {
> >>
> >> Probably same here.
> >> DPDK does not set RTE_ETH_DEV_UNUSED state for PMDs without RTE_ETH_DEV_CLOSE_REMOVE flag.
> > 
> > If RTE_ETH_DEV_CLOSE_REMOVE is not set, you should call rte_dev_remove()
> > just after rte_eth_dev_close().
> 
> And what if we call rte_dev_remove() after rte_eth_dev_close() for
> PMD with RTE_ETH_DEV_CLOSE_REMOVE ?

Then you release the full device, as before with the old "detach" API.
It makes no difference for single-port device.
But it is bad for multi-port devices (which should already implement
RTE_ETH_DEV_CLOSE_REMOVE).

> >>> +        response = xasprintf("Device '%s' is being shared with other "
> >>> +                             "interfaces. Remove them before detaching.",
> >>> +                             argv[1]);
> >>> +        goto error;
> >>> +    }
> >>>  
> >>> -    rte_eth_dev_info_get(port_id, &dev_info);
> >>> -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> >>> -        response = xasprintf("Device '%s' can not be detached", argv[1]);
> >>> +    rte_eth_dev_close(port_id);
> >>> +    if (rte_dev_remove(rte_dev) < 0) {
> >>> +        response = xasprintf("Device '%s' can not be removed", argv[1]);
Ophir Munk Jan. 13, 2019, 3:13 p.m. UTC | #6
Hi Ilya,
Thank you for your review.
Please find comments inline.

> -----Original Message-----
> From: Ilya Maximets <i.maximets@samsung.com>
> Sent: Thursday, January 10, 2019 1:32 PM
> To: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org
> Cc: Asaf Penso <asafp@mellanox.com>; Ian Stokes <ian.stokes@intel.com>;
> Shahaf Shuler <shahafs@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>; Kevin
> Traynor <ktraynor@redhat.com>
> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
> 
> On 08.01.2019 12:31, Ophir Munk wrote:
> > Dpdk port representors were introduced in dpdk versions 18.xx.
> > Prior to port representors there was a one-to-one relationship between
> > an rte device (e.g. PCI bus) and an eth device (referenced as dpdk
> > port id in OVS). With port representors the relationship becomes
> > one-to-many rte device to eth devices.
> > For example in [3] there are two devices (representors) using the same
> > PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
> > "0000:08:00.0,representor=[5]".
> > This commit handles the new one-to-many relationship. For example,
> > when one of the device port representors in [3] is closed - the PCI
> > bus cannot be detached until the other device port representor is
> > closed as well. OVS remains backward compatible by supporting dpdk
> > legacy PCI ports which do not include port representors.
> > Dpdk port representors related commits are listed in [1]. Dpdk port
> > representors documentation appears in [2]. A sample configuration
> > which uses two representors ports (the output of "ovs-vsctl show"
> > command) is shown in [3].
> > +
> >  static int
> >  vhost_common_construct(struct netdev *netdev)
> >      OVS_REQUIRES(dpdk_mutex)
> > @@ -1353,20 +1373,25 @@ static void
> >  netdev_dpdk_destruct(struct netdev *netdev)  {
> >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > -    struct rte_eth_dev_info dev_info;
> > +    struct rte_device *rte_dev;
> >
> >      ovs_mutex_lock(&dpdk_mutex);
> >
> >      rte_eth_dev_stop(dev->port_id);
> >      dev->started = false;
> > -
> 
> Please, keep the empty line.
> 

I will update in v5.

> >      if (dev->attached) {
> > +        /* Remove the port eth device. */
> >          rte_eth_dev_close(dev->port_id);
> > -        rte_eth_dev_info_get(dev->port_id, &dev_info);
> > -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> > -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
> 
> Please keep this log message for the successful case.

Please note the line below: 
   VLOG_INFO("Device '%s' has been removed", dev->devargs);

This is the log message for the successful case. With the introduction of representors several eth devices can use one PCI device.
If you remove the first representor - its eth device is closed - but you do not detach the PCI device till the second representor eth device is closed as well.
Would you suggest replacing "removed" with "closed" or "detached"?

> 
> > -        } else {
> > -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> > +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
> > +        /* If this is the last port_id using this rte device
> > +         * remove this rte device and all its eth devices.
> > +         * FIXME: avoid direct access to DPDK internal array rte_eth_devices.
> > +         */
> > +        rte_dev = rte_eth_devices[dev->port_id].device;
> > +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
> 
> rte_eth_dev_close() sets the RTE_ETH_DEV_UNUSED state for PMDs with
> RTE_ETH_DEV_CLOSE_REMOVE flag.
> The function netdev_dpdk_get_num_ports() skips unused ports.
> The result should not be equal to 1. We need the other way to detect the
> siblings or close the port after calculating them.
> In this case we'll probably do not need to have check for UNUSED state inside
> the _get_num_ports().
> 

I will reply to this in the exchange of emails that followed with Thomas.

> > +            if (rte_dev_remove(rte_dev) < 0) {
> > +                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> > +            }
> >          }
> >      }
> >

> > @@ -1632,8 +1657,37 @@ netdev_dpdk_get_port_by_mac(const char
> *mac_str)
> >      return DPDK_ETH_PORT_ID_INVALID;
> >  }
> >
> > +/* Return the first DPDK port_id matching the devargs pattern. */
> > +static dpdk_port_t netdev_dpdk_get_port_by_devargs(const char
> > +*devargs)
> 
> OVS_REQUIRES(dpdk_mutex)

I will update in v5

> 
> I'm not sure if 'netdev_dpdk_get_port_by_devargs' is the correct name
> because it not only finds (gets) the port, but probes (attaches) it.
> 

I think that the function main purpose is to find the port. Probing is optional for that purpose.
I am good with the current name but I am open to any new name you would like to suggest.
Can you please suggest?

> > +{
> > +    struct rte_dev_iterator iterator;
> > +    dpdk_port_t port_id;
> > +
> > +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
> 

> Invalid spacing around the FOREACH loop. 
> Same in all the other places.

I do not understand this comment. Can you please explain or demo the fix? 

> > +        /* If a break is done - must call rte_eth_iterator_cleanup. */
> > +        rte_eth_iterator_cleanup(&iterator);
> > +        break;
> > +    }
> > +    /* The port may be invalid if it was not probed */
> > +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
> > +        if (rte_dev_probe(devargs)) {
> > +            port_id = DPDK_ETH_PORT_ID_INVALID;
> > +        } else {
> > +            RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
> > +                /* If a break is done - must call rte_eth_iterator_cleanup. */
> > +                rte_eth_iterator_cleanup(&iterator);
> > +                break;
> > +            }
> > +        }
> > +    }
> > +
> > +    return port_id;
> > +}
> > +
> >  /*
> > - * Normally, a PCI id is enough for identifying a specific DPDK port.
> > + * Normally, a PCI id (optionally followed by a representor number)
> > + * is enough for identifying a specific DPDK port.
> >   * However, for some NICs having multiple ports sharing the same PCI
> >   * id, using PCI id won't work then.
> >   *
> > @@ -1641,34 +1695,42 @@ netdev_dpdk_get_port_by_mac(const char
> *mac_str)
> >   *
> >   * Note that the compatibility is fully kept: user can still use the
> >   * PCI id for adding ports (when it's enough for them).
> > + *
> > + * In order to avoid clang errors add OVS_REQUIRES(dpdk_mutex) to
> > + this function
> > + * since it is required by netdev_dpdk_lookup_by_port_id() while the
> > + dpdk_mutex
> > + * is taken outside of this function.
> >   */
> >  static dpdk_port_t
> >  netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
> >                              const char *devargs, char **errp)
> > +    OVS_REQUIRES(dpdk_mutex)
> >  {
> > -    char *name;
> >      dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
> >
> >      if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
> >          new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
> >      } else {
> > -        name = xmemdup0(devargs, strcspn(devargs, ","));
> > -        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
> > -                || !rte_eth_dev_is_valid_port(new_port_id)) {
> > -            /* Device not found in DPDK, attempt to attach it */
> > -            if (!rte_dev_probe(devargs)
> > -                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
> > -                /* Attach successful */
> > -                dev->attached = true;
> > -                VLOG_INFO("Device '%s' attached to DPDK", devargs);
> > -            } else {
> > -                /* Attach unsuccessful */
> > +        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
> > +        if (!rte_eth_dev_is_valid_port(new_port_id)) {
> > +            new_port_id = DPDK_ETH_PORT_ID_INVALID;
> > +        } else {
> > +            struct netdev_dpdk *dup_dev;
> > +
> > +            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> > +            if (dup_dev) {
> > +                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> > +                               "which is already in use by '%s'",
> > +                          netdev_get_name(&dev->up), devargs,
> > +                          netdev_get_name(&dup_dev->up));
> 
> Why you duplicating this part of code ?
> Can we just replace 'rte_eth_dev_get_port_by_name' with
> 'netdev_dpdk_get_port_by_devargs' and keep all the other logic ?
> 
> netdev_dpdk_get_port_by_devargs should look like this in this case:
> 
> /* Return the first DPDK port_id matching the devargs pattern. */ static bool
> netdev_dpdk_get_port_by_devargs(const char *devargs, dpdk_port_t
> *port_id)
>     OVS_REQUIRES(dpdk_mutex)
> {
>     struct rte_dev_iterator iterator;
> 
>     RTE_ETH_FOREACH_MATCHING_DEV (*port_id, devargs, &iterator) {
>         /* If a break is done - must call rte_eth_iterator_cleanup. */
>         rte_eth_iterator_cleanup(&iterator);
>         break;
>     }
>     return *port_id != DPDK_ETH_PORT_ID_INVALID; }
> 
> netdev_dpdk_process_devargs:
> 
>         if (netdev_dpdk_get_port_by_devargs(devagrs, &new_port_id)
>                 || !rte_eth_dev_is_valid_port(new_port_id)) {
>             /* Device not found in DPDK, attempt to attach it */
>             if (!rte_dev_probe(devargs)
>                 && !netdev_dpdk_get_port_by_devargs(devargs, &new_port_id)) {
>                 /* Attach successful */
>                 dev->attached = true;
>                 VLOG_INFO("Device '%s' attached to DPDK", devargs);
>             } else {
>                 /* Attach unsuccessful */
>                 new_port_id = DPDK_ETH_PORT_ID_INVALID;
>             }
>         }
> 
> If it works, I'd prefer this variant.
> 

I don't think it works in case you add two different ports with the same representor devargs (or the same PCI address with a PMD supporting representors).
The extra code is required to avoid this misconfiguration (using the same device for two different ports). 
For example, the following configuration must fail:

ovs-vsctl --no-wait add-port br1 port-one -- set Interface port-one type=dpdk options:dpdk-devargs=0000:08:00.0,representor=[1]
ovs-vsctl --no-wait add-port br1 port-two -- set Interface port-two type=dpdk options:dpdk-devargs=0000:08:00.0,representor=[1]

In such misconfiguration please note the waring you get:
                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
                               "which is already in use by '%s'",
                          netdev_get_name(&dev->up), devargs,
                          netdev_get_name(&dup_dev->up));

An interesting question is why didn't we need this extra code in previous OVS versions.
The reason is that if we called rte_dev_probe() a second time for the same PCI device - we got an error. So we were immune against this misconfiguration.
This was the behavior before the introduction of representors. With representor - calling rte_dev_probe() twice for the same devargs will return successfully, hence the need for an additional check (extra code).

> Also, do we still need netdev_dpdk_get_port_by_mac() if we're using
> devargs iterator ?
> 

I will reply to this in the exchange of emails that followed with Thomas.
In short - the devargs iterator seems broken in parsing mac addresses.

> >                  new_port_id = DPDK_ETH_PORT_ID_INVALID;
> > +            } else {
> > +                /* Device successfully found. */
> > +                dev->attached = true;
> > +                VLOG_INFO("Device '%s' attached to DPDK port %d",
> > +                          devargs, new_port_id);
> >              }
> >          }
> > -        free(name);
> >      }
> > -
> 
> Please, keep the empty line.

I will update in v5

> 
> >      if (new_port_id == DPDK_ETH_PORT_ID_INVALID) {
> >          VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK",
> devargs);
> >      }
> > @@ -1761,7 +1823,7 @@ netdev_dpdk_set_config(struct netdev *netdev,
> const struct smap *args,
> >                  dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> >                  if (dup_dev) {
> >                      VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> > -                                        "which is already in use by '%s'",
> > +                                  "which is already in use by '%s'",
> >                                    netdev_get_name(netdev), new_devargs,
> >                                    netdev_get_name(&dup_dev->up));
> >                      err = EADDRINUSE; @@ -3236,11 +3298,17 @@
> > netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
> >      char *response;
> >      dpdk_port_t port_id;
> >      struct netdev_dpdk *dev;
> > -    struct rte_eth_dev_info dev_info;
> > +    struct rte_device *rte_dev;
> > +    struct rte_dev_iterator iterator;
> >
> >      ovs_mutex_lock(&dpdk_mutex);
> >
> > -    if (rte_eth_dev_get_port_by_name(argv[1], &port_id)) {
> > +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, argv[1], &iterator) {
> > +        /* If a break is done - must call rte_eth_iterator_cleanup. */
> > +        rte_eth_iterator_cleanup(&iterator);
> > +        break;
> > +    }
> > +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
> >          response = xasprintf("Device '%s' not found in DPDK", argv[1]);
> >          goto error;
> >      }
> > @@ -3253,15 +3321,23 @@ netdev_dpdk_detach(struct unixctl_conn
> *conn, int argc OVS_UNUSED,
> >          goto error;
> >      }
> >
> > -    rte_eth_dev_close(port_id);
> > +    /* FIXME: avoid direct access to DPDK internal array rte_eth_devices. */
> > +    rte_dev = rte_eth_devices[port_id].device;
> > +    if (netdev_dpdk_get_num_ports(rte_dev)) {
> 
> Probably same here.
> DPDK does not set RTE_ETH_DEV_UNUSED state for PMDs without
> RTE_ETH_DEV_CLOSE_REMOVE flag.

Same here: I will reply to this in the exchange of emails that followed with Thomas.

> 
> > +        response = xasprintf("Device '%s' is being shared with other "
> > +                             "interfaces. Remove them before detaching.",
> > +                             argv[1]);
> > +        goto error;
> > +    }
> >
> > -    rte_eth_dev_info_get(port_id, &dev_info);
> > -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> > -        response = xasprintf("Device '%s' can not be detached", argv[1]);
> > +    rte_eth_dev_close(port_id);
> > +    if (rte_dev_remove(rte_dev) < 0) {
> > +        response = xasprintf("Device '%s' can not be removed",
> > + argv[1]);
> 
> /removed/detached/
> You're using different words for equal situations. This is misleading.
> 

I can change the wording to "detached'. However I think it is misleading in a different way:
Assuming there are two representors using the same PCI bus - then "detaching" one representor will only
close the eth device, but will not detach the PCI device until the second representor is closed as well.
When all eth devices are closed - please note the message below:
   response = xasprintf("All devices shared with device '%s' "
                         "have been detached", argv[1]);

Having said that - do you still suggest replacing "removed" with "detached"?

> >
> > -    response = xasprintf("Device '%s' has been detached", argv[1]);
> > +    response = xasprintf("All devices shared with device '%s' "
> > +                          "have been detached", argv[1]);
> >
> >      ovs_mutex_unlock(&dpdk_mutex);
> >      unixctl_command_reply(conn, response);
> >
Ophir Munk Jan. 14, 2019, 9:23 a.m. UTC | #7
Hi Thomas and Ilya.
Thank you for your reviews. 
Please find comments inline.

Regards,
Ophir

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, January 10, 2019 4:50 PM
> To: Ilya Maximets <i.maximets@samsung.com>
> Cc: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org; Asaf
> Penso <asafp@mellanox.com>; Ian Stokes <ian.stokes@intel.com>; Shahaf
> Shuler <shahafs@mellanox.com>; Olga Shern <olgas@mellanox.com>; Kevin
> Traynor <ktraynor@redhat.com>
> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
> 
> 10/01/2019 15:23, Ilya Maximets:
> > On 10.01.2019 16:42, Thomas Monjalon wrote:
> > > Hi,
> > >
> > > 10/01/2019 12:32, Ilya Maximets:
> > >> On 08.01.2019 12:31, Ophir Munk wrote:
> > >>>      if (dev->attached) {
> > >>> +        /* Remove the port eth device. */
> > >>>          rte_eth_dev_close(dev->port_id);
> > >>> -        rte_eth_dev_info_get(dev->port_id, &dev_info);
> > >>> -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> > >>> -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
> > >>
> > >> Please keep this log message for the successful case.
> > >>
> > >>> -        } else {
> > >>> -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> > >>> +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
> > >>> +        /* If this is the last port_id using this rte device
> > >>> +         * remove this rte device and all its eth devices.
> > >>> +         * FIXME: avoid direct access to DPDK internal array
> rte_eth_devices.
> > >>> +         */
> > >>> +        rte_dev = rte_eth_devices[dev->port_id].device;
> > >>> +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
> > >>
> > >> rte_eth_dev_close() sets the RTE_ETH_DEV_UNUSED state for PMDs
> with
> > >> RTE_ETH_DEV_CLOSE_REMOVE flag.
> > >> The function netdev_dpdk_get_num_ports() skips unused ports.
> > >> The result should not be equal to 1. We need the other way to
> > >> detect the siblings or close the port after calculating them.
> > >> In this case we'll probably do not need to have check for UNUSED
> > >> state inside the _get_num_ports().
> > >
> > > Maybe I don't understand correctly.
> > > Why do you want to count closed sibling ports?
> >
> > Proposed version of 'netdev_dpdk_get_num_ports' iterates over
> > 'dpdk_list' which contains only ports that currently opened by OVS.
> > So, the only port that could have UNUSED state in that list is the
> > port that we're just closed here. If we'll count number of siblings
> > before closing the port where should be no UNUSED ports in that list.
> >
> > The issue I tried to raise here is that rte_eth_dev_close() changed
> > the state only for PMDs with RTE_ETH_DEV_CLOSE_REMOVE. So, the result
> > of 'netdev_dpdk_get_num_ports' will be different for different PMDs.
> 
> Yes, that's true.
> You can manage it two ways:
> 	- you rely on OVS list of open ports
> 	- you remove the device completely (as before)
> 	  for PMDs not supporting RTE_ETH_DEV_CLOSE_REMOVE
> 

In v5 I will implemented this last option: when RTE_ETH_DEV_CLOSE_REMOVE is not supported the code will be as before representors.

> > >>> +            if (rte_dev_remove(rte_dev) < 0) {
> > >>> +                VLOG_ERR("Device '%s' can not be detached", dev-
> >devargs);
> > >>> +            }
> > >>>          }
> > >>>      }
> > >
> > > [...]
> > >> Also, do we still need netdev_dpdk_get_port_by_mac() if we're using
> > >> devargs iterator ?
> > >
> > > The devargs iterator manages "mac=" property in DPDK 18.11.
> >
> > Good.
> >

In PATCH v1 I have already tried to remove the get_port_by_mac() call, but the devargs iterator failed to correctly parse the mac address. 
Re-testing it now I get the same failure.
I will remove the get_port_by_mac() call as soon as there is a fix in devargs iterator.

> > >
> > > [...]
> > >>> -    rte_eth_dev_close(port_id);
> > >>> +    /* FIXME: avoid direct access to DPDK internal array
> rte_eth_devices. */
> > >>> +    rte_dev = rte_eth_devices[port_id].device;
> > >>> +    if (netdev_dpdk_get_num_ports(rte_dev)) {
> > >>
> > >> Probably same here.
> > >> DPDK does not set RTE_ETH_DEV_UNUSED state for PMDs without
> RTE_ETH_DEV_CLOSE_REMOVE flag.
> > >
> > > If RTE_ETH_DEV_CLOSE_REMOVE is not set, you should call
> > > rte_dev_remove() just after rte_eth_dev_close().
> >
> > And what if we call rte_dev_remove() after rte_eth_dev_close() for PMD
> > with RTE_ETH_DEV_CLOSE_REMOVE ?
> 
> Then you release the full device, as before with the old "detach" API.
> It makes no difference for single-port device.
> But it is bad for multi-port devices (which should already implement
> RTE_ETH_DEV_CLOSE_REMOVE).

In v5 I will implemented this last option: when RTE_ETH_DEV_CLOSE_REMOVE is not supported the code will be as before representors.

> 
> > >>> +        response = xasprintf("Device '%s' is being shared with other "
> > >>> +                             "interfaces. Remove them before detaching.",
> > >>> +                             argv[1]);
> > >>> +        goto error;
> > >>> +    }
> > >>>
> > >>> -    rte_eth_dev_info_get(port_id, &dev_info);
> > >>> -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> > >>> -        response = xasprintf("Device '%s' can not be detached", argv[1]);
> > >>> +    rte_eth_dev_close(port_id);
> > >>> +    if (rte_dev_remove(rte_dev) < 0) {
> > >>> +        response = xasprintf("Device '%s' can not be removed",
> > >>> + argv[1]);
> 
>
Ilya Maximets Jan. 14, 2019, 10:10 a.m. UTC | #8
On 13.01.2019 18:13, Ophir Munk wrote:
> Hi Ilya,
> Thank you for your review.
> Please find comments inline.
> 
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets@samsung.com>
>> Sent: Thursday, January 10, 2019 1:32 PM
>> To: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org
>> Cc: Asaf Penso <asafp@mellanox.com>; Ian Stokes <ian.stokes@intel.com>;
>> Shahaf Shuler <shahafs@mellanox.com>; Thomas Monjalon
>> <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>; Kevin
>> Traynor <ktraynor@redhat.com>
>> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
>>
>> On 08.01.2019 12:31, Ophir Munk wrote:
>>> Dpdk port representors were introduced in dpdk versions 18.xx.
>>> Prior to port representors there was a one-to-one relationship between
>>> an rte device (e.g. PCI bus) and an eth device (referenced as dpdk
>>> port id in OVS). With port representors the relationship becomes
>>> one-to-many rte device to eth devices.
>>> For example in [3] there are two devices (representors) using the same
>>> PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
>>> "0000:08:00.0,representor=[5]".
>>> This commit handles the new one-to-many relationship. For example,
>>> when one of the device port representors in [3] is closed - the PCI
>>> bus cannot be detached until the other device port representor is
>>> closed as well. OVS remains backward compatible by supporting dpdk
>>> legacy PCI ports which do not include port representors.
>>> Dpdk port representors related commits are listed in [1]. Dpdk port
>>> representors documentation appears in [2]. A sample configuration
>>> which uses two representors ports (the output of "ovs-vsctl show"
>>> command) is shown in [3].
>>> +
>>>  static int
>>>  vhost_common_construct(struct netdev *netdev)
>>>      OVS_REQUIRES(dpdk_mutex)
>>> @@ -1353,20 +1373,25 @@ static void
>>>  netdev_dpdk_destruct(struct netdev *netdev)  {
>>>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>> -    struct rte_eth_dev_info dev_info;
>>> +    struct rte_device *rte_dev;
>>>
>>>      ovs_mutex_lock(&dpdk_mutex);
>>>
>>>      rte_eth_dev_stop(dev->port_id);
>>>      dev->started = false;
>>> -
>>
>> Please, keep the empty line.
>>
> 
> I will update in v5.
> 
>>>      if (dev->attached) {
>>> +        /* Remove the port eth device. */
>>>          rte_eth_dev_close(dev->port_id);
>>> -        rte_eth_dev_info_get(dev->port_id, &dev_info);
>>> -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
>>> -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
>>
>> Please keep this log message for the successful case.
> 
> Please note the line below: 
>    VLOG_INFO("Device '%s' has been removed", dev->devargs);
> 
> This is the log message for the successful case. With the introduction of representors several eth devices can use one PCI device.
> If you remove the first representor - its eth device is closed - but you do not detach the PCI device till the second representor eth device is closed as well.
> Would you suggest replacing "removed" with "closed" or "detached"?

I understand what you're trying to do here. What I'm trying to say
is that in current version, message "has been removed" states that
device was closed, but we do not know if we tried to detach it.
We now have two levels of getting port out of OVS. First is "removing"
(rte_eth_dev_close) and the second is "detaching" (rte_dev_remove,
kind of confusing).
So, in current code we do not know if we detached the port or we skipped
detaching because of existence of other ports sharing the same device.

> 
>>
>>> -        } else {
>>> -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
>>> +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
>>> +        /* If this is the last port_id using this rte device
>>> +         * remove this rte device and all its eth devices.
>>> +         * FIXME: avoid direct access to DPDK internal array rte_eth_devices.
>>> +         */
>>> +        rte_dev = rte_eth_devices[dev->port_id].device;
>>> +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
>>
>> rte_eth_dev_close() sets the RTE_ETH_DEV_UNUSED state for PMDs with
>> RTE_ETH_DEV_CLOSE_REMOVE flag.
>> The function netdev_dpdk_get_num_ports() skips unused ports.
>> The result should not be equal to 1. We need the other way to detect the
>> siblings or close the port after calculating them.
>> In this case we'll probably do not need to have check for UNUSED state inside
>> the _get_num_ports().
>>
> 
> I will reply to this in the exchange of emails that followed with Thomas.
> 
>>> +            if (rte_dev_remove(rte_dev) < 0) {
>>> +                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
>>> +            }
>>>          }
>>>      }
>>>
> 
>>> @@ -1632,8 +1657,37 @@ netdev_dpdk_get_port_by_mac(const char
>> *mac_str)
>>>      return DPDK_ETH_PORT_ID_INVALID;
>>>  }
>>>
>>> +/* Return the first DPDK port_id matching the devargs pattern. */
>>> +static dpdk_port_t netdev_dpdk_get_port_by_devargs(const char
>>> +*devargs)
>>
>> OVS_REQUIRES(dpdk_mutex)
> 
> I will update in v5
> 
>>
>> I'm not sure if 'netdev_dpdk_get_port_by_devargs' is the correct name
>> because it not only finds (gets) the port, but probes (attaches) it.
>>
> 
> I think that the function main purpose is to find the port. Probing is optional for that purpose.
> I am good with the current name but I am open to any new name you would like to suggest.
> Can you please suggest?

IMHO, the main purpose of this function is to probe the port because
searching will succeed only if we're trying to add duplicated port.
'netdev_dpdk_probe_port_by_devargs' should be better, IMHO.

> 
>>> +{
>>> +    struct rte_dev_iterator iterator;
>>> +    dpdk_port_t port_id;
>>> +
>>> +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
>>
> 
>> Invalid spacing around the FOREACH loop. 
>> Same in all the other places.
> 
> I do not understand this comment. Can you please explain or demo the fix?

You just need an extra space before the '(' like this:
	'RTE_ETH_FOREACH_MATCHING_DEV ('

I'll send a patch for checkpatch script to catch this. Right now it
supports only 'FOR_EACH' loops, not the 'FOREACH'.

> 
>>> +        /* If a break is done - must call rte_eth_iterator_cleanup. */
>>> +        rte_eth_iterator_cleanup(&iterator);
>>> +        break;
>>> +    }
>>> +    /* The port may be invalid if it was not probed */
>>> +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
>>> +        if (rte_dev_probe(devargs)) {
>>> +            port_id = DPDK_ETH_PORT_ID_INVALID;
>>> +        } else {
>>> +            RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
>>> +                /* If a break is done - must call rte_eth_iterator_cleanup. */
>>> +                rte_eth_iterator_cleanup(&iterator);
>>> +                break;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    return port_id;
>>> +}
>>> +
>>>  /*
>>> - * Normally, a PCI id is enough for identifying a specific DPDK port.
>>> + * Normally, a PCI id (optionally followed by a representor number)
>>> + * is enough for identifying a specific DPDK port.
>>>   * However, for some NICs having multiple ports sharing the same PCI
>>>   * id, using PCI id won't work then.
>>>   *
>>> @@ -1641,34 +1695,42 @@ netdev_dpdk_get_port_by_mac(const char
>> *mac_str)
>>>   *
>>>   * Note that the compatibility is fully kept: user can still use the
>>>   * PCI id for adding ports (when it's enough for them).
>>> + *
>>> + * In order to avoid clang errors add OVS_REQUIRES(dpdk_mutex) to
>>> + this function
>>> + * since it is required by netdev_dpdk_lookup_by_port_id() while the
>>> + dpdk_mutex
>>> + * is taken outside of this function.
>>>   */
>>>  static dpdk_port_t
>>>  netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
>>>                              const char *devargs, char **errp)
>>> +    OVS_REQUIRES(dpdk_mutex)
>>>  {
>>> -    char *name;
>>>      dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
>>>
>>>      if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
>>>          new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
>>>      } else {
>>> -        name = xmemdup0(devargs, strcspn(devargs, ","));
>>> -        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
>>> -                || !rte_eth_dev_is_valid_port(new_port_id)) {
>>> -            /* Device not found in DPDK, attempt to attach it */
>>> -            if (!rte_dev_probe(devargs)
>>> -                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
>>> -                /* Attach successful */
>>> -                dev->attached = true;
>>> -                VLOG_INFO("Device '%s' attached to DPDK", devargs);
>>> -            } else {
>>> -                /* Attach unsuccessful */
>>> +        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
>>> +        if (!rte_eth_dev_is_valid_port(new_port_id)) {
>>> +            new_port_id = DPDK_ETH_PORT_ID_INVALID;
>>> +        } else {
>>> +            struct netdev_dpdk *dup_dev;
>>> +
>>> +            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
>>> +            if (dup_dev) {
>>> +                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
>>> +                               "which is already in use by '%s'",
>>> +                          netdev_get_name(&dev->up), devargs,
>>> +                          netdev_get_name(&dup_dev->up));
>>
>> Why you duplicating this part of code ?
>> Can we just replace 'rte_eth_dev_get_port_by_name' with
>> 'netdev_dpdk_get_port_by_devargs' and keep all the other logic ?
>>
>> netdev_dpdk_get_port_by_devargs should look like this in this case:
>>
>> /* Return the first DPDK port_id matching the devargs pattern. */ static bool
>> netdev_dpdk_get_port_by_devargs(const char *devargs, dpdk_port_t
>> *port_id)
>>     OVS_REQUIRES(dpdk_mutex)
>> {
>>     struct rte_dev_iterator iterator;
>>
>>     RTE_ETH_FOREACH_MATCHING_DEV (*port_id, devargs, &iterator) {
>>         /* If a break is done - must call rte_eth_iterator_cleanup. */
>>         rte_eth_iterator_cleanup(&iterator);
>>         break;
>>     }
>>     return *port_id != DPDK_ETH_PORT_ID_INVALID; }
>>
>> netdev_dpdk_process_devargs:
>>
>>         if (netdev_dpdk_get_port_by_devargs(devagrs, &new_port_id)
>>                 || !rte_eth_dev_is_valid_port(new_port_id)) {
>>             /* Device not found in DPDK, attempt to attach it */
>>             if (!rte_dev_probe(devargs)
>>                 && !netdev_dpdk_get_port_by_devargs(devargs, &new_port_id)) {
>>                 /* Attach successful */
>>                 dev->attached = true;
>>                 VLOG_INFO("Device '%s' attached to DPDK", devargs);
>>             } else {
>>                 /* Attach unsuccessful */
>>                 new_port_id = DPDK_ETH_PORT_ID_INVALID;
>>             }
>>         }
>>
>> If it works, I'd prefer this variant.
>>
> 
> I don't think it works in case you add two different ports with the same representor devargs (or the same PCI address with a PMD supporting representors).
> The extra code is required to avoid this misconfiguration (using the same device for two different ports). 
> For example, the following configuration must fail:
> 
> ovs-vsctl --no-wait add-port br1 port-one -- set Interface port-one type=dpdk options:dpdk-devargs=0000:08:00.0,representor=[1]
> ovs-vsctl --no-wait add-port br1 port-two -- set Interface port-two type=dpdk options:dpdk-devargs=0000:08:00.0,representor=[1]
> 
> In such misconfiguration please note the waring you get:
>                 VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
>                                "which is already in use by '%s'",
>                           netdev_get_name(&dev->up), devargs,
>                           netdev_get_name(&dup_dev->up));
> 
> An interesting question is why didn't we need this extra code in previous OVS versions.
> The reason is that if we called rte_dev_probe() a second time for the same PCI device - we got an error. So we were immune against this misconfiguration.
> This was the behavior before the introduction of representors. With representor - calling rte_dev_probe() twice for the same devargs will return successfully, hence the need for an additional check (extra code).

But we have the same check on the upper level. If you will not
check here, the new_port_id will go to netdev_dpdk_set_config(), where
we have equal check for duplicated ports. Why this doesn't work for you?

> 
>> Also, do we still need netdev_dpdk_get_port_by_mac() if we're using
>> devargs iterator ?
>>
> 
> I will reply to this in the exchange of emails that followed with Thomas.
> In short - the devargs iterator seems broken in parsing mac addresses.
> 
>>>                  new_port_id = DPDK_ETH_PORT_ID_INVALID;
>>> +            } else {
>>> +                /* Device successfully found. */
>>> +                dev->attached = true;
>>> +                VLOG_INFO("Device '%s' attached to DPDK port %d",
>>> +                          devargs, new_port_id);
>>>              }
>>>          }
>>> -        free(name);
>>>      }
>>> -
>>
>> Please, keep the empty line.
> 
> I will update in v5
> 
>>
>>>      if (new_port_id == DPDK_ETH_PORT_ID_INVALID) {
>>>          VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK",
>> devargs);
>>>      }
>>> @@ -1761,7 +1823,7 @@ netdev_dpdk_set_config(struct netdev *netdev,
>> const struct smap *args,
>>>                  dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
>>>                  if (dup_dev) {
>>>                      VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
>>> -                                        "which is already in use by '%s'",
>>> +                                  "which is already in use by '%s'",
>>>                                    netdev_get_name(netdev), new_devargs,
>>>                                    netdev_get_name(&dup_dev->up));
>>>                      err = EADDRINUSE; @@ -3236,11 +3298,17 @@
>>> netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
>>>      char *response;
>>>      dpdk_port_t port_id;
>>>      struct netdev_dpdk *dev;
>>> -    struct rte_eth_dev_info dev_info;
>>> +    struct rte_device *rte_dev;
>>> +    struct rte_dev_iterator iterator;
>>>
>>>      ovs_mutex_lock(&dpdk_mutex);
>>>
>>> -    if (rte_eth_dev_get_port_by_name(argv[1], &port_id)) {
>>> +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, argv[1], &iterator) {
>>> +        /* If a break is done - must call rte_eth_iterator_cleanup. */
>>> +        rte_eth_iterator_cleanup(&iterator);
>>> +        break;
>>> +    }
>>> +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
>>>          response = xasprintf("Device '%s' not found in DPDK", argv[1]);
>>>          goto error;
>>>      }
>>> @@ -3253,15 +3321,23 @@ netdev_dpdk_detach(struct unixctl_conn
>> *conn, int argc OVS_UNUSED,
>>>          goto error;
>>>      }
>>>
>>> -    rte_eth_dev_close(port_id);
>>> +    /* FIXME: avoid direct access to DPDK internal array rte_eth_devices. */
>>> +    rte_dev = rte_eth_devices[port_id].device;
>>> +    if (netdev_dpdk_get_num_ports(rte_dev)) {
>>
>> Probably same here.
>> DPDK does not set RTE_ETH_DEV_UNUSED state for PMDs without
>> RTE_ETH_DEV_CLOSE_REMOVE flag.
> 
> Same here: I will reply to this in the exchange of emails that followed with Thomas.
> 
>>
>>> +        response = xasprintf("Device '%s' is being shared with other "
>>> +                             "interfaces. Remove them before detaching.",
>>> +                             argv[1]);
>>> +        goto error;
>>> +    }
>>>
>>> -    rte_eth_dev_info_get(port_id, &dev_info);
>>> -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
>>> -        response = xasprintf("Device '%s' can not be detached", argv[1]);
>>> +    rte_eth_dev_close(port_id);
>>> +    if (rte_dev_remove(rte_dev) < 0) {
>>> +        response = xasprintf("Device '%s' can not be removed",
>>> + argv[1]);
>>
>> /removed/detached/
>> You're using different words for equal situations. This is misleading.
>>
> 
> I can change the wording to "detached'. However I think it is misleading in a different way:
> Assuming there are two representors using the same PCI bus - then "detaching" one representor will only
> close the eth device, but will not detach the PCI device until the second representor is closed as well.
> When all eth devices are closed - please note the message below:
>    response = xasprintf("All devices shared with device '%s' "
>                          "have been detached", argv[1]);
> 
> Having said that - do you still suggest replacing "removed" with "detached"?

I'm suggesting to separate two cases:

 1. rte_eth_dev_close() executed.
 2. rte_dev_remove() executed.

The first one we could call "Port '%s' closed", the second "All devices shared
with device '%s' detached". And as we have two places where this could happen,
in both places we should have equal messages for the equal events. You may vary
the wording, but, IMHO, it should be clear from the log which of above events
occurred. And we should log every our action (1 and 2) for this purpose.
Otherwise it'll be hard to tell what devices are still attached to OVS.

> 
>>>
>>> -    response = xasprintf("Device '%s' has been detached", argv[1]);
>>> +    response = xasprintf("All devices shared with device '%s' "
>>> +                          "have been detached", argv[1]);
>>>
>>>      ovs_mutex_unlock(&dpdk_mutex);
>>>      unixctl_command_reply(conn, response);
>>>
Ophir Munk Jan. 15, 2019, 8:39 a.m. UTC | #9
Hi Ilya,
Please find comments inline

> -----Original Message-----
> From: Ilya Maximets <i.maximets@samsung.com>
> Sent: Monday, January 14, 2019 12:11 PM
> To: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org
> Cc: Asaf Penso <asafp@mellanox.com>; Ian Stokes <ian.stokes@intel.com>;
> Shahaf Shuler <shahafs@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>; Kevin
> Traynor <ktraynor@redhat.com>
> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
> 
> On 13.01.2019 18:13, Ophir Munk wrote:
> > Hi Ilya,
> > Thank you for your review.
> > Please find comments inline.
> >
> >> -----Original Message-----
> >> From: Ilya Maximets <i.maximets@samsung.com>
> >> Sent: Thursday, January 10, 2019 1:32 PM
> >> To: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org
> >> Cc: Asaf Penso <asafp@mellanox.com>; Ian Stokes
> >> <ian.stokes@intel.com>; Shahaf Shuler <shahafs@mellanox.com>;
> Thomas
> >> Monjalon <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>;
> >> Kevin Traynor <ktraynor@redhat.com>
> >> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
> >>
> >> On 08.01.2019 12:31, Ophir Munk wrote:
> >>> Dpdk port representors were introduced in dpdk versions 18.xx.
> >>> Prior to port representors there was a one-to-one relationship
> >>> between an rte device (e.g. PCI bus) and an eth device (referenced
> >>> as dpdk port id in OVS). With port representors the relationship
> >>> becomes one-to-many rte device to eth devices.
> >>> For example in [3] there are two devices (representors) using the
> >>> same PCI physical address 0000:08:00.0:
> >>> "0000:08:00.0,representor=[3]" and "0000:08:00.0,representor=[5]".
> >>> This commit handles the new one-to-many relationship. For example,
> >>> when one of the device port representors in [3] is closed - the PCI
> >>> bus cannot be detached until the other device port representor is
> >>> closed as well. OVS remains backward compatible by supporting dpdk
> >>> legacy PCI ports which do not include port representors.
> >>> Dpdk port representors related commits are listed in [1]. Dpdk port
> >>> representors documentation appears in [2]. A sample configuration
> >>> which uses two representors ports (the output of "ovs-vsctl show"
> >>> command) is shown in [3].
> >>> +
> >>>  static int
> >>>  vhost_common_construct(struct netdev *netdev)
> >>>      OVS_REQUIRES(dpdk_mutex)
> >>> @@ -1353,20 +1373,25 @@ static void
> >>>  netdev_dpdk_destruct(struct netdev *netdev)  {
> >>>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> >>> -    struct rte_eth_dev_info dev_info;
> >>> +    struct rte_device *rte_dev;
> >>>
> >>>      ovs_mutex_lock(&dpdk_mutex);
> >>>
> >>>      rte_eth_dev_stop(dev->port_id);
> >>>      dev->started = false;
> >>> -
> >>
> >> Please, keep the empty line.
> >>
> >
> > I will update in v5.
> >
> >>>      if (dev->attached) {
> >>> +        /* Remove the port eth device. */
> >>>          rte_eth_dev_close(dev->port_id);
> >>> -        rte_eth_dev_info_get(dev->port_id, &dev_info);
> >>> -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> >>> -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
> >>
> >> Please keep this log message for the successful case.
> >
> > Please note the line below:
> >    VLOG_INFO("Device '%s' has been removed", dev->devargs);
> >
> > This is the log message for the successful case. With the introduction of
> representors several eth devices can use one PCI device.
> > If you remove the first representor - its eth device is closed - but you do
> not detach the PCI device till the second representor eth device is closed as
> well.
> > Would you suggest replacing "removed" with "closed" or "detached"?
> 
> I understand what you're trying to do here. What I'm trying to say is that in
> current version, message "has been removed" states that device was closed,
> but we do not know if we tried to detach it.
> We now have two levels of getting port out of OVS. First is "removing"
> (rte_eth_dev_close) and the second is "detaching" (rte_dev_remove, kind
> of confusing).
> So, in current code we do not know if we detached the port or we skipped
> detaching because of existence of other ports sharing the same device.
> 

In v5 I will inform if a device was removed (e.g. closed) or detached.

> >>
> >> I'm not sure if 'netdev_dpdk_get_port_by_devargs' is the correct name
> >> because it not only finds (gets) the port, but probes (attaches) it.
> >>
> >
> > I think that the function main purpose is to find the port. Probing is optional
> for that purpose.
> > I am good with the current name but I am open to any new name you
> would like to suggest.
> > Can you please suggest?
> 
> IMHO, the main purpose of this function is to probe the port because
> searching will succeed only if we're trying to add duplicated port.
> 'netdev_dpdk_probe_port_by_devargs' should be better, IMHO.
> 

In v5 I will update the function name per your suggestion.

> >
> >>> +{
> >>> +    struct rte_dev_iterator iterator;
> >>> +    dpdk_port_t port_id;
> >>> +
> >>> +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
> >>
> >
> >> Invalid spacing around the FOREACH loop.
> >> Same in all the other places.
> >
> > I do not understand this comment. Can you please explain or demo the fix?
> 
> You just need an extra space before the '(' like this:
> 	'RTE_ETH_FOREACH_MATCHING_DEV ('
> 

I will update in v5

> I'll send a patch for checkpatch script to catch this. Right now it supports only
> 'FOR_EACH' loops, not the 'FOREACH'.
> 
> >
> >>>      dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
> >>>
> >>>      if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
> >>>          new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
> >>>      } else {
> >>> -        name = xmemdup0(devargs, strcspn(devargs, ","));
> >>> -        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
> >>> -                || !rte_eth_dev_is_valid_port(new_port_id)) {
> >>> -            /* Device not found in DPDK, attempt to attach it */
> >>> -            if (!rte_dev_probe(devargs)
> >>> -                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
> >>> -                /* Attach successful */
> >>> -                dev->attached = true;
> >>> -                VLOG_INFO("Device '%s' attached to DPDK", devargs);
> >>> -            } else {
> >>> -                /* Attach unsuccessful */
> >>> +        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
> >>> +        if (!rte_eth_dev_is_valid_port(new_port_id)) {
> >>> +            new_port_id = DPDK_ETH_PORT_ID_INVALID;
> >>> +        } else {
> >>> +            struct netdev_dpdk *dup_dev;
> >>> +
> >>> +            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> >>> +            if (dup_dev) {
> >>> +                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> >>> +                               "which is already in use by '%s'",
> >>> +                          netdev_get_name(&dev->up), devargs,
> >>> +                          netdev_get_name(&dup_dev->up));
> >>
> >> Why you duplicating this part of code ?
> >> Can we just replace 'rte_eth_dev_get_port_by_name' with
> >> 'netdev_dpdk_get_port_by_devargs' and keep all the other logic ?
> >>
> >> netdev_dpdk_get_port_by_devargs should look like this in this case:
> >>
> >> /* Return the first DPDK port_id matching the devargs pattern. */
> >> static bool netdev_dpdk_get_port_by_devargs(const char *devargs,
> >> dpdk_port_t
> >> *port_id)
> >>     OVS_REQUIRES(dpdk_mutex)
> >> {
> >>     struct rte_dev_iterator iterator;
> >>
> >>     RTE_ETH_FOREACH_MATCHING_DEV (*port_id, devargs, &iterator) {
> >>         /* If a break is done - must call rte_eth_iterator_cleanup. */
> >>         rte_eth_iterator_cleanup(&iterator);
> >>         break;
> >>     }
> >>     return *port_id != DPDK_ETH_PORT_ID_INVALID; }
> >>
> >> netdev_dpdk_process_devargs:
> >>
> >>         if (netdev_dpdk_get_port_by_devargs(devagrs, &new_port_id)
> >>                 || !rte_eth_dev_is_valid_port(new_port_id)) {
> >>             /* Device not found in DPDK, attempt to attach it */
> >>             if (!rte_dev_probe(devargs)
> >>                 && !netdev_dpdk_get_port_by_devargs(devargs,
> &new_port_id)) {
> >>                 /* Attach successful */
> >>                 dev->attached = true;
> >>                 VLOG_INFO("Device '%s' attached to DPDK", devargs);
> >>             } else {
> >>                 /* Attach unsuccessful */
> >>                 new_port_id = DPDK_ETH_PORT_ID_INVALID;
> >>             }
> >>         }
> >>
> >> If it works, I'd prefer this variant.
> >>
> >
> > I don't think it works in case you add two different ports with the same
> representor devargs (or the same PCI address with a PMD supporting
> representors).
> > The extra code is required to avoid this misconfiguration (using the same
> device for two different ports).
> > For example, the following configuration must fail:
> >
> > ovs-vsctl --no-wait add-port br1 port-one -- set Interface port-one
> > type=dpdk options:dpdk-devargs=0000:08:00.0,representor=[1]
> > ovs-vsctl --no-wait add-port br1 port-two -- set Interface port-two
> > type=dpdk options:dpdk-devargs=0000:08:00.0,representor=[1]
> >
> > In such misconfiguration please note the waring you get:
> >                 VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> >                                "which is already in use by '%s'",
> >                           netdev_get_name(&dev->up), devargs,
> >                           netdev_get_name(&dup_dev->up));
> >
> > An interesting question is why didn't we need this extra code in previous
> OVS versions.
> > The reason is that if we called rte_dev_probe() a second time for the same
> PCI device - we got an error. So we were immune against this
> misconfiguration.
> > This was the behavior before the introduction of representors. With
> representor - calling rte_dev_probe() twice for the same devargs will return
> successfully, hence the need for an additional check (extra code).
> 
> But we have the same check on the upper level. If you will not check here,
> the new_port_id will go to netdev_dpdk_set_config(), where we have equal
> check for duplicated ports. Why this doesn't work for you?
> 

I will explain why this does not work. Please note the following code snippet from 
netdev_dpdk_set_config():

1    dpdk_port_t new_port_id = netdev_dpdk_process_devargs(dev, new_devargs, errp)
2   if (!rte_eth_dev_is_valid_port(new_port_id)) {
3       err = EINVAL;
4   } else if (new_port_id == dev->port_id) {
5       /* Already configured, do not reconfigure again */
6       err = 0;
7   } else {
8       err = netdev_dpdk_check_dup_dev(new_port_id, netdev,
9                 new_devargs, errp);
10       if (!err) {
11          int sid = rte_eth_dev_socket_id(new_port_id);
12
13          dev->requested_socket_id = sid < 0 ? SOCKET0 : sid;
14          dev->devargs = xstrdup(new_devargs);
15          dev->port_id = new_port_id;
16          netdev_request_reconfigure(&dev->up);
17          netdev_dpdk_clear_xstats(dev);
18      }
19 }

Line 1: Before representors were introduced the check for a duplicated device was implicitly done inside netdev_dpdk_process_devarags which returned an error (reason: rte_dev_probe() was called twice on the same devargs and failed).
In such a case err is set to EINVAL in lines 2-3 and we do not reach lines 4-19. With the introduction of representors netdev_dpdk_process_devarags() may return the same 'new_port_id' if called again with the same devargs, or may fail as before for PMDs which do not support representors.
In case netdev_dpdk_process_devarags() is successful for the same devargs we end with no errors (err = 0) in lines 4-6 .... hence we ignore the dup dev case which results in accepting the same device for 2 different ports .... which ends in seg fault.
In order to prevent a change in flow logic (dup dev should fail in netdev_dpdk_process_devarags() as always has been) I moved the check of dup_dev inside the call of netdev_dpdk_process_devarags(). 
The check for dup dev in line 8 is redundant (probably has always been so).

> >
> >>
> >>> +        response = xasprintf("Device '%s' is being shared with other "
> >>> +                             "interfaces. Remove them before detaching.",
> >>> +                             argv[1]);
> >>> +        goto error;
> >>> +    }
> >>>
> >>> -    rte_eth_dev_info_get(port_id, &dev_info);
> >>> -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> >>> -        response = xasprintf("Device '%s' can not be detached", argv[1]);
> >>> +    rte_eth_dev_close(port_id);
> >>> +    if (rte_dev_remove(rte_dev) < 0) {
> >>> +        response = xasprintf("Device '%s' can not be removed",
> >>> + argv[1]);
> >>
> >> /removed/detached/
> >> You're using different words for equal situations. This is misleading.
> >>
> >
> > I can change the wording to "detached'. However I think it is misleading in a
> different way:
> > Assuming there are two representors using the same PCI bus - then
> > "detaching" one representor will only close the eth device, but will not
> detach the PCI device until the second representor is closed as well.
> > When all eth devices are closed - please note the message below:
> >    response = xasprintf("All devices shared with device '%s' "
> >                          "have been detached", argv[1]);
> >
> > Having said that - do you still suggest replacing "removed" with
> "detached"?
> 
> I'm suggesting to separate two cases:
> 
>  1. rte_eth_dev_close() executed.
>  2. rte_dev_remove() executed.
> 
> The first one we could call "Port '%s' closed", the second "All devices shared
> with device '%s' detached". And as we have two places where this could
> happen, in both places we should have equal messages for the equal events.
> You may vary the wording, but, IMHO, it should be clear from the log which
> of above events occurred. And we should log every our action (1 and 2) for
> this purpose.
> Otherwise it'll be hard to tell what devices are still attached to OVS.
> 
> >

In function netdev_dpdk_detach() you have one of 2 cases:

1. If you try to detach a device whose PCI is shared with other devides you return an error
response = xasprintf("Device '%s' is being shared with other "
                             "interfaces. Remove them before detaching.",
                             argv[1]);

2. If you detach the last device using the PCI bus - you can really detach. In this case you execute both 
rte_eth_dev_close() and rte_dev_remove() at once and there is no separation between the two.

In this case you  inform.
    response = xasprintf("All devices shared with device '%s' "
                          "have been detached", argv[1]);

It seems IMHO the correct way of informing.
Ilya Maximets Jan. 15, 2019, 9:15 a.m. UTC | #10
On 15.01.2019 11:39, Ophir Munk wrote:
> Hi Ilya,
> Please find comments inline
> 
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets@samsung.com>
>> Sent: Monday, January 14, 2019 12:11 PM
>> To: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org
>> Cc: Asaf Penso <asafp@mellanox.com>; Ian Stokes <ian.stokes@intel.com>;
>> Shahaf Shuler <shahafs@mellanox.com>; Thomas Monjalon
>> <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>; Kevin
>> Traynor <ktraynor@redhat.com>
>> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
>>
>> On 13.01.2019 18:13, Ophir Munk wrote:
>>> Hi Ilya,
>>> Thank you for your review.
>>> Please find comments inline.
>>>
>>>> -----Original Message-----
>>>> From: Ilya Maximets <i.maximets@samsung.com>
>>>> Sent: Thursday, January 10, 2019 1:32 PM
>>>> To: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org
>>>> Cc: Asaf Penso <asafp@mellanox.com>; Ian Stokes
>>>> <ian.stokes@intel.com>; Shahaf Shuler <shahafs@mellanox.com>;
>> Thomas
>>>> Monjalon <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>;
>>>> Kevin Traynor <ktraynor@redhat.com>
>>>> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
>>>>
>>>> On 08.01.2019 12:31, Ophir Munk wrote:
>>>>> Dpdk port representors were introduced in dpdk versions 18.xx.
>>>>> Prior to port representors there was a one-to-one relationship
>>>>> between an rte device (e.g. PCI bus) and an eth device (referenced
>>>>> as dpdk port id in OVS). With port representors the relationship
>>>>> becomes one-to-many rte device to eth devices.
>>>>> For example in [3] there are two devices (representors) using the
>>>>> same PCI physical address 0000:08:00.0:
>>>>> "0000:08:00.0,representor=[3]" and "0000:08:00.0,representor=[5]".
>>>>> This commit handles the new one-to-many relationship. For example,
>>>>> when one of the device port representors in [3] is closed - the PCI
>>>>> bus cannot be detached until the other device port representor is
>>>>> closed as well. OVS remains backward compatible by supporting dpdk
>>>>> legacy PCI ports which do not include port representors.
>>>>> Dpdk port representors related commits are listed in [1]. Dpdk port
>>>>> representors documentation appears in [2]. A sample configuration
>>>>> which uses two representors ports (the output of "ovs-vsctl show"
>>>>> command) is shown in [3].
>>>>> +
>>>>>  static int
>>>>>  vhost_common_construct(struct netdev *netdev)
>>>>>      OVS_REQUIRES(dpdk_mutex)
>>>>> @@ -1353,20 +1373,25 @@ static void
>>>>>  netdev_dpdk_destruct(struct netdev *netdev)  {
>>>>>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>>>> -    struct rte_eth_dev_info dev_info;
>>>>> +    struct rte_device *rte_dev;
>>>>>
>>>>>      ovs_mutex_lock(&dpdk_mutex);
>>>>>
>>>>>      rte_eth_dev_stop(dev->port_id);
>>>>>      dev->started = false;
>>>>> -
>>>>
>>>> Please, keep the empty line.
>>>>
>>>
>>> I will update in v5.
>>>
>>>>>      if (dev->attached) {
>>>>> +        /* Remove the port eth device. */
>>>>>          rte_eth_dev_close(dev->port_id);
>>>>> -        rte_eth_dev_info_get(dev->port_id, &dev_info);
>>>>> -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
>>>>> -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
>>>>
>>>> Please keep this log message for the successful case.
>>>
>>> Please note the line below:
>>>    VLOG_INFO("Device '%s' has been removed", dev->devargs);
>>>
>>> This is the log message for the successful case. With the introduction of
>> representors several eth devices can use one PCI device.
>>> If you remove the first representor - its eth device is closed - but you do
>> not detach the PCI device till the second representor eth device is closed as
>> well.
>>> Would you suggest replacing "removed" with "closed" or "detached"?
>>
>> I understand what you're trying to do here. What I'm trying to say is that in
>> current version, message "has been removed" states that device was closed,
>> but we do not know if we tried to detach it.
>> We now have two levels of getting port out of OVS. First is "removing"
>> (rte_eth_dev_close) and the second is "detaching" (rte_dev_remove, kind
>> of confusing).
>> So, in current code we do not know if we detached the port or we skipped
>> detaching because of existence of other ports sharing the same device.
>>
> 
> In v5 I will inform if a device was removed (e.g. closed) or detached.
> 
>>>>
>>>> I'm not sure if 'netdev_dpdk_get_port_by_devargs' is the correct name
>>>> because it not only finds (gets) the port, but probes (attaches) it.
>>>>
>>>
>>> I think that the function main purpose is to find the port. Probing is optional
>> for that purpose.
>>> I am good with the current name but I am open to any new name you
>> would like to suggest.
>>> Can you please suggest?
>>
>> IMHO, the main purpose of this function is to probe the port because
>> searching will succeed only if we're trying to add duplicated port.
>> 'netdev_dpdk_probe_port_by_devargs' should be better, IMHO.
>>
> 
> In v5 I will update the function name per your suggestion.
> 
>>>
>>>>> +{
>>>>> +    struct rte_dev_iterator iterator;
>>>>> +    dpdk_port_t port_id;
>>>>> +
>>>>> +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
>>>>
>>>
>>>> Invalid spacing around the FOREACH loop.
>>>> Same in all the other places.
>>>
>>> I do not understand this comment. Can you please explain or demo the fix?
>>
>> You just need an extra space before the '(' like this:
>> 	'RTE_ETH_FOREACH_MATCHING_DEV ('
>>
> 
> I will update in v5
> 
>> I'll send a patch for checkpatch script to catch this. Right now it supports only
>> 'FOR_EACH' loops, not the 'FOREACH'.
>>
>>>
>>>>>      dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
>>>>>
>>>>>      if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
>>>>>          new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
>>>>>      } else {
>>>>> -        name = xmemdup0(devargs, strcspn(devargs, ","));
>>>>> -        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
>>>>> -                || !rte_eth_dev_is_valid_port(new_port_id)) {
>>>>> -            /* Device not found in DPDK, attempt to attach it */
>>>>> -            if (!rte_dev_probe(devargs)
>>>>> -                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
>>>>> -                /* Attach successful */
>>>>> -                dev->attached = true;
>>>>> -                VLOG_INFO("Device '%s' attached to DPDK", devargs);
>>>>> -            } else {
>>>>> -                /* Attach unsuccessful */
>>>>> +        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
>>>>> +        if (!rte_eth_dev_is_valid_port(new_port_id)) {
>>>>> +            new_port_id = DPDK_ETH_PORT_ID_INVALID;
>>>>> +        } else {
>>>>> +            struct netdev_dpdk *dup_dev;
>>>>> +
>>>>> +            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
>>>>> +            if (dup_dev) {
>>>>> +                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
>>>>> +                               "which is already in use by '%s'",
>>>>> +                          netdev_get_name(&dev->up), devargs,
>>>>> +                          netdev_get_name(&dup_dev->up));
>>>>
>>>> Why you duplicating this part of code ?
>>>> Can we just replace 'rte_eth_dev_get_port_by_name' with
>>>> 'netdev_dpdk_get_port_by_devargs' and keep all the other logic ?
>>>>
>>>> netdev_dpdk_get_port_by_devargs should look like this in this case:
>>>>
>>>> /* Return the first DPDK port_id matching the devargs pattern. */
>>>> static bool netdev_dpdk_get_port_by_devargs(const char *devargs,
>>>> dpdk_port_t
>>>> *port_id)
>>>>     OVS_REQUIRES(dpdk_mutex)
>>>> {
>>>>     struct rte_dev_iterator iterator;
>>>>
>>>>     RTE_ETH_FOREACH_MATCHING_DEV (*port_id, devargs, &iterator) {
>>>>         /* If a break is done - must call rte_eth_iterator_cleanup. */
>>>>         rte_eth_iterator_cleanup(&iterator);
>>>>         break;
>>>>     }
>>>>     return *port_id != DPDK_ETH_PORT_ID_INVALID; }
>>>>
>>>> netdev_dpdk_process_devargs:
>>>>
>>>>         if (netdev_dpdk_get_port_by_devargs(devagrs, &new_port_id)
>>>>                 || !rte_eth_dev_is_valid_port(new_port_id)) {
>>>>             /* Device not found in DPDK, attempt to attach it */
>>>>             if (!rte_dev_probe(devargs)
>>>>                 && !netdev_dpdk_get_port_by_devargs(devargs,
>> &new_port_id)) {
>>>>                 /* Attach successful */
>>>>                 dev->attached = true;
>>>>                 VLOG_INFO("Device '%s' attached to DPDK", devargs);
>>>>             } else {
>>>>                 /* Attach unsuccessful */
>>>>                 new_port_id = DPDK_ETH_PORT_ID_INVALID;
>>>>             }
>>>>         }
>>>>
>>>> If it works, I'd prefer this variant.
>>>>
>>>
>>> I don't think it works in case you add two different ports with the same
>> representor devargs (or the same PCI address with a PMD supporting
>> representors).
>>> The extra code is required to avoid this misconfiguration (using the same
>> device for two different ports).
>>> For example, the following configuration must fail:
>>>
>>> ovs-vsctl --no-wait add-port br1 port-one -- set Interface port-one
>>> type=dpdk options:dpdk-devargs=0000:08:00.0,representor=[1]
>>> ovs-vsctl --no-wait add-port br1 port-two -- set Interface port-two
>>> type=dpdk options:dpdk-devargs=0000:08:00.0,representor=[1]
>>>
>>> In such misconfiguration please note the waring you get:
>>>                 VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
>>>                                "which is already in use by '%s'",
>>>                           netdev_get_name(&dev->up), devargs,
>>>                           netdev_get_name(&dup_dev->up));
>>>
>>> An interesting question is why didn't we need this extra code in previous
>> OVS versions.
>>> The reason is that if we called rte_dev_probe() a second time for the same
>> PCI device - we got an error. So we were immune against this
>> misconfiguration.
>>> This was the behavior before the introduction of representors. With
>> representor - calling rte_dev_probe() twice for the same devargs will return
>> successfully, hence the need for an additional check (extra code).
>>
>> But we have the same check on the upper level. If you will not check here,
>> the new_port_id will go to netdev_dpdk_set_config(), where we have equal
>> check for duplicated ports. Why this doesn't work for you?
>>
> 
> I will explain why this does not work. Please note the following code snippet from 
> netdev_dpdk_set_config():
> 
> 1    dpdk_port_t new_port_id = netdev_dpdk_process_devargs(dev, new_devargs, errp)
> 2   if (!rte_eth_dev_is_valid_port(new_port_id)) {
> 3       err = EINVAL;
> 4   } else if (new_port_id == dev->port_id) {
> 5       /* Already configured, do not reconfigure again */
> 6       err = 0;
> 7   } else {
> 8       err = netdev_dpdk_check_dup_dev(new_port_id, netdev,
> 9                 new_devargs, errp);
> 10       if (!err) {
> 11          int sid = rte_eth_dev_socket_id(new_port_id);
> 12
> 13          dev->requested_socket_id = sid < 0 ? SOCKET0 : sid;
> 14          dev->devargs = xstrdup(new_devargs);
> 15          dev->port_id = new_port_id;
> 16          netdev_request_reconfigure(&dev->up);
> 17          netdev_dpdk_clear_xstats(dev);
> 18      }
> 19 }
> 
> Line 1: Before representors were introduced the check for a duplicated device was implicitly done inside netdev_dpdk_process_devarags which returned an error (reason: rte_dev_probe() was called twice on the same devargs and failed).
> In such a case err is set to EINVAL in lines 2-3 and we do not reach lines 4-19. With the introduction of representors netdev_dpdk_process_devarags() may return the same 'new_port_id' if called again with the same devargs, or may fail as before for PMDs which do not support representors.
> In case netdev_dpdk_process_devarags() is successful for the same devargs we end with no errors (err = 0) in lines 4-6 .... hence we ignore the dup dev case which results in accepting the same device for 2 different ports .... which ends in seg fault.

But 'dev->port_id' should not be initialized yet at this point.
And the (new_port_id == dev->port_id) should not be true and we
should execute code in 'else' branch (8-18).

> In order to prevent a change in flow logic (dup dev should fail in netdev_dpdk_process_devarags() as always has been) I moved the check of dup_dev inside the call of netdev_dpdk_process_devarags(). 
> The check for dup dev in line 8 is redundant (probably has always been so).
> 
>>>
>>>>
>>>>> +        response = xasprintf("Device '%s' is being shared with other "
>>>>> +                             "interfaces. Remove them before detaching.",
>>>>> +                             argv[1]);
>>>>> +        goto error;
>>>>> +    }
>>>>>
>>>>> -    rte_eth_dev_info_get(port_id, &dev_info);
>>>>> -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
>>>>> -        response = xasprintf("Device '%s' can not be detached", argv[1]);
>>>>> +    rte_eth_dev_close(port_id);
>>>>> +    if (rte_dev_remove(rte_dev) < 0) {
>>>>> +        response = xasprintf("Device '%s' can not be removed",
>>>>> + argv[1]);
>>>>
>>>> /removed/detached/
>>>> You're using different words for equal situations. This is misleading.
>>>>
>>>
>>> I can change the wording to "detached'. However I think it is misleading in a
>> different way:
>>> Assuming there are two representors using the same PCI bus - then
>>> "detaching" one representor will only close the eth device, but will not
>> detach the PCI device until the second representor is closed as well.
>>> When all eth devices are closed - please note the message below:
>>>    response = xasprintf("All devices shared with device '%s' "
>>>                          "have been detached", argv[1]);
>>>
>>> Having said that - do you still suggest replacing "removed" with
>> "detached"?
>>
>> I'm suggesting to separate two cases:
>>
>>  1. rte_eth_dev_close() executed.
>>  2. rte_dev_remove() executed.
>>
>> The first one we could call "Port '%s' closed", the second "All devices shared
>> with device '%s' detached". And as we have two places where this could
>> happen, in both places we should have equal messages for the equal events.
>> You may vary the wording, but, IMHO, it should be clear from the log which
>> of above events occurred. And we should log every our action (1 and 2) for
>> this purpose.
>> Otherwise it'll be hard to tell what devices are still attached to OVS.
>>
>>>
> 
> In function netdev_dpdk_detach() you have one of 2 cases:
> 
> 1. If you try to detach a device whose PCI is shared with other devides you return an error
> response = xasprintf("Device '%s' is being shared with other "
>                              "interfaces. Remove them before detaching.",
>                              argv[1]);
> 
> 2. If you detach the last device using the PCI bus - you can really detach. In this case you execute both 
> rte_eth_dev_close() and rte_dev_remove() at once and there is no separation between the two.
> 
> In this case you  inform.
>     response = xasprintf("All devices shared with device '%s' "
>                           "have been detached", argv[1]);
> 
> It seems IMHO the correct way of informing.
> 
>
Ophir Munk Jan. 16, 2019, 5:36 p.m. UTC | #11
Hi Ilya and thanks for your comments.
Please see inline

> -----Original Message-----
> From: Ilya Maximets <i.maximets@samsung.com>
> Sent: Tuesday, January 15, 2019 11:16 AM
> To: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org
> Cc: Asaf Penso <asafp@mellanox.com>; Ian Stokes <ian.stokes@intel.com>;
> Shahaf Shuler <shahafs@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>; Kevin Traynor
> <ktraynor@redhat.com>
> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
> 
> >>
> >> But we have the same check on the upper level. If you will not check
> >> here, the new_port_id will go to netdev_dpdk_set_config(), where we
> >> have equal check for duplicated ports. Why this doesn't work for you?
> >>
> >
> > I will explain why this does not work. Please note the following code
> > snippet from
> > netdev_dpdk_set_config():
> >
> > 1    dpdk_port_t new_port_id = netdev_dpdk_process_devargs(dev,
> new_devargs, errp)
> > 2   if (!rte_eth_dev_is_valid_port(new_port_id)) {
> > 3       err = EINVAL;
> > 4   } else if (new_port_id == dev->port_id) {
> > 5       /* Already configured, do not reconfigure again */
> > 6       err = 0;
> > 7   } else {
> > 8       err = netdev_dpdk_check_dup_dev(new_port_id, netdev,
> > 9                 new_devargs, errp);
> > 10       if (!err) {
> > 11          int sid = rte_eth_dev_socket_id(new_port_id);
> > 12
> > 13          dev->requested_socket_id = sid < 0 ? SOCKET0 : sid;
> > 14          dev->devargs = xstrdup(new_devargs);
> > 15          dev->port_id = new_port_id;
> > 16          netdev_request_reconfigure(&dev->up);
> > 17          netdev_dpdk_clear_xstats(dev);
> > 18      }
> > 19 }
> >
> > Line 1: Before representors were introduced the check for a duplicated
> device was implicitly done inside netdev_dpdk_process_devarags which
> returned an error (reason: rte_dev_probe() was called twice on the same
> devargs and failed).
> > In such a case err is set to EINVAL in lines 2-3 and we do not reach lines 4-19.
> With the introduction of representors netdev_dpdk_process_devarags() may
> return the same 'new_port_id' if called again with the same devargs, or may
> fail as before for PMDs which do not support representors.
> > In case netdev_dpdk_process_devarags() is successful for the same devargs
> we end with no errors (err = 0) in lines 4-6 .... hence we ignore the dup dev
> case which results in accepting the same device for 2 different ports .... which
> ends in seg fault.
> 
> But 'dev->port_id' should not be initialized yet at this point.
> And the (new_port_id == dev->port_id) should not be true and we should
> execute code in 'else' branch (8-18).
> 

You are right. We can skip checking dup_dev inside netdev_dpdk_process_devarags() and leave it only in netdev_dpdk_set_config(). 
There is still a new changes needed: do not assign dev->attached=true inside netdev_dpdk_process_devarags(). Delay this assignment only after a successful call to check_dup_dev.

I will add the mentioned changes in v6

Regards,
Ophir
Ilya Maximets Jan. 16, 2019, 5:49 p.m. UTC | #12
On 16.01.2019 20:36, Ophir Munk wrote:
> Hi Ilya and thanks for your comments.
> Please see inline
> 
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets@samsung.com>
>> Sent: Tuesday, January 15, 2019 11:16 AM
>> To: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org
>> Cc: Asaf Penso <asafp@mellanox.com>; Ian Stokes <ian.stokes@intel.com>;
>> Shahaf Shuler <shahafs@mellanox.com>; Thomas Monjalon
>> <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>; Kevin Traynor
>> <ktraynor@redhat.com>
>> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
>>
>>>>
>>>> But we have the same check on the upper level. If you will not check
>>>> here, the new_port_id will go to netdev_dpdk_set_config(), where we
>>>> have equal check for duplicated ports. Why this doesn't work for you?
>>>>
>>>
>>> I will explain why this does not work. Please note the following code
>>> snippet from
>>> netdev_dpdk_set_config():
>>>
>>> 1    dpdk_port_t new_port_id = netdev_dpdk_process_devargs(dev,
>> new_devargs, errp)
>>> 2   if (!rte_eth_dev_is_valid_port(new_port_id)) {
>>> 3       err = EINVAL;
>>> 4   } else if (new_port_id == dev->port_id) {
>>> 5       /* Already configured, do not reconfigure again */
>>> 6       err = 0;
>>> 7   } else {
>>> 8       err = netdev_dpdk_check_dup_dev(new_port_id, netdev,
>>> 9                 new_devargs, errp);
>>> 10       if (!err) {
>>> 11          int sid = rte_eth_dev_socket_id(new_port_id);
>>> 12
>>> 13          dev->requested_socket_id = sid < 0 ? SOCKET0 : sid;
>>> 14          dev->devargs = xstrdup(new_devargs);
>>> 15          dev->port_id = new_port_id;
>>> 16          netdev_request_reconfigure(&dev->up);
>>> 17          netdev_dpdk_clear_xstats(dev);
>>> 18      }
>>> 19 }
>>>
>>> Line 1: Before representors were introduced the check for a duplicated
>> device was implicitly done inside netdev_dpdk_process_devarags which
>> returned an error (reason: rte_dev_probe() was called twice on the same
>> devargs and failed).
>>> In such a case err is set to EINVAL in lines 2-3 and we do not reach lines 4-19.
>> With the introduction of representors netdev_dpdk_process_devarags() may
>> return the same 'new_port_id' if called again with the same devargs, or may
>> fail as before for PMDs which do not support representors.
>>> In case netdev_dpdk_process_devarags() is successful for the same devargs
>> we end with no errors (err = 0) in lines 4-6 .... hence we ignore the dup dev
>> case which results in accepting the same device for 2 different ports .... which
>> ends in seg fault.
>>
>> But 'dev->port_id' should not be initialized yet at this point.
>> And the (new_port_id == dev->port_id) should not be true and we should
>> execute code in 'else' branch (8-18).
>>
> 
> You are right. We can skip checking dup_dev inside netdev_dpdk_process_devarags() and leave it only in netdev_dpdk_set_config(). 
> There is still a new changes needed: do not assign dev->attached=true inside netdev_dpdk_process_devarags(). Delay this assignment only after a successful call to check_dup_dev.

You should not move assignment to the upper layer because it should be
set to 'true' only if device was probed by the 'rte_dev_probe'.
In case the device was already probed before (by the whitelist) the
value of 'dev->attached' must be 'false'.
You need to move the probing logic out of 'netdev_dpdk_probe_port_by_devargs'
instead. And make it 'netdev_dpdk_get_port_by_devargs'.

It looks like the logic around 'attached' variable is messed up with this
patch set. We need to recheck it carefully.

> 
> I will add the mentioned changes in v6
> 
> Regards,
> Ophir
>
Ophir Munk Jan. 17, 2019, 1:45 p.m. UTC | #13
> -----Original Message-----
> From: Ilya Maximets <i.maximets@samsung.com>
> Sent: Wednesday, January 16, 2019 7:50 PM
> To: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org
> Cc: Asaf Penso <asafp@mellanox.com>; Ian Stokes <ian.stokes@intel.com>;
> Shahaf Shuler <shahafs@mellanox.com>; Thomas Monjalon
> <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>; Kevin Traynor
> <ktraynor@redhat.com>
> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
> 
> On 16.01.2019 20:36, Ophir Munk wrote:
> > Hi Ilya and thanks for your comments.
> > Please see inline
> >
> >> -----Original Message-----
> >> From: Ilya Maximets <i.maximets@samsung.com>
> >> Sent: Tuesday, January 15, 2019 11:16 AM
> >> To: Ophir Munk <ophirmu@mellanox.com>; ovs-dev@openvswitch.org
> >> Cc: Asaf Penso <asafp@mellanox.com>; Ian Stokes
> >> <ian.stokes@intel.com>; Shahaf Shuler <shahafs@mellanox.com>; Thomas
> >> Monjalon <thomas@monjalon.net>; Olga Shern <olgas@mellanox.com>;
> >> Kevin Traynor <ktraynor@redhat.com>
> >> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
> >>
> >>>>
> >>>> But we have the same check on the upper level. If you will not
> >>>> check here, the new_port_id will go to netdev_dpdk_set_config(),
> >>>> where we have equal check for duplicated ports. Why this doesn't work
> for you?
> >>>>
> >>>
> >>> I will explain why this does not work. Please note the following
> >>> code snippet from
> >>> netdev_dpdk_set_config():
> >>>
> >>> 1    dpdk_port_t new_port_id = netdev_dpdk_process_devargs(dev,
> >> new_devargs, errp)
> >>> 2   if (!rte_eth_dev_is_valid_port(new_port_id)) {
> >>> 3       err = EINVAL;
> >>> 4   } else if (new_port_id == dev->port_id) {
> >>> 5       /* Already configured, do not reconfigure again */
> >>> 6       err = 0;
> >>> 7   } else {
> >>> 8       err = netdev_dpdk_check_dup_dev(new_port_id, netdev,
> >>> 9                 new_devargs, errp);
> >>> 10       if (!err) {
> >>> 11          int sid = rte_eth_dev_socket_id(new_port_id);
> >>> 12
> >>> 13          dev->requested_socket_id = sid < 0 ? SOCKET0 : sid;
> >>> 14          dev->devargs = xstrdup(new_devargs);
> >>> 15          dev->port_id = new_port_id;
> >>> 16          netdev_request_reconfigure(&dev->up);
> >>> 17          netdev_dpdk_clear_xstats(dev);
> >>> 18      }
> >>> 19 }
> >>>
> >>> Line 1: Before representors were introduced the check for a
> >>> duplicated
> >> device was implicitly done inside netdev_dpdk_process_devarags which
> >> returned an error (reason: rte_dev_probe() was called twice on the
> >> same devargs and failed).
> >>> In such a case err is set to EINVAL in lines 2-3 and we do not reach lines 4-
> 19.
> >> With the introduction of representors netdev_dpdk_process_devarags()
> >> may return the same 'new_port_id' if called again with the same
> >> devargs, or may fail as before for PMDs which do not support representors.
> >>> In case netdev_dpdk_process_devarags() is successful for the same
> >>> devargs
> >> we end with no errors (err = 0) in lines 4-6 .... hence we ignore the
> >> dup dev case which results in accepting the same device for 2
> >> different ports .... which ends in seg fault.
> >>
> >> But 'dev->port_id' should not be initialized yet at this point.
> >> And the (new_port_id == dev->port_id) should not be true and we
> >> should execute code in 'else' branch (8-18).
> >>
> >
> > You are right. We can skip checking dup_dev inside
> netdev_dpdk_process_devarags() and leave it only in
> netdev_dpdk_set_config().
> > There is still a new changes needed: do not assign dev->attached=true inside
> netdev_dpdk_process_devarags(). Delay this assignment only after a successful
> call to check_dup_dev.
> 
> You should not move assignment to the upper layer because it should be set to
> 'true' only if device was probed by the 'rte_dev_probe'.
> In case the device was already probed before (by the whitelist) the value of
> 'dev->attached' must be 'false'.
> You need to move the probing logic out of
> 'netdev_dpdk_probe_port_by_devargs'
> instead. And make it 'netdev_dpdk_get_port_by_devargs'.
> 
> It looks like the logic around 'attached' variable is messed up with this patch
> set. We need to recheck it carefully.
> 

Thanks for highlighting it. In v6 the 'attached' variable is assigned as required.

> >
> > I will add the mentioned changes in v6
> >
> > Regards,
> > Ophir
> >
diff mbox series

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 320422b..b6e903f 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1218,6 +1218,26 @@  dpdk_dev_parse_name(const char dev_name[], const char prefix[],
     }
 }
 
+/* Get the number of OVS interfaces which have the same DPDK
+ * rte device (e.g. same pci bus address).
+ * FIXME: avoid direct access to DPDK internal array rte_eth_devices.
+ */
+static int
+netdev_dpdk_get_num_ports(struct rte_device *device)
+    OVS_REQUIRES(dpdk_mutex)
+{
+    struct netdev_dpdk *dev;
+    int count = 0;
+
+    LIST_FOR_EACH (dev, list_node, &dpdk_list) {
+        if (rte_eth_devices[dev->port_id].device == device
+        && rte_eth_devices[dev->port_id].state != RTE_ETH_DEV_UNUSED) {
+            count++;
+        }
+    }
+    return count;
+}
+
 static int
 vhost_common_construct(struct netdev *netdev)
     OVS_REQUIRES(dpdk_mutex)
@@ -1353,20 +1373,25 @@  static void
 netdev_dpdk_destruct(struct netdev *netdev)
 {
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
-    struct rte_eth_dev_info dev_info;
+    struct rte_device *rte_dev;
 
     ovs_mutex_lock(&dpdk_mutex);
 
     rte_eth_dev_stop(dev->port_id);
     dev->started = false;
-
     if (dev->attached) {
+        /* Remove the port eth device. */
         rte_eth_dev_close(dev->port_id);
-        rte_eth_dev_info_get(dev->port_id, &dev_info);
-        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
-            VLOG_INFO("Device '%s' has been detached", dev->devargs);
-        } else {
-            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
+        VLOG_INFO("Device '%s' has been removed", dev->devargs);
+        /* If this is the last port_id using this rte device
+         * remove this rte device and all its eth devices.
+         * FIXME: avoid direct access to DPDK internal array rte_eth_devices.
+         */
+        rte_dev = rte_eth_devices[dev->port_id].device;
+        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
+            if (rte_dev_remove(rte_dev) < 0) {
+                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
+            }
         }
     }
 
@@ -1632,8 +1657,37 @@  netdev_dpdk_get_port_by_mac(const char *mac_str)
     return DPDK_ETH_PORT_ID_INVALID;
 }
 
+/* Return the first DPDK port_id matching the devargs pattern. */
+static dpdk_port_t
+netdev_dpdk_get_port_by_devargs(const char *devargs)
+{
+    struct rte_dev_iterator iterator;
+    dpdk_port_t port_id;
+
+    RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
+        /* If a break is done - must call rte_eth_iterator_cleanup. */
+        rte_eth_iterator_cleanup(&iterator);
+        break;
+    }
+    /* The port may be invalid if it was not probed */
+    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
+        if (rte_dev_probe(devargs)) {
+            port_id = DPDK_ETH_PORT_ID_INVALID;
+        } else {
+            RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
+                /* If a break is done - must call rte_eth_iterator_cleanup. */
+                rte_eth_iterator_cleanup(&iterator);
+                break;
+            }
+        }
+    }
+
+    return port_id;
+}
+
 /*
- * Normally, a PCI id is enough for identifying a specific DPDK port.
+ * Normally, a PCI id (optionally followed by a representor number)
+ * is enough for identifying a specific DPDK port.
  * However, for some NICs having multiple ports sharing the same PCI
  * id, using PCI id won't work then.
  *
@@ -1641,34 +1695,42 @@  netdev_dpdk_get_port_by_mac(const char *mac_str)
  *
  * Note that the compatibility is fully kept: user can still use the
  * PCI id for adding ports (when it's enough for them).
+ *
+ * In order to avoid clang errors add OVS_REQUIRES(dpdk_mutex) to this function
+ * since it is required by netdev_dpdk_lookup_by_port_id() while the dpdk_mutex
+ * is taken outside of this function.
  */
 static dpdk_port_t
 netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
                             const char *devargs, char **errp)
+    OVS_REQUIRES(dpdk_mutex)
 {
-    char *name;
     dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
 
     if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
         new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
     } else {
-        name = xmemdup0(devargs, strcspn(devargs, ","));
-        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
-                || !rte_eth_dev_is_valid_port(new_port_id)) {
-            /* Device not found in DPDK, attempt to attach it */
-            if (!rte_dev_probe(devargs)
-                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
-                /* Attach successful */
-                dev->attached = true;
-                VLOG_INFO("Device '%s' attached to DPDK", devargs);
-            } else {
-                /* Attach unsuccessful */
+        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
+        if (!rte_eth_dev_is_valid_port(new_port_id)) {
+            new_port_id = DPDK_ETH_PORT_ID_INVALID;
+        } else {
+            struct netdev_dpdk *dup_dev;
+
+            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
+            if (dup_dev) {
+                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
+                               "which is already in use by '%s'",
+                          netdev_get_name(&dev->up), devargs,
+                          netdev_get_name(&dup_dev->up));
                 new_port_id = DPDK_ETH_PORT_ID_INVALID;
+            } else {
+                /* Device successfully found. */
+                dev->attached = true;
+                VLOG_INFO("Device '%s' attached to DPDK port %d",
+                          devargs, new_port_id);
             }
         }
-        free(name);
     }
-
     if (new_port_id == DPDK_ETH_PORT_ID_INVALID) {
         VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK", devargs);
     }
@@ -1761,7 +1823,7 @@  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
                 dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
                 if (dup_dev) {
                     VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
-                                        "which is already in use by '%s'",
+                                  "which is already in use by '%s'",
                                   netdev_get_name(netdev), new_devargs,
                                   netdev_get_name(&dup_dev->up));
                     err = EADDRINUSE;
@@ -3236,11 +3298,17 @@  netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
     char *response;
     dpdk_port_t port_id;
     struct netdev_dpdk *dev;
-    struct rte_eth_dev_info dev_info;
+    struct rte_device *rte_dev;
+    struct rte_dev_iterator iterator;
 
     ovs_mutex_lock(&dpdk_mutex);
 
-    if (rte_eth_dev_get_port_by_name(argv[1], &port_id)) {
+    RTE_ETH_FOREACH_MATCHING_DEV(port_id, argv[1], &iterator) {
+        /* If a break is done - must call rte_eth_iterator_cleanup. */
+        rte_eth_iterator_cleanup(&iterator);
+        break;
+    }
+    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
         response = xasprintf("Device '%s' not found in DPDK", argv[1]);
         goto error;
     }
@@ -3253,15 +3321,23 @@  netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
         goto error;
     }
 
-    rte_eth_dev_close(port_id);
+    /* FIXME: avoid direct access to DPDK internal array rte_eth_devices. */
+    rte_dev = rte_eth_devices[port_id].device;
+    if (netdev_dpdk_get_num_ports(rte_dev)) {
+        response = xasprintf("Device '%s' is being shared with other "
+                             "interfaces. Remove them before detaching.",
+                             argv[1]);
+        goto error;
+    }
 
-    rte_eth_dev_info_get(port_id, &dev_info);
-    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
-        response = xasprintf("Device '%s' can not be detached", argv[1]);
+    rte_eth_dev_close(port_id);
+    if (rte_dev_remove(rte_dev) < 0) {
+        response = xasprintf("Device '%s' can not be removed", argv[1]);
         goto error;
     }
 
-    response = xasprintf("Device '%s' has been detached", argv[1]);
+    response = xasprintf("All devices shared with device '%s' "
+                          "have been detached", argv[1]);
 
     ovs_mutex_unlock(&dpdk_mutex);
     unixctl_command_reply(conn, response);