diff mbox series

[ovs-dev,v3,2/2] netdev-dpdk: Add option to configure VF MAC address.

Message ID fc45649e343b68a34dff4daef9141f0c426f2c5a.1600269209.git.grive@u256.net
State Changes Requested
Headers show
Series netdev-dpdk: support changing VF MAC | expand

Commit Message

Gaetan Rivet Sept. 16, 2020, 3:17 p.m. UTC
In some cloud topologies, using DPDK VF representors in guest requires
configuring a VF before it is assigned to the guest.

A first basic option for such configuration is setting the VF MAC
address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface
table.

This option can be used as such:

   $ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk \
      options:dpdk-vf-mac=00:11:22:33:44:55

Signed-off-by: Gaetan Rivet <grive@u256.net>
Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eli Britstein <elibr@nvidia.com>
---
 Documentation/topics/dpdk/phy.rst | 22 ++++++++++++
 NEWS                              |  2 ++
 lib/netdev-dpdk.c                 | 60 +++++++++++++++++++++++++++++++
 vswitchd/vswitch.xml              | 13 +++++++
 4 files changed, 97 insertions(+)

Comments

Kevin Traynor Oct. 20, 2020, 11:15 a.m. UTC | #1
On 16/09/2020 16:17, Gaetan Rivet wrote:
> In some cloud topologies, using DPDK VF representors in guest requires
> configuring a VF before it is assigned to the guest.
> 
> A first basic option for such configuration is setting the VF MAC
> address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface
> table.
> 
> This option can be used as such:
> 
>    $ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk \
>       options:dpdk-vf-mac=00:11:22:33:44:55
> 

If I got it right, it doesn't seem like it solves any issues by limiting
to representor, but is just an attempt to limit some of the edge cases
around bifurcated devices to the devices where the requested
functionality is really needed now.

This limitation has some costs...

We will now have, mac, mac_in_use, dpdk-vf-mac, configured_hwaddr,
requested_hwaddr for the user to understand. I think it can be confusing.

'ovs-vsctl show' will show the dpdk-vf-mac, which is the main one we
document, but it's not clear if it was successful or not. (yes there is
a log entry, but it is a separate place).

As it is specific for representors, if we ever want to allow setting of
mac on any non representor DPDK ports, we have an awkward user
interface. We would need to make another 'dpdk-mac' type option, or we
decide then to allow use mac in interface table but that it doesn't work
for representors, or there are two ways to set for representors etc.
None of this seems great.

My feeling is that it is making things complicated for the user with
this many knobs, where we could just have mac and mac_in_use, live with
the edge cases (as Gaetan pointed out, we already do for MTU) and we
know as well if we ever need to extend further, the user interface will
stay simple. Just my 2C, maybe there's a good argument why we can't do
it like this.

Few comments on the code as it is now below.


> Signed-off-by: Gaetan Rivet <grive@u256.net>
> Suggested-by: Ilya Maximets <i.maximets@ovn.org>
> Acked-by: Eli Britstein <elibr@nvidia.com>
> ---
>  Documentation/topics/dpdk/phy.rst | 22 ++++++++++++
>  NEWS                              |  2 ++
>  lib/netdev-dpdk.c                 | 60 +++++++++++++++++++++++++++++++
>  vswitchd/vswitch.xml              | 13 +++++++
>  4 files changed, 97 insertions(+)
> 
> diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst
> index 38e52c8de..73dcb7c28 100644
> --- a/Documentation/topics/dpdk/phy.rst
> +++ b/Documentation/topics/dpdk/phy.rst
> @@ -379,6 +379,28 @@ an eth device whose mac address is ``00:11:22:33:44:55``::
>      $ ovs-vsctl add-port br0 dpdk-mac -- set Interface dpdk-mac type=dpdk \
>         options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"
>  
> +Representor specific configuration
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +In some topologies, a VF must be configured before being assigned to a
> +guest (VM) machine.  This configuration is done through VF-specific fields
> +in the ``options`` column of the Interface table.
> +
> +.. important::
> +
> +   Some DPDK port use `bifurcated drivers <bifurcated-drivers>`__,
> +   which means that a kernel netdevice remains when Open vSwitch is stopped.
> +
> +   In such case, any configuration applied to a VF would remain set on the
> +   kernel netdevice, and be inherited from it when Open vSwitch is restarted,
> +   even if the options described in this section are unset from Open vSwitch.
> +
> +.. _bifurcated-drivers: http://doc.dpdk.org/guides/linux_gsg/linux_drivers.html#bifurcated-driver
> +
> +- Configure the VF MAC address::
> +
> +    $ ovs-vsctl set Interface dpdk-rep0 options:dpdk-vf-mac=00:11:22:33:44:55
> +

As mentioned above, if for some reason setting the mac fails there will
be warning in the logs, but 'ovs-vsctl show' will still show the
dpdk-vf-mac that was not applied and it's quite prominent. Also, there
are mac, mac_in_use, dpdk-vf-mac, configured_hwaddr, requested_hwaddr.

Suggest to add a small summary here for each one as it relates
representors and make clear to users,
1. how to find the mac the user requested
2. how to find the current mac

>  Jumbo Frames
>  ------------
>  
> diff --git a/NEWS b/NEWS
> index 2f67d5047..5ece29474 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -24,6 +24,8 @@ v2.14.0 - 17 Aug 2020

2.15....hopefully :-)

>         to list and change log levels in DPDK components.
>       * Vhost-user Dequeue zero-copy support is deprecated and will be removed
>         in the next release.
> +     * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
> +       that allows configuring the MAC address of a VF representor.
>     - Linux datapath:
>       * Support for kernel versions up to 5.5.x.
>     - AF_XDP:
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 4ef7f78a6..060348369 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -523,6 +523,9 @@ struct netdev_dpdk {
>           * otherwise interrupt mode is used. */
>          bool requested_lsc_interrupt_mode;
>          bool lsc_interrupt_mode;
> +
> +        /* VF configuration. */
> +        struct eth_addr requested_hwaddr;
>      );
>  
>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> @@ -1685,6 +1688,16 @@ out:
>      return ret;
>  }
>  
> +static bool
> +dpdk_port_is_representor(struct netdev_dpdk *dev)
> +    OVS_REQUIRES(dev->mutex)
> +{
> +    struct rte_eth_dev_info dev_info;
> +
> +    rte_eth_dev_info_get(dev->port_id, &dev_info);
> +    return (*dev_info.dev_flags) & RTE_ETH_DEV_REPRESENTOR;
> +}
> +
>  static int
>  netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
>  {
> @@ -1719,6 +1732,13 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
>          }
>          smap_add(args, "lsc_interrupt_mode",
>                   dev->lsc_interrupt_mode ? "true" : "false");
> +
> +        if (dpdk_port_is_representor(dev)) {
> +            smap_add_format(args, "requested_hwaddr", ETH_ADDR_FMT,
> +                            ETH_ADDR_ARGS(dev->requested_hwaddr));
> +            smap_add_format(args, "configured_hwaddr", ETH_ADDR_FMT,
> +                            ETH_ADDR_ARGS(dev->hwaddr));

'*mac*' is used elsewhere (mac, mac_in_use, dpdk-vf-mac), think it's
better to keep consistent

> +        }
>      }
>      ovs_mutex_unlock(&dev->mutex);
>  
> @@ -1898,6 +1918,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>          {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
>      };
>      const char *new_devargs;
> +    const char *vf_mac;
>      int err = 0;
>  
>      ovs_mutex_lock(&dpdk_mutex);
> @@ -1968,6 +1989,36 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>          goto out;
>      }
>  
> +    vf_mac = smap_get(args, "dpdk-vf-mac");
> +    if (vf_mac) {
> +        struct eth_addr mac;
> +
> +        err = EINVAL;
> +
> +        if (!dpdk_port_is_representor(dev)) {
> +            VLOG_ERR_BUF(errp, "'%s' is trying to set the VF MAC '%s' "
> +                         "but 'options:dpdk-vf-mac' is only supported for "
> +                         "VF representors.",
> +                         netdev_get_name(netdev), vf_mac);

Warnings here seems more in line with the other configs that cannot be
applied.

> +            goto out;
> +        }
> +        if (!eth_addr_from_string(vf_mac, &mac)) {
> +            VLOG_ERR_BUF(errp, "interface '%s': cannot parse VF MAC '%s'",
> +                         netdev_get_name(netdev), vf_mac);

as above

> +            goto out;
> +        }
> +        if (eth_addr_is_multicast(mac)) {
> +            VLOG_ERR_BUF(errp,
> +                         "interface '%s': cannot set VF MAC to multicast "
> +                         "address", netdev_get_name(netdev));

as above

> +            goto out;
> +        }
> +        if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
> +            dev->requested_hwaddr = mac;
> +            netdev_request_reconfigure(netdev);
> +        }
> +    }
> +
>      lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
>      if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
>          dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
> @@ -4938,6 +4989,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>          && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>          && dev->rxq_size == dev->requested_rxq_size
>          && dev->txq_size == dev->requested_txq_size
> +        && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>          && dev->socket_id == dev->requested_socket_id
>          && dev->started && !dev->reset_needed) {
>          /* Reconfiguration is unnecessary */
> @@ -4969,6 +5021,14 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>      dev->txq_size = dev->requested_txq_size;
>  
>      rte_free(dev->tx_q);
> +
> +    if (!eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)) {

Looks like if dpdk-vf-mac is not set, after initialization there will be
a configured hwaddr, and requested_hwaddr will be all zeros.

If this function is called because of some other config change, then
hwaddr !=  requested_hwaddr, and it will attempt to set the mac to the
requested_hwaddr of all zeros. This may also fail causing the
reconfigure to fail for other configs items.

(I admit, I hacked a phy port dev_info to say it was a representor, when
playing with this but the issue seems valid)


> +        err = netdev_dpdk_set_etheraddr__(dev, dev->requested_hwaddr);
> +        if (err) {
> +            goto out;
> +        }
> +    }
> +
>      err = dpdk_eth_dev_init(dev);
>      if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
>          netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 81c84927f..c0bf03c95 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3280,6 +3280,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>            descriptors will be used by default.
>          </p>
>        </column>
> +
> +      <column name="options" key="dpdk-vf-mac">
> +        <p>
> +          Ethernet address to set for this VF interface.  If unset then the
> +          default MAC address is used:
> +        </p>
> +        <ul>
> +          <li>For most drivers, the default MAC address assigned by
> +          their hardware.</li><li>For bifurcated drivers, the MAC currently
> +          used by the kernel netdevice.</li>

nit: the formatting is a little different compared to the other
<li></li> usage, newline/indent etc.

thanks,
Kevin.

> +        </ul>
> +        <p>This option may only be used with dpdk VF representors.</p>
> +      </column>
>      </group>
>  
>      <group title="EMC (Exact Match Cache) Configuration">
>
Ilya Maximets Oct. 27, 2020, 2:03 p.m. UTC | #2
On 10/20/20 1:15 PM, Kevin Traynor wrote:
> On 16/09/2020 16:17, Gaetan Rivet wrote:
>> In some cloud topologies, using DPDK VF representors in guest requires
>> configuring a VF before it is assigned to the guest.
>>
>> A first basic option for such configuration is setting the VF MAC
>> address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface
>> table.
>>
>> This option can be used as such:
>>
>>    $ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk \
>>       options:dpdk-vf-mac=00:11:22:33:44:55
>>
> 
> If I got it right, it doesn't seem like it solves any issues by limiting
> to representor, but is just an attempt to limit some of the edge cases
> around bifurcated devices to the devices where the requested
> functionality is really needed now.
> 
> This limitation has some costs...
> 
> We will now have, mac, mac_in_use, dpdk-vf-mac, configured_hwaddr,
> requested_hwaddr for the user to understand. I think it can be confusing.

To be clear, 'configured_hwaddr/requested_hwaddr' type of report should not
exist at least in the output of 'get_config()' function.  Same applies
to all existing '{configured,requested}_something' reports.
'get_config()' intended to report things that could be copied and passed directly
to database.  OTOH, 'get_status()' should report what is an actual device
status, i.e. 'dpdk-vf-mac' should be in 'get_config()' and 'configured_hwaddr'
should be in 'get_status()', but it should be named differently, e.g. just
'dpdk-vf-mac' and the callback it placed in already says that it's actually
'configured'.  In short:
  - 'dpdk-vf-mac' from the 'get_config()' should report dev->requested_hwaddr.
  - 'dpdk-vf-mac' from the 'get_status()' should report dev->hwaddr.
Of course, all this should be reported only if device is a representor.


> 
> 'ovs-vsctl show' will show the dpdk-vf-mac, which is the main one we
> document, but it's not clear if it was successful or not. (yes there is
> a log entry, but it is a separate place).
> 
> As it is specific for representors, if we ever want to allow setting of
> mac on any non representor DPDK ports, we have an awkward user
> interface. We would need to make another 'dpdk-mac' type option, or we
> decide then to allow use mac in interface table but that it doesn't work
> for representors, or there are two ways to set for representors etc.
> None of this seems great.
> 
> My feeling is that it is making things complicated for the user with
> this many knobs, where we could just have mac and mac_in_use, live with
> the edge cases (as Gaetan pointed out, we already do for MTU) and we
> know as well if we ever need to extend further, the user interface will
> stay simple. Just my 2C, maybe there's a good argument why we can't do
> it like this.

Setting mac for physical device has lots of split-brain issues and unclear
ways how and when undo changes as we do not know what was original mac or
what to do if there was no mac at all at the start.  Moreover, some devices
will not allow setting mac to it's original value (e.g. all zeroes).
One more point is that vf and representor itself are different devices and
these devices could have different mac addresses.  For now we agreed that
DPDK api works for representors in a way that it always changes parameters
of a virtual function, not representors', but that is not a very clear thing
for all users (e.g. for DPDK developers), so 'vf' in 'dpdk-vf-mac' specifically
points to that difference.

After a long discussion we decided to make this option as specific as possible,
so users will not try to use it to configure anything else beside mac of a
virtual function.  All restrictions and downsides of configuration of vf mac
was/should be documented.


> 
> Few comments on the code as it is now below.
> 
> 
>> Signed-off-by: Gaetan Rivet <grive@u256.net>
>> Suggested-by: Ilya Maximets <i.maximets@ovn.org>
>> Acked-by: Eli Britstein <elibr@nvidia.com>
>> ---
>>  Documentation/topics/dpdk/phy.rst | 22 ++++++++++++
>>  NEWS                              |  2 ++
>>  lib/netdev-dpdk.c                 | 60 +++++++++++++++++++++++++++++++
>>  vswitchd/vswitch.xml              | 13 +++++++
>>  4 files changed, 97 insertions(+)
>>
>> diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst
>> index 38e52c8de..73dcb7c28 100644
>> --- a/Documentation/topics/dpdk/phy.rst
>> +++ b/Documentation/topics/dpdk/phy.rst
>> @@ -379,6 +379,28 @@ an eth device whose mac address is ``00:11:22:33:44:55``::
>>      $ ovs-vsctl add-port br0 dpdk-mac -- set Interface dpdk-mac type=dpdk \
>>         options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"
>>  
>> +Representor specific configuration
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +In some topologies, a VF must be configured before being assigned to a
>> +guest (VM) machine.  This configuration is done through VF-specific fields
>> +in the ``options`` column of the Interface table.
>> +
>> +.. important::
>> +
>> +   Some DPDK port use `bifurcated drivers <bifurcated-drivers>`__,
>> +   which means that a kernel netdevice remains when Open vSwitch is stopped.
>> +
>> +   In such case, any configuration applied to a VF would remain set on the
>> +   kernel netdevice, and be inherited from it when Open vSwitch is restarted,
>> +   even if the options described in this section are unset from Open vSwitch.
>> +
>> +.. _bifurcated-drivers: http://doc.dpdk.org/guides/linux_gsg/linux_drivers.html#bifurcated-driver
>> +
>> +- Configure the VF MAC address::
>> +
>> +    $ ovs-vsctl set Interface dpdk-rep0 options:dpdk-vf-mac=00:11:22:33:44:55
>> +
> 
> As mentioned above, if for some reason setting the mac fails there will
> be warning in the logs, but 'ovs-vsctl show' will still show the
> dpdk-vf-mac that was not applied and it's quite prominent. Also, there
> are mac, mac_in_use, dpdk-vf-mac, configured_hwaddr, requested_hwaddr.
> 
> Suggest to add a small summary here for each one as it relates
> representors and make clear to users,
> 1. how to find the mac the user requested
> 2. how to find the current mac
> 
>>  Jumbo Frames
>>  ------------
>>  
>> diff --git a/NEWS b/NEWS
>> index 2f67d5047..5ece29474 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -24,6 +24,8 @@ v2.14.0 - 17 Aug 2020
> 
> 2.15....hopefully :-)
> 
>>         to list and change log levels in DPDK components.
>>       * Vhost-user Dequeue zero-copy support is deprecated and will be removed
>>         in the next release.
>> +     * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
>> +       that allows configuring the MAC address of a VF representor.
>>     - Linux datapath:
>>       * Support for kernel versions up to 5.5.x.
>>     - AF_XDP:
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 4ef7f78a6..060348369 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -523,6 +523,9 @@ struct netdev_dpdk {
>>           * otherwise interrupt mode is used. */
>>          bool requested_lsc_interrupt_mode;
>>          bool lsc_interrupt_mode;
>> +
>> +        /* VF configuration. */
>> +        struct eth_addr requested_hwaddr;
>>      );
>>  
>>      PADDED_MEMBERS(CACHE_LINE_SIZE,
>> @@ -1685,6 +1688,16 @@ out:
>>      return ret;
>>  }
>>  
>> +static bool
>> +dpdk_port_is_representor(struct netdev_dpdk *dev)
>> +    OVS_REQUIRES(dev->mutex)
>> +{
>> +    struct rte_eth_dev_info dev_info;
>> +
>> +    rte_eth_dev_info_get(dev->port_id, &dev_info);
>> +    return (*dev_info.dev_flags) & RTE_ETH_DEV_REPRESENTOR;
>> +}
>> +
>>  static int
>>  netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
>>  {
>> @@ -1719,6 +1732,13 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
>>          }
>>          smap_add(args, "lsc_interrupt_mode",
>>                   dev->lsc_interrupt_mode ? "true" : "false");
>> +
>> +        if (dpdk_port_is_representor(dev)) {
>> +            smap_add_format(args, "requested_hwaddr", ETH_ADDR_FMT,
>> +                            ETH_ADDR_ARGS(dev->requested_hwaddr));
>> +            smap_add_format(args, "configured_hwaddr", ETH_ADDR_FMT,
>> +                            ETH_ADDR_ARGS(dev->hwaddr));
> 
> '*mac*' is used elsewhere (mac, mac_in_use, dpdk-vf-mac), think it's
> better to keep consistent

Since this is 'config' it should report things in a way they could be passed
to the datbase.  So, it should be 'dpdk-vf-mac' and it should report the
value that was requested, i.e. dev->requested_hwaddr.

> 
>> +        }
>>      }
>>      ovs_mutex_unlock(&dev->mutex);
>>  
>> @@ -1898,6 +1918,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>          {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
>>      };
>>      const char *new_devargs;
>> +    const char *vf_mac;
>>      int err = 0;
>>  
>>      ovs_mutex_lock(&dpdk_mutex);
>> @@ -1968,6 +1989,36 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>          goto out;
>>      }
>>  
>> +    vf_mac = smap_get(args, "dpdk-vf-mac");
>> +    if (vf_mac) {
>> +        struct eth_addr mac;
>> +
>> +        err = EINVAL;
>> +
>> +        if (!dpdk_port_is_representor(dev)) {
>> +            VLOG_ERR_BUF(errp, "'%s' is trying to set the VF MAC '%s' "
>> +                         "but 'options:dpdk-vf-mac' is only supported for "
>> +                         "VF representors.",
>> +                         netdev_get_name(netdev), vf_mac);
> 
> Warnings here seems more in line with the other configs that cannot be
> applied.

I agree, if it's not a representor we could just issue a warning and continue.

> 
>> +            goto out;
>> +        }
>> +        if (!eth_addr_from_string(vf_mac, &mac)) {
>> +            VLOG_ERR_BUF(errp, "interface '%s': cannot parse VF MAC '%s'",
>> +                         netdev_get_name(netdev), vf_mac);
> 
> as above
> 
>> +            goto out;
>> +        }
>> +        if (eth_addr_is_multicast(mac)) {
>> +            VLOG_ERR_BUF(errp,
>> +                         "interface '%s': cannot set VF MAC to multicast "
>> +                         "address", netdev_get_name(netdev));
> 
> as above
> 
>> +            goto out;
>> +        }
>> +        if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
>> +            dev->requested_hwaddr = mac;
>> +            netdev_request_reconfigure(netdev);
>> +        }
>> +    }
>> +
>>      lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
>>      if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
>>          dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
>> @@ -4938,6 +4989,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>          && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>>          && dev->rxq_size == dev->requested_rxq_size
>>          && dev->txq_size == dev->requested_txq_size
>> +        && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>>          && dev->socket_id == dev->requested_socket_id
>>          && dev->started && !dev->reset_needed) {
>>          /* Reconfiguration is unnecessary */
>> @@ -4969,6 +5021,14 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>      dev->txq_size = dev->requested_txq_size;
>>  
>>      rte_free(dev->tx_q);
>> +
>> +    if (!eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)) {
> 
> Looks like if dpdk-vf-mac is not set, after initialization there will be
> a configured hwaddr, and requested_hwaddr will be all zeros.
> 
> If this function is called because of some other config change, then
> hwaddr !=  requested_hwaddr, and it will attempt to set the mac to the
> requested_hwaddr of all zeros. This may also fail causing the
> reconfigure to fail for other configs items.
> 
> (I admit, I hacked a phy port dev_info to say it was a representor, when
> playing with this but the issue seems valid)
> 
> 
>> +        err = netdev_dpdk_set_etheraddr__(dev, dev->requested_hwaddr);
>> +        if (err) {
>> +            goto out;
>> +        }
>> +    }
>> +
>>      err = dpdk_eth_dev_init(dev);
>>      if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
>>          netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index 81c84927f..c0bf03c95 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -3280,6 +3280,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>>            descriptors will be used by default.
>>          </p>
>>        </column>
>> +
>> +      <column name="options" key="dpdk-vf-mac">
>> +        <p>
>> +          Ethernet address to set for this VF interface.  If unset then the
>> +          default MAC address is used:
>> +        </p>
>> +        <ul>
>> +          <li>For most drivers, the default MAC address assigned by
>> +          their hardware.</li><li>For bifurcated drivers, the MAC currently
>> +          used by the kernel netdevice.</li>
> 
> nit: the formatting is a little different compared to the other
> <li></li> usage, newline/indent etc.
> 
> thanks,
> Kevin.
> 
>> +        </ul>
>> +        <p>This option may only be used with dpdk VF representors.</p>
>> +      </column>
>>      </group>
>>  
>>      <group title="EMC (Exact Match Cache) Configuration">
>>
> 
> 
> 
> 
>
Gaetan Rivet Oct. 28, 2020, 1:58 a.m. UTC | #3
Hi Ilya, Kevin,

Thanks for the remarks, I will fix them.
However I have two questions inline.

On 27/10/20 15:03 +0100, Ilya Maximets wrote:
> On 10/20/20 1:15 PM, Kevin Traynor wrote:
> > On 16/09/2020 16:17, Gaetan Rivet wrote:
> >> In some cloud topologies, using DPDK VF representors in guest requires
> >> configuring a VF before it is assigned to the guest.
> >>
> >> A first basic option for such configuration is setting the VF MAC
> >> address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface
> >> table.
> >>
> >> This option can be used as such:
> >>
> >>    $ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk \
> >>       options:dpdk-vf-mac=00:11:22:33:44:55
> >>
> > 
> > If I got it right, it doesn't seem like it solves any issues by limiting
> > to representor, but is just an attempt to limit some of the edge cases
> > around bifurcated devices to the devices where the requested
> > functionality is really needed now.
> > 
> > This limitation has some costs...
> > 
> > We will now have, mac, mac_in_use, dpdk-vf-mac, configured_hwaddr,
> > requested_hwaddr for the user to understand. I think it can be confusing.
> 
> To be clear, 'configured_hwaddr/requested_hwaddr' type of report should not
> exist at least in the output of 'get_config()' function.  Same applies
> to all existing '{configured,requested}_something' reports.
> 'get_config()' intended to report things that could be copied and passed directly
> to database.  OTOH, 'get_status()' should report what is an actual device
> status, i.e. 'dpdk-vf-mac' should be in 'get_config()' and 'configured_hwaddr'
> should be in 'get_status()', but it should be named differently, e.g. just
> 'dpdk-vf-mac' and the callback it placed in already says that it's actually
> 'configured'.  In short:
>   - 'dpdk-vf-mac' from the 'get_config()' should report dev->requested_hwaddr.
>   - 'dpdk-vf-mac' from the 'get_status()' should report dev->hwaddr.
> Of course, all this should be reported only if device is a representor.
> 
> 

Indeed I had followed the other '{configured|requested}_' fields.

This is not related to this patchset, however I'd like to know how to
proceed to fix those other fields? They were exposed to users possibly,
should there be a deprecation process to warn of the change, or should
they just be fixed directly as the get_config() API was not respected?

> > 
> > 'ovs-vsctl show' will show the dpdk-vf-mac, which is the main one we
> > document, but it's not clear if it was successful or not. (yes there is
> > a log entry, but it is a separate place).
> > 
> > As it is specific for representors, if we ever want to allow setting of
> > mac on any non representor DPDK ports, we have an awkward user
> > interface. We would need to make another 'dpdk-mac' type option, or we
> > decide then to allow use mac in interface table but that it doesn't work
> > for representors, or there are two ways to set for representors etc.
> > None of this seems great.
> > 
> > My feeling is that it is making things complicated for the user with
> > this many knobs, where we could just have mac and mac_in_use, live with
> > the edge cases (as Gaetan pointed out, we already do for MTU) and we
> > know as well if we ever need to extend further, the user interface will
> > stay simple. Just my 2C, maybe there's a good argument why we can't do
> > it like this.
> 
> Setting mac for physical device has lots of split-brain issues and unclear
> ways how and when undo changes as we do not know what was original mac or
> what to do if there was no mac at all at the start.  Moreover, some devices
> will not allow setting mac to it's original value (e.g. all zeroes).
> One more point is that vf and representor itself are different devices and
> these devices could have different mac addresses.  For now we agreed that
> DPDK api works for representors in a way that it always changes parameters
> of a virtual function, not representors', but that is not a very clear thing
> for all users (e.g. for DPDK developers), so 'vf' in 'dpdk-vf-mac' specifically
> points to that difference.
> 
> After a long discussion we decided to make this option as specific as possible,
> so users will not try to use it to configure anything else beside mac of a
> virtual function.  All restrictions and downsides of configuration of vf mac
> was/should be documented.
> 
> 
> > 
> > Few comments on the code as it is now below.
> > 
> > 
> >> Signed-off-by: Gaetan Rivet <grive@u256.net>
> >> Suggested-by: Ilya Maximets <i.maximets@ovn.org>
> >> Acked-by: Eli Britstein <elibr@nvidia.com>
> >> ---
> >>  Documentation/topics/dpdk/phy.rst | 22 ++++++++++++
> >>  NEWS                              |  2 ++
> >>  lib/netdev-dpdk.c                 | 60 +++++++++++++++++++++++++++++++
> >>  vswitchd/vswitch.xml              | 13 +++++++
> >>  4 files changed, 97 insertions(+)
> >>
> >> diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst
> >> index 38e52c8de..73dcb7c28 100644
> >> --- a/Documentation/topics/dpdk/phy.rst
> >> +++ b/Documentation/topics/dpdk/phy.rst
> >> @@ -379,6 +379,28 @@ an eth device whose mac address is ``00:11:22:33:44:55``::
> >>      $ ovs-vsctl add-port br0 dpdk-mac -- set Interface dpdk-mac type=dpdk \
> >>         options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"
> >>  
> >> +Representor specific configuration
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +In some topologies, a VF must be configured before being assigned to a
> >> +guest (VM) machine.  This configuration is done through VF-specific fields
> >> +in the ``options`` column of the Interface table.
> >> +
> >> +.. important::
> >> +
> >> +   Some DPDK port use `bifurcated drivers <bifurcated-drivers>`__,
> >> +   which means that a kernel netdevice remains when Open vSwitch is stopped.
> >> +
> >> +   In such case, any configuration applied to a VF would remain set on the
> >> +   kernel netdevice, and be inherited from it when Open vSwitch is restarted,
> >> +   even if the options described in this section are unset from Open vSwitch.
> >> +
> >> +.. _bifurcated-drivers: http://doc.dpdk.org/guides/linux_gsg/linux_drivers.html#bifurcated-driver
> >> +
> >> +- Configure the VF MAC address::
> >> +
> >> +    $ ovs-vsctl set Interface dpdk-rep0 options:dpdk-vf-mac=00:11:22:33:44:55
> >> +
> > 
> > As mentioned above, if for some reason setting the mac fails there will
> > be warning in the logs, but 'ovs-vsctl show' will still show the
> > dpdk-vf-mac that was not applied and it's quite prominent. Also, there
> > are mac, mac_in_use, dpdk-vf-mac, configured_hwaddr, requested_hwaddr.
> > 
> > Suggest to add a small summary here for each one as it relates
> > representors and make clear to users,
> > 1. how to find the mac the user requested
> > 2. how to find the current mac
> > 
> >>  Jumbo Frames
> >>  ------------
> >>  
> >> diff --git a/NEWS b/NEWS
> >> index 2f67d5047..5ece29474 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -24,6 +24,8 @@ v2.14.0 - 17 Aug 2020
> > 
> > 2.15....hopefully :-)
> > 
> >>         to list and change log levels in DPDK components.
> >>       * Vhost-user Dequeue zero-copy support is deprecated and will be removed
> >>         in the next release.
> >> +     * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
> >> +       that allows configuring the MAC address of a VF representor.
> >>     - Linux datapath:
> >>       * Support for kernel versions up to 5.5.x.
> >>     - AF_XDP:
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >> index 4ef7f78a6..060348369 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -523,6 +523,9 @@ struct netdev_dpdk {
> >>           * otherwise interrupt mode is used. */
> >>          bool requested_lsc_interrupt_mode;
> >>          bool lsc_interrupt_mode;
> >> +
> >> +        /* VF configuration. */
> >> +        struct eth_addr requested_hwaddr;
> >>      );
> >>  
> >>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> >> @@ -1685,6 +1688,16 @@ out:
> >>      return ret;
> >>  }
> >>  
> >> +static bool
> >> +dpdk_port_is_representor(struct netdev_dpdk *dev)
> >> +    OVS_REQUIRES(dev->mutex)
> >> +{
> >> +    struct rte_eth_dev_info dev_info;
> >> +
> >> +    rte_eth_dev_info_get(dev->port_id, &dev_info);
> >> +    return (*dev_info.dev_flags) & RTE_ETH_DEV_REPRESENTOR;
> >> +}
> >> +
> >>  static int
> >>  netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
> >>  {
> >> @@ -1719,6 +1732,13 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
> >>          }
> >>          smap_add(args, "lsc_interrupt_mode",
> >>                   dev->lsc_interrupt_mode ? "true" : "false");
> >> +
> >> +        if (dpdk_port_is_representor(dev)) {
> >> +            smap_add_format(args, "requested_hwaddr", ETH_ADDR_FMT,
> >> +                            ETH_ADDR_ARGS(dev->requested_hwaddr));
> >> +            smap_add_format(args, "configured_hwaddr", ETH_ADDR_FMT,
> >> +                            ETH_ADDR_ARGS(dev->hwaddr));
> > 
> > '*mac*' is used elsewhere (mac, mac_in_use, dpdk-vf-mac), think it's
> > better to keep consistent
> 
> Since this is 'config' it should report things in a way they could be passed
> to the datbase.  So, it should be 'dpdk-vf-mac' and it should report the
> value that was requested, i.e. dev->requested_hwaddr.
> 
> > 
> >> +        }
> >>      }
> >>      ovs_mutex_unlock(&dev->mutex);
> >>  
> >> @@ -1898,6 +1918,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
> >>          {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
> >>      };
> >>      const char *new_devargs;
> >> +    const char *vf_mac;
> >>      int err = 0;
> >>  
> >>      ovs_mutex_lock(&dpdk_mutex);
> >> @@ -1968,6 +1989,36 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
> >>          goto out;
> >>      }
> >>  
> >> +    vf_mac = smap_get(args, "dpdk-vf-mac");
> >> +    if (vf_mac) {
> >> +        struct eth_addr mac;
> >> +
> >> +        err = EINVAL;
> >> +
> >> +        if (!dpdk_port_is_representor(dev)) {
> >> +            VLOG_ERR_BUF(errp, "'%s' is trying to set the VF MAC '%s' "
> >> +                         "but 'options:dpdk-vf-mac' is only supported for "
> >> +                         "VF representors.",
> >> +                         netdev_get_name(netdev), vf_mac);
> > 
> > Warnings here seems more in line with the other configs that cannot be
> > applied.
> 
> I agree, if it's not a representor we could just issue a warning and continue.
> 

Your phrasing seems to limit the (err -> warn) change to the
representor filtering part. Should the MAC sanitizing messages below remain
errors?

> > 
> >> +            goto out;
> >> +        }
> >> +        if (!eth_addr_from_string(vf_mac, &mac)) {
> >> +            VLOG_ERR_BUF(errp, "interface '%s': cannot parse VF MAC '%s'",
> >> +                         netdev_get_name(netdev), vf_mac);
> > 
> > as above
> > 
> >> +            goto out;
> >> +        }
> >> +        if (eth_addr_is_multicast(mac)) {
> >> +            VLOG_ERR_BUF(errp,
> >> +                         "interface '%s': cannot set VF MAC to multicast "
> >> +                         "address", netdev_get_name(netdev));
> > 
> > as above
> > 
> >> +            goto out;
> >> +        }
> >> +        if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
> >> +            dev->requested_hwaddr = mac;
> >> +            netdev_request_reconfigure(netdev);
> >> +        }
> >> +    }
> >> +
> >>      lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
> >>      if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
> >>          dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
> >> @@ -4938,6 +4989,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> >>          && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
> >>          && dev->rxq_size == dev->requested_rxq_size
> >>          && dev->txq_size == dev->requested_txq_size
> >> +        && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
> >>          && dev->socket_id == dev->requested_socket_id
> >>          && dev->started && !dev->reset_needed) {
> >>          /* Reconfiguration is unnecessary */
> >> @@ -4969,6 +5021,14 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> >>      dev->txq_size = dev->requested_txq_size;
> >>  
> >>      rte_free(dev->tx_q);
> >> +
> >> +    if (!eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)) {
> > 
> > Looks like if dpdk-vf-mac is not set, after initialization there will be
> > a configured hwaddr, and requested_hwaddr will be all zeros.
> > 
> > If this function is called because of some other config change, then
> > hwaddr !=  requested_hwaddr, and it will attempt to set the mac to the
> > requested_hwaddr of all zeros. This may also fail causing the
> > reconfigure to fail for other configs items.
> > 
> > (I admit, I hacked a phy port dev_info to say it was a representor, when
> > playing with this but the issue seems valid)
> > 
> > 
> >> +        err = netdev_dpdk_set_etheraddr__(dev, dev->requested_hwaddr);
> >> +        if (err) {
> >> +            goto out;
> >> +        }
> >> +    }
> >> +
> >>      err = dpdk_eth_dev_init(dev);
> >>      if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
> >>          netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
> >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> >> index 81c84927f..c0bf03c95 100644
> >> --- a/vswitchd/vswitch.xml
> >> +++ b/vswitchd/vswitch.xml
> >> @@ -3280,6 +3280,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
> >>            descriptors will be used by default.
> >>          </p>
> >>        </column>
> >> +
> >> +      <column name="options" key="dpdk-vf-mac">
> >> +        <p>
> >> +          Ethernet address to set for this VF interface.  If unset then the
> >> +          default MAC address is used:
> >> +        </p>
> >> +        <ul>
> >> +          <li>For most drivers, the default MAC address assigned by
> >> +          their hardware.</li><li>For bifurcated drivers, the MAC currently
> >> +          used by the kernel netdevice.</li>
> > 
> > nit: the formatting is a little different compared to the other
> > <li></li> usage, newline/indent etc.
> > 
> > thanks,
> > Kevin.
> > 
> >> +        </ul>
> >> +        <p>This option may only be used with dpdk VF representors.</p>
> >> +      </column>
> >>      </group>
> >>  
> >>      <group title="EMC (Exact Match Cache) Configuration">
> >>
> > 
> > 
> > 
> > 
> > 
>
Gaetan Rivet Oct. 30, 2020, 4 p.m. UTC | #4
On 28/10/20 02:58 +0100, Gaƫtan Rivet wrote:

[...]

> > > 
> > >> +        }
> > >>      }
> > >>      ovs_mutex_unlock(&dev->mutex);
> > >>  
> > >> @@ -1898,6 +1918,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
> > >>          {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
> > >>      };
> > >>      const char *new_devargs;
> > >> +    const char *vf_mac;
> > >>      int err = 0;
> > >>  
> > >>      ovs_mutex_lock(&dpdk_mutex);
> > >> @@ -1968,6 +1989,36 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
> > >>          goto out;
> > >>      }
> > >>  
> > >> +    vf_mac = smap_get(args, "dpdk-vf-mac");
> > >> +    if (vf_mac) {
> > >> +        struct eth_addr mac;
> > >> +
> > >> +        err = EINVAL;
> > >> +
> > >> +        if (!dpdk_port_is_representor(dev)) {
> > >> +            VLOG_ERR_BUF(errp, "'%s' is trying to set the VF MAC '%s' "
> > >> +                         "but 'options:dpdk-vf-mac' is only supported for "
> > >> +                         "VF representors.",
> > >> +                         netdev_get_name(netdev), vf_mac);
> > > 
> > > Warnings here seems more in line with the other configs that cannot be
> > > applied.
> > 
> > I agree, if it's not a representor we could just issue a warning and continue.
> > 
> 
> Your phrasing seems to limit the (err -> warn) change to the
> representor filtering part. Should the MAC sanitizing messages below remain
> errors?
> 

I've sent a v4 that followed Kevin's suggestions, let me know if you'd
prefer if I did as I outlined above.

Cheers,
Kevin Traynor Nov. 3, 2020, 1:41 p.m. UTC | #5
On 27/10/2020 14:03, Ilya Maximets wrote:
> On 10/20/20 1:15 PM, Kevin Traynor wrote:
>> On 16/09/2020 16:17, Gaetan Rivet wrote:
>>> In some cloud topologies, using DPDK VF representors in guest requires
>>> configuring a VF before it is assigned to the guest.
>>>
>>> A first basic option for such configuration is setting the VF MAC
>>> address. Add a key 'dpdk-vf-mac' to the 'options' column of the Interface
>>> table.
>>>
>>> This option can be used as such:
>>>
>>>    $ ovs-vsctl add-port br0 dpdk-rep0 -- set Interface dpdk-rep0 type=dpdk \
>>>       options:dpdk-vf-mac=00:11:22:33:44:55
>>>
>>
>> If I got it right, it doesn't seem like it solves any issues by limiting
>> to representor, but is just an attempt to limit some of the edge cases
>> around bifurcated devices to the devices where the requested
>> functionality is really needed now.
>>
>> This limitation has some costs...
>>
>> We will now have, mac, mac_in_use, dpdk-vf-mac, configured_hwaddr,
>> requested_hwaddr for the user to understand. I think it can be confusing.
> 
> To be clear, 'configured_hwaddr/requested_hwaddr' type of report should not
> exist at least in the output of 'get_config()' function.  Same applies
> to all existing '{configured,requested}_something' reports.
> 'get_config()' intended to report things that could be copied and passed directly
> to database.  OTOH, 'get_status()' should report what is an actual device
> status, i.e. 'dpdk-vf-mac' should be in 'get_config()' and 'configured_hwaddr'
> should be in 'get_status()', but it should be named differently, e.g. just
> 'dpdk-vf-mac' and the callback it placed in already says that it's actually
> 'configured'.  In short:
>   - 'dpdk-vf-mac' from the 'get_config()' should report dev->requested_hwaddr.
>   - 'dpdk-vf-mac' from the 'get_status()' should report dev->hwaddr.

If an invalid mac is attempted or it fails to be set, this is what the
user will see for dpdk-vf-mac:

ovs-appctl dpctl/show -> dpdk-vf-mac=00:11:22:33:44:55
ovs-vsctl show -> dpdk-vf-mac=33:11:22:33:44:55
ovs-vsctl get Interface myport status -> dpdk-vf-mac=00:11:22:33:44:55

For completeness, i'll add the failed attempt is logged clearly.

I suppose we can tell the user to check the 'mac_in_use', but multiple
dpdk-vf-mac entries for the same device with different values is confusing.

requested_dpdk-vf-mac configured_dpdk-vf-mac from 'ovs-appctl
dpctl/show' would be explicit. At a minimum we should document which one
is requested and which one is configured.

> Of course, all this should be reported only if device is a representor.
> 
> 
>>
>> 'ovs-vsctl show' will show the dpdk-vf-mac, which is the main one we
>> document, but it's not clear if it was successful or not. (yes there is
>> a log entry, but it is a separate place).
>>
>> As it is specific for representors, if we ever want to allow setting of
>> mac on any non representor DPDK ports, we have an awkward user
>> interface. We would need to make another 'dpdk-mac' type option, or we
>> decide then to allow use mac in interface table but that it doesn't work
>> for representors, or there are two ways to set for representors etc.
>> None of this seems great.
>>
>> My feeling is that it is making things complicated for the user with
>> this many knobs, where we could just have mac and mac_in_use, live with
>> the edge cases (as Gaetan pointed out, we already do for MTU) and we
>> know as well if we ever need to extend further, the user interface will
>> stay simple. Just my 2C, maybe there's a good argument why we can't do
>> it like this.
> 
> Setting mac for physical device has lots of split-brain issues and unclear
> ways how and when undo changes as we do not know what was original mac or
> what to do if there was no mac at all at the start.  Moreover, some devices
> will not allow setting mac to it's original value (e.g. all zeroes).
> One more point is that vf and representor itself are different devices and
> these devices could have different mac addresses.  For now we agreed that

> DPDK api works for representors in a way that it always changes parameters
> of a virtual function, not representors', but that is not a very clear thing
> for all users (e.g. for DPDK developers), so 'vf' in 'dpdk-vf-mac' specifically
> points to that difference.
> 

That's a good point, it does distinguish and iirc some people made
the point that this could be important.

> After a long discussion we decided to make this option as specific as possible,
> so users will not try to use it to configure anything else beside mac of a
> virtual function.  All restrictions and downsides of configuration of vf mac
> was/should be documented.
> 

ok, I can see the pros of this type of design too so I won't object, but
agree the downsides like the additional fields and ability for user to
understand clearly the output from commands above should be documented.

> 
>>
>> Few comments on the code as it is now below.
>>
>>
>>> Signed-off-by: Gaetan Rivet <grive@u256.net>
>>> Suggested-by: Ilya Maximets <i.maximets@ovn.org>
>>> Acked-by: Eli Britstein <elibr@nvidia.com>
>>> ---
>>>  Documentation/topics/dpdk/phy.rst | 22 ++++++++++++
>>>  NEWS                              |  2 ++
>>>  lib/netdev-dpdk.c                 | 60 +++++++++++++++++++++++++++++++
>>>  vswitchd/vswitch.xml              | 13 +++++++
>>>  4 files changed, 97 insertions(+)
>>>
>>> diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst
>>> index 38e52c8de..73dcb7c28 100644
>>> --- a/Documentation/topics/dpdk/phy.rst
>>> +++ b/Documentation/topics/dpdk/phy.rst
>>> @@ -379,6 +379,28 @@ an eth device whose mac address is ``00:11:22:33:44:55``::
>>>      $ ovs-vsctl add-port br0 dpdk-mac -- set Interface dpdk-mac type=dpdk \
>>>         options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"
>>>  
>>> +Representor specific configuration
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +In some topologies, a VF must be configured before being assigned to a
>>> +guest (VM) machine.  This configuration is done through VF-specific fields
>>> +in the ``options`` column of the Interface table.
>>> +
>>> +.. important::
>>> +
>>> +   Some DPDK port use `bifurcated drivers <bifurcated-drivers>`__,
>>> +   which means that a kernel netdevice remains when Open vSwitch is stopped.
>>> +
>>> +   In such case, any configuration applied to a VF would remain set on the
>>> +   kernel netdevice, and be inherited from it when Open vSwitch is restarted,
>>> +   even if the options described in this section are unset from Open vSwitch.
>>> +
>>> +.. _bifurcated-drivers: http://doc.dpdk.org/guides/linux_gsg/linux_drivers.html#bifurcated-driver
>>> +
>>> +- Configure the VF MAC address::
>>> +
>>> +    $ ovs-vsctl set Interface dpdk-rep0 options:dpdk-vf-mac=00:11:22:33:44:55
>>> +
>>
>> As mentioned above, if for some reason setting the mac fails there will
>> be warning in the logs, but 'ovs-vsctl show' will still show the
>> dpdk-vf-mac that was not applied and it's quite prominent. Also, there
>> are mac, mac_in_use, dpdk-vf-mac, configured_hwaddr, requested_hwaddr.
>>
>> Suggest to add a small summary here for each one as it relates
>> representors and make clear to users,
>> 1. how to find the mac the user requested
>> 2. how to find the current mac
>>
>>>  Jumbo Frames
>>>  ------------
>>>  
>>> diff --git a/NEWS b/NEWS
>>> index 2f67d5047..5ece29474 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -24,6 +24,8 @@ v2.14.0 - 17 Aug 2020
>>
>> 2.15....hopefully :-)
>>
>>>         to list and change log levels in DPDK components.
>>>       * Vhost-user Dequeue zero-copy support is deprecated and will be removed
>>>         in the next release.
>>> +     * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
>>> +       that allows configuring the MAC address of a VF representor.
>>>     - Linux datapath:
>>>       * Support for kernel versions up to 5.5.x.
>>>     - AF_XDP:
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 4ef7f78a6..060348369 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -523,6 +523,9 @@ struct netdev_dpdk {
>>>           * otherwise interrupt mode is used. */
>>>          bool requested_lsc_interrupt_mode;
>>>          bool lsc_interrupt_mode;
>>> +
>>> +        /* VF configuration. */
>>> +        struct eth_addr requested_hwaddr;
>>>      );
>>>  
>>>      PADDED_MEMBERS(CACHE_LINE_SIZE,
>>> @@ -1685,6 +1688,16 @@ out:
>>>      return ret;
>>>  }
>>>  
>>> +static bool
>>> +dpdk_port_is_representor(struct netdev_dpdk *dev)
>>> +    OVS_REQUIRES(dev->mutex)
>>> +{
>>> +    struct rte_eth_dev_info dev_info;
>>> +
>>> +    rte_eth_dev_info_get(dev->port_id, &dev_info);
>>> +    return (*dev_info.dev_flags) & RTE_ETH_DEV_REPRESENTOR;
>>> +}
>>> +
>>>  static int
>>>  netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
>>>  {
>>> @@ -1719,6 +1732,13 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
>>>          }
>>>          smap_add(args, "lsc_interrupt_mode",
>>>                   dev->lsc_interrupt_mode ? "true" : "false");
>>> +
>>> +        if (dpdk_port_is_representor(dev)) {
>>> +            smap_add_format(args, "requested_hwaddr", ETH_ADDR_FMT,
>>> +                            ETH_ADDR_ARGS(dev->requested_hwaddr));
>>> +            smap_add_format(args, "configured_hwaddr", ETH_ADDR_FMT,
>>> +                            ETH_ADDR_ARGS(dev->hwaddr));
>>
>> '*mac*' is used elsewhere (mac, mac_in_use, dpdk-vf-mac), think it's
>> better to keep consistent
> 
> Since this is 'config' it should report things in a way they could be passed
> to the datbase.  So, it should be 'dpdk-vf-mac' and it should report the
> value that was requested, i.e. dev->requested_hwaddr.
> 
>>
>>> +        }
>>>      }
>>>      ovs_mutex_unlock(&dev->mutex);
>>>  
>>> @@ -1898,6 +1918,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>>          {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
>>>      };
>>>      const char *new_devargs;
>>> +    const char *vf_mac;
>>>      int err = 0;
>>>  
>>>      ovs_mutex_lock(&dpdk_mutex);
>>> @@ -1968,6 +1989,36 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>>          goto out;
>>>      }
>>>  
>>> +    vf_mac = smap_get(args, "dpdk-vf-mac");
>>> +    if (vf_mac) {
>>> +        struct eth_addr mac;
>>> +
>>> +        err = EINVAL;
>>> +
>>> +        if (!dpdk_port_is_representor(dev)) {
>>> +            VLOG_ERR_BUF(errp, "'%s' is trying to set the VF MAC '%s' "
>>> +                         "but 'options:dpdk-vf-mac' is only supported for "
>>> +                         "VF representors.",
>>> +                         netdev_get_name(netdev), vf_mac);
>>
>> Warnings here seems more in line with the other configs that cannot be
>> applied.
> 
> I agree, if it's not a representor we could just issue a warning and continue.
> 
>>
>>> +            goto out;
>>> +        }
>>> +        if (!eth_addr_from_string(vf_mac, &mac)) {
>>> +            VLOG_ERR_BUF(errp, "interface '%s': cannot parse VF MAC '%s'",
>>> +                         netdev_get_name(netdev), vf_mac);
>>
>> as above
>>
>>> +            goto out;
>>> +        }
>>> +        if (eth_addr_is_multicast(mac)) {
>>> +            VLOG_ERR_BUF(errp,
>>> +                         "interface '%s': cannot set VF MAC to multicast "
>>> +                         "address", netdev_get_name(netdev));
>>
>> as above
>>
>>> +            goto out;
>>> +        }
>>> +        if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
>>> +            dev->requested_hwaddr = mac;
>>> +            netdev_request_reconfigure(netdev);
>>> +        }
>>> +    }
>>> +
>>>      lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
>>>      if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
>>>          dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
>>> @@ -4938,6 +4989,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>          && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>>>          && dev->rxq_size == dev->requested_rxq_size
>>>          && dev->txq_size == dev->requested_txq_size
>>> +        && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>>>          && dev->socket_id == dev->requested_socket_id
>>>          && dev->started && !dev->reset_needed) {
>>>          /* Reconfiguration is unnecessary */
>>> @@ -4969,6 +5021,14 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>      dev->txq_size = dev->requested_txq_size;
>>>  
>>>      rte_free(dev->tx_q);
>>> +
>>> +    if (!eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)) {
>>
>> Looks like if dpdk-vf-mac is not set, after initialization there will be
>> a configured hwaddr, and requested_hwaddr will be all zeros.
>>
>> If this function is called because of some other config change, then
>> hwaddr !=  requested_hwaddr, and it will attempt to set the mac to the
>> requested_hwaddr of all zeros. This may also fail causing the
>> reconfigure to fail for other configs items.
>>
>> (I admit, I hacked a phy port dev_info to say it was a representor, when
>> playing with this but the issue seems valid)
>>
>>
>>> +        err = netdev_dpdk_set_etheraddr__(dev, dev->requested_hwaddr);
>>> +        if (err) {
>>> +            goto out;
>>> +        }
>>> +    }
>>> +
>>>      err = dpdk_eth_dev_init(dev);
>>>      if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
>>>          netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>> index 81c84927f..c0bf03c95 100644
>>> --- a/vswitchd/vswitch.xml
>>> +++ b/vswitchd/vswitch.xml
>>> @@ -3280,6 +3280,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>>>            descriptors will be used by default.
>>>          </p>
>>>        </column>
>>> +
>>> +      <column name="options" key="dpdk-vf-mac">
>>> +        <p>
>>> +          Ethernet address to set for this VF interface.  If unset then the
>>> +          default MAC address is used:
>>> +        </p>
>>> +        <ul>
>>> +          <li>For most drivers, the default MAC address assigned by
>>> +          their hardware.</li><li>For bifurcated drivers, the MAC currently
>>> +          used by the kernel netdevice.</li>
>>
>> nit: the formatting is a little different compared to the other
>> <li></li> usage, newline/indent etc.
>>
>> thanks,
>> Kevin.
>>
>>> +        </ul>
>>> +        <p>This option may only be used with dpdk VF representors.</p>
>>> +      </column>
>>>      </group>
>>>  
>>>      <group title="EMC (Exact Match Cache) Configuration">
>>>
>>
>>
>>
>>
>>
>
diff mbox series

Patch

diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst
index 38e52c8de..73dcb7c28 100644
--- a/Documentation/topics/dpdk/phy.rst
+++ b/Documentation/topics/dpdk/phy.rst
@@ -379,6 +379,28 @@  an eth device whose mac address is ``00:11:22:33:44:55``::
     $ ovs-vsctl add-port br0 dpdk-mac -- set Interface dpdk-mac type=dpdk \
        options:dpdk-devargs="class=eth,mac=00:11:22:33:44:55"
 
+Representor specific configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In some topologies, a VF must be configured before being assigned to a
+guest (VM) machine.  This configuration is done through VF-specific fields
+in the ``options`` column of the Interface table.
+
+.. important::
+
+   Some DPDK port use `bifurcated drivers <bifurcated-drivers>`__,
+   which means that a kernel netdevice remains when Open vSwitch is stopped.
+
+   In such case, any configuration applied to a VF would remain set on the
+   kernel netdevice, and be inherited from it when Open vSwitch is restarted,
+   even if the options described in this section are unset from Open vSwitch.
+
+.. _bifurcated-drivers: http://doc.dpdk.org/guides/linux_gsg/linux_drivers.html#bifurcated-driver
+
+- Configure the VF MAC address::
+
+    $ ovs-vsctl set Interface dpdk-rep0 options:dpdk-vf-mac=00:11:22:33:44:55
+
 Jumbo Frames
 ------------
 
diff --git a/NEWS b/NEWS
index 2f67d5047..5ece29474 100644
--- a/NEWS
+++ b/NEWS
@@ -24,6 +24,8 @@  v2.14.0 - 17 Aug 2020
        to list and change log levels in DPDK components.
      * Vhost-user Dequeue zero-copy support is deprecated and will be removed
        in the next release.
+     * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
+       that allows configuring the MAC address of a VF representor.
    - Linux datapath:
      * Support for kernel versions up to 5.5.x.
    - AF_XDP:
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4ef7f78a6..060348369 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -523,6 +523,9 @@  struct netdev_dpdk {
          * otherwise interrupt mode is used. */
         bool requested_lsc_interrupt_mode;
         bool lsc_interrupt_mode;
+
+        /* VF configuration. */
+        struct eth_addr requested_hwaddr;
     );
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1685,6 +1688,16 @@  out:
     return ret;
 }
 
+static bool
+dpdk_port_is_representor(struct netdev_dpdk *dev)
+    OVS_REQUIRES(dev->mutex)
+{
+    struct rte_eth_dev_info dev_info;
+
+    rte_eth_dev_info_get(dev->port_id, &dev_info);
+    return (*dev_info.dev_flags) & RTE_ETH_DEV_REPRESENTOR;
+}
+
 static int
 netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
 {
@@ -1719,6 +1732,13 @@  netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
         }
         smap_add(args, "lsc_interrupt_mode",
                  dev->lsc_interrupt_mode ? "true" : "false");
+
+        if (dpdk_port_is_representor(dev)) {
+            smap_add_format(args, "requested_hwaddr", ETH_ADDR_FMT,
+                            ETH_ADDR_ARGS(dev->requested_hwaddr));
+            smap_add_format(args, "configured_hwaddr", ETH_ADDR_FMT,
+                            ETH_ADDR_ARGS(dev->hwaddr));
+        }
     }
     ovs_mutex_unlock(&dev->mutex);
 
@@ -1898,6 +1918,7 @@  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
         {RTE_FC_RX_PAUSE, RTE_FC_FULL    }
     };
     const char *new_devargs;
+    const char *vf_mac;
     int err = 0;
 
     ovs_mutex_lock(&dpdk_mutex);
@@ -1968,6 +1989,36 @@  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
         goto out;
     }
 
+    vf_mac = smap_get(args, "dpdk-vf-mac");
+    if (vf_mac) {
+        struct eth_addr mac;
+
+        err = EINVAL;
+
+        if (!dpdk_port_is_representor(dev)) {
+            VLOG_ERR_BUF(errp, "'%s' is trying to set the VF MAC '%s' "
+                         "but 'options:dpdk-vf-mac' is only supported for "
+                         "VF representors.",
+                         netdev_get_name(netdev), vf_mac);
+            goto out;
+        }
+        if (!eth_addr_from_string(vf_mac, &mac)) {
+            VLOG_ERR_BUF(errp, "interface '%s': cannot parse VF MAC '%s'",
+                         netdev_get_name(netdev), vf_mac);
+            goto out;
+        }
+        if (eth_addr_is_multicast(mac)) {
+            VLOG_ERR_BUF(errp,
+                         "interface '%s': cannot set VF MAC to multicast "
+                         "address", netdev_get_name(netdev));
+            goto out;
+        }
+        if (!eth_addr_equals(dev->requested_hwaddr, mac)) {
+            dev->requested_hwaddr = mac;
+            netdev_request_reconfigure(netdev);
+        }
+    }
+
     lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
     if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
         dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
@@ -4938,6 +4989,7 @@  netdev_dpdk_reconfigure(struct netdev *netdev)
         && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
         && dev->rxq_size == dev->requested_rxq_size
         && dev->txq_size == dev->requested_txq_size
+        && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
         && dev->socket_id == dev->requested_socket_id
         && dev->started && !dev->reset_needed) {
         /* Reconfiguration is unnecessary */
@@ -4969,6 +5021,14 @@  netdev_dpdk_reconfigure(struct netdev *netdev)
     dev->txq_size = dev->requested_txq_size;
 
     rte_free(dev->tx_q);
+
+    if (!eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)) {
+        err = netdev_dpdk_set_etheraddr__(dev, dev->requested_hwaddr);
+        if (err) {
+            goto out;
+        }
+    }
+
     err = dpdk_eth_dev_init(dev);
     if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
         netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO;
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 81c84927f..c0bf03c95 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -3280,6 +3280,19 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
           descriptors will be used by default.
         </p>
       </column>
+
+      <column name="options" key="dpdk-vf-mac">
+        <p>
+          Ethernet address to set for this VF interface.  If unset then the
+          default MAC address is used:
+        </p>
+        <ul>
+          <li>For most drivers, the default MAC address assigned by
+          their hardware.</li><li>For bifurcated drivers, the MAC currently
+          used by the kernel netdevice.</li>
+        </ul>
+        <p>This option may only be used with dpdk VF representors.</p>
+      </column>
     </group>
 
     <group title="EMC (Exact Match Cache) Configuration">