diff mbox series

[ovs-dev,v3] netdev-dpdk: add support for the RTE_ETH_EVENT_INTR_RESET event

Message ID 20190911131907.180147.7202.stgit@netdev64
State Superseded
Headers show
Series [ovs-dev,v3] netdev-dpdk: add support for the RTE_ETH_EVENT_INTR_RESET event | expand

Commit Message

Eelco Chaudron Sept. 11, 2019, 1:20 p.m. UTC
Currently, OVS does not register and therefore not handle the
interface reset event from the DPDK framework. This would cause a
problem in cases where a VF is used as an interface, and its
configuration changes.

As an example in the following scenario the MAC change is not
detected/acted upon until OVS is restarted without the patch applied:

  $ echo 1 > /sys/bus/pci/devices/0000:05:00.1/sriov_numvfs
  $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
            set Interface dpdk0 type=dpdk -- \
            set Interface dpdk0 options:dpdk-devargs=0000:05:0a.0

  $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
v2 -> v3:
v1 -> v2:
  Fixed Ilya's comments

 lib/netdev-dpdk.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 51 insertions(+), 2 deletions(-)

Comments

Ilya Maximets Sept. 12, 2019, 8:39 a.m. UTC | #1
On 11.09.2019 16:20, Eelco Chaudron wrote:
> Currently, OVS does not register and therefore not handle the
> interface reset event from the DPDK framework. This would cause a
> problem in cases where a VF is used as an interface, and its
> configuration changes.
> 
> As an example in the following scenario the MAC change is not
> detected/acted upon until OVS is restarted without the patch applied:
> 
>   $ echo 1 > /sys/bus/pci/devices/0000:05:00.1/sriov_numvfs
>   $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
>             set Interface dpdk0 type=dpdk -- \
>             set Interface dpdk0 options:dpdk-devargs=0000:05:0a.0
> 
>   $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33
> 
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> ---

<snip>

> @@ -4180,13 +4223,19 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>          && dev->rxq_size == dev->requested_rxq_size
>          && dev->txq_size == dev->requested_txq_size
>          && dev->socket_id == dev->requested_socket_id
> -        && dev->started) {
> +        && dev->started && !dev->reset_needed) {
>          /* Reconfiguration is unnecessary */
>  
>          goto out;
>      }
>  
> -    rte_eth_dev_stop(dev->port_id);
> +    if (dev->reset_needed) {
> +        rte_eth_dev_reset(dev->port_id);

Thinking more about the change and looking at flow control configuration,
it seems that on reset we'll lost configurations done in set_config().
Device reset should return it to initial state, i.e. all the default settings,
but set_config() will not be called after that.
I know, that VFs commonly doesn't support flow control, but if we'll add like
real configuration of MAC address, it will be lost too.

What do you think?

BTW, there is a discussion about flow control configuration:
https://patchwork.ozlabs.org/patch/1159689/

> +        dev->reset_needed = false;
> +    } else {
> +        rte_eth_dev_stop(dev->port_id);
> +    }
> +
>      dev->started = false;
>  
>      err = netdev_dpdk_mempool_configure(dev);
Eelco Chaudron Sept. 12, 2019, 10:07 a.m. UTC | #2
On 12 Sep 2019, at 10:39, Ilya Maximets wrote:

> On 11.09.2019 16:20, Eelco Chaudron wrote:
>> Currently, OVS does not register and therefore not handle the
>> interface reset event from the DPDK framework. This would cause a
>> problem in cases where a VF is used as an interface, and its
>> configuration changes.
>>
>> As an example in the following scenario the MAC change is not
>> detected/acted upon until OVS is restarted without the patch applied:
>>
>>   $ echo 1 > /sys/bus/pci/devices/0000:05:00.1/sriov_numvfs
>>   $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
>>             set Interface dpdk0 type=dpdk -- \
>>             set Interface dpdk0 options:dpdk-devargs=0000:05:0a.0
>>
>>   $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33
>>
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>> ---
>
> <snip>
>
>> @@ -4180,13 +4223,19 @@ netdev_dpdk_reconfigure(struct netdev 
>> *netdev)
>>          && dev->rxq_size == dev->requested_rxq_size
>>          && dev->txq_size == dev->requested_txq_size
>>          && dev->socket_id == dev->requested_socket_id
>> -        && dev->started) {
>> +        && dev->started && !dev->reset_needed) {
>>          /* Reconfiguration is unnecessary */
>>
>>          goto out;
>>      }
>>
>> -    rte_eth_dev_stop(dev->port_id);
>> +    if (dev->reset_needed) {
>> +        rte_eth_dev_reset(dev->port_id);
>
> Thinking more about the change and looking at flow control 
> configuration,
> it seems that on reset we'll lost configurations done in set_config().
> Device reset should return it to initial state, i.e. all the default 
> settings,
> but set_config() will not be called after that.
> I know, that VFs commonly doesn't support flow control, but if we'll 
> add like
> real configuration of MAC address, it will be lost too.
>
> What do you think?

Doesn’t the full bridge run sequence take care of this?

So in callback we do netdev_request_reconfigure() which triggers the 
following sequence…

bridge_run__()
  ...
    dpif_netdev_run()
      ...
        reconfigure_datapath()
          ...
            netdev_dpdk_reconfigure()

But than also it will call in the next run

bridge_run()
  ...
    bridge_delete_or_reconfigure_ports()
      ...
         netdev_set_config()
           netdev_dpdk_set_config()


Or do I miss something?

>
> BTW, there is a discussion about flow control configuration:
> https://patchwork.ozlabs.org/patch/1159689/
>
>> +        dev->reset_needed = false;
>> +    } else {
>> +        rte_eth_dev_stop(dev->port_id);
>> +    }
>> +
>>      dev->started = false;
>>
>>      err = netdev_dpdk_mempool_configure(dev);
Ilya Maximets Sept. 12, 2019, 10:19 a.m. UTC | #3
On 12.09.2019 13:07, Eelco Chaudron wrote:
> 
> 
> On 12 Sep 2019, at 10:39, Ilya Maximets wrote:
> 
>> On 11.09.2019 16:20, Eelco Chaudron wrote:
>>> Currently, OVS does not register and therefore not handle the
>>> interface reset event from the DPDK framework. This would cause a
>>> problem in cases where a VF is used as an interface, and its
>>> configuration changes.
>>>
>>> As an example in the following scenario the MAC change is not
>>> detected/acted upon until OVS is restarted without the patch applied:
>>>
>>>   $ echo 1 > /sys/bus/pci/devices/0000:05:00.1/sriov_numvfs
>>>   $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
>>>             set Interface dpdk0 type=dpdk -- \
>>>             set Interface dpdk0 options:dpdk-devargs=0000:05:0a.0
>>>
>>>   $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33
>>>
>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>> ---
>>
>> <snip>
>>
>>> @@ -4180,13 +4223,19 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>          && dev->rxq_size == dev->requested_rxq_size
>>>          && dev->txq_size == dev->requested_txq_size
>>>          && dev->socket_id == dev->requested_socket_id
>>> -        && dev->started) {
>>> +        && dev->started && !dev->reset_needed) {
>>>          /* Reconfiguration is unnecessary */
>>>
>>>          goto out;
>>>      }
>>>
>>> -    rte_eth_dev_stop(dev->port_id);
>>> +    if (dev->reset_needed) {
>>> +        rte_eth_dev_reset(dev->port_id);
>>
>> Thinking more about the change and looking at flow control configuration,
>> it seems that on reset we'll lost configurations done in set_config().
>> Device reset should return it to initial state, i.e. all the default settings,
>> but set_config() will not be called after that.
>> I know, that VFs commonly doesn't support flow control, but if we'll add like
>> real configuration of MAC address, it will be lost too.
>>
>> What do you think?
> 
> Doesn’t the full bridge run sequence take care of this?
> 
> So in callback we do netdev_request_reconfigure() which triggers the following sequence…
> 
> bridge_run__()
>  ...
>    dpif_netdev_run()
>      ...
>        reconfigure_datapath()
>          ...
>            netdev_dpdk_reconfigure()
> 
> But than also it will call in the next run
> 
> bridge_run()
>  ...
>    bridge_delete_or_reconfigure_ports()
>      ...
>         netdev_set_config()
>           netdev_dpdk_set_config()
> 
> 
> Or do I miss something?

dpif_netdev_run() is called from ofproto_type_run() which is called
unconditionally from the bridge_run().
But bridge_delete_or_reconfigure_ports() only called from bridge_reconfigure(),
which is protected by the following condition:

    if (ovsdb_idl_get_seqno(idl) != idl_seqno ||
        if_notifier_changed(ifnotifier)) {

i.e. if there will be no DB updates or there will be no netlink notifications,
set_config() will not be called. I understand that in your scenario there
might be netlink notification for interface changes since PF is controlled
by the kernel, but it might be not the case if PF attached as a DPDK port
to OVS or some other application.

> 
>>
>> BTW, there is a discussion about flow control configuration:
>> https://patchwork.ozlabs.org/patch/1159689/
>>
>>> +        dev->reset_needed = false;
>>> +    } else {
>>> +        rte_eth_dev_stop(dev->port_id);
>>> +    }
>>> +
>>>      dev->started = false;
>>>
>>>      err = netdev_dpdk_mempool_configure(dev);
> 
>
Ilya Maximets Sept. 12, 2019, 10:24 a.m. UTC | #4
On 12.09.2019 13:19, Ilya Maximets wrote:
> On 12.09.2019 13:07, Eelco Chaudron wrote:
>>
>>
>> On 12 Sep 2019, at 10:39, Ilya Maximets wrote:
>>
>>> On 11.09.2019 16:20, Eelco Chaudron wrote:
>>>> Currently, OVS does not register and therefore not handle the
>>>> interface reset event from the DPDK framework. This would cause a
>>>> problem in cases where a VF is used as an interface, and its
>>>> configuration changes.
>>>>
>>>> As an example in the following scenario the MAC change is not
>>>> detected/acted upon until OVS is restarted without the patch applied:
>>>>
>>>>   $ echo 1 > /sys/bus/pci/devices/0000:05:00.1/sriov_numvfs
>>>>   $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
>>>>             set Interface dpdk0 type=dpdk -- \
>>>>             set Interface dpdk0 options:dpdk-devargs=0000:05:0a.0
>>>>
>>>>   $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33
>>>>
>>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>>> ---
>>>
>>> <snip>
>>>
>>>> @@ -4180,13 +4223,19 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>>          && dev->rxq_size == dev->requested_rxq_size
>>>>          && dev->txq_size == dev->requested_txq_size
>>>>          && dev->socket_id == dev->requested_socket_id
>>>> -        && dev->started) {
>>>> +        && dev->started && !dev->reset_needed) {
>>>>          /* Reconfiguration is unnecessary */
>>>>
>>>>          goto out;
>>>>      }
>>>>
>>>> -    rte_eth_dev_stop(dev->port_id);
>>>> +    if (dev->reset_needed) {
>>>> +        rte_eth_dev_reset(dev->port_id);
>>>
>>> Thinking more about the change and looking at flow control configuration,
>>> it seems that on reset we'll lost configurations done in set_config().
>>> Device reset should return it to initial state, i.e. all the default settings,
>>> but set_config() will not be called after that.
>>> I know, that VFs commonly doesn't support flow control, but if we'll add like
>>> real configuration of MAC address, it will be lost too.
>>>
>>> What do you think?
>>
>> Doesn’t the full bridge run sequence take care of this?
>>
>> So in callback we do netdev_request_reconfigure() which triggers the following sequence…
>>
>> bridge_run__()
>>  ...
>>    dpif_netdev_run()
>>      ...
>>        reconfigure_datapath()
>>          ...
>>            netdev_dpdk_reconfigure()
>>
>> But than also it will call in the next run
>>
>> bridge_run()
>>  ...
>>    bridge_delete_or_reconfigure_ports()
>>      ...
>>         netdev_set_config()
>>           netdev_dpdk_set_config()
>>
>>
>> Or do I miss something?
> 
> dpif_netdev_run() is called from ofproto_type_run() which is called
> unconditionally from the bridge_run().
> But bridge_delete_or_reconfigure_ports() only called from bridge_reconfigure(),
> which is protected by the following condition:
> 
>     if (ovsdb_idl_get_seqno(idl) != idl_seqno ||
>         if_notifier_changed(ifnotifier)) {
> 
> i.e. if there will be no DB updates or there will be no netlink notifications,
> set_config() will not be called. I understand that in your scenario there
> might be netlink notification for interface changes since PF is controlled
> by the kernel, but it might be not the case if PF attached as a DPDK port
> to OVS or some other application.

Hmm. Basically, dpdk_eth_event_callback() is an analogue of the
if_notifier, but for DPDK application.

Maybe we can add it as another trigger for above 'if' condition?

> 
>>
>>>
>>> BTW, there is a discussion about flow control configuration:
>>> https://patchwork.ozlabs.org/patch/1159689/
>>>
>>>> +        dev->reset_needed = false;
>>>> +    } else {
>>>> +        rte_eth_dev_stop(dev->port_id);
>>>> +    }
>>>> +
>>>>      dev->started = false;
>>>>
>>>>      err = netdev_dpdk_mempool_configure(dev);
>>
>>
> 
>
Eelco Chaudron Sept. 24, 2019, 1:15 p.m. UTC | #5
On 12 Sep 2019, at 12:24, Ilya Maximets wrote:

> On 12.09.2019 13:19, Ilya Maximets wrote:
>> On 12.09.2019 13:07, Eelco Chaudron wrote:
>>>
>>>
>>> On 12 Sep 2019, at 10:39, Ilya Maximets wrote:
>>>
>>>> On 11.09.2019 16:20, Eelco Chaudron wrote:
>>>>> Currently, OVS does not register and therefore not handle the
>>>>> interface reset event from the DPDK framework. This would cause a
>>>>> problem in cases where a VF is used as an interface, and its
>>>>> configuration changes.
>>>>>
>>>>> As an example in the following scenario the MAC change is not
>>>>> detected/acted upon until OVS is restarted without the patch 
>>>>> applied:
>>>>>
>>>>>   $ echo 1 > /sys/bus/pci/devices/0000:05:00.1/sriov_numvfs
>>>>>   $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
>>>>>             set Interface dpdk0 type=dpdk -- \
>>>>>             set Interface dpdk0 
>>>>> options:dpdk-devargs=0000:05:0a.0
>>>>>
>>>>>   $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33
>>>>>
>>>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>>>> ---
>>>>
>>>> <snip>
>>>>
>>>>> @@ -4180,13 +4223,19 @@ netdev_dpdk_reconfigure(struct netdev 
>>>>> *netdev)
>>>>>          && dev->rxq_size == dev->requested_rxq_size
>>>>>          && dev->txq_size == dev->requested_txq_size
>>>>>          && dev->socket_id == dev->requested_socket_id
>>>>> -        && dev->started) {
>>>>> +        && dev->started && !dev->reset_needed) {
>>>>>          /* Reconfiguration is unnecessary */
>>>>>
>>>>>          goto out;
>>>>>      }
>>>>>
>>>>> -    rte_eth_dev_stop(dev->port_id);
>>>>> +    if (dev->reset_needed) {
>>>>> +        rte_eth_dev_reset(dev->port_id);
>>>>
>>>> Thinking more about the change and looking at flow control 
>>>> configuration,
>>>> it seems that on reset we'll lost configurations done in 
>>>> set_config().
>>>> Device reset should return it to initial state, i.e. all the 
>>>> default settings,
>>>> but set_config() will not be called after that.
>>>> I know, that VFs commonly doesn't support flow control, but if 
>>>> we'll add like
>>>> real configuration of MAC address, it will be lost too.
>>>>
>>>> What do you think?
>>>
>>> Doesn’t the full bridge run sequence take care of this?
>>>
>>> So in callback we do netdev_request_reconfigure() which triggers the 
>>> following sequence…
>>>
>>> bridge_run__()
>>>  ...
>>>    dpif_netdev_run()
>>>      ...
>>>        reconfigure_datapath()
>>>          ...
>>>            netdev_dpdk_reconfigure()
>>>
>>> But than also it will call in the next run
>>>
>>> bridge_run()
>>>  ...
>>>    bridge_delete_or_reconfigure_ports()
>>>      ...
>>>         netdev_set_config()
>>>           netdev_dpdk_set_config()
>>>
>>>
>>> Or do I miss something?
>>
>> dpif_netdev_run() is called from ofproto_type_run() which is called
>> unconditionally from the bridge_run().
>> But bridge_delete_or_reconfigure_ports() only called from 
>> bridge_reconfigure(),
>> which is protected by the following condition:
>>
>>     if (ovsdb_idl_get_seqno(idl) != idl_seqno ||
>>         if_notifier_changed(ifnotifier)) {
>>
>> i.e. if there will be no DB updates or there will be no netlink 
>> notifications,
>> set_config() will not be called. I understand that in your scenario 
>> there
>> might be netlink notification for interface changes since PF is 
>> controlled
>> by the kernel, but it might be not the case if PF attached as a DPDK 
>> port
>> to OVS or some other application.
>
> Hmm. Basically, dpdk_eth_event_callback() is an analogue of the
> if_notifier, but for DPDK application.
>
> Maybe we can add it as another trigger for above 'if' condition?

It sounds like a good idea, however, I see no clean way of doing this as 
the if_notifier_changed()/if_notifier_create() is not data path 
abstracted.
Any ideas for a nice clean solution here? Guess we could add a 
dpdk_notifier_create() in if-notifier.c

Thoughts?
Ilya Maximets Sept. 26, 2019, 10:30 a.m. UTC | #6
On 24.09.2019 16:15, Eelco Chaudron wrote:
> 
> 
> On 12 Sep 2019, at 12:24, Ilya Maximets wrote:
> 
>> On 12.09.2019 13:19, Ilya Maximets wrote:
>>> On 12.09.2019 13:07, Eelco Chaudron wrote:
>>>>
>>>>
>>>> On 12 Sep 2019, at 10:39, Ilya Maximets wrote:
>>>>
>>>>> On 11.09.2019 16:20, Eelco Chaudron wrote:
>>>>>> Currently, OVS does not register and therefore not handle the
>>>>>> interface reset event from the DPDK framework. This would cause a
>>>>>> problem in cases where a VF is used as an interface, and its
>>>>>> configuration changes.
>>>>>>
>>>>>> As an example in the following scenario the MAC change is not
>>>>>> detected/acted upon until OVS is restarted without the patch applied:
>>>>>>
>>>>>>   $ echo 1 > /sys/bus/pci/devices/0000:05:00.1/sriov_numvfs
>>>>>>   $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
>>>>>>             set Interface dpdk0 type=dpdk -- \
>>>>>>             set Interface dpdk0 options:dpdk-devargs=0000:05:0a.0
>>>>>>
>>>>>>   $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33
>>>>>>
>>>>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>>>>> ---
>>>>>
>>>>> <snip>
>>>>>
>>>>>> @@ -4180,13 +4223,19 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>>>>          && dev->rxq_size == dev->requested_rxq_size
>>>>>>          && dev->txq_size == dev->requested_txq_size
>>>>>>          && dev->socket_id == dev->requested_socket_id
>>>>>> -        && dev->started) {
>>>>>> +        && dev->started && !dev->reset_needed) {
>>>>>>          /* Reconfiguration is unnecessary */
>>>>>>
>>>>>>          goto out;
>>>>>>      }
>>>>>>
>>>>>> -    rte_eth_dev_stop(dev->port_id);
>>>>>> +    if (dev->reset_needed) {
>>>>>> +        rte_eth_dev_reset(dev->port_id);
>>>>>
>>>>> Thinking more about the change and looking at flow control configuration,
>>>>> it seems that on reset we'll lost configurations done in set_config().
>>>>> Device reset should return it to initial state, i.e. all the default settings,
>>>>> but set_config() will not be called after that.
>>>>> I know, that VFs commonly doesn't support flow control, but if we'll add like
>>>>> real configuration of MAC address, it will be lost too.
>>>>>
>>>>> What do you think?
>>>>
>>>> Doesn’t the full bridge run sequence take care of this?
>>>>
>>>> So in callback we do netdev_request_reconfigure() which triggers the following sequence…
>>>>
>>>> bridge_run__()
>>>>  ...
>>>>    dpif_netdev_run()
>>>>      ...
>>>>        reconfigure_datapath()
>>>>          ...
>>>>            netdev_dpdk_reconfigure()
>>>>
>>>> But than also it will call in the next run
>>>>
>>>> bridge_run()
>>>>  ...
>>>>    bridge_delete_or_reconfigure_ports()
>>>>      ...
>>>>         netdev_set_config()
>>>>           netdev_dpdk_set_config()
>>>>
>>>>
>>>> Or do I miss something?
>>>
>>> dpif_netdev_run() is called from ofproto_type_run() which is called
>>> unconditionally from the bridge_run().
>>> But bridge_delete_or_reconfigure_ports() only called from bridge_reconfigure(),
>>> which is protected by the following condition:
>>>
>>>     if (ovsdb_idl_get_seqno(idl) != idl_seqno ||
>>>         if_notifier_changed(ifnotifier)) {
>>>
>>> i.e. if there will be no DB updates or there will be no netlink notifications,
>>> set_config() will not be called. I understand that in your scenario there
>>> might be netlink notification for interface changes since PF is controlled
>>> by the kernel, but it might be not the case if PF attached as a DPDK port
>>> to OVS or some other application.
>>
>> Hmm. Basically, dpdk_eth_event_callback() is an analogue of the
>> if_notifier, but for DPDK application.
>>
>> Maybe we can add it as another trigger for above 'if' condition?
> 
> It sounds like a good idea, however, I see no clean way of doing this as the> if_notifier_changed()/if_notifier_create() is not data path abstracted.
> Any ideas for a nice clean solution here?
> Guess we could add a dpdk_notifier_create() in if-notifier.c
> 
> Thoughts?

I'm not sure how the clean solution could look like, but I'd like to
rise another issue closely related to this change.

The issue is an infinite re-addition of the failed ports.
This happens if the device in userspace datapath has a linux network
interface and it's not able to be configured. For example, if the first
reconfiguration fails because of misconfiguration.
In current code victims are afxdp ports and the mellanox NIC ports
opened by the DPDK due to their bifurcated drivers (It's unlikely for
usual netdev-linux ports to fail).

The root cause: Every change in the state of the network interface
of a linux kernel device generates if-notifier event and if-notifier
event triggers the OVS code to re-apply the configuration of ports,
i.e. add broken ports back. The most obvious part is that dpif-netdev
changes the device flags before trying to configure it:

    1. add_port()
    2. set_flags() --> if-notifier event
    3. reconfigure() --> port removal from the datapath due to
                         misconfiguration or any other issue in
                         the underlying device.
    4. setting flags back --> another if-notifier event.
    5. There was new if-notifier event?
       yes --> re-apply all settings. --> goto step 1.

We could move applying the netdev flags after the first successful
reconfiguration and this might fix part of the problem, but many
changes that we're applying on the reconfiguration phase could
generate if-notifier events too and we can't easily avoid that.

I think, if we'll add a dpdk_notifier, we'll have same issue for
the rest of DPDK port types.

Any thoughts?

BTW, I'm not against the dpdk_notifier and I think that it's a
right direction to move, but it's the right place and time to
highlight the issue we have.

Best regards, Ilya Maximets.
Eelco Chaudron Oct. 31, 2019, 10:16 a.m. UTC | #7
On 26 Sep 2019, at 12:30, Ilya Maximets wrote:

> On 24.09.2019 16:15, Eelco Chaudron wrote:
>>
>>
>> On 12 Sep 2019, at 12:24, Ilya Maximets wrote:
>>
>>> On 12.09.2019 13:19, Ilya Maximets wrote:
>>>> On 12.09.2019 13:07, Eelco Chaudron wrote:
>>>>>
>>>>>
>>>>> On 12 Sep 2019, at 10:39, Ilya Maximets wrote:
>>>>>
>>>>>> On 11.09.2019 16:20, Eelco Chaudron wrote:
>>>>>>> Currently, OVS does not register and therefore not handle the
>>>>>>> interface reset event from the DPDK framework. This would cause 
>>>>>>> a
>>>>>>> problem in cases where a VF is used as an interface, and its
>>>>>>> configuration changes.
>>>>>>>
>>>>>>> As an example in the following scenario the MAC change is not
>>>>>>> detected/acted upon until OVS is restarted without the patch 
>>>>>>> applied:
>>>>>>>
>>>>>>>   $ echo 1 > /sys/bus/pci/devices/0000:05:00.1/sriov_numvfs
>>>>>>>   $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
>>>>>>>             set Interface dpdk0 type=dpdk -- \
>>>>>>>             set Interface dpdk0 
>>>>>>> options:dpdk-devargs=0000:05:0a.0
>>>>>>>
>>>>>>>   $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33
>>>>>>>
>>>>>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>>>>>> ---
>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>> @@ -4180,13 +4223,19 @@ netdev_dpdk_reconfigure(struct netdev 
>>>>>>> *netdev)
>>>>>>>          && dev->rxq_size == dev->requested_rxq_size
>>>>>>>          && dev->txq_size == dev->requested_txq_size
>>>>>>>          && dev->socket_id == dev->requested_socket_id
>>>>>>> -        && dev->started) {
>>>>>>> +        && dev->started && !dev->reset_needed) {
>>>>>>>          /* Reconfiguration is unnecessary */
>>>>>>>
>>>>>>>          goto out;
>>>>>>>      }
>>>>>>>
>>>>>>> -    rte_eth_dev_stop(dev->port_id);
>>>>>>> +    if (dev->reset_needed) {
>>>>>>> +        rte_eth_dev_reset(dev->port_id);
>>>>>>
>>>>>> Thinking more about the change and looking at flow control 
>>>>>> configuration,
>>>>>> it seems that on reset we'll lost configurations done in 
>>>>>> set_config().
>>>>>> Device reset should return it to initial state, i.e. all the 
>>>>>> default settings,
>>>>>> but set_config() will not be called after that.
>>>>>> I know, that VFs commonly doesn't support flow control, but if 
>>>>>> we'll add like
>>>>>> real configuration of MAC address, it will be lost too.
>>>>>>
>>>>>> What do you think?
>>>>>
>>>>> Doesn’t the full bridge run sequence take care of this?
>>>>>
>>>>> So in callback we do netdev_request_reconfigure() which triggers 
>>>>> the following sequence…
>>>>>
>>>>> bridge_run__()
>>>>>  ...
>>>>>    dpif_netdev_run()
>>>>>      ...
>>>>>        reconfigure_datapath()
>>>>>          ...
>>>>>            netdev_dpdk_reconfigure()
>>>>>
>>>>> But than also it will call in the next run
>>>>>
>>>>> bridge_run()
>>>>>  ...
>>>>>    bridge_delete_or_reconfigure_ports()
>>>>>      ...
>>>>>         netdev_set_config()
>>>>>           netdev_dpdk_set_config()
>>>>>
>>>>>
>>>>> Or do I miss something?
>>>>
>>>> dpif_netdev_run() is called from ofproto_type_run() which is called
>>>> unconditionally from the bridge_run().
>>>> But bridge_delete_or_reconfigure_ports() only called from 
>>>> bridge_reconfigure(),
>>>> which is protected by the following condition:
>>>>
>>>>     if (ovsdb_idl_get_seqno(idl) != idl_seqno ||
>>>>         if_notifier_changed(ifnotifier)) {
>>>>
>>>> i.e. if there will be no DB updates or there will be no netlink 
>>>> notifications,
>>>> set_config() will not be called. I understand that in your scenario 
>>>> there
>>>> might be netlink notification for interface changes since PF is 
>>>> controlled
>>>> by the kernel, but it might be not the case if PF attached as a 
>>>> DPDK port
>>>> to OVS or some other application.
>>>
>>> Hmm. Basically, dpdk_eth_event_callback() is an analogue of the
>>> if_notifier, but for DPDK application.
>>>
>>> Maybe we can add it as another trigger for above 'if' condition?
>>
>> It sounds like a good idea, however, I see no clean way of doing this 
>> as the> if_notifier_changed()/if_notifier_create() is not data path 
>> abstracted.
>> Any ideas for a nice clean solution here?
>> Guess we could add a dpdk_notifier_create() in if-notifier.c
>>
>> Thoughts?
>
> I'm not sure how the clean solution could look like, but I'd like to
> rise another issue closely related to this change.

I’ll send out a patch later today that will include a DPDK notifier as 
part of the if-notifier.c to take care of the set_config() part.

> The issue is an infinite re-addition of the failed ports.
> This happens if the device in userspace datapath has a linux network
> interface and it's not able to be configured. For example, if the 
> first
> reconfiguration fails because of misconfiguration.
> In current code victims are afxdp ports and the mellanox NIC ports
> opened by the DPDK due to their bifurcated drivers (It's unlikely for
> usual netdev-linux ports to fail).
>
> The root cause: Every change in the state of the network interface
> of a linux kernel device generates if-notifier event and if-notifier
> event triggers the OVS code to re-apply the configuration of ports,
> i.e. add broken ports back. The most obvious part is that dpif-netdev
> changes the device flags before trying to configure it:
>
>    1. add_port()
>    2. set_flags() --> if-notifier event
>    3. reconfigure() --> port removal from the datapath due to
>                         misconfiguration or any other issue in
>                         the underlying device.
>    4. setting flags back --> another if-notifier event.
>    5. There was new if-notifier event?
>       yes --> re-apply all settings. --> goto step 1.
>
> We could move applying the netdev flags after the first successful
> reconfiguration and this might fix part of the problem, but many
> changes that we're applying on the reconfiguration phase could
> generate if-notifier events too and we can't easily avoid that.
>
> I think, if we'll add a dpdk_notifier, we'll have same issue for
> the rest of DPDK port types.
>
> Any thoughts?


I’ve been thinking about this, but see no fix-all solution right now, 
it needs a bit more thinking.

As this is not something introduced by this patch, I’ll work on it 
separately. I’ve not been able to trigger it in the DPDK case, 
however, I can easily in the AF_XDP case.

//Eelco
Ilya Maximets Oct. 31, 2019, 10:49 a.m. UTC | #8
On 31.10.2019 11:16, Eelco Chaudron wrote:
> 
> 
> On 26 Sep 2019, at 12:30, Ilya Maximets wrote:
> 
>> On 24.09.2019 16:15, Eelco Chaudron wrote:
>>>
>>>
>>> On 12 Sep 2019, at 12:24, Ilya Maximets wrote:
>>>
>>>> On 12.09.2019 13:19, Ilya Maximets wrote:
>>>>> On 12.09.2019 13:07, Eelco Chaudron wrote:
>>>>>>
>>>>>>
>>>>>> On 12 Sep 2019, at 10:39, Ilya Maximets wrote:
>>>>>>
>>>>>>> On 11.09.2019 16:20, Eelco Chaudron wrote:
>>>>>>>> Currently, OVS does not register and therefore not handle the
>>>>>>>> interface reset event from the DPDK framework. This would cause a
>>>>>>>> problem in cases where a VF is used as an interface, and its
>>>>>>>> configuration changes.
>>>>>>>>
>>>>>>>> As an example in the following scenario the MAC change is not
>>>>>>>> detected/acted upon until OVS is restarted without the patch applied:
>>>>>>>>
>>>>>>>>   $ echo 1 > /sys/bus/pci/devices/0000:05:00.1/sriov_numvfs
>>>>>>>>   $ ovs-vsctl add-port ovs_pvp_br0 dpdk0 -- \
>>>>>>>>             set Interface dpdk0 type=dpdk -- \
>>>>>>>>             set Interface dpdk0 options:dpdk-devargs=0000:05:0a.0
>>>>>>>>
>>>>>>>>   $ ip link set p5p2 vf 0 mac 52:54:00:92:d3:33
>>>>>>>>
>>>>>>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>>>>>>> ---
>>>>>>>
>>>>>>> <snip>
>>>>>>>
>>>>>>>> @@ -4180,13 +4223,19 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>>>>>>          && dev->rxq_size == dev->requested_rxq_size
>>>>>>>>          && dev->txq_size == dev->requested_txq_size
>>>>>>>>          && dev->socket_id == dev->requested_socket_id
>>>>>>>> -        && dev->started) {
>>>>>>>> +        && dev->started && !dev->reset_needed) {
>>>>>>>>          /* Reconfiguration is unnecessary */
>>>>>>>>
>>>>>>>>          goto out;
>>>>>>>>      }
>>>>>>>>
>>>>>>>> -    rte_eth_dev_stop(dev->port_id);
>>>>>>>> +    if (dev->reset_needed) {
>>>>>>>> +        rte_eth_dev_reset(dev->port_id);
>>>>>>>
>>>>>>> Thinking more about the change and looking at flow control configuration,
>>>>>>> it seems that on reset we'll lost configurations done in set_config().
>>>>>>> Device reset should return it to initial state, i.e. all the default settings,
>>>>>>> but set_config() will not be called after that.
>>>>>>> I know, that VFs commonly doesn't support flow control, but if we'll add like
>>>>>>> real configuration of MAC address, it will be lost too.
>>>>>>>
>>>>>>> What do you think?
>>>>>>
>>>>>> Doesn’t the full bridge run sequence take care of this?
>>>>>>
>>>>>> So in callback we do netdev_request_reconfigure() which triggers the following sequence…
>>>>>>
>>>>>> bridge_run__()
>>>>>>  ...
>>>>>>    dpif_netdev_run()
>>>>>>      ...
>>>>>>        reconfigure_datapath()
>>>>>>          ...
>>>>>>            netdev_dpdk_reconfigure()
>>>>>>
>>>>>> But than also it will call in the next run
>>>>>>
>>>>>> bridge_run()
>>>>>>  ...
>>>>>>    bridge_delete_or_reconfigure_ports()
>>>>>>      ...
>>>>>>         netdev_set_config()
>>>>>>           netdev_dpdk_set_config()
>>>>>>
>>>>>>
>>>>>> Or do I miss something?
>>>>>
>>>>> dpif_netdev_run() is called from ofproto_type_run() which is called
>>>>> unconditionally from the bridge_run().
>>>>> But bridge_delete_or_reconfigure_ports() only called from bridge_reconfigure(),
>>>>> which is protected by the following condition:
>>>>>
>>>>>     if (ovsdb_idl_get_seqno(idl) != idl_seqno ||
>>>>>         if_notifier_changed(ifnotifier)) {
>>>>>
>>>>> i.e. if there will be no DB updates or there will be no netlink notifications,
>>>>> set_config() will not be called. I understand that in your scenario there
>>>>> might be netlink notification for interface changes since PF is controlled
>>>>> by the kernel, but it might be not the case if PF attached as a DPDK port
>>>>> to OVS or some other application.
>>>>
>>>> Hmm. Basically, dpdk_eth_event_callback() is an analogue of the
>>>> if_notifier, but for DPDK application.
>>>>
>>>> Maybe we can add it as another trigger for above 'if' condition?
>>>
>>> It sounds like a good idea, however, I see no clean way of doing this as the> if_notifier_changed()/if_notifier_create() is not data path abstracted.
>>> Any ideas for a nice clean solution here?
>>> Guess we could add a dpdk_notifier_create() in if-notifier.c
>>>
>>> Thoughts?
>>
>> I'm not sure how the clean solution could look like, but I'd like to
>> rise another issue closely related to this change.
> 
> I’ll send out a patch later today that will include a DPDK notifier as part of the if-notifier.c to take care of the set_config() part.
> 
>> The issue is an infinite re-addition of the failed ports.
>> This happens if the device in userspace datapath has a linux network
>> interface and it's not able to be configured. For example, if the first
>> reconfiguration fails because of misconfiguration.
>> In current code victims are afxdp ports and the mellanox NIC ports
>> opened by the DPDK due to their bifurcated drivers (It's unlikely for
>> usual netdev-linux ports to fail).
>>
>> The root cause: Every change in the state of the network interface
>> of a linux kernel device generates if-notifier event and if-notifier
>> event triggers the OVS code to re-apply the configuration of ports,
>> i.e. add broken ports back. The most obvious part is that dpif-netdev
>> changes the device flags before trying to configure it:
>>
>>    1. add_port()
>>    2. set_flags() --> if-notifier event
>>    3. reconfigure() --> port removal from the datapath due to
>>                         misconfiguration or any other issue in
>>                         the underlying device.
>>    4. setting flags back --> another if-notifier event.
>>    5. There was new if-notifier event?
>>       yes --> re-apply all settings. --> goto step 1.
>>
>> We could move applying the netdev flags after the first successful
>> reconfiguration and this might fix part of the problem, but many
>> changes that we're applying on the reconfiguration phase could
>> generate if-notifier events too and we can't easily avoid that.
>>
>> I think, if we'll add a dpdk_notifier, we'll have same issue for
>> the rest of DPDK port types.
>>
>> Any thoughts?
> 
> 
> I’ve been thinking about this, but see no fix-all solution right now, it needs a bit more thinking.
> 
> As this is not something introduced by this patch, I’ll work on it separately.
> I’ve not been able to trigger it in the DPDK case, however, I can easily in the AF_XDP case.

netdev-dpdk is more robust.  It validates most of the config options
and re-tries the configuration.  Configuration failures usually happens
because of the incorrect devargs or broken device/missing libs.

I'm working on making netdev-afxdp more error-proof, e.g. best-effort
xdp-mode choosing. Start from "drv+zc" and try just "drv" and subsequently
"skb" on failures. And we also need validation of n_rxq option to not
try to configure more than available.

Moving of flags' updating after the first reconfiguration solves
most of the obvious cases  with afxdp misconfigurations. I have
a patch for this. Will finalize it and send sometime soon.

Best regards, Ilya Maximets.
William Tu Oct. 31, 2019, 2:01 p.m. UTC | #9
> > The root cause: Every change in the state of the network interface
> > of a linux kernel device generates if-notifier event and if-notifier
> > event triggers the OVS code to re-apply the configuration of ports,
> > i.e. add broken ports back. The most obvious part is that dpif-netdev
> > changes the device flags before trying to configure it:
> >


> >    1. add_port()
> >    2. set_flags() --> if-notifier event
> >    3. reconfigure() --> port removal from the datapath due to
> >                         misconfiguration or any other issue in
> >                         the underlying device.
> >    4. setting flags back --> another if-notifier event.
> >    5. There was new if-notifier event?
> >       yes --> re-apply all settings. --> goto step 1.
> >

Hi Eelco and Ilya,

Is it better to save the current flags before step2, and
after setting flags back, at step 5, we know it's the same
as before, so we can skip re-applying all settings.

so
    1. add_port()
    2. save old_flags, set_flags() --> if-notifier event
    3. reconfigure() --> port removal from the datapath due to
                         misconfiguration or any other issue in
                         the underlying device.
    4. setting flags back as cur_flags --> another if-notifier event.
    5. There was new if-notifier event?
       yes && old_flags != cur_flags --> re-apply all settings. --> goto step 1.

Regards,
William
Ilya Maximets Oct. 31, 2019, 2:11 p.m. UTC | #10
On 31.10.2019 15:01, William Tu wrote:
>>> The root cause: Every change in the state of the network interface
>>> of a linux kernel device generates if-notifier event and if-notifier
>>> event triggers the OVS code to re-apply the configuration of ports,
>>> i.e. add broken ports back. The most obvious part is that dpif-netdev
>>> changes the device flags before trying to configure it:
>>>
> 
> 
>>>     1. add_port()
>>>     2. set_flags() --> if-notifier event
>>>     3. reconfigure() --> port removal from the datapath due to
>>>                          misconfiguration or any other issue in
>>>                          the underlying device.
>>>     4. setting flags back --> another if-notifier event.
>>>     5. There was new if-notifier event?
>>>        yes --> re-apply all settings. --> goto step 1.
>>>
> 
> Hi Eelco and Ilya,
> 
> Is it better to save the current flags before step2, and
> after setting flags back, at step 5, we know it's the same
> as before, so we can skip re-applying all settings.

It already works like this, i.e. flags are saved and restored,
but each netlink notification increments sequence number in
bridge.c and bridge.c decides to re-apply configs because sequence
number changed. Datapath doesn't have control over that.

The best we can do is to postpone flags update until we know that
reconfguration passed.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4805783..9747def 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -396,6 +396,8 @@  struct netdev_dpdk {
         bool attached;
         /* If true, rte_eth_dev_start() was successfully called */
         bool started;
+        bool reset_needed;
+        /* 1 pad byte here. */
         struct eth_addr hwaddr;
         int mtu;
         int socket_id;
@@ -1173,6 +1175,8 @@  common_construct(struct netdev *netdev, dpdk_port_t port_no,
     ovsrcu_index_init(&dev->vid, -1);
     dev->vhost_reconfigured = false;
     dev->attached = false;
+    dev->started = false;
+    dev->reset_needed = false;
 
     ovsrcu_init(&dev->qos_conf, NULL);
 
@@ -1769,6 +1773,34 @@  netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
     return new_port_id;
 }
 
+static int
+dpdk_eth_event_callback(dpdk_port_t port_id, enum rte_eth_event_type type,
+                        void *param OVS_UNUSED, void *ret_param OVS_UNUSED)
+{
+    struct netdev_dpdk *dev;
+
+    switch ((int) type) {
+    case RTE_ETH_EVENT_INTR_RESET:
+        ovs_mutex_lock(&dpdk_mutex);
+        dev = netdev_dpdk_lookup_by_port_id(port_id);
+        if (dev) {
+            ovs_mutex_lock(&dev->mutex);
+            dev->reset_needed = true;
+            netdev_request_reconfigure(&dev->up);
+            VLOG_DBG_RL(&rl, "%s: Device reset requested.",
+                        netdev_get_name(&dev->up));
+            ovs_mutex_unlock(&dev->mutex);
+        }
+        ovs_mutex_unlock(&dpdk_mutex);
+        break;
+
+    default:
+        /* Ignore all other types. */
+        break;
+   }
+   return 0;
+}
+
 static void
 dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args)
     OVS_REQUIRES(dev->mutex)
@@ -3807,6 +3839,8 @@  netdev_dpdk_class_init(void)
     /* This function can be called for different classes.  The initialization
      * needs to be done only once */
     if (ovsthread_once_start(&once)) {
+        int ret;
+
         ovs_thread_create("dpdk_watchdog", dpdk_watchdog, NULL);
         unixctl_command_register("netdev-dpdk/set-admin-state",
                                  "[netdev] up|down", 1, 2,
@@ -3820,6 +3854,15 @@  netdev_dpdk_class_init(void)
                                  "[netdev]", 0, 1,
                                  netdev_dpdk_get_mempool_info, NULL);
 
+        ret = rte_eth_dev_callback_register(RTE_ETH_ALL,
+                                            RTE_ETH_EVENT_INTR_RESET,
+                                            dpdk_eth_event_callback, NULL);
+
+        if (ret != 0) {
+            VLOG_ERR("Ethernet device callback register error: %s",
+                     rte_strerror(-ret));
+        }
+
         ovsthread_once_done(&once);
     }
 
@@ -4180,13 +4223,19 @@  netdev_dpdk_reconfigure(struct netdev *netdev)
         && dev->rxq_size == dev->requested_rxq_size
         && dev->txq_size == dev->requested_txq_size
         && dev->socket_id == dev->requested_socket_id
-        && dev->started) {
+        && dev->started && !dev->reset_needed) {
         /* Reconfiguration is unnecessary */
 
         goto out;
     }
 
-    rte_eth_dev_stop(dev->port_id);
+    if (dev->reset_needed) {
+        rte_eth_dev_reset(dev->port_id);
+        dev->reset_needed = false;
+    } else {
+        rte_eth_dev_stop(dev->port_id);
+    }
+
     dev->started = false;
 
     err = netdev_dpdk_mempool_configure(dev);