[ovs-dev,v6] Configurable Link State Change (LSC) detection mode

Message ID HE1PR07MB121289B1519335088EBFAA4DEDDA0@HE1PR07MB1212.eurprd07.prod.outlook.com
State Changes Requested
Delegated to: Ian Stokes
Headers show
Series
  • [ovs-dev,v6] Configurable Link State Change (LSC) detection mode
Related show

Commit Message

Róbert Mulik March 5, 2018, 3:16 p.m.
It is possible to change LSC detection mode to polling or interrupt mode
for DPDK interfaces. The default is polling mode. To set interrupt mode,
option dpdk-lsc-interrupt has to be set to true.

In polling mode more processor time is needed, since the OVS repeatedly reads
the link state with a short period. It can lead to packet loss for certain
systems.

In interrupt mode the hardware itself triggers an interrupt when link state
change happens, so less processing time needs for the OVS.

For detailed description and usage see the dpdk install documentation.

Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
---
v5 -> v6:
- DPDK install documentation updated.
- Status of lsc_interrupt_mode of DPDK interfaces can be read by command
  ovs-appctl dpif/show.
- It was suggested to check if the HW supports interrupt mode, but it is not
  possible to do without DPDK code change, so it is skipped from this patch.
---
 Documentation/intro/install/dpdk.rst | 33 +++++++++++++++++++++++++++++++++
 lib/netdev-dpdk.c                    | 24 ++++++++++++++++++++++--
 vswitchd/vswitch.xml                 | 17 +++++++++++++++++
 3 files changed, 72 insertions(+), 2 deletions(-)

--
1.9.1

Comments

Stokes, Ian March 27, 2018, 10:19 a.m. | #1
> It is possible to change LSC detection mode to polling or interrupt mode
> for DPDK interfaces. The default is polling mode. To set interrupt mode,
> option dpdk-lsc-interrupt has to be set to true.
> 
> In polling mode more processor time is needed, since the OVS repeatedly
> reads the link state with a short period. It can lead to packet loss for
> certain systems.
> 
> In interrupt mode the hardware itself triggers an interrupt when link
> state change happens, so less processing time needs for the OVS.
> 
> For detailed description and usage see the dpdk install documentation.

Thanks for working on this Robert.

I've completed some testing including the case where LSC is not supported, in which case the port will remain in a down state and fail rx/tx traffic. This behavior conforms to the netdev_reconfigure expectations in the fail case so that's ok.

I'm a bit late to the thread but I have a few other comments below.

I'd like to get this patch in the next pull request if possible so I'd appreciate if others can give any comments on the patch also.

Thanks
Ian

> 
> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
> ---
> v5 -> v6:
> - DPDK install documentation updated.
> - Status of lsc_interrupt_mode of DPDK interfaces can be read by command
>   ovs-appctl dpif/show.
> - It was suggested to check if the HW supports interrupt mode, but it is
> not
>   possible to do without DPDK code change, so it is skipped from this
> patch.
> ---
>  Documentation/intro/install/dpdk.rst | 33
> +++++++++++++++++++++++++++++++++
>  lib/netdev-dpdk.c                    | 24 ++++++++++++++++++++++--
>  vswitchd/vswitch.xml                 | 17 +++++++++++++++++
>  3 files changed, 72 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/intro/install/dpdk.rst
> b/Documentation/intro/install/dpdk.rst
> index ed358d5..eb1bc7b 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -628,6 +628,39 @@ The average number of packets per output batch can be
> checked in PMD stats::
> 
>      $ ovs-appctl dpif-netdev/pmd-stats-show
> 
> +Link State Change (LSC) detection configuration
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +There are two methods to get the information when Link State Change
> +(LSC) happens on a network interface: by polling or interrupt.
> +
> +With the polling method, the main process checks the link state on a
> +fixed interval. This fixed interval determines how fast a link change
> +is detected.
> +
> +If interrupts are used to get LSC information, the hardware itself
> +triggers an interrupt when link state change happens, the interrupt
> +thread wakes up from sleep, updates the information, and goes back to
> +sleep mode. When no link state change happens (most of the time), the
> +thread remains in sleep mode and doesn`t use processor time at all. The
> +disadvantage of this method is that when an interrupt happens, the
> +processor has to handle it immediately, so it puts the currently
> +running process to background, handles the interrupt, and takes the
> background process back.
> +
> +Note that not all PMD drivers support LSC interrupts.
> +
> +The default configuration is polling mode. To set interrupt mode,
> +option ``dpdk-lsc-interrupt`` has to be set to ``true``.
> +
> +Command to set interrupt mode for a specific interface::
> +    $ ovs-vsctl set interface <iface_name>
> +options:dpdk-lsc-interrupt=true
> +
> +Command to set polling mode for a specific interface::
> +    $ ovs-vsctl set interface <iface_name>
> +options:dpdk-lsc-interrupt=false
> +
> +Command to remove ``dpdk-lsc-interrupt`` option::
> +    $ ovs-vsctl remove interface <iface_name> options
> +dpdk-lsc-interrupt

Just a query, why do we need the above option to remove lsc, is setting lsc true or false with the previous commands not enough?

> +
>  Limitations
>  ------------
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 94fb163..e2794e8
> 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -433,6 +433,12 @@ struct netdev_dpdk {
>          /* DPDK-ETH hardware offload features,
>           * from the enum set 'dpdk_hw_ol_features' */
>          uint32_t hw_ol_features;
> +
> +        /* Properties for link state change detection mode.
> +         * If lsc_interrupt_mode is set to false, poll mode is used,
> +         * otherwise interrupt mode is used. */
> +        bool requested_lsc_interrupt_mode;
> +        bool lsc_interrupt_mode;
>      );
> 
>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> @@ -686,12 +692,14 @@ dpdk_watchdog(void *dummy OVS_UNUSED)  }
> 
>  static int
> -dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
> +dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
>  {
>      int diag = 0;
>      int i;
>      struct rte_eth_conf conf = port_conf;
> 
> +    conf.intr_conf.lsc = dev->lsc_interrupt_mode;

Should above be dev->requested_lsc_interrupt_mode? Similar to how we request MTUs, we first have to validate if it's supported.
Since we have no way of knowing if the interrupt mode is supported at the moment we should set conf.intr_conf.lsc = dev->requested_lsc_interrupt_mode and once configuration succeeds set dev->lsc_interrupt_mode = conf.intr_conf.lsc.

If we don't then I'm not sure of the purpose of having requested_lsc_interrupt_mode and a separate lsc_interrupt_mode for dev.

Thoughts?

> +
>      /* For some NICs (e.g. Niantic), scatter_rx mode needs to be
> explicitly
>       * enabled. */
>      if (dev->mtu > ETHER_MTU) {
> @@ -801,7 +809,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>      n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
>      n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> 
> -    diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
> +    diag = dpdk_eth_dev_port_config(dev, n_rxq, n_txq);
>      if (diag) {
>          VLOG_ERR("Interface %s(rxq:%d txq:%d) configure error: %s",
>                   dev->up.name, n_rxq, n_txq, rte_strerror(-diag));

With the change from dpdk_eth_dev_queue_setup to dpdk_eth_dev_port_config I think we need to improve the error message also logged above.
Instead of only reporting the rxq and txq we should include the requested lsc mode also as that could cause a failure if not supported.

 @@ -
> 897,6 +905,7 @@ common_construct(struct netdev *netdev, dpdk_port_t
> port_no,
>      dev->flags = 0;
>      dev->requested_mtu = ETHER_MTU;
>      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> +    dev->requested_lsc_interrupt_mode = 0;
>      ovsrcu_index_init(&dev->vid, -1);
>      dev->vhost_reconfigured = false;
>      dev->attached = false;
> @@ -1320,6 +1329,8 @@ netdev_dpdk_get_config(const struct netdev *netdev,
> struct smap *args)
>          } else {
>              smap_add(args, "rx_csum_offload", "false");
>          }
> +        smap_add(args, "lsc_interrupt_mode",
> +                 dev->lsc_interrupt_mode?"true":"false");

Need space before/after operators '?' and ':' above.

>      }
>      ovs_mutex_unlock(&dev->mutex);
> 
> @@ -1520,6 +1531,12 @@ netdev_dpdk_set_config(struct netdev *netdev, const
> struct smap *args,
>          goto out;
>      }
> 
> +    bool lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt",

This comes down to style preference but can you declare this bool at the beginning of the function? There are other bool variable in the function declared there already so it keeps with the existing style of the function.

> false);
> +    if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
> +        dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
> +        netdev_request_reconfigure(netdev);
> +    }
> +
>      rx_fc_en = smap_get_bool(args, "rx-flow-ctrl", false);
>      tx_fc_en = smap_get_bool(args, "tx-flow-ctrl", false);
>      autoneg = smap_get_bool(args, "flow-ctrl-autoneg", false); @@ -3546,6
> +3563,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>      if (netdev->n_txq == dev->requested_n_txq
>          && netdev->n_rxq == dev->requested_n_rxq
>          && dev->mtu == dev->requested_mtu
> +        && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>          && dev->rxq_size == dev->requested_rxq_size
>          && dev->txq_size == dev->requested_txq_size
>          && dev->socket_id == dev->requested_socket_id) { @@ -3561,6
> +3579,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>          goto out;
>      }
> 
> +    dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode;

I think this would be removed from here and only set once rte_eth_dev_configure succeeds in dpdk_eth_dev_port_config as mentioned earlier?

> +
>      netdev->n_txq = dev->requested_n_txq;
>      netdev->n_rxq = dev->requested_n_rxq;
> 

It could be useful to report the LSC mode configured for the port also as part of the port status, something like

@@ -2781,6 +2783,8 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args)
                            dev_info.max_hash_mac_addrs);
     smap_add_format(args, "max_vfs", "%u", dev_info.max_vfs);
     smap_add_format(args, "max_vmdq_pools", "%u", dev_info.max_vmdq_pools);
+    smap_add_format(args, "lsc_mode", "%s",
+                           dev->lsc_interrupt_mode ? "Interrupt" : "PMD");

Thoughts?

> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> 0c6a43d..3c9e637 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3631,6 +3631,23 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> type=patch options:peer=p1 \
>        </column>
>      </group>
> 
> +    <group title="Link State Change detection mode">
> +      <column name="options" key="dpdk-lsc-interrupt"
> +              type='{"type": "boolean"}'>
> +        <p>
> +          Set this value to <code>true</code> to configure interrupt mode
> for
> +          Link State Change (LSC) detection instead of poll mode for the
> DPDK
> +          interface.
> +        </p>
> +        <p>
> +          If this value is not set, poll mode is configured.
> +        </p>
> +        <p>
> +          This parameter has an effect only on netdev dpdk interfaces.
> +        </p>
> +      </column>
> +    </group>
> +
>      <group title="Common Columns">
>        The overall purpose of these columns is described under
> <code>Common
>        Columns</code> at the beginning of this document.
> --
> 1.9.1
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ilya Maximets March 27, 2018, 11:16 a.m. | #2
On 27.03.2018 13:19, Stokes, Ian wrote:
>> It is possible to change LSC detection mode to polling or interrupt mode
>> for DPDK interfaces. The default is polling mode. To set interrupt mode,
>> option dpdk-lsc-interrupt has to be set to true.
>>
>> In polling mode more processor time is needed, since the OVS repeatedly
>> reads the link state with a short period. It can lead to packet loss for
>> certain systems.
>>
>> In interrupt mode the hardware itself triggers an interrupt when link
>> state change happens, so less processing time needs for the OVS.
>>
>> For detailed description and usage see the dpdk install documentation.

Could you, please, better describe why we need this change?
Because we're not removing the polling thread. OVS will still
poll the link states periodically. This config option has
no effect on that side. Also, link state polling in OVS uses
'rte_eth_link_get_nowait()' function which will be called in both
cases and should not wait for hardware reply in any implementation.

There was recent bug fix for intel NICs that fixes waiting of an
admin queue on link state requests despite of 'no_wait' flag:
    http://dpdk.org/ml/archives/dev/2018-March/092156.html
Will this fix your target case?

So, the difference of execution time of 'rte_eth_link_get_nowait()'
with enabled and disabled interrupts should be not so significant.
Do you have performance measurements? Measurement with above fix applied?


> 
> Thanks for working on this Robert.
> 
> I've completed some testing including the case where LSC is not supported, in which case the port will remain in a down state and fail rx/tx traffic. This behavior conforms to the netdev_reconfigure expectations in the fail case so that's ok.

I'm not sure if this is acceptable. For example, we're not failing
reconfiguration in case of issues with number of queues. We're trying
different numbers until we have working configuration.
Maybe we need the same fall-back mechanism in case of not supported LSC
interrupts? (MTU setup errors are really uncommon unlike LSC interrupts'
support in PMDs).

> 
> I'm a bit late to the thread but I have a few other comments below.
> 
> I'd like to get this patch in the next pull request if possible so I'd appreciate if others can give any comments on the patch also.
> 
> Thanks
> Ian
> 
>>
>> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
>> ---
>> v5 -> v6:
>> - DPDK install documentation updated.
>> - Status of lsc_interrupt_mode of DPDK interfaces can be read by command
>>   ovs-appctl dpif/show.
>> - It was suggested to check if the HW supports interrupt mode, but it is
>> not
>>   possible to do without DPDK code change, so it is skipped from this
>> patch.
>> ---
>>  Documentation/intro/install/dpdk.rst | 33
>> +++++++++++++++++++++++++++++++++
>>  lib/netdev-dpdk.c                    | 24 ++++++++++++++++++++++--
>>  vswitchd/vswitch.xml                 | 17 +++++++++++++++++
>>  3 files changed, 72 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/intro/install/dpdk.rst
>> b/Documentation/intro/install/dpdk.rst
>> index ed358d5..eb1bc7b 100644
>> --- a/Documentation/intro/install/dpdk.rst
>> +++ b/Documentation/intro/install/dpdk.rst
>> @@ -628,6 +628,39 @@ The average number of packets per output batch can be
>> checked in PMD stats::
>>
>>      $ ovs-appctl dpif-netdev/pmd-stats-show
>>
>> +Link State Change (LSC) detection configuration
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +There are two methods to get the information when Link State Change
>> +(LSC) happens on a network interface: by polling or interrupt.
>> +
>> +With the polling method, the main process checks the link state on a
>> +fixed interval. This fixed interval determines how fast a link change
>> +is detected.
>> +
>> +If interrupts are used to get LSC information, the hardware itself
>> +triggers an interrupt when link state change happens, the interrupt
>> +thread wakes up from sleep, updates the information, and goes back to
>> +sleep mode. When no link state change happens (most of the time), the
>> +thread remains in sleep mode and doesn`t use processor time at all. The
>> +disadvantage of this method is that when an interrupt happens, the
>> +processor has to handle it immediately, so it puts the currently
>> +running process to background, handles the interrupt, and takes the
>> background process back.
>> +
>> +Note that not all PMD drivers support LSC interrupts.
>> +
>> +The default configuration is polling mode. To set interrupt mode,
>> +option ``dpdk-lsc-interrupt`` has to be set to ``true``.
>> +
>> +Command to set interrupt mode for a specific interface::
>> +    $ ovs-vsctl set interface <iface_name>
>> +options:dpdk-lsc-interrupt=true
>> +
>> +Command to set polling mode for a specific interface::
>> +    $ ovs-vsctl set interface <iface_name>
>> +options:dpdk-lsc-interrupt=false
>> +
>> +Command to remove ``dpdk-lsc-interrupt`` option::
>> +    $ ovs-vsctl remove interface <iface_name> options
>> +dpdk-lsc-interrupt
> 
> Just a query, why do we need the above option to remove lsc, is setting lsc true or false with the previous commands not enough?
> 
>> +
>>  Limitations
>>  ------------
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 94fb163..e2794e8
>> 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -433,6 +433,12 @@ struct netdev_dpdk {
>>          /* DPDK-ETH hardware offload features,
>>           * from the enum set 'dpdk_hw_ol_features' */
>>          uint32_t hw_ol_features;
>> +
>> +        /* Properties for link state change detection mode.
>> +         * If lsc_interrupt_mode is set to false, poll mode is used,
>> +         * otherwise interrupt mode is used. */
>> +        bool requested_lsc_interrupt_mode;
>> +        bool lsc_interrupt_mode;
>>      );
>>
>>      PADDED_MEMBERS(CACHE_LINE_SIZE,
>> @@ -686,12 +692,14 @@ dpdk_watchdog(void *dummy OVS_UNUSED)  }
>>
>>  static int
>> -dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
>> +dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
>>  {
>>      int diag = 0;
>>      int i;
>>      struct rte_eth_conf conf = port_conf;
>>
>> +    conf.intr_conf.lsc = dev->lsc_interrupt_mode;
> 
> Should above be dev->requested_lsc_interrupt_mode? Similar to how we request MTUs, we first have to validate if it's supported.
> Since we have no way of knowing if the interrupt mode is supported at the moment we should set conf.intr_conf.lsc = dev->requested_lsc_interrupt_mode and once configuration succeeds set dev->lsc_interrupt_mode = conf.intr_conf.lsc.
> 
> If we don't then I'm not sure of the purpose of having requested_lsc_interrupt_mode and a separate lsc_interrupt_mode for dev.
> 
> Thoughts?
> 
>> +
>>      /* For some NICs (e.g. Niantic), scatter_rx mode needs to be
>> explicitly
>>       * enabled. */
>>      if (dev->mtu > ETHER_MTU) {
>> @@ -801,7 +809,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>>      n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
>>      n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
>>
>> -    diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
>> +    diag = dpdk_eth_dev_port_config(dev, n_rxq, n_txq);
>>      if (diag) {
>>          VLOG_ERR("Interface %s(rxq:%d txq:%d) configure error: %s",
>>                   dev->up.name, n_rxq, n_txq, rte_strerror(-diag));
> 
> With the change from dpdk_eth_dev_queue_setup to dpdk_eth_dev_port_config I think we need to improve the error message also logged above.
> Instead of only reporting the rxq and txq we should include the requested lsc mode also as that could cause a failure if not supported.
> 
>  @@ -
>> 897,6 +905,7 @@ common_construct(struct netdev *netdev, dpdk_port_t
>> port_no,
>>      dev->flags = 0;
>>      dev->requested_mtu = ETHER_MTU;
>>      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>> +    dev->requested_lsc_interrupt_mode = 0;
>>      ovsrcu_index_init(&dev->vid, -1);
>>      dev->vhost_reconfigured = false;
>>      dev->attached = false;
>> @@ -1320,6 +1329,8 @@ netdev_dpdk_get_config(const struct netdev *netdev,
>> struct smap *args)
>>          } else {
>>              smap_add(args, "rx_csum_offload", "false");
>>          }
>> +        smap_add(args, "lsc_interrupt_mode",
>> +                 dev->lsc_interrupt_mode?"true":"false");
> 
> Need space before/after operators '?' and ':' above.
> 
>>      }
>>      ovs_mutex_unlock(&dev->mutex);
>>
>> @@ -1520,6 +1531,12 @@ netdev_dpdk_set_config(struct netdev *netdev, const
>> struct smap *args,
>>          goto out;
>>      }
>>
>> +    bool lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt",
> 
> This comes down to style preference but can you declare this bool at the beginning of the function? There are other bool variable in the function declared there already so it keeps with the existing style of the function.

+1

> 
>> false);
>> +    if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
>> +        dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
>> +        netdev_request_reconfigure(netdev);
>> +    }
>> +
>>      rx_fc_en = smap_get_bool(args, "rx-flow-ctrl", false);
>>      tx_fc_en = smap_get_bool(args, "tx-flow-ctrl", false);
>>      autoneg = smap_get_bool(args, "flow-ctrl-autoneg", false); @@ -3546,6
>> +3563,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>      if (netdev->n_txq == dev->requested_n_txq
>>          && netdev->n_rxq == dev->requested_n_rxq
>>          && dev->mtu == dev->requested_mtu
>> +        && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>>          && dev->rxq_size == dev->requested_rxq_size
>>          && dev->txq_size == dev->requested_txq_size
>>          && dev->socket_id == dev->requested_socket_id) { @@ -3561,6
>> +3579,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>          goto out;
>>      }
>>
>> +    dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode;
> 
> I think this would be removed from here and only set once rte_eth_dev_configure succeeds in dpdk_eth_dev_port_config as mentioned earlier?
> 
>> +
>>      netdev->n_txq = dev->requested_n_txq;
>>      netdev->n_rxq = dev->requested_n_rxq;
>>
> 
> It could be useful to report the LSC mode configured for the port also as part of the port status, something like
> 
> @@ -2781,6 +2783,8 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args)
>                             dev_info.max_hash_mac_addrs);
>      smap_add_format(args, "max_vfs", "%u", dev_info.max_vfs);
>      smap_add_format(args, "max_vmdq_pools", "%u", dev_info.max_vmdq_pools);
> +    smap_add_format(args, "lsc_mode", "%s",
> +                           dev->lsc_interrupt_mode ? "Interrupt" : "PMD");
> 
> Thoughts?

We have this in 'netdev_dpdk_get_config()'. Do you think we need this
in two places? Also, 'netdev_dpdk_get_status()' is more like HW
specific invariants. (Not sure why we're placing these values in 'status',
it's a bit strange.)

> 
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
>> 0c6a43d..3c9e637 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -3631,6 +3631,23 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
>> type=patch options:peer=p1 \
>>        </column>
>>      </group>
>>
>> +    <group title="Link State Change detection mode">
>> +      <column name="options" key="dpdk-lsc-interrupt"
>> +              type='{"type": "boolean"}'>
>> +        <p>
>> +          Set this value to <code>true</code> to configure interrupt mode
>> for
>> +          Link State Change (LSC) detection instead of poll mode for the
>> DPDK
>> +          interface.
>> +        </p>
>> +        <p>
>> +          If this value is not set, poll mode is configured.
>> +        </p>
>> +        <p>
>> +          This parameter has an effect only on netdev dpdk interfaces.
>> +        </p>
>> +      </column>
>> +    </group>
>> +
>>      <group title="Common Columns">
>>        The overall purpose of these columns is described under
>> <code>Common
>>        Columns</code> at the beginning of this document.
>> --
>> 1.9.1
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> 
>
Stokes, Ian March 27, 2018, 2:07 p.m. | #3
> On 27.03.2018 13:19, Stokes, Ian wrote:
> >> It is possible to change LSC detection mode to polling or interrupt
> >> mode for DPDK interfaces. The default is polling mode. To set
> >> interrupt mode, option dpdk-lsc-interrupt has to be set to true.
> >>
> >> In polling mode more processor time is needed, since the OVS
> >> repeatedly reads the link state with a short period. It can lead to
> >> packet loss for certain systems.
> >>
> >> In interrupt mode the hardware itself triggers an interrupt when link
> >> state change happens, so less processing time needs for the OVS.
> >>
> >> For detailed description and usage see the dpdk install documentation.
> 
> Could you, please, better describe why we need this change?
> Because we're not removing the polling thread. OVS will still poll the
> link states periodically. This config option has no effect on that side.
> Also, link state polling in OVS uses 'rte_eth_link_get_nowait()' function
> which will be called in both cases and should not wait for hardware reply
> in any implementation.

I believe it was related to a case where bonded mode in active back was causing packet drops due to the frequency that the LSC was being polled. Using interrupt based approach alleviated the issue. (I'm open to correction on this :))

@Robert/Eelco You may be able to provide some more light here and whether the patches below in DPDK resolve the issue?

> 
> There was recent bug fix for intel NICs that fixes waiting of an admin
> queue on link state requests despite of 'no_wait' flag:
>     http://dpdk.org/ml/archives/dev/2018-March/092156.html
> Will this fix your target case?
> 
> So, the difference of execution time of 'rte_eth_link_get_nowait()'
> with enabled and disabled interrupts should be not so significant.
> Do you have performance measurements? Measurement with above fix applied?
> 
> 
> >
> > Thanks for working on this Robert.
> >
> > I've completed some testing including the case where LSC is not
> supported, in which case the port will remain in a down state and fail
> rx/tx traffic. This behavior conforms to the netdev_reconfigure
> expectations in the fail case so that's ok.
> 
> I'm not sure if this is acceptable. For example, we're not failing
> reconfiguration in case of issues with number of queues. We're trying
> different numbers until we have working configuration.
> Maybe we need the same fall-back mechanism in case of not supported LSC
> interrupts? (MTU setup errors are really uncommon unlike LSC interrupts'
> support in PMDs).

Thanks for raising this Ilya.

I thought of this as well. I'd like to see a fall back to the PMD but didn’t see how it could be done in a clean way.

Unfortunately rte_eth_dev_configure() returns -EINVAL when lsc mode is requested but not supported.

It doesn't give us a clue if the error is related to lsc mode as it could also relate to a number of other configure issues such as nb_rxq/nb_txq/portid etc.

It would be better if we could query the device via ethdev api to see if it supports lsc interrupt mode but that’s not available currently.

The only way I have seen lsc support queries in DPDK is by querying the device data itself which doesn't look great, this was discussed in an earlier thread I think for this patch.

The alternative could be to introduce a generic retry for rte_eth_dev_configure() when configure fails but with lsc configured to PMD instead but I'd prefer to see an indication that the lsc mode was the cause.

If we have a cleaner way to fall back to PMD that’s ok by me but I think we have to allow for a possible failure in during a reconfigure and follow the prescribed behavior as per netdev_reconfigure() which this code currently does.

> 
> >
> > I'm a bit late to the thread but I have a few other comments below.
> >
> > I'd like to get this patch in the next pull request if possible so I'd
> appreciate if others can give any comments on the patch also.
> >
> > Thanks
> > Ian
> >
> >>
> >> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
> >> ---
> >> v5 -> v6:
> >> - DPDK install documentation updated.
> >> - Status of lsc_interrupt_mode of DPDK interfaces can be read by
> command
> >>   ovs-appctl dpif/show.
> >> - It was suggested to check if the HW supports interrupt mode, but it
> >> is not
> >>   possible to do without DPDK code change, so it is skipped from this
> >> patch.
> >> ---
> >>  Documentation/intro/install/dpdk.rst | 33
> >> +++++++++++++++++++++++++++++++++
> >>  lib/netdev-dpdk.c                    | 24 ++++++++++++++++++++++--
> >>  vswitchd/vswitch.xml                 | 17 +++++++++++++++++
> >>  3 files changed, 72 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/intro/install/dpdk.rst
> >> b/Documentation/intro/install/dpdk.rst
> >> index ed358d5..eb1bc7b 100644
> >> --- a/Documentation/intro/install/dpdk.rst
> >> +++ b/Documentation/intro/install/dpdk.rst
> >> @@ -628,6 +628,39 @@ The average number of packets per output batch
> >> can be checked in PMD stats::
> >>
> >>      $ ovs-appctl dpif-netdev/pmd-stats-show
> >>
> >> +Link State Change (LSC) detection configuration
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +There are two methods to get the information when Link State Change
> >> +(LSC) happens on a network interface: by polling or interrupt.
> >> +
> >> +With the polling method, the main process checks the link state on a
> >> +fixed interval. This fixed interval determines how fast a link
> >> +change is detected.
> >> +
> >> +If interrupts are used to get LSC information, the hardware itself
> >> +triggers an interrupt when link state change happens, the interrupt
> >> +thread wakes up from sleep, updates the information, and goes back
> >> +to sleep mode. When no link state change happens (most of the time),
> >> +the thread remains in sleep mode and doesn`t use processor time at
> >> +all. The disadvantage of this method is that when an interrupt
> >> +happens, the processor has to handle it immediately, so it puts the
> >> +currently running process to background, handles the interrupt, and
> >> +takes the
> >> background process back.
> >> +
> >> +Note that not all PMD drivers support LSC interrupts.
> >> +
> >> +The default configuration is polling mode. To set interrupt mode,
> >> +option ``dpdk-lsc-interrupt`` has to be set to ``true``.
> >> +
> >> +Command to set interrupt mode for a specific interface::
> >> +    $ ovs-vsctl set interface <iface_name>
> >> +options:dpdk-lsc-interrupt=true
> >> +
> >> +Command to set polling mode for a specific interface::
> >> +    $ ovs-vsctl set interface <iface_name>
> >> +options:dpdk-lsc-interrupt=false
> >> +
> >> +Command to remove ``dpdk-lsc-interrupt`` option::
> >> +    $ ovs-vsctl remove interface <iface_name> options
> >> +dpdk-lsc-interrupt
> >
> > Just a query, why do we need the above option to remove lsc, is setting
> lsc true or false with the previous commands not enough?
> >
> >> +
> >>  Limitations
> >>  ------------
> >>
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> >> 94fb163..e2794e8
> >> 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -433,6 +433,12 @@ struct netdev_dpdk {
> >>          /* DPDK-ETH hardware offload features,
> >>           * from the enum set 'dpdk_hw_ol_features' */
> >>          uint32_t hw_ol_features;
> >> +
> >> +        /* Properties for link state change detection mode.
> >> +         * If lsc_interrupt_mode is set to false, poll mode is used,
> >> +         * otherwise interrupt mode is used. */
> >> +        bool requested_lsc_interrupt_mode;
> >> +        bool lsc_interrupt_mode;
> >>      );
> >>
> >>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> >> @@ -686,12 +692,14 @@ dpdk_watchdog(void *dummy OVS_UNUSED)  }
> >>
> >>  static int
> >> -dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int
> >> n_txq)
> >> +dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int
> >> +n_txq)
> >>  {
> >>      int diag = 0;
> >>      int i;
> >>      struct rte_eth_conf conf = port_conf;
> >>
> >> +    conf.intr_conf.lsc = dev->lsc_interrupt_mode;
> >
> > Should above be dev->requested_lsc_interrupt_mode? Similar to how we
> request MTUs, we first have to validate if it's supported.
> > Since we have no way of knowing if the interrupt mode is supported at
> the moment we should set conf.intr_conf.lsc = dev-
> >requested_lsc_interrupt_mode and once configuration succeeds set dev-
> >lsc_interrupt_mode = conf.intr_conf.lsc.
> >
> > If we don't then I'm not sure of the purpose of having
> requested_lsc_interrupt_mode and a separate lsc_interrupt_mode for dev.
> >
> > Thoughts?
> >
> >> +
> >>      /* For some NICs (e.g. Niantic), scatter_rx mode needs to be
> >> explicitly
> >>       * enabled. */
> >>      if (dev->mtu > ETHER_MTU) {
> >> @@ -801,7 +809,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
> >>      n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
> >>      n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> >>
> >> -    diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
> >> +    diag = dpdk_eth_dev_port_config(dev, n_rxq, n_txq);
> >>      if (diag) {
> >>          VLOG_ERR("Interface %s(rxq:%d txq:%d) configure error: %s",
> >>                   dev->up.name, n_rxq, n_txq, rte_strerror(-diag));
> >
> > With the change from dpdk_eth_dev_queue_setup to
> dpdk_eth_dev_port_config I think we need to improve the error message also
> logged above.
> > Instead of only reporting the rxq and txq we should include the
> requested lsc mode also as that could cause a failure if not supported.
> >
> >  @@ -
> >> 897,6 +905,7 @@ common_construct(struct netdev *netdev, dpdk_port_t
> >> port_no,
> >>      dev->flags = 0;
> >>      dev->requested_mtu = ETHER_MTU;
> >>      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> >> +    dev->requested_lsc_interrupt_mode = 0;
> >>      ovsrcu_index_init(&dev->vid, -1);
> >>      dev->vhost_reconfigured = false;
> >>      dev->attached = false;
> >> @@ -1320,6 +1329,8 @@ netdev_dpdk_get_config(const struct netdev
> >> *netdev, struct smap *args)
> >>          } else {
> >>              smap_add(args, "rx_csum_offload", "false");
> >>          }
> >> +        smap_add(args, "lsc_interrupt_mode",
> >> +                 dev->lsc_interrupt_mode?"true":"false");
> >
> > Need space before/after operators '?' and ':' above.
> >
> >>      }
> >>      ovs_mutex_unlock(&dev->mutex);
> >>
> >> @@ -1520,6 +1531,12 @@ netdev_dpdk_set_config(struct netdev *netdev,
> >> const struct smap *args,
> >>          goto out;
> >>      }
> >>
> >> +    bool lsc_interrupt_mode = smap_get_bool(args,
> >> + "dpdk-lsc-interrupt",
> >
> > This comes down to style preference but can you declare this bool at the
> beginning of the function? There are other bool variable in the function
> declared there already so it keeps with the existing style of the
> function.
> 
> +1
> 
> >
> >> false);
> >> +    if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
> >> +        dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
> >> +        netdev_request_reconfigure(netdev);
> >> +    }
> >> +
> >>      rx_fc_en = smap_get_bool(args, "rx-flow-ctrl", false);
> >>      tx_fc_en = smap_get_bool(args, "tx-flow-ctrl", false);
> >>      autoneg = smap_get_bool(args, "flow-ctrl-autoneg", false); @@
> >> -3546,6
> >> +3563,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> >>      if (netdev->n_txq == dev->requested_n_txq
> >>          && netdev->n_rxq == dev->requested_n_rxq
> >>          && dev->mtu == dev->requested_mtu
> >> +        && dev->lsc_interrupt_mode ==
> >> + dev->requested_lsc_interrupt_mode
> >>          && dev->rxq_size == dev->requested_rxq_size
> >>          && dev->txq_size == dev->requested_txq_size
> >>          && dev->socket_id == dev->requested_socket_id) { @@ -3561,6
> >> +3579,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> >>          goto out;
> >>      }
> >>
> >> +    dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode;
> >
> > I think this would be removed from here and only set once
> rte_eth_dev_configure succeeds in dpdk_eth_dev_port_config as mentioned
> earlier?
> >
> >> +
> >>      netdev->n_txq = dev->requested_n_txq;
> >>      netdev->n_rxq = dev->requested_n_rxq;
> >>
> >
> > It could be useful to report the LSC mode configured for the port also
> > as part of the port status, something like
> >
> > @@ -2781,6 +2783,8 @@ netdev_dpdk_get_status(const struct netdev
> *netdev, struct smap *args)
> >                             dev_info.max_hash_mac_addrs);
> >      smap_add_format(args, "max_vfs", "%u", dev_info.max_vfs);
> >      smap_add_format(args, "max_vmdq_pools", "%u",
> > dev_info.max_vmdq_pools);
> > +    smap_add_format(args, "lsc_mode", "%s",
> > +                           dev->lsc_interrupt_mode ? "Interrupt" :
> > + "PMD");
> >
> > Thoughts?
> 
> We have this in 'netdev_dpdk_get_config()'. Do you think we need this in
> two places? Also, 'netdev_dpdk_get_status()' is more like HW specific
> invariants. (Not sure why we're placing these values in 'status', it's a
> bit strange.)

Ya, I wasn't sure here so thought I'd suggest it. 

You have values in get_status like max_rx_packet_length that change based on user configuration, so for completeness really I suggested it be added here.

I don’t feel particularly strongly about it though, wanted to raise it for discussion as it may be something the user expects.

Ian
> 
> >
> >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> >> 0c6a43d..3c9e637 100644
> >> --- a/vswitchd/vswitch.xml
> >> +++ b/vswitchd/vswitch.xml
> >> @@ -3631,6 +3631,23 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> >> type=patch options:peer=p1 \
> >>        </column>
> >>      </group>
> >>
> >> +    <group title="Link State Change detection mode">
> >> +      <column name="options" key="dpdk-lsc-interrupt"
> >> +              type='{"type": "boolean"}'>
> >> +        <p>
> >> +          Set this value to <code>true</code> to configure interrupt
> >> + mode
> >> for
> >> +          Link State Change (LSC) detection instead of poll mode for
> >> + the
> >> DPDK
> >> +          interface.
> >> +        </p>
> >> +        <p>
> >> +          If this value is not set, poll mode is configured.
> >> +        </p>
> >> +        <p>
> >> +          This parameter has an effect only on netdev dpdk interfaces.
> >> +        </p>
> >> +      </column>
> >> +    </group>
> >> +
> >>      <group title="Common Columns">
> >>        The overall purpose of these columns is described under
> >> <code>Common
> >>        Columns</code> at the beginning of this document.
> >> --
> >> 1.9.1
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
> >
Jan Scheurich March 27, 2018, 2:09 p.m. | #4
Hi Ilya,

This patch is the upstream version of a fix we implemented downstream a year ago to fix the issue with massive packet drop of OVS-DPDK on Fortville NICs.

The root cause of this packet drop was the extended blocking of the ovs-vswitchd by the i40e PMD during the rte_eth_link_get_nowait() function, which caused the PMDs to hang for up to 40ms during upcalls.

At the time switching to LSC interrupt was the only viable solution. OVS still polls the links state from DPDK with rte_eth_link_get_nowait() but DPDK returns the locally buffered link state, updated through LSC interrupt, instead of going through the FVL's admin queue.

Now, the new i40e PMD fix in DPDK bypassing the admin queue should also solve the problem with Fortville NICs. It would have to be backported to older DPDK releases to be useful as fix for OVS 2.6, 2.7, 2.8 and 2.9, whereas the LSC interrupt solution in OVS would work as back-port for all OVS versions since 2.6.

That's why we still think there is value in pursuing the LSC interrupt track.

BR, Jan

> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Ilya Maximets
> Sent: Tuesday, 27 March, 2018 13:17
> To: Stokes, Ian <ian.stokes@intel.com>; Róbert Mulik <robert.mulik@ericsson.com>; dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v6] Configurable Link State Change (LSC) detection mode
> 
> On 27.03.2018 13:19, Stokes, Ian wrote:
> >> It is possible to change LSC detection mode to polling or interrupt mode
> >> for DPDK interfaces. The default is polling mode. To set interrupt mode,
> >> option dpdk-lsc-interrupt has to be set to true.
> >>
> >> In polling mode more processor time is needed, since the OVS repeatedly
> >> reads the link state with a short period. It can lead to packet loss for
> >> certain systems.
> >>
> >> In interrupt mode the hardware itself triggers an interrupt when link
> >> state change happens, so less processing time needs for the OVS.
> >>
> >> For detailed description and usage see the dpdk install documentation.
> 
> Could you, please, better describe why we need this change?
> Because we're not removing the polling thread. OVS will still
> poll the link states periodically. This config option has
> no effect on that side. Also, link state polling in OVS uses
> 'rte_eth_link_get_nowait()' function which will be called in both
> cases and should not wait for hardware reply in any implementation.
> 
> There was recent bug fix for intel NICs that fixes waiting of an
> admin queue on link state requests despite of 'no_wait' flag:
>     http://dpdk.org/ml/archives/dev/2018-March/092156.html
> Will this fix your target case?
> 
> So, the difference of execution time of 'rte_eth_link_get_nowait()'
> with enabled and disabled interrupts should be not so significant.
> Do you have performance measurements? Measurement with above fix applied?
> 
> 
> >
> > Thanks for working on this Robert.
> >
> > I've completed some testing including the case where LSC is not supported, in which case the port will remain in a down state and
> fail rx/tx traffic. This behavior conforms to the netdev_reconfigure expectations in the fail case so that's ok.
> 
> I'm not sure if this is acceptable. For example, we're not failing
> reconfiguration in case of issues with number of queues. We're trying
> different numbers until we have working configuration.
> Maybe we need the same fall-back mechanism in case of not supported LSC
> interrupts? (MTU setup errors are really uncommon unlike LSC interrupts'
> support in PMDs).
> 
> >
> > I'm a bit late to the thread but I have a few other comments below.
> >
> > I'd like to get this patch in the next pull request if possible so I'd appreciate if others can give any comments on the patch also.
> >
> > Thanks
> > Ian
> >
> >>
> >> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
> >> ---
> >> v5 -> v6:
> >> - DPDK install documentation updated.
> >> - Status of lsc_interrupt_mode of DPDK interfaces can be read by command
> >>   ovs-appctl dpif/show.
> >> - It was suggested to check if the HW supports interrupt mode, but it is
> >> not
> >>   possible to do without DPDK code change, so it is skipped from this
> >> patch.
> >> ---
> >>  Documentation/intro/install/dpdk.rst | 33
> >> +++++++++++++++++++++++++++++++++
> >>  lib/netdev-dpdk.c                    | 24 ++++++++++++++++++++++--
> >>  vswitchd/vswitch.xml                 | 17 +++++++++++++++++
> >>  3 files changed, 72 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/intro/install/dpdk.rst
> >> b/Documentation/intro/install/dpdk.rst
> >> index ed358d5..eb1bc7b 100644
> >> --- a/Documentation/intro/install/dpdk.rst
> >> +++ b/Documentation/intro/install/dpdk.rst
> >> @@ -628,6 +628,39 @@ The average number of packets per output batch can be
> >> checked in PMD stats::
> >>
> >>      $ ovs-appctl dpif-netdev/pmd-stats-show
> >>
> >> +Link State Change (LSC) detection configuration
> >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> +
> >> +There are two methods to get the information when Link State Change
> >> +(LSC) happens on a network interface: by polling or interrupt.
> >> +
> >> +With the polling method, the main process checks the link state on a
> >> +fixed interval. This fixed interval determines how fast a link change
> >> +is detected.
> >> +
> >> +If interrupts are used to get LSC information, the hardware itself
> >> +triggers an interrupt when link state change happens, the interrupt
> >> +thread wakes up from sleep, updates the information, and goes back to
> >> +sleep mode. When no link state change happens (most of the time), the
> >> +thread remains in sleep mode and doesn`t use processor time at all. The
> >> +disadvantage of this method is that when an interrupt happens, the
> >> +processor has to handle it immediately, so it puts the currently
> >> +running process to background, handles the interrupt, and takes the
> >> background process back.
> >> +
> >> +Note that not all PMD drivers support LSC interrupts.
> >> +
> >> +The default configuration is polling mode. To set interrupt mode,
> >> +option ``dpdk-lsc-interrupt`` has to be set to ``true``.
> >> +
> >> +Command to set interrupt mode for a specific interface::
> >> +    $ ovs-vsctl set interface <iface_name>
> >> +options:dpdk-lsc-interrupt=true
> >> +
> >> +Command to set polling mode for a specific interface::
> >> +    $ ovs-vsctl set interface <iface_name>
> >> +options:dpdk-lsc-interrupt=false
> >> +
> >> +Command to remove ``dpdk-lsc-interrupt`` option::
> >> +    $ ovs-vsctl remove interface <iface_name> options
> >> +dpdk-lsc-interrupt
> >
> > Just a query, why do we need the above option to remove lsc, is setting lsc true or false with the previous commands not enough?
> >
> >> +
> >>  Limitations
> >>  ------------
> >>
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 94fb163..e2794e8
> >> 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -433,6 +433,12 @@ struct netdev_dpdk {
> >>          /* DPDK-ETH hardware offload features,
> >>           * from the enum set 'dpdk_hw_ol_features' */
> >>          uint32_t hw_ol_features;
> >> +
> >> +        /* Properties for link state change detection mode.
> >> +         * If lsc_interrupt_mode is set to false, poll mode is used,
> >> +         * otherwise interrupt mode is used. */
> >> +        bool requested_lsc_interrupt_mode;
> >> +        bool lsc_interrupt_mode;
> >>      );
> >>
> >>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> >> @@ -686,12 +692,14 @@ dpdk_watchdog(void *dummy OVS_UNUSED)  }
> >>
> >>  static int
> >> -dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
> >> +dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
> >>  {
> >>      int diag = 0;
> >>      int i;
> >>      struct rte_eth_conf conf = port_conf;
> >>
> >> +    conf.intr_conf.lsc = dev->lsc_interrupt_mode;
> >
> > Should above be dev->requested_lsc_interrupt_mode? Similar to how we request MTUs, we first have to validate if it's supported.
> > Since we have no way of knowing if the interrupt mode is supported at the moment we should set conf.intr_conf.lsc = dev-
> >requested_lsc_interrupt_mode and once configuration succeeds set dev->lsc_interrupt_mode = conf.intr_conf.lsc.
> >
> > If we don't then I'm not sure of the purpose of having requested_lsc_interrupt_mode and a separate lsc_interrupt_mode for dev.
> >
> > Thoughts?
> >
> >> +
> >>      /* For some NICs (e.g. Niantic), scatter_rx mode needs to be
> >> explicitly
> >>       * enabled. */
> >>      if (dev->mtu > ETHER_MTU) {
> >> @@ -801,7 +809,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
> >>      n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
> >>      n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> >>
> >> -    diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
> >> +    diag = dpdk_eth_dev_port_config(dev, n_rxq, n_txq);
> >>      if (diag) {
> >>          VLOG_ERR("Interface %s(rxq:%d txq:%d) configure error: %s",
> >>                   dev->up.name, n_rxq, n_txq, rte_strerror(-diag));
> >
> > With the change from dpdk_eth_dev_queue_setup to dpdk_eth_dev_port_config I think we need to improve the error message
> also logged above.
> > Instead of only reporting the rxq and txq we should include the requested lsc mode also as that could cause a failure if not
> supported.
> >
> >  @@ -
> >> 897,6 +905,7 @@ common_construct(struct netdev *netdev, dpdk_port_t
> >> port_no,
> >>      dev->flags = 0;
> >>      dev->requested_mtu = ETHER_MTU;
> >>      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> >> +    dev->requested_lsc_interrupt_mode = 0;
> >>      ovsrcu_index_init(&dev->vid, -1);
> >>      dev->vhost_reconfigured = false;
> >>      dev->attached = false;
> >> @@ -1320,6 +1329,8 @@ netdev_dpdk_get_config(const struct netdev *netdev,
> >> struct smap *args)
> >>          } else {
> >>              smap_add(args, "rx_csum_offload", "false");
> >>          }
> >> +        smap_add(args, "lsc_interrupt_mode",
> >> +                 dev->lsc_interrupt_mode?"true":"false");
> >
> > Need space before/after operators '?' and ':' above.
> >
> >>      }
> >>      ovs_mutex_unlock(&dev->mutex);
> >>
> >> @@ -1520,6 +1531,12 @@ netdev_dpdk_set_config(struct netdev *netdev, const
> >> struct smap *args,
> >>          goto out;
> >>      }
> >>
> >> +    bool lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt",
> >
> > This comes down to style preference but can you declare this bool at the beginning of the function? There are other bool variable in
> the function declared there already so it keeps with the existing style of the function.
> 
> +1
> 
> >
> >> false);
> >> +    if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
> >> +        dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
> >> +        netdev_request_reconfigure(netdev);
> >> +    }
> >> +
> >>      rx_fc_en = smap_get_bool(args, "rx-flow-ctrl", false);
> >>      tx_fc_en = smap_get_bool(args, "tx-flow-ctrl", false);
> >>      autoneg = smap_get_bool(args, "flow-ctrl-autoneg", false); @@ -3546,6
> >> +3563,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> >>      if (netdev->n_txq == dev->requested_n_txq
> >>          && netdev->n_rxq == dev->requested_n_rxq
> >>          && dev->mtu == dev->requested_mtu
> >> +        && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
> >>          && dev->rxq_size == dev->requested_rxq_size
> >>          && dev->txq_size == dev->requested_txq_size
> >>          && dev->socket_id == dev->requested_socket_id) { @@ -3561,6
> >> +3579,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> >>          goto out;
> >>      }
> >>
> >> +    dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode;
> >
> > I think this would be removed from here and only set once rte_eth_dev_configure succeeds in dpdk_eth_dev_port_config as
> mentioned earlier?
> >
> >> +
> >>      netdev->n_txq = dev->requested_n_txq;
> >>      netdev->n_rxq = dev->requested_n_rxq;
> >>
> >
> > It could be useful to report the LSC mode configured for the port also as part of the port status, something like
> >
> > @@ -2781,6 +2783,8 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args)
> >                             dev_info.max_hash_mac_addrs);
> >      smap_add_format(args, "max_vfs", "%u", dev_info.max_vfs);
> >      smap_add_format(args, "max_vmdq_pools", "%u", dev_info.max_vmdq_pools);
> > +    smap_add_format(args, "lsc_mode", "%s",
> > +                           dev->lsc_interrupt_mode ? "Interrupt" : "PMD");
> >
> > Thoughts?
> 
> We have this in 'netdev_dpdk_get_config()'. Do you think we need this
> in two places? Also, 'netdev_dpdk_get_status()' is more like HW
> specific invariants. (Not sure why we're placing these values in 'status',
> it's a bit strange.)
> 
> >
> >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> >> 0c6a43d..3c9e637 100644
> >> --- a/vswitchd/vswitch.xml
> >> +++ b/vswitchd/vswitch.xml
> >> @@ -3631,6 +3631,23 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> >> type=patch options:peer=p1 \
> >>        </column>
> >>      </group>
> >>
> >> +    <group title="Link State Change detection mode">
> >> +      <column name="options" key="dpdk-lsc-interrupt"
> >> +              type='{"type": "boolean"}'>
> >> +        <p>
> >> +          Set this value to <code>true</code> to configure interrupt mode
> >> for
> >> +          Link State Change (LSC) detection instead of poll mode for the
> >> DPDK
> >> +          interface.
> >> +        </p>
> >> +        <p>
> >> +          If this value is not set, poll mode is configured.
> >> +        </p>
> >> +        <p>
> >> +          This parameter has an effect only on netdev dpdk interfaces.
> >> +        </p>
> >> +      </column>
> >> +    </group>
> >> +
> >>      <group title="Common Columns">
> >>        The overall purpose of these columns is described under
> >> <code>Common
> >>        Columns</code> at the beginning of this document.
> >> --
> >> 1.9.1
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Eelco Chaudron March 28, 2018, 4:28 p.m. | #5
On 03/27/2018 04:07 PM, Stokes, Ian wrote:
>> On 27.03.2018 13:19, Stokes, Ian wrote:
>>>> It is possible to change LSC detection mode to polling or interrupt
>>>> mode for DPDK interfaces. The default is polling mode. To set
>>>> interrupt mode, option dpdk-lsc-interrupt has to be set to true.
>>>>
>>>> In polling mode more processor time is needed, since the OVS
>>>> repeatedly reads the link state with a short period. It can lead to
>>>> packet loss for certain systems.
>>>>
>>>> In interrupt mode the hardware itself triggers an interrupt when link
>>>> state change happens, so less processing time needs for the OVS.
>>>>
>>>> For detailed description and usage see the dpdk install documentation.
>> Could you, please, better describe why we need this change?
>> Because we're not removing the polling thread. OVS will still poll the
>> link states periodically. This config option has no effect on that side.
>> Also, link state polling in OVS uses 'rte_eth_link_get_nowait()' function
>> which will be called in both cases and should not wait for hardware reply
>> in any implementation.

rte_eth_link_get_nowait() on Intel XL710 could take an excessive time to 
respond. The following patch, 
https://dpdk.org/ml/archives/dev/2018-March/092156.html is taking care 
of it from a DPDK side.

There might be other drivers that also take a long time, hence this 
patch might still be useful in the future.

> I believe it was related to a case where bonded mode in active back was causing packet drops due to the frequency that the LSC was being polled. Using interrupt based approach alleviated the issue. (I'm open to correction on this :))
>
> @Robert/Eelco You may be able to provide some more light here and whether the patches below in DPDK resolve the issue?
>
This long delay can be an issue in bonding mode, as the links checks for 
bonding interfaces is holding the RW lock in bond_run(). This same lock 
is taken in the PMD thread when calling the bond_check_admissibility() 
for upcall traffic.
>> There was recent bug fix for intel NICs that fixes waiting of an admin
>> queue on link state requests despite of 'no_wait' flag:
>>      http://dpdk.org/ml/archives/dev/2018-March/092156.html
>> Will this fix your target case?
>>
>> So, the difference of execution time of 'rte_eth_link_get_nowait()'
>> with enabled and disabled interrupts should be not so significant.
>> Do you have performance measurements? Measurement with above fix applied?
I do not have delay numbers but I know that we were no longer seeing 
dropped traffic compared to other NICs under the same load with upcall 
traffic present.
>>> Thanks for working on this Robert.
>>>
>>> I've completed some testing including the case where LSC is not
>> supported, in which case the port will remain in a down state and fail
>> rx/tx traffic. This behavior conforms to the netdev_reconfigure
>> expectations in the fail case so that's ok.
>>
>> I'm not sure if this is acceptable. For example, we're not failing
>> reconfiguration in case of issues with number of queues. We're trying
>> different numbers until we have working configuration.
>> Maybe we need the same fall-back mechanism in case of not supported LSC
>> interrupts? (MTU setup errors are really uncommon unlike LSC interrupts'
>> support in PMDs).
> Thanks for raising this Ilya.
>
> I thought of this as well. I'd like to see a fall back to the PMD but didn’t see how it could be done in a clean way.
>
> Unfortunately rte_eth_dev_configure() returns -EINVAL when lsc mode is requested but not supported.
>
> It doesn't give us a clue if the error is related to lsc mode as it could also relate to a number of other configure issues such as nb_rxq/nb_txq/portid etc.
>
> It would be better if we could query the device via ethdev api to see if it supports lsc interrupt mode but that’s not available currently.
Maybe a DPDK patch before we continue?
> The only way I have seen lsc support queries in DPDK is by querying the device data itself which doesn't look great, this was discussed in an earlier thread I think for this patch.
>
> The alternative could be to introduce a generic retry for rte_eth_dev_configure() when configure fails but with lsc configured to PMD instead but I'd prefer to see an indication that the lsc mode was the cause.
>
> If we have a cleaner way to fall back to PMD that’s ok by me but I think we have to allow for a possible failure in during a reconfigure and follow the prescribed behavior as per netdev_reconfigure() which this code currently does.
>
>>> I'm a bit late to the thread but I have a few other comments below.
>>>
>>> I'd like to get this patch in the next pull request if possible so I'd
>> appreciate if others can give any comments on the patch also.
>>> Thanks
>>> Ian
>>>
>>>> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
>>>> ---
>>>> v5 -> v6:
>>>> - DPDK install documentation updated.
>>>> - Status of lsc_interrupt_mode of DPDK interfaces can be read by
>> command
>>>>    ovs-appctl dpif/show.
>>>> - It was suggested to check if the HW supports interrupt mode, but it
>>>> is not
>>>>    possible to do without DPDK code change, so it is skipped from this
>>>> patch.
>>>> ---
>>>>   Documentation/intro/install/dpdk.rst | 33
>>>> +++++++++++++++++++++++++++++++++
>>>>   lib/netdev-dpdk.c                    | 24 ++++++++++++++++++++++--
>>>>   vswitchd/vswitch.xml                 | 17 +++++++++++++++++
>>>>   3 files changed, 72 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/Documentation/intro/install/dpdk.rst
>>>> b/Documentation/intro/install/dpdk.rst
>>>> index ed358d5..eb1bc7b 100644
>>>> --- a/Documentation/intro/install/dpdk.rst
>>>> +++ b/Documentation/intro/install/dpdk.rst
>>>> @@ -628,6 +628,39 @@ The average number of packets per output batch
>>>> can be checked in PMD stats::
>>>>
>>>>       $ ovs-appctl dpif-netdev/pmd-stats-show
>>>>
>>>> +Link State Change (LSC) detection configuration
>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> +
>>>> +There are two methods to get the information when Link State Change
>>>> +(LSC) happens on a network interface: by polling or interrupt.
>>>> +
>>>> +With the polling method, the main process checks the link state on a
>>>> +fixed interval. This fixed interval determines how fast a link
>>>> +change is detected.
>>>> +
>>>> +If interrupts are used to get LSC information, the hardware itself
>>>> +triggers an interrupt when link state change happens, the interrupt
>>>> +thread wakes up from sleep, updates the information, and goes back
>>>> +to sleep mode. When no link state change happens (most of the time),
>>>> +the thread remains in sleep mode and doesn`t use processor time at
>>>> +all. The disadvantage of this method is that when an interrupt
>>>> +happens, the processor has to handle it immediately, so it puts the
>>>> +currently running process to background, handles the interrupt, and
>>>> +takes the
>>>> background process back.
>>>> +
>>>> +Note that not all PMD drivers support LSC interrupts.
>>>> +
>>>> +The default configuration is polling mode. To set interrupt mode,
>>>> +option ``dpdk-lsc-interrupt`` has to be set to ``true``.
>>>> +
>>>> +Command to set interrupt mode for a specific interface::
>>>> +    $ ovs-vsctl set interface <iface_name>
>>>> +options:dpdk-lsc-interrupt=true
>>>> +
>>>> +Command to set polling mode for a specific interface::
>>>> +    $ ovs-vsctl set interface <iface_name>
>>>> +options:dpdk-lsc-interrupt=false
>>>> +
>>>> +Command to remove ``dpdk-lsc-interrupt`` option::
>>>> +    $ ovs-vsctl remove interface <iface_name> options
>>>> +dpdk-lsc-interrupt
>>> Just a query, why do we need the above option to remove lsc, is setting
>> lsc true or false with the previous commands not enough?
>>>> +
>>>>   Limitations
>>>>   ------------
>>>>
>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>>>> 94fb163..e2794e8
>>>> 100644
>>>> --- a/lib/netdev-dpdk.c
>>>> +++ b/lib/netdev-dpdk.c
>>>> @@ -433,6 +433,12 @@ struct netdev_dpdk {
>>>>           /* DPDK-ETH hardware offload features,
>>>>            * from the enum set 'dpdk_hw_ol_features' */
>>>>           uint32_t hw_ol_features;
>>>> +
>>>> +        /* Properties for link state change detection mode.
>>>> +         * If lsc_interrupt_mode is set to false, poll mode is used,
>>>> +         * otherwise interrupt mode is used. */
>>>> +        bool requested_lsc_interrupt_mode;
>>>> +        bool lsc_interrupt_mode;
>>>>       );
>>>>
>>>>       PADDED_MEMBERS(CACHE_LINE_SIZE,
>>>> @@ -686,12 +692,14 @@ dpdk_watchdog(void *dummy OVS_UNUSED)  }
>>>>
>>>>   static int
>>>> -dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int
>>>> n_txq)
>>>> +dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int
>>>> +n_txq)
>>>>   {
>>>>       int diag = 0;
>>>>       int i;
>>>>       struct rte_eth_conf conf = port_conf;
>>>>
>>>> +    conf.intr_conf.lsc = dev->lsc_interrupt_mode;
>>> Should above be dev->requested_lsc_interrupt_mode? Similar to how we
>> request MTUs, we first have to validate if it's supported.
>>> Since we have no way of knowing if the interrupt mode is supported at
>> the moment we should set conf.intr_conf.lsc = dev-
>>> requested_lsc_interrupt_mode and once configuration succeeds set dev-
>>> lsc_interrupt_mode = conf.intr_conf.lsc.
>>>
>>> If we don't then I'm not sure of the purpose of having
>> requested_lsc_interrupt_mode and a separate lsc_interrupt_mode for dev.
>>> Thoughts?
>>>
>>>> +
>>>>       /* For some NICs (e.g. Niantic), scatter_rx mode needs to be
>>>> explicitly
>>>>        * enabled. */
>>>>       if (dev->mtu > ETHER_MTU) {
>>>> @@ -801,7 +809,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>>>>       n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
>>>>       n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
>>>>
>>>> -    diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
>>>> +    diag = dpdk_eth_dev_port_config(dev, n_rxq, n_txq);
>>>>       if (diag) {
>>>>           VLOG_ERR("Interface %s(rxq:%d txq:%d) configure error: %s",
>>>>                    dev->up.name, n_rxq, n_txq, rte_strerror(-diag));
>>> With the change from dpdk_eth_dev_queue_setup to
>> dpdk_eth_dev_port_config I think we need to improve the error message also
>> logged above.
>>> Instead of only reporting the rxq and txq we should include the
>> requested lsc mode also as that could cause a failure if not supported.
>>>   @@ -
>>>> 897,6 +905,7 @@ common_construct(struct netdev *netdev, dpdk_port_t
>>>> port_no,
>>>>       dev->flags = 0;
>>>>       dev->requested_mtu = ETHER_MTU;
>>>>       dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>>> +    dev->requested_lsc_interrupt_mode = 0;
>>>>       ovsrcu_index_init(&dev->vid, -1);
>>>>       dev->vhost_reconfigured = false;
>>>>       dev->attached = false;
>>>> @@ -1320,6 +1329,8 @@ netdev_dpdk_get_config(const struct netdev
>>>> *netdev, struct smap *args)
>>>>           } else {
>>>>               smap_add(args, "rx_csum_offload", "false");
>>>>           }
>>>> +        smap_add(args, "lsc_interrupt_mode",
>>>> +                 dev->lsc_interrupt_mode?"true":"false");
>>> Need space before/after operators '?' and ':' above.
>>>
>>>>       }
>>>>       ovs_mutex_unlock(&dev->mutex);
>>>>
>>>> @@ -1520,6 +1531,12 @@ netdev_dpdk_set_config(struct netdev *netdev,
>>>> const struct smap *args,
>>>>           goto out;
>>>>       }
>>>>
>>>> +    bool lsc_interrupt_mode = smap_get_bool(args,
>>>> + "dpdk-lsc-interrupt",
>>> This comes down to style preference but can you declare this bool at the
>> beginning of the function? There are other bool variable in the function
>> declared there already so it keeps with the existing style of the
>> function.
>>
>> +1
>>
>>>> false);
>>>> +    if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
>>>> +        dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
>>>> +        netdev_request_reconfigure(netdev);
>>>> +    }
>>>> +
>>>>       rx_fc_en = smap_get_bool(args, "rx-flow-ctrl", false);
>>>>       tx_fc_en = smap_get_bool(args, "tx-flow-ctrl", false);
>>>>       autoneg = smap_get_bool(args, "flow-ctrl-autoneg", false); @@
>>>> -3546,6
>>>> +3563,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>>       if (netdev->n_txq == dev->requested_n_txq
>>>>           && netdev->n_rxq == dev->requested_n_rxq
>>>>           && dev->mtu == dev->requested_mtu
>>>> +        && dev->lsc_interrupt_mode ==
>>>> + dev->requested_lsc_interrupt_mode
>>>>           && dev->rxq_size == dev->requested_rxq_size
>>>>           && dev->txq_size == dev->requested_txq_size
>>>>           && dev->socket_id == dev->requested_socket_id) { @@ -3561,6
>>>> +3579,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>>           goto out;
>>>>       }
>>>>
>>>> +    dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode;
>>> I think this would be removed from here and only set once
>> rte_eth_dev_configure succeeds in dpdk_eth_dev_port_config as mentioned
>> earlier?
>>>> +
>>>>       netdev->n_txq = dev->requested_n_txq;
>>>>       netdev->n_rxq = dev->requested_n_rxq;
>>>>
>>> It could be useful to report the LSC mode configured for the port also
>>> as part of the port status, something like
>>>
>>> @@ -2781,6 +2783,8 @@ netdev_dpdk_get_status(const struct netdev
>> *netdev, struct smap *args)
>>>                              dev_info.max_hash_mac_addrs);
>>>       smap_add_format(args, "max_vfs", "%u", dev_info.max_vfs);
>>>       smap_add_format(args, "max_vmdq_pools", "%u",
>>> dev_info.max_vmdq_pools);
>>> +    smap_add_format(args, "lsc_mode", "%s",
>>> +                           dev->lsc_interrupt_mode ? "Interrupt" :
>>> + "PMD");
>>>
>>> Thoughts?
>> We have this in 'netdev_dpdk_get_config()'. Do you think we need this in
>> two places? Also, 'netdev_dpdk_get_status()' is more like HW specific
>> invariants. (Not sure why we're placing these values in 'status', it's a
>> bit strange.)
> Ya, I wasn't sure here so thought I'd suggest it.
>
> You have values in get_status like max_rx_packet_length that change based on user configuration, so for completeness really I suggested it be added here.
>
> I don’t feel particularly strongly about it though, wanted to raise it for discussion as it may be something the user expects.
>
> Ian
>>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
>>>> 0c6a43d..3c9e637 100644
>>>> --- a/vswitchd/vswitch.xml
>>>> +++ b/vswitchd/vswitch.xml
>>>> @@ -3631,6 +3631,23 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
>>>> type=patch options:peer=p1 \
>>>>         </column>
>>>>       </group>
>>>>
>>>> +    <group title="Link State Change detection mode">
>>>> +      <column name="options" key="dpdk-lsc-interrupt"
>>>> +              type='{"type": "boolean"}'>
>>>> +        <p>
>>>> +          Set this value to <code>true</code> to configure interrupt
>>>> + mode
>>>> for
>>>> +          Link State Change (LSC) detection instead of poll mode for
>>>> + the
>>>> DPDK
>>>> +          interface.
>>>> +        </p>
>>>> +        <p>
>>>> +          If this value is not set, poll mode is configured.
>>>> +        </p>
>>>> +        <p>
>>>> +          This parameter has an effect only on netdev dpdk interfaces.
>>>> +        </p>
>>>> +      </column>
>>>> +    </group>
>>>> +
>>>>       <group title="Common Columns">
>>>>         The overall purpose of these columns is described under
>>>> <code>Common
>>>>         Columns</code> at the beginning of this document.
>>>> --
>>>> 1.9.1
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>>
Stokes, Ian April 5, 2018, 1:17 p.m. | #6
> On 03/27/2018 04:07 PM, Stokes, Ian wrote:
> >> On 27.03.2018 13:19, Stokes, Ian wrote:
> >>>> It is possible to change LSC detection mode to polling or interrupt
> >>>> mode for DPDK interfaces. The default is polling mode. To set
> >>>> interrupt mode, option dpdk-lsc-interrupt has to be set to true.
> >>>>
> >>>> In polling mode more processor time is needed, since the OVS
> >>>> repeatedly reads the link state with a short period. It can lead to
> >>>> packet loss for certain systems.
> >>>>
> >>>> In interrupt mode the hardware itself triggers an interrupt when
> >>>> link state change happens, so less processing time needs for the OVS.
> >>>>
> >>>> For detailed description and usage see the dpdk install
> documentation.
> >> Could you, please, better describe why we need this change?
> >> Because we're not removing the polling thread. OVS will still poll
> >> the link states periodically. This config option has no effect on that
> side.
> >> Also, link state polling in OVS uses 'rte_eth_link_get_nowait()'
> >> function which will be called in both cases and should not wait for
> >> hardware reply in any implementation.
> 
> rte_eth_link_get_nowait() on Intel XL710 could take an excessive time to
> respond. The following patch, https://dpdk.org/ml/archives/dev/2018-
> March/092156.html is taking care of it from a DPDK side.
> 
> There might be other drivers that also take a long time, hence this patch
> might still be useful in the future.
> 
> > I believe it was related to a case where bonded mode in active back
> > was causing packet drops due to the frequency that the LSC was being
> > polled. Using interrupt based approach alleviated the issue. (I'm open
> > to correction on this :))
> >
> > @Robert/Eelco You may be able to provide some more light here and
> whether the patches below in DPDK resolve the issue?
> >
> This long delay can be an issue in bonding mode, as the links checks for
> bonding interfaces is holding the RW lock in bond_run(). This same lock is
> taken in the PMD thread when calling the bond_check_admissibility() for
> upcall traffic.
> >> There was recent bug fix for intel NICs that fixes waiting of an
> >> admin queue on link state requests despite of 'no_wait' flag:
> >>      http://dpdk.org/ml/archives/dev/2018-March/092156.html
> >> Will this fix your target case?
> >>
> >> So, the difference of execution time of 'rte_eth_link_get_nowait()'
> >> with enabled and disabled interrupts should be not so significant.
> >> Do you have performance measurements? Measurement with above fix
> applied?
> I do not have delay numbers but I know that we were no longer seeing
> dropped traffic compared to other NICs under the same load with upcall
> traffic present.
> >>> Thanks for working on this Robert.
> >>>
> >>> I've completed some testing including the case where LSC is not
> >> supported, in which case the port will remain in a down state and
> >> fail rx/tx traffic. This behavior conforms to the netdev_reconfigure
> >> expectations in the fail case so that's ok.
> >>
> >> I'm not sure if this is acceptable. For example, we're not failing
> >> reconfiguration in case of issues with number of queues. We're trying
> >> different numbers until we have working configuration.
> >> Maybe we need the same fall-back mechanism in case of not supported
> >> LSC interrupts? (MTU setup errors are really uncommon unlike LSC
> interrupts'
> >> support in PMDs).
> > Thanks for raising this Ilya.
> >
> > I thought of this as well. I'd like to see a fall back to the PMD but
> didn’t see how it could be done in a clean way.
> >
> > Unfortunately rte_eth_dev_configure() returns -EINVAL when lsc mode is
> requested but not supported.
> >
> > It doesn't give us a clue if the error is related to lsc mode as it
> could also relate to a number of other configure issues such as
> nb_rxq/nb_txq/portid etc.
> >
> > It would be better if we could query the device via ethdev api to see if
> it supports lsc interrupt mode but that’s not available currently.

> Maybe a DPDK patch before we continue?

It's hard to say. As part of the call to rte_eth_dev_configure() there is a check specifically to see if lsc interrupt is supported with the following.

	/* Check that the device supports requested interrupts */
	if ((dev_conf->intr_conf.lsc == 1) &&
		(!(dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC))) {
			RTE_PMD_DEBUG_TRACE("driver %s does not support lsc\n",
					dev->device->driver->name);
			return -EINVAL;
	}

Even if a patch was submitted to extend the API in DPDK to allow this check specifically, I feel it will be the same code as above, just in a new function name. The check would remain in rte_eth_dev_configure() anyway.

We could do something similar in OVS for now to allow PMD fallback where it's not supported.

Ian

> > The only way I have seen lsc support queries in DPDK is by querying the
> device data itself which doesn't look great, this was discussed in an
> earlier thread I think for this patch.
> >
> > The alternative could be to introduce a generic retry for
> rte_eth_dev_configure() when configure fails but with lsc configured to
> PMD instead but I'd prefer to see an indication that the lsc mode was the
> cause.
> >
> > If we have a cleaner way to fall back to PMD that’s ok by me but I think
> we have to allow for a possible failure in during a reconfigure and follow
> the prescribed behavior as per netdev_reconfigure() which this code
> currently does.
> >
> >>> I'm a bit late to the thread but I have a few other comments below.
> >>>
> >>> I'd like to get this patch in the next pull request if possible so
> >>> I'd
> >> appreciate if others can give any comments on the patch also.
> >>> Thanks
> >>> Ian
> >>>
> >>>> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
> >>>> ---
> >>>> v5 -> v6:
> >>>> - DPDK install documentation updated.
> >>>> - Status of lsc_interrupt_mode of DPDK interfaces can be read by
> >> command
> >>>>    ovs-appctl dpif/show.
> >>>> - It was suggested to check if the HW supports interrupt mode, but
> >>>> it is not
> >>>>    possible to do without DPDK code change, so it is skipped from
> >>>> this patch.
> >>>> ---
> >>>>   Documentation/intro/install/dpdk.rst | 33
> >>>> +++++++++++++++++++++++++++++++++
> >>>>   lib/netdev-dpdk.c                    | 24 ++++++++++++++++++++++--
> >>>>   vswitchd/vswitch.xml                 | 17 +++++++++++++++++
> >>>>   3 files changed, 72 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/Documentation/intro/install/dpdk.rst
> >>>> b/Documentation/intro/install/dpdk.rst
> >>>> index ed358d5..eb1bc7b 100644
> >>>> --- a/Documentation/intro/install/dpdk.rst
> >>>> +++ b/Documentation/intro/install/dpdk.rst
> >>>> @@ -628,6 +628,39 @@ The average number of packets per output batch
> >>>> can be checked in PMD stats::
> >>>>
> >>>>       $ ovs-appctl dpif-netdev/pmd-stats-show
> >>>>
> >>>> +Link State Change (LSC) detection configuration
> >>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>>> +
> >>>> +There are two methods to get the information when Link State
> >>>> +Change
> >>>> +(LSC) happens on a network interface: by polling or interrupt.
> >>>> +
> >>>> +With the polling method, the main process checks the link state on
> >>>> +a fixed interval. This fixed interval determines how fast a link
> >>>> +change is detected.
> >>>> +
> >>>> +If interrupts are used to get LSC information, the hardware itself
> >>>> +triggers an interrupt when link state change happens, the
> >>>> +interrupt thread wakes up from sleep, updates the information, and
> >>>> +goes back to sleep mode. When no link state change happens (most
> >>>> +of the time), the thread remains in sleep mode and doesn`t use
> >>>> +processor time at all. The disadvantage of this method is that
> >>>> +when an interrupt happens, the processor has to handle it
> >>>> +immediately, so it puts the currently running process to
> >>>> +background, handles the interrupt, and takes the
> >>>> background process back.
> >>>> +
> >>>> +Note that not all PMD drivers support LSC interrupts.
> >>>> +
> >>>> +The default configuration is polling mode. To set interrupt mode,
> >>>> +option ``dpdk-lsc-interrupt`` has to be set to ``true``.
> >>>> +
> >>>> +Command to set interrupt mode for a specific interface::
> >>>> +    $ ovs-vsctl set interface <iface_name>
> >>>> +options:dpdk-lsc-interrupt=true
> >>>> +
> >>>> +Command to set polling mode for a specific interface::
> >>>> +    $ ovs-vsctl set interface <iface_name>
> >>>> +options:dpdk-lsc-interrupt=false
> >>>> +
> >>>> +Command to remove ``dpdk-lsc-interrupt`` option::
> >>>> +    $ ovs-vsctl remove interface <iface_name> options
> >>>> +dpdk-lsc-interrupt
> >>> Just a query, why do we need the above option to remove lsc, is
> >>> setting
> >> lsc true or false with the previous commands not enough?
> >>>> +
> >>>>   Limitations
> >>>>   ------------
> >>>>
> >>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> >>>> 94fb163..e2794e8
> >>>> 100644
> >>>> --- a/lib/netdev-dpdk.c
> >>>> +++ b/lib/netdev-dpdk.c
> >>>> @@ -433,6 +433,12 @@ struct netdev_dpdk {
> >>>>           /* DPDK-ETH hardware offload features,
> >>>>            * from the enum set 'dpdk_hw_ol_features' */
> >>>>           uint32_t hw_ol_features;
> >>>> +
> >>>> +        /* Properties for link state change detection mode.
> >>>> +         * If lsc_interrupt_mode is set to false, poll mode is used,
> >>>> +         * otherwise interrupt mode is used. */
> >>>> +        bool requested_lsc_interrupt_mode;
> >>>> +        bool lsc_interrupt_mode;
> >>>>       );
> >>>>
> >>>>       PADDED_MEMBERS(CACHE_LINE_SIZE, @@ -686,12 +692,14 @@
> >>>> dpdk_watchdog(void *dummy OVS_UNUSED)  }
> >>>>
> >>>>   static int
> >>>> -dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int
> >>>> n_txq)
> >>>> +dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int
> >>>> +n_txq)
> >>>>   {
> >>>>       int diag = 0;
> >>>>       int i;
> >>>>       struct rte_eth_conf conf = port_conf;
> >>>>
> >>>> +    conf.intr_conf.lsc = dev->lsc_interrupt_mode;
> >>> Should above be dev->requested_lsc_interrupt_mode? Similar to how we
> >> request MTUs, we first have to validate if it's supported.
> >>> Since we have no way of knowing if the interrupt mode is supported
> >>> at
> >> the moment we should set conf.intr_conf.lsc = dev-
> >>> requested_lsc_interrupt_mode and once configuration succeeds set
> >>> dev- lsc_interrupt_mode = conf.intr_conf.lsc.
> >>>
> >>> If we don't then I'm not sure of the purpose of having
> >> requested_lsc_interrupt_mode and a separate lsc_interrupt_mode for dev.
> >>> Thoughts?
> >>>
> >>>> +
> >>>>       /* For some NICs (e.g. Niantic), scatter_rx mode needs to be
> >>>> explicitly
> >>>>        * enabled. */
> >>>>       if (dev->mtu > ETHER_MTU) {
> >>>> @@ -801,7 +809,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
> >>>>       n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
> >>>>       n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> >>>>
> >>>> -    diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
> >>>> +    diag = dpdk_eth_dev_port_config(dev, n_rxq, n_txq);
> >>>>       if (diag) {
> >>>>           VLOG_ERR("Interface %s(rxq:%d txq:%d) configure error: %s",
> >>>>                    dev->up.name, n_rxq, n_txq,
> >>>> rte_strerror(-diag));
> >>> With the change from dpdk_eth_dev_queue_setup to
> >> dpdk_eth_dev_port_config I think we need to improve the error message
> >> also logged above.
> >>> Instead of only reporting the rxq and txq we should include the
> >> requested lsc mode also as that could cause a failure if not supported.
> >>>   @@ -
> >>>> 897,6 +905,7 @@ common_construct(struct netdev *netdev, dpdk_port_t
> >>>> port_no,
> >>>>       dev->flags = 0;
> >>>>       dev->requested_mtu = ETHER_MTU;
> >>>>       dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> >>>> +    dev->requested_lsc_interrupt_mode = 0;
> >>>>       ovsrcu_index_init(&dev->vid, -1);
> >>>>       dev->vhost_reconfigured = false;
> >>>>       dev->attached = false;
> >>>> @@ -1320,6 +1329,8 @@ netdev_dpdk_get_config(const struct netdev
> >>>> *netdev, struct smap *args)
> >>>>           } else {
> >>>>               smap_add(args, "rx_csum_offload", "false");
> >>>>           }
> >>>> +        smap_add(args, "lsc_interrupt_mode",
> >>>> +                 dev->lsc_interrupt_mode?"true":"false");
> >>> Need space before/after operators '?' and ':' above.
> >>>
> >>>>       }
> >>>>       ovs_mutex_unlock(&dev->mutex);
> >>>>
> >>>> @@ -1520,6 +1531,12 @@ netdev_dpdk_set_config(struct netdev
> >>>> *netdev, const struct smap *args,
> >>>>           goto out;
> >>>>       }
> >>>>
> >>>> +    bool lsc_interrupt_mode = smap_get_bool(args,
> >>>> + "dpdk-lsc-interrupt",
> >>> This comes down to style preference but can you declare this bool at
> >>> the
> >> beginning of the function? There are other bool variable in the
> >> function declared there already so it keeps with the existing style
> >> of the function.
> >>
> >> +1
> >>
> >>>> false);
> >>>> +    if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
> >>>> +        dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
> >>>> +        netdev_request_reconfigure(netdev);
> >>>> +    }
> >>>> +
> >>>>       rx_fc_en = smap_get_bool(args, "rx-flow-ctrl", false);
> >>>>       tx_fc_en = smap_get_bool(args, "tx-flow-ctrl", false);
> >>>>       autoneg = smap_get_bool(args, "flow-ctrl-autoneg", false); @@
> >>>> -3546,6
> >>>> +3563,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> >>>>       if (netdev->n_txq == dev->requested_n_txq
> >>>>           && netdev->n_rxq == dev->requested_n_rxq
> >>>>           && dev->mtu == dev->requested_mtu
> >>>> +        && dev->lsc_interrupt_mode ==
> >>>> + dev->requested_lsc_interrupt_mode
> >>>>           && dev->rxq_size == dev->requested_rxq_size
> >>>>           && dev->txq_size == dev->requested_txq_size
> >>>>           && dev->socket_id == dev->requested_socket_id) { @@
> >>>> -3561,6
> >>>> +3579,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> >>>>           goto out;
> >>>>       }
> >>>>
> >>>> +    dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode;
> >>> I think this would be removed from here and only set once
> >> rte_eth_dev_configure succeeds in dpdk_eth_dev_port_config as
> >> mentioned earlier?
> >>>> +
> >>>>       netdev->n_txq = dev->requested_n_txq;
> >>>>       netdev->n_rxq = dev->requested_n_rxq;
> >>>>
> >>> It could be useful to report the LSC mode configured for the port
> >>> also as part of the port status, something like
> >>>
> >>> @@ -2781,6 +2783,8 @@ netdev_dpdk_get_status(const struct netdev
> >> *netdev, struct smap *args)
> >>>                              dev_info.max_hash_mac_addrs);
> >>>       smap_add_format(args, "max_vfs", "%u", dev_info.max_vfs);
> >>>       smap_add_format(args, "max_vmdq_pools", "%u",
> >>> dev_info.max_vmdq_pools);
> >>> +    smap_add_format(args, "lsc_mode", "%s",
> >>> +                           dev->lsc_interrupt_mode ? "Interrupt" :
> >>> + "PMD");
> >>>
> >>> Thoughts?
> >> We have this in 'netdev_dpdk_get_config()'. Do you think we need this
> >> in two places? Also, 'netdev_dpdk_get_status()' is more like HW
> >> specific invariants. (Not sure why we're placing these values in
> >> 'status', it's a bit strange.)
> > Ya, I wasn't sure here so thought I'd suggest it.
> >
> > You have values in get_status like max_rx_packet_length that change
> based on user configuration, so for completeness really I suggested it be
> added here.
> >
> > I don’t feel particularly strongly about it though, wanted to raise it
> for discussion as it may be something the user expects.
> >
> > Ian
> >>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> >>>> 0c6a43d..3c9e637 100644
> >>>> --- a/vswitchd/vswitch.xml
> >>>> +++ b/vswitchd/vswitch.xml
> >>>> @@ -3631,6 +3631,23 @@ ovs-vsctl add-port br0 p0 -- set Interface
> >>>> p0 type=patch options:peer=p1 \
> >>>>         </column>
> >>>>       </group>
> >>>>
> >>>> +    <group title="Link State Change detection mode">
> >>>> +      <column name="options" key="dpdk-lsc-interrupt"
> >>>> +              type='{"type": "boolean"}'>
> >>>> +        <p>
> >>>> +          Set this value to <code>true</code> to configure
> >>>> + interrupt mode
> >>>> for
> >>>> +          Link State Change (LSC) detection instead of poll mode
> >>>> + for the
> >>>> DPDK
> >>>> +          interface.
> >>>> +        </p>
> >>>> +        <p>
> >>>> +          If this value is not set, poll mode is configured.
> >>>> +        </p>
> >>>> +        <p>
> >>>> +          This parameter has an effect only on netdev dpdk
> interfaces.
> >>>> +        </p>
> >>>> +      </column>
> >>>> +    </group>
> >>>> +
> >>>>       <group title="Common Columns">
> >>>>         The overall purpose of these columns is described under
> >>>> <code>Common
> >>>>         Columns</code> at the beginning of this document.
> >>>> --
> >>>> 1.9.1
> >>>>
> >>>> _______________________________________________
> >>>> dev mailing list
> >>>> dev@openvswitch.org
> >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>
> >>>
Eelco Chaudron April 9, 2018, 7:53 a.m. | #7
On 05/04/18 15:17, Stokes, Ian wrote:
>> On 03/27/2018 04:07 PM, Stokes, Ian wrote:
>>>> On 27.03.2018 13:19, Stokes, Ian wrote:
>>>>>> It is possible to change LSC detection mode to polling or interrupt
>>>>>> mode for DPDK interfaces. The default is polling mode. To set
>>>>>> interrupt mode, option dpdk-lsc-interrupt has to be set to true.
>>>>>>
>>>>>> In polling mode more processor time is needed, since the OVS
>>>>>> repeatedly reads the link state with a short period. It can lead to
>>>>>> packet loss for certain systems.
>>>>>>
>>>>>> In interrupt mode the hardware itself triggers an interrupt when
>>>>>> link state change happens, so less processing time needs for the OVS.
>>>>>>
>>>>>> For detailed description and usage see the dpdk install
>> documentation.
>>>> Could you, please, better describe why we need this change?
>>>> Because we're not removing the polling thread. OVS will still poll
>>>> the link states periodically. This config option has no effect on that
>> side.
>>>> Also, link state polling in OVS uses 'rte_eth_link_get_nowait()'
>>>> function which will be called in both cases and should not wait for
>>>> hardware reply in any implementation.
>> rte_eth_link_get_nowait() on Intel XL710 could take an excessive time to
>> respond. The following patch, https://dpdk.org/ml/archives/dev/2018-
>> March/092156.html is taking care of it from a DPDK side.
>>
>> There might be other drivers that also take a long time, hence this patch
>> might still be useful in the future.
>>
>>> I believe it was related to a case where bonded mode in active back
>>> was causing packet drops due to the frequency that the LSC was being
>>> polled. Using interrupt based approach alleviated the issue. (I'm open
>>> to correction on this :))
>>>
>>> @Robert/Eelco You may be able to provide some more light here and
>> whether the patches below in DPDK resolve the issue?
>> This long delay can be an issue in bonding mode, as the links checks for
>> bonding interfaces is holding the RW lock in bond_run(). This same lock is
>> taken in the PMD thread when calling the bond_check_admissibility() for
>> upcall traffic.
>>>> There was recent bug fix for intel NICs that fixes waiting of an
>>>> admin queue on link state requests despite of 'no_wait' flag:
>>>>       http://dpdk.org/ml/archives/dev/2018-March/092156.html
>>>> Will this fix your target case?
>>>>
>>>> So, the difference of execution time of 'rte_eth_link_get_nowait()'
>>>> with enabled and disabled interrupts should be not so significant.
>>>> Do you have performance measurements? Measurement with above fix
>> applied?
>> I do not have delay numbers but I know that we were no longer seeing
>> dropped traffic compared to other NICs under the same load with upcall
>> traffic present.
>>>>> Thanks for working on this Robert.
>>>>>
>>>>> I've completed some testing including the case where LSC is not
>>>> supported, in which case the port will remain in a down state and
>>>> fail rx/tx traffic. This behavior conforms to the netdev_reconfigure
>>>> expectations in the fail case so that's ok.
>>>>
>>>> I'm not sure if this is acceptable. For example, we're not failing
>>>> reconfiguration in case of issues with number of queues. We're trying
>>>> different numbers until we have working configuration.
>>>> Maybe we need the same fall-back mechanism in case of not supported
>>>> LSC interrupts? (MTU setup errors are really uncommon unlike LSC
>> interrupts'
>>>> support in PMDs).
>>> Thanks for raising this Ilya.
>>>
>>> I thought of this as well. I'd like to see a fall back to the PMD but
>> didn’t see how it could be done in a clean way.
>>> Unfortunately rte_eth_dev_configure() returns -EINVAL when lsc mode is
>> requested but not supported.
>>> It doesn't give us a clue if the error is related to lsc mode as it
>> could also relate to a number of other configure issues such as
>> nb_rxq/nb_txq/portid etc.
>>> It would be better if we could query the device via ethdev api to see if
>> it supports lsc interrupt mode but that’s not available currently.
>> Maybe a DPDK patch before we continue?
> It's hard to say. As part of the call to rte_eth_dev_configure() there is a check specifically to see if lsc interrupt is supported with the following.
>
> 	/* Check that the device supports requested interrupts */
> 	if ((dev_conf->intr_conf.lsc == 1) &&
> 		(!(dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC))) {
> 			RTE_PMD_DEBUG_TRACE("driver %s does not support lsc\n",
> 					dev->device->driver->name);
> 			return -EINVAL;
> 	}
>
> Even if a patch was submitted to extend the API in DPDK to allow this check specifically, I feel it will be the same code as above, just in a new function name. The check would remain in rte_eth_dev_configure() anyway.
I was more hinting to the fact that the only way to get this device 
information is trough some global data set, 
rte_eth_devices[port_id].data->dev_flags, would be nice if a clean API 
existed. But if we are ok, we can access it directly from OVS.

> We could do something similar in OVS for now to allow PMD fallback where it's not supported.
>
> Ian
>
<SNIP>

Patch

diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
index ed358d5..eb1bc7b 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -628,6 +628,39 @@  The average number of packets per output batch can be checked in PMD stats::

     $ ovs-appctl dpif-netdev/pmd-stats-show

+Link State Change (LSC) detection configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are two methods to get the information when Link State Change (LSC)
+happens on a network interface: by polling or interrupt.
+
+With the polling method, the main process checks the link state on a
+fixed interval. This fixed interval determines how fast a link change is
+detected.
+
+If interrupts are used to get LSC information, the hardware itself triggers
+an interrupt when link state change happens, the interrupt thread wakes up
+from sleep, updates the information, and goes back to sleep mode. When no link
+state change happens (most of the time), the thread remains in sleep mode and
+doesn`t use processor time at all. The disadvantage of this method is that
+when an interrupt happens, the processor has to handle it immediately, so it
+puts the currently running process to background, handles the interrupt, and
+takes the background process back.
+
+Note that not all PMD drivers support LSC interrupts.
+
+The default configuration is polling mode. To set interrupt mode, option
+``dpdk-lsc-interrupt`` has to be set to ``true``.
+
+Command to set interrupt mode for a specific interface::
+    $ ovs-vsctl set interface <iface_name> options:dpdk-lsc-interrupt=true
+
+Command to set polling mode for a specific interface::
+    $ ovs-vsctl set interface <iface_name> options:dpdk-lsc-interrupt=false
+
+Command to remove ``dpdk-lsc-interrupt`` option::
+    $ ovs-vsctl remove interface <iface_name> options dpdk-lsc-interrupt
+
 Limitations
 ------------

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 94fb163..e2794e8 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -433,6 +433,12 @@  struct netdev_dpdk {
         /* DPDK-ETH hardware offload features,
          * from the enum set 'dpdk_hw_ol_features' */
         uint32_t hw_ol_features;
+
+        /* Properties for link state change detection mode.
+         * If lsc_interrupt_mode is set to false, poll mode is used,
+         * otherwise interrupt mode is used. */
+        bool requested_lsc_interrupt_mode;
+        bool lsc_interrupt_mode;
     );

     PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -686,12 +692,14 @@  dpdk_watchdog(void *dummy OVS_UNUSED)
 }

 static int
-dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
+dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
 {
     int diag = 0;
     int i;
     struct rte_eth_conf conf = port_conf;

+    conf.intr_conf.lsc = dev->lsc_interrupt_mode;
+
     /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly
      * enabled. */
     if (dev->mtu > ETHER_MTU) {
@@ -801,7 +809,7 @@  dpdk_eth_dev_init(struct netdev_dpdk *dev)
     n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
     n_txq = MIN(info.max_tx_queues, dev->up.n_txq);

-    diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
+    diag = dpdk_eth_dev_port_config(dev, n_rxq, n_txq);
     if (diag) {
         VLOG_ERR("Interface %s(rxq:%d txq:%d) configure error: %s",
                  dev->up.name, n_rxq, n_txq, rte_strerror(-diag));
@@ -897,6 +905,7 @@  common_construct(struct netdev *netdev, dpdk_port_t port_no,
     dev->flags = 0;
     dev->requested_mtu = ETHER_MTU;
     dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
+    dev->requested_lsc_interrupt_mode = 0;
     ovsrcu_index_init(&dev->vid, -1);
     dev->vhost_reconfigured = false;
     dev->attached = false;
@@ -1320,6 +1329,8 @@  netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
         } else {
             smap_add(args, "rx_csum_offload", "false");
         }
+        smap_add(args, "lsc_interrupt_mode",
+                 dev->lsc_interrupt_mode?"true":"false");
     }
     ovs_mutex_unlock(&dev->mutex);

@@ -1520,6 +1531,12 @@  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
         goto out;
     }

+    bool lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false);
+    if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
+        dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
+        netdev_request_reconfigure(netdev);
+    }
+
     rx_fc_en = smap_get_bool(args, "rx-flow-ctrl", false);
     tx_fc_en = smap_get_bool(args, "tx-flow-ctrl", false);
     autoneg = smap_get_bool(args, "flow-ctrl-autoneg", false);
@@ -3546,6 +3563,7 @@  netdev_dpdk_reconfigure(struct netdev *netdev)
     if (netdev->n_txq == dev->requested_n_txq
         && netdev->n_rxq == dev->requested_n_rxq
         && dev->mtu == dev->requested_mtu
+        && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
         && dev->rxq_size == dev->requested_rxq_size
         && dev->txq_size == dev->requested_txq_size
         && dev->socket_id == dev->requested_socket_id) {
@@ -3561,6 +3579,8 @@  netdev_dpdk_reconfigure(struct netdev *netdev)
         goto out;
     }

+    dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode;
+
     netdev->n_txq = dev->requested_n_txq;
     netdev->n_rxq = dev->requested_n_rxq;

diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 0c6a43d..3c9e637 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -3631,6 +3631,23 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
       </column>
     </group>

+    <group title="Link State Change detection mode">
+      <column name="options" key="dpdk-lsc-interrupt"
+              type='{"type": "boolean"}'>
+        <p>
+          Set this value to <code>true</code> to configure interrupt mode for
+          Link State Change (LSC) detection instead of poll mode for the DPDK
+          interface.
+        </p>
+        <p>
+          If this value is not set, poll mode is configured.
+        </p>
+        <p>
+          This parameter has an effect only on netdev dpdk interfaces.
+        </p>
+      </column>
+    </group>
+
     <group title="Common Columns">
       The overall purpose of these columns is described under <code>Common
       Columns</code> at the beginning of this document.