diff mbox series

[ovs-dev,ovs-dev,v4] netdev-dpdk: Configurable Link State Change (LSC) detection mode

Message ID HE1PR07MB12120ED57113E8739D8426CFEDF90@HE1PR07MB1212.eurprd07.prod.outlook.com
State Superseded
Delegated to: Ian Stokes
Headers show
Series [ovs-dev,ovs-dev,v4] netdev-dpdk: Configurable Link State Change (LSC) detection mode | expand

Commit Message

Róbert Mulik Feb. 2, 2018, 2:05 p.m. UTC
It is possible to change LSC detection mode to polling or interrupt mode
for DPDK interfaces. The default is polling mode. To set interrupt mode,
option dpdk-lsc-interrupt has to be set to true.

In polling mode more processor time is needed, since the OVS repeatedly reads
the link state with a short period. It can lead to packet loss for certain
systems.

In interrupt mode the hardware itself triggers an interrupt when link state
change happens, so less processing time needs for the OVS. It is not possible
to enable the interrupt mode on all hardware.

For detailed description and usage see the dpdk install documentation.

Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
Reviewed-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Ian Stokes <ian.stokes@intel.com>
Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
---
 Documentation/intro/install/dpdk.rst | 48 ++++++++++++++++++++++++++++++++++++
 lib/netdev-dpdk.c                    | 42 ++++++++++++++++++++++++++++---
 lib/netdev-dpdk.h                    |  2 ++
 vswitchd/bridge.c                    |  8 ++++++
 vswitchd/vswitch.xml                 | 45 +++++++++++++++++++++++++++++++++
 5 files changed, 142 insertions(+), 3 deletions(-)

--
1.9.1

Comments

Ilya Maximets Feb. 2, 2018, 4:17 p.m. UTC | #1
On 02.02.2018 17:05, Róbert Mulik wrote:
> It is possible to change LSC detection mode to polling or interrupt mode
> for DPDK interfaces. The default is polling mode. To set interrupt mode,
> option dpdk-lsc-interrupt has to be set to true.
> 
> In polling mode more processor time is needed, since the OVS repeatedly reads
> the link state with a short period. It can lead to packet loss for certain
> systems.
> 
> In interrupt mode the hardware itself triggers an interrupt when link state
> change happens, so less processing time needs for the OVS. It is not possible
> to enable the interrupt mode on all hardware.
> 
> For detailed description and usage see the dpdk install documentation.
> 
> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
> Reviewed-by: Ilya Maximets <i.maximets@samsung.com>

I'm sorry, but this is not true. I even stated that in my reply to v3:
"Not a full review.".

Also, we didn't decide what to do with some of my concerns.

As OVS doesn't usually use 'Reviewed-by' tag, I'll point you to the kernel docs:
    https://www.kernel.org/doc/html/v4.12/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

> Reviewed-by: Ian Stokes <ian.stokes@intel.com>
> Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
> ---

It's a good common practice to put the changes between patch versions here.
It's really hard to track all the changes.

>  Documentation/intro/install/dpdk.rst | 48 ++++++++++++++++++++++++++++++++++++
>  lib/netdev-dpdk.c                    | 42 ++++++++++++++++++++++++++++---
>  lib/netdev-dpdk.h                    |  2 ++
>  vswitchd/bridge.c                    |  8 ++++++
>  vswitchd/vswitch.xml                 | 45 +++++++++++++++++++++++++++++++++
>  5 files changed, 142 insertions(+), 3 deletions(-)
> 
> --
> 1.9.1
> 
> diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
> index ed358d5..14a6684 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -628,6 +628,54 @@ The average number of packets per output batch can be checked in PMD stats::
> 
>      $ ovs-appctl dpif-netdev/pmd-stats-show
> 

It's still not a full review.
But I wanted to say that below documentation is just text and definitely
not a reStructuredText. Please re-format it using rST markup features.
Look at the docs around for examples.

> +Link State Change (LSC) detection configuration
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +There are two methods to get the information when Link State Change (LSC)
> +happens on a network interface: by polling or interrupt.
> +
> +With polling method, a process is running in the background and repeatedly
> +reads the link state with a short period. It continuously needs processor time
> +and between 2 reading periods it can`t see the link state change, therefore
> +the reaction time depends on the polling period. With higher rate, more
> +processor time is needed. Another problem with the poll mode is that on some
> +hardware a polling cycle takes too much time, which (in the end) leads to
> +packet loss for certain systems.
> +
> +If interrupts are used to get LSC information, the hardware itself triggers
> +an interrupt when link state change happens, the thread wakes up from sleep,
> +updates the information, and goes back to sleep mode. When no link state
> +change happens (most of the time), the thread remains in sleep mode and
> +doesn`t use processor time at all. The disadvantage of this method is that
> +when interrupt happens, the processor has to handle it immediately, so it
> +puts the currently running process to background, handles the interrupt, and
> +takes the background process back. Another disadvantage is that some hardware
> +can`t be configured to generate LSC interrupts.
> +
> +The default configuration is polling mode. To set interrupt mode, option
> +dpdk-lsc-interrupt has to be set to true.
> +
> +Global settings
> +
> +Command to set interrupt mode for all interfaces:
> +ovs-vsctl set Open_vSwitch . other_config:dpdk-lsc-interrupt=true
> +
> +Command to set polling mode for all interfaces:
> +ovs-vsctl set Open_vSwitch . other_config:dpdk-lsc-interrupt=false
> +or:
> +ovs-vsctl remove Open_vSwitch . other_config dpdk-lsc-interrupt
> +
> +Interface specific settings (override global settings)
> +
> +Command to set interrupt mode for a specific interface:
> +ovs-vsctl set interface <interface_name> options:dpdk-lsc-interrupt=true
> +
> +Command to set polling mode for a specific interface:
> +ovs-vsctl set interface <interface_name> options:dpdk-lsc-interrupt=false
> +
> +Command to reset to globally defined mode for a specific interface:
> +ovs-vsctl remove interface <interface_name> options dpdk-lsc-interrupt
> +
>  Limitations
>  ------------
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 94fb163..73d0d4b 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -148,7 +148,7 @@ typedef uint16_t dpdk_port_t;
>  #define VHOST_ENQ_RETRY_NUM 8
>  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
> 
> -static const struct rte_eth_conf port_conf = {
> +static struct rte_eth_conf port_conf = {
>      .rxmode = {
>          .mq_mode = ETH_MQ_RX_RSS,
>          .split_hdr_size = 0,
> @@ -167,6 +167,10 @@ static const struct rte_eth_conf port_conf = {
>      .txmode = {
>          .mq_mode = ETH_MQ_TX_NONE,
>      },
> +    .intr_conf = {
> +        /* LSC interrupt mode disabled, polling mode used. */
> +        .lsc = 0,
> +    },
>  };
> 
>  /*
> @@ -433,6 +437,12 @@ struct netdev_dpdk {
>          /* DPDK-ETH hardware offload features,
>           * from the enum set 'dpdk_hw_ol_features' */
>          uint32_t hw_ol_features;
> +
> +        /* Properties for link state change detection mode.
> +         * If lsc_interrupt_mode is set to false, poll mode is used,
> +         * otherwise interrupt mode is used. */
> +        bool requested_lsc_interrupt_mode;
> +        bool lsc_interrupt_mode;
>      );
> 
>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> @@ -459,6 +469,18 @@ int netdev_dpdk_get_vid(const struct netdev_dpdk *dev);
>  struct ingress_policer *
>  netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev);
> 
> +void
> +netdev_dpdk_set_default_lsc_detect_mode(bool enabled)
> +{
> +    port_conf.intr_conf.lsc = enabled;
> +}
> +
> +bool
> +netdev_dpdk_get_def_lsc_int_mode_enabled(void)
> +{
> +    return port_conf.intr_conf.lsc;
> +}
> +
>  static bool
>  is_dpdk_class(const struct netdev_class *class)
>  {
> @@ -686,12 +708,14 @@ dpdk_watchdog(void *dummy OVS_UNUSED)
>  }
> 
>  static int
> -dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
> +dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
>  {
>      int diag = 0;
>      int i;
>      struct rte_eth_conf conf = port_conf;
> 
> +    conf.intr_conf.lsc = dev->lsc_interrupt_mode;
> +
>      /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly
>       * enabled. */
>      if (dev->mtu > ETHER_MTU) {
> @@ -801,7 +825,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>      n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
>      n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> 
> -    diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
> +    diag = dpdk_eth_dev_port_config(dev, n_rxq, n_txq);
>      if (diag) {
>          VLOG_ERR("Interface %s(rxq:%d txq:%d) configure error: %s",
>                   dev->up.name, n_rxq, n_txq, rte_strerror(-diag));
> @@ -897,6 +921,8 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no,
>      dev->flags = 0;
>      dev->requested_mtu = ETHER_MTU;
>      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> +    dev->requested_lsc_interrupt_mode =
> +        netdev_dpdk_get_def_lsc_int_mode_enabled();
>      ovsrcu_index_init(&dev->vid, -1);
>      dev->vhost_reconfigured = false;
>      dev->attached = false;
> @@ -1520,6 +1546,13 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>          goto out;
>      }
> 
> +    bool lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt",
> +                              netdev_dpdk_get_def_lsc_int_mode_enabled());
> +    if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
> +        dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
> +        netdev_request_reconfigure(netdev);
> +    }
> +
>      rx_fc_en = smap_get_bool(args, "rx-flow-ctrl", false);
>      tx_fc_en = smap_get_bool(args, "tx-flow-ctrl", false);
>      autoneg = smap_get_bool(args, "flow-ctrl-autoneg", false);
> @@ -3546,6 +3579,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>      if (netdev->n_txq == dev->requested_n_txq
>          && netdev->n_rxq == dev->requested_n_rxq
>          && dev->mtu == dev->requested_mtu
> +        && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>          && dev->rxq_size == dev->requested_rxq_size
>          && dev->txq_size == dev->requested_txq_size
>          && dev->socket_id == dev->requested_socket_id) {
> @@ -3561,6 +3595,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>          goto out;
>      }
> 
> +    dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode;
> +
>      netdev->n_txq = dev->requested_n_txq;
>      netdev->n_rxq = dev->requested_n_rxq;
> 
> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
> index b7d02a7..8eb51bc 100644
> --- a/lib/netdev-dpdk.h
> +++ b/lib/netdev-dpdk.h
> @@ -27,6 +27,8 @@ struct dp_packet;
> 
>  void netdev_dpdk_register(void);
>  void free_dpdk_buf(struct dp_packet *);
> +void netdev_dpdk_set_def_lsc_int_mode_enabled(bool enabled);
> +bool netdev_dpdk_get_def_lsc_int_mode_enabled(void);
> 
>  #else
> 
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index f44f950..87608a5 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -68,6 +68,9 @@
>  #include "lib/vswitch-idl.h"
>  #include "xenserver.h"
>  #include "vlan-bitmap.h"
> +#ifdef DPDK_NETDEV
> +#include "./lib/netdev-provider.h"
> +#endif
> 
>  VLOG_DEFINE_THIS_MODULE(bridge);
> 
> @@ -601,6 +604,11 @@ bridge_reconfigure(const struct ovsrec_open_vswitch *ovs_cfg)
>      ofproto_set_vlan_limit(smap_get_int(&ovs_cfg->other_config, "vlan-limit",
>                                         LEGACY_MAX_VLAN_HEADERS));
> 
> +#ifdef DPDK_NETDEV
> +    netdev_dpdk_set_def_lsc_int_mode_enabled(
> +        smap_get_bool(&ovs_cfg->other_config, "dpdk-lsc-interrupt", false));
> +#endif
> +

I'm still thinking that this is not necessary.
Eelco, why do you think we need to change the default global value in run-time?
IMHO, this only complicates the code. (I didn't check, but it looks like current
implementation will not work anyway.)

>      ofproto_set_threads(
>          smap_get_int(&ovs_cfg->other_config, "n-handler-threads", 0),
>          smap_get_int(&ovs_cfg->other_config, "n-revalidator-threads", 0));
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 0c6a43d..91a9003 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -320,6 +320,29 @@
>          </p>
>        </column>
> 
> +      <column name="other_config" key="dpdk-lsc-interrupt"
> +              type='{"type": "boolean"}'>
> +        <p>
> +          Set this value to <code>true</code> to configure interrupt mode for
> +          Link State Change (LSC) detection instead of poll mode for DPDK
> +          interfaces.
> +        </p>
> +        <p>
> +          The default value is <code>false</code>.
> +        </p>
> +        <p>
> +          If this value is <code>false</code> at startup, poll mode is used for
> +          all netdev dpdk interfaces.
> +        </p>
> +        <p>
> +          This value can be overridden for a specific interface in the options
> +          section of that interface.
> +        </p>
> +        <p>
> +          This parameter has an effect only on netdev dpdk interfaces.
> +        </p>
> +      </column>
> +
>        <column name="other_config" key="dpdk-extra"
>                type='{"type": "string"}'>
>          <p>
> @@ -3631,6 +3654,28 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>        </column>
>      </group>
> 
> +    <group title="Link State Change detection mode">
> +      <column name="options" key="dpdk-lsc-interrupt"
> +              type='{"type": "boolean"}'>
> +        <p>
> +          Set this value to <code>true</code> to configure interrupt mode for
> +          Link State Change (LSC) detection instead of poll mode for the DPDK
> +          interface.
> +        </p>
> +        <p>
> +          If this value is not set, the value is taken from the global
> +          settings.
> +        </p>
> +        <p>
> +          If this value is set, the global LSC interrupt settings are
> +          overridden.
> +        </p>
> +        <p>
> +          This parameter has an effect only on netdev dpdk interfaces.
> +        </p>
> +      </column>
> +    </group>
> +
>      <group title="Common Columns">
>        The overall purpose of these columns is described under <code>Common
>        Columns</code> at the beginning of this document.
>
Róbert Mulik Feb. 5, 2018, 2:18 p.m. UTC | #2
Hi Ilya,

As I see you are concerned if the code is working, also Eelco mentioned that the configuration change didn't take effect for him, so I retested the patch.

I added 2 different types of NICs to OVS on the same server:

1
product: 82599ES 10-Gigabit SFI/SFP+ Network Connection
It uses ixgbe kernel driver by default

2
product: Ethernet Controller XL710 for 40GbE QSFP+
It uses i40e kernel driver by default

In my setup I used a Dell 630 with kernel 4.4.0-111-generic, and used vfio-pci kernel driver for the NICs.

I monitored the link_state of the dpdk interfaces and the interrupt counters of /proc/interrupts.

When I changed either the other_config:dpdk-lsc-interrupt part of Open_vSwitch, or the options:dpdk-lsc-interrupt part of the NIC by set or remove,
the behavior was correct: the link_state always changed when reinitialization was triggered, however there were problems with the counters.

1.
This NIC worked fine, link_state was OK, interrupt counters were incremented as expected.

2.
The link_state was OK, but there were problems with the counters.

With the X710 series there are a few problems related to the link_state and interrupts known by intel too. The faulty behavior can be caused by the combination of
the FW of the NIC, the kernel version, the dpdk version, and the i40e kernel driver version (there can be interference between the kernel driver and dpdk driver).


After the verification, as I see the code works fine (the events triggered when the options are changed), but the XL710 card has problems with the interrupt configuration.

Eelco, I know that you are working with X710 series cards. Did you test the patch on one of them, or on a different type?
When (if) you verify the patch, could you please check it on different NICs too?

Thank you!

Regards,
Robert
Ilya Maximets Feb. 5, 2018, 2:31 p.m. UTC | #3
On 05.02.2018 17:18, Róbert Mulik wrote:
> Hi Ilya,
> 
> As I see you are concerned if the code is working, also Eelco mentioned that the configuration change didn't take effect for him, so I retested the patch.
> 
> I added 2 different types of NICs to OVS on the same server:
> 
> 1
> product: 82599ES 10-Gigabit SFI/SFP+ Network Connection
> It uses ixgbe kernel driver by default
> 
> 2
> product: Ethernet Controller XL710 for 40GbE QSFP+
> It uses i40e kernel driver by default
> 
> In my setup I used a Dell 630 with kernel 4.4.0-111-generic, and used vfio-pci kernel driver for the NICs.
> 
> I monitored the link_state of the dpdk interfaces and the interrupt counters of /proc/interrupts.
> 
> When I changed either the other_config:dpdk-lsc-interrupt part of Open_vSwitch, or the options:dpdk-lsc-interrupt part of the NIC by set or remove,
> the behavior was correct: the link_state always changed when reinitialization was triggered, 


That is the key point. When the reinitialization was actually triggered?
Was it triggered right after changing the other_config:dpdk-lsc-interrupt ?
Or you need some additional manipulations to trigger the actual reconfiguration?

> however there were problems with the counters.
> 
> 1.
> This NIC worked fine, link_state was OK, interrupt counters were incremented as expected.
> 
> 2.
> The link_state was OK, but there were problems with the counters.
> 
> With the X710 series there are a few problems related to the link_state and interrupts known by intel too. The faulty behavior can be caused by the combination of
> the FW of the NIC, the kernel version, the dpdk version, and the i40e kernel driver version (there can be interference between the kernel driver and dpdk driver).
> 
> 
> After the verification, as I see the code works fine (the events triggered when the options are changed), but the XL710 card has problems with the interrupt configuration.
> 
> Eelco, I know that you are working with X710 series cards. Did you test the patch on one of them, or on a different type?
> When (if) you verify the patch, could you please check it on different NICs too?
> 
> Thank you!
> 
> Regards,
> Robert
>
Róbert Mulik Feb. 5, 2018, 2:34 p.m. UTC | #4
It was triggered right after the change. This was my ovs setup:

ovs-vsctl show
57390393-2b8b-4aba-b5dc-aaf822138e66
    Bridge br-prv
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:83:00.1", n_rxq="2"}
        Port br-prv
            Interface br-prv
                type: internal
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
                options: {dpdk-devargs="0000:01:00.0", n_rxq="2"}
    ovs_version: "2.9.90"


-----Original Message-----
From: Ilya Maximets [mailto:i.maximets@samsung.com] 

Sent: Monday, February 05, 2018 3:31 PM
To: Róbert Mulik <robert.mulik@ericsson.com>; dev@openvswitch.org
Cc: Stokes, Ian <ian.stokes@intel.com>; Eelco Chaudron <echaudro@redhat.com>
Subject: Re: [ovs-dev, ovs-dev, v4] netdev-dpdk: Configurable Link State Change (LSC) detection mode

On 05.02.2018 17:18, Róbert Mulik wrote:
> Hi Ilya,

> 

> As I see you are concerned if the code is working, also Eelco mentioned that the configuration change didn't take effect for him, so I retested the patch.

> 

> I added 2 different types of NICs to OVS on the same server:

> 

> 1

> product: 82599ES 10-Gigabit SFI/SFP+ Network Connection It uses ixgbe 

> kernel driver by default

> 

> 2

> product: Ethernet Controller XL710 for 40GbE QSFP+ It uses i40e kernel 

> driver by default

> 

> In my setup I used a Dell 630 with kernel 4.4.0-111-generic, and used vfio-pci kernel driver for the NICs.

> 

> I monitored the link_state of the dpdk interfaces and the interrupt counters of /proc/interrupts.

> 

> When I changed either the other_config:dpdk-lsc-interrupt part of 

> Open_vSwitch, or the options:dpdk-lsc-interrupt part of the NIC by set 

> or remove, the behavior was correct: the link_state always changed 

> when reinitialization was triggered,



That is the key point. When the reinitialization was actually triggered?
Was it triggered right after changing the other_config:dpdk-lsc-interrupt ?
Or you need some additional manipulations to trigger the actual reconfiguration?

> however there were problems with the counters.

> 

> 1.

> This NIC worked fine, link_state was OK, interrupt counters were incremented as expected.

> 

> 2.

> The link_state was OK, but there were problems with the counters.

> 

> With the X710 series there are a few problems related to the 

> link_state and interrupts known by intel too. The faulty behavior can be caused by the combination of the FW of the NIC, the kernel version, the dpdk version, and the i40e kernel driver version (there can be interference between the kernel driver and dpdk driver).

> 

> 

> After the verification, as I see the code works fine (the events triggered when the options are changed), but the XL710 card has problems with the interrupt configuration.

> 

> Eelco, I know that you are working with X710 series cards. Did you test the patch on one of them, or on a different type?

> When (if) you verify the patch, could you please check it on different NICs too?

> 

> Thank you!

> 

> Regards,

> Robert

>
Eelco Chaudron Feb. 5, 2018, 4:19 p.m. UTC | #5
See some minor comments inline below...

On 02/02/18 15:05, Róbert Mulik wrote:
> It is possible to change LSC detection mode to polling or interrupt mode
> for DPDK interfaces. The default is polling mode. To set interrupt mode,
> option dpdk-lsc-interrupt has to be set to true.
>
> In polling mode more processor time is needed, since the OVS repeatedly reads
> the link state with a short period. It can lead to packet loss for certain
> systems.
>
> In interrupt mode the hardware itself triggers an interrupt when link state
> change happens, so less processing time needs for the OVS. It is not possible
> to enable the interrupt mode on all hardware.
>
Did you try what happens if the PMD does not support LSC and its enabled 
globally/per interface?
> For detailed description and usage see the dpdk install documentation.
>
> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
> Reviewed-by: Ilya Maximets <i.maximets@samsung.com>
> Reviewed-by: Ian Stokes <ian.stokes@intel.com>
> Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
Guess same comment as Ilya, I reviewed your V2, not this one.
> ---
>   Documentation/intro/install/dpdk.rst | 48 ++++++++++++++++++++++++++++++++++++
>   lib/netdev-dpdk.c                    | 42 ++++++++++++++++++++++++++++---
>   lib/netdev-dpdk.h                    |  2 ++
>   vswitchd/bridge.c                    |  8 ++++++
>   vswitchd/vswitch.xml                 | 45 +++++++++++++++++++++++++++++++++
>   5 files changed, 142 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
> index ed358d5..14a6684 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -628,6 +628,54 @@ The average number of packets per output batch can be checked in PMD stats::
>
>       $ ovs-appctl dpif-netdev/pmd-stats-show
I'm not a native speaker, but some suggestion below:
> +Link State Change (LSC) detection configuration
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +There are two methods to get the information when Link State Change (LSC)
> +happens on a network interface: by polling or interrupt.
> +
> +With polling method, a process is running in the background and repeatedly
> +reads the link state with a short period.

With the polling method, the main process checks the link state on a 
fixed interval.

> It continuously needs processor time
> +and between 2 reading periods it can`t see the link state change, therefore
> +the reaction time depends on the polling period.

This looks a bit confusing to me, maybe it needs something simple as:

     This fixed interval determines how fast a link change is detected.

> With higher rate, more
> +processor time is needed.
Not sure what you try to explain here?
> Another problem with the poll mode is that on some
> +hardware a polling cycle takes too much time, which (in the end) leads to
> +packet loss for certain systems.
> +
Maybe you need something in the lines of:

     Depending on the NIC hardware, a decent amount of time can be spent 
on gathering link state information. When waiting for this the CPU can 
not be used for other tasks.

> +If interrupts are used to get LSC information, the hardware itself triggers
> +an interrupt when link state change happens, the thread wakes up from sleep,
the INTERRUPT thread wakes...
> +updates the information, and goes back to sleep mode. When no link state
> +change happens (most of the time), the thread remains in sleep mode and
> +doesn`t use processor time at all. The disadvantage of this method is that
> +when interrupt happens, the processor has to handle it immediately, so it
when AN interrupt
> +puts the currently running process to background, handles the interrupt, and
> +takes the background process back.

> Another disadvantage is that some hardware
> +can`t be configured to generate LSC interrupts.
> +
Maybe as a general note instead: Note that not all PMD drivers support 
LSC interrupts.
> +The default configuration is polling mode. To set interrupt mode, option
> +dpdk-lsc-interrupt has to be set to true.
> +
> +Global settings
> +
> +Command to set interrupt mode for all interfaces:
> +ovs-vsctl set Open_vSwitch . other_config:dpdk-lsc-interrupt=true
> +
> +Command to set polling mode for all interfaces:
> +ovs-vsctl set Open_vSwitch . other_config:dpdk-lsc-interrupt=false
> +or:
> +ovs-vsctl remove Open_vSwitch . other_config dpdk-lsc-interrupt
> +
> +Interface specific settings (override global settings)
> +
> +Command to set interrupt mode for a specific interface:
> +ovs-vsctl set interface <interface_name> options:dpdk-lsc-interrupt=true
> +
> +Command to set polling mode for a specific interface:
> +ovs-vsctl set interface <interface_name> options:dpdk-lsc-interrupt=false
> +
> +Command to reset to globally defined mode for a specific interface:
> +ovs-vsctl remove interface <interface_name> options dpdk-lsc-interrupt
> +
>   Limitations
>   ------------
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 94fb163..73d0d4b 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -148,7 +148,7 @@ typedef uint16_t dpdk_port_t;
>   #define VHOST_ENQ_RETRY_NUM 8
>   #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
>
> -static const struct rte_eth_conf port_conf = {
> +static struct rte_eth_conf port_conf = {
>       .rxmode = {
>           .mq_mode = ETH_MQ_RX_RSS,
>           .split_hdr_size = 0,
> @@ -167,6 +167,10 @@ static const struct rte_eth_conf port_conf = {
>       .txmode = {
>           .mq_mode = ETH_MQ_TX_NONE,
>       },
> +    .intr_conf = {
> +        /* LSC interrupt mode disabled, polling mode used. */
> +        .lsc = 0,
> +    },
>   };
>
>   /*
> @@ -433,6 +437,12 @@ struct netdev_dpdk {
>           /* DPDK-ETH hardware offload features,
>            * from the enum set 'dpdk_hw_ol_features' */
>           uint32_t hw_ol_features;
> +
> +        /* Properties for link state change detection mode.
> +         * If lsc_interrupt_mode is set to false, poll mode is used,
> +         * otherwise interrupt mode is used. */
> +        bool requested_lsc_interrupt_mode;
> +        bool lsc_interrupt_mode;
>       );
>
>       PADDED_MEMBERS(CACHE_LINE_SIZE,
> @@ -459,6 +469,18 @@ int netdev_dpdk_get_vid(const struct netdev_dpdk *dev);
>   struct ingress_policer *
>   netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev);
>
> +void
> +netdev_dpdk_set_default_lsc_detect_mode(bool enabled)
> +{
> +    port_conf.intr_conf.lsc = enabled;
> +}
> +
> +bool
> +netdev_dpdk_get_def_lsc_int_mode_enabled(void)
> +{
> +    return port_conf.intr_conf.lsc;
> +}
> +
Why not align the getter and setter function? I know Ilya commented on 
this also,
but than you had different types, now its all bool.

netdev_dpdk_set_default_lsc_irq_mode()
netdev_dpdk_get_default_lsc_irq_mode()

>   static bool
>   is_dpdk_class(const struct netdev_class *class)
>   {
> @@ -686,12 +708,14 @@ dpdk_watchdog(void *dummy OVS_UNUSED)
>   }
>
>   static int
> -dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
> +dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
>   {
>       int diag = 0;
>       int i;
>       struct rte_eth_conf conf = port_conf;
>
> +    conf.intr_conf.lsc = dev->lsc_interrupt_mode;
> +
>       /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly
>        * enabled. */
>       if (dev->mtu > ETHER_MTU) {
> @@ -801,7 +825,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>       n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
>       n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
>
> -    diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
> +    diag = dpdk_eth_dev_port_config(dev, n_rxq, n_txq);
>       if (diag) {
>           VLOG_ERR("Interface %s(rxq:%d txq:%d) configure error: %s",
>                    dev->up.name, n_rxq, n_txq, rte_strerror(-diag));
> @@ -897,6 +921,8 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no,
>       dev->flags = 0;
>       dev->requested_mtu = ETHER_MTU;
>       dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> +    dev->requested_lsc_interrupt_mode =
> +        netdev_dpdk_get_def_lsc_int_mode_enabled();
>       ovsrcu_index_init(&dev->vid, -1);
>       dev->vhost_reconfigured = false;
>       dev->attached = false;
> @@ -1520,6 +1546,13 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>           goto out;
>       }
>
> +    bool lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt",
> +                              netdev_dpdk_get_def_lsc_int_mode_enabled());
> +    if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
> +        dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
> +        netdev_request_reconfigure(netdev);
> +    }
> +
>       rx_fc_en = smap_get_bool(args, "rx-flow-ctrl", false);
>       tx_fc_en = smap_get_bool(args, "tx-flow-ctrl", false);
>       autoneg = smap_get_bool(args, "flow-ctrl-autoneg", false);
> @@ -3546,6 +3579,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>       if (netdev->n_txq == dev->requested_n_txq
>           && netdev->n_rxq == dev->requested_n_rxq
>           && dev->mtu == dev->requested_mtu
> +        && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>           && dev->rxq_size == dev->requested_rxq_size
>           && dev->txq_size == dev->requested_txq_size
>           && dev->socket_id == dev->requested_socket_id) {
> @@ -3561,6 +3595,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>           goto out;
>       }
>
> +    dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode;
> +
>       netdev->n_txq = dev->requested_n_txq;
>       netdev->n_rxq = dev->requested_n_rxq;
>
> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
> index b7d02a7..8eb51bc 100644
> --- a/lib/netdev-dpdk.h
> +++ b/lib/netdev-dpdk.h
> @@ -27,6 +27,8 @@ struct dp_packet;
>
>   void netdev_dpdk_register(void);
>   void free_dpdk_buf(struct dp_packet *);
> +void netdev_dpdk_set_def_lsc_int_mode_enabled(bool enabled);
> +bool netdev_dpdk_get_def_lsc_int_mode_enabled(void);
See above, and below on naming...
>   #else
>
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index f44f950..87608a5 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -68,6 +68,9 @@
>   #include "lib/vswitch-idl.h"
>   #include "xenserver.h"
>   #include "vlan-bitmap.h"
> +#ifdef DPDK_NETDEV
> +#include "./lib/netdev-provider.h"
> +#endif
>
>   VLOG_DEFINE_THIS_MODULE(bridge);
>
> @@ -601,6 +604,11 @@ bridge_reconfigure(const struct ovsrec_open_vswitch *ovs_cfg)
>       ofproto_set_vlan_limit(smap_get_int(&ovs_cfg->other_config, "vlan-limit",
>                                          LEGACY_MAX_VLAN_HEADERS));
>
> +#ifdef DPDK_NETDEV
> +    netdev_dpdk_set_def_lsc_int_mode_enabled(
> +        smap_get_bool(&ovs_cfg->other_config, "dpdk-lsc-interrupt", false));
> +#endif
> +
Did you tried to compile your patch? Guess it should be 
netdev_dpdk_set_default_lsc_detect_mode()?

>       ofproto_set_threads(
>           smap_get_int(&ovs_cfg->other_config, "n-handler-threads", 0),
>           smap_get_int(&ovs_cfg->other_config, "n-revalidator-threads", 0));
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 0c6a43d..91a9003 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -320,6 +320,29 @@
>           </p>
>         </column>
>
> +      <column name="other_config" key="dpdk-lsc-interrupt"
> +              type='{"type": "boolean"}'>
> +        <p>
> +          Set this value to <code>true</code> to configure interrupt mode for
> +          Link State Change (LSC) detection instead of poll mode for DPDK
> +          interfaces.
> +        </p>
> +        <p>
> +          The default value is <code>false</code>.
> +        </p>
> +        <p>
> +          If this value is <code>false</code> at startup, poll mode is used for
> +          all netdev dpdk interfaces.
> +        </p>
> +        <p>
> +          This value can be overridden for a specific interface in the options
> +          section of that interface.
> +        </p>
> +        <p>
> +          This parameter has an effect only on netdev dpdk interfaces.
> +        </p>
> +      </column>

In my previous review I mentioned the execution below did not correctly 
trigger the change,
however with this version applied I do not see it (must have been a 
screw up from my side).

ovs-vsctl set interface dpdk0 options:dpdk-lsc-interrupt=true
ovs-vsctl remove interface dpdk0 options dpdk-lsc-interrupt

I also tested all global/local config combinations and seem to work fine!
> +
>         <column name="other_config" key="dpdk-extra"
>                 type='{"type": "string"}'>
>           <p>
> @@ -3631,6 +3654,28 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>         </column>
>       </group>
>
> +    <group title="Link State Change detection mode">
> +      <column name="options" key="dpdk-lsc-interrupt"
> +              type='{"type": "boolean"}'>
> +        <p>
> +          Set this value to <code>true</code> to configure interrupt mode for
> +          Link State Change (LSC) detection instead of poll mode for the DPDK
> +          interface.
> +        </p>
> +        <p>
> +          If this value is not set, the value is taken from the global
> +          settings.
> +        </p>
> +        <p>
> +          If this value is set, the global LSC interrupt settings are
> +          overridden.
> +        </p>
> +        <p>
> +          This parameter has an effect only on netdev dpdk interfaces.
> +        </p>
> +      </column>
> +    </group>
> +
>       <group title="Common Columns">
>         The overall purpose of these columns is described under <code>Common
>         Columns</code> at the beginning of this document.
> --
> 1.9.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Eelco Chaudron Feb. 5, 2018, 4:29 p.m. UTC | #6
On 02/02/18 17:17, Ilya Maximets wrote:
> On 02.02.2018 17:05, Róbert Mulik wrote:
>> It is possible to change LSC detection mode to polling or interrupt mode
>> for DPDK interfaces. The default is polling mode. To set interrupt mode,
>> option dpdk-lsc-interrupt has to be set to true.
>>
>> In polling mode more processor time is needed, since the OVS repeatedly reads
>> the link state with a short period. It can lead to packet loss for certain
>> systems.
>>
>> In interrupt mode the hardware itself triggers an interrupt when link state
>> change happens, so less processing time needs for the OVS. It is not possible
>> to enable the interrupt mode on all hardware.
>>
>> For detailed description and usage see the dpdk install documentation.
>>
>> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
>> Reviewed-by: Ilya Maximets <i.maximets@samsung.com>
> 8<---- SNIP ---->8
>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>> index f44f950..87608a5 100644
>> --- a/vswitchd/bridge.c
>> +++ b/vswitchd/bridge.c
>> @@ -68,6 +68,9 @@
>>   #include "lib/vswitch-idl.h"
>>   #include "xenserver.h"
>>   #include "vlan-bitmap.h"
>> +#ifdef DPDK_NETDEV
>> +#include "./lib/netdev-provider.h"
>> +#endif
>>
>>   VLOG_DEFINE_THIS_MODULE(bridge);
>>
>> @@ -601,6 +604,11 @@ bridge_reconfigure(const struct ovsrec_open_vswitch *ovs_cfg)
>>       ofproto_set_vlan_limit(smap_get_int(&ovs_cfg->other_config, "vlan-limit",
>>                                          LEGACY_MAX_VLAN_HEADERS));
>>
>> +#ifdef DPDK_NETDEV
>> +    netdev_dpdk_set_def_lsc_int_mode_enabled(
>> +        smap_get_bool(&ovs_cfg->other_config, "dpdk-lsc-interrupt", false));
>> +#endif
>> +
> I'm still thinking that this is not necessary.
> Eelco, why do you think we need to change the default global value in run-time?
> IMHO, this only complicates the code. (I didn't check, but it looks like current
> implementation will not work anyway.)
>
I do feel like the global option should be applied directly for the 
following reasons:

- There is no way to see what is the running configuration
- If not done this way the LSC mode might change for some devices, for 
example if you decide to change another port related configuration.
Eelco Chaudron Feb. 5, 2018, 4:48 p.m. UTC | #7
Just finished my review and some basic testing (see other email), so let 
me reply to these inline.


On 05/02/18 15:31, Ilya Maximets wrote:
> On 05.02.2018 17:18, Róbert Mulik wrote:
>> Hi Ilya,
>>
>> As I see you are concerned if the code is working, also Eelco mentioned that the configuration change didn't take effect for him, so I retested the patch.
>>
>> I added 2 different types of NICs to OVS on the same server:
>>
>> 1
>> product: 82599ES 10-Gigabit SFI/SFP+ Network Connection
>> It uses ixgbe kernel driver by default
>>
>> 2
>> product: Ethernet Controller XL710 for 40GbE QSFP+
>> It uses i40e kernel driver by default
>>
>> In my setup I used a Dell 630 with kernel 4.4.0-111-generic, and used vfio-pci kernel driver for the NICs.
>>
>> I monitored the link_state of the dpdk interfaces and the interrupt counters of /proc/interrupts.
>>
>> When I changed either the other_config:dpdk-lsc-interrupt part of Open_vSwitch, or the options:dpdk-lsc-interrupt part of the NIC by set or remove,
>> the behavior was correct: the link_state always changed when reinitialization was triggered,
>
> That is the key point. When the reinitialization was actually triggered?
> Was it triggered right after changing the other_config:dpdk-lsc-interrupt ?
> Or you need some additional manipulations to trigger the actual reconfiguration?
Only the configuration change will trigger configuration change. I 
tested this with a stable link.
I was using GDB to get the triggers, so added a quick bt:

Breakpoint 4, dpdk_eth_dev_port_config (dev=dev@entry=0x7f1bbaa19e80, 
n_rxq=n_rxq@entry=2, n_txq=n_txq@entry=3) at lib/netdev-dpdk.c:712
712    {
#0  dpdk_eth_dev_port_config (dev=dev@entry=0x7f1bbaa19e80, 
n_rxq=n_rxq@entry=2, n_txq=n_txq@entry=3) at lib/netdev-dpdk.c:712
#1  0x00000000007f01dd in dpdk_eth_dev_init 
(dev=dev@entry=0x7f1bbaa19e80) at lib/netdev-dpdk.c:828
#2  0x00000000007f1974 in netdev_dpdk_reconfigure 
(netdev=0x7f1bbaa1af40) at lib/netdev-dpdk.c:3607
#3  0x0000000000752acc in netdev_reconfigure 
(netdev=netdev@entry=0x7f1bbaa1af40) at lib/netdev.c:2073
#4  0x0000000000727ba1 in port_reconfigure (port=port@entry=0x2365a50) 
at lib/dpif-netdev.c:3341
#5  0x000000000072b959 in reconfigure_datapath 
(dp=dp@entry=0x7f2167283010) at lib/dpif-netdev.c:3822
#6  0x000000000072f76c in dpif_netdev_run (dpif=<optimized out>) at 
lib/dpif-netdev.c:3963
#7  0x0000000000731cca in dpif_run (dpif=<optimized out>) at lib/dpif.c:466
#8  0x00000000006f298b in type_run (type=<optimized out>) at 
ofproto/ofproto-dpif.c:344
#9  0x00000000006df916 in ofproto_type_run 
(datapath_type=datapath_type@entry=0x234b560 "netdev") at 
ofproto/ofproto.c:1703
#10 0x00000000006cd1df in bridge_run__ () at vswitchd/bridge.c:2941
#11 0x00000000006d5678 in bridge_reconfigure (ovs_cfg=<optimized out>) 
at vswitchd/bridge.c:727
#12 0x00000000006d5f21 in bridge_run () at vswitchd/bridge.c:3026
#13 0x00000000006d6880 in main (argc=10, argv=0x7ffe6455edb8) at 
vswitchd/ovs-vswitchd.c:120

>> however there were problems with the counters.
>>
>> 1.
>> This NIC worked fine, link_state was OK, interrupt counters were incremented as expected.
>>
>> 2.
>> The link_state was OK, but there were problems with the counters.
Can you explain what you mean with counter problems?
>> With the X710 series there are a few problems related to the link_state and interrupts known by intel too. The faulty behavior can be caused by the combination of
>> the FW of the NIC, the kernel version, the dpdk version, and the i40e kernel driver version (there can be interference between the kernel driver and dpdk driver).
>>
>>
>> After the verification, as I see the code works fine (the events triggered when the options are changed), but the XL710 card has problems with the interrupt configuration.
>>
>> Eelco, I know that you are working with X710 series cards. Did you test the patch on one of them, or on a different type?
>> When (if) you verify the patch, could you please check it on different NICs too?
I have an XL710 card, but for the test I did not check the interrupt 
generation part. I was using it before with a hard coded version, so 
assumed it would work, are you telling me setting it dynamically (i.e. 
changing it multiple times) does not work?

>> Thank you!
>>
>> Regards,
>> Robert
>>
Ilya Maximets Feb. 6, 2018, 12:12 p.m. UTC | #8
On 05.02.2018 19:29, Eelco Chaudron wrote:
> On 02/02/18 17:17, Ilya Maximets wrote:
>> On 02.02.2018 17:05, Róbert Mulik wrote:
>>> It is possible to change LSC detection mode to polling or interrupt mode
>>> for DPDK interfaces. The default is polling mode. To set interrupt mode,
>>> option dpdk-lsc-interrupt has to be set to true.
>>>
>>> In polling mode more processor time is needed, since the OVS repeatedly reads
>>> the link state with a short period. It can lead to packet loss for certain
>>> systems.
>>>
>>> In interrupt mode the hardware itself triggers an interrupt when link state
>>> change happens, so less processing time needs for the OVS. It is not possible
>>> to enable the interrupt mode on all hardware.
>>>
>>> For detailed description and usage see the dpdk install documentation.
>>>
>>> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
>>> Reviewed-by: Ilya Maximets <i.maximets@samsung.com>
>> 8<---- SNIP ---->8
>>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>>> index f44f950..87608a5 100644
>>> --- a/vswitchd/bridge.c
>>> +++ b/vswitchd/bridge.c
>>> @@ -68,6 +68,9 @@
>>>   #include "lib/vswitch-idl.h"
>>>   #include "xenserver.h"
>>>   #include "vlan-bitmap.h"
>>> +#ifdef DPDK_NETDEV
>>> +#include "./lib/netdev-provider.h"
>>> +#endif
>>>
>>>   VLOG_DEFINE_THIS_MODULE(bridge);
>>>
>>> @@ -601,6 +604,11 @@ bridge_reconfigure(const struct ovsrec_open_vswitch *ovs_cfg)
>>>       ofproto_set_vlan_limit(smap_get_int(&ovs_cfg->other_config, "vlan-limit",
>>>                                          LEGACY_MAX_VLAN_HEADERS));
>>>
>>> +#ifdef DPDK_NETDEV
>>> +    netdev_dpdk_set_def_lsc_int_mode_enabled(
>>> +        smap_get_bool(&ovs_cfg->other_config, "dpdk-lsc-interrupt", false));
>>> +#endif
>>> +
>> I'm still thinking that this is not necessary.
>> Eelco, why do you think we need to change the default global value in run-time?
>> IMHO, this only complicates the code. (I didn't check, but it looks like current
>> implementation will not work anyway.)
>>
> I do feel like the global option should be applied directly for the following reasons:
> 
> - There is no way to see what is the running configuration
> - If not done this way the LSC mode might change for some devices, for example if you decide to change another port related configuration.
> 


I feel that we have some kind of misunderstanding.
I'll try to explain my position:

1. I'm not against the global config option. I just want it to be static.
   i.e. OVS restart will be required to change it.

2. Per-port option will allow overriding of the global default configuration.

3. You're saying that the running configuration is not visible, but It'll be
   visible through global default value in other_config and per-port value in
   options of the particular interface. If we'll not set any of these configs,
   the value will not be visible in any case.

   Suggestion: We definitely need to export the current configuration in
               netdev_dpdk_get_config(). And it'll be always visible.

   Maybe you're saying that we can update global default in database and this
   will hide real default value? But it's the default behaviour for many other
   'restart required' options like iommu support of vhost_sock_dir.
   We need to inform about the global default value in VLOG message while
   initializing inside dpdk_init() like it's done for all other options.

4. I didn't understand your second reason. There is no way to change the current
   port by changing config of the another one. DPDK initialized only once, so
   default will be fixed on boot and never changed. Any database changes for a
   global other_config after the initialization phase will be ignored.

Best regards, Ilya Maximets.
Eelco Chaudron Feb. 6, 2018, 1:47 p.m. UTC | #9
On 06/02/18 13:12, Ilya Maximets wrote:
> On 05.02.2018 19:29, Eelco Chaudron wrote:
>> On 02/02/18 17:17, Ilya Maximets wrote:
>>> On 02.02.2018 17:05, Róbert Mulik wrote:
>>>> It is possible to change LSC detection mode to polling or interrupt mode
>>>> for DPDK interfaces. The default is polling mode. To set interrupt mode,
>>>> option dpdk-lsc-interrupt has to be set to true.
>>>>
>>>> In polling mode more processor time is needed, since the OVS repeatedly reads
>>>> the link state with a short period. It can lead to packet loss for certain
>>>> systems.
>>>>
>>>> In interrupt mode the hardware itself triggers an interrupt when link state
>>>> change happens, so less processing time needs for the OVS. It is not possible
>>>> to enable the interrupt mode on all hardware.
>>>>
>>>> For detailed description and usage see the dpdk install documentation.
>>>>
>>>> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
>>>> Reviewed-by: Ilya Maximets <i.maximets@samsung.com>
>>> 8<---- SNIP ---->8
>>>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>>>> index f44f950..87608a5 100644
>>>> --- a/vswitchd/bridge.c
>>>> +++ b/vswitchd/bridge.c
>>>> @@ -68,6 +68,9 @@
>>>>    #include "lib/vswitch-idl.h"
>>>>    #include "xenserver.h"
>>>>    #include "vlan-bitmap.h"
>>>> +#ifdef DPDK_NETDEV
>>>> +#include "./lib/netdev-provider.h"
>>>> +#endif
>>>>
>>>>    VLOG_DEFINE_THIS_MODULE(bridge);
>>>>
>>>> @@ -601,6 +604,11 @@ bridge_reconfigure(const struct ovsrec_open_vswitch *ovs_cfg)
>>>>        ofproto_set_vlan_limit(smap_get_int(&ovs_cfg->other_config, "vlan-limit",
>>>>                                           LEGACY_MAX_VLAN_HEADERS));
>>>>
>>>> +#ifdef DPDK_NETDEV
>>>> +    netdev_dpdk_set_def_lsc_int_mode_enabled(
>>>> +        smap_get_bool(&ovs_cfg->other_config, "dpdk-lsc-interrupt", false));
>>>> +#endif
>>>> +
>>> I'm still thinking that this is not necessary.
>>> Eelco, why do you think we need to change the default global value in run-time?
>>> IMHO, this only complicates the code. (I didn't check, but it looks like current
>>> implementation will not work anyway.)
>>>
>> I do feel like the global option should be applied directly for the following reasons:
>>
>> - There is no way to see what is the running configuration
>> - If not done this way the LSC mode might change for some devices, for example if you decide to change another port related configuration.
>>
>
> I feel that we have some kind of misunderstanding.
> I'll try to explain my position:
>
> 1. I'm not against the global config option. I just want it to be static.
>     i.e. OVS restart will be required to change it.
Got your point, but why do you want it to be static? I'm arguing against it
as other options, like other_config:pmd-cpu-mask, are also not static.
If it can be done dynamic, I prefer doing it that way.

However if we do go with a static option, it should not effect any existing
NIC configuration, see 4.
> 2. Per-port option will allow overriding of the global default configuration.
No problem here... However it's not clear to me how we would handle PMDs 
not
supporting RTE_ETH_DEV_INTR_LSC?For individual interfaces, we might want to
check this flag, and report an error at configuration time, rather than wait
for re-initialization and add something to the log file, EINVAL?

But how do we deal with the global values and unsupported devices? 
Currently
looking at the code it will fail initialization after the restart with no
sensible error message.
> 3. You're saying that the running configuration is not visible, but It'll be
>     visible through global default value in other_config and per-port value in
>     options of the particular interface. If we'll not set any of these configs,
>     the value will not be visible in any case.
>
>     Suggestion: We definitely need to export the current configuration in
>                 netdev_dpdk_get_config(). And it'll be always visible.
I agree we need some way to show the current config because if you set 
the global
value (and not restart) you will have no way of knowing the current 
operational
state.The best way would be to have some ovs-appctl command to do this, 
as logs
might rotate...

Also currently all port settings are on the port, there are no global 
defaults.
So people need to be aware that doing an "ovs-vsctl show" does not show 
all the
actual port configuration.
>
>     Maybe you're saying that we can update global default in database and this
>     will hide real default value? But it's the default behaviour for many other
>     'restart required' options like iommu support of vhost_sock_dir.
>     We need to inform about the global default value in VLOG message while
>     initializing inside dpdk_init() like it's done for all other options.
Yes, and this makes it some times hard to trouble shoot.
>
> 4. I didn't understand your second reason. There is no way to change the current
>     port by changing config of the another one. DPDK initialized only once, so
>     default will be fixed on boot and never changed. Any database changes for a
>     global other_config after the initialization phase will be ignored.
Let me explain, this is with the current implementation if we would have 
a global
static option (i.e. v2 of this patch).Assume you configure the following:

ovs-vsctl set Open_vSwitch . other_config:dpdk-lsc-interrupt=true
ovs-vsctl set interface dpdk1 options:n_rxq=2

The last command will trigger a re-configure of dpdk1, and causes now 
interrupts
to be enabled.

Just want to indicate its a problem with the patch as implemented, if we 
want to
go with a static global configuration it needs to be fixed.

> Best regards, Ilya Maximets.
Ilya Maximets Feb. 6, 2018, 3:01 p.m. UTC | #10
On 06.02.2018 16:47, Eelco Chaudron wrote:
> On 06/02/18 13:12, Ilya Maximets wrote:
>> On 05.02.2018 19:29, Eelco Chaudron wrote:
>>> On 02/02/18 17:17, Ilya Maximets wrote:
>>>> On 02.02.2018 17:05, Róbert Mulik wrote:
>>>>> It is possible to change LSC detection mode to polling or interrupt mode
>>>>> for DPDK interfaces. The default is polling mode. To set interrupt mode,
>>>>> option dpdk-lsc-interrupt has to be set to true.
>>>>>
>>>>> In polling mode more processor time is needed, since the OVS repeatedly reads
>>>>> the link state with a short period. It can lead to packet loss for certain
>>>>> systems.
>>>>>
>>>>> In interrupt mode the hardware itself triggers an interrupt when link state
>>>>> change happens, so less processing time needs for the OVS. It is not possible
>>>>> to enable the interrupt mode on all hardware.
>>>>>
>>>>> For detailed description and usage see the dpdk install documentation.
>>>>>
>>>>> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
>>>>> Reviewed-by: Ilya Maximets <i.maximets@samsung.com>
>>>> 8<---- SNIP ---->8
>>>>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>>>>> index f44f950..87608a5 100644
>>>>> --- a/vswitchd/bridge.c
>>>>> +++ b/vswitchd/bridge.c
>>>>> @@ -68,6 +68,9 @@
>>>>>    #include "lib/vswitch-idl.h"
>>>>>    #include "xenserver.h"
>>>>>    #include "vlan-bitmap.h"
>>>>> +#ifdef DPDK_NETDEV
>>>>> +#include "./lib/netdev-provider.h"
>>>>> +#endif
>>>>>
>>>>>    VLOG_DEFINE_THIS_MODULE(bridge);
>>>>>
>>>>> @@ -601,6 +604,11 @@ bridge_reconfigure(const struct ovsrec_open_vswitch *ovs_cfg)
>>>>>        ofproto_set_vlan_limit(smap_get_int(&ovs_cfg->other_config, "vlan-limit",
>>>>>                                           LEGACY_MAX_VLAN_HEADERS));
>>>>>
>>>>> +#ifdef DPDK_NETDEV
>>>>> +    netdev_dpdk_set_def_lsc_int_mode_enabled(
>>>>> +        smap_get_bool(&ovs_cfg->other_config, "dpdk-lsc-interrupt", false));
>>>>> +#endif
>>>>> +
>>>> I'm still thinking that this is not necessary.
>>>> Eelco, why do you think we need to change the default global value in run-time?
>>>> IMHO, this only complicates the code. (I didn't check, but it looks like current
>>>> implementation will not work anyway.)
>>>>
>>> I do feel like the global option should be applied directly for the following reasons:
>>>
>>> - There is no way to see what is the running configuration
>>> - If not done this way the LSC mode might change for some devices, for example if you decide to change another port related configuration.
>>>
>>
>> I feel that we have some kind of misunderstanding.
>> I'll try to explain my position:
>>
>> 1. I'm not against the global config option. I just want it to be static.
>>     i.e. OVS restart will be required to change it.
> Got your point, but why do you want it to be static? I'm arguing against it
> as other options, like other_config:pmd-cpu-mask, are also not static.
> If it can be done dynamic, I prefer doing it that way.

'pmd-cpu-mask' change is the only way to configure the number of PMD threads,
and this is important to have this dynamic for run-time scaling. Also, this is
the datapath parameter, not the parameter of any ports. We, actually, have no
dynamic global parameters for ports. All the ports related parameters like
iommu-support, vhost_sock_dir are static or hardcoded.

other_config:dpdk-lsc-interrupt is just the default value that could be
overridden by the dynamic per-port configuration. We're not frequently
adding new physical NICs to servers. At this point user likely knows, which
devices supports LSC interrupts and which are not, and it's possible to decide
which value for default other_config:dpdk-lsc-interrupt is preferable.

I personally prefer to not have global option. Only per-port configurable
with 'false' as default value. This will additionally help with NICs that
doesn't support LSC interrupts, and with mismatch between output of ovs-vsctl
and real configured values. Someday in the future we could change default
to 'true' when most NICs will support this, and if this feature will be
in demand.

> 
> However if we do go with a static option, it should not effect any existing
> NIC configuration, see 4.
>> 2. Per-port option will allow overriding of the global default configuration.
> No problem here... However it's not clear to me how we would handle PMDs not
> supporting RTE_ETH_DEV_INTR_LSC?For individual interfaces, we might want to
> check this flag, and report an error at configuration time, rather than wait
> for re-initialization and add something to the log file, EINVAL?
> 
> But how do we deal with the global values and unsupported devices? Currently
> looking at the code it will fail initialization after the restart with no
> sensible error message.
>> 3. You're saying that the running configuration is not visible, but It'll be
>>     visible through global default value in other_config and per-port value in
>>     options of the particular interface. If we'll not set any of these configs,
>>     the value will not be visible in any case.
>>
>>     Suggestion: We definitely need to export the current configuration in
>>                 netdev_dpdk_get_config(). And it'll be always visible.
> I agree we need some way to show the current config because if you set the global
> value (and not restart) you will have no way of knowing the current operational
> state.The best way would be to have some ovs-appctl command to do this, as logs
> might rotate...

Current configuration could be checked by 'ovs-appctl dpif/show'. We just need
to add new value like 'lsc_interrupts_enabled' to netdev_dpdk_get_config().

> 
> Also currently all port settings are on the port, there are no global defaults.
> So people need to be aware that doing an "ovs-vsctl show" does not show all the
> actual port configuration.

Yes. That is not so elegant. But users should know that ovs-vsctl reports only the
status of the database, not the status of the daemon. 'dpif/show' and other
'dp'-commands are intended to check the actual status of the working switch.
It's the historical behaviour of the OVS.

>>
>>     Maybe you're saying that we can update global default in database and this
>>     will hide real default value? But it's the default behaviour for many other
>>     'restart required' options like iommu support of vhost_sock_dir.
>>     We need to inform about the global default value in VLOG message while
>>     initializing inside dpdk_init() like it's done for all other options.
> Yes, and this makes it some times hard to trouble shoot.
>>
>> 4. I didn't understand your second reason. There is no way to change the current
>>     port by changing config of the another one. DPDK initialized only once, so
>>     default will be fixed on boot and never changed. Any database changes for a
>>     global other_config after the initialization phase will be ignored.
> Let me explain, this is with the current implementation if we would have a global
> static option (i.e. v2 of this patch).Assume you configure the following:
> 
> ovs-vsctl set Open_vSwitch . other_config:dpdk-lsc-interrupt=true
> ovs-vsctl set interface dpdk1 options:n_rxq=2
> 
> The last command will trigger a re-configure of dpdk1, and causes now interrupts
> to be enabled.
> 
> Just want to indicate its a problem with the patch as implemented, if we want to
> go with a static global configuration it needs to be fixed.

With v2 of this series first command will update only database. Configuration
inside daemon will not be updated. After the second command dpdk1 will be
reconfigured with the old value of global dpdk-lsc-interrupt, i.e. 'false'.
(I didn't test real v2, but it should work this way if implemented right).
There is no issues here. The only issue is that value in database will mismatch
with actual value used by daemon. After restart of ovs-vswitch, global
dpdk-lsc-interrupt will be updated from the database and dpdk1 will be configured
with enabled interrupts.
Eelco Chaudron Feb. 6, 2018, 4:30 p.m. UTC | #11
On 06/02/18 16:01, Ilya Maximets wrote:
> On 06.02.2018 16:47, Eelco Chaudron wrote:
>> On 06/02/18 13:12, Ilya Maximets wrote:
>>> On 05.02.2018 19:29, Eelco Chaudron wrote:
>>>> On 02/02/18 17:17, Ilya Maximets wrote:
>>>>> On 02.02.2018 17:05, Róbert Mulik wrote:
>>>>>> It is possible to change LSC detection mode to polling or interrupt mode
>>>>>> for DPDK interfaces. The default is polling mode. To set interrupt mode,
>>>>>> option dpdk-lsc-interrupt has to be set to true.
>>>>>>
>>>>>> In polling mode more processor time is needed, since the OVS repeatedly reads
>>>>>> the link state with a short period. It can lead to packet loss for certain
>>>>>> systems.
>>>>>>
>>>>>> In interrupt mode the hardware itself triggers an interrupt when link state
>>>>>> change happens, so less processing time needs for the OVS. It is not possible
>>>>>> to enable the interrupt mode on all hardware.
>>>>>>
>>>>>> For detailed description and usage see the dpdk install documentation.
>>>>>>
>>>>>> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>
>>>>>> Reviewed-by: Ilya Maximets <i.maximets@samsung.com>
>>>>> 8<---- SNIP ---->8
>>>>>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>>>>>> index f44f950..87608a5 100644
>>>>>> --- a/vswitchd/bridge.c
>>>>>> +++ b/vswitchd/bridge.c
>>>>>> @@ -68,6 +68,9 @@
>>>>>>     #include "lib/vswitch-idl.h"
>>>>>>     #include "xenserver.h"
>>>>>>     #include "vlan-bitmap.h"
>>>>>> +#ifdef DPDK_NETDEV
>>>>>> +#include "./lib/netdev-provider.h"
>>>>>> +#endif
>>>>>>
>>>>>>     VLOG_DEFINE_THIS_MODULE(bridge);
>>>>>>
>>>>>> @@ -601,6 +604,11 @@ bridge_reconfigure(const struct ovsrec_open_vswitch *ovs_cfg)
>>>>>>         ofproto_set_vlan_limit(smap_get_int(&ovs_cfg->other_config, "vlan-limit",
>>>>>>                                            LEGACY_MAX_VLAN_HEADERS));
>>>>>>
>>>>>> +#ifdef DPDK_NETDEV
>>>>>> +    netdev_dpdk_set_def_lsc_int_mode_enabled(
>>>>>> +        smap_get_bool(&ovs_cfg->other_config, "dpdk-lsc-interrupt", false));
>>>>>> +#endif
>>>>>> +
>>>>> I'm still thinking that this is not necessary.
>>>>> Eelco, why do you think we need to change the default global value in run-time?
>>>>> IMHO, this only complicates the code. (I didn't check, but it looks like current
>>>>> implementation will not work anyway.)
>>>>>
>>>> I do feel like the global option should be applied directly for the following reasons:
>>>>
>>>> - There is no way to see what is the running configuration
>>>> - If not done this way the LSC mode might change for some devices, for example if you decide to change another port related configuration.
>>>>
>>> I feel that we have some kind of misunderstanding.
>>> I'll try to explain my position:
>>>
>>> 1. I'm not against the global config option. I just want it to be static.
>>>      i.e. OVS restart will be required to change it.
>> Got your point, but why do you want it to be static? I'm arguing against it
>> as other options, like other_config:pmd-cpu-mask, are also not static.
>> If it can be done dynamic, I prefer doing it that way.
> 'pmd-cpu-mask' change is the only way to configure the number of PMD threads,
> and this is important to have this dynamic for run-time scaling. Also, this is
> the datapath parameter, not the parameter of any ports. We, actually, have no
> dynamic global parameters for ports. All the ports related parameters like
> iommu-support, vhost_sock_dir are static or hardcoded.
>
> other_config:dpdk-lsc-interrupt is just the default value that could be
> overridden by the dynamic per-port configuration. We're not frequently
> adding new physical NICs to servers. At this point user likely knows, which
> devices supports LSC interrupts and which are not, and it's possible to decide
> which value for default other_config:dpdk-lsc-interrupt is preferable.
>
> I personally prefer to not have global option. Only per-port configurable
> with 'false' as default value. This will additionally help with NICs that
> doesn't support LSC interrupts, and with mismatch between output of ovs-vsctl
> and real configured values. Someday in the future we could change default
> to 'true' when most NICs will support this, and if this feature will be
> in demand.
Having all this complexity, I would not mind also removing the global 
setting.
Any others in favor of keeping it? If not Robert might be able to remove 
it in V5?

>> However if we do go with a static option, it should not effect any existing
>> NIC configuration, see 4.
>>> 2. Per-port option will allow overriding of the global default configuration.
>> No problem here... However it's not clear to me how we would handle PMDs not
>> supporting RTE_ETH_DEV_INTR_LSC?For individual interfaces, we might want to
>> check this flag, and report an error at configuration time, rather than wait
>> for re-initialization and add something to the log file, EINVAL?
Robert, can we add this in V5?
>> But how do we deal with the global values and unsupported devices? Currently
>> looking at the code it will fail initialization after the restart with no
>> sensible error message.
>>> 3. You're saying that the running configuration is not visible, but It'll be
>>>      visible through global default value in other_config and per-port value in
>>>      options of the particular interface. If we'll not set any of these configs,
>>>      the value will not be visible in any case.
>>>
>>>      Suggestion: We definitely need to export the current configuration in
>>>                  netdev_dpdk_get_config(). And it'll be always visible.
>> I agree we need some way to show the current config because if you set the global
>> value (and not restart) you will have no way of knowing the current operational
>> state.The best way would be to have some ovs-appctl command to do this, as logs
>> might rotate...
> Current configuration could be checked by 'ovs-appctl dpif/show'. We just need
> to add new value like 'lsc_interrupts_enabled' to netdev_dpdk_get_config().
>
Robert can you update this function in V5?

See also my general review of V4 before you submit a v5.
>> Also currently all port settings are on the port, there are no global defaults.
>> So people need to be aware that doing an "ovs-vsctl show" does not show all the
>> actual port configuration.
> Yes. That is not so elegant. But users should know that ovs-vsctl reports only the
> status of the database, not the status of the daemon. 'dpif/show' and other
> 'dp'-commands are intended to check the actual status of the working switch.
> It's the historical behaviour of the OVS.
>
>>>      Maybe you're saying that we can update global default in database and this
>>>      will hide real default value? But it's the default behaviour for many other
>>>      'restart required' options like iommu support of vhost_sock_dir.
>>>      We need to inform about the global default value in VLOG message while
>>>      initializing inside dpdk_init() like it's done for all other options.
>> Yes, and this makes it some times hard to trouble shoot.
>>> 4. I didn't understand your second reason. There is no way to change the current
>>>      port by changing config of the another one. DPDK initialized only once, so
>>>      default will be fixed on boot and never changed. Any database changes for a
>>>      global other_config after the initialization phase will be ignored.
>> Let me explain, this is with the current implementation if we would have a global
>> static option (i.e. v2 of this patch).Assume you configure the following:
>>
>> ovs-vsctl set Open_vSwitch . other_config:dpdk-lsc-interrupt=true
>> ovs-vsctl set interface dpdk1 options:n_rxq=2
>>
>> The last command will trigger a re-configure of dpdk1, and causes now interrupts
>> to be enabled.
>>
>> Just want to indicate its a problem with the patch as implemented, if we want to
>> go with a static global configuration it needs to be fixed.
> With v2 of this series first command will update only database. Configuration
> inside daemon will not be updated. After the second command dpdk1 will be
> reconfigured with the old value of global dpdk-lsc-interrupt, i.e. 'false'.
> (I didn't test real v2, but it should work this way if implemented right).
> There is no issues here. The only issue is that value in database will mismatch
> with actual value used by daemon. After restart of ovs-vswitch, global
> dpdk-lsc-interrupt will be updated from the database and dpdk1 will be configured
> with enabled interrupts.
Jan Scheurich Feb. 7, 2018, noon UTC | #12
I support the idea to remove the global default configuration option. It causes more confusion that it helps. 
If at all, new physical ports are typically added to OVS manually and LSC interrupt mode can be configured as required.

Configurable defaults can make sense for vhostuser ports as they are created automatically from OpenStack.  The operator wants to control default settings per port in the absence of specific configuration coming from OpenStack.

Regards, Jan

> -----Original Message-----

> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-bounces@openvswitch.org] On Behalf Of Eelco Chaudron

> Sent: Tuesday, 06 February, 2018 17:30

> To: Ilya Maximets <i.maximets@samsung.com>; Róbert Mulik <robert.mulik@ericsson.com>; dev@openvswitch.org

> Subject: Re: [ovs-dev] [ovs-dev, ovs-dev, v4] netdev-dpdk: Configurable Link State Change (LSC) detection mode

> 

> On 06/02/18 16:01, Ilya Maximets wrote:

> > On 06.02.2018 16:47, Eelco Chaudron wrote:

> >> On 06/02/18 13:12, Ilya Maximets wrote:

> >>> On 05.02.2018 19:29, Eelco Chaudron wrote:

> >>>> On 02/02/18 17:17, Ilya Maximets wrote:

> >>>>> On 02.02.2018 17:05, Róbert Mulik wrote:

> >>>>>> It is possible to change LSC detection mode to polling or interrupt mode

> >>>>>> for DPDK interfaces. The default is polling mode. To set interrupt mode,

> >>>>>> option dpdk-lsc-interrupt has to be set to true.

> >>>>>>

> >>>>>> In polling mode more processor time is needed, since the OVS repeatedly reads

> >>>>>> the link state with a short period. It can lead to packet loss for certain

> >>>>>> systems.

> >>>>>>

> >>>>>> In interrupt mode the hardware itself triggers an interrupt when link state

> >>>>>> change happens, so less processing time needs for the OVS. It is not possible

> >>>>>> to enable the interrupt mode on all hardware.

> >>>>>>

> >>>>>> For detailed description and usage see the dpdk install documentation.

> >>>>>>

> >>>>>> Signed-off-by: Robert Mulik <robert.mulik@ericsson.com>

> >>>>>> Reviewed-by: Ilya Maximets <i.maximets@samsung.com>

> >>>>> 8<---- SNIP ---->8

> >>>>>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c

> >>>>>> index f44f950..87608a5 100644

> >>>>>> --- a/vswitchd/bridge.c

> >>>>>> +++ b/vswitchd/bridge.c

> >>>>>> @@ -68,6 +68,9 @@

> >>>>>>     #include "lib/vswitch-idl.h"

> >>>>>>     #include "xenserver.h"

> >>>>>>     #include "vlan-bitmap.h"

> >>>>>> +#ifdef DPDK_NETDEV

> >>>>>> +#include "./lib/netdev-provider.h"

> >>>>>> +#endif

> >>>>>>

> >>>>>>     VLOG_DEFINE_THIS_MODULE(bridge);

> >>>>>>

> >>>>>> @@ -601,6 +604,11 @@ bridge_reconfigure(const struct ovsrec_open_vswitch *ovs_cfg)

> >>>>>>         ofproto_set_vlan_limit(smap_get_int(&ovs_cfg->other_config, "vlan-limit",

> >>>>>>                                            LEGACY_MAX_VLAN_HEADERS));

> >>>>>>

> >>>>>> +#ifdef DPDK_NETDEV

> >>>>>> +    netdev_dpdk_set_def_lsc_int_mode_enabled(

> >>>>>> +        smap_get_bool(&ovs_cfg->other_config, "dpdk-lsc-interrupt", false));

> >>>>>> +#endif

> >>>>>> +

> >>>>> I'm still thinking that this is not necessary.

> >>>>> Eelco, why do you think we need to change the default global value in run-time?

> >>>>> IMHO, this only complicates the code. (I didn't check, but it looks like current

> >>>>> implementation will not work anyway.)

> >>>>>

> >>>> I do feel like the global option should be applied directly for the following reasons:

> >>>>

> >>>> - There is no way to see what is the running configuration

> >>>> - If not done this way the LSC mode might change for some devices, for example if you decide to change another port related

> configuration.

> >>>>

> >>> I feel that we have some kind of misunderstanding.

> >>> I'll try to explain my position:

> >>>

> >>> 1. I'm not against the global config option. I just want it to be static.

> >>>      i.e. OVS restart will be required to change it.

> >> Got your point, but why do you want it to be static? I'm arguing against it

> >> as other options, like other_config:pmd-cpu-mask, are also not static.

> >> If it can be done dynamic, I prefer doing it that way.

> > 'pmd-cpu-mask' change is the only way to configure the number of PMD threads,

> > and this is important to have this dynamic for run-time scaling. Also, this is

> > the datapath parameter, not the parameter of any ports. We, actually, have no

> > dynamic global parameters for ports. All the ports related parameters like

> > iommu-support, vhost_sock_dir are static or hardcoded.

> >

> > other_config:dpdk-lsc-interrupt is just the default value that could be

> > overridden by the dynamic per-port configuration. We're not frequently

> > adding new physical NICs to servers. At this point user likely knows, which

> > devices supports LSC interrupts and which are not, and it's possible to decide

> > which value for default other_config:dpdk-lsc-interrupt is preferable.

> >

> > I personally prefer to not have global option. Only per-port configurable

> > with 'false' as default value. This will additionally help with NICs that

> > doesn't support LSC interrupts, and with mismatch between output of ovs-vsctl

> > and real configured values. Someday in the future we could change default

> > to 'true' when most NICs will support this, and if this feature will be

> > in demand.

> Having all this complexity, I would not mind also removing the global

> setting.

> Any others in favor of keeping it? If not Robert might be able to remove

> it in V5?

> 

> >> However if we do go with a static option, it should not effect any existing

> >> NIC configuration, see 4.

> >>> 2. Per-port option will allow overriding of the global default configuration.

> >> No problem here... However it's not clear to me how we would handle PMDs not

> >> supporting RTE_ETH_DEV_INTR_LSC?For individual interfaces, we might want to

> >> check this flag, and report an error at configuration time, rather than wait

> >> for re-initialization and add something to the log file, EINVAL?

> Robert, can we add this in V5?

> >> But how do we deal with the global values and unsupported devices? Currently

> >> looking at the code it will fail initialization after the restart with no

> >> sensible error message.

> >>> 3. You're saying that the running configuration is not visible, but It'll be

> >>>      visible through global default value in other_config and per-port value in

> >>>      options of the particular interface. If we'll not set any of these configs,

> >>>      the value will not be visible in any case.

> >>>

> >>>      Suggestion: We definitely need to export the current configuration in

> >>>                  netdev_dpdk_get_config(). And it'll be always visible.

> >> I agree we need some way to show the current config because if you set the global

> >> value (and not restart) you will have no way of knowing the current operational

> >> state.The best way would be to have some ovs-appctl command to do this, as logs

> >> might rotate...

> > Current configuration could be checked by 'ovs-appctl dpif/show'. We just need

> > to add new value like 'lsc_interrupts_enabled' to netdev_dpdk_get_config().

> >

> Robert can you update this function in V5?

> 

> See also my general review of V4 before you submit a v5.

> >> Also currently all port settings are on the port, there are no global defaults.

> >> So people need to be aware that doing an "ovs-vsctl show" does not show all the

> >> actual port configuration.

> > Yes. That is not so elegant. But users should know that ovs-vsctl reports only the

> > status of the database, not the status of the daemon. 'dpif/show' and other

> > 'dp'-commands are intended to check the actual status of the working switch.

> > It's the historical behaviour of the OVS.

> >

> >>>      Maybe you're saying that we can update global default in database and this

> >>>      will hide real default value? But it's the default behaviour for many other

> >>>      'restart required' options like iommu support of vhost_sock_dir.

> >>>      We need to inform about the global default value in VLOG message while

> >>>      initializing inside dpdk_init() like it's done for all other options.

> >> Yes, and this makes it some times hard to trouble shoot.

> >>> 4. I didn't understand your second reason. There is no way to change the current

> >>>      port by changing config of the another one. DPDK initialized only once, so

> >>>      default will be fixed on boot and never changed. Any database changes for a

> >>>      global other_config after the initialization phase will be ignored.

> >> Let me explain, this is with the current implementation if we would have a global

> >> static option (i.e. v2 of this patch).Assume you configure the following:

> >>

> >> ovs-vsctl set Open_vSwitch . other_config:dpdk-lsc-interrupt=true

> >> ovs-vsctl set interface dpdk1 options:n_rxq=2

> >>

> >> The last command will trigger a re-configure of dpdk1, and causes now interrupts

> >> to be enabled.

> >>

> >> Just want to indicate its a problem with the patch as implemented, if we want to

> >> go with a static global configuration it needs to be fixed.

> > With v2 of this series first command will update only database. Configuration

> > inside daemon will not be updated. After the second command dpdk1 will be

> > reconfigured with the old value of global dpdk-lsc-interrupt, i.e. 'false'.

> > (I didn't test real v2, but it should work this way if implemented right).

> > There is no issues here. The only issue is that value in database will mismatch

> > with actual value used by daemon. After restart of ovs-vswitch, global

> > dpdk-lsc-interrupt will be updated from the database and dpdk1 will be configured

> > with enabled interrupts.

> 

> 

> _______________________________________________

> dev mailing list

> dev@openvswitch.org

> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
index ed358d5..14a6684 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -628,6 +628,54 @@  The average number of packets per output batch can be checked in PMD stats::

     $ ovs-appctl dpif-netdev/pmd-stats-show

+Link State Change (LSC) detection configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are two methods to get the information when Link State Change (LSC)
+happens on a network interface: by polling or interrupt.
+
+With polling method, a process is running in the background and repeatedly
+reads the link state with a short period. It continuously needs processor time
+and between 2 reading periods it can`t see the link state change, therefore
+the reaction time depends on the polling period. With higher rate, more
+processor time is needed. Another problem with the poll mode is that on some
+hardware a polling cycle takes too much time, which (in the end) leads to
+packet loss for certain systems.
+
+If interrupts are used to get LSC information, the hardware itself triggers
+an interrupt when link state change happens, the thread wakes up from sleep,
+updates the information, and goes back to sleep mode. When no link state
+change happens (most of the time), the thread remains in sleep mode and
+doesn`t use processor time at all. The disadvantage of this method is that
+when interrupt happens, the processor has to handle it immediately, so it
+puts the currently running process to background, handles the interrupt, and
+takes the background process back. Another disadvantage is that some hardware
+can`t be configured to generate LSC interrupts.
+
+The default configuration is polling mode. To set interrupt mode, option
+dpdk-lsc-interrupt has to be set to true.
+
+Global settings
+
+Command to set interrupt mode for all interfaces:
+ovs-vsctl set Open_vSwitch . other_config:dpdk-lsc-interrupt=true
+
+Command to set polling mode for all interfaces:
+ovs-vsctl set Open_vSwitch . other_config:dpdk-lsc-interrupt=false
+or:
+ovs-vsctl remove Open_vSwitch . other_config dpdk-lsc-interrupt
+
+Interface specific settings (override global settings)
+
+Command to set interrupt mode for a specific interface:
+ovs-vsctl set interface <interface_name> options:dpdk-lsc-interrupt=true
+
+Command to set polling mode for a specific interface:
+ovs-vsctl set interface <interface_name> options:dpdk-lsc-interrupt=false
+
+Command to reset to globally defined mode for a specific interface:
+ovs-vsctl remove interface <interface_name> options dpdk-lsc-interrupt
+
 Limitations
 ------------

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 94fb163..73d0d4b 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -148,7 +148,7 @@  typedef uint16_t dpdk_port_t;
 #define VHOST_ENQ_RETRY_NUM 8
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)

-static const struct rte_eth_conf port_conf = {
+static struct rte_eth_conf port_conf = {
     .rxmode = {
         .mq_mode = ETH_MQ_RX_RSS,
         .split_hdr_size = 0,
@@ -167,6 +167,10 @@  static const struct rte_eth_conf port_conf = {
     .txmode = {
         .mq_mode = ETH_MQ_TX_NONE,
     },
+    .intr_conf = {
+        /* LSC interrupt mode disabled, polling mode used. */
+        .lsc = 0,
+    },
 };

 /*
@@ -433,6 +437,12 @@  struct netdev_dpdk {
         /* DPDK-ETH hardware offload features,
          * from the enum set 'dpdk_hw_ol_features' */
         uint32_t hw_ol_features;
+
+        /* Properties for link state change detection mode.
+         * If lsc_interrupt_mode is set to false, poll mode is used,
+         * otherwise interrupt mode is used. */
+        bool requested_lsc_interrupt_mode;
+        bool lsc_interrupt_mode;
     );

     PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -459,6 +469,18 @@  int netdev_dpdk_get_vid(const struct netdev_dpdk *dev);
 struct ingress_policer *
 netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev);

+void
+netdev_dpdk_set_default_lsc_detect_mode(bool enabled)
+{
+    port_conf.intr_conf.lsc = enabled;
+}
+
+bool
+netdev_dpdk_get_def_lsc_int_mode_enabled(void)
+{
+    return port_conf.intr_conf.lsc;
+}
+
 static bool
 is_dpdk_class(const struct netdev_class *class)
 {
@@ -686,12 +708,14 @@  dpdk_watchdog(void *dummy OVS_UNUSED)
 }

 static int
-dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
+dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
 {
     int diag = 0;
     int i;
     struct rte_eth_conf conf = port_conf;

+    conf.intr_conf.lsc = dev->lsc_interrupt_mode;
+
     /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly
      * enabled. */
     if (dev->mtu > ETHER_MTU) {
@@ -801,7 +825,7 @@  dpdk_eth_dev_init(struct netdev_dpdk *dev)
     n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
     n_txq = MIN(info.max_tx_queues, dev->up.n_txq);

-    diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
+    diag = dpdk_eth_dev_port_config(dev, n_rxq, n_txq);
     if (diag) {
         VLOG_ERR("Interface %s(rxq:%d txq:%d) configure error: %s",
                  dev->up.name, n_rxq, n_txq, rte_strerror(-diag));
@@ -897,6 +921,8 @@  common_construct(struct netdev *netdev, dpdk_port_t port_no,
     dev->flags = 0;
     dev->requested_mtu = ETHER_MTU;
     dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
+    dev->requested_lsc_interrupt_mode =
+        netdev_dpdk_get_def_lsc_int_mode_enabled();
     ovsrcu_index_init(&dev->vid, -1);
     dev->vhost_reconfigured = false;
     dev->attached = false;
@@ -1520,6 +1546,13 @@  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
         goto out;
     }

+    bool lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt",
+                              netdev_dpdk_get_def_lsc_int_mode_enabled());
+    if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
+        dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
+        netdev_request_reconfigure(netdev);
+    }
+
     rx_fc_en = smap_get_bool(args, "rx-flow-ctrl", false);
     tx_fc_en = smap_get_bool(args, "tx-flow-ctrl", false);
     autoneg = smap_get_bool(args, "flow-ctrl-autoneg", false);
@@ -3546,6 +3579,7 @@  netdev_dpdk_reconfigure(struct netdev *netdev)
     if (netdev->n_txq == dev->requested_n_txq
         && netdev->n_rxq == dev->requested_n_rxq
         && dev->mtu == dev->requested_mtu
+        && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
         && dev->rxq_size == dev->requested_rxq_size
         && dev->txq_size == dev->requested_txq_size
         && dev->socket_id == dev->requested_socket_id) {
@@ -3561,6 +3595,8 @@  netdev_dpdk_reconfigure(struct netdev *netdev)
         goto out;
     }

+    dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode;
+
     netdev->n_txq = dev->requested_n_txq;
     netdev->n_rxq = dev->requested_n_rxq;

diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index b7d02a7..8eb51bc 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -27,6 +27,8 @@  struct dp_packet;

 void netdev_dpdk_register(void);
 void free_dpdk_buf(struct dp_packet *);
+void netdev_dpdk_set_def_lsc_int_mode_enabled(bool enabled);
+bool netdev_dpdk_get_def_lsc_int_mode_enabled(void);

 #else

diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index f44f950..87608a5 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -68,6 +68,9 @@ 
 #include "lib/vswitch-idl.h"
 #include "xenserver.h"
 #include "vlan-bitmap.h"
+#ifdef DPDK_NETDEV
+#include "./lib/netdev-provider.h"
+#endif

 VLOG_DEFINE_THIS_MODULE(bridge);

@@ -601,6 +604,11 @@  bridge_reconfigure(const struct ovsrec_open_vswitch *ovs_cfg)
     ofproto_set_vlan_limit(smap_get_int(&ovs_cfg->other_config, "vlan-limit",
                                        LEGACY_MAX_VLAN_HEADERS));

+#ifdef DPDK_NETDEV
+    netdev_dpdk_set_def_lsc_int_mode_enabled(
+        smap_get_bool(&ovs_cfg->other_config, "dpdk-lsc-interrupt", false));
+#endif
+
     ofproto_set_threads(
         smap_get_int(&ovs_cfg->other_config, "n-handler-threads", 0),
         smap_get_int(&ovs_cfg->other_config, "n-revalidator-threads", 0));
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 0c6a43d..91a9003 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -320,6 +320,29 @@ 
         </p>
       </column>

+      <column name="other_config" key="dpdk-lsc-interrupt"
+              type='{"type": "boolean"}'>
+        <p>
+          Set this value to <code>true</code> to configure interrupt mode for
+          Link State Change (LSC) detection instead of poll mode for DPDK
+          interfaces.
+        </p>
+        <p>
+          The default value is <code>false</code>.
+        </p>
+        <p>
+          If this value is <code>false</code> at startup, poll mode is used for
+          all netdev dpdk interfaces.
+        </p>
+        <p>
+          This value can be overridden for a specific interface in the options
+          section of that interface.
+        </p>
+        <p>
+          This parameter has an effect only on netdev dpdk interfaces.
+        </p>
+      </column>
+
       <column name="other_config" key="dpdk-extra"
               type='{"type": "string"}'>
         <p>
@@ -3631,6 +3654,28 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
       </column>
     </group>

+    <group title="Link State Change detection mode">
+      <column name="options" key="dpdk-lsc-interrupt"
+              type='{"type": "boolean"}'>
+        <p>
+          Set this value to <code>true</code> to configure interrupt mode for
+          Link State Change (LSC) detection instead of poll mode for the DPDK
+          interface.
+        </p>
+        <p>
+          If this value is not set, the value is taken from the global
+          settings.
+        </p>
+        <p>
+          If this value is set, the global LSC interrupt settings are
+          overridden.
+        </p>
+        <p>
+          This parameter has an effect only on netdev dpdk interfaces.
+        </p>
+      </column>
+    </group>
+
     <group title="Common Columns">
       The overall purpose of these columns is described under <code>Common
       Columns</code> at the beginning of this document.