[ovs-dev,v4,2/2] netdev-dpdk: Enable optional dequeue zero copy for vHost User

Message ID 1509614611-4233-3-git-send-email-ciara.loftus@intel.com
State New
Delegated to: Ian Stokes
Headers show
Series
  • vHost Dequeue Zero Copy
Related show

Commit Message

Loftus, Ciara Nov. 2, 2017, 9:23 a.m.
Enabled per port like so:
ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true

The feature is disabled by default and can only be enabled/disabled when
a vHost port is down.

When packets from a vHost device with zero copy enabled are destined for
a 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be
set to a smaller value. 128 is recommended. This can be achieved like
so:

ovs-vsctl set Interface dpdkport options:n_txq_desc=128

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
v4:
* Rebase

 Documentation/howto/dpdk.rst             | 33 ++++++++++++
 Documentation/topics/dpdk/vhost-user.rst | 58 +++++++++++++++++++++
 NEWS                                     |  3 ++
 lib/netdev-dpdk.c                        | 89 +++++++++++++++++++++++++++++++-
 vswitchd/vswitch.xml                     | 11 ++++
 5 files changed, 193 insertions(+), 1 deletion(-)

Patch

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index d123819..3e1b8f8 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -709,3 +709,36 @@  devices to bridge ``br0``. Once complete, follow the below steps:
    Check traffic on multiple queues::
 
        $ cat /proc/interrupts | grep virtio
+
+PHY-VM-PHY (vHost Dequeue Zero Copy)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+vHost dequeue zero copy functionality can  be validated using the
+PHY-VM-PHY configuration. To begin, follow the steps described in
+:ref:`dpdk-phy-phy` to create and initialize the database, start
+ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices
+and flows to bridge ``br0``. Once complete, follow the below steps:
+
+1. Enable dequeue zero copy on the vHost devices.
+
+       $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-copy=true
+       $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero-copy=true
+
+The following log should be observed for each device:
+
+       netdev_dpdk|INFO|Zero copy enabled for vHost socket <name>
+
+2. Reduce the number of txq descriptors on the phy ports.
+
+       $ ovs-vsctl set Interface phy0 options:n_txq_desc=128
+       $ ovs-vsctl set Interface phy1 options:n_txq_desc=128
+
+3. Proceed with the test by launching the VM and configuring guest
+forwarding, be it via the vHost loopback method or kernel forwarding
+method, and sending traffic. The following log should be oberved for
+each device as it becomes active during VM boot:
+
+       VHOST_CONFIG: dequeue zero copy is enabled
+
+It is essential that step 1 is performed before booting the VM, otherwise
+the feature will not be enabled.
diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
index 74ac06e..2636d5a 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -408,3 +408,61 @@  Sample XML
     </domain>
 
 .. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD
+
+vhost-user Dequeue Zero Copy
+-------------------------------------
+
+Normally when dequeuing a packet from a vHost User device, a memcpy operation
+must be used to copy that packet from guest address space to host address
+space. This memcpy can be removed by enabling dequeue zero-copy like so:
+
+    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
+
+With this feature enabled, a reference (pointer) to the packet is passed to
+the host, instead of a copy of the packet. Removing this memcpy can give a
+performance improvement for some use cases, for example switching large packets
+between different VMs.
+
+Note that the feature is disabled by default and must be explicitly enabled
+by using the command above.
+
+The feature cannot be enabled when the device is active (ie. VM booted). If
+you wish to enable the feature after the VM has booted, you must shutdown
+the VM and bring it back up.
+
+The same logic applies for disabling the feature - it must be disabled when
+the device is inactive, for example before VM boot. To disable the feature:
+
+    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=false
+
+The feature is available to both dpdkvhostuser and dpdkvhostuserclient port
+types.
+
+A limitation exists whereby if packets from a vHost port with dq-zero-copy=true
+are destined for a 'dpdk' type port, the number of tx descriptors (n_txq_desc)
+for that port must be reduced to a smaller number, 128 being the recommended
+value. This can be achieved by issuing the following command:
+
+    $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
+
+More information on the n_txq_desc option can be found in the "DPDK Physical
+Port Queue Sizes" section of the  `intro/install/dpdk.rst` guide.
+
+The reason for this limitation is due to how the zero copy functionality is
+implemented. The vHost device's 'tx used vring', a virtio structure used for
+tracking used ie. sent descriptors, will only be updated when the NIC frees
+the corresponding mbuf. If we don't free the mbufs frequently enough, that
+vring will be starved and packets will no longer be processed. One way to
+ensure we don't encounter this scenario, is to configure n_txq_desc to a small
+enough number such that the 'mbuf free threshold' for the NIC will be hit more
+often and thus free mbufs more frequently. The value of 128 is suggested, but
+values of 64 and 256 have been tested and verified to work too, with differing
+performance characteristics.
+
+Further information can be found in the
+`DPDK documentation
+<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
+
+Further information can be found in the
+`DPDK documentation
+<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
diff --git a/NEWS b/NEWS
index 1325d31..0f6dfa1 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,9 @@  Post-v2.8.0
        chassis "hostname" in addition to a chassis "name".
    - Linux kernel 4.13
      * Add support for compiling OVS with the latest Linux 4.13 kernel
+   - DPDK:
+     * Optional dequeue zero copy feature for vHost ports enabled per port
+       via the boolean 'dq-zero-copy' option.
 
 v2.8.0 - 31 Aug 2017
 --------------------
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4643f6f..db813f3 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -379,6 +379,9 @@  struct netdev_dpdk {
     /* True if vHost device is 'up' and has been reconfigured at least once */
     bool vhost_reconfigured;
 
+    /* True if dq-zero-copy feature has successfully been enabled */
+    bool dq_zc_enabled;
+
     /* Identifier used to distinguish vhost devices from each other. */
     char vhost_id[PATH_MAX];
 
@@ -923,6 +926,7 @@  common_construct(struct netdev *netdev, dpdk_port_t port_no,
     dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
     ovsrcu_index_init(&dev->vid, -1);
     dev->vhost_reconfigured = false;
+    dev->dq_zc_enabled = false;
     dev->attached = false;
 
     ovsrcu_init(&dev->qos_conf, NULL);
@@ -1431,6 +1435,29 @@  netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap *args,
     return 0;
 }
 
+static void
+dpdk_vhost_set_config_helper(struct netdev_dpdk *dev,
+                             const struct smap *args)
+{
+    bool needs_reconfigure = false;
+    bool zc_requested = smap_get_bool(args, "dq-zero-copy", false);
+
+    if (zc_requested &&
+            !(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
+        dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
+        needs_reconfigure = true;
+    } else if (!zc_requested &&
+            (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
+        dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
+        needs_reconfigure = true;
+    }
+
+    /* Only try to change ZC mode when device is down */
+    if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) {
+        netdev_request_reconfigure(&dev->up);
+    }
+}
+
 static int
 netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
                                     const struct smap *args,
@@ -1447,6 +1474,23 @@  netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
             netdev_request_reconfigure(netdev);
         }
     }
+
+    dpdk_vhost_set_config_helper(dev, args);
+
+    ovs_mutex_unlock(&dev->mutex);
+
+    return 0;
+}
+
+static int
+netdev_dpdk_vhost_set_config(struct netdev *netdev,
+                             const struct smap *args,
+                             char **errp OVS_UNUSED)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+
+    ovs_mutex_lock(&dev->mutex);
+    dpdk_vhost_set_config_helper(dev, args);
     ovs_mutex_unlock(&dev->mutex);
 
     return 0;
@@ -2771,6 +2815,46 @@  netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev)
     }
 }
 
+static void
+vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool client_mode,
+                            bool enable)
+{
+    int err = rte_vhost_driver_unregister(dev->vhost_id);
+
+    if (err) {
+        VLOG_ERR("Error unregistering vHost socket %s; can't change zero copy "
+                "mode", dev->vhost_id);
+    } else {
+        err = dpdk_setup_vhost_device(dev, client_mode);
+        if (err) {
+            VLOG_ERR("Error changing zero copy mode for vHost socket %s",
+                    dev->vhost_id);
+        } else if (enable) {
+            dev->dq_zc_enabled = true;
+            VLOG_INFO("Zero copy enabled for vHost socket %s", dev->vhost_id);
+        } else {
+            dev->dq_zc_enabled = false;
+            VLOG_INFO("Zero copy disabled for vHost socket %s", dev->vhost_id);
+        }
+    }
+}
+
+static void
+vhost_check_zero_copy_status(struct netdev_dpdk *dev)
+{
+    bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT;
+
+    if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
+                && !dev->dq_zc_enabled) {
+        /* ZC disabled but requested to be enabled, enable it. */
+        vhost_change_zero_copy_mode(dev, mode, true);
+    } else if (!(dev->vhost_driver_flags &
+            RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev->dq_zc_enabled) {
+        /* ZC enabled but requested to be disabled, disable it. */
+        vhost_change_zero_copy_mode(dev, mode, false);
+    }
+}
+
 /*
  * Remove a virtio-net device from the specific vhost port.  Use dev->remove
  * flag to stop any more packets from being sent or received to/from a VM and
@@ -2816,6 +2900,7 @@  destroy_device(int vid)
          */
         ovsrcu_quiesce_start();
         VLOG_INFO("vHost Device '%s' has been removed", ifname);
+        netdev_request_reconfigure(&dev->up);
     } else {
         VLOG_INFO("vHost Device '%s' not found", ifname);
     }
@@ -3307,6 +3392,8 @@  dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
             /* Carrier status may need updating. */
             netdev_change_seq_changed(&dev->up);
         }
+    } else {
+        vhost_check_zero_copy_status(dev);
     }
 
     return 0;
@@ -3468,7 +3555,7 @@  static const struct netdev_class dpdk_vhost_class =
         NULL,
         netdev_dpdk_vhost_construct,
         netdev_dpdk_vhost_destruct,
-        NULL,
+        netdev_dpdk_vhost_set_config,
         NULL,
         netdev_dpdk_vhost_send,
         netdev_dpdk_vhost_get_carrier,
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index d7f6839..55add4c 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -2649,6 +2649,17 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
         </p>
       </column>
 
+      <column name="options" key="dq-zero-copy"
+              type='{"type": "boolean"}'>
+        <p>
+          The value specifies whether or not to enable dequeue zero copy on
+          the given interface.
+          The port must be in an inactive state in order to enable or disable
+          this feature.
+          Only supported by dpdkvhostuserclient and dpdkvhostuser interfaces.
+        </p>
+      </column>
+
       <column name="options" key="n_rxq_desc"
               type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'>
         <p>