{"id":833265,"url":"http://patchwork.ozlabs.org/api/1.2/patches/833265/?format=json","web_url":"http://patchwork.ozlabs.org/project/openvswitch/patch/1509614611-4233-3-git-send-email-ciara.loftus@intel.com/","project":{"id":47,"url":"http://patchwork.ozlabs.org/api/1.2/projects/47/?format=json","name":"Open vSwitch","link_name":"openvswitch","list_id":"ovs-dev.openvswitch.org","list_email":"ovs-dev@openvswitch.org","web_url":"http://openvswitch.org/","scm_url":"git@github.com:openvswitch/ovs.git","webscm_url":"https://github.com/openvswitch/ovs","list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<1509614611-4233-3-git-send-email-ciara.loftus@intel.com>","list_archive_url":null,"date":"2017-11-02T09:23:31","name":"[ovs-dev,v4,2/2] netdev-dpdk: Enable optional dequeue zero copy for vHost User","commit_ref":null,"pull_url":null,"state":"superseded","archived":false,"hash":"f3303f87d75ca96e8ef33825c8ce5472bb96ec91","submitter":{"id":67255,"url":"http://patchwork.ozlabs.org/api/1.2/people/67255/?format=json","name":"Ciara Loftus","email":"ciara.loftus@intel.com"},"delegate":{"id":70734,"url":"http://patchwork.ozlabs.org/api/1.2/users/70734/?format=json","username":"istokes","first_name":"Ian","last_name":"Stokes","email":"ian.stokes@intel.com"},"mbox":"http://patchwork.ozlabs.org/project/openvswitch/patch/1509614611-4233-3-git-send-email-ciara.loftus@intel.com/mbox/","series":[{"id":11459,"url":"http://patchwork.ozlabs.org/api/1.2/series/11459/?format=json","web_url":"http://patchwork.ozlabs.org/project/openvswitch/list/?series=11459","date":"2017-11-02T09:23:29","name":"vHost Dequeue Zero Copy","version":4,"mbox":"http://patchwork.ozlabs.org/series/11459/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/833265/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/833265/checks/","tags":{},"related":[],"headers":{"Return-Path":"<ovs-dev-bounces@openvswitch.org>","X-Original-To":["incoming@patchwork.ozlabs.org","dev@openvswitch.org"],"Delivered-To":["patchwork-incoming@bilbo.ozlabs.org","ovs-dev@mail.linuxfoundation.org"],"Authentication-Results":"ozlabs.org;\n\tspf=pass (mailfrom) smtp.mailfrom=openvswitch.org\n\t(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;\n\tenvelope-from=ovs-dev-bounces@openvswitch.org;\n\treceiver=<UNKNOWN>)","Received":["from mail.linuxfoundation.org (mail.linuxfoundation.org\n\t[140.211.169.12])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256\n\tbits)) (No client certificate requested)\n\tby ozlabs.org (Postfix) with ESMTPS id 3ySKRt0hNBz9sNx\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu,  2 Nov 2017 20:24:58 +1100 (AEDT)","from mail.linux-foundation.org (localhost [127.0.0.1])\n\tby mail.linuxfoundation.org (Postfix) with ESMTP id 410DAACA;\n\tThu,  2 Nov 2017 09:23:47 +0000 (UTC)","from smtp1.linuxfoundation.org (smtp1.linux-foundation.org\n\t[172.17.192.35])\n\tby mail.linuxfoundation.org (Postfix) with ESMTPS id 22C97AA5\n\tfor <dev@openvswitch.org>; Thu,  2 Nov 2017 09:23:46 +0000 (UTC)","from mga01.intel.com (mga01.intel.com [192.55.52.88])\n\tby smtp1.linuxfoundation.org (Postfix) with ESMTPS id 1F8544CE\n\tfor <dev@openvswitch.org>; Thu,  2 Nov 2017 09:23:45 +0000 (UTC)","from orsmga004.jf.intel.com ([10.7.209.38])\n\tby fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t02 Nov 2017 02:23:44 -0700","from sivswdev01.ir.intel.com (HELO localhost.localdomain)\n\t([10.237.217.45])\n\tby orsmga004.jf.intel.com with ESMTP; 02 Nov 2017 02:23:43 -0700"],"X-Greylist":"domain auto-whitelisted by SQLgrey-1.7.6","X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.44,333,1505804400\"; d=\"scan'208\";a=\"145057681\"","From":"Ciara Loftus <ciara.loftus@intel.com>","To":"dev@openvswitch.org","Date":"Thu,  2 Nov 2017 09:23:31 +0000","Message-Id":"<1509614611-4233-3-git-send-email-ciara.loftus@intel.com>","X-Mailer":"git-send-email 1.7.0.7","In-Reply-To":"<1509614611-4233-1-git-send-email-ciara.loftus@intel.com>","References":"<1509614611-4233-1-git-send-email-ciara.loftus@intel.com>","X-Spam-Status":"No, score=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI,\n\tRP_MATCHES_RCVD autolearn=disabled version=3.3.1","X-Spam-Checker-Version":"SpamAssassin 3.3.1 (2010-03-16) on\n\tsmtp1.linux-foundation.org","Subject":"[ovs-dev] [PATCH v4 2/2] netdev-dpdk: Enable optional dequeue zero\n\tcopy for vHost User","X-BeenThere":"ovs-dev@openvswitch.org","X-Mailman-Version":"2.1.12","Precedence":"list","List-Id":"<ovs-dev.openvswitch.org>","List-Unsubscribe":"<https://mail.openvswitch.org/mailman/options/ovs-dev>,\n\t<mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe>","List-Archive":"<http://mail.openvswitch.org/pipermail/ovs-dev/>","List-Post":"<mailto:ovs-dev@openvswitch.org>","List-Help":"<mailto:ovs-dev-request@openvswitch.org?subject=help>","List-Subscribe":"<https://mail.openvswitch.org/mailman/listinfo/ovs-dev>,\n\t<mailto:ovs-dev-request@openvswitch.org?subject=subscribe>","MIME-Version":"1.0","Content-Type":"text/plain; charset=\"us-ascii\"","Content-Transfer-Encoding":"7bit","Sender":"ovs-dev-bounces@openvswitch.org","Errors-To":"ovs-dev-bounces@openvswitch.org"},"content":"Enabled per port like so:\novs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true\n\nThe feature is disabled by default and can only be enabled/disabled when\na vHost port is down.\n\nWhen packets from a vHost device with zero copy enabled are destined for\na 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be\nset to a smaller value. 128 is recommended. This can be achieved like\nso:\n\novs-vsctl set Interface dpdkport options:n_txq_desc=128\n\nSigned-off-by: Ciara Loftus <ciara.loftus@intel.com>\n---\nv4:\n* Rebase\n\n Documentation/howto/dpdk.rst             | 33 ++++++++++++\n Documentation/topics/dpdk/vhost-user.rst | 58 +++++++++++++++++++++\n NEWS                                     |  3 ++\n lib/netdev-dpdk.c                        | 89 +++++++++++++++++++++++++++++++-\n vswitchd/vswitch.xml                     | 11 ++++\n 5 files changed, 193 insertions(+), 1 deletion(-)","diff":"diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst\nindex d123819..3e1b8f8 100644\n--- a/Documentation/howto/dpdk.rst\n+++ b/Documentation/howto/dpdk.rst\n@@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow the below steps:\n    Check traffic on multiple queues::\n \n        $ cat /proc/interrupts | grep virtio\n+\n+PHY-VM-PHY (vHost Dequeue Zero Copy)\n+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n+\n+vHost dequeue zero copy functionality can  be validated using the\n+PHY-VM-PHY configuration. To begin, follow the steps described in\n+:ref:`dpdk-phy-phy` to create and initialize the database, start\n+ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices\n+and flows to bridge ``br0``. Once complete, follow the below steps:\n+\n+1. Enable dequeue zero copy on the vHost devices.\n+\n+       $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-copy=true\n+       $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero-copy=true\n+\n+The following log should be observed for each device:\n+\n+       netdev_dpdk|INFO|Zero copy enabled for vHost socket <name>\n+\n+2. Reduce the number of txq descriptors on the phy ports.\n+\n+       $ ovs-vsctl set Interface phy0 options:n_txq_desc=128\n+       $ ovs-vsctl set Interface phy1 options:n_txq_desc=128\n+\n+3. Proceed with the test by launching the VM and configuring guest\n+forwarding, be it via the vHost loopback method or kernel forwarding\n+method, and sending traffic. The following log should be oberved for\n+each device as it becomes active during VM boot:\n+\n+       VHOST_CONFIG: dequeue zero copy is enabled\n+\n+It is essential that step 1 is performed before booting the VM, otherwise\n+the feature will not be enabled.\ndiff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst\nindex 74ac06e..2636d5a 100644\n--- a/Documentation/topics/dpdk/vhost-user.rst\n+++ b/Documentation/topics/dpdk/vhost-user.rst\n@@ -408,3 +408,61 @@ Sample XML\n     </domain>\n \n .. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD\n+\n+vhost-user Dequeue Zero Copy\n+-------------------------------------\n+\n+Normally when dequeuing a packet from a vHost User device, a memcpy operation\n+must be used to copy that packet from guest address space to host address\n+space. This memcpy can be removed by enabling dequeue zero-copy like so:\n+\n+    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true\n+\n+With this feature enabled, a reference (pointer) to the packet is passed to\n+the host, instead of a copy of the packet. Removing this memcpy can give a\n+performance improvement for some use cases, for example switching large packets\n+between different VMs.\n+\n+Note that the feature is disabled by default and must be explicitly enabled\n+by using the command above.\n+\n+The feature cannot be enabled when the device is active (ie. VM booted). If\n+you wish to enable the feature after the VM has booted, you must shutdown\n+the VM and bring it back up.\n+\n+The same logic applies for disabling the feature - it must be disabled when\n+the device is inactive, for example before VM boot. To disable the feature:\n+\n+    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=false\n+\n+The feature is available to both dpdkvhostuser and dpdkvhostuserclient port\n+types.\n+\n+A limitation exists whereby if packets from a vHost port with dq-zero-copy=true\n+are destined for a 'dpdk' type port, the number of tx descriptors (n_txq_desc)\n+for that port must be reduced to a smaller number, 128 being the recommended\n+value. This can be achieved by issuing the following command:\n+\n+    $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128\n+\n+More information on the n_txq_desc option can be found in the \"DPDK Physical\n+Port Queue Sizes\" section of the  `intro/install/dpdk.rst` guide.\n+\n+The reason for this limitation is due to how the zero copy functionality is\n+implemented. The vHost device's 'tx used vring', a virtio structure used for\n+tracking used ie. sent descriptors, will only be updated when the NIC frees\n+the corresponding mbuf. If we don't free the mbufs frequently enough, that\n+vring will be starved and packets will no longer be processed. One way to\n+ensure we don't encounter this scenario, is to configure n_txq_desc to a small\n+enough number such that the 'mbuf free threshold' for the NIC will be hit more\n+often and thus free mbufs more frequently. The value of 128 is suggested, but\n+values of 64 and 256 have been tested and verified to work too, with differing\n+performance characteristics.\n+\n+Further information can be found in the\n+`DPDK documentation\n+<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__\n+\n+Further information can be found in the\n+`DPDK documentation\n+<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__\ndiff --git a/NEWS b/NEWS\nindex 1325d31..0f6dfa1 100644\n--- a/NEWS\n+++ b/NEWS\n@@ -5,6 +5,9 @@ Post-v2.8.0\n        chassis \"hostname\" in addition to a chassis \"name\".\n    - Linux kernel 4.13\n      * Add support for compiling OVS with the latest Linux 4.13 kernel\n+   - DPDK:\n+     * Optional dequeue zero copy feature for vHost ports enabled per port\n+       via the boolean 'dq-zero-copy' option.\n \n v2.8.0 - 31 Aug 2017\n --------------------\ndiff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c\nindex 4643f6f..db813f3 100644\n--- a/lib/netdev-dpdk.c\n+++ b/lib/netdev-dpdk.c\n@@ -379,6 +379,9 @@ struct netdev_dpdk {\n     /* True if vHost device is 'up' and has been reconfigured at least once */\n     bool vhost_reconfigured;\n \n+    /* True if dq-zero-copy feature has successfully been enabled */\n+    bool dq_zc_enabled;\n+\n     /* Identifier used to distinguish vhost devices from each other. */\n     char vhost_id[PATH_MAX];\n \n@@ -923,6 +926,7 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no,\n     dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);\n     ovsrcu_index_init(&dev->vid, -1);\n     dev->vhost_reconfigured = false;\n+    dev->dq_zc_enabled = false;\n     dev->attached = false;\n \n     ovsrcu_init(&dev->qos_conf, NULL);\n@@ -1431,6 +1435,29 @@ netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap *args,\n     return 0;\n }\n \n+static void\n+dpdk_vhost_set_config_helper(struct netdev_dpdk *dev,\n+                             const struct smap *args)\n+{\n+    bool needs_reconfigure = false;\n+    bool zc_requested = smap_get_bool(args, \"dq-zero-copy\", false);\n+\n+    if (zc_requested &&\n+            !(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {\n+        dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;\n+        needs_reconfigure = true;\n+    } else if (!zc_requested &&\n+            (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {\n+        dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;\n+        needs_reconfigure = true;\n+    }\n+\n+    /* Only try to change ZC mode when device is down */\n+    if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) {\n+        netdev_request_reconfigure(&dev->up);\n+    }\n+}\n+\n static int\n netdev_dpdk_vhost_client_set_config(struct netdev *netdev,\n                                     const struct smap *args,\n@@ -1447,6 +1474,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev,\n             netdev_request_reconfigure(netdev);\n         }\n     }\n+\n+    dpdk_vhost_set_config_helper(dev, args);\n+\n+    ovs_mutex_unlock(&dev->mutex);\n+\n+    return 0;\n+}\n+\n+static int\n+netdev_dpdk_vhost_set_config(struct netdev *netdev,\n+                             const struct smap *args,\n+                             char **errp OVS_UNUSED)\n+{\n+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);\n+\n+    ovs_mutex_lock(&dev->mutex);\n+    dpdk_vhost_set_config_helper(dev, args);\n     ovs_mutex_unlock(&dev->mutex);\n \n     return 0;\n@@ -2771,6 +2815,46 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev)\n     }\n }\n \n+static void\n+vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool client_mode,\n+                            bool enable)\n+{\n+    int err = rte_vhost_driver_unregister(dev->vhost_id);\n+\n+    if (err) {\n+        VLOG_ERR(\"Error unregistering vHost socket %s; can't change zero copy \"\n+                \"mode\", dev->vhost_id);\n+    } else {\n+        err = dpdk_setup_vhost_device(dev, client_mode);\n+        if (err) {\n+            VLOG_ERR(\"Error changing zero copy mode for vHost socket %s\",\n+                    dev->vhost_id);\n+        } else if (enable) {\n+            dev->dq_zc_enabled = true;\n+            VLOG_INFO(\"Zero copy enabled for vHost socket %s\", dev->vhost_id);\n+        } else {\n+            dev->dq_zc_enabled = false;\n+            VLOG_INFO(\"Zero copy disabled for vHost socket %s\", dev->vhost_id);\n+        }\n+    }\n+}\n+\n+static void\n+vhost_check_zero_copy_status(struct netdev_dpdk *dev)\n+{\n+    bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT;\n+\n+    if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)\n+                && !dev->dq_zc_enabled) {\n+        /* ZC disabled but requested to be enabled, enable it. */\n+        vhost_change_zero_copy_mode(dev, mode, true);\n+    } else if (!(dev->vhost_driver_flags &\n+            RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev->dq_zc_enabled) {\n+        /* ZC enabled but requested to be disabled, disable it. */\n+        vhost_change_zero_copy_mode(dev, mode, false);\n+    }\n+}\n+\n /*\n  * Remove a virtio-net device from the specific vhost port.  Use dev->remove\n  * flag to stop any more packets from being sent or received to/from a VM and\n@@ -2816,6 +2900,7 @@ destroy_device(int vid)\n          */\n         ovsrcu_quiesce_start();\n         VLOG_INFO(\"vHost Device '%s' has been removed\", ifname);\n+        netdev_request_reconfigure(&dev->up);\n     } else {\n         VLOG_INFO(\"vHost Device '%s' not found\", ifname);\n     }\n@@ -3307,6 +3392,8 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)\n             /* Carrier status may need updating. */\n             netdev_change_seq_changed(&dev->up);\n         }\n+    } else {\n+        vhost_check_zero_copy_status(dev);\n     }\n \n     return 0;\n@@ -3468,7 +3555,7 @@ static const struct netdev_class dpdk_vhost_class =\n         NULL,\n         netdev_dpdk_vhost_construct,\n         netdev_dpdk_vhost_destruct,\n-        NULL,\n+        netdev_dpdk_vhost_set_config,\n         NULL,\n         netdev_dpdk_vhost_send,\n         netdev_dpdk_vhost_get_carrier,\ndiff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml\nindex d7f6839..55add4c 100644\n--- a/vswitchd/vswitch.xml\n+++ b/vswitchd/vswitch.xml\n@@ -2649,6 +2649,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \\\n         </p>\n       </column>\n \n+      <column name=\"options\" key=\"dq-zero-copy\"\n+              type='{\"type\": \"boolean\"}'>\n+        <p>\n+          The value specifies whether or not to enable dequeue zero copy on\n+          the given interface.\n+          The port must be in an inactive state in order to enable or disable\n+          this feature.\n+          Only supported by dpdkvhostuserclient and dpdkvhostuser interfaces.\n+        </p>\n+      </column>\n+\n       <column name=\"options\" key=\"n_rxq_desc\"\n               type='{\"type\": \"integer\", \"minInteger\": 1, \"maxInteger\": 4096}'>\n         <p>\n","prefixes":["ovs-dev","v4","2/2"]}