From patchwork Wed Jan 24 08:30:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ciara Loftus X-Patchwork-Id: 865245 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zRJJj4mG3z9s7n for ; Wed, 24 Jan 2018 19:30:29 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 9350CEF6; Wed, 24 Jan 2018 08:30:26 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 19440E1A for ; Wed, 24 Jan 2018 08:30:25 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 27EE6285 for ; Wed, 24 Jan 2018 08:30:24 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jan 2018 00:30:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,406,1511856000"; d="scan'208";a="13781803" Received: from sivswdev01.ir.intel.com (HELO localhost.localdomain) ([10.237.217.45]) by fmsmga002.fm.intel.com with ESMTP; 24 Jan 2018 00:30:22 -0800 From: Ciara Loftus To: dev@openvswitch.org Date: Wed, 24 Jan 2018 08:30:19 +0000 Message-Id: <1516782619-3089-1-git-send-email-ciara.loftus@intel.com> X-Mailer: git-send-email 1.7.0.7 X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v11] netdev-dpdk: Add support for vHost dequeue zero copy (experimental) X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Zero copy is disabled by default. To enable it, set the 'dq-zero-copy' option to 'true' when configuring the Interface: ovs-vsctl set Interface dpdkvhostuserclient0 options:vhost-server-path=/tmp/dpdkvhostuserclient0 options:dq-zero-copy=true When packets from a vHost device with zero copy enabled are destined for a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be set to a smaller value. 128 is recommended. This can be achieved like so: ovs-vsctl set Interface dpdkport options:n_txq_desc=128 Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send to should not exceed 128. Due to this requirement, the feature is considered 'experimental'. Testing of the patch showed a 15% improvement when switching 512B packets between vHost devices on different VMs on the same host when zero copy was enabled on the transmitting device. Signed-off-by: Ciara Loftus --- v11: * Rebase * Fix mutex Documentation/intro/install/dpdk.rst | 2 + Documentation/topics/dpdk/vhost-user.rst | 73 ++++++++++++++++++++++++++++++++ NEWS | 1 + lib/netdev-dpdk.c | 17 ++++++++ vswitchd/vswitch.xml | 11 +++++ 5 files changed, 104 insertions(+) diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index 040e62e..93411aa 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -518,6 +518,8 @@ The above command sets the number of rx queues for DPDK physical interface. The rx queues are assigned to pmd threads on the same NUMA node in a round-robin fashion. +.. _dpdk-queues-sizes: + DPDK Physical Port Queue Sizes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst index 8447e2d..95517a6 100644 --- a/Documentation/topics/dpdk/vhost-user.rst +++ b/Documentation/topics/dpdk/vhost-user.rst @@ -458,3 +458,76 @@ Sample XML .. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD + +vhost-user Dequeue Zero Copy (experimental) +------------------------------------------- + +Normally when dequeuing a packet from a vHost User device, a memcpy operation +must be used to copy that packet from guest address space to host address +space. This memcpy can be removed by enabling dequeue zero-copy like so:: + + $ ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface \ + dpdkvhostuserclient0 type=dpdkvhostuserclient \ + options:vhost-server-path=/tmp/dpdkvhostclient0 \ + options:dq-zero-copy=true + +With this feature enabled, a reference (pointer) to the packet is passed to +the host, instead of a copy of the packet. Removing this memcpy can give a +performance improvement for some use cases, for example switching large packets +between different VMs. However additional packet loss may be observed. + +Note that the feature is disabled by default and must be explicitly enabled +by setting the ``dq-zero-copy`` option to ``true`` while specifying the +``vhost-server-path`` option as above. If you wish to split out the command +into multiple commands as below, ensure ``dq-zero-copy`` is set before +``vhost-server-path``:: + + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true + $ ovs-vsctl set Interface dpdkvhostuserclient0 \ + options:vhost-server-path=/tmp/dpdkvhostclient0 + +The feature is only available to ``dpdkvhostuserclient`` port types. + +A limitation exists whereby if packets from a vHost port with +``dq-zero-copy=true`` are destined for a ``dpdk`` type port, the number of tx +descriptors (``n_txq_desc``) for that port must be reduced to a smaller number, +128 being the recommended value. This can be achieved by issuing the following +command:: + + $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128 + +Note: The sum of the tx descriptors of all ``dpdk`` ports the VM will send to +should not exceed 128. For example, in case of a bond over two physical ports +in balance-tcp mode, one must divide 128 by the number of links in the bond. + +Refer to :ref:`dpdk-queues-sizes` for more information. + +The reason for this limitation is due to how the zero copy functionality is +implemented. The vHost device's 'tx used vring', a virtio structure used for +tracking used ie. sent descriptors, will only be updated when the NIC frees +the corresponding mbuf. If we don't free the mbufs frequently enough, that +vring will be starved and packets will no longer be processed. One way to +ensure we don't encounter this scenario, is to configure ``n_txq_desc`` to a +small enough number such that the 'mbuf free threshold' for the NIC will be hit +more often and thus free mbufs more frequently. The value of 128 is suggested, +but values of 64 and 256 have been tested and verified to work too, with +differing performance characteristics. A value of 512 can be used too, if the +virtio queue size in the guest is increased to 1024 (available to configure in +QEMU versions v2.10 and greater). This value can be set like so:: + + $ qemu-system-x86_64 ... -chardev socket,id=char1,path=,server + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1, + tx_queue_size=1024 + +Because of this limitation, this feature is considered 'experimental'. + +The feature currently does not fully work with QEMU >= v2.7 due to a bug in +DPDK which will be addressed in an upcoming release. The patch to fix this +issue can be found on +`Patchwork +`__ + +Further information can be found in the +`DPDK documentation +`__ diff --git a/NEWS b/NEWS index d7c83c2..32fd6a6 100644 --- a/NEWS +++ b/NEWS @@ -54,6 +54,7 @@ v2.9.0 - xx xxx xxxx * New appctl command 'dpif-netdev/pmd-rxq-rebalance' to rebalance rxq to pmd assignments. * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'. + * Add support for vHost dequeue zero copy (experimental) - Userspace datapath: * Output packet batching support. - vswitchd: diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index ac2e38e..d598f63 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1519,6 +1519,12 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev, path = smap_get(args, "vhost-server-path"); if (path && strcmp(path, dev->vhost_id)) { strcpy(dev->vhost_id, path); + /* check zero copy configuration */ + if (smap_get_bool(args, "dq-zero-copy", false)) { + dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + } else { + dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + } netdev_request_reconfigure(netdev); } } @@ -3569,8 +3575,10 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int err; uint64_t vhost_flags = 0; + bool zc_enabled; ovs_mutex_lock(&dev->mutex); + zc_enabled = dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY; /* Configure vHost client mode if requested and if the following criteria * are met: @@ -3586,6 +3594,12 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) if (dpdk_vhost_iommu_enabled()) { vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT; } + + /* Enable zero copy flag, if requested */ + if (zc_enabled) { + vhost_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + } + err = rte_vhost_driver_register(dev->vhost_id, vhost_flags); if (err) { VLOG_ERR("vhost-user device setup failure for device %s\n", @@ -3597,6 +3611,9 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) VLOG_INFO("vHost User device '%s' created in 'client' mode, " "using client socket '%s'", dev->up.name, dev->vhost_id); + if (zc_enabled) { + VLOG_INFO("Zero copy enabled for vHost port %s", dev->up.name); + } } err = rte_vhost_driver_callback_register(dev->vhost_id, diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 7e89325..0c6a43d 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -2698,6 +2698,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \

+ +

+ The value specifies whether or not to enable dequeue zero copy on + the given interface. + Must be set before vhost-server-path is specified. + Only supported by dpdkvhostuserclient interfaces. + The feature is considered experimental. +

+
+