From patchwork Thu Nov 2 09:23:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ciara Loftus X-Patchwork-Id: 833265 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ySKRt0hNBz9sNx for ; Thu, 2 Nov 2017 20:24:58 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 410DAACA; Thu, 2 Nov 2017 09:23:47 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 22C97AA5 for ; Thu, 2 Nov 2017 09:23:46 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 1F8544CE for ; Thu, 2 Nov 2017 09:23:45 +0000 (UTC) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Nov 2017 02:23:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,333,1505804400"; d="scan'208";a="145057681" Received: from sivswdev01.ir.intel.com (HELO localhost.localdomain) ([10.237.217.45]) by orsmga004.jf.intel.com with ESMTP; 02 Nov 2017 02:23:43 -0700 From: Ciara Loftus To: dev@openvswitch.org Date: Thu, 2 Nov 2017 09:23:31 +0000 Message-Id: <1509614611-4233-3-git-send-email-ciara.loftus@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: <1509614611-4233-1-git-send-email-ciara.loftus@intel.com> References: <1509614611-4233-1-git-send-email-ciara.loftus@intel.com> X-Spam-Status: No, score=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD autolearn=disabled version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v4 2/2] netdev-dpdk: Enable optional dequeue zero copy for vHost User X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Enabled per port like so: ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true The feature is disabled by default and can only be enabled/disabled when a vHost port is down. When packets from a vHost device with zero copy enabled are destined for a 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be set to a smaller value. 128 is recommended. This can be achieved like so: ovs-vsctl set Interface dpdkport options:n_txq_desc=128 Signed-off-by: Ciara Loftus --- v4: * Rebase Documentation/howto/dpdk.rst | 33 ++++++++++++ Documentation/topics/dpdk/vhost-user.rst | 58 +++++++++++++++++++++ NEWS | 3 ++ lib/netdev-dpdk.c | 89 +++++++++++++++++++++++++++++++- vswitchd/vswitch.xml | 11 ++++ 5 files changed, 193 insertions(+), 1 deletion(-) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index d123819..3e1b8f8 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -709,3 +709,36 @@ devices to bridge ``br0``. Once complete, follow the below steps: Check traffic on multiple queues:: $ cat /proc/interrupts | grep virtio + +PHY-VM-PHY (vHost Dequeue Zero Copy) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +vHost dequeue zero copy functionality can be validated using the +PHY-VM-PHY configuration. To begin, follow the steps described in +:ref:`dpdk-phy-phy` to create and initialize the database, start +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices +and flows to bridge ``br0``. Once complete, follow the below steps: + +1. Enable dequeue zero copy on the vHost devices. + + $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-copy=true + $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero-copy=true + +The following log should be observed for each device: + + netdev_dpdk|INFO|Zero copy enabled for vHost socket + +2. Reduce the number of txq descriptors on the phy ports. + + $ ovs-vsctl set Interface phy0 options:n_txq_desc=128 + $ ovs-vsctl set Interface phy1 options:n_txq_desc=128 + +3. Proceed with the test by launching the VM and configuring guest +forwarding, be it via the vHost loopback method or kernel forwarding +method, and sending traffic. The following log should be oberved for +each device as it becomes active during VM boot: + + VHOST_CONFIG: dequeue zero copy is enabled + +It is essential that step 1 is performed before booting the VM, otherwise +the feature will not be enabled. diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst index 74ac06e..2636d5a 100644 --- a/Documentation/topics/dpdk/vhost-user.rst +++ b/Documentation/topics/dpdk/vhost-user.rst @@ -408,3 +408,61 @@ Sample XML .. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD + +vhost-user Dequeue Zero Copy +------------------------------------- + +Normally when dequeuing a packet from a vHost User device, a memcpy operation +must be used to copy that packet from guest address space to host address +space. This memcpy can be removed by enabling dequeue zero-copy like so: + + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true + +With this feature enabled, a reference (pointer) to the packet is passed to +the host, instead of a copy of the packet. Removing this memcpy can give a +performance improvement for some use cases, for example switching large packets +between different VMs. + +Note that the feature is disabled by default and must be explicitly enabled +by using the command above. + +The feature cannot be enabled when the device is active (ie. VM booted). If +you wish to enable the feature after the VM has booted, you must shutdown +the VM and bring it back up. + +The same logic applies for disabling the feature - it must be disabled when +the device is inactive, for example before VM boot. To disable the feature: + + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=false + +The feature is available to both dpdkvhostuser and dpdkvhostuserclient port +types. + +A limitation exists whereby if packets from a vHost port with dq-zero-copy=true +are destined for a 'dpdk' type port, the number of tx descriptors (n_txq_desc) +for that port must be reduced to a smaller number, 128 being the recommended +value. This can be achieved by issuing the following command: + + $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128 + +More information on the n_txq_desc option can be found in the "DPDK Physical +Port Queue Sizes" section of the `intro/install/dpdk.rst` guide. + +The reason for this limitation is due to how the zero copy functionality is +implemented. The vHost device's 'tx used vring', a virtio structure used for +tracking used ie. sent descriptors, will only be updated when the NIC frees +the corresponding mbuf. If we don't free the mbufs frequently enough, that +vring will be starved and packets will no longer be processed. One way to +ensure we don't encounter this scenario, is to configure n_txq_desc to a small +enough number such that the 'mbuf free threshold' for the NIC will be hit more +often and thus free mbufs more frequently. The value of 128 is suggested, but +values of 64 and 256 have been tested and verified to work too, with differing +performance characteristics. + +Further information can be found in the +`DPDK documentation +`__ + +Further information can be found in the +`DPDK documentation +`__ diff --git a/NEWS b/NEWS index 1325d31..0f6dfa1 100644 --- a/NEWS +++ b/NEWS @@ -5,6 +5,9 @@ Post-v2.8.0 chassis "hostname" in addition to a chassis "name". - Linux kernel 4.13 * Add support for compiling OVS with the latest Linux 4.13 kernel + - DPDK: + * Optional dequeue zero copy feature for vHost ports enabled per port + via the boolean 'dq-zero-copy' option. v2.8.0 - 31 Aug 2017 -------------------- diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 4643f6f..db813f3 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -379,6 +379,9 @@ struct netdev_dpdk { /* True if vHost device is 'up' and has been reconfigured at least once */ bool vhost_reconfigured; + /* True if dq-zero-copy feature has successfully been enabled */ + bool dq_zc_enabled; + /* Identifier used to distinguish vhost devices from each other. */ char vhost_id[PATH_MAX]; @@ -923,6 +926,7 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no, dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); ovsrcu_index_init(&dev->vid, -1); dev->vhost_reconfigured = false; + dev->dq_zc_enabled = false; dev->attached = false; ovsrcu_init(&dev->qos_conf, NULL); @@ -1431,6 +1435,29 @@ netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap *args, return 0; } +static void +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev, + const struct smap *args) +{ + bool needs_reconfigure = false; + bool zc_requested = smap_get_bool(args, "dq-zero-copy", false); + + if (zc_requested && + !(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { + dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + needs_reconfigure = true; + } else if (!zc_requested && + (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { + dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + needs_reconfigure = true; + } + + /* Only try to change ZC mode when device is down */ + if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) { + netdev_request_reconfigure(&dev->up); + } +} + static int netdev_dpdk_vhost_client_set_config(struct netdev *netdev, const struct smap *args, @@ -1447,6 +1474,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev, netdev_request_reconfigure(netdev); } } + + dpdk_vhost_set_config_helper(dev, args); + + ovs_mutex_unlock(&dev->mutex); + + return 0; +} + +static int +netdev_dpdk_vhost_set_config(struct netdev *netdev, + const struct smap *args, + char **errp OVS_UNUSED) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + + ovs_mutex_lock(&dev->mutex); + dpdk_vhost_set_config_helper(dev, args); ovs_mutex_unlock(&dev->mutex); return 0; @@ -2771,6 +2815,46 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev) } } +static void +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool client_mode, + bool enable) +{ + int err = rte_vhost_driver_unregister(dev->vhost_id); + + if (err) { + VLOG_ERR("Error unregistering vHost socket %s; can't change zero copy " + "mode", dev->vhost_id); + } else { + err = dpdk_setup_vhost_device(dev, client_mode); + if (err) { + VLOG_ERR("Error changing zero copy mode for vHost socket %s", + dev->vhost_id); + } else if (enable) { + dev->dq_zc_enabled = true; + VLOG_INFO("Zero copy enabled for vHost socket %s", dev->vhost_id); + } else { + dev->dq_zc_enabled = false; + VLOG_INFO("Zero copy disabled for vHost socket %s", dev->vhost_id); + } + } +} + +static void +vhost_check_zero_copy_status(struct netdev_dpdk *dev) +{ + bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT; + + if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY) + && !dev->dq_zc_enabled) { + /* ZC disabled but requested to be enabled, enable it. */ + vhost_change_zero_copy_mode(dev, mode, true); + } else if (!(dev->vhost_driver_flags & + RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev->dq_zc_enabled) { + /* ZC enabled but requested to be disabled, disable it. */ + vhost_change_zero_copy_mode(dev, mode, false); + } +} + /* * Remove a virtio-net device from the specific vhost port. Use dev->remove * flag to stop any more packets from being sent or received to/from a VM and @@ -2816,6 +2900,7 @@ destroy_device(int vid) */ ovsrcu_quiesce_start(); VLOG_INFO("vHost Device '%s' has been removed", ifname); + netdev_request_reconfigure(&dev->up); } else { VLOG_INFO("vHost Device '%s' not found", ifname); } @@ -3307,6 +3392,8 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) /* Carrier status may need updating. */ netdev_change_seq_changed(&dev->up); } + } else { + vhost_check_zero_copy_status(dev); } return 0; @@ -3468,7 +3555,7 @@ static const struct netdev_class dpdk_vhost_class = NULL, netdev_dpdk_vhost_construct, netdev_dpdk_vhost_destruct, - NULL, + netdev_dpdk_vhost_set_config, NULL, netdev_dpdk_vhost_send, netdev_dpdk_vhost_get_carrier, diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index d7f6839..55add4c 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -2649,6 +2649,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \

+ +

+ The value specifies whether or not to enable dequeue zero copy on + the given interface. + The port must be in an inactive state in order to enable or disable + this feature. + Only supported by dpdkvhostuserclient and dpdkvhostuser interfaces. +

+
+