Message ID | 1517395494-13904-1-git-send-email-ciara.loftus@intel.com |
---|---|
State | Accepted |
Delegated to: | Ian Stokes |
Headers | show |
Series | [ovs-dev,v13] netdev-dpdk: Add support for vHost dequeue zero copy (experimental) | expand |
> -----Original Message----- > From: Loftus, Ciara > Sent: Wednesday, January 31, 2018 10:45 AM > To: dev@openvswitch.org > Cc: Loftus, Ciara <ciara.loftus@intel.com>; Stokes, Ian > <ian.stokes@intel.com> > Subject: [PATCH v13] netdev-dpdk: Add support for vHost dequeue zero copy > (experimental) > > Zero copy is disabled by default. To enable it, set the 'dq-zero-copy' > option to 'true' when configuring the Interface: > > ovs-vsctl set Interface dpdkvhostuserclient0 > options:vhost-server-path=/tmp/dpdkvhostuserclient0 > options:dq-zero-copy=true > > When packets from a vHost device with zero copy enabled are destined for a > single 'dpdk' port, the number of tx descriptors on that 'dpdk' port must > be set to a smaller value. 128 is recommended. This can be achieved like > so: > > ovs-vsctl set Interface dpdkport options:n_txq_desc=128 > > Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send > to should not exceed 128. Due to this requirement, the feature is > considered 'experimental'. > > Testing of the patch showed a ~8% improvement when switching 512B packets > between vHost devices on different VMs on the same host when zero copy was > enabled on the transmitting device. > > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> > Acked-by: Ilya Maximets <i.maximets@samsung.com> Thanks all for the reviews and work on this. I pushed this to DPDK_MERGE and DPDK_MERGE_2_9. Thanks Ian > --- > v13: > * Rebase > * Update commit message with recent performance measurement > > Documentation/intro/install/dpdk.rst | 2 + > Documentation/topics/dpdk/vhost-user.rst | 73 > ++++++++++++++++++++++++++++++++ > NEWS | 1 + > lib/netdev-dpdk.c | 18 ++++++++ > vswitchd/vswitch.xml | 11 +++++ > 5 files changed, 105 insertions(+) > > diff --git a/Documentation/intro/install/dpdk.rst > b/Documentation/intro/install/dpdk.rst > index 8d5ee56..ed358d5 100644 > --- a/Documentation/intro/install/dpdk.rst > +++ b/Documentation/intro/install/dpdk.rst > @@ -518,6 +518,8 @@ The above command sets the number of rx queues for > DPDK physical interface. > The rx queues are assigned to pmd threads on the same NUMA node in a > round-robin fashion. > > +.. _dpdk-queues-sizes: > + > DPDK Physical Port Queue Sizes > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > diff --git a/Documentation/topics/dpdk/vhost-user.rst > b/Documentation/topics/dpdk/vhost-user.rst > index 8447e2d..95517a6 100644 > --- a/Documentation/topics/dpdk/vhost-user.rst > +++ b/Documentation/topics/dpdk/vhost-user.rst > @@ -458,3 +458,76 @@ Sample XML > </domain> > > .. _QEMU documentation: http://git.qemu- > project.org/?p=qemu.git;a=blob;f=docs/specs/vhost- > user.txt;h=7890d7169;hb=HEAD > + > +vhost-user Dequeue Zero Copy (experimental) > +------------------------------------------- > + > +Normally when dequeuing a packet from a vHost User device, a memcpy > +operation must be used to copy that packet from guest address space to > +host address space. This memcpy can be removed by enabling dequeue zero- > copy like so:: > + > + $ ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface \ > + dpdkvhostuserclient0 type=dpdkvhostuserclient \ > + options:vhost-server-path=/tmp/dpdkvhostclient0 \ > + options:dq-zero-copy=true > + > +With this feature enabled, a reference (pointer) to the packet is > +passed to the host, instead of a copy of the packet. Removing this > +memcpy can give a performance improvement for some use cases, for > +example switching large packets between different VMs. However additional > packet loss may be observed. > + > +Note that the feature is disabled by default and must be explicitly > +enabled by setting the ``dq-zero-copy`` option to ``true`` while > +specifying the ``vhost-server-path`` option as above. If you wish to > +split out the command into multiple commands as below, ensure > +``dq-zero-copy`` is set before > +``vhost-server-path``:: > + > + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero- > copy=true > + $ ovs-vsctl set Interface dpdkvhostuserclient0 \ > + options:vhost-server-path=/tmp/dpdkvhostclient0 > + > +The feature is only available to ``dpdkvhostuserclient`` port types. > + > +A limitation exists whereby if packets from a vHost port with > +``dq-zero-copy=true`` are destined for a ``dpdk`` type port, the number > +of tx descriptors (``n_txq_desc``) for that port must be reduced to a > +smaller number, > +128 being the recommended value. This can be achieved by issuing the > +following > +command:: > + > + $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128 > + > +Note: The sum of the tx descriptors of all ``dpdk`` ports the VM will > +send to should not exceed 128. For example, in case of a bond over two > +physical ports in balance-tcp mode, one must divide 128 by the number of > links in the bond. > + > +Refer to :ref:`dpdk-queues-sizes` for more information. > + > +The reason for this limitation is due to how the zero copy > +functionality is implemented. The vHost device's 'tx used vring', a > +virtio structure used for tracking used ie. sent descriptors, will only > +be updated when the NIC frees the corresponding mbuf. If we don't free > +the mbufs frequently enough, that vring will be starved and packets > +will no longer be processed. One way to ensure we don't encounter this > +scenario, is to configure ``n_txq_desc`` to a small enough number such > +that the 'mbuf free threshold' for the NIC will be hit more often and > +thus free mbufs more frequently. The value of 128 is suggested, but > +values of 64 and 256 have been tested and verified to work too, with > +differing performance characteristics. A value of 512 can be used too, > +if the virtio queue size in the guest is increased to 1024 (available to > configure in QEMU versions v2.10 and greater). This value can be set like > so:: > + > + $ qemu-system-x86_64 ... -chardev > socket,id=char1,path=<sockpath>,server > + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce > + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1, > + tx_queue_size=1024 > + > +Because of this limitation, this feature is considered 'experimental'. > + > +The feature currently does not fully work with QEMU >= v2.7 due to a > +bug in DPDK which will be addressed in an upcoming release. The patch > +to fix this issue can be found on `Patchwork > +<http://dpdk.org/dev/patchwork/patch/32198/>`__ > + > +Further information can be found in the `DPDK documentation > +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__ > diff --git a/NEWS b/NEWS > index 726589c..2023b25 100644 > --- a/NEWS > +++ b/NEWS > @@ -56,6 +56,7 @@ v2.9.0 - xx xxx xxxx > * New appctl command 'dpif-netdev/pmd-rxq-rebalance' to rebalance > rxq to > pmd assignments. > * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'. > + * Add support for vHost dequeue zero copy (experimental) > - Userspace datapath: > * Output packet batching support. > - vswitchd: > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index d41f5aa..94fb163 > 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -1564,6 +1564,12 @@ netdev_dpdk_vhost_client_set_config(struct netdev > *netdev, > path = smap_get(args, "vhost-server-path"); > if (path && strcmp(path, dev->vhost_id)) { > strcpy(dev->vhost_id, path); > + /* check zero copy configuration */ > + if (smap_get_bool(args, "dq-zero-copy", false)) { > + dev->vhost_driver_flags |= > RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > + } else { > + dev->vhost_driver_flags &= > ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > + } > netdev_request_reconfigure(netdev); > } > } > @@ -3627,6 +3633,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev > *netdev) > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > int err; > uint64_t vhost_flags = 0; > + bool zc_enabled; > > ovs_mutex_lock(&dev->mutex); > > @@ -3644,6 +3651,14 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev > *netdev) > if (dpdk_vhost_iommu_enabled()) { > vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT; > } > + > + zc_enabled = dev->vhost_driver_flags > + & RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > + /* Enable zero copy flag, if requested */ > + if (zc_enabled) { > + vhost_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > + } > + > err = rte_vhost_driver_register(dev->vhost_id, vhost_flags); > if (err) { > VLOG_ERR("vhost-user device setup failure for device %s\n", > @@ -3655,6 +3670,9 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev > *netdev) > VLOG_INFO("vHost User device '%s' created in 'client' mode, " > "using client socket '%s'", > dev->up.name, dev->vhost_id); > + if (zc_enabled) { > + VLOG_INFO("Zero copy enabled for vHost port %s", dev- > >up.name); > + } > } > > err = rte_vhost_driver_callback_register(dev->vhost_id, > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index > 7e89325..0c6a43d 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -2698,6 +2698,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 > type=patch options:peer=p1 \ > </p> > </column> > > + <column name="options" key="dq-zero-copy" > + type='{"type": "boolean"}'> > + <p> > + The value specifies whether or not to enable dequeue zero copy > on > + the given interface. > + Must be set before vhost-server-path is specified. > + Only supported by dpdkvhostuserclient interfaces. > + The feature is considered experimental. > + </p> > + </column> > + > <column name="options" key="n_rxq_desc" > type='{"type": "integer", "minInteger": 1, "maxInteger": > 4096}'> > <p> > -- > 2.7.5
diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index 8d5ee56..ed358d5 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -518,6 +518,8 @@ The above command sets the number of rx queues for DPDK physical interface. The rx queues are assigned to pmd threads on the same NUMA node in a round-robin fashion. +.. _dpdk-queues-sizes: + DPDK Physical Port Queue Sizes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst index 8447e2d..95517a6 100644 --- a/Documentation/topics/dpdk/vhost-user.rst +++ b/Documentation/topics/dpdk/vhost-user.rst @@ -458,3 +458,76 @@ Sample XML </domain> .. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD + +vhost-user Dequeue Zero Copy (experimental) +------------------------------------------- + +Normally when dequeuing a packet from a vHost User device, a memcpy operation +must be used to copy that packet from guest address space to host address +space. This memcpy can be removed by enabling dequeue zero-copy like so:: + + $ ovs-vsctl add-port br0 dpdkvhostuserclient0 -- set Interface \ + dpdkvhostuserclient0 type=dpdkvhostuserclient \ + options:vhost-server-path=/tmp/dpdkvhostclient0 \ + options:dq-zero-copy=true + +With this feature enabled, a reference (pointer) to the packet is passed to +the host, instead of a copy of the packet. Removing this memcpy can give a +performance improvement for some use cases, for example switching large packets +between different VMs. However additional packet loss may be observed. + +Note that the feature is disabled by default and must be explicitly enabled +by setting the ``dq-zero-copy`` option to ``true`` while specifying the +``vhost-server-path`` option as above. If you wish to split out the command +into multiple commands as below, ensure ``dq-zero-copy`` is set before +``vhost-server-path``:: + + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true + $ ovs-vsctl set Interface dpdkvhostuserclient0 \ + options:vhost-server-path=/tmp/dpdkvhostclient0 + +The feature is only available to ``dpdkvhostuserclient`` port types. + +A limitation exists whereby if packets from a vHost port with +``dq-zero-copy=true`` are destined for a ``dpdk`` type port, the number of tx +descriptors (``n_txq_desc``) for that port must be reduced to a smaller number, +128 being the recommended value. This can be achieved by issuing the following +command:: + + $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128 + +Note: The sum of the tx descriptors of all ``dpdk`` ports the VM will send to +should not exceed 128. For example, in case of a bond over two physical ports +in balance-tcp mode, one must divide 128 by the number of links in the bond. + +Refer to :ref:`dpdk-queues-sizes` for more information. + +The reason for this limitation is due to how the zero copy functionality is +implemented. The vHost device's 'tx used vring', a virtio structure used for +tracking used ie. sent descriptors, will only be updated when the NIC frees +the corresponding mbuf. If we don't free the mbufs frequently enough, that +vring will be starved and packets will no longer be processed. One way to +ensure we don't encounter this scenario, is to configure ``n_txq_desc`` to a +small enough number such that the 'mbuf free threshold' for the NIC will be hit +more often and thus free mbufs more frequently. The value of 128 is suggested, +but values of 64 and 256 have been tested and verified to work too, with +differing performance characteristics. A value of 512 can be used too, if the +virtio queue size in the guest is increased to 1024 (available to configure in +QEMU versions v2.10 and greater). This value can be set like so:: + + $ qemu-system-x86_64 ... -chardev socket,id=char1,path=<sockpath>,server + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1, + tx_queue_size=1024 + +Because of this limitation, this feature is considered 'experimental'. + +The feature currently does not fully work with QEMU >= v2.7 due to a bug in +DPDK which will be addressed in an upcoming release. The patch to fix this +issue can be found on +`Patchwork +<http://dpdk.org/dev/patchwork/patch/32198/>`__ + +Further information can be found in the +`DPDK documentation +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__ diff --git a/NEWS b/NEWS index 726589c..2023b25 100644 --- a/NEWS +++ b/NEWS @@ -56,6 +56,7 @@ v2.9.0 - xx xxx xxxx * New appctl command 'dpif-netdev/pmd-rxq-rebalance' to rebalance rxq to pmd assignments. * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'. + * Add support for vHost dequeue zero copy (experimental) - Userspace datapath: * Output packet batching support. - vswitchd: diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index d41f5aa..94fb163 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1564,6 +1564,12 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev, path = smap_get(args, "vhost-server-path"); if (path && strcmp(path, dev->vhost_id)) { strcpy(dev->vhost_id, path); + /* check zero copy configuration */ + if (smap_get_bool(args, "dq-zero-copy", false)) { + dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + } else { + dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + } netdev_request_reconfigure(netdev); } } @@ -3627,6 +3633,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int err; uint64_t vhost_flags = 0; + bool zc_enabled; ovs_mutex_lock(&dev->mutex); @@ -3644,6 +3651,14 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) if (dpdk_vhost_iommu_enabled()) { vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT; } + + zc_enabled = dev->vhost_driver_flags + & RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + /* Enable zero copy flag, if requested */ + if (zc_enabled) { + vhost_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + } + err = rte_vhost_driver_register(dev->vhost_id, vhost_flags); if (err) { VLOG_ERR("vhost-user device setup failure for device %s\n", @@ -3655,6 +3670,9 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) VLOG_INFO("vHost User device '%s' created in 'client' mode, " "using client socket '%s'", dev->up.name, dev->vhost_id); + if (zc_enabled) { + VLOG_INFO("Zero copy enabled for vHost port %s", dev->up.name); + } } err = rte_vhost_driver_callback_register(dev->vhost_id, diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 7e89325..0c6a43d 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -2698,6 +2698,17 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ </p> </column> + <column name="options" key="dq-zero-copy" + type='{"type": "boolean"}'> + <p> + The value specifies whether or not to enable dequeue zero copy on + the given interface. + Must be set before vhost-server-path is specified. + Only supported by dpdkvhostuserclient interfaces. + The feature is considered experimental. + </p> + </column> + <column name="options" key="n_rxq_desc" type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'> <p>