Message ID | 1513701857-8137-1-git-send-email-ciara.loftus@intel.com |
---|---|
State | Superseded |
Delegated to: | Ian Stokes |
Headers | show |
Series | [ovs-dev,v7] netdev-dpdk: Add support for vHost dequeue zero copy (experimental) | expand |
On 19.12.2017 19:44, Ciara Loftus wrote: > Enabled per port like so: > ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true > > If the device is already active at the point of enabling/disabling the > feature via ovs-vsctl, one must reload the guest driver or reboot the VM > in order for the feature to be successfully enabled/disabled. > > When packets from a vHost device with zero copy enabled are destined for > a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port > must be set to a smaller value. 128 is recommended. This can be achieved > like so: > > ovs-vsctl set Interface dpdkport options:n_txq_desc=128 > > Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send > to should not exceed 128. Due to this requirement, the feature is > considered 'experimental'. > > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> > --- > v7: > * Remove support for zc for dpdkvhostuser ports & patch 1 in the series. > * Remove entry from howto Documentation as the document in its current > state is only relevant to dpdkvhostuser ports which do not support the > feature. > * Use ref: link to queue size documentation. > * Elaborate on details of NIC descriptors limitation eg. cases where > packets are destined to >1 NICs. > * Fix comment about padding in netdev-dpdk struct. > > Documentation/intro/install/dpdk.rst | 2 + > Documentation/topics/dpdk/vhost-user.rst | 71 +++++++++++++++++++++++ > NEWS | 1 + > lib/netdev-dpdk.c | 97 ++++++++++++++++++++++++++++---- > vswitchd/vswitch.xml | 13 +++++ > 5 files changed, 174 insertions(+), 10 deletions(-) > > diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst > index 3fecb5c..087eb88 100644 > --- a/Documentation/intro/install/dpdk.rst > +++ b/Documentation/intro/install/dpdk.rst > @@ -518,6 +518,8 @@ The above command sets the number of rx queues for DPDK physical interface. > The rx queues are assigned to pmd threads on the same NUMA node in a > round-robin fashion. > > +.. _dpdk-queues-sizes: > + > DPDK Physical Port Queue Sizes > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst > index 8447e2d..d82ca40 100644 > --- a/Documentation/topics/dpdk/vhost-user.rst > +++ b/Documentation/topics/dpdk/vhost-user.rst > @@ -458,3 +458,74 @@ Sample XML > </domain> > > .. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD > + > +vhost-user Dequeue Zero Copy (experimental) > +------------------------------------------- > + > +Normally when dequeuing a packet from a vHost User device, a memcpy operation > +must be used to copy that packet from guest address space to host address > +space. This memcpy can be removed by enabling dequeue zero-copy like so:: > + > + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true > + > +With this feature enabled, a reference (pointer) to the packet is passed to > +the host, instead of a copy of the packet. Removing this memcpy can give a > +performance improvement for some use cases, for example switching large packets > +between different VMs. However additional packet loss may be observed. > + > +Note that the feature is disabled by default and must be explicitly enabled > +by using the command above. > + > +The feature cannot be enabled when the device is active. If the device is > +active when enabling the feature via ovs-vsctl, a driver reload or reboot > +is required for the changes to take effect. > + > +The same logic applies for disabling the feature - it must be disabled when > +the device is inactive. To disable the feature:: > + > + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=false > + > +The feature is only available to dpdkvhostuserclient port types. > + > +A limitation exists whereby if packets from a vHost port with dq-zero-copy=true > +are destined for a 'dpdk' type port, the number of tx descriptors (n_txq_desc) > +for that port must be reduced to a smaller number, 128 being the recommended > +value. This can be achieved by issuing the following command:: > + > + $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128 > + > +Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send to > +should not exceed 128. For example, in case of a bond over two physical ports > +in balance-tcp mode, one must divide 128 by the number of links in the bond. > + > +Refer to :ref:`<dpdk-queue-sizes>` for more information. > + > +The reason for this limitation is due to how the zero copy functionality is > +implemented. The vHost device's 'tx used vring', a virtio structure used for > +tracking used ie. sent descriptors, will only be updated when the NIC frees > +the corresponding mbuf. If we don't free the mbufs frequently enough, that > +vring will be starved and packets will no longer be processed. One way to > +ensure we don't encounter this scenario, is to configure n_txq_desc to a small > +enough number such that the 'mbuf free threshold' for the NIC will be hit more > +often and thus free mbufs more frequently. The value of 128 is suggested, but > +values of 64 and 256 have been tested and verified to work too, with differing > +performance characteristics. A value of 512 can be used too, if the virtio > +queue size in the guest is increased to 1024 (available to configure in QEMU > +versions v2.10 and greater). This value can be set like so:: > + > + $ qemu-system-x86_64 ... -chardev socket,id=char1,path=<sockpath>,server > + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce > + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1, > + tx_queue_size=1024 > + > +Because of this limitation, this feature is condsidered 'experimental'. > + > +The feature currently does not fully work with QEMU >= v2.7 due to a bug in > +DPDK which will be addressed in an upcoming release. The patch to fix this > +issue can be found on > +`Patchwork > +<http://dpdk.org/dev/patchwork/patch/32198/>`__ > + > +Further information can be found in the > +`DPDK documentation > +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__ > diff --git a/NEWS b/NEWS > index 752e98f..1eaa32c 100644 > --- a/NEWS > +++ b/NEWS > @@ -24,6 +24,7 @@ Post-v2.8.0 > - DPDK: > * Add support for DPDK v17.11 > * Add support for vHost IOMMU > + * Add support for vHost dequeue zero copy (experimental) > > v2.8.0 - 31 Aug 2017 > -------------------- > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index 8f22264..4b82a06 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -378,7 +378,10 @@ struct netdev_dpdk { > > /* True if vHost device is 'up' and has been reconfigured at least once */ > bool vhost_reconfigured; > - /* 3 pad bytes here. */ > + > + /* True if dq-zero-copy feature has successfully been enabled */ > + bool dq_zc_enabled; > + /* 2 pad bytes here. */ > ); > > PADDED_MEMBERS(CACHE_LINE_SIZE, > @@ -889,6 +892,7 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no, > dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > ovsrcu_index_init(&dev->vid, -1); > dev->vhost_reconfigured = false; > + dev->dq_zc_enabled = false; > dev->attached = false; > > ovsrcu_init(&dev->qos_conf, NULL); > @@ -1371,6 +1375,27 @@ netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap *args, > return 0; > } > > +static bool > +dpdk_vhost_check_zero_copy_config(struct netdev_dpdk *dev, > + const struct smap *args) > +{ > + bool needs_reconfigure = false; > + bool zc_requested = smap_get_bool(args, "dq-zero-copy", false); > + > + if (zc_requested && > + !(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { > + dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > + needs_reconfigure = true; > + } else if (!zc_requested && > + (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { > + dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY; > + needs_reconfigure = true; > + } > + > + /* Only try to change ZC mode when device is down */ > + return needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1); > +} > + > static int > netdev_dpdk_vhost_client_set_config(struct netdev *netdev, > const struct smap *args, > @@ -1378,15 +1403,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev, > { > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > const char *path; > + bool needs_reconfigure = false; > > ovs_mutex_lock(&dev->mutex); > if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)) { > path = smap_get(args, "vhost-server-path"); > if (path && strcmp(path, dev->vhost_id)) { > strcpy(dev->vhost_id, path); > - netdev_request_reconfigure(netdev); > + needs_reconfigure = true; > } > } > + > + needs_reconfigure |= dpdk_vhost_check_zero_copy_config(dev, args); > + > + if (needs_reconfigure) { > + netdev_request_reconfigure(netdev); > + } > + > ovs_mutex_unlock(&dev->mutex); > > return 0; > @@ -2719,6 +2752,22 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev) > } > } > > +static bool > +vhost_zero_copy_status_changed(struct netdev_dpdk *dev, bool *enable_zc) > +{ > + if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY) > + && !dev->dq_zc_enabled) { > + *enable_zc = true; > + return true; > + } else if (!(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY) > + && dev->dq_zc_enabled) { > + *enable_zc = false; > + return true; > + } > + > + return false; > +} > + > /* > * Remove a virtio-net device from the specific vhost port. Use dev->remove > * flag to stop any more packets from being sent or received to/from a VM and > @@ -2764,6 +2813,7 @@ destroy_device(int vid) > */ > ovsrcu_quiesce_start(); > VLOG_INFO("vHost Device '%s' has been removed", ifname); > + netdev_request_reconfigure(&dev->up); > } else { > VLOG_INFO("vHost Device '%s' not found", ifname); > } > @@ -3278,24 +3328,39 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) > { > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > int err; > - uint64_t vhost_flags = 0; > + uint64_t vhost_flags = dev->vhost_driver_flags; > + bool enable_zc = false; > + bool zc_status_changed = false; > > ovs_mutex_lock(&dev->mutex); > > - /* Configure vHost client mode if requested and if the following criteria > - * are met: > - * 1. Device hasn't been registered yet. > - * 2. A path has been specified. > + zc_status_changed = vhost_zero_copy_status_changed(dev, &enable_zc); > + > + /* Reconfigure the device if either: > + * 1. Device hasn't been registered and a path has been specified. > + * 2. Requested zero copy mode has changed from current configuration. > */ > - if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT) > - && strlen(dev->vhost_id)) { > - /* Register client-mode device. */ > + if ((!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT) > + && strlen(dev->vhost_id)) > + || zc_status_changed) { > + > vhost_flags |= RTE_VHOST_USER_CLIENT; > > /* Enable IOMMU support, if explicitly requested. */ > if (dpdk_vhost_iommu_enabled()) { > vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT; > } > + > + /* Device has already been registered. Need to unregister */ > + if (dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT) { > + err = rte_vhost_driver_unregister(dev->vhost_id); This causes deadlock if VM will be started between setting "dq-zero-copy=true" and actual reconfiguration of the netdev but before the "new_device" callback. Can be easily reproduced by inserting of some valuable sleep before taking the mutex in 'netdev_dpdk_vhost_client_reconfigure' (for example, sleep(10)). Running the following command with this change will cause abort on double locking of the 'dev->mutex': ovs-vsctl --no-wait set interface vhost1 options:dq-zero-copy=true && \ sleep 2 && ./run_vm.sh 2017-12-26T07:36:43Z|00169|netdev_dpdk|INFO|vHost Device './vhost1' has been added on numa node 0 2017-12-26T07:36:43Z|00170|dpdk|INFO|VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL 2017-12-26T07:36:43Z|00171|dpdk|INFO|VHOST_CONFIG: vring call idx:7 file:89 <...> 2017-12-26T07:36:47Z|00221|dpdk|INFO|VHOST_CONFIG: free connfd = 67 for device './vhost1' ovs-vswitchd: lib/netdev-dpdk.c:2790: pthread_mutex_lock failed (Resource deadlock avoided) Program received signal SIGABRT, Aborted. 0x0000007fb7a79340 in raise () from /lib64/libc.so.6 (gdb) bt #0 raise () from /lib64/libc.so.6 #1 abort () from /lib64/libc.so.6 #2 ovs_abort_valist (err_no=0, format=0xbe81 , args=...) at lib/util.c:347 #3 ovs_abort (err_no=0, format=0xbe81) at lib/util.c:339 #4 ovs_mutex_lock_at (l_=<optimized out>, where=0x91154a "lib/netdev-dpdk.c:2790") at lib/ovs-thread.c:76 #5 destroy_device (vid=<optimized out>) at lib/netdev-dpdk.c:2790 #6 vhost_destroy_device (vid=0) at lib/librte_vhost/vhost.c:335 #7 rte_vhost_driver_unregister (path=0x7f4a41cf80 "./vhost1") at lib/librte_vhost/socket.c:771 #8 netdev_dpdk_vhost_client_reconfigure (netdev=<optimized out>) at lib/netdev-dpdk.c:3362 #9 port_reconfigure (port=<optimized out>) at lib/dpif-netdev.c:3276 #10 reconfigure_datapath (dp=0xd6e420) at lib/dpif-netdev.c:3757 #11 dpif_netdev_run (dpif=<optimized out>) at lib/dpif-netdev.c:3895 #12 type_run (type=<optimized out>) at ofproto/ofproto-dpif.c:335 #13 ofproto_type_run (datapath_type=0xd37580 "netdev") at ofproto/ofproto.c:1702 #14 bridge_run__ () at vswitchd/bridge.c:2913 #15 bridge_reconfigure (ovs_cfg=<optimized out>) at vswitchd/bridge.c:719 #16 bridge_run () at vswitchd/bridge.c:2998 #17 main (argc=<optimized out>, argv=<optimized out>) at vswitchd/ovs-vswitchd.c:119 There are few more complex deadlocks possible due to race condition between VM start and unregistering of vhost device. Right now I see no way to safely call 'rte_vhost_driver_unregister' under the lock. See commit 3f891bbea61d ("netdev-dpdk: Fix deadlock in destroy_device().") for similar issue with 'netdev_dpdk_vhost_destruct()'. Best regards, Ilya Maximets. > + > + if (err) { > + VLOG_ERR("Error unregistering vHost socket %s", dev->vhost_id); > + goto unlock; > + } > + } > + > err = rte_vhost_driver_register(dev->vhost_id, vhost_flags); > if (err) { > VLOG_ERR("vhost-user device setup failure for device %s\n", > @@ -3333,6 +3398,18 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) > "client port: %s\n", dev->up.name); > goto unlock; > } > + > + if (zc_status_changed) { > + if (enable_zc) { > + dev->dq_zc_enabled = true; > + VLOG_INFO("Zero copy enabled for vHost port %s", > + dev->up.name); > + } else { > + dev->dq_zc_enabled = false; > + VLOG_INFO("Zero copy disabled for vHost port %s", > + dev->up.name); > + } > + } > } > > err = dpdk_vhost_reconfigure_helper(dev); > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index 018d644..e51795c 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -2669,6 +2669,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ > </p> > </column> > > + <column name="options" key="dq-zero-copy" > + type='{"type": "boolean"}'> > + <p> > + The value specifies whether or not to enable dequeue zero copy on > + the given interface. > + The port must be in an inactive state in order to enable or disable > + this feature. One must reload the guest driver or reboot the VM if > + they wish to enable/disable the feature when the device is active. > + Only supported by dpdkvhostuserclient interfaces. > + The feature is considered experimental. > + </p> > + </column> > + > <column name="options" key="n_rxq_desc" > type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'> > <p> >
diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index 3fecb5c..087eb88 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -518,6 +518,8 @@ The above command sets the number of rx queues for DPDK physical interface. The rx queues are assigned to pmd threads on the same NUMA node in a round-robin fashion. +.. _dpdk-queues-sizes: + DPDK Physical Port Queue Sizes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst index 8447e2d..d82ca40 100644 --- a/Documentation/topics/dpdk/vhost-user.rst +++ b/Documentation/topics/dpdk/vhost-user.rst @@ -458,3 +458,74 @@ Sample XML </domain> .. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD + +vhost-user Dequeue Zero Copy (experimental) +------------------------------------------- + +Normally when dequeuing a packet from a vHost User device, a memcpy operation +must be used to copy that packet from guest address space to host address +space. This memcpy can be removed by enabling dequeue zero-copy like so:: + + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true + +With this feature enabled, a reference (pointer) to the packet is passed to +the host, instead of a copy of the packet. Removing this memcpy can give a +performance improvement for some use cases, for example switching large packets +between different VMs. However additional packet loss may be observed. + +Note that the feature is disabled by default and must be explicitly enabled +by using the command above. + +The feature cannot be enabled when the device is active. If the device is +active when enabling the feature via ovs-vsctl, a driver reload or reboot +is required for the changes to take effect. + +The same logic applies for disabling the feature - it must be disabled when +the device is inactive. To disable the feature:: + + $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=false + +The feature is only available to dpdkvhostuserclient port types. + +A limitation exists whereby if packets from a vHost port with dq-zero-copy=true +are destined for a 'dpdk' type port, the number of tx descriptors (n_txq_desc) +for that port must be reduced to a smaller number, 128 being the recommended +value. This can be achieved by issuing the following command:: + + $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128 + +Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send to +should not exceed 128. For example, in case of a bond over two physical ports +in balance-tcp mode, one must divide 128 by the number of links in the bond. + +Refer to :ref:`<dpdk-queue-sizes>` for more information. + +The reason for this limitation is due to how the zero copy functionality is +implemented. The vHost device's 'tx used vring', a virtio structure used for +tracking used ie. sent descriptors, will only be updated when the NIC frees +the corresponding mbuf. If we don't free the mbufs frequently enough, that +vring will be starved and packets will no longer be processed. One way to +ensure we don't encounter this scenario, is to configure n_txq_desc to a small +enough number such that the 'mbuf free threshold' for the NIC will be hit more +often and thus free mbufs more frequently. The value of 128 is suggested, but +values of 64 and 256 have been tested and verified to work too, with differing +performance characteristics. A value of 512 can be used too, if the virtio +queue size in the guest is increased to 1024 (available to configure in QEMU +versions v2.10 and greater). This value can be set like so:: + + $ qemu-system-x86_64 ... -chardev socket,id=char1,path=<sockpath>,server + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1, + tx_queue_size=1024 + +Because of this limitation, this feature is condsidered 'experimental'. + +The feature currently does not fully work with QEMU >= v2.7 due to a bug in +DPDK which will be addressed in an upcoming release. The patch to fix this +issue can be found on +`Patchwork +<http://dpdk.org/dev/patchwork/patch/32198/>`__ + +Further information can be found in the +`DPDK documentation +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__ diff --git a/NEWS b/NEWS index 752e98f..1eaa32c 100644 --- a/NEWS +++ b/NEWS @@ -24,6 +24,7 @@ Post-v2.8.0 - DPDK: * Add support for DPDK v17.11 * Add support for vHost IOMMU + * Add support for vHost dequeue zero copy (experimental) v2.8.0 - 31 Aug 2017 -------------------- diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 8f22264..4b82a06 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -378,7 +378,10 @@ struct netdev_dpdk { /* True if vHost device is 'up' and has been reconfigured at least once */ bool vhost_reconfigured; - /* 3 pad bytes here. */ + + /* True if dq-zero-copy feature has successfully been enabled */ + bool dq_zc_enabled; + /* 2 pad bytes here. */ ); PADDED_MEMBERS(CACHE_LINE_SIZE, @@ -889,6 +892,7 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no, dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); ovsrcu_index_init(&dev->vid, -1); dev->vhost_reconfigured = false; + dev->dq_zc_enabled = false; dev->attached = false; ovsrcu_init(&dev->qos_conf, NULL); @@ -1371,6 +1375,27 @@ netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap *args, return 0; } +static bool +dpdk_vhost_check_zero_copy_config(struct netdev_dpdk *dev, + const struct smap *args) +{ + bool needs_reconfigure = false; + bool zc_requested = smap_get_bool(args, "dq-zero-copy", false); + + if (zc_requested && + !(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { + dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + needs_reconfigure = true; + } else if (!zc_requested && + (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) { + dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY; + needs_reconfigure = true; + } + + /* Only try to change ZC mode when device is down */ + return needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1); +} + static int netdev_dpdk_vhost_client_set_config(struct netdev *netdev, const struct smap *args, @@ -1378,15 +1403,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev, { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); const char *path; + bool needs_reconfigure = false; ovs_mutex_lock(&dev->mutex); if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)) { path = smap_get(args, "vhost-server-path"); if (path && strcmp(path, dev->vhost_id)) { strcpy(dev->vhost_id, path); - netdev_request_reconfigure(netdev); + needs_reconfigure = true; } } + + needs_reconfigure |= dpdk_vhost_check_zero_copy_config(dev, args); + + if (needs_reconfigure) { + netdev_request_reconfigure(netdev); + } + ovs_mutex_unlock(&dev->mutex); return 0; @@ -2719,6 +2752,22 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev) } } +static bool +vhost_zero_copy_status_changed(struct netdev_dpdk *dev, bool *enable_zc) +{ + if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY) + && !dev->dq_zc_enabled) { + *enable_zc = true; + return true; + } else if (!(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY) + && dev->dq_zc_enabled) { + *enable_zc = false; + return true; + } + + return false; +} + /* * Remove a virtio-net device from the specific vhost port. Use dev->remove * flag to stop any more packets from being sent or received to/from a VM and @@ -2764,6 +2813,7 @@ destroy_device(int vid) */ ovsrcu_quiesce_start(); VLOG_INFO("vHost Device '%s' has been removed", ifname); + netdev_request_reconfigure(&dev->up); } else { VLOG_INFO("vHost Device '%s' not found", ifname); } @@ -3278,24 +3328,39 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int err; - uint64_t vhost_flags = 0; + uint64_t vhost_flags = dev->vhost_driver_flags; + bool enable_zc = false; + bool zc_status_changed = false; ovs_mutex_lock(&dev->mutex); - /* Configure vHost client mode if requested and if the following criteria - * are met: - * 1. Device hasn't been registered yet. - * 2. A path has been specified. + zc_status_changed = vhost_zero_copy_status_changed(dev, &enable_zc); + + /* Reconfigure the device if either: + * 1. Device hasn't been registered and a path has been specified. + * 2. Requested zero copy mode has changed from current configuration. */ - if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT) - && strlen(dev->vhost_id)) { - /* Register client-mode device. */ + if ((!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT) + && strlen(dev->vhost_id)) + || zc_status_changed) { + vhost_flags |= RTE_VHOST_USER_CLIENT; /* Enable IOMMU support, if explicitly requested. */ if (dpdk_vhost_iommu_enabled()) { vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT; } + + /* Device has already been registered. Need to unregister */ + if (dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT) { + err = rte_vhost_driver_unregister(dev->vhost_id); + + if (err) { + VLOG_ERR("Error unregistering vHost socket %s", dev->vhost_id); + goto unlock; + } + } + err = rte_vhost_driver_register(dev->vhost_id, vhost_flags); if (err) { VLOG_ERR("vhost-user device setup failure for device %s\n", @@ -3333,6 +3398,18 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) "client port: %s\n", dev->up.name); goto unlock; } + + if (zc_status_changed) { + if (enable_zc) { + dev->dq_zc_enabled = true; + VLOG_INFO("Zero copy enabled for vHost port %s", + dev->up.name); + } else { + dev->dq_zc_enabled = false; + VLOG_INFO("Zero copy disabled for vHost port %s", + dev->up.name); + } + } } err = dpdk_vhost_reconfigure_helper(dev); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 018d644..e51795c 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -2669,6 +2669,19 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \ </p> </column> + <column name="options" key="dq-zero-copy" + type='{"type": "boolean"}'> + <p> + The value specifies whether or not to enable dequeue zero copy on + the given interface. + The port must be in an inactive state in order to enable or disable + this feature. One must reload the guest driver or reboot the VM if + they wish to enable/disable the feature when the device is active. + Only supported by dpdkvhostuserclient interfaces. + The feature is considered experimental. + </p> + </column> + <column name="options" key="n_rxq_desc" type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'> <p>
Enabled per port like so: ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true If the device is already active at the point of enabling/disabling the feature via ovs-vsctl, one must reload the guest driver or reboot the VM in order for the feature to be successfully enabled/disabled. When packets from a vHost device with zero copy enabled are destined for a single 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be set to a smaller value. 128 is recommended. This can be achieved like so: ovs-vsctl set Interface dpdkport options:n_txq_desc=128 Note: The sum of the tx descriptors of all 'dpdk' ports the VM will send to should not exceed 128. Due to this requirement, the feature is considered 'experimental'. Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> --- v7: * Remove support for zc for dpdkvhostuser ports & patch 1 in the series. * Remove entry from howto Documentation as the document in its current state is only relevant to dpdkvhostuser ports which do not support the feature. * Use ref: link to queue size documentation. * Elaborate on details of NIC descriptors limitation eg. cases where packets are destined to >1 NICs. * Fix comment about padding in netdev-dpdk struct. Documentation/intro/install/dpdk.rst | 2 + Documentation/topics/dpdk/vhost-user.rst | 71 +++++++++++++++++++++++ NEWS | 1 + lib/netdev-dpdk.c | 97 ++++++++++++++++++++++++++++---- vswitchd/vswitch.xml | 13 +++++ 5 files changed, 174 insertions(+), 10 deletions(-)