[ovs-dev,v2,2/2] netdev-dpdk: Enable optional dequeue zero copy for vHost User

Message ID 1507731713-9302-3-git-send-email-ciara.loftus@intel.com
State New
Headers show
Series
  • vHost Dequeue Zero Copy
Related show

Commit Message

Loftus, Ciara Oct. 11, 2017, 2:21 p.m.
Enabled per port like so:
ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true

The feature is disabled by default and can only be enabled/disabled when
a vHost port is down.

When packets from a vHost device with zero copy enabled are destined for
a 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be
set to a smaller value. 128 is recommended. This can be achieved like
so:

ovs-vsctl set Interface dpdkport options:n_txq_desc=128

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 Documentation/howto/dpdk.rst             | 29 +++++++++++
 Documentation/topics/dpdk/vhost-user.rst | 35 +++++++++++++
 NEWS                                     |  3 ++
 lib/netdev-dpdk.c                        | 89 +++++++++++++++++++++++++++++++-
 vswitchd/vswitch.xml                     | 11 ++++
 5 files changed, 166 insertions(+), 1 deletion(-)

Comments

Stokes, Ian Oct. 15, 2017, 1:52 p.m. | #1
Thanks for the v2 Ciara. Comments inline.


> Enabled per port like so:
> ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
> 
> The feature is disabled by default and can only be enabled/disabled when a
> vHost port is down.
> 
> When packets from a vHost device with zero copy enabled are destined for a
> 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be set
> to a smaller value. 128 is recommended. This can be achieved like
> so:
> 
> ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> 
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>  Documentation/howto/dpdk.rst             | 29 +++++++++++
>  Documentation/topics/dpdk/vhost-user.rst | 35 +++++++++++++
>  NEWS                                     |  3 ++
>  lib/netdev-dpdk.c                        | 89
> +++++++++++++++++++++++++++++++-
>  vswitchd/vswitch.xml                     | 11 ++++
>  5 files changed, 166 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index d123819..d149a8e 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -709,3 +709,32 @@ devices to bridge ``br0``. Once complete, follow the
> below steps:
>     Check traffic on multiple queues::
> 
>         $ cat /proc/interrupts | grep virtio
> +
> +PHY-VM-PHY (vHost Dequeue Zero Copy)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Title underline is too short here, will cause an error on compilation.

> +
> +vHost dequeue zero copy functionality can  be validated using the
> +PHY-VM-PHY configuration. To begin, follow the steps described in
> +:ref:`dpdk-phy-phy` to create and initialize the database, start
> +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices
> +and flows to bridge ``br0``. Once complete, follow the below steps:
> +
> +1. Enable dequeue zero copy on the vHost devices.
> +
> +       $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-copy=true
> +       $ ovs-vsctl set Interface dpdkvhostuser1
> + options:dq-zero-copy=true
> +
> +2. Reduce the number of txq descriptors on the phy ports.
> +
> +       $ ovs-vsctl set Interface phy0 options:n_txq_desc=128
> +       $ ovs-vsctl set Interface phy1 options:n_txq_desc=128
> +
> +3. Proceed with the test by launching the VM and configuring guest
> +forwarding, be it via the vHost loopback method or kernel forwarding
> +method, and sending traffic.
> +
> +It is essential that step 1 is performed before booting the VM,
> +otherwise the feature will not be enabled.
> +
> +Check for the log "VHOST_CONFIG: dequeue zero copy is enabled" to
> +ensure the successful enablement of the feature.

Minor nit but the info message I saw in the log during testing differed from what's above, it was as follows:

netdev_dpdk|INFO|Zero copy enabled for vHost socket /usr/local/var/run/openvswitch/vhost-user0

Consider changing the message in the docs to reflect this.

> diff --git a/Documentation/topics/dpdk/vhost-user.rst
> b/Documentation/topics/dpdk/vhost-user.rst
> index 74ac06e..267ccaf 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -408,3 +408,38 @@ Sample XML
>      </domain>
> 
>  .. _QEMU documentation: http://git.qemu-
> project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-
> user.txt;h=7890d7169;hb=HEAD
> +
> +vhost-user Dequeue Zero Copy
> +-------------------------------------
> +
> +Normally when dequeuing a packet from a vHost User device, a memcpy
> +operation must be used to copy that packet from guest address space to
> +host address space. This memcpy can be removed by enabling dequeue zero-
> copy like so:
> +
> +    $ ovs-vsctl set Interface dpdkvhostuserclient0
> + options:dq-zero-copy=true
> +
> +With this feature enabled, a reference (pointer) to the packet is
> +passed to the host, instead of a copy of the packet. Removing this
> +memcpy can give a performance improvement for some use cases, for
> +example switching large packets between different VMs.
> +
> +Note that the feature is disabled by default and must be explicitly
> +enabled by using the command above.
> +
> +The feature cannot be enabled when the device is active (ie. VM
> +booted). If you wish to enable the feature after the VM has booted, you
> +must shutdown the VM and bring it back up.

Does it make sense to explain how the feature can be disabled on a port that it was previously enabled on?

I don't see a command or an example of something like $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=false.

Disabling the feature and the steps involved would be good for completeness of the documentation.

> +
> +The feature is available to both dpdkvhostuser and dpdkvhostuserclient
> +port types.
> +
> +A limitation exists whereby if packets from a vHost port with
> +dq-zero-copy=true are destined for a 'dpdk' type port, the number of tx
> +descriptors (n_txq_desc) for that port must be reduced to a smaller
> +number, 128 being the recommended value. This can be achieved by issuing
> the following command:
> +
> +    $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> +

I'd like to see this limitation expanded upon.

In testing a VM -> Phy use case with testpmd in txmode in the VM, if you don't reduce the descriptors (i.e. use the default settings on the phy port) I saw that no traffic was being transmitted from the phy port. Experimenting with different values for the num descriptors yields difference performance levels i.e. 64 desc gave higher throughput in this simple case, and increasing the number of descriptors (128,256 etc.) reduced the throughput level.

I can see a user experimenting with this but from the docs it's not clear what's going on under the hood and why 128 should be selected with zero copy enabled, what factors impact on this selection?

A quick high level explanation would be good here with maybe a link to the DPDK docs for descriptors if its available.

> +Further information can be found in the `DPDK documentation
> +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
> diff --git a/NEWS b/NEWS
> index a3c1a02..832a348 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -5,6 +5,9 @@ Post-v2.8.0
>         chassis "hostname" in addition to a chassis "name".
>     - Linux kernel 4.13
>       * Add support for compiling OVS with the latest Linux 4.13 kernel
> +   - DPDK:
> +     * Optional dequeue zero copy feature for vHost ports enabled per
> port
> +       via the boolean 'dq-zero-copy' option.
> 
>  v2.8.0 - xx xxx xxxx
>  ---------------------
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0b910cd..930b20e
> 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -379,6 +379,9 @@ struct netdev_dpdk {
>      /* True if vHost device is 'up' and has been reconfigured at least
> once */
>      bool vhost_reconfigured;
> 
> +    /* True if dq-zero-copy feature has successfully been enabled */
> +    bool dq_zc_enabled;
> +
>      /* Identifier used to distinguish vhost devices from each other. */
>      char vhost_id[PATH_MAX];
> 
> @@ -905,6 +908,7 @@ common_construct(struct netdev *netdev, dpdk_port_t
> port_no,
>      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>      ovsrcu_index_init(&dev->vid, -1);
>      dev->vhost_reconfigured = false;
> +    dev->dq_zc_enabled = false;
>      dev->attached = false;
> 
>      ovsrcu_init(&dev->qos_conf, NULL);
> @@ -1413,6 +1417,29 @@ netdev_dpdk_ring_set_config(struct netdev *netdev,
> const struct smap *args,
>      return 0;
>  }
> 
> +static void
> +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev,
> +                             const struct smap *args) {
> +    bool needs_reconfigure = false;
> +    bool zc_requested = smap_get_bool(args, "dq-zero-copy", false);
> +
> +    if (zc_requested &&
> +            !(dev->vhost_driver_flags &
> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
> +        dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> +        needs_reconfigure = true;
> +    } else if (!zc_requested &&
> +            (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY))
> {
> +        dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> +        needs_reconfigure = true;
> +    }
> +
> +    /* Only try to change ZC mode when device is down */
> +    if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) {
> +        netdev_request_reconfigure(&dev->up);
> +    }
> +}
> +
>  static int
>  netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
>                                      const struct smap *args, @@ -1429,6
> +1456,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
>              netdev_request_reconfigure(netdev);
>          }
>      }
> +
> +    dpdk_vhost_set_config_helper(dev, args);
> +
> +    ovs_mutex_unlock(&dev->mutex);
> +
> +    return 0;
> +}
> +
> +static int
> +netdev_dpdk_vhost_set_config(struct netdev *netdev,
> +                             const struct smap *args,
> +                             char **errp OVS_UNUSED) {
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +
> +    ovs_mutex_lock(&dev->mutex);
> +    dpdk_vhost_set_config_helper(dev, args);
>      ovs_mutex_unlock(&dev->mutex);
> 
>      return 0;
> @@ -2753,6 +2797,46 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev)
>      }
>  }
> 
> +static void
> +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool client_mode,
> +                            bool enable) {
> +    int err = rte_vhost_driver_unregister(dev->vhost_id);
> +
> +    if (err) {
> +        VLOG_ERR("Error unregistering vHost socket %s; can't change zero
> copy "
> +                "mode", dev->vhost_id);
> +    } else {
> +        err = dpdk_setup_vhost_device(dev, client_mode);
> +        if (err) {
> +            VLOG_ERR("Error changing zero copy mode for vHost socket %s",
> +                    dev->vhost_id);
> +        } else if (enable) {
> +            dev->dq_zc_enabled = true;
> +            VLOG_INFO("Zero copy enabled for vHost socket %s", dev-
> >vhost_id);
> +        } else {
> +            dev->dq_zc_enabled = false;
> +            VLOG_INFO("Zero copy disabled for vHost socket %s", dev-
> >vhost_id);
> +        }
> +    }
> +}
> +
> +static void
> +vhost_check_zero_copy_status(struct netdev_dpdk *dev) {
> +    bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT;
> +
> +    if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
> +                && !dev->dq_zc_enabled) {
> +        /* ZC disabled but requested to be enabled, enable it. */
> +        vhost_change_zero_copy_mode(dev, mode, true);
> +    } else if (!(dev->vhost_driver_flags &
> +            RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev->dq_zc_enabled) {
> +        /* ZC enabled but requested to be disabled, disable it. */
> +        vhost_change_zero_copy_mode(dev, mode, false);
> +    }
> +}
> +
>  /*
>   * Remove a virtio-net device from the specific vhost port.  Use dev-
> >remove
>   * flag to stop any more packets from being sent or received to/from a VM
> and @@ -2798,6 +2882,7 @@ destroy_device(int vid)
>           */
>          ovsrcu_quiesce_start();
>          VLOG_INFO("vHost Device '%s' has been removed", ifname);
> +        netdev_request_reconfigure(&dev->up);
>      } else {
>          VLOG_INFO("vHost Device '%s' not found", ifname);
>      }
> @@ -3289,6 +3374,8 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk
> *dev)
>              /* Carrier status may need updating. */
>              netdev_change_seq_changed(&dev->up);
>          }
> +    } else {
> +        vhost_check_zero_copy_status(dev);
>      }
> 
>      return 0;
> @@ -3450,7 +3537,7 @@ static const struct netdev_class dpdk_vhost_class =
>          NULL,
>          netdev_dpdk_vhost_construct,
>          netdev_dpdk_vhost_destruct,
> -        NULL,
> +        netdev_dpdk_vhost_set_config,
>          NULL,
>          netdev_dpdk_vhost_send,
>          netdev_dpdk_vhost_get_carrier,
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> 074535b..660b46f 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -2621,6 +2621,17 @@
>          </p>
>        </column>
> 
> +      <column name="options" key="dq-zero-copy"
> +              type='{"type": "boolean"}'>
> +        <p>
> +          The value specifies whether or not to enable dequeue zero copy
> on
> +          the given interface.
> +          The port must be in an inactive state in order to enable or
> disable
> +          this feature.
> +          Only supported by dpdkvhostuserclient and dpdkvhostuser
> interfaces.
> +        </p>
> +      </column>
> +
>        <column name="options" key="n_rxq_desc"
>                type='{"type": "integer", "minInteger": 1, "maxInteger":
> 4096}'>
>          <p>
> --
> 2.7.5
Loftus, Ciara Oct. 16, 2017, 11:11 a.m. | #2
> 
> Thanks for the v2 Ciara. Comments inline.

Thanks for your review Ian. Hope to send a v3 soon. Responses inline.

Thanks,
Ciara

> 
> 
> > Enabled per port like so:
> > ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
> >
> > The feature is disabled by default and can only be enabled/disabled when a
> > vHost port is down.
> >
> > When packets from a vHost device with zero copy enabled are destined for
> a
> > 'dpdk' port, the number of tx descriptors on that 'dpdk' port must be set
> > to a smaller value. 128 is recommended. This can be achieved like
> > so:
> >
> > ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> >
> > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> > ---
> >  Documentation/howto/dpdk.rst             | 29 +++++++++++
> >  Documentation/topics/dpdk/vhost-user.rst | 35 +++++++++++++
> >  NEWS                                     |  3 ++
> >  lib/netdev-dpdk.c                        | 89
> > +++++++++++++++++++++++++++++++-
> >  vswitchd/vswitch.xml                     | 11 ++++
> >  5 files changed, 166 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/howto/dpdk.rst
> b/Documentation/howto/dpdk.rst
> > index d123819..d149a8e 100644
> > --- a/Documentation/howto/dpdk.rst
> > +++ b/Documentation/howto/dpdk.rst
> > @@ -709,3 +709,32 @@ devices to bridge ``br0``. Once complete, follow
> the
> > below steps:
> >     Check traffic on multiple queues::
> >
> >         $ cat /proc/interrupts | grep virtio
> > +
> > +PHY-VM-PHY (vHost Dequeue Zero Copy)
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Title underline is too short here, will cause an error on compilation.

Ok.

> 
> > +
> > +vHost dequeue zero copy functionality can  be validated using the
> > +PHY-VM-PHY configuration. To begin, follow the steps described in
> > +:ref:`dpdk-phy-phy` to create and initialize the database, start
> > +ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices
> > +and flows to bridge ``br0``. Once complete, follow the below steps:
> > +
> > +1. Enable dequeue zero copy on the vHost devices.
> > +
> > +       $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-copy=true
> > +       $ ovs-vsctl set Interface dpdkvhostuser1
> > + options:dq-zero-copy=true
> > +
> > +2. Reduce the number of txq descriptors on the phy ports.
> > +
> > +       $ ovs-vsctl set Interface phy0 options:n_txq_desc=128
> > +       $ ovs-vsctl set Interface phy1 options:n_txq_desc=128
> > +
> > +3. Proceed with the test by launching the VM and configuring guest
> > +forwarding, be it via the vHost loopback method or kernel forwarding
> > +method, and sending traffic.
> > +
> > +It is essential that step 1 is performed before booting the VM,
> > +otherwise the feature will not be enabled.
> > +
> > +Check for the log "VHOST_CONFIG: dequeue zero copy is enabled" to
> > +ensure the successful enablement of the feature.
> 
> Minor nit but the info message I saw in the log during testing differed from
> what's above, it was as follows:
> 
> netdev_dpdk|INFO|Zero copy enabled for vHost socket
> /usr/local/var/run/openvswitch/vhost-user0
> 
> Consider changing the message in the docs to reflect this.

There are two logs. The one you mentioned happens when you enable the option in the OVSDB. The second one that's in the documentation is a DPDK log that is reported when the feature is enabled on the device, which is the one you need to look out for. I'll update the documentation to reflect this point as it could be confusing.

> 
> > diff --git a/Documentation/topics/dpdk/vhost-user.rst
> > b/Documentation/topics/dpdk/vhost-user.rst
> > index 74ac06e..267ccaf 100644
> > --- a/Documentation/topics/dpdk/vhost-user.rst
> > +++ b/Documentation/topics/dpdk/vhost-user.rst
> > @@ -408,3 +408,38 @@ Sample XML
> >      </domain>
> >
> >  .. _QEMU documentation: http://git.qemu-
> > project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-
> > user.txt;h=7890d7169;hb=HEAD
> > +
> > +vhost-user Dequeue Zero Copy
> > +-------------------------------------
> > +
> > +Normally when dequeuing a packet from a vHost User device, a memcpy
> > +operation must be used to copy that packet from guest address space to
> > +host address space. This memcpy can be removed by enabling dequeue
> zero-
> > copy like so:
> > +
> > +    $ ovs-vsctl set Interface dpdkvhostuserclient0
> > + options:dq-zero-copy=true
> > +
> > +With this feature enabled, a reference (pointer) to the packet is
> > +passed to the host, instead of a copy of the packet. Removing this
> > +memcpy can give a performance improvement for some use cases, for
> > +example switching large packets between different VMs.
> > +
> > +Note that the feature is disabled by default and must be explicitly
> > +enabled by using the command above.
> > +
> > +The feature cannot be enabled when the device is active (ie. VM
> > +booted). If you wish to enable the feature after the VM has booted, you
> > +must shutdown the VM and bring it back up.
> 
> Does it make sense to explain how the feature can be disabled on a port that
> it was previously enabled on?
> 
> I don't see a command or an example of something like $ ovs-vsctl set
> Interface dpdkvhostuserclient0 options:dq-zero-copy=false.
> 
> Disabling the feature and the steps involved would be good for
> completeness of the documentation.

Ok.

> 
> > +
> > +The feature is available to both dpdkvhostuser and dpdkvhostuserclient
> > +port types.
> > +
> > +A limitation exists whereby if packets from a vHost port with
> > +dq-zero-copy=true are destined for a 'dpdk' type port, the number of tx
> > +descriptors (n_txq_desc) for that port must be reduced to a smaller
> > +number, 128 being the recommended value. This can be achieved by
> issuing
> > the following command:
> > +
> > +    $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
> > +
> 
> I'd like to see this limitation expanded upon.
> 
> In testing a VM -> Phy use case with testpmd in txmode in the VM, if you
> don't reduce the descriptors (i.e. use the default settings on the phy port) I
> saw that no traffic was being transmitted from the phy port. Experimenting
> with different values for the num descriptors yields difference performance
> levels i.e. 64 desc gave higher throughput in this simple case, and increasing
> the number of descriptors (128,256 etc.) reduced the throughput level.
> 
> I can see a user experimenting with this but from the docs it's not clear
> what's going on under the hood and why 128 should be selected with zero
> copy enabled, what factors impact on this selection?
> 
> A quick high level explanation would be good here with maybe a link to the
> DPDK docs for descriptors if its available.

Thanks for this testing Ian. I will update the documentation to include a reason why this limitation exists, and mention that the values you tested are valid too.

> 
> > +Further information can be found in the `DPDK documentation
> > +<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
> > diff --git a/NEWS b/NEWS
> > index a3c1a02..832a348 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -5,6 +5,9 @@ Post-v2.8.0
> >         chassis "hostname" in addition to a chassis "name".
> >     - Linux kernel 4.13
> >       * Add support for compiling OVS with the latest Linux 4.13 kernel
> > +   - DPDK:
> > +     * Optional dequeue zero copy feature for vHost ports enabled per
> > port
> > +       via the boolean 'dq-zero-copy' option.
> >
> >  v2.8.0 - xx xxx xxxx
> >  ---------------------
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0b910cd..930b20e
> > 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -379,6 +379,9 @@ struct netdev_dpdk {
> >      /* True if vHost device is 'up' and has been reconfigured at least
> > once */
> >      bool vhost_reconfigured;
> >
> > +    /* True if dq-zero-copy feature has successfully been enabled */
> > +    bool dq_zc_enabled;
> > +
> >      /* Identifier used to distinguish vhost devices from each other. */
> >      char vhost_id[PATH_MAX];
> >
> > @@ -905,6 +908,7 @@ common_construct(struct netdev *netdev,
> dpdk_port_t
> > port_no,
> >      dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> >      ovsrcu_index_init(&dev->vid, -1);
> >      dev->vhost_reconfigured = false;
> > +    dev->dq_zc_enabled = false;
> >      dev->attached = false;
> >
> >      ovsrcu_init(&dev->qos_conf, NULL);
> > @@ -1413,6 +1417,29 @@ netdev_dpdk_ring_set_config(struct netdev
> *netdev,
> > const struct smap *args,
> >      return 0;
> >  }
> >
> > +static void
> > +dpdk_vhost_set_config_helper(struct netdev_dpdk *dev,
> > +                             const struct smap *args) {
> > +    bool needs_reconfigure = false;
> > +    bool zc_requested = smap_get_bool(args, "dq-zero-copy", false);
> > +
> > +    if (zc_requested &&
> > +            !(dev->vhost_driver_flags &
> > RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
> > +        dev->vhost_driver_flags |=
> RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> > +        needs_reconfigure = true;
> > +    } else if (!zc_requested &&
> > +            (dev->vhost_driver_flags &
> RTE_VHOST_USER_DEQUEUE_ZERO_COPY))
> > {
> > +        dev->vhost_driver_flags &=
> ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> > +        needs_reconfigure = true;
> > +    }
> > +
> > +    /* Only try to change ZC mode when device is down */
> > +    if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) {
> > +        netdev_request_reconfigure(&dev->up);
> > +    }
> > +}
> > +
> >  static int
> >  netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
> >                                      const struct smap *args, @@ -1429,6
> > +1456,23 @@ netdev_dpdk_vhost_client_set_config(struct netdev
> *netdev,
> >              netdev_request_reconfigure(netdev);
> >          }
> >      }
> > +
> > +    dpdk_vhost_set_config_helper(dev, args);
> > +
> > +    ovs_mutex_unlock(&dev->mutex);
> > +
> > +    return 0;
> > +}
> > +
> > +static int
> > +netdev_dpdk_vhost_set_config(struct netdev *netdev,
> > +                             const struct smap *args,
> > +                             char **errp OVS_UNUSED) {
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > +
> > +    ovs_mutex_lock(&dev->mutex);
> > +    dpdk_vhost_set_config_helper(dev, args);
> >      ovs_mutex_unlock(&dev->mutex);
> >
> >      return 0;
> > @@ -2753,6 +2797,46 @@ netdev_dpdk_txq_map_clear(struct
> netdev_dpdk *dev)
> >      }
> >  }
> >
> > +static void
> > +vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool
> client_mode,
> > +                            bool enable) {
> > +    int err = rte_vhost_driver_unregister(dev->vhost_id);
> > +
> > +    if (err) {
> > +        VLOG_ERR("Error unregistering vHost socket %s; can't change zero
> > copy "
> > +                "mode", dev->vhost_id);
> > +    } else {
> > +        err = dpdk_setup_vhost_device(dev, client_mode);
> > +        if (err) {
> > +            VLOG_ERR("Error changing zero copy mode for vHost socket %s",
> > +                    dev->vhost_id);
> > +        } else if (enable) {
> > +            dev->dq_zc_enabled = true;
> > +            VLOG_INFO("Zero copy enabled for vHost socket %s", dev-
> > >vhost_id);
> > +        } else {
> > +            dev->dq_zc_enabled = false;
> > +            VLOG_INFO("Zero copy disabled for vHost socket %s", dev-
> > >vhost_id);
> > +        }
> > +    }
> > +}
> > +
> > +static void
> > +vhost_check_zero_copy_status(struct netdev_dpdk *dev) {
> > +    bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT;
> > +
> > +    if ((dev->vhost_driver_flags &
> RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
> > +                && !dev->dq_zc_enabled) {
> > +        /* ZC disabled but requested to be enabled, enable it. */
> > +        vhost_change_zero_copy_mode(dev, mode, true);
> > +    } else if (!(dev->vhost_driver_flags &
> > +            RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev-
> >dq_zc_enabled) {
> > +        /* ZC enabled but requested to be disabled, disable it. */
> > +        vhost_change_zero_copy_mode(dev, mode, false);
> > +    }
> > +}
> > +
> >  /*
> >   * Remove a virtio-net device from the specific vhost port.  Use dev-
> > >remove
> >   * flag to stop any more packets from being sent or received to/from a VM
> > and @@ -2798,6 +2882,7 @@ destroy_device(int vid)
> >           */
> >          ovsrcu_quiesce_start();
> >          VLOG_INFO("vHost Device '%s' has been removed", ifname);
> > +        netdev_request_reconfigure(&dev->up);
> >      } else {
> >          VLOG_INFO("vHost Device '%s' not found", ifname);
> >      }
> > @@ -3289,6 +3374,8 @@ dpdk_vhost_reconfigure_helper(struct
> netdev_dpdk
> > *dev)
> >              /* Carrier status may need updating. */
> >              netdev_change_seq_changed(&dev->up);
> >          }
> > +    } else {
> > +        vhost_check_zero_copy_status(dev);
> >      }
> >
> >      return 0;
> > @@ -3450,7 +3537,7 @@ static const struct netdev_class dpdk_vhost_class
> =
> >          NULL,
> >          netdev_dpdk_vhost_construct,
> >          netdev_dpdk_vhost_destruct,
> > -        NULL,
> > +        netdev_dpdk_vhost_set_config,
> >          NULL,
> >          netdev_dpdk_vhost_send,
> >          netdev_dpdk_vhost_get_carrier,
> > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> > 074535b..660b46f 100644
> > --- a/vswitchd/vswitch.xml
> > +++ b/vswitchd/vswitch.xml
> > @@ -2621,6 +2621,17 @@
> >          </p>
> >        </column>
> >
> > +      <column name="options" key="dq-zero-copy"
> > +              type='{"type": "boolean"}'>
> > +        <p>
> > +          The value specifies whether or not to enable dequeue zero copy
> > on
> > +          the given interface.
> > +          The port must be in an inactive state in order to enable or
> > disable
> > +          this feature.
> > +          Only supported by dpdkvhostuserclient and dpdkvhostuser
> > interfaces.
> > +        </p>
> > +      </column>
> > +
> >        <column name="options" key="n_rxq_desc"
> >                type='{"type": "integer", "minInteger": 1, "maxInteger":
> > 4096}'>
> >          <p>
> > --
> > 2.7.5

Patch

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index d123819..d149a8e 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -709,3 +709,32 @@  devices to bridge ``br0``. Once complete, follow the below steps:
    Check traffic on multiple queues::
 
        $ cat /proc/interrupts | grep virtio
+
+PHY-VM-PHY (vHost Dequeue Zero Copy)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+vHost dequeue zero copy functionality can  be validated using the
+PHY-VM-PHY configuration. To begin, follow the steps described in
+:ref:`dpdk-phy-phy` to create and initialize the database, start
+ovs-vswitchd and add ``dpdk``-type and ``dpdkvhostuser``-type devices
+and flows to bridge ``br0``. Once complete, follow the below steps:
+
+1. Enable dequeue zero copy on the vHost devices.
+
+       $ ovs-vsctl set Interface dpdkvhostuser0 options:dq-zero-copy=true
+       $ ovs-vsctl set Interface dpdkvhostuser1 options:dq-zero-copy=true
+
+2. Reduce the number of txq descriptors on the phy ports.
+
+       $ ovs-vsctl set Interface phy0 options:n_txq_desc=128
+       $ ovs-vsctl set Interface phy1 options:n_txq_desc=128
+
+3. Proceed with the test by launching the VM and configuring guest
+forwarding, be it via the vHost loopback method or kernel forwarding
+method, and sending traffic.
+
+It is essential that step 1 is performed before booting the VM, otherwise
+the feature will not be enabled.
+
+Check for the log "VHOST_CONFIG: dequeue zero copy is enabled" to ensure
+the successful enablement of the feature.
diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
index 74ac06e..267ccaf 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -408,3 +408,38 @@  Sample XML
     </domain>
 
 .. _QEMU documentation: http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/specs/vhost-user.txt;h=7890d7169;hb=HEAD
+
+vhost-user Dequeue Zero Copy
+-------------------------------------
+
+Normally when dequeuing a packet from a vHost User device, a memcpy operation
+must be used to copy that packet from guest address space to host address
+space. This memcpy can be removed by enabling dequeue zero-copy like so:
+
+    $ ovs-vsctl set Interface dpdkvhostuserclient0 options:dq-zero-copy=true
+
+With this feature enabled, a reference (pointer) to the packet is passed to
+the host, instead of a copy of the packet. Removing this memcpy can give a
+performance improvement for some use cases, for example switching large packets
+between different VMs.
+
+Note that the feature is disabled by default and must be explicitly enabled
+by using the command above.
+
+The feature cannot be enabled when the device is active (ie. VM booted). If
+you wish to enable the feature after the VM has booted, you must shutdown
+the VM and bring it back up.
+
+The feature is available to both dpdkvhostuser and dpdkvhostuserclient port
+types.
+
+A limitation exists whereby if packets from a vHost port with dq-zero-copy=true
+are destined for a 'dpdk' type port, the number of tx descriptors (n_txq_desc)
+for that port must be reduced to a smaller number, 128 being the recommended
+value. This can be achieved by issuing the following command:
+
+    $ ovs-vsctl set Interface dpdkport options:n_txq_desc=128
+
+Further information can be found in the
+`DPDK documentation
+<http://dpdk.readthedocs.io/en/v17.05/prog_guide/vhost_lib.html>`__
diff --git a/NEWS b/NEWS
index a3c1a02..832a348 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,9 @@  Post-v2.8.0
        chassis "hostname" in addition to a chassis "name".
    - Linux kernel 4.13
      * Add support for compiling OVS with the latest Linux 4.13 kernel
+   - DPDK:
+     * Optional dequeue zero copy feature for vHost ports enabled per port
+       via the boolean 'dq-zero-copy' option.
 
 v2.8.0 - xx xxx xxxx
 ---------------------
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0b910cd..930b20e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -379,6 +379,9 @@  struct netdev_dpdk {
     /* True if vHost device is 'up' and has been reconfigured at least once */
     bool vhost_reconfigured;
 
+    /* True if dq-zero-copy feature has successfully been enabled */
+    bool dq_zc_enabled;
+
     /* Identifier used to distinguish vhost devices from each other. */
     char vhost_id[PATH_MAX];
 
@@ -905,6 +908,7 @@  common_construct(struct netdev *netdev, dpdk_port_t port_no,
     dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
     ovsrcu_index_init(&dev->vid, -1);
     dev->vhost_reconfigured = false;
+    dev->dq_zc_enabled = false;
     dev->attached = false;
 
     ovsrcu_init(&dev->qos_conf, NULL);
@@ -1413,6 +1417,29 @@  netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap *args,
     return 0;
 }
 
+static void
+dpdk_vhost_set_config_helper(struct netdev_dpdk *dev,
+                             const struct smap *args)
+{
+    bool needs_reconfigure = false;
+    bool zc_requested = smap_get_bool(args, "dq-zero-copy", false);
+
+    if (zc_requested &&
+            !(dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
+        dev->vhost_driver_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
+        needs_reconfigure = true;
+    } else if (!zc_requested &&
+            (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)) {
+        dev->vhost_driver_flags &= ~RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
+        needs_reconfigure = true;
+    }
+
+    /* Only try to change ZC mode when device is down */
+    if (needs_reconfigure && (netdev_dpdk_get_vid(dev) == -1)) {
+        netdev_request_reconfigure(&dev->up);
+    }
+}
+
 static int
 netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
                                     const struct smap *args,
@@ -1429,6 +1456,23 @@  netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
             netdev_request_reconfigure(netdev);
         }
     }
+
+    dpdk_vhost_set_config_helper(dev, args);
+
+    ovs_mutex_unlock(&dev->mutex);
+
+    return 0;
+}
+
+static int
+netdev_dpdk_vhost_set_config(struct netdev *netdev,
+                             const struct smap *args,
+                             char **errp OVS_UNUSED)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+
+    ovs_mutex_lock(&dev->mutex);
+    dpdk_vhost_set_config_helper(dev, args);
     ovs_mutex_unlock(&dev->mutex);
 
     return 0;
@@ -2753,6 +2797,46 @@  netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev)
     }
 }
 
+static void
+vhost_change_zero_copy_mode(struct netdev_dpdk *dev, bool client_mode,
+                            bool enable)
+{
+    int err = rte_vhost_driver_unregister(dev->vhost_id);
+
+    if (err) {
+        VLOG_ERR("Error unregistering vHost socket %s; can't change zero copy "
+                "mode", dev->vhost_id);
+    } else {
+        err = dpdk_setup_vhost_device(dev, client_mode);
+        if (err) {
+            VLOG_ERR("Error changing zero copy mode for vHost socket %s",
+                    dev->vhost_id);
+        } else if (enable) {
+            dev->dq_zc_enabled = true;
+            VLOG_INFO("Zero copy enabled for vHost socket %s", dev->vhost_id);
+        } else {
+            dev->dq_zc_enabled = false;
+            VLOG_INFO("Zero copy disabled for vHost socket %s", dev->vhost_id);
+        }
+    }
+}
+
+static void
+vhost_check_zero_copy_status(struct netdev_dpdk *dev)
+{
+    bool mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT;
+
+    if ((dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY)
+                && !dev->dq_zc_enabled) {
+        /* ZC disabled but requested to be enabled, enable it. */
+        vhost_change_zero_copy_mode(dev, mode, true);
+    } else if (!(dev->vhost_driver_flags &
+            RTE_VHOST_USER_DEQUEUE_ZERO_COPY) && dev->dq_zc_enabled) {
+        /* ZC enabled but requested to be disabled, disable it. */
+        vhost_change_zero_copy_mode(dev, mode, false);
+    }
+}
+
 /*
  * Remove a virtio-net device from the specific vhost port.  Use dev->remove
  * flag to stop any more packets from being sent or received to/from a VM and
@@ -2798,6 +2882,7 @@  destroy_device(int vid)
          */
         ovsrcu_quiesce_start();
         VLOG_INFO("vHost Device '%s' has been removed", ifname);
+        netdev_request_reconfigure(&dev->up);
     } else {
         VLOG_INFO("vHost Device '%s' not found", ifname);
     }
@@ -3289,6 +3374,8 @@  dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
             /* Carrier status may need updating. */
             netdev_change_seq_changed(&dev->up);
         }
+    } else {
+        vhost_check_zero_copy_status(dev);
     }
 
     return 0;
@@ -3450,7 +3537,7 @@  static const struct netdev_class dpdk_vhost_class =
         NULL,
         netdev_dpdk_vhost_construct,
         netdev_dpdk_vhost_destruct,
-        NULL,
+        netdev_dpdk_vhost_set_config,
         NULL,
         netdev_dpdk_vhost_send,
         netdev_dpdk_vhost_get_carrier,
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 074535b..660b46f 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -2621,6 +2621,17 @@ 
         </p>
       </column>
 
+      <column name="options" key="dq-zero-copy"
+              type='{"type": "boolean"}'>
+        <p>
+          The value specifies whether or not to enable dequeue zero copy on
+          the given interface.
+          The port must be in an inactive state in order to enable or disable
+          this feature.
+          Only supported by dpdkvhostuserclient and dpdkvhostuser interfaces.
+        </p>
+      </column>
+
       <column name="options" key="n_rxq_desc"
               type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'>
         <p>