[ovs-dev,v4,0/2] vHost Dequeue Zero Copy

Message ID	1509614611-4233-1-git-send-email-ciara.loftus@intel.com
Headers	show Return-Path: <ovs-dev-bounces@openvswitch.org> From: Ciara Loftus <ciara.loftus@intel.com> To: dev@openvswitch.org Date: Thu, 2 Nov 2017 09:23:29 +0000 Message-Id: <1509614611-4233-1-git-send-email-ciara.loftus@intel.com> Subject: [ovs-dev] [PATCH v4 0/2] vHost Dequeue Zero Copy Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org
Series	vHost Dequeue Zero Copy \| expand [ovs-dev,v4,0/2] vHost Dequeue Zero Copy [ovs-dev,v4,1/2] netdev-dpdk: Helper function for vHost device setup [ovs-dev,v4,2/2] netdev-dpdk: Enable optional dequeue zero copy for vHost User

Message ID

1509614611-4233-1-git-send-email-ciara.loftus@intel.com

Headers

From: Ciara Loftus <ciara.loftus@intel.com>
To: dev@openvswitch.org
Date: Thu,  2 Nov 2017 09:23:29 +0000
Message-Id: <1509614611-4233-1-git-send-email-ciara.loftus@intel.com>
Subject: [ovs-dev] [PATCH v4 0/2] vHost Dequeue Zero Copy
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org

Series

vHost Dequeue Zero Copy | expand

Message

Ciara Loftus Nov. 2, 2017, 9:23 a.m. UTC

This patch enables optional dequeue zero copy for vHost ports.
This gives a performance increase for some use cases. I'm using
the cover letter to report my results.

vhost (vm1) -> vhost (vm2)
Using testpmd to source (txonly) in vm1 and sink (rxonly) in vm2.
4C1Q 64B packets: 5.05Mpps -> 5.52Mpps = 9.2% improvement
256B:    4.69 vs 5.42 Mpps (+~16%)
512B:    4.04 vs 4.90 Mpps (+~21%)
1518B:    2.51 vs 3.05 Mpps (+~22%)

vhost (virtio_user backend 1) -> vhost (virtio_user backend 2)
Using 2 instances of testpmd, each with a virtio_user backend
connected to one of the two vhost ports created in OVS.
2C1Q 1518B packets: 2.59Mpps -> 3.09Mpps = 19.3% improvement

vhost -> phy
Using testpmd to source (txonly) and sink in the NIC
1C1Q 64B packets: 6.81Mpps -> 7.76Mpps = 13.9% improvement

phy -> vhost -> phy
No improvement measured

v4:
* Rebase

v3:
* Documentation updates:
** Style fixes
** Elaborate on expected logs
** Describe how to disable the feature
** Describe NIC descriptors limitation in more detail

v2 changes:
* Mention feature is disabled by default in the documentation
* Add PHY-VM-PHY with vHost dequeue zero copy documentation guide
* Line wrap link to DPDK documentation
* Rename zc_enabled to dq_zc_enabled for future-proofing
* Mention feature is available for both vHost port types in the docs
* In practise, rebooting the VM doesn't always enable the feature if
enabled post-boot, so update the documentation to suggest a shutdown
rather than a reboot. The reason why this doesn't work is probably
because the total downtime during reboot isn't enough to allow a vhost
device unregister & re-register with the new feature, so when the VM
starts again it doesn't pick up the new device as it hasn't been
re-registered in time.

Ciara Loftus (2):
  netdev-dpdk: Helper function for vHost device setup
  netdev-dpdk: Enable optional dequeue zero copy for vHost User

 Documentation/howto/dpdk.rst             |  33 +++++
 Documentation/topics/dpdk/vhost-user.rst |  58 +++++++++
 NEWS                                     |   3 +
 lib/netdev-dpdk.c                        | 202 +++++++++++++++++++++----------
 vswitchd/vswitch.xml                     |  11 ++
 5 files changed, 245 insertions(+), 62 deletions(-)

Comments

Jan Scheurich Nov. 27, 2017, 2:47 p.m. UTC | #1

Hi Ciara,

> Thanks for your feedback. The limitation is only placed on phy port queues on the VP (vhost -> phy) path. VV path and PV path are not
> affected.

Yes, you are right. VM to VM traffic is copied on transmit to the second VM.

> > I would much rather put a requirement on tenants that their virtio drivers
> > need to allocate enough virtio packet buffers if they want their VM to use
> > zero-copy vhostuser ports. Or is the critical resource  owned and managed by
> > Qemu and we'd need a patch on Qemu to overcome this limit?

Can you comment on that? Can a user also reduce the problem by configuring
a) a larger virtio Tx queue size (up to 1K) in Qemu, or
b) a larger mempool for packets in Tx direction inside the guest (driver?) 

> >
> > And what about increased packet drop risk due to shortened tx queues?
> 
> I guess this could be an issue. If I had some data to back this up I would include it in the documentation and mention the risk.
> If the risk is unacceptable to the user they may choose to not enable the feature. It's disabled by default so shouldn't introduce an issue for
> the standard case.

Yes, but it would be good to understand the potential drawback for a better judgement of the trade-off between better raw throughput and higher loss risk.

Regards, Jan

Ciara Loftus Nov. 28, 2017, 3:37 p.m. UTC | #2

> 

> Hi Ciara,

> 

> > Thanks for your feedback. The limitation is only placed on phy port queues

> on the VP (vhost -> phy) path. VV path and PV path are not

> > affected.

> 

> Yes, you are right. VM to VM traffic is copied on transmit to the second VM.

> 

> > > I would much rather put a requirement on tenants that their virtio drivers

> > > need to allocate enough virtio packet buffers if they want their VM to use

> > > zero-copy vhostuser ports. Or is the critical resource  owned and

> managed by

> > > Qemu and we'd need a patch on Qemu to overcome this limit?

> 

> Can you comment on that? Can a user also reduce the problem by

> configuring

> a) a larger virtio Tx queue size (up to 1K) in Qemu, or


Is this possible right now without modifying QEMU src? I think the size is hardcoded to 256 at the moment although it may become configurable in the future. If/when it does, we can test and update the docs if it does solve the problem. I don’t think we should suggest modifying the QEMU src as a workaround now.

> b) a larger mempool for packets in Tx direction inside the guest (driver?)


Using the DPDK driver in the guest & generating traffic via testpmd I modified the number of descriptors given to the virtio device from 512 (default) to 2048 & 4096 but it didn't resolve the issue unfortunately.

> 

> > >

> > > And what about increased packet drop risk due to shortened tx queues?

> >

> > I guess this could be an issue. If I had some data to back this up I would

> include it in the documentation and mention the risk.

> > If the risk is unacceptable to the user they may choose to not enable the

> feature. It's disabled by default so shouldn't introduce an issue for

> > the standard case.

> 

> Yes, but it would be good to understand the potential drawback for a better

> judgement of the trade-off between better raw throughput and higher loss

> risk.


I ran RFC2544 0% packet loss tests for ZC on & off (64B PVP) and observed the following:

Max rate (Mpps) with 0% loss
ZC Off 2599518
ZC On  1678758

As you suspected, there is a trade-off. I can mention this in the docs.

Thanks,
Ciara

> 

> Regards, Jan

Jan Scheurich Nov. 28, 2017, 5:04 p.m. UTC | #3

> > Can you comment on that? Can a user also reduce the problem by

> > configuring

> > a) a larger virtio Tx queue size (up to 1K) in Qemu, or

> 

> Is this possible right now without modifying QEMU src? I think the size is hardcoded to 256 at the moment although it may become

> configurable in the future. If/when it does, we can test and update the docs if it does solve the problem. I don’t think we should suggest

> modifying the QEMU src as a workaround now.


The possibility to configure the tx queue size has been upstreamed in Qemu 2.10:

commit 9b02e1618cf26aa52cf786f215d757506dda14f8
Author: Wei Wang <wei.w.wang@intel.com>
Date:   Wed Jun 28 10:37:59 2017 +0800

    virtio-net: enable configurable tx queue size

    This patch enables the virtio-net tx queue size to be configurable
    between 256 (the default queue size) and 1024 by the user when the
    vhost-user backend is used....

So you should be able to test larger tx queue sizes with Qemu 2.10.

> 

> > b) a larger mempool for packets in Tx direction inside the guest (driver?)

> 

> Using the DPDK driver in the guest & generating traffic via testpmd I modified the number of descriptors given to the virtio device from

> 512 (default) to 2048 & 4096 but it didn't resolve the issue unfortunately.


I re-read the virtio 1.0 spec and it states that the total number of virtio descriptors per virtqueue equals the size of the virtqueue. Descriptors just point to guest mbufs. The mempool the guest driver uses for mbufs is irrelevant. OVS as virtio device needs to return the virtio descriptors to the guest driver. That means the virtio queue size sets the limit on the packets in flight in OVS and physical NICs.

I would like to add a statement in the documentation that explains this dependency between Qemu Tx queue size and maximum physical NIC Tx queue size when using the vhost zero copy feature on a port.

> > > > And what about increased packet drop risk due to shortened tx queues?

> > >

> > > I guess this could be an issue. If I had some data to back this up I would

> > include it in the documentation and mention the risk.

> > > If the risk is unacceptable to the user they may choose to not enable the

> > feature. It's disabled by default so shouldn't introduce an issue for

> > > the standard case.

> >

> > Yes, but it would be good to understand the potential drawback for a better

> > judgement of the trade-off between better raw throughput and higher loss

> > risk.

> 

> I ran RFC2544 0% packet loss tests for ZC on & off (64B PVP) and observed the following:

> 

> Max rate (Mpps) with 0% loss

> ZC Off 2599518

> ZC On  1678758

> 

> As you suspected, there is a trade-off. I can mention this in the docs.


That degradation looks severe.
It would be cool if you could re-run the test with a 1K queue size configured in Qemu 2.10 and NIC

Regards, 
Jan

Ciara Loftus Dec. 8, 2017, 12:59 p.m. UTC | #4

> 

> > > Can you comment on that? Can a user also reduce the problem by

> > > configuring

> > > a) a larger virtio Tx queue size (up to 1K) in Qemu, or

> >

> > Is this possible right now without modifying QEMU src? I think the size is

> hardcoded to 256 at the moment although it may become

> > configurable in the future. If/when it does, we can test and update the

> docs if it does solve the problem. I don’t think we should suggest

> > modifying the QEMU src as a workaround now.

> 

> The possibility to configure the tx queue size has been upstreamed in Qemu

> 2.10:

> 

> commit 9b02e1618cf26aa52cf786f215d757506dda14f8

> Author: Wei Wang <wei.w.wang@intel.com>

> Date:   Wed Jun 28 10:37:59 2017 +0800

> 

>     virtio-net: enable configurable tx queue size

> 

>     This patch enables the virtio-net tx queue size to be configurable

>     between 256 (the default queue size) and 1024 by the user when the

>     vhost-user backend is used....

> 

> So you should be able to test larger tx queue sizes with Qemu 2.10.


That's good news, thanks for sharing the details.
I tested with tx_queue_size=1024 and it didn't resolve the issue completely, but allowed for a greater number of txq descriptors for the NIC:
For default QEMU VQ size = 256, max n_txq_desc value = 256
For QEMY VQ size = 1024, max n_txq_desc value = 512

> 

> >

> > > b) a larger mempool for packets in Tx direction inside the guest (driver?)

> >

> > Using the DPDK driver in the guest & generating traffic via testpmd I

> modified the number of descriptors given to the virtio device from

> > 512 (default) to 2048 & 4096 but it didn't resolve the issue unfortunately.

> 

> I re-read the virtio 1.0 spec and it states that the total number of virtio

> descriptors per virtqueue equals the size of the virtqueue. Descriptors just

> point to guest mbufs. The mempool the guest driver uses for mbufs is

> irrelevant. OVS as virtio device needs to return the virtio descriptors to the

> guest driver. That means the virtio queue size sets the limit on the packets in

> flight in OVS and physical NICs.

> 

> I would like to add a statement in the documentation that explains this

> dependency between Qemu Tx queue size and maximum physical NIC Tx

> queue size when using the vhost zero copy feature on a port.


I will put my findings above in the documentation.

> 

> > > > > And what about increased packet drop risk due to shortened tx

> queues?

> > > >

> > > > I guess this could be an issue. If I had some data to back this up I would

> > > include it in the documentation and mention the risk.

> > > > If the risk is unacceptable to the user they may choose to not enable

> the

> > > feature. It's disabled by default so shouldn't introduce an issue for

> > > > the standard case.

> > >

> > > Yes, but it would be good to understand the potential drawback for a

> better

> > > judgement of the trade-off between better raw throughput and higher

> loss

> > > risk.

> >

> > I ran RFC2544 0% packet loss tests for ZC on & off (64B PVP) and observed

> the following:

> >

> > Max rate (Mpps) with 0% loss

> > ZC Off 2599518

> > ZC On  1678758

> >

> > As you suspected, there is a trade-off. I can mention this in the docs.

> 

> That degradation looks severe.

> It would be cool if you could re-run the test with a 1K queue size configured

> in Qemu 2.10 and NIC


I ran a couple of configurations, again 64B RFC2544 PVP:

NIC-TXD    Virtio-TXD    ZC        Mpps
2048            256                 off        2.105    # default case
128              256                 off        2.162    # checking effect of modifying NIC TXD (positive)
2048            1024               off        2.455   # checking effect of modifying Virtio TXD (positive)
128              256                 on        1.587    # default zero copy case
512              1024               on        0.321    # checking effect of modifying NIC & Virtio TXD (negative)

For the default non-zero copy case, it seems increasing the virtio queue size in the guest has a positive effect wrt packet loss, but has the opposite effect for the zero copy case.
It looks like the zero copy feature may increase the likelihood of packet loss, which I guess is a tradeoff for the increase pps you get with the feature.

Thanks,
Ciara


> 

> Regards,

> Jan