mbox series

[ovs-dev,v3,0/3] Add support for TSO with DPDK

Message ID 20200109144457.2489481-1-fbl@sysclose.org
Headers show
Series Add support for TSO with DPDK | expand

Message

Flavio Leitner Jan. 9, 2020, 2:44 p.m. UTC
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.

A guest using vhost-user interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.

It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.

If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhost-user interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.

The first 2 patches are not really part of TSO support, but they are
required to make sure everything works.

There are good improvements sending to or receiving from veth pairs or
tap devices as well. See the iperf3 results below:

[*] veth with ethtool tx off.

VM sending to:          Default           Enabled
   Local BR           859 Mbits/sec     9.23 Gbits/sec
   Net NS (veth)      965 Mbits/sec[*]  9.74 Gbits/sec
   VM (same host)    2.54 Gbits/sec     22.4 Gbits/sec
   Ext Host          10.3 Gbits/sec     35.0 Gbits/sec
   Ext Host (vxlan)  8.77 Gbits/sec     (not supported)

  Using VLAN:
   Local BR           877 Mbits/sec     9.49 Gbits/sec
   VM (same host)    2.35 Gbits/sec     23.3 Gbits/sec
   Ext Host          5.84 Gbits/sec     34.6 Gbits/sec

  Using IPv6:
   Net NS (veth)      937 Mbits/sec[*]  9.32 Gbits/sec
   VM (same host)    2.53 Gbits/sec     21.1 Gbits/sec
   Ext Host          8.66 Gbits/sec     37.7 Gbits/sec

  Conntrack:
   No packet changes: 1.41 Gbits/sec    33.1 Gbits/sec

VM receiving from:
   Local BR           221 Mbits/sec     220 Mbits/sec
   Net NS (veth)      221 Mbits/sec[*]  5.91 Gbits/sec
   VM (same host)    4.79 Gbits/sec     22.2 Gbits/sec
   Ext Host          10.6 Gbits/sec     10.7 Gbits/sec
   Ext Host (vxlan)  5.82 Gbits/sec     (not supported)

  Using VLAN:
   Local BR           223 Mbits/sec     219 Mbits/sec
   VM (same host)    4.21 Gbits/sec     24.1 Gbits/sec
   Ext Host          10.3 Gbits/sec     10.2 Gbits/sec

  Using IPv6:
   Net NS (veth)      217 Mbits/sec[*]  9.32 Gbits/sec
   VM (same host)    4.26 Gbits/sec     23.3 Gbits/sec
   Ext Host          9.99 Gbits/sec     10.1 Gbits/sec

Used iperf3 -u to test UDP traffic limited at default 1Mbits/sec
and noticed no change with the exception for tunneled packets (not
supported).

Travis, AppVeyor, and Cirrus-ci passed.

Flavio Leitner (3):
  dp-packet: preserve headroom when cloning a pkt batch
  vhost: Disable multi-segmented buffers
  netdev-dpdk: Add TCP Segmentation Offload support

 Documentation/automake.mk           |   1 +
 Documentation/topics/dpdk/index.rst |   1 +
 Documentation/topics/dpdk/tso.rst   |  96 +++++++++
 NEWS                                |   1 +
 lib/automake.mk                     |   2 +
 lib/conntrack.c                     |  29 ++-
 lib/dp-packet.h                     | 158 +++++++++++++-
 lib/ipf.c                           |  32 +--
 lib/netdev-dpdk.c                   | 318 ++++++++++++++++++++++++----
 lib/netdev-linux-private.h          |   4 +
 lib/netdev-linux.c                  | 296 +++++++++++++++++++++++---
 lib/netdev-provider.h               |  10 +
 lib/netdev.c                        |  66 +++++-
 lib/tso.c                           |  54 +++++
 lib/tso.h                           |  23 ++
 vswitchd/bridge.c                   |   2 +
 vswitchd/vswitch.xml                |  12 ++
 17 files changed, 1013 insertions(+), 92 deletions(-)
 create mode 100644 Documentation/topics/dpdk/tso.rst
 create mode 100644 lib/tso.c
 create mode 100644 lib/tso.h

Comments

Ciara Loftus Jan. 10, 2020, 1:55 p.m. UTC | #1
> -----Original Message-----
> From: Flavio Leitner <fbl@sysclose.org>
> Sent: Thursday 9 January 2020 14:45
> To: dev@openvswitch.org
> Cc: Stokes, Ian <ian.stokes@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; Ilya Maximets <i.maximets@ovn.org>;
> yangyi01@inspur.com; Flavio Leitner <fbl@sysclose.org>
> Subject: [PATCH v3 0/3] Add support for TSO with DPDK
> 
> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
> the network stack to delegate the TCP segmentation to the NIC reducing
> the per packet CPU overhead.
> 
> A guest using vhost-user interface with TSO enabled can send TCP packets
> much bigger than the MTU, which saves CPU cycles normally used to break
> the packets down to MTU size and to calculate checksums.
> 
> It also saves CPU cycles used to parse multiple packets/headers during
> the packet processing inside virtual switch.
> 
> If the destination of the packet is another guest in the same host, then
> the same big packet can be sent through a vhost-user interface skipping
> the segmentation completely. However, if the destination is not local,
> the NIC hardware is instructed to do the TCP segmentation and checksum
> calculation.
> 
> The first 2 patches are not really part of TSO support, but they are
> required to make sure everything works.
> 
> There are good improvements sending to or receiving from veth pairs or
> tap devices as well. See the iperf3 results below:
> 
> [*] veth with ethtool tx off.
> 
> VM sending to:          Default           Enabled
>    Local BR           859 Mbits/sec     9.23 Gbits/sec
>    Net NS (veth)      965 Mbits/sec[*]  9.74 Gbits/sec
>    VM (same host)    2.54 Gbits/sec     22.4 Gbits/sec
>    Ext Host          10.3 Gbits/sec     35.0 Gbits/sec

I performed some similar tests. I recorded the following improvements in throughput:
VM -> VM (same host): +5.4x
VM -> Ext Host: +3.8x

I tested VM -> Ext Host for both MT27800 (ConnectX-5) and a XL710 (Fortville) NICs.
The result above was measured for the XL710.

Two things to note when testing with an i40e NIC:
1. The following patch is required for DPDK, which fixes an issue on the TSO path:
http://git.dpdk.org/next/dpdk-next-net/commit/?id=b2a4dc260139409c539fb8e7f1b9d0a5182cfd2b
2. For optimal performance, ensure the IRQs of the queue being used by the iperf server are pinned to their own core, as opposed to the same core as the server process, which appears to be the default behavior.

I intend on submitting a follow up patch which documents the above.

Tested-by: Ciara Loftus <ciara.loftus@intel.com>

Thanks,
Ciara

>    Ext Host (vxlan)  8.77 Gbits/sec     (not supported)
> 
>   Using VLAN:
>    Local BR           877 Mbits/sec     9.49 Gbits/sec
>    VM (same host)    2.35 Gbits/sec     23.3 Gbits/sec
>    Ext Host          5.84 Gbits/sec     34.6 Gbits/sec
> 
>   Using IPv6:
>    Net NS (veth)      937 Mbits/sec[*]  9.32 Gbits/sec
>    VM (same host)    2.53 Gbits/sec     21.1 Gbits/sec
>    Ext Host          8.66 Gbits/sec     37.7 Gbits/sec
> 
>   Conntrack:
>    No packet changes: 1.41 Gbits/sec    33.1 Gbits/sec
> 
> VM receiving from:
>    Local BR           221 Mbits/sec     220 Mbits/sec
>    Net NS (veth)      221 Mbits/sec[*]  5.91 Gbits/sec
>    VM (same host)    4.79 Gbits/sec     22.2 Gbits/sec
>    Ext Host          10.6 Gbits/sec     10.7 Gbits/sec
>    Ext Host (vxlan)  5.82 Gbits/sec     (not supported)
> 
>   Using VLAN:
>    Local BR           223 Mbits/sec     219 Mbits/sec
>    VM (same host)    4.21 Gbits/sec     24.1 Gbits/sec
>    Ext Host          10.3 Gbits/sec     10.2 Gbits/sec
> 
>   Using IPv6:
>    Net NS (veth)      217 Mbits/sec[*]  9.32 Gbits/sec
>    VM (same host)    4.26 Gbits/sec     23.3 Gbits/sec
>    Ext Host          9.99 Gbits/sec     10.1 Gbits/sec
> 
> Used iperf3 -u to test UDP traffic limited at default 1Mbits/sec
> and noticed no change with the exception for tunneled packets (not
> supported).
> 
> Travis, AppVeyor, and Cirrus-ci passed.
> 
> Flavio Leitner (3):
>   dp-packet: preserve headroom when cloning a pkt batch
>   vhost: Disable multi-segmented buffers
>   netdev-dpdk: Add TCP Segmentation Offload support
> 
>  Documentation/automake.mk           |   1 +
>  Documentation/topics/dpdk/index.rst |   1 +
>  Documentation/topics/dpdk/tso.rst   |  96 +++++++++
>  NEWS                                |   1 +
>  lib/automake.mk                     |   2 +
>  lib/conntrack.c                     |  29 ++-
>  lib/dp-packet.h                     | 158 +++++++++++++-
>  lib/ipf.c                           |  32 +--
>  lib/netdev-dpdk.c                   | 318 ++++++++++++++++++++++++----
>  lib/netdev-linux-private.h          |   4 +
>  lib/netdev-linux.c                  | 296 +++++++++++++++++++++++---
>  lib/netdev-provider.h               |  10 +
>  lib/netdev.c                        |  66 +++++-
>  lib/tso.c                           |  54 +++++
>  lib/tso.h                           |  23 ++
>  vswitchd/bridge.c                   |   2 +
>  vswitchd/vswitch.xml                |  12 ++
>  17 files changed, 1013 insertions(+), 92 deletions(-)
>  create mode 100644 Documentation/topics/dpdk/tso.rst
>  create mode 100644 lib/tso.c
>  create mode 100644 lib/tso.h
> 
> --
> 2.24.1