mbox series

[ovs-dev,0/4] Add support for TSO with DPDK

Message ID 20191202134426.2220333-1-fbl@sysclose.org
Headers show
Series Add support for TSO with DPDK | expand

Message

Flavio Leitner Dec. 2, 2019, 1:44 p.m. UTC
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.

A guest using vhost-user interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.

It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.

If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhost-user interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.

The first 3 patches are not really part of TSO support, but they are
required to make sure everything works.

The TSO is only enabled in the vhost-user ports in client mode which
means DPDK is required. The other ports (dpdk or linux) can receive
those packets.

This patchset is based on branch dpdk-latest (v19.11 required).

The numbers look good and I noticed no regression so far. However, my
environment is tuned but not well fined tuned, so consider those as
ball park numbers only.

This is VM-to-external host (Gbits/sec)
               | No patch  |  TSO off  |  TSO on
              ---------------------------------------------
TCPv4          |    10     |    10     |   38  (line rate)
-----------------------------------------------------------
TCPv6          |     9     |     9     |   38  (line rate)
-----------------------------------------------------------
V4 Conntrack   |     1     |     1     |   33


This is VM-to-VM in the same host (Gbits/sec)
               | No patch  |  TSO off  |  TSO on
              ---------------------------------------------
TCPv4          |     9.4   |    9.4    |   31
-----------------------------------------------------------
TCPv6          |     8     |    8      |   30
-----------------------------------------------------------

There are good improvements sending to veth pairs or tap devices too.

Flavio Leitner (4):
  dp-packet: preserve headroom when cloning a pkt batch
  vhost: Disable multi-segmented buffers
  dp-packet: handle new dpdk buffer flags
  netdev-dpdk: Add TCP Segmentation Offload support

 Documentation/automake.mk           |   1 +
 Documentation/topics/dpdk/index.rst |   1 +
 Documentation/topics/dpdk/tso.rst   |  83 +++++++++++++++++++++
 NEWS                                |   1 +
 lib/automake.mk                     |   2 +
 lib/dp-packet.c                     |  50 ++++++++++++-
 lib/dp-packet.h                     |  38 ++++++++--
 lib/netdev-bsd.c                    |   7 ++
 lib/netdev-dpdk.c                   | 112 +++++++++++++++++++++++-----
 lib/netdev-dummy.c                  |   7 ++
 lib/netdev-linux.c                  |  71 ++++++++++++++++--
 lib/tso.c                           |  54 ++++++++++++++
 lib/tso.h                           |  23 ++++++
 vswitchd/bridge.c                   |   2 +
 vswitchd/vswitch.xml                |  14 ++++
 15 files changed, 430 insertions(+), 36 deletions(-)
 create mode 100644 Documentation/topics/dpdk/tso.rst
 create mode 100644 lib/tso.c
 create mode 100644 lib/tso.h

Comments

Ilya Maximets Dec. 3, 2019, 12:15 p.m. UTC | #1
On 02.12.2019 14:44, Flavio Leitner wrote:
> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
> the network stack to delegate the TCP segmentation to the NIC reducing
> the per packet CPU overhead.
> 
> A guest using vhost-user interface with TSO enabled can send TCP packets
> much bigger than the MTU, which saves CPU cycles normally used to break
> the packets down to MTU size and to calculate checksums.
> 
> It also saves CPU cycles used to parse multiple packets/headers during
> the packet processing inside virtual switch.
> 
> If the destination of the packet is another guest in the same host, then
> the same big packet can be sent through a vhost-user interface skipping
> the segmentation completely. However, if the destination is not local,
> the NIC hardware is instructed to do the TCP segmentation and checksum
> calculation.
> 
> The first 3 patches are not really part of TSO support, but they are
> required to make sure everything works.
> 
> The TSO is only enabled in the vhost-user ports in client mode which
> means DPDK is required. The other ports (dpdk or linux) can receive
> those packets.
> 
> This patchset is based on branch dpdk-latest (v19.11 required).
> 
> The numbers look good and I noticed no regression so far. However, my
> environment is tuned but not well fined tuned, so consider those as
> ball park numbers only.
> 
> This is VM-to-external host (Gbits/sec)
>                | No patch  |  TSO off  |  TSO on
>               ---------------------------------------------
> TCPv4          |    10     |    10     |   38  (line rate)
> -----------------------------------------------------------
> TCPv6          |     9     |     9     |   38  (line rate)
> -----------------------------------------------------------
> V4 Conntrack   |     1     |     1     |   33
> 
> 
> This is VM-to-VM in the same host (Gbits/sec)
>                | No patch  |  TSO off  |  TSO on
>               ---------------------------------------------
> TCPv4          |     9.4   |    9.4    |   31
> -----------------------------------------------------------
> TCPv6          |     8     |    8      |   30
> -----------------------------------------------------------
> 
> There are good improvements sending to veth pairs or tap devices too.
> 
> Flavio Leitner (4):
>   dp-packet: preserve headroom when cloning a pkt batch
>   vhost: Disable multi-segmented buffers
>   dp-packet: handle new dpdk buffer flags
>   netdev-dpdk: Add TCP Segmentation Offload support
> 
>  Documentation/automake.mk           |   1 +
>  Documentation/topics/dpdk/index.rst |   1 +
>  Documentation/topics/dpdk/tso.rst   |  83 +++++++++++++++++++++
>  NEWS                                |   1 +
>  lib/automake.mk                     |   2 +
>  lib/dp-packet.c                     |  50 ++++++++++++-
>  lib/dp-packet.h                     |  38 ++++++++--
>  lib/netdev-bsd.c                    |   7 ++
>  lib/netdev-dpdk.c                   | 112 +++++++++++++++++++++++-----
>  lib/netdev-dummy.c                  |   7 ++
>  lib/netdev-linux.c                  |  71 ++++++++++++++++--
>  lib/tso.c                           |  54 ++++++++++++++
>  lib/tso.h                           |  23 ++++++
>  vswitchd/bridge.c                   |   2 +
>  vswitchd/vswitch.xml                |  14 ++++
>  15 files changed, 430 insertions(+), 36 deletions(-)
>  create mode 100644 Documentation/topics/dpdk/tso.rst
>  create mode 100644 lib/tso.c
>  create mode 100644 lib/tso.h
> 


Hi Flavio,

Thanks for working on this.
I didn't read the code carefully,  just a couple of first look points:

* Patch-set needs to have 'dpdk-latest' in a subject prefix.

* First patch seems to be OK even without TSO, batch_clone usually
  happens on clone action that implies further packet modifications,
  so it might make sense to have a headroom for that.

* Maybe we need to enable LINEARBUF for usual vhost-user too just to
  avoid receiving of mutli-segment mbufs?

* Third patch is not needed.  Similar patch was already merged recently.

* I don't see the call to rte_eth_tx_prepare().  It is required to prepare
  packets for TSO for Intel NICs that needs pseudo-header checksum
  precalculated.

* Guest is allowed to disable offloading.  In this case we're not allowed
  to send incomplete packets (no-checksum, oversized), they will be just
  dropped by the guest.

* NICs could not support TSO.  I see that you're moving this issue to the
  user by forcing to be sure that TSO is supported.  Most of HW NICs
  nowadays should support TSO, so it might be not a big deal, but
  what about some software NIC that DPDK implements?  I'm not sure.

* Don't include virtio headers in dp-packet code.  This will break non-Linux
  builds.  In fact, construction of virtio-net header is only needed for
  netdev-linux, so it should be implemented there.

* netdev-afxdp and netdev-windows are not handled by the patches.

* I'm not sure if we need separate tso.{c,h} files.

* Docs and log messages are very confusing because they makes impression that
  system datapath doesn't support TSO.  Also, why we cant' have TSO without DPDK?
  You have a virtio header for netdev-linux, so we can use it.

* Is it safe to just drop virtio header on packet receive?

* Not sure if it fully safe to say that checksum is valid while it's not
  calculated.  Have you checked this from the ipf and conntrack points of
  view.

Best regards, Ilya Maximets.
Stokes, Ian Dec. 3, 2019, 12:59 p.m. UTC | #2
On 12/3/2019 12:15 PM, Ilya Maximets wrote:
> On 02.12.2019 14:44, Flavio Leitner wrote:
>> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
>> the network stack to delegate the TCP segmentation to the NIC reducing
>> the per packet CPU overhead.
>>
>> A guest using vhost-user interface with TSO enabled can send TCP packets
>> much bigger than the MTU, which saves CPU cycles normally used to break
>> the packets down to MTU size and to calculate checksums.
>>
>> It also saves CPU cycles used to parse multiple packets/headers during
>> the packet processing inside virtual switch.
>>
>> If the destination of the packet is another guest in the same host, then
>> the same big packet can be sent through a vhost-user interface skipping
>> the segmentation completely. However, if the destination is not local,
>> the NIC hardware is instructed to do the TCP segmentation and checksum
>> calculation.
>>
>> The first 3 patches are not really part of TSO support, but they are
>> required to make sure everything works.
>>
>> The TSO is only enabled in the vhost-user ports in client mode which
>> means DPDK is required. The other ports (dpdk or linux) can receive
>> those packets.
>>
>> This patchset is based on branch dpdk-latest (v19.11 required).
>>
>> The numbers look good and I noticed no regression so far. However, my
>> environment is tuned but not well fined tuned, so consider those as
>> ball park numbers only.
>>
>> This is VM-to-external host (Gbits/sec)
>>                 | No patch  |  TSO off  |  TSO on
>>                ---------------------------------------------
>> TCPv4          |    10     |    10     |   38  (line rate)
>> -----------------------------------------------------------
>> TCPv6          |     9     |     9     |   38  (line rate)
>> -----------------------------------------------------------
>> V4 Conntrack   |     1     |     1     |   33
>>
>>
>> This is VM-to-VM in the same host (Gbits/sec)
>>                 | No patch  |  TSO off  |  TSO on
>>                ---------------------------------------------
>> TCPv4          |     9.4   |    9.4    |   31
>> -----------------------------------------------------------
>> TCPv6          |     8     |    8      |   30
>> -----------------------------------------------------------
>>
>> There are good improvements sending to veth pairs or tap devices too.
>>
>> Flavio Leitner (4):
>>    dp-packet: preserve headroom when cloning a pkt batch
>>    vhost: Disable multi-segmented buffers
>>    dp-packet: handle new dpdk buffer flags
>>    netdev-dpdk: Add TCP Segmentation Offload support
>>
>>   Documentation/automake.mk           |   1 +
>>   Documentation/topics/dpdk/index.rst |   1 +
>>   Documentation/topics/dpdk/tso.rst   |  83 +++++++++++++++++++++
>>   NEWS                                |   1 +
>>   lib/automake.mk                     |   2 +
>>   lib/dp-packet.c                     |  50 ++++++++++++-
>>   lib/dp-packet.h                     |  38 ++++++++--
>>   lib/netdev-bsd.c                    |   7 ++
>>   lib/netdev-dpdk.c                   | 112 +++++++++++++++++++++++-----
>>   lib/netdev-dummy.c                  |   7 ++
>>   lib/netdev-linux.c                  |  71 ++++++++++++++++--
>>   lib/tso.c                           |  54 ++++++++++++++
>>   lib/tso.h                           |  23 ++++++
>>   vswitchd/bridge.c                   |   2 +
>>   vswitchd/vswitch.xml                |  14 ++++
>>   15 files changed, 430 insertions(+), 36 deletions(-)
>>   create mode 100644 Documentation/topics/dpdk/tso.rst
>>   create mode 100644 lib/tso.c
>>   create mode 100644 lib/tso.h
>>
> 
> 
> Hi Flavio,
> 
> Thanks for working on this.
> I didn't read the code carefully,  just a couple of first look points:
> 
> * Patch-set needs to have 'dpdk-latest' in a subject prefix.
> 

FYI, I'm creating a patch to move to 19.11 today for master, just 
validating at the moment. Will send this evening.

Ian

> * First patch seems to be OK even without TSO, batch_clone usually
>    happens on clone action that implies further packet modifications,
>    so it might make sense to have a headroom for that.
> 
> * Maybe we need to enable LINEARBUF for usual vhost-user too just to
>    avoid receiving of mutli-segment mbufs?
> 
> * Third patch is not needed.  Similar patch was already merged recently.
> 
> * I don't see the call to rte_eth_tx_prepare().  It is required to prepare
>    packets for TSO for Intel NICs that needs pseudo-header checksum
>    precalculated.
> 
> * Guest is allowed to disable offloading.  In this case we're not allowed
>    to send incomplete packets (no-checksum, oversized), they will be just
>    dropped by the guest.
> 
> * NICs could not support TSO.  I see that you're moving this issue to the
>    user by forcing to be sure that TSO is supported.  Most of HW NICs
>    nowadays should support TSO, so it might be not a big deal, but
>    what about some software NIC that DPDK implements?  I'm not sure.
> 
> * Don't include virtio headers in dp-packet code.  This will break non-Linux
>    builds.  In fact, construction of virtio-net header is only needed for
>    netdev-linux, so it should be implemented there.
> 
> * netdev-afxdp and netdev-windows are not handled by the patches.
> 
> * I'm not sure if we need separate tso.{c,h} files.
> 
> * Docs and log messages are very confusing because they makes impression that
>    system datapath doesn't support TSO.  Also, why we cant' have TSO without DPDK?
>    You have a virtio header for netdev-linux, so we can use it.
> 
> * Is it safe to just drop virtio header on packet receive?
> 
> * Not sure if it fully safe to say that checksum is valid while it's not
>    calculated.  Have you checked this from the ipf and conntrack points of
>    view.
> 
> Best regards, Ilya Maximets.
>
Flavio Leitner Dec. 3, 2019, 1:12 p.m. UTC | #3
On Tue, Dec 03, 2019 at 01:15:23PM +0100, Ilya Maximets wrote:
> On 02.12.2019 14:44, Flavio Leitner wrote:
> > Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
> > the network stack to delegate the TCP segmentation to the NIC reducing
> > the per packet CPU overhead.
> > 
> > A guest using vhost-user interface with TSO enabled can send TCP packets
> > much bigger than the MTU, which saves CPU cycles normally used to break
> > the packets down to MTU size and to calculate checksums.
> > 
> > It also saves CPU cycles used to parse multiple packets/headers during
> > the packet processing inside virtual switch.
> > 
> > If the destination of the packet is another guest in the same host, then
> > the same big packet can be sent through a vhost-user interface skipping
> > the segmentation completely. However, if the destination is not local,
> > the NIC hardware is instructed to do the TCP segmentation and checksum
> > calculation.
> > 
> > The first 3 patches are not really part of TSO support, but they are
> > required to make sure everything works.
> > 
> > The TSO is only enabled in the vhost-user ports in client mode which
> > means DPDK is required. The other ports (dpdk or linux) can receive
> > those packets.
> > 
> > This patchset is based on branch dpdk-latest (v19.11 required).
> > 
> > The numbers look good and I noticed no regression so far. However, my
> > environment is tuned but not well fined tuned, so consider those as
> > ball park numbers only.
> > 
> > This is VM-to-external host (Gbits/sec)
> >                | No patch  |  TSO off  |  TSO on
> >               ---------------------------------------------
> > TCPv4          |    10     |    10     |   38  (line rate)
> > -----------------------------------------------------------
> > TCPv6          |     9     |     9     |   38  (line rate)
> > -----------------------------------------------------------
> > V4 Conntrack   |     1     |     1     |   33
> > 
> > 
> > This is VM-to-VM in the same host (Gbits/sec)
> >                | No patch  |  TSO off  |  TSO on
> >               ---------------------------------------------
> > TCPv4          |     9.4   |    9.4    |   31
> > -----------------------------------------------------------
> > TCPv6          |     8     |    8      |   30
> > -----------------------------------------------------------
> > 
> > There are good improvements sending to veth pairs or tap devices too.
> > 
> > Flavio Leitner (4):
> >   dp-packet: preserve headroom when cloning a pkt batch
> >   vhost: Disable multi-segmented buffers
> >   dp-packet: handle new dpdk buffer flags
> >   netdev-dpdk: Add TCP Segmentation Offload support
> > 
> >  Documentation/automake.mk           |   1 +
> >  Documentation/topics/dpdk/index.rst |   1 +
> >  Documentation/topics/dpdk/tso.rst   |  83 +++++++++++++++++++++
> >  NEWS                                |   1 +
> >  lib/automake.mk                     |   2 +
> >  lib/dp-packet.c                     |  50 ++++++++++++-
> >  lib/dp-packet.h                     |  38 ++++++++--
> >  lib/netdev-bsd.c                    |   7 ++
> >  lib/netdev-dpdk.c                   | 112 +++++++++++++++++++++++-----
> >  lib/netdev-dummy.c                  |   7 ++
> >  lib/netdev-linux.c                  |  71 ++++++++++++++++--
> >  lib/tso.c                           |  54 ++++++++++++++
> >  lib/tso.h                           |  23 ++++++
> >  vswitchd/bridge.c                   |   2 +
> >  vswitchd/vswitch.xml                |  14 ++++
> >  15 files changed, 430 insertions(+), 36 deletions(-)
> >  create mode 100644 Documentation/topics/dpdk/tso.rst
> >  create mode 100644 lib/tso.c
> >  create mode 100644 lib/tso.h
> > 
> 
> 
> Hi Flavio,
> 
> Thanks for working on this.
> I didn't read the code carefully,  just a couple of first look points:
> 
> * Patch-set needs to have 'dpdk-latest' in a subject prefix.

Ok.

> * First patch seems to be OK even without TSO, batch_clone usually
>   happens on clone action that implies further packet modifications,
>   so it might make sense to have a headroom for that.

It's in the cover letter that the first three patches are somewhat
unrelated, but needed to make sure everything works ok.

Also that it preserves the original headroom and not the default
headroom used in the new packets.

> * Maybe we need to enable LINEARBUF for usual vhost-user too just to
>   avoid receiving of mutli-segment mbufs?

That's what the patch does, right? The flag is always passed
regardless of TSO being enabled or disabled.

> * Third patch is not needed.  Similar patch was already merged recently.

Cool, but that patch wasn't in dpdk-latest, so I can't drop that yet.

> * I don't see the call to rte_eth_tx_prepare().  It is required to prepare
>   packets for TSO for Intel NICs that needs pseudo-header checksum
>   precalculated.

Good catch, will fix in v2.

> * Guest is allowed to disable offloading.  In this case we're not allowed
>   to send incomplete packets (no-checksum, oversized), they will be just
>   dropped by the guest.

Good point, will fix in v2.

> * NICs could not support TSO.  I see that you're moving this issue to the
>   user by forcing to be sure that TSO is supported.  Most of HW NICs
>   nowadays should support TSO, so it might be not a big deal, but
>   what about some software NIC that DPDK implements?  I'm not sure.

What kind of some interface you have in mind? Since I will need to
handle VMs with TSO disabled, then most probably the same solution
applies to those software devices.

> * Don't include virtio headers in dp-packet code.  This will break non-Linux
>   builds.  In fact, construction of virtio-net header is only needed for
>   netdev-linux, so it should be implemented there.

Isn't virtio available in non-linux envs? I had initially there, but
it breaks because it has DPDK dependencies as well.

> * netdev-afxdp and netdev-windows are not handled by the patches.

That's correct.

> * I'm not sure if we need separate tso.{c,h} files.

The tso_init() is called from vswtchd/bridge.c, so I am open to
ideas.

> * Docs and log messages are very confusing because they makes impression that
>   system datapath doesn't support TSO.  Also, why we cant' have TSO without DPDK?
>   You have a virtio header for netdev-linux, so we can use it.

OK, the doc is under the subdirectory dpdk/, but the vswitchd.xml
and the other log messages could be more clear.

> * Is it safe to just drop virtio header on packet receive?

We don't support TSO from other devices yet, so we drop that and
let the usual checking drop the packets if they miss csum or are
oversized.

> * Not sure if it fully safe to say that checksum is valid while it's not
>   calculated.  Have you checked this from the ipf and conntrack points of
>   view.

Could you elaborate? I didn't get the question.

Thanks for the review!
Flavio Leitner Dec. 3, 2019, 3:48 p.m. UTC | #4
On Tue, Dec 03, 2019 at 12:59:24PM +0000, Stokes, Ian wrote:
> 
> 
> On 12/3/2019 12:15 PM, Ilya Maximets wrote:
> > On 02.12.2019 14:44, Flavio Leitner wrote:
> > > Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
> > > the network stack to delegate the TCP segmentation to the NIC reducing
> > > the per packet CPU overhead.
> > 
> > Hi Flavio,
> > 
> > Thanks for working on this.
> > I didn't read the code carefully,  just a couple of first look points:
> > 
> > * Patch-set needs to have 'dpdk-latest' in a subject prefix.
> > 
> 
> FYI, I'm creating a patch to move to 19.11 today for master, just validating
> at the moment. Will send this evening.

Great! I will rebase on top of that before push tso v2.
fbl

> 
> Ian
> 
> > * First patch seems to be OK even without TSO, batch_clone usually
> >    happens on clone action that implies further packet modifications,
> >    so it might make sense to have a headroom for that.
> > 
> > * Maybe we need to enable LINEARBUF for usual vhost-user too just to
> >    avoid receiving of mutli-segment mbufs?
> > 
> > * Third patch is not needed.  Similar patch was already merged recently.
> > 
> > * I don't see the call to rte_eth_tx_prepare().  It is required to prepare
> >    packets for TSO for Intel NICs that needs pseudo-header checksum
> >    precalculated.
> > 
> > * Guest is allowed to disable offloading.  In this case we're not allowed
> >    to send incomplete packets (no-checksum, oversized), they will be just
> >    dropped by the guest.
> > 
> > * NICs could not support TSO.  I see that you're moving this issue to the
> >    user by forcing to be sure that TSO is supported.  Most of HW NICs
> >    nowadays should support TSO, so it might be not a big deal, but
> >    what about some software NIC that DPDK implements?  I'm not sure.
> > 
> > * Don't include virtio headers in dp-packet code.  This will break non-Linux
> >    builds.  In fact, construction of virtio-net header is only needed for
> >    netdev-linux, so it should be implemented there.
> > 
> > * netdev-afxdp and netdev-windows are not handled by the patches.
> > 
> > * I'm not sure if we need separate tso.{c,h} files.
> > 
> > * Docs and log messages are very confusing because they makes impression that
> >    system datapath doesn't support TSO.  Also, why we cant' have TSO without DPDK?
> >    You have a virtio header for netdev-linux, so we can use it.
> > 
> > * Is it safe to just drop virtio header on packet receive?
> > 
> > * Not sure if it fully safe to say that checksum is valid while it's not
> >    calculated.  Have you checked this from the ipf and conntrack points of
> >    view.
> > 
> > Best regards, Ilya Maximets.
> > 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ilya Maximets Dec. 3, 2019, 4:26 p.m. UTC | #5
On 03.12.2019 14:12, Flavio Leitner wrote:
> On Tue, Dec 03, 2019 at 01:15:23PM +0100, Ilya Maximets wrote:
>> On 02.12.2019 14:44, Flavio Leitner wrote:
>>> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
>>> the network stack to delegate the TCP segmentation to the NIC reducing
>>> the per packet CPU overhead.
>>>
>>> A guest using vhost-user interface with TSO enabled can send TCP packets
>>> much bigger than the MTU, which saves CPU cycles normally used to break
>>> the packets down to MTU size and to calculate checksums.
>>>
>>> It also saves CPU cycles used to parse multiple packets/headers during
>>> the packet processing inside virtual switch.
>>>
>>> If the destination of the packet is another guest in the same host, then
>>> the same big packet can be sent through a vhost-user interface skipping
>>> the segmentation completely. However, if the destination is not local,
>>> the NIC hardware is instructed to do the TCP segmentation and checksum
>>> calculation.
>>>
>>> The first 3 patches are not really part of TSO support, but they are
>>> required to make sure everything works.
>>>
>>> The TSO is only enabled in the vhost-user ports in client mode which
>>> means DPDK is required. The other ports (dpdk or linux) can receive
>>> those packets.
>>>
>>> This patchset is based on branch dpdk-latest (v19.11 required).
>>>
>>> The numbers look good and I noticed no regression so far. However, my
>>> environment is tuned but not well fined tuned, so consider those as
>>> ball park numbers only.
>>>
>>> This is VM-to-external host (Gbits/sec)
>>>                | No patch  |  TSO off  |  TSO on
>>>               ---------------------------------------------
>>> TCPv4          |    10     |    10     |   38  (line rate)
>>> -----------------------------------------------------------
>>> TCPv6          |     9     |     9     |   38  (line rate)
>>> -----------------------------------------------------------
>>> V4 Conntrack   |     1     |     1     |   33
>>>
>>>
>>> This is VM-to-VM in the same host (Gbits/sec)
>>>                | No patch  |  TSO off  |  TSO on
>>>               ---------------------------------------------
>>> TCPv4          |     9.4   |    9.4    |   31
>>> -----------------------------------------------------------
>>> TCPv6          |     8     |    8      |   30
>>> -----------------------------------------------------------
>>>
>>> There are good improvements sending to veth pairs or tap devices too.
>>>
>>> Flavio Leitner (4):
>>>   dp-packet: preserve headroom when cloning a pkt batch
>>>   vhost: Disable multi-segmented buffers
>>>   dp-packet: handle new dpdk buffer flags
>>>   netdev-dpdk: Add TCP Segmentation Offload support
>>>
>>>  Documentation/automake.mk           |   1 +
>>>  Documentation/topics/dpdk/index.rst |   1 +
>>>  Documentation/topics/dpdk/tso.rst   |  83 +++++++++++++++++++++
>>>  NEWS                                |   1 +
>>>  lib/automake.mk                     |   2 +
>>>  lib/dp-packet.c                     |  50 ++++++++++++-
>>>  lib/dp-packet.h                     |  38 ++++++++--
>>>  lib/netdev-bsd.c                    |   7 ++
>>>  lib/netdev-dpdk.c                   | 112 +++++++++++++++++++++++-----
>>>  lib/netdev-dummy.c                  |   7 ++
>>>  lib/netdev-linux.c                  |  71 ++++++++++++++++--
>>>  lib/tso.c                           |  54 ++++++++++++++
>>>  lib/tso.h                           |  23 ++++++
>>>  vswitchd/bridge.c                   |   2 +
>>>  vswitchd/vswitch.xml                |  14 ++++
>>>  15 files changed, 430 insertions(+), 36 deletions(-)
>>>  create mode 100644 Documentation/topics/dpdk/tso.rst
>>>  create mode 100644 lib/tso.c
>>>  create mode 100644 lib/tso.h
>>>
>>
>>
>> Hi Flavio,
>>
>> Thanks for working on this.
>> I didn't read the code carefully,  just a couple of first look points:
>>
>> * Patch-set needs to have 'dpdk-latest' in a subject prefix.
> 
> Ok.
> 
>> * First patch seems to be OK even without TSO, batch_clone usually
>>   happens on clone action that implies further packet modifications,
>>   so it might make sense to have a headroom for that.
> 
> It's in the cover letter that the first three patches are somewhat
> unrelated, but needed to make sure everything works ok.
> 
> Also that it preserves the original headroom and not the default
> headroom used in the new packets.
> 
>> * Maybe we need to enable LINEARBUF for usual vhost-user too just to
>>   avoid receiving of mutli-segment mbufs?
> 
> That's what the patch does, right? The flag is always passed
> regardless of TSO being enabled or disabled.

I meant dpdkvhostuser vs dpdkvhostuserclient.

> 
>> * Third patch is not needed.  Similar patch was already merged recently.
> 
> Cool, but that patch wasn't in dpdk-latest, so I can't drop that yet.
> 
>> * I don't see the call to rte_eth_tx_prepare().  It is required to prepare
>>   packets for TSO for Intel NICs that needs pseudo-header checksum
>>   precalculated.
> 
> Good catch, will fix in v2.
> 
>> * Guest is allowed to disable offloading.  In this case we're not allowed
>>   to send incomplete packets (no-checksum, oversized), they will be just
>>   dropped by the guest.
> 
> Good point, will fix in v2.
> 
>> * NICs could not support TSO.  I see that you're moving this issue to the
>>   user by forcing to be sure that TSO is supported.  Most of HW NICs
>>   nowadays should support TSO, so it might be not a big deal, but
>>   what about some software NIC that DPDK implements?  I'm not sure.
> 
> What kind of some interface you have in mind? Since I will need to
> handle VMs with TSO disabled, then most probably the same solution
> applies to those software devices.

Looking at the feature support table there are even few HW NICs that
doesn't support TSO: https://doc.dpdk.org/guides/nics/overview.html

For example, axgbe and new hns3.

From virtual ports the most obvious is afxdp.

> 
>> * Don't include virtio headers in dp-packet code.  This will break non-Linux
>>   builds.  In fact, construction of virtio-net header is only needed for
>>   netdev-linux, so it should be implemented there.
> 
> Isn't virtio available in non-linux envs? I had initially there, but
> it breaks because it has DPDK dependencies as well.

I believe that "linux/virtio_net.h" is not available outside of linux.
BTW, You could always try to check your patch-set on CirrusCI with FreeBSD
or on Windows with AppVeyor.

We'll have to move this code if someday we'll want to use DPDK on BSD
or Windows.  Right now netdev-dpdk depends on Linux, but mostly because
of the vhost-user support.

> 
>> * netdev-afxdp and netdev-windows are not handled by the patches.
> 
> That's correct.
> 
>> * I'm not sure if we need separate tso.{c,h} files.
> 
> The tso_init() is called from vswtchd/bridge.c, so I am open to
> ideas.

Need to think about this.

> 
>> * Docs and log messages are very confusing because they makes impression that
>>   system datapath doesn't support TSO.  Also, why we cant' have TSO without DPDK?
>>   You have a virtio header for netdev-linux, so we can use it.
> 
> OK, the doc is under the subdirectory dpdk/, but the vswitchd.xml
> and the other log messages could be more clear.
> 
>> * Is it safe to just drop virtio header on packet receive?
> 
> We don't support TSO from other devices yet, so we drop that and
> let the usual checking drop the packets if they miss csum or are
> oversized.
> 
>> * Not sure if it fully safe to say that checksum is valid while it's not
>>   calculated.  Have you checked this from the ipf and conntrack points of
>>   view.
> 
> Could you elaborate? I didn't get the question.

I meant that ipf and conntrack in userspace are using the checksum_valid()
functions and it might be bad if they think that checksum is correct while
it's not.  Just wanted to be sure that you've checked the callers.

One more point:

* I see that you're claiming that TSO is not supported with tunneling, so
  it might make sense to fail the encapsulation for packets with wrong checksums
  if we're not going to have software TSO implementation.


BTW, something strange happens to your mail account:
"""
An error occurred while sending mail. The mail server responded:
550 5.1.2 <fbl@sysclose.com>: Recipient address rejected: Domain not found.
Please check the message recipient "fbl@sysclose.com" and try again.
"""
So, I changed it back to .org one.

Best regards, Ilya Maximets.
Flavio Leitner Dec. 3, 2019, 7:19 p.m. UTC | #6
On Tue, Dec 03, 2019 at 05:26:51PM +0100, Ilya Maximets wrote:
> On 03.12.2019 14:12, Flavio Leitner wrote:
> > On Tue, Dec 03, 2019 at 01:15:23PM +0100, Ilya Maximets wrote:
> >> On 02.12.2019 14:44, Flavio Leitner wrote:
> >>> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
> >>> the network stack to delegate the TCP segmentation to the NIC reducing
> >>> the per packet CPU overhead.
> >>>
> >>> A guest using vhost-user interface with TSO enabled can send TCP packets
> >>> much bigger than the MTU, which saves CPU cycles normally used to break
> >>> the packets down to MTU size and to calculate checksums.
> >>>
> >>> It also saves CPU cycles used to parse multiple packets/headers during
> >>> the packet processing inside virtual switch.
> >>>
> >>> If the destination of the packet is another guest in the same host, then
> >>> the same big packet can be sent through a vhost-user interface skipping
> >>> the segmentation completely. However, if the destination is not local,
> >>> the NIC hardware is instructed to do the TCP segmentation and checksum
> >>> calculation.
> >>>
> >>> The first 3 patches are not really part of TSO support, but they are
> >>> required to make sure everything works.
> >>>
> >>> The TSO is only enabled in the vhost-user ports in client mode which
> >>> means DPDK is required. The other ports (dpdk or linux) can receive
> >>> those packets.
> >>>
> >>> This patchset is based on branch dpdk-latest (v19.11 required).
> >>>
> >>> The numbers look good and I noticed no regression so far. However, my
> >>> environment is tuned but not well fined tuned, so consider those as
> >>> ball park numbers only.
> >>>
> >>> This is VM-to-external host (Gbits/sec)
> >>>                | No patch  |  TSO off  |  TSO on
> >>>               ---------------------------------------------
> >>> TCPv4          |    10     |    10     |   38  (line rate)
> >>> -----------------------------------------------------------
> >>> TCPv6          |     9     |     9     |   38  (line rate)
> >>> -----------------------------------------------------------
> >>> V4 Conntrack   |     1     |     1     |   33
> >>>
> >>>
> >>> This is VM-to-VM in the same host (Gbits/sec)
> >>>                | No patch  |  TSO off  |  TSO on
> >>>               ---------------------------------------------
> >>> TCPv4          |     9.4   |    9.4    |   31
> >>> -----------------------------------------------------------
> >>> TCPv6          |     8     |    8      |   30
> >>> -----------------------------------------------------------
> >>>
> >>> There are good improvements sending to veth pairs or tap devices too.
> >>>
> >>> Flavio Leitner (4):
> >>>   dp-packet: preserve headroom when cloning a pkt batch
> >>>   vhost: Disable multi-segmented buffers
> >>>   dp-packet: handle new dpdk buffer flags
> >>>   netdev-dpdk: Add TCP Segmentation Offload support
> >>>
> >>>  Documentation/automake.mk           |   1 +
> >>>  Documentation/topics/dpdk/index.rst |   1 +
> >>>  Documentation/topics/dpdk/tso.rst   |  83 +++++++++++++++++++++
> >>>  NEWS                                |   1 +
> >>>  lib/automake.mk                     |   2 +
> >>>  lib/dp-packet.c                     |  50 ++++++++++++-
> >>>  lib/dp-packet.h                     |  38 ++++++++--
> >>>  lib/netdev-bsd.c                    |   7 ++
> >>>  lib/netdev-dpdk.c                   | 112 +++++++++++++++++++++++-----
> >>>  lib/netdev-dummy.c                  |   7 ++
> >>>  lib/netdev-linux.c                  |  71 ++++++++++++++++--
> >>>  lib/tso.c                           |  54 ++++++++++++++
> >>>  lib/tso.h                           |  23 ++++++
> >>>  vswitchd/bridge.c                   |   2 +
> >>>  vswitchd/vswitch.xml                |  14 ++++
> >>>  15 files changed, 430 insertions(+), 36 deletions(-)
> >>>  create mode 100644 Documentation/topics/dpdk/tso.rst
> >>>  create mode 100644 lib/tso.c
> >>>  create mode 100644 lib/tso.h
> >>>
> >>
> >>
> >> Hi Flavio,
> >>
> >> Thanks for working on this.
> >> I didn't read the code carefully,  just a couple of first look points:
> >>
> >> * Patch-set needs to have 'dpdk-latest' in a subject prefix.
> > 
> > Ok.
> > 
> >> * First patch seems to be OK even without TSO, batch_clone usually
> >>   happens on clone action that implies further packet modifications,
> >>   so it might make sense to have a headroom for that.
> > 
> > It's in the cover letter that the first three patches are somewhat
> > unrelated, but needed to make sure everything works ok.
> > 
> > Also that it preserves the original headroom and not the default
> > headroom used in the new packets.
> > 
> >> * Maybe we need to enable LINEARBUF for usual vhost-user too just to
> >>   avoid receiving of mutli-segment mbufs?
> > 
> > That's what the patch does, right? The flag is always passed
> > regardless of TSO being enabled or disabled.
> 
> I meant dpdkvhostuser vs dpdkvhostuserclient.

OK, I thought dpdkvhostuser ports are deprecated since 2017 and they
are going to be removed (next release perhaps?). 

If not, then I agree I missed a warning though in case someone tries
to use today.

> >> * Third patch is not needed.  Similar patch was already merged recently.
> > 
> > Cool, but that patch wasn't in dpdk-latest, so I can't drop that yet.
> > 
> >> * I don't see the call to rte_eth_tx_prepare().  It is required to prepare
> >>   packets for TSO for Intel NICs that needs pseudo-header checksum
> >>   precalculated.
> > 
> > Good catch, will fix in v2.
> > 
> >> * Guest is allowed to disable offloading.  In this case we're not allowed
> >>   to send incomplete packets (no-checksum, oversized), they will be just
> >>   dropped by the guest.
> > 
> > Good point, will fix in v2.
> > 
> >> * NICs could not support TSO.  I see that you're moving this issue to the
> >>   user by forcing to be sure that TSO is supported.  Most of HW NICs
> >>   nowadays should support TSO, so it might be not a big deal, but
> >>   what about some software NIC that DPDK implements?  I'm not sure.
> > 
> > What kind of some interface you have in mind? Since I will need to
> > handle VMs with TSO disabled, then most probably the same solution
> > applies to those software devices.
> 
> Looking at the feature support table there are even few HW NICs that
> doesn't support TSO: https://doc.dpdk.org/guides/nics/overview.html
> 
> For example, axgbe and new hns3.
> 
> From virtual ports the most obvious is afxdp.

Do you have an use-case in mind that would use TSO and not have
support in the NIC?

> >> * Don't include virtio headers in dp-packet code.  This will break non-Linux
> >>   builds.  In fact, construction of virtio-net header is only needed for
> >>   netdev-linux, so it should be implemented there.
> > 
> > Isn't virtio available in non-linux envs? I had initially there, but
> > it breaks because it has DPDK dependencies as well.
> 
> I believe that "linux/virtio_net.h" is not available outside of linux.

Sure, I was talking about virtio structures, then it would be a
matter of selecting the right one. Because if we want to mess with
csum and offloading, most probably we will need those fields anyway
or translate them.

> BTW, You could always try to check your patch-set on CirrusCI with FreeBSD
> or on Windows with AppVeyor.

I will try that, thanks.

> We'll have to move this code if someday we'll want to use DPDK on BSD
> or Windows.  Right now netdev-dpdk depends on Linux, but mostly because
> of the vhost-user support.
> 
> > 
> >> * netdev-afxdp and netdev-windows are not handled by the patches.
> > 
> > That's correct.
> > 
> >> * I'm not sure if we need separate tso.{c,h} files.
> > 
> > The tso_init() is called from vswtchd/bridge.c, so I am open to
> > ideas.
> 
> Need to think about this.
> 
> > 
> >> * Docs and log messages are very confusing because they makes impression that
> >>   system datapath doesn't support TSO.  Also, why we cant' have TSO without DPDK?
> >>   You have a virtio header for netdev-linux, so we can use it.
> > 
> > OK, the doc is under the subdirectory dpdk/, but the vswitchd.xml
> > and the other log messages could be more clear.
> > 
> >> * Is it safe to just drop virtio header on packet receive?
> > 
> > We don't support TSO from other devices yet, so we drop that and
> > let the usual checking drop the packets if they miss csum or are
> > oversized.
> > 
> >> * Not sure if it fully safe to say that checksum is valid while it's not
> >>   calculated.  Have you checked this from the ipf and conntrack points of
> >>   view.
> > 
> > Could you elaborate? I didn't get the question.
> 
> I meant that ipf and conntrack in userspace are using the checksum_valid()
> functions and it might be bad if they think that checksum is correct while
> it's not.  Just wanted to be sure that you've checked the callers.

For that yes, the csum is valid either if the HW RX indicates that or if
the TX side indicates offloading csum calculation. It trusts that while
the packet is within the host there is no corruption.

But there are places where the csum is recalculated and uses the old
value, which should be also okay, but I haven't tested that scenario yet.


> One more point:
> 
> * I see that you're claiming that TSO is not supported with tunneling, so
>   it might make sense to fail the encapsulation for packets with wrong checksums
>   if we're not going to have software TSO implementation.

I will add a log warning if that's the case.

> BTW, something strange happens to your mail account:
> """
> An error occurred while sending mail. The mail server responded:
> 550 5.1.2 <fbl@sysclose.com>: Recipient address rejected: Domain not found.
> Please check the message recipient "fbl@sysclose.com" and try again.
> """
> So, I changed it back to .org one.

Oops, I fixed here, thanks!
Ciara Loftus Dec. 9, 2019, 4:18 p.m. UTC | #7
> 
> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
> the network stack to delegate the TCP segmentation to the NIC reducing
> the per packet CPU overhead.
> 
> A guest using vhost-user interface with TSO enabled can send TCP packets
> much bigger than the MTU, which saves CPU cycles normally used to break
> the packets down to MTU size and to calculate checksums.
> 
> It also saves CPU cycles used to parse multiple packets/headers during
> the packet processing inside virtual switch.
> 
> If the destination of the packet is another guest in the same host, then
> the same big packet can be sent through a vhost-user interface skipping
> the segmentation completely. However, if the destination is not local,
> the NIC hardware is instructed to do the TCP segmentation and checksum
> calculation.
> 
> The first 3 patches are not really part of TSO support, but they are
> required to make sure everything works.
> 
> The TSO is only enabled in the vhost-user ports in client mode which
> means DPDK is required. The other ports (dpdk or linux) can receive
> those packets.
> 
> This patchset is based on branch dpdk-latest (v19.11 required).
> 
> The numbers look good and I noticed no regression so far. However, my
> environment is tuned but not well fined tuned, so consider those as
> ball park numbers only.
> 
> This is VM-to-external host (Gbits/sec)
>                | No patch  |  TSO off  |  TSO on
>               ---------------------------------------------
> TCPv4          |    10     |    10     |   38  (line rate)
> -----------------------------------------------------------
> TCPv6          |     9     |     9     |   38  (line rate)
> -----------------------------------------------------------
> V4 Conntrack   |     1     |     1     |   33
> 
> 
> This is VM-to-VM in the same host (Gbits/sec)
>                | No patch  |  TSO off  |  TSO on
>               ---------------------------------------------
> TCPv4          |     9.4   |    9.4    |   31
> -----------------------------------------------------------
> TCPv6          |     8     |    8      |   30
> -----------------------------------------------------------

Hi Flavio,

Thanks for the patch.
Similarly, my environment is not tuned but I'm measuring a decent improvement too.
(These results include your fixes on https://github.com/fleitner/ovs/tree/tso-v2-devel)
VM to external host: +71%
VM to VM: +154%

Note: TSO appears to be broken for some cases on i40e: https://bugs.dpdk.org/show_bug.cgi?id=373
So I've applied this patch to DPDK for my tests: https://github.com/Mic92/dpdk/commit/2dc9b8dd2c8e2eb71f216b5b9222a4deb57482c9

I've also tested on 82599ES however since we hit near line rate with TSO off, no significant improvement can be measured.

Thanks,
Ciara

> 
> There are good improvements sending to veth pairs or tap devices too.
> 
> Flavio Leitner (4):
>   dp-packet: preserve headroom when cloning a pkt batch
>   vhost: Disable multi-segmented buffers
>   dp-packet: handle new dpdk buffer flags
>   netdev-dpdk: Add TCP Segmentation Offload support
> 
>  Documentation/automake.mk           |   1 +
>  Documentation/topics/dpdk/index.rst |   1 +
>  Documentation/topics/dpdk/tso.rst   |  83 +++++++++++++++++++++
>  NEWS                                |   1 +
>  lib/automake.mk                     |   2 +
>  lib/dp-packet.c                     |  50 ++++++++++++-
>  lib/dp-packet.h                     |  38 ++++++++--
>  lib/netdev-bsd.c                    |   7 ++
>  lib/netdev-dpdk.c                   | 112 +++++++++++++++++++++++-----
>  lib/netdev-dummy.c                  |   7 ++
>  lib/netdev-linux.c                  |  71 ++++++++++++++++--
>  lib/tso.c                           |  54 ++++++++++++++
>  lib/tso.h                           |  23 ++++++
>  vswitchd/bridge.c                   |   2 +
>  vswitchd/vswitch.xml                |  14 ++++
>  15 files changed, 430 insertions(+), 36 deletions(-)
>  create mode 100644 Documentation/topics/dpdk/tso.rst
>  create mode 100644 lib/tso.c
>  create mode 100644 lib/tso.h
> 
> --
> 2.23.0
Stokes, Ian Dec. 11, 2019, 6:35 p.m. UTC | #8
On 12/3/2019 2:19 PM, Flavio Leitner wrote:
> On Tue, Dec 03, 2019 at 05:26:51PM +0100, Ilya Maximets wrote:
>> On 03.12.2019 14:12, Flavio Leitner wrote:
>>> On Tue, Dec 03, 2019 at 01:15:23PM +0100, Ilya Maximets wrote:
>>>> On 02.12.2019 14:44, Flavio Leitner wrote:
>>>>> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
>>>>> the network stack to delegate the TCP segmentation to the NIC reducing
>>>>> the per packet CPU overhead.
>>>>>
>>>>> A guest using vhost-user interface with TSO enabled can send TCP packets
>>>>> much bigger than the MTU, which saves CPU cycles normally used to break
>>>>> the packets down to MTU size and to calculate checksums.
>>>>>
>>>>> It also saves CPU cycles used to parse multiple packets/headers during
>>>>> the packet processing inside virtual switch.
>>>>>
>>>>> If the destination of the packet is another guest in the same host, then
>>>>> the same big packet can be sent through a vhost-user interface skipping
>>>>> the segmentation completely. However, if the destination is not local,
>>>>> the NIC hardware is instructed to do the TCP segmentation and checksum
>>>>> calculation.
>>>>>
>>>>> The first 3 patches are not really part of TSO support, but they are
>>>>> required to make sure everything works.
>>>>>
>>>>> The TSO is only enabled in the vhost-user ports in client mode which
>>>>> means DPDK is required. The other ports (dpdk or linux) can receive
>>>>> those packets.
>>>>>
>>>>> This patchset is based on branch dpdk-latest (v19.11 required).
>>>>>
>>>>> The numbers look good and I noticed no regression so far. However, my
>>>>> environment is tuned but not well fined tuned, so consider those as
>>>>> ball park numbers only.
>>>>>
>>>>> This is VM-to-external host (Gbits/sec)
>>>>>                 | No patch  |  TSO off  |  TSO on
>>>>>                ---------------------------------------------
>>>>> TCPv4          |    10     |    10     |   38  (line rate)
>>>>> -----------------------------------------------------------
>>>>> TCPv6          |     9     |     9     |   38  (line rate)
>>>>> -----------------------------------------------------------
>>>>> V4 Conntrack   |     1     |     1     |   33
>>>>>
>>>>>
>>>>> This is VM-to-VM in the same host (Gbits/sec)
>>>>>                 | No patch  |  TSO off  |  TSO on
>>>>>                ---------------------------------------------
>>>>> TCPv4          |     9.4   |    9.4    |   31
>>>>> -----------------------------------------------------------
>>>>> TCPv6          |     8     |    8      |   30
>>>>> -----------------------------------------------------------
>>>>>
>>>>> There are good improvements sending to veth pairs or tap devices too.
>>>>>
>>>>> Flavio Leitner (4):
>>>>>    dp-packet: preserve headroom when cloning a pkt batch
>>>>>    vhost: Disable multi-segmented buffers
>>>>>    dp-packet: handle new dpdk buffer flags
>>>>>    netdev-dpdk: Add TCP Segmentation Offload support
>>>>>
>>>>>   Documentation/automake.mk           |   1 +
>>>>>   Documentation/topics/dpdk/index.rst |   1 +
>>>>>   Documentation/topics/dpdk/tso.rst   |  83 +++++++++++++++++++++
>>>>>   NEWS                                |   1 +
>>>>>   lib/automake.mk                     |   2 +
>>>>>   lib/dp-packet.c                     |  50 ++++++++++++-
>>>>>   lib/dp-packet.h                     |  38 ++++++++--
>>>>>   lib/netdev-bsd.c                    |   7 ++
>>>>>   lib/netdev-dpdk.c                   | 112 +++++++++++++++++++++++-----
>>>>>   lib/netdev-dummy.c                  |   7 ++
>>>>>   lib/netdev-linux.c                  |  71 ++++++++++++++++--
>>>>>   lib/tso.c                           |  54 ++++++++++++++
>>>>>   lib/tso.h                           |  23 ++++++
>>>>>   vswitchd/bridge.c                   |   2 +
>>>>>   vswitchd/vswitch.xml                |  14 ++++
>>>>>   15 files changed, 430 insertions(+), 36 deletions(-)
>>>>>   create mode 100644 Documentation/topics/dpdk/tso.rst
>>>>>   create mode 100644 lib/tso.c
>>>>>   create mode 100644 lib/tso.h
>>>>>
>>>>
>>>>
>>>> Hi Flavio,
>>>>
>>>> Thanks for working on this.
>>>> I didn't read the code carefully,  just a couple of first look points:
>>>>
>>>> * Patch-set needs to have 'dpdk-latest' in a subject prefix.
>>>
>>> Ok.
>>>
>>>> * First patch seems to be OK even without TSO, batch_clone usually
>>>>    happens on clone action that implies further packet modifications,
>>>>    so it might make sense to have a headroom for that.
>>>
>>> It's in the cover letter that the first three patches are somewhat
>>> unrelated, but needed to make sure everything works ok.
>>>
>>> Also that it preserves the original headroom and not the default
>>> headroom used in the new packets.
>>>
>>>> * Maybe we need to enable LINEARBUF for usual vhost-user too just to
>>>>    avoid receiving of mutli-segment mbufs?
>>>
>>> That's what the patch does, right? The flag is always passed
>>> regardless of TSO being enabled or disabled.
>>
>> I meant dpdkvhostuser vs dpdkvhostuserclient.
> 
> OK, I thought dpdkvhostuser ports are deprecated since 2017 and they
> are going to be removed (next release perhaps?).
> 

I've raised it's removal a few times, but there was always feedback that 
some in the community are still using it/have to support it. Maybe we 
can check again and see is it acceptable now.

Aside: If we do intend to remove it I think we should Flag it in the 
NEWS for one release that it would be removed completely by 'OVS 2.X'.

> If not, then I agree I missed a warning though in case someone tries
> to use today.
>
This is OK with me if it's being kept for the moment, we could always 
remove it later if/when dpdkvhostuser is removed.

>>>> * Third patch is not needed.  Similar patch was already merged recently.
>>>
>>> Cool, but that patch wasn't in dpdk-latest, so I can't drop that yet.
>>>
>>>> * I don't see the call to rte_eth_tx_prepare().  It is required to prepare
>>>>    packets for TSO for Intel NICs that needs pseudo-header checksum
>>>>    precalculated.
>>>
>>> Good catch, will fix in v2.
>>>
>>>> * Guest is allowed to disable offloading.  In this case we're not allowed
>>>>    to send incomplete packets (no-checksum, oversized), they will be just
>>>>    dropped by the guest.
>>>
>>> Good point, will fix in v2.
>>>
>>>> * NICs could not support TSO.  I see that you're moving this issue to the
>>>>    user by forcing to be sure that TSO is supported.  Most of HW NICs
>>>>    nowadays should support TSO, so it might be not a big deal, but
>>>>    what about some software NIC that DPDK implements?  I'm not sure.
>>>
>>> What kind of some interface you have in mind? Since I will need to
>>> handle VMs with TSO disabled, then most probably the same solution
>>> applies to those software devices.
>>
>> Looking at the feature support table there are even few HW NICs that
>> doesn't support TSO: https://doc.dpdk.org/guides/nics/overview.html
>>
>> For example, axgbe and new hns3.

@Ilya, I understand the concern here, but as TSO would be experimental I 
think it would be OK to flag this to the user that ports deployed for 
use with TSO must support the feature (similar to what was done with 
Partial HWOL).

It may well be supported in the future for these (or maybe never in some 
cases) but documenting it should be OK I would hope initially.

>>
>>  From virtual ports the most obvious is afxdp.
> 
> Do you have an use-case in mind that would use TSO and not have
> support in the NIC?
> 
>>>> * Don't include virtio headers in dp-packet code.  This will break non-Linux
>>>>    builds.  In fact, construction of virtio-net header is only needed for
>>>>    netdev-linux, so it should be implemented there.
>>>
>>> Isn't virtio available in non-linux envs? I had initially there, but
>>> it breaks because it has DPDK dependencies as well.
>>
>> I believe that "linux/virtio_net.h" is not available outside of linux.
> 
> Sure, I was talking about virtio structures, then it would be a
> matter of selecting the right one. Because if we want to mess with
> csum and offloading, most probably we will need those fields anyway
> or translate them.
> 
>> BTW, You could always try to check your patch-set on CirrusCI with FreeBSD
>> or on Windows with AppVeyor.
> 
> I will try that, thanks.
> 
>> We'll have to move this code if someday we'll want to use DPDK on BSD
>> or Windows.  Right now netdev-dpdk depends on Linux, but mostly because
>> of the vhost-user support.
>>

@ Ilya, just to clarify is the concern that we should not introduce this 
until DPDK is supported in OVS for Windows and BSD?

I guess my concern here is that I'm not sure of the effort required to 
enable these (is there a requirement to do so?), especially in the 2.13 
timeline.

As DPDK is only supported for Linux with OVS I don't thinks this should 
block.

If in time effort to support Windows and BSD come about with DPDK then 
sure, we should revisit it. I just haven't heard a request as of yet for 
this.

>>>
>>>> * netdev-afxdp and netdev-windows are not handled by the patches.
>>>
>>> That's correct.
>>>
>>>> * I'm not sure if we need separate tso.{c,h} files.
>>>
>>> The tso_init() is called from vswtchd/bridge.c, so I am open to
>>> ideas.
>>
>> Need to think about this.
>>
>>>
>>>> * Docs and log messages are very confusing because they makes impression that
>>>>    system datapath doesn't support TSO.  Also, why we cant' have TSO without DPDK?
>>>>    You have a virtio header for netdev-linux, so we can use it.
>>>
>>> OK, the doc is under the subdirectory dpdk/, but the vswitchd.xml
>>> and the other log messages could be more clear.
>>>
>>>> * Is it safe to just drop virtio header on packet receive?
>>>
>>> We don't support TSO from other devices yet, so we drop that and
>>> let the usual checking drop the packets if they miss csum or are
>>> oversized.
>>>
>>>> * Not sure if it fully safe to say that checksum is valid while it's not
>>>>    calculated.  Have you checked this from the ipf and conntrack points of
>>>>    view.
>>>
>>> Could you elaborate? I didn't get the question.
>>
>> I meant that ipf and conntrack in userspace are using the checksum_valid()
>> functions and it might be bad if they think that checksum is correct while
>> it's not.  Just wanted to be sure that you've checked the callers.
> 
> For that yes, the csum is valid either if the HW RX indicates that or if
> the TX side indicates offloading csum calculation. It trusts that while
> the packet is within the host there is no corruption.
> 
> But there are places where the csum is recalculated and uses the old
> value, which should be also okay, but I haven't tested that scenario yet.
> 
+1
> 
>> One more point:
>>
>> * I see that you're claiming that TSO is not supported with tunneling, so
>>    it might make sense to fail the encapsulation for packets with wrong checksums
>>    if we're not going to have software TSO implementation.
> 
> I will add a log warning if that's the case.

Probably good to document as known limitation. It's something that can 
be worked towards as we'd seek to remive the experimental tag.

Thanks
Ian
> 
>> BTW, something strange happens to your mail account:
>> """
>> An error occurred while sending mail. The mail server responded:
>> 550 5.1.2 <fbl@sysclose.com>: Recipient address rejected: Domain not found.
>> Please check the message recipient "fbl@sysclose.com" and try again.
>> """
>> So, I changed it back to .org one.
> 
> Oops, I fixed here, thanks!
>
Ilya Maximets Dec. 11, 2019, 7:03 p.m. UTC | #9
On 11.12.2019 19:35, Stokes, Ian wrote:
> 
> 
> On 12/3/2019 2:19 PM, Flavio Leitner wrote:
>> On Tue, Dec 03, 2019 at 05:26:51PM +0100, Ilya Maximets wrote:
>>> On 03.12.2019 14:12, Flavio Leitner wrote:
>>>> On Tue, Dec 03, 2019 at 01:15:23PM +0100, Ilya Maximets wrote:
>>>>> On 02.12.2019 14:44, Flavio Leitner wrote:
>>>>>> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
>>>>>> the network stack to delegate the TCP segmentation to the NIC reducing
>>>>>> the per packet CPU overhead.
>>>>>>
>>>>>> A guest using vhost-user interface with TSO enabled can send TCP packets
>>>>>> much bigger than the MTU, which saves CPU cycles normally used to break
>>>>>> the packets down to MTU size and to calculate checksums.
>>>>>>
>>>>>> It also saves CPU cycles used to parse multiple packets/headers during
>>>>>> the packet processing inside virtual switch.
>>>>>>
>>>>>> If the destination of the packet is another guest in the same host, then
>>>>>> the same big packet can be sent through a vhost-user interface skipping
>>>>>> the segmentation completely. However, if the destination is not local,
>>>>>> the NIC hardware is instructed to do the TCP segmentation and checksum
>>>>>> calculation.
>>>>>>
>>>>>> The first 3 patches are not really part of TSO support, but they are
>>>>>> required to make sure everything works.
>>>>>>
>>>>>> The TSO is only enabled in the vhost-user ports in client mode which
>>>>>> means DPDK is required. The other ports (dpdk or linux) can receive
>>>>>> those packets.
>>>>>>
>>>>>> This patchset is based on branch dpdk-latest (v19.11 required).
>>>>>>
>>>>>> The numbers look good and I noticed no regression so far. However, my
>>>>>> environment is tuned but not well fined tuned, so consider those as
>>>>>> ball park numbers only.
>>>>>>
>>>>>> This is VM-to-external host (Gbits/sec)
>>>>>>                 | No patch  |  TSO off  |  TSO on
>>>>>>                ---------------------------------------------
>>>>>> TCPv4          |    10     |    10     |   38  (line rate)
>>>>>> -----------------------------------------------------------
>>>>>> TCPv6          |     9     |     9     |   38  (line rate)
>>>>>> -----------------------------------------------------------
>>>>>> V4 Conntrack   |     1     |     1     |   33
>>>>>>
>>>>>>
>>>>>> This is VM-to-VM in the same host (Gbits/sec)
>>>>>>                 | No patch  |  TSO off  |  TSO on
>>>>>>                ---------------------------------------------
>>>>>> TCPv4          |     9.4   |    9.4    |   31
>>>>>> -----------------------------------------------------------
>>>>>> TCPv6          |     8     |    8      |   30
>>>>>> -----------------------------------------------------------
>>>>>>
>>>>>> There are good improvements sending to veth pairs or tap devices too.
>>>>>>
>>>>>> Flavio Leitner (4):
>>>>>>    dp-packet: preserve headroom when cloning a pkt batch
>>>>>>    vhost: Disable multi-segmented buffers
>>>>>>    dp-packet: handle new dpdk buffer flags
>>>>>>    netdev-dpdk: Add TCP Segmentation Offload support
>>>>>>
>>>>>>   Documentation/automake.mk           |   1 +
>>>>>>   Documentation/topics/dpdk/index.rst |   1 +
>>>>>>   Documentation/topics/dpdk/tso.rst   |  83 +++++++++++++++++++++
>>>>>>   NEWS                                |   1 +
>>>>>>   lib/automake.mk                     |   2 +
>>>>>>   lib/dp-packet.c                     |  50 ++++++++++++-
>>>>>>   lib/dp-packet.h                     |  38 ++++++++--
>>>>>>   lib/netdev-bsd.c                    |   7 ++
>>>>>>   lib/netdev-dpdk.c                   | 112 +++++++++++++++++++++++-----
>>>>>>   lib/netdev-dummy.c                  |   7 ++
>>>>>>   lib/netdev-linux.c                  |  71 ++++++++++++++++--
>>>>>>   lib/tso.c                           |  54 ++++++++++++++
>>>>>>   lib/tso.h                           |  23 ++++++
>>>>>>   vswitchd/bridge.c                   |   2 +
>>>>>>   vswitchd/vswitch.xml                |  14 ++++
>>>>>>   15 files changed, 430 insertions(+), 36 deletions(-)
>>>>>>   create mode 100644 Documentation/topics/dpdk/tso.rst
>>>>>>   create mode 100644 lib/tso.c
>>>>>>   create mode 100644 lib/tso.h
>>>>>>
>>>>>
>>>>>
>>>>> Hi Flavio,
>>>>>
>>>>> Thanks for working on this.
>>>>> I didn't read the code carefully,  just a couple of first look points:
>>>>>
>>>>> * Patch-set needs to have 'dpdk-latest' in a subject prefix.
>>>>
>>>> Ok.
>>>>
>>>>> * First patch seems to be OK even without TSO, batch_clone usually
>>>>>    happens on clone action that implies further packet modifications,
>>>>>    so it might make sense to have a headroom for that.
>>>>
>>>> It's in the cover letter that the first three patches are somewhat
>>>> unrelated, but needed to make sure everything works ok.
>>>>
>>>> Also that it preserves the original headroom and not the default
>>>> headroom used in the new packets.
>>>>
>>>>> * Maybe we need to enable LINEARBUF for usual vhost-user too just to
>>>>>    avoid receiving of mutli-segment mbufs?
>>>>
>>>> That's what the patch does, right? The flag is always passed
>>>> regardless of TSO being enabled or disabled.
>>>
>>> I meant dpdkvhostuser vs dpdkvhostuserclient.
>>
>> OK, I thought dpdkvhostuser ports are deprecated since 2017 and they
>> are going to be removed (next release perhaps?).
>>
> 
> I've raised it's removal a few times, but there was always feedback that some in the community are still using it/have to support it. Maybe we can check again and see is it acceptable now.

It might be still in use for container networking with virtio-user on the
container side.  The basic case is that it's easier to mount socket file
inside container than requesting application to create socket with specific
name so the OVS will be able to find it.  The replacement could be to use
dpdk memif vdev or vhost vdev, but at least vhost vdev requires some events
handlers on the OVS side to track LSC and stuff.  So, I'm not sure.

> 
> Aside: If we do intend to remove it I think we should Flag it in the NEWS for one release that it would be removed completely by 'OVS 2.X'.
> 
>> If not, then I agree I missed a warning though in case someone tries
>> to use today.
>>
> This is OK with me if it's being kept for the moment, we could always remove it later if/when dpdkvhostuser is removed.
> 
>>>>> * Third patch is not needed.  Similar patch was already merged recently.
>>>>
>>>> Cool, but that patch wasn't in dpdk-latest, so I can't drop that yet.
>>>>
>>>>> * I don't see the call to rte_eth_tx_prepare().  It is required to prepare
>>>>>    packets for TSO for Intel NICs that needs pseudo-header checksum
>>>>>    precalculated.
>>>>
>>>> Good catch, will fix in v2.
>>>>
>>>>> * Guest is allowed to disable offloading.  In this case we're not allowed
>>>>>    to send incomplete packets (no-checksum, oversized), they will be just
>>>>>    dropped by the guest.
>>>>
>>>> Good point, will fix in v2.
>>>>
>>>>> * NICs could not support TSO.  I see that you're moving this issue to the
>>>>>    user by forcing to be sure that TSO is supported.  Most of HW NICs
>>>>>    nowadays should support TSO, so it might be not a big deal, but
>>>>>    what about some software NIC that DPDK implements?  I'm not sure.
>>>>
>>>> What kind of some interface you have in mind? Since I will need to
>>>> handle VMs with TSO disabled, then most probably the same solution
>>>> applies to those software devices.
>>>
>>> Looking at the feature support table there are even few HW NICs that
>>> doesn't support TSO: https://doc.dpdk.org/guides/nics/overview.html
>>>
>>> For example, axgbe and new hns3.
> 
> @Ilya, I understand the concern here, but as TSO would be experimental I think it would be OK to flag this to the user that ports deployed for use with TSO must support the feature (similar to what was done with Partial HWOL).
> 
> It may well be supported in the future for these (or maybe never in some cases) but documenting it should be OK I would hope initially.

Yes, sure.

> 
>>>
>>>  From virtual ports the most obvious is afxdp.
>>
>> Do you have an use-case in mind that would use TSO and not have
>> support in the NIC?

I thought about usecases a bit and it seems that they are kind of specific.
One possible case is a VM chaining where first VM re-assembles TCP frames
and passes them to VMs on the same host after that.  But, yes, we could just
document HW support as a requirement for now as it's a experimental feature.

>>
>>>>> * Don't include virtio headers in dp-packet code.  This will break non-Linux
>>>>>    builds.  In fact, construction of virtio-net header is only needed for
>>>>>    netdev-linux, so it should be implemented there.
>>>>
>>>> Isn't virtio available in non-linux envs? I had initially there, but
>>>> it breaks because it has DPDK dependencies as well.
>>>
>>> I believe that "linux/virtio_net.h" is not available outside of linux.
>>
>> Sure, I was talking about virtio structures, then it would be a
>> matter of selecting the right one. Because if we want to mess with
>> csum and offloading, most probably we will need those fields anyway
>> or translate them.
>>
>>> BTW, You could always try to check your patch-set on CirrusCI with FreeBSD
>>> or on Windows with AppVeyor.
>>
>> I will try that, thanks.
>>
>>> We'll have to move this code if someday we'll want to use DPDK on BSD
>>> or Windows.  Right now netdev-dpdk depends on Linux, but mostly because
>>> of the vhost-user support.
>>>
> 
> @ Ilya, just to clarify is the concern that we should not introduce this until DPDK is supported in OVS for Windows and BSD?

No, I just want this code to be moved to netdev-linux, i.e. to the only user.

> 
> I guess my concern here is that I'm not sure of the effort required to enable these (is there a requirement to do so?), especially in the 2.13 timeline.

There was a request from person who tried to build OVS with DPDK on FreeBSD,
but I think that it wasn't a hard requirement for something specific.
So, there is no hurry here.

About efforts, there shouldn't be much trouble for FreeBSD support.  We need
to compile out vhost-user support from netdev-dpdk as it depends on Linux and
we need to implement few things in ovs-numa module.  But I don't think that
anybody has time for that now.

> 
> As DPDK is only supported for Linux with OVS I don't thinks this should block.

I believe that '#include <linux/virtio_net.h>' will block compiling on non-Linux
systems.  vhost library in DPDK depends on Linux, so it might not make much sense
enabling TSO on other operating systems anyway.  So, moving it to netdev-linux
seems to be a better solution than guarding the code that doesn't depend on DPDK
with '#ifdef DPDK_NETDEV'.

> 
> If in time effort to support Windows and BSD come about with DPDK then sure, we should revisit it. I just haven't heard a request as of yet for this.
> 
>>>>
>>>>> * netdev-afxdp and netdev-windows are not handled by the patches.
>>>>
>>>> That's correct.
>>>>
>>>>> * I'm not sure if we need separate tso.{c,h} files.
>>>>
>>>> The tso_init() is called from vswtchd/bridge.c, so I am open to
>>>> ideas.
>>>
>>> Need to think about this.
>>>
>>>>
>>>>> * Docs and log messages are very confusing because they makes impression that
>>>>>    system datapath doesn't support TSO.  Also, why we cant' have TSO without DPDK?
>>>>>    You have a virtio header for netdev-linux, so we can use it.
>>>>
>>>> OK, the doc is under the subdirectory dpdk/, but the vswitchd.xml
>>>> and the other log messages could be more clear.
>>>>
>>>>> * Is it safe to just drop virtio header on packet receive?
>>>>
>>>> We don't support TSO from other devices yet, so we drop that and
>>>> let the usual checking drop the packets if they miss csum or are
>>>> oversized.
>>>>
>>>>> * Not sure if it fully safe to say that checksum is valid while it's not
>>>>>    calculated.  Have you checked this from the ipf and conntrack points of
>>>>>    view.
>>>>
>>>> Could you elaborate? I didn't get the question.
>>>
>>> I meant that ipf and conntrack in userspace are using the checksum_valid()
>>> functions and it might be bad if they think that checksum is correct while
>>> it's not.  Just wanted to be sure that you've checked the callers.
>>
>> For that yes, the csum is valid either if the HW RX indicates that or if
>> the TX side indicates offloading csum calculation. It trusts that while
>> the packet is within the host there is no corruption.
>>
>> But there are places where the csum is recalculated and uses the old
>> value, which should be also okay, but I haven't tested that scenario yet.
>>
> +1
>>
>>> One more point:
>>>
>>> * I see that you're claiming that TSO is not supported with tunneling, so
>>>    it might make sense to fail the encapsulation for packets with wrong checksums
>>>    if we're not going to have software TSO implementation.
>>
>> I will add a log warning if that's the case.
> 
> Probably good to document as known limitation. It's something that can be worked towards as we'd seek to remive the experimental tag.
> 
> Thanks
> Ian
>>
>>> BTW, something strange happens to your mail account:
>>> """
>>> An error occurred while sending mail. The mail server responded:
>>> 550 5.1.2 <fbl@sysclose.com>: Recipient address rejected: Domain not found.
>>> Please check the message recipient "fbl@sysclose.com" and try again.
>>> """
>>> So, I changed it back to .org one.
>>
>> Oops, I fixed here, thanks!
>>