mbox series

[ovs-dev,v4,0/3] Add support for TSO with DPDK

Message ID 20200116170035.261803-1-fbl@sysclose.org
Headers show
Series Add support for TSO with DPDK | expand

Message

Flavio Leitner Jan. 16, 2020, 5 p.m. UTC
Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
the network stack to delegate the TCP segmentation to the NIC reducing
the per packet CPU overhead.

A guest using vhost-user interface with TSO enabled can send TCP packets
much bigger than the MTU, which saves CPU cycles normally used to break
the packets down to MTU size and to calculate checksums.

It also saves CPU cycles used to parse multiple packets/headers during
the packet processing inside virtual switch.

If the destination of the packet is another guest in the same host, then
the same big packet can be sent through a vhost-user interface skipping
the segmentation completely. However, if the destination is not local,
the NIC hardware is instructed to do the TCP segmentation and checksum
calculation.

The first 2 patches are not really part of TSO support, but they are
required to make sure everything works.

There are good improvements sending to or receiving from veth pairs or
tap devices as well. See the iperf3 results below:

[*] veth with ethtool tx off.

VM sending to:          Default           Enabled     Enabled/Default
   Local BR             3 Gbits/sec     23 Gbits/sec      7x
   Net NS (veth)        3 Gbits/sec[*]  22 Gbits/sec      7x
   VM (same host)     2.5 Gbits/sec     24 Gbits/sec      9x
   Ext Host            10 Gbits/sec     35 Gbits/sec      3x
   Ext Host (vxlan)   8.8 Gbits/sec     (not supported) 

  Using VLAN:
   Local BR             3 Gbits/sec     23 Gbits/sec      7x
   VM (same host)     2.5 Gbits/sec     21 Gbits/sec      8x
   Ext Host           6.4 Gbits/sec     34 Gbits/sec      5x

  Using IPv6:
   Net NS (veth)      2.7 Gbits/sec[*]  22 Gbits/sec      8x
   VM (same host)     2.6 Gbits/sec     21 Gbits/sec      8x
   Ext Host           8.7 Gbits/sec     34 Gbits/sec      4x

  Conntrack:
   No packet changes: 1.41 Gbits/sec    33 Gbits/sec      23x

VM receiving from:
   Local BR           2.5 Gbits/sec     2.4 Gbits/sec     1x
   Net NS (veth)      2.5 Gbits/sec[*]  9.3 Gbits/sec     3x
   VM (same host)     4.9 Gbits/sec      25 Gbits/sec     5x
   Ext Host           9.7 Gbits/sec     9.4 Gbits/sec     1x
   Ext Host (vxlan)   5.5 Gbits/sec     (not supported)

  Using VLAN:
   Local BR           2.4 Gbits/sec     2.4 Gbits/sec     1x
   VM (same host)     3.8 Gbits/sec      24 Gbits/sec     8x
   Ext Host           9.5 Gbits/sec     9.5 Gbits/sec     1x

  Using IPv6:
   Net NS (veth)      2.2 Gbits/sec[*]   9 Gbits/sec      4x
   VM (same host)     4.5 Gbits/sec     24 Gbits/sec      5x
   Ext Host           8.9 Gbits/sec    8.9 Gbits/sec      1x

Used iperf3 -u to test UDP traffic limited at default 1Mbits/sec
and noticed no change with the exception for tunneled packets (not
supported).

Travis, AppVeyor, and Cirrus-ci passed.

Flavio Leitner (3):
  dp-packet: preserve headroom when cloning a pkt batch
  vhost: Disable multi-segmented buffers
  netdev-dpdk: Add TCP Segmentation Offload support

 Documentation/automake.mk              |   1 +
 Documentation/topics/index.rst         |   1 +
 Documentation/topics/userspace-tso.rst |  98 +++++++
 NEWS                                   |   1 +
 lib/automake.mk                        |   2 +
 lib/conntrack.c                        |  29 +-
 lib/dp-packet.h                        | 192 +++++++++++-
 lib/ipf.c                              |  32 +-
 lib/netdev-dpdk.c                      | 355 ++++++++++++++++++++---
 lib/netdev-linux-private.h             |   5 +
 lib/netdev-linux.c                     | 386 ++++++++++++++++++++++---
 lib/netdev-provider.h                  |   9 +
 lib/netdev.c                           |  78 ++++-
 lib/userspace-tso.c                    |  48 +++
 lib/userspace-tso.h                    |  23 ++
 vswitchd/bridge.c                      |   2 +
 vswitchd/vswitch.xml                   |  17 ++
 17 files changed, 1154 insertions(+), 125 deletions(-)
 create mode 100644 Documentation/topics/userspace-tso.rst
 create mode 100644 lib/userspace-tso.c
 create mode 100644 lib/userspace-tso.h

Comments

Ciara Loftus Jan. 17, 2020, 9:18 a.m. UTC | #1
> -----Original Message-----
> From: Flavio Leitner <fbl@sysclose.org>
> Sent: Thursday 16 January 2020 17:01
> To: dev@openvswitch.org
> Cc: Stokes, Ian <ian.stokes@intel.com>; Loftus, Ciara
> <ciara.loftus@intel.com>; Ilya Maximets <i.maximets@ovn.org>;
> yangyi01@inspur.com; txfh2007 <txfh2007@aliyun.com>; Flavio Leitner
> <fbl@sysclose.org>
> Subject: [PATCH v4 0/3] Add support for TSO with DPDK
> 
> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
> the network stack to delegate the TCP segmentation to the NIC reducing
> the per packet CPU overhead.
> 
> A guest using vhost-user interface with TSO enabled can send TCP packets
> much bigger than the MTU, which saves CPU cycles normally used to break
> the packets down to MTU size and to calculate checksums.
> 
> It also saves CPU cycles used to parse multiple packets/headers during
> the packet processing inside virtual switch.
> 
> If the destination of the packet is another guest in the same host, then
> the same big packet can be sent through a vhost-user interface skipping
> the segmentation completely. However, if the destination is not local,
> the NIC hardware is instructed to do the TCP segmentation and checksum
> calculation.
> 
> The first 2 patches are not really part of TSO support, but they are
> required to make sure everything works.
> 
> There are good improvements sending to or receiving from veth pairs or
> tap devices as well. See the iperf3 results below:
> 
> [*] veth with ethtool tx off.
> 
> VM sending to:          Default           Enabled     Enabled/Default
>    Local BR             3 Gbits/sec     23 Gbits/sec      7x
>    Net NS (veth)        3 Gbits/sec[*]  22 Gbits/sec      7x
>    VM (same host)     2.5 Gbits/sec     24 Gbits/sec      9x
>    Ext Host            10 Gbits/sec     35 Gbits/sec      3x

I re-ran my tests and observed similar (slightly better, actually) improvements as I reported for the v3:
VM -> VM (same host): +5.5x
VM -> Ext Host: +4.1x

Tested-by: Ciara Loftus <ciara.loftus@intel.com>

Thanks,
Ciara

>    Ext Host (vxlan)   8.8 Gbits/sec     (not supported)
> 
>   Using VLAN:
>    Local BR             3 Gbits/sec     23 Gbits/sec      7x
>    VM (same host)     2.5 Gbits/sec     21 Gbits/sec      8x
>    Ext Host           6.4 Gbits/sec     34 Gbits/sec      5x
> 
>   Using IPv6:
>    Net NS (veth)      2.7 Gbits/sec[*]  22 Gbits/sec      8x
>    VM (same host)     2.6 Gbits/sec     21 Gbits/sec      8x
>    Ext Host           8.7 Gbits/sec     34 Gbits/sec      4x
> 
>   Conntrack:
>    No packet changes: 1.41 Gbits/sec    33 Gbits/sec      23x
> 
> VM receiving from:
>    Local BR           2.5 Gbits/sec     2.4 Gbits/sec     1x
>    Net NS (veth)      2.5 Gbits/sec[*]  9.3 Gbits/sec     3x
>    VM (same host)     4.9 Gbits/sec      25 Gbits/sec     5x
>    Ext Host           9.7 Gbits/sec     9.4 Gbits/sec     1x
>    Ext Host (vxlan)   5.5 Gbits/sec     (not supported)
> 
>   Using VLAN:
>    Local BR           2.4 Gbits/sec     2.4 Gbits/sec     1x
>    VM (same host)     3.8 Gbits/sec      24 Gbits/sec     8x
>    Ext Host           9.5 Gbits/sec     9.5 Gbits/sec     1x
> 
>   Using IPv6:
>    Net NS (veth)      2.2 Gbits/sec[*]   9 Gbits/sec      4x
>    VM (same host)     4.5 Gbits/sec     24 Gbits/sec      5x
>    Ext Host           8.9 Gbits/sec    8.9 Gbits/sec      1x
> 
> Used iperf3 -u to test UDP traffic limited at default 1Mbits/sec
> and noticed no change with the exception for tunneled packets (not
> supported).
> 
> Travis, AppVeyor, and Cirrus-ci passed.
> 
> Flavio Leitner (3):
>   dp-packet: preserve headroom when cloning a pkt batch
>   vhost: Disable multi-segmented buffers
>   netdev-dpdk: Add TCP Segmentation Offload support
> 
>  Documentation/automake.mk              |   1 +
>  Documentation/topics/index.rst         |   1 +
>  Documentation/topics/userspace-tso.rst |  98 +++++++
>  NEWS                                   |   1 +
>  lib/automake.mk                        |   2 +
>  lib/conntrack.c                        |  29 +-
>  lib/dp-packet.h                        | 192 +++++++++++-
>  lib/ipf.c                              |  32 +-
>  lib/netdev-dpdk.c                      | 355 ++++++++++++++++++++---
>  lib/netdev-linux-private.h             |   5 +
>  lib/netdev-linux.c                     | 386 ++++++++++++++++++++++---
>  lib/netdev-provider.h                  |   9 +
>  lib/netdev.c                           |  78 ++++-
>  lib/userspace-tso.c                    |  48 +++
>  lib/userspace-tso.h                    |  23 ++
>  vswitchd/bridge.c                      |   2 +
>  vswitchd/vswitch.xml                   |  17 ++
>  17 files changed, 1154 insertions(+), 125 deletions(-)
>  create mode 100644 Documentation/topics/userspace-tso.rst
>  create mode 100644 lib/userspace-tso.c
>  create mode 100644 lib/userspace-tso.h
> 
> --
> 2.24.1
William Tu Jan. 21, 2020, 6:39 p.m. UTC | #2
On Thu, Jan 16, 2020 at 9:01 AM Flavio Leitner <fbl@sysclose.org> wrote:
>
> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
> the network stack to delegate the TCP segmentation to the NIC reducing
> the per packet CPU overhead.
>
> A guest using vhost-user interface with TSO enabled can send TCP packets
> much bigger than the MTU, which saves CPU cycles normally used to break
> the packets down to MTU size and to calculate checksums.
>
> It also saves CPU cycles used to parse multiple packets/headers during
> the packet processing inside virtual switch.
>
> If the destination of the packet is another guest in the same host, then
> the same big packet can be sent through a vhost-user interface skipping
> the segmentation completely. However, if the destination is not local,
> the NIC hardware is instructed to do the TCP segmentation and checksum
> calculation.
>
> The first 2 patches are not really part of TSO support, but they are
> required to make sure everything works.
>
> There are good improvements sending to or receiving from veth pairs or
> tap devices as well. See the iperf3 results below:
>
> [*] veth with ethtool tx off.
>

Hi Flavio,

I want to test performance of namespace to namespace using veth, hoping to
see TSO packets. Using below setup:
  iperf -c (ns0) -> veth peer -> OVS -> veth peer -> iperf -s (ns1)

With current master I'm not able to see large packet size being sent.
I compile ovs with --with-dpdk and,
$ ovs-vsctl set Open_vSwitch . other_config:userspace-tso-enable=true

At ns0 and ns1, enable tso by ethtool sg and tso on
The veth driver shows
# ip netns exec at_ns0 ethtool -k p0
Features for p0:
Cannot get device udp-fragmentation-offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: off
    tx-checksum-ipv4: off [fixed]
    tx-checksum-ip-generic: off
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off
scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: on
tcp-segmentation-offload: off
    tx-tcp-segmentation: off [requested on]
    tx-tcp-ecn-segmentation: off [requested on]
    tx-tcp-mangleid-segmentation: off [requested on]
   tx-tcp6-segmentation: off [requested on]

But I'm still seeing 1500 packet size, and about 1.3Gbps performance.
Is there anything I'm missing?

Thanks
William
Flavio Leitner Jan. 22, 2020, 8:26 a.m. UTC | #3
On Tue, Jan 21, 2020 at 10:39:14AM -0800, William Tu wrote:
> On Thu, Jan 16, 2020 at 9:01 AM Flavio Leitner <fbl@sysclose.org> wrote:
> >
> > Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
> > the network stack to delegate the TCP segmentation to the NIC reducing
> > the per packet CPU overhead.
> >
> > A guest using vhost-user interface with TSO enabled can send TCP packets
> > much bigger than the MTU, which saves CPU cycles normally used to break
> > the packets down to MTU size and to calculate checksums.
> >
> > It also saves CPU cycles used to parse multiple packets/headers during
> > the packet processing inside virtual switch.
> >
> > If the destination of the packet is another guest in the same host, then
> > the same big packet can be sent through a vhost-user interface skipping
> > the segmentation completely. However, if the destination is not local,
> > the NIC hardware is instructed to do the TCP segmentation and checksum
> > calculation.
> >
> > The first 2 patches are not really part of TSO support, but they are
> > required to make sure everything works.
> >
> > There are good improvements sending to or receiving from veth pairs or
> > tap devices as well. See the iperf3 results below:
> >
> > [*] veth with ethtool tx off.
> >
> 
> Hi Flavio,
> 
> I want to test performance of namespace to namespace using veth, hoping to
> see TSO packets. Using below setup:
>   iperf -c (ns0) -> veth peer -> OVS -> veth peer -> iperf -s (ns1)
> 
> With current master I'm not able to see large packet size being sent.
> I compile ovs with --with-dpdk and,
> $ ovs-vsctl set Open_vSwitch . other_config:userspace-tso-enable=true
> 
> At ns0 and ns1, enable tso by ethtool sg and tso on
> The veth driver shows
> # ip netns exec at_ns0 ethtool -k p0
> Features for p0:
> Cannot get device udp-fragmentation-offload settings: Operation not supported
> rx-checksumming: on
> tx-checksumming: off
^^^^^^^^^^^^^^^^^^^^^

That disables TSO. We had to do '[*] veth with ethtool tx off'
before TSO is supported exactly to avoid the large packets to
be able to run iperf3 tests, etc...

You must leave tx on (default) to enable TSO in veth pairs.

HTH,
fbl
William Tu Jan. 22, 2020, 5:32 p.m. UTC | #4
On Wed, Jan 22, 2020 at 12:26 AM Flavio Leitner <fbl@sysclose.org> wrote:
>
> On Tue, Jan 21, 2020 at 10:39:14AM -0800, William Tu wrote:
> > On Thu, Jan 16, 2020 at 9:01 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > >
> > > Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
> > > the network stack to delegate the TCP segmentation to the NIC reducing
> > > the per packet CPU overhead.
> > >
> > > A guest using vhost-user interface with TSO enabled can send TCP packets
> > > much bigger than the MTU, which saves CPU cycles normally used to break
> > > the packets down to MTU size and to calculate checksums.
> > >
> > > It also saves CPU cycles used to parse multiple packets/headers during
> > > the packet processing inside virtual switch.
> > >
> > > If the destination of the packet is another guest in the same host, then
> > > the same big packet can be sent through a vhost-user interface skipping
> > > the segmentation completely. However, if the destination is not local,
> > > the NIC hardware is instructed to do the TCP segmentation and checksum
> > > calculation.
> > >
> > > The first 2 patches are not really part of TSO support, but they are
> > > required to make sure everything works.
> > >
> > > There are good improvements sending to or receiving from veth pairs or
> > > tap devices as well. See the iperf3 results below:
> > >
> > > [*] veth with ethtool tx off.
> > >
> >
> > Hi Flavio,
> >
> > I want to test performance of namespace to namespace using veth, hoping to
> > see TSO packets. Using below setup:
> >   iperf -c (ns0) -> veth peer -> OVS -> veth peer -> iperf -s (ns1)
> >
> > With current master I'm not able to see large packet size being sent.
> > I compile ovs with --with-dpdk and,
> > $ ovs-vsctl set Open_vSwitch . other_config:userspace-tso-enable=true
> >
> > At ns0 and ns1, enable tso by ethtool sg and tso on
> > The veth driver shows
> > # ip netns exec at_ns0 ethtool -k p0
> > Features for p0:
> > Cannot get device udp-fragmentation-offload settings: Operation not supported
> > rx-checksumming: on
> > tx-checksumming: off
> ^^^^^^^^^^^^^^^^^^^^^
>
> That disables TSO. We had to do '[*] veth with ethtool tx off'
> before TSO is supported exactly to avoid the large packets to
> be able to run iperf3 tests, etc...
>
> You must leave tx on (default) to enable TSO in veth pairs.

Hi Flavio,

Thanks! With this setup:
    iperf3 -c (ns0) -> veth peer -> OVS -> veth peer -> iperf3 -s (ns1)
I got it working now. I can see TCP packet with 64k length.

without TSO: 1.3Gbps
with TSO: 6Gbps

Do you know that for af_packet (netdev-linux.c), if we switch to use
packet mmap[1]
instead of recvmmsg, will it improve performance?

[1] https://www.mjmwired.net/kernel/Documentation/networking/packet_mmap.txt

Regards,
William
Flavio Leitner Jan. 24, 2020, 2:42 p.m. UTC | #5
On Wed, Jan 22, 2020 at 09:32:31AM -0800, William Tu wrote:
[...]
> Hi Flavio,
[...]
> Do you know that for af_packet (netdev-linux.c), if we switch to use
> packet mmap[1]
> instead of recvmmsg, will it improve performance?
> 
> [1] https://www.mjmwired.net/kernel/Documentation/networking/packet_mmap.txt

I have not tried that, so I don't know.
Yifeng Sun Jan. 27, 2020, 5:24 p.m. UTC | #6
Hi Flavio,

I am testing your patch using iperf between 2 VMs on the same host.
But it seems that TCP connection can't be created between these 2 VMs.
When inspecting further, I found that TCP packets have invalid checksums.
This might be the reason.

I am wondering if I missed something in the setup? Thanks a lot.

Best,
Yifeng

On Thu, Jan 16, 2020 at 9:01 AM Flavio Leitner <fbl@sysclose.org> wrote:
>
> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
> the network stack to delegate the TCP segmentation to the NIC reducing
> the per packet CPU overhead.
>
> A guest using vhost-user interface with TSO enabled can send TCP packets
> much bigger than the MTU, which saves CPU cycles normally used to break
> the packets down to MTU size and to calculate checksums.
>
> It also saves CPU cycles used to parse multiple packets/headers during
> the packet processing inside virtual switch.
>
> If the destination of the packet is another guest in the same host, then
> the same big packet can be sent through a vhost-user interface skipping
> the segmentation completely. However, if the destination is not local,
> the NIC hardware is instructed to do the TCP segmentation and checksum
> calculation.
>
> The first 2 patches are not really part of TSO support, but they are
> required to make sure everything works.
>
> There are good improvements sending to or receiving from veth pairs or
> tap devices as well. See the iperf3 results below:
>
> [*] veth with ethtool tx off.
>
> VM sending to:          Default           Enabled     Enabled/Default
>    Local BR             3 Gbits/sec     23 Gbits/sec      7x
>    Net NS (veth)        3 Gbits/sec[*]  22 Gbits/sec      7x
>    VM (same host)     2.5 Gbits/sec     24 Gbits/sec      9x
>    Ext Host            10 Gbits/sec     35 Gbits/sec      3x
>    Ext Host (vxlan)   8.8 Gbits/sec     (not supported)
>
>   Using VLAN:
>    Local BR             3 Gbits/sec     23 Gbits/sec      7x
>    VM (same host)     2.5 Gbits/sec     21 Gbits/sec      8x
>    Ext Host           6.4 Gbits/sec     34 Gbits/sec      5x
>
>   Using IPv6:
>    Net NS (veth)      2.7 Gbits/sec[*]  22 Gbits/sec      8x
>    VM (same host)     2.6 Gbits/sec     21 Gbits/sec      8x
>    Ext Host           8.7 Gbits/sec     34 Gbits/sec      4x
>
>   Conntrack:
>    No packet changes: 1.41 Gbits/sec    33 Gbits/sec      23x
>
> VM receiving from:
>    Local BR           2.5 Gbits/sec     2.4 Gbits/sec     1x
>    Net NS (veth)      2.5 Gbits/sec[*]  9.3 Gbits/sec     3x
>    VM (same host)     4.9 Gbits/sec      25 Gbits/sec     5x
>    Ext Host           9.7 Gbits/sec     9.4 Gbits/sec     1x
>    Ext Host (vxlan)   5.5 Gbits/sec     (not supported)
>
>   Using VLAN:
>    Local BR           2.4 Gbits/sec     2.4 Gbits/sec     1x
>    VM (same host)     3.8 Gbits/sec      24 Gbits/sec     8x
>    Ext Host           9.5 Gbits/sec     9.5 Gbits/sec     1x
>
>   Using IPv6:
>    Net NS (veth)      2.2 Gbits/sec[*]   9 Gbits/sec      4x
>    VM (same host)     4.5 Gbits/sec     24 Gbits/sec      5x
>    Ext Host           8.9 Gbits/sec    8.9 Gbits/sec      1x
>
> Used iperf3 -u to test UDP traffic limited at default 1Mbits/sec
> and noticed no change with the exception for tunneled packets (not
> supported).
>
> Travis, AppVeyor, and Cirrus-ci passed.
>
> Flavio Leitner (3):
>   dp-packet: preserve headroom when cloning a pkt batch
>   vhost: Disable multi-segmented buffers
>   netdev-dpdk: Add TCP Segmentation Offload support
>
>  Documentation/automake.mk              |   1 +
>  Documentation/topics/index.rst         |   1 +
>  Documentation/topics/userspace-tso.rst |  98 +++++++
>  NEWS                                   |   1 +
>  lib/automake.mk                        |   2 +
>  lib/conntrack.c                        |  29 +-
>  lib/dp-packet.h                        | 192 +++++++++++-
>  lib/ipf.c                              |  32 +-
>  lib/netdev-dpdk.c                      | 355 ++++++++++++++++++++---
>  lib/netdev-linux-private.h             |   5 +
>  lib/netdev-linux.c                     | 386 ++++++++++++++++++++++---
>  lib/netdev-provider.h                  |   9 +
>  lib/netdev.c                           |  78 ++++-
>  lib/userspace-tso.c                    |  48 +++
>  lib/userspace-tso.h                    |  23 ++
>  vswitchd/bridge.c                      |   2 +
>  vswitchd/vswitch.xml                   |  17 ++
>  17 files changed, 1154 insertions(+), 125 deletions(-)
>  create mode 100644 Documentation/topics/userspace-tso.rst
>  create mode 100644 lib/userspace-tso.c
>  create mode 100644 lib/userspace-tso.h
>
> --
> 2.24.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ilya Maximets Jan. 27, 2020, 8:09 p.m. UTC | #7
On 27.01.2020 18:24, Yifeng Sun wrote:
> Hi Flavio,
> 
> I am testing your patch using iperf between 2 VMs on the same host.
> But it seems that TCP connection can't be created between these 2 VMs.
> When inspecting further, I found that TCP packets have invalid checksums.
> This might be the reason.
> 
> I am wondering if I missed something in the setup? Thanks a lot.

I didn't test myself, but according to current design, checksum offloading
(rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
be dropped by the guest kernel.

Best regards, Ilya Maximets.
Yifeng Sun Jan. 28, 2020, 1:17 a.m. UTC | #8
Hi Ilya,

Thanks for your reply.

The thing is, if checksum offloading is enabled in both VMs, then
sender VM will send
a packet with invalid TCP checksum, and later OVS will send this
packet to receiver
VM directly without calculating a valid checksum. As a result,
receiver VM will drop
this packet because it contains invalid checksum. This is what
happened when I tried
this patch.

Best,
Yifeng

On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 27.01.2020 18:24, Yifeng Sun wrote:
> > Hi Flavio,
> >
> > I am testing your patch using iperf between 2 VMs on the same host.
> > But it seems that TCP connection can't be created between these 2 VMs.
> > When inspecting further, I found that TCP packets have invalid checksums.
> > This might be the reason.
> >
> > I am wondering if I missed something in the setup? Thanks a lot.
>
> I didn't test myself, but according to current design, checksum offloading
> (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
> be dropped by the guest kernel.
>
> Best regards, Ilya Maximets.
Flavio Leitner Jan. 28, 2020, noon UTC | #9
On Mon, Jan 27, 2020 at 05:17:01PM -0800, Yifeng Sun wrote:
> Hi Ilya,
> 
> Thanks for your reply.
> 
> The thing is, if checksum offloading is enabled in both VMs, then
> sender VM will send
> a packet with invalid TCP checksum, and later OVS will send this
> packet to receiver
> VM directly without calculating a valid checksum. As a result,
> receiver VM will drop
> this packet because it contains invalid checksum. This is what
> happened when I tried
> this patch.
> 

When TSO is enabled, the TX checksumming offloading is required,
then you will see invalid checksum. This is well documented here:

https://github.com/openvswitch/ovs/blob/master/Documentation/topics/userspace-tso.rst#userspace-datapath---tso

"Additionally, if the traffic is headed to a VM within the same host
further optimization can be expected. As the traffic never leaves
the machine, no MTU needs to be accounted for, and thus no
segmentation and checksum calculations are required, which saves yet
more cycles."

Therefore, it's expected to see bad csum in the traffic dumps.

To use the feature, you need few steps: enable the feature in OvS
enable in qemu and inside the VM. The linux guest usually enable
the feature by default if qemu offers it.

HTH,
fbl


> Best,
> Yifeng
> 
> On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> >
> > On 27.01.2020 18:24, Yifeng Sun wrote:
> > > Hi Flavio,
> > >
> > > I am testing your patch using iperf between 2 VMs on the same host.
> > > But it seems that TCP connection can't be created between these 2 VMs.
> > > When inspecting further, I found that TCP packets have invalid checksums.
> > > This might be the reason.
> > >
> > > I am wondering if I missed something in the setup? Thanks a lot.
> >
> > I didn't test myself, but according to current design, checksum offloading
> > (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
> > be dropped by the guest kernel.
> >
> > Best regards, Ilya Maximets.
Yifeng Sun Jan. 28, 2020, 10:21 p.m. UTC | #10
Hi Flavio,

Thanks for the explanation. I followed the steps in the document but
TCP connection still failed to build between 2 VMs.

I finally modified VM's kernel directly to disable TCP checksum validation
to get it working properly. I got 30.0Gbps for 'iperf' between 2 VMs.

Best,
Yifeng


On Tue, Jan 28, 2020 at 4:00 AM Flavio Leitner <fbl@sysclose.org> wrote:
>
> On Mon, Jan 27, 2020 at 05:17:01PM -0800, Yifeng Sun wrote:
> > Hi Ilya,
> >
> > Thanks for your reply.
> >
> > The thing is, if checksum offloading is enabled in both VMs, then
> > sender VM will send
> > a packet with invalid TCP checksum, and later OVS will send this
> > packet to receiver
> > VM directly without calculating a valid checksum. As a result,
> > receiver VM will drop
> > this packet because it contains invalid checksum. This is what
> > happened when I tried
> > this patch.
> >
>
> When TSO is enabled, the TX checksumming offloading is required,
> then you will see invalid checksum. This is well documented here:
>
> https://github.com/openvswitch/ovs/blob/master/Documentation/topics/userspace-tso.rst#userspace-datapath---tso
>
> "Additionally, if the traffic is headed to a VM within the same host
> further optimization can be expected. As the traffic never leaves
> the machine, no MTU needs to be accounted for, and thus no
> segmentation and checksum calculations are required, which saves yet
> more cycles."
>
> Therefore, it's expected to see bad csum in the traffic dumps.
>
> To use the feature, you need few steps: enable the feature in OvS
> enable in qemu and inside the VM. The linux guest usually enable
> the feature by default if qemu offers it.
>
> HTH,
> fbl
>
>
> > Best,
> > Yifeng
> >
> > On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> > >
> > > On 27.01.2020 18:24, Yifeng Sun wrote:
> > > > Hi Flavio,
> > > >
> > > > I am testing your patch using iperf between 2 VMs on the same host.
> > > > But it seems that TCP connection can't be created between these 2 VMs.
> > > > When inspecting further, I found that TCP packets have invalid checksums.
> > > > This might be the reason.
> > > >
> > > > I am wondering if I missed something in the setup? Thanks a lot.
> > >
> > > I didn't test myself, but according to current design, checksum offloading
> > > (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
> > > be dropped by the guest kernel.
> > >
> > > Best regards, Ilya Maximets.
>
> --
> fbl
Flavio Leitner Jan. 28, 2020, 10:52 p.m. UTC | #11
On Tue, Jan 28, 2020 at 02:21:30PM -0800, Yifeng Sun wrote:
> Hi Flavio,
> 
> Thanks for the explanation. I followed the steps in the document but
> TCP connection still failed to build between 2 VMs.
> 
> I finally modified VM's kernel directly to disable TCP checksum validation
> to get it working properly. I got 30.0Gbps for 'iperf' between 2 VMs.

Could you provide more details on how you did that? What's running
inside the VM?

I don't change anything inside of the VMs (Linux) in my testbed.

fbl


> 
> Best,
> Yifeng
> 
> 
> On Tue, Jan 28, 2020 at 4:00 AM Flavio Leitner <fbl@sysclose.org> wrote:
> >
> > On Mon, Jan 27, 2020 at 05:17:01PM -0800, Yifeng Sun wrote:
> > > Hi Ilya,
> > >
> > > Thanks for your reply.
> > >
> > > The thing is, if checksum offloading is enabled in both VMs, then
> > > sender VM will send
> > > a packet with invalid TCP checksum, and later OVS will send this
> > > packet to receiver
> > > VM directly without calculating a valid checksum. As a result,
> > > receiver VM will drop
> > > this packet because it contains invalid checksum. This is what
> > > happened when I tried
> > > this patch.
> > >
> >
> > When TSO is enabled, the TX checksumming offloading is required,
> > then you will see invalid checksum. This is well documented here:
> >
> > https://github.com/openvswitch/ovs/blob/master/Documentation/topics/userspace-tso.rst#userspace-datapath---tso
> >
> > "Additionally, if the traffic is headed to a VM within the same host
> > further optimization can be expected. As the traffic never leaves
> > the machine, no MTU needs to be accounted for, and thus no
> > segmentation and checksum calculations are required, which saves yet
> > more cycles."
> >
> > Therefore, it's expected to see bad csum in the traffic dumps.
> >
> > To use the feature, you need few steps: enable the feature in OvS
> > enable in qemu and inside the VM. The linux guest usually enable
> > the feature by default if qemu offers it.
> >
> > HTH,
> > fbl
> >
> >
> > > Best,
> > > Yifeng
> > >
> > > On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> > > >
> > > > On 27.01.2020 18:24, Yifeng Sun wrote:
> > > > > Hi Flavio,
> > > > >
> > > > > I am testing your patch using iperf between 2 VMs on the same host.
> > > > > But it seems that TCP connection can't be created between these 2 VMs.
> > > > > When inspecting further, I found that TCP packets have invalid checksums.
> > > > > This might be the reason.
> > > > >
> > > > > I am wondering if I missed something in the setup? Thanks a lot.
> > > >
> > > > I didn't test myself, but according to current design, checksum offloading
> > > > (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
> > > > be dropped by the guest kernel.
> > > >
> > > > Best regards, Ilya Maximets.
> >
> > --
> > fbl
Yifeng Sun Jan. 28, 2020, 11:23 p.m. UTC | #12
Sure.

Firstly, make sure userspace-tso-enable is true
# ovs-vsctl get Open_vSwitch . other_config
{dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
userspace-tso-enable="true"}

Next, create 2 VMs with vhostuser-type interface on the same KVM host:
    <interface type='vhostuser'>
      <mac address='88:69:00:00:00:11'/>
      <source type='unix'
path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
      <model type='virtio'/>
      <driver queues='2' rx_queue_size='512'>
        <host csum='on' tso4='on' tso6='on'/>
        <guest csum='on' tso4='on' tso6='on'/>
      </driver>
      <alias name='net2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
function='0x0'/>
    </interface>

When VM boots up, turn on tx, tso and sg
# ethtool -K ens6 tx on
# ethtool -K ens6 tso on
# ethtool -K ens6 sg on

Then run 'iperf -s' on one VM and 'iperf -c xx.xx.xx.xx' on another VM.
Iperf doesn't work if there is no chage to VM's kernel. `tcpdump` shows
that iperf server received packets with invalid TCP checksum.
`nstat -a` shows that TcpInCsumErr number is accumulating.

After adding changes to VM's kernel as below, iperf works properly.
in tcp_v4_rcv()
      - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
      + if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))

static inline bool tcp_checksum_complete(struct sk_buff *skb)
{
        return 0;
}



Best,
Yifeng

On Tue, Jan 28, 2020 at 2:52 PM Flavio Leitner <fbl@sysclose.org> wrote:
>
> On Tue, Jan 28, 2020 at 02:21:30PM -0800, Yifeng Sun wrote:
> > Hi Flavio,
> >
> > Thanks for the explanation. I followed the steps in the document but
> > TCP connection still failed to build between 2 VMs.
> >
> > I finally modified VM's kernel directly to disable TCP checksum validation
> > to get it working properly. I got 30.0Gbps for 'iperf' between 2 VMs.
>
> Could you provide more details on how you did that? What's running
> inside the VM?
>
> I don't change anything inside of the VMs (Linux) in my testbed.
>
> fbl
>
>
> >
> > Best,
> > Yifeng
> >
> >
> > On Tue, Jan 28, 2020 at 4:00 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > >
> > > On Mon, Jan 27, 2020 at 05:17:01PM -0800, Yifeng Sun wrote:
> > > > Hi Ilya,
> > > >
> > > > Thanks for your reply.
> > > >
> > > > The thing is, if checksum offloading is enabled in both VMs, then
> > > > sender VM will send
> > > > a packet with invalid TCP checksum, and later OVS will send this
> > > > packet to receiver
> > > > VM directly without calculating a valid checksum. As a result,
> > > > receiver VM will drop
> > > > this packet because it contains invalid checksum. This is what
> > > > happened when I tried
> > > > this patch.
> > > >
> > >
> > > When TSO is enabled, the TX checksumming offloading is required,
> > > then you will see invalid checksum. This is well documented here:
> > >
> > > https://github.com/openvswitch/ovs/blob/master/Documentation/topics/userspace-tso.rst#userspace-datapath---tso
> > >
> > > "Additionally, if the traffic is headed to a VM within the same host
> > > further optimization can be expected. As the traffic never leaves
> > > the machine, no MTU needs to be accounted for, and thus no
> > > segmentation and checksum calculations are required, which saves yet
> > > more cycles."
> > >
> > > Therefore, it's expected to see bad csum in the traffic dumps.
> > >
> > > To use the feature, you need few steps: enable the feature in OvS
> > > enable in qemu and inside the VM. The linux guest usually enable
> > > the feature by default if qemu offers it.
> > >
> > > HTH,
> > > fbl
> > >
> > >
> > > > Best,
> > > > Yifeng
> > > >
> > > > On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> > > > >
> > > > > On 27.01.2020 18:24, Yifeng Sun wrote:
> > > > > > Hi Flavio,
> > > > > >
> > > > > > I am testing your patch using iperf between 2 VMs on the same host.
> > > > > > But it seems that TCP connection can't be created between these 2 VMs.
> > > > > > When inspecting further, I found that TCP packets have invalid checksums.
> > > > > > This might be the reason.
> > > > > >
> > > > > > I am wondering if I missed something in the setup? Thanks a lot.
> > > > >
> > > > > I didn't test myself, but according to current design, checksum offloading
> > > > > (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
> > > > > be dropped by the guest kernel.
> > > > >
> > > > > Best regards, Ilya Maximets.
> > >
> > > --
> > > fbl
>
> --
> fbl
Flavio Leitner Jan. 29, 2020, 11:25 a.m. UTC | #13
On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
> Sure.
> 
> Firstly, make sure userspace-tso-enable is true
> # ovs-vsctl get Open_vSwitch . other_config
> {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
> userspace-tso-enable="true"}
> 
> Next, create 2 VMs with vhostuser-type interface on the same KVM host:
>     <interface type='vhostuser'>
>       <mac address='88:69:00:00:00:11'/>
>       <source type='unix'
> path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
>       <model type='virtio'/>
>       <driver queues='2' rx_queue_size='512'>
>         <host csum='on' tso4='on' tso6='on'/>
>         <guest csum='on' tso4='on' tso6='on'/>

I have other options set, but I don't think they are related:
       <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
ufo='off' mrg_rxbuf='on'/>
       <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>


>       </driver>
>       <alias name='net2'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
> function='0x0'/>
>     </interface>
> 
> When VM boots up, turn on tx, tso and sg
> # ethtool -K ens6 tx on
> # ethtool -K ens6 tso on
> # ethtool -K ens6 sg on

All the needed offloading features are turned on by default,
so I don't change anything in my testbed.

> Then run 'iperf -s' on one VM and 'iperf -c xx.xx.xx.xx' on another VM.
> Iperf doesn't work if there is no chage to VM's kernel. `tcpdump` shows
> that iperf server received packets with invalid TCP checksum.
> `nstat -a` shows that TcpInCsumErr number is accumulating.
> 
> After adding changes to VM's kernel as below, iperf works properly.
> in tcp_v4_rcv()
>       - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
>       + if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> 
> static inline bool tcp_checksum_complete(struct sk_buff *skb)
> {
>         return 0;
> }

That's odd. Which kernel is that? Maybe I can try the same version.
I am using 5.2.14-200.fc30.x86_64.

Looks like somehow the packet lost its offloading flags, then kernel
has to check the csum and since it wasn't calculated before, it's 
just random garbage.

fbl


> 
> 
> 
> Best,
> Yifeng
> 
> On Tue, Jan 28, 2020 at 2:52 PM Flavio Leitner <fbl@sysclose.org> wrote:
> >
> > On Tue, Jan 28, 2020 at 02:21:30PM -0800, Yifeng Sun wrote:
> > > Hi Flavio,
> > >
> > > Thanks for the explanation. I followed the steps in the document but
> > > TCP connection still failed to build between 2 VMs.
> > >
> > > I finally modified VM's kernel directly to disable TCP checksum validation
> > > to get it working properly. I got 30.0Gbps for 'iperf' between 2 VMs.
> >
> > Could you provide more details on how you did that? What's running
> > inside the VM?
> >
> > I don't change anything inside of the VMs (Linux) in my testbed.
> >
> > fbl
> >
> >
> > >
> > > Best,
> > > Yifeng
> > >
> > >
> > > On Tue, Jan 28, 2020 at 4:00 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > > >
> > > > On Mon, Jan 27, 2020 at 05:17:01PM -0800, Yifeng Sun wrote:
> > > > > Hi Ilya,
> > > > >
> > > > > Thanks for your reply.
> > > > >
> > > > > The thing is, if checksum offloading is enabled in both VMs, then
> > > > > sender VM will send
> > > > > a packet with invalid TCP checksum, and later OVS will send this
> > > > > packet to receiver
> > > > > VM directly without calculating a valid checksum. As a result,
> > > > > receiver VM will drop
> > > > > this packet because it contains invalid checksum. This is what
> > > > > happened when I tried
> > > > > this patch.
> > > > >
> > > >
> > > > When TSO is enabled, the TX checksumming offloading is required,
> > > > then you will see invalid checksum. This is well documented here:
> > > >
> > > > https://github.com/openvswitch/ovs/blob/master/Documentation/topics/userspace-tso.rst#userspace-datapath---tso
> > > >
> > > > "Additionally, if the traffic is headed to a VM within the same host
> > > > further optimization can be expected. As the traffic never leaves
> > > > the machine, no MTU needs to be accounted for, and thus no
> > > > segmentation and checksum calculations are required, which saves yet
> > > > more cycles."
> > > >
> > > > Therefore, it's expected to see bad csum in the traffic dumps.
> > > >
> > > > To use the feature, you need few steps: enable the feature in OvS
> > > > enable in qemu and inside the VM. The linux guest usually enable
> > > > the feature by default if qemu offers it.
> > > >
> > > > HTH,
> > > > fbl
> > > >
> > > >
> > > > > Best,
> > > > > Yifeng
> > > > >
> > > > > On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> > > > > >
> > > > > > On 27.01.2020 18:24, Yifeng Sun wrote:
> > > > > > > Hi Flavio,
> > > > > > >
> > > > > > > I am testing your patch using iperf between 2 VMs on the same host.
> > > > > > > But it seems that TCP connection can't be created between these 2 VMs.
> > > > > > > When inspecting further, I found that TCP packets have invalid checksums.
> > > > > > > This might be the reason.
> > > > > > >
> > > > > > > I am wondering if I missed something in the setup? Thanks a lot.
> > > > > >
> > > > > > I didn't test myself, but according to current design, checksum offloading
> > > > > > (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
> > > > > > be dropped by the guest kernel.
> > > > > >
> > > > > > Best regards, Ilya Maximets.
> > > >
> > > > --
> > > > fbl
> >
> > --
> > fbl
Ilya Maximets Jan. 29, 2020, 12:07 p.m. UTC | #14
On 29.01.2020 12:25, Flavio Leitner wrote:
> On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
>> Sure.
>>
>> Firstly, make sure userspace-tso-enable is true
>> # ovs-vsctl get Open_vSwitch . other_config
>> {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
>> userspace-tso-enable="true"}
>>
>> Next, create 2 VMs with vhostuser-type interface on the same KVM host:
>>     <interface type='vhostuser'>
>>       <mac address='88:69:00:00:00:11'/>
>>       <source type='unix'
>> path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
>>       <model type='virtio'/>
>>       <driver queues='2' rx_queue_size='512'>
>>         <host csum='on' tso4='on' tso6='on'/>
>>         <guest csum='on' tso4='on' tso6='on'/>
> 
> I have other options set, but I don't think they are related:
>        <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
> ufo='off' mrg_rxbuf='on'/>>        <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>
> 
> 
>>       </driver>
>>       <alias name='net2'/>
>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
>> function='0x0'/>
>>     </interface>
>>
>> When VM boots up, turn on tx, tso and sg
>> # ethtool -K ens6 tx on
>> # ethtool -K ens6 tso on
>> # ethtool -K ens6 sg on

Could you, please, provide the output of 'ethtool -k ens6'?
If for some reason rx offloading is not enabled by default, you need to
enable it too.

> 
> All the needed offloading features are turned on by default,
> so I don't change anything in my testbed.
> 
>> Then run 'iperf -s' on one VM and 'iperf -c xx.xx.xx.xx' on another VM.
>> Iperf doesn't work if there is no chage to VM's kernel. `tcpdump` shows
>> that iperf server received packets with invalid TCP checksum.
>> `nstat -a` shows that TcpInCsumErr number is accumulating.
>>
>> After adding changes to VM's kernel as below, iperf works properly.
>> in tcp_v4_rcv()
>>       - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
>>       + if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
>>
>> static inline bool tcp_checksum_complete(struct sk_buff *skb)
>> {
>>         return 0;
>> }
> 
> That's odd. Which kernel is that? Maybe I can try the same version.
> I am using 5.2.14-200.fc30.x86_64.
> 
> Looks like somehow the packet lost its offloading flags, then kernel
> has to check the csum and since it wasn't calculated before, it's 
> just random garbage.
> 
> fbl
> 
> 
>>
>>
>>
>> Best,
>> Yifeng
>>
>> On Tue, Jan 28, 2020 at 2:52 PM Flavio Leitner <fbl@sysclose.org> wrote:
>>>
>>> On Tue, Jan 28, 2020 at 02:21:30PM -0800, Yifeng Sun wrote:
>>>> Hi Flavio,
>>>>
>>>> Thanks for the explanation. I followed the steps in the document but
>>>> TCP connection still failed to build between 2 VMs.
>>>>
>>>> I finally modified VM's kernel directly to disable TCP checksum validation
>>>> to get it working properly. I got 30.0Gbps for 'iperf' between 2 VMs.
>>>
>>> Could you provide more details on how you did that? What's running
>>> inside the VM?
>>>
>>> I don't change anything inside of the VMs (Linux) in my testbed.
>>>
>>> fbl
>>>
>>>
>>>>
>>>> Best,
>>>> Yifeng
>>>>
>>>>
>>>> On Tue, Jan 28, 2020 at 4:00 AM Flavio Leitner <fbl@sysclose.org> wrote:
>>>>>
>>>>> On Mon, Jan 27, 2020 at 05:17:01PM -0800, Yifeng Sun wrote:
>>>>>> Hi Ilya,
>>>>>>
>>>>>> Thanks for your reply.
>>>>>>
>>>>>> The thing is, if checksum offloading is enabled in both VMs, then
>>>>>> sender VM will send
>>>>>> a packet with invalid TCP checksum, and later OVS will send this
>>>>>> packet to receiver
>>>>>> VM directly without calculating a valid checksum. As a result,
>>>>>> receiver VM will drop
>>>>>> this packet because it contains invalid checksum. This is what
>>>>>> happened when I tried
>>>>>> this patch.
>>>>>>
>>>>>
>>>>> When TSO is enabled, the TX checksumming offloading is required,
>>>>> then you will see invalid checksum. This is well documented here:
>>>>>
>>>>> https://github.com/openvswitch/ovs/blob/master/Documentation/topics/userspace-tso.rst#userspace-datapath---tso
>>>>>
>>>>> "Additionally, if the traffic is headed to a VM within the same host
>>>>> further optimization can be expected. As the traffic never leaves
>>>>> the machine, no MTU needs to be accounted for, and thus no
>>>>> segmentation and checksum calculations are required, which saves yet
>>>>> more cycles."
>>>>>
>>>>> Therefore, it's expected to see bad csum in the traffic dumps.
>>>>>
>>>>> To use the feature, you need few steps: enable the feature in OvS
>>>>> enable in qemu and inside the VM. The linux guest usually enable
>>>>> the feature by default if qemu offers it.
>>>>>
>>>>> HTH,
>>>>> fbl
>>>>>
>>>>>
>>>>>> Best,
>>>>>> Yifeng
>>>>>>
>>>>>> On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>>>>
>>>>>>> On 27.01.2020 18:24, Yifeng Sun wrote:
>>>>>>>> Hi Flavio,
>>>>>>>>
>>>>>>>> I am testing your patch using iperf between 2 VMs on the same host.
>>>>>>>> But it seems that TCP connection can't be created between these 2 VMs.
>>>>>>>> When inspecting further, I found that TCP packets have invalid checksums.
>>>>>>>> This might be the reason.
>>>>>>>>
>>>>>>>> I am wondering if I missed something in the setup? Thanks a lot.
>>>>>>>
>>>>>>> I didn't test myself, but according to current design, checksum offloading
>>>>>>> (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
>>>>>>> be dropped by the guest kernel.
>>>>>>>
>>>>>>> Best regards, Ilya Maximets.
>>>>>
>>>>> --
>>>>> fbl
>>>
>>> --
>>> fbl
>
Yifeng Sun Jan. 29, 2020, 5:04 p.m. UTC | #15
Hi Flavio,

Sorry in my last email, one change is incorrect. it should be:
in tcp_v4_rcv()
      - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
      + if (0)

The kernel version I am using is ubuntu 18.04's default kernel:
$ uname -r
4.15.0-76-generic

Thanks,
Yifeng

On Wed, Jan 29, 2020 at 3:25 AM Flavio Leitner <fbl@sysclose.org> wrote:
>
> On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
> > Sure.
> >
> > Firstly, make sure userspace-tso-enable is true
> > # ovs-vsctl get Open_vSwitch . other_config
> > {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
> > userspace-tso-enable="true"}
> >
> > Next, create 2 VMs with vhostuser-type interface on the same KVM host:
> >     <interface type='vhostuser'>
> >       <mac address='88:69:00:00:00:11'/>
> >       <source type='unix'
> > path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
> >       <model type='virtio'/>
> >       <driver queues='2' rx_queue_size='512'>
> >         <host csum='on' tso4='on' tso6='on'/>
> >         <guest csum='on' tso4='on' tso6='on'/>
>
> I have other options set, but I don't think they are related:
>        <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
> ufo='off' mrg_rxbuf='on'/>
>        <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>
>
>
> >       </driver>
> >       <alias name='net2'/>
> >       <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
> > function='0x0'/>
> >     </interface>
> >
> > When VM boots up, turn on tx, tso and sg
> > # ethtool -K ens6 tx on
> > # ethtool -K ens6 tso on
> > # ethtool -K ens6 sg on
>
> All the needed offloading features are turned on by default,
> so I don't change anything in my testbed.
>
> > Then run 'iperf -s' on one VM and 'iperf -c xx.xx.xx.xx' on another VM.
> > Iperf doesn't work if there is no chage to VM's kernel. `tcpdump` shows
> > that iperf server received packets with invalid TCP checksum.
> > `nstat -a` shows that TcpInCsumErr number is accumulating.
> >
> > After adding changes to VM's kernel as below, iperf works properly.
> > in tcp_v4_rcv()
> >       - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> >       + if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> >
> > static inline bool tcp_checksum_complete(struct sk_buff *skb)
> > {
> >         return 0;
> > }
>
> That's odd. Which kernel is that? Maybe I can try the same version.
> I am using 5.2.14-200.fc30.x86_64.
>
> Looks like somehow the packet lost its offloading flags, then kernel
> has to check the csum and since it wasn't calculated before, it's
> just random garbage.
>
> fbl
>
>
> >
> >
> >
> > Best,
> > Yifeng
> >
> > On Tue, Jan 28, 2020 at 2:52 PM Flavio Leitner <fbl@sysclose.org> wrote:
> > >
> > > On Tue, Jan 28, 2020 at 02:21:30PM -0800, Yifeng Sun wrote:
> > > > Hi Flavio,
> > > >
> > > > Thanks for the explanation. I followed the steps in the document but
> > > > TCP connection still failed to build between 2 VMs.
> > > >
> > > > I finally modified VM's kernel directly to disable TCP checksum validation
> > > > to get it working properly. I got 30.0Gbps for 'iperf' between 2 VMs.
> > >
> > > Could you provide more details on how you did that? What's running
> > > inside the VM?
> > >
> > > I don't change anything inside of the VMs (Linux) in my testbed.
> > >
> > > fbl
> > >
> > >
> > > >
> > > > Best,
> > > > Yifeng
> > > >
> > > >
> > > > On Tue, Jan 28, 2020 at 4:00 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > > > >
> > > > > On Mon, Jan 27, 2020 at 05:17:01PM -0800, Yifeng Sun wrote:
> > > > > > Hi Ilya,
> > > > > >
> > > > > > Thanks for your reply.
> > > > > >
> > > > > > The thing is, if checksum offloading is enabled in both VMs, then
> > > > > > sender VM will send
> > > > > > a packet with invalid TCP checksum, and later OVS will send this
> > > > > > packet to receiver
> > > > > > VM directly without calculating a valid checksum. As a result,
> > > > > > receiver VM will drop
> > > > > > this packet because it contains invalid checksum. This is what
> > > > > > happened when I tried
> > > > > > this patch.
> > > > > >
> > > > >
> > > > > When TSO is enabled, the TX checksumming offloading is required,
> > > > > then you will see invalid checksum. This is well documented here:
> > > > >
> > > > > https://github.com/openvswitch/ovs/blob/master/Documentation/topics/userspace-tso.rst#userspace-datapath---tso
> > > > >
> > > > > "Additionally, if the traffic is headed to a VM within the same host
> > > > > further optimization can be expected. As the traffic never leaves
> > > > > the machine, no MTU needs to be accounted for, and thus no
> > > > > segmentation and checksum calculations are required, which saves yet
> > > > > more cycles."
> > > > >
> > > > > Therefore, it's expected to see bad csum in the traffic dumps.
> > > > >
> > > > > To use the feature, you need few steps: enable the feature in OvS
> > > > > enable in qemu and inside the VM. The linux guest usually enable
> > > > > the feature by default if qemu offers it.
> > > > >
> > > > > HTH,
> > > > > fbl
> > > > >
> > > > >
> > > > > > Best,
> > > > > > Yifeng
> > > > > >
> > > > > > On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> > > > > > >
> > > > > > > On 27.01.2020 18:24, Yifeng Sun wrote:
> > > > > > > > Hi Flavio,
> > > > > > > >
> > > > > > > > I am testing your patch using iperf between 2 VMs on the same host.
> > > > > > > > But it seems that TCP connection can't be created between these 2 VMs.
> > > > > > > > When inspecting further, I found that TCP packets have invalid checksums.
> > > > > > > > This might be the reason.
> > > > > > > >
> > > > > > > > I am wondering if I missed something in the setup? Thanks a lot.
> > > > > > >
> > > > > > > I didn't test myself, but according to current design, checksum offloading
> > > > > > > (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
> > > > > > > be dropped by the guest kernel.
> > > > > > >
> > > > > > > Best regards, Ilya Maximets.
> > > > >
> > > > > --
> > > > > fbl
> > >
> > > --
> > > fbl
>
> --
> fbl
Yifeng Sun Jan. 29, 2020, 5:06 p.m. UTC | #16
Hi Ilya,

The whole output of 'ethtool -k ens6' is here:

$ ethtool -k ens6
Features for ens6:
rx-checksumming: on [fixed]
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: on
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: on [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
yfs@ubuntu:~$ ethtool -k ens6 | grep rx
rx-checksumming: on [fixed]
rx-vlan-offload: off [fixed]
rx-vlan-filter: on [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
yfs@ubuntu:~$ ethtool -k ens6
Features for ens6:
rx-checksumming: on [fixed]
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: on
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: on [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]

Thanks,
Yifeng

On Wed, Jan 29, 2020 at 4:07 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 29.01.2020 12:25, Flavio Leitner wrote:
> > On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
> >> Sure.
> >>
> >> Firstly, make sure userspace-tso-enable is true
> >> # ovs-vsctl get Open_vSwitch . other_config
> >> {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
> >> userspace-tso-enable="true"}
> >>
> >> Next, create 2 VMs with vhostuser-type interface on the same KVM host:
> >>     <interface type='vhostuser'>
> >>       <mac address='88:69:00:00:00:11'/>
> >>       <source type='unix'
> >> path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
> >>       <model type='virtio'/>
> >>       <driver queues='2' rx_queue_size='512'>
> >>         <host csum='on' tso4='on' tso6='on'/>
> >>         <guest csum='on' tso4='on' tso6='on'/>
> >
> > I have other options set, but I don't think they are related:
> >        <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
> > ufo='off' mrg_rxbuf='on'/>>        <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>
> >
> >
> >>       </driver>
> >>       <alias name='net2'/>
> >>       <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
> >> function='0x0'/>
> >>     </interface>
> >>
> >> When VM boots up, turn on tx, tso and sg
> >> # ethtool -K ens6 tx on
> >> # ethtool -K ens6 tso on
> >> # ethtool -K ens6 sg on
>
> Could you, please, provide the output of 'ethtool -k ens6'?
> If for some reason rx offloading is not enabled by default, you need to
> enable it too.
>
> >
> > All the needed offloading features are turned on by default,
> > so I don't change anything in my testbed.
> >
> >> Then run 'iperf -s' on one VM and 'iperf -c xx.xx.xx.xx' on another VM.
> >> Iperf doesn't work if there is no chage to VM's kernel. `tcpdump` shows
> >> that iperf server received packets with invalid TCP checksum.
> >> `nstat -a` shows that TcpInCsumErr number is accumulating.
> >>
> >> After adding changes to VM's kernel as below, iperf works properly.
> >> in tcp_v4_rcv()
> >>       - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> >>       + if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> >>
> >> static inline bool tcp_checksum_complete(struct sk_buff *skb)
> >> {
> >>         return 0;
> >> }
> >
> > That's odd. Which kernel is that? Maybe I can try the same version.
> > I am using 5.2.14-200.fc30.x86_64.
> >
> > Looks like somehow the packet lost its offloading flags, then kernel
> > has to check the csum and since it wasn't calculated before, it's
> > just random garbage.
> >
> > fbl
> >
> >
> >>
> >>
> >>
> >> Best,
> >> Yifeng
> >>
> >> On Tue, Jan 28, 2020 at 2:52 PM Flavio Leitner <fbl@sysclose.org> wrote:
> >>>
> >>> On Tue, Jan 28, 2020 at 02:21:30PM -0800, Yifeng Sun wrote:
> >>>> Hi Flavio,
> >>>>
> >>>> Thanks for the explanation. I followed the steps in the document but
> >>>> TCP connection still failed to build between 2 VMs.
> >>>>
> >>>> I finally modified VM's kernel directly to disable TCP checksum validation
> >>>> to get it working properly. I got 30.0Gbps for 'iperf' between 2 VMs.
> >>>
> >>> Could you provide more details on how you did that? What's running
> >>> inside the VM?
> >>>
> >>> I don't change anything inside of the VMs (Linux) in my testbed.
> >>>
> >>> fbl
> >>>
> >>>
> >>>>
> >>>> Best,
> >>>> Yifeng
> >>>>
> >>>>
> >>>> On Tue, Jan 28, 2020 at 4:00 AM Flavio Leitner <fbl@sysclose.org> wrote:
> >>>>>
> >>>>> On Mon, Jan 27, 2020 at 05:17:01PM -0800, Yifeng Sun wrote:
> >>>>>> Hi Ilya,
> >>>>>>
> >>>>>> Thanks for your reply.
> >>>>>>
> >>>>>> The thing is, if checksum offloading is enabled in both VMs, then
> >>>>>> sender VM will send
> >>>>>> a packet with invalid TCP checksum, and later OVS will send this
> >>>>>> packet to receiver
> >>>>>> VM directly without calculating a valid checksum. As a result,
> >>>>>> receiver VM will drop
> >>>>>> this packet because it contains invalid checksum. This is what
> >>>>>> happened when I tried
> >>>>>> this patch.
> >>>>>>
> >>>>>
> >>>>> When TSO is enabled, the TX checksumming offloading is required,
> >>>>> then you will see invalid checksum. This is well documented here:
> >>>>>
> >>>>> https://github.com/openvswitch/ovs/blob/master/Documentation/topics/userspace-tso.rst#userspace-datapath---tso
> >>>>>
> >>>>> "Additionally, if the traffic is headed to a VM within the same host
> >>>>> further optimization can be expected. As the traffic never leaves
> >>>>> the machine, no MTU needs to be accounted for, and thus no
> >>>>> segmentation and checksum calculations are required, which saves yet
> >>>>> more cycles."
> >>>>>
> >>>>> Therefore, it's expected to see bad csum in the traffic dumps.
> >>>>>
> >>>>> To use the feature, you need few steps: enable the feature in OvS
> >>>>> enable in qemu and inside the VM. The linux guest usually enable
> >>>>> the feature by default if qemu offers it.
> >>>>>
> >>>>> HTH,
> >>>>> fbl
> >>>>>
> >>>>>
> >>>>>> Best,
> >>>>>> Yifeng
> >>>>>>
> >>>>>> On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> >>>>>>>
> >>>>>>> On 27.01.2020 18:24, Yifeng Sun wrote:
> >>>>>>>> Hi Flavio,
> >>>>>>>>
> >>>>>>>> I am testing your patch using iperf between 2 VMs on the same host.
> >>>>>>>> But it seems that TCP connection can't be created between these 2 VMs.
> >>>>>>>> When inspecting further, I found that TCP packets have invalid checksums.
> >>>>>>>> This might be the reason.
> >>>>>>>>
> >>>>>>>> I am wondering if I missed something in the setup? Thanks a lot.
> >>>>>>>
> >>>>>>> I didn't test myself, but according to current design, checksum offloading
> >>>>>>> (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
> >>>>>>> be dropped by the guest kernel.
> >>>>>>>
> >>>>>>> Best regards, Ilya Maximets.
> >>>>>
> >>>>> --
> >>>>> fbl
> >>>
> >>> --
> >>> fbl
> >
William Tu Jan. 29, 2020, 7:19 p.m. UTC | #17
On Wed, Jan 29, 2020 at 3:25 AM Flavio Leitner <fbl@sysclose.org> wrote:
>
> On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
> > Sure.
> >
> > Firstly, make sure userspace-tso-enable is true
> > # ovs-vsctl get Open_vSwitch . other_config
> > {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
> > userspace-tso-enable="true"}
> >
> > Next, create 2 VMs with vhostuser-type interface on the same KVM host:
> >     <interface type='vhostuser'>
> >       <mac address='88:69:00:00:00:11'/>
> >       <source type='unix'
> > path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
> >       <model type='virtio'/>
> >       <driver queues='2' rx_queue_size='512'>
> >         <host csum='on' tso4='on' tso6='on'/>
> >         <guest csum='on' tso4='on' tso6='on'/>
>
> I have other options set, but I don't think they are related:
>        <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
> ufo='off' mrg_rxbuf='on'/>
>        <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>

Is mrg_rxbuf required to be on?
I saw when enable userspace tso, we are setting external buffer
RTE_VHOST_USER_EXTBUF_SUPPORT

Is this the same thing?
William
Flavio Leitner Jan. 29, 2020, 9:21 p.m. UTC | #18
On Wed, Jan 29, 2020 at 11:19:47AM -0800, William Tu wrote:
> On Wed, Jan 29, 2020 at 3:25 AM Flavio Leitner <fbl@sysclose.org> wrote:
> >
> > On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
> > > Sure.
> > >
> > > Firstly, make sure userspace-tso-enable is true
> > > # ovs-vsctl get Open_vSwitch . other_config
> > > {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
> > > userspace-tso-enable="true"}
> > >
> > > Next, create 2 VMs with vhostuser-type interface on the same KVM host:
> > >     <interface type='vhostuser'>
> > >       <mac address='88:69:00:00:00:11'/>
> > >       <source type='unix'
> > > path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
> > >       <model type='virtio'/>
> > >       <driver queues='2' rx_queue_size='512'>
> > >         <host csum='on' tso4='on' tso6='on'/>
> > >         <guest csum='on' tso4='on' tso6='on'/>
> >
> > I have other options set, but I don't think they are related:
> >        <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
> > ufo='off' mrg_rxbuf='on'/>
> >        <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>
> 
> Is mrg_rxbuf required to be on?

No.

> I saw when enable userspace tso, we are setting external buffer
> RTE_VHOST_USER_EXTBUF_SUPPORT

Yes.

> Is this the same thing?

No.

mrg_rxbuf says that we want the virtio ring to support chained ring
entries. If that is disabled, the virtio ring will be populated with
entries of maximum buffer length. If that is enabled, a packet will
use one or chain more entries in the virtio ring, so each entry can
be of smaller lengths. That is not visible to OvS.

The RTE_VHOST_USER_EXTBUF_SUPPORT tells how a packet is provided
after have been pulled out of virtio rings to OvS. We have three
options currently:

1) LINEARBUF
It supports data length up to the packet provided (~MTU size).

2) EXTBUF
If the packet is too big for #1, allocate a buffer large enough
to fit the data. We get a big packet, but instead of data being
along with the packet's metadata, it's in an external buffer.

<packet [packet metadata] [ unused buffer ]>
           +---> [ big buffer]

Well, actually we make partial use of unused buffer to store
struct rte_mbuf_ext_shared_info.

3) If neither LINEARBUF nor EXTBUF is not provided (default),
vhost lib can provide large packets as a chain of mbufs, which
OvS doesn't support today.

HTH,
Yifeng Sun Jan. 29, 2020, 10:42 p.m. UTC | #19
Hi Flavio,

I found this piece of code in kernel's drivers/net/virtio_net.c and
its function receive_buf():
    if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
                skb->ip_summed = CHECKSUM_UNNECESSARY;
My understanding is that vhost_user needs to set flag
VIRTIO_NET_HDR_F_DATA_VALID so that
guest's kernel will skip packet's checksum validation.

Then I looked through dpdk's source code but didn't find any place
that sets this flag. So I made
some changes as below, and TCP starts working between 2 VMs without
any kernel change.

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 73bf98bd9..5e45db655 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -437,6 +437,7 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf,
struct virtio_net_hdr *net_hdr)
                ASSIGN_UNLESS_EQUAL(net_hdr->csum_start, 0);
                ASSIGN_UNLESS_EQUAL(net_hdr->csum_offset, 0);
-               ASSIGN_UNLESS_EQUAL(net_hdr->flags, 0);
+               net_hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
        }

        /* IP cksum verification cannot be bypassed, then calculate here */


Any comments will be appreciated!

Thanks a lot,
Yifeng

On Wed, Jan 29, 2020 at 1:21 PM Flavio Leitner <fbl@sysclose.org> wrote:
>
> On Wed, Jan 29, 2020 at 11:19:47AM -0800, William Tu wrote:
> > On Wed, Jan 29, 2020 at 3:25 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > >
> > > On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
> > > > Sure.
> > > >
> > > > Firstly, make sure userspace-tso-enable is true
> > > > # ovs-vsctl get Open_vSwitch . other_config
> > > > {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
> > > > userspace-tso-enable="true"}
> > > >
> > > > Next, create 2 VMs with vhostuser-type interface on the same KVM host:
> > > >     <interface type='vhostuser'>
> > > >       <mac address='88:69:00:00:00:11'/>
> > > >       <source type='unix'
> > > > path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
> > > >       <model type='virtio'/>
> > > >       <driver queues='2' rx_queue_size='512'>
> > > >         <host csum='on' tso4='on' tso6='on'/>
> > > >         <guest csum='on' tso4='on' tso6='on'/>
> > >
> > > I have other options set, but I don't think they are related:
> > >        <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
> > > ufo='off' mrg_rxbuf='on'/>
> > >        <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>
> >
> > Is mrg_rxbuf required to be on?
>
> No.
>
> > I saw when enable userspace tso, we are setting external buffer
> > RTE_VHOST_USER_EXTBUF_SUPPORT
>
> Yes.
>
> > Is this the same thing?
>
> No.
>
> mrg_rxbuf says that we want the virtio ring to support chained ring
> entries. If that is disabled, the virtio ring will be populated with
> entries of maximum buffer length. If that is enabled, a packet will
> use one or chain more entries in the virtio ring, so each entry can
> be of smaller lengths. That is not visible to OvS.
>
> The RTE_VHOST_USER_EXTBUF_SUPPORT tells how a packet is provided
> after have been pulled out of virtio rings to OvS. We have three
> options currently:
>
> 1) LINEARBUF
> It supports data length up to the packet provided (~MTU size).
>
> 2) EXTBUF
> If the packet is too big for #1, allocate a buffer large enough
> to fit the data. We get a big packet, but instead of data being
> along with the packet's metadata, it's in an external buffer.
>
> <packet [packet metadata] [ unused buffer ]>
>            +---> [ big buffer]
>
> Well, actually we make partial use of unused buffer to store
> struct rte_mbuf_ext_shared_info.
>
> 3) If neither LINEARBUF nor EXTBUF is not provided (default),
> vhost lib can provide large packets as a chain of mbufs, which
> OvS doesn't support today.
>
> HTH,
> --
> fbl
Flavio Leitner Jan. 29, 2020, 11:04 p.m. UTC | #20
On Wed, Jan 29, 2020 at 02:42:27PM -0800, Yifeng Sun wrote:
> Hi Flavio,

Hi Yifend, thanks for looking into this.

> I found this piece of code in kernel's drivers/net/virtio_net.c and
> its function receive_buf():
>     if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
>                 skb->ip_summed = CHECKSUM_UNNECESSARY;
> My understanding is that vhost_user needs to set flag
> VIRTIO_NET_HDR_F_DATA_VALID so that
> guest's kernel will skip packet's checksum validation.
> 
> Then I looked through dpdk's source code but didn't find any place
> that sets this flag. So I made
> some changes as below, and TCP starts working between 2 VMs without
> any kernel change.
> 
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 73bf98bd9..5e45db655 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -437,6 +437,7 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf,
> struct virtio_net_hdr *net_hdr)
>                 ASSIGN_UNLESS_EQUAL(net_hdr->csum_start, 0);
>                 ASSIGN_UNLESS_EQUAL(net_hdr->csum_offset, 0);
> -               ASSIGN_UNLESS_EQUAL(net_hdr->flags, 0);
> +               net_hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
>         }
> 
>         /* IP cksum verification cannot be bypassed, then calculate here */

No, it actually uses ->flags to pass VIRTIO_NET_HDR_F_NEEDS_CSUM and 
then we pass the start and offset.

HTH,
fbl

> 
> 
> Any comments will be appreciated!
> 
> Thanks a lot,
> Yifeng
> 
> On Wed, Jan 29, 2020 at 1:21 PM Flavio Leitner <fbl@sysclose.org> wrote:
> >
> > On Wed, Jan 29, 2020 at 11:19:47AM -0800, William Tu wrote:
> > > On Wed, Jan 29, 2020 at 3:25 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > > >
> > > > On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
> > > > > Sure.
> > > > >
> > > > > Firstly, make sure userspace-tso-enable is true
> > > > > # ovs-vsctl get Open_vSwitch . other_config
> > > > > {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
> > > > > userspace-tso-enable="true"}
> > > > >
> > > > > Next, create 2 VMs with vhostuser-type interface on the same KVM host:
> > > > >     <interface type='vhostuser'>
> > > > >       <mac address='88:69:00:00:00:11'/>
> > > > >       <source type='unix'
> > > > > path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
> > > > >       <model type='virtio'/>
> > > > >       <driver queues='2' rx_queue_size='512'>
> > > > >         <host csum='on' tso4='on' tso6='on'/>
> > > > >         <guest csum='on' tso4='on' tso6='on'/>
> > > >
> > > > I have other options set, but I don't think they are related:
> > > >        <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
> > > > ufo='off' mrg_rxbuf='on'/>
> > > >        <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>
> > >
> > > Is mrg_rxbuf required to be on?
> >
> > No.
> >
> > > I saw when enable userspace tso, we are setting external buffer
> > > RTE_VHOST_USER_EXTBUF_SUPPORT
> >
> > Yes.
> >
> > > Is this the same thing?
> >
> > No.
> >
> > mrg_rxbuf says that we want the virtio ring to support chained ring
> > entries. If that is disabled, the virtio ring will be populated with
> > entries of maximum buffer length. If that is enabled, a packet will
> > use one or chain more entries in the virtio ring, so each entry can
> > be of smaller lengths. That is not visible to OvS.
> >
> > The RTE_VHOST_USER_EXTBUF_SUPPORT tells how a packet is provided
> > after have been pulled out of virtio rings to OvS. We have three
> > options currently:
> >
> > 1) LINEARBUF
> > It supports data length up to the packet provided (~MTU size).
> >
> > 2) EXTBUF
> > If the packet is too big for #1, allocate a buffer large enough
> > to fit the data. We get a big packet, but instead of data being
> > along with the packet's metadata, it's in an external buffer.
> >
> > <packet [packet metadata] [ unused buffer ]>
> >            +---> [ big buffer]
> >
> > Well, actually we make partial use of unused buffer to store
> > struct rte_mbuf_ext_shared_info.
> >
> > 3) If neither LINEARBUF nor EXTBUF is not provided (default),
> > vhost lib can provide large packets as a chain of mbufs, which
> > OvS doesn't support today.
> >
> > HTH,
> > --
> > fbl
Yifeng Sun Jan. 29, 2020, 11:12 p.m. UTC | #21
Got it. Thanks.
Yifeng

On Wed, Jan 29, 2020 at 3:04 PM Flavio Leitner <fbl@sysclose.org> wrote:
>
> On Wed, Jan 29, 2020 at 02:42:27PM -0800, Yifeng Sun wrote:
> > Hi Flavio,
>
> Hi Yifend, thanks for looking into this.
>
> > I found this piece of code in kernel's drivers/net/virtio_net.c and
> > its function receive_buf():
> >     if (hdr->hdr.flags & VIRTIO_NET_HDR_F_DATA_VALID)
> >                 skb->ip_summed = CHECKSUM_UNNECESSARY;
> > My understanding is that vhost_user needs to set flag
> > VIRTIO_NET_HDR_F_DATA_VALID so that
> > guest's kernel will skip packet's checksum validation.
> >
> > Then I looked through dpdk's source code but didn't find any place
> > that sets this flag. So I made
> > some changes as below, and TCP starts working between 2 VMs without
> > any kernel change.
> >
> > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> > index 73bf98bd9..5e45db655 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -437,6 +437,7 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf,
> > struct virtio_net_hdr *net_hdr)
> >                 ASSIGN_UNLESS_EQUAL(net_hdr->csum_start, 0);
> >                 ASSIGN_UNLESS_EQUAL(net_hdr->csum_offset, 0);
> > -               ASSIGN_UNLESS_EQUAL(net_hdr->flags, 0);
> > +               net_hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
> >         }
> >
> >         /* IP cksum verification cannot be bypassed, then calculate here */
>
> No, it actually uses ->flags to pass VIRTIO_NET_HDR_F_NEEDS_CSUM and
> then we pass the start and offset.
>
> HTH,
> fbl
>
> >
> >
> > Any comments will be appreciated!
> >
> > Thanks a lot,
> > Yifeng
> >
> > On Wed, Jan 29, 2020 at 1:21 PM Flavio Leitner <fbl@sysclose.org> wrote:
> > >
> > > On Wed, Jan 29, 2020 at 11:19:47AM -0800, William Tu wrote:
> > > > On Wed, Jan 29, 2020 at 3:25 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > > > >
> > > > > On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
> > > > > > Sure.
> > > > > >
> > > > > > Firstly, make sure userspace-tso-enable is true
> > > > > > # ovs-vsctl get Open_vSwitch . other_config
> > > > > > {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
> > > > > > userspace-tso-enable="true"}
> > > > > >
> > > > > > Next, create 2 VMs with vhostuser-type interface on the same KVM host:
> > > > > >     <interface type='vhostuser'>
> > > > > >       <mac address='88:69:00:00:00:11'/>
> > > > > >       <source type='unix'
> > > > > > path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
> > > > > >       <model type='virtio'/>
> > > > > >       <driver queues='2' rx_queue_size='512'>
> > > > > >         <host csum='on' tso4='on' tso6='on'/>
> > > > > >         <guest csum='on' tso4='on' tso6='on'/>
> > > > >
> > > > > I have other options set, but I don't think they are related:
> > > > >        <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
> > > > > ufo='off' mrg_rxbuf='on'/>
> > > > >        <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>
> > > >
> > > > Is mrg_rxbuf required to be on?
> > >
> > > No.
> > >
> > > > I saw when enable userspace tso, we are setting external buffer
> > > > RTE_VHOST_USER_EXTBUF_SUPPORT
> > >
> > > Yes.
> > >
> > > > Is this the same thing?
> > >
> > > No.
> > >
> > > mrg_rxbuf says that we want the virtio ring to support chained ring
> > > entries. If that is disabled, the virtio ring will be populated with
> > > entries of maximum buffer length. If that is enabled, a packet will
> > > use one or chain more entries in the virtio ring, so each entry can
> > > be of smaller lengths. That is not visible to OvS.
> > >
> > > The RTE_VHOST_USER_EXTBUF_SUPPORT tells how a packet is provided
> > > after have been pulled out of virtio rings to OvS. We have three
> > > options currently:
> > >
> > > 1) LINEARBUF
> > > It supports data length up to the packet provided (~MTU size).
> > >
> > > 2) EXTBUF
> > > If the packet is too big for #1, allocate a buffer large enough
> > > to fit the data. We get a big packet, but instead of data being
> > > along with the packet's metadata, it's in an external buffer.
> > >
> > > <packet [packet metadata] [ unused buffer ]>
> > >            +---> [ big buffer]
> > >
> > > Well, actually we make partial use of unused buffer to store
> > > struct rte_mbuf_ext_shared_info.
> > >
> > > 3) If neither LINEARBUF nor EXTBUF is not provided (default),
> > > vhost lib can provide large packets as a chain of mbufs, which
> > > OvS doesn't support today.
> > >
> > > HTH,
> > > --
> > > fbl
>
> --
> fbl
Flavio Leitner Feb. 13, 2020, 8:05 p.m. UTC | #22
Hi Yifeng,

Sorry the late response.

On Wed, Jan 29, 2020 at 09:04:39AM -0800, Yifeng Sun wrote:
> Hi Flavio,
> 
> Sorry in my last email, one change is incorrect. it should be:
> in tcp_v4_rcv()
>       - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
>       + if (0)
> 
> The kernel version I am using is ubuntu 18.04's default kernel:
> $ uname -r
> 4.15.0-76-generic

I deployed a VM with 4.15.0 from upstream and I can ssh, scp (back
and forth), iperf3 (direct, reverse, with TCP or UDP) between that
VM and another VM, veth, bridge and another host without issues.

Any chance for you to try with the same upstream kernel version?

Thanks,
fbl

> 
> Thanks,
> Yifeng
> 
> On Wed, Jan 29, 2020 at 3:25 AM Flavio Leitner <fbl@sysclose.org> wrote:
> >
> > On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
> > > Sure.
> > >
> > > Firstly, make sure userspace-tso-enable is true
> > > # ovs-vsctl get Open_vSwitch . other_config
> > > {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
> > > userspace-tso-enable="true"}
> > >
> > > Next, create 2 VMs with vhostuser-type interface on the same KVM host:
> > >     <interface type='vhostuser'>
> > >       <mac address='88:69:00:00:00:11'/>
> > >       <source type='unix'
> > > path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
> > >       <model type='virtio'/>
> > >       <driver queues='2' rx_queue_size='512'>
> > >         <host csum='on' tso4='on' tso6='on'/>
> > >         <guest csum='on' tso4='on' tso6='on'/>
> >
> > I have other options set, but I don't think they are related:
> >        <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
> > ufo='off' mrg_rxbuf='on'/>
> >        <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>
> >
> >
> > >       </driver>
> > >       <alias name='net2'/>
> > >       <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
> > > function='0x0'/>
> > >     </interface>
> > >
> > > When VM boots up, turn on tx, tso and sg
> > > # ethtool -K ens6 tx on
> > > # ethtool -K ens6 tso on
> > > # ethtool -K ens6 sg on
> >
> > All the needed offloading features are turned on by default,
> > so I don't change anything in my testbed.
> >
> > > Then run 'iperf -s' on one VM and 'iperf -c xx.xx.xx.xx' on another VM.
> > > Iperf doesn't work if there is no chage to VM's kernel. `tcpdump` shows
> > > that iperf server received packets with invalid TCP checksum.
> > > `nstat -a` shows that TcpInCsumErr number is accumulating.
> > >
> > > After adding changes to VM's kernel as below, iperf works properly.
> > > in tcp_v4_rcv()
> > >       - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> > >       + if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> > >
> > > static inline bool tcp_checksum_complete(struct sk_buff *skb)
> > > {
> > >         return 0;
> > > }
> >
> > That's odd. Which kernel is that? Maybe I can try the same version.
> > I am using 5.2.14-200.fc30.x86_64.
> >
> > Looks like somehow the packet lost its offloading flags, then kernel
> > has to check the csum and since it wasn't calculated before, it's
> > just random garbage.
> >
> > fbl
> >
> >
> > >
> > >
> > >
> > > Best,
> > > Yifeng
> > >
> > > On Tue, Jan 28, 2020 at 2:52 PM Flavio Leitner <fbl@sysclose.org> wrote:
> > > >
> > > > On Tue, Jan 28, 2020 at 02:21:30PM -0800, Yifeng Sun wrote:
> > > > > Hi Flavio,
> > > > >
> > > > > Thanks for the explanation. I followed the steps in the document but
> > > > > TCP connection still failed to build between 2 VMs.
> > > > >
> > > > > I finally modified VM's kernel directly to disable TCP checksum validation
> > > > > to get it working properly. I got 30.0Gbps for 'iperf' between 2 VMs.
> > > >
> > > > Could you provide more details on how you did that? What's running
> > > > inside the VM?
> > > >
> > > > I don't change anything inside of the VMs (Linux) in my testbed.
> > > >
> > > > fbl
> > > >
> > > >
> > > > >
> > > > > Best,
> > > > > Yifeng
> > > > >
> > > > >
> > > > > On Tue, Jan 28, 2020 at 4:00 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > > > > >
> > > > > > On Mon, Jan 27, 2020 at 05:17:01PM -0800, Yifeng Sun wrote:
> > > > > > > Hi Ilya,
> > > > > > >
> > > > > > > Thanks for your reply.
> > > > > > >
> > > > > > > The thing is, if checksum offloading is enabled in both VMs, then
> > > > > > > sender VM will send
> > > > > > > a packet with invalid TCP checksum, and later OVS will send this
> > > > > > > packet to receiver
> > > > > > > VM directly without calculating a valid checksum. As a result,
> > > > > > > receiver VM will drop
> > > > > > > this packet because it contains invalid checksum. This is what
> > > > > > > happened when I tried
> > > > > > > this patch.
> > > > > > >
> > > > > >
> > > > > > When TSO is enabled, the TX checksumming offloading is required,
> > > > > > then you will see invalid checksum. This is well documented here:
> > > > > >
> > > > > > https://github.com/openvswitch/ovs/blob/master/Documentation/topics/userspace-tso.rst#userspace-datapath---tso
> > > > > >
> > > > > > "Additionally, if the traffic is headed to a VM within the same host
> > > > > > further optimization can be expected. As the traffic never leaves
> > > > > > the machine, no MTU needs to be accounted for, and thus no
> > > > > > segmentation and checksum calculations are required, which saves yet
> > > > > > more cycles."
> > > > > >
> > > > > > Therefore, it's expected to see bad csum in the traffic dumps.
> > > > > >
> > > > > > To use the feature, you need few steps: enable the feature in OvS
> > > > > > enable in qemu and inside the VM. The linux guest usually enable
> > > > > > the feature by default if qemu offers it.
> > > > > >
> > > > > > HTH,
> > > > > > fbl
> > > > > >
> > > > > >
> > > > > > > Best,
> > > > > > > Yifeng
> > > > > > >
> > > > > > > On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> > > > > > > >
> > > > > > > > On 27.01.2020 18:24, Yifeng Sun wrote:
> > > > > > > > > Hi Flavio,
> > > > > > > > >
> > > > > > > > > I am testing your patch using iperf between 2 VMs on the same host.
> > > > > > > > > But it seems that TCP connection can't be created between these 2 VMs.
> > > > > > > > > When inspecting further, I found that TCP packets have invalid checksums.
> > > > > > > > > This might be the reason.
> > > > > > > > >
> > > > > > > > > I am wondering if I missed something in the setup? Thanks a lot.
> > > > > > > >
> > > > > > > > I didn't test myself, but according to current design, checksum offloading
> > > > > > > > (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
> > > > > > > > be dropped by the guest kernel.
> > > > > > > >
> > > > > > > > Best regards, Ilya Maximets.
> > > > > >
> > > > > > --
> > > > > > fbl
> > > >
> > > > --
> > > > fbl
> >
> > --
> > fbl
Yifeng Sun Feb. 14, 2020, 5:44 p.m. UTC | #23
Hi Flavio,

Can you please confirm the kernel versions you are using?

Host KVM: 5.2.14-200.fc30.x86_64.
VM: 4.15.0 from upstream ubuntu.

Thanks,
Yifeng

On Thu, Feb 13, 2020 at 12:05 PM Flavio Leitner <fbl@sysclose.org> wrote:
>
>
> Hi Yifeng,
>
> Sorry the late response.
>
> On Wed, Jan 29, 2020 at 09:04:39AM -0800, Yifeng Sun wrote:
> > Hi Flavio,
> >
> > Sorry in my last email, one change is incorrect. it should be:
> > in tcp_v4_rcv()
> >       - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> >       + if (0)
> >
> > The kernel version I am using is ubuntu 18.04's default kernel:
> > $ uname -r
> > 4.15.0-76-generic
>
> I deployed a VM with 4.15.0 from upstream and I can ssh, scp (back
> and forth), iperf3 (direct, reverse, with TCP or UDP) between that
> VM and another VM, veth, bridge and another host without issues.
>
> Any chance for you to try with the same upstream kernel version?
>
> Thanks,
> fbl
>
> >
> > Thanks,
> > Yifeng
> >
> > On Wed, Jan 29, 2020 at 3:25 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > >
> > > On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
> > > > Sure.
> > > >
> > > > Firstly, make sure userspace-tso-enable is true
> > > > # ovs-vsctl get Open_vSwitch . other_config
> > > > {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
> > > > userspace-tso-enable="true"}
> > > >
> > > > Next, create 2 VMs with vhostuser-type interface on the same KVM host:
> > > >     <interface type='vhostuser'>
> > > >       <mac address='88:69:00:00:00:11'/>
> > > >       <source type='unix'
> > > > path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
> > > >       <model type='virtio'/>
> > > >       <driver queues='2' rx_queue_size='512'>
> > > >         <host csum='on' tso4='on' tso6='on'/>
> > > >         <guest csum='on' tso4='on' tso6='on'/>
> > >
> > > I have other options set, but I don't think they are related:
> > >        <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
> > > ufo='off' mrg_rxbuf='on'/>
> > >        <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>
> > >
> > >
> > > >       </driver>
> > > >       <alias name='net2'/>
> > > >       <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
> > > > function='0x0'/>
> > > >     </interface>
> > > >
> > > > When VM boots up, turn on tx, tso and sg
> > > > # ethtool -K ens6 tx on
> > > > # ethtool -K ens6 tso on
> > > > # ethtool -K ens6 sg on
> > >
> > > All the needed offloading features are turned on by default,
> > > so I don't change anything in my testbed.
> > >
> > > > Then run 'iperf -s' on one VM and 'iperf -c xx.xx.xx.xx' on another VM.
> > > > Iperf doesn't work if there is no chage to VM's kernel. `tcpdump` shows
> > > > that iperf server received packets with invalid TCP checksum.
> > > > `nstat -a` shows that TcpInCsumErr number is accumulating.
> > > >
> > > > After adding changes to VM's kernel as below, iperf works properly.
> > > > in tcp_v4_rcv()
> > > >       - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> > > >       + if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> > > >
> > > > static inline bool tcp_checksum_complete(struct sk_buff *skb)
> > > > {
> > > >         return 0;
> > > > }
> > >
> > > That's odd. Which kernel is that? Maybe I can try the same version.
> > > I am using 5.2.14-200.fc30.x86_64.
> > >
> > > Looks like somehow the packet lost its offloading flags, then kernel
> > > has to check the csum and since it wasn't calculated before, it's
> > > just random garbage.
> > >
> > > fbl
> > >
> > >
> > > >
> > > >
> > > >
> > > > Best,
> > > > Yifeng
> > > >
> > > > On Tue, Jan 28, 2020 at 2:52 PM Flavio Leitner <fbl@sysclose.org> wrote:
> > > > >
> > > > > On Tue, Jan 28, 2020 at 02:21:30PM -0800, Yifeng Sun wrote:
> > > > > > Hi Flavio,
> > > > > >
> > > > > > Thanks for the explanation. I followed the steps in the document but
> > > > > > TCP connection still failed to build between 2 VMs.
> > > > > >
> > > > > > I finally modified VM's kernel directly to disable TCP checksum validation
> > > > > > to get it working properly. I got 30.0Gbps for 'iperf' between 2 VMs.
> > > > >
> > > > > Could you provide more details on how you did that? What's running
> > > > > inside the VM?
> > > > >
> > > > > I don't change anything inside of the VMs (Linux) in my testbed.
> > > > >
> > > > > fbl
> > > > >
> > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yifeng
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 28, 2020 at 4:00 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > > > > > >
> > > > > > > On Mon, Jan 27, 2020 at 05:17:01PM -0800, Yifeng Sun wrote:
> > > > > > > > Hi Ilya,
> > > > > > > >
> > > > > > > > Thanks for your reply.
> > > > > > > >
> > > > > > > > The thing is, if checksum offloading is enabled in both VMs, then
> > > > > > > > sender VM will send
> > > > > > > > a packet with invalid TCP checksum, and later OVS will send this
> > > > > > > > packet to receiver
> > > > > > > > VM directly without calculating a valid checksum. As a result,
> > > > > > > > receiver VM will drop
> > > > > > > > this packet because it contains invalid checksum. This is what
> > > > > > > > happened when I tried
> > > > > > > > this patch.
> > > > > > > >
> > > > > > >
> > > > > > > When TSO is enabled, the TX checksumming offloading is required,
> > > > > > > then you will see invalid checksum. This is well documented here:
> > > > > > >
> > > > > > > https://github.com/openvswitch/ovs/blob/master/Documentation/topics/userspace-tso.rst#userspace-datapath---tso
> > > > > > >
> > > > > > > "Additionally, if the traffic is headed to a VM within the same host
> > > > > > > further optimization can be expected. As the traffic never leaves
> > > > > > > the machine, no MTU needs to be accounted for, and thus no
> > > > > > > segmentation and checksum calculations are required, which saves yet
> > > > > > > more cycles."
> > > > > > >
> > > > > > > Therefore, it's expected to see bad csum in the traffic dumps.
> > > > > > >
> > > > > > > To use the feature, you need few steps: enable the feature in OvS
> > > > > > > enable in qemu and inside the VM. The linux guest usually enable
> > > > > > > the feature by default if qemu offers it.
> > > > > > >
> > > > > > > HTH,
> > > > > > > fbl
> > > > > > >
> > > > > > >
> > > > > > > > Best,
> > > > > > > > Yifeng
> > > > > > > >
> > > > > > > > On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> > > > > > > > >
> > > > > > > > > On 27.01.2020 18:24, Yifeng Sun wrote:
> > > > > > > > > > Hi Flavio,
> > > > > > > > > >
> > > > > > > > > > I am testing your patch using iperf between 2 VMs on the same host.
> > > > > > > > > > But it seems that TCP connection can't be created between these 2 VMs.
> > > > > > > > > > When inspecting further, I found that TCP packets have invalid checksums.
> > > > > > > > > > This might be the reason.
> > > > > > > > > >
> > > > > > > > > > I am wondering if I missed something in the setup? Thanks a lot.
> > > > > > > > >
> > > > > > > > > I didn't test myself, but according to current design, checksum offloading
> > > > > > > > > (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
> > > > > > > > > be dropped by the guest kernel.
> > > > > > > > >
> > > > > > > > > Best regards, Ilya Maximets.
> > > > > > >
> > > > > > > --
> > > > > > > fbl
> > > > >
> > > > > --
> > > > > fbl
> > >
> > > --
> > > fbl
>
> --
> fbl
Flavio Leitner Feb. 14, 2020, 7:29 p.m. UTC | #24
On Fri, Feb 14, 2020 at 09:44:52AM -0800, Yifeng Sun wrote:
> Hi Flavio,
> 
> Can you please confirm the kernel versions you are using?
> 
> Host KVM: 5.2.14-200.fc30.x86_64.

Host KVM: 5.5.0+

> VM: 4.15.0 from upstream ubuntu.

VM: 4.15.0 from Linus git tree.

fbl

> 
> Thanks,
> Yifeng
> 
> On Thu, Feb 13, 2020 at 12:05 PM Flavio Leitner <fbl@sysclose.org> wrote:
> >
> >
> > Hi Yifeng,
> >
> > Sorry the late response.
> >
> > On Wed, Jan 29, 2020 at 09:04:39AM -0800, Yifeng Sun wrote:
> > > Hi Flavio,
> > >
> > > Sorry in my last email, one change is incorrect. it should be:
> > > in tcp_v4_rcv()
> > >       - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> > >       + if (0)
> > >
> > > The kernel version I am using is ubuntu 18.04's default kernel:
> > > $ uname -r
> > > 4.15.0-76-generic
> >
> > I deployed a VM with 4.15.0 from upstream and I can ssh, scp (back
> > and forth), iperf3 (direct, reverse, with TCP or UDP) between that
> > VM and another VM, veth, bridge and another host without issues.
> >
> > Any chance for you to try with the same upstream kernel version?
> >
> > Thanks,
> > fbl
> >
> > >
> > > Thanks,
> > > Yifeng
> > >
> > > On Wed, Jan 29, 2020 at 3:25 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > > >
> > > > On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
> > > > > Sure.
> > > > >
> > > > > Firstly, make sure userspace-tso-enable is true
> > > > > # ovs-vsctl get Open_vSwitch . other_config
> > > > > {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
> > > > > userspace-tso-enable="true"}
> > > > >
> > > > > Next, create 2 VMs with vhostuser-type interface on the same KVM host:
> > > > >     <interface type='vhostuser'>
> > > > >       <mac address='88:69:00:00:00:11'/>
> > > > >       <source type='unix'
> > > > > path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
> > > > >       <model type='virtio'/>
> > > > >       <driver queues='2' rx_queue_size='512'>
> > > > >         <host csum='on' tso4='on' tso6='on'/>
> > > > >         <guest csum='on' tso4='on' tso6='on'/>
> > > >
> > > > I have other options set, but I don't think they are related:
> > > >        <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
> > > > ufo='off' mrg_rxbuf='on'/>
> > > >        <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>
> > > >
> > > >
> > > > >       </driver>
> > > > >       <alias name='net2'/>
> > > > >       <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
> > > > > function='0x0'/>
> > > > >     </interface>
> > > > >
> > > > > When VM boots up, turn on tx, tso and sg
> > > > > # ethtool -K ens6 tx on
> > > > > # ethtool -K ens6 tso on
> > > > > # ethtool -K ens6 sg on
> > > >
> > > > All the needed offloading features are turned on by default,
> > > > so I don't change anything in my testbed.
> > > >
> > > > > Then run 'iperf -s' on one VM and 'iperf -c xx.xx.xx.xx' on another VM.
> > > > > Iperf doesn't work if there is no chage to VM's kernel. `tcpdump` shows
> > > > > that iperf server received packets with invalid TCP checksum.
> > > > > `nstat -a` shows that TcpInCsumErr number is accumulating.
> > > > >
> > > > > After adding changes to VM's kernel as below, iperf works properly.
> > > > > in tcp_v4_rcv()
> > > > >       - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> > > > >       + if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> > > > >
> > > > > static inline bool tcp_checksum_complete(struct sk_buff *skb)
> > > > > {
> > > > >         return 0;
> > > > > }
> > > >
> > > > That's odd. Which kernel is that? Maybe I can try the same version.
> > > > I am using 5.2.14-200.fc30.x86_64.
> > > >
> > > > Looks like somehow the packet lost its offloading flags, then kernel
> > > > has to check the csum and since it wasn't calculated before, it's
> > > > just random garbage.
> > > >
> > > > fbl
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > > Yifeng
> > > > >
> > > > > On Tue, Jan 28, 2020 at 2:52 PM Flavio Leitner <fbl@sysclose.org> wrote:
> > > > > >
> > > > > > On Tue, Jan 28, 2020 at 02:21:30PM -0800, Yifeng Sun wrote:
> > > > > > > Hi Flavio,
> > > > > > >
> > > > > > > Thanks for the explanation. I followed the steps in the document but
> > > > > > > TCP connection still failed to build between 2 VMs.
> > > > > > >
> > > > > > > I finally modified VM's kernel directly to disable TCP checksum validation
> > > > > > > to get it working properly. I got 30.0Gbps for 'iperf' between 2 VMs.
> > > > > >
> > > > > > Could you provide more details on how you did that? What's running
> > > > > > inside the VM?
> > > > > >
> > > > > > I don't change anything inside of the VMs (Linux) in my testbed.
> > > > > >
> > > > > > fbl
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Yifeng
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 28, 2020 at 4:00 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jan 27, 2020 at 05:17:01PM -0800, Yifeng Sun wrote:
> > > > > > > > > Hi Ilya,
> > > > > > > > >
> > > > > > > > > Thanks for your reply.
> > > > > > > > >
> > > > > > > > > The thing is, if checksum offloading is enabled in both VMs, then
> > > > > > > > > sender VM will send
> > > > > > > > > a packet with invalid TCP checksum, and later OVS will send this
> > > > > > > > > packet to receiver
> > > > > > > > > VM directly without calculating a valid checksum. As a result,
> > > > > > > > > receiver VM will drop
> > > > > > > > > this packet because it contains invalid checksum. This is what
> > > > > > > > > happened when I tried
> > > > > > > > > this patch.
> > > > > > > > >
> > > > > > > >
> > > > > > > > When TSO is enabled, the TX checksumming offloading is required,
> > > > > > > > then you will see invalid checksum. This is well documented here:
> > > > > > > >
> > > > > > > > https://github.com/openvswitch/ovs/blob/master/Documentation/topics/userspace-tso.rst#userspace-datapath---tso
> > > > > > > >
> > > > > > > > "Additionally, if the traffic is headed to a VM within the same host
> > > > > > > > further optimization can be expected. As the traffic never leaves
> > > > > > > > the machine, no MTU needs to be accounted for, and thus no
> > > > > > > > segmentation and checksum calculations are required, which saves yet
> > > > > > > > more cycles."
> > > > > > > >
> > > > > > > > Therefore, it's expected to see bad csum in the traffic dumps.
> > > > > > > >
> > > > > > > > To use the feature, you need few steps: enable the feature in OvS
> > > > > > > > enable in qemu and inside the VM. The linux guest usually enable
> > > > > > > > the feature by default if qemu offers it.
> > > > > > > >
> > > > > > > > HTH,
> > > > > > > > fbl
> > > > > > > >
> > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Yifeng
> > > > > > > > >
> > > > > > > > > On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> > > > > > > > > >
> > > > > > > > > > On 27.01.2020 18:24, Yifeng Sun wrote:
> > > > > > > > > > > Hi Flavio,
> > > > > > > > > > >
> > > > > > > > > > > I am testing your patch using iperf between 2 VMs on the same host.
> > > > > > > > > > > But it seems that TCP connection can't be created between these 2 VMs.
> > > > > > > > > > > When inspecting further, I found that TCP packets have invalid checksums.
> > > > > > > > > > > This might be the reason.
> > > > > > > > > > >
> > > > > > > > > > > I am wondering if I missed something in the setup? Thanks a lot.
> > > > > > > > > >
> > > > > > > > > > I didn't test myself, but according to current design, checksum offloading
> > > > > > > > > > (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
> > > > > > > > > > be dropped by the guest kernel.
> > > > > > > > > >
> > > > > > > > > > Best regards, Ilya Maximets.
> > > > > > > >
> > > > > > > > --
> > > > > > > > fbl
> > > > > >
> > > > > > --
> > > > > > fbl
> > > >
> > > > --
> > > > fbl
> >
> > --
> > fbl
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Yifeng Sun Feb. 14, 2020, 9:35 p.m. UTC | #25
Got it, thanks!

Yifeng

On Fri, Feb 14, 2020 at 11:29 AM Flavio Leitner <fbl@sysclose.org> wrote:
>
> On Fri, Feb 14, 2020 at 09:44:52AM -0800, Yifeng Sun wrote:
> > Hi Flavio,
> >
> > Can you please confirm the kernel versions you are using?
> >
> > Host KVM: 5.2.14-200.fc30.x86_64.
>
> Host KVM: 5.5.0+
>
> > VM: 4.15.0 from upstream ubuntu.
>
> VM: 4.15.0 from Linus git tree.
>
> fbl
>
> >
> > Thanks,
> > Yifeng
> >
> > On Thu, Feb 13, 2020 at 12:05 PM Flavio Leitner <fbl@sysclose.org> wrote:
> > >
> > >
> > > Hi Yifeng,
> > >
> > > Sorry the late response.
> > >
> > > On Wed, Jan 29, 2020 at 09:04:39AM -0800, Yifeng Sun wrote:
> > > > Hi Flavio,
> > > >
> > > > Sorry in my last email, one change is incorrect. it should be:
> > > > in tcp_v4_rcv()
> > > >       - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> > > >       + if (0)
> > > >
> > > > The kernel version I am using is ubuntu 18.04's default kernel:
> > > > $ uname -r
> > > > 4.15.0-76-generic
> > >
> > > I deployed a VM with 4.15.0 from upstream and I can ssh, scp (back
> > > and forth), iperf3 (direct, reverse, with TCP or UDP) between that
> > > VM and another VM, veth, bridge and another host without issues.
> > >
> > > Any chance for you to try with the same upstream kernel version?
> > >
> > > Thanks,
> > > fbl
> > >
> > > >
> > > > Thanks,
> > > > Yifeng
> > > >
> > > > On Wed, Jan 29, 2020 at 3:25 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > > > >
> > > > > On Tue, Jan 28, 2020 at 03:23:02PM -0800, Yifeng Sun wrote:
> > > > > > Sure.
> > > > > >
> > > > > > Firstly, make sure userspace-tso-enable is true
> > > > > > # ovs-vsctl get Open_vSwitch . other_config
> > > > > > {dpdk-init="true", enable-statistics="true", pmd-cpu-mask="0xf",
> > > > > > userspace-tso-enable="true"}
> > > > > >
> > > > > > Next, create 2 VMs with vhostuser-type interface on the same KVM host:
> > > > > >     <interface type='vhostuser'>
> > > > > >       <mac address='88:69:00:00:00:11'/>
> > > > > >       <source type='unix'
> > > > > > path='/tmp/041afca0-6e11-4eab-a62f-1ccf5cd318fd' mode='server'/>
> > > > > >       <model type='virtio'/>
> > > > > >       <driver queues='2' rx_queue_size='512'>
> > > > > >         <host csum='on' tso4='on' tso6='on'/>
> > > > > >         <guest csum='on' tso4='on' tso6='on'/>
> > > > >
> > > > > I have other options set, but I don't think they are related:
> > > > >        <host csum='on' gso='off' tso4='on' tso6='on' ecn='off'
> > > > > ufo='off' mrg_rxbuf='on'/>
> > > > >        <guest csum='on' tso4='on' tso6='on' ecn='off' ufo='off'/>
> > > > >
> > > > >
> > > > > >       </driver>
> > > > > >       <alias name='net2'/>
> > > > > >       <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
> > > > > > function='0x0'/>
> > > > > >     </interface>
> > > > > >
> > > > > > When VM boots up, turn on tx, tso and sg
> > > > > > # ethtool -K ens6 tx on
> > > > > > # ethtool -K ens6 tso on
> > > > > > # ethtool -K ens6 sg on
> > > > >
> > > > > All the needed offloading features are turned on by default,
> > > > > so I don't change anything in my testbed.
> > > > >
> > > > > > Then run 'iperf -s' on one VM and 'iperf -c xx.xx.xx.xx' on another VM.
> > > > > > Iperf doesn't work if there is no chage to VM's kernel. `tcpdump` shows
> > > > > > that iperf server received packets with invalid TCP checksum.
> > > > > > `nstat -a` shows that TcpInCsumErr number is accumulating.
> > > > > >
> > > > > > After adding changes to VM's kernel as below, iperf works properly.
> > > > > > in tcp_v4_rcv()
> > > > > >       - if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> > > > > >       + if (skb_checksum_init(skb, IPPROTO_TCP, inet_compute_pseudo))
> > > > > >
> > > > > > static inline bool tcp_checksum_complete(struct sk_buff *skb)
> > > > > > {
> > > > > >         return 0;
> > > > > > }
> > > > >
> > > > > That's odd. Which kernel is that? Maybe I can try the same version.
> > > > > I am using 5.2.14-200.fc30.x86_64.
> > > > >
> > > > > Looks like somehow the packet lost its offloading flags, then kernel
> > > > > has to check the csum and since it wasn't calculated before, it's
> > > > > just random garbage.
> > > > >
> > > > > fbl
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yifeng
> > > > > >
> > > > > > On Tue, Jan 28, 2020 at 2:52 PM Flavio Leitner <fbl@sysclose.org> wrote:
> > > > > > >
> > > > > > > On Tue, Jan 28, 2020 at 02:21:30PM -0800, Yifeng Sun wrote:
> > > > > > > > Hi Flavio,
> > > > > > > >
> > > > > > > > Thanks for the explanation. I followed the steps in the document but
> > > > > > > > TCP connection still failed to build between 2 VMs.
> > > > > > > >
> > > > > > > > I finally modified VM's kernel directly to disable TCP checksum validation
> > > > > > > > to get it working properly. I got 30.0Gbps for 'iperf' between 2 VMs.
> > > > > > >
> > > > > > > Could you provide more details on how you did that? What's running
> > > > > > > inside the VM?
> > > > > > >
> > > > > > > I don't change anything inside of the VMs (Linux) in my testbed.
> > > > > > >
> > > > > > > fbl
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yifeng
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jan 28, 2020 at 4:00 AM Flavio Leitner <fbl@sysclose.org> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jan 27, 2020 at 05:17:01PM -0800, Yifeng Sun wrote:
> > > > > > > > > > Hi Ilya,
> > > > > > > > > >
> > > > > > > > > > Thanks for your reply.
> > > > > > > > > >
> > > > > > > > > > The thing is, if checksum offloading is enabled in both VMs, then
> > > > > > > > > > sender VM will send
> > > > > > > > > > a packet with invalid TCP checksum, and later OVS will send this
> > > > > > > > > > packet to receiver
> > > > > > > > > > VM directly without calculating a valid checksum. As a result,
> > > > > > > > > > receiver VM will drop
> > > > > > > > > > this packet because it contains invalid checksum. This is what
> > > > > > > > > > happened when I tried
> > > > > > > > > > this patch.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > When TSO is enabled, the TX checksumming offloading is required,
> > > > > > > > > then you will see invalid checksum. This is well documented here:
> > > > > > > > >
> > > > > > > > > https://github.com/openvswitch/ovs/blob/master/Documentation/topics/userspace-tso.rst#userspace-datapath---tso
> > > > > > > > >
> > > > > > > > > "Additionally, if the traffic is headed to a VM within the same host
> > > > > > > > > further optimization can be expected. As the traffic never leaves
> > > > > > > > > the machine, no MTU needs to be accounted for, and thus no
> > > > > > > > > segmentation and checksum calculations are required, which saves yet
> > > > > > > > > more cycles."
> > > > > > > > >
> > > > > > > > > Therefore, it's expected to see bad csum in the traffic dumps.
> > > > > > > > >
> > > > > > > > > To use the feature, you need few steps: enable the feature in OvS
> > > > > > > > > enable in qemu and inside the VM. The linux guest usually enable
> > > > > > > > > the feature by default if qemu offers it.
> > > > > > > > >
> > > > > > > > > HTH,
> > > > > > > > > fbl
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yifeng
> > > > > > > > > >
> > > > > > > > > > On Mon, Jan 27, 2020 at 12:09 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On 27.01.2020 18:24, Yifeng Sun wrote:
> > > > > > > > > > > > Hi Flavio,
> > > > > > > > > > > >
> > > > > > > > > > > > I am testing your patch using iperf between 2 VMs on the same host.
> > > > > > > > > > > > But it seems that TCP connection can't be created between these 2 VMs.
> > > > > > > > > > > > When inspecting further, I found that TCP packets have invalid checksums.
> > > > > > > > > > > > This might be the reason.
> > > > > > > > > > > >
> > > > > > > > > > > > I am wondering if I missed something in the setup? Thanks a lot.
> > > > > > > > > > >
> > > > > > > > > > > I didn't test myself, but according to current design, checksum offloading
> > > > > > > > > > > (rx and tx) shuld be enabled in both VMs.  Otherwise all the packets will
> > > > > > > > > > > be dropped by the guest kernel.
> > > > > > > > > > >
> > > > > > > > > > > Best regards, Ilya Maximets.
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > fbl
> > > > > > >
> > > > > > > --
> > > > > > > fbl
> > > > >
> > > > > --
> > > > > fbl
> > >
> > > --
> > > fbl
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
> --
> fbl