mbox series

[ovs-dev,RFC,0/2] dpdk: Add support for TSO

Message ID 1533742768-204340-1-git-send-email-tiago.lam@intel.com
Headers show
Series dpdk: Add support for TSO | expand

Message

Lam, Tiago Aug. 8, 2018, 3:39 p.m. UTC
Enabling TSO offload allows a host stack to delegate the segmentation of
oversized TCP packets to the underlying physical NIC, if supported. In the case
of a VM this means that the segmentation of the packets is not performed by the
guest kernel, but by the host NIC itself. In turn, since the TSO calculations
and checksums are being performed in hardware, this alleviates the CPU load on
the host system. In inter VM communication this might account to significant
savings, and higher throughput, even more so if the VMs are running on the same
host.

Thus, although inter VM communication is already possible as is, there's a
sacrifice in terms of CPU, which may affect the overall throughput.

This series proposes to add support for TSO in OvS-DPDK, by making use of the
TSO offloading feature already supported by DPDK vhost backend, having the
following scenarios in mind:
- Inter VM communication on the same host;
- Inter VM communication on different hosts;
- The same two use cases above, but on a VLAN network.

The work is based on [1]; It has been rebased to run on top of the
multi-segment mbufs work (v7) [2] and re-worked to use the new Tx offload API
as per [3].

[1] https://patchwork.ozlabs.org/patch/749564/
[2] https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/350081.html
[3] http://dpdk.readthedocs.io/en/v17.11/rel_notes/deprecation.html

Considerations:
- This series depends on the multi-segment mbuf series (v7) and can't be
  applied on master as is;
- Right now TSO is enabled by default when using multi-segment mbufs, or
  otherwise TSO is disabled and can't be set on its own. I'm open to opinions
  on this, though. My main idea to stick TSO behind multi-segment mbufs is:
  - The performance enhancements enabled by DPDK PMDs, such as vectorization,
    that are disabled when using multi-segment mbufs are also disabled when
    using some of the offload features;
  - By enabling the multi-segment mbuf flag, the packets have a default size of
    2048B. This might be higher if not using multi-segments mbufs, since the
    size of the packet will be ajusted to the MTU (such as 9000B). This might
    lead to a higher waste of memory when appending mbufs to each other as we
    will be increamenting larger ammounts.
- There's some initial documentation on patch 1/2 (which came form [1]), but
  needs improving.

Tiago Lam (2):
  netdev-dpdk: Consider packets marked for TSO.
  netdev-dpdk: Enable TSO when using multi-seg mbufs

 Documentation/topics/dpdk/phy.rst |  64 ++++++++++++++
 lib/dp-packet.c                   |   5 +-
 lib/netdev-dpdk.c                 | 172 ++++++++++++++++++++++++++++++--------
 3 files changed, 204 insertions(+), 37 deletions(-)

Comments

Ilya Maximets Aug. 23, 2018, 2:36 p.m. UTC | #1
> Enabling TSO offload allows a host stack to delegate the segmentation of
> oversized TCP packets to the underlying physical NIC, if supported. In the case
> of a VM this means that the segmentation of the packets is not performed by the
> guest kernel, but by the host NIC itself. In turn, since the TSO calculations
> and checksums are being performed in hardware, this alleviates the CPU load on
> the host system. In inter VM communication this might account to significant
> savings, and higher throughput, even more so if the VMs are running on the same
> host.
> 
> Thus, although inter VM communication is already possible as is, there's a
> sacrifice in terms of CPU, which may affect the overall throughput.
> 
> This series proposes to add support for TSO in OvS-DPDK, by making use of the
> TSO offloading feature already supported by DPDK vhost backend, having the
> following scenarios in mind:
> - Inter VM communication on the same host;
> - Inter VM communication on different hosts;
> - The same two use cases above, but on a VLAN network.
> 
> The work is based on [1]; It has been rebased to run on top of the
> multi-segment mbufs work (v7) [2] and re-worked to use the new Tx offload API
> as per [3].
> 
> [1] https://patchwork.ozlabs.org/patch/749564/
> [2] https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/350081.html
> [3] http://dpdk.readthedocs.io/en/v17.11/rel_notes/deprecation.html
> 
> Considerations:
> - This series depends on the multi-segment mbuf series (v7) and can't be
>   applied on master as is;
> - Right now TSO is enabled by default when using multi-segment mbufs, or
>   otherwise TSO is disabled and can't be set on its own. I'm open to opinions
>   on this, though. My main idea to stick TSO behind multi-segment mbufs is:
>   - The performance enhancements enabled by DPDK PMDs, such as vectorization,
>     that are disabled when using multi-segment mbufs are also disabled when
>     using some of the offload features;
>   - By enabling the multi-segment mbuf flag, the packets have a default size of
>     2048B. This might be higher if not using multi-segments mbufs, since the
>     size of the packet will be ajusted to the MTU (such as 9000B). This might
>     lead to a higher waste of memory when appending mbufs to each other as we
>     will be increamenting larger ammounts.
> - There's some initial documentation on patch 1/2 (which came form [1]), but
>   needs improving.

I don't want to make a full review right now. Here is the list of issues
that I see at the first glance:

1. No support for devices that doesn't support TSO.
   * netdev-linux/bsd/etc.
     You need to implement software TSO and split packets before sending
     to interfaces that doesn't support TSO. For usual linux taps/sockets
     it's possible to enable TSO support by using virtio-net headars as
     an option.
   * Support tunneling.
     At least, you need to recalculate checksums before encapsulating.
   * netdev-dpdk physical devices that has no TSO support.
     Not sure if there are such NICs, but we have to support this, because
     it's possible. SoftNIC?
   * netdev-dpdk ring aka dpdkr.
     Not sure, which approach to use here. Software TSO?
   * vhost-user devices that did not negotiate offloading.
     Commit message of the first patch states that but does not implement.
     Feature set could be changing by reloading guest driver.
     The configuration should be applied smoothly without OVS reconfiguration,
     otherwise guest will be able to force OVS to reconfigure infinitely.

2. netdev_dpdk_prep_tso_packet issues.
   * It calculates preudoheader checksums which is only needed for Intel NICs.
     You have to use 'rte_eth_tx_prepare' API instead to avoid issues with
     other NICs.

> Tiago Lam (2):
>   netdev-dpdk: Consider packets marked for TSO.
>   netdev-dpdk: Enable TSO when using multi-seg mbufs
> 
>  Documentation/topics/dpdk/phy.rst |  64 ++++++++++++++
>  lib/dp-packet.c                   |   5 +-
>  lib/netdev-dpdk.c                 | 172 ++++++++++++++++++++++++++++++--------
>  3 files changed, 204 insertions(+), 37 deletions(-)
> 
> -- 
> 2.7.4