[ovs-dev] RFC: netdev-afxdp: Support for XDP metadata HW hints.

One big problem of netdev-afxdp is that there is no metadata support
from the hardware at all.  For example, OVS netdev-afxdp has to do rxhash,
or TCP checksum in software, resulting in high performance overhead.

A generic meta data type for XDP frame using BTF is proposed[1] and
there is sample implementation[2][3].  This patch experiments enabling 
the XDP metadata, or called HW hints, and shows the potential performance
improvement.  The patch uses only the rxhash value provided from HW,
so avoiding at the calculation of hash at lib/dpif-netdev.c:
    if (!dp_packet_rss_valid(execute->packet)) {
        dp_packet_set_rss_hash(execute->packet,
                               flow_hash_5tuple(execute->flow, 0));
    }

Using '$ ovs-appctl dpif-netdev/pmd-stats-show', the 'avg processing
cycles per packet' drops from 402 to 272.  More details below

Reference:
----------
[1] https://www.kernel.org/doc/html/latest/bpf/btf.html
[2] https://netdevconf.info/0x14/pub/slides/54/[1]%20XDP%20meta%20data%20acceleration.pdf
[3] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/log/?h=topic/xdp_metadata4

Testbed:
--------
Two Xeon E5-2620 v3 2.4GHz connected back-to-back using Mellanox
ConnectX-6Dx 25GbE. Before starting OVS, enable the MD by:
$ bpftool net xdp show
xdp:
enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(0)
enp2s0f1np1(5) md_btf_id(2) md_btf_enabled(0)
$ bpftool net xdp set dev enp2s0f0np0 md_btf on
$ bpftool net xdp
xdp:
enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(1)

Limitations/TODO:
-----------------
1. Support only AF_XDP native mode, not zero-copy mode.
2. Currently only three fields: vlan, hash, and flow_mark, and only receive
   side supports XDP metadata.
3. Control plane, how to enable and probe the structure, not upstream yet.

OVS rxdrop without HW hints:
---------------------------
Drop rate: 4.8Mpps

pmd thread numa_id 0 core_id 3:
  packets received: 196592006
  packet recirculations: 0
  avg. datapath passes per packet: 1.00
  emc hits: 196592006
  smc hits: 0
  megaflow hits: 0
  avg. subtable lookups per megaflow hit: 0.00
  miss with success upcall: 0
  miss with failed upcall: 0
  avg. packets per output batch: 0.00
  idle cycles: 56009063835 (41.43%)
  processing cycles: 79164971931 (58.57%)
  avg cycles per packet: 687.59 (135174035766/196592006)
  avg processing cycles per packet: 402.69 (79164971931/196592006)

pmd thread numa_id 0 core_id 3:
  Iterations:           339607649  (0.23 us/it)
  - Used TSC cycles: 188620512777  ( 99.9 % of total cycles)
  - idle iterations:    330697002  ( 40.3 % of used cycles)
  - busy iterations:      8910647  ( 59.7 % of used cycles)
  Rx packets:           285140031  (3624 Kpps, 395 cycles/pkt)
  Datapath passes:      285140031  (1.00 passes/pkt)
  - EMC hits:           285139999  (100.0 %)
  - SMC hits:                   0  (  0.0 %)
  - Megaflow hits:              0  (  0.0 %, 0.00 subtbl lookups/hit)
  - Upcalls:                    0  (  0.0 %, 0.0 us/upcall)
  - Lost upcalls:               0  (  0.0 %)
  Tx packets:                   0

Perf report:
  17.56%  pmd-c03/id:11  ovs-vswitchd        [.] netdev_afxdp_rxq_recv
  14.39%  pmd-c03/id:11  ovs-vswitchd        [.] dp_netdev_process_rxq_port
  14.17%  pmd-c03/id:11  ovs-vswitchd        [.] pmd_thread_main
  10.86%  pmd-c03/id:11  [vdso]              [.] __vdso_clock_gettime
  10.19%  pmd-c03/id:11  ovs-vswitchd        [.] pmd_perf_end_iteration
   7.71%  pmd-c03/id:11  ovs-vswitchd        [.] time_timespec__
   5.64%  pmd-c03/id:11  ovs-vswitchd        [.] time_usec
   3.88%  pmd-c03/id:11  ovs-vswitchd        [.] netdev_get_class
   2.95%  pmd-c03/id:11  ovs-vswitchd        [.] netdev_rxq_recv
   2.78%  pmd-c03/id:11  libbpf.so.0.2.0     [.] xsk_socket__fd
   2.74%  pmd-c03/id:11  ovs-vswitchd        [.] pmd_perf_start_iteration
   2.11%  pmd-c03/id:11  libc-2.27.so        [.] __clock_gettime
   1.32%  pmd-c03/id:11  ovs-vswitchd        [.] xsk_socket__fd@plt

OVS rxdrop with HW hints:
-------------------------
rxdrop rate: 4.73Mpps

pmd thread numa_id 0 core_id 7:
  packets received: 13686880
  packet recirculations: 0
  avg. datapath passes per packet: 1.00
  emc hits: 13686880
  smc hits: 0
  megaflow hits: 0
  avg. subtable lookups per megaflow hit: 0.00
  miss with success upcall: 0
  miss with failed upcall: 0
  avg. packets per output batch: 0.00
  idle cycles: 3182105544 (46.02%)
  processing cycles: 3732023844 (53.98%)
  avg cycles per packet: 505.16 (6914129388/13686880)
  avg processing cycles per packet: 272.67 (3732023844/13686880)

pmd thread numa_id 0 core_id 7:

  Iterations:           392909539  (0.18 us/it)
  - Used TSC cycles: 167697342678  ( 99.9 % of total cycles)
  - idle iterations:    382539861  ( 46.0 % of used cycles)
  - busy iterations:     10369678  ( 54.0 % of used cycles)
  Rx packets:           331829656  (4743 Kpps, 273 cycles/pkt)
  Datapath passes:      331829656  (1.00 passes/pkt)
  - EMC hits:           331829656  (100.0 %)
  - SMC hits:                   0  (  0.0 %)
  - Megaflow hits:              0  (  0.0 %, 0.00 subtbl lookups/hit)
  - Upcalls:                    0  (  0.0 %, 0.0 us/upcall)
  - Lost upcalls:               0  (  0.0 %)
  Tx packets:                   0

Perf record/report:
  22.96%  pmd-c07/id:8  ovs-vswitchd        [.] netdev_afxdp_rxq_recv
  10.43%  pmd-c07/id:8  ovs-vswitchd        [.] miniflow_extract
   7.20%  pmd-c07/id:8  ovs-vswitchd        [.] dp_packet_init__
   7.00%  pmd-c07/id:8  ovs-vswitchd        [.] dp_netdev_input__
   6.79%  pmd-c07/id:8  ovs-vswitchd        [.] dp_netdev_process_rxq_port
   6.62%  pmd-c07/id:8  ovs-vswitchd        [.] pmd_thread_main
   5.65%  pmd-c07/id:8  ovs-vswitchd        [.] pmd_perf_end_iteration
   5.04%  pmd-c07/id:8  [vdso]              [.] __vdso_clock_gettime
   3.60%  pmd-c07/id:8  ovs-vswitchd        [.] time_timespec__
   3.10%  pmd-c07/id:8  ovs-vswitchd        [.] umem_elem_push
   2.74%  pmd-c07/id:8  libc-2.27.so        [.] __memcmp_avx2_movbe
   2.62%  pmd-c07/id:8  ovs-vswitchd        [.] time_usec
   2.14%  pmd-c07/id:8  ovs-vswitchd        [.] dp_packet_use_afxdp
   1.58%  pmd-c07/id:8  ovs-vswitchd        [.] netdev_rxq_recv
   1.47%  pmd-c07/id:8  ovs-vswitchd        [.] netdev_get_class
   1.34%  pmd-c07/id:8  ovs-vswitchd        [.] pmd_perf_start_iteration

Signed-off-by: William Tu <u9012063@gmail.com>
---
 lib/netdev-afxdp.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Message ID	1614882425-52800-1-git-send-email-u9012063@gmail.com
State	RFC
Headers	show Return-Path: <ovs-dev-bounces@openvswitch.org> From: William Tu <u9012063@gmail.com> To: dev@openvswitch.org Date: Thu, 4 Mar 2021 10:27:05 -0800 Message-Id: <1614882425-52800-1-git-send-email-u9012063@gmail.com> Cc: brouer@redhat.com, i.maximets@ovn.org, vedang.patel@intel.com, dsahern@gmail.com, bjorn.topel@intel.com, saeedm@nvidia.com, magnus.karlsson@intel.com Subject: [ovs-dev] [PATCH] RFC: netdev-afxdp: Support for XDP metadata HW hints. Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" <ovs-dev-bounces@openvswitch.org>
Series	[ovs-dev] RFC: netdev-afxdp: Support for XDP metadata HW hints. \| expand [ovs-dev] RFC: netdev-afxdp: Support for XDP metadata HW hints.

[ovs-dev] RFC: netdev-afxdp: Support for XDP metadata HW hints.

Commit Message

Comments

Patch