From patchwork Fri Jul 31 02:55:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 1339204 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=QLmm2OBU; dkim-atps=neutral Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BHsN83B1Gz9sRN for ; Fri, 31 Jul 2020 12:55:35 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 5FB5A8872A; Fri, 31 Jul 2020 02:55:33 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id V-qe6yetg4tX; Fri, 31 Jul 2020 02:55:30 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id 588E388706; Fri, 31 Jul 2020 02:55:30 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 38EDDC0050; Fri, 31 Jul 2020 02:55:30 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 85D52C004D for ; Fri, 31 Jul 2020 02:55:28 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 7AE0A23B28 for ; Fri, 31 Jul 2020 02:55:28 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w3LfC06ryZ20 for ; Fri, 31 Jul 2020 02:55:26 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-pl1-f194.google.com (mail-pl1-f194.google.com [209.85.214.194]) by silver.osuosl.org (Postfix) with ESMTPS id B35832151F for ; Fri, 31 Jul 2020 02:55:26 +0000 (UTC) Received: by mail-pl1-f194.google.com with SMTP id m16so16043726pls.5 for ; Thu, 30 Jul 2020 19:55:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=cMLbwgToipdo79GtDXZav8YLAngkGgSqmRShUbHDP+k=; b=QLmm2OBUBy8w+Z+yVsJn+ozj3TFBCptL/eQ7htBh0R49lPHxENgN3KJm9el3Io/B7K GP2q91MP4WYPlMYLD4uQT3QSn/3quHAHGi+17M55TcWostNoNQfKVa8MRCmCEGedQFlK oL4ZTk6DTb2gLOLtu/sFfQI416ggpAOG1nSbo5Ruc68L0APW/KB07hYj0gns24rjuEMN nV61SPeQkI2yS1uqXxKiPk0/2LLA+b8bd5cdZxJREQwi49M8qhbnr3DZWcVIY/R8CsJV pA/VTLOiQc8BLWhadtyTAgKuydCCZr6JfJO57TRh3CDEO6oGZeKBD996+Rtg7lq2pnbU 0LCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=cMLbwgToipdo79GtDXZav8YLAngkGgSqmRShUbHDP+k=; b=WMx7X24+facyRGiR9sY1lfSURiN6VUmB7XTnHEK2TQq7HCkZfHYuWC4Sel5AQsDXdo PIMa3gXx8XpJ7EMepe5u6qUDId12CjoGjBSvApNDAKGZzxoWcEQq3Blq4ed3lV9Ka2lU +HWfJs5H+q39uMCMMgt25calob3N3GueWO2Z7NE7IXhKAr/2WdvroOCIf14+o8zMiqFM t709eEyH039NKJExFz/HfniQH6FdDIPy3YnZ1N1NwKEzLWLmmCkMrimu11r3mfozQrcF tXSD4Mk9GG8J6U6+5vhVEF/f72wPuuUUOyN1/sG0g8H9rWdBha0c5FLX18/O3JGal7Hx NoPA== X-Gm-Message-State: AOAM5321xj5dEpUqOxC6k2XeAAiBlChrTZxQpz6gUuev4DTQUUownnIS md9KsxIwj/XDFo6DBMbtwixbjsT6 X-Google-Smtp-Source: ABdhPJyqlbbEQIN4xPMbOqy8colcSv2tLgO0o9+fTm11AphWKUQdpD9iDmj5kW3ky5Zb8VgzNzxTJA== X-Received: by 2002:a17:902:bb83:: with SMTP id m3mr1738532pls.209.1596164125877; Thu, 30 Jul 2020 19:55:25 -0700 (PDT) Received: from localhost.localdomain (i121-115-229-245.s42.a013.ap.plala.or.jp. [121.115.229.245]) by smtp.gmail.com with ESMTPSA id q66sm8050497pga.29.2020.07.30.19.55.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Jul 2020 19:55:25 -0700 (PDT) From: Toshiaki Makita To: ovs-dev@openvswitch.org Date: Fri, 31 Jul 2020 11:55:09 +0900 Message-Id: <20200731025514.1669061-1-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Cc: Simon Horman , Ilya Maximets , Tim Rozet , Eli Britstein Subject: [ovs-dev] [PATCH v4 0/5] XDP offload using flow API provider X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This patch adds an XDP-based flow cache using the OVS netdev-offload flow API provider. When an OVS device with XDP offload enabled, packets first are processed in the XDP flow cache (with parse, and table lookup implemented in eBPF) and if hits, the action processing are also done in the context of XDP, which has the minimum overhead. This provider is based on top of William's recently posted patch for custom XDP load. When a custom XDP is loaded, the provider detects if the program supports classifier, and if supported it starts offloading flows to the XDP program. The patches are derived from xdp_flow[1], which is a mechanism similar to this but implemented in kernel. * Motivation While userspace datapath using netdev-afxdp or netdev-dpdk shows good performance, there are use cases where packets better to be processed in kernel, for example, TCP/IP connections, or container to container connections. Current solution is to use tap device or af_packet with extra kernel-to/from-userspace overhead. But with XDP, a better solution is to steer packets earlier in the XDP program, and decides to send to userspace datapath or stay in kernel. One problem with current netdev-afxdp is that it forwards all packets to userspace, The first patch from William (netdev-afxdp: Enable loading XDP program.) only provides the interface to load XDP program, howerver users usually don't know how to write their own XDP program. XDP also supports HW-offload so it may be possible to offload flows to HW through this provider in the future, although not currently. The reason is that map-in-map is required for our program to support classifier with subtables in XDP, but map-in-map is not offloadable. If map-in-map becomes offloadable, HW-offload of our program may also be possible. * How to use 1. Install clang/llvm >= 9, libbpf >= 0.0.6 (included in kernel 5.5), and kernel >= 5.3. 2. make with --enable-afxdp --enable-xdp-offload --enable-bpf will generate XDP program "bpf/flowtable_afxdp.o". Note that the BPF object will not be installed anywhere by "make install" at this point. 3. Load custom XDP program E.g. $ ovs-vsctl add-port ovsbr0 veth0 -- set int veth0 options:xdp-mode=native \ options:xdp-obj="/path/to/ovs/bpf/flowtable_afxdp.o" $ ovs-vsctl add-port ovsbr0 veth1 -- set int veth1 options:xdp-mode=native \ options:xdp-obj="/path/to/ovs/bpf/flowtable_afxdp.o" 4. Enable XDP_REDIRECT If you use veth devices, make sure to load some (possibly dummy) programs on the peers of veth devices. This patch set includes a program which does nothing but returns XDP_PASS. You can use it for the veth peer like this: $ ip link set veth1 xdpdrv object /path/to/ovs/bpf/xdp_noop.o section xdp Some HW NIC drivers require as many queues as cores on its system. Tweak queues using "ethtool -L". 5. Enable hw-offload $ ovs-vsctl set Open_vSwitch . other_config:offload-driver=linux_xdp $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true This will starts offloading flows to the XDP program. You should be able to see some maps installed, including "debug_stats". $ bpftool map If packets are successfully redirected by the XDP program, debug_stats[2] will be counted. $ bpftool map dump id Currently only very limited keys and output actions are supported. For example NORMAL action entry and IP based matching work with current key support. VLAN actions used by port tag/trunks are also supported. * Performance Tested 2 cases. 1) i40e to veth, 2) i40e to i40e. Test 1 Measured drop rate at veth interface with redirect action from physical interface (i40e 25G NIC, XXV 710) to veth. The CPU is Xeon Silver 4114 (2.20 GHz). XDP_DROP +------+ +-------+ +-------+ pktgen -- wire --> | eth0 | -- NORMAL ACTION --> | veth0 |----| veth2 | +------+ +-------+ +-------+ Test 2 uses i40e instead of veth, and measured tx packet rate at output device. Single-flow performance test results: 1) i40e-veth a) no-zerocopy in i40e - xdp 3.7 Mpps - afxdp 980 kpps b) zerocopy in i40e (veth does not have zc) - xdp 1.9 Mpps - afxdp 980 Kpps 2) i40e-i40e a) no-zerocopy - xdp 3.5 Mpps - afxdp 1.5 Mpps b) zerocopy - xdp 2.0 Mpps - afxdp 4.4 Mpps ** xdp is better when zc is disabled. The reason of poor performance on zc is that xdp_frame requires packet memory allocation and memcpy on XDP_REDIRECT to other devices iff zc is enabled. ** afxdp with zc is better than xdp without zc, but afxdp is using 2 cores in this case, one is pmd and the other is softirq. When pmd and softirq were running on the same core, the performance was extremely poor as pmd consumes cpus. I also tested afxdp-nonpmd to run softirq and userspace processing on the same core, but the result was lower than (pmd results) / 2. With nonpmd, xdp performance was the same as xdp with pmd. This means xdp only uses one core (for softirq only). Even with pmd, we need only one pmd for xdp even when we want to use more cores for multi-flow. This patch set is based on top of commit e8bf77748 ("odp-util: Fix clearing match mask if set action is partially unnecessary."). To make review easier I left pre-squashed commits from v3 here. https://github.com/tmakita/ovs/compare/xdp_offload_v3...tmakita:xdp_offload_v4_history?expand=1 [1] https://lwn.net/Articles/802653/ v4: - Fix checkpatch errors. - Fix duplicate flow api register. - Don't call unnecessary flow api init callbacks when default flow api provider can be used. - Fix typo in comments. - Improve bpf Makefile.am to support automatic dependencies. - Add a dummy XDP program for veth peers. - Rename netdev_info to netdev_xdp_info. - Use id-pool for free subtable entry management and devmap indexes. - Rename --enable-bpf to --enable-xdp-offload. - Compile xdp flow api provider only with --enable-xdp-offload. - Tested again and updated performance numbers in cover letter (get slightly better numbers). v3: - Use ".ovs_meta" section to inform vswitchd of metadata like supported keys. - Rewrite action loop logic in bpf to support multiple actions. - Add missing linux/types.h in acinclude.m4, as per William Tu. - Fix infinite reconfiguration loop when xsks_map is missing. - Add vlan-related actions in bpf program. - Fix CI build error. - Fix inability to delete subtable entries. v2: - Add uninit callback of netdev-offload-xdp. - Introduce "offload-driver" other_config to specify offload driver. - Add --enable-bpf (HAVE_BPF) config option to build bpf programs. - Workaround incorrect UINTPTR_MAX in x64 clang bpf build. - Fix boot.sh autoconf warning. Toshiaki Makita (4): netdev-offload: Add "offload-driver" other_config to specify offload driver netdev-offload: Add xdp flow api provider bpf: Add reference XDP program implementation for netdev-offload-xdp bpf: Add dummy program for veth devices William Tu (1): netdev-afxdp: Enable loading XDP program. .travis.yml | 2 +- Documentation/intro/install/afxdp.rst | 59 ++ Makefile.am | 9 +- NEWS | 2 + acinclude.m4 | 60 ++ bpf/.gitignore | 4 + bpf/Makefile.am | 83 ++ bpf/bpf_compiler.h | 25 + bpf/bpf_miniflow.h | 179 ++++ bpf/bpf_netlink.h | 63 ++ bpf/bpf_workaround.h | 28 + bpf/flowtable_afxdp.c | 585 ++++++++++++ bpf/xdp_noop.c | 31 + configure.ac | 2 + lib/automake.mk | 8 + lib/bpf-util.c | 38 + lib/bpf-util.h | 22 + lib/netdev-afxdp.c | 373 +++++++- lib/netdev-afxdp.h | 3 + lib/netdev-linux-private.h | 5 + lib/netdev-offload-provider.h | 8 +- lib/netdev-offload-xdp.c | 1213 +++++++++++++++++++++++++ lib/netdev-offload-xdp.h | 49 + lib/netdev-offload.c | 42 + 24 files changed, 2881 insertions(+), 12 deletions(-) create mode 100644 bpf/.gitignore create mode 100644 bpf/Makefile.am create mode 100644 bpf/bpf_compiler.h create mode 100644 bpf/bpf_miniflow.h create mode 100644 bpf/bpf_netlink.h create mode 100644 bpf/bpf_workaround.h create mode 100644 bpf/flowtable_afxdp.c create mode 100644 bpf/xdp_noop.c create mode 100644 lib/bpf-util.c create mode 100644 lib/bpf-util.h create mode 100644 lib/netdev-offload-xdp.c create mode 100644 lib/netdev-offload-xdp.h