From patchwork Mon Nov 2 12:48:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392217 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt5h3g9hz9sVv; Mon, 2 Nov 2020 23:49:12 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZGh-0005vF-48; Mon, 02 Nov 2020 12:49:03 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGf-0005up-Px for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:01 +0000 Received: from mail-qk1-f200.google.com ([209.85.222.200]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGf-00042P-FB for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:01 +0000 Received: by mail-qk1-f200.google.com with SMTP id t70so8448366qka.11 for ; Mon, 02 Nov 2020 04:49:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=x4QS7HRWjXPXcTjiXJuej/jwG8fQnZPe/Vd9L0yCw4s=; b=rxrUB41gaaaxm7hXf4rjjXDYRcXSd53fEhec8gYyNdKk2epB+JE5Wjllr7GySZkg9p uLZY6qeKc38H+IAi3ya9nN4WBzKz0IAL+udZ6rhFmoTcdc+39K8u44C5y3m7nHbVCnX0 V4rRQjq2XWtoqwzVlcg2+hda2u11kT+ObowwcowuxoVN/YeoHoQcUtVN3hfr/+d2mf22 vBCKcNWFWQc0COPbK/aNv/w0BdlWHXV3pwx7oItBaluKCh9ynv1N/ZY9Q771o1iLmzYy METgBB63zOPHVjX0iFTzLolsmvNGzB1T/DwqusmdVMjm08rlm7TtXh4KynGnFGT5nVgn VnPA== X-Gm-Message-State: AOAM5301W6mkeSO34Kf7q0cDQswcRfuyfh9FuW1XdU1LFo1ohtH3e9eo OiFzjESNTcPycZHhqIfcN7bAFJh8YZGnbv24Sw774/Z32EXU0X7xlIVKJSFdR3kP+9rtOBNeVgw 2298IKCoK84fxYOcj1mue+8eTRm/q8OvPV1pf11Z1JQ== X-Received: by 2002:ac8:1493:: with SMTP id l19mr13208320qtj.198.1604321339904; Mon, 02 Nov 2020 04:48:59 -0800 (PST) X-Google-Smtp-Source: ABdhPJy9j1ZzuPfJxWlAZlRkSMuUourRaa5aX7pd+G0vll8uIvini3fWvR7SF32skptwhBtcjbGuxw== X-Received: by 2002:ac8:1493:: with SMTP id l19mr13208302qtj.198.1604321339689; Mon, 02 Nov 2020 04:48:59 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.48.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:48:59 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 01/40] hv_netvsc: pass netvsc_device to receive callback Date: Mon, 2 Nov 2020 07:48:17 -0500 Message-Id: <20201102124856.4659-2-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Stephen Hemminger BugLink: https://bugs.launchpad.net/bugs/1877654 The netvsc_receive_callback function was using RCU to find the appropriate underlying netvsc_device. Since calling function already had that pointer, this was unnecessary. Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller (backported from commit 345ac08990b8365294f9756da806f357c239d758) [ vilhelmgray: context adjustments ] Signed-off-by: William Breathitt Gray --- drivers/net/hyperv/hyperv_net.h | 1 + drivers/net/hyperv/netvsc_drv.c | 13 ++----------- drivers/net/hyperv/rndis_filter.c | 3 ++- 3 files changed, 5 insertions(+), 12 deletions(-) diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 83e040359037..5d1c91500df0 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -202,6 +202,7 @@ int netvsc_send(struct net_device *net, void netvsc_linkstatus_callback(struct net_device *net, struct rndis_message *resp); int netvsc_recv_callback(struct net_device *net, + struct netvsc_device *nvdev, struct vmbus_channel *channel, void *data, u32 len, const struct ndis_tcp_ip_checksum_info *csum_info, diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 74e00120850c..3f1b3e6ae4aa 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -831,33 +831,25 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net, * "wire" on the specified device. */ int netvsc_recv_callback(struct net_device *net, + struct netvsc_device *net_device, struct vmbus_channel *channel, void *data, u32 len, const struct ndis_tcp_ip_checksum_info *csum_info, const struct ndis_pkt_8021q_info *vlan) { struct net_device_context *net_device_ctx = netdev_priv(net); - struct netvsc_device *net_device; u16 q_idx = channel->offermsg.offer.sub_channel_index; - struct netvsc_channel *nvchan; + struct netvsc_channel *nvchan = &net_device->chan_table[q_idx]; struct sk_buff *skb; struct netvsc_stats *rx_stats; if (net->reg_state != NETREG_REGISTERED) return NVSP_STAT_FAIL; - rcu_read_lock(); - net_device = rcu_dereference(net_device_ctx->nvdev); - if (unlikely(!net_device)) - goto drop; - - nvchan = &net_device->chan_table[q_idx]; - /* Allocate a skb - TODO direct I/O to pages? */ skb = netvsc_alloc_recv_skb(net, &nvchan->napi, csum_info, vlan, data, len); if (unlikely(!skb)) { -drop: ++net->stats.rx_dropped; rcu_read_unlock(); return NVSP_STAT_FAIL; @@ -882,7 +874,6 @@ int netvsc_recv_callback(struct net_device *net, u64_stats_update_end(&rx_stats->syncp); napi_gro_receive(&nvchan->napi, skb); - rcu_read_unlock(); return NVSP_STAT_SUCCESS; } diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c index 29e8741e1891..4d2eceb3a694 100644 --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -397,7 +397,8 @@ static int rndis_filter_receive_data(struct net_device *ndev, */ data = (void *)((unsigned long)data + data_offset); csum_info = rndis_get_ppi(rndis_pkt, TCPIP_CHKSUM_PKTINFO); - return netvsc_recv_callback(ndev, channel, + + return netvsc_recv_callback(ndev, nvdev, channel, data, rndis_pkt->data_len, csum_info, vlan); } From patchwork Mon Nov 2 12:48:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392216 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt5h3l1Tz9sWD; Mon, 2 Nov 2020 23:49:12 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZGl-0005xM-O1; Mon, 02 Nov 2020 12:49:07 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGg-0005v0-Li for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:02 +0000 Received: from mail-qt1-f198.google.com ([209.85.160.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGg-00042w-Bl for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:02 +0000 Received: by mail-qt1-f198.google.com with SMTP id y5so584498qtb.13 for ; Mon, 02 Nov 2020 04:49:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QP2zhhyHgLfGCXMl0xnwOnPwh6sBGojgL5DqxmTdU0c=; b=NUlZcyQVSTEEFPEWVynloQjEipSqpjaVyag+LtFyjql99nRil9lEeYvxMkHdP+/DAM sjVQa35SG5RvhpUMa5Fj+QBVCvx9EQ6aE9G7jsraS8m6IzgfbpLPGYVLLP83JSIy7G98 raWKvqwiHdBDpIeufMjBzx2fL17ZXJ8mgLxC5ADDIZFGAKuji7nBrQ1/481dTqIInSPr qQr2jZA7a4MjYaL1L84mqwSJYsA/ACMdJ+zN5LZK7I2fA+bIoIMAdeVP6U8F9fn1FVLR xvR7Xj58wWbTWXcSykat6FECH1F80x2wXwO7Rq6hZJXHzYQdDGr57UQrd0hmwwt5Kt4R fGpA== X-Gm-Message-State: AOAM5330d3Dg1IrpcUrOqREeCKLGANPPqzuHQRRO9aFHKf9sWCOhsyVH BL07r8QCb9PfR5e7BNblaQBQNpI0Vf4A0vKh67BH5EN+VT89A/j27VxVCmAiRMs22ISTHKI5RHI 7SSDPinxkwqKFeWqrz8Lwhs/5NMVlAwvN2TWMoqSFNw== X-Received: by 2002:ac8:6946:: with SMTP id n6mr14072395qtr.247.1604321340834; Mon, 02 Nov 2020 04:49:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJwDCgfd1FVTgp8j2fC6L87Wq0/wrg55B3JoSsjUZcibpdPlAhQVYh5z+u0VljIzjzPyMUrHYQ== X-Received: by 2002:ac8:6946:: with SMTP id n6mr14072377qtr.247.1604321340593; Mon, 02 Nov 2020 04:49:00 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.48.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:00 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 02/40] xdp: base API for new XDP rx-queue info concept Date: Mon, 2 Nov 2020 07:48:18 -0500 Message-Id: <20201102124856.4659-3-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 This patch only introduce the core data structures and API functions. All XDP enabled drivers must use the API before this info can used. There is a need for XDP to know more about the RX-queue a given XDP frames have arrived on. For both the XDP bpf-prog and kernel side. Instead of extending xdp_buff each time new info is needed, the patch creates a separate read-mostly struct xdp_rxq_info, that contains this info. We stress this data/cache-line is for read-only info. This is NOT for dynamic per packet info, use the data_meta for such use-cases. The performance advantage is this info can be setup at RX-ring init time, instead of updating N-members in xdp_buff. A possible (driver level) micro optimization is that xdp_buff->rxq assignment could be done once per XDP/NAPI loop. The extra pointer deref only happens for program needing access to this info (thus, no slowdown to existing use-cases). Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Alexei Starovoitov (backported from commit aecd67b60722dd24353b0bc50e78a55b30707dcd) [ vilhelmgray: context adjustment ] Signed-off-by: William Breathitt Gray --- include/linux/filter.h | 2 ++ include/net/xdp.h | 47 +++++++++++++++++++++++++++++ net/core/Makefile | 2 +- net/core/xdp.c | 67 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 117 insertions(+), 1 deletion(-) create mode 100644 include/net/xdp.h create mode 100644 net/core/xdp.c diff --git a/include/linux/filter.h b/include/linux/filter.h index baec2c269602..158fb795cba7 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -19,6 +19,7 @@ #include #include +#include #include #include @@ -493,6 +494,7 @@ struct xdp_buff { void *data_end; void *data_meta; void *data_hard_start; + struct xdp_rxq_info *rxq; }; /* Compute the linear packet data range [data, data_end) which diff --git a/include/net/xdp.h b/include/net/xdp.h new file mode 100644 index 000000000000..86c41631a908 --- /dev/null +++ b/include/net/xdp.h @@ -0,0 +1,47 @@ +/* include/net/xdp.h + * + * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc. + * Released under terms in GPL version 2. See COPYING. + */ +#ifndef __LINUX_NET_XDP_H__ +#define __LINUX_NET_XDP_H__ + +/** + * DOC: XDP RX-queue information + * + * The XDP RX-queue info (xdp_rxq_info) is associated with the driver + * level RX-ring queues. It is information that is specific to how + * the driver have configured a given RX-ring queue. + * + * Each xdp_buff frame received in the driver carry a (pointer) + * reference to this xdp_rxq_info structure. This provides the XDP + * data-path read-access to RX-info for both kernel and bpf-side + * (limited subset). + * + * For now, direct access is only safe while running in NAPI/softirq + * context. Contents is read-mostly and must not be updated during + * driver NAPI/softirq poll. + * + * The driver usage API is a register and unregister API. + * + * The struct is not directly tied to the XDP prog. A new XDP prog + * can be attached as long as it doesn't change the underlying + * RX-ring. If the RX-ring does change significantly, the NIC driver + * naturally need to stop the RX-ring before purging and reallocating + * memory. In that process the driver MUST call unregistor (which + * also apply for driver shutdown and unload). The register API is + * also mandatory during RX-ring setup. + */ + +struct xdp_rxq_info { + struct net_device *dev; + u32 queue_index; + u32 reg_state; +} ____cacheline_aligned; /* perf critical, avoid false-sharing */ + +int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq, + struct net_device *dev, u32 queue_index); +void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq); +void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq); + +#endif /* __LINUX_NET_XDP_H__ */ diff --git a/net/core/Makefile b/net/core/Makefile index 1fd0a9c88b1b..6dbbba8c57ae 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -11,7 +11,7 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.o obj-y += dev.o ethtool.o dev_addr_lists.o dst.o netevent.o \ neighbour.o rtnetlink.o utils.o link_watch.o filter.o \ sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \ - fib_notifier.o + fib_notifier.o xdp.o obj-y += net-sysfs.o obj-$(CONFIG_PROC_FS) += net-procfs.o diff --git a/net/core/xdp.c b/net/core/xdp.c new file mode 100644 index 000000000000..229bc5a0ee04 --- /dev/null +++ b/net/core/xdp.c @@ -0,0 +1,67 @@ +/* net/core/xdp.c + * + * Copyright (c) 2017 Jesper Dangaard Brouer, Red Hat Inc. + * Released under terms in GPL version 2. See COPYING. + */ +#include +#include + +#include + +#define REG_STATE_NEW 0x0 +#define REG_STATE_REGISTERED 0x1 +#define REG_STATE_UNREGISTERED 0x2 +#define REG_STATE_UNUSED 0x3 + +void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq) +{ + /* Simplify driver cleanup code paths, allow unreg "unused" */ + if (xdp_rxq->reg_state == REG_STATE_UNUSED) + return; + + WARN(!(xdp_rxq->reg_state == REG_STATE_REGISTERED), "Driver BUG"); + + xdp_rxq->reg_state = REG_STATE_UNREGISTERED; + xdp_rxq->dev = NULL; +} +EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg); + +static void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq) +{ + memset(xdp_rxq, 0, sizeof(*xdp_rxq)); +} + +/* Returns 0 on success, negative on failure */ +int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq, + struct net_device *dev, u32 queue_index) +{ + if (xdp_rxq->reg_state == REG_STATE_UNUSED) { + WARN(1, "Driver promised not to register this"); + return -EINVAL; + } + + if (xdp_rxq->reg_state == REG_STATE_REGISTERED) { + WARN(1, "Missing unregister, handled but fix driver"); + xdp_rxq_info_unreg(xdp_rxq); + } + + if (!dev) { + WARN(1, "Missing net_device from driver"); + return -ENODEV; + } + + /* State either UNREGISTERED or NEW */ + xdp_rxq_info_init(xdp_rxq); + xdp_rxq->dev = dev; + xdp_rxq->queue_index = queue_index; + + xdp_rxq->reg_state = REG_STATE_REGISTERED; + return 0; +} +EXPORT_SYMBOL_GPL(xdp_rxq_info_reg); + +void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq) +{ + xdp_rxq->reg_state = REG_STATE_UNUSED; +} +EXPORT_SYMBOL_GPL(xdp_rxq_info_unused); From patchwork Mon Nov 2 12:48:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392215 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt5g1GJGz9sVm; Mon, 2 Nov 2020 23:49:10 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZGj-0005vv-AW; Mon, 02 Nov 2020 12:49:05 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGh-0005vS-EN for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:03 +0000 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGh-000434-3V for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:03 +0000 Received: by mail-qk1-f197.google.com with SMTP id q18so5289496qke.9 for ; Mon, 02 Nov 2020 04:49:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=17/gw1zHdmukNoOIYStAnQoHzSH2Fki2jQdXTJv5Ck0=; b=gQ0gA8a2++ciw1Q0oZvZwhG2c9JvxG/+Swc4eW63q458Je6EQ5rgozxoE2/bzeRoJA dF6UDNI6u1znAFjqNNf2fbA3CyC+vbqTRR0wdTD71VOo2HRrilemcYI2MAXxqoOUN71K WEceNBWb0iVKXNQICTGAZa/IfYApwhqEvvL76m8lI57xl7LQDTPEOcqASPec0sG9Q9kA YuKh3u4uvv2zSbc6VyoukQLni1XqYjdwXWoBYln6BNM8XTwKcbXkuweQxLRcYL9pe4U4 3WLglZvU1upN3ChPXdNnJwB5kQDkDrlh2LA14BZcfIYWvYI/TKbw+D2BBJWwbLkYJCPc KyXg== X-Gm-Message-State: AOAM531/+v1O2FNvw4qLTFpUmMOOnihLtVf8bkoHH0uyfkjnyNRqWWul KY+q3/ws4Z7UeHl3bUg2f4fgL+tdHdrXNdgKgUR4IXXCaGUL3hfkC4OduoVXGnoRuwjRNM2ybYn SuS9yd+2SjsBjRXXdN0V4QcPDpKCEntYwTOLgIKHiRA== X-Received: by 2002:ac8:5cc2:: with SMTP id s2mr13838712qta.314.1604321341872; Mon, 02 Nov 2020 04:49:01 -0800 (PST) X-Google-Smtp-Source: ABdhPJwwD8s/wXnb9TlccbPInwevtf0RlKCr0NfaUNWtlRIfcpc/nXF1a54h9a/H6obEQ4XUM5QWTQ== X-Received: by 2002:ac8:5cc2:: with SMTP id s2mr13838697qta.314.1604321341636; Mon, 02 Nov 2020 04:49:01 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:01 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 03/40] ixgbe: setup xdp_rxq_info Date: Mon, 2 Nov 2020 07:48:19 -0500 Message-Id: <20201102124856.4659-4-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 Driver hook points for xdp_rxq_info: * reg : ixgbe_setup_rx_resources() * unreg: ixgbe_free_rx_resources() Tested on actual hardware. V2: Fix ixgbe_set_ringparam, clear xdp_rxq_info in temp_ring Cc: intel-wired-lan@lists.osuosl.org Cc: Jeff Kirsher Cc: Alexander Duyck Signed-off-by: Jesper Dangaard Brouer Acked-by: John Fastabend Signed-off-by: Alexei Starovoitov (backported from commit 99ffc5ade4e8703c3bc56fa6bb8e25437da09ee9) [ vilhelmgray: context adjustment ] Signed-off-by: William Breathitt Gray --- drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 ++ drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 ++++ drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 10 +++++++++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h index 468c3555a629..8611763d6129 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -53,6 +53,7 @@ #include #endif +#include #include /* common prefix used by pr_<> macros */ @@ -371,6 +372,7 @@ struct ixgbe_ring { struct ixgbe_tx_queue_stats tx_stats; struct ixgbe_rx_queue_stats rx_stats; }; + struct xdp_rxq_info xdp_rxq; } ____cacheline_internodealigned_in_smp; enum ixgbe_ring_f_enum { diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c index 372835dc144c..3decd446524c 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c @@ -1156,6 +1156,10 @@ static int ixgbe_set_ringparam(struct net_device *netdev, memcpy(&temp_ring[i], adapter->rx_ring[i], sizeof(struct ixgbe_ring)); + /* Clear copied XDP RX-queue info */ + memset(&temp_ring[i].xdp_rxq, 0, + sizeof(temp_ring[i].xdp_rxq)); + temp_ring[i].count = new_rx_count; err = ixgbe_setup_rx_resources(adapter, &temp_ring[i]); if (err) { diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index b2826f7e945c..49028d005eae 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -2330,12 +2330,14 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, #endif /* IXGBE_FCOE */ u16 cleaned_count = ixgbe_desc_unused(rx_ring); unsigned int xdp_xmit = 0; + struct xdp_buff xdp; + + xdp.rxq = &rx_ring->xdp_rxq; while (likely(total_rx_packets < budget)) { union ixgbe_adv_rx_desc *rx_desc; struct ixgbe_rx_buffer *rx_buffer; struct sk_buff *skb; - struct xdp_buff xdp; unsigned int size; /* return some buffers to hardware, one at a time is too slow */ @@ -6499,6 +6501,11 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter, rx_ring->next_to_clean = 0; rx_ring->next_to_use = 0; + /* XDP RX-queue info */ + if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev, + rx_ring->queue_index) < 0) + goto err; + rx_ring->xdp_prog = adapter->xdp_prog; return 0; @@ -6596,6 +6603,7 @@ void ixgbe_free_rx_resources(struct ixgbe_ring *rx_ring) ixgbe_clean_rx_ring(rx_ring); rx_ring->xdp_prog = NULL; + xdp_rxq_info_unreg(&rx_ring->xdp_rxq); vfree(rx_ring->rx_buffer_info); rx_ring->rx_buffer_info = NULL; From patchwork Mon Nov 2 12:48:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392212 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt5g1CWnz9sVk; Mon, 2 Nov 2020 23:49:10 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZGk-0005wU-FR; Mon, 02 Nov 2020 12:49:06 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGi-0005vp-SM for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:04 +0000 Received: from mail-qt1-f200.google.com ([209.85.160.200]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGi-00043D-IX for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:04 +0000 Received: by mail-qt1-f200.google.com with SMTP id h16so7915833qtr.8 for ; Mon, 02 Nov 2020 04:49:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TTs0syZMybQvbzXF2mlBVwzy7QMc36DUmBRaJdqXt0Y=; b=mAbowFrJqEwPcWcdixdwPzJt9P+dewiRPxErsSOyFDQ63XRnAop/E+kUxaR3/RsKOu q8XqUWBhaOvjKr+O0YwY8to3zyIR9u6B8uGUGT9/LcFKuuZLJmolMcBIbXqzNVP3eYFR baDaX85YpUnIYpq60uDoSGUlGL2jwZtYU3s3gmveIAXFbqaMFijSWy+AgZ+lg+Mzdicv AH++Vw4Q171lNyaZqMN1TpjO7NXy27zddXXBwg5UfvlIGDTHm4GLgdk70LYbqZiLb3V1 OR2HwTKQyXWOuzsOk+b0iscCNLZqBxyQxhoT+xrrcKyTXA0Pru29+5646de1b/O7myHm i3yg== X-Gm-Message-State: AOAM533HqSnsdPzxciLCGfmDohi3GUm0Ydkamw9cD8Ak6D4Z6DOr7Cep /a9UfJ6tbMPx+EC0hRroaTAEzmZMD6orlu6wwIgmVOvv9ByJ221khimE2IJW6mJz6w9aUApuF/2 llXWuAY5+kRffpahwYwEUy3RAs4248DroI9r9x00nuA== X-Received: by 2002:ac8:5856:: with SMTP id h22mr14154987qth.254.1604321343325; Mon, 02 Nov 2020 04:49:03 -0800 (PST) X-Google-Smtp-Source: ABdhPJxiFE+Fnp/bmboheyGf7UuG17L2s5UVEqolZe1NSqzfCF+EtlKuY3Ug6qj99qDjRUNcm6vS0w== X-Received: by 2002:ac8:5856:: with SMTP id h22mr14154974qth.254.1604321343031; Mon, 02 Nov 2020 04:49:03 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:02 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 04/40] xdp/qede: setup xdp_rxq_info and intro xdp_rxq_info_is_reg Date: Mon, 2 Nov 2020 07:48:20 -0500 Message-Id: <20201102124856.4659-5-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 The driver code qede_free_fp_array() depend on kfree() can be called with a NULL pointer. This stems from the qede_alloc_fp_array() function which either (kz)alloc memory for fp->txq or fp->rxq. This also simplifies error handling code in case of memory allocation failures, but xdp_rxq_info_unreg need to know the difference. Introduce xdp_rxq_info_is_reg() to handle if a memory allocation fails and detect this is the failure path by seeing that xdp_rxq_info was not registred yet, which first happens after successful alloaction in qede_init_fp(). Driver hook points for xdp_rxq_info: * reg : qede_init_fp * unreg: qede_free_fp_array Tested on actual hardware with samples/bpf program. V2: Driver have no proper error path for failed XDP RX-queue info reg, as qede_init_fp() is a void function. Cc: everest-linux-l2@cavium.com Cc: Ariel Elior Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Alexei Starovoitov (cherry picked from commit c0124f327e5cabd844a10d7e1fc5aa2a81e796a9) Signed-off-by: William Breathitt Gray --- drivers/net/ethernet/qlogic/qede/qede.h | 2 ++ drivers/net/ethernet/qlogic/qede/qede_fp.c | 1 + drivers/net/ethernet/qlogic/qede/qede_main.c | 10 ++++++++++ include/net/xdp.h | 1 + net/core/xdp.c | 6 ++++++ 5 files changed, 20 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h index 2a8535f1e966..93981490cc84 100644 --- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -40,6 +40,7 @@ #include #include #include +#include #include #include #ifdef CONFIG_RFS_ACCEL @@ -349,6 +350,7 @@ struct qede_rx_queue { u64 xdp_no_pass; void *handle; + struct xdp_rxq_info xdp_rxq; }; union db_prod { diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c index da80852b5ce0..14941303189d 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_fp.c +++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c @@ -1004,6 +1004,7 @@ static bool qede_rx_xdp(struct qede_dev *edev, xdp.data = xdp.data_hard_start + *data_offset; xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + *len; + xdp.rxq = &rxq->xdp_rxq; /* Queues always have a full reset currently, so for the time * being until there's atomic program replace just mark read diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 60d59b057b9b..43ed50cb0030 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -770,6 +770,12 @@ static void qede_free_fp_array(struct qede_dev *edev) fp = &edev->fp_array[i]; kfree(fp->sb_info); + /* Handle mem alloc failure case where qede_init_fp + * didn't register xdp_rxq_info yet. + * Implicit only (fp->type & QEDE_FASTPATH_RX) + */ + if (fp->rxq && xdp_rxq_info_is_reg(&fp->rxq->xdp_rxq)) + xdp_rxq_info_unreg(&fp->rxq->xdp_rxq); kfree(fp->rxq); kfree(fp->xdp_tx); kfree(fp->txq); @@ -1518,6 +1524,10 @@ static void qede_init_fp(struct qede_dev *edev) else fp->rxq->data_direction = DMA_FROM_DEVICE; fp->rxq->dev = &edev->pdev->dev; + + /* Driver have no error path from here */ + WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev, + fp->rxq->rxq_id) < 0); } if (fp->type & QEDE_FASTPATH_TX) { diff --git a/include/net/xdp.h b/include/net/xdp.h index 86c41631a908..b2362ddfa694 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -43,5 +43,6 @@ int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq, struct net_device *dev, u32 queue_index); void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq); void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq); +bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq); #endif /* __LINUX_NET_XDP_H__ */ diff --git a/net/core/xdp.c b/net/core/xdp.c index 229bc5a0ee04..097a0f74e004 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -65,3 +65,9 @@ void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq) xdp_rxq->reg_state = REG_STATE_UNUSED; } EXPORT_SYMBOL_GPL(xdp_rxq_info_unused); + +bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq) +{ + return (xdp_rxq->reg_state == REG_STATE_REGISTERED); +} +EXPORT_SYMBOL_GPL(xdp_rxq_info_is_reg); From patchwork Mon Nov 2 12:48:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392213 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt5g0yZdz9sVX; Mon, 2 Nov 2020 23:49:10 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZGl-0005xd-Td; Mon, 02 Nov 2020 12:49:07 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGj-0005w8-Pm for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:05 +0000 Received: from mail-qt1-f199.google.com ([209.85.160.199]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGj-00043L-FW for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:05 +0000 Received: by mail-qt1-f199.google.com with SMTP id 7so492731qtw.23 for ; Mon, 02 Nov 2020 04:49:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Q17f1k/k/0jSrOZDya9176SedGL7xtN/gmYasd6ToII=; b=rLdYEVoIEVHRtJs8xaKs+lvAsnDWK6bgr6NvS65ULtIsPXDaNnV07MWhERY2YkQ5ny xWTZ3gEUONPdYMxXRLSJ7ZOYnrhnOj5+CEhNHD2+mIV9jNXuT/+ouWooVkR53jxWe7RC kVMIxWTEbYPPATNKH9f7Kbt4fCdzuiVxhskBFJMqPqwSXn6wi2xBqj/WonInGz4qKJ1k oRaQdVY5OQz6X8pp3nX0ZMGYGKDAdh3w/Ih4fGsMPvH4RFlI0RAF+5bPOq+zrJLxqfy2 5liWtONgovR2kz9fjl/CjA0VmfrqfHKSVK5HNZNiBHTuQtezs8S9NKZha5fn8bj9AU7F 41EA== X-Gm-Message-State: AOAM5300EYCwEBh76Qh4T4rbbIG9FPlisgwDCsHJ4ITq9t7A1ERdnVdn ZkechRJ9U78wDWtfLHRMWK9WPOeJGjxePpL93UAAUo9qbrinetWLD3wSYCuuBhiW9RHaRox/J4u XqzPjNMRDDxpXDeH8EYNfIUwBwuj1DAzcVYDwupK0qg== X-Received: by 2002:a37:a642:: with SMTP id p63mr10015219qke.5.1604321344317; Mon, 02 Nov 2020 04:49:04 -0800 (PST) X-Google-Smtp-Source: ABdhPJxPyIG1iNoaOha25TPjmY1bBpIKeZeip4d2YtETR3bH5DILhu7kq8qAQUMmXAfvl2nRQp7GSA== X-Received: by 2002:a37:a642:: with SMTP id p63mr10015196qke.5.1604321344010; Mon, 02 Nov 2020 04:49:04 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:03 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 05/40] tun: setup xdp_rxq_info Date: Mon, 2 Nov 2020 07:48:21 -0500 Message-Id: <20201102124856.4659-6-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 Driver hook points for xdp_rxq_info: * reg : tun_attach * unreg: __tun_detach I've done some manual testing of this tun driver, but I would appriciate good review and someone else running their use-case tests, as I'm not 100% sure I understand the tfile->detached semantics. V2: Removed the skb_array_cleanup() call from V1 by request from Jason Wang. Cc: Jason Wang Cc: "Michael S. Tsirkin" Cc: Willem de Bruijn Signed-off-by: Jesper Dangaard Brouer Reviewed-by: Jason Wang Signed-off-by: Alexei Starovoitov (backported from commit 8bf5c4ee1889308ccd396fdfd40ac94129ee419f) [ vilhelmgray: context adjustments ] Signed-off-by: William Breathitt Gray --- drivers/net/tun.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index e9c7317618fa..2434b2bd902f 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -181,6 +181,7 @@ struct tun_file { struct list_head next; struct tun_struct *detached; struct skb_array tx_array; + struct xdp_rxq_info xdp_rxq; }; struct tun_flow_entry { @@ -658,7 +659,10 @@ static void __tun_detach(struct tun_file *tfile, bool clean) tun->dev->reg_state == NETREG_REGISTERED) unregister_netdevice(tun->dev); } - skb_array_cleanup(&tfile->tx_array); + if (tun) { + skb_array_cleanup(&tfile->tx_array); + xdp_rxq_info_unreg(&tfile->xdp_rxq); + } sock_put(&tfile->sk); } } @@ -699,11 +703,13 @@ static void tun_detach_all(struct net_device *dev) tun_napi_del(tfile); /* Drop read queue */ tun_queue_purge(tfile); + xdp_rxq_info_unreg(&tfile->xdp_rxq); sock_put(&tfile->sk); } list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) { tun_enable_queue(tfile); tun_queue_purge(tfile); + xdp_rxq_info_unreg(&tfile->xdp_rxq); sock_put(&tfile->sk); } BUG_ON(tun->numdisabled != 0); @@ -759,6 +765,22 @@ static int tun_attach(struct tun_struct *tun, struct file *file, tfile->queue_index = tun->numqueues; tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN; + + if (tfile->detached) { + /* Re-attach detached tfile, updating XDP queue_index */ + WARN_ON(!xdp_rxq_info_is_reg(&tfile->xdp_rxq)); + + if (tfile->xdp_rxq.queue_index != tfile->queue_index) + tfile->xdp_rxq.queue_index = tfile->queue_index; + } else { + /* Setup XDP RX-queue info, for new tfile getting attached */ + err = xdp_rxq_info_reg(&tfile->xdp_rxq, + tun->dev, tfile->queue_index); + if (err < 0) + goto out; + err = 0; + } + if (tfile->detached) { tun_enable_queue(tfile); } else { @@ -1498,6 +1520,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, xdp.data = buf + pad; xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + len; + xdp.rxq = &tfile->xdp_rxq; orig_data = xdp.data; act = bpf_prog_run_xdp(xdp_prog, &xdp); From patchwork Mon Nov 2 12:48:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392219 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt5l6D4wz9sWM; Mon, 2 Nov 2020 23:49:15 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZGn-0005yW-6X; Mon, 02 Nov 2020 12:49:09 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGk-0005wT-Jj for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:06 +0000 Received: from mail-qt1-f199.google.com ([209.85.160.199]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGk-00043T-6w for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:06 +0000 Received: by mail-qt1-f199.google.com with SMTP id i20so1776083qtr.0 for ; Mon, 02 Nov 2020 04:49:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lZENkfgIrG49WibnHgYdvJkWPI6O/ilQ5xVJKbVQDbM=; b=mTqRQJx/iBWD846SHFuLZeD1wiGcemVgbwM32s8zQiYIMz8j2EV0iIzUyvD0WMExa0 c1Uodw7wW/ciK3vaZeLHhHJ/cx+tHr8oSY7gp6ubuBDwpuvTupFZhyfeCOXBr7KxApvO ru3XuparIp90hKLemOCmJN8d6Y22FgcCRD7wpVRQLJe2HFq0D16HoJMt8FK1TX8APN6p Lv6/vU1Qyr1OjnFeHqsttO4hc5Es4OgCMe07eX1K+QPUkhJpy2AO6uwPnyjGVHO89OUm cIJ+WTJdD/S/RSXKS+JBjVHONZgsP1DRzSQZxf+MXs1dUEzgVN7o+6FRuuFrc10MlPlj pERg== X-Gm-Message-State: AOAM533MuDvWY/Rtx7FdhWBk4X0hukn2DjV+A+EMdl08ciIlVr0KLGuu DU7wDOXeKQRIGua6ukKM886ahcSSbvxD6u/SCfBq19JhtpFaVoLeXhVZYURuAw875gJGGH1Tt8l FB93eWyQ+EC5/WtbKCmeS09xPcLAbT3vt7jrZUImQwQ== X-Received: by 2002:ac8:7b92:: with SMTP id p18mr4236412qtu.105.1604321345025; Mon, 02 Nov 2020 04:49:05 -0800 (PST) X-Google-Smtp-Source: ABdhPJwaZuu8VmOGneB0ceN6Ubuc04g1WTQxHs4gl0oXllCr81etbQcyRsglilf9EE6M1b8+QK2oWw== X-Received: by 2002:ac8:7b92:: with SMTP id p18mr4236396qtu.105.1604321344802; Mon, 02 Nov 2020 04:49:04 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:04 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 06/40] virtio_net: setup xdp_rxq_info Date: Mon, 2 Nov 2020 07:48:22 -0500 Message-Id: <20201102124856.4659-7-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 The virtio_net driver doesn't dynamically change the RX-ring queue layout and backing pages, but instead reject XDP setup if all the conditions for XDP is not meet. Thus, the xdp_rxq_info also remains fairly static. This allow us to simply add the reg/unreg to net_device open/close functions. Driver hook points for xdp_rxq_info: * reg : virtnet_open * unreg: virtnet_close V3: - bugfix, also setup xdp.rxq in receive_mergeable() - Tested bpf-sample prog inside guest on a virtio_net device Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: virtualization@lists.linux-foundation.org Signed-off-by: Jesper Dangaard Brouer Reviewed-by: Jason Wang Signed-off-by: Alexei Starovoitov (backported from commit 754b8a21a96d5f11712245aef907149606b323ae) Signed-off-by: William Breathitt Gray --- drivers/net/virtio_net.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 2b6916c012d2..76a1eb622a04 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -31,6 +31,7 @@ #include #include #include +#include static int napi_weight = NAPI_POLL_WEIGHT; module_param(napi_weight, int, 0444); @@ -116,6 +117,8 @@ struct receive_queue { /* Name of this receive queue: input.$index */ char name[40]; + + struct xdp_rxq_info xdp_rxq; }; /* Control VQ buffers: protected by the rtnl lock */ @@ -565,6 +568,7 @@ static struct sk_buff *receive_small(struct net_device *dev, xdp.data = xdp.data_hard_start + xdp_headroom; xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + len; + xdp.rxq = &rq->xdp_rxq; orig_data = xdp.data; act = bpf_prog_run_xdp(xdp_prog, &xdp); @@ -702,6 +706,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, xdp.data = data + vi->hdr_len; xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + (len - vi->hdr_len); + xdp.rxq = &rq->xdp_rxq; + act = bpf_prog_run_xdp(xdp_prog, &xdp); if (act != XDP_PASS) @@ -1257,13 +1263,18 @@ static int virtnet_poll(struct napi_struct *napi, int budget) static int virtnet_open(struct net_device *dev) { struct virtnet_info *vi = netdev_priv(dev); - int i; + int i, err; for (i = 0; i < vi->max_queue_pairs; i++) { if (i < vi->curr_queue_pairs) /* Make sure we have some buffers: if oom use wq. */ if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL)) schedule_delayed_work(&vi->refill, 0); + + err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i); + if (err < 0) + return err; + virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi); } @@ -1601,6 +1612,7 @@ static int virtnet_close(struct net_device *dev) cancel_delayed_work_sync(&vi->refill); for (i = 0; i < vi->max_queue_pairs; i++) { + xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); napi_disable(&vi->rq[i].napi); virtnet_napi_tx_disable(&vi->sq[i].napi); } From patchwork Mon Nov 2 12:48:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392218 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt5k50pTz9sWL; Mon, 2 Nov 2020 23:49:14 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZGo-0005zr-KC; Mon, 02 Nov 2020 12:49:10 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGl-0005xI-MX for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:07 +0000 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGl-00043d-Ay for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:07 +0000 Received: by mail-qk1-f197.google.com with SMTP id r129so7339363qkd.5 for ; Mon, 02 Nov 2020 04:49:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tFVl0m4rcyPYF/nyVhONj20ZhhTm6op4ajZXeolmbpg=; b=qX+eF4pYIipQFCwrUt96OPxDAs4Hv7zW5d6ajlarFRHy8Y92EOIjEyQX99YIdoT0G4 n8+dC2rGzSOpfnmezTPETjyc+lej/+A1CtpZvWV0DyBrTRGpL3v0ZZllHjkriPYzDWz8 ssH7iRKHbqDOalI1a/JQGn90ZGQRgd64H2WXrT6yMbqNMBKO1oTSQHdV8gMpO0dxgD2M n2Q9NphgpQgvmYOIvAwMcPzHimZ/8ydCDdz2Fq64lCtuhlJvJ8saPrquA2vC/oY97BLX 0/oNybftmwZ7/j/w9grcRYkh/BH+PX6j/mwoaXOq7IEQZkwXRYcWDTFWRvZZ8xFETY2b P1bA== X-Gm-Message-State: AOAM530KQ6LUc7UtaWQbUPN4F+crEogMjoW3fInIIZgTotzCQckmXxRP Mnr/mfyXfJSFybYToWK4v+zJGEIPKiFx3Xea5PipgnDE9c5CoS0MYXTB8domwTKrfoTtfgrr6ii GRArHRYLIzCFvoXa34LLUmJReUct8WHrtCBN/qCtzFg== X-Received: by 2002:a37:3d5:: with SMTP id 204mr15329464qkd.373.1604321346186; Mon, 02 Nov 2020 04:49:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJyHcUsl+Q+B1qyKSbHup4EkY9oCL6mUd0QDW6eS3jrduy3o7w2JR9EbOzQNOB7yUaALEL5xDw== X-Received: by 2002:a37:3d5:: with SMTP id 204mr15329447qkd.373.1604321345964; Mon, 02 Nov 2020 04:49:05 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:05 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 07/40] xdp: generic XDP handling of xdp_rxq_info Date: Mon, 2 Nov 2020 07:48:23 -0500 Message-Id: <20201102124856.4659-8-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 Hook points for xdp_rxq_info: * reg : netif_alloc_rx_queues * unreg: netif_free_rx_queues The net_device have some members (num_rx_queues + real_num_rx_queues) and data-area (dev->_rx with struct netdev_rx_queue's) that were primarily used for exporting information about RPS (CONFIG_RPS) queues to sysfs (CONFIG_SYSFS). For generic XDP extend struct netdev_rx_queue with the xdp_rxq_info, and remove some of the CONFIG_SYSFS ifdefs. Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Alexei Starovoitov (cherry picked from commit e817f85652c14d78f170b18797e4c477c78949e0) Signed-off-by: William Breathitt Gray --- include/linux/netdevice.h | 2 ++ net/core/dev.c | 69 +++++++++++++++++++++++++++++++++------ 2 files changed, 61 insertions(+), 10 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ebe9a5bc648a..0482941b1647 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -44,6 +44,7 @@ #include #endif #include +#include #include #include @@ -686,6 +687,7 @@ struct netdev_rx_queue { #endif struct kobject kobj; struct net_device *dev; + struct xdp_rxq_info xdp_rxq; } ____cacheline_aligned_in_smp; /* diff --git a/net/core/dev.c b/net/core/dev.c index 0e37c4da15b5..d84ba4b0587a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3931,9 +3931,33 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu, return NET_RX_DROP; } +static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb) +{ + struct net_device *dev = skb->dev; + struct netdev_rx_queue *rxqueue; + + rxqueue = dev->_rx; + + if (skb_rx_queue_recorded(skb)) { + u16 index = skb_get_rx_queue(skb); + + if (unlikely(index >= dev->real_num_rx_queues)) { + WARN_ONCE(dev->real_num_rx_queues > 1, + "%s received packet on queue %u, but number " + "of RX queues is %u\n", + dev->name, index, dev->real_num_rx_queues); + + return rxqueue; /* Return first rxqueue */ + } + rxqueue += index; + } + return rxqueue; +} + static u32 netif_receive_generic_xdp(struct sk_buff *skb, struct bpf_prog *xdp_prog) { + struct netdev_rx_queue *rxqueue; u32 metalen, act = XDP_DROP; struct xdp_buff xdp; void *orig_data; @@ -3977,6 +4001,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb, xdp.data_hard_start = skb->data - skb_headroom(skb); orig_data = xdp.data; + rxqueue = netif_get_rxqueue(skb); + xdp.rxq = &rxqueue->xdp_rxq; + act = bpf_prog_run_xdp(xdp_prog, &xdp); off = xdp.data - orig_data; @@ -7752,12 +7779,12 @@ void netif_stacked_transfer_operstate(const struct net_device *rootdev, } EXPORT_SYMBOL(netif_stacked_transfer_operstate); -#ifdef CONFIG_SYSFS static int netif_alloc_rx_queues(struct net_device *dev) { unsigned int i, count = dev->num_rx_queues; struct netdev_rx_queue *rx; size_t sz = count * sizeof(*rx); + int err = 0; BUG_ON(count < 1); @@ -7767,11 +7794,39 @@ static int netif_alloc_rx_queues(struct net_device *dev) dev->_rx = rx; - for (i = 0; i < count; i++) + for (i = 0; i < count; i++) { rx[i].dev = dev; + + /* XDP RX-queue setup */ + err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i); + if (err < 0) + goto err_rxq_info; + } return 0; + +err_rxq_info: + /* Rollback successful reg's and free other resources */ + while (i--) + xdp_rxq_info_unreg(&rx[i].xdp_rxq); + kfree(dev->_rx); + dev->_rx = NULL; + return err; +} + +static void netif_free_rx_queues(struct net_device *dev) +{ + unsigned int i, count = dev->num_rx_queues; + struct netdev_rx_queue *rx; + + /* netif_alloc_rx_queues alloc failed, resources have been unreg'ed */ + if (!dev->_rx) + return; + + rx = dev->_rx; + + for (i = 0; i < count; i++) + xdp_rxq_info_unreg(&rx[i].xdp_rxq); } -#endif static void netdev_init_one_queue(struct net_device *dev, struct netdev_queue *queue, void *_unused) @@ -8363,12 +8418,10 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, return NULL; } -#ifdef CONFIG_SYSFS if (rxqs < 1) { pr_err("alloc_netdev: Unable to allocate device with zero RX queues\n"); return NULL; } -#endif alloc_size = sizeof(struct net_device); if (sizeof_priv) { @@ -8427,12 +8480,10 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, if (netif_alloc_netdev_queues(dev)) goto free_all; -#ifdef CONFIG_SYSFS dev->num_rx_queues = rxqs; dev->real_num_rx_queues = rxqs; if (netif_alloc_rx_queues(dev)) goto free_all; -#endif strcpy(dev->name, name); dev->name_assign_type = name_assign_type; @@ -8472,9 +8523,7 @@ void free_netdev(struct net_device *dev) might_sleep(); netif_free_tx_queues(dev); -#ifdef CONFIG_SYSFS - kvfree(dev->_rx); -#endif + netif_free_rx_queues(dev); kfree(rcu_dereference_protected(dev->ingress_queue, 1)); From patchwork Mon Nov 2 12:48:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392220 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt5p5V4zz9sWS; Mon, 2 Nov 2020 23:49:18 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZGq-00061r-RX; Mon, 02 Nov 2020 12:49:12 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGn-0005yV-BI for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:09 +0000 Received: from mail-qk1-f199.google.com ([209.85.222.199]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGm-00043n-R4 for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:09 +0000 Received: by mail-qk1-f199.google.com with SMTP id r129so7339414qkd.5 for ; Mon, 02 Nov 2020 04:49:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JA0ndzQkDG8d46e9gh6YIHrmIrd12YKPRXVShYNh/Ss=; b=ZZNgnu9F704jopVmxk9FZjSxj1TGSXUDlTF7GxeDsdIUhb3FvDt/DwN7bb6OOHFGCT zqyvgZxJNmvTNlL42rNTyeY1dYi+jFFwVudSUK8Io7hOccRsHD7udPQxDFCXje8WAJL7 654Qz4f9LKj0k5/hvhTglqwhbIhlk92jZCSTq+2grWb3juIEPHhgLMII8hbrQtuNDUr/ 33PWmZOFAGutgOT/yeGRFAxAS236Ndo2Inpt8jP7Y0qHogxLtx+84DasXWvZ2VO5Got/ b+ijfiHiYVymimJyupevMbdfkdeKA6KuPZjgncRJ4yJP4+yZGF6uLShM2z9FTSTj+UmS Ueaw== X-Gm-Message-State: AOAM5311CIuLBanvJLYN78StLT419Cfk5WV+F7dLZOx7VP5t4UmRgbDn tAxL4tl0Vm6XfsjJM9eZpJq23XzVjAOp6hNy178rbCJwLoCIus6BqfPfweSZwTjuYtWvLvOo3nL xmftMqQ6nlWafCN6XfDUjz8JbJzHO20P67qoklaIrVQ== X-Received: by 2002:ae9:ebca:: with SMTP id b193mr2312457qkg.235.1604321347506; Mon, 02 Nov 2020 04:49:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJzPEIs6SvF9rZIoJq0MxnOrsHXfwPnmTbt8oK5yqTEM1GMjzXuuV7ozIA8eLojSzOfTztinmA== X-Received: by 2002:ae9:ebca:: with SMTP id b193mr2312439qkg.235.1604321347224; Mon, 02 Nov 2020 04:49:07 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.06 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:06 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 08/40] tun/tap: use ptr_ring instead of skb_array Date: Mon, 2 Nov 2020 07:48:24 -0500 Message-Id: <20201102124856.4659-9-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jason Wang BugLink: https://bugs.launchpad.net/bugs/1877654 This patch switches to use ptr_ring instead of skb_array. This will be used to enqueue different types of pointers by encoding type into lower bits. Signed-off-by: Jason Wang Signed-off-by: David S. Miller (backported from commit 5990a30510ed1c37a769d3a035ad2d030b843528) [ vilhelmgray: context adjustments ] [ vilhelmgray: use ptr_ring_resize() in tun_attach() ] [ vilhelmgray: use ptr_ring_init() in tun_chr_open() ] Signed-off-by: William Breathitt Gray --- debian.azure-4.15/config/config.common.ubuntu | 1 + drivers/net/tap.c | 42 ++++++++++-------- drivers/net/tun.c | 44 ++++++++++--------- drivers/vhost/net.c | 39 ++++++++-------- include/linux/if_tap.h | 6 +-- include/linux/if_tun.h | 4 +- 6 files changed, 72 insertions(+), 64 deletions(-) diff --git a/debian.azure-4.15/config/config.common.ubuntu b/debian.azure-4.15/config/config.common.ubuntu index fbb0a87534ec..5d866c752057 100644 --- a/debian.azure-4.15/config/config.common.ubuntu +++ b/debian.azure-4.15/config/config.common.ubuntu @@ -3402,6 +3402,7 @@ CONFIG_PAGE_COUNTER=y # CONFIG_PAGE_EXTENSION is not set # CONFIG_PAGE_OWNER is not set # CONFIG_PAGE_POISONING is not set +CONFIG_PAGE_POOL=y CONFIG_PAGE_TABLE_ISOLATION=y # CONFIG_PANASONIC_LAPTOP is not set CONFIG_PANEL=m diff --git a/drivers/net/tap.c b/drivers/net/tap.c index f8b44d395c2f..658235e0204f 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -330,6 +330,9 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) if (!q) return RX_HANDLER_PASS; + if (__ptr_ring_full(&q->ring)) + goto drop; + skb_push(skb, ETH_HLEN); /* Apply the forward feature mask so that we perform segmentation @@ -345,7 +348,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) goto drop; if (!segs) { - if (skb_array_produce(&q->skb_array, skb)) + if (ptr_ring_produce(&q->ring, skb)) goto drop; goto wake_up; } @@ -355,7 +358,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) struct sk_buff *nskb = segs->next; segs->next = NULL; - if (skb_array_produce(&q->skb_array, segs)) { + if (ptr_ring_produce(&q->ring, segs)) { kfree_skb(segs); kfree_skb_list(nskb); break; @@ -372,7 +375,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) !(features & NETIF_F_CSUM_MASK) && skb_checksum_help(skb)) goto drop; - if (skb_array_produce(&q->skb_array, skb)) + if (ptr_ring_produce(&q->ring, skb)) goto drop; } @@ -494,7 +497,7 @@ static void tap_sock_destruct(struct sock *sk) { struct tap_queue *q = container_of(sk, struct tap_queue, sk); - skb_array_cleanup(&q->skb_array); + ptr_ring_cleanup(&q->ring, __skb_array_destroy_skb); } static int tap_open(struct inode *inode, struct file *file) @@ -514,7 +517,7 @@ static int tap_open(struct inode *inode, struct file *file) &tap_proto, 0); if (!q) goto err; - if (skb_array_init(&q->skb_array, tap->dev->tx_queue_len, GFP_KERNEL)) { + if (ptr_ring_init(&q->ring, tap->dev->tx_queue_len, GFP_KERNEL)) { sk_free(&q->sk); goto err; } @@ -543,7 +546,7 @@ static int tap_open(struct inode *inode, struct file *file) err = tap_set_queue(tap, file, q); if (err) { - /* tap_sock_destruct() will take care of freeing skb_array */ + /* tap_sock_destruct() will take care of freeing ptr_ring */ goto err_put; } @@ -580,7 +583,7 @@ static unsigned int tap_poll(struct file *file, poll_table *wait) mask = 0; poll_wait(file, &q->wq.wait, wait); - if (!skb_array_empty(&q->skb_array)) + if (!ptr_ring_empty(&q->ring)) mask |= POLLIN | POLLRDNORM; if (sock_writeable(&q->sk) || @@ -844,7 +847,7 @@ static ssize_t tap_do_read(struct tap_queue *q, TASK_INTERRUPTIBLE); /* Read frames from the queue */ - skb = skb_array_consume(&q->skb_array); + skb = ptr_ring_consume(&q->ring); if (skb) break; if (noblock) { @@ -1176,7 +1179,7 @@ static int tap_peek_len(struct socket *sock) { struct tap_queue *q = container_of(sock, struct tap_queue, sock); - return skb_array_peek_len(&q->skb_array); + return PTR_RING_PEEK_CALL(&q->ring, __skb_array_len_with_tag); } /* Ops structure to mimic raw sockets with tun */ @@ -1202,7 +1205,7 @@ struct socket *tap_get_socket(struct file *file) } EXPORT_SYMBOL_GPL(tap_get_socket); -struct skb_array *tap_get_skb_array(struct file *file) +struct ptr_ring *tap_get_ptr_ring(struct file *file) { struct tap_queue *q; @@ -1211,29 +1214,30 @@ struct skb_array *tap_get_skb_array(struct file *file) q = file->private_data; if (!q) return ERR_PTR(-EBADFD); - return &q->skb_array; + return &q->ring; } -EXPORT_SYMBOL_GPL(tap_get_skb_array); +EXPORT_SYMBOL_GPL(tap_get_ptr_ring); int tap_queue_resize(struct tap_dev *tap) { struct net_device *dev = tap->dev; struct tap_queue *q; - struct skb_array **arrays; + struct ptr_ring **rings; int n = tap->numqueues; int ret, i = 0; - arrays = kmalloc_array(n, sizeof(*arrays), GFP_KERNEL); - if (!arrays) + rings = kmalloc_array(n, sizeof(*rings), GFP_KERNEL); + if (!rings) return -ENOMEM; list_for_each_entry(q, &tap->queue_list, next) - arrays[i++] = &q->skb_array; + rings[i++] = &q->ring; - ret = skb_array_resize_multiple(arrays, n, - dev->tx_queue_len, GFP_KERNEL); + ret = ptr_ring_resize_multiple(rings, n, + dev->tx_queue_len, GFP_KERNEL, + __skb_array_destroy_skb); - kfree(arrays); + kfree(rings); return ret; } EXPORT_SYMBOL_GPL(tap_queue_resize); diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 2434b2bd902f..d1f2f25a748f 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -180,7 +180,7 @@ struct tun_file { struct mutex napi_mutex; /* Protects access to the above napi */ struct list_head next; struct tun_struct *detached; - struct skb_array tx_array; + struct ptr_ring tx_ring; struct xdp_rxq_info xdp_rxq; }; @@ -606,7 +606,7 @@ static void tun_queue_purge(struct tun_file *tfile) { struct sk_buff *skb; - while ((skb = skb_array_consume(&tfile->tx_array)) != NULL) + while ((skb = ptr_ring_consume(&tfile->tx_ring)) != NULL) kfree_skb(skb); skb_queue_purge(&tfile->sk.sk_write_queue); @@ -660,7 +660,8 @@ static void __tun_detach(struct tun_file *tfile, bool clean) unregister_netdevice(tun->dev); } if (tun) { - skb_array_cleanup(&tfile->tx_array); + ptr_ring_cleanup(&tfile->tx_ring, + __skb_array_destroy_skb); xdp_rxq_info_unreg(&tfile->xdp_rxq); } sock_put(&tfile->sk); @@ -758,7 +759,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file, } if (!tfile->detached && - skb_array_resize(&tfile->tx_array, dev->tx_queue_len, GFP_KERNEL)) { + ptr_ring_resize(&tfile->tx_ring, dev->tx_queue_len, GFP_KERNEL, __skb_array_destroy_skb)) { err = -ENOMEM; goto out; } @@ -1012,7 +1013,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) nf_reset(skb); - if (skb_array_produce(&tfile->tx_array, skb)) + if (ptr_ring_produce(&tfile->tx_ring, skb)) goto drop; /* Notify and wake up reader process */ @@ -1302,7 +1303,7 @@ static unsigned int tun_chr_poll(struct file *file, poll_table *wait) poll_wait(file, sk_sleep(sk), wait); - if (!skb_array_empty(&tfile->tx_array)) + if (!ptr_ring_empty(&tfile->tx_ring)) mask |= POLLIN | POLLRDNORM; /* Make sure SOCKWQ_ASYNC_NOSPACE is set if not writable to @@ -1981,7 +1982,7 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, struct sk_buff *skb = NULL; int error = 0; - skb = skb_array_consume(&tfile->tx_array); + skb = ptr_ring_consume(&tfile->tx_ring); if (skb) goto out; if (noblock) { @@ -1993,7 +1994,7 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, while (1) { set_current_state(TASK_INTERRUPTIBLE); - skb = skb_array_consume(&tfile->tx_array); + skb = ptr_ring_consume(&tfile->tx_ring); if (skb) break; if (signal_pending(current)) { @@ -2191,7 +2192,7 @@ static int tun_peek_len(struct socket *sock) if (!tun) return 0; - ret = skb_array_peek_len(&tfile->tx_array); + ret = PTR_RING_PEEK_CALL(&tfile->tx_ring, __skb_array_len_with_tag); tun_put(tun); return ret; @@ -2921,7 +2922,7 @@ static int tun_chr_open(struct inode *inode, struct file * file) &tun_proto, 0); if (!tfile) return -ENOMEM; - if (skb_array_init(&tfile->tx_array, 0, GFP_KERNEL)) { + if (ptr_ring_init(&tfile->tx_ring, 0, GFP_KERNEL)) { sk_free(&tfile->sk); return -ENOMEM; } @@ -3094,25 +3095,26 @@ static int tun_queue_resize(struct tun_struct *tun) { struct net_device *dev = tun->dev; struct tun_file *tfile; - struct skb_array **arrays; + struct ptr_ring **rings; int n = tun->numqueues + tun->numdisabled; int ret, i; - arrays = kmalloc_array(n, sizeof(*arrays), GFP_KERNEL); - if (!arrays) + rings = kmalloc_array(n, sizeof(*rings), GFP_KERNEL); + if (!rings) return -ENOMEM; for (i = 0; i < tun->numqueues; i++) { tfile = rtnl_dereference(tun->tfiles[i]); - arrays[i] = &tfile->tx_array; + rings[i] = &tfile->tx_ring; } list_for_each_entry(tfile, &tun->disabled, next) - arrays[i++] = &tfile->tx_array; + rings[i++] = &tfile->tx_ring; - ret = skb_array_resize_multiple(arrays, n, - dev->tx_queue_len, GFP_KERNEL); + ret = ptr_ring_resize_multiple(rings, n, + dev->tx_queue_len, GFP_KERNEL, + __skb_array_destroy_skb); - kfree(arrays); + kfree(rings); return ret; } @@ -3207,7 +3209,7 @@ struct socket *tun_get_socket(struct file *file) } EXPORT_SYMBOL_GPL(tun_get_socket); -struct skb_array *tun_get_skb_array(struct file *file) +struct ptr_ring *tun_get_tx_ring(struct file *file) { struct tun_file *tfile; @@ -3216,9 +3218,9 @@ struct skb_array *tun_get_skb_array(struct file *file) tfile = file->private_data; if (!tfile) return ERR_PTR(-EBADFD); - return &tfile->tx_array; + return &tfile->tx_ring; } -EXPORT_SYMBOL_GPL(tun_get_skb_array); +EXPORT_SYMBOL_GPL(tun_get_tx_ring); module_init(tun_init); module_exit(tun_cleanup); diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 17606d9688c7..0f1e31ee29b3 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -95,7 +95,7 @@ struct vhost_net_ubuf_ref { #define VHOST_RX_BATCH 64 struct vhost_net_buf { - struct sk_buff **queue; + void **queue; int tail; int head; }; @@ -114,7 +114,7 @@ struct vhost_net_virtqueue { /* Reference counting for outstanding ubufs. * Protected by vq mutex. Writers must also take device mutex. */ struct vhost_net_ubuf_ref *ubufs; - struct skb_array *rx_array; + struct ptr_ring *rx_ring; struct vhost_net_buf rxq; }; @@ -164,7 +164,7 @@ static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq) struct vhost_net_buf *rxq = &nvq->rxq; rxq->head = 0; - rxq->tail = skb_array_consume_batched(nvq->rx_array, rxq->queue, + rxq->tail = ptr_ring_consume_batched(nvq->rx_ring, rxq->queue, VHOST_RX_BATCH); return rxq->tail; } @@ -173,9 +173,10 @@ static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq) { struct vhost_net_buf *rxq = &nvq->rxq; - if (nvq->rx_array && !vhost_net_buf_is_empty(rxq)) { - skb_array_unconsume(nvq->rx_array, rxq->queue + rxq->head, - vhost_net_buf_get_size(rxq)); + if (nvq->rx_ring && !vhost_net_buf_is_empty(rxq)) { + ptr_ring_unconsume(nvq->rx_ring, rxq->queue + rxq->head, + vhost_net_buf_get_size(rxq), + __skb_array_destroy_skb); rxq->head = rxq->tail = 0; } } @@ -593,7 +594,7 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk) int len = 0; unsigned long flags; - if (rvq->rx_array) + if (rvq->rx_ring) return vhost_net_buf_peek(rvq); spin_lock_irqsave(&sk->sk_receive_queue.lock, flags); @@ -806,7 +807,7 @@ static void handle_rx(struct vhost_net *net) * they refilled. */ goto out; } - if (nvq->rx_array) + if (nvq->rx_ring) msg.msg_control = vhost_net_buf_consume(&nvq->rxq); /* On overrun, truncate and discard */ if (unlikely(headcount > UIO_MAXIOV)) { @@ -910,7 +911,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) struct vhost_net *n; struct vhost_dev *dev; struct vhost_virtqueue **vqs; - struct sk_buff **queue; + void **queue; int i; n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_RETRY_MAYFAIL); @@ -922,7 +923,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) return -ENOMEM; } - queue = kmalloc_array(VHOST_RX_BATCH, sizeof(struct sk_buff *), + queue = kmalloc_array(VHOST_RX_BATCH, sizeof(void *), GFP_KERNEL); if (!queue) { kfree(vqs); @@ -1052,23 +1053,23 @@ static struct socket *get_raw_socket(int fd) return ERR_PTR(r); } -static struct skb_array *get_tap_skb_array(int fd) +static struct ptr_ring *get_tap_ptr_ring(int fd) { - struct skb_array *array; + struct ptr_ring *ring; struct file *file = fget(fd); if (!file) return NULL; - array = tun_get_skb_array(file); - if (!IS_ERR(array)) + ring = tun_get_tx_ring(file); + if (!IS_ERR(ring)) goto out; - array = tap_get_skb_array(file); - if (!IS_ERR(array)) + ring = tap_get_ptr_ring(file); + if (!IS_ERR(ring)) goto out; - array = NULL; + ring = NULL; out: fput(file); - return array; + return ring; } static struct socket *get_tap_socket(int fd) @@ -1149,7 +1150,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) vq->private_data = sock; vhost_net_buf_unproduce(nvq); if (index == VHOST_NET_VQ_RX) - nvq->rx_array = get_tap_skb_array(fd); + nvq->rx_ring = get_tap_ptr_ring(fd); r = vhost_vq_init_access(vq); if (r) goto err_used; diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h index 3ecef57c31e3..8e66866c11be 100644 --- a/include/linux/if_tap.h +++ b/include/linux/if_tap.h @@ -4,7 +4,7 @@ #if IS_ENABLED(CONFIG_TAP) struct socket *tap_get_socket(struct file *); -struct skb_array *tap_get_skb_array(struct file *file); +struct ptr_ring *tap_get_ptr_ring(struct file *file); #else #include #include @@ -14,7 +14,7 @@ static inline struct socket *tap_get_socket(struct file *f) { return ERR_PTR(-EINVAL); } -static inline struct skb_array *tap_get_skb_array(struct file *f) +static inline struct ptr_ring *tap_get_ptr_ring(struct file *f) { return ERR_PTR(-EINVAL); } @@ -70,7 +70,7 @@ struct tap_queue { u16 queue_index; bool enabled; struct list_head next; - struct skb_array skb_array; + struct ptr_ring ring; }; rx_handler_result_t tap_handle_frame(struct sk_buff **pskb); diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h index bf9bdf42d577..bdee9b83baf6 100644 --- a/include/linux/if_tun.h +++ b/include/linux/if_tun.h @@ -19,7 +19,7 @@ #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE) struct socket *tun_get_socket(struct file *); -struct skb_array *tun_get_skb_array(struct file *file); +struct ptr_ring *tun_get_tx_ring(struct file *file); #else #include #include @@ -29,7 +29,7 @@ static inline struct socket *tun_get_socket(struct file *f) { return ERR_PTR(-EINVAL); } -static inline struct skb_array *tun_get_skb_array(struct file *f) +static inline struct ptr_ring *tun_get_tx_ring(struct file *f) { return ERR_PTR(-EINVAL); } From patchwork Mon Nov 2 12:48:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392223 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt5v3qTgz9sVk; Mon, 2 Nov 2020 23:49:23 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZGw-000666-OA; Mon, 02 Nov 2020 12:49:18 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGo-0005zB-6H for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:10 +0000 Received: from mail-qv1-f72.google.com ([209.85.219.72]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGn-00043y-PA for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:09 +0000 Received: by mail-qv1-f72.google.com with SMTP id eh4so8060534qvb.12 for ; Mon, 02 Nov 2020 04:49:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qYHUZMK8Ahpb5LYvo9Rit4Y9K0UDyVINcUuR9pyIcq0=; b=gzQD9QEUTvGDPZJOhWHGsMjY4WqqbqFMYgTvpWM4An745ipeFCLfzj8typw3lN03VG xWkuIY9X8XA8rYom4RDswx+SNA9UaA7odkiuGquteF+4J4PJSATG/5Gp0brtirtdKW/g jUeqyojJeE6zATzTg1PsLrx2yc3iffHI4+qXZUjKH6DZIIkHubFyGIYYUdncCrstl6Ww XpiGJPiy1y4UOp+rGbFDW0tgovjkm5VMr4RVsrZwQd0GwcNWYeaVpw/0hGH8P0CAHfpH rw4lUUv+5hnBvk6VZ9IRC6qQJybcnbPHQVJqYxeGgMLgzYkN2XOGWP+cWV9/Ak3vfdta sa9A== X-Gm-Message-State: AOAM532eBl7c1lM85V8anZ4B1PQ1l/r2rAOJCZnm5H910pN+NGsRB9O4 ZzKxYbop3Xfa0xGt/2jhzcXgQSpm6r54G9eB6twkPHhObTeAPW3oV9abj9+9tLqs/wDX3K43nf0 WSbeTsdWMnnZQmLiBEYUTtjn7e+5KbbrqkffYcSyF1w== X-Received: by 2002:a37:a192:: with SMTP id k140mr3233386qke.32.1604321348483; Mon, 02 Nov 2020 04:49:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJxwTBGsR9aWR6qzpW+FlMmVxh5Fdl55zs6w9E/zvsd0Z1gJQqzW4HI0JNFPHXWnoMqjVlnotg== X-Received: by 2002:a37:a192:: with SMTP id k140mr3233369qke.32.1604321348193; Mon, 02 Nov 2020 04:49:08 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:07 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 09/40] tuntap: XDP transmission Date: Mon, 2 Nov 2020 07:48:25 -0500 Message-Id: <20201102124856.4659-10-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jason Wang BugLink: https://bugs.launchpad.net/bugs/1877654 This patch implements XDP transmission for TAP. Since we can't create new queues for TAP during XDP set, exist ptr_ring was reused for queuing XDP buffers. To differ xdp_buff from sk_buff, TUN_XDP_FLAG (0x1UL) was encoded into lowest bit of xpd_buff pointer during ptr_ring_produce, and was decoded during consuming. XDP metadata was stored in the headroom of the packet which should work in most of cases since driver usually reserve enough headroom. Very minor changes were done for vhost_net: it just need to peek the length depends on the type of pointer. Tests were done on two Intel E5-2630 2.40GHz machines connected back to back through two 82599ES. Traffic were generated/received through MoonGen/testpmd(rxonly). It reports ~20% improvements when xdp_redirect_map is doing redirection from ixgbe to TAP (from 2.50Mpps to 3.05Mpps) Cc: Jesper Dangaard Brouer Signed-off-by: Jason Wang Signed-off-by: David S. Miller (backported from commit fc72d1d54dd9ffe2552c76b17e9129803ca7b255) [ vilhelmgray: context adjustments ] Signed-off-by: William Breathitt Gray --- drivers/net/tun.c | 211 ++++++++++++++++++++++++++++++++++------- drivers/vhost/net.c | 13 ++- include/linux/if_tun.h | 17 ++++ 3 files changed, 208 insertions(+), 33 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index d1f2f25a748f..6d827c0ef8cc 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -236,6 +236,24 @@ struct tun_struct { struct bpf_prog __rcu *xdp_prog; }; +bool tun_is_xdp_buff(void *ptr) +{ + return (unsigned long)ptr & TUN_XDP_FLAG; +} +EXPORT_SYMBOL(tun_is_xdp_buff); + +void *tun_xdp_to_ptr(void *ptr) +{ + return (void *)((unsigned long)ptr | TUN_XDP_FLAG); +} +EXPORT_SYMBOL(tun_xdp_to_ptr); + +void *tun_ptr_to_xdp(void *ptr) +{ + return (void *)((unsigned long)ptr & ~TUN_XDP_FLAG); +} +EXPORT_SYMBOL(tun_ptr_to_xdp); + static int tun_napi_receive(struct napi_struct *napi, int budget) { struct tun_file *tfile = container_of(napi, struct tun_file, napi); @@ -602,12 +620,25 @@ static struct tun_struct *tun_enable_queue(struct tun_file *tfile) return tun; } +static void tun_ptr_free(void *ptr) +{ + if (!ptr) + return; + if (tun_is_xdp_buff(ptr)) { + struct xdp_buff *xdp = tun_ptr_to_xdp(ptr); + + put_page(virt_to_head_page(xdp->data)); + } else { + __skb_array_destroy_skb(ptr); + } +} + static void tun_queue_purge(struct tun_file *tfile) { - struct sk_buff *skb; + void *ptr; - while ((skb = ptr_ring_consume(&tfile->tx_ring)) != NULL) - kfree_skb(skb); + while ((ptr = ptr_ring_consume(&tfile->tx_ring)) != NULL) + tun_ptr_free(ptr); skb_queue_purge(&tfile->sk.sk_write_queue); skb_queue_purge(&tfile->sk.sk_error_queue); @@ -660,8 +691,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean) unregister_netdevice(tun->dev); } if (tun) { - ptr_ring_cleanup(&tfile->tx_ring, - __skb_array_destroy_skb); + ptr_ring_cleanup(&tfile->tx_ring, tun_ptr_free); xdp_rxq_info_unreg(&tfile->xdp_rxq); } sock_put(&tfile->sk); @@ -1200,6 +1230,67 @@ static const struct net_device_ops tun_netdev_ops = { .ndo_change_carrier = tun_net_change_carrier, }; +static int tun_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) +{ + struct tun_struct *tun = netdev_priv(dev); + struct xdp_buff *buff = xdp->data_hard_start; + int headroom = xdp->data - xdp->data_hard_start; + struct tun_file *tfile; + u32 numqueues; + int ret = 0; + + /* Assure headroom is available and buff is properly aligned */ + if (unlikely(headroom < sizeof(*xdp) || tun_is_xdp_buff(xdp))) + return -ENOSPC; + + *buff = *xdp; + + rcu_read_lock(); + + numqueues = READ_ONCE(tun->numqueues); + if (!numqueues) { + ret = -ENOSPC; + goto out; + } + + tfile = rcu_dereference(tun->tfiles[smp_processor_id() % + numqueues]); + /* Encode the XDP flag into lowest bit for consumer to differ + * XDP buffer from sk_buff. + */ + if (ptr_ring_produce(&tfile->tx_ring, tun_xdp_to_ptr(buff))) { + this_cpu_inc(tun->pcpu_stats->tx_dropped); + ret = -ENOSPC; + } + +out: + rcu_read_unlock(); + return ret; +} + +static void tun_xdp_flush(struct net_device *dev) +{ + struct tun_struct *tun = netdev_priv(dev); + struct tun_file *tfile; + u32 numqueues; + + rcu_read_lock(); + + numqueues = READ_ONCE(tun->numqueues); + if (!numqueues) + goto out; + + tfile = rcu_dereference(tun->tfiles[smp_processor_id() % + numqueues]); + /* Notify and wake up reader process */ + if (tfile->flags & TUN_FASYNC) + kill_fasync(&tfile->fasync, SIGIO, POLL_IN); + tfile->socket.sk->sk_data_ready(tfile->socket.sk); + +out: + rcu_read_unlock(); +} + static const struct net_device_ops tap_netdev_ops = { .ndo_uninit = tun_net_uninit, .ndo_open = tun_net_open, @@ -1217,6 +1308,8 @@ static const struct net_device_ops tap_netdev_ops = { .ndo_set_rx_headroom = tun_set_headroom, .ndo_get_stats64 = tun_net_get_stats64, .ndo_bpf = tun_xdp, + .ndo_xdp_xmit = tun_xdp_xmit, + .ndo_xdp_flush = tun_xdp_flush, .ndo_change_carrier = tun_net_change_carrier, }; @@ -1877,6 +1970,40 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, struct iov_iter *from) return result; } +static ssize_t tun_put_user_xdp(struct tun_struct *tun, + struct tun_file *tfile, + struct xdp_buff *xdp, + struct iov_iter *iter) +{ + int vnet_hdr_sz = 0; + size_t size = xdp->data_end - xdp->data; + struct tun_pcpu_stats *stats; + size_t ret; + + if (tun->flags & IFF_VNET_HDR) { + struct virtio_net_hdr gso = { 0 }; + + vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz); + if (unlikely(iov_iter_count(iter) < vnet_hdr_sz)) + return -EINVAL; + if (unlikely(copy_to_iter(&gso, sizeof(gso), iter) != + sizeof(gso))) + return -EFAULT; + iov_iter_advance(iter, vnet_hdr_sz - sizeof(gso)); + } + + ret = copy_to_iter(xdp->data, size, iter) + vnet_hdr_sz; + + stats = get_cpu_ptr(tun->pcpu_stats); + u64_stats_update_begin(&stats->syncp); + stats->tx_packets++; + stats->tx_bytes += ret; + u64_stats_update_end(&stats->syncp); + put_cpu_ptr(tun->pcpu_stats); + + return ret; +} + /* Put packet to the user space buffer */ static ssize_t tun_put_user(struct tun_struct *tun, struct tun_file *tfile, @@ -1975,15 +2102,14 @@ static ssize_t tun_put_user(struct tun_struct *tun, return total; } -static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, - int *err) +static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err) { DECLARE_WAITQUEUE(wait, current); - struct sk_buff *skb = NULL; + void *ptr = NULL; int error = 0; - skb = ptr_ring_consume(&tfile->tx_ring); - if (skb) + ptr = ptr_ring_consume(&tfile->tx_ring); + if (ptr) goto out; if (noblock) { error = -EAGAIN; @@ -1994,8 +2120,8 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, while (1) { set_current_state(TASK_INTERRUPTIBLE); - skb = ptr_ring_consume(&tfile->tx_ring); - if (skb) + ptr = ptr_ring_consume(&tfile->tx_ring); + if (ptr) break; if (signal_pending(current)) { error = -ERESTARTSYS; @@ -2014,12 +2140,12 @@ static struct sk_buff *tun_ring_recv(struct tun_file *tfile, int noblock, out: *err = error; - return skb; + return ptr; } static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile, struct iov_iter *to, - int noblock, struct sk_buff *skb) + int noblock, void *ptr) { ssize_t ret; int err; @@ -2027,23 +2153,31 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile, tun_debug(KERN_INFO, tun, "tun_do_read\n"); if (!iov_iter_count(to)) { - if (skb) - kfree_skb(skb); + tun_ptr_free(ptr); return 0; } - if (!skb) { + if (!ptr) { /* Read frames from ring */ - skb = tun_ring_recv(tfile, noblock, &err); - if (!skb) + ptr = tun_ring_recv(tfile, noblock, &err); + if (!ptr) return err; } - ret = tun_put_user(tun, tfile, skb, to); - if (unlikely(ret < 0)) - kfree_skb(skb); - else - consume_skb(skb); + if (tun_is_xdp_buff(ptr)) { + struct xdp_buff *xdp = tun_ptr_to_xdp(ptr); + + ret = tun_put_user_xdp(tun, tfile, xdp, to); + put_page(virt_to_head_page(xdp->data)); + } else { + struct sk_buff *skb = ptr; + + ret = tun_put_user(tun, tfile, skb, to); + if (unlikely(ret < 0)) + kfree_skb(skb); + else + consume_skb(skb); + } return ret; } @@ -2148,12 +2282,12 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len, { struct tun_file *tfile = container_of(sock, struct tun_file, socket); struct tun_struct *tun = tun_get(tfile); - struct sk_buff *skb = m->msg_control; + void *ptr = m->msg_control; int ret; if (!tun) { ret = -EBADFD; - goto out_free_skb; + goto out_free; } if (flags & ~(MSG_DONTWAIT|MSG_TRUNC|MSG_ERRQUEUE)) { @@ -2165,7 +2299,7 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len, SOL_PACKET, TUN_TX_TIMESTAMP); goto out; } - ret = tun_do_read(tun, tfile, &m->msg_iter, flags & MSG_DONTWAIT, skb); + ret = tun_do_read(tun, tfile, &m->msg_iter, flags & MSG_DONTWAIT, ptr); if (ret > (ssize_t)total_len) { m->msg_flags |= MSG_TRUNC; ret = flags & MSG_TRUNC ? ret : total_len; @@ -2176,12 +2310,25 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len, out_put_tun: tun_put(tun); -out_free_skb: - if (skb) - kfree_skb(skb); +out_free: + tun_ptr_free(ptr); return ret; } +static int tun_ptr_peek_len(void *ptr) +{ + if (likely(ptr)) { + if (tun_is_xdp_buff(ptr)) { + struct xdp_buff *xdp = tun_ptr_to_xdp(ptr); + + return xdp->data_end - xdp->data; + } + return __skb_array_len_with_tag(ptr); + } else { + return 0; + } +} + static int tun_peek_len(struct socket *sock) { struct tun_file *tfile = container_of(sock, struct tun_file, socket); @@ -2192,7 +2339,7 @@ static int tun_peek_len(struct socket *sock) if (!tun) return 0; - ret = PTR_RING_PEEK_CALL(&tfile->tx_ring, __skb_array_len_with_tag); + ret = PTR_RING_PEEK_CALL(&tfile->tx_ring, tun_ptr_peek_len); tun_put(tun); return ret; @@ -3112,7 +3259,7 @@ static int tun_queue_resize(struct tun_struct *tun) ret = ptr_ring_resize_multiple(rings, n, dev->tx_queue_len, GFP_KERNEL, - __skb_array_destroy_skb); + tun_ptr_free); kfree(rings); return ret; diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 0f1e31ee29b3..2f3d5298d630 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -181,6 +181,17 @@ static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq) } } +static int vhost_net_buf_peek_len(void *ptr) +{ + if (tun_is_xdp_buff(ptr)) { + struct xdp_buff *xdp = tun_ptr_to_xdp(ptr); + + return xdp->data_end - xdp->data; + } + + return __skb_array_len_with_tag(ptr); +} + static int vhost_net_buf_peek(struct vhost_net_virtqueue *nvq) { struct vhost_net_buf *rxq = &nvq->rxq; @@ -192,7 +203,7 @@ static int vhost_net_buf_peek(struct vhost_net_virtqueue *nvq) return 0; out: - return __skb_array_len_with_tag(vhost_net_buf_get_ptr(rxq)); + return vhost_net_buf_peek_len(vhost_net_buf_get_ptr(rxq)); } static void vhost_net_buf_init(struct vhost_net_buf *rxq) diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h index bdee9b83baf6..08e66827ad8e 100644 --- a/include/linux/if_tun.h +++ b/include/linux/if_tun.h @@ -17,9 +17,14 @@ #include +#define TUN_XDP_FLAG 0x1UL + #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE) struct socket *tun_get_socket(struct file *); struct ptr_ring *tun_get_tx_ring(struct file *file); +bool tun_is_xdp_buff(void *ptr); +void *tun_xdp_to_ptr(void *ptr); +void *tun_ptr_to_xdp(void *ptr); #else #include #include @@ -33,5 +38,17 @@ static inline struct ptr_ring *tun_get_tx_ring(struct file *f) { return ERR_PTR(-EINVAL); } +static inline bool tun_is_xdp_buff(void *ptr) +{ + return false; +} +void *tun_xdp_to_ptr(void *ptr) +{ + return NULL; +} +void *tun_ptr_to_xdp(void *ptr) +{ + return NULL; +} #endif /* CONFIG_TUN */ #endif /* __IF_TUN_H */ From patchwork Mon Nov 2 12:48:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392221 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt600B58z9sWW; Mon, 2 Nov 2020 23:49:28 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZH0-000696-JY; Mon, 02 Nov 2020 12:49:22 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGq-000617-LN for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:12 +0000 Received: from mail-qk1-f200.google.com ([209.85.222.200]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGp-00044G-GR for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:11 +0000 Received: by mail-qk1-f200.google.com with SMTP id q18so5289738qke.9 for ; Mon, 02 Nov 2020 04:49:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MoiHYpSKxzCMiajiYSXSZkxrydP51k+UkJorV2QD56s=; b=abxBhwf5GzaMrRubMf3RSMC9q7ltGHia/iKbuaieiverdf+n/bJkscF0W+2jQf9+aU h+ooYBjzRHVoSGeVbZstzzOaDnR9YMftJX686fjsqV8lVvyfmsX0Z6VJZOcw4NgyjpPF h0jEKrx71gHdvlIkyjnuI56zVkVhq2jFvZXQPhvv8ll+iCPESqtvhWnwLsV600CZgeTI bKPk4pugnh9YMvF/JD5xI2wlU6+NKqX/ErxLAE4Q+LOlRpFueN5I7BPcqPC2Ax4E517J 54Qo9MGhaTHk2ADtxp5QR0qbat3SQpICspgp0puiM0chLeQhW7h124d5vE2+9MgDIWWS fmCA== X-Gm-Message-State: AOAM532uOOJvWFkNDRzID5n43+Z5FIJFPlQ2U3+En3KmcETTz2A2KQbY r+KP6VvrUYrmjS3JnziBbI8elvVxoTPKdIgJztVu7CKWzKBNtJnLp0yh/sVpoG00PsOaW5+htPw xUZrZ7GxY3MjgWyuoC+7oOkuKQnsXyAqnwBbDY32iHg== X-Received: by 2002:ae9:dcc1:: with SMTP id q184mr15187500qkf.436.1604321350230; Mon, 02 Nov 2020 04:49:10 -0800 (PST) X-Google-Smtp-Source: ABdhPJyzYUL3954weR7HStpj836qkZtTWZwe2quSbS+Oj5LJtrMWV2TY2+GntSvrPQF1kiu+5Nu0Eg== X-Received: by 2002:ae9:dcc1:: with SMTP id q184mr15187480qkf.436.1604321349973; Mon, 02 Nov 2020 04:49:09 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:09 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 10/40] net: avoid including xdp.h in filter.h Date: Mon, 2 Nov 2020 07:48:26 -0500 Message-Id: <20201102124856.4659-11-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 If is sufficient with a forward declaration of struct xdp_rxq_info in linux/filter.h, which avoids including net/xdp.h. This was originally suggested by John Fastabend during the review phase, but wasn't included in the final patchset revision. Thus, this followup. Suggested-by: John Fastabend Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Alexei Starovoitov (backported from commit 297dd12cb104151797fd649433a2157b585f1718) [ vilhelmgray: context adjustment ] Signed-off-by: William Breathitt Gray --- include/linux/filter.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 158fb795cba7..449091818f43 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -19,7 +19,6 @@ #include #include -#include #include #include @@ -29,6 +28,7 @@ struct sk_buff; struct sock; struct seccomp_data; struct bpf_prog_aux; +struct xdp_rxq_info; /* ArgX, context and stack frame pointer register positions. Note, * Arg1, Arg2, Arg3, etc are used as argument mappings of function From patchwork Mon Nov 2 12:48:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392224 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt615h5Vz9sWg; Mon, 2 Nov 2020 23:49:29 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZH2-0006Aj-Mf; Mon, 02 Nov 2020 12:49:24 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGq-00061i-Vx for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:12 +0000 Received: from mail-qv1-f72.google.com ([209.85.219.72]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGq-00044O-F4 for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:12 +0000 Received: by mail-qv1-f72.google.com with SMTP id d41so8036204qvc.23 for ; Mon, 02 Nov 2020 04:49:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AFWuR+RCU1E7X2Kvlv7Qz4zeplIsAgAP0hYhUcacoAE=; b=nAUFQL9/SsKJjer8+z9T3Lsq9/wpPlGl5PCiBmiHTjKA5NSjARCFsby34K6XCMLJph 23P+rfXKHzuTxfU2q4/WMumsF8IxGySqJiAz2ApnNDzJnO/aHUhD1GrWxK+kvbThERU3 HKdH1q9riRn4aXZrXV8+jnfiE7w5tlqs4jO/1Yg0a4s7I9La0ZoFZS3NmqHRLKB0tYeN vF40V0M9fvSHCrnZjt71Boy2RYZB7ub8ef9y+pAdfjq4h+PpPQRuVTYKuFJdMAHKzjNF gBW8UXoEd8GvdslNBWwzBMUwuK7Djk0ZgEHfuBamG9JDDvG4+1eiCqMhuBxcBwjmfs8T mHIg== X-Gm-Message-State: AOAM533i4CzE4T30JJk1S8KRjOIb4GxKY8DIGMCUc3siaFLcMWVIShLm uTLhEuDsA3CQoK44XEpiFreo36ADfqudiKsp8vWW1su43SL3wufeigmeDGzHXgSzs7CJxeGWciy 42vu9DlrT1vqBwnvEwK9/R4HhFwMRb2u+wIQL7jFKAg== X-Received: by 2002:ae9:f501:: with SMTP id o1mr14241249qkg.8.1604321351042; Mon, 02 Nov 2020 04:49:11 -0800 (PST) X-Google-Smtp-Source: ABdhPJxIxUe5bFulKFuJ19KOboH2b8LR7MuDUXa9CjiUvSdLRoqW79/IvJj3PtILNg95vws8mj2GNg== X-Received: by 2002:ae9:f501:: with SMTP id o1mr14241233qkg.8.1604321350833; Mon, 02 Nov 2020 04:49:10 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:10 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 11/40] virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP Date: Mon, 2 Nov 2020 07:48:27 -0500 Message-Id: <20201102124856.4659-12-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 When a driver implements the ndo_xdp_xmit() function, there is (currently) no generic way to determine whether it is safe to call. It is e.g. unsafe to call the drivers ndo_xdp_xmit, if it have not allocated the needed XDP TX queues yet. This is the case for virtio_net, which first allocates the XDP TX queues once an XDP/bpf prog is attached (in virtnet_xdp_set()). Thus, a crash will occur for virtio_net when redirecting to another virtio_net device's ndo_xdp_xmit, which have not attached a XDP prog. The sample xdp_redirect_map tries to attach a dummy XDP prog to take this into account, but it can also easily fail if the virtio_net (or actually underlying vhost driver) have not allocated enough extra queues for the device. Allocating more queue this is currently a manual config. Hint for libvirt XML add: The solution in this patch is to check that the device have loaded an XDP/bpf prog before proceeding. This is similar to the check performed in driver ixgbe. Signed-off-by: Jesper Dangaard Brouer Acked-by: John Fastabend Signed-off-by: David S. Miller (cherry picked from commit 8dcc5b0ab0ec9a2efb3362d380272546b8b2ee26) Signed-off-by: William Breathitt Gray --- drivers/net/virtio_net.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 76a1eb622a04..80aad47c8c97 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -435,8 +435,18 @@ static bool __virtnet_xdp_xmit(struct virtnet_info *vi, static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) { struct virtnet_info *vi = netdev_priv(dev); - bool sent = __virtnet_xdp_xmit(vi, xdp); + struct receive_queue *rq = vi->rq; + struct bpf_prog *xdp_prog; + bool sent; + + /* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this + * indicate XDP resources have been successfully allocated. + */ + xdp_prog = rcu_dereference(rq->xdp_prog); + if (!xdp_prog) + return -ENXIO; + sent = __virtnet_xdp_xmit(vi, xdp); if (!sent) return -ENOSPC; return 0; From patchwork Mon Nov 2 12:48:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392225 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt6253hJz9sWT; Mon, 2 Nov 2020 23:49:30 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZH4-0006Bk-Bh; Mon, 02 Nov 2020 12:49:26 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGs-00062h-Np for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:14 +0000 Received: from mail-qt1-f198.google.com ([209.85.160.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGr-00044V-6M for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:13 +0000 Received: by mail-qt1-f198.google.com with SMTP id u28so1668756qtv.20 for ; Mon, 02 Nov 2020 04:49:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sA8daJpjRmm/2HWORfdGtsKE9gvUqE/XLwtHpG1lkcs=; b=riG48H+DfOLCigZVN79sWU+eoeifdviUtKCm3G5h3t6KEVdl0Il+MLfJt6YSrM03U3 dusQWgRwE7DvHJnwRX72N/6y53EffO9U5dE03I4Iu6nGfigFsDt7CDKoMb09LnR7ubZE isSJkwkPAQZ90JJeyRqUxc9KkCTMg3Y1QzGJJ0cAeKVHBQB4gunsqYMc6oP8kSYvWTHE foNdoKHYoTV+QPvFxNL0sVSnogoHCdmplx/ypybwIv6GHoThbsV4r2cIz9Cd6Dh2pkqj k7Mkcv/2G61wuZ+oIbzLIk6/+1FZYvmuSLFJV3+FLPFFCrSo5J9V0M7cKY6bnNXzgjq4 q8NA== X-Gm-Message-State: AOAM532DuWLgDkSV/8EEleFkdbVFtlnCp1tqRmMPtDDFFk0eZu2mhU/Q 8OFNnNk+B7PEEZ8pJ6Iks9E4U39zaLb9kVAhVIA5+xykTPMudxFzXsf29uGwPgsb2yngmuM3SB8 MUjxk4Y9sdkGiGp9NJnjIgxu6SAf3BdEVaFNuTPfD4A== X-Received: by 2002:a0c:ee4b:: with SMTP id m11mr19472472qvs.0.1604321351952; Mon, 02 Nov 2020 04:49:11 -0800 (PST) X-Google-Smtp-Source: ABdhPJze7Quu7nGJbkLRgJqfoluCLHgiIHJ164sxmiOLGyB8u396izs4tkB7PgWbDfZ4lH5kYN05vg== X-Received: by 2002:a0c:ee4b:: with SMTP id m11mr19472460qvs.0.1604321351775; Mon, 02 Nov 2020 04:49:11 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:11 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 12/40] tuntap: XDP_TX can use native XDP Date: Mon, 2 Nov 2020 07:48:28 -0500 Message-Id: <20201102124856.4659-13-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jason Wang BugLink: https://bugs.launchpad.net/bugs/1877654 Now we have ndo_xdp_xmit, switch to use it instead of the slow generic XDP TX routine. XDP_TX on TAP gets ~20% improvements from ~1.5Mpps to ~1.8Mpps on 2.60GHz Core(TM) i7-5600U. Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller (backported from commit 59655a5b6c837e392e873629591069c898585592) [ vilhelmgray: context adjustments ] [ vilhelmgray: use local_bh_enable() instead of preempt_enable() ] Signed-off-by: William Breathitt Gray --- drivers/net/tun.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 6d827c0ef8cc..1d7f953d83e4 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1572,7 +1572,6 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, unsigned int delta = 0; char *buf; size_t copied; - bool xdp_xmit = false; int err, pad = TUN_RX_PAD; rcu_read_lock(); @@ -1630,8 +1629,14 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, local_bh_enable(); return NULL; case XDP_TX: - xdp_xmit = true; - /* fall through */ + get_page(alloc_frag->page); + alloc_frag->offset += buflen; + if (tun_xdp_xmit(tun->dev, &xdp)) + goto err_redirect; + tun_xdp_flush(tun->dev); + rcu_read_unlock(); + local_bh_enable(); + return NULL; case XDP_PASS: delta = orig_data - xdp.data; break; @@ -1659,14 +1664,6 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, get_page(alloc_frag->page); alloc_frag->offset += buflen; - if (xdp_xmit) { - skb->dev = tun->dev; - generic_xdp_tx(skb, xdp_prog); - rcu_read_unlock(); - local_bh_enable(); - return NULL; - } - rcu_read_unlock(); local_bh_enable(); From patchwork Mon Nov 2 12:48:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392222 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt5y09WLz9sVS; Mon, 2 Nov 2020 23:49:26 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZGx-00067H-VB; Mon, 02 Nov 2020 12:49:19 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGt-00063a-AO for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:15 +0000 Received: from mail-qk1-f200.google.com ([209.85.222.200]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGs-00044h-Js for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:14 +0000 Received: by mail-qk1-f200.google.com with SMTP id s5so8540970qkj.21 for ; Mon, 02 Nov 2020 04:49:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=C77Kf6gVJwdJLCDre3SQ4ldEUBMc2t7dbAU6QItmEww=; b=njaFh2qa9O6vL9SqA4Sa9XuCoJX4/wDG5ycjPq8aTzWJVpb+sJuSxGc1q4T62jucOl tWsDxF10kHl7tnDlaEmrtXEe6WR+n72jeSqZ23AoIJAJd77mnKK6pR2AXxnvVfDTwTyy 5AcCTO6D63mxdtULDjNtw6FaOOsb7mwytcaAbTc/c4ftLidf/dFBAK3eqQepYcehFqpG ztyr2nkjq2tQKIMOl02UAca4T581/l8GuwtDG0kH+iT/WOGu6mH8wpFinvPYDTQRMRPq uDbzuYwlwQEq3Gvp3oI2yl+e6viPfhFwtjWQGNgEcMIjlP381URJOFDB67VQpwpRCk4v wCRQ== X-Gm-Message-State: AOAM531pedfYsBi+3QaHJyVMySa6DQoOulP7lQJRK87TVqfrDl2csq7W I80iZUm7ogIP7u5daLd6HnEYX686OUphPKOFddOpQdx/fiUpPE1A6iXe+mN4EB0TJksk3yUGi8z FWX/wou6p74WtR6MVzBJN2zrLTkEAWOhQf6zy3SYlbA== X-Received: by 2002:ac8:488b:: with SMTP id i11mr7092964qtq.11.1604321353374; Mon, 02 Nov 2020 04:49:13 -0800 (PST) X-Google-Smtp-Source: ABdhPJxA7726S9TLRs/XiXsCbtPVUXfQTeYN9NiJwzWzYtZQapG9Wa2nw4YdLDMgSJ7FUNxAdIi/MQ== X-Received: by 2002:ac8:488b:: with SMTP id i11mr7092951qtq.11.1604321353129; Mon, 02 Nov 2020 04:49:13 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:12 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 13/40] i40e: add support for XDP_REDIRECT Date: Mon, 2 Nov 2020 07:48:29 -0500 Message-Id: <20201102124856.4659-14-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Björn Töpel BugLink: https://bugs.launchpad.net/bugs/1877654 The driver now acts upon the XDP_REDIRECT return action. Two new ndos are implemented, ndo_xdp_xmit and ndo_xdp_flush. XDP_REDIRECT action enables XDP program to redirect frames to other netdevs. Signed-off-by: Björn Töpel Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher (backported from commit d9314c474d4fc1985e836b92fba4c40dd84885a7) [ vilhelmgray: context adjustment ] Signed-off-by: William Breathitt Gray --- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 + drivers/net/ethernet/intel/i40e/i40e_txrx.c | 74 ++++++++++++++++++--- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 + 3 files changed, 68 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 65799dcabcc9..de222069516c 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -11772,6 +11772,8 @@ static const struct net_device_ops i40e_netdev_ops = { .ndo_bridge_getlink = i40e_ndo_bridge_getlink, .ndo_bridge_setlink = i40e_ndo_bridge_setlink, .ndo_bpf = i40e_xdp, + .ndo_xdp_xmit = i40e_xdp_xmit, + .ndo_xdp_flush = i40e_xdp_flush, }; /** diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 5bc2748ac468..9bc0edac43b5 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -1996,7 +1996,7 @@ static int i40e_xmit_xdp_ring(struct xdp_buff *xdp, static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp) { - int result = I40E_XDP_PASS; + int err, result = I40E_XDP_PASS; struct i40e_ring *xdp_ring; struct bpf_prog *xdp_prog; u32 act; @@ -2015,6 +2015,10 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring, xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index]; result = i40e_xmit_xdp_ring(xdp, xdp_ring); break; + case XDP_REDIRECT: + err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); + result = !err ? I40E_XDP_TX : I40E_XDP_CONSUMED; + break; default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: @@ -2050,6 +2054,15 @@ static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring, #endif } +static inline void i40e_xdp_ring_update_tail(struct i40e_ring *xdp_ring) +{ + /* Force memory writes to complete before letting h/w + * know there are new descriptors to fetch. + */ + wmb(); + writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail); +} + /** * i40e_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf * @rx_ring: rx descriptor ring to transact packets on @@ -2182,16 +2195,11 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) } if (xdp_xmit) { - struct i40e_ring *xdp_ring; - - xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index]; - - /* Force memory writes to complete before letting h/w - * know there are new descriptors to fetch. - */ - wmb(); + struct i40e_ring *xdp_ring = + rx_ring->vsi->xdp_rings[rx_ring->queue_index]; - writel(xdp_ring->next_to_use, xdp_ring->tail); + i40e_xdp_ring_update_tail(xdp_ring); + xdp_do_flush_map(); } rx_ring->skb = skb; @@ -3445,3 +3453,49 @@ netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev) return i40e_xmit_frame_ring(skb, tx_ring); } + +/** + * i40e_xdp_xmit - Implements ndo_xdp_xmit + * @dev: netdev + * @xdp: XDP buffer + * + * Returns Zero if sent, else an error code + **/ +int i40e_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) +{ + struct i40e_netdev_priv *np = netdev_priv(dev); + unsigned int queue_index = smp_processor_id(); + struct i40e_vsi *vsi = np->vsi; + int err; + + if (test_bit(__I40E_VSI_DOWN, vsi->state)) + return -ENETDOWN; + + if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs) + return -ENXIO; + + err = i40e_xmit_xdp_ring(xdp, vsi->xdp_rings[queue_index]); + if (err != I40E_XDP_TX) + return -ENOSPC; + + return 0; +} + +/** + * i40e_xdp_flush - Implements ndo_xdp_flush + * @dev: netdev + **/ +void i40e_xdp_flush(struct net_device *dev) +{ + struct i40e_netdev_priv *np = netdev_priv(dev); + unsigned int queue_index = smp_processor_id(); + struct i40e_vsi *vsi = np->vsi; + + if (test_bit(__I40E_VSI_DOWN, vsi->state)) + return; + + if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs) + return; + + i40e_xdp_ring_update_tail(vsi->xdp_rings[queue_index]); +} diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index fbae1182e2ea..c0fe46404a0c 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -500,6 +500,8 @@ void i40e_force_wb(struct i40e_vsi *vsi, struct i40e_q_vector *q_vector); u32 i40e_get_tx_pending(struct i40e_ring *ring); int __i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size); bool __i40e_chk_linearize(struct sk_buff *skb); +int i40e_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp); +void i40e_xdp_flush(struct net_device *dev); /** * i40e_get_head - Retrieve head from head writeback From patchwork Mon Nov 2 12:48:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392227 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt6614cYz9sWk; Mon, 2 Nov 2020 23:49:34 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZH6-0006DK-5k; Mon, 02 Nov 2020 12:49:28 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGv-00064s-BQ for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:17 +0000 Received: from mail-qk1-f198.google.com ([209.85.222.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGu-00044x-5M for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:16 +0000 Received: by mail-qk1-f198.google.com with SMTP id v134so8569500qka.19 for ; Mon, 02 Nov 2020 04:49:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VkzWE4Jby2nwO9U30ryurDZNX0NaZShURcrUv8naT1M=; b=npTXJm4HsbaDA6z7T6tYwzubjMNNGHJBGm3OxmxOeJZiyiKnDcdrcRAX/w2uiWREDU DCNHlXYx9BrMap/miNOfUIvRuoh9Zt2NAO7vmgNei4ikVC0Z8okkeUSJQ2lkVHFdzbzK RJdfg6QqxDuVnsrbzP9GcfRCvkoalR9j2ZWhbx+09athNWo1VtDs0HEsd+EjrLAgVbvt 3AugTur7bxES45o65oyYjDmBru/Lqkq4B4PP1rSqMEcoBQZUGWT33ZQRVIXdNOwRulCR ouQBE0/OdwpFByrA6HPkhwTqwpt0DQ3UmLjGWJfSxFjnTsf1j0kzWA4nEa4jOaAs/nk4 xVHA== X-Gm-Message-State: AOAM533AZw1pnHM13b4wk99mLA3MufntmXLIpFyifrDjj8ede9XKOl3P kyEOIwA3ARJ0SqZJ3+ETV35ZrAFDIZ184+0rVFPtY6XkCdnwQxAkccXl7ZkgeqnZw53zYBV5U7D EDVow5mw4R2lLCj3Xzf09P0slqxDx3izwgoDvlADrsw== X-Received: by 2002:a37:9e0b:: with SMTP id h11mr14818468qke.88.1604321354806; Mon, 02 Nov 2020 04:49:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJytZUONCgDV0xWIpEF3rlw8jc6JvkI5+HfDmWozqsSIke53Jx5Ef8Lg4yMolUwG4UYkm1Rvhw== X-Received: by 2002:a37:9e0b:: with SMTP id h11mr14818448qke.88.1604321354530; Mon, 02 Nov 2020 04:49:14 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:13 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 14/40] xdp: introduce xdp_return_frame API and use in cpumap Date: Mon, 2 Nov 2020 07:48:30 -0500 Message-Id: <20201102124856.4659-15-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 Introduce an xdp_return_frame API, and convert over cpumap as the first user, given it have queued XDP frame structure to leverage. V3: Cleanup and remove C99 style comments, pointed out by Alex Duyck. V6: Remove comment that id will be added later (Req by Alex Duyck) V8: Rename enum mem_type to xdp_mem_type (found by kbuild test robot) Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (backported from commit 5ab073ffd326480a6185d096e9703f62ef92b86c) [ vilhelmgray: context adjustment ] Signed-off-by: William Breathitt Gray --- include/net/xdp.h | 27 ++++++++++++++++++++ kernel/bpf/cpumap.c | 60 +++++++++++++++++++++++++++------------------ net/core/xdp.c | 18 ++++++++++++++ 3 files changed, 81 insertions(+), 24 deletions(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index b2362ddfa694..e4207699c410 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -33,16 +33,43 @@ * also mandatory during RX-ring setup. */ +enum xdp_mem_type { + MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */ + MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */ + MEM_TYPE_MAX, +}; + +struct xdp_mem_info { + u32 type; /* enum xdp_mem_type, but known size type */ +}; + struct xdp_rxq_info { struct net_device *dev; u32 queue_index; u32 reg_state; + struct xdp_mem_info mem; } ____cacheline_aligned; /* perf critical, avoid false-sharing */ + +static inline +void xdp_return_frame(void *data, struct xdp_mem_info *mem) +{ + if (mem->type == MEM_TYPE_PAGE_SHARED) + page_frag_free(data); + + if (mem->type == MEM_TYPE_PAGE_ORDER0) { + struct page *page = virt_to_page(data); /* Assumes order0 page*/ + + put_page(page); + } +} + int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq, struct net_device *dev, u32 queue_index); void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq); void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq); bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq); +int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, + enum xdp_mem_type type, void *allocator); #endif /* __LINUX_NET_XDP_H__ */ diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index e4ef747915a0..1507e305ecfc 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -143,27 +144,6 @@ static struct bpf_map *cpu_map_alloc(union bpf_attr *attr) return ERR_PTR(err); } -void __cpu_map_queue_destructor(void *ptr) -{ - /* The tear-down procedure should have made sure that queue is - * empty. See __cpu_map_entry_replace() and work-queue - * invoked cpu_map_kthread_stop(). Catch any broken behaviour - * gracefully and warn once. - */ - if (WARN_ON_ONCE(ptr)) - page_frag_free(ptr); -} - -static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu) -{ - if (atomic_dec_and_test(&rcpu->refcnt)) { - /* The queue should be empty at this point */ - ptr_ring_cleanup(rcpu->queue, __cpu_map_queue_destructor); - kfree(rcpu->queue); - kfree(rcpu); - } -} - static void get_cpu_map_entry(struct bpf_cpu_map_entry *rcpu) { atomic_inc(&rcpu->refcnt); @@ -194,6 +174,10 @@ struct xdp_pkt { u16 len; u16 headroom; u16 metasize; + /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time, + * while mem info is valid on remote CPU. + */ + struct xdp_mem_info mem; struct net_device *dev_rx; }; @@ -219,6 +203,9 @@ static struct xdp_pkt *convert_to_xdp_pkt(struct xdp_buff *xdp) xdp_pkt->headroom = headroom - sizeof(*xdp_pkt); xdp_pkt->metasize = metasize; + /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */ + xdp_pkt->mem = xdp->rxq->mem; + return xdp_pkt; } @@ -271,6 +258,31 @@ struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu, return skb; } +static void __cpu_map_ring_cleanup(struct ptr_ring *ring) +{ + /* The tear-down procedure should have made sure that queue is + * empty. See __cpu_map_entry_replace() and work-queue + * invoked cpu_map_kthread_stop(). Catch any broken behaviour + * gracefully and warn once. + */ + struct xdp_pkt *xdp_pkt; + + while ((xdp_pkt = ptr_ring_consume(ring))) + if (WARN_ON_ONCE(xdp_pkt)) + xdp_return_frame(xdp_pkt, &xdp_pkt->mem); +} + +static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu) +{ + if (atomic_dec_and_test(&rcpu->refcnt)) { + /* The queue should be empty at this point */ + __cpu_map_ring_cleanup(rcpu->queue); + ptr_ring_cleanup(rcpu->queue, NULL); + kfree(rcpu->queue); + kfree(rcpu); + } +} + static int cpu_map_kthread_run(void *data) { struct bpf_cpu_map_entry *rcpu = data; @@ -313,7 +325,7 @@ static int cpu_map_kthread_run(void *data) skb = cpu_map_build_skb(rcpu, xdp_pkt); if (!skb) { - page_frag_free(xdp_pkt); + xdp_return_frame(xdp_pkt, &xdp_pkt->mem); continue; } @@ -609,13 +621,13 @@ static int bq_flush_to_queue(struct bpf_cpu_map_entry *rcpu, spin_lock(&q->producer_lock); for (i = 0; i < bq->count; i++) { - void *xdp_pkt = bq->q[i]; + struct xdp_pkt *xdp_pkt = bq->q[i]; int err; err = __ptr_ring_produce(q, xdp_pkt); if (err) { drops++; - page_frag_free(xdp_pkt); /* Free xdp_pkt */ + xdp_return_frame(xdp_pkt->data, &xdp_pkt->mem); } processed++; } diff --git a/net/core/xdp.c b/net/core/xdp.c index 097a0f74e004..7e6b3545277d 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -71,3 +71,21 @@ bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq) return (xdp_rxq->reg_state == REG_STATE_REGISTERED); } EXPORT_SYMBOL_GPL(xdp_rxq_info_is_reg); + +int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, + enum xdp_mem_type type, void *allocator) +{ + if (type >= MEM_TYPE_MAX) + return -EINVAL; + + xdp_rxq->mem.type = type; + + if (allocator) + return -EOPNOTSUPP; + + /* TODO: Allocate an ID that maps to allocator pointer + * See: https://www.kernel.org/doc/html/latest/core-api/idr.html + */ + return 0; +} +EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model); From patchwork Mon Nov 2 12:48:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392232 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt6l3KxSz9sRK; Mon, 2 Nov 2020 23:50:07 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHX-0006We-M0; Mon, 02 Nov 2020 12:49:55 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGw-00065L-H7 for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:18 +0000 Received: from mail-qk1-f199.google.com ([209.85.222.199]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGv-000455-02 for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:17 +0000 Received: by mail-qk1-f199.google.com with SMTP id x5so8641014qkn.2 for ; Mon, 02 Nov 2020 04:49:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oD+ZulzuAMpAwKokZ5Dx+tYpLnAvNh3kdFntFDB/E6g=; b=YQvvZo5XDIJpSlG1g5ndPw9Qjdpt9qX5Ay/5pTwyWApRXPFFqvIk9yl/6h5KCcSQJr pbYC7yZQ7G1Rt4LhoYiVg4KzBxGAn+n/DxzaR7l+j8ms7SMKNvcj3i72elQAArIG+yX4 6v1kqxDYPAxexxRYy58YeG3xab94lZ74Vcq1MSuGuqCkGktY0Fb5JWMSUPXzGBWn6mfS j8OhTLf/UVnyi9OsX84aAaUIWGxMcNOK0C/GyWqNWSs4Vfqy5Etb3TREHmZc518QW9pM 3h1msjRDZTkZHy7lZ+KtQS3TNL63R6+AMysufqUbQZ26FNcKLKGFrdX8REGvf8vxxJl6 gYTw== X-Gm-Message-State: AOAM532DndnEXFwS1W2BnQw8bfuaHrJBY2narv1gAcEFT3K2eoDVls6C OlyL0B49H1L24Vd3B3dLof/6s1l+JKUWmekZMtgsdzI05kiQEiIPU+UUe/BR5EWIzIOPSM8UBnQ KCuB77srOJ/YMy6UHF6xl/B5vZDcyRKGL4ZVi03k8Ag== X-Received: by 2002:ac8:4713:: with SMTP id f19mr14148086qtp.233.1604321355715; Mon, 02 Nov 2020 04:49:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJysdKEatJesfxZHBCCtU5LwYEkbKXaoQ7SOzrt2g1yObOZk2w+CNzPwMg46nj+DlQ9OxzGDyQ== X-Received: by 2002:ac8:4713:: with SMTP id f19mr14148052qtp.233.1604321355480; Mon, 02 Nov 2020 04:49:15 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:14 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 15/40] ixgbe: use xdp_return_frame API Date: Mon, 2 Nov 2020 07:48:31 -0500 Message-Id: <20201102124856.4659-16-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 Extend struct ixgbe_tx_buffer to store the xdp_mem_info. Notice that this could be optimized further by putting this into a union in the struct ixgbe_tx_buffer, but this patchset works towards removing this again. Thus, this is not done. Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (cherry picked from commit 189ead81a83eba5f5c5ce56c45620e51abcb5cb8) Signed-off-by: William Breathitt Gray --- drivers/net/ethernet/intel/ixgbe/ixgbe.h | 1 + drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 ++++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h index 8611763d6129..94a77ef2bffd 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -247,6 +247,7 @@ struct ixgbe_tx_buffer { DEFINE_DMA_UNMAP_ADDR(dma); DEFINE_DMA_UNMAP_LEN(len); u32 tx_flags; + struct xdp_mem_info xdp_mem; }; struct ixgbe_rx_buffer { diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 49028d005eae..25ebb81ee1bb 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -1207,7 +1207,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector, /* free the skb */ if (ring_is_xdp(tx_ring)) - page_frag_free(tx_buffer->data); + xdp_return_frame(tx_buffer->data, &tx_buffer->xdp_mem); else napi_consume_skb(tx_buffer->skb, napi_budget); @@ -5880,7 +5880,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring) /* Free all the Tx ring sk_buffs */ if (ring_is_xdp(tx_ring)) - page_frag_free(tx_buffer->data); + xdp_return_frame(tx_buffer->data, &tx_buffer->xdp_mem); else dev_kfree_skb_any(tx_buffer->skb); @@ -8476,6 +8476,8 @@ static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter, dma_unmap_len_set(tx_buffer, len, len); dma_unmap_addr_set(tx_buffer, dma, dma); tx_buffer->data = xdp->data; + tx_buffer->xdp_mem = xdp->rxq->mem; + tx_desc->read.buffer_addr = cpu_to_le64(dma); /* put descriptor type bits */ From patchwork Mon Nov 2 12:48:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392226 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt655stjz9sVN; Mon, 2 Nov 2020 23:49:33 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZH7-0006E4-0N; Mon, 02 Nov 2020 12:49:29 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGw-00066A-Vf for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:19 +0000 Received: from mail-qv1-f71.google.com ([209.85.219.71]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGw-00045I-7W for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:18 +0000 Received: by mail-qv1-f71.google.com with SMTP id w1so8146871qvv.0 for ; Mon, 02 Nov 2020 04:49:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6oRHOy62LOL8gkFxJiMjKgZBlF3dvrbryQtSJJJ7yws=; b=BevwLdLazwX2nKGtpDHyFGNh/dCTjFe11/A1XYD5eWGkjOrua942eeQL9+/eSa8CFq HlONdfWndyK0A1NiDZQXaKKRKSNMMpNOBVrc9xOueQiD6JRxkM+tXAjfpidN3YqhIuV9 X+uq2lg8B+YDAQ4VuLjyW4HFdap7+bhu0sW7JG2xpNzhrncJfrqmJHmuKoI2fNtDcelD gG4EEaKGzUDu9WbnTTMw+oYXIIbXnFSgRnySZGzvC5Qyo3ZDcO7qJydXeXUy1j1f55n9 Ow1w/LGyC60WKDtBXPDiPsvNIBD7gUbUb1KmzPGZ+ZRpOuRQAHHJ9r+J7zm1Z4m2TJTn /Q8w== X-Gm-Message-State: AOAM533mzLokAgwYvFLk+VdKAj5ui9ab2GosvIdR1Zpd1dKCGX3/ep2x kszRxfpX8fWSkWCiYM9UK2QLbGjY/M1duSMlrTyRwxsgt5ZXdoNRvnNiYXJx+axWiBRp/svTa+E kAG4h+QkFofYGCzZyKyVQ4X0/K1iAi49NSHankhrRJg== X-Received: by 2002:ac8:6b92:: with SMTP id z18mr2997070qts.30.1604321357060; Mon, 02 Nov 2020 04:49:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJx/02HdwkzJjt3hGC7+9Ha6AgEqHNbEGMR+czqY6B35DoC9RYCdBza2MmNOpJqLa9OX4mKPBw== X-Received: by 2002:ac8:6b92:: with SMTP id z18mr2997056qts.30.1604321356859; Mon, 02 Nov 2020 04:49:16 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.15 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:15 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 16/40] xdp: move struct xdp_buff from filter.h to xdp.h Date: Mon, 2 Nov 2020 07:48:32 -0500 Message-Id: <20201102124856.4659-17-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 This is done to prepare for the next patch, and it is also nice to move this XDP related struct out of filter.h. Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (backported from commit 106ca27f2922e8de820d1bd3d79b1cbdf2d78eea) [ vilhelmgray: context adjustment ] Signed-off-by: William Breathitt Gray --- include/linux/filter.h | 24 +----------------------- include/net/xdp.h | 22 ++++++++++++++++++++++ 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index 449091818f43..ef3df706b417 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -29,6 +29,7 @@ struct sock; struct seccomp_data; struct bpf_prog_aux; struct xdp_rxq_info; +struct xdp_buff; /* ArgX, context and stack frame pointer register positions. Note, * Arg1, Arg2, Arg3, etc are used as argument mappings of function @@ -489,14 +490,6 @@ struct bpf_skb_data_end { void *data_end; }; -struct xdp_buff { - void *data; - void *data_end; - void *data_meta; - void *data_hard_start; - struct xdp_rxq_info *rxq; -}; - /* Compute the linear packet data range [data, data_end) which * will be accessed by various program types (cls_bpf, act_bpf, * lwt, ...). Subsystems allowing direct data access must (!) @@ -731,21 +724,6 @@ int xdp_do_redirect(struct net_device *dev, struct bpf_prog *prog); void xdp_do_flush_map(void); -/* Drivers not supporting XDP metadata can use this helper, which - * rejects any room expansion for metadata as a result. - */ -static __always_inline void -xdp_set_data_meta_invalid(struct xdp_buff *xdp) -{ - xdp->data_meta = xdp->data + 1; -} - -static __always_inline bool -xdp_data_meta_unsupported(const struct xdp_buff *xdp) -{ - return unlikely(xdp->data_meta > xdp->data); -} - void bpf_warn_invalid_xdp_action(u32 act); struct sock *do_sk_redirect_map(struct sk_buff *skb); diff --git a/include/net/xdp.h b/include/net/xdp.h index e4207699c410..15f8ade008b5 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -50,6 +50,13 @@ struct xdp_rxq_info { struct xdp_mem_info mem; } ____cacheline_aligned; /* perf critical, avoid false-sharing */ +struct xdp_buff { + void *data; + void *data_end; + void *data_meta; + void *data_hard_start; + struct xdp_rxq_info *rxq; +}; static inline void xdp_return_frame(void *data, struct xdp_mem_info *mem) @@ -72,4 +79,19 @@ bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq); int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, enum xdp_mem_type type, void *allocator); +/* Drivers not supporting XDP metadata can use this helper, which + * rejects any room expansion for metadata as a result. + */ +static __always_inline void +xdp_set_data_meta_invalid(struct xdp_buff *xdp) +{ + xdp->data_meta = xdp->data + 1; +} + +static __always_inline bool +xdp_data_meta_unsupported(const struct xdp_buff *xdp) +{ + return unlikely(xdp->data_meta > xdp->data); +} + #endif /* __LINUX_NET_XDP_H__ */ From patchwork Mon Nov 2 12:48:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392233 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt6t5mjwz9sWD; Mon, 2 Nov 2020 23:50:14 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHh-0006cp-PJ; Mon, 02 Nov 2020 12:50:05 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGy-000679-GL for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:20 +0000 Received: from mail-qv1-f69.google.com ([209.85.219.69]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGx-00045a-Bu for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:19 +0000 Received: by mail-qv1-f69.google.com with SMTP id b13so8147212qvz.2 for ; Mon, 02 Nov 2020 04:49:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Uh/pEcNX4ofO6z1Iq7Sc9CZ6DYGzkz92f8QGyA3Mwh0=; b=UMBis9OYT06bpkyKGJpgTcZJZ2KfOGRCAv2WHIJqTn9uLCxnI3qN/KJ9H/iUFWcruP F9LJufH6JvNZH/IhWm5BawZ2lFLLagpEVNwew5NV9eQckcX6va+dBUtlxwr7x8pr5m1P URK0FdYurtmVpabOn9jXWvL4yVjpCFi1YuKTwbWLVV3FgoLSKUL9cGjXL0lv68eNLNQ+ g8nAcykDCW8g5fKjsipLZb14/aHh2+pwxzaUqukGaxsYrOHl1LdJrNRmG78eb4wuUgFI x51e9Jep7y3fX9XDAYWo4IdsH1fvD8y8H2i/fqx1T7UQ2fNQJiPPFkyVG6/1BYcadpLU 4eLA== X-Gm-Message-State: AOAM532QT5otq150D+J3m9xr5JKkEC4LdrMiaFaRiEryf9cWgVHcea1Z aNa+WS7kmTB26xxN9BGHUYS6jnvrSnAqHqwqDFNCut3EBaUG/vAdOoHLREGY6xICP6Ssz+Cb+t/ 4u/g9qKv9Y/G6qC3rsbpTl6rH4mHoCLA0GfJZ0uFfZA== X-Received: by 2002:a37:6107:: with SMTP id v7mr13956963qkb.43.1604321358234; Mon, 02 Nov 2020 04:49:18 -0800 (PST) X-Google-Smtp-Source: ABdhPJzBmEqcFAFaozYaWBE7wzpSKy4aVpK8mGLHvt6DnYzIIWqy9onrhqwy+daZMzwblbQ+obAD+g== X-Received: by 2002:a37:6107:: with SMTP id v7mr13956952qkb.43.1604321358016; Mon, 02 Nov 2020 04:49:18 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:17 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 17/40] xdp: introduce a new xdp_frame type Date: Mon, 2 Nov 2020 07:48:33 -0500 Message-Id: <20201102124856.4659-18-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 This is needed to convert drivers tuntap and virtio_net. This is a generalization of what is done inside cpumap, which will be converted later. Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (cherry picked from commit c0048cff8abb69c956ce1277d17a3f7a14e41522) Signed-off-by: William Breathitt Gray --- include/net/xdp.h | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/include/net/xdp.h b/include/net/xdp.h index 15f8ade008b5..756c42811e78 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -58,6 +58,46 @@ struct xdp_buff { struct xdp_rxq_info *rxq; }; +struct xdp_frame { + void *data; + u16 len; + u16 headroom; + u16 metasize; + /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time, + * while mem info is valid on remote CPU. + */ + struct xdp_mem_info mem; +}; + +/* Convert xdp_buff to xdp_frame */ +static inline +struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp) +{ + struct xdp_frame *xdp_frame; + int metasize; + int headroom; + + /* Assure headroom is available for storing info */ + headroom = xdp->data - xdp->data_hard_start; + metasize = xdp->data - xdp->data_meta; + metasize = metasize > 0 ? metasize : 0; + if (unlikely((headroom - metasize) < sizeof(*xdp_frame))) + return NULL; + + /* Store info in top of packet */ + xdp_frame = xdp->data_hard_start; + + xdp_frame->data = xdp->data; + xdp_frame->len = xdp->data_end - xdp->data; + xdp_frame->headroom = headroom - sizeof(*xdp_frame); + xdp_frame->metasize = metasize; + + /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */ + xdp_frame->mem = xdp->rxq->mem; + + return xdp_frame; +} + static inline void xdp_return_frame(void *data, struct xdp_mem_info *mem) { From patchwork Mon Nov 2 12:48:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392228 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt680zVNz9sWn; Mon, 2 Nov 2020 23:49:36 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZH7-0006En-Lc; Mon, 02 Nov 2020 12:49:29 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH0-00068J-9L for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:22 +0000 Received: from mail-qt1-f198.google.com ([209.85.160.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGy-00045y-NT for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:20 +0000 Received: by mail-qt1-f198.google.com with SMTP id i39so7992902qtb.1 for ; Mon, 02 Nov 2020 04:49:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dU0kWNfg+L77ey6s5EcZIOn/uMh+riS7kM7c2EzFzN4=; b=LyGnKeXlfwCwR1PZJcUN6TcAAEe/xH4IhVC/iEAxDKGrrVQNeKPEbnM9CdWJNtFlCc w/ecv3ORTzy43pp/agqLkPoOug9CJ+4Xi6a2DexdbYqrPGnRhsbXH/PWWTTM9vyc71ee +yApRh3yCGs4GeNdXnZwRQt0Gf9tN1JaU5CO70E50/L1uoVZvrxaOl/dMOGtmyiPVm4R sTekbVaYhoJ0dLGwvJCq5jJ4IjrXpFPv2eWpsD5g0KUQ6WfJReO7Nngpzo3gfkucAcd8 2IJthEeKMYPXpot+RY0hkQki67u2pALAJ6ENIVxPjuWqYn88UAuB/j2VZT+loBVL0HN0 kxxw== X-Gm-Message-State: AOAM530/r/sQ6zm6+hQUFrpmb46XJaboF9WZvdzbfuzRB7Ev3QOgNrAp 0xC7PFuHmBJQFOhzF+FhjpetEoTernOOAhfWvH4hX2sPy0tCxRcpxT+uqoBhySs6NPDX6Lq3eGo 6RAJPnJXudKRcNcBhVtSpFZ5+TgSYy+sIq0b0Z7NW8g== X-Received: by 2002:a37:a857:: with SMTP id r84mr15347212qke.11.1604321359368; Mon, 02 Nov 2020 04:49:19 -0800 (PST) X-Google-Smtp-Source: ABdhPJz5Z5y7Yxwl9xA0oxUHDZ7hjlxW1DvB9VVj0KybTtsviQIYC0IDjcQMA/zTL+BfkW/dliFysA== X-Received: by 2002:a37:a857:: with SMTP id r84mr15347199qke.11.1604321359137; Mon, 02 Nov 2020 04:49:19 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:18 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 18/40] tun: convert to use generic xdp_frame and xdp_return_frame API Date: Mon, 2 Nov 2020 07:48:34 -0500 Message-Id: <20201102124856.4659-19-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 The tuntap driver invented it's own driver specific way of queuing XDP packets, by storing the xdp_buff information in the top of the XDP frame data. Convert it over to use the more generic xdp_frame structure. The main problem with the in-driver method is that the xdp_rxq_info pointer cannot be trused/used when dequeueing the frame. V3: Remove check based on feedback from Jason Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (backported from commit 1ffcbc8537d0bc32aaca7000cb9c904ec4b6300f) [ vilhelmgray: context adjustments ] Signed-off-by: William Breathitt Gray --- drivers/net/tun.c | 43 ++++++++++++++++++++---------------------- drivers/vhost/net.c | 7 ++++--- include/linux/if_tun.h | 4 ++-- 3 files changed, 26 insertions(+), 28 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 1d7f953d83e4..25962a060574 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -236,11 +236,11 @@ struct tun_struct { struct bpf_prog __rcu *xdp_prog; }; -bool tun_is_xdp_buff(void *ptr) +bool tun_is_xdp_frame(void *ptr) { return (unsigned long)ptr & TUN_XDP_FLAG; } -EXPORT_SYMBOL(tun_is_xdp_buff); +EXPORT_SYMBOL(tun_is_xdp_frame); void *tun_xdp_to_ptr(void *ptr) { @@ -624,10 +624,10 @@ static void tun_ptr_free(void *ptr) { if (!ptr) return; - if (tun_is_xdp_buff(ptr)) { - struct xdp_buff *xdp = tun_ptr_to_xdp(ptr); + if (tun_is_xdp_frame(ptr)) { + struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr); - put_page(virt_to_head_page(xdp->data)); + xdp_return_frame(xdpf->data, &xdpf->mem); } else { __skb_array_destroy_skb(ptr); } @@ -1233,17 +1233,14 @@ static const struct net_device_ops tun_netdev_ops = { static int tun_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) { struct tun_struct *tun = netdev_priv(dev); - struct xdp_buff *buff = xdp->data_hard_start; - int headroom = xdp->data - xdp->data_hard_start; + struct xdp_frame *frame; struct tun_file *tfile; u32 numqueues; int ret = 0; - /* Assure headroom is available and buff is properly aligned */ - if (unlikely(headroom < sizeof(*xdp) || tun_is_xdp_buff(xdp))) - return -ENOSPC; - - *buff = *xdp; + frame = convert_to_xdp_frame(xdp); + if (unlikely(!frame)) + return -EOVERFLOW; rcu_read_lock(); @@ -1258,7 +1255,7 @@ static int tun_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) /* Encode the XDP flag into lowest bit for consumer to differ * XDP buffer from sk_buff. */ - if (ptr_ring_produce(&tfile->tx_ring, tun_xdp_to_ptr(buff))) { + if (ptr_ring_produce(&tfile->tx_ring, tun_xdp_to_ptr(frame))) { this_cpu_inc(tun->pcpu_stats->tx_dropped); ret = -ENOSPC; } @@ -1969,11 +1966,11 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, struct iov_iter *from) static ssize_t tun_put_user_xdp(struct tun_struct *tun, struct tun_file *tfile, - struct xdp_buff *xdp, + struct xdp_frame *xdp_frame, struct iov_iter *iter) { int vnet_hdr_sz = 0; - size_t size = xdp->data_end - xdp->data; + size_t size = xdp_frame->len; struct tun_pcpu_stats *stats; size_t ret; @@ -1989,7 +1986,7 @@ static ssize_t tun_put_user_xdp(struct tun_struct *tun, iov_iter_advance(iter, vnet_hdr_sz - sizeof(gso)); } - ret = copy_to_iter(xdp->data, size, iter) + vnet_hdr_sz; + ret = copy_to_iter(xdp_frame->data, size, iter) + vnet_hdr_sz; stats = get_cpu_ptr(tun->pcpu_stats); u64_stats_update_begin(&stats->syncp); @@ -2161,11 +2158,11 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile, return err; } - if (tun_is_xdp_buff(ptr)) { - struct xdp_buff *xdp = tun_ptr_to_xdp(ptr); + if (tun_is_xdp_frame(ptr)) { + struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr); - ret = tun_put_user_xdp(tun, tfile, xdp, to); - put_page(virt_to_head_page(xdp->data)); + ret = tun_put_user_xdp(tun, tfile, xdpf, to); + xdp_return_frame(xdpf->data, &xdpf->mem); } else { struct sk_buff *skb = ptr; @@ -2315,10 +2312,10 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len, static int tun_ptr_peek_len(void *ptr) { if (likely(ptr)) { - if (tun_is_xdp_buff(ptr)) { - struct xdp_buff *xdp = tun_ptr_to_xdp(ptr); + if (tun_is_xdp_frame(ptr)) { + struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr); - return xdp->data_end - xdp->data; + return xdpf->len; } return __skb_array_len_with_tag(ptr); } else { diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 2f3d5298d630..07ca1c0e6fc4 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -32,6 +32,7 @@ #include #include +#include #include "vhost.h" @@ -183,10 +184,10 @@ static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq) static int vhost_net_buf_peek_len(void *ptr) { - if (tun_is_xdp_buff(ptr)) { - struct xdp_buff *xdp = tun_ptr_to_xdp(ptr); + if (tun_is_xdp_frame(ptr)) { + struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr); - return xdp->data_end - xdp->data; + return xdpf->len; } return __skb_array_len_with_tag(ptr); diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h index 08e66827ad8e..0ab590e41913 100644 --- a/include/linux/if_tun.h +++ b/include/linux/if_tun.h @@ -22,7 +22,7 @@ #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE) struct socket *tun_get_socket(struct file *); struct ptr_ring *tun_get_tx_ring(struct file *file); -bool tun_is_xdp_buff(void *ptr); +bool tun_is_xdp_frame(void *ptr); void *tun_xdp_to_ptr(void *ptr); void *tun_ptr_to_xdp(void *ptr); #else @@ -38,7 +38,7 @@ static inline struct ptr_ring *tun_get_tx_ring(struct file *f) { return ERR_PTR(-EINVAL); } -static inline bool tun_is_xdp_buff(void *ptr) +static inline bool tun_is_xdp_frame(void *ptr) { return false; } From patchwork Mon Nov 2 12:48:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392229 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt6D0C6Kz9sWr; Mon, 2 Nov 2020 23:49:40 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHC-0006ID-D5; Mon, 02 Nov 2020 12:49:34 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH1-00068y-77 for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:23 +0000 Received: from mail-qv1-f71.google.com ([209.85.219.71]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZGz-000466-Jb for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:21 +0000 Received: by mail-qv1-f71.google.com with SMTP id d18so822232qvp.15 for ; Mon, 02 Nov 2020 04:49:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=o3KYhDnNhqdWKsxBm1aDpXOzDjMJvPjeRzijLb/R0KM=; b=coTzORlEGbz3gA+6yG2e2RiROaCg/svIroLTEzyFQoYYuPHngXT1psMCuVJD3pTDhp kHSBV0V+Dl5zUJaUEym/15xoQvdRO8/cghzt1m89bgt+XEJVkOLOYMH9Em6jUy1QAmOx aFws63faUZF9jN+Z5viSwk+Q5MVDLiQk44P6IU/5m0LUNFRU5+mqJaGlh88lhpw3kKZN 4ia/DxRj9SBgCAXbgUnWzwJtf1LZQUQGH6K3d+a1Lgl4tNYbx4l2rvtSMgYleUbCcuww K21TPrcdrch1bhwiZO35sLPryD4zV11FTcQ3V7Dtp0oP+SjJYv6hsjnNo6pdnFjP2xDa jeMw== X-Gm-Message-State: AOAM53389Vjtw4jcBvz224HfgFa+0EDBBbP24HaEMxiaDLgdzpFDYLqA thej3lmw/5C5lpN4ecThS2ZnoeC9BqanRsVlDEZWVMke4q2/QfJqmmI6+yTceo9IhkRQUSjd9R1 VsuZMjg3nUMjpqqEsNt9lbI+7cmwd5mKFSab7bJTN8w== X-Received: by 2002:a37:27c9:: with SMTP id n192mr13934894qkn.71.1604321360363; Mon, 02 Nov 2020 04:49:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJwc+mUuMLtZFbJ5fM5VuFaod7pwVxdjVXHf91zju3vbIW1PHuXvMBXTXdHSHBSj+jyztoO5MQ== X-Received: by 2002:a37:27c9:: with SMTP id n192mr13934878qkn.71.1604321360078; Mon, 02 Nov 2020 04:49:20 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:19 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 19/40] virtio_net: convert to use generic xdp_frame and xdp_return_frame API Date: Mon, 2 Nov 2020 07:48:35 -0500 Message-Id: <20201102124856.4659-20-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 The virtio_net driver assumes XDP frames are always released based on page refcnt (via put_page). Thus, is only queues the XDP data pointer address and uses virt_to_head_page() to retrieve struct page. Use the XDP return API to get away from such assumptions. Instead queue an xdp_frame, which allow us to use the xdp_return_frame API, when releasing the frame. V8: Avoid endianness issues (found by kbuild test robot) V9: Change __virtnet_xdp_xmit from bool to int return value (found by Dan Carpenter) Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (backported from commit cac320c850efb25480cd0f71383b84ec61c0e138) [ vilhelmgray: context adjustment ] Signed-off-by: William Breathitt Gray --- drivers/net/virtio_net.c | 54 +++++++++++++++++++++------------------- 1 file changed, 29 insertions(+), 25 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 80aad47c8c97..280bab31a2d3 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -398,38 +398,48 @@ static void virtnet_xdp_flush(struct net_device *dev) virtqueue_kick(sq->vq); } -static bool __virtnet_xdp_xmit(struct virtnet_info *vi, - struct xdp_buff *xdp) +static int __virtnet_xdp_xmit(struct virtnet_info *vi, + struct xdp_buff *xdp) { struct virtio_net_hdr_mrg_rxbuf *hdr; - unsigned int len; + struct xdp_frame *xdpf, *xdpf_sent; struct send_queue *sq; + unsigned int len; unsigned int qp; - void *xdp_sent; int err; qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id(); sq = &vi->sq[qp]; /* Free up any pending old buffers before queueing new ones. */ - while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) { - struct page *sent_page = virt_to_head_page(xdp_sent); + while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) + xdp_return_frame(xdpf_sent->data, &xdpf_sent->mem); - put_page(sent_page); - } + xdpf = convert_to_xdp_frame(xdp); + if (unlikely(!xdpf)) + return -EOVERFLOW; + + /* virtqueue want to use data area in-front of packet */ + if (unlikely(xdpf->metasize > 0)) + return -EOPNOTSUPP; - xdp->data -= vi->hdr_len; + if (unlikely(xdpf->headroom < vi->hdr_len)) + return -EOVERFLOW; + + /* Make room for virtqueue hdr (also change xdpf->headroom?) */ + xdpf->data -= vi->hdr_len; /* Zero header and leave csum up to XDP layers */ - hdr = xdp->data; + hdr = xdpf->data; memset(hdr, 0, vi->hdr_len); + xdpf->len += vi->hdr_len; - sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data); + sg_init_one(sq->sg, xdpf->data, xdpf->len); - err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp->data, GFP_ATOMIC); + err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdpf, GFP_ATOMIC); if (unlikely(err)) - return false; /* Caller handle free/refcnt */ + return -ENOSPC; /* Caller handle free/refcnt */ - return true; + return 0; } static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) @@ -437,7 +447,6 @@ static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) struct virtnet_info *vi = netdev_priv(dev); struct receive_queue *rq = vi->rq; struct bpf_prog *xdp_prog; - bool sent; /* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this * indicate XDP resources have been successfully allocated. @@ -446,10 +455,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) if (!xdp_prog) return -ENXIO; - sent = __virtnet_xdp_xmit(vi, xdp); - if (!sent) - return -ENOSPC; - return 0; + return __virtnet_xdp_xmit(vi, xdp); } static unsigned int virtnet_get_headroom(struct virtnet_info *vi) @@ -537,7 +543,6 @@ static struct sk_buff *receive_small(struct net_device *dev, struct page *page = virt_to_head_page(buf); unsigned int delta = 0; struct page *xdp_page; - bool sent; int err; len -= vi->hdr_len; @@ -588,8 +593,8 @@ static struct sk_buff *receive_small(struct net_device *dev, delta = orig_data - xdp.data; break; case XDP_TX: - sent = __virtnet_xdp_xmit(vi, &xdp); - if (unlikely(!sent)) { + err = __virtnet_xdp_xmit(vi, &xdp); + if (unlikely(err)) { trace_xdp_exception(vi->dev, xdp_prog, act); goto err_xdp; } @@ -674,7 +679,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, unsigned int truesize; unsigned int headroom = mergeable_ctx_to_headroom(ctx); int err; - bool sent; head_skb = NULL; @@ -743,8 +747,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, } break; case XDP_TX: - sent = __virtnet_xdp_xmit(vi, &xdp); - if (unlikely(!sent)) { + err = __virtnet_xdp_xmit(vi, &xdp); + if (unlikely(err)) { trace_xdp_exception(vi->dev, xdp_prog, act); if (unlikely(xdp_page != page)) put_page(xdp_page); From patchwork Mon Nov 2 12:48:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392230 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt6N4P09z9sWH; Mon, 2 Nov 2020 23:49:48 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHG-0006LT-Ve; Mon, 02 Nov 2020 12:49:39 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH1-00069u-Uv for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:24 +0000 Received: from mail-qt1-f200.google.com ([209.85.160.200]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH0-00046H-MF for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:22 +0000 Received: by mail-qt1-f200.google.com with SMTP id f10so7926355qtv.6 for ; Mon, 02 Nov 2020 04:49:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2eZ5t85CuFKNoXWGlMRTRAYAKr6Xv2qJ8WvgKEBA5wk=; b=AF6hCdl5fUrNxXsWBhygkXuUq5CUMX9x7l9bW8bGLR9kLdrKJuWZfPA1lrcIRSsnlP Ym71XRgASHUCr2/HahmotBKWUQ9GPwTxpGybx8XEHSw4NRtrtaV04Z+Hwogl+gL08zib Lh0DGu+NWo5ni/XWOuPbsw8IwyxpXyPj0Fg2wLr8N7LgnIBA/DPr8TAMehfiIJ1349ip +8i0ZOrtsnjeafEWDN1vyiw7z2VtbofIBIuQPW1sTf2FfSg3m0AZr7Hlif0hR7jZcdTS /Z1pL2bmvVrB2JGPkiRsHf6+iiTEld0jy2SglqOaagxv7k4nmj1RWY5nIOGANM9ww28v uvHg== X-Gm-Message-State: AOAM532KuxZW4QBuL1MGUXIm/MS6k4VaMk7U5bxeC+lOSEZwKpT7Aw4q oOM7T/hjQUHIjQj0dJ7AZT/wvw4nfsk5GCmCgRuXL6HZxH8brfS5twJYvOx8k19Zblo/wlG/KDg t60sXTKJhGGCSUFSFTSzG9wLY1uj73+z1LIsLNHI3Cg== X-Received: by 2002:a37:ba42:: with SMTP id k63mr14371141qkf.22.1604321361229; Mon, 02 Nov 2020 04:49:21 -0800 (PST) X-Google-Smtp-Source: ABdhPJyqaBWduP4++T0VBKyamUoHQZ/+EJvNGuaqBaahAvzR6lkJ5DkfGnL4pfOJu9ZbzzACr/Rf0g== X-Received: by 2002:a37:ba42:: with SMTP id k63mr14371120qkf.22.1604321360933; Mon, 02 Nov 2020 04:49:20 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:20 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 20/40] bpf: cpumap convert to use generic xdp_frame Date: Mon, 2 Nov 2020 07:48:36 -0500 Message-Id: <20201102124856.4659-21-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 The generic xdp_frame format, was inspired by the cpumap own internal xdp_pkt format. It is now time to convert it over to the generic xdp_frame format. The cpumap needs one extra field dev_rx. Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (backported from commit 70280ed91cb8acb43e8fd7a8094840846c172ac5) [ vilhelmgray: context adjustment ] Signed-off-by: William Breathitt Gray --- include/net/xdp.h | 1 + kernel/bpf/cpumap.c | 100 +++++++++++++------------------------------- 2 files changed, 29 insertions(+), 72 deletions(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index 756c42811e78..ea3773f94f65 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -67,6 +67,7 @@ struct xdp_frame { * while mem info is valid on remote CPU. */ struct xdp_mem_info mem; + struct net_device *dev_rx; /* used by cpumap */ }; /* Convert xdp_buff to xdp_frame */ diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 1507e305ecfc..924b3cd75da9 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -165,52 +165,8 @@ static void cpu_map_kthread_stop(struct work_struct *work) kthread_stop(rcpu->kthread); } -/* For now, xdp_pkt is a cpumap internal data structure, with info - * carried between enqueue to dequeue. It is mapped into the top - * headroom of the packet, to avoid allocating separate mem. - */ -struct xdp_pkt { - void *data; - u16 len; - u16 headroom; - u16 metasize; - /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time, - * while mem info is valid on remote CPU. - */ - struct xdp_mem_info mem; - struct net_device *dev_rx; -}; - -/* Convert xdp_buff to xdp_pkt */ -static struct xdp_pkt *convert_to_xdp_pkt(struct xdp_buff *xdp) -{ - struct xdp_pkt *xdp_pkt; - int metasize; - int headroom; - - /* Assure headroom is available for storing info */ - headroom = xdp->data - xdp->data_hard_start; - metasize = xdp->data - xdp->data_meta; - metasize = metasize > 0 ? metasize : 0; - if (unlikely((headroom - metasize) < sizeof(*xdp_pkt))) - return NULL; - - /* Store info in top of packet */ - xdp_pkt = xdp->data_hard_start; - - xdp_pkt->data = xdp->data; - xdp_pkt->len = xdp->data_end - xdp->data; - xdp_pkt->headroom = headroom - sizeof(*xdp_pkt); - xdp_pkt->metasize = metasize; - - /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */ - xdp_pkt->mem = xdp->rxq->mem; - - return xdp_pkt; -} - struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu, - struct xdp_pkt *xdp_pkt) + struct xdp_frame *xdpf) { unsigned int frame_size; void *pkt_data_start; @@ -225,7 +181,7 @@ struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu, * would be preferred to set frame_size to 2048 or 4096 * depending on the driver. * frame_size = 2048; - * frame_len = frame_size - sizeof(*xdp_pkt); + * frame_len = frame_size - sizeof(*xdp_frame); * * Instead, with info avail, skb_shared_info in placed after * packet len. This, unfortunately fakes the truesize. @@ -233,21 +189,21 @@ struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu, * is not at a fixed memory location, with mixed length * packets, which is bad for cache-line hotness. */ - frame_size = SKB_DATA_ALIGN(xdp_pkt->len) + xdp_pkt->headroom + + frame_size = SKB_DATA_ALIGN(xdpf->len) + xdpf->headroom + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - pkt_data_start = xdp_pkt->data - xdp_pkt->headroom; + pkt_data_start = xdpf->data - xdpf->headroom; skb = build_skb(pkt_data_start, frame_size); if (!skb) return NULL; - skb_reserve(skb, xdp_pkt->headroom); - __skb_put(skb, xdp_pkt->len); - if (xdp_pkt->metasize) - skb_metadata_set(skb, xdp_pkt->metasize); + skb_reserve(skb, xdpf->headroom); + __skb_put(skb, xdpf->len); + if (xdpf->metasize) + skb_metadata_set(skb, xdpf->metasize); /* Essential SKB info: protocol and skb->dev */ - skb->protocol = eth_type_trans(skb, xdp_pkt->dev_rx); + skb->protocol = eth_type_trans(skb, xdpf->dev_rx); /* Optional SKB info, currently missing: * - HW checksum info (skb->ip_summed) @@ -265,11 +221,11 @@ static void __cpu_map_ring_cleanup(struct ptr_ring *ring) * invoked cpu_map_kthread_stop(). Catch any broken behaviour * gracefully and warn once. */ - struct xdp_pkt *xdp_pkt; + struct xdp_frame *xdpf; - while ((xdp_pkt = ptr_ring_consume(ring))) - if (WARN_ON_ONCE(xdp_pkt)) - xdp_return_frame(xdp_pkt, &xdp_pkt->mem); + while ((xdpf = ptr_ring_consume(ring))) + if (WARN_ON_ONCE(xdpf)) + xdp_return_frame(xdpf->data, &xdpf->mem); } static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu) @@ -296,7 +252,7 @@ static int cpu_map_kthread_run(void *data) */ while (!kthread_should_stop() || !__ptr_ring_empty(rcpu->queue)) { unsigned int processed = 0, drops = 0, sched = 0; - struct xdp_pkt *xdp_pkt; + struct xdp_frame *xdpf; /* Release CPU reschedule checks */ if (__ptr_ring_empty(rcpu->queue)) { @@ -319,13 +275,13 @@ static int cpu_map_kthread_run(void *data) * kthread CPU pinned. Lockless access to ptr_ring * consume side valid as no-resize allowed of queue. */ - while ((xdp_pkt = __ptr_ring_consume(rcpu->queue))) { + while ((xdpf = __ptr_ring_consume(rcpu->queue))) { struct sk_buff *skb; int ret; - skb = cpu_map_build_skb(rcpu, xdp_pkt); + skb = cpu_map_build_skb(rcpu, xdpf); if (!skb) { - xdp_return_frame(xdp_pkt, &xdp_pkt->mem); + xdp_return_frame(xdpf->data, &xdpf->mem); continue; } @@ -621,13 +577,13 @@ static int bq_flush_to_queue(struct bpf_cpu_map_entry *rcpu, spin_lock(&q->producer_lock); for (i = 0; i < bq->count; i++) { - struct xdp_pkt *xdp_pkt = bq->q[i]; + struct xdp_frame *xdpf = bq->q[i]; int err; - err = __ptr_ring_produce(q, xdp_pkt); + err = __ptr_ring_produce(q, xdpf); if (err) { drops++; - xdp_return_frame(xdp_pkt->data, &xdp_pkt->mem); + xdp_return_frame(xdpf->data, &xdpf->mem); } processed++; } @@ -642,7 +598,7 @@ static int bq_flush_to_queue(struct bpf_cpu_map_entry *rcpu, /* Runs under RCU-read-side, plus in softirq under NAPI protection. * Thus, safe percpu variable access. */ -static int bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_pkt *xdp_pkt) +static int bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf) { struct xdp_bulk_queue *bq = this_cpu_ptr(rcpu->bulkq); @@ -653,28 +609,28 @@ static int bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_pkt *xdp_pkt) * driver to code invoking us to finished, due to driver * (e.g. ixgbe) recycle tricks based on page-refcnt. * - * Thus, incoming xdp_pkt is always queued here (else we race + * Thus, incoming xdp_frame is always queued here (else we race * with another CPU on page-refcnt and remaining driver code). * Queue time is very short, as driver will invoke flush * operation, when completing napi->poll call. */ - bq->q[bq->count++] = xdp_pkt; + bq->q[bq->count++] = xdpf; return 0; } int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp, struct net_device *dev_rx) { - struct xdp_pkt *xdp_pkt; + struct xdp_frame *xdpf; - xdp_pkt = convert_to_xdp_pkt(xdp); - if (unlikely(!xdp_pkt)) + xdpf = convert_to_xdp_frame(xdp); + if (unlikely(!xdpf)) return -EOVERFLOW; /* Info needed when constructing SKB on remote CPU */ - xdp_pkt->dev_rx = dev_rx; + xdpf->dev_rx = dev_rx; - bq_enqueue(rcpu, xdp_pkt); + bq_enqueue(rcpu, xdpf); return 0; } From patchwork Mon Nov 2 12:48:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392234 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt6y6Yf6z9sVK; Mon, 2 Nov 2020 23:50:18 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHo-0006j0-Lj; Mon, 02 Nov 2020 12:50:12 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH2-0006AG-NZ for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:24 +0000 Received: from mail-qt1-f200.google.com ([209.85.160.200]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH1-00046P-EQ for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:23 +0000 Received: by mail-qt1-f200.google.com with SMTP id h16so7916248qtr.8 for ; Mon, 02 Nov 2020 04:49:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Uktc1Q46Eoz5Ygx7PBQWflqz2glC2C4I91TeuRpAm2U=; b=F31LHfilvczZxm1lQ7EM8qTaiyx5vEB4wo9OvSCV5bqbs5+jWaJ+7SMwKO5Xdnv3PJ hLj5Iq6e8i1O7YmRWNLzyES6tjRWU4NZiAHqf+Td7zv3FK7M+73qy2knEAQ0GCZSF1NB XZHfqWpkwPNsZysdAxKEyMpJ1CKemPWUiYCnTiPx/1TJH1qPe8v3VGBbVRvV5zXWponr IUAUiVUlwmWidLLlHgnIe5trRi7OqkVA2PZls22TXszPg3KHHDn6J/k1eByiV4XbcV8f QoULRBDAmXusu+DRFyY1sA+RkVmwyK22YbURPrQRggs6tWvMiBR2RN+XgoOm+4UoonmQ /0Mg== X-Gm-Message-State: AOAM533CBq8/u2ph/gwR+hgnJRRfzuXT4v/ktaw0BqLXI6ZEov0GF9GV 9jfNFFSi2AvcvMDSyxTo0mZBUZ7ZplnyTXZHmFR5I6DAfxL39xU86OG3FWcgtY/NpxAVKnPYSiN a4HGA6xiJC4SVxD/4IfsYFvunXGRAhL28vymC3F9ATg== X-Received: by 2002:a05:620a:100e:: with SMTP id z14mr13910077qkj.278.1604321362320; Mon, 02 Nov 2020 04:49:22 -0800 (PST) X-Google-Smtp-Source: ABdhPJyccHdIHSZfBFHBgehYnWj2iiS5HvRaXPffKqvl3oLMoFOSIbwNIGijx4LCrOR8ZaI6mS4RNA== X-Received: by 2002:a05:620a:100e:: with SMTP id z14mr13910060qkj.278.1604321362043; Mon, 02 Nov 2020 04:49:22 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.21 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:21 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 21/40] i40e: convert to use generic xdp_frame and xdp_return_frame API Date: Mon, 2 Nov 2020 07:48:37 -0500 Message-Id: <20201102124856.4659-22-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 Also convert driver i40e, which very recently got XDP_REDIRECT support in commit d9314c474d4f ("i40e: add support for XDP_REDIRECT"). V7: This patch got added in V7 of this patchset. Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (cherry picked from commit b411ef11020d012790839a5414040283d9335386) Signed-off-by: William Breathitt Gray --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 20 +++++++++++++++----- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 1 + 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 9bc0edac43b5..c674d0a566a8 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -631,7 +631,8 @@ static void i40e_unmap_and_free_tx_resource(struct i40e_ring *ring, if (tx_buffer->tx_flags & I40E_TX_FLAGS_FD_SB) kfree(tx_buffer->raw_buf); else if (ring_is_xdp(ring)) - page_frag_free(tx_buffer->raw_buf); + xdp_return_frame(tx_buffer->xdpf->data, + &tx_buffer->xdpf->mem); else dev_kfree_skb_any(tx_buffer->skb); if (dma_unmap_len(tx_buffer, len)) @@ -775,7 +776,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi, /* free the skb/XDP data */ if (ring_is_xdp(tx_ring)) - page_frag_free(tx_buf->raw_buf); + xdp_return_frame(tx_buf->xdpf->data, &tx_buf->xdpf->mem); else napi_consume_skb(tx_buf->skb, napi_budget); @@ -2007,6 +2008,8 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring, if (!xdp_prog) goto xdp_out; + prefetchw(xdp->data_hard_start); /* xdp_frame write */ + act = bpf_prog_run_xdp(xdp_prog, xdp); switch (act) { case XDP_PASS: @@ -3267,25 +3270,32 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb, static int i40e_xmit_xdp_ring(struct xdp_buff *xdp, struct i40e_ring *xdp_ring) { - u32 size = xdp->data_end - xdp->data; u16 i = xdp_ring->next_to_use; struct i40e_tx_buffer *tx_bi; struct i40e_tx_desc *tx_desc; + struct xdp_frame *xdpf; dma_addr_t dma; + u32 size; + + xdpf = convert_to_xdp_frame(xdp); + if (unlikely(!xdpf)) + return I40E_XDP_CONSUMED; + + size = xdpf->len; if (!unlikely(I40E_DESC_UNUSED(xdp_ring))) { xdp_ring->tx_stats.tx_busy++; return I40E_XDP_CONSUMED; } - dma = dma_map_single(xdp_ring->dev, xdp->data, size, DMA_TO_DEVICE); + dma = dma_map_single(xdp_ring->dev, xdpf->data, size, DMA_TO_DEVICE); if (dma_mapping_error(xdp_ring->dev, dma)) return I40E_XDP_CONSUMED; tx_bi = &xdp_ring->tx_bi[i]; tx_bi->bytecount = size; tx_bi->gso_segs = 1; - tx_bi->raw_buf = xdp->data; + tx_bi->xdpf = xdpf; /* record length, and DMA address */ dma_unmap_len_set(tx_bi, len, size); diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index c0fe46404a0c..b06e4cf86f60 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -298,6 +298,7 @@ static inline unsigned int i40e_txd_use_count(unsigned int size) struct i40e_tx_buffer { struct i40e_tx_desc *next_to_watch; union { + struct xdp_frame *xdpf; struct sk_buff *skb; void *raw_buf; }; From patchwork Mon Nov 2 12:48:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392231 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt6V5Brhz9sVY; Mon, 2 Nov 2020 23:49:54 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHP-0006R6-DR; Mon, 02 Nov 2020 12:49:47 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH4-0006B1-21 for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:26 +0000 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH2-00046b-Pa for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:24 +0000 Received: by mail-qk1-f197.google.com with SMTP id f9so8594369qkg.13 for ; Mon, 02 Nov 2020 04:49:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0UZ7YrGZwHDEXy31JVDEBY5BmsOoovwmX0i+DQ0QEnM=; b=odJs0bAdNMs7VD0oNdVcTAVlU/JO0gn+87mvVLLVQu5w1zjDGPlk0ILsza2oJNlwQq jMN2CvpnfNOo/bXM8nYFzEqLu1yrplJM6VOwBNc/dT+9qNsgQFp4QF7pzL2mZgLSQ8Dp 2ZPTeMccMLvZocCbhwwDFvKQZve7rj7CGyx7VM1a+9IUa621x0yIQ2M/+iJa/PzTULpr e+Kxv+qX3Dsl+hUOdGqWI+gZfVX3G6knGarbo891G0Chn2RO+0d/8gsCHXuY+Z6RFPEL ++n+Mso0c7iOLys6hgjo2c3uqgZBspiAs6I2BcRzqu43SbSxN4LjifnDHgUVBtV2HJ+d ahqQ== X-Gm-Message-State: AOAM532uUrhrJXTQ2UUMULJiLKek2Ivr2otAF9KbbhXCQCF3btlD98uw wSxbQ9Tv3VhrVEbJvV3NfAnu8X7d+36hU1Q9eHLGnBHgpO7sIBWmsjKubxH0m8Zq3rGAVXUWNnb 2+gdbO+1j5vjkBvamLm7RuXY94FkTjU/BeObmrxj7VQ== X-Received: by 2002:a37:68b:: with SMTP id 133mr14596333qkg.354.1604321363484; Mon, 02 Nov 2020 04:49:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJyeR+2/8hCPXshk9AuVz4upN/oAV8rrQIceYFeQTJRSXS6A3FsCuXDlWdnVePU7UZ4LUpzLGw== X-Received: by 2002:a37:68b:: with SMTP id 133mr14596304qkg.354.1604321363185; Mon, 02 Nov 2020 04:49:23 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:22 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 22/40] xdp: rhashtable with allocator ID to pointer mapping Date: Mon, 2 Nov 2020 07:48:38 -0500 Message-Id: <20201102124856.4659-23-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. Instead of using the IDR infrastructure, which uses a radix tree, use a dynamic rhashtable, for creating ID to pointer lookup table, because this is faster. The problem that is being solved here is that, the xdp_rxq_info pointer (stored in xdp_buff) cannot be used directly, as the guaranteed lifetime is too short. The info is needed on a (potentially) remote CPU during DMA-TX completion time . In an xdp_frame the xdp_mem_info is stored, when it got converted from an xdp_buff, which is sufficient for the simple page refcnt based recycle schemes. For more advanced allocators there is a need to store a pointer to the registered allocator. Thus, there is a need to guard the lifetime or validity of the allocator pointer, which is done through this rhashtable ID map to pointer. The removal and validity of of the allocator and helper struct xdp_mem_allocator is guarded by RCU. The allocator will be created by the driver, and registered with xdp_rxq_info_reg_mem_model(). It is up-to debate who is responsible for freeing the allocator pointer or invoking the allocator destructor function. In any case, this must happen via RCU freeing. Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. V4: Per req of Jason Wang - Use xdp_rxq_info_reg_mem_model() in all drivers implementing XDP_REDIRECT, even-though it's not strictly necessary when allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero). V6: Per req of Alex Duyck - Introduce rhashtable_lookup() call in later patch V8: Address sparse should be static warnings (from kbuild test robot) Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (cherry picked from commit 8d5d88527587516bd58ff0f3810f07c38e65e2be) Signed-off-by: William Breathitt Gray --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 9 +- drivers/net/tun.c | 6 + drivers/net/virtio_net.c | 7 + include/net/xdp.h | 14 +- net/core/xdp.c | 223 +++++++++++++++++- 5 files changed, 241 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 25ebb81ee1bb..3040ec9362c8 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -6469,7 +6469,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter, struct device *dev = rx_ring->dev; int orig_node = dev_to_node(dev); int ring_node = -1; - int size; + int size, err; size = sizeof(struct ixgbe_rx_buffer) * rx_ring->count; @@ -6506,6 +6506,13 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter, rx_ring->queue_index) < 0) goto err; + err = xdp_rxq_info_reg_mem_model(&rx_ring->xdp_rxq, + MEM_TYPE_PAGE_SHARED, NULL); + if (err) { + xdp_rxq_info_unreg(&rx_ring->xdp_rxq); + goto err; + } + rx_ring->xdp_prog = adapter->xdp_prog; return 0; diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 25962a060574..72664d59abc0 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -809,6 +809,12 @@ static int tun_attach(struct tun_struct *tun, struct file *file, tun->dev, tfile->queue_index); if (err < 0) goto out; + err = xdp_rxq_info_reg_mem_model(&tfile->xdp_rxq, + MEM_TYPE_PAGE_SHARED, NULL); + if (err < 0) { + xdp_rxq_info_unreg(&tfile->xdp_rxq); + goto out; + } err = 0; } diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 280bab31a2d3..14a59b12a96b 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1289,6 +1289,13 @@ static int virtnet_open(struct net_device *dev) if (err < 0) return err; + err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq, + MEM_TYPE_PAGE_SHARED, NULL); + if (err < 0) { + xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); + return err; + } + virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi); } diff --git a/include/net/xdp.h b/include/net/xdp.h index ea3773f94f65..5f67c62540aa 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -41,6 +41,7 @@ enum xdp_mem_type { struct xdp_mem_info { u32 type; /* enum xdp_mem_type, but known size type */ + u32 id; }; struct xdp_rxq_info { @@ -99,18 +100,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp) return xdp_frame; } -static inline -void xdp_return_frame(void *data, struct xdp_mem_info *mem) -{ - if (mem->type == MEM_TYPE_PAGE_SHARED) - page_frag_free(data); - - if (mem->type == MEM_TYPE_PAGE_ORDER0) { - struct page *page = virt_to_page(data); /* Assumes order0 page*/ - - put_page(page); - } -} +void xdp_return_frame(void *data, struct xdp_mem_info *mem); int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq, struct net_device *dev, u32 queue_index); diff --git a/net/core/xdp.c b/net/core/xdp.c index 7e6b3545277d..8b2cb79b5de0 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -5,6 +5,9 @@ */ #include #include +#include +#include +#include #include @@ -13,6 +16,99 @@ #define REG_STATE_UNREGISTERED 0x2 #define REG_STATE_UNUSED 0x3 +static DEFINE_IDA(mem_id_pool); +static DEFINE_MUTEX(mem_id_lock); +#define MEM_ID_MAX 0xFFFE +#define MEM_ID_MIN 1 +static int mem_id_next = MEM_ID_MIN; + +static bool mem_id_init; /* false */ +static struct rhashtable *mem_id_ht; + +struct xdp_mem_allocator { + struct xdp_mem_info mem; + void *allocator; + struct rhash_head node; + struct rcu_head rcu; +}; + +static u32 xdp_mem_id_hashfn(const void *data, u32 len, u32 seed) +{ + const u32 *k = data; + const u32 key = *k; + + BUILD_BUG_ON(FIELD_SIZEOF(struct xdp_mem_allocator, mem.id) + != sizeof(u32)); + + /* Use cyclic increasing ID as direct hash key, see rht_bucket_index */ + return key << RHT_HASH_RESERVED_SPACE; +} + +static int xdp_mem_id_cmp(struct rhashtable_compare_arg *arg, + const void *ptr) +{ + const struct xdp_mem_allocator *xa = ptr; + u32 mem_id = *(u32 *)arg->key; + + return xa->mem.id != mem_id; +} + +static const struct rhashtable_params mem_id_rht_params = { + .nelem_hint = 64, + .head_offset = offsetof(struct xdp_mem_allocator, node), + .key_offset = offsetof(struct xdp_mem_allocator, mem.id), + .key_len = FIELD_SIZEOF(struct xdp_mem_allocator, mem.id), + .max_size = MEM_ID_MAX, + .min_size = 8, + .automatic_shrinking = true, + .hashfn = xdp_mem_id_hashfn, + .obj_cmpfn = xdp_mem_id_cmp, +}; + +static void __xdp_mem_allocator_rcu_free(struct rcu_head *rcu) +{ + struct xdp_mem_allocator *xa; + + xa = container_of(rcu, struct xdp_mem_allocator, rcu); + + /* Allow this ID to be reused */ + ida_simple_remove(&mem_id_pool, xa->mem.id); + + /* TODO: Depending on allocator type/pointer free resources */ + + /* Poison memory */ + xa->mem.id = 0xFFFF; + xa->mem.type = 0xF0F0; + xa->allocator = (void *)0xDEAD9001; + + kfree(xa); +} + +static void __xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq) +{ + struct xdp_mem_allocator *xa; + int id = xdp_rxq->mem.id; + int err; + + if (id == 0) + return; + + mutex_lock(&mem_id_lock); + + xa = rhashtable_lookup(mem_id_ht, &id, mem_id_rht_params); + if (!xa) { + mutex_unlock(&mem_id_lock); + return; + } + + err = rhashtable_remove_fast(mem_id_ht, &xa->node, mem_id_rht_params); + WARN_ON(err); + + call_rcu(&xa->rcu, __xdp_mem_allocator_rcu_free); + + mutex_unlock(&mem_id_lock); +} + void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq) { /* Simplify driver cleanup code paths, allow unreg "unused" */ @@ -21,8 +117,14 @@ void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq) WARN(!(xdp_rxq->reg_state == REG_STATE_REGISTERED), "Driver BUG"); + __xdp_rxq_info_unreg_mem_model(xdp_rxq); + xdp_rxq->reg_state = REG_STATE_UNREGISTERED; xdp_rxq->dev = NULL; + + /* Reset mem info to defaults */ + xdp_rxq->mem.id = 0; + xdp_rxq->mem.type = 0; } EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg); @@ -72,20 +174,131 @@ bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq) } EXPORT_SYMBOL_GPL(xdp_rxq_info_is_reg); +static int __mem_id_init_hash_table(void) +{ + struct rhashtable *rht; + int ret; + + if (unlikely(mem_id_init)) + return 0; + + rht = kzalloc(sizeof(*rht), GFP_KERNEL); + if (!rht) + return -ENOMEM; + + ret = rhashtable_init(rht, &mem_id_rht_params); + if (ret < 0) { + kfree(rht); + return ret; + } + mem_id_ht = rht; + smp_mb(); /* mutex lock should provide enough pairing */ + mem_id_init = true; + + return 0; +} + +/* Allocate a cyclic ID that maps to allocator pointer. + * See: https://www.kernel.org/doc/html/latest/core-api/idr.html + * + * Caller must lock mem_id_lock. + */ +static int __mem_id_cyclic_get(gfp_t gfp) +{ + int retries = 1; + int id; + +again: + id = ida_simple_get(&mem_id_pool, mem_id_next, MEM_ID_MAX, gfp); + if (id < 0) { + if (id == -ENOSPC) { + /* Cyclic allocator, reset next id */ + if (retries--) { + mem_id_next = MEM_ID_MIN; + goto again; + } + } + return id; /* errno */ + } + mem_id_next = id + 1; + + return id; +} + int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, enum xdp_mem_type type, void *allocator) { + struct xdp_mem_allocator *xdp_alloc; + gfp_t gfp = GFP_KERNEL; + int id, errno, ret; + void *ptr; + + if (xdp_rxq->reg_state != REG_STATE_REGISTERED) { + WARN(1, "Missing register, driver bug"); + return -EFAULT; + } + if (type >= MEM_TYPE_MAX) return -EINVAL; xdp_rxq->mem.type = type; - if (allocator) - return -EOPNOTSUPP; + if (!allocator) + return 0; + + /* Delay init of rhashtable to save memory if feature isn't used */ + if (!mem_id_init) { + mutex_lock(&mem_id_lock); + ret = __mem_id_init_hash_table(); + mutex_unlock(&mem_id_lock); + if (ret < 0) { + WARN_ON(1); + return ret; + } + } + + xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp); + if (!xdp_alloc) + return -ENOMEM; + + mutex_lock(&mem_id_lock); + id = __mem_id_cyclic_get(gfp); + if (id < 0) { + errno = id; + goto err; + } + xdp_rxq->mem.id = id; + xdp_alloc->mem = xdp_rxq->mem; + xdp_alloc->allocator = allocator; + + /* Insert allocator into ID lookup table */ + ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node); + if (IS_ERR(ptr)) { + errno = PTR_ERR(ptr); + goto err; + } + + mutex_unlock(&mem_id_lock); - /* TODO: Allocate an ID that maps to allocator pointer - * See: https://www.kernel.org/doc/html/latest/core-api/idr.html - */ return 0; +err: + mutex_unlock(&mem_id_lock); + kfree(xdp_alloc); + return errno; } EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model); + +void xdp_return_frame(void *data, struct xdp_mem_info *mem) +{ + if (mem->type == MEM_TYPE_PAGE_SHARED) { + page_frag_free(data); + return; + } + + if (mem->type == MEM_TYPE_PAGE_ORDER0) { + struct page *page = virt_to_page(data); /* Assumes order0 page*/ + + put_page(page); + } +} +EXPORT_SYMBOL_GPL(xdp_return_frame); From patchwork Mon Nov 2 12:48:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392235 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt714Rp5z9sVv; Mon, 2 Nov 2020 23:50:21 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHr-0006lI-RX; Mon, 02 Nov 2020 12:50:15 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH4-0006C2-Vf for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:27 +0000 Received: from mail-qv1-f71.google.com ([209.85.219.71]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH4-00046l-0x for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:26 +0000 Received: by mail-qv1-f71.google.com with SMTP id a1so8115773qvj.3 for ; Mon, 02 Nov 2020 04:49:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CIMq7h9Slli5nZQHUa+MWhT/aX85kDO/+NjFueVNwGo=; b=fCZ3V6LTK/Lyx7yR7KcJ4BuL2IGk2PzSYePoUeZYcZwLkzgOgGJSPBj1n9r/i0YuOG 9iYzNddg3HKYcrvSnpyI7ZeEaiPMUJdMbOOMhgYMS9Q9Q8KwRey9CpLBXVRmX42se8iv mA9u0H1oek1euR2r8X99xdK3r0rA/clvwSZ97gdUKbRdIAF9rynCGtwUnxVXBHBVl7jv h5wlmuEyjC7NIFerpnvJuyb2UHFYSHg+agehwD6+wkNBnPz0/sYckAxz/yr8qfj9sSD/ ne0AzofaUHwdOkm9xXj3k3d6C4cDlsbXQE2z+I49Y7VnsMLOoaZV36ssYH1earidCVil 6UUQ== X-Gm-Message-State: AOAM532nGf3RmmO0IoEwl4DtthhHTht0RY3FuN3vXIluQqFNzScn83hp XlN29a0ZM0KHW4f/R4GUZYMWSq7KVZy4UF4xWqOaFDXWpaozRuP0WIt0Pkp9VWw5vJ8VuDIS2fV 97+gF9TeGlwqHb4iQZHP5hkpuwjaraL2LQaz5j091XA== X-Received: by 2002:a05:620a:b97:: with SMTP id k23mr4625468qkh.214.1604321364537; Mon, 02 Nov 2020 04:49:24 -0800 (PST) X-Google-Smtp-Source: ABdhPJw3Uq29iPmhZ4y6ti6LtRvKLzJ0YRE7u74ITM2+VIw38AhEnrbaRfNK4o/SgYck+lDd4zdEpA== X-Received: by 2002:a05:620a:b97:: with SMTP id k23mr4625434qkh.214.1604321364131; Mon, 02 Nov 2020 04:49:24 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:23 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 23/40] page_pool: refurbish version of page_pool code Date: Mon, 2 Nov 2020 07:48:39 -0500 Message-Id: <20201102124856.4659-24-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 Need a fast page recycle mechanism for ndo_xdp_xmit API for returning pages on DMA-TX completion time, which have good cross CPU performance, given DMA-TX completion time can happen on a remote CPU. Refurbish my page_pool code, that was presented[1] at MM-summit 2016. Adapted page_pool code to not depend the page allocator and integration into struct page. The DMA mapping feature is kept, even-though it will not be activated/used in this patchset. [1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf V2: Adjustments requested by Tariq - Changed page_pool_create return codes, don't return NULL, only ERR_PTR, as this simplifies err handling in drivers. V4: many small improvements and cleanups - Add DOC comment section, that can be used by kernel-doc - Improve fallback mode, to work better with refcnt based recycling e.g. remove a WARN as pointed out by Tariq e.g. quicker fallback if ptr_ring is empty. V5: Fixed SPDX license as pointed out by Alexei V6: Adjustments requested by Eric Dumazet - Adjust ____cacheline_aligned_in_smp usage/placement - Move rcu_head in struct page_pool - Free pages quicker on destroy, minimize resources delayed an RCU period - Remove code for forward/backward compat ABI interface V8: Issues found by kbuild test robot - Address sparse should be static warnings - Only compile+link when a driver use/select page_pool, mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (cherry picked from commit ff7d6b27f894f1469dc51ccb828b7363ccd9799f) Signed-off-by: William Breathitt Gray --- .../net/ethernet/mellanox/mlx5/core/Kconfig | 1 + include/net/page_pool.h | 129 +++++++ net/Kconfig | 3 + net/core/Makefile | 1 + net/core/page_pool.c | 317 ++++++++++++++++++ 5 files changed, 451 insertions(+) create mode 100644 include/net/page_pool.h create mode 100644 net/core/page_pool.c diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig index c032319f1cb9..12257034131e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig @@ -30,6 +30,7 @@ config MLX5_CORE_EN bool "Mellanox Technologies ConnectX-4 Ethernet support" depends on NETDEVICES && ETHERNET && INET && PCI && MLX5_CORE depends on IPV6=y || IPV6=n || MLX5_CORE=m + select PAGE_POOL default n ---help--- Ethernet support in Mellanox Technologies ConnectX-4 NIC. diff --git a/include/net/page_pool.h b/include/net/page_pool.h new file mode 100644 index 000000000000..1fe77db59518 --- /dev/null +++ b/include/net/page_pool.h @@ -0,0 +1,129 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * page_pool.h + * Author: Jesper Dangaard Brouer + * Copyright (C) 2016 Red Hat, Inc. + */ + +/** + * DOC: page_pool allocator + * + * This page_pool allocator is optimized for the XDP mode that + * uses one-frame-per-page, but have fallbacks that act like the + * regular page allocator APIs. + * + * Basic use involve replacing alloc_pages() calls with the + * page_pool_alloc_pages() call. Drivers should likely use + * page_pool_dev_alloc_pages() replacing dev_alloc_pages(). + * + * If page_pool handles DMA mapping (use page->private), then API user + * is responsible for invoking page_pool_put_page() once. In-case of + * elevated refcnt, the DMA state is released, assuming other users of + * the page will eventually call put_page(). + * + * If no DMA mapping is done, then it can act as shim-layer that + * fall-through to alloc_page. As no state is kept on the page, the + * regular put_page() call is sufficient. + */ +#ifndef _NET_PAGE_POOL_H +#define _NET_PAGE_POOL_H + +#include /* Needed by ptr_ring */ +#include +#include + +#define PP_FLAG_DMA_MAP 1 /* Should page_pool do the DMA map/unmap */ +#define PP_FLAG_ALL PP_FLAG_DMA_MAP + +/* + * Fast allocation side cache array/stack + * + * The cache size and refill watermark is related to the network + * use-case. The NAPI budget is 64 packets. After a NAPI poll the RX + * ring is usually refilled and the max consumed elements will be 64, + * thus a natural max size of objects needed in the cache. + * + * Keeping room for more objects, is due to XDP_DROP use-case. As + * XDP_DROP allows the opportunity to recycle objects directly into + * this array, as it shares the same softirq/NAPI protection. If + * cache is already full (or partly full) then the XDP_DROP recycles + * would have to take a slower code path. + */ +#define PP_ALLOC_CACHE_SIZE 128 +#define PP_ALLOC_CACHE_REFILL 64 +struct pp_alloc_cache { + u32 count; + void *cache[PP_ALLOC_CACHE_SIZE]; +}; + +struct page_pool_params { + unsigned int flags; + unsigned int order; + unsigned int pool_size; + int nid; /* Numa node id to allocate from pages from */ + struct device *dev; /* device, for DMA pre-mapping purposes */ + enum dma_data_direction dma_dir; /* DMA mapping direction */ +}; + +struct page_pool { + struct rcu_head rcu; + struct page_pool_params p; + + /* + * Data structure for allocation side + * + * Drivers allocation side usually already perform some kind + * of resource protection. Piggyback on this protection, and + * require driver to protect allocation side. + * + * For NIC drivers this means, allocate a page_pool per + * RX-queue. As the RX-queue is already protected by + * Softirq/BH scheduling and napi_schedule. NAPI schedule + * guarantee that a single napi_struct will only be scheduled + * on a single CPU (see napi_schedule). + */ + struct pp_alloc_cache alloc ____cacheline_aligned_in_smp; + + /* Data structure for storing recycled pages. + * + * Returning/freeing pages is more complicated synchronization + * wise, because free's can happen on remote CPUs, with no + * association with allocation resource. + * + * Use ptr_ring, as it separates consumer and producer + * effeciently, it a way that doesn't bounce cache-lines. + * + * TODO: Implement bulk return pages into this structure. + */ + struct ptr_ring ring; +}; + +struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp); + +static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool) +{ + gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN); + + return page_pool_alloc_pages(pool, gfp); +} + +struct page_pool *page_pool_create(const struct page_pool_params *params); + +void page_pool_destroy(struct page_pool *pool); + +/* Never call this directly, use helpers below */ +void __page_pool_put_page(struct page_pool *pool, + struct page *page, bool allow_direct); + +static inline void page_pool_put_page(struct page_pool *pool, struct page *page) +{ + __page_pool_put_page(pool, page, false); +} +/* Very limited use-cases allow recycle direct */ +static inline void page_pool_recycle_direct(struct page_pool *pool, + struct page *page) +{ + __page_pool_put_page(pool, page, true); +} + +#endif /* _NET_PAGE_POOL_H */ diff --git a/net/Kconfig b/net/Kconfig index 9dba2715919d..64ba53faa206 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -440,6 +440,9 @@ config MAY_USE_DEVLINK on MAY_USE_DEVLINK to ensure they do not cause link errors when devlink is a loadable module and the driver using it is built-in. +config PAGE_POOL + bool + endif # if NET # Used by archs to tell that they support BPF JIT compiler plus which flavour. diff --git a/net/core/Makefile b/net/core/Makefile index 6dbbba8c57ae..7080417f8bc8 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -14,6 +14,7 @@ obj-y += dev.o ethtool.o dev_addr_lists.o dst.o netevent.o \ fib_notifier.o xdp.o obj-y += net-sysfs.o +obj-$(CONFIG_PAGE_POOL) += page_pool.o obj-$(CONFIG_PROC_FS) += net-procfs.o obj-$(CONFIG_NET_PKTGEN) += pktgen.o obj-$(CONFIG_NETPOLL) += netpoll.o diff --git a/net/core/page_pool.c b/net/core/page_pool.c new file mode 100644 index 000000000000..68bf07206744 --- /dev/null +++ b/net/core/page_pool.c @@ -0,0 +1,317 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * page_pool.c + * Author: Jesper Dangaard Brouer + * Copyright (C) 2016 Red Hat, Inc. + */ +#include +#include +#include + +#include +#include +#include +#include +#include /* for __put_page() */ + +static int page_pool_init(struct page_pool *pool, + const struct page_pool_params *params) +{ + unsigned int ring_qsize = 1024; /* Default */ + + memcpy(&pool->p, params, sizeof(pool->p)); + + /* Validate only known flags were used */ + if (pool->p.flags & ~(PP_FLAG_ALL)) + return -EINVAL; + + if (pool->p.pool_size) + ring_qsize = pool->p.pool_size; + + /* Sanity limit mem that can be pinned down */ + if (ring_qsize > 32768) + return -E2BIG; + + /* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. + * DMA_BIDIRECTIONAL is for allowing page used for DMA sending, + * which is the XDP_TX use-case. + */ + if ((pool->p.dma_dir != DMA_FROM_DEVICE) && + (pool->p.dma_dir != DMA_BIDIRECTIONAL)) + return -EINVAL; + + if (ptr_ring_init(&pool->ring, ring_qsize, GFP_KERNEL) < 0) + return -ENOMEM; + + return 0; +} + +struct page_pool *page_pool_create(const struct page_pool_params *params) +{ + struct page_pool *pool; + int err = 0; + + pool = kzalloc_node(sizeof(*pool), GFP_KERNEL, params->nid); + if (!pool) + return ERR_PTR(-ENOMEM); + + err = page_pool_init(pool, params); + if (err < 0) { + pr_warn("%s() gave up with errno %d\n", __func__, err); + kfree(pool); + return ERR_PTR(err); + } + return pool; +} +EXPORT_SYMBOL(page_pool_create); + +/* fast path */ +static struct page *__page_pool_get_cached(struct page_pool *pool) +{ + struct ptr_ring *r = &pool->ring; + struct page *page; + + /* Quicker fallback, avoid locks when ring is empty */ + if (__ptr_ring_empty(r)) + return NULL; + + /* Test for safe-context, caller should provide this guarantee */ + if (likely(in_serving_softirq())) { + if (likely(pool->alloc.count)) { + /* Fast-path */ + page = pool->alloc.cache[--pool->alloc.count]; + return page; + } + /* Slower-path: Alloc array empty, time to refill + * + * Open-coded bulk ptr_ring consumer. + * + * Discussion: the ring consumer lock is not really + * needed due to the softirq/NAPI protection, but + * later need the ability to reclaim pages on the + * ring. Thus, keeping the locks. + */ + spin_lock(&r->consumer_lock); + while ((page = __ptr_ring_consume(r))) { + if (pool->alloc.count == PP_ALLOC_CACHE_REFILL) + break; + pool->alloc.cache[pool->alloc.count++] = page; + } + spin_unlock(&r->consumer_lock); + return page; + } + + /* Slow-path: Get page from locked ring queue */ + page = ptr_ring_consume(&pool->ring); + return page; +} + +/* slow path */ +noinline +static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool, + gfp_t _gfp) +{ + struct page *page; + gfp_t gfp = _gfp; + dma_addr_t dma; + + /* We could always set __GFP_COMP, and avoid this branch, as + * prep_new_page() can handle order-0 with __GFP_COMP. + */ + if (pool->p.order) + gfp |= __GFP_COMP; + + /* FUTURE development: + * + * Current slow-path essentially falls back to single page + * allocations, which doesn't improve performance. This code + * need bulk allocation support from the page allocator code. + */ + + /* Cache was empty, do real allocation */ + page = alloc_pages_node(pool->p.nid, gfp, pool->p.order); + if (!page) + return NULL; + + if (!(pool->p.flags & PP_FLAG_DMA_MAP)) + goto skip_dma_map; + + /* Setup DMA mapping: use page->private for DMA-addr + * This mapping is kept for lifetime of page, until leaving pool. + */ + dma = dma_map_page(pool->p.dev, page, 0, + (PAGE_SIZE << pool->p.order), + pool->p.dma_dir); + if (dma_mapping_error(pool->p.dev, dma)) { + put_page(page); + return NULL; + } + set_page_private(page, dma); /* page->private = dma; */ + +skip_dma_map: + /* When page just alloc'ed is should/must have refcnt 1. */ + return page; +} + +/* For using page_pool replace: alloc_pages() API calls, but provide + * synchronization guarantee for allocation side. + */ +struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp) +{ + struct page *page; + + /* Fast-path: Get a page from cache */ + page = __page_pool_get_cached(pool); + if (page) + return page; + + /* Slow-path: cache empty, do real allocation */ + page = __page_pool_alloc_pages_slow(pool, gfp); + return page; +} +EXPORT_SYMBOL(page_pool_alloc_pages); + +/* Cleanup page_pool state from page */ +static void __page_pool_clean_page(struct page_pool *pool, + struct page *page) +{ + if (!(pool->p.flags & PP_FLAG_DMA_MAP)) + return; + + /* DMA unmap */ + dma_unmap_page(pool->p.dev, page_private(page), + PAGE_SIZE << pool->p.order, pool->p.dma_dir); + set_page_private(page, 0); +} + +/* Return a page to the page allocator, cleaning up our state */ +static void __page_pool_return_page(struct page_pool *pool, struct page *page) +{ + __page_pool_clean_page(pool, page); + put_page(page); + /* An optimization would be to call __free_pages(page, pool->p.order) + * knowing page is not part of page-cache (thus avoiding a + * __page_cache_release() call). + */ +} + +static bool __page_pool_recycle_into_ring(struct page_pool *pool, + struct page *page) +{ + int ret; + /* BH protection not needed if current is serving softirq */ + if (in_serving_softirq()) + ret = ptr_ring_produce(&pool->ring, page); + else + ret = ptr_ring_produce_bh(&pool->ring, page); + + return (ret == 0) ? true : false; +} + +/* Only allow direct recycling in special circumstances, into the + * alloc side cache. E.g. during RX-NAPI processing for XDP_DROP use-case. + * + * Caller must provide appropriate safe context. + */ +static bool __page_pool_recycle_direct(struct page *page, + struct page_pool *pool) +{ + if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE)) + return false; + + /* Caller MUST have verified/know (page_ref_count(page) == 1) */ + pool->alloc.cache[pool->alloc.count++] = page; + return true; +} + +void __page_pool_put_page(struct page_pool *pool, + struct page *page, bool allow_direct) +{ + /* This allocator is optimized for the XDP mode that uses + * one-frame-per-page, but have fallbacks that act like the + * regular page allocator APIs. + * + * refcnt == 1 means page_pool owns page, and can recycle it. + */ + if (likely(page_ref_count(page) == 1)) { + /* Read barrier done in page_ref_count / READ_ONCE */ + + if (allow_direct && in_serving_softirq()) + if (__page_pool_recycle_direct(page, pool)) + return; + + if (!__page_pool_recycle_into_ring(pool, page)) { + /* Cache full, fallback to free pages */ + __page_pool_return_page(pool, page); + } + return; + } + /* Fallback/non-XDP mode: API user have elevated refcnt. + * + * Many drivers split up the page into fragments, and some + * want to keep doing this to save memory and do refcnt based + * recycling. Support this use case too, to ease drivers + * switching between XDP/non-XDP. + * + * In-case page_pool maintains the DMA mapping, API user must + * call page_pool_put_page once. In this elevated refcnt + * case, the DMA is unmapped/released, as driver is likely + * doing refcnt based recycle tricks, meaning another process + * will be invoking put_page. + */ + __page_pool_clean_page(pool, page); + put_page(page); +} +EXPORT_SYMBOL(__page_pool_put_page); + +static void __page_pool_empty_ring(struct page_pool *pool) +{ + struct page *page; + + /* Empty recycle ring */ + while ((page = ptr_ring_consume(&pool->ring))) { + /* Verify the refcnt invariant of cached pages */ + if (!(page_ref_count(page) == 1)) + pr_crit("%s() page_pool refcnt %d violation\n", + __func__, page_ref_count(page)); + + __page_pool_return_page(pool, page); + } +} + +static void __page_pool_destroy_rcu(struct rcu_head *rcu) +{ + struct page_pool *pool; + + pool = container_of(rcu, struct page_pool, rcu); + + WARN(pool->alloc.count, "API usage violation"); + + __page_pool_empty_ring(pool); + ptr_ring_cleanup(&pool->ring, NULL); + kfree(pool); +} + +/* Cleanup and release resources */ +void page_pool_destroy(struct page_pool *pool) +{ + struct page *page; + + /* Empty alloc cache, assume caller made sure this is + * no-longer in use, and page_pool_alloc_pages() cannot be + * call concurrently. + */ + while (pool->alloc.count) { + page = pool->alloc.cache[--pool->alloc.count]; + __page_pool_return_page(pool, page); + } + + /* No more consumers should exist, but producers could still + * be in-flight. + */ + __page_pool_empty_ring(pool); + + /* An xdp_mem_allocator can still ref page_pool pointer */ + call_rcu(&pool->rcu, __page_pool_destroy_rcu); +} +EXPORT_SYMBOL(page_pool_destroy); From patchwork Mon Nov 2 12:48:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392236 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt721Jrdz9sWM; Mon, 2 Nov 2020 23:50:22 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHt-0006o5-VQ; Mon, 02 Nov 2020 12:50:17 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH5-0006CO-K8 for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:27 +0000 Received: from mail-qv1-f71.google.com ([209.85.219.71]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH4-00046n-Cw for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:26 +0000 Received: by mail-qv1-f71.google.com with SMTP id d41so8036577qvc.23 for ; Mon, 02 Nov 2020 04:49:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1WykU7pg+D5xD33d5wFfrviXowZsE/GTHuFgFOVKhys=; b=PVG20lmlGpyNFNvjDex62pDIaFUFKy6Jw9LdYpZRKbE/Rg/o7AGWAdc0+m74GjYKnc SfdfW7t0o9l2aug/kFiO0uzF8dGH2aihuZyIoyipb5jqU7b6pENVX9M6gvIo32f35Zwp GBVyZjNDsyUbDPTS359TxDpHsz0JJmYyTqpNUTg77Te994Bki6WCNE2yfGq2Ni3Wf6Hk qcSrX+4XnxWfbyp0iqTPj0iUSbGS7278aQRooOQZ3HU1zEdk6lhMM2QimfybCBPyvVPL BSGMTw+LbiVaLMZiMvqk1Wh7m0gSL1cz5PqQ0F81XFEEoaFsQ/YBWChO3TORTmD/6+Fr bW0g== X-Gm-Message-State: AOAM530hrV4Pdx88v8Tb8D8jsc38VLz6ZkCLxU42YJXh6iN8rPjDlx9o eQaQ1JeprUvN30FKODBwVM2NXlXqE1rkrReKeQmev0+0CdhIG1cjwgpz0fsVXOsUxPTkJ6peR0T EyorgI2MHP6chq7n+8E87+uLmj1EXueAdzfxyVgnOuw== X-Received: by 2002:ac8:6f23:: with SMTP id i3mr14173175qtv.255.1604321365223; Mon, 02 Nov 2020 04:49:25 -0800 (PST) X-Google-Smtp-Source: ABdhPJwwIISGIATY+hikCNWPnk1gLBDc5UTbL4GtkkavdEAckrKCYQWgnmVPCC4mKMqYDeQ3Lrn7lw== X-Received: by 2002:ac8:6f23:: with SMTP id i3mr14173165qtv.255.1604321364971; Mon, 02 Nov 2020 04:49:24 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:24 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 24/40] xdp: allow page_pool as an allocator type in xdp_return_frame Date: Mon, 2 Nov 2020 07:48:40 -0500 Message-Id: <20201102124856.4659-25-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 New allocator type MEM_TYPE_PAGE_POOL for page_pool usage. The registered allocator page_pool pointer is not available directly from xdp_rxq_info, but it could be (if needed). For now, the driver should keep separate track of the page_pool pointer, which it should use for RX-ring page allocation. As suggested by Saeed, to maintain a symmetric API it is the drivers responsibility to allocate/create and free/destroy the page_pool. Thus, after the driver have called xdp_rxq_info_unreg(), it is drivers responsibility to free the page_pool, but with a RCU free call. This is done easily via the page_pool helper page_pool_destroy() (which avoids touching any driver code during the RCU callback, which could happen after the driver have been unloaded). V8: address issues found by kbuild test robot - Address sparse should be static warnings - Allow xdp.o to be compiled without page_pool.o V9: Remove inline from .c file, compiler knows best Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (cherry picked from commit 57d0a1c1ac9e6a836bbab4698ba2a2e03f64bf1b) Signed-off-by: William Breathitt Gray --- include/net/page_pool.h | 14 ++++++++++ include/net/xdp.h | 3 +++ net/core/xdp.c | 60 ++++++++++++++++++++++++++++++++--------- 3 files changed, 65 insertions(+), 12 deletions(-) diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 1fe77db59518..c79087153148 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -117,7 +117,12 @@ void __page_pool_put_page(struct page_pool *pool, static inline void page_pool_put_page(struct page_pool *pool, struct page *page) { + /* When page_pool isn't compiled-in, net/core/xdp.c doesn't + * allow registering MEM_TYPE_PAGE_POOL, but shield linker. + */ +#ifdef CONFIG_PAGE_POOL __page_pool_put_page(pool, page, false); +#endif } /* Very limited use-cases allow recycle direct */ static inline void page_pool_recycle_direct(struct page_pool *pool, @@ -126,4 +131,13 @@ static inline void page_pool_recycle_direct(struct page_pool *pool, __page_pool_put_page(pool, page, true); } +static inline bool is_page_pool_compiled_in(void) +{ +#ifdef CONFIG_PAGE_POOL + return true; +#else + return false; +#endif +} + #endif /* _NET_PAGE_POOL_H */ diff --git a/include/net/xdp.h b/include/net/xdp.h index 5f67c62540aa..d0ee437753dc 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -36,6 +36,7 @@ enum xdp_mem_type { MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */ MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */ + MEM_TYPE_PAGE_POOL, MEM_TYPE_MAX, }; @@ -44,6 +45,8 @@ struct xdp_mem_info { u32 id; }; +struct page_pool; + struct xdp_rxq_info { struct net_device *dev; u32 queue_index; diff --git a/net/core/xdp.c b/net/core/xdp.c index 8b2cb79b5de0..33e382afbd95 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -8,6 +8,7 @@ #include #include #include +#include #include @@ -27,7 +28,10 @@ static struct rhashtable *mem_id_ht; struct xdp_mem_allocator { struct xdp_mem_info mem; - void *allocator; + union { + void *allocator; + struct page_pool *page_pool; + }; struct rhash_head node; struct rcu_head rcu; }; @@ -74,7 +78,9 @@ static void __xdp_mem_allocator_rcu_free(struct rcu_head *rcu) /* Allow this ID to be reused */ ida_simple_remove(&mem_id_pool, xa->mem.id); - /* TODO: Depending on allocator type/pointer free resources */ + /* Notice, driver is expected to free the *allocator, + * e.g. page_pool, and MUST also use RCU free. + */ /* Poison memory */ xa->mem.id = 0xFFFF; @@ -225,6 +231,17 @@ static int __mem_id_cyclic_get(gfp_t gfp) return id; } +static bool __is_supported_mem_type(enum xdp_mem_type type) +{ + if (type == MEM_TYPE_PAGE_POOL) + return is_page_pool_compiled_in(); + + if (type >= MEM_TYPE_MAX) + return false; + + return true; +} + int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, enum xdp_mem_type type, void *allocator) { @@ -238,13 +255,16 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, return -EFAULT; } - if (type >= MEM_TYPE_MAX) - return -EINVAL; + if (!__is_supported_mem_type(type)) + return -EOPNOTSUPP; xdp_rxq->mem.type = type; - if (!allocator) + if (!allocator) { + if (type == MEM_TYPE_PAGE_POOL) + return -EINVAL; /* Setup time check page_pool req */ return 0; + } /* Delay init of rhashtable to save memory if feature isn't used */ if (!mem_id_init) { @@ -290,15 +310,31 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model); void xdp_return_frame(void *data, struct xdp_mem_info *mem) { - if (mem->type == MEM_TYPE_PAGE_SHARED) { + struct xdp_mem_allocator *xa; + struct page *page; + + switch (mem->type) { + case MEM_TYPE_PAGE_POOL: + rcu_read_lock(); + /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */ + xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params); + page = virt_to_head_page(data); + if (xa) + page_pool_put_page(xa->page_pool, page); + else + put_page(page); + rcu_read_unlock(); + break; + case MEM_TYPE_PAGE_SHARED: page_frag_free(data); - return; - } - - if (mem->type == MEM_TYPE_PAGE_ORDER0) { - struct page *page = virt_to_page(data); /* Assumes order0 page*/ - + break; + case MEM_TYPE_PAGE_ORDER0: + page = virt_to_page(data); /* Assumes order0 page*/ put_page(page); + break; + default: + /* Not possible, checked in xdp_rxq_info_reg_mem_model() */ + break; } } EXPORT_SYMBOL_GPL(xdp_return_frame); From patchwork Mon Nov 2 12:48:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392237 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt740hbjz9sWX; Mon, 2 Nov 2020 23:50:24 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHu-0006p0-Kg; Mon, 02 Nov 2020 12:50:18 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH7-0006Dh-2B for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:29 +0000 Received: from mail-qk1-f199.google.com ([209.85.222.199]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH6-000477-3N for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:28 +0000 Received: by mail-qk1-f199.google.com with SMTP id i2so8642600qkk.0 for ; Mon, 02 Nov 2020 04:49:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SW1XH4DUI6okOCh1zXOhnWQUcSlWexiNDMfCGN9KJKk=; b=oKXTjx+R8watTA5w/Swd0X80lfHjGDoQoPa8vGUt97LE35JgpDRtAYV75rIamLau20 zC4PCvyPZ5mUPJdg6ilHjuCrOYICD71WNDuRn5eQFPMwNRuPBEEgieSFHwovmR+AgHiV uvGAV1osPfRuQpaN/Wifp6ulqBDTfqh1OtkQtKG5g/ptT4mIlq7I5QQkp4jqj/ErD70j eUMlvzDmwxxxcfzFd/KXW6DsVcbYO2CU50LRUnYkqRkzPu6glAibItB0zTc1doAD/1zS XvjARx9D0lU3xh88M4vomFKr3WJM3zQjQEJS2DUJfieix5n2x5YQeD27CxP6uKKkVyDI HxcA== X-Gm-Message-State: AOAM531pUjok0kmCH7Qv1JmRB9Hrk3XFLGbnexOKUjBydRhnTiUZdE+H VVIuB8CUMFGKnPCFX5FT3YleKG0uRFUqAxi+MNnBWz38oFT4F95ZZyacdWtwiSf/LURykBe42ZP XNtE8CCcqOxBF8+96CD07pFmwI2KDqwCol9nRQdUPtg== X-Received: by 2002:a05:622a:d5:: with SMTP id p21mr3842907qtw.110.1604321366681; Mon, 02 Nov 2020 04:49:26 -0800 (PST) X-Google-Smtp-Source: ABdhPJxGq1TMZxXQG2lpXhlCVqJJjLScB/qBYjoYn+oIvsn01hfLzToQCjEJ7MjWs4/xUB21keWvqA== X-Received: by 2002:a05:622a:d5:: with SMTP id p21mr3842878qtw.110.1604321366260; Mon, 02 Nov 2020 04:49:26 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:25 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 25/40] xdp: transition into using xdp_frame for return API Date: Mon, 2 Nov 2020 07:48:41 -0500 Message-Id: <20201102124856.4659-26-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 Changing API xdp_return_frame() to take struct xdp_frame as argument, seems like a natural choice. But there are some subtle performance details here that needs extra care, which is a deliberate choice. When de-referencing xdp_frame on a remote CPU during DMA-TX completion, result in the cache-line is change to "Shared" state. Later when the page is reused for RX, then this xdp_frame cache-line is written, which change the state to "Modified". This situation already happens (naturally) for, virtio_net, tun and cpumap as the xdp_frame pointer is the queued object. In tun and cpumap, the ptr_ring is used for efficiently transferring cache-lines (with pointers) between CPUs. Thus, the only option is to de-referencing xdp_frame. It is only the ixgbe driver that had an optimization, in which it can avoid doing the de-reference of xdp_frame. The driver already have TX-ring queue, which (in case of remote DMA-TX completion) have to be transferred between CPUs anyhow. In this data area, we stored a struct xdp_mem_info and a data pointer, which allowed us to avoid de-referencing xdp_frame. To compensate for this, a prefetchw is used for telling the cache coherency protocol about our access pattern. My benchmarks show that this prefetchw is enough to compensate the ixgbe driver. V7: Adjust for commit d9314c474d4f ("i40e: add support for XDP_REDIRECT") V8: Adjust for commit bd658dda4237 ("net/mlx5e: Separate dma base address and offset in dma_sync call") Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (backported from commit 039930945a72d9af5ff04ae9b9e60658a52e0770) [ vilhelmgray: context adjustment ] Signed-off-by: William Breathitt Gray --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 5 ++--- drivers/net/ethernet/intel/ixgbe/ixgbe.h | 4 +--- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 17 +++++++++++------ drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 1 + drivers/net/tun.c | 4 ++-- drivers/net/virtio_net.c | 2 +- include/net/xdp.h | 2 +- kernel/bpf/cpumap.c | 6 +++--- net/core/xdp.c | 4 +++- 9 files changed, 25 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index c674d0a566a8..2d4d3bfc79ec 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -631,8 +631,7 @@ static void i40e_unmap_and_free_tx_resource(struct i40e_ring *ring, if (tx_buffer->tx_flags & I40E_TX_FLAGS_FD_SB) kfree(tx_buffer->raw_buf); else if (ring_is_xdp(ring)) - xdp_return_frame(tx_buffer->xdpf->data, - &tx_buffer->xdpf->mem); + xdp_return_frame(tx_buffer->xdpf); else dev_kfree_skb_any(tx_buffer->skb); if (dma_unmap_len(tx_buffer, len)) @@ -776,7 +775,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi, /* free the skb/XDP data */ if (ring_is_xdp(tx_ring)) - xdp_return_frame(tx_buf->xdpf->data, &tx_buf->xdpf->mem); + xdp_return_frame(tx_buf->xdpf); else napi_consume_skb(tx_buf->skb, napi_budget); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h index 94a77ef2bffd..6dbe71870ce8 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -238,8 +238,7 @@ struct ixgbe_tx_buffer { unsigned long time_stamp; union { struct sk_buff *skb; - /* XDP uses address ptr on irq_clean */ - void *data; + struct xdp_frame *xdpf; }; unsigned int bytecount; unsigned short gso_segs; @@ -247,7 +246,6 @@ struct ixgbe_tx_buffer { DEFINE_DMA_UNMAP_ADDR(dma); DEFINE_DMA_UNMAP_LEN(len); u32 tx_flags; - struct xdp_mem_info xdp_mem; }; struct ixgbe_rx_buffer { diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 3040ec9362c8..68427243f706 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -1207,7 +1207,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector, /* free the skb */ if (ring_is_xdp(tx_ring)) - xdp_return_frame(tx_buffer->data, &tx_buffer->xdp_mem); + xdp_return_frame(tx_buffer->xdpf); else napi_consume_skb(tx_buffer->skb, napi_budget); @@ -2367,6 +2367,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, xdp.data_hard_start = xdp.data - ixgbe_rx_offset(rx_ring); xdp.data_end = xdp.data + size; + prefetchw(xdp.data_hard_start); /* xdp_frame write */ skb = ixgbe_run_xdp(adapter, rx_ring, &xdp); } @@ -5880,7 +5881,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring) /* Free all the Tx ring sk_buffs */ if (ring_is_xdp(tx_ring)) - xdp_return_frame(tx_buffer->data, &tx_buffer->xdp_mem); + xdp_return_frame(tx_buffer->xdpf); else dev_kfree_skb_any(tx_buffer->skb); @@ -8458,16 +8459,21 @@ static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter, struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()]; struct ixgbe_tx_buffer *tx_buffer; union ixgbe_adv_tx_desc *tx_desc; + struct xdp_frame *xdpf; u32 len, cmd_type; dma_addr_t dma; u16 i; - len = xdp->data_end - xdp->data; + xdpf = convert_to_xdp_frame(xdp); + if (unlikely(!xdpf)) + return -EOVERFLOW; + + len = xdpf->len; if (unlikely(!ixgbe_desc_unused(ring))) return IXGBE_XDP_CONSUMED; - dma = dma_map_single(ring->dev, xdp->data, len, DMA_TO_DEVICE); + dma = dma_map_single(ring->dev, xdpf->data, len, DMA_TO_DEVICE); if (dma_mapping_error(ring->dev, dma)) return IXGBE_XDP_CONSUMED; @@ -8482,8 +8488,7 @@ static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter, dma_unmap_len_set(tx_buffer, len, len); dma_unmap_addr_set(tx_buffer, dma, dma); - tx_buffer->data = xdp->data; - tx_buffer->xdp_mem = xdp->rxq->mem; + tx_buffer->xdpf = xdpf; tx_desc->read.buffer_addr = cpu_to_le64(dma); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index b1bc4bdf9ff4..078a098d4d90 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -1002,6 +1002,7 @@ struct sk_buff *skb_from_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe, di->addr + wi->offset, 0, frag_size, DMA_FROM_DEVICE); + prefetchw(va); /* xdp_frame data area */ prefetch(data); wi->offset += frag_size; diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 72664d59abc0..6263852b2ed0 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -627,7 +627,7 @@ static void tun_ptr_free(void *ptr) if (tun_is_xdp_frame(ptr)) { struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr); - xdp_return_frame(xdpf->data, &xdpf->mem); + xdp_return_frame(xdpf); } else { __skb_array_destroy_skb(ptr); } @@ -2168,7 +2168,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile, struct xdp_frame *xdpf = tun_ptr_to_xdp(ptr); ret = tun_put_user_xdp(tun, tfile, xdpf, to); - xdp_return_frame(xdpf->data, &xdpf->mem); + xdp_return_frame(xdpf); } else { struct sk_buff *skb = ptr; diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 14a59b12a96b..b1fda3c79bdb 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -413,7 +413,7 @@ static int __virtnet_xdp_xmit(struct virtnet_info *vi, /* Free up any pending old buffers before queueing new ones. */ while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) - xdp_return_frame(xdpf_sent->data, &xdpf_sent->mem); + xdp_return_frame(xdpf_sent); xdpf = convert_to_xdp_frame(xdp); if (unlikely(!xdpf)) diff --git a/include/net/xdp.h b/include/net/xdp.h index d0ee437753dc..137ad5f9f40f 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -103,7 +103,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp) return xdp_frame; } -void xdp_return_frame(void *data, struct xdp_mem_info *mem); +void xdp_return_frame(struct xdp_frame *xdpf); int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq, struct net_device *dev, u32 queue_index); diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 924b3cd75da9..d96e3f114118 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -225,7 +225,7 @@ static void __cpu_map_ring_cleanup(struct ptr_ring *ring) while ((xdpf = ptr_ring_consume(ring))) if (WARN_ON_ONCE(xdpf)) - xdp_return_frame(xdpf->data, &xdpf->mem); + xdp_return_frame(xdpf); } static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu) @@ -281,7 +281,7 @@ static int cpu_map_kthread_run(void *data) skb = cpu_map_build_skb(rcpu, xdpf); if (!skb) { - xdp_return_frame(xdpf->data, &xdpf->mem); + xdp_return_frame(xdpf); continue; } @@ -583,7 +583,7 @@ static int bq_flush_to_queue(struct bpf_cpu_map_entry *rcpu, err = __ptr_ring_produce(q, xdpf); if (err) { drops++; - xdp_return_frame(xdpf->data, &xdpf->mem); + xdp_return_frame(xdpf); } processed++; } diff --git a/net/core/xdp.c b/net/core/xdp.c index 33e382afbd95..0c86b53a3a63 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -308,9 +308,11 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, } EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model); -void xdp_return_frame(void *data, struct xdp_mem_info *mem) +void xdp_return_frame(struct xdp_frame *xdpf) { + struct xdp_mem_info *mem = &xdpf->mem; struct xdp_mem_allocator *xa; + void *data = xdpf->data; struct page *page; switch (mem->type) { From patchwork Mon Nov 2 12:48:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392238 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt753jrJz9sVS; Mon, 2 Nov 2020 23:50:25 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHw-0006rD-D2; Mon, 02 Nov 2020 12:50:20 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH9-0006Es-3J for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:31 +0000 Received: from mail-qk1-f199.google.com ([209.85.222.199]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH7-00047F-29 for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:29 +0000 Received: by mail-qk1-f199.google.com with SMTP id n23so7520867qkn.1 for ; Mon, 02 Nov 2020 04:49:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MiTpdF8vlBMbn5dpsTvq5KuQurHy82Ms89L6klHvgbM=; b=sHwmkSz1qSjbPBu31mt2Hb5I/k8MhfN+6zug+n5h/hdBfOmmAXDN2V1FIEpWvcC5S9 +A3OxwYmRJY7qSU64So2FU2JzVZaNzz8yU/UyJECipkdtP+70Xn3TPHxspwHrAYgz1ji qjSI5X6EmkR7pkgWwwCHT63Z2fzoNrhB5qMAwE8AAlfijM6c3h9Sch8a5HFvG15TX4tB lsffAHav0VfbajbNgkqE9MVJoT4nKHxpE1CzlLUw1yoLicLdNAd2XyW34I0n0D3EQML5 POUNUrOdeb/L5Cbua7onWt4RzyD3NrQz/VdBCNY/38gK0QR57hNZecZ9XW4cA8RXkdiA dDOQ== X-Gm-Message-State: AOAM532ZBkRVXkvA8AYwrOW5UU0HaL9H1tB8VZU6uTGTkLA61DP47D4y SmeNhvTcnc6GWmTqs+AgqOU166olviF3L2Vy8FGTkFl/5PMgjcYqiqY8c+bGQ02Mn03GFYtd6sJ KwpFr1tumse/t4pW3g+t5VLrMrvsdSJ5yNGfJFidMwg== X-Received: by 2002:ac8:7391:: with SMTP id t17mr13464447qtp.289.1604321367699; Mon, 02 Nov 2020 04:49:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJypba5Kfh8hmHms4slRWvLnU5tiZqWRXGe/n9IhWESSvgW28pHmj/PeNrikXUWH63XYYojpyQ== X-Received: by 2002:ac8:7391:: with SMTP id t17mr13464424qtp.289.1604321367394; Mon, 02 Nov 2020 04:49:27 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:26 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 26/40] xdp: transition into using xdp_frame for ndo_xdp_xmit Date: Mon, 2 Nov 2020 07:48:42 -0500 Message-Id: <20201102124856.4659-27-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct xdp_buff. This brings xdp_return_frame and ndp_xdp_xmit in sync. This builds towards changing the API further to become a bulk API, because xdp_buff is not a queue-able object while xdp_frame is. V4: Adjust for commit 59655a5b6c83 ("tuntap: XDP_TX can use native XDP") V7: Adjust for commit d9314c474d4f ("i40e: add support for XDP_REDIRECT") Signed-off-by: Jesper Dangaard Brouer Signed-off-by: David S. Miller (backported from commit 44fa2dbd475996ddc8f3a0e6113dee983e0ee3aa) [ vilhelmgray: context adjustments ] Signed-off-by: William Breathitt Gray --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 30 +++++++++++-------- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 24 ++++++++------- drivers/net/tun.c | 19 +++++++----- drivers/net/virtio_net.c | 24 ++++++++------- include/linux/netdevice.h | 4 +-- net/core/filter.c | 17 +++++++++-- 7 files changed, 74 insertions(+), 46 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 2d4d3bfc79ec..f131ccd25187 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -1985,9 +1985,20 @@ static bool i40e_is_non_eop(struct i40e_ring *rx_ring, #define I40E_XDP_CONSUMED 1 #define I40E_XDP_TX 2 -static int i40e_xmit_xdp_ring(struct xdp_buff *xdp, +static int i40e_xmit_xdp_ring(struct xdp_frame *xdpf, struct i40e_ring *xdp_ring); +static int i40e_xmit_xdp_tx_ring(struct xdp_buff *xdp, + struct i40e_ring *xdp_ring) +{ + struct xdp_frame *xdpf = convert_to_xdp_frame(xdp); + + if (unlikely(!xdpf)) + return I40E_XDP_CONSUMED; + + return i40e_xmit_xdp_ring(xdpf, xdp_ring); +} + /** * i40e_run_xdp - run an XDP program * @rx_ring: Rx ring being processed @@ -2015,7 +2026,7 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring, break; case XDP_TX: xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index]; - result = i40e_xmit_xdp_ring(xdp, xdp_ring); + result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring); break; case XDP_REDIRECT: err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); @@ -3266,21 +3277,14 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb, * @xdp: data to transmit * @xdp_ring: XDP Tx ring **/ -static int i40e_xmit_xdp_ring(struct xdp_buff *xdp, +static int i40e_xmit_xdp_ring(struct xdp_frame *xdpf, struct i40e_ring *xdp_ring) { u16 i = xdp_ring->next_to_use; struct i40e_tx_buffer *tx_bi; struct i40e_tx_desc *tx_desc; - struct xdp_frame *xdpf; + u32 size = xdpf->len; dma_addr_t dma; - u32 size; - - xdpf = convert_to_xdp_frame(xdp); - if (unlikely(!xdpf)) - return I40E_XDP_CONSUMED; - - size = xdpf->len; if (!unlikely(I40E_DESC_UNUSED(xdp_ring))) { xdp_ring->tx_stats.tx_busy++; @@ -3470,7 +3474,7 @@ netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev) * * Returns Zero if sent, else an error code **/ -int i40e_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) +int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf) { struct i40e_netdev_priv *np = netdev_priv(dev); unsigned int queue_index = smp_processor_id(); @@ -3483,7 +3487,7 @@ int i40e_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) if (!i40e_enabled_xdp_vsi(vsi) || queue_index >= vsi->num_queue_pairs) return -ENXIO; - err = i40e_xmit_xdp_ring(xdp, vsi->xdp_rings[queue_index]); + err = i40e_xmit_xdp_ring(xdpf, vsi->xdp_rings[queue_index]); if (err != I40E_XDP_TX) return -ENOSPC; diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h index b06e4cf86f60..95df83f35706 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h @@ -501,7 +501,7 @@ void i40e_force_wb(struct i40e_vsi *vsi, struct i40e_q_vector *q_vector); u32 i40e_get_tx_pending(struct i40e_ring *ring); int __i40e_maybe_stop_tx(struct i40e_ring *tx_ring, int size); bool __i40e_chk_linearize(struct sk_buff *skb); -int i40e_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp); +int i40e_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf); void i40e_xdp_flush(struct net_device *dev); /** diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 68427243f706..86a94105a489 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -2242,7 +2242,7 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring, #define IXGBE_XDP_REDIR BIT(2) static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter, - struct xdp_buff *xdp); + struct xdp_frame *xdpf); static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter, struct ixgbe_ring *rx_ring, @@ -2250,6 +2250,7 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter, { int err, result = IXGBE_XDP_PASS; struct bpf_prog *xdp_prog; + struct xdp_frame *xdpf; u32 act; rcu_read_lock(); @@ -2258,12 +2259,19 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter, if (!xdp_prog) goto xdp_out; + prefetchw(xdp->data_hard_start); /* xdp_frame write */ + act = bpf_prog_run_xdp(xdp_prog, xdp); switch (act) { case XDP_PASS: break; case XDP_TX: - result = ixgbe_xmit_xdp_ring(adapter, xdp); + xdpf = convert_to_xdp_frame(xdp); + if (unlikely(!xdpf)) { + result = IXGBE_XDP_CONSUMED; + break; + } + result = ixgbe_xmit_xdp_ring(adapter, xdpf); break; case XDP_REDIRECT: err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog); @@ -2367,7 +2375,6 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, xdp.data_hard_start = xdp.data - ixgbe_rx_offset(rx_ring); xdp.data_end = xdp.data + size; - prefetchw(xdp.data_hard_start); /* xdp_frame write */ skb = ixgbe_run_xdp(adapter, rx_ring, &xdp); } @@ -8454,20 +8461,15 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb, } static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter, - struct xdp_buff *xdp) + struct xdp_frame *xdpf) { struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()]; struct ixgbe_tx_buffer *tx_buffer; union ixgbe_adv_tx_desc *tx_desc; - struct xdp_frame *xdpf; u32 len, cmd_type; dma_addr_t dma; u16 i; - xdpf = convert_to_xdp_frame(xdp); - if (unlikely(!xdpf)) - return -EOVERFLOW; - len = xdpf->len; if (unlikely(!ixgbe_desc_unused(ring))) @@ -10108,7 +10110,7 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp) } } -static int ixgbe_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) +static int ixgbe_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf) { struct ixgbe_adapter *adapter = netdev_priv(dev); struct ixgbe_ring *ring; @@ -10124,7 +10126,7 @@ static int ixgbe_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) if (unlikely(!ring)) return -ENXIO; - err = ixgbe_xmit_xdp_ring(adapter, xdp); + err = ixgbe_xmit_xdp_ring(adapter, xdpf); if (err != IXGBE_XDP_TX) return -ENOSPC; diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 6263852b2ed0..5aeb6a2c82ba 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1236,18 +1236,13 @@ static const struct net_device_ops tun_netdev_ops = { .ndo_change_carrier = tun_net_change_carrier, }; -static int tun_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) +static int tun_xdp_xmit(struct net_device *dev, struct xdp_frame *frame) { struct tun_struct *tun = netdev_priv(dev); - struct xdp_frame *frame; struct tun_file *tfile; u32 numqueues; int ret = 0; - frame = convert_to_xdp_frame(xdp); - if (unlikely(!frame)) - return -EOVERFLOW; - rcu_read_lock(); numqueues = READ_ONCE(tun->numqueues); @@ -1271,6 +1266,16 @@ static int tun_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) return ret; } +static int tun_xdp_tx(struct net_device *dev, struct xdp_buff *xdp) +{ + struct xdp_frame *frame = convert_to_xdp_frame(xdp); + + if (unlikely(!frame)) + return -EOVERFLOW; + + return tun_xdp_xmit(dev, frame); +} + static void tun_xdp_flush(struct net_device *dev) { struct tun_struct *tun = netdev_priv(dev); @@ -1634,7 +1639,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, case XDP_TX: get_page(alloc_frag->page); alloc_frag->offset += buflen; - if (tun_xdp_xmit(tun->dev, &xdp)) + if (tun_xdp_tx(tun->dev, &xdp)) goto err_redirect; tun_xdp_flush(tun->dev); rcu_read_unlock(); diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b1fda3c79bdb..9e2363f0f8fe 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -399,10 +399,10 @@ static void virtnet_xdp_flush(struct net_device *dev) } static int __virtnet_xdp_xmit(struct virtnet_info *vi, - struct xdp_buff *xdp) + struct xdp_frame *xdpf) { struct virtio_net_hdr_mrg_rxbuf *hdr; - struct xdp_frame *xdpf, *xdpf_sent; + struct xdp_frame *xdpf_sent; struct send_queue *sq; unsigned int len; unsigned int qp; @@ -415,10 +415,6 @@ static int __virtnet_xdp_xmit(struct virtnet_info *vi, while ((xdpf_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) xdp_return_frame(xdpf_sent); - xdpf = convert_to_xdp_frame(xdp); - if (unlikely(!xdpf)) - return -EOVERFLOW; - /* virtqueue want to use data area in-front of packet */ if (unlikely(xdpf->metasize > 0)) return -EOPNOTSUPP; @@ -442,7 +438,7 @@ static int __virtnet_xdp_xmit(struct virtnet_info *vi, return 0; } -static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) +static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_frame *xdpf) { struct virtnet_info *vi = netdev_priv(dev); struct receive_queue *rq = vi->rq; @@ -455,7 +451,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, struct xdp_buff *xdp) if (!xdp_prog) return -ENXIO; - return __virtnet_xdp_xmit(vi, xdp); + return __virtnet_xdp_xmit(vi, xdpf); } static unsigned int virtnet_get_headroom(struct virtnet_info *vi) @@ -551,6 +547,7 @@ static struct sk_buff *receive_small(struct net_device *dev, xdp_prog = rcu_dereference(rq->xdp_prog); if (xdp_prog) { struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset; + struct xdp_frame *xdpf; struct xdp_buff xdp; void *orig_data; u32 act; @@ -593,7 +590,10 @@ static struct sk_buff *receive_small(struct net_device *dev, delta = orig_data - xdp.data; break; case XDP_TX: - err = __virtnet_xdp_xmit(vi, &xdp); + xdpf = convert_to_xdp_frame(&xdp); + if (unlikely(!xdpf)) + goto err_xdp; + err = __virtnet_xdp_xmit(vi, xdpf); if (unlikely(err)) { trace_xdp_exception(vi->dev, xdp_prog, act); goto err_xdp; @@ -685,6 +685,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, rcu_read_lock(); xdp_prog = rcu_dereference(rq->xdp_prog); if (xdp_prog) { + struct xdp_frame *xdpf; struct page *xdp_page; struct xdp_buff xdp; void *data; @@ -747,7 +748,10 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, } break; case XDP_TX: - err = __virtnet_xdp_xmit(vi, &xdp); + xdpf = convert_to_xdp_frame(&xdp); + if (unlikely(!xdpf)) + goto err_xdp; + err = __virtnet_xdp_xmit(vi, xdpf); if (unlikely(err)) { trace_xdp_exception(vi->dev, xdp_prog, act); if (unlikely(xdp_page != page)) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0482941b1647..510f27c156bf 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1145,7 +1145,7 @@ struct dev_ifalias { * This function is used to set or query state related to XDP on the * netdevice and manage BPF offload. See definition of * enum bpf_netdev_command for details. - * int (*ndo_xdp_xmit)(struct net_device *dev, struct xdp_buff *xdp); + * int (*ndo_xdp_xmit)(struct net_device *dev, struct xdp_frame *xdp); * This function is used to submit a XDP packet for transmit on a * netdevice. * void (*ndo_xdp_flush)(struct net_device *dev); @@ -1336,7 +1336,7 @@ struct net_device_ops { int (*ndo_bpf)(struct net_device *dev, struct netdev_bpf *bpf); int (*ndo_xdp_xmit)(struct net_device *dev, - struct xdp_buff *xdp); + struct xdp_frame *xdp); void (*ndo_xdp_flush)(struct net_device *dev); }; diff --git a/net/core/filter.c b/net/core/filter.c index c7754263f1b9..24766051dfd6 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2524,13 +2524,18 @@ static int __bpf_tx_xdp(struct net_device *dev, struct xdp_buff *xdp, u32 index) { + struct xdp_frame *xdpf; int err; if (!dev->netdev_ops->ndo_xdp_xmit) { return -EOPNOTSUPP; } - err = dev->netdev_ops->ndo_xdp_xmit(dev, xdp); + xdpf = convert_to_xdp_frame(xdp); + if (unlikely(!xdpf)) + return -EOVERFLOW; + + err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf); if (err) return err; dev->netdev_ops->ndo_xdp_flush(dev); @@ -2546,11 +2551,19 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, if (map->map_type == BPF_MAP_TYPE_DEVMAP) { struct net_device *dev = fwd; + struct xdp_frame *xdpf; if (!dev->netdev_ops->ndo_xdp_xmit) return -EOPNOTSUPP; - err = dev->netdev_ops->ndo_xdp_xmit(dev, xdp); + xdpf = convert_to_xdp_frame(xdp); + if (unlikely(!xdpf)) + return -EOVERFLOW; + + /* TODO: move to inside map code instead, for bulk support + * err = dev_map_enqueue(dev, xdp); + */ + err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf); if (err) return err; __dev_map_insert_ctx(map, index); From patchwork Mon Nov 2 12:48:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392242 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt784Jhzz9sWs; Mon, 2 Nov 2020 23:50:27 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHy-0006u0-Vo; Mon, 02 Nov 2020 12:50:22 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHB-0006Fy-I3 for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:33 +0000 Received: from mail-qk1-f198.google.com ([209.85.222.198]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH8-00047U-QS for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:31 +0000 Received: by mail-qk1-f198.google.com with SMTP id n23so7520906qkn.1 for ; Mon, 02 Nov 2020 04:49:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=IE2+KBEz6RGUYHtG/KprEdEqoJPqvolJ0esluAqhLeQ=; b=TJyUh5Np1On/JlZHLJNyQMJLExZCROoRRxNvw3zxVVQh5IPnhz7lExi4UZBzf0ZKk6 58iH/9EQbDIqUKaSCtieZ21SGbDp70sPDHlKxe09Kpvc6GdkhLOSDTvAfKl+PEzySAcJ FljJ9Xk1VwikfE9UdGiBytxoSddCJuN4wS6zRUQWnfhiuoVrXQCc448TG0nSuC5t9ZJd FlyA5Q9jg/PicVIChwC2Yr5sx3KcQnFufGDaf2Wrok5vrMhU6rXye9ZcK+mhlsoDqvoQ hDxNjj6Wmdpwb1xw1AxbkPN0AMOlUmxVGyRDtcVeMIJpVE4cXQulGsKV7UFXbHPQu6uX N7Fw== X-Gm-Message-State: AOAM530/ZM3AIQMKnU95gxohgu4yokQD0AG6judDn170REj9p4O8ORZC 5q5v6foCKF/Vhq7mYn8qwoqEJnS6efIdzF8JlUGY/VBeX00F5SiBOPjzVDks0iGxswX0ISAjZ5i n6jAkCmY7BAxMC9nUvOO0XUVTDrbKUoduUh79wa3jaQ== X-Received: by 2002:ac8:3612:: with SMTP id m18mr6035071qtb.346.1604321369293; Mon, 02 Nov 2020 04:49:29 -0800 (PST) X-Google-Smtp-Source: ABdhPJwKjIs/oeloIXROZQNk058rX2nFZOikRi3mwBW+EWkH209yf2yBfk08xJnrRdPbwzV761Ktyw== X-Received: by 2002:ac8:3612:: with SMTP id m18mr6035009qtb.346.1604321368347; Mon, 02 Nov 2020 04:49:28 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:27 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 27/40] xsk: add user memory registration support sockopt Date: Mon, 2 Nov 2020 07:48:43 -0500 Message-Id: <20201102124856.4659-28-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Björn Töpel BugLink: https://bugs.launchpad.net/bugs/1877654 In this commit the base structure of the AF_XDP address family is set up. Further, we introduce the abilty register a window of user memory to the kernel via the XDP_UMEM_REG setsockopt syscall. The memory window is viewed by an AF_XDP socket as a set of equally large frames. After a user memory registration all frames are "owned" by the user application, and not the kernel. v2: More robust checks on umem creation and unaccount on error. Call set_page_dirty_lock on cleanup. Simplified xdp_umem_reg. Co-authored-by: Magnus Karlsson Signed-off-by: Magnus Karlsson Signed-off-by: Björn Töpel Signed-off-by: Alexei Starovoitov (cherry picked from commit c0c77d8fb787cfe0c3fca689c2a30d1dad4eaba7) Signed-off-by: William Breathitt Gray --- include/net/xdp_sock.h | 31 +++++ include/uapi/linux/if_xdp.h | 34 +++++ net/Makefile | 1 + net/xdp/Makefile | 2 + net/xdp/xdp_umem.c | 245 ++++++++++++++++++++++++++++++++++++ net/xdp/xdp_umem.h | 45 +++++++ net/xdp/xdp_umem_props.h | 23 ++++ net/xdp/xsk.c | 215 +++++++++++++++++++++++++++++++ 8 files changed, 596 insertions(+) create mode 100644 include/net/xdp_sock.h create mode 100644 include/uapi/linux/if_xdp.h create mode 100644 net/xdp/Makefile create mode 100644 net/xdp/xdp_umem.c create mode 100644 net/xdp/xdp_umem.h create mode 100644 net/xdp/xdp_umem_props.h create mode 100644 net/xdp/xsk.c diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h new file mode 100644 index 000000000000..94785f5db13e --- /dev/null +++ b/include/net/xdp_sock.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 + * AF_XDP internal functions + * Copyright(c) 2018 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef _LINUX_XDP_SOCK_H +#define _LINUX_XDP_SOCK_H + +#include +#include + +struct xdp_umem; + +struct xdp_sock { + /* struct sock must be the first member of struct xdp_sock */ + struct sock sk; + struct xdp_umem *umem; + /* Protects multiple processes in the control path */ + struct mutex mutex; +}; + +#endif /* _LINUX_XDP_SOCK_H */ diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h new file mode 100644 index 000000000000..41252135a0fe --- /dev/null +++ b/include/uapi/linux/if_xdp.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note + * + * if_xdp: XDP socket user-space interface + * Copyright(c) 2018 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * Author(s): Björn Töpel + * Magnus Karlsson + */ + +#ifndef _LINUX_IF_XDP_H +#define _LINUX_IF_XDP_H + +#include + +/* XDP socket options */ +#define XDP_UMEM_REG 3 + +struct xdp_umem_reg { + __u64 addr; /* Start of packet data area */ + __u64 len; /* Length of packet data area */ + __u32 frame_size; /* Frame size */ + __u32 frame_headroom; /* Frame head room */ +}; + +#endif /* _LINUX_IF_XDP_H */ diff --git a/net/Makefile b/net/Makefile index 14fede520840..9df8e6f827f8 100644 --- a/net/Makefile +++ b/net/Makefile @@ -86,3 +86,4 @@ obj-y += l3mdev/ endif obj-$(CONFIG_QRTR) += qrtr/ obj-$(CONFIG_NET_NCSI) += ncsi/ +obj-$(CONFIG_XDP_SOCKETS) += xdp/ diff --git a/net/xdp/Makefile b/net/xdp/Makefile new file mode 100644 index 000000000000..a5d736640a0f --- /dev/null +++ b/net/xdp/Makefile @@ -0,0 +1,2 @@ +obj-$(CONFIG_XDP_SOCKETS) += xsk.o xdp_umem.o + diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c new file mode 100644 index 000000000000..ec8b3552be44 --- /dev/null +++ b/net/xdp/xdp_umem.c @@ -0,0 +1,245 @@ +// SPDX-License-Identifier: GPL-2.0 +/* XDP user-space packet buffer + * Copyright(c) 2018 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "xdp_umem.h" + +#define XDP_UMEM_MIN_FRAME_SIZE 2048 + +int xdp_umem_create(struct xdp_umem **umem) +{ + *umem = kzalloc(sizeof(**umem), GFP_KERNEL); + + if (!(*umem)) + return -ENOMEM; + + return 0; +} + +static void xdp_umem_unpin_pages(struct xdp_umem *umem) +{ + unsigned int i; + + if (umem->pgs) { + for (i = 0; i < umem->npgs; i++) { + struct page *page = umem->pgs[i]; + + set_page_dirty_lock(page); + put_page(page); + } + + kfree(umem->pgs); + umem->pgs = NULL; + } +} + +static void xdp_umem_unaccount_pages(struct xdp_umem *umem) +{ + if (umem->user) { + atomic_long_sub(umem->npgs, &umem->user->locked_vm); + free_uid(umem->user); + } +} + +static void xdp_umem_release(struct xdp_umem *umem) +{ + struct task_struct *task; + struct mm_struct *mm; + + if (umem->pgs) { + xdp_umem_unpin_pages(umem); + + task = get_pid_task(umem->pid, PIDTYPE_PID); + put_pid(umem->pid); + if (!task) + goto out; + mm = get_task_mm(task); + put_task_struct(task); + if (!mm) + goto out; + + mmput(mm); + umem->pgs = NULL; + } + + xdp_umem_unaccount_pages(umem); +out: + kfree(umem); +} + +static void xdp_umem_release_deferred(struct work_struct *work) +{ + struct xdp_umem *umem = container_of(work, struct xdp_umem, work); + + xdp_umem_release(umem); +} + +void xdp_get_umem(struct xdp_umem *umem) +{ + atomic_inc(&umem->users); +} + +void xdp_put_umem(struct xdp_umem *umem) +{ + if (!umem) + return; + + if (atomic_dec_and_test(&umem->users)) { + INIT_WORK(&umem->work, xdp_umem_release_deferred); + schedule_work(&umem->work); + } +} + +static int xdp_umem_pin_pages(struct xdp_umem *umem) +{ + unsigned int gup_flags = FOLL_WRITE; + long npgs; + int err; + + umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), GFP_KERNEL); + if (!umem->pgs) + return -ENOMEM; + + down_write(¤t->mm->mmap_sem); + npgs = get_user_pages(umem->address, umem->npgs, + gup_flags, &umem->pgs[0], NULL); + up_write(¤t->mm->mmap_sem); + + if (npgs != umem->npgs) { + if (npgs >= 0) { + umem->npgs = npgs; + err = -ENOMEM; + goto out_pin; + } + err = npgs; + goto out_pgs; + } + return 0; + +out_pin: + xdp_umem_unpin_pages(umem); +out_pgs: + kfree(umem->pgs); + umem->pgs = NULL; + return err; +} + +static int xdp_umem_account_pages(struct xdp_umem *umem) +{ + unsigned long lock_limit, new_npgs, old_npgs; + + if (capable(CAP_IPC_LOCK)) + return 0; + + lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + umem->user = get_uid(current_user()); + + do { + old_npgs = atomic_long_read(&umem->user->locked_vm); + new_npgs = old_npgs + umem->npgs; + if (new_npgs > lock_limit) { + free_uid(umem->user); + umem->user = NULL; + return -ENOBUFS; + } + } while (atomic_long_cmpxchg(&umem->user->locked_vm, old_npgs, + new_npgs) != old_npgs); + return 0; +} + +int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) +{ + u32 frame_size = mr->frame_size, frame_headroom = mr->frame_headroom; + u64 addr = mr->addr, size = mr->len; + unsigned int nframes, nfpp; + int size_chk, err; + + if (!umem) + return -EINVAL; + + if (frame_size < XDP_UMEM_MIN_FRAME_SIZE || frame_size > PAGE_SIZE) { + /* Strictly speaking we could support this, if: + * - huge pages, or* + * - using an IOMMU, or + * - making sure the memory area is consecutive + * but for now, we simply say "computer says no". + */ + return -EINVAL; + } + + if (!is_power_of_2(frame_size)) + return -EINVAL; + + if (!PAGE_ALIGNED(addr)) { + /* Memory area has to be page size aligned. For + * simplicity, this might change. + */ + return -EINVAL; + } + + if ((addr + size) < addr) + return -EINVAL; + + nframes = size / frame_size; + if (nframes == 0 || nframes > UINT_MAX) + return -EINVAL; + + nfpp = PAGE_SIZE / frame_size; + if (nframes < nfpp || nframes % nfpp) + return -EINVAL; + + frame_headroom = ALIGN(frame_headroom, 64); + + size_chk = frame_size - frame_headroom - XDP_PACKET_HEADROOM; + if (size_chk < 0) + return -EINVAL; + + umem->pid = get_task_pid(current, PIDTYPE_PID); + umem->size = (size_t)size; + umem->address = (unsigned long)addr; + umem->props.frame_size = frame_size; + umem->props.nframes = nframes; + umem->frame_headroom = frame_headroom; + umem->npgs = size / PAGE_SIZE; + umem->pgs = NULL; + umem->user = NULL; + + umem->frame_size_log2 = ilog2(frame_size); + umem->nfpp_mask = nfpp - 1; + umem->nfpplog2 = ilog2(nfpp); + atomic_set(&umem->users, 1); + + err = xdp_umem_account_pages(umem); + if (err) + goto out; + + err = xdp_umem_pin_pages(umem); + if (err) + goto out_account; + return 0; + +out_account: + xdp_umem_unaccount_pages(umem); +out: + put_pid(umem->pid); + return err; +} diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h new file mode 100644 index 000000000000..4597ae81a221 --- /dev/null +++ b/net/xdp/xdp_umem.h @@ -0,0 +1,45 @@ +/* SPDX-License-Identifier: GPL-2.0 + * XDP user-space packet buffer + * Copyright(c) 2018 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef XDP_UMEM_H_ +#define XDP_UMEM_H_ + +#include +#include +#include + +#include "xdp_umem_props.h" + +struct xdp_umem { + struct page **pgs; + struct xdp_umem_props props; + u32 npgs; + u32 frame_headroom; + u32 nfpp_mask; + u32 nfpplog2; + u32 frame_size_log2; + struct user_struct *user; + struct pid *pid; + unsigned long address; + size_t size; + atomic_t users; + struct work_struct work; +}; + +int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr); +void xdp_get_umem(struct xdp_umem *umem); +void xdp_put_umem(struct xdp_umem *umem); +int xdp_umem_create(struct xdp_umem **umem); + +#endif /* XDP_UMEM_H_ */ diff --git a/net/xdp/xdp_umem_props.h b/net/xdp/xdp_umem_props.h new file mode 100644 index 000000000000..77fb5daf29f3 --- /dev/null +++ b/net/xdp/xdp_umem_props.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 + * XDP user-space packet buffer + * Copyright(c) 2018 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef XDP_UMEM_PROPS_H_ +#define XDP_UMEM_PROPS_H_ + +struct xdp_umem_props { + u32 frame_size; + u32 nframes; +}; + +#endif /* XDP_UMEM_PROPS_H_ */ diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c new file mode 100644 index 000000000000..84e0e867febb --- /dev/null +++ b/net/xdp/xsk.c @@ -0,0 +1,215 @@ +// SPDX-License-Identifier: GPL-2.0 +/* XDP sockets + * + * AF_XDP sockets allows a channel between XDP programs and userspace + * applications. + * Copyright(c) 2018 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * Author(s): Björn Töpel + * Magnus Karlsson + */ + +#define pr_fmt(fmt) "AF_XDP: %s: " fmt, __func__ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "xdp_umem.h" + +static struct xdp_sock *xdp_sk(struct sock *sk) +{ + return (struct xdp_sock *)sk; +} + +static int xsk_release(struct socket *sock) +{ + struct sock *sk = sock->sk; + struct net *net; + + if (!sk) + return 0; + + net = sock_net(sk); + + local_bh_disable(); + sock_prot_inuse_add(net, sk->sk_prot, -1); + local_bh_enable(); + + sock_orphan(sk); + sock->sk = NULL; + + sk_refcnt_debug_release(sk); + sock_put(sk); + + return 0; +} + +static int xsk_setsockopt(struct socket *sock, int level, int optname, + char __user *optval, unsigned int optlen) +{ + struct sock *sk = sock->sk; + struct xdp_sock *xs = xdp_sk(sk); + int err; + + if (level != SOL_XDP) + return -ENOPROTOOPT; + + switch (optname) { + case XDP_UMEM_REG: + { + struct xdp_umem_reg mr; + struct xdp_umem *umem; + + if (xs->umem) + return -EBUSY; + + if (copy_from_user(&mr, optval, sizeof(mr))) + return -EFAULT; + + mutex_lock(&xs->mutex); + err = xdp_umem_create(&umem); + + err = xdp_umem_reg(umem, &mr); + if (err) { + kfree(umem); + mutex_unlock(&xs->mutex); + return err; + } + + /* Make sure umem is ready before it can be seen by others */ + smp_wmb(); + + xs->umem = umem; + mutex_unlock(&xs->mutex); + return 0; + } + default: + break; + } + + return -ENOPROTOOPT; +} + +static struct proto xsk_proto = { + .name = "XDP", + .owner = THIS_MODULE, + .obj_size = sizeof(struct xdp_sock), +}; + +static const struct proto_ops xsk_proto_ops = { + .family = PF_XDP, + .owner = THIS_MODULE, + .release = xsk_release, + .bind = sock_no_bind, + .connect = sock_no_connect, + .socketpair = sock_no_socketpair, + .accept = sock_no_accept, + .getname = sock_no_getname, + .poll = sock_no_poll, + .ioctl = sock_no_ioctl, + .listen = sock_no_listen, + .shutdown = sock_no_shutdown, + .setsockopt = xsk_setsockopt, + .getsockopt = sock_no_getsockopt, + .sendmsg = sock_no_sendmsg, + .recvmsg = sock_no_recvmsg, + .mmap = sock_no_mmap, + .sendpage = sock_no_sendpage, +}; + +static void xsk_destruct(struct sock *sk) +{ + struct xdp_sock *xs = xdp_sk(sk); + + if (!sock_flag(sk, SOCK_DEAD)) + return; + + xdp_put_umem(xs->umem); + + sk_refcnt_debug_dec(sk); +} + +static int xsk_create(struct net *net, struct socket *sock, int protocol, + int kern) +{ + struct sock *sk; + struct xdp_sock *xs; + + if (!ns_capable(net->user_ns, CAP_NET_RAW)) + return -EPERM; + if (sock->type != SOCK_RAW) + return -ESOCKTNOSUPPORT; + + if (protocol) + return -EPROTONOSUPPORT; + + sock->state = SS_UNCONNECTED; + + sk = sk_alloc(net, PF_XDP, GFP_KERNEL, &xsk_proto, kern); + if (!sk) + return -ENOBUFS; + + sock->ops = &xsk_proto_ops; + + sock_init_data(sock, sk); + + sk->sk_family = PF_XDP; + + sk->sk_destruct = xsk_destruct; + sk_refcnt_debug_inc(sk); + + xs = xdp_sk(sk); + mutex_init(&xs->mutex); + + local_bh_disable(); + sock_prot_inuse_add(net, &xsk_proto, 1); + local_bh_enable(); + + return 0; +} + +static const struct net_proto_family xsk_family_ops = { + .family = PF_XDP, + .create = xsk_create, + .owner = THIS_MODULE, +}; + +static int __init xsk_init(void) +{ + int err; + + err = proto_register(&xsk_proto, 0 /* no slab */); + if (err) + goto out; + + err = sock_register(&xsk_family_ops); + if (err) + goto out_proto; + + return 0; + +out_proto: + proto_unregister(&xsk_proto); +out: + return err; +} + +fs_initcall(xsk_init); From patchwork Mon Nov 2 12:48:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392239 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt7641mYz9sWj; Mon, 2 Nov 2020 23:50:26 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHx-0006sq-Pw; Mon, 02 Nov 2020 12:50:21 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHB-0006G3-Lt for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:33 +0000 Received: from mail-qk1-f200.google.com ([209.85.222.200]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH8-00047W-Vw for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:31 +0000 Received: by mail-qk1-f200.google.com with SMTP id k12so8536548qkj.18 for ; Mon, 02 Nov 2020 04:49:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=YLI9FuecYP2JBWSusZOhn2Z0/9n7i3ECeYXDqFMp7dQ=; b=RF8WGXO1Oruz4hXd72eu7jsb2ORJ1WplgsD0HAcEhbxPBxn13Z0hi7EuaVLkwZRCyT O2VByqdkK96WPwBLnSjp0z+9M4j7q1iXvnCnX87KhlWlwY17QvCBaxXSEAbMCSln9TIu ldZcqZ1X+SgeEYKM7qkMFyvoVwig3O7WDmvOCxiE7HviE9dqnLVeeFMlFsbr8q4wJpeI SBDj1RgjGzueLkRweZ5aygAb09q6JPv3aUHd8xnzpVB/GM068FTX1xhyqxU7WIcmA//J rFvsRCPpGDxzifQsg82VUMdrASny2teSGFdvLR3V8O/O12R3qJxgr5Pzuu+KGcu4CMHq Y2rw== X-Gm-Message-State: AOAM532LMlsLYMiKuUMb3ghlwd+FbGF4Ym3iL468aaEX330kRNolk9sP Ik34DGfsEQ+pKN5CLDydGEUmAL0eWMbuOfR3FScvON7JCF49Xv+fagVOBSzTIUwx7Ihw38itwXk wJQ+IIZsX6f7xUprC02YggFySIynOmzCKdAFGymB0Zg== X-Received: by 2002:a37:4a44:: with SMTP id x65mr14599861qka.140.1604321369573; Mon, 02 Nov 2020 04:49:29 -0800 (PST) X-Google-Smtp-Source: ABdhPJxI4+90fJBeOjJSeNjPFgETVrCrw6TkOcA68rR/2utWv04IDUylS3OHsM+B1r95TijKMNuBqQ== X-Received: by 2002:a37:4a44:: with SMTP id x65mr14599840qka.140.1604321369315; Mon, 02 Nov 2020 04:49:29 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:28 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 28/40] xsk: add umem fill queue support and mmap Date: Mon, 2 Nov 2020 07:48:44 -0500 Message-Id: <20201102124856.4659-29-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Magnus Karlsson BugLink: https://bugs.launchpad.net/bugs/1877654 Here, we add another setsockopt for registered user memory (umem) called XDP_UMEM_FILL_QUEUE. Using this socket option, the process can ask the kernel to allocate a queue (ring buffer) and also mmap it (XDP_UMEM_PGOFF_FILL_QUEUE) into the process. The queue is used to explicitly pass ownership of umem frames from the user process to the kernel. These frames will in a later patch be filled in with Rx packet data by the kernel. v2: Fixed potential crash in xsk_mmap. Signed-off-by: Magnus Karlsson Signed-off-by: Alexei Starovoitov (cherry picked from commit 423f38329d267969130fb6f2c685f73d72687558) Signed-off-by: William Breathitt Gray --- include/uapi/linux/if_xdp.h | 15 +++++++++ net/xdp/Makefile | 2 +- net/xdp/xdp_umem.c | 5 +++ net/xdp/xdp_umem.h | 2 ++ net/xdp/xsk.c | 65 ++++++++++++++++++++++++++++++++++++- net/xdp/xsk_queue.c | 58 +++++++++++++++++++++++++++++++++ net/xdp/xsk_queue.h | 38 ++++++++++++++++++++++ 7 files changed, 183 insertions(+), 2 deletions(-) create mode 100644 net/xdp/xsk_queue.c create mode 100644 net/xdp/xsk_queue.h diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index 41252135a0fe..975661e1baca 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -23,6 +23,7 @@ /* XDP socket options */ #define XDP_UMEM_REG 3 +#define XDP_UMEM_FILL_RING 4 struct xdp_umem_reg { __u64 addr; /* Start of packet data area */ @@ -31,4 +32,18 @@ struct xdp_umem_reg { __u32 frame_headroom; /* Frame head room */ }; +/* Pgoff for mmaping the rings */ +#define XDP_UMEM_PGOFF_FILL_RING 0x100000000 + +struct xdp_ring { + __u32 producer __attribute__((aligned(64))); + __u32 consumer __attribute__((aligned(64))); +}; + +/* Used for the fill and completion queues for buffers */ +struct xdp_umem_ring { + struct xdp_ring ptrs; + __u32 desc[0] __attribute__((aligned(64))); +}; + #endif /* _LINUX_IF_XDP_H */ diff --git a/net/xdp/Makefile b/net/xdp/Makefile index a5d736640a0f..074fb2b2d51c 100644 --- a/net/xdp/Makefile +++ b/net/xdp/Makefile @@ -1,2 +1,2 @@ -obj-$(CONFIG_XDP_SOCKETS) += xsk.o xdp_umem.o +obj-$(CONFIG_XDP_SOCKETS) += xsk.o xdp_umem.o xsk_queue.o diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index ec8b3552be44..e1f627d0cc1c 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -65,6 +65,11 @@ static void xdp_umem_release(struct xdp_umem *umem) struct task_struct *task; struct mm_struct *mm; + if (umem->fq) { + xskq_destroy(umem->fq); + umem->fq = NULL; + } + if (umem->pgs) { xdp_umem_unpin_pages(umem); diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h index 4597ae81a221..25634b8a5c6f 100644 --- a/net/xdp/xdp_umem.h +++ b/net/xdp/xdp_umem.h @@ -19,9 +19,11 @@ #include #include +#include "xsk_queue.h" #include "xdp_umem_props.h" struct xdp_umem { + struct xsk_queue *fq; struct page **pgs; struct xdp_umem_props props; u32 npgs; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 84e0e867febb..da67a3c5c1c9 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -32,6 +32,7 @@ #include #include +#include "xsk_queue.h" #include "xdp_umem.h" static struct xdp_sock *xdp_sk(struct sock *sk) @@ -39,6 +40,21 @@ static struct xdp_sock *xdp_sk(struct sock *sk) return (struct xdp_sock *)sk; } +static int xsk_init_queue(u32 entries, struct xsk_queue **queue) +{ + struct xsk_queue *q; + + if (entries == 0 || *queue || !is_power_of_2(entries)) + return -EINVAL; + + q = xskq_create(entries); + if (!q) + return -ENOMEM; + + *queue = q; + return 0; +} + static int xsk_release(struct socket *sock) { struct sock *sk = sock->sk; @@ -101,6 +117,23 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, mutex_unlock(&xs->mutex); return 0; } + case XDP_UMEM_FILL_RING: + { + struct xsk_queue **q; + int entries; + + if (!xs->umem) + return -EINVAL; + + if (copy_from_user(&entries, optval, sizeof(entries))) + return -EFAULT; + + mutex_lock(&xs->mutex); + q = &xs->umem->fq; + err = xsk_init_queue(entries, q); + mutex_unlock(&xs->mutex); + return err; + } default: break; } @@ -108,6 +141,36 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, return -ENOPROTOOPT; } +static int xsk_mmap(struct file *file, struct socket *sock, + struct vm_area_struct *vma) +{ + unsigned long offset = vma->vm_pgoff << PAGE_SHIFT; + unsigned long size = vma->vm_end - vma->vm_start; + struct xdp_sock *xs = xdp_sk(sock->sk); + struct xsk_queue *q = NULL; + unsigned long pfn; + struct page *qpg; + + if (!xs->umem) + return -EINVAL; + + if (offset == XDP_UMEM_PGOFF_FILL_RING) + q = xs->umem->fq; + else + return -EINVAL; + + if (!q) + return -EINVAL; + + qpg = virt_to_head_page(q->ring); + if (size > (PAGE_SIZE << compound_order(qpg))) + return -EINVAL; + + pfn = virt_to_phys(q->ring) >> PAGE_SHIFT; + return remap_pfn_range(vma, vma->vm_start, pfn, + size, vma->vm_page_prot); +} + static struct proto xsk_proto = { .name = "XDP", .owner = THIS_MODULE, @@ -131,7 +194,7 @@ static const struct proto_ops xsk_proto_ops = { .getsockopt = sock_no_getsockopt, .sendmsg = sock_no_sendmsg, .recvmsg = sock_no_recvmsg, - .mmap = sock_no_mmap, + .mmap = xsk_mmap, .sendpage = sock_no_sendpage, }; diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c new file mode 100644 index 000000000000..23da4f29d3fb --- /dev/null +++ b/net/xdp/xsk_queue.c @@ -0,0 +1,58 @@ +// SPDX-License-Identifier: GPL-2.0 +/* XDP user-space ring structure + * Copyright(c) 2018 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include + +#include "xsk_queue.h" + +static u32 xskq_umem_get_ring_size(struct xsk_queue *q) +{ + return sizeof(struct xdp_umem_ring) + q->nentries * sizeof(u32); +} + +struct xsk_queue *xskq_create(u32 nentries) +{ + struct xsk_queue *q; + gfp_t gfp_flags; + size_t size; + + q = kzalloc(sizeof(*q), GFP_KERNEL); + if (!q) + return NULL; + + q->nentries = nentries; + q->ring_mask = nentries - 1; + + gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | + __GFP_COMP | __GFP_NORETRY; + size = xskq_umem_get_ring_size(q); + + q->ring = (struct xdp_ring *)__get_free_pages(gfp_flags, + get_order(size)); + if (!q->ring) { + kfree(q); + return NULL; + } + + return q; +} + +void xskq_destroy(struct xsk_queue *q) +{ + if (!q) + return; + + page_frag_free(q->ring); + kfree(q); +} diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h new file mode 100644 index 000000000000..7eb556bf73be --- /dev/null +++ b/net/xdp/xsk_queue.h @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: GPL-2.0 + * XDP user-space ring structure + * Copyright(c) 2018 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef _LINUX_XSK_QUEUE_H +#define _LINUX_XSK_QUEUE_H + +#include +#include + +#include "xdp_umem_props.h" + +struct xsk_queue { + struct xdp_umem_props umem_props; + u32 ring_mask; + u32 nentries; + u32 prod_head; + u32 prod_tail; + u32 cons_head; + u32 cons_tail; + struct xdp_ring *ring; + u64 invalid_descs; +}; + +struct xsk_queue *xskq_create(u32 nentries); +void xskq_destroy(struct xsk_queue *q); + +#endif /* _LINUX_XSK_QUEUE_H */ From patchwork Mon Nov 2 12:48:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392241 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt792GFcz9sWv; Mon, 2 Nov 2020 23:50:28 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZHz-0006v1-Sy; Mon, 02 Nov 2020 12:50:23 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHC-0006H1-9a for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:34 +0000 Received: from mail-qt1-f200.google.com ([209.85.160.200]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZH9-00047d-Mo for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:31 +0000 Received: by mail-qt1-f200.google.com with SMTP id f10so7926562qtv.6 for ; Mon, 02 Nov 2020 04:49:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oaCp/1CJR4DXgmYyW2GXKI2GrRv/+VvsAv/nyhTM/Hw=; b=KtbaXFanJrKZY388h9QvqXOcZyKM1Nfo8FvSMI/w2DubPwaSDWzxL2MMBa3ILMLM45 bxWca+mGdD8LHufCZAwYfyAAjEuJ7D+EkyEbHnHSBuh7u6bWSEhqj6FKiF6dtcbrY/VX 4cK25/ZuJ7KyeB8ytRlb1YRZQY+xiPjRYB+85AxsHdEowhUgUsMa1kljWs3XV3cghuVb x9hAQf0vToosel6jN2XDWTllxPyg2iPMCMUDoM62xrQf64mbxupZdelmrILtnHP4ej1S 7D1xe0LXbgPtp8Rr3lwOAtu4sbgUb2pVOOYEcAvayd3SmSp+63R6W1/q+BjWp1V/t1Bb x49Q== X-Gm-Message-State: AOAM5314OpRzdoDhsWf27B7uGc26QJx9BLmODELxXbV1mo1vl9QW2p80 AkgtpO/rwQIyYf4t7zlJ0oWX75FVjU5jogY1YO5i7L3sZ2yYtLp5PFD8FnBSR11RpPL6BVQ6Dr9 ovU6Skf5goo1Ka6ogkM3uVMFQ+kjPsq8783W+pIHq0w== X-Received: by 2002:a37:3d5:: with SMTP id 204mr15331110qkd.373.1604321370423; Mon, 02 Nov 2020 04:49:30 -0800 (PST) X-Google-Smtp-Source: ABdhPJzChc8HqvGnWEDLsioYWx+0pVpBoJ2d50wrPmETQY4r1blGHOWsZbafooqOtnAp8mk8kod3NA== X-Received: by 2002:a37:3d5:: with SMTP id 204mr15331094qkd.373.1604321370168; Mon, 02 Nov 2020 04:49:30 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:29 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 29/40] xsk: add Rx queue setup and mmap support Date: Mon, 2 Nov 2020 07:48:45 -0500 Message-Id: <20201102124856.4659-30-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Björn Töpel BugLink: https://bugs.launchpad.net/bugs/1877654 Another setsockopt (XDP_RX_QUEUE) is added to let the process allocate a queue, where the kernel can pass completed Rx frames from the kernel to user process. The mmapping of the queue is done using the XDP_PGOFF_RX_QUEUE offset. Signed-off-by: Björn Töpel Signed-off-by: Alexei Starovoitov (cherry picked from commit b9b6b68e8abd101be6eb5330e4999218c696d1e8) Signed-off-by: William Breathitt Gray --- include/net/xdp_sock.h | 4 ++++ include/uapi/linux/if_xdp.h | 16 +++++++++++++++ net/xdp/xsk.c | 41 +++++++++++++++++++++++++++++-------- net/xdp/xsk_queue.c | 11 ++++++++-- net/xdp/xsk_queue.h | 2 +- 5 files changed, 62 insertions(+), 12 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 94785f5db13e..db9a321de087 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -18,11 +18,15 @@ #include #include +struct net_device; +struct xsk_queue; struct xdp_umem; struct xdp_sock { /* struct sock must be the first member of struct xdp_sock */ struct sock sk; + struct xsk_queue *rx; + struct net_device *dev; struct xdp_umem *umem; /* Protects multiple processes in the control path */ struct mutex mutex; diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index 975661e1baca..65324558829d 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -22,6 +22,7 @@ #include /* XDP socket options */ +#define XDP_RX_RING 1 #define XDP_UMEM_REG 3 #define XDP_UMEM_FILL_RING 4 @@ -33,13 +34,28 @@ struct xdp_umem_reg { }; /* Pgoff for mmaping the rings */ +#define XDP_PGOFF_RX_RING 0 #define XDP_UMEM_PGOFF_FILL_RING 0x100000000 +struct xdp_desc { + __u32 idx; + __u32 len; + __u16 offset; + __u8 flags; + __u8 padding[5]; +}; + struct xdp_ring { __u32 producer __attribute__((aligned(64))); __u32 consumer __attribute__((aligned(64))); }; +/* Used for the RX and TX queues for packets */ +struct xdp_rxtx_ring { + struct xdp_ring ptrs; + struct xdp_desc desc[0] __attribute__((aligned(64))); +}; + /* Used for the fill and completion queues for buffers */ struct xdp_umem_ring { struct xdp_ring ptrs; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index da67a3c5c1c9..92bd9b7e548f 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -31,6 +31,7 @@ #include #include #include +#include #include "xsk_queue.h" #include "xdp_umem.h" @@ -40,14 +41,15 @@ static struct xdp_sock *xdp_sk(struct sock *sk) return (struct xdp_sock *)sk; } -static int xsk_init_queue(u32 entries, struct xsk_queue **queue) +static int xsk_init_queue(u32 entries, struct xsk_queue **queue, + bool umem_queue) { struct xsk_queue *q; if (entries == 0 || *queue || !is_power_of_2(entries)) return -EINVAL; - q = xskq_create(entries); + q = xskq_create(entries, umem_queue); if (!q) return -ENOMEM; @@ -89,6 +91,22 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, return -ENOPROTOOPT; switch (optname) { + case XDP_RX_RING: + { + struct xsk_queue **q; + int entries; + + if (optlen < sizeof(entries)) + return -EINVAL; + if (copy_from_user(&entries, optval, sizeof(entries))) + return -EFAULT; + + mutex_lock(&xs->mutex); + q = &xs->rx; + err = xsk_init_queue(entries, q, false); + mutex_unlock(&xs->mutex); + return err; + } case XDP_UMEM_REG: { struct xdp_umem_reg mr; @@ -130,7 +148,7 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, mutex_lock(&xs->mutex); q = &xs->umem->fq; - err = xsk_init_queue(entries, q); + err = xsk_init_queue(entries, q, true); mutex_unlock(&xs->mutex); return err; } @@ -151,13 +169,17 @@ static int xsk_mmap(struct file *file, struct socket *sock, unsigned long pfn; struct page *qpg; - if (!xs->umem) - return -EINVAL; + if (offset == XDP_PGOFF_RX_RING) { + q = xs->rx; + } else { + if (!xs->umem) + return -EINVAL; - if (offset == XDP_UMEM_PGOFF_FILL_RING) - q = xs->umem->fq; - else - return -EINVAL; + if (offset == XDP_UMEM_PGOFF_FILL_RING) + q = xs->umem->fq; + else + return -EINVAL; + } if (!q) return -EINVAL; @@ -205,6 +227,7 @@ static void xsk_destruct(struct sock *sk) if (!sock_flag(sk, SOCK_DEAD)) return; + xskq_destroy(xs->rx); xdp_put_umem(xs->umem); sk_refcnt_debug_dec(sk); diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c index 23da4f29d3fb..894f9f89afc7 100644 --- a/net/xdp/xsk_queue.c +++ b/net/xdp/xsk_queue.c @@ -21,7 +21,13 @@ static u32 xskq_umem_get_ring_size(struct xsk_queue *q) return sizeof(struct xdp_umem_ring) + q->nentries * sizeof(u32); } -struct xsk_queue *xskq_create(u32 nentries) +static u32 xskq_rxtx_get_ring_size(struct xsk_queue *q) +{ + return (sizeof(struct xdp_ring) + + q->nentries * sizeof(struct xdp_desc)); +} + +struct xsk_queue *xskq_create(u32 nentries, bool umem_queue) { struct xsk_queue *q; gfp_t gfp_flags; @@ -36,7 +42,8 @@ struct xsk_queue *xskq_create(u32 nentries) gfp_flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP | __GFP_NORETRY; - size = xskq_umem_get_ring_size(q); + size = umem_queue ? xskq_umem_get_ring_size(q) : + xskq_rxtx_get_ring_size(q); q->ring = (struct xdp_ring *)__get_free_pages(gfp_flags, get_order(size)); diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 7eb556bf73be..5439fa381763 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -32,7 +32,7 @@ struct xsk_queue { u64 invalid_descs; }; -struct xsk_queue *xskq_create(u32 nentries); +struct xsk_queue *xskq_create(u32 nentries, bool umem_queue); void xskq_destroy(struct xsk_queue *q); #endif /* _LINUX_XSK_QUEUE_H */ From patchwork Mon Nov 2 12:48:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392240 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt7B2BCWz9sWx; Mon, 2 Nov 2020 23:50:30 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZI0-0006w0-Oi; Mon, 02 Nov 2020 12:50:24 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHD-0006HH-1q for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:35 +0000 Received: from mail-qt1-f197.google.com ([209.85.160.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHA-00047q-Hl for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:32 +0000 Received: by mail-qt1-f197.google.com with SMTP id 7so493344qtw.23 for ; Mon, 02 Nov 2020 04:49:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=w/P8+QVS6fu5ABAunfHp43TLq02TdIs0Cslx/y/R6Qw=; b=DlUkTlvc0gEA0YLHCsVTV3u+Sxxj/65jlHS6tGwY8Z8PVMbwpGepGpinOrmohpNGW3 1xTAMJRIjKfQOlwqtHeDimBjz4NT07+TgiZUbdBfCzwnEj3gLFIHa1ighjDZWCq1Kx+f JUu/MuvTso15WoUc66CmmeHo6TQ7EVVtm8bGZVIyyAdbazRwdVbxTG8X3DDHaWHmqIqP 4YeqgpFOEClVn1HFs7Y/rkEKg4DWa3Q/q6+K0MTdhDOR++EIGCDUUjaovmRjivEmC0LT uXHoximoObgc3jOaTs7/muOsuQ4Iabn9LZ0+Fo8tl6/oPky+QBvxyVEbBvZ351QihlLF 7qDA== X-Gm-Message-State: AOAM531+ZiEnJmna22nx6zU5R2ToA/Nd7WGTq8I6brEwZVE7ueCNzCMo YJvUR4RuAEHYXqn6IB3PrloYkePIf0fkL0dy7JWNCZPSHv8QxcC45MBJAwVKqvpozEEjbtAmfub 0HYHVaQZqQA4i3LIimypY45ykL9kbie9hGvmKlURytg== X-Received: by 2002:ae9:ebca:: with SMTP id b193mr2313995qkg.235.1604321371265; Mon, 02 Nov 2020 04:49:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJzOW7UwgfEoQRSYUUQUipHBpQQv0pFTnCAgHrT1b1T5u/DLrDilFaH7pNF7CsDFErdDcH8g4Q== X-Received: by 2002:ae9:ebca:: with SMTP id b193mr2313983qkg.235.1604321371028; Mon, 02 Nov 2020 04:49:31 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:30 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 30/40] xsk: add support for bind for Rx Date: Mon, 2 Nov 2020 07:48:46 -0500 Message-Id: <20201102124856.4659-31-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Magnus Karlsson BugLink: https://bugs.launchpad.net/bugs/1877654 Here, the bind syscall is added. Binding an AF_XDP socket, means associating the socket to an umem, a netdev and a queue index. This can be done in two ways. The first way, creating a "socket from scratch". Create the umem using the XDP_UMEM_REG setsockopt and an associated fill queue with XDP_UMEM_FILL_QUEUE. Create the Rx queue using the XDP_RX_QUEUE setsockopt. Call bind passing ifindex and queue index ("channel" in ethtool speak). The second way to bind a socket, is simply skipping the umem/netdev/queue index, and passing another already setup AF_XDP socket. The new socket will then have the same umem/netdev/queue index as the parent so it will share the same umem. You must also set the flags field in the socket address to XDP_SHARED_UMEM. v2: Use PTR_ERR instead of passing error variable explicitly. Signed-off-by: Magnus Karlsson Signed-off-by: Alexei Starovoitov (cherry picked from commit 965a990984432cd01a9eb3514c64d86f56704295) Signed-off-by: William Breathitt Gray --- include/net/xdp_sock.h | 1 + include/uapi/linux/if_xdp.h | 11 ++++ net/xdp/xdp_umem.c | 5 ++ net/xdp/xdp_umem.h | 1 + net/xdp/xsk.c | 124 +++++++++++++++++++++++++++++++++++- net/xdp/xsk_queue.c | 8 +++ net/xdp/xsk_queue.h | 1 + 7 files changed, 150 insertions(+), 1 deletion(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index db9a321de087..85d02512f59b 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -28,6 +28,7 @@ struct xdp_sock { struct xsk_queue *rx; struct net_device *dev; struct xdp_umem *umem; + u16 queue_id; /* Protects multiple processes in the control path */ struct mutex mutex; }; diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index 65324558829d..e5091881f776 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -21,6 +21,17 @@ #include +/* Options for the sxdp_flags field */ +#define XDP_SHARED_UMEM 1 + +struct sockaddr_xdp { + __u16 sxdp_family; + __u32 sxdp_ifindex; + __u32 sxdp_queue_id; + __u32 sxdp_shared_umem_fd; + __u16 sxdp_flags; +}; + /* XDP socket options */ #define XDP_RX_RING 1 #define XDP_UMEM_REG 3 diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index e1f627d0cc1c..9bac1ad570fa 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -248,3 +248,8 @@ int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) put_pid(umem->pid); return err; } + +bool xdp_umem_validate_queues(struct xdp_umem *umem) +{ + return umem->fq; +} diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h index 25634b8a5c6f..b13133e9c501 100644 --- a/net/xdp/xdp_umem.h +++ b/net/xdp/xdp_umem.h @@ -39,6 +39,7 @@ struct xdp_umem { struct work_struct work; }; +bool xdp_umem_validate_queues(struct xdp_umem *umem); int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr); void xdp_get_umem(struct xdp_umem *umem); void xdp_put_umem(struct xdp_umem *umem); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 92bd9b7e548f..bf2c97b87992 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -57,9 +57,18 @@ static int xsk_init_queue(u32 entries, struct xsk_queue **queue, return 0; } +static void __xsk_release(struct xdp_sock *xs) +{ + /* Wait for driver to stop using the xdp socket. */ + synchronize_net(); + + dev_put(xs->dev); +} + static int xsk_release(struct socket *sock) { struct sock *sk = sock->sk; + struct xdp_sock *xs = xdp_sk(sk); struct net *net; if (!sk) @@ -71,6 +80,11 @@ static int xsk_release(struct socket *sock) sock_prot_inuse_add(net, sk->sk_prot, -1); local_bh_enable(); + if (xs->dev) { + __xsk_release(xs); + xs->dev = NULL; + } + sock_orphan(sk); sock->sk = NULL; @@ -80,6 +94,114 @@ static int xsk_release(struct socket *sock) return 0; } +static struct socket *xsk_lookup_xsk_from_fd(int fd) +{ + struct socket *sock; + int err; + + sock = sockfd_lookup(fd, &err); + if (!sock) + return ERR_PTR(-ENOTSOCK); + + if (sock->sk->sk_family != PF_XDP) { + sockfd_put(sock); + return ERR_PTR(-ENOPROTOOPT); + } + + return sock; +} + +static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) +{ + struct sockaddr_xdp *sxdp = (struct sockaddr_xdp *)addr; + struct sock *sk = sock->sk; + struct net_device *dev, *dev_curr; + struct xdp_sock *xs = xdp_sk(sk); + struct xdp_umem *old_umem = NULL; + int err = 0; + + if (addr_len < sizeof(struct sockaddr_xdp)) + return -EINVAL; + if (sxdp->sxdp_family != AF_XDP) + return -EINVAL; + + mutex_lock(&xs->mutex); + dev_curr = xs->dev; + dev = dev_get_by_index(sock_net(sk), sxdp->sxdp_ifindex); + if (!dev) { + err = -ENODEV; + goto out_release; + } + + if (!xs->rx) { + err = -EINVAL; + goto out_unlock; + } + + if (sxdp->sxdp_queue_id >= dev->num_rx_queues) { + err = -EINVAL; + goto out_unlock; + } + + if (sxdp->sxdp_flags & XDP_SHARED_UMEM) { + struct xdp_sock *umem_xs; + struct socket *sock; + + if (xs->umem) { + /* We have already our own. */ + err = -EINVAL; + goto out_unlock; + } + + sock = xsk_lookup_xsk_from_fd(sxdp->sxdp_shared_umem_fd); + if (IS_ERR(sock)) { + err = PTR_ERR(sock); + goto out_unlock; + } + + umem_xs = xdp_sk(sock->sk); + if (!umem_xs->umem) { + /* No umem to inherit. */ + err = -EBADF; + sockfd_put(sock); + goto out_unlock; + } else if (umem_xs->dev != dev || + umem_xs->queue_id != sxdp->sxdp_queue_id) { + err = -EINVAL; + sockfd_put(sock); + goto out_unlock; + } + + xdp_get_umem(umem_xs->umem); + old_umem = xs->umem; + xs->umem = umem_xs->umem; + sockfd_put(sock); + } else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) { + err = -EINVAL; + goto out_unlock; + } + + /* Rebind? */ + if (dev_curr && (dev_curr != dev || + xs->queue_id != sxdp->sxdp_queue_id)) { + __xsk_release(xs); + if (old_umem) + xdp_put_umem(old_umem); + } + + xs->dev = dev; + xs->queue_id = sxdp->sxdp_queue_id; + + xskq_set_umem(xs->rx, &xs->umem->props); + +out_unlock: + if (err) + dev_put(dev); +out_release: + mutex_unlock(&xs->mutex); + return err; +} + static int xsk_setsockopt(struct socket *sock, int level, int optname, char __user *optval, unsigned int optlen) { @@ -203,7 +325,7 @@ static const struct proto_ops xsk_proto_ops = { .family = PF_XDP, .owner = THIS_MODULE, .release = xsk_release, - .bind = sock_no_bind, + .bind = xsk_bind, .connect = sock_no_connect, .socketpair = sock_no_socketpair, .accept = sock_no_accept, diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c index 894f9f89afc7..d012e5e23591 100644 --- a/net/xdp/xsk_queue.c +++ b/net/xdp/xsk_queue.c @@ -16,6 +16,14 @@ #include "xsk_queue.h" +void xskq_set_umem(struct xsk_queue *q, struct xdp_umem_props *umem_props) +{ + if (!q) + return; + + q->umem_props = *umem_props; +} + static u32 xskq_umem_get_ring_size(struct xsk_queue *q) { return sizeof(struct xdp_umem_ring) + q->nentries * sizeof(u32); diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 5439fa381763..9ddd2ee07a84 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -32,6 +32,7 @@ struct xsk_queue { u64 invalid_descs; }; +void xskq_set_umem(struct xsk_queue *q, struct xdp_umem_props *umem_props); struct xsk_queue *xskq_create(u32 nentries, bool umem_queue); void xskq_destroy(struct xsk_queue *q); From patchwork Mon Nov 2 12:48:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392243 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt7B4rKzz9sX0; Mon, 2 Nov 2020 23:50:30 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZI2-0006xQ-8a; Mon, 02 Nov 2020 12:50:26 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHD-0006II-Hl for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:35 +0000 Received: from mail-qk1-f200.google.com ([209.85.222.200]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHC-000481-4Z for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:34 +0000 Received: by mail-qk1-f200.google.com with SMTP id t67so5117620qke.10 for ; Mon, 02 Nov 2020 04:49:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1LSXamUpvovp4p4EskbqXh4bRxOCyRipII6q8fYsnvM=; b=Vh4wdMm7HTYjZcdGd2i99oAcdhuKYchgUzSJuCLQH8urj/vQ/MbpKkAHzt0CuaOIoD /SgqDmENtne80HO+7I1CjU+mr4nqBcLF7Z9CSzbBl7Uw3AsKZQ99K6VTT2lRuf5zT67d oCoMdBUHcFMor2H/nqGSESSLstbVqD5F8//jaV2vjAsE+NddRPrLu3PMwlzdnowr6eVX edPowsKmMgxNgW3IPXXW7g1xfk4PQxKrGcBzwEk7N5D8KTMfcoNbwwkXnEe8Q42dNAj/ cTi5DyP63dfzCqjmqnBpneHFgmt9xSfyHypLTmAtBVcPBMlWiakcUnNrcnL4FBIyRDit MJSg== X-Gm-Message-State: AOAM533zm5CMHHoN9NDSq6BaYxMHRvrtnietbMq6PoE+lhR75AlqV1DX OMWlZTDznSFc1pdGWmLabeRHSDQRzQns7Wi+RGPYNPFhRdxGnUIiRKPSGFrUyLUIpkiIblhBBqE IDhLW6rdIP8X9R0aLh7tUbn3yBHBABEV/n3yRjZwIqA== X-Received: by 2002:ae9:dcc1:: with SMTP id q184mr15189007qkf.436.1604321372714; Mon, 02 Nov 2020 04:49:32 -0800 (PST) X-Google-Smtp-Source: ABdhPJyjIgter5sPCH3TBDakt66todULiJzMN0zzSjSyrz0l7rK7NFMqoPCGRoXcxwHMF6XwCMvoig== X-Received: by 2002:ae9:dcc1:: with SMTP id q184mr15188993qkf.436.1604321372444; Mon, 02 Nov 2020 04:49:32 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:31 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 31/40] xsk: add Rx receive functions and poll support Date: Mon, 2 Nov 2020 07:48:47 -0500 Message-Id: <20201102124856.4659-32-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Björn Töpel BugLink: https://bugs.launchpad.net/bugs/1877654 Here the actual receive functions of AF_XDP are implemented, that in a later commit, will be called from the XDP layers. There's one set of functions for the XDP_DRV side and another for XDP_SKB (generic). A new XDP API, xdp_return_buff, is also introduced. Adding xdp_return_buff, which is analogous to xdp_return_frame, but acts upon an struct xdp_buff. The API will be used by AF_XDP in future commits. Support for the poll syscall is also implemented. v2: xskq_validate_id did not update cons_tail. The entries variable was calculated twice in xskq_nb_avail. Squashed xdp_return_buff commit. Signed-off-by: Björn Töpel Signed-off-by: Alexei Starovoitov (cherry picked from commit c497176cb2e478f0a5713b0e05f242276e3194b5) Signed-off-by: William Breathitt Gray --- include/net/xdp.h | 1 + include/net/xdp_sock.h | 22 ++++++++ net/core/xdp.c | 15 ++++-- net/xdp/xdp_umem.h | 18 +++++++ net/xdp/xsk.c | 73 +++++++++++++++++++++++++- net/xdp/xsk_queue.h | 114 ++++++++++++++++++++++++++++++++++++++++- 6 files changed, 238 insertions(+), 5 deletions(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index 137ad5f9f40f..0b689cf561c7 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -104,6 +104,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp) } void xdp_return_frame(struct xdp_frame *xdpf); +void xdp_return_buff(struct xdp_buff *xdp); int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq, struct net_device *dev, u32 queue_index); diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 85d02512f59b..a0342dff6a4d 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -31,6 +31,28 @@ struct xdp_sock { u16 queue_id; /* Protects multiple processes in the control path */ struct mutex mutex; + u64 rx_dropped; }; +struct xdp_buff; +#ifdef CONFIG_XDP_SOCKETS +int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); +int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); +void xsk_flush(struct xdp_sock *xs); +#else +static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +{ + return -ENOTSUPP; +} + +static inline int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +{ + return -ENOTSUPP; +} + +static inline void xsk_flush(struct xdp_sock *xs) +{ +} +#endif /* CONFIG_XDP_SOCKETS */ + #endif /* _LINUX_XDP_SOCK_H */ diff --git a/net/core/xdp.c b/net/core/xdp.c index 0c86b53a3a63..bf6758f74339 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -308,11 +308,9 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, } EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model); -void xdp_return_frame(struct xdp_frame *xdpf) +static void xdp_return(void *data, struct xdp_mem_info *mem) { - struct xdp_mem_info *mem = &xdpf->mem; struct xdp_mem_allocator *xa; - void *data = xdpf->data; struct page *page; switch (mem->type) { @@ -339,4 +337,15 @@ void xdp_return_frame(struct xdp_frame *xdpf) break; } } + +void xdp_return_frame(struct xdp_frame *xdpf) +{ + xdp_return(xdpf->data, &xdpf->mem); +} EXPORT_SYMBOL_GPL(xdp_return_frame); + +void xdp_return_buff(struct xdp_buff *xdp) +{ + xdp_return(xdp->data, &xdp->rxq->mem); +} +EXPORT_SYMBOL_GPL(xdp_return_buff); diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h index b13133e9c501..c7378a11721f 100644 --- a/net/xdp/xdp_umem.h +++ b/net/xdp/xdp_umem.h @@ -39,6 +39,24 @@ struct xdp_umem { struct work_struct work; }; +static inline char *xdp_umem_get_data(struct xdp_umem *umem, u32 idx) +{ + u64 pg, off; + char *data; + + pg = idx >> umem->nfpplog2; + off = (idx & umem->nfpp_mask) << umem->frame_size_log2; + + data = page_address(umem->pgs[pg]); + return data + off; +} + +static inline char *xdp_umem_get_data_with_headroom(struct xdp_umem *umem, + u32 idx) +{ + return xdp_umem_get_data(umem, idx) + umem->frame_headroom; +} + bool xdp_umem_validate_queues(struct xdp_umem *umem); int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr); void xdp_get_umem(struct xdp_umem *umem); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index bf2c97b87992..4e1e6c581e1d 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -41,6 +41,74 @@ static struct xdp_sock *xdp_sk(struct sock *sk) return (struct xdp_sock *)sk; } +static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +{ + u32 *id, len = xdp->data_end - xdp->data; + void *buffer; + int err = 0; + + if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index) + return -EINVAL; + + id = xskq_peek_id(xs->umem->fq); + if (!id) + return -ENOSPC; + + buffer = xdp_umem_get_data_with_headroom(xs->umem, *id); + memcpy(buffer, xdp->data, len); + err = xskq_produce_batch_desc(xs->rx, *id, len, + xs->umem->frame_headroom); + if (!err) + xskq_discard_id(xs->umem->fq); + + return err; +} + +int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +{ + int err; + + err = __xsk_rcv(xs, xdp); + if (likely(!err)) + xdp_return_buff(xdp); + else + xs->rx_dropped++; + + return err; +} + +void xsk_flush(struct xdp_sock *xs) +{ + xskq_produce_flush_desc(xs->rx); + xs->sk.sk_data_ready(&xs->sk); +} + +int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) +{ + int err; + + err = __xsk_rcv(xs, xdp); + if (!err) + xsk_flush(xs); + else + xs->rx_dropped++; + + return err; +} + +static unsigned int xsk_poll(struct file *file, struct socket *sock, + struct poll_table_struct *wait) +{ + unsigned int mask = datagram_poll(file, sock, wait); + struct sock *sk = sock->sk; + struct xdp_sock *xs = xdp_sk(sk); + + if (xs->rx && !xskq_empty_desc(xs->rx)) + mask |= POLLIN | POLLRDNORM; + + return mask; +} + static int xsk_init_queue(u32 entries, struct xsk_queue **queue, bool umem_queue) { @@ -179,6 +247,9 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) } else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) { err = -EINVAL; goto out_unlock; + } else { + /* This xsk has its own umem. */ + xskq_set_umem(xs->umem->fq, &xs->umem->props); } /* Rebind? */ @@ -330,7 +401,7 @@ static const struct proto_ops xsk_proto_ops = { .socketpair = sock_no_socketpair, .accept = sock_no_accept, .getname = sock_no_getname, - .poll = sock_no_poll, + .poll = xsk_poll, .ioctl = sock_no_ioctl, .listen = sock_no_listen, .shutdown = sock_no_shutdown, diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 9ddd2ee07a84..0a9b92b4f93a 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -20,6 +20,8 @@ #include "xdp_umem_props.h" +#define RX_BATCH_SIZE 16 + struct xsk_queue { struct xdp_umem_props umem_props; u32 ring_mask; @@ -32,8 +34,118 @@ struct xsk_queue { u64 invalid_descs; }; +/* Common functions operating for both RXTX and umem queues */ + +static inline u32 xskq_nb_avail(struct xsk_queue *q, u32 dcnt) +{ + u32 entries = q->prod_tail - q->cons_tail; + + if (entries == 0) { + /* Refresh the local pointer */ + q->prod_tail = READ_ONCE(q->ring->producer); + entries = q->prod_tail - q->cons_tail; + } + + return (entries > dcnt) ? dcnt : entries; +} + +static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt) +{ + u32 free_entries = q->nentries - (producer - q->cons_tail); + + if (free_entries >= dcnt) + return free_entries; + + /* Refresh the local tail pointer */ + q->cons_tail = READ_ONCE(q->ring->consumer); + return q->nentries - (producer - q->cons_tail); +} + +/* UMEM queue */ + +static inline bool xskq_is_valid_id(struct xsk_queue *q, u32 idx) +{ + if (unlikely(idx >= q->umem_props.nframes)) { + q->invalid_descs++; + return false; + } + return true; +} + +static inline u32 *xskq_validate_id(struct xsk_queue *q) +{ + while (q->cons_tail != q->cons_head) { + struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring; + unsigned int idx = q->cons_tail & q->ring_mask; + + if (xskq_is_valid_id(q, ring->desc[idx])) + return &ring->desc[idx]; + + q->cons_tail++; + } + + return NULL; +} + +static inline u32 *xskq_peek_id(struct xsk_queue *q) +{ + struct xdp_umem_ring *ring; + + if (q->cons_tail == q->cons_head) { + WRITE_ONCE(q->ring->consumer, q->cons_tail); + q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE); + + /* Order consumer and data */ + smp_rmb(); + + return xskq_validate_id(q); + } + + ring = (struct xdp_umem_ring *)q->ring; + return &ring->desc[q->cons_tail & q->ring_mask]; +} + +static inline void xskq_discard_id(struct xsk_queue *q) +{ + q->cons_tail++; + (void)xskq_validate_id(q); +} + +/* Rx queue */ + +static inline int xskq_produce_batch_desc(struct xsk_queue *q, + u32 id, u32 len, u16 offset) +{ + struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; + unsigned int idx; + + if (xskq_nb_free(q, q->prod_head, 1) == 0) + return -ENOSPC; + + idx = (q->prod_head++) & q->ring_mask; + ring->desc[idx].idx = id; + ring->desc[idx].len = len; + ring->desc[idx].offset = offset; + + return 0; +} + +static inline void xskq_produce_flush_desc(struct xsk_queue *q) +{ + /* Order producer and data */ + smp_wmb(); + + q->prod_tail = q->prod_head, + WRITE_ONCE(q->ring->producer, q->prod_tail); +} + +static inline bool xskq_empty_desc(struct xsk_queue *q) +{ + return (xskq_nb_free(q, q->prod_tail, 1) == q->nentries); +} + void xskq_set_umem(struct xsk_queue *q, struct xdp_umem_props *umem_props); struct xsk_queue *xskq_create(u32 nentries, bool umem_queue); -void xskq_destroy(struct xsk_queue *q); +void xskq_destroy(struct xsk_queue *q_ops); #endif /* _LINUX_XSK_QUEUE_H */ From patchwork Mon Nov 2 12:48:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392244 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt7D28wxz9sWW; Mon, 2 Nov 2020 23:50:32 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZI3-0006ye-7K; Mon, 02 Nov 2020 12:50:27 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHE-0006JU-Rq for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:37 +0000 Received: from mail-qv1-f70.google.com ([209.85.219.70]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHD-000489-2P for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:35 +0000 Received: by mail-qv1-f70.google.com with SMTP id x34so8089537qvx.7 for ; Mon, 02 Nov 2020 04:49:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/y2g5ID3JwnV8sXkB5pBn42eiTzd3jjIoqNXTzzqLcE=; b=TO/r3p0zNH6/zjUJ/rbnobnvOuEidN57TjSgTPvdLtRkqskYXeH9HmLGYf6w8oV04O 204nk2OBY+L0MZ2x+tn3nFuphNd4cZ/tN493Zg7liVt8aw59WOGZc9IA1HR0VWirDHy3 c01P0itxm0fIM4yKGjZUVeqO+20ljQ8+niuRknJjwoP6JUIVnQ0gf/7H2EYcpXpgO1Mm ZZo7B1B3ZgDGg5sbHaydTUKTemsQGyMf4/XwYJ0iDp4gkwjSC74gRyy7sBuc34jhkmiO G6UhPv1HyaqXaPjMSz+pc90PXabsGAFwbS2/b6ArrGGPpXCfdGjmQd0WeZ4VVqTrmSab ElhQ== X-Gm-Message-State: AOAM532N7ktmAKoRMVP2VbIjx37yPdYZ1j2/k10bh1vJjAuD3tQb23wL L42LatgjkGeJuPHA7O7YgZu3CxhPZZzErgh3k2gBWIHmYkYaY5ZiuckVsHjRCqwQTDtP/JvK09n fTh+N1KROmRolm1Lbb3nm6OiY6t5vQ9SpbeSITyQrng== X-Received: by 2002:a0c:cd08:: with SMTP id b8mr19079080qvm.12.1604321373722; Mon, 02 Nov 2020 04:49:33 -0800 (PST) X-Google-Smtp-Source: ABdhPJxuUmjmlITdxFxRtnSItBE7XXkBBbjmP/Cb7KuKxqzxAOGL4gv+9Ap8sMnHl7kRl0x9vymHnQ== X-Received: by 2002:a0c:cd08:: with SMTP id b8mr19079067qvm.12.1604321373480; Mon, 02 Nov 2020 04:49:33 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:32 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 32/40] bpf: devmap introduce dev_map_enqueue Date: Mon, 2 Nov 2020 07:48:48 -0500 Message-Id: <20201102124856.4659-33-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 Functionality is the same, but the ndo_xdp_xmit call is now simply invoked from inside the devmap.c code. V2: Fix compile issue reported by kbuild test robot V5: Cleanups requested by Daniel - Newlines before func definition - Use BUILD_BUG_ON checks - Remove unnecessary use return value store in dev_map_enqueue Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Alexei Starovoitov (backported commit 67f29e07e131ffa13ea158c259a513f474c7df27) [ vilhelmgray: context adjustments ] Signed-off-by: William Breathitt Gray --- include/linux/bpf.h | 16 +++++++++++++--- include/trace/events/xdp.h | 9 ++++++++- kernel/bpf/devmap.c | 34 ++++++++++++++++++++++++++++------ net/core/filter.c | 15 ++------------- 4 files changed, 51 insertions(+), 23 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e5b17218e70d..788f881a5d99 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -413,14 +413,16 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size) int bpf_check(struct bpf_prog **fp, union bpf_attr *attr); /* Map specifics */ -struct net_device *__dev_map_lookup_elem(struct bpf_map *map, u32 key); +struct xdp_buff; + +struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key); void __dev_map_insert_ctx(struct bpf_map *map, u32 index); void __dev_map_flush(struct bpf_map *map); +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp); struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key); void __cpu_map_insert_ctx(struct bpf_map *map, u32 index); void __cpu_map_flush(struct bpf_map *map); -struct xdp_buff; int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp, struct net_device *dev_rx); @@ -499,6 +501,15 @@ static inline void __dev_map_flush(struct bpf_map *map) { } +struct xdp_buff; +struct bpf_dtab_netdev; + +static inline +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp) +{ + return 0; +} + static inline struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key) { @@ -513,7 +524,6 @@ static inline void __cpu_map_flush(struct bpf_map *map) { } -struct xdp_buff; static inline int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp, struct net_device *dev_rx) diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h index 8989a92c571a..96104610d40e 100644 --- a/include/trace/events/xdp.h +++ b/include/trace/events/xdp.h @@ -138,11 +138,18 @@ DEFINE_EVENT_PRINT(xdp_redirect_template, xdp_redirect_map_err, __entry->map_id, __entry->map_index) ); +#ifndef __DEVMAP_OBJ_TYPE +#define __DEVMAP_OBJ_TYPE +struct _bpf_dtab_netdev { + struct net_device *dev; +}; +#endif /* __DEVMAP_OBJ_TYPE */ + #define devmap_ifindex(fwd, map) \ (!fwd ? 0 : \ (!map ? 0 : \ ((map->map_type == BPF_MAP_TYPE_DEVMAP) ? \ - ((struct net_device *)fwd)->ifindex : 0))) + ((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0))) #define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx) \ trace_xdp_redirect_map(dev, xdp, devmap_ifindex(fwd, map), \ diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 91b1fbe18774..99240806ef71 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -48,13 +48,15 @@ * calls will fail at this point. */ #include +#include #include +#include #define DEV_CREATE_FLAG_MASK \ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) struct bpf_dtab_netdev { - struct net_device *dev; + struct net_device *dev; /* must be first member, due to tracepoint */ struct bpf_dtab *dtab; unsigned int bit; struct rcu_head rcu; @@ -250,21 +252,38 @@ void __dev_map_flush(struct bpf_map *map) * update happens in parallel here a dev_put wont happen until after reading the * ifindex. */ -struct net_device *__dev_map_lookup_elem(struct bpf_map *map, u32 key) +struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key) { struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map); - struct bpf_dtab_netdev *dev; + struct bpf_dtab_netdev *obj; if (key >= map->max_entries) return NULL; - dev = READ_ONCE(dtab->netdev_map[key]); - return dev ? dev->dev : NULL; + obj = READ_ONCE(dtab->netdev_map[key]); + return obj; +} + +int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp) +{ + struct net_device *dev = dst->dev; + struct xdp_frame *xdpf; + + if (!dev->netdev_ops->ndo_xdp_xmit) + return -EOPNOTSUPP; + + xdpf = convert_to_xdp_frame(xdp); + if (unlikely(!xdpf)) + return -EOVERFLOW; + + /* TODO: implement a bulking/enqueue step later */ + return dev->netdev_ops->ndo_xdp_xmit(dev, xdpf); } static void *dev_map_lookup_elem(struct bpf_map *map, void *key) { - struct net_device *dev = __dev_map_lookup_elem(map, *(u32 *)key); + struct bpf_dtab_netdev *obj = __dev_map_lookup_elem(map, *(u32 *)key); + struct net_device *dev = dev = obj ? obj->dev : NULL; return dev ? &dev->ifindex : NULL; } @@ -414,6 +433,9 @@ static struct notifier_block dev_map_notifier = { static int __init dev_map_init(void) { + /* Assure tracepoint shadow struct _bpf_dtab_netdev is in sync */ + BUILD_BUG_ON(offsetof(struct bpf_dtab_netdev, dev) != + offsetof(struct _bpf_dtab_netdev, dev)); register_netdevice_notifier(&dev_map_notifier); return 0; } diff --git a/net/core/filter.c b/net/core/filter.c index 24766051dfd6..c660f84b8067 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2550,20 +2550,9 @@ static int __bpf_tx_xdp_map(struct net_device *dev_rx, void *fwd, int err; if (map->map_type == BPF_MAP_TYPE_DEVMAP) { - struct net_device *dev = fwd; - struct xdp_frame *xdpf; + struct bpf_dtab_netdev *dst = fwd; - if (!dev->netdev_ops->ndo_xdp_xmit) - return -EOPNOTSUPP; - - xdpf = convert_to_xdp_frame(xdp); - if (unlikely(!xdpf)) - return -EOVERFLOW; - - /* TODO: move to inside map code instead, for bulk support - * err = dev_map_enqueue(dev, xdp); - */ - err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf); + err = dev_map_enqueue(dst, xdp); if (err) return err; __dev_map_insert_ctx(map, index); From patchwork Mon Nov 2 12:48:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392246 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt7M62vTz9sRR; Mon, 2 Nov 2020 23:50:39 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZIA-00077y-0p; Mon, 02 Nov 2020 12:50:34 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHG-0006K9-Q8 for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:38 +0000 Received: from mail-qv1-f69.google.com ([209.85.219.69]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHD-00048G-Pt for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:35 +0000 Received: by mail-qv1-f69.google.com with SMTP id j17so4927706qvi.21 for ; Mon, 02 Nov 2020 04:49:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WQUJKItgB9PJasJDVebnRPxKL5fTtJMmUyDrkRkn2hU=; b=Uljav/l9iYIp1AasB6pnK9Lp6eZiZToJ37RYJ+XK78LsAFjoPDyTqOMMnoAQRcbqfw 6jrMFQ+E6yGEZROYQdDbsPhRjOBI2gJRvu2NLZdDzKGKctBaiba3qkhHcX8LBD3eYEMX lDGZlI4Ts0QptRnVNEDMcfBqHglOiZbrPBxQp6RWymO3yxX4a3JRWCyTBy1OPbQxfzLd uJ6D+hLhbKk/sWKuVvHsxr7+xwidG8Ag22Znibd6wimJVQGdjotIuwbktl/qGgB6OBQu PdfBxPslVCc+sglxyYSx7SjuZwzHQmdyT9YVC7k6YiO3GWRK/KBGsIZ2ArfHC4g8KSkI tDMA== X-Gm-Message-State: AOAM531j+05c2o+Kztf3XmZCRhOwukFciSYOvJxV+GEbs+b9Bl63c15N AEAixb+GNEKA3uUa5YsLsjYgGSi1D+JiNy3sxqfMN0uQ+zQuIO3WJJcUHN6Q4CuRzeODbjT1g5Z pR/Jhw+gyoMpGayzrW2sndzS/mUxLvkmifMNsB7h5Tw== X-Received: by 2002:a0c:fdcb:: with SMTP id g11mr8823135qvs.58.1604321374641; Mon, 02 Nov 2020 04:49:34 -0800 (PST) X-Google-Smtp-Source: ABdhPJxa8yEuoqcTuW6c7OJijSYMhX+PhCwwtbLnIsA0xv5X4aw9dBOfy61srJ5B6+N7EZhRDIEdaQ== X-Received: by 2002:a0c:fdcb:: with SMTP id g11mr8823115qvs.58.1604321374424; Mon, 02 Nov 2020 04:49:34 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:33 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 33/40] bpf: devmap prepare xdp frames for bulking Date: Mon, 2 Nov 2020 07:48:49 -0500 Message-Id: <20201102124856.4659-34-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 Like cpumap create queue for xdp frames that will be bulked. For now, this patch simply invoke ndo_xdp_xmit foreach frame. This happens, either when the map flush operation is envoked, or when the limit DEV_MAP_BULK_SIZE is reached. V5: Avoid memleak on error path in dev_map_update_elem() Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Alexei Starovoitov (backported from commit 5d053f9da4311a86bc58be8588bb5660fb3f0724) [ vilhelmgray: context adjustment ] Signed-off-by: William Breathitt Gray --- kernel/bpf/devmap.c | 73 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 69 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 99240806ef71..7606c542b643 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -55,10 +55,17 @@ #define DEV_CREATE_FLAG_MASK \ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) +#define DEV_MAP_BULK_SIZE 16 +struct xdp_bulk_queue { + struct xdp_frame *q[DEV_MAP_BULK_SIZE]; + unsigned int count; +}; + struct bpf_dtab_netdev { struct net_device *dev; /* must be first member, due to tracepoint */ struct bpf_dtab *dtab; unsigned int bit; + struct xdp_bulk_queue __percpu *bulkq; struct rcu_head rcu; }; @@ -217,6 +224,34 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 bit) __set_bit(bit, bitmap); } +static int bq_xmit_all(struct bpf_dtab_netdev *obj, + struct xdp_bulk_queue *bq) +{ + struct net_device *dev = obj->dev; + int i; + + if (unlikely(!bq->count)) + return 0; + + for (i = 0; i < bq->count; i++) { + struct xdp_frame *xdpf = bq->q[i]; + + prefetch(xdpf); + } + + for (i = 0; i < bq->count; i++) { + struct xdp_frame *xdpf = bq->q[i]; + int err; + + err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf); + if (err) + xdp_return_frame(xdpf); + } + bq->count = 0; + + return 0; +} + /* __dev_map_flush is called from xdp_do_flush_map() which _must_ be signaled * from the driver before returning from its napi->poll() routine. The poll() * routine is called either from busy_poll context or net_rx_action signaled @@ -232,6 +267,7 @@ void __dev_map_flush(struct bpf_map *map) for_each_set_bit(bit, bitmap, map->max_entries) { struct bpf_dtab_netdev *dev = READ_ONCE(dtab->netdev_map[bit]); + struct xdp_bulk_queue *bq; struct net_device *netdev; /* This is possible if the dev entry is removed by user space @@ -240,6 +276,8 @@ void __dev_map_flush(struct bpf_map *map) if (unlikely(!dev)) continue; + bq = this_cpu_ptr(dev->bulkq); + bq_xmit_all(dev, bq); netdev = dev->dev; if (likely(netdev->netdev_ops->ndo_xdp_flush)) netdev->netdev_ops->ndo_xdp_flush(netdev); @@ -264,6 +302,20 @@ struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key) return obj; } +/* Runs under RCU-read-side, plus in softirq under NAPI protection. + * Thus, safe percpu variable access. + */ +static int bq_enqueue(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf) +{ + struct xdp_bulk_queue *bq = this_cpu_ptr(obj->bulkq); + + if (unlikely(bq->count == DEV_MAP_BULK_SIZE)) + bq_xmit_all(obj, bq); + + bq->q[bq->count++] = xdpf; + return 0; +} + int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp) { struct net_device *dev = dst->dev; @@ -276,8 +328,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp) if (unlikely(!xdpf)) return -EOVERFLOW; - /* TODO: implement a bulking/enqueue step later */ - return dev->netdev_ops->ndo_xdp_xmit(dev, xdpf); + return bq_enqueue(dst, xdpf); } static void *dev_map_lookup_elem(struct bpf_map *map, void *key) @@ -292,13 +343,18 @@ static void dev_map_flush_old(struct bpf_dtab_netdev *dev) { if (dev->dev->netdev_ops->ndo_xdp_flush) { struct net_device *fl = dev->dev; + struct xdp_bulk_queue *bq; unsigned long *bitmap; + int cpu; for_each_online_cpu(cpu) { bitmap = per_cpu_ptr(dev->dtab->flush_needed, cpu); __clear_bit(dev->bit, bitmap); + bq = per_cpu_ptr(dev->bulkq, cpu); + bq_xmit_all(dev, bq); + fl->netdev_ops->ndo_xdp_flush(dev->dev); } } @@ -310,6 +366,7 @@ static void __dev_map_entry_free(struct rcu_head *rcu) dev = container_of(rcu, struct bpf_dtab_netdev, rcu); dev_map_flush_old(dev); + free_percpu(dev->bulkq); dev_put(dev->dev); kfree(dev); } @@ -342,6 +399,7 @@ static int dev_map_update_elem(struct bpf_map *map, void *key, void *value, { struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map); struct net *net = current->nsproxy->net_ns; + gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN; struct bpf_dtab_netdev *dev, *old_dev; u32 i = *(u32 *)key; u32 ifindex = *(u32 *)value; @@ -356,13 +414,20 @@ static int dev_map_update_elem(struct bpf_map *map, void *key, void *value, if (!ifindex) { dev = NULL; } else { - dev = kmalloc_node(sizeof(*dev), GFP_ATOMIC | __GFP_NOWARN, - map->numa_node); + dev = kmalloc_node(sizeof(*dev), gfp, map->numa_node); if (!dev) return -ENOMEM; + dev->bulkq = __alloc_percpu_gfp(sizeof(*dev->bulkq), + sizeof(void *), gfp); + if (!dev->bulkq) { + kfree(dev); + return -ENOMEM; + } + dev->dev = dev_get_by_index(net, ifindex); if (!dev->dev) { + free_percpu(dev->bulkq); kfree(dev); return -EINVAL; } From patchwork Mon Nov 2 12:48:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392245 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt7Q12Vmz9sX5; Mon, 2 Nov 2020 23:50:42 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZIC-0007An-ID; Mon, 02 Nov 2020 12:50:36 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHH-0006Ky-3T for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:39 +0000 Received: from mail-qv1-f72.google.com ([209.85.219.72]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHE-00048Q-Tn for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:37 +0000 Received: by mail-qv1-f72.google.com with SMTP id b10so5177365qvl.8 for ; Mon, 02 Nov 2020 04:49:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WxOjiVsXM/wfmMS4QoX4KeQc/+t9cZbc/U+pPstqUtA=; b=Hyx52k7iD/kdV/MJAkklHjLa3qkffC4QyC0WIggqOtNVDq47EYyAwMp3YiPgYiAoVv KjDmkDoOCZGScTBNDdM39UGqok3jAPu1nNmARglyS9usTfm/jQz1zqr7T7E5WoHjT8Rj /ebB/C8k1erE4OX/YjG5O2bVHfkwBo5morR/N3iAWZIf9QlkmeG8Pv7JOOJE7Zz4CUzG BOoBz+L9YFM+YdwG9bJleZpkSmT4hxKkJ2t4so8jxff4j/TLMeuRa/NoKpDCckDB+Jod JrR1BQwy6dnZeyadc25gxaUF5u2SRqowpzwTsvEPqsxo5ObMjNHqySy8RLrPWYGz1ADC 5NdA== X-Gm-Message-State: AOAM5323QFEp/LU0rjL5p5PBp/XP4tgzueRz48d632AVzHui5kjDkk1U vAax8B1WD5WxxTu+ciFcu/6wdLa2c0Eepw+1Dj37gu5HRtzngrm1NuNNyS2YT1FS8j43KS7fuEl 7/eVRwXM2bz27sFgyw5mfmKaWLtAPOgNbdoeHbTJSEg== X-Received: by 2002:a37:6107:: with SMTP id v7mr13958012qkb.43.1604321375761; Mon, 02 Nov 2020 04:49:35 -0800 (PST) X-Google-Smtp-Source: ABdhPJziv/zu2sVkW/xO0EPRfjrzm9diGehwV4xTlTaigyHN6r9ymq4tajCqin5OWnHI+SwITqrHZQ== X-Received: by 2002:a37:6107:: with SMTP id v7mr13958002qkb.43.1604321375539; Mon, 02 Nov 2020 04:49:35 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:34 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 34/40] xdp: introduce xdp_return_frame_rx_napi Date: Mon, 2 Nov 2020 07:48:50 -0500 Message-Id: <20201102124856.4659-35-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jesper Dangaard Brouer BugLink: https://bugs.launchpad.net/bugs/1877654 When sending an xdp_frame through xdp_do_redirect call, then error cases can happen where the xdp_frame needs to be dropped, and returning an -errno code isn't sufficient/possible any-longer (e.g. for cpumap case). This is already fully supported, by simply calling xdp_return_frame. This patch is an optimization, which provides xdp_return_frame_rx_napi, which is a faster variant for these error cases. It take advantage of the protection provided by XDP RX running under NAPI protection. This change is mostly relevant for drivers using the page_pool allocator as it can take advantage of this. (Tested with mlx5). Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Alexei Starovoitov (backported from commit 389ab7f01af988c2a1ec5617eb0c7e220df1ef1c) [ vilhelmgray: context adjustment ] Signed-off-by: William Breathitt Gray --- include/net/page_pool.h | 5 +++-- include/net/xdp.h | 1 + kernel/bpf/cpumap.c | 2 +- kernel/bpf/devmap.c | 2 +- net/core/xdp.c | 20 ++++++++++++++++---- 5 files changed, 22 insertions(+), 8 deletions(-) diff --git a/include/net/page_pool.h b/include/net/page_pool.h index c79087153148..694d055e01ef 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -115,13 +115,14 @@ void page_pool_destroy(struct page_pool *pool); void __page_pool_put_page(struct page_pool *pool, struct page *page, bool allow_direct); -static inline void page_pool_put_page(struct page_pool *pool, struct page *page) +static inline void page_pool_put_page(struct page_pool *pool, + struct page *page, bool allow_direct) { /* When page_pool isn't compiled-in, net/core/xdp.c doesn't * allow registering MEM_TYPE_PAGE_POOL, but shield linker. */ #ifdef CONFIG_PAGE_POOL - __page_pool_put_page(pool, page, false); + __page_pool_put_page(pool, page, allow_direct); #endif } /* Very limited use-cases allow recycle direct */ diff --git a/include/net/xdp.h b/include/net/xdp.h index 0b689cf561c7..7ad779237ae8 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -104,6 +104,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp) } void xdp_return_frame(struct xdp_frame *xdpf); +void xdp_return_frame_rx_napi(struct xdp_frame *xdpf); void xdp_return_buff(struct xdp_buff *xdp); int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq, diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index d96e3f114118..95d6c95b9b45 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -583,7 +583,7 @@ static int bq_flush_to_queue(struct bpf_cpu_map_entry *rcpu, err = __ptr_ring_produce(q, xdpf); if (err) { drops++; - xdp_return_frame(xdpf); + xdp_return_frame_rx_napi(xdpf); } processed++; } diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 7606c542b643..50ebd188fdb0 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -245,7 +245,7 @@ static int bq_xmit_all(struct bpf_dtab_netdev *obj, err = dev->netdev_ops->ndo_xdp_xmit(dev, xdpf); if (err) - xdp_return_frame(xdpf); + xdp_return_frame_rx_napi(xdpf); } bq->count = 0; diff --git a/net/core/xdp.c b/net/core/xdp.c index bf6758f74339..cb8c4e061a5a 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -308,7 +308,13 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, } EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model); -static void xdp_return(void *data, struct xdp_mem_info *mem) +/* XDP RX runs under NAPI protection, and in different delivery error + * scenarios (e.g. queue full), it is possible to return the xdp_frame + * while still leveraging this protection. The @napi_direct boolian + * is used for those calls sites. Thus, allowing for faster recycling + * of xdp_frames/pages in those cases. + */ +static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct) { struct xdp_mem_allocator *xa; struct page *page; @@ -320,7 +326,7 @@ static void xdp_return(void *data, struct xdp_mem_info *mem) xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params); page = virt_to_head_page(data); if (xa) - page_pool_put_page(xa->page_pool, page); + page_pool_put_page(xa->page_pool, page, napi_direct); else put_page(page); rcu_read_unlock(); @@ -340,12 +346,18 @@ static void xdp_return(void *data, struct xdp_mem_info *mem) void xdp_return_frame(struct xdp_frame *xdpf) { - xdp_return(xdpf->data, &xdpf->mem); + __xdp_return(xdpf->data, &xdpf->mem, false); } EXPORT_SYMBOL_GPL(xdp_return_frame); +void xdp_return_frame_rx_napi(struct xdp_frame *xdpf) +{ + __xdp_return(xdpf->data, &xdpf->mem, true); +} +EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi); + void xdp_return_buff(struct xdp_buff *xdp) { - xdp_return(xdp->data, &xdp->rxq->mem); + __xdp_return(xdp->data, &xdp->rxq->mem, true); } EXPORT_SYMBOL_GPL(xdp_return_buff); From patchwork Mon Nov 2 12:48:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392248 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt7V5YGyz9sVV; Mon, 2 Nov 2020 23:50:46 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZIG-0007Ed-7P; Mon, 02 Nov 2020 12:50:40 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHI-0006LY-Hn for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:40 +0000 Received: from mail-qv1-f69.google.com ([209.85.219.69]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHG-00048c-2X for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:38 +0000 Received: by mail-qv1-f69.google.com with SMTP id o12so922631qvw.1 for ; Mon, 02 Nov 2020 04:49:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2jokYFkCh7M0AX3lcsAEWZmFUNTLu+0lcx7ZuCGokz0=; b=HE4/OE2H07pGHSfkLtX8lyczexqWKTlqpBGQqYewGuGNkTFYxBzlWzMgrJDzSkYpms GURj1DKZPtE0fDI4hSk1ZGdnnzOT6chK6ef6wpXtxKOnOrbwm9lSOrZMkQH9Yo57eKZ/ /H1k9X5Zoq/nvYKdj6UAQa5qHU5tLFU3UkXGPxjRN85iHb+bnJ25YcGTaYA6BXE2Yj7u k9ZLZXl1kEekGdZsb0Ie/KXKA8U989/4J5D/kTc+4IKITxThwmRRe+EdJ5epCqVmxTeL yKAe/vOXH1nrii2pTBZqUd1JMrDs8z/Ud0t4kr6I/Odz8h4g95eMcKVOYOlcA06Dgt/O s/0w== X-Gm-Message-State: AOAM533n2omOWdkKrRemMT85FetHnC3OQ6OBa57htFDEPSK/+hpjNnY1 ktUfsLCv6dnm9UoKOH8MRX3FPUImgB2XdnGN+jIxjmA6Pnv9f8W89SQsdageB8JBYXF7ITA6juA rrlVBm0K3aL1kbNAD+X/lLJjAXZRz8xXJOQewRgXR8Q== X-Received: by 2002:a37:a605:: with SMTP id p5mr14492190qke.134.1604321376906; Mon, 02 Nov 2020 04:49:36 -0800 (PST) X-Google-Smtp-Source: ABdhPJysRZ4Fp71jSDfTKujhlgm1q53yvPVzBY0Y7gM4prOD+7Mn/HcsHt8n+T/oIOQDoaK997E1sw== X-Received: by 2002:a37:a605:: with SMTP id p5mr14492176qke.134.1604321376696; Mon, 02 Nov 2020 04:49:36 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:35 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 35/40] xdp: add MEM_TYPE_ZERO_COPY Date: Mon, 2 Nov 2020 07:48:51 -0500 Message-Id: <20201102124856.4659-36-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Björn Töpel BugLink: https://bugs.launchpad.net/bugs/1877654 Here, a new type of allocator support is added to the XDP return API. A zero-copy allocated xdp_buff cannot be converted to an xdp_frame. Instead is the buff has to be copied. This is not supported at all in this commit. Also, an opaque "handle" is added to xdp_buff. This can be used as a context for the zero-copy allocator implementation. Signed-off-by: Björn Töpel Signed-off-by: Daniel Borkmann (cherry picked from commit 02b55e5657c3a569fc681ba851e464cfa6b90d4f) Signed-off-by: William Breathitt Gray --- include/net/xdp.h | 10 ++++++++++ net/core/xdp.c | 19 ++++++++++++++----- 2 files changed, 24 insertions(+), 5 deletions(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index 7ad779237ae8..715d82f0ee86 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -37,6 +37,7 @@ enum xdp_mem_type { MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */ MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */ MEM_TYPE_PAGE_POOL, + MEM_TYPE_ZERO_COPY, MEM_TYPE_MAX, }; @@ -47,6 +48,10 @@ struct xdp_mem_info { struct page_pool; +struct zero_copy_allocator { + void (*free)(struct zero_copy_allocator *zca, unsigned long handle); +}; + struct xdp_rxq_info { struct net_device *dev; u32 queue_index; @@ -59,6 +64,7 @@ struct xdp_buff { void *data_end; void *data_meta; void *data_hard_start; + unsigned long handle; struct xdp_rxq_info *rxq; }; @@ -82,6 +88,10 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp) int metasize; int headroom; + /* TODO: implement clone, copy, use "native" MEM_TYPE */ + if (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY) + return NULL; + /* Assure headroom is available for storing info */ headroom = xdp->data - xdp->data_hard_start; metasize = xdp->data - xdp->data_meta; diff --git a/net/core/xdp.c b/net/core/xdp.c index cb8c4e061a5a..9d1f22072d5d 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -31,6 +31,7 @@ struct xdp_mem_allocator { union { void *allocator; struct page_pool *page_pool; + struct zero_copy_allocator *zc_alloc; }; struct rhash_head node; struct rcu_head rcu; @@ -261,7 +262,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, xdp_rxq->mem.type = type; if (!allocator) { - if (type == MEM_TYPE_PAGE_POOL) + if (type == MEM_TYPE_PAGE_POOL || type == MEM_TYPE_ZERO_COPY) return -EINVAL; /* Setup time check page_pool req */ return 0; } @@ -314,7 +315,8 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model); * is used for those calls sites. Thus, allowing for faster recycling * of xdp_frames/pages in those cases. */ -static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct) +static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct, + unsigned long handle) { struct xdp_mem_allocator *xa; struct page *page; @@ -338,6 +340,13 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct) page = virt_to_page(data); /* Assumes order0 page*/ put_page(page); break; + case MEM_TYPE_ZERO_COPY: + /* NB! Only valid from an xdp_buff! */ + rcu_read_lock(); + /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */ + xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params); + xa->zc_alloc->free(xa->zc_alloc, handle); + rcu_read_unlock(); default: /* Not possible, checked in xdp_rxq_info_reg_mem_model() */ break; @@ -346,18 +355,18 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct) void xdp_return_frame(struct xdp_frame *xdpf) { - __xdp_return(xdpf->data, &xdpf->mem, false); + __xdp_return(xdpf->data, &xdpf->mem, false, 0); } EXPORT_SYMBOL_GPL(xdp_return_frame); void xdp_return_frame_rx_napi(struct xdp_frame *xdpf) { - __xdp_return(xdpf->data, &xdpf->mem, true); + __xdp_return(xdpf->data, &xdpf->mem, true, 0); } EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi); void xdp_return_buff(struct xdp_buff *xdp) { - __xdp_return(xdp->data, &xdp->rxq->mem, true); + __xdp_return(xdp->data, &xdp->rxq->mem, true, xdp->handle); } EXPORT_SYMBOL_GPL(xdp_return_buff); From patchwork Mon Nov 2 12:48:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392249 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt7b41v9z9sWR; Mon, 2 Nov 2020 23:50:51 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZIK-0007JV-3H; Mon, 02 Nov 2020 12:50:44 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHL-0006Ms-Ht for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:43 +0000 Received: from mail-qv1-f72.google.com ([209.85.219.72]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHH-00048n-IC for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:39 +0000 Received: by mail-qv1-f72.google.com with SMTP id z9so8100087qvo.20 for ; Mon, 02 Nov 2020 04:49:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cCe1vcnFCq0KLPZ1Jg5wwkhl7Citnu2Ci53ZRPujoX4=; b=i3jMAaKYrSq9KEmhqmABAhFCV2DnKaKm8foDIvFEirqYhPeHkVHXX/A03hXAsOVQRq 9Ep8IFqeSjlY4ICFl3xgp+LFzCbjg+FwcHnbDq+gGxmXLBF/UCepFbNLo95XrJMEtih+ EqegbvJ1MhaYwXmNp3MyPx0CV+sAvGpHbBAn5e4t8QpVQd8zJz/8aX7BdkI3AEiQapre Hs9LkihcKejhjgS1VmqdMOO51O3cdTDco0+zLFeZ18NOYOKbh4qPweSaSmc6c+f3MXJd e5HEC8YQH8SvwtIPR540HsOH4HIJazufu+uGSzu4dFvDM8YasGSj509Pfo5iC9skb7vS JK0g== X-Gm-Message-State: AOAM532i4gwpQfn4ybSC5S/RwuIPWpf69yR+GVYh//EcdVSHdyK/StbU gnrMFAQQKugkl2BPAg7c64ZNtH20HJJSu1NbfzvmnRCCe+CXHNGmwRjNUq3hiRatOJBuZw+62Vk jPRDBC2dtVkXENaRYEhz6rReJ9EQGRt5/D/dEhOmQFA== X-Received: by 2002:a0c:a2a6:: with SMTP id g35mr21456566qva.4.1604321378206; Mon, 02 Nov 2020 04:49:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJy2Rv95RjjHN7WfxOlEo/3sQ9+pulTkereUNrcysESEpUy9FDHmFXt+HlJ7Rrdo6ePDBuVYFQ== X-Received: by 2002:a0c:a2a6:: with SMTP id g35mr21456536qva.4.1604321377842; Mon, 02 Nov 2020 04:49:37 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.36 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:37 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 36/40] hv_netvsc: Add support for LRO/RSC in the vSwitch Date: Mon, 2 Nov 2020 07:48:52 -0500 Message-Id: <20201102124856.4659-37-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Haiyang Zhang BugLink: https://bugs.launchpad.net/bugs/1877654 LRO/RSC in the vSwitch is a feature available in Windows Server 2019 hosts and later. It reduces the per packet processing overhead by coalescing multiple TCP segments when possible. This patch adds netvsc driver support for this feature. Signed-off-by: Haiyang Zhang Signed-off-by: David S. Miller (backported from commit c8e4eff4675f22ad1110141ed9e62102d4d77e1c) [ vilhelmgray: context adjustments ] [ vilhelmgray: remove drop incomplete packet from netvsc_receive() ] Signed-off-by: William Breathitt Gray --- drivers/net/hyperv/hyperv_net.h | 47 +++++++++++++--- drivers/net/hyperv/netvsc.c | 17 ++++-- drivers/net/hyperv/netvsc_drv.c | 28 +++++----- drivers/net/hyperv/rndis_filter.c | 93 ++++++++++++++++++++++++++----- 4 files changed, 146 insertions(+), 39 deletions(-) diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 5d1c91500df0..df27f4437b1e 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -188,6 +188,7 @@ struct rndis_device { /* Interface */ struct rndis_message; struct netvsc_device; +struct netvsc_channel; struct net_device_context; struct netvsc_device *netvsc_device_add(struct hv_device *device, @@ -203,10 +204,7 @@ void netvsc_linkstatus_callback(struct net_device *net, struct rndis_message *resp); int netvsc_recv_callback(struct net_device *net, struct netvsc_device *nvdev, - struct vmbus_channel *channel, - void *data, u32 len, - const struct ndis_tcp_ip_checksum_info *csum_info, - const struct ndis_pkt_8021q_info *vlan); + struct netvsc_channel *nvchan); void netvsc_channel_cb(void *context); int netvsc_poll(struct napi_struct *napi, int budget); @@ -222,7 +220,7 @@ int rndis_filter_set_rss_param(struct rndis_device *rdev, const u8 *key); int rndis_filter_receive(struct net_device *ndev, struct netvsc_device *net_dev, - struct vmbus_channel *channel, + struct netvsc_channel *nvchan, void *data, u32 buflen); int rndis_filter_set_device_mac(struct netvsc_device *ndev, @@ -524,6 +522,8 @@ struct nvsp_2_vsc_capability { u64 ieee8021q:1; u64 correlation_id:1; u64 teaming:1; + u64 vsubnetid:1; + u64 rsc:1; }; }; } __packed; @@ -827,7 +827,7 @@ struct nvsp_message { #define NETVSC_SUPPORTED_HW_FEATURES (NETIF_F_RXCSUM | NETIF_F_IP_CSUM | \ NETIF_F_TSO | NETIF_F_IPV6_CSUM | \ - NETIF_F_TSO6) + NETIF_F_TSO6 | NETIF_F_LRO) #define VRSS_SEND_TAB_SIZE 16 /* must be power of 2 */ #define VRSS_CHANNEL_MAX 64 @@ -853,6 +853,18 @@ struct multi_recv_comp { u32 next; /* next entry for writing */ }; +#define NVSP_RSC_MAX 562 /* Max #RSC frags in a vmbus xfer page pkt */ + +struct nvsc_rsc { + const struct ndis_pkt_8021q_info *vlan; + const struct ndis_tcp_ip_checksum_info *csum_info; + u8 is_last; /* last RNDIS msg in a vmtransfer_page */ + u32 cnt; /* #fragments in an RSC packet */ + u32 pktlen; /* Full packet length */ + void *data[NVSP_RSC_MAX]; + u32 len[NVSP_RSC_MAX]; +}; + struct netvsc_stats { u64 packets; u64 bytes; @@ -946,6 +958,7 @@ struct netvsc_channel { struct multi_send_data msd; struct multi_recv_comp mrc; atomic_t queue_sends; + struct nvsc_rsc rsc; struct netvsc_stats tx_stats; struct netvsc_stats rx_stats; @@ -1131,7 +1144,8 @@ struct rndis_oobd { /* Packet extension field contents associated with a Data message. */ struct rndis_per_packet_info { u32 size; - u32 type; + u32 type:31; + u32 internal:1; u32 ppi_offset; }; @@ -1152,6 +1166,25 @@ enum ndis_per_pkt_info_type { MAX_PER_PKT_INFO }; +enum rndis_per_pkt_info_interal_type { + RNDIS_PKTINFO_ID = 1, + /* Add more memebers here */ + + RNDIS_PKTINFO_MAX +}; + +#define RNDIS_PKTINFO_SUBALLOC BIT(0) +#define RNDIS_PKTINFO_1ST_FRAG BIT(1) +#define RNDIS_PKTINFO_LAST_FRAG BIT(2) + +#define RNDIS_PKTINFO_ID_V1 1 + +struct rndis_pktinfo_id { + u8 ver; + u8 flag; + u16 pkt_id; +}; + struct ndis_pkt_8021q_info { union { struct { diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index c8380964a04d..70a3484e7e38 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -529,6 +529,9 @@ static int negotiate_nvsp_ver(struct hv_device *device, init_packet->msg.v2_msg.send_ndis_config.capability.teaming = 1; } + if (nvsp_ver >= NVSP_PROTOCOL_VERSION_61) + init_packet->msg.v2_msg.send_ndis_config.capability.rsc = 1; + ret = vmbus_sendpacket(device->channel, init_packet, sizeof(struct nvsp_message), (unsigned long)init_packet, @@ -1113,11 +1116,12 @@ static void enq_receive_complete(struct net_device *ndev, static int netvsc_receive(struct net_device *ndev, struct netvsc_device *net_device, - struct vmbus_channel *channel, + struct netvsc_channel *nvchan, const struct vmpacket_descriptor *desc, const struct nvsp_message *nvsp) { struct net_device_context *net_device_ctx = netdev_priv(ndev); + struct vmbus_channel *channel = nvchan->channel; const struct vmtransfer_page_packet_header *vmxferpage_packet = container_of(desc, const struct vmtransfer_page_packet_header, d); u16 q_idx = channel->offermsg.offer.sub_channel_index; @@ -1151,9 +1155,11 @@ static int netvsc_receive(struct net_device *ndev, u32 buflen = vmxferpage_packet->ranges[i].byte_count; int ret; + nvchan->rsc.is_last = (i == count - 1); + /* Pass it to the upper layer */ ret = rndis_filter_receive(ndev, net_device, - channel, data, buflen); + nvchan, data, buflen); if (unlikely(ret != NVSP_STAT_SUCCESS)) status = NVSP_STAT_FAIL; @@ -1230,12 +1236,13 @@ static void netvsc_receive_inband(struct net_device *ndev, } static int netvsc_process_raw_pkt(struct hv_device *device, - struct vmbus_channel *channel, + struct netvsc_channel *nvchan, struct netvsc_device *net_device, struct net_device *ndev, const struct vmpacket_descriptor *desc, int budget) { + struct vmbus_channel *channel = nvchan->channel; const struct nvsp_message *nvmsg = hv_pkt_data(desc); u32 msglen = hv_pkt_datalen(desc); @@ -1246,7 +1253,7 @@ static int netvsc_process_raw_pkt(struct hv_device *device, break; case VM_PKT_DATA_USING_XFER_PAGES: - return netvsc_receive(ndev, net_device, channel, + return netvsc_receive(ndev, net_device, nvchan, desc, nvmsg); break; @@ -1290,7 +1297,7 @@ int netvsc_poll(struct napi_struct *napi, int budget) nvchan->desc = hv_pkt_iter_first(channel); while (nvchan->desc && work_done < budget) { - work_done += netvsc_process_raw_pkt(device, channel, net_device, + work_done += netvsc_process_raw_pkt(device, nvchan, net_device, ndev, nvchan->desc, budget); nvchan->desc = hv_pkt_iter_next(channel, nvchan->desc); } diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 3f1b3e6ae4aa..2c815614ce16 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -777,14 +777,16 @@ static void netvsc_comp_ipcsum(struct sk_buff *skb) } static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net, - struct napi_struct *napi, - const struct ndis_tcp_ip_checksum_info *csum_info, - const struct ndis_pkt_8021q_info *vlan, - void *data, u32 buflen) + struct netvsc_channel *nvchan) { + struct napi_struct *napi = &nvchan->napi; + const struct ndis_pkt_8021q_info *vlan = nvchan->rsc.vlan; + const struct ndis_tcp_ip_checksum_info *csum_info = + nvchan->rsc.csum_info; struct sk_buff *skb; + int i; - skb = napi_alloc_skb(napi, buflen); + skb = napi_alloc_skb(napi, nvchan->rsc.pktlen); if (!skb) return skb; @@ -792,7 +794,8 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net, * Copy to skb. This copy is needed here since the memory pointed by * hv_netvsc_packet cannot be deallocated */ - skb_put_data(skb, data, buflen); + for (i = 0; i < nvchan->rsc.cnt; i++) + skb_put_data(skb, nvchan->rsc.data[i], nvchan->rsc.len[i]); skb->protocol = eth_type_trans(skb, net); @@ -832,14 +835,11 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net, */ int netvsc_recv_callback(struct net_device *net, struct netvsc_device *net_device, - struct vmbus_channel *channel, - void *data, u32 len, - const struct ndis_tcp_ip_checksum_info *csum_info, - const struct ndis_pkt_8021q_info *vlan) + struct netvsc_channel *nvchan) { struct net_device_context *net_device_ctx = netdev_priv(net); + struct vmbus_channel *channel = nvchan->channel; u16 q_idx = channel->offermsg.offer.sub_channel_index; - struct netvsc_channel *nvchan = &net_device->chan_table[q_idx]; struct sk_buff *skb; struct netvsc_stats *rx_stats; @@ -847,8 +847,8 @@ int netvsc_recv_callback(struct net_device *net, return NVSP_STAT_FAIL; /* Allocate a skb - TODO direct I/O to pages? */ - skb = netvsc_alloc_recv_skb(net, &nvchan->napi, - csum_info, vlan, data, len); + skb = netvsc_alloc_recv_skb(net, nvchan); + if (unlikely(!skb)) { ++net->stats.rx_dropped; rcu_read_unlock(); @@ -865,7 +865,7 @@ int netvsc_recv_callback(struct net_device *net, rx_stats = &nvchan->rx_stats; u64_stats_update_begin(&rx_stats->syncp); rx_stats->packets++; - rx_stats->bytes += len; + rx_stats->bytes += nvchan->rsc.pktlen; if (skb->pkt_type == PACKET_BROADCAST) ++rx_stats->broadcast; diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c index 4d2eceb3a694..9489600e766b 100644 --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -338,7 +338,8 @@ static void rndis_filter_receive_response(struct net_device *ndev, * Get the Per-Packet-Info with the specified type * return NULL if not found. */ -static inline void *rndis_get_ppi(struct rndis_packet *rpkt, u32 type) +static inline void *rndis_get_ppi(struct rndis_packet *rpkt, + u32 type, u8 internal) { struct rndis_per_packet_info *ppi; int len; @@ -351,7 +352,7 @@ static inline void *rndis_get_ppi(struct rndis_packet *rpkt, u32 type) len = rpkt->per_pkt_info_len; while (len > 0) { - if (ppi->type == type) + if (ppi->type == type && ppi->internal == internal) return (void *)((ulong)ppi + ppi->ppi_offset); len -= ppi->size; ppi = (struct rndis_per_packet_info *)((ulong)ppi + ppi->size); @@ -360,16 +361,40 @@ static inline void *rndis_get_ppi(struct rndis_packet *rpkt, u32 type) return NULL; } +static inline +void rsc_add_data(struct netvsc_channel *nvchan, + const struct ndis_pkt_8021q_info *vlan, + const struct ndis_tcp_ip_checksum_info *csum_info, + void *data, u32 len) +{ + u32 cnt = nvchan->rsc.cnt; + + if (cnt) { + nvchan->rsc.pktlen += len; + } else { + nvchan->rsc.vlan = vlan; + nvchan->rsc.csum_info = csum_info; + nvchan->rsc.pktlen = len; + } + + nvchan->rsc.data[cnt] = data; + nvchan->rsc.len[cnt] = len; + nvchan->rsc.cnt++; +} + static int rndis_filter_receive_data(struct net_device *ndev, struct netvsc_device *nvdev, struct rndis_message *msg, - struct vmbus_channel *channel, + struct netvsc_channel *nvchan, void *data, u32 data_buflen) { struct rndis_packet *rndis_pkt = &msg->msg.pkt; const struct ndis_tcp_ip_checksum_info *csum_info; const struct ndis_pkt_8021q_info *vlan; + const struct rndis_pktinfo_id *pktinfo_id; u32 data_offset; + bool rsc_more = false; + int ret; /* Remove the rndis header and pass it back up the stack */ data_offset = RNDIS_HEADER_SIZE + rndis_pkt->data_offset; @@ -388,24 +413,59 @@ static int rndis_filter_receive_data(struct net_device *ndev, return NVSP_STAT_FAIL; } - vlan = rndis_get_ppi(rndis_pkt, IEEE_8021Q_INFO); + vlan = rndis_get_ppi(rndis_pkt, IEEE_8021Q_INFO, 0); - /* - * Remove the rndis trailer padding from rndis packet message + csum_info = rndis_get_ppi(rndis_pkt, TCPIP_CHKSUM_PKTINFO, 0); + + pktinfo_id = rndis_get_ppi(rndis_pkt, RNDIS_PKTINFO_ID, 1); + + data = (void *)msg + data_offset; + + /* Identify RSC frags, drop erroneous packets */ + if (pktinfo_id && (pktinfo_id->flag & RNDIS_PKTINFO_SUBALLOC)) { + if (pktinfo_id->flag & RNDIS_PKTINFO_1ST_FRAG) + nvchan->rsc.cnt = 0; + else if (nvchan->rsc.cnt == 0) + goto drop; + + rsc_more = true; + + if (pktinfo_id->flag & RNDIS_PKTINFO_LAST_FRAG) + rsc_more = false; + + if (rsc_more && nvchan->rsc.is_last) + goto drop; + } else { + nvchan->rsc.cnt = 0; + } + + if (unlikely(nvchan->rsc.cnt >= NVSP_RSC_MAX)) + goto drop; + + /* Put data into per channel structure. + * Also, remove the rndis trailer padding from rndis packet message * rndis_pkt->data_len tell us the real data length, we only copy * the data packet to the stack, without the rndis trailer padding */ - data = (void *)((unsigned long)data + data_offset); - csum_info = rndis_get_ppi(rndis_pkt, TCPIP_CHKSUM_PKTINFO); + rsc_add_data(nvchan, vlan, csum_info, data, rndis_pkt->data_len); + + if (rsc_more) + return NVSP_STAT_SUCCESS; - return netvsc_recv_callback(ndev, nvdev, channel, - data, rndis_pkt->data_len, - csum_info, vlan); + ret = netvsc_recv_callback(ndev, nvdev, nvchan); + nvchan->rsc.cnt = 0; + + return ret; + +drop: + /* Drop incomplete packet */ + nvchan->rsc.cnt = 0; + return NVSP_STAT_FAIL; } int rndis_filter_receive(struct net_device *ndev, struct netvsc_device *net_dev, - struct vmbus_channel *channel, + struct netvsc_channel *nvchan, void *data, u32 buflen) { struct net_device_context *net_device_ctx = netdev_priv(ndev); @@ -417,7 +477,7 @@ int rndis_filter_receive(struct net_device *ndev, switch (rndis_msg->ndis_msg_type) { case RNDIS_MSG_PACKET: return rndis_filter_receive_data(ndev, net_dev, rndis_msg, - channel, data, buflen); + nvchan, data, buflen); case RNDIS_MSG_INIT_C: case RNDIS_MSG_QUERY_C: case RNDIS_MSG_SET_C: @@ -1187,6 +1247,13 @@ static int rndis_netdev_set_hwcaps(struct rndis_device *rndis_device, } } + if (hwcaps.rsc.ip4 && hwcaps.rsc.ip6) { + net->hw_features |= NETIF_F_LRO; + + offloads.rsc_ip_v4 = NDIS_OFFLOAD_PARAMETERS_RSC_ENABLED; + offloads.rsc_ip_v6 = NDIS_OFFLOAD_PARAMETERS_RSC_ENABLED; + } + /* In case some hw_features disappeared we need to remove them from * net->features list as they're no longer supported. */ From patchwork Mon Nov 2 12:48:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392251 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt7h1J0Lz9sWP; Mon, 2 Nov 2020 23:50:56 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZIP-0007OX-7H; Mon, 02 Nov 2020 12:50:49 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHM-0006Na-FN for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:44 +0000 Received: from mail-qv1-f70.google.com ([209.85.219.70]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHI-00048w-Dz for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:40 +0000 Received: by mail-qv1-f70.google.com with SMTP id m11so8078992qvt.11 for ; Mon, 02 Nov 2020 04:49:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=NpxTt8KNmuTRWCZarWUcXtwgsisrnqkghprCMAbVA1M=; b=rmWcg2MNMI+wnSQ7bLenReuCLbQrBIKuCNFvIXx3A1Zr4nsGvhqHmw8jERpiKnn7DI znhEwzaKA67Yv9auFzkC+E7SMVub+RxQ0SzjlRkpQwj2+DxkjqjlcyAvq5mW7nkLAyZy lQpEvM78uxcWUu39i8auu4Di9MaJZ+D1/HSBA/xbTZwaXo6FzYtBFAXej6eIUWcEf7yT b6r0+91u9uq2VMpSX1JQcxBX8Nl+rq7tvwwypu2ah1Hk3BqkkHcLB76nEZNRfCL+DHJ1 R64AT4tQI2j6uFwTOJ1VdgwgGyr/s9XcS4C4RWW6vxgMGbHNKxvK1krCkyK9bGdVEmId 9vQA== X-Gm-Message-State: AOAM5309hw34n1QvS1jDmyOtsKrKAnvv13PQjgSanE37ZFi29CJUA9Py aLh/o4Fsx/Dtj8aZc3nmueFqqwalvUY0+EKpx1oSq/B/k3E6M1qkhiyPLo+Zr3/DT2ZH/S+wHbg p/pS6SFM2d2LBYdM0y9ESfPZo8i/t+pKbK9Xfk8KWwA== X-Received: by 2002:a37:b4e:: with SMTP id 75mr11056103qkl.326.1604321379176; Mon, 02 Nov 2020 04:49:39 -0800 (PST) X-Google-Smtp-Source: ABdhPJx6seTpdhtMr2RB27JBrf5q3+fJQgVEQtPjB63Rx1LSUBm7zU0a8KY//ll25o505bk/Ir/iUg== X-Received: by 2002:a37:b4e:: with SMTP id 75mr11056077qkl.326.1604321378866; Mon, 02 Nov 2020 04:49:38 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:38 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 37/40] hv_netvsc: Refactor assignments of struct netvsc_device_info Date: Mon, 2 Nov 2020 07:48:53 -0500 Message-Id: <20201102124856.4659-38-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Haiyang Zhang BugLink: https://bugs.launchpad.net/bugs/1877654 These assignments occur in multiple places. The patch refactor them to a function for simplicity. It also puts the struct to heap area for future expension. Signed-off-by: Haiyang Zhang Reviewed-by: Michael Kelley [sl: fix up subject line] Signed-off-by: Sasha Levin (backported from commit 7c9f335a3ff20557a92584199f3d35c7e992bbe5) [ vilhelmgray: context adjustments and storing ring_size ] Signed-off-by: William Breathitt Gray --- drivers/net/hyperv/netvsc_drv.c | 142 ++++++++++++++++++++------------ 1 file changed, 90 insertions(+), 52 deletions(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 2c815614ce16..1f588da45dca 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -897,6 +897,36 @@ static void netvsc_get_channels(struct net_device *net, } } +/* Alloc struct netvsc_device_info, and initialize it from either existing + * struct netvsc_device, or from default values. + */ +static struct netvsc_device_info *netvsc_devinfo_get + (struct netvsc_device *nvdev) +{ + struct netvsc_device_info *dev_info; + + dev_info = kzalloc(sizeof(*dev_info), GFP_ATOMIC); + + if (!dev_info) + return NULL; + + if (nvdev) { + dev_info->num_chn = nvdev->num_chn; + dev_info->send_sections = nvdev->send_section_cnt; + dev_info->send_section_size = nvdev->send_section_size; + dev_info->recv_sections = nvdev->recv_section_cnt; + dev_info->recv_section_size = nvdev->recv_section_size; + } else { + dev_info->num_chn = VRSS_CHANNEL_DEFAULT; + dev_info->send_sections = NETVSC_DEFAULT_TX; + dev_info->send_section_size = NETVSC_SEND_SECTION_SIZE; + dev_info->recv_sections = NETVSC_DEFAULT_RX; + dev_info->recv_section_size = NETVSC_RECV_SECTION_SIZE; + } + + return dev_info; +} + static int netvsc_detach(struct net_device *ndev, struct netvsc_device *nvdev) { @@ -990,7 +1020,7 @@ static int netvsc_set_channels(struct net_device *net, struct net_device_context *net_device_ctx = netdev_priv(net); struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev); unsigned int orig, count = channels->combined_count; - struct netvsc_device_info device_info; + struct netvsc_device_info *device_info; int ret; /* We do not support separate count for rx, tx, or other */ @@ -1009,25 +1039,27 @@ static int netvsc_set_channels(struct net_device *net, orig = nvdev->num_chn; - memset(&device_info, 0, sizeof(device_info)); - device_info.num_chn = count; - device_info.ring_size = ring_size; - device_info.send_sections = nvdev->send_section_cnt; - device_info.send_section_size = nvdev->send_section_size; - device_info.recv_sections = nvdev->recv_section_cnt; - device_info.recv_section_size = nvdev->recv_section_size; + device_info = netvsc_devinfo_get(nvdev); + + if (!device_info) + return -ENOMEM; + + device_info->num_chn = count; + device_info->ring_size = ring_size; ret = netvsc_detach(net, nvdev); if (ret) - return ret; + goto out; - ret = netvsc_attach(net, &device_info); + ret = netvsc_attach(net, device_info); if (ret) { - device_info.num_chn = orig; - if (netvsc_attach(net, &device_info)) + device_info->num_chn = orig; + if (netvsc_attach(net, device_info)) netdev_err(net, "restoring channel setting failed\n"); } +out: + kfree(device_info); return ret; } @@ -1094,26 +1126,25 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu) struct net_device *vf_netdev = rtnl_dereference(ndevctx->vf_netdev); struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev); int orig_mtu = ndev->mtu; - struct netvsc_device_info device_info; + struct netvsc_device_info *device_info; int ret = 0; if (!nvdev || nvdev->destroy) return -ENODEV; + device_info = netvsc_devinfo_get(nvdev); + + if (!device_info) + return -ENOMEM; + /* Change MTU of underlying VF netdev first. */ if (vf_netdev) { ret = dev_set_mtu(vf_netdev, mtu); if (ret) - return ret; + goto out; } - memset(&device_info, 0, sizeof(device_info)); - device_info.ring_size = ring_size; - device_info.num_chn = nvdev->num_chn; - device_info.send_sections = nvdev->send_section_cnt; - device_info.send_section_size = nvdev->send_section_size; - device_info.recv_sections = nvdev->recv_section_cnt; - device_info.recv_section_size = nvdev->recv_section_size; + device_info->ring_size = ring_size; ret = netvsc_detach(ndev, nvdev); if (ret) @@ -1121,22 +1152,21 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu) ndev->mtu = mtu; - ret = netvsc_attach(ndev, &device_info); - if (ret) - goto rollback; - - return 0; + ret = netvsc_attach(ndev, device_info); + if (!ret) + goto out; -rollback: /* Attempt rollback to original MTU */ ndev->mtu = orig_mtu; - if (netvsc_attach(ndev, &device_info)) + if (netvsc_attach(ndev, device_info)) netdev_err(ndev, "restoring mtu failed\n"); rollback_vf: if (vf_netdev) dev_set_mtu(vf_netdev, orig_mtu); +out: + kfree(device_info); return ret; } @@ -1645,7 +1675,7 @@ static int netvsc_set_ringparam(struct net_device *ndev, { struct net_device_context *ndevctx = netdev_priv(ndev); struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev); - struct netvsc_device_info device_info; + struct netvsc_device_info *device_info; struct ethtool_ringparam orig; u32 new_tx, new_rx; int ret = 0; @@ -1665,27 +1695,30 @@ static int netvsc_set_ringparam(struct net_device *ndev, new_rx == orig.rx_pending) return 0; /* no change */ - memset(&device_info, 0, sizeof(device_info)); - device_info.num_chn = nvdev->num_chn; - device_info.ring_size = ring_size; - device_info.send_sections = new_tx; - device_info.send_section_size = nvdev->send_section_size; - device_info.recv_sections = new_rx; - device_info.recv_section_size = nvdev->recv_section_size; + device_info = netvsc_devinfo_get(nvdev); + + if (!device_info) + return -ENOMEM; + + device_info->ring_size = ring_size; + device_info->send_sections = new_tx; + device_info->recv_sections = new_rx; ret = netvsc_detach(ndev, nvdev); if (ret) - return ret; + goto out; - ret = netvsc_attach(ndev, &device_info); + ret = netvsc_attach(ndev, device_info); if (ret) { - device_info.send_sections = orig.tx_pending; - device_info.recv_sections = orig.rx_pending; + device_info->send_sections = orig.tx_pending; + device_info->recv_sections = orig.rx_pending; - if (netvsc_attach(ndev, &device_info)) + if (netvsc_attach(ndev, device_info)) netdev_err(ndev, "restoring ringparam failed"); } +out: + kfree(device_info); return ret; } @@ -2070,7 +2103,7 @@ static int netvsc_probe(struct hv_device *dev, { struct net_device *net = NULL; struct net_device_context *net_device_ctx; - struct netvsc_device_info device_info; + struct netvsc_device_info *device_info = NULL; struct netvsc_device *nvdev; int ret = -ENOMEM; @@ -2117,22 +2150,23 @@ static int netvsc_probe(struct hv_device *dev, netif_set_real_num_rx_queues(net, 1); /* Notify the netvsc driver of the new device */ - memset(&device_info, 0, sizeof(device_info)); - device_info.ring_size = ring_size; - device_info.num_chn = VRSS_CHANNEL_DEFAULT; - device_info.send_sections = NETVSC_DEFAULT_TX; - device_info.send_section_size = NETVSC_SEND_SECTION_SIZE; - device_info.recv_sections = NETVSC_DEFAULT_RX; - device_info.recv_section_size = NETVSC_RECV_SECTION_SIZE; - - nvdev = rndis_filter_device_add(dev, &device_info); + device_info = netvsc_devinfo_get(NULL); + + if (!device_info) { + ret = -ENOMEM; + goto devinfo_failed; + } + + device_info->ring_size = ring_size; + + nvdev = rndis_filter_device_add(dev, device_info); if (IS_ERR(nvdev)) { ret = PTR_ERR(nvdev); netdev_err(net, "unable to add netvsc device (ret %d)\n", ret); goto rndis_failed; } - memcpy(net->dev_addr, device_info.mac_adr, ETH_ALEN); + memcpy(net->dev_addr, device_info->mac_adr, ETH_ALEN); /* We must get rtnl lock before scheduling nvdev->subchan_work, * otherwise netvsc_subchan_work() can get rtnl lock first and wait @@ -2172,12 +2206,16 @@ static int netvsc_probe(struct hv_device *dev, list_add(&net_device_ctx->list, &netvsc_dev_list); rtnl_unlock(); + + kfree(device_info); return 0; register_failed: rtnl_unlock(); rndis_filter_device_remove(dev, nvdev); rndis_failed: + kfree(device_info); +devinfo_failed: free_percpu(net_device_ctx->vf_stats); no_stats: hv_set_drvdata(dev, NULL); From patchwork Mon Nov 2 12:48:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392250 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt7f226Fz9sVq; Mon, 2 Nov 2020 23:50:54 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZIN-0007MI-I1; Mon, 02 Nov 2020 12:50:47 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHO-0006OT-GS for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:46 +0000 Received: from mail-qt1-f197.google.com ([209.85.160.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHK-00049E-AI for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:42 +0000 Received: by mail-qt1-f197.google.com with SMTP id i15so7938534qti.7 for ; Mon, 02 Nov 2020 04:49:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GDFMZzroOXLDXZTUuA4DDCezmiuKlFaYLimSx470kYg=; b=ThdTLNtJZBCd+U7KugzLVjU5q+nnTVfk6nnQ+rJU63SWYozHPeCZqW/Ot5K8Bwy9Vr HywT/8FNjTXmmBFL2wGDiPiLpo1pYBtbQMlmWkYedPRvnp2wWUgg540ayE7f8SiWj/GM Rl4V5iKkdgBQ79lRlJX27BPVeroXjtMMWddHDEP9FmOFAXMbpH0yYMhxs5OM5/Zg576G Qx96YSwZyJ5ahmgRRTfqsNokzm0Z2IyyjjGurcnSxee6PQjVuFuOUVU8hPqMCy82DwCS erOWRHrwgvDo0eYVsSGsKbyrFXktW3J/TjM6bfa1SY+XnYo4IZKAb0NQumNsTpmL7aw4 0VYw== X-Gm-Message-State: AOAM533wVVkNi17x9VANBODzAO2JGLCc195sS5SA3JSvN5O6wk5dL3z8 +z2zY6Hf6TiBiKYJ0X/fnTo8InPX64RK2VCB/enw9VQMAjTnHQiFGBiJtajzYAfWrHC5bgQ8auO O7tPGcj/EtqttJ87bbipEp1YO8s9CSp4P3Pt2SjSGQg== X-Received: by 2002:a37:a692:: with SMTP id p140mr14262142qke.125.1604321380700; Mon, 02 Nov 2020 04:49:40 -0800 (PST) X-Google-Smtp-Source: ABdhPJz6DLsgjMfOXns7hpGFh3QLeQf0vIz7B7KGVLGfk6t7gh8I2tCQx9wzfxh2XxAuRu2h4ZN7IA== X-Received: by 2002:a37:a692:: with SMTP id p140mr14262107qke.125.1604321380220; Mon, 02 Nov 2020 04:49:40 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:39 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 38/40] hv_netvsc: Add XDP support Date: Mon, 2 Nov 2020 07:48:54 -0500 Message-Id: <20201102124856.4659-39-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Haiyang Zhang BugLink: https://bugs.launchpad.net/bugs/1877654 This patch adds support of XDP in native mode for hv_netvsc driver, and transparently sets the XDP program on the associated VF NIC as well. Setting / unsetting XDP program on synthetic NIC (netvsc) propagates to VF NIC automatically. Setting / unsetting XDP program on VF NIC directly is not recommended, also not propagated to synthetic NIC, and may be overwritten by setting of synthetic NIC. The Azure/Hyper-V synthetic NIC receive buffer doesn't provide headroom for XDP. We thought about re-use the RNDIS header space, but it's too small. So we decided to copy the packets to a page buffer for XDP. And, most of our VMs on Azure have Accelerated Network (SRIOV) enabled, so most of the packets run on VF NIC. The synthetic NIC is considered as a fallback data-path. So the data copy on netvsc won't impact performance significantly. XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO before running XDP: ethtool -K eth0 lro off XDP actions not yet supported: XDP_REDIRECT Signed-off-by: Haiyang Zhang Signed-off-by: David S. Miller (backported from commit 351e1581395fcc7fb952bbd7dda01238f69968fd) [ vilhelmgray: context adjustments ] [ vilhelmgray: drop changes for netvsc_resume() ] Signed-off-by: William Breathitt Gray --- drivers/net/hyperv/Makefile | 2 +- drivers/net/hyperv/hyperv_net.h | 22 +++- drivers/net/hyperv/netvsc.c | 31 ++++- drivers/net/hyperv/netvsc_bpf.c | 209 ++++++++++++++++++++++++++++++ drivers/net/hyperv/netvsc_drv.c | 177 +++++++++++++++++++++---- drivers/net/hyperv/rndis_filter.c | 2 +- 6 files changed, 408 insertions(+), 35 deletions(-) create mode 100644 drivers/net/hyperv/netvsc_bpf.c diff --git a/drivers/net/hyperv/Makefile b/drivers/net/hyperv/Makefile index c8a66827100c..366c564841a3 100644 --- a/drivers/net/hyperv/Makefile +++ b/drivers/net/hyperv/Makefile @@ -1,3 +1,3 @@ obj-$(CONFIG_HYPERV_NET) += hv_netvsc.o -hv_netvsc-y := netvsc_drv.o netvsc.o rndis_filter.o +hv_netvsc-y := netvsc_drv.o netvsc.o rndis_filter.o netvsc_bpf.o diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index df27f4437b1e..4f7967819dc0 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -27,6 +27,7 @@ #include #include #include +#include /* Include refcount here to avoid backporting a bunch of kref commits. */ #include @@ -155,6 +156,8 @@ struct netvsc_device_info { u32 recv_sections; u32 send_section_size; u32 recv_section_size; + + struct bpf_prog *bprog; }; enum rndis_device_state { @@ -199,7 +202,8 @@ int netvsc_send(struct net_device *net, struct hv_netvsc_packet *packet, struct rndis_message *rndis_msg, struct hv_page_buffer *page_buffer, - struct sk_buff *skb); + struct sk_buff *skb, + bool xdp_tx); void netvsc_linkstatus_callback(struct net_device *net, struct rndis_message *resp); int netvsc_recv_callback(struct net_device *net, @@ -208,6 +212,16 @@ int netvsc_recv_callback(struct net_device *net, void netvsc_channel_cb(void *context); int netvsc_poll(struct napi_struct *napi, int budget); +u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan, + struct xdp_buff *xdp); +unsigned int netvsc_xdp_fraglen(unsigned int len); +struct bpf_prog *netvsc_xdp_get(struct netvsc_device *nvdev); +int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog, + struct netlink_ext_ack *extack, + struct netvsc_device *nvdev); +int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog *prog); +int netvsc_bpf(struct net_device *dev, struct netdev_bpf *bpf); + int rndis_set_subchannel(struct net_device *ndev, struct netvsc_device *nvdev); int rndis_filter_open(struct netvsc_device *nvdev); int rndis_filter_close(struct netvsc_device *nvdev); @@ -836,6 +850,8 @@ struct nvsp_message { #define RNDIS_MAX_PKT_DEFAULT 8 #define RNDIS_PKT_ALIGN_DEFAULT 8 +#define NETVSC_XDP_HDRM 256 + struct multi_send_data { struct sk_buff *skb; /* skb containing the pkt */ struct hv_netvsc_packet *pkt; /* netvsc pkt pending */ @@ -870,6 +886,7 @@ struct netvsc_stats { u64 bytes; u64 broadcast; u64 multicast; + u64 xdp_drop; struct u64_stats_sync syncp; }; @@ -960,6 +977,9 @@ struct netvsc_channel { atomic_t queue_sends; struct nvsc_rsc rsc; + struct bpf_prog __rcu *bpf_prog; + struct xdp_rxq_info xdp_rxq; + struct netvsc_stats tx_stats; struct netvsc_stats rx_stats; }; diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 70a3484e7e38..54ed633e82a6 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -130,8 +130,10 @@ static void free_netvsc_device(struct rcu_head *head) vfree(nvdev->send_buf); kfree(nvdev->send_section_map); - for (i = 0; i < VRSS_CHANNEL_MAX; i++) + for (i = 0; i < VRSS_CHANNEL_MAX; i++) { + xdp_rxq_info_unreg(&nvdev->chan_table[i].xdp_rxq); vfree(nvdev->chan_table[i].mrc.slots); + } kfree(nvdev); } @@ -908,7 +910,8 @@ int netvsc_send(struct net_device *ndev, struct hv_netvsc_packet *packet, struct rndis_message *rndis_msg, struct hv_page_buffer *pb, - struct sk_buff *skb) + struct sk_buff *skb, + bool xdp_tx) { struct net_device_context *ndev_ctx = netdev_priv(ndev); struct netvsc_device *net_device @@ -931,10 +934,11 @@ int netvsc_send(struct net_device *ndev, packet->send_buf_index = NETVSC_INVALID_INDEX; packet->cp_partial = false; - /* Send control message directly without accessing msd (Multi-Send - * Data) field which may be changed during data packet processing. + /* Send a control message or XDP packet directly without accessing + * msd (Multi-Send Data) field which may be changed during data packet + * processing. */ - if (!skb) + if (!skb || xdp_tx) return netvsc_send_pkt(device, packet, net_device, pb, skb); /* batch packets in send buffer if possible */ @@ -1384,6 +1388,21 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device, nvchan->net_device = net_device; u64_stats_init(&nvchan->tx_stats.syncp); u64_stats_init(&nvchan->rx_stats.syncp); + + ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i); + + if (ret) { + netdev_err(ndev, "xdp_rxq_info_reg fail: %d\n", ret); + goto cleanup2; + } + + ret = xdp_rxq_info_reg_mem_model(&nvchan->xdp_rxq, + MEM_TYPE_PAGE_SHARED, NULL); + + if (ret) { + netdev_err(ndev, "xdp reg_mem_model fail: %d\n", ret); + goto cleanup2; + } } /* Enable NAPI handler before init callbacks */ @@ -1430,6 +1449,8 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device, cleanup: netif_napi_del(&net_device->chan_table[0].napi); + +cleanup2: free_netvsc_device(&net_device->rcu); return ERR_PTR(ret); diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c new file mode 100644 index 000000000000..20adfe544294 --- /dev/null +++ b/drivers/net/hyperv/netvsc_bpf.c @@ -0,0 +1,209 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2019, Microsoft Corporation. + * + * Author: + * Haiyang Zhang + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "hyperv_net.h" + +u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan, + struct xdp_buff *xdp) +{ + void *data = nvchan->rsc.data[0]; + u32 len = nvchan->rsc.len[0]; + struct page *page = NULL; + struct bpf_prog *prog; + u32 act = XDP_PASS; + + xdp->data_hard_start = NULL; + + rcu_read_lock(); + prog = rcu_dereference(nvchan->bpf_prog); + + if (!prog) + goto out; + + /* allocate page buffer for data */ + page = alloc_page(GFP_ATOMIC); + if (!page) { + act = XDP_DROP; + goto out; + } + + xdp->data_hard_start = page_address(page); + xdp->data = xdp->data_hard_start + NETVSC_XDP_HDRM; + xdp_set_data_meta_invalid(xdp); + xdp->data_end = xdp->data + len; + xdp->rxq = &nvchan->xdp_rxq; + xdp->handle = 0; + + memcpy(xdp->data, data, len); + + act = bpf_prog_run_xdp(prog, xdp); + + switch (act) { + case XDP_PASS: + case XDP_TX: + case XDP_DROP: + break; + + case XDP_ABORTED: + trace_xdp_exception(ndev, prog, act); + break; + + default: + bpf_warn_invalid_xdp_action(act); + } + +out: + rcu_read_unlock(); + + if (page && act != XDP_PASS && act != XDP_TX) { + __free_page(page); + xdp->data_hard_start = NULL; + } + + return act; +} + +unsigned int netvsc_xdp_fraglen(unsigned int len) +{ + return SKB_DATA_ALIGN(len) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); +} + +struct bpf_prog *netvsc_xdp_get(struct netvsc_device *nvdev) +{ + return rtnl_dereference(nvdev->chan_table[0].bpf_prog); +} + +int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog, + struct netlink_ext_ack *extack, + struct netvsc_device *nvdev) +{ + struct bpf_prog *old_prog; + int buf_max, i; + + old_prog = netvsc_xdp_get(nvdev); + + if (!old_prog && !prog) + return 0; + + buf_max = NETVSC_XDP_HDRM + netvsc_xdp_fraglen(dev->mtu + ETH_HLEN); + if (prog && buf_max > PAGE_SIZE) { + netdev_err(dev, "XDP: mtu:%u too large, buf_max:%u\n", + dev->mtu, buf_max); + NL_SET_ERR_MSG_MOD(extack, "XDP: mtu too large"); + + return -EOPNOTSUPP; + } + + if (prog && (dev->features & NETIF_F_LRO)) { + netdev_err(dev, "XDP: not support LRO\n"); + NL_SET_ERR_MSG_MOD(extack, "XDP: not support LRO"); + + return -EOPNOTSUPP; + } + + if (prog) + bpf_prog_add(prog, nvdev->num_chn); + + for (i = 0; i < nvdev->num_chn; i++) + rcu_assign_pointer(nvdev->chan_table[i].bpf_prog, prog); + + if (old_prog) + for (i = 0; i < nvdev->num_chn; i++) + bpf_prog_put(old_prog); + + return 0; +} + +int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog *prog) +{ + struct netdev_bpf xdp; + bpf_op_t ndo_bpf; + + ASSERT_RTNL(); + + if (!vf_netdev) + return 0; + + ndo_bpf = vf_netdev->netdev_ops->ndo_bpf; + if (!ndo_bpf) + return 0; + + memset(&xdp, 0, sizeof(xdp)); + + xdp.command = XDP_SETUP_PROG; + xdp.prog = prog; + + return ndo_bpf(vf_netdev, &xdp); +} + +static u32 netvsc_xdp_query(struct netvsc_device *nvdev) +{ + struct bpf_prog *prog = netvsc_xdp_get(nvdev); + + if (prog) + return prog->aux->id; + + return 0; +} + +int netvsc_bpf(struct net_device *dev, struct netdev_bpf *bpf) +{ + struct net_device_context *ndevctx = netdev_priv(dev); + struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev); + struct net_device *vf_netdev = rtnl_dereference(ndevctx->vf_netdev); + struct netlink_ext_ack *extack = bpf->extack; + int ret; + + if (!nvdev || nvdev->destroy) { + if (bpf->command == XDP_QUERY_PROG) { + bpf->prog_id = 0; + return 0; /* Query must always succeed */ + } else { + return -ENODEV; + } + } + + switch (bpf->command) { + case XDP_SETUP_PROG: + ret = netvsc_xdp_set(dev, bpf->prog, extack, nvdev); + + if (ret) + return ret; + + ret = netvsc_vf_setxdp(vf_netdev, bpf->prog); + + if (ret) { + netdev_err(dev, "vf_setxdp failed:%d\n", ret); + NL_SET_ERR_MSG_MOD(extack, "vf_setxdp failed"); + + netvsc_xdp_set(dev, NULL, extack, nvdev); + } + + return ret; + + case XDP_QUERY_PROG: + bpf->prog_id = netvsc_xdp_query(nvdev); + return 0; + + default: + return -EINVAL; + } +} diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 1f588da45dca..6b904816af5c 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include @@ -532,7 +533,7 @@ static int netvsc_vf_xmit(struct net_device *net, struct net_device *vf_netdev, return rc; } -static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net) +static int netvsc_xmit(struct sk_buff *skb, struct net_device *net, bool xdp_tx) { struct net_device_context *net_device_ctx = netdev_priv(net); struct hv_netvsc_packet *packet = NULL; @@ -703,7 +704,7 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net) /* timestamp packet in software */ skb_tx_timestamp(skb); - ret = netvsc_send(net, packet, rndis_msg, pb, skb); + ret = netvsc_send(net, packet, rndis_msg, pb, skb, xdp_tx); if (likely(ret == 0)) return NETDEV_TX_OK; @@ -726,6 +727,11 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net) goto drop; } +static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *ndev) +{ + return netvsc_xmit(skb, ndev, false); +} + /* * netvsc_linkstatus_callback - Link up/down notification */ @@ -768,6 +774,22 @@ void netvsc_linkstatus_callback(struct net_device *net, schedule_delayed_work(&ndev_ctx->dwork, 0); } +static void netvsc_xdp_xmit(struct sk_buff *skb, struct net_device *ndev) +{ + int rc; + + skb->queue_mapping = skb_get_rx_queue(skb); + __skb_push(skb, ETH_HLEN); + + rc = netvsc_xmit(skb, ndev, true); + + if (dev_xmit_complete(rc)) + return; + + dev_kfree_skb_any(skb); + ndev->stats.tx_dropped++; +} + static void netvsc_comp_ipcsum(struct sk_buff *skb) { struct iphdr *iph = (struct iphdr *)skb->data; @@ -777,25 +799,45 @@ static void netvsc_comp_ipcsum(struct sk_buff *skb) } static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net, - struct netvsc_channel *nvchan) + struct netvsc_channel *nvchan, + struct xdp_buff *xdp) { struct napi_struct *napi = &nvchan->napi; const struct ndis_pkt_8021q_info *vlan = nvchan->rsc.vlan; const struct ndis_tcp_ip_checksum_info *csum_info = nvchan->rsc.csum_info; struct sk_buff *skb; + void *xbuf = xdp->data_hard_start; int i; - skb = napi_alloc_skb(napi, nvchan->rsc.pktlen); - if (!skb) - return skb; + if (xbuf) { + unsigned int hdroom = xdp->data - xdp->data_hard_start; + unsigned int xlen = xdp->data_end - xdp->data; + unsigned int frag_size = netvsc_xdp_fraglen(hdroom + xlen); - /* - * Copy to skb. This copy is needed here since the memory pointed by - * hv_netvsc_packet cannot be deallocated - */ - for (i = 0; i < nvchan->rsc.cnt; i++) - skb_put_data(skb, nvchan->rsc.data[i], nvchan->rsc.len[i]); + skb = build_skb(xbuf, frag_size); + + if (!skb) { + __free_page(virt_to_page(xbuf)); + return NULL; + } + + skb_reserve(skb, hdroom); + skb_put(skb, xlen); + skb->dev = napi->dev; + } else { + skb = napi_alloc_skb(napi, nvchan->rsc.pktlen); + + if (!skb) + return NULL; + + /* Copy to skb. This copy is needed here since the memory + * pointed by hv_netvsc_packet cannot be deallocated. + */ + for (i = 0; i < nvchan->rsc.cnt; i++) + skb_put_data(skb, nvchan->rsc.data[i], + nvchan->rsc.len[i]); + } skb->protocol = eth_type_trans(skb, net); @@ -841,13 +883,25 @@ int netvsc_recv_callback(struct net_device *net, struct vmbus_channel *channel = nvchan->channel; u16 q_idx = channel->offermsg.offer.sub_channel_index; struct sk_buff *skb; - struct netvsc_stats *rx_stats; + struct netvsc_stats *rx_stats = &nvchan->rx_stats; + struct xdp_buff xdp; + u32 act; if (net->reg_state != NETREG_REGISTERED) return NVSP_STAT_FAIL; + act = netvsc_run_xdp(net, nvchan, &xdp); + + if (act != XDP_PASS && act != XDP_TX) { + u64_stats_update_begin(&rx_stats->syncp); + rx_stats->xdp_drop++; + u64_stats_update_end(&rx_stats->syncp); + + return NVSP_STAT_SUCCESS; /* consumed by XDP */ + } + /* Allocate a skb - TODO direct I/O to pages? */ - skb = netvsc_alloc_recv_skb(net, nvchan); + skb = netvsc_alloc_recv_skb(net, nvchan, &xdp); if (unlikely(!skb)) { ++net->stats.rx_dropped; @@ -862,7 +916,6 @@ int netvsc_recv_callback(struct net_device *net, * on the synthetic device because modifying the VF device * statistics will not work correctly. */ - rx_stats = &nvchan->rx_stats; u64_stats_update_begin(&rx_stats->syncp); rx_stats->packets++; rx_stats->bytes += nvchan->rsc.pktlen; @@ -873,6 +926,11 @@ int netvsc_recv_callback(struct net_device *net, ++rx_stats->multicast; u64_stats_update_end(&rx_stats->syncp); + if (act == XDP_TX) { + netvsc_xdp_xmit(skb, net); + return NVSP_STAT_SUCCESS; + } + napi_gro_receive(&nvchan->napi, skb); return NVSP_STAT_SUCCESS; @@ -900,10 +958,11 @@ static void netvsc_get_channels(struct net_device *net, /* Alloc struct netvsc_device_info, and initialize it from either existing * struct netvsc_device, or from default values. */ -static struct netvsc_device_info *netvsc_devinfo_get - (struct netvsc_device *nvdev) +static +struct netvsc_device_info *netvsc_devinfo_get(struct netvsc_device *nvdev) { struct netvsc_device_info *dev_info; + struct bpf_prog *prog; dev_info = kzalloc(sizeof(*dev_info), GFP_ATOMIC); @@ -911,11 +970,19 @@ static struct netvsc_device_info *netvsc_devinfo_get return NULL; if (nvdev) { + ASSERT_RTNL(); + dev_info->num_chn = nvdev->num_chn; dev_info->send_sections = nvdev->send_section_cnt; dev_info->send_section_size = nvdev->send_section_size; dev_info->recv_sections = nvdev->recv_section_cnt; dev_info->recv_section_size = nvdev->recv_section_size; + + prog = netvsc_xdp_get(nvdev); + if (prog) { + bpf_prog_inc(prog); + dev_info->bprog = prog; + } } else { dev_info->num_chn = VRSS_CHANNEL_DEFAULT; dev_info->send_sections = NETVSC_DEFAULT_TX; @@ -927,6 +994,17 @@ static struct netvsc_device_info *netvsc_devinfo_get return dev_info; } +/* Free struct netvsc_device_info */ +static void netvsc_devinfo_put(struct netvsc_device_info *dev_info) +{ + if (dev_info->bprog) { + ASSERT_RTNL(); + bpf_prog_put(dev_info->bprog); + } + + kfree(dev_info); +} + static int netvsc_detach(struct net_device *ndev, struct netvsc_device *nvdev) { @@ -938,6 +1016,8 @@ static int netvsc_detach(struct net_device *ndev, if (cancel_work_sync(&nvdev->subchan_work)) nvdev->num_chn = 1; + netvsc_xdp_set(ndev, NULL, NULL, nvdev); + /* If device was up (receiving) then shutdown */ if (netif_running(ndev)) { netvsc_tx_disable(nvdev, ndev); @@ -971,7 +1051,8 @@ static int netvsc_attach(struct net_device *ndev, struct hv_device *hdev = ndev_ctx->device_ctx; struct netvsc_device *nvdev; struct rndis_device *rdev; - int ret; + struct bpf_prog *prog; + int ret = 0; nvdev = rndis_filter_device_add(hdev, dev_info); if (IS_ERR(nvdev)) @@ -987,6 +1068,13 @@ static int netvsc_attach(struct net_device *ndev, } } + prog = dev_info->bprog; + if (prog) { + ret = netvsc_xdp_set(ndev, prog, NULL, nvdev); + if (ret) + goto err1; + } + /* In any case device is now ready */ nvdev->tx_disable = false; netif_device_attach(ndev); @@ -997,7 +1085,7 @@ static int netvsc_attach(struct net_device *ndev, if (netif_running(ndev)) { ret = rndis_filter_open(nvdev); if (ret) - goto err; + goto err2; rdev = nvdev->extension; if (!rdev->link_state) @@ -1006,9 +1094,10 @@ static int netvsc_attach(struct net_device *ndev, return 0; -err: +err2: netif_device_detach(ndev); +err1: rndis_filter_device_remove(hdev, nvdev); return ret; @@ -1059,7 +1148,7 @@ static int netvsc_set_channels(struct net_device *net, } out: - kfree(device_info); + netvsc_devinfo_put(device_info); return ret; } @@ -1166,7 +1255,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu) dev_set_mtu(vf_netdev, orig_mtu); out: - kfree(device_info); + netvsc_devinfo_put(device_info); return ret; } @@ -1312,8 +1401,8 @@ static const struct { #define NETVSC_GLOBAL_STATS_LEN ARRAY_SIZE(netvsc_stats) #define NETVSC_VF_STATS_LEN ARRAY_SIZE(vf_stats) -/* 4 statistics per queue (rx/tx packets/bytes) */ -#define NETVSC_QUEUE_STATS_LEN(dev) ((dev)->num_chn * 4) +/* 5 statistics per queue (rx/tx packets/bytes, rx xdp_drop) */ +#define NETVSC_QUEUE_STATS_LEN(dev) ((dev)->num_chn * 5) static int netvsc_get_sset_count(struct net_device *dev, int string_set) { @@ -1343,6 +1432,7 @@ static void netvsc_get_ethtool_stats(struct net_device *dev, struct netvsc_vf_pcpu_stats sum; unsigned int start; u64 packets, bytes; + u64 xdp_drop; int i, j; if (!nvdev) @@ -1371,9 +1461,11 @@ static void netvsc_get_ethtool_stats(struct net_device *dev, start = u64_stats_fetch_begin_irq(&qstats->syncp); packets = qstats->packets; bytes = qstats->bytes; + xdp_drop = qstats->xdp_drop; } while (u64_stats_fetch_retry_irq(&qstats->syncp, start)); data[i++] = packets; data[i++] = bytes; + data[i++] = xdp_drop; } } @@ -1408,6 +1500,8 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data) p += ETH_GSTRING_LEN; sprintf(p, "rx_queue_%u_bytes", i); p += ETH_GSTRING_LEN; + sprintf(p, "rx_queue_%u_xdp_drop", i); + p += ETH_GSTRING_LEN; } break; @@ -1719,9 +1813,27 @@ static int netvsc_set_ringparam(struct net_device *ndev, out: kfree(device_info); + netvsc_devinfo_put(device_info); return ret; } +static netdev_features_t netvsc_fix_features(struct net_device *ndev, + netdev_features_t features) +{ + struct net_device_context *ndevctx = netdev_priv(ndev); + struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev); + + if (!nvdev || nvdev->destroy) + return features; + + if ((features & NETIF_F_LRO) && netvsc_xdp_get(nvdev)) { + features ^= NETIF_F_LRO; + netdev_info(ndev, "Skip LRO - unsupported with XDP\n"); + } + + return features; +} + static const struct ethtool_ops ethtool_ops = { .get_drvinfo = netvsc_get_drvinfo, .get_link = ethtool_op_get_link, @@ -1749,11 +1861,13 @@ static const struct net_device_ops device_ops = { .ndo_start_xmit = netvsc_start_xmit, .ndo_change_rx_flags = netvsc_change_rx_flags, .ndo_set_rx_mode = netvsc_set_rx_mode, + .ndo_fix_features = netvsc_fix_features, .ndo_change_mtu = netvsc_change_mtu, .ndo_validate_addr = eth_validate_addr, .ndo_set_mac_address = netvsc_set_mac_addr, .ndo_select_queue = netvsc_select_queue, .ndo_get_stats64 = netvsc_get_stats64, + .ndo_bpf = netvsc_bpf, #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_poll_controller = netvsc_poll_controller, #endif @@ -2021,6 +2135,7 @@ static int netvsc_register_vf(struct net_device *vf_netdev) struct net_device_context *net_device_ctx; struct device *pdev = vf_netdev->dev.parent; struct netvsc_device *netvsc_dev; + struct bpf_prog *prog; if (vf_netdev->addr_len != ETH_ALEN) return NOTIFY_DONE; @@ -2049,6 +2164,10 @@ static int netvsc_register_vf(struct net_device *vf_netdev) dev_hold(vf_netdev); rcu_assign_pointer(net_device_ctx->vf_netdev, vf_netdev); + + prog = netvsc_xdp_get(netvsc_dev); + netvsc_vf_setxdp(vf_netdev, prog); + return NOTIFY_OK; } @@ -2090,6 +2209,8 @@ static int netvsc_unregister_vf(struct net_device *vf_netdev) netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name); + netvsc_vf_setxdp(vf_netdev, NULL); + netdev_rx_handler_unregister(vf_netdev); netdev_upper_dev_unlink(vf_netdev, ndev); RCU_INIT_POINTER(net_device_ctx->vf_netdev, NULL); @@ -2207,14 +2328,14 @@ static int netvsc_probe(struct hv_device *dev, list_add(&net_device_ctx->list, &netvsc_dev_list); rtnl_unlock(); - kfree(device_info); + netvsc_devinfo_put(device_info); return 0; register_failed: rtnl_unlock(); rndis_filter_device_remove(dev, nvdev); rndis_failed: - kfree(device_info); + netvsc_devinfo_put(device_info); devinfo_failed: free_percpu(net_device_ctx->vf_stats); no_stats: @@ -2242,8 +2363,10 @@ static int netvsc_remove(struct hv_device *dev) rtnl_lock(); nvdev = rtnl_dereference(ndev_ctx->nvdev); - if (nvdev) + if (nvdev) { cancel_work_sync(&nvdev->subchan_work); + netvsc_xdp_set(net, NULL, NULL, nvdev); + } /* * Call to the vsc driver to let it know that the device is being diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c index 9489600e766b..ad51234b499b 100644 --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -242,7 +242,7 @@ static int rndis_filter_send_request(struct rndis_device *dev, } rcu_read_lock_bh(); - ret = netvsc_send(dev->ndev, packet, NULL, pb, NULL); + ret = netvsc_send(dev->ndev, packet, NULL, pb, NULL, false); rcu_read_unlock_bh(); return ret; From patchwork Mon Nov 2 12:48:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392252 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt7l0ntdz9sWT; Mon, 2 Nov 2020 23:50:59 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZIT-0007RR-5D; Mon, 02 Nov 2020 12:50:53 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHO-0006OY-H3 for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:46 +0000 Received: from mail-qk1-f197.google.com ([209.85.222.197]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHK-00049H-Ic for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:42 +0000 Received: by mail-qk1-f197.google.com with SMTP id k12so8536832qkj.18 for ; Mon, 02 Nov 2020 04:49:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+WE1Cm3Cb4ZpIhW4GJl6XI+eWwB1TyicA2XAoM5xyMA=; b=HTo/yvPZxwQIaasHUlOjTmR1QwJ0zE5QzrQLA9kGOxIgyST5u6Yg+kjez7p1mGWwX8 osd9f0JU0QMwmNMqDKdElM5kGIPyU02GbXVhDrMOArKHnfXn6TM+vOrvINojuHfN3Ykz f8etJ3mzCissZSAsLoBtwt5gzpk8m3clwyg9rPuk9I+rqZ0Nbhfkv4QafOJek6B41yqU wP+pItNuo9eudC6/xZQrZQkfkBYWii5vkpFafK54NjpmHGzWtRNUolqpNUlNv/66dc8p POms9gaAC/OHLBe8bu3NDTktcjlsiQF55UPtyEWh47DIhAFAE2Gn2tFbM4pdAdMzAob7 z28Q== X-Gm-Message-State: AOAM5317IJ2IuA5822yAPD7ZvTL4hnEbwCxmIgiIO8S19fFWtOzm7ZAg 9rEjfI8dSyJ6nMtLQ1lecW21buBaxb3qdJ3hYflJmlNN4PX4Q/Q7GK+uouNHbTUvXJxU0mf6+pM 2lN4sREJHBHyA7mhDH3P34y69nERLMduL5wzJ08r0pg== X-Received: by 2002:ad4:4e84:: with SMTP id dy4mr9432438qvb.47.1604321381452; Mon, 02 Nov 2020 04:49:41 -0800 (PST) X-Google-Smtp-Source: ABdhPJzfydLD5o2Q/s1RmgVxAMgBmUzn4YZ769mgo7M7pcFxtyWxpV36GkPXX2g2t9K0mh46F+RvQA== X-Received: by 2002:ad4:4e84:: with SMTP id dy4mr9432423qvb.47.1604321381263; Mon, 02 Nov 2020 04:49:41 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:40 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 39/40] hv_netvsc: Update document for XDP support Date: Mon, 2 Nov 2020 07:48:55 -0500 Message-Id: <20201102124856.4659-40-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Haiyang Zhang BugLink: https://bugs.launchpad.net/bugs/1877654 Added the new section in the document regarding XDP support by hv_netvsc driver. Signed-off-by: Haiyang Zhang Signed-off-by: David S. Miller (backported from commit 12fa74383ed4d1ffa283f77c1e7fe038e8182405) [ vilhelmgray: context adjustment ] Signed-off-by: William Breathitt Gray --- Documentation/networking/netvsc.txt | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/Documentation/networking/netvsc.txt b/Documentation/networking/netvsc.txt index 92f5b31392fa..6433ae1d901f 100644 --- a/Documentation/networking/netvsc.txt +++ b/Documentation/networking/netvsc.txt @@ -73,3 +73,24 @@ Features contain one or more packets. The send buffer is an optimization, the driver will use slower method to handle very large packets or if the send buffer area is exhausted. + + XDP support + ----------- + XDP (eXpress Data Path) is a feature that runs eBPF bytecode at the early + stage when packets arrive at a NIC card. The goal is to increase performance + for packet processing, reducing the overhead of SKB allocation and other + upper network layers. + + hv_netvsc supports XDP in native mode, and transparently sets the XDP + program on the associated VF NIC as well. + + Setting / unsetting XDP program on synthetic NIC (netvsc) propagates to + VF NIC automatically. Setting / unsetting XDP program on VF NIC directly + is not recommended, also not propagated to synthetic NIC, and may be + overwritten by setting of synthetic NIC. + + XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO + before running XDP: + ethtool -K eth0 lro off + + XDP_REDIRECT action is not yet supported. From patchwork Mon Nov 2 12:48:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: William Breathitt Gray X-Patchwork-Id: 1392247 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CPt7X3grqz9sVp; Mon, 2 Nov 2020 23:50:48 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1kZZII-0007Ha-Jk; Mon, 02 Nov 2020 12:50:42 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHP-0006P9-Cj for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:47 +0000 Received: from mail-qv1-f72.google.com ([209.85.219.72]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kZZHL-00049X-Q8 for kernel-team@lists.ubuntu.com; Mon, 02 Nov 2020 12:49:43 +0000 Received: by mail-qv1-f72.google.com with SMTP id z9so8100202qvo.20 for ; Mon, 02 Nov 2020 04:49:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bM5AU/1Wg27/2i9W86BbdyDqpXEEJICDD7Hv9qhEkyM=; b=CaNx4RVoz+6gta1d1NjWsHRgF/GR7h5jsn0cQ8H9kb4ZAB1nyi6Ne/YOGsu41YOU8O sOQgKwWIirgiMKaMthbUaW2LW6WO3ytEYXwwSG0s11mLJG3sLy2FPlUOIFYJlPWOZlwi NM4t4REfNUGyQZrLMU1Yy+sgdFXnrGqxmR1ZTsLB/CHBC08y42Zrx5oWnnYwPQRTZa4x qA4g2QAQ6NaKUZ3AMTwxuI0W6ACTS0i5PEM5PsmVlZKNbt9okMufFCwAUYIVZVwAd9tc T9bh5LetPxDkQ8FSwLXGpriO9oq6H5QiC0gTY3/9CYuk1SkUxEjVy4fkyrZYoGRWSNR/ C2uw== X-Gm-Message-State: AOAM532OttRM2n34AHWFOyv6gmdvIZHn5RVkN04owdQXjD6bBPMJrxM6 PpLj/t8kfkBbFv+Fup9TFoJbMtTWc8S2h5YO1FMbpxGDufOHB8abCWCnclcR+b5qrAcWa/BO4H0 GJxCXrXsuaMCLbQn/CzA96dojSGRXtr1m/fExc9OkOw== X-Received: by 2002:ac8:4713:: with SMTP id f19mr14149769qtp.233.1604321382679; Mon, 02 Nov 2020 04:49:42 -0800 (PST) X-Google-Smtp-Source: ABdhPJxPDZX9efnXe1/jattffs0LbKdnuF/9nZVli8sZMGJOiKilutTFouxz01C33aw1v1wvxHj3aA== X-Received: by 2002:ac8:4713:: with SMTP id f19mr14149762qtp.233.1604321382494; Mon, 02 Nov 2020 04:49:42 -0800 (PST) Received: from localhost.localdomain (072-189-064-225.res.spectrum.com. [72.189.64.225]) by smtp.gmail.com with ESMTPSA id q7sm7666201qtd.49.2020.11.02.04.49.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Nov 2020 04:49:41 -0800 (PST) From: William Breathitt Gray To: kernel-team@lists.ubuntu.com Subject: [SRU][B:linux-azure-4.15][PATCH 40/40] hv_netvsc: Fix XDP refcnt for synthetic and VF NICs Date: Mon, 2 Nov 2020 07:48:56 -0500 Message-Id: <20201102124856.4659-41-william.gray@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201102124856.4659-1-william.gray@canonical.com> References: <20201102124856.4659-1-william.gray@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Haiyang Zhang BugLink: https://bugs.launchpad.net/bugs/1877654 The caller of XDP_SETUP_PROG has already incremented refcnt in __bpf_prog_get(), so drivers should only increment refcnt by num_queues - 1. To fix the issue, update netvsc_xdp_set() to add the correct number to refcnt. Hold a refcnt in netvsc_xdp_set()’s other caller, netvsc_attach(). And, do the same in netvsc_vf_setxdp(). Otherwise, every time when VF is removed and added from the host side, the refcnt will be decreased by one, which may cause page fault when unloading xdp program. Fixes: 351e1581395f ("hv_netvsc: Add XDP support") Signed-off-by: Haiyang Zhang Signed-off-by: David S. Miller (cherry picked from commit 184367dce4f744bde54377203305ccc8889aa79f) Signed-off-by: William Breathitt Gray --- drivers/net/hyperv/netvsc_bpf.c | 13 +++++++++++-- drivers/net/hyperv/netvsc_drv.c | 5 ++++- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c index 20adfe544294..b86611041db6 100644 --- a/drivers/net/hyperv/netvsc_bpf.c +++ b/drivers/net/hyperv/netvsc_bpf.c @@ -120,7 +120,7 @@ int netvsc_xdp_set(struct net_device *dev, struct bpf_prog *prog, } if (prog) - bpf_prog_add(prog, nvdev->num_chn); + bpf_prog_add(prog, nvdev->num_chn - 1); for (i = 0; i < nvdev->num_chn; i++) rcu_assign_pointer(nvdev->chan_table[i].bpf_prog, prog); @@ -136,6 +136,7 @@ int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog *prog) { struct netdev_bpf xdp; bpf_op_t ndo_bpf; + int ret; ASSERT_RTNL(); @@ -148,10 +149,18 @@ int netvsc_vf_setxdp(struct net_device *vf_netdev, struct bpf_prog *prog) memset(&xdp, 0, sizeof(xdp)); + if (prog) + bpf_prog_inc(prog); + xdp.command = XDP_SETUP_PROG; xdp.prog = prog; - return ndo_bpf(vf_netdev, &xdp); + ret = ndo_bpf(vf_netdev, &xdp); + + if (ret && prog) + bpf_prog_put(prog); + + return ret; } static u32 netvsc_xdp_query(struct netvsc_device *nvdev) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 6b904816af5c..c78e76c304e5 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -1070,9 +1070,12 @@ static int netvsc_attach(struct net_device *ndev, prog = dev_info->bprog; if (prog) { + bpf_prog_inc(prog); ret = netvsc_xdp_set(ndev, prog, NULL, nvdev); - if (ret) + if (ret) { + bpf_prog_put(prog); goto err1; + } } /* In any case device is now ready */