From patchwork Sun Jul 22 15:13:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 947473 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="OluV3iCy"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41YSn350sZz9s4Z for ; Mon, 23 Jul 2018 01:13:27 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729793AbeGVQKV (ORCPT ); Sun, 22 Jul 2018 12:10:21 -0400 Received: from mail-pl0-f66.google.com ([209.85.160.66]:41901 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728506AbeGVQKV (ORCPT ); Sun, 22 Jul 2018 12:10:21 -0400 Received: by mail-pl0-f66.google.com with SMTP id w8-v6so7145731ply.8 for ; Sun, 22 Jul 2018 08:13:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=V7incWrLVqMEbpM3F9VB76bzZvqjpEZwM1jn+oTmATM=; b=OluV3iCyg4ou1BtAex0mJpep5OIZ2m/i36pLOpM99BtYWYC+20WbPIJsasAzo+glWF iGIp1yg+Y/oi+0bLUfPmRzAsCIA3tmnDDfqOID3zEntqfLWGBzR1kdu9fOh3Et9OIAbN NXfUYvJZAgwlA3nKx3Ssc5HxRPa4W225NXI8HkyeAWkWY794cSR02yGd6AhxvhmF8Ozq IpvQmbUM4QYlVgo1RCxsr9zEyaVgtNxXhTgHyB8AFM7Rq6wFLlkjZ7oXmLu1R0MePoyx qRRP5pVd+pjjXWy6AZxwxGuHqb8re/EW63TeLfaIWIYBZo8H1M4letVz7Pyd1UnmFdSJ 50JA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=V7incWrLVqMEbpM3F9VB76bzZvqjpEZwM1jn+oTmATM=; b=t6I9KvcXK2G3QhSaPPyiYh6Nevue7hPevi1S0m3j8fhB+ip8ezha0/jyb40ojOdIes zquSNTuePZ+cPCAus6WJb0kf8EySUbqdE/XPlZgaxgdJQ1sxWEjdR/2ollK+FZSHOIhz oBr+M/XPERij/z7n6/acenZbSwZwXlgymHi+IkKO0bJvwH4uapQKOJ+CGHGTHDOjtMYo pWuh9ZBicBwK2auuHueEBaiy34HSoHMFsUUq2JK3dyx88n5tvLNji5RHtOso9HJyeERO RNr578m5EqxtiEiP6vLcFBar84Wym8AJCetSfaVQIJHPdJbKIxF9LPlQ00lR5QM92qcM 4b0w== X-Gm-Message-State: AOUpUlFEavBsMVXPv7Uh4UBkGhM5hgOHSlwiHQuuOAzOmDWqaLK5oCDY /9vn8XL1CAkQVE3s3zuNxZOMz7ls X-Google-Smtp-Source: AAOMgpfz9VmrRni+9U70rTRJ1x0tCTs4jlQkLA2o3z8qL/AMZiIg84Cop5ZeZ/Gmouhg8mAgfYhEJw== X-Received: by 2002:a17:902:7c12:: with SMTP id x18-v6mr4846573pll.23.1532272401385; Sun, 22 Jul 2018 08:13:21 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id v6-v6sm12092940pfa.28.2018.07.22.08.13.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 22 Jul 2018 08:13:20 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer Subject: [PATCH v3 bpf-next 1/8] net: Export skb_headers_offset_update Date: Mon, 23 Jul 2018 00:13:01 +0900 Message-Id: <20180722151308.5480-2-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180722151308.5480-1-toshiaki.makita1@gmail.com> References: <20180722151308.5480-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This is needed for veth XDP which does skb_copy_expand()-like operation. v2: - Drop skb_copy_header part because it has already been exported now. Signed-off-by: Toshiaki Makita --- include/linux/skbuff.h | 1 + net/core/skbuff.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index fd3cb1b247df..f6929688853a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1035,6 +1035,7 @@ static inline struct sk_buff *alloc_skb_fclone(unsigned int size, } struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src); +void skb_headers_offset_update(struct sk_buff *skb, int off); int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask); struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t priority); void skb_copy_header(struct sk_buff *new, const struct sk_buff *old); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 0c1a00672ba9..5366d1660e5b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1291,7 +1291,7 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask) } EXPORT_SYMBOL(skb_clone); -static void skb_headers_offset_update(struct sk_buff *skb, int off) +void skb_headers_offset_update(struct sk_buff *skb, int off) { /* Only adjust this if it actually is csum_start rather than csum */ if (skb->ip_summed == CHECKSUM_PARTIAL) @@ -1305,6 +1305,7 @@ static void skb_headers_offset_update(struct sk_buff *skb, int off) skb->inner_network_header += off; skb->inner_mac_header += off; } +EXPORT_SYMBOL(skb_headers_offset_update); void skb_copy_header(struct sk_buff *new, const struct sk_buff *old) { From patchwork Sun Jul 22 15:13:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 947475 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="U6cVOyk3"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41YSn43wqmz9s4c for ; Mon, 23 Jul 2018 01:13:28 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729809AbeGVQKY (ORCPT ); Sun, 22 Jul 2018 12:10:24 -0400 Received: from mail-pl0-f66.google.com ([209.85.160.66]:44101 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728394AbeGVQKX (ORCPT ); Sun, 22 Jul 2018 12:10:23 -0400 Received: by mail-pl0-f66.google.com with SMTP id m16-v6so7147425pls.11 for ; Sun, 22 Jul 2018 08:13:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=8k9qjZmyJ2wbr/FKyHtiMAaKyN3BfL/ZRZ8hvsFa3RY=; b=U6cVOyk3V9RCus7NJVsLx8ckMmu0Xuolf+3CwYAqPkfApocsMNTNuFpAkqWrn+z8DY 6p+J2gqkyb+KA9jnEl2zmpu3UwzHOjUETxLBUrreCJLka+H/7acFylxr4p2xexUsB0nG YqM+ZuuCXvL3JZFuBfbe/ad7QhZnhi6lKyXrSQantcdvX8ZxnWzWGVWv1xTowi7YRkpw Zr4NKBtRAtvhOvIEu36GVVKW1g7TCjQdGIgP7JjYnfeeVWx+nEMF1w55C4ybxozdoH+g 2sBc4nhOV52TUctBnJ8lAQ7+0JiB5Haqd5dlJKQLkKSNgd2ja1bAomCwxZz8S05eyzKz m8pQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=8k9qjZmyJ2wbr/FKyHtiMAaKyN3BfL/ZRZ8hvsFa3RY=; b=jf3hCwRqtPUNnbNVsoqTOIhaxsKn1LX/4iaDaSrgzuXLuS5/r7XSjb2F82RgZy6MKG mGvLFBSMbpI/x+cRKrIHzqBBywFzs7ZuSUSclRUy2JgAggy/OsbsPGuJvDWkf3/VFC4W YLZr+pfEjR0l3eIWiPlrcTlx7nXxHwhtaynKGLPiTysVxL57eKb07f5kJ9QQ0wAPQfEQ 4F73ypM1Myhi3aq9dxdXhkqdG1AjYSl8EMR7hDCTC/hj50Syat4c78OpOZq6YF426uSE lGaQlWt6nUXamlVkm6ZkgAy7f6Iatt718WkiCWeXKvO5kG0sTHgM9b7f+jrC0HlMjZjS hQaA== X-Gm-Message-State: AOUpUlGe+QAqlrrNl5BXULyAaAsR22D35y5uR2IsgPXFBfmPysuztrOY KlgCu33RfJZUvqQ3aftXC8yAaeIp X-Google-Smtp-Source: AAOMgpfUdwX/TukIEa66W7MP103erxf4MgoCnG6Ouftz7aMpJBg/mqssN/fcxv4ixo82uSd8SCtx0g== X-Received: by 2002:a17:902:7482:: with SMTP id h2-v6mr9586562pll.185.1532272403866; Sun, 22 Jul 2018 08:13:23 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id v6-v6sm12092940pfa.28.2018.07.22.08.13.21 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 22 Jul 2018 08:13:23 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer Subject: [PATCH v3 bpf-next 2/8] veth: Add driver XDP Date: Mon, 23 Jul 2018 00:13:02 +0900 Message-Id: <20180722151308.5480-3-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180722151308.5480-1-toshiaki.makita1@gmail.com> References: <20180722151308.5480-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This is the basic implementation of veth driver XDP. Incoming packets are sent from the peer veth device in the form of skb, so this is generally doing the same thing as generic XDP. This itself is not so useful, but a starting point to implement other useful veth XDP features like TX and REDIRECT. This introduces NAPI when XDP is enabled, because XDP is now heavily relies on NAPI context. Use ptr_ring to emulate NIC ring. Tx function enqueues packets to the ring and peer NAPI handler drains the ring. Currently only one ring is allocated for each veth device, so it does not scale on multiqueue env. This can be resolved by allocating rings on the per-queue basis later. Note that NAPI is not used but netif_rx is used when XDP is not loaded, so this does not change the default behaviour. v3: - Fix race on closing the device. - Add extack messages in ndo_bpf. v2: - Squashed with the patch adding NAPI. - Implement adjust_tail. - Don't acquire consumer lock because it is guarded by NAPI. - Make poll_controller noop since it is unnecessary. - Register rxq_info on enabling XDP rather than on opening the device. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 373 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 366 insertions(+), 7 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index a69ad39ee57e..78fa08cb6e24 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -19,10 +19,18 @@ #include #include #include +#include +#include +#include +#include +#include #define DRV_NAME "veth" #define DRV_VERSION "1.0" +#define VETH_RING_SIZE 256 +#define VETH_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN) + struct pcpu_vstats { u64 packets; u64 bytes; @@ -30,9 +38,16 @@ struct pcpu_vstats { }; struct veth_priv { + struct napi_struct xdp_napi; + struct net_device *dev; + struct bpf_prog __rcu *xdp_prog; + struct bpf_prog *_xdp_prog; struct net_device __rcu *peer; atomic64_t dropped; unsigned requested_headroom; + bool rx_notify_masked; + struct ptr_ring xdp_ring; + struct xdp_rxq_info xdp_rxq; }; /* @@ -98,11 +113,43 @@ static const struct ethtool_ops veth_ethtool_ops = { .get_link_ksettings = veth_get_link_ksettings, }; -static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) +/* general routines */ + +static void __veth_xdp_flush(struct veth_priv *priv) +{ + /* Write ptr_ring before reading rx_notify_masked */ + smp_mb(); + if (!priv->rx_notify_masked) { + priv->rx_notify_masked = true; + napi_schedule(&priv->xdp_napi); + } +} + +static int veth_xdp_rx(struct veth_priv *priv, struct sk_buff *skb) +{ + if (unlikely(ptr_ring_produce(&priv->xdp_ring, skb))) { + dev_kfree_skb_any(skb); + return NET_RX_DROP; + } + + return NET_RX_SUCCESS; +} + +static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, bool xdp) { struct veth_priv *priv = netdev_priv(dev); + + return __dev_forward_skb(dev, skb) ?: xdp ? + veth_xdp_rx(priv, skb) : + netif_rx(skb); +} + +static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct veth_priv *rcv_priv, *priv = netdev_priv(dev); struct net_device *rcv; int length = skb->len; + bool rcv_xdp = false; rcu_read_lock(); rcv = rcu_dereference(priv->peer); @@ -111,7 +158,10 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) goto drop; } - if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) { + rcv_priv = netdev_priv(rcv); + rcv_xdp = rcu_access_pointer(rcv_priv->xdp_prog); + + if (likely(veth_forward_skb(rcv, skb, rcv_xdp) == NET_RX_SUCCESS)) { struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats); u64_stats_update_begin(&stats->syncp); @@ -122,14 +172,15 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) drop: atomic64_inc(&priv->dropped); } + + if (rcv_xdp) + __veth_xdp_flush(rcv_priv); + rcu_read_unlock(); + return NETDEV_TX_OK; } -/* - * general routines - */ - static u64 veth_stats_one(struct pcpu_vstats *result, struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); @@ -179,18 +230,253 @@ static void veth_set_multicast_list(struct net_device *dev) { } +static struct sk_buff *veth_build_skb(void *head, int headroom, int len, + int buflen) +{ + struct sk_buff *skb; + + if (!buflen) { + buflen = SKB_DATA_ALIGN(headroom + len) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + } + skb = build_skb(head, buflen); + if (!skb) + return NULL; + + skb_reserve(skb, headroom); + skb_put(skb, len); + + return skb; +} + +static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, + struct sk_buff *skb) +{ + u32 pktlen, headroom, act, metalen; + void *orig_data, *orig_data_end; + int size, mac_len, delta, off; + struct bpf_prog *xdp_prog; + struct xdp_buff xdp; + + rcu_read_lock(); + xdp_prog = rcu_dereference(priv->xdp_prog); + if (unlikely(!xdp_prog)) { + rcu_read_unlock(); + goto out; + } + + mac_len = skb->data - skb_mac_header(skb); + pktlen = skb->len + mac_len; + size = SKB_DATA_ALIGN(VETH_XDP_HEADROOM + pktlen) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + if (size > PAGE_SIZE) + goto drop; + + headroom = skb_headroom(skb) - mac_len; + if (skb_shared(skb) || skb_head_is_locked(skb) || + skb_is_nonlinear(skb) || headroom < XDP_PACKET_HEADROOM) { + struct sk_buff *nskb; + void *head, *start; + struct page *page; + int head_off; + + page = alloc_page(GFP_ATOMIC); + if (!page) + goto drop; + + head = page_address(page); + start = head + VETH_XDP_HEADROOM; + if (skb_copy_bits(skb, -mac_len, start, pktlen)) { + page_frag_free(head); + goto drop; + } + + nskb = veth_build_skb(head, + VETH_XDP_HEADROOM + mac_len, skb->len, + PAGE_SIZE); + if (!nskb) { + page_frag_free(head); + goto drop; + } + + skb_copy_header(nskb, skb); + head_off = skb_headroom(nskb) - skb_headroom(skb); + skb_headers_offset_update(nskb, head_off); + if (skb->sk) + skb_set_owner_w(nskb, skb->sk); + consume_skb(skb); + skb = nskb; + } + + xdp.data_hard_start = skb->head; + xdp.data = skb_mac_header(skb); + xdp.data_end = xdp.data + pktlen; + xdp.data_meta = xdp.data; + xdp.rxq = &priv->xdp_rxq; + orig_data = xdp.data; + orig_data_end = xdp.data_end; + + act = bpf_prog_run_xdp(xdp_prog, &xdp); + + switch (act) { + case XDP_PASS: + break; + default: + bpf_warn_invalid_xdp_action(act); + case XDP_ABORTED: + trace_xdp_exception(priv->dev, xdp_prog, act); + case XDP_DROP: + goto drop; + } + rcu_read_unlock(); + + delta = orig_data - xdp.data; + off = mac_len + delta; + if (off > 0) + __skb_push(skb, off); + else if (off < 0) + __skb_pull(skb, -off); + skb->mac_header -= delta; + off = xdp.data_end - orig_data_end; + if (off != 0) + __skb_put(skb, off); + skb->protocol = eth_type_trans(skb, priv->dev); + + metalen = xdp.data - xdp.data_meta; + if (metalen) + skb_metadata_set(skb, metalen); +out: + return skb; +drop: + rcu_read_unlock(); + kfree_skb(skb); + return NULL; +} + +static int veth_xdp_rcv(struct veth_priv *priv, int budget) +{ + int i, done = 0; + + for (i = 0; i < budget; i++) { + struct sk_buff *skb = __ptr_ring_consume(&priv->xdp_ring); + + if (!skb) + break; + + skb = veth_xdp_rcv_skb(priv, skb); + + if (skb) + napi_gro_receive(&priv->xdp_napi, skb); + + done++; + } + + return done; +} + +static int veth_poll(struct napi_struct *napi, int budget) +{ + struct veth_priv *priv = + container_of(napi, struct veth_priv, xdp_napi); + int done; + + done = veth_xdp_rcv(priv, budget); + + if (done < budget && napi_complete_done(napi, done)) { + /* Write rx_notify_masked before reading ptr_ring */ + smp_store_mb(priv->rx_notify_masked, false); + if (unlikely(!__ptr_ring_empty(&priv->xdp_ring))) { + priv->rx_notify_masked = true; + napi_schedule(&priv->xdp_napi); + } + } + + return done; +} + +static int veth_napi_add(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + int err; + + err = ptr_ring_init(&priv->xdp_ring, VETH_RING_SIZE, GFP_KERNEL); + if (err) + return err; + + netif_napi_add(dev, &priv->xdp_napi, veth_poll, NAPI_POLL_WEIGHT); + napi_enable(&priv->xdp_napi); + + return 0; +} + +static void veth_napi_del(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + + napi_disable(&priv->xdp_napi); + netif_napi_del(&priv->xdp_napi); + priv->rx_notify_masked = false; + ptr_ring_cleanup(&priv->xdp_ring, __skb_array_destroy_skb); +} + +static int veth_enable_xdp(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + int err; + + if (!xdp_rxq_info_is_reg(&priv->xdp_rxq)) { + err = xdp_rxq_info_reg(&priv->xdp_rxq, dev, 0); + if (err < 0) + return err; + + err = xdp_rxq_info_reg_mem_model(&priv->xdp_rxq, + MEM_TYPE_PAGE_SHARED, NULL); + if (err < 0) + goto err; + + err = veth_napi_add(dev); + if (err) + goto err; + } + + rcu_assign_pointer(priv->xdp_prog, priv->_xdp_prog); + + return 0; +err: + xdp_rxq_info_unreg(&priv->xdp_rxq); + + return err; +} + +static void veth_disable_xdp(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + + rcu_assign_pointer(priv->xdp_prog, NULL); + veth_napi_del(dev); + xdp_rxq_info_unreg(&priv->xdp_rxq); +} + static int veth_open(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); struct net_device *peer = rtnl_dereference(priv->peer); + int err; if (!peer) return -ENOTCONN; + if (priv->_xdp_prog) { + err = veth_enable_xdp(dev); + if (err) + return err; + } + if (peer->flags & IFF_UP) { netif_carrier_on(dev); netif_carrier_on(peer); } + return 0; } @@ -203,6 +489,9 @@ static int veth_close(struct net_device *dev) if (peer) netif_carrier_off(peer); + if (priv->_xdp_prog) + veth_disable_xdp(dev); + return 0; } @@ -228,7 +517,7 @@ static void veth_dev_free(struct net_device *dev) static void veth_poll_controller(struct net_device *dev) { /* veth only receives frames when its peer sends one - * Since it's a synchronous operation, we are guaranteed + * Since it has nothing to do with disabling irqs, we are guaranteed * never to have pending data when we poll for it so * there is nothing to do here. * @@ -276,6 +565,72 @@ static void veth_set_rx_headroom(struct net_device *dev, int new_hr) rcu_read_unlock(); } +static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, + struct netlink_ext_ack *extack) +{ + struct veth_priv *priv = netdev_priv(dev); + struct bpf_prog *old_prog; + struct net_device *peer; + int err; + + old_prog = priv->_xdp_prog; + priv->_xdp_prog = prog; + peer = rtnl_dereference(priv->peer); + + if (prog) { + if (!peer) { + NL_SET_ERR_MSG_MOD(extack, "Cannot set XDP when peer is detached"); + err = -ENOTCONN; + goto err; + } + + if (dev->flags & IFF_UP) { + err = veth_enable_xdp(dev); + if (err) { + NL_SET_ERR_MSG_MOD(extack, "Setup for XDP failed"); + goto err; + } + } + } + + if (old_prog) { + if (!prog && dev->flags & IFF_UP) + veth_disable_xdp(dev); + bpf_prog_put(old_prog); + } + + return 0; +err: + priv->_xdp_prog = old_prog; + + return err; +} + +static u32 veth_xdp_query(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + const struct bpf_prog *xdp_prog; + + xdp_prog = priv->_xdp_prog; + if (xdp_prog) + return xdp_prog->aux->id; + + return 0; +} + +static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) +{ + switch (xdp->command) { + case XDP_SETUP_PROG: + return veth_xdp_set(dev, xdp->prog, xdp->extack); + case XDP_QUERY_PROG: + xdp->prog_id = veth_xdp_query(dev); + return 0; + default: + return -EINVAL; + } +} + static const struct net_device_ops veth_netdev_ops = { .ndo_init = veth_dev_init, .ndo_open = veth_open, @@ -290,6 +645,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_get_iflink = veth_get_iflink, .ndo_features_check = passthru_features_check, .ndo_set_rx_headroom = veth_set_rx_headroom, + .ndo_bpf = veth_xdp, }; #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \ @@ -451,10 +807,13 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, */ priv = netdev_priv(dev); + priv->dev = dev; rcu_assign_pointer(priv->peer, peer); priv = netdev_priv(peer); + priv->dev = peer; rcu_assign_pointer(priv->peer, dev); + return 0; err_register_dev: From patchwork Sun Jul 22 15:13:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 947476 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Xq4nx3Kx"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41YSn55jjQz9s4s for ; Mon, 23 Jul 2018 01:13:29 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729822AbeGVQK0 (ORCPT ); Sun, 22 Jul 2018 12:10:26 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:37969 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728394AbeGVQK0 (ORCPT ); Sun, 22 Jul 2018 12:10:26 -0400 Received: by mail-pg1-f194.google.com with SMTP id k3-v6so10416328pgq.5 for ; Sun, 22 Jul 2018 08:13:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=sC+J1Uq8LlX5oDb2s8HJBH1sc9OepEJocKbOX87a3EA=; b=Xq4nx3Kx8ObfJISm8JGQ585TKQIalxWWu8pb+23NpSb5ckwWsleOnGKOGi28wanIk7 Jd3hnYS9oxq+Zg926arXPBxCD72k0/rQyegjWJln5ZCiGe2OF1JH4rc6C/g8KP50Xrng DWvk9gnVNwbxGYqlvrScteBKLCHiuZ148UhCIaacIVVXFPTyndl9W02Nk7ZSxs0tZp35 3ldVr218UCt28eC6pUk7V35t4ZpV87lRENOOSXObtRKLAKTke2JlWJg8v3gYb/IaMhDy XBYxMwUsFRqCHIUElT7i6i4pcnwMigODEDFnwWFOYs/K2bGFstKHFm5H/kGZV0NoBjn8 djWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=sC+J1Uq8LlX5oDb2s8HJBH1sc9OepEJocKbOX87a3EA=; b=As+FG8VlCNIYE+rA8lkrE1qe6S+VVd9v43L4TVj74NtXChoOe5sMGYVIny6CYPr6Ew 5KfSNd1tN8aJKT4/l94gQG5R1spIA8E2u0bG1xYbxqNSdWfc/bFngODt7rhNWLFKAUig BRyiUMgMxIRf2QyUr2FmQZE/5OeShlNrs/SWbIXd4Fdj+zDrm6HH669rkGCI19gGU8iO iuwGliR9QuZyUQdZ3ZXyv4/76zzi2dufhK7nsHCBt28x28jf05FJKWXHp80Uc5mbjozI 0uZQ2+F9NxUAGxuYF/3hmDMOtIbdl1pkdYyf97wmq082/J2KERb6ni8JWs7kxQRZcA4T /Mbg== X-Gm-Message-State: AOUpUlFYh3zmwnCKhAllxIyyGtID5hqa4biAfTRHudF29U3oISYL1g72 cILltBTpWoW8RCJyrI/hu+ktLCSW X-Google-Smtp-Source: AAOMgpd6MMGEFElJW/bAqFZF1AyOoNb75tx4suBRUBldpxmZw+O+rex/z5m6o+30/BGYo2BCAPgINw== X-Received: by 2002:a63:735d:: with SMTP id d29-v6mr9012565pgn.156.1532272406177; Sun, 22 Jul 2018 08:13:26 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id v6-v6sm12092940pfa.28.2018.07.22.08.13.24 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 22 Jul 2018 08:13:25 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer Subject: [PATCH v3 bpf-next 3/8] veth: Avoid drops by oversized packets when XDP is enabled Date: Mon, 23 Jul 2018 00:13:03 +0900 Message-Id: <20180722151308.5480-4-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180722151308.5480-1-toshiaki.makita1@gmail.com> References: <20180722151308.5480-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita All oversized packets including GSO packets are dropped if XDP is enabled on receiver side, so don't send such packets from peer. Drop TSO and SCTP fragmentation features so that veth devices themselves segment packets with XDP enabled. Also cap MTU accordingly. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 41 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 39 insertions(+), 2 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 78fa08cb6e24..f5b72e937d9d 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -542,6 +542,23 @@ static int veth_get_iflink(const struct net_device *dev) return iflink; } +static netdev_features_t veth_fix_features(struct net_device *dev, + netdev_features_t features) +{ + struct veth_priv *priv = netdev_priv(dev); + struct net_device *peer; + + peer = rtnl_dereference(priv->peer); + if (peer) { + struct veth_priv *peer_priv = netdev_priv(peer); + + if (peer_priv->_xdp_prog) + features &= ~NETIF_F_GSO_SOFTWARE; + } + + return features; +} + static void veth_set_rx_headroom(struct net_device *dev, int new_hr) { struct veth_priv *peer_priv, *priv = netdev_priv(dev); @@ -591,14 +608,33 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, goto err; } } + + if (!old_prog) { + peer->hw_features &= ~NETIF_F_GSO_SOFTWARE; + peer->max_mtu = PAGE_SIZE - VETH_XDP_HEADROOM - + peer->hard_header_len - + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + if (peer->mtu > peer->max_mtu) + dev_set_mtu(peer, peer->max_mtu); + } } if (old_prog) { - if (!prog && dev->flags & IFF_UP) - veth_disable_xdp(dev); + if (!prog) { + if (dev->flags & IFF_UP) + veth_disable_xdp(dev); + + if (peer) { + peer->hw_features |= NETIF_F_GSO_SOFTWARE; + peer->max_mtu = ETH_MAX_MTU; + } + } bpf_prog_put(old_prog); } + if ((!!old_prog ^ !!prog) && peer) + netdev_update_features(peer); + return 0; err: priv->_xdp_prog = old_prog; @@ -643,6 +679,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_poll_controller = veth_poll_controller, #endif .ndo_get_iflink = veth_get_iflink, + .ndo_fix_features = veth_fix_features, .ndo_features_check = passthru_features_check, .ndo_set_rx_headroom = veth_set_rx_headroom, .ndo_bpf = veth_xdp, From patchwork Sun Jul 22 15:13:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 947477 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="AZdXtBPm"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41YSn82MBjz9s3N for ; Mon, 23 Jul 2018 01:13:32 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729837AbeGVQK3 (ORCPT ); Sun, 22 Jul 2018 12:10:29 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:39538 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728394AbeGVQK2 (ORCPT ); Sun, 22 Jul 2018 12:10:28 -0400 Received: by mail-pg1-f193.google.com with SMTP id g2-v6so10423887pgs.6 for ; Sun, 22 Jul 2018 08:13:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=HVYT0cjsAEwe3aB9bFEs36Am7aGr36kf3dkF2f1syuw=; b=AZdXtBPmyeY3KnAlryfAYh8Ie22xBo3EekXSxFm89cPDS7xxddswSSjv9urJOUx+tC VzN8kA0wEn/p7V8oxOBW/6CZoBY5s9q5tWQjSv3aUPWwGHuB44Gw6GiuAKXL15pxPFof t7CPz7hblBzA/1vx9zaK2g/RCwUBWR/RhuPojeh5izKe8OMXWV1ECR5YjjcVhm57wK0e ve6/Ud4Hik2ymak4bNdnzI5Q2DH8uDDPxA21BjlbxYL2HCWPlRZIbzSko4BBuIF2NZqQ 6NJClbgOPENomeInr7Euvwp3FUEKKAFvdahtlcj3y7z3Pjpm659ugmANku3zeJJMFtBL kYbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=HVYT0cjsAEwe3aB9bFEs36Am7aGr36kf3dkF2f1syuw=; b=hqmBctF4fw+IHH3Ns3MvGjrP2Ti5menDiiAB+Cs+SnPPWUjqqfnFHJ4Y1TWADtc93l vSKxsPAdM2vhXyOu+S5aPw0Cbm9+YOfY8AyEGKvoYTJZep5IPil8v+6t5wBZqZeTl0Vq OZgL70HUctJLeiZscOLuy3U16R9JxgWVfIz/ZPyGTZYCxBeLnGR0VsR9PuoUy0Q61sL9 KP/mWhKO0dr6AThY2nX6JLfuaUstjJBWwaLe+H1vH2turm63olwo3Z1nlJ/IHM3OLm4L Km1HOWbDgYreIVAjoaHuTTPjqlk7uiV8wQP+Sug0Q6MpKPtM8l+9uvino1+d/keubpGX 1Yew== X-Gm-Message-State: AOUpUlEXt7B3FTzwyowLFSxcuCWzS5e4P9LIEPsev0e6KNQ/YqVsTZys ZT8adZt/kMNxu8DvXao9sm5a26uH X-Google-Smtp-Source: AAOMgpfsMv0k02v1lNk0w/GBESNOWDFOb05V5laK/zCgeQRC5s+kkwdzhZ9nv2K3XKkwsRQa1SXFRA== X-Received: by 2002:a63:6243:: with SMTP id w64-v6mr8903457pgb.179.1532272408493; Sun, 22 Jul 2018 08:13:28 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id v6-v6sm12092940pfa.28.2018.07.22.08.13.26 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 22 Jul 2018 08:13:27 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer Subject: [PATCH v3 bpf-next 4/8] veth: Handle xdp_frames in xdp napi ring Date: Mon, 23 Jul 2018 00:13:04 +0900 Message-Id: <20180722151308.5480-5-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180722151308.5480-1-toshiaki.makita1@gmail.com> References: <20180722151308.5480-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This is preparation for XDP TX and ndo_xdp_xmit. This allows napi handler to handle xdp_frames through xdp ring as well as sk_buff. v3: - Revert v2 change around rings and use a flag to differentiate skb and xdp_frame, since bulk skb xmit makes little performance difference for now. v2: - Use another ring instead of using flag to differentiate skb and xdp_frame. This approach makes bulk skb transmit possible in veth_xmit later. - Clear xdp_frame feilds in skb->head. - Implement adjust_tail. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 82 insertions(+), 5 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index f5b72e937d9d..4be75c58bc6a 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -22,12 +22,12 @@ #include #include #include -#include #include #define DRV_NAME "veth" #define DRV_VERSION "1.0" +#define VETH_XDP_FLAG BIT(0) #define VETH_RING_SIZE 256 #define VETH_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN) @@ -115,6 +115,24 @@ static const struct ethtool_ops veth_ethtool_ops = { /* general routines */ +static bool veth_is_xdp_frame(void *ptr) +{ + return (unsigned long)ptr & VETH_XDP_FLAG; +} + +static void *veth_ptr_to_xdp(void *ptr) +{ + return (void *)((unsigned long)ptr & ~VETH_XDP_FLAG); +} + +static void veth_ptr_free(void *ptr) +{ + if (veth_is_xdp_frame(ptr)) + xdp_return_frame(veth_ptr_to_xdp(ptr)); + else + kfree_skb(ptr); +} + static void __veth_xdp_flush(struct veth_priv *priv) { /* Write ptr_ring before reading rx_notify_masked */ @@ -249,6 +267,61 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len, return skb; } +static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, + struct xdp_frame *frame) +{ + int len = frame->len, delta = 0; + struct bpf_prog *xdp_prog; + unsigned int headroom; + struct sk_buff *skb; + + rcu_read_lock(); + xdp_prog = rcu_dereference(priv->xdp_prog); + if (likely(xdp_prog)) { + struct xdp_buff xdp; + u32 act; + + xdp.data_hard_start = frame->data - frame->headroom; + xdp.data = frame->data; + xdp.data_end = frame->data + frame->len; + xdp.data_meta = frame->data - frame->metasize; + xdp.rxq = &priv->xdp_rxq; + + act = bpf_prog_run_xdp(xdp_prog, &xdp); + + switch (act) { + case XDP_PASS: + delta = frame->data - xdp.data; + len = xdp.data_end - xdp.data; + break; + default: + bpf_warn_invalid_xdp_action(act); + case XDP_ABORTED: + trace_xdp_exception(priv->dev, xdp_prog, act); + case XDP_DROP: + goto err_xdp; + } + } + rcu_read_unlock(); + + headroom = frame->data - delta - (void *)frame; + skb = veth_build_skb(frame, headroom, len, 0); + if (!skb) { + xdp_return_frame(frame); + goto err; + } + + memset(frame, 0, sizeof(*frame)); + skb->protocol = eth_type_trans(skb, priv->dev); +err: + return skb; +err_xdp: + rcu_read_unlock(); + xdp_return_frame(frame); + + return NULL; +} + static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, struct sk_buff *skb) { @@ -358,12 +431,16 @@ static int veth_xdp_rcv(struct veth_priv *priv, int budget) int i, done = 0; for (i = 0; i < budget; i++) { - struct sk_buff *skb = __ptr_ring_consume(&priv->xdp_ring); + void *ptr = __ptr_ring_consume(&priv->xdp_ring); + struct sk_buff *skb; - if (!skb) + if (!ptr) break; - skb = veth_xdp_rcv_skb(priv, skb); + if (veth_is_xdp_frame(ptr)) + skb = veth_xdp_rcv_one(priv, veth_ptr_to_xdp(ptr)); + else + skb = veth_xdp_rcv_skb(priv, ptr); if (skb) napi_gro_receive(&priv->xdp_napi, skb); @@ -416,7 +493,7 @@ static void veth_napi_del(struct net_device *dev) napi_disable(&priv->xdp_napi); netif_napi_del(&priv->xdp_napi); priv->rx_notify_masked = false; - ptr_ring_cleanup(&priv->xdp_ring, __skb_array_destroy_skb); + ptr_ring_cleanup(&priv->xdp_ring, veth_ptr_free); } static int veth_enable_xdp(struct net_device *dev) From patchwork Sun Jul 22 15:13:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 947478 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="AhELqD6y"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41YSnB5YZrz9s3N for ; Mon, 23 Jul 2018 01:13:34 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729850AbeGVQKb (ORCPT ); Sun, 22 Jul 2018 12:10:31 -0400 Received: from mail-pl0-f65.google.com ([209.85.160.65]:33255 "EHLO mail-pl0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728394AbeGVQKb (ORCPT ); Sun, 22 Jul 2018 12:10:31 -0400 Received: by mail-pl0-f65.google.com with SMTP id 6-v6so7160717plb.0 for ; Sun, 22 Jul 2018 08:13:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Fej+FK2Ec5SX1ZXBPw3rC9Kf3wDYZpa1YiWyhnveH7k=; b=AhELqD6yUT88AlFXl5j0FqBuQEvCnXDt3d6ek2phq/6DtOPGp1YCsr91CggS1wMnqG vsyk25S4J+FEJpGQjyj0sDq0ivynQoo6Mp+Y65KvKtmqhMQE6s8dcDe1LC0SR1oedfHv 9baate6LN4YLt2a5G0/GFzidDTsqqd7GZqpaQ3ciaGeZl8dRdmhOCqxXuLIy/Ubi0McI 3iN+yppPek6KT21ZA0vnB5Pmem7ZoTV9iLwbvVbaXDfG3QHkXLO3/jAahQqkqQLTv7+Z 0xh3XpuKZHFZOC3aHmTd65cqthHQaJ0QHVxCWv0HPzLvFej8gpAlXMLBKf3ddoM/pD/a hANw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Fej+FK2Ec5SX1ZXBPw3rC9Kf3wDYZpa1YiWyhnveH7k=; b=HpdMU9bozsCUlDjrMbCe0SZMTHliN3yG72JP2NXqpaqo9Ehs0q5fVss4NXfDvb2eNs p/gEJCubsm7s3eDMz4/wGvI6P0CsjE6FB0K8Jo3tqee7x1Tn/g0a/fc/mSHe22dwb0dC njh6P30Y2HiUYKYEr0Bly6NuaoF3gRJjqbRCC3APU9WAPD6oxmvQBet/1dlLgyQNpDqp Qru8CUEkIz3MqPn4hX2fQRLNCSWvD6G3PIlatyjHP/jxafPzZq4uKBbceiPrFddunlR2 /kCQ0UjM0f5b94CI5F31BpT08/3HWDSDggrlM1o+TG/2ThODtKh3P4pNIxQM+ivzglnL lvWA== X-Gm-Message-State: AOUpUlFePXefnwQYjLZwP2k3M2mYnufdAbZvb8GVqf1vZJrSwdzCb1YM xu+q1Yzxdl3TGcYZgEa6xuuHZYsh X-Google-Smtp-Source: AAOMgpeFQBQ2Wh/cTrS7AfaanIJG7SRWuDKOvwv+61g5Rqa38778d48zMEDcCjceBzc2m3eSD3dwvA== X-Received: by 2002:a17:902:8a87:: with SMTP id p7-v6mr9319172plo.281.1532272410848; Sun, 22 Jul 2018 08:13:30 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id v6-v6sm12092940pfa.28.2018.07.22.08.13.28 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 22 Jul 2018 08:13:30 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer Subject: [PATCH v3 bpf-next 5/8] veth: Add ndo_xdp_xmit Date: Mon, 23 Jul 2018 00:13:05 +0900 Message-Id: <20180722151308.5480-6-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180722151308.5480-1-toshiaki.makita1@gmail.com> References: <20180722151308.5480-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This allows NIC's XDP to redirect packets to veth. The destination veth device enqueues redirected packets to the napi ring of its peer, then they are processed by XDP on its peer veth device. This can be thought as calling another XDP program by XDP program using REDIRECT, when the peer enables driver XDP. Note that when the peer veth device does not set driver xdp, redirected packets will be dropped because the peer is not ready for NAPI. v2: - Drop the part converting xdp_frame into skb when XDP is not enabled. - Implement bulk interface of ndo_xdp_xmit. - Implement XDP_XMIT_FLUSH bit and drop ndo_xdp_flush. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 4be75c58bc6a..57187e955fea 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -125,6 +126,11 @@ static void *veth_ptr_to_xdp(void *ptr) return (void *)((unsigned long)ptr & ~VETH_XDP_FLAG); } +static void *veth_xdp_to_ptr(void *ptr) +{ + return (void *)((unsigned long)ptr | VETH_XDP_FLAG); +} + static void veth_ptr_free(void *ptr) { if (veth_is_xdp_frame(ptr)) @@ -267,6 +273,44 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len, return skb; } +static int veth_xdp_xmit(struct net_device *dev, int n, + struct xdp_frame **frames, u32 flags) +{ + struct veth_priv *rcv_priv, *priv = netdev_priv(dev); + struct net_device *rcv; + int i, drops = 0; + + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) + return -EINVAL; + + rcv = rcu_dereference(priv->peer); + if (unlikely(!rcv)) + return -ENXIO; + + rcv_priv = netdev_priv(rcv); + /* xdp_ring is initialized on receive side? */ + if (!rcu_access_pointer(rcv_priv->xdp_prog)) + return -ENXIO; + + spin_lock(&rcv_priv->xdp_ring.producer_lock); + for (i = 0; i < n; i++) { + struct xdp_frame *frame = frames[i]; + void *ptr = veth_xdp_to_ptr(frame); + + if (unlikely(xdp_ok_fwd_dev(rcv, frame->len) || + __ptr_ring_produce(&rcv_priv->xdp_ring, ptr))) { + xdp_return_frame_rx_napi(frame); + drops++; + } + } + spin_unlock(&rcv_priv->xdp_ring.producer_lock); + + if (flags & XDP_XMIT_FLUSH) + __veth_xdp_flush(rcv_priv); + + return n - drops; +} + static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, struct xdp_frame *frame) { @@ -760,6 +804,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_features_check = passthru_features_check, .ndo_set_rx_headroom = veth_set_rx_headroom, .ndo_bpf = veth_xdp, + .ndo_xdp_xmit = veth_xdp_xmit, }; #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \ From patchwork Sun Jul 22 15:13:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 947479 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="RKNpjAmm"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41YSnC5BBgz9s4Z for ; Mon, 23 Jul 2018 01:13:35 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729862AbeGVQKd (ORCPT ); Sun, 22 Jul 2018 12:10:33 -0400 Received: from mail-pl0-f67.google.com ([209.85.160.67]:43278 "EHLO mail-pl0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728394AbeGVQKd (ORCPT ); Sun, 22 Jul 2018 12:10:33 -0400 Received: by mail-pl0-f67.google.com with SMTP id o7-v6so7147468plk.10 for ; Sun, 22 Jul 2018 08:13:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=2nzcyJeVttL2IwH2+ztuTUDWWCC4+d2p170/bZF+Ll4=; b=RKNpjAmm9252ciGjHQ9ZCeGGJpPVq/0/MxvrXtqwRYfrzI4I/3R9WZsmxRzkOpmYyD /t8JBlQBn9hhQrDRIIG9dYFvZE3T+e+1C8hFdYm92P0t4sH7ygPd7DHnHIO6OIU1MIGd 1zRo9F5SBqgk8s6COgXS/b2XJF6VggHx8KOJc3S8QqVV+GuYKRPj0tZ4HKwzT7cA8eUn EquLGjyARlBJ6XfcIAWILZl12JCJ8jk0zqfG/YAfp0j6dvvfEiEAXGRbq0ezjA03QZK0 1qnwDSS0CW/8q/WchFqJBI9imZrUVg4yvG/DI+qJ0S+tRNpecZ5cK9Ip8jQerij4OQLa lsyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=2nzcyJeVttL2IwH2+ztuTUDWWCC4+d2p170/bZF+Ll4=; b=R+DDNjKjek4mBsQ/hmtRT0aXdLfggrPHnNjvqkS9TvM43NILYgRAbv9L5rlRZN8/Ol WPeD1pMN5so2QFlY2NqalIFBaz544YdMbydLwodTpHdxpBodjQ9UvjNZiHIGNs+8c7qM UOfDjT1igD+ygxbdyRMqQYFFEbFpKYVP3jhI4Jjm+rDfd+32V5Gkob7KW5cTC8tKgKcy YAl4XvHeZdTaYJy11Xxx1AuXRj1n1bXTzbLwpbrJ+5Ff5jvmeoFBLRE9CEKs5njDUyPX Mff/3IroOfTGTiZ6OtjpScNodILDsMPuVYZKE5xO2xSH/oOQRNnqJuu0i1IVOnhnP/zs b6SQ== X-Gm-Message-State: AOUpUlHaWAEKijBCbget1MovWLscJVRgjRtaRa8Om20ruXWL+vSFDuDW drN0kS/9aS7h4wpsQptU0pj/NCfK X-Google-Smtp-Source: AAOMgpcqAlniKKA7O8PXPTQ83Dq7g0637oZYZfnAEn4RqaEg0OV+5yw9ifgWWMHh/XSJ8FIBQHLV+Q== X-Received: by 2002:a17:902:44a4:: with SMTP id l33-v6mr9435801pld.134.1532272413202; Sun, 22 Jul 2018 08:13:33 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id v6-v6sm12092940pfa.28.2018.07.22.08.13.31 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 22 Jul 2018 08:13:32 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer Subject: [PATCH v3 bpf-next 6/8] xdp: Add a flag for disabling napi_direct of xdp_return_frame in xdp_mem_info Date: Mon, 23 Jul 2018 00:13:06 +0900 Message-Id: <20180722151308.5480-7-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180722151308.5480-1-toshiaki.makita1@gmail.com> References: <20180722151308.5480-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita We need some mechanism to disable napi_direct on calling xdp_return_frame_rx_napi() from some context. When veth gets support of XDP_REDIRECT, it will redirects packets which are redirected from other devices. On redirection veth will reuse xdp_mem_info of the redirection source device to make return_frame work. But in this case .ndo_xdp_xmit() called from veth redirection uses xdp_mem_info which is not guarded by NAPI, because the .ndo_xdp_xmit is not called directly from the rxq which owns the xdp_mem_info. This approach introduces a flag in xdp_mem_info to indicate that napi_direct should be disabled even when _rx_napi variant is used. Signed-off-by: Toshiaki Makita --- include/net/xdp.h | 4 ++++ net/core/xdp.c | 6 ++++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index fcb033f51d8c..1d1bc6553ff2 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -41,6 +41,9 @@ enum xdp_mem_type { MEM_TYPE_MAX, }; +/* XDP flags for xdp_mem_info */ +#define XDP_MEM_RF_NO_DIRECT BIT(0) /* don't use napi_direct */ + /* XDP flags for ndo_xdp_xmit */ #define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */ #define XDP_XMIT_FLAGS_MASK XDP_XMIT_FLUSH @@ -48,6 +51,7 @@ enum xdp_mem_type { struct xdp_mem_info { u32 type; /* enum xdp_mem_type, but known size type */ u32 id; + u32 flags; }; struct page_pool; diff --git a/net/core/xdp.c b/net/core/xdp.c index 57285383ed00..1426c608fd75 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -330,10 +330,12 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct, /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */ xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params); page = virt_to_head_page(data); - if (xa) + if (xa) { + napi_direct &= !(mem->flags & XDP_MEM_RF_NO_DIRECT); page_pool_put_page(xa->page_pool, page, napi_direct); - else + } else { put_page(page); + } rcu_read_unlock(); break; case MEM_TYPE_PAGE_SHARED: From patchwork Sun Jul 22 15:13:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 947480 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="EaFRFugT"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41YSnG3nZWz9s3N for ; Mon, 23 Jul 2018 01:13:38 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729874AbeGVQKg (ORCPT ); Sun, 22 Jul 2018 12:10:36 -0400 Received: from mail-pl0-f65.google.com ([209.85.160.65]:33259 "EHLO mail-pl0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728394AbeGVQKf (ORCPT ); Sun, 22 Jul 2018 12:10:35 -0400 Received: by mail-pl0-f65.google.com with SMTP id 6-v6so7160760plb.0 for ; Sun, 22 Jul 2018 08:13:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=/CZGL/TRevkImEXNf0cT9Y6Le2TsY9OwB0VyJqv64pU=; b=EaFRFugTt4Q3TeQ9g5dgwJKOzSytGZ/8ET6EuBHhHBDI7xQDm0iigBZMVNcJrFXaSu uMvGYo0I6UrH4bxXnebZrWiV++01tSVp6IQHX39aLwY/M0G5U5KyBvRPthkCstFS2Ynb fZtDz2iEyrHF5WhfkjzwRF4LvXLdaEr6NF6NxSFfF4eZcTlmxntsjkK6CcM5+x9rtEb8 woay9jDvFTuHk7qKHVSGeiGo7NiOFFSPpHfKaJe2aMu/X7H3mOgfQrgl6CAQcv69j0BG FuCDL8CIv0Jfmj7G6Uzpf9DT+s9/IRr7KiRBySnu23HvC8ZFKND3iUxgBUKg2QIXpdz/ 8gHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=/CZGL/TRevkImEXNf0cT9Y6Le2TsY9OwB0VyJqv64pU=; b=uldjbHiI5cn/orOYQWND99s1ctUBPrqksmtxk8HjfxDxnznM1OA+8nzm4grA4gI5e+ dtcPfOih88zsPSAHXmDs8DgNbzrTNpF0vend9lf1/xz5EWSsBuVSY6CcA0KMOHylBQUa WK6c0cQPMwMnKnS3KCXsY0kllz0Kd3VWEAJHBJ6iNuMnwFrle5JAu/K2rNdN7o4m1fti a/qhHITsoNxEte7xO5RPQVnHDXWtLx9dqA6lJAckh8j8QvmiRZ71r4LkSk3I3cpAZ2MX ydsRXgaS7kZmSDMFs6G/0S7PnZrBvpQXjZWNSLlFbIN/XJnexkePvxeyHnsM9JRTExye NfSw== X-Gm-Message-State: AOUpUlHgRRMrwNDY4C9LMOaOT9FrVZoPICMTBbZR+j+7G7EhaZNVab7Q kIR8M6PmGQ3O8FTyoEGzB3Pr2xJg X-Google-Smtp-Source: AAOMgpdLvcfs+EzmBNbesaAThfjvcM9GLRfxEZ1IpoIRfJ7q6M7m66RyjHDNxTNKpOsRg9vZw9R1tg== X-Received: by 2002:a17:902:8308:: with SMTP id bd8-v6mr9476238plb.329.1532272415592; Sun, 22 Jul 2018 08:13:35 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id v6-v6sm12092940pfa.28.2018.07.22.08.13.33 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 22 Jul 2018 08:13:34 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer Subject: [PATCH v3 bpf-next 7/8] veth: Add XDP TX and REDIRECT Date: Mon, 23 Jul 2018 00:13:07 +0900 Message-Id: <20180722151308.5480-8-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180722151308.5480-1-toshiaki.makita1@gmail.com> References: <20180722151308.5480-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This allows further redirection of xdp_frames like NIC -> veth--veth -> veth--veth (XDP) (XDP) (XDP) The intermediate XDP, redirecting packets from NIC to the other veth, reuses xdp_mem_info from NIC so that page recycling of the NIC works on the destination veth's XDP. In this way return_frame is not fully guarded by NAPI, since another NAPI handler on another cpu may use the same xdp_mem_info concurrently. Thus disable napi_direct by XDP_MEM_RF_NO_DIRECT flag. v3: - Fix double free when veth_xdp_tx() returns a positive value. - Convert xdp_xmit and xdp_redir variables into flags. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 110 insertions(+), 9 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 57187e955fea..0323a4ca74e2 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -32,6 +32,10 @@ #define VETH_RING_SIZE 256 #define VETH_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN) +/* Separating two types of XDP xmit */ +#define VETH_XDP_TX BIT(0) +#define VETH_XDP_REDIR BIT(1) + struct pcpu_vstats { u64 packets; u64 bytes; @@ -45,6 +49,7 @@ struct veth_priv { struct bpf_prog *_xdp_prog; struct net_device __rcu *peer; atomic64_t dropped; + struct xdp_mem_info xdp_mem; unsigned requested_headroom; bool rx_notify_masked; struct ptr_ring xdp_ring; @@ -311,10 +316,42 @@ static int veth_xdp_xmit(struct net_device *dev, int n, return n - drops; } +static void veth_xdp_flush(struct net_device *dev) +{ + struct veth_priv *rcv_priv, *priv = netdev_priv(dev); + struct net_device *rcv; + + rcu_read_lock(); + rcv = rcu_dereference(priv->peer); + if (unlikely(!rcv)) + goto out; + + rcv_priv = netdev_priv(rcv); + /* xdp_ring is initialized on receive side? */ + if (unlikely(!rcu_access_pointer(rcv_priv->xdp_prog))) + goto out; + + __veth_xdp_flush(rcv_priv); +out: + rcu_read_unlock(); +} + +static int veth_xdp_tx(struct net_device *dev, struct xdp_buff *xdp) +{ + struct xdp_frame *frame = convert_to_xdp_frame(xdp); + + if (unlikely(!frame)) + return -EOVERFLOW; + + return veth_xdp_xmit(dev, 1, &frame, 0); +} + static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, - struct xdp_frame *frame) + struct xdp_frame *frame, + unsigned int *xdp_xmit) { int len = frame->len, delta = 0; + struct xdp_frame orig_frame; struct bpf_prog *xdp_prog; unsigned int headroom; struct sk_buff *skb; @@ -338,6 +375,31 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, delta = frame->data - xdp.data; len = xdp.data_end - xdp.data; break; + case XDP_TX: + orig_frame = *frame; + xdp.data_hard_start = frame; + xdp.rxq->mem = frame->mem; + xdp.rxq->mem.flags |= XDP_MEM_RF_NO_DIRECT; + if (unlikely(veth_xdp_tx(priv->dev, &xdp) < 0)) { + trace_xdp_exception(priv->dev, xdp_prog, act); + frame = &orig_frame; + goto err_xdp; + } + *xdp_xmit |= VETH_XDP_TX; + rcu_read_unlock(); + goto xdp_xmit; + case XDP_REDIRECT: + orig_frame = *frame; + xdp.data_hard_start = frame; + xdp.rxq->mem = frame->mem; + xdp.rxq->mem.flags |= XDP_MEM_RF_NO_DIRECT; + if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) { + frame = &orig_frame; + goto err_xdp; + } + *xdp_xmit |= VETH_XDP_REDIR; + rcu_read_unlock(); + goto xdp_xmit; default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: @@ -362,12 +424,13 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, err_xdp: rcu_read_unlock(); xdp_return_frame(frame); - +xdp_xmit: return NULL; } static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, - struct sk_buff *skb) + struct sk_buff *skb, + unsigned int *xdp_xmit) { u32 pktlen, headroom, act, metalen; void *orig_data, *orig_data_end; @@ -438,6 +501,26 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, switch (act) { case XDP_PASS: break; + case XDP_TX: + get_page(virt_to_page(xdp.data)); + consume_skb(skb); + xdp.rxq->mem = priv->xdp_mem; + if (unlikely(veth_xdp_tx(priv->dev, &xdp) < 0)) { + trace_xdp_exception(priv->dev, xdp_prog, act); + goto err_xdp; + } + *xdp_xmit |= VETH_XDP_TX; + rcu_read_unlock(); + goto xdp_xmit; + case XDP_REDIRECT: + get_page(virt_to_page(xdp.data)); + consume_skb(skb); + xdp.rxq->mem = priv->xdp_mem; + if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) + goto err_xdp; + *xdp_xmit |= VETH_XDP_REDIR; + rcu_read_unlock(); + goto xdp_xmit; default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: @@ -468,9 +551,15 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, rcu_read_unlock(); kfree_skb(skb); return NULL; +err_xdp: + rcu_read_unlock(); + page_frag_free(xdp.data); +xdp_xmit: + return NULL; } -static int veth_xdp_rcv(struct veth_priv *priv, int budget) +static int veth_xdp_rcv(struct veth_priv *priv, int budget, + unsigned int *xdp_xmit) { int i, done = 0; @@ -481,10 +570,12 @@ static int veth_xdp_rcv(struct veth_priv *priv, int budget) if (!ptr) break; - if (veth_is_xdp_frame(ptr)) - skb = veth_xdp_rcv_one(priv, veth_ptr_to_xdp(ptr)); - else - skb = veth_xdp_rcv_skb(priv, ptr); + if (veth_is_xdp_frame(ptr)) { + skb = veth_xdp_rcv_one(priv, veth_ptr_to_xdp(ptr), + xdp_xmit); + } else { + skb = veth_xdp_rcv_skb(priv, ptr, xdp_xmit); + } if (skb) napi_gro_receive(&priv->xdp_napi, skb); @@ -499,9 +590,10 @@ static int veth_poll(struct napi_struct *napi, int budget) { struct veth_priv *priv = container_of(napi, struct veth_priv, xdp_napi); + unsigned int xdp_xmit = 0; int done; - done = veth_xdp_rcv(priv, budget); + done = veth_xdp_rcv(priv, budget, &xdp_xmit); if (done < budget && napi_complete_done(napi, done)) { /* Write rx_notify_masked before reading ptr_ring */ @@ -512,6 +604,11 @@ static int veth_poll(struct napi_struct *napi, int budget) } } + if (xdp_xmit & VETH_XDP_TX) + veth_xdp_flush(priv->dev); + if (xdp_xmit & VETH_XDP_REDIR) + xdp_do_flush_map(); + return done; } @@ -558,6 +655,9 @@ static int veth_enable_xdp(struct net_device *dev) err = veth_napi_add(dev); if (err) goto err; + + /* Save original mem info as it can be overwritten */ + priv->xdp_mem = priv->xdp_rxq.mem; } rcu_assign_pointer(priv->xdp_prog, priv->_xdp_prog); @@ -575,6 +675,7 @@ static void veth_disable_xdp(struct net_device *dev) rcu_assign_pointer(priv->xdp_prog, NULL); veth_napi_del(dev); + priv->xdp_rxq.mem = priv->xdp_mem; xdp_rxq_info_unreg(&priv->xdp_rxq); } From patchwork Sun Jul 22 15:13:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 947481 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="EShrrtq0"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41YSnK1yWcz9s3N for ; Mon, 23 Jul 2018 01:13:41 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729893AbeGVQKj (ORCPT ); Sun, 22 Jul 2018 12:10:39 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:41419 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728394AbeGVQKi (ORCPT ); Sun, 22 Jul 2018 12:10:38 -0400 Received: by mail-pg1-f196.google.com with SMTP id z8-v6so10417908pgu.8 for ; Sun, 22 Jul 2018 08:13:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=r+MAR/LnbflfgpWzEzka02WT94aYZZrM6JxlOazmQeA=; b=EShrrtq0MWB7C9yRZQMDEiwNjqnrd5tDpeDrbrm/NsJlL3hPtNwltlnBpHfj+58l9J u/2vnpRl++qGzwd4ZAmYmWMNOxaY7EVLqKZ1UeyFjNCzgQ3FtvQduaXz1TqwHx3PdjFx fQZHeQSJ/2Bt4H/mqG+CxJsvnvfLVitEgdoYs0iqMuyasK+J/fHgkFiVJrAOZ12r9xrq E64ZuFdPSc1GgbazPvqQNArzAMpeRTTdvXyfF/4O4SJl1852V7Ryqqm0EX/BZpjpVOb7 xh2FA1vI6UKryI2opj4tKSyE0HPjKic2Lo7IqlK9P3KIwzHEzvt8cbu9CrmgQ11yJJ1a DCig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=r+MAR/LnbflfgpWzEzka02WT94aYZZrM6JxlOazmQeA=; b=F648l/9iZh7HaYTlazlbpoOyns8sNneG6LEQVKSNPjNZoTadHaTZB3UOEBZtc4gAHN C4MYD04uXUtHEbVgo5wc5+n3V9+EhmafGbTXwitnrpo+buyt2T851ywy76WoUIS7GUP2 hrI7RkOJpSOKxQPBLr/4SKh4zqX4J8QmDswST5okKGoB3KIKp253AEzW3+OlrIAr9gvx pjx802JW/Hu3osgRpJ3CqHUK2M4w32z0BGpKYTDpqZ+fRZzKdOoKLxDP/7cFh6G8I164 vrSmkCK8uFwr1qQRsjcBilHWn8R97JXCdD/dwjNCOv1jg2u1KKcj8xH0XUhbKnlOcgn6 fVOQ== X-Gm-Message-State: AOUpUlEHb8p9Zt8iqdoRSkDejt/7ci7XUJ2RU5UK2LvRyJkDXED/Sw2Y yYD3NeLBpn0a7UY0Ccc5CRKGPo50 X-Google-Smtp-Source: AAOMgpfY9C3VAjignama5HGRFQIXEamagpkvSYNZ69qh1Rn6S3AN+MTDA72ZcjSUFloMUP/gvxvByg== X-Received: by 2002:a63:7d48:: with SMTP id m8-v6mr9084469pgn.0.1532272418093; Sun, 22 Jul 2018 08:13:38 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id v6-v6sm12092940pfa.28.2018.07.22.08.13.35 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 22 Jul 2018 08:13:37 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann Cc: Toshiaki Makita , Jesper Dangaard Brouer Subject: [PATCH v3 bpf-next 8/8] veth: Support per queue XDP ring Date: Mon, 23 Jul 2018 00:13:08 +0900 Message-Id: <20180722151308.5480-9-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180722151308.5480-1-toshiaki.makita1@gmail.com> References: <20180722151308.5480-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita Move XDP and napi related fields in veth_priv to newly created veth_rq structure. When xdp_frames are enqueued from ndo_xdp_xmit and XDP_TX, rxq is selected by current cpu. When skbs are enqueued from the peer device, rxq is one to one mapping of its peer txq. This way we have a restriction that the number of rxqs must not less than the number of peer txqs, but leave the possibility to achieve bulk skb xmit in the future because txq lock would make it possible to remove rxq ptr_ring lock. v3: - Add extack messages. - Fix array overrun in veth_xmit. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 278 ++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 188 insertions(+), 90 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 0323a4ca74e2..84482d9901ec 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -42,20 +42,24 @@ struct pcpu_vstats { struct u64_stats_sync syncp; }; -struct veth_priv { +struct veth_rq { struct napi_struct xdp_napi; struct net_device *dev; struct bpf_prog __rcu *xdp_prog; - struct bpf_prog *_xdp_prog; - struct net_device __rcu *peer; - atomic64_t dropped; struct xdp_mem_info xdp_mem; - unsigned requested_headroom; bool rx_notify_masked; struct ptr_ring xdp_ring; struct xdp_rxq_info xdp_rxq; }; +struct veth_priv { + struct net_device __rcu *peer; + atomic64_t dropped; + struct bpf_prog *_xdp_prog; + struct veth_rq *rq; + unsigned int requested_headroom; +}; + /* * ethtool interface */ @@ -144,19 +148,19 @@ static void veth_ptr_free(void *ptr) kfree_skb(ptr); } -static void __veth_xdp_flush(struct veth_priv *priv) +static void __veth_xdp_flush(struct veth_rq *rq) { /* Write ptr_ring before reading rx_notify_masked */ smp_mb(); - if (!priv->rx_notify_masked) { - priv->rx_notify_masked = true; - napi_schedule(&priv->xdp_napi); + if (!rq->rx_notify_masked) { + rq->rx_notify_masked = true; + napi_schedule(&rq->xdp_napi); } } -static int veth_xdp_rx(struct veth_priv *priv, struct sk_buff *skb) +static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb) { - if (unlikely(ptr_ring_produce(&priv->xdp_ring, skb))) { + if (unlikely(ptr_ring_produce(&rq->xdp_ring, skb))) { dev_kfree_skb_any(skb); return NET_RX_DROP; } @@ -164,21 +168,22 @@ static int veth_xdp_rx(struct veth_priv *priv, struct sk_buff *skb) return NET_RX_SUCCESS; } -static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, bool xdp) +static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, + struct veth_rq *rq, bool xdp) { - struct veth_priv *priv = netdev_priv(dev); - return __dev_forward_skb(dev, skb) ?: xdp ? - veth_xdp_rx(priv, skb) : + veth_xdp_rx(rq, skb) : netif_rx(skb); } static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); + struct veth_rq *rq = NULL; struct net_device *rcv; int length = skb->len; bool rcv_xdp = false; + int rxq; rcu_read_lock(); rcv = rcu_dereference(priv->peer); @@ -188,9 +193,15 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) } rcv_priv = netdev_priv(rcv); - rcv_xdp = rcu_access_pointer(rcv_priv->xdp_prog); + rxq = skb_get_queue_mapping(skb); + if (rxq < rcv->real_num_rx_queues) { + rq = &rcv_priv->rq[rxq]; + rcv_xdp = rcu_access_pointer(rq->xdp_prog); + if (rcv_xdp) + skb_record_rx_queue(skb, rxq); + } - if (likely(veth_forward_skb(rcv, skb, rcv_xdp) == NET_RX_SUCCESS)) { + if (likely(veth_forward_skb(rcv, skb, rq, rcv_xdp) == NET_RX_SUCCESS)) { struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats); u64_stats_update_begin(&stats->syncp); @@ -203,7 +214,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) } if (rcv_xdp) - __veth_xdp_flush(rcv_priv); + __veth_xdp_flush(rq); rcu_read_unlock(); @@ -278,11 +289,17 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len, return skb; } +static int veth_select_rxq(struct net_device *dev) +{ + return smp_processor_id() % dev->real_num_rx_queues; +} + static int veth_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, u32 flags) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); struct net_device *rcv; + struct veth_rq *rq; int i, drops = 0; if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) @@ -293,25 +310,26 @@ static int veth_xdp_xmit(struct net_device *dev, int n, return -ENXIO; rcv_priv = netdev_priv(rcv); + rq = &rcv_priv->rq[veth_select_rxq(rcv)]; /* xdp_ring is initialized on receive side? */ - if (!rcu_access_pointer(rcv_priv->xdp_prog)) + if (!rcu_access_pointer(rq->xdp_prog)) return -ENXIO; - spin_lock(&rcv_priv->xdp_ring.producer_lock); + spin_lock(&rq->xdp_ring.producer_lock); for (i = 0; i < n; i++) { struct xdp_frame *frame = frames[i]; void *ptr = veth_xdp_to_ptr(frame); if (unlikely(xdp_ok_fwd_dev(rcv, frame->len) || - __ptr_ring_produce(&rcv_priv->xdp_ring, ptr))) { + __ptr_ring_produce(&rq->xdp_ring, ptr))) { xdp_return_frame_rx_napi(frame); drops++; } } - spin_unlock(&rcv_priv->xdp_ring.producer_lock); + spin_unlock(&rq->xdp_ring.producer_lock); if (flags & XDP_XMIT_FLUSH) - __veth_xdp_flush(rcv_priv); + __veth_xdp_flush(rq); return n - drops; } @@ -320,6 +338,7 @@ static void veth_xdp_flush(struct net_device *dev) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); struct net_device *rcv; + struct veth_rq *rq; rcu_read_lock(); rcv = rcu_dereference(priv->peer); @@ -327,11 +346,12 @@ static void veth_xdp_flush(struct net_device *dev) goto out; rcv_priv = netdev_priv(rcv); + rq = &rcv_priv->rq[veth_select_rxq(rcv)]; /* xdp_ring is initialized on receive side? */ - if (unlikely(!rcu_access_pointer(rcv_priv->xdp_prog))) + if (unlikely(!rcu_access_pointer(rq->xdp_prog))) goto out; - __veth_xdp_flush(rcv_priv); + __veth_xdp_flush(rq); out: rcu_read_unlock(); } @@ -346,7 +366,7 @@ static int veth_xdp_tx(struct net_device *dev, struct xdp_buff *xdp) return veth_xdp_xmit(dev, 1, &frame, 0); } -static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, +static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq, struct xdp_frame *frame, unsigned int *xdp_xmit) { @@ -357,7 +377,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, struct sk_buff *skb; rcu_read_lock(); - xdp_prog = rcu_dereference(priv->xdp_prog); + xdp_prog = rcu_dereference(rq->xdp_prog); if (likely(xdp_prog)) { struct xdp_buff xdp; u32 act; @@ -366,7 +386,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, xdp.data = frame->data; xdp.data_end = frame->data + frame->len; xdp.data_meta = frame->data - frame->metasize; - xdp.rxq = &priv->xdp_rxq; + xdp.rxq = &rq->xdp_rxq; act = bpf_prog_run_xdp(xdp_prog, &xdp); @@ -380,8 +400,8 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, xdp.data_hard_start = frame; xdp.rxq->mem = frame->mem; xdp.rxq->mem.flags |= XDP_MEM_RF_NO_DIRECT; - if (unlikely(veth_xdp_tx(priv->dev, &xdp) < 0)) { - trace_xdp_exception(priv->dev, xdp_prog, act); + if (unlikely(veth_xdp_tx(rq->dev, &xdp) < 0)) { + trace_xdp_exception(rq->dev, xdp_prog, act); frame = &orig_frame; goto err_xdp; } @@ -393,7 +413,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, xdp.data_hard_start = frame; xdp.rxq->mem = frame->mem; xdp.rxq->mem.flags |= XDP_MEM_RF_NO_DIRECT; - if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) { + if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) { frame = &orig_frame; goto err_xdp; } @@ -403,7 +423,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: - trace_xdp_exception(priv->dev, xdp_prog, act); + trace_xdp_exception(rq->dev, xdp_prog, act); case XDP_DROP: goto err_xdp; } @@ -418,7 +438,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, } memset(frame, 0, sizeof(*frame)); - skb->protocol = eth_type_trans(skb, priv->dev); + skb->protocol = eth_type_trans(skb, rq->dev); err: return skb; err_xdp: @@ -428,8 +448,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, return NULL; } -static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, - struct sk_buff *skb, +static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct sk_buff *skb, unsigned int *xdp_xmit) { u32 pktlen, headroom, act, metalen; @@ -439,7 +458,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, struct xdp_buff xdp; rcu_read_lock(); - xdp_prog = rcu_dereference(priv->xdp_prog); + xdp_prog = rcu_dereference(rq->xdp_prog); if (unlikely(!xdp_prog)) { rcu_read_unlock(); goto out; @@ -492,7 +511,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, xdp.data = skb_mac_header(skb); xdp.data_end = xdp.data + pktlen; xdp.data_meta = xdp.data; - xdp.rxq = &priv->xdp_rxq; + xdp.rxq = &rq->xdp_rxq; orig_data = xdp.data; orig_data_end = xdp.data_end; @@ -504,9 +523,9 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, case XDP_TX: get_page(virt_to_page(xdp.data)); consume_skb(skb); - xdp.rxq->mem = priv->xdp_mem; - if (unlikely(veth_xdp_tx(priv->dev, &xdp) < 0)) { - trace_xdp_exception(priv->dev, xdp_prog, act); + xdp.rxq->mem = rq->xdp_mem; + if (unlikely(veth_xdp_tx(rq->dev, &xdp) < 0)) { + trace_xdp_exception(rq->dev, xdp_prog, act); goto err_xdp; } *xdp_xmit |= VETH_XDP_TX; @@ -515,8 +534,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, case XDP_REDIRECT: get_page(virt_to_page(xdp.data)); consume_skb(skb); - xdp.rxq->mem = priv->xdp_mem; - if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) + xdp.rxq->mem = rq->xdp_mem; + if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) goto err_xdp; *xdp_xmit |= VETH_XDP_REDIR; rcu_read_unlock(); @@ -524,7 +543,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: - trace_xdp_exception(priv->dev, xdp_prog, act); + trace_xdp_exception(rq->dev, xdp_prog, act); case XDP_DROP: goto drop; } @@ -540,7 +559,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, off = xdp.data_end - orig_data_end; if (off != 0) __skb_put(skb, off); - skb->protocol = eth_type_trans(skb, priv->dev); + skb->protocol = eth_type_trans(skb, rq->dev); metalen = xdp.data - xdp.data_meta; if (metalen) @@ -558,27 +577,26 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, return NULL; } -static int veth_xdp_rcv(struct veth_priv *priv, int budget, - unsigned int *xdp_xmit) +static int veth_xdp_rcv(struct veth_rq *rq, int budget, unsigned int *xdp_xmit) { int i, done = 0; for (i = 0; i < budget; i++) { - void *ptr = __ptr_ring_consume(&priv->xdp_ring); + void *ptr = __ptr_ring_consume(&rq->xdp_ring); struct sk_buff *skb; if (!ptr) break; if (veth_is_xdp_frame(ptr)) { - skb = veth_xdp_rcv_one(priv, veth_ptr_to_xdp(ptr), + skb = veth_xdp_rcv_one(rq, veth_ptr_to_xdp(ptr), xdp_xmit); } else { - skb = veth_xdp_rcv_skb(priv, ptr, xdp_xmit); + skb = veth_xdp_rcv_skb(rq, ptr, xdp_xmit); } if (skb) - napi_gro_receive(&priv->xdp_napi, skb); + napi_gro_receive(&rq->xdp_napi, skb); done++; } @@ -588,24 +606,24 @@ static int veth_xdp_rcv(struct veth_priv *priv, int budget, static int veth_poll(struct napi_struct *napi, int budget) { - struct veth_priv *priv = - container_of(napi, struct veth_priv, xdp_napi); + struct veth_rq *rq = + container_of(napi, struct veth_rq, xdp_napi); unsigned int xdp_xmit = 0; int done; - done = veth_xdp_rcv(priv, budget, &xdp_xmit); + done = veth_xdp_rcv(rq, budget, &xdp_xmit); if (done < budget && napi_complete_done(napi, done)) { /* Write rx_notify_masked before reading ptr_ring */ - smp_store_mb(priv->rx_notify_masked, false); - if (unlikely(!__ptr_ring_empty(&priv->xdp_ring))) { - priv->rx_notify_masked = true; - napi_schedule(&priv->xdp_napi); + smp_store_mb(rq->rx_notify_masked, false); + if (unlikely(!__ptr_ring_empty(&rq->xdp_ring))) { + rq->rx_notify_masked = true; + napi_schedule(&rq->xdp_napi); } } if (xdp_xmit & VETH_XDP_TX) - veth_xdp_flush(priv->dev); + veth_xdp_flush(rq->dev); if (xdp_xmit & VETH_XDP_REDIR) xdp_do_flush_map(); @@ -615,56 +633,90 @@ static int veth_poll(struct napi_struct *napi, int budget) static int veth_napi_add(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); - int err; + int err, i; - err = ptr_ring_init(&priv->xdp_ring, VETH_RING_SIZE, GFP_KERNEL); - if (err) - return err; + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + err = ptr_ring_init(&rq->xdp_ring, VETH_RING_SIZE, GFP_KERNEL); + if (err) + goto err_xdp_ring; + } - netif_napi_add(dev, &priv->xdp_napi, veth_poll, NAPI_POLL_WEIGHT); - napi_enable(&priv->xdp_napi); + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + netif_napi_add(dev, &rq->xdp_napi, veth_poll, NAPI_POLL_WEIGHT); + napi_enable(&rq->xdp_napi); + } return 0; +err_xdp_ring: + for (i--; i >= 0; i--) + ptr_ring_cleanup(&priv->rq[i].xdp_ring, veth_ptr_free); + + return err; } static void veth_napi_del(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); + int i; - napi_disable(&priv->xdp_napi); - netif_napi_del(&priv->xdp_napi); - priv->rx_notify_masked = false; - ptr_ring_cleanup(&priv->xdp_ring, veth_ptr_free); + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + napi_disable(&rq->xdp_napi); + napi_hash_del(&rq->xdp_napi); + } + synchronize_net(); + + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + netif_napi_del(&rq->xdp_napi); + rq->rx_notify_masked = false; + ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free); + } } static int veth_enable_xdp(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); - int err; + int err, i; - if (!xdp_rxq_info_is_reg(&priv->xdp_rxq)) { - err = xdp_rxq_info_reg(&priv->xdp_rxq, dev, 0); - if (err < 0) - return err; + if (!xdp_rxq_info_is_reg(&priv->rq[0].xdp_rxq)) { + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; - err = xdp_rxq_info_reg_mem_model(&priv->xdp_rxq, - MEM_TYPE_PAGE_SHARED, NULL); - if (err < 0) - goto err; + err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i); + if (err < 0) + goto err_rxq_reg; + + err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, + MEM_TYPE_PAGE_SHARED, + NULL); + if (err < 0) + goto err_reg_mem; + + /* Save original mem info as it can be overwritten */ + rq->xdp_mem = rq->xdp_rxq.mem; + } err = veth_napi_add(dev); if (err) - goto err; - - /* Save original mem info as it can be overwritten */ - priv->xdp_mem = priv->xdp_rxq.mem; + goto err_rxq_reg; } - rcu_assign_pointer(priv->xdp_prog, priv->_xdp_prog); + for (i = 0; i < dev->real_num_rx_queues; i++) + rcu_assign_pointer(priv->rq[i].xdp_prog, priv->_xdp_prog); return 0; -err: - xdp_rxq_info_unreg(&priv->xdp_rxq); +err_reg_mem: + xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq); +err_rxq_reg: + for (i--; i >= 0; i--) + xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq); return err; } @@ -672,11 +724,17 @@ static int veth_enable_xdp(struct net_device *dev) static void veth_disable_xdp(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); + int i; - rcu_assign_pointer(priv->xdp_prog, NULL); + for (i = 0; i < dev->real_num_rx_queues; i++) + rcu_assign_pointer(priv->rq[i].xdp_prog, NULL); veth_napi_del(dev); - priv->xdp_rxq.mem = priv->xdp_mem; - xdp_rxq_info_unreg(&priv->xdp_rxq); + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + rq->xdp_rxq.mem = rq->xdp_mem; + xdp_rxq_info_unreg(&rq->xdp_rxq); + } } static int veth_open(struct net_device *dev) @@ -823,6 +881,12 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, goto err; } + if (dev->real_num_rx_queues < peer->real_num_tx_queues) { + NL_SET_ERR_MSG_MOD(extack, "XDP expects number of rx queues not less than peer tx queues"); + err = -ENOSPC; + goto err; + } + if (dev->flags & IFF_UP) { err = veth_enable_xdp(dev); if (err) { @@ -961,13 +1025,31 @@ static int veth_validate(struct nlattr *tb[], struct nlattr *data[], return 0; } +static int veth_alloc_queues(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + + priv->rq = kcalloc(dev->num_rx_queues, sizeof(*priv->rq), GFP_KERNEL); + if (!priv->rq) + return -ENOMEM; + + return 0; +} + +static void veth_free_queues(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + + kfree(priv->rq); +} + static struct rtnl_link_ops veth_link_ops; static int veth_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { - int err; + int err, i; struct net_device *peer; struct veth_priv *priv; char ifname[IFNAMSIZ]; @@ -1020,6 +1102,12 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, return PTR_ERR(peer); } + err = veth_alloc_queues(peer); + if (err) { + put_net(net); + goto err_peer_alloc_queues; + } + if (!ifmp || !tbp[IFLA_ADDRESS]) eth_hw_addr_random(peer); @@ -1048,6 +1136,10 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, * should be re-allocated */ + err = veth_alloc_queues(dev); + if (err) + goto err_alloc_queues; + if (tb[IFLA_ADDRESS] == NULL) eth_hw_addr_random(dev); @@ -1067,22 +1159,28 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, */ priv = netdev_priv(dev); - priv->dev = dev; + for (i = 0; i < dev->real_num_rx_queues; i++) + priv->rq[i].dev = dev; rcu_assign_pointer(priv->peer, peer); priv = netdev_priv(peer); - priv->dev = peer; + for (i = 0; i < peer->real_num_rx_queues; i++) + priv->rq[i].dev = peer; rcu_assign_pointer(priv->peer, dev); return 0; err_register_dev: + veth_free_queues(dev); +err_alloc_queues: /* nothing to do */ err_configure_peer: unregister_netdevice(peer); return err; err_register_peer: + veth_free_queues(peer); +err_peer_alloc_queues: free_netdev(peer); return err; }