From patchwork Sun Jun 10 16:02:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 927385 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="c9nDG5hJ"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 413gsg2fhtz9rvt for ; Mon, 11 Jun 2018 02:03:03 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932204AbeFJQCb (ORCPT ); Sun, 10 Jun 2018 12:02:31 -0400 Received: from mail-pl0-f49.google.com ([209.85.160.49]:44118 "EHLO mail-pl0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932153AbeFJQC1 (ORCPT ); Sun, 10 Jun 2018 12:02:27 -0400 Received: by mail-pl0-f49.google.com with SMTP id z9-v6so10911482plk.11 for ; Sun, 10 Jun 2018 09:02:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=y2Tiil40/0tQVosQtENjtfk31jL1Jg8W9X08LPXsnn4=; b=c9nDG5hJgyHn9Q2/tiqpD1qgYwfRB90qUH08xLpLeoDhOy0F2i8k/RtwEENLpm+HCt 11nEd57yiFK41mXr4dhzTcxoNQmIoiKrlOylYA1i6KYPfvXLDmMpZycLRm87NYwzHTWS cJmiYrTOxfhzIG1Ko2n2f15PqYqhTF/DEMwmrHhQ1NtceUxDYjiGxcpVHa+WEDpG6L67 V/pYx/44yeIB6pjDZxbSapBMbH/k62QKTomrVzaOFAwJkgQznb0vSsfvLeFlrPfG8xAp j+hAAGv9WnjXFIdInpHPNH1ee7ERO4npLKhtrQEI7bAxWzOO8SLCe7dyqskqMvG2QT1q 309Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=y2Tiil40/0tQVosQtENjtfk31jL1Jg8W9X08LPXsnn4=; b=NATvAoSXBXRipsHtuRphZLU7h5ppJPDYYR2d/WcKNlV8hfgTMpQTBkscUHVzVikdPH iGIyBLwmqPcP+O7YMwkrA8RRbcaMquQe9hfWz1BZMGIPktmY/H1vlkfDvjzH7GETDuY/ NVgujoy6VwGpuvkmcoH0NplIEGJ3YQWXKrREsHmBvdBTr9mWlCKCtKBW4rz6wTw5oFr0 4dAyVdJl8jTd1pTUJkQrmIWGHFDNpOIDRT6Cj95TBrlynRPLB5XpsoH7/IrvHoTImU6Z UBcg57VjEuZMhjA+ZOMysFXoFH6Uqu3lGiEsxPmCZ2l/jHF+SgBo9Pd7f8QPfOfR+dfx sKUw== X-Gm-Message-State: APt69E1ZKmLOox2IOyFVcktwQ/qnc9TGIgY3FaxETIDdtO4Wh1WS+FPp YdNDPls8EBQsh51Ht7oZOylO1A== X-Google-Smtp-Source: ADUXVKICNlngDaFOrxyRvqbEHUrUCxJ7Wz1CbSbqMoD/iNv0RCR2ZP4DzVfR1jnGRwqmFRkZWTMr/Q== X-Received: by 2002:a17:902:e00a:: with SMTP id ca10-v6mr14805133plb.224.1528646547009; Sun, 10 Jun 2018 09:02:27 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id o87-v6sm56068211pfa.106.2018.06.10.09.02.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 10 Jun 2018 09:02:26 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org Cc: Toshiaki Makita , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann Subject: [PATCH RFC v2 1/9] net: Export skb_headers_offset_update Date: Mon, 11 Jun 2018 01:02:09 +0900 Message-Id: <20180610160217.3146-2-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180610160217.3146-1-toshiaki.makita1@gmail.com> References: <20180610160217.3146-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita v2: - Drop skb_copy_header part because it has already been exported now. Signed-off-by: Toshiaki Makita --- include/linux/skbuff.h | 1 + net/core/skbuff.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 164cdedf6012..2bdba543fda7 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1030,6 +1030,7 @@ static inline struct sk_buff *alloc_skb_fclone(unsigned int size, } struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src); +void skb_headers_offset_update(struct sk_buff *skb, int off); int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask); struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t priority); void skb_copy_header(struct sk_buff *new, const struct sk_buff *old); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index c642304f178c..180ab7d7f84f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1290,7 +1290,7 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask) } EXPORT_SYMBOL(skb_clone); -static void skb_headers_offset_update(struct sk_buff *skb, int off) +void skb_headers_offset_update(struct sk_buff *skb, int off) { /* Only adjust this if it actually is csum_start rather than csum */ if (skb->ip_summed == CHECKSUM_PARTIAL) @@ -1304,6 +1304,7 @@ static void skb_headers_offset_update(struct sk_buff *skb, int off) skb->inner_network_header += off; skb->inner_mac_header += off; } +EXPORT_SYMBOL(skb_headers_offset_update); void skb_copy_header(struct sk_buff *new, const struct sk_buff *old) { From patchwork Sun Jun 10 16:02:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 927379 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="t8WdlbUf"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 413gs95tBdz9rvt for ; Mon, 11 Jun 2018 02:02:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932236AbeFJQCd (ORCPT ); Sun, 10 Jun 2018 12:02:33 -0400 Received: from mail-pl0-f65.google.com ([209.85.160.65]:34064 "EHLO mail-pl0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932163AbeFJQC3 (ORCPT ); Sun, 10 Jun 2018 12:02:29 -0400 Received: by mail-pl0-f65.google.com with SMTP id g20-v6so10922814plq.1 for ; Sun, 10 Jun 2018 09:02:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=iWDL/CIQCP/5DYQ6CnC3k0gXZJIDYTkUIux5XlyhawE=; b=t8WdlbUfgDKJR2Xm/AtmhqQIqfK00FjlIxQl+JiX8lu0z1cEA9nccz2WzjcxPRABGm p+uGdrKQYPQmjnb01sM+vthO0gbnJLLzBSonOtda51151A/t52dfQflM2ZADKVzXi/2s 6w4ZKx+jMlpqbPApXAO2iP4IN5OCxWp85NmpPPnMxviu0lARPgU8KVikT3k9L7t8ZmqF tWV9pj7PzIsn8w2HUjZYq4LMy5vgEOIHJ/BNhVm2URw0jqJJbIn87Fc0UHkIFgjZbsNR SV8x49904bKaaJdr0PIhfQ1H2ufDoCDKl1eWxWMX7AsllRw++70gF7gPr2M8SfObVk75 k3aA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=iWDL/CIQCP/5DYQ6CnC3k0gXZJIDYTkUIux5XlyhawE=; b=op19qeT7W3Rh45dfBDz0HA9VTQDHCP1s2bhEEvArn+Kas12pPg4tWVbmG/jJ47jidf xQYRWbRiYc7+FR+qYVBw2bwqB2iIOlu/zkvSb4Pu3abgw/l/iyEfTV7CWbk75GrKaVWY 723L7i5iUE+is0SgXE8oK7lfLXUNqvPW9x5KiGOuFUGBjSIwOUQ872uVuIBJwOvvQUtk UuC/qL9O41hJoGAAQhn+6R1SrppWwxyh+LMq0FDB4k8GHm7J8s6Pt4nOsOLXZMk1lwjp Sm22m/ZP/VGixM4PoFIy4ghIXglZ3huxIOspQvcxaY7glXcvJQwYfa1UPXcMR1NyD43w cf9g== X-Gm-Message-State: APt69E3mQRR2PMhrbixp65SbeNsnpAh5OKTvTKxG3RbIWPLXu6o7xewG EhQH36eZd6TzK0YU9m1+g1cc1g== X-Google-Smtp-Source: ADUXVKLSfZr7NKmJ93zq3CSRP0gyYafTjHOD37C2U7sikXQnrQoxRAfbxDVzkt+WJ7Lfm6QZH5Owzw== X-Received: by 2002:a17:902:aa95:: with SMTP id d21-v6mr14782673plr.73.1528646549151; Sun, 10 Jun 2018 09:02:29 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id o87-v6sm56068211pfa.106.2018.06.10.09.02.27 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 10 Jun 2018 09:02:28 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org Cc: Toshiaki Makita , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann Subject: [PATCH RFC v2 2/9] veth: Add driver XDP Date: Mon, 11 Jun 2018 01:02:10 +0900 Message-Id: <20180610160217.3146-3-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180610160217.3146-1-toshiaki.makita1@gmail.com> References: <20180610160217.3146-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This is basic implementation of veth driver XDP. Incoming packets are sent from the peer veth device in the form of skb, so this is generally doing the same thing as generic XDP. This itself is not so useful, but a starting point to implement other useful veth XDP features like TX and REDIRECT. This introduces NAPI when XDP is enabled, because XDP is now heavily relies on NAPI context. Use ptr_ring to emulate NIC ring. Tx function enqueues packets to the ring and peer NAPI handler drains the ring. Currently only one ring is allocated for each veth device, so it does not scale on multiqueue env. This can be resolved by allocating rings on the per-queue basis later. Note that NAPI is not used but netif_rx is used when XDP is not loaded, so this does not change the default behaviour. v2: - Squashed with the patch adding NAPI. - Implement adjust_tail. - Don't acquire consumer lock because it is guarded by NAPI. - Make poll_controller noop since it is unnecessary. - Register rxq_info on enabling XDP rather than on opening the device. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 357 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 350 insertions(+), 7 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index a69ad39ee57e..317ec92cf816 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -19,10 +19,18 @@ #include #include #include +#include +#include +#include +#include +#include #define DRV_NAME "veth" #define DRV_VERSION "1.0" +#define VETH_RING_SIZE 256 +#define VETH_XDP_HEADROOM (XDP_PACKET_HEADROOM + NET_IP_ALIGN) + struct pcpu_vstats { u64 packets; u64 bytes; @@ -30,9 +38,15 @@ struct pcpu_vstats { }; struct veth_priv { + struct napi_struct xdp_napi; + struct net_device *dev; + struct bpf_prog __rcu *xdp_prog; struct net_device __rcu *peer; atomic64_t dropped; unsigned requested_headroom; + bool rx_notify_masked; + struct ptr_ring xdp_ring; + struct xdp_rxq_info xdp_rxq; }; /* @@ -98,11 +112,43 @@ static const struct ethtool_ops veth_ethtool_ops = { .get_link_ksettings = veth_get_link_ksettings, }; -static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) +/* general routines */ + +static void __veth_xdp_flush(struct veth_priv *priv) +{ + /* Write ptr_ring before reading rx_notify_masked */ + smp_mb(); + if (!priv->rx_notify_masked) { + priv->rx_notify_masked = true; + napi_schedule(&priv->xdp_napi); + } +} + +static int veth_xdp_rx(struct veth_priv *priv, struct sk_buff *skb) +{ + if (unlikely(ptr_ring_produce(&priv->xdp_ring, skb))) { + dev_kfree_skb_any(skb); + return NET_RX_DROP; + } + + return NET_RX_SUCCESS; +} + +static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, bool xdp) { struct veth_priv *priv = netdev_priv(dev); + + return __dev_forward_skb(dev, skb) ?: xdp ? + veth_xdp_rx(priv, skb) : + netif_rx(skb); +} + +static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct veth_priv *rcv_priv, *priv = netdev_priv(dev); struct net_device *rcv; int length = skb->len; + bool rcv_xdp = false; rcu_read_lock(); rcv = rcu_dereference(priv->peer); @@ -111,7 +157,10 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) goto drop; } - if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) { + rcv_priv = netdev_priv(rcv); + rcv_xdp = rcu_access_pointer(rcv_priv->xdp_prog); + + if (likely(veth_forward_skb(rcv, skb, rcv_xdp) == NET_RX_SUCCESS)) { struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats); u64_stats_update_begin(&stats->syncp); @@ -122,14 +171,15 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) drop: atomic64_inc(&priv->dropped); } + + if (rcv_xdp) + __veth_xdp_flush(rcv_priv); + rcu_read_unlock(); + return NETDEV_TX_OK; } -/* - * general routines - */ - static u64 veth_stats_one(struct pcpu_vstats *result, struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); @@ -179,18 +229,245 @@ static void veth_set_multicast_list(struct net_device *dev) { } +static struct sk_buff *veth_build_skb(void *head, int headroom, int len, + int buflen) +{ + struct sk_buff *skb; + + if (!buflen) { + buflen = SKB_DATA_ALIGN(headroom + len) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + } + skb = build_skb(head, buflen); + if (!skb) + return NULL; + + skb_reserve(skb, headroom); + skb_put(skb, len); + + return skb; +} + +static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, + struct sk_buff *skb) +{ + u32 pktlen, headroom, act, metalen; + void *orig_data, *orig_data_end; + int size, mac_len, delta, off; + struct bpf_prog *xdp_prog; + struct xdp_buff xdp; + + rcu_read_lock(); + xdp_prog = rcu_dereference(priv->xdp_prog); + if (!xdp_prog) { + rcu_read_unlock(); + goto out; + } + + mac_len = skb->data - skb_mac_header(skb); + pktlen = skb->len + mac_len; + size = SKB_DATA_ALIGN(VETH_XDP_HEADROOM + pktlen) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + if (size > PAGE_SIZE) + goto drop; + + headroom = skb_headroom(skb) - mac_len; + if (skb_shared(skb) || skb_head_is_locked(skb) || + skb_is_nonlinear(skb) || headroom < XDP_PACKET_HEADROOM) { + struct sk_buff *nskb; + void *head, *start; + struct page *page; + int head_off; + + page = alloc_page(GFP_ATOMIC); + if (!page) + goto drop; + + head = page_address(page); + start = head + VETH_XDP_HEADROOM; + if (skb_copy_bits(skb, -mac_len, start, pktlen)) { + page_frag_free(head); + goto drop; + } + + nskb = veth_build_skb(head, + VETH_XDP_HEADROOM + mac_len, skb->len, + PAGE_SIZE); + if (!nskb) { + page_frag_free(head); + goto drop; + } + + skb_copy_header(nskb, skb); + head_off = skb_headroom(nskb) - skb_headroom(skb); + skb_headers_offset_update(nskb, head_off); + dev_consume_skb_any(skb); + skb = nskb; + } + + xdp.data_hard_start = skb->head; + xdp.data = skb_mac_header(skb); + xdp.data_end = xdp.data + pktlen; + xdp.data_meta = xdp.data; + xdp.rxq = &priv->xdp_rxq; + orig_data = xdp.data; + orig_data_end = xdp.data_end; + + act = bpf_prog_run_xdp(xdp_prog, &xdp); + + switch (act) { + case XDP_PASS: + break; + default: + bpf_warn_invalid_xdp_action(act); + case XDP_ABORTED: + trace_xdp_exception(priv->dev, xdp_prog, act); + case XDP_DROP: + goto drop; + } + rcu_read_unlock(); + + delta = orig_data - xdp.data; + off = mac_len + delta; + if (off > 0) + __skb_push(skb, off); + else if (off < 0) + __skb_pull(skb, -off); + skb->mac_header -= delta; + off = xdp.data_end - orig_data_end; + if (off != 0) + __skb_put(skb, off); + skb->protocol = eth_type_trans(skb, priv->dev); + + metalen = xdp.data - xdp.data_meta; + if (metalen) + skb_metadata_set(skb, metalen); +out: + return skb; +drop: + rcu_read_unlock(); + dev_kfree_skb_any(skb); + return NULL; +} + +static int veth_xdp_rcv(struct veth_priv *priv, int budget) +{ + int i, done = 0; + + for (i = 0; i < budget; i++) { + struct sk_buff *skb = __ptr_ring_consume(&priv->xdp_ring); + + if (!skb) + break; + + skb = veth_xdp_rcv_skb(priv, skb); + + if (skb) + napi_gro_receive(&priv->xdp_napi, skb); + + done++; + } + + return done; +} + +static int veth_poll(struct napi_struct *napi, int budget) +{ + struct veth_priv *priv = + container_of(napi, struct veth_priv, xdp_napi); + int done; + + done = veth_xdp_rcv(priv, budget); + + if (done < budget && napi_complete_done(napi, done)) { + /* Write rx_notify_masked before reading ptr_ring */ + smp_store_mb(priv->rx_notify_masked, false); + if (unlikely(!__ptr_ring_empty(&priv->xdp_ring))) { + priv->rx_notify_masked = true; + napi_schedule(&priv->xdp_napi); + } + } + + return done; +} + +static int veth_napi_add(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + int err; + + err = ptr_ring_init(&priv->xdp_ring, VETH_RING_SIZE, GFP_KERNEL); + if (err) + return err; + + netif_napi_add(dev, &priv->xdp_napi, veth_poll, NAPI_POLL_WEIGHT); + napi_enable(&priv->xdp_napi); + + return 0; +} + +static void veth_napi_del(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + + napi_disable(&priv->xdp_napi); + netif_napi_del(&priv->xdp_napi); + ptr_ring_cleanup(&priv->xdp_ring, __skb_array_destroy_skb); +} + +static int veth_enable_xdp(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + int err; + + err = xdp_rxq_info_reg(&priv->xdp_rxq, dev, 0); + if (err < 0) + return err; + + err = xdp_rxq_info_reg_mem_model(&priv->xdp_rxq, + MEM_TYPE_PAGE_SHARED, NULL); + if (err < 0) + goto err; + + err = veth_napi_add(dev); + if (err) + goto err; + + return 0; +err: + xdp_rxq_info_unreg(&priv->xdp_rxq); + + return err; +} + +static void veth_disable_xdp(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + + veth_napi_del(dev); + xdp_rxq_info_unreg(&priv->xdp_rxq); +} + static int veth_open(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); struct net_device *peer = rtnl_dereference(priv->peer); + int err; if (!peer) return -ENOTCONN; + if (rtnl_dereference(priv->xdp_prog)) { + err = veth_enable_xdp(dev); + if (err) + return err; + } + if (peer->flags & IFF_UP) { netif_carrier_on(dev); netif_carrier_on(peer); } + return 0; } @@ -203,6 +480,9 @@ static int veth_close(struct net_device *dev) if (peer) netif_carrier_off(peer); + if (rtnl_dereference(priv->xdp_prog)) + veth_disable_xdp(dev); + return 0; } @@ -228,7 +508,7 @@ static void veth_dev_free(struct net_device *dev) static void veth_poll_controller(struct net_device *dev) { /* veth only receives frames when its peer sends one - * Since it's a synchronous operation, we are guaranteed + * Since it has nothing to do with disabling irqs, we are guaranteed * never to have pending data when we poll for it so * there is nothing to do here. * @@ -276,6 +556,65 @@ static void veth_set_rx_headroom(struct net_device *dev, int new_hr) rcu_read_unlock(); } +static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, + struct netlink_ext_ack *extack) +{ + struct veth_priv *priv = netdev_priv(dev); + struct bpf_prog *old_prog; + struct net_device *peer; + int err; + + old_prog = rtnl_dereference(priv->xdp_prog); + peer = rtnl_dereference(priv->peer); + + if (prog) { + if (!peer) + return -ENOTCONN; + + if (!old_prog && dev->flags & IFF_UP) { + err = veth_enable_xdp(dev); + if (err) + return err; + } + } + + rcu_assign_pointer(priv->xdp_prog, prog); + + if (old_prog) { + bpf_prog_put(old_prog); + if (!prog && dev->flags & IFF_UP) + veth_disable_xdp(dev); + } + + return 0; +} + +static u32 veth_xdp_query(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + const struct bpf_prog *xdp_prog; + + xdp_prog = rtnl_dereference(priv->xdp_prog); + if (xdp_prog) + return xdp_prog->aux->id; + + return 0; +} + +static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) +{ + switch (xdp->command) { + case XDP_SETUP_PROG: + return veth_xdp_set(dev, xdp->prog, xdp->extack); + case XDP_QUERY_PROG: + xdp->prog_id = veth_xdp_query(dev); + xdp->prog_attached = !!xdp->prog_id; + return 0; + default: + return -EINVAL; + } +} + static const struct net_device_ops veth_netdev_ops = { .ndo_init = veth_dev_init, .ndo_open = veth_open, @@ -290,6 +629,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_get_iflink = veth_get_iflink, .ndo_features_check = passthru_features_check, .ndo_set_rx_headroom = veth_set_rx_headroom, + .ndo_bpf = veth_xdp, }; #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \ @@ -451,10 +791,13 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, */ priv = netdev_priv(dev); + priv->dev = dev; rcu_assign_pointer(priv->peer, peer); priv = netdev_priv(peer); + priv->dev = peer; rcu_assign_pointer(priv->peer, dev); + return 0; err_register_dev: From patchwork Sun Jun 10 16:02:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 927386 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ZPTFmnJA"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 413gsj6mB2z9rvt for ; Mon, 11 Jun 2018 02:03:05 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932352AbeFJQDD (ORCPT ); Sun, 10 Jun 2018 12:03:03 -0400 Received: from mail-pg0-f66.google.com ([74.125.83.66]:32875 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932153AbeFJQCb (ORCPT ); Sun, 10 Jun 2018 12:02:31 -0400 Received: by mail-pg0-f66.google.com with SMTP id e11-v6so8579680pgq.0 for ; Sun, 10 Jun 2018 09:02:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=YdPBOLjJyTFdUctmX7REHLr1V8FKjVM4Dm2S45Tz7bU=; b=ZPTFmnJAYyY6bt1ZgoyUSSPxO63WssVAAeMMLvmuvlXeT0L1t6amCNdlpaTPVzF1D5 O0jS9gRJq5N99Nm42yWsF2b3ddyrziK2IhAJhJRGvmLu9hyQrseulU80rhh/pnPXdTYd mt6lDb0pV1jloqsURuhrnMSaVDw4RXWleQU94lV8NspKJhO0nQJCodLVqqLHad47OtNj vRqvr+44QxkBrHsg/izUyDuDH/2XM9ipEiAlebXyOxfAlpaMHtFilVXUC4SgVSE+WltH djn859uTNGS4rU0Le3HVc6jEyfdpjywpE5pa/bLGe73XnOdhKLvJnlHFHerdKj37j0Z+ fr1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=YdPBOLjJyTFdUctmX7REHLr1V8FKjVM4Dm2S45Tz7bU=; b=DQFrNVKwk5BOU73EfMgGT08e5/pOT7JE81zC/8GEks11svYLbv7/ST+ijaV+s5C9kG maEX/YBBDT2UKj7JWje9A7a1yE34r7ryGcqVe/QKRRant8/BDuYtV5g31BYRjKa1SQyU zzQ7vAD+dWXO68L5Swqbpqe0vnmoWN9s2H3vLfHwwr6BVKKtQYlQ14WMadn5DefwpOZZ tTYbSFK35e9avwRsZwv+3OuxmemcZInVLOI631PKZNVSk0Rf+ps5uM4t+uxsN59M+TUr VZvCw3BCvNXNRYMuL75kAf0k+2vWoO8jHr9BXJAh0MvzWLxMTPKUJjidtkyEU44CJKyy wZPQ== X-Gm-Message-State: APt69E1RYbCWRQ9SXKwnejRvdNyp3uVMIrpUcB5eBIKMEU9suHwC2T2y LfE273wigfAiIwCvO+IsdBvEiA== X-Google-Smtp-Source: ADUXVKLpFvicne8pthT4J6l8bwfhlHzMC5gSmVp/b1n6GDqpDvOU5BCOKYCT64eKDxQuVb5NpD3zcA== X-Received: by 2002:a65:60d2:: with SMTP id r18-v6mr6592560pgv.306.1528646551220; Sun, 10 Jun 2018 09:02:31 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id o87-v6sm56068211pfa.106.2018.06.10.09.02.29 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 10 Jun 2018 09:02:30 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org Cc: Toshiaki Makita , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann Subject: [PATCH RFC v2 3/9] veth: Avoid drops by oversized packets when XDP is enabled Date: Mon, 11 Jun 2018 01:02:11 +0900 Message-Id: <20180610160217.3146-4-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180610160217.3146-1-toshiaki.makita1@gmail.com> References: <20180610160217.3146-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita All oversized packets including GSO packets are dropped if XDP is enabled on receiver side, so don't send such packets from peer. Drop TSO and SCTP fragmentation features so that veth devices themselves segment packets with XDP enabled. Also cap MTU accordingly. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 49 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 43 insertions(+), 6 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 317ec92cf816..88d349da72cc 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -533,6 +533,23 @@ static int veth_get_iflink(const struct net_device *dev) return iflink; } +static netdev_features_t veth_fix_features(struct net_device *dev, + netdev_features_t features) +{ + struct veth_priv *priv = netdev_priv(dev); + struct net_device *peer; + + peer = rtnl_dereference(priv->peer); + if (peer) { + struct veth_priv *peer_priv = netdev_priv(peer); + + if (rtnl_dereference(peer_priv->xdp_prog)) + features &= ~NETIF_F_GSO_SOFTWARE; + } + + return features; +} + static void veth_set_rx_headroom(struct net_device *dev, int new_hr) { struct veth_priv *peer_priv, *priv = netdev_priv(dev); @@ -571,10 +588,19 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, if (!peer) return -ENOTCONN; - if (!old_prog && dev->flags & IFF_UP) { - err = veth_enable_xdp(dev); - if (err) - return err; + if (!old_prog) { + if (dev->flags & IFF_UP) { + err = veth_enable_xdp(dev); + if (err) + return err; + } + + peer->hw_features &= ~NETIF_F_GSO_SOFTWARE; + peer->max_mtu = PAGE_SIZE - VETH_XDP_HEADROOM - + peer->hard_header_len - + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + if (peer->mtu > peer->max_mtu) + dev_set_mtu(peer, peer->max_mtu); } } @@ -582,10 +608,20 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, if (old_prog) { bpf_prog_put(old_prog); - if (!prog && dev->flags & IFF_UP) - veth_disable_xdp(dev); + if (!prog) { + if (dev->flags & IFF_UP) + veth_disable_xdp(dev); + + if (peer) { + peer->hw_features |= NETIF_F_GSO_SOFTWARE; + peer->max_mtu = ETH_MAX_MTU; + } + } } + if ((!!old_prog ^ !!prog) && peer) + netdev_update_features(peer); + return 0; } @@ -627,6 +663,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_poll_controller = veth_poll_controller, #endif .ndo_get_iflink = veth_get_iflink, + .ndo_fix_features = veth_fix_features, .ndo_features_check = passthru_features_check, .ndo_set_rx_headroom = veth_set_rx_headroom, .ndo_bpf = veth_xdp, From patchwork Sun Jun 10 16:02:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 927380 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="jiCTttC7"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 413gsF5Yszz9rvt for ; Mon, 11 Jun 2018 02:02:41 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932296AbeFJQCh (ORCPT ); Sun, 10 Jun 2018 12:02:37 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:34271 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932163AbeFJQCe (ORCPT ); Sun, 10 Jun 2018 12:02:34 -0400 Received: by mail-pf0-f193.google.com with SMTP id a63-v6so8962683pfl.1 for ; Sun, 10 Jun 2018 09:02:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=uLnUamuckBFjDEk74ZdINumU6xe4kM+jHRMGBq4wCoE=; b=jiCTttC7yVnhkeZ01Wh1k2lsFt1NMcqed+ryjV+WK1ENLY3sepg5DOsNhFkMu0zlrz vIqnsfsZtyE96+1vwAHpSH66/F67N3shHxjlUP3HxytqsA6Zy0UZOPSjL8IfrR6ecK7A UfnH+ZcT3NIqZqc90PV77pOmzGf+E7wo5ZkWt97egmss8k4I24wABJrAQImJq8eSEdYQ 7XUmtAe3bDBTlOo2YpY0aPOusQZ+GKESSrPrWOWwp7JHXHpkDNcUk1edPNSrfeOva4uH Gy46IbbLrWoBZPfKDOGDO3UKzi3wMqB58N9KBjRj1TIL349w25JT40jY4V9ButKFqdNB Cq1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=uLnUamuckBFjDEk74ZdINumU6xe4kM+jHRMGBq4wCoE=; b=flE3xE3H3rIiiy6L9Qwt3LHQE3RP7mYvOfngOTouZSCClybOkLF6I4kfHFZ9bzG4Hy 4deSFR6BXfLJj125qvUp2o0XzEsM38HnCb3H83C2TG1aj/4+e9XTrbx+jn1G5h7/PUCa o8ssn7/tKPZKSTUl0yNhUe6e8wsjmam1B3fRf0T4hsPl8MLII4Be2reo8e3V4s6dvZ6M Wn3zdqE/95rfEyNZoHG3le3V4Et7su3JrHcJe8ApPnX4EjXJh6L7iFHl94F+c492f53s 9pE7rVHWBVIsSxeVjhchJsjwZRVz932JED3dBL/X2OTkh0rVub5Ef4oiEfaQWacDms5n bMmA== X-Gm-Message-State: APt69E0IACP0quNRY4DyKpmXtgaG07w1X8+m3nNZ6OdyGOOw+1x5nFaE HstYyG/J5Dqe0SgVNV1ROGnqzg== X-Google-Smtp-Source: ADUXVKL7KDU/6RJqt84l59rWtIT+OgK+eiquaK6L+4Gz7lnR09qfnsLdc4wKTkl4zJddO4WZ8w/wgg== X-Received: by 2002:a63:af50:: with SMTP id s16-v6mr12111502pgo.263.1528646553319; Sun, 10 Jun 2018 09:02:33 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id o87-v6sm56068211pfa.106.2018.06.10.09.02.31 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 10 Jun 2018 09:02:32 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org Cc: Toshiaki Makita , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann Subject: [PATCH RFC v2 4/9] veth: Add another napi ring for ndo_xdp_xmit and handle xdp_frames Date: Mon, 11 Jun 2018 01:02:12 +0900 Message-Id: <20180610160217.3146-5-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180610160217.3146-1-toshiaki.makita1@gmail.com> References: <20180610160217.3146-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This is preparation for XDP TX and ndo_xdp_xmit. Add another napi ring and handle redirected xdp_frames through it. v2: - Use another ring instead of using flag to differentiate skb and xdp_frame. This approach makes bulk skb transmit possible in veth_xmit later. - Clear xdp_frame feilds in skb->head. - Implement adjust_tail. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 125 ++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 114 insertions(+), 11 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 88d349da72cc..cb3fa558fbe0 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -46,6 +46,7 @@ struct veth_priv { unsigned requested_headroom; bool rx_notify_masked; struct ptr_ring xdp_ring; + struct ptr_ring xdp_tx_ring; struct xdp_rxq_info xdp_rxq; }; @@ -114,6 +115,11 @@ static const struct ethtool_ops veth_ethtool_ops = { /* general routines */ +static void veth_xdp_free(void *frame) +{ + xdp_return_frame(frame); +} + static void __veth_xdp_flush(struct veth_priv *priv) { /* Write ptr_ring before reading rx_notify_masked */ @@ -248,6 +254,61 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len, return skb; } +static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, + struct xdp_frame *frame) +{ + int len = frame->len, delta = 0; + struct bpf_prog *xdp_prog; + unsigned int headroom; + struct sk_buff *skb; + + rcu_read_lock(); + xdp_prog = rcu_dereference(priv->xdp_prog); + if (xdp_prog) { + struct xdp_buff xdp; + u32 act; + + xdp.data_hard_start = frame->data - frame->headroom; + xdp.data = frame->data; + xdp.data_end = frame->data + frame->len; + xdp.data_meta = frame->data - frame->metasize; + xdp.rxq = &priv->xdp_rxq; + + act = bpf_prog_run_xdp(xdp_prog, &xdp); + + switch (act) { + case XDP_PASS: + delta = frame->data - xdp.data; + len = xdp.data_end - xdp.data; + break; + default: + bpf_warn_invalid_xdp_action(act); + case XDP_ABORTED: + trace_xdp_exception(priv->dev, xdp_prog, act); + case XDP_DROP: + goto err_xdp; + } + } + rcu_read_unlock(); + + headroom = frame->data - delta - (void *)frame; + skb = veth_build_skb(frame, headroom, len, 0); + if (!skb) { + xdp_return_frame(frame); + goto err; + } + + memset(frame, 0, sizeof(*frame)); + skb->protocol = eth_type_trans(skb, priv->dev); +err: + return skb; +err_xdp: + rcu_read_unlock(); + xdp_return_frame(frame); + + return NULL; +} + static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, struct sk_buff *skb) { @@ -352,21 +413,53 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, static int veth_xdp_rcv(struct veth_priv *priv, int budget) { - int i, done = 0; + int done = 0; + bool more; - for (i = 0; i < budget; i++) { - struct sk_buff *skb = __ptr_ring_consume(&priv->xdp_ring); + do { + int curr_budget, i; + bool curr_more; - if (!skb) - break; + more = false; - skb = veth_xdp_rcv_skb(priv, skb); + curr_more = true; + curr_budget = min(budget - done, budget >> 1); + for (i = 0; i < curr_budget; i++) { + struct xdp_frame *frame; + struct sk_buff *skb; - if (skb) - napi_gro_receive(&priv->xdp_napi, skb); + frame = __ptr_ring_consume(&priv->xdp_tx_ring); + if (!frame) { + curr_more = false; + break; + } - done++; - } + skb = veth_xdp_rcv_one(priv, frame); + if (skb) + napi_gro_receive(&priv->xdp_napi, skb); + + done++; + } + more |= curr_more; + + curr_more = true; + curr_budget = min(budget - done, budget >> 1); + for (i = 0; i < curr_budget; i++) { + struct sk_buff *skb = __ptr_ring_consume(&priv->xdp_ring); + + if (!skb) { + curr_more = false; + break; + } + + skb = veth_xdp_rcv_skb(priv, skb); + if (skb) + napi_gro_receive(&priv->xdp_napi, skb); + + done++; + } + more |= curr_more; + } while (more && done < budget); return done; } @@ -382,7 +475,8 @@ static int veth_poll(struct napi_struct *napi, int budget) if (done < budget && napi_complete_done(napi, done)) { /* Write rx_notify_masked before reading ptr_ring */ smp_store_mb(priv->rx_notify_masked, false); - if (unlikely(!__ptr_ring_empty(&priv->xdp_ring))) { + if (unlikely(!__ptr_ring_empty(&priv->xdp_tx_ring) || + !__ptr_ring_empty(&priv->xdp_ring))) { priv->rx_notify_masked = true; napi_schedule(&priv->xdp_napi); } @@ -400,10 +494,18 @@ static int veth_napi_add(struct net_device *dev) if (err) return err; + err = ptr_ring_init(&priv->xdp_tx_ring, VETH_RING_SIZE, GFP_KERNEL); + if (err) + goto err_xdp_tx_ring; + netif_napi_add(dev, &priv->xdp_napi, veth_poll, NAPI_POLL_WEIGHT); napi_enable(&priv->xdp_napi); return 0; +err_xdp_tx_ring: + ptr_ring_cleanup(&priv->xdp_ring, __skb_array_destroy_skb); + + return err; } static void veth_napi_del(struct net_device *dev) @@ -413,6 +515,7 @@ static void veth_napi_del(struct net_device *dev) napi_disable(&priv->xdp_napi); netif_napi_del(&priv->xdp_napi); ptr_ring_cleanup(&priv->xdp_ring, __skb_array_destroy_skb); + ptr_ring_cleanup(&priv->xdp_tx_ring, veth_xdp_free); } static int veth_enable_xdp(struct net_device *dev) From patchwork Sun Jun 10 16:02:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 927382 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="a0ykoZ3H"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 413gsS0jQ7z9rvt for ; Mon, 11 Jun 2018 02:02:52 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932337AbeFJQCo (ORCPT ); Sun, 10 Jun 2018 12:02:44 -0400 Received: from mail-pg0-f65.google.com ([74.125.83.65]:44060 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932269AbeFJQCf (ORCPT ); Sun, 10 Jun 2018 12:02:35 -0400 Received: by mail-pg0-f65.google.com with SMTP id p21-v6so8565724pgd.11 for ; Sun, 10 Jun 2018 09:02:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=91ZhUkOOhsz9Km0tva/dfjGbO9uymm73oWhcmGvU1Fk=; b=a0ykoZ3HdaXRSOyF3kBPnPwhkl7MptA1YPdCGBLb5hXtnfKpNwo8G8SQRTzC0PsxzP NyXYqOfV04dG31tAloOEqL5pcy+feChCp1uBRxtgHgqV3LZ/DnnCnagpN5cbl0bQe2Lx /aiR6THiT++EuKyzN8h6e5N9rdkqtYFYvqKaFM0ae96brRiu0hk6OFtMHvPpbBwDwanv h4AxHNa6eliCMatAF3l/2auWqy/Kh4tAzrfzYf3RSRIni7EZnkx1gH++6qPb75JD2fDm h+XI7VtbanrF9k2AY88Piz5Z8o4Qf5Y5yM0OBVQOPf8DkBvbja0/vTQTLF8mj6enDz5n EFXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=91ZhUkOOhsz9Km0tva/dfjGbO9uymm73oWhcmGvU1Fk=; b=qU03DD+2GdigD6KCjmwl6DLOYxsSZooICgNgkodQSJKLE1FnpxDA85kwuyRKDxqhw0 mwx7rhIozj11q0x1GthHOH3p7hnx6zFnaAqvUdh6AmKznumx8kVQArDiO+OUSMGLogvi 7+VRe4u5r4elRj/Zx5+ntaYripIWTvMCjUlgYMUDhn3KZPo+a9npUrZ0bOyEjG7t22pS 7hmFrTLqvIxI/qy8S+PeR9F2njrYeNi0Nmbftq4pvoCz78z2aUwSZTe7TWGda1IjIEYk tDom9rTxoMwnsibfzFs+dzFyFPZrDPPEOtX37NK4wvP5/pwsFTWBwpRYhdB4sb+tROYG fgcQ== X-Gm-Message-State: APt69E2ihYaa0Bk+T3HR5WuYhSoNcYmJRMTPeLIheV+hRT83YLrilWSu HDTs6mJ3H89JLpEHwNT6Z2cg2w== X-Google-Smtp-Source: ADUXVKL8HELnv427zIZuKymQCKCGNpqPtpjenftTKf+DSdZUdoQnWNQJQdWMIllLj5g2HWg9iIE/wg== X-Received: by 2002:a65:6504:: with SMTP id x4-v6mr11989834pgv.131.1528646555265; Sun, 10 Jun 2018 09:02:35 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id o87-v6sm56068211pfa.106.2018.06.10.09.02.33 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 10 Jun 2018 09:02:34 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org Cc: Toshiaki Makita , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann Subject: [PATCH RFC v2 5/9] veth: Add ndo_xdp_xmit Date: Mon, 11 Jun 2018 01:02:13 +0900 Message-Id: <20180610160217.3146-6-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180610160217.3146-1-toshiaki.makita1@gmail.com> References: <20180610160217.3146-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This allows NIC's XDP to redirect packets to veth. The destination veth device enqueues redirected packets to the napi ring of its peer, then they are processed by XDP on its peer veth device. This can be thought as calling another XDP program by XDP program using REDIRECT, when the peer enables driver XDP. Note that when the peer veth device does not set driver xdp, redirected packets will be dropped because the peer is not ready for NAPI. v2: - Drop the part converting xdp_frame into skb when XDP is not enabled. - Implement bulk interface of ndo_xdp_xmit. - Implement XDP_XMIT_FLUSH bit and drop ndo_xdp_flush. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 39 +++++++++++++++++++++++++++++++++++++++ include/linux/filter.h | 16 ++++++++++++++++ net/core/filter.c | 11 +---------- 3 files changed, 56 insertions(+), 10 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index cb3fa558fbe0..b809d609a642 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -254,6 +255,43 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len, return skb; } +static int veth_xdp_xmit(struct net_device *dev, int n, + struct xdp_frame **frames, u32 flags) +{ + struct veth_priv *rcv_priv, *priv = netdev_priv(dev); + struct net_device *rcv; + int i, drops = 0; + + if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) + return -EINVAL; + + rcv = rcu_dereference(priv->peer); + if (unlikely(!rcv)) + return -ENXIO; + + rcv_priv = netdev_priv(rcv); + /* xdp_ring is initialized on receive side? */ + if (!rcu_access_pointer(rcv_priv->xdp_prog)) + return -ENXIO; + + spin_lock(&rcv_priv->xdp_tx_ring.producer_lock); + for (i = 0; i < n; i++) { + struct xdp_frame *frame = frames[i]; + + if (unlikely(xdp_ok_fwd_dev(rcv, frame->len) || + __ptr_ring_produce(&rcv_priv->xdp_tx_ring, frame))) { + xdp_return_frame_rx_napi(frame); + drops++; + } + } + spin_unlock(&rcv_priv->xdp_tx_ring.producer_lock); + + if (flags & XDP_XMIT_FLUSH) + __veth_xdp_flush(rcv_priv); + + return n - drops; +} + static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, struct xdp_frame *frame) { @@ -770,6 +808,7 @@ static const struct net_device_ops veth_netdev_ops = { .ndo_features_check = passthru_features_check, .ndo_set_rx_headroom = veth_set_rx_headroom, .ndo_bpf = veth_xdp, + .ndo_xdp_xmit = veth_xdp_xmit, }; #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \ diff --git a/include/linux/filter.h b/include/linux/filter.h index 45fc0f5000d8..12777eb70b40 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -19,6 +19,7 @@ #include #include #include +#include #include @@ -786,6 +787,21 @@ static inline bool bpf_dump_raw_ok(void) struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off, const struct bpf_insn *patch, u32 len); +static __always_inline int +xdp_ok_fwd_dev(const struct net_device *fwd, unsigned int pktlen) +{ + unsigned int len; + + if (unlikely(!(fwd->flags & IFF_UP))) + return -ENETDOWN; + + len = fwd->mtu + fwd->hard_header_len + VLAN_HLEN; + if (pktlen > len) + return -EMSGSIZE; + + return 0; +} + /* The pair of xdp_do_redirect and xdp_do_flush_map MUST be called in the * same cpu context. Further for best results no more than a single map * for the do_redirect/do_flush pair should be used. This limitation is diff --git a/net/core/filter.c b/net/core/filter.c index 3d9ba7e5965a..05d9e84566a4 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3216,16 +3216,7 @@ EXPORT_SYMBOL_GPL(xdp_do_redirect); static int __xdp_generic_ok_fwd_dev(struct sk_buff *skb, struct net_device *fwd) { - unsigned int len; - - if (unlikely(!(fwd->flags & IFF_UP))) - return -ENETDOWN; - - len = fwd->mtu + fwd->hard_header_len + VLAN_HLEN; - if (skb->len > len) - return -EMSGSIZE; - - return 0; + return xdp_ok_fwd_dev(fwd, skb->len); } static int xdp_do_generic_redirect_map(struct net_device *dev, From patchwork Sun Jun 10 16:02:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 927383 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="XHigL5rg"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 413gsV1YG6z9rvt for ; Mon, 11 Jun 2018 02:02:54 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932330AbeFJQCn (ORCPT ); Sun, 10 Jun 2018 12:02:43 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:42417 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932295AbeFJQCh (ORCPT ); Sun, 10 Jun 2018 12:02:37 -0400 Received: by mail-pf0-f193.google.com with SMTP id w7-v6so8942738pfn.9 for ; Sun, 10 Jun 2018 09:02:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=yok3wQzu28fT1HxXZy1Q0Rjn9NvF/b0Adxd6S83GuI8=; b=XHigL5rgy1DuO+RzL3Ef5pXw54OjGxA9MW7fD9zYea379xNLG+96k0noZwqW+zxFiB VFfAc2XeF9iLWN7MFofFjVhHWDXGwuLjiQZXMQ/UreTw6oeAMgIzYbZTSl7I/shlpEv6 svdzht16HYFImE+YQ5SJ2RoWBqRj+VgAU10waEQggmRBJDWkUkP/l0wqyOz/h53Anwa/ /b3l+wJbYnrUbK12DswEcHc9H+uZyksD5CFj2iZPOPs4VvtPSjjqAMzG/XgU1YMFEoal NJAy/4J0lqzku2k0doCtqFJ1QzGtDDPYZThKz27Fn5KZEZUhc0szeSHxJqPIi3EDkAeK Oo5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=yok3wQzu28fT1HxXZy1Q0Rjn9NvF/b0Adxd6S83GuI8=; b=D4vJ65RVOxtDnq8dbt9XvrdQdQcbSy6BvMlVIkeD0UEuBh/Wa/tjvdygvmEw7I4EAC IsDop+kHiQw3t5oNzFWzxfNWlJxfaUjH9ONXhqqtTZFbvxWoU4lTHuQ/b7Bs3u8VzbwE KjVd06GYs8JiNOJkn+uBMU07g0ifOipwI0F2MMNvKrlutbNI8Rc+pzygTkR6BdYdKkQ0 Q8mlX6X7tAd0f7KxCTdGmRdJCBCaxAZVCrAtUYkyW7bkplEvL6fZry+Ev9OxaFSptK57 pk9wkGZg5WYPTsQbuq3kr6OsUcGdw/bCp57IK+U4LXoaDqGL5QiAw62T9/8JK28tb9fX c5nA== X-Gm-Message-State: APt69E38A3PsXcUroZd7eUSfbhr0KzqME5c5rqTPThWc9/eH2hyJIODd +1WeFgCSlKJgXxfssxrdE3siJA== X-Google-Smtp-Source: ADUXVKIdwPjIIcSYkJc7BjblUI4VciNbKHCUNkSpvgYinqvd0E5HrM/5jfFffiPZ/ue6ppuIQTQz7Q== X-Received: by 2002:a63:77c9:: with SMTP id s192-v6mr11871766pgc.140.1528646557110; Sun, 10 Jun 2018 09:02:37 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id o87-v6sm56068211pfa.106.2018.06.10.09.02.35 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 10 Jun 2018 09:02:36 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org Cc: Toshiaki Makita , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann Subject: [PATCH RFC v2 6/9] xdp: Add a flag for disabling napi_direct of xdp_return_frame in xdp_mem_info Date: Mon, 11 Jun 2018 01:02:14 +0900 Message-Id: <20180610160217.3146-7-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180610160217.3146-1-toshiaki.makita1@gmail.com> References: <20180610160217.3146-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita We need some mechanism to disable napi_direct on calling xdp_return_frame_rx_napi() from some context. When veth gets support of XDP_REDIRECT, it will redirects packets which are redirected from other devices. On redirection veth will reuse xdp_mem_info of the redirection source device to make return_frame work. But in this case .ndo_xdp_xmit() called from veth redirection uses xdp_mem_info which is not guarded by NAPI, because the .ndo_xdp_xmit is not called directly from the rxq which owns the xdp_mem_info. This approach introduces a flag in xdp_mem_info to indicate that napi_direct should be disabled even when _rx_napi variant is used. Signed-off-by: Toshiaki Makita --- include/net/xdp.h | 4 ++++ net/core/xdp.c | 6 ++++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/include/net/xdp.h b/include/net/xdp.h index 2deea7166a34..ea0c80f6c8ee 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -41,6 +41,9 @@ enum xdp_mem_type { MEM_TYPE_MAX, }; +/* XDP flags for xdp_mem_info */ +#define XDP_MEM_RF_NO_DIRECT BIT(0) /* don't use napi_direct */ + /* XDP flags for ndo_xdp_xmit */ #define XDP_XMIT_FLUSH (1U << 0) /* doorbell signal consumer */ #define XDP_XMIT_FLAGS_MASK XDP_XMIT_FLUSH @@ -48,6 +51,7 @@ enum xdp_mem_type { struct xdp_mem_info { u32 type; /* enum xdp_mem_type, but known size type */ u32 id; + u32 flags; }; struct page_pool; diff --git a/net/core/xdp.c b/net/core/xdp.c index 9d1f22072d5d..e94f146360b2 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -327,10 +327,12 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct, /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */ xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params); page = virt_to_head_page(data); - if (xa) + if (xa) { + napi_direct &= !(mem->flags & XDP_MEM_RF_NO_DIRECT); page_pool_put_page(xa->page_pool, page, napi_direct); - else + } else { put_page(page); + } rcu_read_unlock(); break; case MEM_TYPE_PAGE_SHARED: From patchwork Sun Jun 10 16:02:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 927384 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="KQiuw52O"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 413gsZ0CK8z9rvt for ; Mon, 11 Jun 2018 02:02:58 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932319AbeFJQCm (ORCPT ); Sun, 10 Jun 2018 12:02:42 -0400 Received: from mail-pg0-f68.google.com ([74.125.83.68]:46424 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932163AbeFJQCj (ORCPT ); Sun, 10 Jun 2018 12:02:39 -0400 Received: by mail-pg0-f68.google.com with SMTP id d2-v6so8567056pga.13 for ; Sun, 10 Jun 2018 09:02:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Ou7fu2v+iDjcoAq9ZlaBc/Rb/7LRgi59hEuF954coNw=; b=KQiuw52OrxEQCq2DXlDFMPavK0EhGoox1NQhtfCWoRRMWfWX4h6T9gO4IuEa3TTgDT AzWAjULSZHwt9yQYa5Q1SQQ3qm+icq/4exrQchFstugNHDFjkOAHM7YEG5x2iQGPVhVE AFXmRh9ugShPlEn0I2kPo5BJcx4Kqp4XKMzawSvq+vqHn6QJZVxKIU4NeRnyNVJKs7p7 5pzJozlpHEvqJHTsgk+4fmAi4PNxPV/jyNh5OtGVNeAvcTDDucgNdXbWR2zTHaO3BZTy inlytR+WQ9CPBp4hFmehgUeB6Z1Mc8bc/1S+pvrIj2WhcQy3bjp/RLqOkc0Q9P7hqu77 hVKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Ou7fu2v+iDjcoAq9ZlaBc/Rb/7LRgi59hEuF954coNw=; b=Z8uewC3j9qZITUQ+5IQYZtYCadF4AdWLpl8+WJj7LGyAGsGGnz1VU0NWTj+2GYcw+K 4bOBBb+cz7mQ7iRGTsYgy8t+qqNpLGIfGyYd7XICiOZ9wHCSfdpFOElHJhaZWHGTA/v7 LtHFSGkEHv8zp1x+eIzZ1t1fC1FlpQoeLWwudSyR7+73w6KWDZAmD8Q/cytDkTOu8Cov ZDH1yF7pfYPAHSCThakV3Mr7uBu7LJTKzIqGNiv9y4dJILJMyojaA5AXwhd2miadqII/ vo207KQMsjiVuR6FtIikjb7ko3+jvqiOSta3ZYJz2w0ubAHMKk8gmeF7lYYf9UHKD4cc SBTw== X-Gm-Message-State: APt69E3L3QtsAWDjk8aN1lrTLGp923NyXzh+xBCNJlSZk7nxUQ2NHw73 BKAchPpOofZWNXB+k77f3HX3RA== X-Google-Smtp-Source: ADUXVKJY5Zsf+WrvrG1/dXnOgziNyFss0uRWw6B+TKTaeN4e4byGvrZBB6+N4PZFnq/oo1bRLNwbKg== X-Received: by 2002:a63:b742:: with SMTP id w2-v6mr11674762pgt.200.1528646559019; Sun, 10 Jun 2018 09:02:39 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id o87-v6sm56068211pfa.106.2018.06.10.09.02.37 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 10 Jun 2018 09:02:38 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org Cc: Toshiaki Makita , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann Subject: [PATCH RFC v2 7/9] veth: Add XDP TX and REDIRECT Date: Mon, 11 Jun 2018 01:02:15 +0900 Message-Id: <20180610160217.3146-8-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180610160217.3146-1-toshiaki.makita1@gmail.com> References: <20180610160217.3146-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita This allows further redirection of xdp_frames like NIC -> veth--veth -> veth--veth (XDP) (XDP) (XDP) The intermediate XDP, redirecting packets from NIC to the other veth, reuses xdp_mem_info from NIC so that page recycling of the NIC works on the destination veth's XDP. In this way return_frame is not fully guarded by NAPI, since another NAPI handler on another cpu may use the same xdp_mem_info concurrently. Thus disable napi_direct by XDP_MEM_RF_NO_DIRECT flag. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 103 insertions(+), 7 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index b809d609a642..a47e1ba7d7e6 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -44,6 +44,7 @@ struct veth_priv { struct bpf_prog __rcu *xdp_prog; struct net_device __rcu *peer; atomic64_t dropped; + struct xdp_mem_info xdp_mem; unsigned requested_headroom; bool rx_notify_masked; struct ptr_ring xdp_ring; @@ -292,10 +293,42 @@ static int veth_xdp_xmit(struct net_device *dev, int n, return n - drops; } +static void veth_xdp_flush(struct net_device *dev) +{ + struct veth_priv *rcv_priv, *priv = netdev_priv(dev); + struct net_device *rcv; + + rcu_read_lock(); + rcv = rcu_dereference(priv->peer); + if (unlikely(!rcv)) + goto out; + + rcv_priv = netdev_priv(rcv); + /* xdp_ring is initialized on receive side? */ + if (unlikely(!rcu_access_pointer(rcv_priv->xdp_prog))) + goto out; + + __veth_xdp_flush(rcv_priv); +out: + rcu_read_unlock(); +} + +static int veth_xdp_tx(struct net_device *dev, struct xdp_buff *xdp) +{ + struct xdp_frame *frame = convert_to_xdp_frame(xdp); + + if (unlikely(!frame)) + return -EOVERFLOW; + + return veth_xdp_xmit(dev, 1, &frame, 0); +} + static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, - struct xdp_frame *frame) + struct xdp_frame *frame, bool *xdp_xmit, + bool *xdp_redir) { int len = frame->len, delta = 0; + struct xdp_frame orig_frame; struct bpf_prog *xdp_prog; unsigned int headroom; struct sk_buff *skb; @@ -319,6 +352,31 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, delta = frame->data - xdp.data; len = xdp.data_end - xdp.data; break; + case XDP_TX: + orig_frame = *frame; + xdp.data_hard_start = frame; + xdp.rxq->mem = frame->mem; + xdp.rxq->mem.flags |= XDP_MEM_RF_NO_DIRECT; + if (unlikely(veth_xdp_tx(priv->dev, &xdp))) { + trace_xdp_exception(priv->dev, xdp_prog, act); + frame = &orig_frame; + goto err_xdp; + } + *xdp_xmit = true; + rcu_read_unlock(); + goto xdp_xmit; + case XDP_REDIRECT: + orig_frame = *frame; + xdp.data_hard_start = frame; + xdp.rxq->mem = frame->mem; + xdp.rxq->mem.flags |= XDP_MEM_RF_NO_DIRECT; + if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) { + frame = &orig_frame; + goto err_xdp; + } + *xdp_redir = true; + rcu_read_unlock(); + goto xdp_xmit; default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: @@ -343,12 +401,13 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, err_xdp: rcu_read_unlock(); xdp_return_frame(frame); - +xdp_xmit: return NULL; } static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, - struct sk_buff *skb) + struct sk_buff *skb, bool *xdp_xmit, + bool *xdp_redir) { u32 pktlen, headroom, act, metalen; void *orig_data, *orig_data_end; @@ -417,6 +476,26 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, switch (act) { case XDP_PASS: break; + case XDP_TX: + get_page(virt_to_page(xdp.data)); + dev_consume_skb_any(skb); + xdp.rxq->mem = priv->xdp_mem; + if (unlikely(veth_xdp_tx(priv->dev, &xdp))) { + trace_xdp_exception(priv->dev, xdp_prog, act); + goto err_xdp; + } + *xdp_xmit = true; + rcu_read_unlock(); + goto xdp_xmit; + case XDP_REDIRECT: + get_page(virt_to_page(xdp.data)); + dev_consume_skb_any(skb); + xdp.rxq->mem = priv->xdp_mem; + if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) + goto err_xdp; + *xdp_redir = true; + rcu_read_unlock(); + goto xdp_xmit; default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: @@ -447,9 +526,15 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, rcu_read_unlock(); dev_kfree_skb_any(skb); return NULL; +err_xdp: + rcu_read_unlock(); + page_frag_free(xdp.data); +xdp_xmit: + return NULL; } -static int veth_xdp_rcv(struct veth_priv *priv, int budget) +static int veth_xdp_rcv(struct veth_priv *priv, int budget, bool *xdp_xmit, + bool *xdp_redir) { int done = 0; bool more; @@ -472,7 +557,7 @@ static int veth_xdp_rcv(struct veth_priv *priv, int budget) break; } - skb = veth_xdp_rcv_one(priv, frame); + skb = veth_xdp_rcv_one(priv, frame, xdp_xmit, xdp_redir); if (skb) napi_gro_receive(&priv->xdp_napi, skb); @@ -490,7 +575,7 @@ static int veth_xdp_rcv(struct veth_priv *priv, int budget) break; } - skb = veth_xdp_rcv_skb(priv, skb); + skb = veth_xdp_rcv_skb(priv, skb, xdp_xmit, xdp_redir); if (skb) napi_gro_receive(&priv->xdp_napi, skb); @@ -506,9 +591,11 @@ static int veth_poll(struct napi_struct *napi, int budget) { struct veth_priv *priv = container_of(napi, struct veth_priv, xdp_napi); + bool xdp_xmit = false; + bool xdp_redir = false; int done; - done = veth_xdp_rcv(priv, budget); + done = veth_xdp_rcv(priv, budget, &xdp_xmit, &xdp_redir); if (done < budget && napi_complete_done(napi, done)) { /* Write rx_notify_masked before reading ptr_ring */ @@ -520,6 +607,11 @@ static int veth_poll(struct napi_struct *napi, int budget) } } + if (xdp_xmit) + veth_xdp_flush(priv->dev); + if (xdp_redir) + xdp_do_flush_map(); + return done; } @@ -570,6 +662,9 @@ static int veth_enable_xdp(struct net_device *dev) if (err < 0) goto err; + /* Save original mem info as it can be overwritten */ + priv->xdp_mem = priv->xdp_rxq.mem; + err = veth_napi_add(dev); if (err) goto err; @@ -586,6 +681,7 @@ static void veth_disable_xdp(struct net_device *dev) struct veth_priv *priv = netdev_priv(dev); veth_napi_del(dev); + priv->xdp_rxq.mem = priv->xdp_mem; xdp_rxq_info_unreg(&priv->xdp_rxq); } From patchwork Sun Jun 10 16:02:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 927387 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="YSH6lDQP"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 413gsm5D92z9rvt for ; Mon, 11 Jun 2018 02:03:08 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932346AbeFJQDC (ORCPT ); Sun, 10 Jun 2018 12:03:02 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:41165 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932297AbeFJQCl (ORCPT ); Sun, 10 Jun 2018 12:02:41 -0400 Received: by mail-pf0-f193.google.com with SMTP id a11-v6so8948222pff.8 for ; Sun, 10 Jun 2018 09:02:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=/sEoIpafElfrxrxUV1KBGYbyayBmsh0HzZDRfdWF17I=; b=YSH6lDQPqw1A6vyHE2Yq9gsPSWz/oNHx5o7cPSi7UQV8GlAG1Zi1of+hJyRZt9kqny Zue7N2/md3Uil5WWhw0Yfptvo6ZHsGScEtGqRovqa4MAtrLdMxgRXubvraDjEt6U8rF3 fLU80DZIPYyGOXW+KsCHEs8i+CHjXTgpSHYzs0PJf4VSab0AFhWdBUxUOV7KVmQlEhH6 iZW3Rc/a7sfl3yJnZNryUYytM0cOmpVbytWBJnKlilj/zyyzbhAtFTpkTHJUbJ623wf5 9XElRcO+hvJA5/zPfulAaGwvsZp8vYA4W+5gq5VHamcbdf3PEJ9RRlt2Z3c8P6f4RCWX +4Kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=/sEoIpafElfrxrxUV1KBGYbyayBmsh0HzZDRfdWF17I=; b=Zc31DMYuYQgcnQ/K6G3pW551dOCC+ZgbahXpFR9Djq10tQ3+7ivD3aKsdezXU4RLf4 ykNIjGdHvsXrQDaZjFVn9c4reIaCj2eWIg5QQzKFVgAlwoo8oaf7yM0ETCzNh47VmlnJ V9MG0vb+P0bRnsIdBkc7Yq4RxisNSTtwqYOYdoKoFpbU5PtyW5hS8/BZmePjJl68CHOa 2+w++Ro8CI7E+hpXtNfLdzNTpWPR6Qt7HwrrZp3xWbN6M+YQ8KRGwK+j9PJG5VsnW80a +XAn8vCD2DnNX++ZFCH509UiOudusX1nprBzorPMnPPCSce4sCyqnsPIKOZct3vexbJ2 eOSg== X-Gm-Message-State: APt69E3uRYJEVcbAJBkeNvofAA6J3T1LfyjI20YZNLG86Qa4xhrMfBMW qtjfqv5TrQ25wcE4YywykPCyIw== X-Google-Smtp-Source: ADUXVKLhBsKxFKLIxwGPMpYWsgWKLHbv0qF3mX4hIp8cZylX0VvDy8ns5qhwoZHifFOB+oG1EUDDtA== X-Received: by 2002:a63:b34e:: with SMTP id x14-v6mr11987380pgt.243.1528646561010; Sun, 10 Jun 2018 09:02:41 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id o87-v6sm56068211pfa.106.2018.06.10.09.02.39 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 10 Jun 2018 09:02:40 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org Cc: Toshiaki Makita , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann Subject: [PATCH RFC v2 8/9] veth: Support per queue XDP ring Date: Mon, 11 Jun 2018 01:02:16 +0900 Message-Id: <20180610160217.3146-9-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180610160217.3146-1-toshiaki.makita1@gmail.com> References: <20180610160217.3146-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita Move XDP and napi related fields in veth_priv to newly created veth_rq structure. When xdp_frames are enqueued from ndo_xdp_xmit and XDP_TX, rxq is selected by current cpu. When skbs are enqueued from the peer device, rxq is determined by its peer's txq. In this way we can implement bulk packet send using skb->xmit_more later. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 290 +++++++++++++++++++++++++++++++++++------------------ 1 file changed, 191 insertions(+), 99 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index a47e1ba7d7e6..67debd3eafe6 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -38,20 +38,24 @@ struct pcpu_vstats { struct u64_stats_sync syncp; }; -struct veth_priv { +struct veth_rq { struct napi_struct xdp_napi; struct net_device *dev; struct bpf_prog __rcu *xdp_prog; - struct net_device __rcu *peer; - atomic64_t dropped; struct xdp_mem_info xdp_mem; - unsigned requested_headroom; bool rx_notify_masked; struct ptr_ring xdp_ring; struct ptr_ring xdp_tx_ring; struct xdp_rxq_info xdp_rxq; }; +struct veth_priv { + struct net_device __rcu *peer; + atomic64_t dropped; + struct veth_rq *rq; + unsigned int requested_headroom; +}; + /* * ethtool interface */ @@ -122,19 +126,19 @@ static void veth_xdp_free(void *frame) xdp_return_frame(frame); } -static void __veth_xdp_flush(struct veth_priv *priv) +static void __veth_xdp_flush(struct veth_rq *rq) { /* Write ptr_ring before reading rx_notify_masked */ smp_mb(); - if (!priv->rx_notify_masked) { - priv->rx_notify_masked = true; - napi_schedule(&priv->xdp_napi); + if (!rq->rx_notify_masked) { + rq->rx_notify_masked = true; + napi_schedule(&rq->xdp_napi); } } -static int veth_xdp_rx(struct veth_priv *priv, struct sk_buff *skb) +static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb) { - if (unlikely(ptr_ring_produce(&priv->xdp_ring, skb))) { + if (unlikely(ptr_ring_produce(&rq->xdp_ring, skb))) { dev_kfree_skb_any(skb); return NET_RX_DROP; } @@ -142,12 +146,11 @@ static int veth_xdp_rx(struct veth_priv *priv, struct sk_buff *skb) return NET_RX_SUCCESS; } -static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, bool xdp) +static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, + struct veth_rq *rq, bool xdp) { - struct veth_priv *priv = netdev_priv(dev); - return __dev_forward_skb(dev, skb) ?: xdp ? - veth_xdp_rx(priv, skb) : + veth_xdp_rx(rq, skb) : netif_rx(skb); } @@ -157,6 +160,8 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) struct net_device *rcv; int length = skb->len; bool rcv_xdp = false; + struct veth_rq *rq; + int rxq; rcu_read_lock(); rcv = rcu_dereference(priv->peer); @@ -166,9 +171,12 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) } rcv_priv = netdev_priv(rcv); - rcv_xdp = rcu_access_pointer(rcv_priv->xdp_prog); + rxq = skb_get_queue_mapping(skb); + skb_record_rx_queue(skb, rxq); + rq = &rcv_priv->rq[rxq]; + rcv_xdp = rcu_access_pointer(rq->xdp_prog); - if (likely(veth_forward_skb(rcv, skb, rcv_xdp) == NET_RX_SUCCESS)) { + if (likely(veth_forward_skb(rcv, skb, rq, rcv_xdp) == NET_RX_SUCCESS)) { struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats); u64_stats_update_begin(&stats->syncp); @@ -181,7 +189,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) } if (rcv_xdp) - __veth_xdp_flush(rcv_priv); + __veth_xdp_flush(rq); rcu_read_unlock(); @@ -256,11 +264,17 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len, return skb; } +static int veth_select_rxq(struct net_device *dev) +{ + return smp_processor_id() % dev->real_num_rx_queues; +} + static int veth_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, u32 flags) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); struct net_device *rcv; + struct veth_rq *rq; int i, drops = 0; if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) @@ -271,24 +285,25 @@ static int veth_xdp_xmit(struct net_device *dev, int n, return -ENXIO; rcv_priv = netdev_priv(rcv); + rq = &rcv_priv->rq[veth_select_rxq(rcv)]; /* xdp_ring is initialized on receive side? */ - if (!rcu_access_pointer(rcv_priv->xdp_prog)) + if (!rcu_access_pointer(rq->xdp_prog)) return -ENXIO; - spin_lock(&rcv_priv->xdp_tx_ring.producer_lock); + spin_lock(&rq->xdp_tx_ring.producer_lock); for (i = 0; i < n; i++) { struct xdp_frame *frame = frames[i]; if (unlikely(xdp_ok_fwd_dev(rcv, frame->len) || - __ptr_ring_produce(&rcv_priv->xdp_tx_ring, frame))) { + __ptr_ring_produce(&rq->xdp_tx_ring, frame))) { xdp_return_frame_rx_napi(frame); drops++; } } - spin_unlock(&rcv_priv->xdp_tx_ring.producer_lock); + spin_unlock(&rq->xdp_tx_ring.producer_lock); if (flags & XDP_XMIT_FLUSH) - __veth_xdp_flush(rcv_priv); + __veth_xdp_flush(rq); return n - drops; } @@ -297,6 +312,7 @@ static void veth_xdp_flush(struct net_device *dev) { struct veth_priv *rcv_priv, *priv = netdev_priv(dev); struct net_device *rcv; + struct veth_rq *rq; rcu_read_lock(); rcv = rcu_dereference(priv->peer); @@ -304,11 +320,12 @@ static void veth_xdp_flush(struct net_device *dev) goto out; rcv_priv = netdev_priv(rcv); + rq = &rcv_priv->rq[veth_select_rxq(rcv)]; /* xdp_ring is initialized on receive side? */ - if (unlikely(!rcu_access_pointer(rcv_priv->xdp_prog))) + if (unlikely(!rcu_access_pointer(rq->xdp_prog))) goto out; - __veth_xdp_flush(rcv_priv); + __veth_xdp_flush(rq); out: rcu_read_unlock(); } @@ -323,7 +340,7 @@ static int veth_xdp_tx(struct net_device *dev, struct xdp_buff *xdp) return veth_xdp_xmit(dev, 1, &frame, 0); } -static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, +static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq, struct xdp_frame *frame, bool *xdp_xmit, bool *xdp_redir) { @@ -334,7 +351,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, struct sk_buff *skb; rcu_read_lock(); - xdp_prog = rcu_dereference(priv->xdp_prog); + xdp_prog = rcu_dereference(rq->xdp_prog); if (xdp_prog) { struct xdp_buff xdp; u32 act; @@ -343,7 +360,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, xdp.data = frame->data; xdp.data_end = frame->data + frame->len; xdp.data_meta = frame->data - frame->metasize; - xdp.rxq = &priv->xdp_rxq; + xdp.rxq = &rq->xdp_rxq; act = bpf_prog_run_xdp(xdp_prog, &xdp); @@ -357,8 +374,8 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, xdp.data_hard_start = frame; xdp.rxq->mem = frame->mem; xdp.rxq->mem.flags |= XDP_MEM_RF_NO_DIRECT; - if (unlikely(veth_xdp_tx(priv->dev, &xdp))) { - trace_xdp_exception(priv->dev, xdp_prog, act); + if (unlikely(veth_xdp_tx(rq->dev, &xdp))) { + trace_xdp_exception(rq->dev, xdp_prog, act); frame = &orig_frame; goto err_xdp; } @@ -370,7 +387,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, xdp.data_hard_start = frame; xdp.rxq->mem = frame->mem; xdp.rxq->mem.flags |= XDP_MEM_RF_NO_DIRECT; - if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) { + if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) { frame = &orig_frame; goto err_xdp; } @@ -380,7 +397,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: - trace_xdp_exception(priv->dev, xdp_prog, act); + trace_xdp_exception(rq->dev, xdp_prog, act); case XDP_DROP: goto err_xdp; } @@ -395,7 +412,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, } memset(frame, 0, sizeof(*frame)); - skb->protocol = eth_type_trans(skb, priv->dev); + skb->protocol = eth_type_trans(skb, rq->dev); err: return skb; err_xdp: @@ -405,9 +422,8 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv, return NULL; } -static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, - struct sk_buff *skb, bool *xdp_xmit, - bool *xdp_redir) +static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct sk_buff *skb, + bool *xdp_xmit, bool *xdp_redir) { u32 pktlen, headroom, act, metalen; void *orig_data, *orig_data_end; @@ -416,7 +432,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, struct xdp_buff xdp; rcu_read_lock(); - xdp_prog = rcu_dereference(priv->xdp_prog); + xdp_prog = rcu_dereference(rq->xdp_prog); if (!xdp_prog) { rcu_read_unlock(); goto out; @@ -467,7 +483,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, xdp.data = skb_mac_header(skb); xdp.data_end = xdp.data + pktlen; xdp.data_meta = xdp.data; - xdp.rxq = &priv->xdp_rxq; + xdp.rxq = &rq->xdp_rxq; orig_data = xdp.data; orig_data_end = xdp.data_end; @@ -479,9 +495,9 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, case XDP_TX: get_page(virt_to_page(xdp.data)); dev_consume_skb_any(skb); - xdp.rxq->mem = priv->xdp_mem; - if (unlikely(veth_xdp_tx(priv->dev, &xdp))) { - trace_xdp_exception(priv->dev, xdp_prog, act); + xdp.rxq->mem = rq->xdp_mem; + if (unlikely(veth_xdp_tx(rq->dev, &xdp))) { + trace_xdp_exception(rq->dev, xdp_prog, act); goto err_xdp; } *xdp_xmit = true; @@ -490,8 +506,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, case XDP_REDIRECT: get_page(virt_to_page(xdp.data)); dev_consume_skb_any(skb); - xdp.rxq->mem = priv->xdp_mem; - if (xdp_do_redirect(priv->dev, &xdp, xdp_prog)) + xdp.rxq->mem = rq->xdp_mem; + if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) goto err_xdp; *xdp_redir = true; rcu_read_unlock(); @@ -499,7 +515,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, default: bpf_warn_invalid_xdp_action(act); case XDP_ABORTED: - trace_xdp_exception(priv->dev, xdp_prog, act); + trace_xdp_exception(rq->dev, xdp_prog, act); case XDP_DROP: goto drop; } @@ -515,7 +531,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, off = xdp.data_end - orig_data_end; if (off != 0) __skb_put(skb, off); - skb->protocol = eth_type_trans(skb, priv->dev); + skb->protocol = eth_type_trans(skb, rq->dev); metalen = xdp.data - xdp.data_meta; if (metalen) @@ -533,7 +549,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, return NULL; } -static int veth_xdp_rcv(struct veth_priv *priv, int budget, bool *xdp_xmit, +static int veth_xdp_rcv(struct veth_rq *rq, int budget, bool *xdp_xmit, bool *xdp_redir) { int done = 0; @@ -551,15 +567,15 @@ static int veth_xdp_rcv(struct veth_priv *priv, int budget, bool *xdp_xmit, struct xdp_frame *frame; struct sk_buff *skb; - frame = __ptr_ring_consume(&priv->xdp_tx_ring); + frame = __ptr_ring_consume(&rq->xdp_tx_ring); if (!frame) { curr_more = false; break; } - skb = veth_xdp_rcv_one(priv, frame, xdp_xmit, xdp_redir); + skb = veth_xdp_rcv_one(rq, frame, xdp_xmit, xdp_redir); if (skb) - napi_gro_receive(&priv->xdp_napi, skb); + napi_gro_receive(&rq->xdp_napi, skb); done++; } @@ -568,16 +584,16 @@ static int veth_xdp_rcv(struct veth_priv *priv, int budget, bool *xdp_xmit, curr_more = true; curr_budget = min(budget - done, budget >> 1); for (i = 0; i < curr_budget; i++) { - struct sk_buff *skb = __ptr_ring_consume(&priv->xdp_ring); + struct sk_buff *skb = __ptr_ring_consume(&rq->xdp_ring); if (!skb) { curr_more = false; break; } - skb = veth_xdp_rcv_skb(priv, skb, xdp_xmit, xdp_redir); + skb = veth_xdp_rcv_skb(rq, skb, xdp_xmit, xdp_redir); if (skb) - napi_gro_receive(&priv->xdp_napi, skb); + napi_gro_receive(&rq->xdp_napi, skb); done++; } @@ -589,26 +605,26 @@ static int veth_xdp_rcv(struct veth_priv *priv, int budget, bool *xdp_xmit, static int veth_poll(struct napi_struct *napi, int budget) { - struct veth_priv *priv = - container_of(napi, struct veth_priv, xdp_napi); + struct veth_rq *rq = + container_of(napi, struct veth_rq, xdp_napi); bool xdp_xmit = false; bool xdp_redir = false; int done; - done = veth_xdp_rcv(priv, budget, &xdp_xmit, &xdp_redir); + done = veth_xdp_rcv(rq, budget, &xdp_xmit, &xdp_redir); if (done < budget && napi_complete_done(napi, done)) { /* Write rx_notify_masked before reading ptr_ring */ - smp_store_mb(priv->rx_notify_masked, false); - if (unlikely(!__ptr_ring_empty(&priv->xdp_tx_ring) || - !__ptr_ring_empty(&priv->xdp_ring))) { - priv->rx_notify_masked = true; - napi_schedule(&priv->xdp_napi); + smp_store_mb(rq->rx_notify_masked, false); + if (unlikely(!__ptr_ring_empty(&rq->xdp_tx_ring) || + !__ptr_ring_empty(&rq->xdp_ring))) { + rq->rx_notify_masked = true; + napi_schedule(&rq->xdp_napi); } } if (xdp_xmit) - veth_xdp_flush(priv->dev); + veth_xdp_flush(rq->dev); if (xdp_redir) xdp_do_flush_map(); @@ -618,22 +634,36 @@ static int veth_poll(struct napi_struct *napi, int budget) static int veth_napi_add(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); - int err; + int err, i; - err = ptr_ring_init(&priv->xdp_ring, VETH_RING_SIZE, GFP_KERNEL); - if (err) - return err; + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; - err = ptr_ring_init(&priv->xdp_tx_ring, VETH_RING_SIZE, GFP_KERNEL); - if (err) - goto err_xdp_tx_ring; + err = ptr_ring_init(&rq->xdp_ring, VETH_RING_SIZE, GFP_KERNEL); + if (err) + goto err_xdp_ring; + + err = ptr_ring_init(&rq->xdp_tx_ring, VETH_RING_SIZE, + GFP_KERNEL); + if (err) + goto err_xdp_tx_ring; + } + + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; - netif_napi_add(dev, &priv->xdp_napi, veth_poll, NAPI_POLL_WEIGHT); - napi_enable(&priv->xdp_napi); + netif_napi_add(dev, &rq->xdp_napi, veth_poll, NAPI_POLL_WEIGHT); + napi_enable(&rq->xdp_napi); + } return 0; err_xdp_tx_ring: - ptr_ring_cleanup(&priv->xdp_ring, __skb_array_destroy_skb); + ptr_ring_cleanup(&priv->rq[i].xdp_ring, __skb_array_destroy_skb); +err_xdp_ring: + for (i--; i >= 0; i--) { + ptr_ring_cleanup(&priv->rq[i].xdp_ring, __skb_array_destroy_skb); + ptr_ring_cleanup(&priv->rq[i].xdp_tx_ring, veth_xdp_free); + } return err; } @@ -641,37 +671,56 @@ static int veth_napi_add(struct net_device *dev) static void veth_napi_del(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); + int i; - napi_disable(&priv->xdp_napi); - netif_napi_del(&priv->xdp_napi); - ptr_ring_cleanup(&priv->xdp_ring, __skb_array_destroy_skb); - ptr_ring_cleanup(&priv->xdp_tx_ring, veth_xdp_free); + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + napi_disable(&rq->xdp_napi); + napi_hash_del(&rq->xdp_napi); + } + synchronize_net(); + + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + netif_napi_del(&rq->xdp_napi); + ptr_ring_cleanup(&rq->xdp_ring, __skb_array_destroy_skb); + ptr_ring_cleanup(&rq->xdp_tx_ring, veth_xdp_free); + } } static int veth_enable_xdp(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); - int err; + int err, i; - err = xdp_rxq_info_reg(&priv->xdp_rxq, dev, 0); - if (err < 0) - return err; + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; - err = xdp_rxq_info_reg_mem_model(&priv->xdp_rxq, - MEM_TYPE_PAGE_SHARED, NULL); - if (err < 0) - goto err; + err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i); + if (err < 0) + goto err_rxq_reg; + + err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, + MEM_TYPE_PAGE_SHARED, NULL); + if (err < 0) + goto err_reg_mem; - /* Save original mem info as it can be overwritten */ - priv->xdp_mem = priv->xdp_rxq.mem; + /* Save original mem info as it can be overwritten */ + rq->xdp_mem = rq->xdp_rxq.mem; + } err = veth_napi_add(dev); if (err) - goto err; + goto err_rxq_reg; return 0; -err: - xdp_rxq_info_unreg(&priv->xdp_rxq); +err_reg_mem: + xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq); +err_rxq_reg: + for (i--; i >= 0; i--) + xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq); return err; } @@ -679,10 +728,15 @@ static int veth_enable_xdp(struct net_device *dev) static void veth_disable_xdp(struct net_device *dev) { struct veth_priv *priv = netdev_priv(dev); + int i; veth_napi_del(dev); - priv->xdp_rxq.mem = priv->xdp_mem; - xdp_rxq_info_unreg(&priv->xdp_rxq); + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct veth_rq *rq = &priv->rq[i]; + + rq->xdp_rxq.mem = rq->xdp_mem; + xdp_rxq_info_unreg(&rq->xdp_rxq); + } } static int veth_open(struct net_device *dev) @@ -694,7 +748,7 @@ static int veth_open(struct net_device *dev) if (!peer) return -ENOTCONN; - if (rtnl_dereference(priv->xdp_prog)) { + if (rtnl_dereference(priv->rq[0].xdp_prog)) { err = veth_enable_xdp(dev); if (err) return err; @@ -717,7 +771,7 @@ static int veth_close(struct net_device *dev) if (peer) netif_carrier_off(peer); - if (rtnl_dereference(priv->xdp_prog)) + if (rtnl_dereference(priv->rq[0].xdp_prog)) veth_disable_xdp(dev); return 0; @@ -780,7 +834,7 @@ static netdev_features_t veth_fix_features(struct net_device *dev, if (peer) { struct veth_priv *peer_priv = netdev_priv(peer); - if (rtnl_dereference(peer_priv->xdp_prog)) + if (rtnl_dereference(peer_priv->rq[0].xdp_prog)) features &= ~NETIF_F_GSO_SOFTWARE; } @@ -816,9 +870,9 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, struct veth_priv *priv = netdev_priv(dev); struct bpf_prog *old_prog; struct net_device *peer; - int err; + int err, i; - old_prog = rtnl_dereference(priv->xdp_prog); + old_prog = rtnl_dereference(priv->rq[0].xdp_prog); peer = rtnl_dereference(priv->peer); if (prog) { @@ -826,6 +880,9 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, return -ENOTCONN; if (!old_prog) { + if (dev->real_num_rx_queues < peer->real_num_tx_queues) + return -ENOSPC; + if (dev->flags & IFF_UP) { err = veth_enable_xdp(dev); if (err) @@ -841,7 +898,8 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, } } - rcu_assign_pointer(priv->xdp_prog, prog); + for (i = 0; i < dev->real_num_rx_queues; i++) + rcu_assign_pointer(priv->rq[i].xdp_prog, prog); if (old_prog) { bpf_prog_put(old_prog); @@ -867,7 +925,7 @@ static u32 veth_xdp_query(struct net_device *dev) struct veth_priv *priv = netdev_priv(dev); const struct bpf_prog *xdp_prog; - xdp_prog = rtnl_dereference(priv->xdp_prog); + xdp_prog = rtnl_dereference(priv->rq[0].xdp_prog); if (xdp_prog) return xdp_prog->aux->id; @@ -960,13 +1018,31 @@ static int veth_validate(struct nlattr *tb[], struct nlattr *data[], return 0; } +static int veth_alloc_queues(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + + priv->rq = kcalloc(dev->num_rx_queues, sizeof(*priv->rq), GFP_KERNEL); + if (!priv->rq) + return -ENOMEM; + + return 0; +} + +static void veth_free_queues(struct net_device *dev) +{ + struct veth_priv *priv = netdev_priv(dev); + + kfree(priv->rq); +} + static struct rtnl_link_ops veth_link_ops; static int veth_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { - int err; + int err, i; struct net_device *peer; struct veth_priv *priv; char ifname[IFNAMSIZ]; @@ -1019,6 +1095,12 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, return PTR_ERR(peer); } + err = veth_alloc_queues(peer); + if (err) { + put_net(net); + goto err_peer_alloc_queues; + } + if (!ifmp || !tbp[IFLA_ADDRESS]) eth_hw_addr_random(peer); @@ -1047,6 +1129,10 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, * should be re-allocated */ + err = veth_alloc_queues(dev); + if (err) + goto err_alloc_queues; + if (tb[IFLA_ADDRESS] == NULL) eth_hw_addr_random(dev); @@ -1066,22 +1152,28 @@ static int veth_newlink(struct net *src_net, struct net_device *dev, */ priv = netdev_priv(dev); - priv->dev = dev; + for (i = 0; i < dev->real_num_rx_queues; i++) + priv->rq[i].dev = dev; rcu_assign_pointer(priv->peer, peer); priv = netdev_priv(peer); - priv->dev = peer; + for (i = 0; i < peer->real_num_rx_queues; i++) + priv->rq[i].dev = peer; rcu_assign_pointer(priv->peer, dev); return 0; err_register_dev: + veth_free_queues(dev); +err_alloc_queues: /* nothing to do */ err_configure_peer: unregister_netdevice(peer); return err; err_register_peer: + veth_free_queues(peer); +err_peer_alloc_queues: free_netdev(peer); return err; } From patchwork Sun Jun 10 16:02:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiaki Makita X-Patchwork-Id: 927381 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="K0NmpvKJ"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 413gsM0Pxsz9rvt for ; Mon, 11 Jun 2018 02:02:47 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932344AbeFJQCp (ORCPT ); Sun, 10 Jun 2018 12:02:45 -0400 Received: from mail-pg0-f66.google.com ([74.125.83.66]:36239 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932295AbeFJQCn (ORCPT ); Sun, 10 Jun 2018 12:02:43 -0400 Received: by mail-pg0-f66.google.com with SMTP id m5-v6so8573834pgd.3 for ; Sun, 10 Jun 2018 09:02:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=drmQAioz3gaW/c7y1LHqoyxYyAoh/Y7K9gClsYtppj4=; b=K0NmpvKJoNSJCcmOox9vvmJ4YhXhoWZ338gb6Yw+RIhD91j33XJtuOl/3ffYrCfvYB deVVXKYJunLn077eOdHGefqndewCIiULsBmSX8iQd19BsVTdmxdCpqE1PJvEbJUfGpjP hi+k6+5+covehgZRdfWecZOm+oKYN4GN7Hc8bvLq5MNbl06OaNgyJY+E/hpCigARjFg3 eEi0TfYNB1Cyx03Fsk4rqi4KA2LWTha/ZIcp2UlC9zU3auBrvHc52DIddJU2mQZZxS9p plxv5Yda2JdN7tB5ercxLof3RWvV/PbcVv3hPi2jwEFpaxvPk+jpjMvlBZf9M6re3FwP ew4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=drmQAioz3gaW/c7y1LHqoyxYyAoh/Y7K9gClsYtppj4=; b=P+wsiQUvYql4QAfxgg3WU0xZPSdeiq/9ZeL7fE481mcYF5gAEGE1sxH/lFyYSNvZGl U1S/Y3VxN3Nfq+pjbE0e29yKvN3oTIPYZiy4N4JQxiLvS7gqWm2N7w1wGRBTaNu89QIX Hcb36C/xUgrehrUXvQYE0tnIuaAOMysGXw2LBKvfBVIhYHRQP6A2vHLpq+YC78Pnoq+L uUZE/dlJmLXfkY1F4gEosjyEhhH/EF+ep/6QTIIuq+g4xZyvo6tFtO2jgtS6yIwz6qHs qR0N6w9ybjuozQiCf5AyKSO+i4JCcGDMIh5I9M6qaBVAfguGdsBy3DuqsG7HlJTmDbwb UuiA== X-Gm-Message-State: APt69E0RmefOC09izXFHpCZ79vQcdunlYBUHXVwrSDSPBM+OzeJe195D pbOxAmXH76XzvcFZ2baJXkpRhw== X-Google-Smtp-Source: ADUXVKKlEmq+UXVepIAnbd5hdvTuCCyddSteJ5B4bSmsyFrTxZPF1YTMQ/Ny2jvcioqY73zh9LJ1Hw== X-Received: by 2002:a63:7a07:: with SMTP id v7-v6mr11785226pgc.444.1528646562889; Sun, 10 Jun 2018 09:02:42 -0700 (PDT) Received: from localhost.localdomain (i153-145-22-9.s42.a013.ap.plala.or.jp. [153.145.22.9]) by smtp.gmail.com with ESMTPSA id o87-v6sm56068211pfa.106.2018.06.10.09.02.41 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 10 Jun 2018 09:02:42 -0700 (PDT) From: Toshiaki Makita To: netdev@vger.kernel.org Cc: Toshiaki Makita , Jesper Dangaard Brouer , Alexei Starovoitov , Daniel Borkmann Subject: [PATCH RFC v2 9/9] veth: Bulk skb xmit for XDP path Date: Mon, 11 Jun 2018 01:02:17 +0900 Message-Id: <20180610160217.3146-10-toshiaki.makita1@gmail.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180610160217.3146-1-toshiaki.makita1@gmail.com> References: <20180610160217.3146-1-toshiaki.makita1@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Toshiaki Makita Aquire txq lock instead of rxq ptr_ring lock so we avoid per-packet lock when skb->xmit_more is true. We ensure that rxqs are always not less than txqs and txq to rxq is one to one mapping, so we can completely remove rxq side lock. Since we removed rxq side lock, this change does not increase the number of locking even when bulk sending is not possible, e.g. non-GSO packets. Signed-off-by: Toshiaki Makita --- drivers/net/veth.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 67debd3eafe6..376d70f983e5 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -138,7 +138,7 @@ static void __veth_xdp_flush(struct veth_rq *rq) static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb) { - if (unlikely(ptr_ring_produce(&rq->xdp_ring, skb))) { + if (unlikely(__ptr_ring_produce(&rq->xdp_ring, skb))) { dev_kfree_skb_any(skb); return NET_RX_DROP; } @@ -188,7 +188,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) atomic64_inc(&priv->dropped); } - if (rcv_xdp) + if (rcv_xdp && !skb->xmit_more) __veth_xdp_flush(rq); rcu_read_unlock(); @@ -829,15 +829,21 @@ static netdev_features_t veth_fix_features(struct net_device *dev, { struct veth_priv *priv = netdev_priv(dev); struct net_device *peer; + bool xdp = false; peer = rtnl_dereference(priv->peer); if (peer) { struct veth_priv *peer_priv = netdev_priv(peer); if (rtnl_dereference(peer_priv->rq[0].xdp_prog)) - features &= ~NETIF_F_GSO_SOFTWARE; + xdp = true; } + if (xdp) + features &= ~(NETIF_F_GSO_SOFTWARE | NETIF_F_LLTX); + else + features |= NETIF_F_LLTX; + return features; }