From patchwork Sun Jan 12 02:37:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 1221690 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=tAcxWY9h; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 47wLVB37rTz9sRR for ; Sun, 12 Jan 2020 13:37:38 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731986AbgALChh (ORCPT ); Sat, 11 Jan 2020 21:37:37 -0500 Received: from mail-il1-f195.google.com ([209.85.166.195]:41833 "EHLO mail-il1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731985AbgALChh (ORCPT ); Sat, 11 Jan 2020 21:37:37 -0500 Received: by mail-il1-f195.google.com with SMTP id f10so5004152ils.8; Sat, 11 Jan 2020 18:37:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=Ahgxc1mJ0wFyA7jnb8Fsb+g7dNCEBA2eOuNysg7JNsY=; b=tAcxWY9hyPqEDDLz9vF+5vhnd1jCt75tdFbCKyAQ9LXBFBxhwLLzWeKUOW+8iPqmkn z3HlYRrFELIjEuWyo3Z0rtrnhbB9SJbPfNFKBcFjn4+WnSVwLTzgGR4PULZR+k528s+p vGO/hd5JSooUNIhT5pO7c6/ACg21wazuyMaPOMnqb2fgOKeNcbt2VEw08xy7pScOoUGS VXHNgEr/4xFF/riqalHm5T9DzTLZp+8/bMfpIOJHDmhBSvfoDY6djco8RAGCVQ/98Bll iV0gWqR9qATMwMJ1yxjqdEZFeWimnKqFMkjvLCnOwy5UhjSfPrivK+fLs3GSxC5Fpekp SrUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=Ahgxc1mJ0wFyA7jnb8Fsb+g7dNCEBA2eOuNysg7JNsY=; b=LdhGCrXhc5ABI2TiLjd16ERQD9coDQXnS7JwvUJ2WGTgjJRzVjb6kY8Gj+UxYWUJNf 9fMDwpnRpXEFgncLsIw6WxNKsrnePyN7EY5djnvxmSDRJc1n5jGMlvjmv6HHCQ3cM16L vxFPcXVL+fRQnpEqDCBw36ACf/9XLFJw7VkPDZRoBUwvdclYOU3cibaZwhdyIZ355xO4 NBrW+nS2hRQsuc6Ei99wSxXNx7RP41pTu9C88HfdYcfx2SlxNwSb///57CNUnIDcIGb0 zCTsZqieeGqdUwA7SqrJdVuQDooRqTStWxv+Kpu3CBHe3hsMNnhgv0XH14PMQiSefOBX BZvQ== X-Gm-Message-State: APjAAAXEzCeSThtujg6g9NspdlDTaE9U896QRaYNVfb5CwVpNPb9cTjq Dka9JBkq+1CAKEF5d+67rto= X-Google-Smtp-Source: APXvYqxqe6ISn4AyimBnrI0ScVN+XMZzpsBEAKcw9RXkKPI2A4Za6MgWBn0qLNLYEPIh0L+2f/3Hmg== X-Received: by 2002:a92:901:: with SMTP id y1mr9135799ilg.274.1578796656687; Sat, 11 Jan 2020 18:37:36 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id r10sm1713660iot.28.2020.01.11.18.37.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 11 Jan 2020 18:37:36 -0800 (PST) Subject: [bpf-next PATCH v2 1/2] bpf: xdp, update devmap comments to reflect napi/rcu usage From: John Fastabend To: bjorn.topel@gmail.com, bpf@vger.kernel.org, toke@redhat.com, toshiaki.makita1@gmail.com Cc: netdev@vger.kernel.org, john.fastabend@gmail.com, ast@kernel.org, daniel@iogearbox.net Date: Sat, 11 Jan 2020 18:37:21 -0800 Message-ID: <157879664156.8200.4955971883120344808.stgit@john-Precision-5820-Tower> In-Reply-To: <157879606461.8200.2816751890292483534.stgit@john-Precision-5820-Tower> References: <157879606461.8200.2816751890292483534.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Now that we rely on synchronize_rcu and call_rcu waiting to exit perempt-disable regions (NAPI) lets update the comments to reflect this. Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup") Acked-by: Björn Töpel Acked-by: Song Liu Signed-off-by: John Fastabend Acked-by: Toke Høiland-Jørgensen --- kernel/bpf/devmap.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index da9c832..f0bf525 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -193,10 +193,12 @@ static void dev_map_free(struct bpf_map *map) /* At this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0, * so the programs (can be more than one that used this map) were - * disconnected from events. Wait for outstanding critical sections in - * these programs to complete. The rcu critical section only guarantees - * no further reads against netdev_map. It does __not__ ensure pending - * flush operations (if any) are complete. + * disconnected from events. The following synchronize_rcu() guarantees + * both rcu read critical sections complete and waits for + * preempt-disable regions (NAPI being the relavent context here) so we + * are certain there will be no further reads against the netdev_map and + * all flush operations are complete. Flush operations can only be done + * from NAPI context for this reason. */ spin_lock(&dev_map_lock); @@ -498,12 +500,11 @@ static int dev_map_delete_elem(struct bpf_map *map, void *key) return -EINVAL; /* Use call_rcu() here to ensure any rcu critical sections have - * completed, but this does not guarantee a flush has happened - * yet. Because driver side rcu_read_lock/unlock only protects the - * running XDP program. However, for pending flush operations the - * dev and ctx are stored in another per cpu map. And additionally, - * the driver tear down ensures all soft irqs are complete before - * removing the net device in the case of dev_put equals zero. + * completed as well as any flush operations because call_rcu + * will wait for preempt-disable region to complete, NAPI in this + * context. And additionally, the driver tear down ensures all + * soft irqs are complete before removing the net device in the + * case of dev_put equals zero. */ old_dev = xchg(&dtab->netdev_map[k], NULL); if (old_dev) From patchwork Sun Jan 12 02:37:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Fastabend X-Patchwork-Id: 1221692 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=Q1wk3Bar; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 47wLVb3c9cz9sRY for ; Sun, 12 Jan 2020 13:37:59 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731993AbgALCh6 (ORCPT ); Sat, 11 Jan 2020 21:37:58 -0500 Received: from mail-io1-f67.google.com ([209.85.166.67]:36646 "EHLO mail-io1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731985AbgALCh6 (ORCPT ); Sat, 11 Jan 2020 21:37:58 -0500 Received: by mail-io1-f67.google.com with SMTP id d15so6115174iog.3; Sat, 11 Jan 2020 18:37:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=tMBuKvBK8Si7i2xUAU0++949aUuYSuAcPP+8ON5MMKI=; b=Q1wk3Bar+nVU/mEEC2dawCa8jrTCS63p6rXbCivzlDINWjFCkz1C4I7jEkXd15w3bO KdKSXC/p2c70BW6SGAPwklUMeNVFiJMlWTUmiy5Cp3Ux3csc0mBh4RoTiOI92PL+Ew2p 8MWYw8Gp6UN2s6MQVIZQKz8zLjbVyhiq/DQ6au+KsJCApfc2qd5DJ5Wi/zez4tm++QEI +bKM2/aQ52fyRKWqDV3QgP9RyB4PCbniwor86t2sUU0PJq1Y345k5ynHjqh8PnVo++Ik AVZjOlU3jPv8ZIKjrSseul7WVBfhCnBm4paZ4Ot31Pzkb4e4RtXX2FDbnxMgA2QrD+Tt nR0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=tMBuKvBK8Si7i2xUAU0++949aUuYSuAcPP+8ON5MMKI=; b=KUtGfjVDF8GvDdSQgsUymG8ninjJ4f8qx4fZIWLD1TgTbwsr1m9Qn3gHm2NzWAeq5p j08cc/UxPNqC42c3AOFJyzlIk1bzgj2fGdZG2JV9t0JImtLRZBhR6+lL8PdeCe93UmSV aRzAHhX4ezB+9yvRGHdmLvj8f3UDCHghUJPcfGv8p9sGOZtoqbs3uMruRc6OSRQ8ap4l nmPNBJkGdaCQKnrDyO3FZ2KQilXZcc1JGY5wsRfDm7C17FEkJKoFLVFAhiAVUORnx/8R j73pun3zA08e1Rd6zCMw4IPuFhlKP/5w/6/rAkTq2xWsfTl8s2/fModeqmK7io2u/k5l ZXTA== X-Gm-Message-State: APjAAAXfZXOetVU4iOzYcplCZuS73/AkDxfSwUe/C7Sq66SsJcOdS4Ki KmNK4eTFtZTjI7nYrsNYBXgkpnym X-Google-Smtp-Source: APXvYqyyjl3TOhK/uirIOrPJbRHdxYLdsg8bY7HrFmFUxtHM3lbZOUH/Vs+yhSo1V91COkR0QSKplg== X-Received: by 2002:a6b:148c:: with SMTP id 134mr8278272iou.178.1578796677332; Sat, 11 Jan 2020 18:37:57 -0800 (PST) Received: from [127.0.1.1] ([184.63.162.180]) by smtp.gmail.com with ESMTPSA id q23sm1680240iob.39.2020.01.11.18.37.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 11 Jan 2020 18:37:56 -0800 (PST) Subject: [bpf-next PATCH v2 2/2] bpf: xdp, remove no longer required rcu_read_{un}lock() From: John Fastabend To: bjorn.topel@gmail.com, bpf@vger.kernel.org, toke@redhat.com, toshiaki.makita1@gmail.com Cc: netdev@vger.kernel.org, john.fastabend@gmail.com, ast@kernel.org, daniel@iogearbox.net Date: Sat, 11 Jan 2020 18:37:42 -0800 Message-ID: <157879666276.8200.5529955400195897154.stgit@john-Precision-5820-Tower> In-Reply-To: <157879606461.8200.2816751890292483534.stgit@john-Precision-5820-Tower> References: <157879606461.8200.2816751890292483534.stgit@john-Precision-5820-Tower> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Now that we depend on rcu_call() and synchronize_rcu() to also wait for preempt_disabled region to complete the rcu read critical section in __dev_map_flush() is no longer required. Except in a few special cases in drivers that need it for other reasons. These originally ensured the map reference was safe while a map was also being free'd. And additionally that bpf program updates via ndo_bpf did not happen while flush updates were in flight. But flush by new rules can only be called from preempt-disabled NAPI context. The synchronize_rcu from the map free path and the rcu_call from the delete path will ensure the reference there is safe. So lets remove the rcu_read_lock and rcu_read_unlock pair to avoid any confusion around how this is being protected. If the rcu_read_lock was required it would mean errors in the above logic and the original patch would also be wrong. Now that we have done above we put the rcu_read_lock in the driver code where it is needed in a driver dependent way. I think this helps readability of the code so we know where and why we are taking read locks. Most drivers will not need rcu_read_locks here and further XDP drivers already have rcu_read_locks in their code paths for reading xdp programs on RX side so this makes it symmetric where we don't have half of rcu critical sections define in driver and the other half in devmap. Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup") Signed-off-by: John Fastabend Acked-by: Toke Høiland-Jørgensen --- drivers/net/veth.c | 6 +++++- drivers/net/virtio_net.c | 8 ++++++-- kernel/bpf/devmap.c | 5 +++-- 3 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index a552df3..184e1b4 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -377,6 +377,7 @@ static int veth_xdp_xmit(struct net_device *dev, int n, unsigned int max_len; struct veth_rq *rq; + rcu_read_lock(); if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) { ret = -EINVAL; goto drop; @@ -418,11 +419,14 @@ static int veth_xdp_xmit(struct net_device *dev, int n, if (flags & XDP_XMIT_FLUSH) __veth_xdp_flush(rq); - if (likely(!drops)) + if (likely(!drops)) { + rcu_read_unlock(); return n; + } ret = n - drops; drop: + rcu_read_unlock(); atomic64_add(drops, &priv->dropped); return ret; diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 4d7d5434..2c11f82 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -498,12 +498,16 @@ static int virtnet_xdp_xmit(struct net_device *dev, void *ptr; int i; + rcu_read_lock(); + /* Only allow ndo_xdp_xmit if XDP is loaded on dev, as this * indicate XDP resources have been successfully allocated. */ xdp_prog = rcu_dereference(rq->xdp_prog); - if (!xdp_prog) + if (!xdp_prog) { + rcu_read_unlock(); return -ENXIO; + } sq = virtnet_xdp_sq(vi); @@ -552,7 +556,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, sq->stats.xdp_tx_drops += drops; sq->stats.kicks += kicks; u64_stats_update_end(&sq->stats.syncp); - + rcu_read_unlock(); return ret; } diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index f0bf525..d0ce2e2 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -372,16 +372,17 @@ static int bq_xmit_all(struct xdp_bulk_queue *bq, u32 flags) * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the * net device can be torn down. On devmap tear down we ensure the flush list * is empty before completing to ensure all flush operations have completed. + * When drivers update the bpf program they may need to ensure any flush ops + * are also complete. Using synchronize_rcu or call_rcu will suffice for this + * because both wait for napi context to exit. */ void __dev_map_flush(void) { struct list_head *flush_list = this_cpu_ptr(&dev_map_flush_list); struct xdp_bulk_queue *bq, *tmp; - rcu_read_lock(); list_for_each_entry_safe(bq, tmp, flush_list, flush_node) bq_xmit_all(bq, XDP_XMIT_FLUSH); - rcu_read_unlock(); } /* rcu_read_lock (from syscall and BPF contexts) ensures that if a delete and/or