From patchwork Mon Sep 16 13:47:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taehee Yoo X-Patchwork-Id: 1162823 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ZLrLQGpx"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46X6yb3p89z9sN1 for ; Mon, 16 Sep 2019 23:48:23 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387633AbfIPNsW (ORCPT ); Mon, 16 Sep 2019 09:48:22 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:35029 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727806AbfIPNsW (ORCPT ); Mon, 16 Sep 2019 09:48:22 -0400 Received: by mail-pf1-f195.google.com with SMTP id 205so23057308pfw.2 for ; Mon, 16 Sep 2019 06:48:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=E2iTkoNKc433ZlxkZab3V2S2Vj7M1Jee30XLKPAKn98=; b=ZLrLQGpxJixcQR0XhWYjb0y+Yvk1OX7vXxAxw1yh/gkkDGNpw7b6K1erCJguqRCSp+ 2CwAwOHcVZmeaaNivDLZQf39x1mxwYHKJL+njESf4pq2kIisEWbnk5MhbIkwEf9HG5m6 AmU5gkerGMIBaHpR1faNDpfF1Af7bnRdknO5B6APSG8E3iQXgGfnAl0Ukdask/2FpxOD yjK4iS94/4sD3vpRTCk6GEYhjUcWneule6YG+27zQE1ayqf66Qykt72ILhnj5S7M+1yI +ins1CUGPeXR/3TPt1YG7B1bZdZRAriDnWXY+CvPRy8W6z1c2UlkEskvszPHu9+qYO/3 33hA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=E2iTkoNKc433ZlxkZab3V2S2Vj7M1Jee30XLKPAKn98=; b=Qd+uIudtkXwt+bVsDa7TXXucfP3OjyKN0GqBIezlMD5i16hIr9Nn0rBVTyWjmScLH2 oo1JJ232mR1CbcxX1gO5Xp+IYcV0F5no1LFwRmHFzsZekG476m0Cx9aLHldamKcXQKfn 0OXgXnHaYDD/W5MSjfCxyav5EiFjuLIfwPdCmiso71+H21q94QLuSQglnGWBzAyWjO+F EbEbHmyls7LVW1hZ04DO52oK/HofKtuQWss17XzEfu+nZRjBcZTSwHlME+1wwKQgK0u2 uk0Mmyz7vMQ1VaaiLEDMylqZSECedWp5xJzMS1YV9gTpZ/gwwqHOYNyhlnL/g+rbBoRe nd0g== X-Gm-Message-State: APjAAAWArGQ8O4rh8Nj+dWD0+FHnHPv0QBWq5Fiap9IRUjlaIUYnMnDp k0tJ+TgJEpIQE5oI5J26Dq4= X-Google-Smtp-Source: APXvYqwJkvI4Mj1HeZC37DpVyXVo42yUZDuiwa8rn2YWZt/PvvnUuLkGoBg4ISAZRVtGfov0FB7INw== X-Received: by 2002:a63:3c08:: with SMTP id j8mr21520053pga.72.1568641701051; Mon, 16 Sep 2019 06:48:21 -0700 (PDT) Received: from ap-To-be-filled-by-O-E-M.1.1.1.1 ([14.33.120.60]) by smtp.gmail.com with ESMTPSA id z20sm2822266pjn.12.2019.09.16.06.48.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Sep 2019 06:48:19 -0700 (PDT) From: Taehee Yoo To: davem@davemloft.net, netdev@vger.kernel.org, j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net, jiri@resnulli.us, sd@queasysnail.net, roopa@cumulusnetworks.com, saeedm@mellanox.com, manishc@marvell.com, rahulv@marvell.com, kys@microsoft.com, haiyangz@microsoft.com, stephen@networkplumber.org, sashal@kernel.org, hare@suse.de, varun@chelsio.com, ubraun@linux.ibm.com, kgraul@linux.ibm.com, jay.vosburgh@canonical.com Cc: ap420073@gmail.com Subject: [PATCH net v3 01/11] net: core: limit nested device depth Date: Mon, 16 Sep 2019 22:47:52 +0900 Message-Id: <20190916134802.8252-2-ap420073@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190916134802.8252-1-ap420073@gmail.com> References: <20190916134802.8252-1-ap420073@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Current code doesn't limit the number of nested devices. Nested devices would be handled recursively and this needs huge stack memory. So, unlimited nested devices could make stack overflow. This patch adds upper_level and lower_level, they are common variables and represent maximum lower/upper depth. When upper/lower device is attached or dettached, {lower/upper}_level are updated. and if maximum depth is bigger than 8, attach routine fails and returns -EMLINK. In addition, this patch converts recursive routine of netdev_walk_all_{lower/upper} to iterator routine. Test commands: ip link add dummy0 type dummy ip link add link dummy0 name vlan1 type vlan id 1 ip link set vlan1 up for i in {2..200} do let A=$i-1 ip link add vlan$i link vlan$A type vlan id $i done ip link del vlan1 Splat looks like: [ 132.396918] Thread overran stack, or stack corrupted [ 132.397763] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 132.398557] CPU: 1 PID: 1299 Comm: ip Not tainted 5.3.0-rc8+ #179 [ 132.399241] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 132.400136] RIP: 0010:stack_depot_fetch+0x10/0x30 [ 132.400683] Code: 00 75 10 48 8b 73 18 48 89 ef 5b 5d e9 79 8f 87 ff 0f 0b e8 c2 6d 9b ff eb e9 89 f8 c1 ef 110 [ 132.402711] RSP: 0000:ffff8880b002eb78 EFLAGS: 00010006 [ 132.404578] RAX: 00000000001fffff RBX: ffff8880b002eec0 RCX: 0000000000000000 [ 132.405305] RDX: 000000000000001d RSI: ffff8880b002eb80 RDI: 0000000000003ff0 [ 132.406022] RBP: ffffea0002c00a00 R08: ffffed101b53df23 R09: ffffed101b53df23 [ 132.406776] R10: 0000000000000001 R11: ffffed101b53df22 R12: ffff8880d38dd900 [ 132.407598] R13: ffff8880b002e600 R14: ffff8880b002eec0 R15: ffff8880b002ed20 [ 132.408365] FS: 00007f5fca3c90c0(0000) GS:ffff8880da800000(0000) knlGS:0000000000000000 [ 132.409213] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 132.409788] CR2: ffffffffa598b098 CR3: 00000000ccb86004 CR4: 00000000000606e0 [ 132.410659] Call Trace: [ 132.410962] Modules linked in: 8021q garp stp mrp llc dummy veth openvswitch nsh nf_conncount nf_nat nf_conntrs [ 132.412410] CR2: ffffffffa598b098 [ 132.412754] ---[ end trace 7f335fb982ddb2da ]--- [ 132.413293] RIP: 0010:stack_depot_fetch+0x10/0x30 [ 132.413851] Code: 00 75 10 48 8b 73 18 48 89 ef 5b 5d e9 79 8f 87 ff 0f 0b e8 c2 6d 9b ff eb e9 89 f8 c1 ef 110 [ 132.415973] RSP: 0000:ffff8880b002eb78 EFLAGS: 00010006 [ 132.416581] RAX: 00000000001fffff RBX: ffff8880b002eec0 RCX: 0000000000000000 [ 132.417380] RDX: 000000000000001d RSI: ffff8880b002eb80 RDI: 0000000000003ff0 [ 132.418211] RBP: ffffea0002c00a00 R08: ffffed101b53df23 R09: ffffed101b53df23 [ 132.419036] R10: 0000000000000001 R11: ffffed101b53df22 R12: ffff8880d38dd900 [ 132.419815] R13: ffff8880b002e600 R14: ffff8880b002eec0 R15: ffff8880b002ed20 [ 132.420616] FS: 00007f5fca3c90c0(0000) GS:ffff8880da800000(0000) knlGS:0000000000000000 [ 132.421489] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 132.422134] CR2: ffffffffa598b098 CR3: 00000000ccb86004 CR4: 00000000000606e0 [ 132.422912] Kernel panic - not syncing: Fatal exception [ 132.423441] Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff) [ 132.425455] Rebooting in 5 seconds.. Signed-off-by: Taehee Yoo --- v2 -> v3 : - Modify nesting infra code to use iterator instead of recursive v1 -> v2 : - This patch is not changed include/linux/netdevice.h | 4 + net/core/dev.c | 286 ++++++++++++++++++++++++++++++++------ 2 files changed, 245 insertions(+), 45 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 88292953aa6f..5bb5756129af 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1624,6 +1624,8 @@ enum netdev_priv_flags { * @type: Interface hardware type * @hard_header_len: Maximum hardware header length. * @min_header_len: Minimum hardware header length + * @upper_level: Maximum depth level of upper devices. + * @lower_level: Maximum depth level of lower devices. * * @needed_headroom: Extra headroom the hardware may need, but not in all * cases can this be guaranteed @@ -1854,6 +1856,8 @@ struct net_device { unsigned short type; unsigned short hard_header_len; unsigned char min_header_len; + unsigned char upper_level; + unsigned char lower_level; unsigned short needed_headroom; unsigned short needed_tailroom; diff --git a/net/core/dev.c b/net/core/dev.c index 5156c0edebe8..fa847ea957ee 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -146,6 +146,7 @@ #include "net-sysfs.h" #define MAX_GRO_SKBS 8 +#define MAX_NEST_DEV 8 /* This should be increased if a protocol with a bigger head is added. */ #define GRO_MAX_HEAD (MAX_HEADER + 128) @@ -6602,6 +6603,21 @@ struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev, } EXPORT_SYMBOL(netdev_upper_get_next_dev_rcu); +static struct net_device *netdev_next_upper_dev(struct net_device *dev, + struct list_head **iter) +{ + struct netdev_adjacent *upper; + + upper = list_entry((*iter)->next, struct netdev_adjacent, list); + + if (&upper->list == &dev->adj_list.upper) + return NULL; + + *iter = &upper->list; + + return upper->dev; +} + static struct net_device *netdev_next_upper_dev_rcu(struct net_device *dev, struct list_head **iter) { @@ -6619,31 +6635,103 @@ static struct net_device *netdev_next_upper_dev_rcu(struct net_device *dev, return upper->dev; } +int netdev_walk_all_upper_dev(struct net_device *dev, + int (*fn)(struct net_device *dev, + void *data), + void *data) +{ + struct net_device *udev, *next, *now, *dev_stack[MAX_NEST_DEV + 1]; + struct list_head *niter, *iter, *iter_stack[MAX_NEST_DEV + 1]; + int ret, cur = 0; + + now = dev; + iter = &dev->adj_list.upper; + + while (1) { + if (now != dev) { + ret = fn(now, data); + if (ret) + return ret; + } + + next = NULL; + while (1) { + udev = netdev_next_upper_dev(now, &iter); + if (!udev) + break; + + if (!next) { + next = udev; + niter = &udev->adj_list.upper; + } else { + dev_stack[cur] = udev; + iter_stack[cur++] = &udev->adj_list.upper; + break; + } + } + + if (!next) { + if (!cur) + return 0; + next = dev_stack[--cur]; + niter = iter_stack[cur]; + } + + now = next; + iter = niter; + } + + return 0; +} + int netdev_walk_all_upper_dev_rcu(struct net_device *dev, int (*fn)(struct net_device *dev, void *data), void *data) { - struct net_device *udev; - struct list_head *iter; - int ret; + struct net_device *udev, *next, *now, *dev_stack[MAX_NEST_DEV + 1]; + struct list_head *niter, *iter, *iter_stack[MAX_NEST_DEV + 1]; + int ret, cur = 0; - for (iter = &dev->adj_list.upper, - udev = netdev_next_upper_dev_rcu(dev, &iter); - udev; - udev = netdev_next_upper_dev_rcu(dev, &iter)) { - /* first is the upper device itself */ - ret = fn(udev, data); - if (ret) - return ret; + now = dev; + iter = &dev->adj_list.upper; - /* then look at all of its upper devices */ - ret = netdev_walk_all_upper_dev_rcu(udev, fn, data); - if (ret) - return ret; + while (1) { + if (now != dev) { + ret = fn(now, data); + if (ret) + return ret; + } + + next = NULL; + while (1) { + udev = netdev_next_upper_dev_rcu(now, &iter); + if (!udev) + break; + + if (!next) { + next = udev; + niter = &udev->adj_list.upper; + } else { + dev_stack[cur] = udev; + iter_stack[cur++] = &udev->adj_list.upper; + break; + } + } + + if (!next) { + if (!cur) + return 0; + next = dev_stack[--cur]; + niter = iter_stack[cur]; + } + + now = next; + iter = niter; } return 0; + } EXPORT_SYMBOL_GPL(netdev_walk_all_upper_dev_rcu); @@ -6748,23 +6836,45 @@ int netdev_walk_all_lower_dev(struct net_device *dev, void *data), void *data) { - struct net_device *ldev; - struct list_head *iter; - int ret; + struct net_device *ldev, *next, *now, *dev_stack[MAX_NEST_DEV + 1]; + struct list_head *niter, *iter, *iter_stack[MAX_NEST_DEV + 1]; + int ret, cur = 0; - for (iter = &dev->adj_list.lower, - ldev = netdev_next_lower_dev(dev, &iter); - ldev; - ldev = netdev_next_lower_dev(dev, &iter)) { - /* first is the lower device itself */ - ret = fn(ldev, data); - if (ret) - return ret; + now = dev; + iter = &dev->adj_list.lower; - /* then look at all of its lower devices */ - ret = netdev_walk_all_lower_dev(ldev, fn, data); - if (ret) - return ret; + while (1) { + if (now != dev) { + ret = fn(now, data); + if (ret) + return ret; + } + + next = NULL; + while (1) { + ldev = netdev_next_lower_dev(now, &iter); + if (!ldev) + break; + + if (!next) { + next = ldev; + niter = &ldev->adj_list.lower; + } else { + dev_stack[cur] = ldev; + iter_stack[cur++] = &ldev->adj_list.lower; + break; + } + } + + if (!next) { + if (!cur) + return 0; + next = dev_stack[--cur]; + niter = iter_stack[cur]; + } + + now = next; + iter = niter; } return 0; @@ -6785,31 +6895,100 @@ static struct net_device *netdev_next_lower_dev_rcu(struct net_device *dev, return lower->dev; } -int netdev_walk_all_lower_dev_rcu(struct net_device *dev, - int (*fn)(struct net_device *dev, - void *data), - void *data) +static u8 __netdev_upper_depth(struct net_device *dev) +{ + struct net_device *udev; + struct list_head *iter; + u8 max_depth = 0; + + for (iter = &dev->adj_list.upper, + udev = netdev_next_upper_dev(dev, &iter); + udev; + udev = netdev_next_upper_dev(dev, &iter)) { + if (max_depth < udev->upper_level) + max_depth = udev->upper_level; + } + + return max_depth; +} + +static u8 __netdev_lower_depth(struct net_device *dev) { struct net_device *ldev; struct list_head *iter; - int ret; + u8 max_depth = 0; for (iter = &dev->adj_list.lower, - ldev = netdev_next_lower_dev_rcu(dev, &iter); + ldev = netdev_next_lower_dev(dev, &iter); ldev; - ldev = netdev_next_lower_dev_rcu(dev, &iter)) { - /* first is the lower device itself */ - ret = fn(ldev, data); - if (ret) - return ret; + ldev = netdev_next_lower_dev(dev, &iter)) { + if (max_depth < ldev->lower_level) + max_depth = ldev->lower_level; + } - /* then look at all of its lower devices */ - ret = netdev_walk_all_lower_dev_rcu(ldev, fn, data); - if (ret) - return ret; + return max_depth; +} + +static int __netdev_update_upper_level(struct net_device *dev, void *data) +{ + dev->upper_level = __netdev_upper_depth(dev) + 1; + return 0; +} + +static int __netdev_update_lower_level(struct net_device *dev, void *data) +{ + dev->lower_level = __netdev_lower_depth(dev) + 1; + return 0; +} + +int netdev_walk_all_lower_dev_rcu(struct net_device *dev, + int (*fn)(struct net_device *dev, + void *data), + void *data) +{ + struct net_device *ldev, *next, *now, *dev_stack[MAX_NEST_DEV + 1]; + struct list_head *niter, *iter, *iter_stack[MAX_NEST_DEV + 1]; + int ret, cur = 0; + + now = dev; + iter = &dev->adj_list.lower; + + while (1) { + if (now != dev) { + ret = fn(now, data); + if (ret) + return ret; + } + + next = NULL; + while (1) { + ldev = netdev_next_lower_dev_rcu(now, &iter); + if (!ldev) + break; + + if (!next) { + next = ldev; + niter = &ldev->adj_list.lower; + } else { + dev_stack[cur] = ldev; + iter_stack[cur++] = &ldev->adj_list.lower; + break; + } + } + + if (!next) { + if (!cur) + return 0; + next = dev_stack[--cur]; + niter = iter_stack[cur]; + } + + now = next; + iter = niter; } return 0; + } EXPORT_SYMBOL_GPL(netdev_walk_all_lower_dev_rcu); @@ -7063,6 +7242,9 @@ static int __netdev_upper_dev_link(struct net_device *dev, if (netdev_has_upper_dev(upper_dev, dev)) return -EBUSY; + if ((dev->lower_level + upper_dev->upper_level) > MAX_NEST_DEV) + return -EMLINK; + if (!master) { if (netdev_has_upper_dev(dev, upper_dev)) return -EEXIST; @@ -7089,6 +7271,12 @@ static int __netdev_upper_dev_link(struct net_device *dev, if (ret) goto rollback; + __netdev_update_upper_level(dev, NULL); + netdev_walk_all_lower_dev(dev, __netdev_update_upper_level, NULL); + + __netdev_update_lower_level(upper_dev, NULL); + netdev_walk_all_upper_dev(upper_dev, __netdev_update_lower_level, NULL); + return 0; rollback: @@ -7171,6 +7359,12 @@ void netdev_upper_dev_unlink(struct net_device *dev, call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, &changeupper_info.info); + + __netdev_update_upper_level(dev, NULL); + netdev_walk_all_lower_dev(dev, __netdev_update_upper_level, NULL); + + __netdev_update_lower_level(upper_dev, NULL); + netdev_walk_all_upper_dev(upper_dev, __netdev_update_lower_level, NULL); } EXPORT_SYMBOL(netdev_upper_dev_unlink); @@ -9159,6 +9353,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, dev->gso_max_size = GSO_MAX_SIZE; dev->gso_max_segs = GSO_MAX_SEGS; + dev->upper_level = 1; + dev->lower_level = 1; INIT_LIST_HEAD(&dev->napi_list); INIT_LIST_HEAD(&dev->unreg_list); From patchwork Mon Sep 16 13:47:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taehee Yoo X-Patchwork-Id: 1162824 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="c/pVpbFg"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46X6yj70nGz9sN1 for ; Mon, 16 Sep 2019 23:48:29 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387746AbfIPNs3 (ORCPT ); Mon, 16 Sep 2019 09:48:29 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:41346 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727806AbfIPNs2 (ORCPT ); Mon, 16 Sep 2019 09:48:28 -0400 Received: by mail-pl1-f194.google.com with SMTP id t10so1309193plr.8 for ; Mon, 16 Sep 2019 06:48:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ynW3rd5xcj3F3em15sYI7a+15sdwKfWkfP33xIZ/VSw=; b=c/pVpbFgUkOh9ygY6duZQ0tgiVOxrMwQAQQLuxPAhRFXSKHfQc6WURsPSo9FzvNPBt A6I5slw/btQDu0kc3fOaOrHJazyu9vyIOMcwSPmjD21bGtSShanv1CUDt6mwHAvVQg+l RBodth8GJcy81/Q/59r9tAwaqxCEpEBZ4df6u/aRcXRRZE6gv2ctpOVvbduaPw+APMKN 5sbjQRscmwoM5nVjKxxaO861d4AWIcRC7DmObktNTlO0qArhVKMTwO9BgexjgpUBhVhj mLuoAVnQsrnH4Fky9RelMJggLYrgNChZrtVVGBSBjwR4w352hYQtmS/+uFET9/+GplU2 l0sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ynW3rd5xcj3F3em15sYI7a+15sdwKfWkfP33xIZ/VSw=; b=TodyIgxUxbnJrJSu+iH4BQFVcO/5xeSPBU40/YsQR0qfwtcKAvyCcLejmNijDpBxVP UHcebSE+iHqOe4uRwWmInxFPwRiSowzQ3SqHFZGicEihVirHjAR2oOwAQgpiwAUfNg/L bNF4TzAzXLzy1jO6kxNJ7LCMKcRzX0eGqWumLB3iLXAK8FXl5flTJifO8aqja3bLeA70 plzkmpXfsD8PZdQQBY+6nDT11oYLNXjnlfGUxM9YmUwaNl2kCYyoQa3eq0JYsjzQ7iSQ s9OcawRbXzkHAYoi66CrY91bZcZceAW15mzeYRDXe1Z0DtEZC2/v8hyFPmGNgB2AByKV 2t/A== X-Gm-Message-State: APjAAAW24sIJgeDxhrMD1uqUMfQrCFgugJ5eOonJIfY2iRwwh1qcwwyG WR1z2bViJDTNc0BGIaG7vqs= X-Google-Smtp-Source: APXvYqwfgKOcSLN7HflPU/z1jZrYxpnH+Q/4H1iITJgLeTfazEAEPW023WNZXFnk9bma4kXx9KO6kQ== X-Received: by 2002:a17:902:7c13:: with SMTP id x19mr65153129pll.322.1568641707716; Mon, 16 Sep 2019 06:48:27 -0700 (PDT) Received: from ap-To-be-filled-by-O-E-M.1.1.1.1 ([14.33.120.60]) by smtp.gmail.com with ESMTPSA id z20sm2822266pjn.12.2019.09.16.06.48.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Sep 2019 06:48:26 -0700 (PDT) From: Taehee Yoo To: davem@davemloft.net, netdev@vger.kernel.org, j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net, jiri@resnulli.us, sd@queasysnail.net, roopa@cumulusnetworks.com, saeedm@mellanox.com, manishc@marvell.com, rahulv@marvell.com, kys@microsoft.com, haiyangz@microsoft.com, stephen@networkplumber.org, sashal@kernel.org, hare@suse.de, varun@chelsio.com, ubraun@linux.ibm.com, kgraul@linux.ibm.com, jay.vosburgh@canonical.com Cc: ap420073@gmail.com Subject: [PATCH net v3 02/11] vlan: use dynamic lockdep key instead of subclass Date: Mon, 16 Sep 2019 22:47:53 +0900 Message-Id: <20190916134802.8252-3-ap420073@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190916134802.8252-1-ap420073@gmail.com> References: <20190916134802.8252-1-ap420073@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org All VLAN device has same lockdep key and subclass is initialized with nest_level. But actual nest_level value can be changed when a lower device is attached. And at this moment, the subclass should be updated but it seems to be unsafe. So this patch makes VLAN use dynamic lockdep key instead of the subclass. Test commands: ip link add dummy0 type dummy ip link set dummy0 up ip link add bond0 type bond ip link add vlan_dummy1 link dummy0 type vlan id 1 ip link add vlan_bond1 link bond0 type vlan id 2 ip link set vlan_dummy1 master bond0 ip link set bond0 up ip link set vlan_dummy1 up ip link set vlan_bond1 up Both vlan_dummy1 and vlan_bond1 have the same subclass and it makes unnecessary deadlock warning message. Splat looks like: [ 66.163164] WARNING: possible recursive locking detected [ 66.163858] 5.3.0-rc8+ #179 Not tainted [ 66.165520] -------------------------------------------- [ 66.166110] ip/983 is trying to acquire lock: [ 66.166603] 000000009b85ba3e (&vlan_netdev_addr_lock_key/1){+...}, at: dev_uc_sync_multiple+0xfa/0x1a0 [ 66.194006] [ 66.194006] but task is already holding lock: [ 66.194636] 00000000cc752363 (&vlan_netdev_addr_lock_key/1){+...}, at: dev_set_rx_mode+0x19/0x30 [ 66.205191] [ 66.205191] other info that might help us debug this: [ 66.205903] Possible unsafe locking scenario: [ 66.205903] [ 66.206504] CPU0 [ 66.206781] ---- [ 66.208737] lock(&vlan_netdev_addr_lock_key/1); [ 66.257676] lock(&vlan_netdev_addr_lock_key/1); [ 66.282069] [ 66.282069] *** DEADLOCK *** [ 66.282069] [ 66.283708] May be due to missing lock nesting notation [ 66.283708] [ 66.284588] 4 locks held by ip/983: [ 66.285035] #0: 000000002989e16e (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0 [ 66.286051] #1: 00000000cc752363 (&vlan_netdev_addr_lock_key/1){+...}, at: dev_set_rx_mode+0x19/0x30 [ 66.287217] #2: 00000000eddac627 (&dev_addr_list_lock_key/3){+...}, at: dev_mc_sync+0xfa/0x1a0 [ 66.288327] #3: 000000001a459ff7 (rcu_read_lock){....}, at: bond_set_rx_mode+0x5/0x3c0 [bonding] [ 66.289453] [ 66.289453] stack backtrace: [ 66.290019] CPU: 1 PID: 983 Comm: ip Not tainted 5.3.0-rc8+ #179 [ 66.290802] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 66.297639] Call Trace: [ 66.298009] dump_stack+0x7c/0xbb [ 66.298447] __lock_acquire+0x26a9/0x3de0 [ 66.298965] ? register_lock_class+0x14d0/0x14d0 [ 66.299547] ? register_lock_class+0x14d0/0x14d0 [ 66.300151] lock_acquire+0x164/0x3b0 [ 66.300621] ? dev_uc_sync_multiple+0xfa/0x1a0 [ 66.301198] _raw_spin_lock_nested+0x2e/0x60 [ 66.301743] ? dev_uc_sync_multiple+0xfa/0x1a0 [ 66.302311] dev_uc_sync_multiple+0xfa/0x1a0 [ 66.307650] bond_set_rx_mode+0x269/0x3c0 [bonding] [ 66.308175] ? bond_init+0x6f0/0x6f0 [bonding] [ 66.308585] dev_mc_sync+0x15a/0x1a0 [ 66.308927] vlan_dev_set_rx_mode+0x37/0x80 [8021q] [ 66.309375] dev_set_rx_mode+0x21/0x30 [ 66.309727] __dev_open+0x202/0x310 [ 66.310100] ? dev_set_rx_mode+0x30/0x30 [ 66.310513] ? mark_held_locks+0xa5/0xe0 [ 66.310934] ? __local_bh_enable_ip+0xe9/0x1b0 [ 66.311387] __dev_change_flags+0x3c3/0x500 [ 66.311839] ? dev_set_allmulti+0x10/0x10 [ 66.312248] ? kmem_cache_alloc_trace+0x12c/0x320 [ 66.312746] dev_change_flags+0x7a/0x160 [ 66.313161] vlan_device_event+0x846/0x20d0 [8021q] [ ... ] Fixes: 0fe1e567d0b4 ("[VLAN]: nested VLAN: fix lockdep's recursive locking warning") Signed-off-by: Taehee Yoo --- v2 -> v3 : - This patch is not changed v1 -> v2 : - This patch is not changed include/linux/if_vlan.h | 3 +++ net/8021q/vlan_dev.c | 28 +++++++++++++++------------- 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 244278d5c222..1aed9f613e90 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -183,6 +183,9 @@ struct vlan_dev_priv { struct netpoll *netpoll; #endif unsigned int nest_level; + + struct lock_class_key xmit_lock_key; + struct lock_class_key addr_lock_key; }; static inline struct vlan_dev_priv *vlan_dev_priv(const struct net_device *dev) diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 93eadf179123..12bc80650087 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -494,24 +494,24 @@ static void vlan_dev_set_rx_mode(struct net_device *vlan_dev) * "super class" of normal network devices; split their locks off into a * separate class since they always nest. */ -static struct lock_class_key vlan_netdev_xmit_lock_key; -static struct lock_class_key vlan_netdev_addr_lock_key; - static void vlan_dev_set_lockdep_one(struct net_device *dev, struct netdev_queue *txq, - void *_subclass) + void *_unused) { - lockdep_set_class_and_subclass(&txq->_xmit_lock, - &vlan_netdev_xmit_lock_key, - *(int *)_subclass); + struct vlan_dev_priv *vlan = vlan_dev_priv(dev); + + lockdep_set_class(&txq->_xmit_lock, &vlan->xmit_lock_key); } -static void vlan_dev_set_lockdep_class(struct net_device *dev, int subclass) +static void vlan_dev_set_lockdep_class(struct net_device *dev) { - lockdep_set_class_and_subclass(&dev->addr_list_lock, - &vlan_netdev_addr_lock_key, - subclass); - netdev_for_each_tx_queue(dev, vlan_dev_set_lockdep_one, &subclass); + struct vlan_dev_priv *vlan = vlan_dev_priv(dev); + + lockdep_register_key(&vlan->addr_lock_key); + lockdep_set_class(&dev->addr_list_lock, &vlan->addr_lock_key); + + lockdep_register_key(&vlan->xmit_lock_key); + netdev_for_each_tx_queue(dev, vlan_dev_set_lockdep_one, NULL); } static int vlan_dev_get_lock_subclass(struct net_device *dev) @@ -609,7 +609,7 @@ static int vlan_dev_init(struct net_device *dev) SET_NETDEV_DEVTYPE(dev, &vlan_type); - vlan_dev_set_lockdep_class(dev, vlan_dev_get_lock_subclass(dev)); + vlan_dev_set_lockdep_class(dev); vlan->vlan_pcpu_stats = netdev_alloc_pcpu_stats(struct vlan_pcpu_stats); if (!vlan->vlan_pcpu_stats) @@ -630,6 +630,8 @@ static void vlan_dev_uninit(struct net_device *dev) kfree(pm); } } + lockdep_unregister_key(&vlan->addr_lock_key); + lockdep_unregister_key(&vlan->xmit_lock_key); } static netdev_features_t vlan_dev_fix_features(struct net_device *dev, From patchwork Mon Sep 16 13:47:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taehee Yoo X-Patchwork-Id: 1162825 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="rhn/tfYR"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46X6ys3BHBz9sN1 for ; Mon, 16 Sep 2019 23:48:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387784AbfIPNsg (ORCPT ); Mon, 16 Sep 2019 09:48:36 -0400 Received: from mail-pf1-f193.google.com ([209.85.210.193]:38996 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387751AbfIPNsg (ORCPT ); Mon, 16 Sep 2019 09:48:36 -0400 Received: by mail-pf1-f193.google.com with SMTP id i1so14207922pfa.6 for ; Mon, 16 Sep 2019 06:48:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=OfqP2HOQCtH2vUZrmLUvrlsnzKQOAy+Z/HDWz2BBJp4=; b=rhn/tfYRhgboMVtddKwQBhSZIQnMrZLdBJBWpAuj2zPaLI0IonFuMtb+r1H3e9hiIH 4b3i6F3UYVRcMOE4lXnbUbLPXZJZ4l5DQ11u4zhV8IKF54A5y1iJG7RuTWEgUtghgSgQ E0IUZoz57hmjcZwBBGBCvDa5/+MuLn93w+8JndHTKIqAsVLwkDuVYEAvjb2cWPAiQvkQ 605+qKMZcX/9p+tA95hpAtX7Op84QosqvarZ6VAbHP9UCs9eh3M50j6xwO8G/eNdH9mm NRZ7fUnSi9zNJjldb7eSFbqQ1VwmhvZ6/AJ14RThO5iI8LBPFdP47K75la0+w4ZjsoUo ykBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=OfqP2HOQCtH2vUZrmLUvrlsnzKQOAy+Z/HDWz2BBJp4=; b=MhIN1L4vk/1dB++Sd0k5l+pin9UOtpyz2BcyS770SrsCQTRc3uuIKkA/0sS4nNx1Un UFjCaJ2woQMM97nU1/k9JvbOxPU7xJ+sMsq2A95r+NcNryP7MmmGrRnpFiYynXcYQmXe k5ubGqBhMNPeqd1JzfmsXaNlubWboHHObGrJAuzn7KrK9Q4L9UKvK4UdX+mNA6GRIWJB 53jTJ+RiZ0G2FWMLg7WR3yYtPtiJstj2lCd3eW1ntG2MTS9nCc6YMj/6N52B4q8ErY4c 4Fo/nys/TPkcfw2khAS8U3tQTBQum2a84ZxvYzUfNf3Pxrt4/Km+VjStN5zmdTu0nCY5 smTg== X-Gm-Message-State: APjAAAXy9ut+YXsv4GOrZs2lhYZ7ngICVHeSjBwueGLqxOD0Z5VMZk2C KZ0Htq/TGpfKexD/ntJzeD8= X-Google-Smtp-Source: APXvYqzir1XT7bq9ba1I7XeDmu2G0xm+XtkHcDUT8OBBsUd8KfvWyMfbegc9MwFSeuheKLFcJuExdQ== X-Received: by 2002:a17:90a:17ad:: with SMTP id q42mr130876pja.26.1568641714111; Mon, 16 Sep 2019 06:48:34 -0700 (PDT) Received: from ap-To-be-filled-by-O-E-M.1.1.1.1 ([14.33.120.60]) by smtp.gmail.com with ESMTPSA id z20sm2822266pjn.12.2019.09.16.06.48.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Sep 2019 06:48:33 -0700 (PDT) From: Taehee Yoo To: davem@davemloft.net, netdev@vger.kernel.org, j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net, jiri@resnulli.us, sd@queasysnail.net, roopa@cumulusnetworks.com, saeedm@mellanox.com, manishc@marvell.com, rahulv@marvell.com, kys@microsoft.com, haiyangz@microsoft.com, stephen@networkplumber.org, sashal@kernel.org, hare@suse.de, varun@chelsio.com, ubraun@linux.ibm.com, kgraul@linux.ibm.com, jay.vosburgh@canonical.com Cc: ap420073@gmail.com Subject: [PATCH net v3 03/11] bonding: fix unexpected IFF_BONDING bit unset Date: Mon, 16 Sep 2019 22:47:54 +0900 Message-Id: <20190916134802.8252-4-ap420073@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190916134802.8252-1-ap420073@gmail.com> References: <20190916134802.8252-1-ap420073@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The IFF_BONDING means bonding master or bonding slave device. ->ndo_add_slave() sets IFF_BONDING flag and ->ndo_del_slave() unsets IFF_BONDING flag. bond0<--bond1 Both bond0 and bond1 are bonding device and these should keep having IFF_BONDING flag until they are removed. But bond1 would lose IFF_BONDING at ->ndo_del_slave() because that routine do not check whether the slave device is the bonding type or not. This patch adds the interface type check routine before removing IFF_BONDING flag. Test commands: ip link add bond0 type bond ip link add bond1 type bond ip link set bond1 master bond0 ip link set bond1 nomaster ip link del bond1 type bond ip link add bond1 type bond Splat looks like: [ 58.210981] proc_dir_entry 'bonding/bond1' already registered [ 58.463875] WARNING: CPU: 0 PID: 955 at fs/proc/generic.c:361 proc_register+0x2a9/0x3e0 [ 58.466423] Modules linked in: bonding veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nfs [ 58.483855] CPU: 0 PID: 955 Comm: ip Not tainted 5.3.0-rc8+ #179 [ 58.484657] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 58.485779] RIP: 0010:proc_register+0x2a9/0x3e0 [ 58.486377] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 39 01 00 00 48 8b 04 24 48 89 ea 48 c7 c7 60 0f 14 bd 480 [ 58.489003] RSP: 0018:ffff8880cc007078 EFLAGS: 00010282 [ 58.553743] RAX: dffffc0000000008 RBX: ffff8880ce23c0d0 RCX: ffffffffbbd021e2 [ 58.584076] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff8880da5f6b8c [ 58.584901] RBP: ffff8880ce23c353 R08: ffffed101b4bff91 R09: ffffed101b4bff91 [ 58.585724] R10: 0000000000000001 R11: ffffed101b4bff90 R12: ffff8880ce23c268 [ 58.586508] R13: ffff8880ce23c352 R14: dffffc0000000000 R15: ffffed1019c4786a [ 58.587296] FS: 00007f52d53b60c0(0000) GS:ffff8880da400000(0000) knlGS:0000000000000000 [ 58.588247] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 58.653694] CR2: 00007f31a9df9320 CR3: 00000000cd4ea006 CR4: 00000000000606f0 [ 58.654591] Call Trace: [ 58.654895] proc_create_seq_private+0xb3/0xf0 [ 58.655400] bond_create_proc_entry+0x1b3/0x3f0 [bonding] [ 58.655985] bond_netdev_event+0x433/0x970 [bonding] [ 58.656545] ? __module_text_address+0x13/0x140 [ 58.657038] notifier_call_chain+0x90/0x160 [ 58.657541] register_netdevice+0x9b3/0xd80 [ 58.657999] ? alloc_netdev_mqs+0x854/0xc10 [ 58.658476] ? netdev_change_features+0xa0/0xa0 [ 58.663592] ? rtnl_create_link+0x2ed/0xad0 [ 58.664049] bond_newlink+0x2a/0x60 [bonding] [ 58.664529] __rtnl_newlink+0xb9f/0x11b0 [ 58.665014] ? rtnl_link_unregister+0x230/0x230 [ ... ] Fixes: 0b680e753724 ("[PATCH] bonding: Add priv_flag to avoid event mishandling") Signed-off-by: Taehee Yoo Signed-off-by: Jay Vosburgh --- v2 -> v3 : - This patch is not changed v1 -> v2 : - Do not add a new priv_flag. drivers/net/bonding/bond_main.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 931d9d935686..0db12fcfc953 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1816,7 +1816,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, slave_disable_netpoll(new_slave); err_close: - slave_dev->priv_flags &= ~IFF_BONDING; + if (!netif_is_bond_master(slave_dev)) + slave_dev->priv_flags &= ~IFF_BONDING; dev_close(slave_dev); err_restore_mac: @@ -2017,7 +2018,8 @@ static int __bond_release_one(struct net_device *bond_dev, else dev_set_mtu(slave_dev, slave->original_mtu); - slave_dev->priv_flags &= ~IFF_BONDING; + if (!netif_is_bond_master(slave_dev)) + slave_dev->priv_flags &= ~IFF_BONDING; bond_free_slave(slave); From patchwork Mon Sep 16 13:47:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taehee Yoo X-Patchwork-Id: 1162826 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ry/KIFGy"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46X6z06bsVz9sN1 for ; Mon, 16 Sep 2019 23:48:44 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387811AbfIPNso (ORCPT ); Mon, 16 Sep 2019 09:48:44 -0400 Received: from mail-pf1-f193.google.com ([209.85.210.193]:34627 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387751AbfIPNsn (ORCPT ); Mon, 16 Sep 2019 09:48:43 -0400 Received: by mail-pf1-f193.google.com with SMTP id b128so4150465pfa.1 for ; Mon, 16 Sep 2019 06:48:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ZPeLMo9MFvVW4yL0ZJQGO2vpPx1mzULdNeE2N0hHoNo=; b=ry/KIFGyTIjV87OjhUll6s4GmkkiI62rg6l24IptjAsB+5RL0LOgDxf8LfRGqttDsQ FBZJxK/x+Pa4raGf5cDiQv6JLY+b1U/8J7vcnh9ukMWv5xJTZPoBySw7EqNlxFolbxgg n/ir+mENlOBMfAhnBcz6nJpL+8qHJ4rNMwqAretfI56uNn4Wb3hm+KcmbXAZP2I1XGrT SSXCrUFozAZ/dvorngbGZ9clFtOWXubX3HvveFQjaGbAMyth9XramzC9osVq0tGs9Njl pBjaFfAlaISPCZcLDXJRyJO7E/tnZvieYayLBlfTgg32DbZau7bVZ4zsMpaR7MEAOgp5 dUpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ZPeLMo9MFvVW4yL0ZJQGO2vpPx1mzULdNeE2N0hHoNo=; b=svoWsNsctObLACdIpwHm3Gi4eF2DuG7Zsb692/hPDS3cLumU2uoY37VEv0yOKSdfva NUe8thP5aNgYPwA2Qk6Uf8qARHCX+r3Pbk2r+uPg6Xkpkd349cbWMyUmsuNErmvKzJly uCduSOQzzSnxeJ5u+mf7TMR+kTNOrhOH90vNMmFOFlGS4cZ1s8DJHP5V9QxrRCoLbPeu nIBAWzmI56RQ1ftVNtX/i0Eq9ebUewh/YQe6aJZGoonJAJLVaXDxnBRG2/5tfPoCZVdr nwW+MlVXcfr4AQd62pYtlwHoN1vOzJ2zyOWVvm3z3wa3LyvcqEXkrimDOL1W4hAxQuRH nweQ== X-Gm-Message-State: APjAAAVGH6nZOO+PULLHksk2oVYDLa5sivOjj+FSf14XDklFLtQUeGwZ w0Z5No6G+OeP7VMbLBfczmE= X-Google-Smtp-Source: APXvYqzHsT0un85d7+rbFlEzNuR5iCY6NQRKaIWrJRDwBwZgX2oHyfcTqsQLcF4n+gOIE/YsZ9hHGQ== X-Received: by 2002:a63:ab05:: with SMTP id p5mr9639474pgf.414.1568641722741; Mon, 16 Sep 2019 06:48:42 -0700 (PDT) Received: from ap-To-be-filled-by-O-E-M.1.1.1.1 ([14.33.120.60]) by smtp.gmail.com with ESMTPSA id z20sm2822266pjn.12.2019.09.16.06.48.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Sep 2019 06:48:41 -0700 (PDT) From: Taehee Yoo To: davem@davemloft.net, netdev@vger.kernel.org, j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net, jiri@resnulli.us, sd@queasysnail.net, roopa@cumulusnetworks.com, saeedm@mellanox.com, manishc@marvell.com, rahulv@marvell.com, kys@microsoft.com, haiyangz@microsoft.com, stephen@networkplumber.org, sashal@kernel.org, hare@suse.de, varun@chelsio.com, ubraun@linux.ibm.com, kgraul@linux.ibm.com, jay.vosburgh@canonical.com Cc: ap420073@gmail.com Subject: [PATCH net v3 04/11] bonding: use dynamic lockdep key instead of subclass Date: Mon, 16 Sep 2019 22:47:55 +0900 Message-Id: <20190916134802.8252-5-ap420073@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190916134802.8252-1-ap420073@gmail.com> References: <20190916134802.8252-1-ap420073@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org All bonding device has same lockdep key and subclass is initialized with nest_level. But actual nest_level value can be changed when a lower device is attached. And at this moment, the subclass should be updated but it seems to be unsafe. So this patch makes bonding use dynamic lockdep key instead of the subclass. Test commands: ip link add bond0 type bond for i in {1..5} do let A=$i-1 ip link add bond$i type bond ip link set bond$i master bond$A done ip link set bond5 master bond0 Splat looks like: [ 53.930344] WARNING: possible recursive locking detected [ 53.931041] 5.3.0-rc8+ #179 Not tainted [ 53.931554] -------------------------------------------- [ 53.932285] ip/984 is trying to acquire lock: [ 53.932854] 00000000e313b280 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 53.934144] [ 53.934144] but task is already holding lock: [ 53.934907] 00000000f5a0c2e3 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 53.936268] [ 53.936268] other info that might help us debug this: [ 53.937135] Possible unsafe locking scenario: [ 53.937135] [ 53.937910] CPU0 [ 53.938247] ---- [ 53.938577] lock(&(&bond->stats_lock)->rlock#2/2); [ 53.939234] lock(&(&bond->stats_lock)->rlock#2/2); [ 53.939903] [ 53.939903] *** DEADLOCK *** [ 53.939903] [ 53.940745] May be due to missing lock nesting notation [ 53.940745] [ 53.941626] 3 locks held by ip/984: [ 53.942005] #0: 000000009e3df2a0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0 [ 53.942942] #1: 00000000f5a0c2e3 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bond] [ 53.944194] #2: 000000005b301abc (rcu_read_lock){....}, at: bond_get_stats+0x9f/0x500 [bonding] [ 53.945168] [ 53.945168] stack backtrace: [ 53.945672] CPU: 0 PID: 984 Comm: ip Not tainted 5.3.0-rc8+ #179 [ 53.946606] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 53.947529] Call Trace: [ 53.947795] dump_stack+0x7c/0xbb [ 53.948213] __lock_acquire+0x26a9/0x3de0 [ 53.948666] ? register_lock_class+0x14d0/0x14d0 [ 53.949206] lock_acquire+0x164/0x3b0 [ 53.949647] ? bond_get_stats+0xb8/0x500 [bonding] [ 53.950169] _raw_spin_lock_nested+0x2e/0x60 [ 53.950630] ? bond_get_stats+0xb8/0x500 [bonding] [ 53.951132] bond_get_stats+0xb8/0x500 [bonding] [ 53.951655] ? bond_arp_rcv+0xf10/0xf10 [bonding] [ 53.952220] ? register_lock_class+0x14d0/0x14d0 [ 53.952751] ? bond_get_stats+0xb8/0x500 [bonding] [ 53.953277] dev_get_stats+0x1ec/0x270 [ 53.984289] bond_get_stats+0x1d1/0x500 [bonding] [ 53.984803] ? bond_arp_rcv+0xf10/0xf10 [bonding] [ 53.985323] ? dev_get_alias+0xe2/0x190 [ 53.985748] ? nla_put_ifalias+0x71/0x100 [ 53.986213] ? rtnl_phys_switch_id_fill+0x91/0x100 [ 53.986720] dev_get_stats+0x1ec/0x270 [ 53.987100] rtnl_fill_stats+0x44/0xbe0 [ 53.987494] ? nla_put+0xc2/0x140 [ 53.987828] rtnl_fill_ifinfo+0xec7/0x35b0 [ ... ] Fixes: d3fff6c443fe ("net: add netdev_lockdep_set_classes() helper") Signed-off-by: Taehee Yoo --- v2 -> v3 : - This patch is not changed v1 -> v2 : - This patch is not changed drivers/net/bonding/bond_main.c | 61 ++++++++++++++++++++++++++++++--- include/net/bonding.h | 3 ++ 2 files changed, 59 insertions(+), 5 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 0db12fcfc953..7f574e74ed78 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1857,6 +1857,32 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, return res; } +static void bond_dev_set_lockdep_one(struct net_device *dev, + struct netdev_queue *txq, + void *_unused) +{ + struct bonding *bond = netdev_priv(dev); + + lockdep_set_class(&txq->_xmit_lock, &bond->xmit_lock_key); +} + +static void bond_update_lock_key(struct net_device *dev) +{ + struct bonding *bond = netdev_priv(dev); + + lockdep_unregister_key(&bond->stats_lock_key); + lockdep_unregister_key(&bond->addr_lock_key); + lockdep_unregister_key(&bond->xmit_lock_key); + + lockdep_register_key(&bond->stats_lock_key); + lockdep_register_key(&bond->addr_lock_key); + lockdep_register_key(&bond->xmit_lock_key); + + lockdep_set_class(&bond->stats_lock, &bond->stats_lock_key); + lockdep_set_class(&dev->addr_list_lock, &bond->addr_lock_key); + netdev_for_each_tx_queue(dev, bond_dev_set_lockdep_one, NULL); +} + /* Try to release the slave device from the bond device * It is legal to access curr_active_slave without a lock because all the function * is RTNL-locked. If "all" is true it means that the function is being called @@ -2022,6 +2048,8 @@ static int __bond_release_one(struct net_device *bond_dev, slave_dev->priv_flags &= ~IFF_BONDING; bond_free_slave(slave); + if (netif_is_bond_master(slave_dev)) + bond_update_lock_key(slave_dev); return 0; } @@ -3459,7 +3487,7 @@ static void bond_get_stats(struct net_device *bond_dev, struct list_head *iter; struct slave *slave; - spin_lock_nested(&bond->stats_lock, bond_get_nest_level(bond_dev)); + spin_lock(&bond->stats_lock); memcpy(stats, &bond->bond_stats, sizeof(*stats)); rcu_read_lock(); @@ -4297,8 +4325,6 @@ void bond_setup(struct net_device *bond_dev) { struct bonding *bond = netdev_priv(bond_dev); - spin_lock_init(&bond->mode_lock); - spin_lock_init(&bond->stats_lock); bond->params = bonding_defaults; /* Initialize pointers */ @@ -4367,6 +4393,9 @@ static void bond_uninit(struct net_device *bond_dev) list_del(&bond->bond_list); + lockdep_unregister_key(&bond->stats_lock_key); + lockdep_unregister_key(&bond->addr_lock_key); + lockdep_unregister_key(&bond->xmit_lock_key); bond_debug_unregister(bond); } @@ -4758,6 +4787,29 @@ static int bond_check_params(struct bond_params *params) return 0; } +static struct lock_class_key qdisc_tx_busylock_key; +static struct lock_class_key qdisc_running_key; + +static void bond_dev_set_lockdep_class(struct net_device *dev) +{ + struct bonding *bond = netdev_priv(dev); + + dev->qdisc_tx_busylock = &qdisc_tx_busylock_key; + dev->qdisc_running_key = &qdisc_running_key; + + spin_lock_init(&bond->mode_lock); + + spin_lock_init(&bond->stats_lock); + lockdep_register_key(&bond->stats_lock_key); + lockdep_set_class(&bond->stats_lock, &bond->stats_lock_key); + + lockdep_register_key(&bond->addr_lock_key); + lockdep_set_class(&dev->addr_list_lock, &bond->addr_lock_key); + + lockdep_register_key(&bond->xmit_lock_key); + netdev_for_each_tx_queue(dev, bond_dev_set_lockdep_one, NULL); +} + /* Called from registration process */ static int bond_init(struct net_device *bond_dev) { @@ -4771,8 +4823,7 @@ static int bond_init(struct net_device *bond_dev) return -ENOMEM; bond->nest_level = SINGLE_DEPTH_NESTING; - netdev_lockdep_set_classes(bond_dev); - + bond_dev_set_lockdep_class(bond_dev); list_add_tail(&bond->bond_list, &bn->dev_list); bond_prepare_sysfs_group(bond); diff --git a/include/net/bonding.h b/include/net/bonding.h index f7fe45689142..c39ac7061e41 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -239,6 +239,9 @@ struct bonding { struct dentry *debug_dir; #endif /* CONFIG_DEBUG_FS */ struct rtnl_link_stats64 bond_stats; + struct lock_class_key stats_lock_key; + struct lock_class_key xmit_lock_key; + struct lock_class_key addr_lock_key; }; #define bond_slave_get_rcu(dev) \ From patchwork Mon Sep 16 13:47:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taehee Yoo X-Patchwork-Id: 1162827 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="g4seRh1i"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46X6zB0x2Fz9sN1 for ; Mon, 16 Sep 2019 23:48:54 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387851AbfIPNsx (ORCPT ); Mon, 16 Sep 2019 09:48:53 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:35754 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387751AbfIPNsx (ORCPT ); Mon, 16 Sep 2019 09:48:53 -0400 Received: by mail-pl1-f193.google.com with SMTP id s17so12037877plp.2 for ; Mon, 16 Sep 2019 06:48:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=kiGUwG3+6UAbXYJEPg2wTZRIJUJDxXRvDswcnUc5v6U=; b=g4seRh1iShjj5fCZi2jL2CsAx8Edv3dDJ61u+PqqAtCfDt/JrSeu+2QTQ8ljbMgSLR tAbeGUQ10CxEupUaJ4nWGcMNlk/39gGEPisC4zkXvjGDzqlUsIBQJs9dYyO/GWaOMWVd CU0enJJLJ8/bsl13NpbuMDDVcNWE7LVBZEhIZrBfEhugK9uLdRVuazqNpv2U2Gl2HCNA qVlmcCoXNYn2HBUzwNyA+HcsAUHAln6fv2eteiZsZ8237pBL0G7Kvs02etvIyeUSbNx6 jHukQE3NXIvDf/hhVMYN5czAVmfQpp4RdQ7Mgz8u57TzVMNxCtc2YcjCp3cVIY1YslQh nRqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=kiGUwG3+6UAbXYJEPg2wTZRIJUJDxXRvDswcnUc5v6U=; b=sCDqvWlCFUeItQwEdSlZPgZuiOJNmBy8jXcMb4y4ATF4as0rK+gFNFleYrUpJD1AXV PM6UtRC64rFPMHiykl6FhvIFiZGzaZUrX1biYHZmUkY57qbA894qpLTXwjPojzCYGDcE Q5Mziw+XzKpyAglTcJIvaoNklTEn9+ZeIvJpGpxSK2wZL83BKOw4SIysk+yjs2IyJn9L sIEPrNag+EBFnxWyDiyRdBRK2ffWew8+UOS16hUS0leLB+8CGLgBr1KzuOljHbZNA8qo ZFD7fEsZoERkZW9SmuinavUp6STNKs1UjyFwheBUDRB61tI16iFtjXdpUeDWqXs7FlFi zCOw== X-Gm-Message-State: APjAAAWgsTO0V1McLQ1Nh17jWp8UTCeiVDRjsHDvp/Yq/5X4zieOOPXd x8o+W+mD/ZANwJT9bErgZ+Q= X-Google-Smtp-Source: APXvYqxGw0GDbTMzCSa6XKtAqktAliHx/SqcfDztpl5EfzZ8uVJT1a+jd7fvXS1eTcA56+Q2xvCCuw== X-Received: by 2002:a17:902:7d8b:: with SMTP id a11mr11562101plm.149.1568641730058; Mon, 16 Sep 2019 06:48:50 -0700 (PDT) Received: from ap-To-be-filled-by-O-E-M.1.1.1.1 ([14.33.120.60]) by smtp.gmail.com with ESMTPSA id z20sm2822266pjn.12.2019.09.16.06.48.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Sep 2019 06:48:49 -0700 (PDT) From: Taehee Yoo To: davem@davemloft.net, netdev@vger.kernel.org, j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net, jiri@resnulli.us, sd@queasysnail.net, roopa@cumulusnetworks.com, saeedm@mellanox.com, manishc@marvell.com, rahulv@marvell.com, kys@microsoft.com, haiyangz@microsoft.com, stephen@networkplumber.org, sashal@kernel.org, hare@suse.de, varun@chelsio.com, ubraun@linux.ibm.com, kgraul@linux.ibm.com, jay.vosburgh@canonical.com Cc: ap420073@gmail.com Subject: [PATCH net v3 05/11] team: use dynamic lockdep key instead of static key Date: Mon, 16 Sep 2019 22:47:56 +0900 Message-Id: <20190916134802.8252-6-ap420073@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190916134802.8252-1-ap420073@gmail.com> References: <20190916134802.8252-1-ap420073@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In the current code, all team devices have same static lockdep key and team devices could be nested so that it makes unnecessary lockdep warning. Test commands: ip link add team0 type team for i in {1..7} do let A=$i-1 ip link add team$i type team ip link set team$i master team$A done ip link del team0 Splat looks like: [ 52.518420] WARNING: possible recursive locking detected [ 52.518955] 5.3.0-rc8+ #179 Not tainted [ 52.519373] -------------------------------------------- [ 52.519925] ip/1005 is trying to acquire lock: [ 52.520893] 00000000c84e69ac (&dev_addr_list_lock_key/1){+...}, at: dev_uc_sync_multiple+0xfa/0x1a0 [ 52.522194] [ 52.522194] but task is already holding lock: [ 52.523038] 0000000025864f52 (&dev_addr_list_lock_key/1){+...}, at: dev_uc_unsync+0x10c/0x1b0 [ 52.524609] [ 52.524609] other info that might help us debug this: [ 52.525435] Possible unsafe locking scenario: [ 52.525435] [ 52.526154] CPU0 [ 52.526460] ---- [ 52.526766] lock(&dev_addr_list_lock_key/1); [ 52.527311] lock(&dev_addr_list_lock_key/1); [ 52.527854] [ 52.527854] *** DEADLOCK *** [ 52.527854] [ 52.528573] May be due to missing lock nesting notation [ 52.528573] [ 52.529527] 5 locks held by ip/1005: [ 52.529968] #0: 0000000080a68bb2 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0 [ 52.530940] #1: 0000000035d90450 (&team->lock){+.+.}, at: team_uninit+0x3a/0x1a0 [team] [ 52.531933] #2: 00000000c75d8f70 (&dev_addr_list_lock_key){+...}, at: dev_uc_unsync+0x98/0x1b0 [ 52.532991] #3: 0000000025864f52 (&dev_addr_list_lock_key/1){+...}, at: dev_uc_unsync+0x10c/0x1b0 [ 52.534142] #4: 00000000efa4d642 (rcu_read_lock){....}, at: team_set_rx_mode+0x5/0x1d0 [team] [ 52.535195] [ 52.535195] stack backtrace: [ 52.535727] CPU: 0 PID: 1005 Comm: ip Not tainted 5.3.0-rc8+ #179 [ 52.536376] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 52.537254] Call Trace: [ 52.537549] dump_stack+0x7c/0xbb [ 52.537917] __lock_acquire+0x26a9/0x3de0 [ 52.538340] ? register_lock_class+0x14d0/0x14d0 [ 52.538872] ? register_lock_class+0x14d0/0x14d0 [ 52.539559] lock_acquire+0x164/0x3b0 [ 52.540043] ? dev_uc_sync_multiple+0xfa/0x1a0 [ 52.540663] _raw_spin_lock_nested+0x2e/0x60 [ 52.541241] ? dev_uc_sync_multiple+0xfa/0x1a0 [ 52.541867] dev_uc_sync_multiple+0xfa/0x1a0 [ 52.542435] team_set_rx_mode+0xa9/0x1d0 [team] [ 52.543033] dev_uc_unsync+0x151/0x1b0 [ 52.543534] team_port_del+0x304/0x790 [team] [ 52.544110] team_uninit+0xb0/0x1a0 [team] [ 52.544653] rollback_registered_many+0x728/0xda0 [ 52.545271] ? generic_xdp_install+0x310/0x310 [ 52.545865] ? check_chain_key+0x236/0x5d0 [ 52.546425] ? __nla_validate_parse+0x98/0x1ad0 [ 52.547023] unregister_netdevice_many.part.124+0x13/0x1b0 [ 52.547741] rtnl_delete_link+0xbc/0x100 [ 52.548262] ? rtnl_af_register+0xc0/0xc0 [ 52.548793] rtnl_dellink+0x30e/0x8a0 [ ... ] Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Signed-off-by: Taehee Yoo --- v2 -> v3 : - This patch is not changed v1 -> v2 : - This patch is not changed drivers/net/team/team.c | 61 ++++++++++++++++++++++++++++++++++++++--- include/linux/if_team.h | 5 ++++ 2 files changed, 62 insertions(+), 4 deletions(-) diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c index e8089def5a46..bfcd6ed57493 100644 --- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -1607,6 +1607,34 @@ static const struct team_option team_options[] = { }, }; +static void team_dev_set_lockdep_one(struct net_device *dev, + struct netdev_queue *txq, + void *_unused) +{ + struct team *team = netdev_priv(dev); + + lockdep_set_class(&txq->_xmit_lock, &team->xmit_lock_key); +} + +static struct lock_class_key qdisc_tx_busylock_key; +static struct lock_class_key qdisc_running_key; + +static void team_dev_set_lockdep_class(struct net_device *dev) +{ + struct team *team = netdev_priv(dev); + + dev->qdisc_tx_busylock = &qdisc_tx_busylock_key; + dev->qdisc_running_key = &qdisc_running_key; + + lockdep_register_key(&team->team_lock_key); + __mutex_init(&team->lock, "team->team_lock_key", &team->team_lock_key); + + lockdep_register_key(&team->addr_lock_key); + lockdep_set_class(&dev->addr_list_lock, &team->addr_lock_key); + + lockdep_register_key(&team->xmit_lock_key); + netdev_for_each_tx_queue(dev, team_dev_set_lockdep_one, NULL); +} static int team_init(struct net_device *dev) { @@ -1615,7 +1643,6 @@ static int team_init(struct net_device *dev) int err; team->dev = dev; - mutex_init(&team->lock); team_set_no_mode(team); team->pcpu_stats = netdev_alloc_pcpu_stats(struct team_pcpu_stats); @@ -1642,7 +1669,7 @@ static int team_init(struct net_device *dev) goto err_options_register; netif_carrier_off(dev); - netdev_lockdep_set_classes(dev); + team_dev_set_lockdep_class(dev); return 0; @@ -1673,6 +1700,11 @@ static void team_uninit(struct net_device *dev) team_queue_override_fini(team); mutex_unlock(&team->lock); netdev_change_features(dev); + + lockdep_unregister_key(&team->team_lock_key); + lockdep_unregister_key(&team->addr_lock_key); + lockdep_unregister_key(&team->xmit_lock_key); + } static void team_destructor(struct net_device *dev) @@ -1967,6 +1999,23 @@ static int team_add_slave(struct net_device *dev, struct net_device *port_dev, return err; } +static void team_update_lock_key(struct net_device *dev) +{ + struct team *team = netdev_priv(dev); + + lockdep_unregister_key(&team->team_lock_key); + lockdep_unregister_key(&team->addr_lock_key); + lockdep_unregister_key(&team->xmit_lock_key); + + lockdep_register_key(&team->team_lock_key); + lockdep_register_key(&team->addr_lock_key); + lockdep_register_key(&team->xmit_lock_key); + + lockdep_set_class(&team->lock, &team->team_lock_key); + lockdep_set_class(&dev->addr_list_lock, &team->addr_lock_key); + netdev_for_each_tx_queue(dev, team_dev_set_lockdep_one, NULL); +} + static int team_del_slave(struct net_device *dev, struct net_device *port_dev) { struct team *team = netdev_priv(dev); @@ -1976,8 +2025,12 @@ static int team_del_slave(struct net_device *dev, struct net_device *port_dev) err = team_port_del(team, port_dev); mutex_unlock(&team->lock); - if (!err) - netdev_change_features(dev); + if (err) + return err; + + if (netif_is_team_master(port_dev)) + team_update_lock_key(port_dev); + netdev_change_features(dev); return err; } diff --git a/include/linux/if_team.h b/include/linux/if_team.h index 06faa066496f..9c97bb19ed34 100644 --- a/include/linux/if_team.h +++ b/include/linux/if_team.h @@ -223,6 +223,11 @@ struct team { atomic_t count_pending; struct delayed_work dw; } mcast_rejoin; + + struct lock_class_key team_lock_key; + struct lock_class_key xmit_lock_key; + struct lock_class_key addr_lock_key; + long mode_priv[TEAM_MODE_PRIV_LONGS]; }; From patchwork Mon Sep 16 13:47:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taehee Yoo X-Patchwork-Id: 1162828 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="F0wIK8up"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46X6zJ3DyNz9sN1 for ; Mon, 16 Sep 2019 23:49:00 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387875AbfIPNs7 (ORCPT ); Mon, 16 Sep 2019 09:48:59 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:35094 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387751AbfIPNs7 (ORCPT ); Mon, 16 Sep 2019 09:48:59 -0400 Received: by mail-pf1-f195.google.com with SMTP id 205so23058372pfw.2 for ; Mon, 16 Sep 2019 06:48:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=g8eaDZnnKT+l+rr+X6QJHo4HKbmXnQXglT66ibBbMp4=; b=F0wIK8upbYiB8Y6BJnd+X2TrGZtDw/+FO2bGa6A3/MJZM3ZeCFtqAp7BLrMaYOMMhJ b76GD5DBNOq8uinHPM12MtXNAH7je3nFgxfzoatGg1oZEMVfV9OE2duJnQeqzx2OyxUb pB2yPzNq0N5Tbv+qoTCmNfR4xX18PlIg88g0AWZZfAcOsbhx+hzcizUjIeUzkL+mmuk4 qLt5ONSko6P5seA25MREyrZRGI+DAum2QOB1L6Zm7WM4l6pTqxErVSB2tX65qBGMs/AJ WEct8uzDsI9WTkMI45cQnCapAnEiNprAKZf+YqU6ZPbnhdwodVTmEcW/QS94S2hNVVqy JEcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=g8eaDZnnKT+l+rr+X6QJHo4HKbmXnQXglT66ibBbMp4=; b=hx2XskO10t1zQ9jDc6V2zpLFVFP1DSRBMPJJM++j8su5rrkCF18400Nz7o0nRk/Tmg 0W2Dc3eOBVdnus0T2xTUJw0suS93f5TG51/wyIkmcrDT+hv/P2SWJy1+p7M473kPnl+B UeEeOr4TrhFwBTou1PsKRMLLC/QOmbomZTUqCZqbxhz5pCAtGJ8whonrIpNRYVoadDCI nc16CF3Nl0vYTVn3cE7tIzaatfqZbagIKOncOTaelOK9QKD6u7Iqhii61wgSiusNpAus aO6TRyoGf1g86RLqihDeBi9aUyougpUO/h1/eCgafjGmh7Yu7MPrXyMHolUoy2qSWyKP E0Cw== X-Gm-Message-State: APjAAAWOoqeuw7k03Q57m3cnGhN1fPuckUhDI3qktYRZXduGvrn2lyV1 9YhLFKGwOju1PFv1XMfDLYM= X-Google-Smtp-Source: APXvYqxhPsokTXyP1AUxL30hai+oOYMOl4RQGPSUc9TRwF1/FBHDv8qV1VoqaX5GdkExKLi6ek8dzg== X-Received: by 2002:aa7:8d4b:: with SMTP id s11mr44701659pfe.132.1568641736874; Mon, 16 Sep 2019 06:48:56 -0700 (PDT) Received: from ap-To-be-filled-by-O-E-M.1.1.1.1 ([14.33.120.60]) by smtp.gmail.com with ESMTPSA id z20sm2822266pjn.12.2019.09.16.06.48.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Sep 2019 06:48:55 -0700 (PDT) From: Taehee Yoo To: davem@davemloft.net, netdev@vger.kernel.org, j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net, jiri@resnulli.us, sd@queasysnail.net, roopa@cumulusnetworks.com, saeedm@mellanox.com, manishc@marvell.com, rahulv@marvell.com, kys@microsoft.com, haiyangz@microsoft.com, stephen@networkplumber.org, sashal@kernel.org, hare@suse.de, varun@chelsio.com, ubraun@linux.ibm.com, kgraul@linux.ibm.com, jay.vosburgh@canonical.com Cc: ap420073@gmail.com Subject: [PATCH net v3 06/11] macsec: use dynamic lockdep key instead of subclass Date: Mon, 16 Sep 2019 22:47:57 +0900 Message-Id: <20190916134802.8252-7-ap420073@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190916134802.8252-1-ap420073@gmail.com> References: <20190916134802.8252-1-ap420073@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org All macsec device has same lockdep key and subclass is initialized with nest_level. But actual nest_level value can be changed when a lower device is attached. And at this moment, the subclass should be updated but it seems to be unsafe. So this patch makes macsec use dynamic lockdep key instead of the subclass. Test commands: ip link add bond0 type bond ip link add dummy0 type dummy ip link add macsec0 link bond0 type macsec ip link add macsec1 link dummy0 type macsec ip link set bond0 mtu 1000 ip link set macsec1 master bond0 ip link set bond0 up ip link set macsec0 up ip link set dummy0 up ip link set macsec1 up Splat looks like: [ 57.903193] WARNING: possible recursive locking detected [ 57.913011] 5.3.0-rc8+ #179 Not tainted [ 57.915008] -------------------------------------------- [ 57.916707] ip/964 is trying to acquire lock: [ 57.917102] 000000007f1963bf (&macsec_netdev_addr_lock_key/1){+...}, at: dev_uc_sync_multiple+0xfa/0x1a0 [ 57.918476] [ 57.918476] but task is already holding lock: [ 57.918996] 000000006f6fbc66 (&macsec_netdev_addr_lock_key/1){+...}, at: dev_set_rx_mode+0x19/0x30 [ 57.919790] [ 57.919790] other info that might help us debug this: [ 57.920367] Possible unsafe locking scenario: [ 57.920367] [ 57.920894] CPU0 [ 57.921119] ---- [ 57.921366] lock(&macsec_netdev_addr_lock_key/1); [ 57.921813] lock(&macsec_netdev_addr_lock_key/1); [ 57.922250] [ 57.922250] *** DEADLOCK *** [ 57.922250] [ 57.924742] May be due to missing lock nesting notation [ 57.924742] [ 57.925620] 4 locks held by ip/964: [ 57.926067] #0: 00000000bf0ca196 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0 [ 57.927070] #1: 000000006f6fbc66 (&macsec_netdev_addr_lock_key/1){+...}, at: dev_set_rx_mode+0x19/0x30 [ 57.928241] #2: 00000000e5b874a1 (&dev_addr_list_lock_key/3){+...}, at: dev_mc_sync+0xfa/0x1a0 [ 57.929851] #3: 000000002c43dd91 (rcu_read_lock){....}, at: bond_set_rx_mode+0x5/0x3c0 [bonding] [ 57.930983] [ 57.930983] stack backtrace: [ 57.931582] CPU: 1 PID: 964 Comm: ip Not tainted 5.3.0-rc8+ #179 [ 57.932439] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 57.935333] Call Trace: [ 57.935659] dump_stack+0x7c/0xbb [ 57.936088] __lock_acquire+0x26a9/0x3de0 [ 57.936590] ? register_lock_class+0x14d0/0x14d0 [ 57.937694] ? register_lock_class+0x14d0/0x14d0 [ 57.938269] lock_acquire+0x164/0x3b0 [ 57.938734] ? dev_uc_sync_multiple+0xfa/0x1a0 [ 57.939291] _raw_spin_lock_nested+0x2e/0x60 [ 57.940112] ? dev_uc_sync_multiple+0xfa/0x1a0 [ 57.940672] dev_uc_sync_multiple+0xfa/0x1a0 [ 57.941730] bond_set_rx_mode+0x269/0x3c0 [bonding] [ 57.942340] ? bond_init+0x6f0/0x6f0 [bonding] [ 57.942895] ? do_raw_spin_trylock+0xa9/0x170 [ 57.943435] dev_mc_sync+0x15a/0x1a0 [ 57.943891] macsec_dev_set_rx_mode+0x3a/0x50 [macsec] [ 57.944560] dev_set_rx_mode+0x21/0x30 [ 57.945051] __dev_open+0x202/0x310 [ ... ] Fixes: e20038724552 ("macsec: fix lockdep splats when nesting devices") Signed-off-by: Taehee Yoo --- v2 -> v3 : - This patch is not changed v1 -> v2 : - This patch is not changed drivers/net/macsec.c | 37 ++++++++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 8f46aa1ddec0..25a4fc88145d 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -267,6 +267,8 @@ struct macsec_dev { struct pcpu_secy_stats __percpu *stats; struct list_head secys; struct gro_cells gro_cells; + struct lock_class_key xmit_lock_key; + struct lock_class_key addr_lock_key; unsigned int nest_level; }; @@ -2749,7 +2751,32 @@ static netdev_tx_t macsec_start_xmit(struct sk_buff *skb, #define MACSEC_FEATURES \ (NETIF_F_SG | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST) -static struct lock_class_key macsec_netdev_addr_lock_key; + +static void macsec_dev_set_lockdep_one(struct net_device *dev, + struct netdev_queue *txq, + void *_unused) +{ + struct macsec_dev *macsec = macsec_priv(dev); + + lockdep_set_class(&txq->_xmit_lock, &macsec->xmit_lock_key); +} + +static struct lock_class_key qdisc_tx_busylock_key; +static struct lock_class_key qdisc_running_key; + +static void macsec_dev_set_lockdep_class(struct net_device *dev) +{ + struct macsec_dev *macsec = macsec_priv(dev); + + dev->qdisc_tx_busylock = &qdisc_tx_busylock_key; + dev->qdisc_running_key = &qdisc_running_key; + + lockdep_register_key(&macsec->addr_lock_key); + lockdep_set_class(&dev->addr_list_lock, &macsec->addr_lock_key); + + lockdep_register_key(&macsec->xmit_lock_key); + netdev_for_each_tx_queue(dev, macsec_dev_set_lockdep_one, NULL); +} static int macsec_dev_init(struct net_device *dev) { @@ -2780,6 +2807,7 @@ static int macsec_dev_init(struct net_device *dev) if (is_zero_ether_addr(dev->broadcast)) memcpy(dev->broadcast, real_dev->broadcast, dev->addr_len); + macsec_dev_set_lockdep_class(dev); return 0; } @@ -2789,6 +2817,9 @@ static void macsec_dev_uninit(struct net_device *dev) gro_cells_destroy(&macsec->gro_cells); free_percpu(dev->tstats); + + lockdep_unregister_key(&macsec->addr_lock_key); + lockdep_unregister_key(&macsec->xmit_lock_key); } static netdev_features_t macsec_fix_features(struct net_device *dev, @@ -3263,10 +3294,6 @@ static int macsec_newlink(struct net *net, struct net_device *dev, dev_hold(real_dev); macsec->nest_level = dev_get_nest_level(real_dev) + 1; - netdev_lockdep_set_classes(dev); - lockdep_set_class_and_subclass(&dev->addr_list_lock, - &macsec_netdev_addr_lock_key, - macsec_get_nest_level(dev)); err = netdev_upper_dev_link(real_dev, dev, extack); if (err < 0) From patchwork Mon Sep 16 13:47:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taehee Yoo X-Patchwork-Id: 1162829 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="m+p+l5qc"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46X6zP3Nv4z9sN1 for ; Mon, 16 Sep 2019 23:49:05 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387907AbfIPNtE (ORCPT ); Mon, 16 Sep 2019 09:49:04 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:41726 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387751AbfIPNtE (ORCPT ); Mon, 16 Sep 2019 09:49:04 -0400 Received: by mail-pf1-f196.google.com with SMTP id q7so1504778pfh.8 for ; Mon, 16 Sep 2019 06:49:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=9ryfwz8+CO4vKa8AgZEqIHnA9yEV1ieQHqj9PJgnOK4=; b=m+p+l5qcicoFxmpSfrz3p7M+6jwa+JOqvg6PHwfsuA+YbdBKIZM4huk5RGctiCEJve 8KrTLSkModmqFyjORDTD7UxHl/QXpeWrdN8WBok+O/C+fupXRtqdE69xXjSXsE0Fzle8 oKsRLdZy2drlDL8XEJWcv/Ptd42nCQ6xnEI3mplmxEiOq0i/m9jLY9ZZkd2nV75PkXHy BvKr0UrkN7jDUui2Nx1nNb+jfAjJUiYmbLn3RCfyaJGcecosknnh+RQItMtFkFsUEJiX AKLoVhHcSy5AeeC7eBzMgizMVWUeCMUimHIJKq1fJyIOdudnf7ZMeGt3NASzJkAEIMXU Enlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=9ryfwz8+CO4vKa8AgZEqIHnA9yEV1ieQHqj9PJgnOK4=; b=nAMJTWjy++OqYEIS8CEBed566npIPRIw5oYG/U3MCiXBTf073p+Ph/7bex31dhYgcp TVztjo1PwU8a8NGRW5WLMA8n5eOHxSN7RXVorpZUwk0i+59vAGrV5v98eW+QvP47zkRq ZRgfymF2eOftkP+Qa1ZnDICVJhta6O4q/lGqO6ZRPP6y5O10K/WKH0f5FI3HpVdq+L9J dBRDbtGI7OtpjgQ7y1ylIbya9jBpHSygUecwD0lSXz1mFoydH3klP/cA1smL/yBzt+eh U3tR7oBd1jgi8oktU06luFLhyCZdI1VgNlIQ629n7TRuqYWP/8UchV6Rq/RzKcb3xty8 6ZFQ== X-Gm-Message-State: APjAAAWoBK8RFiz/Vn59lzpa5HOeGKwyci2ma+CTYcFJ6Yb0vI1p1XEy Lx+gJUS8BsJAghjyZaC1k+c= X-Google-Smtp-Source: APXvYqxmVVYk/4VHtoMm5AJAF/kRXHQVspJxwcBsT4XM1/xDtF854s+Pq8QVlrkLz+6QIANfI0Y/OQ== X-Received: by 2002:a63:ee04:: with SMTP id e4mr55953382pgi.53.1568641743159; Mon, 16 Sep 2019 06:49:03 -0700 (PDT) Received: from ap-To-be-filled-by-O-E-M.1.1.1.1 ([14.33.120.60]) by smtp.gmail.com with ESMTPSA id z20sm2822266pjn.12.2019.09.16.06.48.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Sep 2019 06:49:02 -0700 (PDT) From: Taehee Yoo To: davem@davemloft.net, netdev@vger.kernel.org, j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net, jiri@resnulli.us, sd@queasysnail.net, roopa@cumulusnetworks.com, saeedm@mellanox.com, manishc@marvell.com, rahulv@marvell.com, kys@microsoft.com, haiyangz@microsoft.com, stephen@networkplumber.org, sashal@kernel.org, hare@suse.de, varun@chelsio.com, ubraun@linux.ibm.com, kgraul@linux.ibm.com, jay.vosburgh@canonical.com Cc: ap420073@gmail.com Subject: [PATCH net v3 07/11] macvlan: use dynamic lockdep key instead of subclass Date: Mon, 16 Sep 2019 22:47:58 +0900 Message-Id: <20190916134802.8252-8-ap420073@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190916134802.8252-1-ap420073@gmail.com> References: <20190916134802.8252-1-ap420073@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org All macvlan device has same lockdep key and subclass is initialized with nest_level. But actual nest_level value can be changed when a lower device is attached. And at this moment, the subclass should be updated but it seems to be unsafe. So this patch makes macvlan use dynamic lockdep key instead of the subclass. Test commands: ip link add bond0 type bond ip link add dummy0 type dummy ip link add macvlan0 link bond0 type macvlan mode bridge ip link add macvlan1 link dummy0 type macvlan mode bridge ip link set bond0 mtu 1000 ip link set macvlan1 master bond0 ip link set bond0 up ip link set macvlan0 up ip link set dummy0 up ip link set macvlan1 up Splat looks like: [ 55.694677] WARNING: possible recursive locking detected [ 55.695420] 5.3.0-rc8+ #179 Not tainted [ 55.709918] -------------------------------------------- [ 55.711814] ip/982 is trying to acquire lock: [ 55.712387] 0000000023ca93f4 (&macvlan_netdev_addr_lock_key/1){+...}, at: dev_uc_sync_multiple+0xfa/0x1a0 [ 55.713621] [ 55.713621] but task is already holding lock: [ 55.714364] 0000000070c93e9d (&macvlan_netdev_addr_lock_key/1){+...}, at: dev_set_rx_mode+0x19/0x30 [ 55.715548] [ 55.715548] other info that might help us debug this: [ 55.716428] Possible unsafe locking scenario: [ 55.716428] [ 55.717231] CPU0 [ 55.717563] ---- [ 55.717907] lock(&macvlan_netdev_addr_lock_key/1); [ 55.718574] lock(&macvlan_netdev_addr_lock_key/1); [ 55.719149] [ 55.719149] *** DEADLOCK *** [ 55.719149] [ 55.719865] May be due to missing lock nesting notation [ 55.719865] [ 55.720607] 4 locks held by ip/982: [ 55.721056] #0: 0000000096ab2afb (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0 [ 55.722031] #1: 0000000070c93e9d (&macvlan_netdev_addr_lock_key/1){+...}, at: dev_set_rx_mode+0x19/0x30 [ 55.791914] #2: 000000005409683b (&dev_addr_list_lock_key/3){+...}, at: dev_uc_sync+0xfa/0x1a0 [ 55.792718] #3: 0000000085f78eaf (rcu_read_lock){....}, at: bond_set_rx_mode+0x5/0x3c0 [bonding] [ 55.793533] [ 55.793533] stack backtrace: [ 55.793939] CPU: 0 PID: 982 Comm: ip Not tainted 5.3.0-rc8+ #179 [ 55.794489] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 55.795227] Call Trace: [ 55.795493] dump_stack+0x7c/0xbb [ 55.795809] __lock_acquire+0x26a9/0x3de0 [ 55.796184] ? register_lock_class+0x14d0/0x14d0 [ 55.797496] ? register_lock_class+0x14d0/0x14d0 [ 55.797971] lock_acquire+0x164/0x3b0 [ 55.838601] ? dev_uc_sync_multiple+0xfa/0x1a0 [ 55.839281] _raw_spin_lock_nested+0x2e/0x60 [ 55.840289] ? dev_uc_sync_multiple+0xfa/0x1a0 [ 55.841308] dev_uc_sync_multiple+0xfa/0x1a0 [ 55.841868] bond_set_rx_mode+0x269/0x3c0 [bonding] [ 55.842500] ? bond_init+0x6f0/0x6f0 [bonding] [ 55.843076] dev_uc_sync+0x15a/0x1a0 [ 55.843567] macvlan_set_mac_lists+0x55/0x110 [macvlan] [ 55.844247] dev_set_rx_mode+0x21/0x30 [ 55.844733] __dev_open+0x202/0x310 [ 55.845186] ? dev_set_rx_mode+0x30/0x30 [ 55.845707] ? mark_held_locks+0xa5/0xe0 [ 55.846219] ? __local_bh_enable_ip+0xe9/0x1b0 [ 55.846805] __dev_change_flags+0x3c3/0x500 [ 55.847376] ? dev_set_allmulti+0x10/0x10 [ 55.847906] ? check_chain_key+0x236/0x5d0 [ 55.848881] dev_change_flags+0x7a/0x160 [ 55.850015] do_setlink+0xa49/0x2f40 [ ... ] Fixes: c674ac30c549 ("macvlan: Fix lockdep warnings with stacked macvlan devices") Signed-off-by: Taehee Yoo --- v2 -> v3 : - This patch is not changed v1 -> v2 : - This patch is not changed drivers/net/macvlan.c | 35 +++++++++++++++++++++++++++-------- include/linux/if_macvlan.h | 2 ++ 2 files changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 940192c057b6..dae368a2e8d1 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -852,8 +852,6 @@ static int macvlan_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) * "super class" of normal network devices; split their locks off into a * separate class since they always nest. */ -static struct lock_class_key macvlan_netdev_addr_lock_key; - #define ALWAYS_ON_OFFLOADS \ (NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE | \ NETIF_F_GSO_ROBUST | NETIF_F_GSO_ENCAP_ALL) @@ -874,12 +872,30 @@ static int macvlan_get_nest_level(struct net_device *dev) return ((struct macvlan_dev *)netdev_priv(dev))->nest_level; } -static void macvlan_set_lockdep_class(struct net_device *dev) +static void macvlan_dev_set_lockdep_one(struct net_device *dev, + struct netdev_queue *txq, + void *_unused) +{ + struct macvlan_dev *macvlan = netdev_priv(dev); + + lockdep_set_class(&txq->_xmit_lock, &macvlan->xmit_lock_key); +} + +static struct lock_class_key qdisc_tx_busylock_key; +static struct lock_class_key qdisc_running_key; + +static void macvlan_dev_set_lockdep_class(struct net_device *dev) { - netdev_lockdep_set_classes(dev); - lockdep_set_class_and_subclass(&dev->addr_list_lock, - &macvlan_netdev_addr_lock_key, - macvlan_get_nest_level(dev)); + struct macvlan_dev *macvlan = netdev_priv(dev); + + dev->qdisc_tx_busylock = &qdisc_tx_busylock_key; + dev->qdisc_running_key = &qdisc_running_key; + + lockdep_register_key(&macvlan->addr_lock_key); + lockdep_set_class(&dev->addr_list_lock, &macvlan->addr_lock_key); + + lockdep_register_key(&macvlan->xmit_lock_key); + netdev_for_each_tx_queue(dev, macvlan_dev_set_lockdep_one, NULL); } static int macvlan_init(struct net_device *dev) @@ -900,7 +916,7 @@ static int macvlan_init(struct net_device *dev) dev->gso_max_segs = lowerdev->gso_max_segs; dev->hard_header_len = lowerdev->hard_header_len; - macvlan_set_lockdep_class(dev); + macvlan_dev_set_lockdep_class(dev); vlan->pcpu_stats = netdev_alloc_pcpu_stats(struct vlan_pcpu_stats); if (!vlan->pcpu_stats) @@ -922,6 +938,9 @@ static void macvlan_uninit(struct net_device *dev) port->count -= 1; if (!port->count) macvlan_port_destroy(port->dev); + + lockdep_unregister_key(&vlan->addr_lock_key); + lockdep_unregister_key(&vlan->xmit_lock_key); } static void macvlan_dev_get_stats64(struct net_device *dev, diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h index 2e55e4cdbd8a..ea5b41823287 100644 --- a/include/linux/if_macvlan.h +++ b/include/linux/if_macvlan.h @@ -31,6 +31,8 @@ struct macvlan_dev { u16 flags; int nest_level; unsigned int macaddr_count; + struct lock_class_key xmit_lock_key; + struct lock_class_key addr_lock_key; #ifdef CONFIG_NET_POLL_CONTROLLER struct netpoll *netpoll; #endif From patchwork Mon Sep 16 13:47:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taehee Yoo X-Patchwork-Id: 1162830 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="RH2GJlPW"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46X6zY2J0Zz9sQm for ; Mon, 16 Sep 2019 23:49:13 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387935AbfIPNtM (ORCPT ); Mon, 16 Sep 2019 09:49:12 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:33990 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733128AbfIPNtM (ORCPT ); Mon, 16 Sep 2019 09:49:12 -0400 Received: by mail-pg1-f195.google.com with SMTP id n9so56255pgc.1 for ; Mon, 16 Sep 2019 06:49:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=SJH1U+hbVAPZ8FanGy0kD29PtSPzpTKSJZxpdr+tM+Q=; b=RH2GJlPWH7GeT/OA85HwZn7qcWumE6+i1v26NX2/O5BupKxMGn+DBRaaMegw4yOywy 91frXUMskuZ5ToogqRslEl/B7gq/WBFzMi7hB9YRRPJv98dCU5LvnVSJq56Rfw44tyai 1oZ+N71EP+moVSAPAfmal2NUGGQTujmBUkin9xv0zU6ZL6gc/46Z/cDcJdgM1aASeHvE sV3S4YJeTliF7KW7JY2C6CSWUMDgJQXnqv1ko/zEZzNqgJFMjaJIg2PXOJ+X6xtdOkJ9 mLuLLpo5pDtVJ0OW2r3/BmFsfbDgZ2qzz10Zoz9Dj5VyhU5lD3rv3wLNCEkC8fzMvvGI HA2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=SJH1U+hbVAPZ8FanGy0kD29PtSPzpTKSJZxpdr+tM+Q=; b=P4SAl4gg6JFlDqWtpOJ55kuDYaMNQpBJK5SKpmxAaZDtCsB6EwqzcnVOX0qstgmL71 324n5w1w78pXgXwyYFYEGQVd2uzcGDN5vpoNy7zqSudObaa+Nr1JTLXNu64YuS8m5vWk er4gdwgyxelMvwU7FUX61gttbYW/KrPAtBwSds1wqs9vL21nIbWvRpSiLTjl8dkzvVji YeWVLdQrPWL9J4BWclMBmZFGxxZlCxubHyCCfSwPIcBQIKPNRaMYKs3mtFnP1Ei1ydXG /G85H1DdtjZ4fF6lYrO68qrENlDHo99UnLitFDWvCKDPP7OlmSG06turHK40cwrxDXJx 6ltQ== X-Gm-Message-State: APjAAAUy1E0FJ2CimLfzBXfPmOphY1alZ51AV5KCU59HfENOy2+7yhWo xBrloz+cmBkru2786UQ+o8I= X-Google-Smtp-Source: APXvYqxWbwu8d/4u/Bx+k/BhYKAdarT9DLtLNnmEiao0JNGZoo85rlZzmaawNJrCtSeyXCKY0fZsaQ== X-Received: by 2002:a17:90a:2464:: with SMTP id h91mr102642pje.9.1568641751355; Mon, 16 Sep 2019 06:49:11 -0700 (PDT) Received: from ap-To-be-filled-by-O-E-M.1.1.1.1 ([14.33.120.60]) by smtp.gmail.com with ESMTPSA id z20sm2822266pjn.12.2019.09.16.06.49.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Sep 2019 06:49:10 -0700 (PDT) From: Taehee Yoo To: davem@davemloft.net, netdev@vger.kernel.org, j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net, jiri@resnulli.us, sd@queasysnail.net, roopa@cumulusnetworks.com, saeedm@mellanox.com, manishc@marvell.com, rahulv@marvell.com, kys@microsoft.com, haiyangz@microsoft.com, stephen@networkplumber.org, sashal@kernel.org, hare@suse.de, varun@chelsio.com, ubraun@linux.ibm.com, kgraul@linux.ibm.com, jay.vosburgh@canonical.com Cc: ap420073@gmail.com Subject: [PATCH net v3 08/11] macsec: fix refcnt leak in module exit routine Date: Mon, 16 Sep 2019 22:47:59 +0900 Message-Id: <20190916134802.8252-9-ap420073@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190916134802.8252-1-ap420073@gmail.com> References: <20190916134802.8252-1-ap420073@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When a macsec interface is created, it increases a refcnt to a lower device(real device). when macsec interface is deleted, the refcnt is decreased in macsec_free_netdev(), which is ->priv_destructor() of macsec interface. The problem scenario is this. When nested macsec interfaces are exiting, the exit routine of the macsec module makes refcnt leaks. Test commands: ip link add dummy0 type dummy ip link add macsec0 link dummy0 type macsec ip link add macsec1 link macsec0 type macsec modprobe -rv macsec [ 208.629433] unregister_netdevice: waiting for macsec0 to become free. Usage count = 1 Steps of exit routine of macsec module are below. 1. Calls ->dellink() in __rtnl_link_unregister(). 2. Checks refcnt and wait refcnt to be 0 if refcnt is not 0 in netdev_run_todo(). 3. Calls ->priv_destruvtor() in netdev_run_todo(). Step2 checks refcnt, but step3 decreases refcnt. So, step2 waits forever. This patch makes the macsec module do not hold a refcnt of the lower device because it already holds a refcnt of the lower device with netdev_upper_dev_link(). Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Taehee Yoo --- v2 -> v3 : - This patch is not changed v1 -> v2 : - This patch is not changed drivers/net/macsec.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 25a4fc88145d..41ec1ed0d545 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -3031,12 +3031,10 @@ static const struct nla_policy macsec_rtnl_policy[IFLA_MACSEC_MAX + 1] = { static void macsec_free_netdev(struct net_device *dev) { struct macsec_dev *macsec = macsec_priv(dev); - struct net_device *real_dev = macsec->real_dev; free_percpu(macsec->stats); free_percpu(macsec->secy.tx_sc.stats); - dev_put(real_dev); } static void macsec_setup(struct net_device *dev) @@ -3291,8 +3289,6 @@ static int macsec_newlink(struct net *net, struct net_device *dev, if (err < 0) return err; - dev_hold(real_dev); - macsec->nest_level = dev_get_nest_level(real_dev) + 1; err = netdev_upper_dev_link(real_dev, dev, extack); From patchwork Mon Sep 16 13:48:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taehee Yoo X-Patchwork-Id: 1162831 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="boF81hql"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46X6zl0WVwz9sQp for ; Mon, 16 Sep 2019 23:49:23 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387964AbfIPNtW (ORCPT ); Mon, 16 Sep 2019 09:49:22 -0400 Received: from mail-pl1-f196.google.com ([209.85.214.196]:44044 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733128AbfIPNtV (ORCPT ); Mon, 16 Sep 2019 09:49:21 -0400 Received: by mail-pl1-f196.google.com with SMTP id k1so16955660pls.11 for ; Mon, 16 Sep 2019 06:49:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=McSuU8z/zNJCxnLPEFOANHVpsoxk6pd/gMe7sJiZLVQ=; b=boF81hqlNjLK8LNURAT9GdihfdeVtkz4+893OOV/ID6rMvrusxvpuH6c/3n1P3UsdS zSFhO8fOXaFYgLgVgNGzyInr/wFcpH1ACWxos86MQMxhvsrc8lRztndS0nq/Qj/pPMhQ D2cJZhg9v25kGAPBsvClYPV2net5AYV2GW35nQP37bXNsde38AmZKpVoq0qNCjo3VjRk PmM28C9NnSLZwhwGXS8QackF3y6Lnl5aroY4WN0wvhCODw/eB7Im5YIzRwgp2rbG5GSq +s23fgCorsgTO/FERh3uy+95HFhdII/5fFZv1wKFLgQdKaiz6N1E8IVSWL9VVMrDBOSm Pm4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=McSuU8z/zNJCxnLPEFOANHVpsoxk6pd/gMe7sJiZLVQ=; b=hed4vI61e+5yEf6tkxIPWM8xP4RSKk+rjNnf9F0U1P8doFBVFMqRkfujnVMpPaT6P+ KB12DbKJ0fSZI+bdQp3ZLoS88jeVwtgLDn1u715mTCm8QXhORMR+8hvepZiFCLtMeBy5 EMiKG36t6vIMa+khbfuEUbElLarjMHRn1bEElajs0Il6ns/Wz7OgBYxgJf1tHlnomhhh jNo9IKVG2kW5elaqaSkD9ic7nJAhpdqomMhH3QebZ4hT4WdHwEQ0w6IrDuDpxXX4m0aG yWqvoTYW93M+5BCbVx+Y6ydHn8tm4/U6v8WXtsGli54b1CJTLEmAwcBn1lJzagrt7AcU ExoQ== X-Gm-Message-State: APjAAAXf5eYC8ZWjs1oq9O5Gdvy6kvPLoReNDGqZY5VoW2PXQ6nlOlW7 iq4GQ15y+FX5INlnPdncgdY= X-Google-Smtp-Source: APXvYqyk/yUBPA0kLidEt4vhGyR1fz0EeUmPT1/SF/C9/m4+L60CktaDnE2uoMV+onegEwbe03Harw== X-Received: by 2002:a17:902:9a46:: with SMTP id x6mr2408091plv.12.1568641758864; Mon, 16 Sep 2019 06:49:18 -0700 (PDT) Received: from ap-To-be-filled-by-O-E-M.1.1.1.1 ([14.33.120.60]) by smtp.gmail.com with ESMTPSA id z20sm2822266pjn.12.2019.09.16.06.49.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Sep 2019 06:49:17 -0700 (PDT) From: Taehee Yoo To: davem@davemloft.net, netdev@vger.kernel.org, j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net, jiri@resnulli.us, sd@queasysnail.net, roopa@cumulusnetworks.com, saeedm@mellanox.com, manishc@marvell.com, rahulv@marvell.com, kys@microsoft.com, haiyangz@microsoft.com, stephen@networkplumber.org, sashal@kernel.org, hare@suse.de, varun@chelsio.com, ubraun@linux.ibm.com, kgraul@linux.ibm.com, jay.vosburgh@canonical.com Cc: ap420073@gmail.com Subject: [PATCH net v3 09/11] net: core: add ignore flag to netdev_adjacent structure Date: Mon, 16 Sep 2019 22:48:00 +0900 Message-Id: <20190916134802.8252-10-ap420073@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190916134802.8252-1-ap420073@gmail.com> References: <20190916134802.8252-1-ap420073@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In order to link an adjacent node, netdev_upper_dev_link() is used and in order to unlink an adjacent node, netdev_upper_dev_unlink() is used. unlink operation does not fail, but link operation can fail. In order to exchange adjacent nodes, we should unlink an old adjacent node first. then, link a new adjacent node. If link operation is failed, we should link an old adjacent node again. But this link operation can fail too. It eventually breaks the adjacent link relationship. This patch adds an ignore flag into the netdev_adjacent structure. If this flag is set, netdev_upper_dev_link() ignores an old adjacent node for a moment. So we can skip unlink operation before link operation. Signed-off-by: Taehee Yoo --- v2 -> v3 : - Modify nesting infra code to use iterator instead of recursive v1 -> v2 : - This patch is not changed include/linux/netdevice.h | 4 + net/core/dev.c | 180 ++++++++++++++++++++++++++++++++++---- 2 files changed, 166 insertions(+), 18 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5bb5756129af..4506810c301b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4319,6 +4319,10 @@ int netdev_master_upper_dev_link(struct net_device *dev, struct netlink_ext_ack *extack); void netdev_upper_dev_unlink(struct net_device *dev, struct net_device *upper_dev); +void netdev_adjacent_dev_disable(struct net_device *upper_dev, + struct net_device *lower_dev); +void netdev_adjacent_dev_enable(struct net_device *upper_dev, + struct net_device *lower_dev); void netdev_adjacent_rename_links(struct net_device *dev, char *oldname); void *netdev_lower_dev_get_private(struct net_device *dev, struct net_device *lower_dev); diff --git a/net/core/dev.c b/net/core/dev.c index fa847ea957ee..12d76b983064 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6448,6 +6448,9 @@ struct netdev_adjacent { /* upper master flag, there can only be one master device per list */ bool master; + /* lookup ignore flag */ + bool ignore; + /* counter for the number of times this device was added to us */ u16 ref_nr; @@ -6553,6 +6556,22 @@ struct net_device *netdev_master_upper_dev_get(struct net_device *dev) } EXPORT_SYMBOL(netdev_master_upper_dev_get); +struct net_device *netdev_master_upper_dev_get_ignore(struct net_device *dev) +{ + struct netdev_adjacent *upper; + + ASSERT_RTNL(); + + if (list_empty(&dev->adj_list.upper)) + return NULL; + + upper = list_first_entry(&dev->adj_list.upper, + struct netdev_adjacent, list); + if (likely(upper->master) && !upper->ignore) + return upper->dev; + return NULL; +} + /** * netdev_has_any_lower_dev - Check if device is linked to some device * @dev: device @@ -6603,8 +6622,9 @@ struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev, } EXPORT_SYMBOL(netdev_upper_get_next_dev_rcu); -static struct net_device *netdev_next_upper_dev(struct net_device *dev, - struct list_head **iter) +static struct net_device *netdev_next_upper_dev_ignore(struct net_device *dev, + struct list_head **iter, + bool *ignore) { struct netdev_adjacent *upper; @@ -6614,6 +6634,7 @@ static struct net_device *netdev_next_upper_dev(struct net_device *dev, return NULL; *iter = &upper->list; + *ignore = upper->ignore; return upper->dev; } @@ -6635,14 +6656,15 @@ static struct net_device *netdev_next_upper_dev_rcu(struct net_device *dev, return upper->dev; } -int netdev_walk_all_upper_dev(struct net_device *dev, - int (*fn)(struct net_device *dev, - void *data), - void *data) +int netdev_walk_all_upper_dev_ignore(struct net_device *dev, + int (*fn)(struct net_device *dev, + void *data), + void *data) { struct net_device *udev, *next, *now, *dev_stack[MAX_NEST_DEV + 1]; struct list_head *niter, *iter, *iter_stack[MAX_NEST_DEV + 1]; int ret, cur = 0; + bool ignore; now = dev; iter = &dev->adj_list.upper; @@ -6656,9 +6678,12 @@ int netdev_walk_all_upper_dev(struct net_device *dev, next = NULL; while (1) { - udev = netdev_next_upper_dev(now, &iter); + udev = netdev_next_upper_dev_ignore(now, &iter, + &ignore); if (!udev) break; + if (ignore) + continue; if (!next) { next = udev; @@ -6735,6 +6760,15 @@ int netdev_walk_all_upper_dev_rcu(struct net_device *dev, } EXPORT_SYMBOL_GPL(netdev_walk_all_upper_dev_rcu); +bool netdev_has_upper_dev_ignore(struct net_device *dev, + struct net_device *upper_dev) +{ + ASSERT_RTNL(); + + return netdev_walk_all_upper_dev_ignore(dev, __netdev_has_upper_dev, + upper_dev); +} + /** * netdev_lower_get_next_private - Get the next ->private from the * lower neighbour list @@ -6831,6 +6865,23 @@ static struct net_device *netdev_next_lower_dev(struct net_device *dev, return lower->dev; } +static struct net_device *netdev_next_lower_dev_ignore(struct net_device *dev, + struct list_head **iter, + bool *ignore) +{ + struct netdev_adjacent *lower; + + lower = list_entry((*iter)->next, struct netdev_adjacent, list); + + if (&lower->list == &dev->adj_list.lower) + return NULL; + + *iter = &lower->list; + *ignore = lower->ignore; + + return lower->dev; +} + int netdev_walk_all_lower_dev(struct net_device *dev, int (*fn)(struct net_device *dev, void *data), @@ -6881,6 +6932,59 @@ int netdev_walk_all_lower_dev(struct net_device *dev, } EXPORT_SYMBOL_GPL(netdev_walk_all_lower_dev); +int netdev_walk_all_lower_dev_ignore(struct net_device *dev, + int (*fn)(struct net_device *dev, + void *data), + void *data) +{ + struct net_device *ldev, *next, *now, *dev_stack[MAX_NEST_DEV + 1]; + struct list_head *niter, *iter, *iter_stack[MAX_NEST_DEV + 1]; + int ret, cur = 0; + bool ignore; + + now = dev; + iter = &dev->adj_list.lower; + + while (1) { + if (now != dev) { + ret = fn(now, data); + if (ret) + return ret; + } + + next = NULL; + while (1) { + ldev = netdev_next_lower_dev_ignore(now, &iter, + &ignore); + if (!ldev) + break; + if (ignore) + continue; + + if (!next) { + next = ldev; + niter = &ldev->adj_list.lower; + } else { + dev_stack[cur] = ldev; + iter_stack[cur++] = &ldev->adj_list.lower; + break; + } + } + + if (!next) { + if (!cur) + return 0; + next = dev_stack[--cur]; + niter = iter_stack[cur]; + } + + now = next; + iter = niter; + } + + return 0; +} + static struct net_device *netdev_next_lower_dev_rcu(struct net_device *dev, struct list_head **iter) { @@ -6900,11 +7004,14 @@ static u8 __netdev_upper_depth(struct net_device *dev) struct net_device *udev; struct list_head *iter; u8 max_depth = 0; + bool ignore; for (iter = &dev->adj_list.upper, - udev = netdev_next_upper_dev(dev, &iter); + udev = netdev_next_upper_dev_ignore(dev, &iter, &ignore); udev; - udev = netdev_next_upper_dev(dev, &iter)) { + udev = netdev_next_upper_dev_ignore(dev, &iter, &ignore)) { + if (ignore) + continue; if (max_depth < udev->upper_level) max_depth = udev->upper_level; } @@ -6917,11 +7024,14 @@ static u8 __netdev_lower_depth(struct net_device *dev) struct net_device *ldev; struct list_head *iter; u8 max_depth = 0; + bool ignore; for (iter = &dev->adj_list.lower, - ldev = netdev_next_lower_dev(dev, &iter); + ldev = netdev_next_lower_dev_ignore(dev, &iter, &ignore); ldev; - ldev = netdev_next_lower_dev(dev, &iter)) { + ldev = netdev_next_lower_dev_ignore(dev, &iter, &ignore)) { + if (ignore) + continue; if (max_depth < ldev->lower_level) max_depth = ldev->lower_level; } @@ -7089,6 +7199,7 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev, adj->master = master; adj->ref_nr = 1; adj->private = private; + adj->ignore = false; dev_hold(adj_dev); pr_debug("Insert adjacency: dev %s adj_dev %s adj->ref_nr %d; dev_hold on %s\n", @@ -7239,17 +7350,17 @@ static int __netdev_upper_dev_link(struct net_device *dev, return -EBUSY; /* To prevent loops, check if dev is not upper device to upper_dev. */ - if (netdev_has_upper_dev(upper_dev, dev)) + if (netdev_has_upper_dev_ignore(upper_dev, dev)) return -EBUSY; if ((dev->lower_level + upper_dev->upper_level) > MAX_NEST_DEV) return -EMLINK; if (!master) { - if (netdev_has_upper_dev(dev, upper_dev)) + if (netdev_has_upper_dev_ignore(dev, upper_dev)) return -EEXIST; } else { - master_dev = netdev_master_upper_dev_get(dev); + master_dev = netdev_master_upper_dev_get_ignore(dev); if (master_dev) return master_dev == upper_dev ? -EEXIST : -EBUSY; } @@ -7272,10 +7383,12 @@ static int __netdev_upper_dev_link(struct net_device *dev, goto rollback; __netdev_update_upper_level(dev, NULL); - netdev_walk_all_lower_dev(dev, __netdev_update_upper_level, NULL); + netdev_walk_all_lower_dev_ignore(dev, + __netdev_update_upper_level, NULL); __netdev_update_lower_level(upper_dev, NULL); - netdev_walk_all_upper_dev(upper_dev, __netdev_update_lower_level, NULL); + netdev_walk_all_upper_dev_ignore(upper_dev, + __netdev_update_lower_level, NULL); return 0; @@ -7361,13 +7474,44 @@ void netdev_upper_dev_unlink(struct net_device *dev, &changeupper_info.info); __netdev_update_upper_level(dev, NULL); - netdev_walk_all_lower_dev(dev, __netdev_update_upper_level, NULL); + netdev_walk_all_lower_dev_ignore(dev, + __netdev_update_upper_level, NULL); __netdev_update_lower_level(upper_dev, NULL); - netdev_walk_all_upper_dev(upper_dev, __netdev_update_lower_level, NULL); + netdev_walk_all_upper_dev_ignore(upper_dev, + __netdev_update_lower_level, NULL); } EXPORT_SYMBOL(netdev_upper_dev_unlink); +void __netdev_adjacent_dev_set(struct net_device *upper_dev, + struct net_device *lower_dev, + bool val) +{ + struct netdev_adjacent *adj; + + adj = __netdev_find_adj(lower_dev, &upper_dev->adj_list.lower); + if (adj) + adj->ignore = val; + + adj = __netdev_find_adj(upper_dev, &lower_dev->adj_list.upper); + if (adj) + adj->ignore = val; +} + +void netdev_adjacent_dev_disable(struct net_device *upper_dev, + struct net_device *lower_dev) +{ + __netdev_adjacent_dev_set(upper_dev, lower_dev, true); +} +EXPORT_SYMBOL(netdev_adjacent_dev_disable); + +void netdev_adjacent_dev_enable(struct net_device *upper_dev, + struct net_device *lower_dev) +{ + __netdev_adjacent_dev_set(upper_dev, lower_dev, false); +} +EXPORT_SYMBOL(netdev_adjacent_dev_enable); + /** * netdev_bonding_info_change - Dispatch event about slave change * @dev: device From patchwork Mon Sep 16 13:48:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taehee Yoo X-Patchwork-Id: 1162832 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="pq2aFAue"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46X6zs2k26z9sQm for ; Mon, 16 Sep 2019 23:49:29 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388043AbfIPNt1 (ORCPT ); Mon, 16 Sep 2019 09:49:27 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:33078 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733243AbfIPNt1 (ORCPT ); Mon, 16 Sep 2019 09:49:27 -0400 Received: by mail-pg1-f195.google.com with SMTP id n190so60117pgn.0 for ; Mon, 16 Sep 2019 06:49:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=3SfkBIDlpNYLbPumeyoDpyNIIQicCnWvn44xZZCAm1g=; b=pq2aFAueqr5WyEAy5l0kx3lQhggaVfaSeuNC4TlHTNS6wfIfgTpK9pOQTBCBBJhCcQ 2n9SZcCmFsGeVvrwKtJLVlBCMfdrg8PGBqaFvl/y7ChIH8Ui/Z7HyR4DciVqgJgnn0yd q6OdGqLWH+x08GOeVbPdLHIfT0TlRe5vUK8hXuM/HD9XNKRJQTACRil8CByx6PmM76b1 PU0biayD44TJ7cJEBGsX3S2Ma2Z+bmneoAbf7WSCcJPZWJFHjU+aEID+9wNQxISMc8kf J+x1dsdba7ujiLL3qjeLydDPhWupsVRqxxFR9cMjRihhyEkgCNrMxIizSGQditbj4Z+r 0GZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=3SfkBIDlpNYLbPumeyoDpyNIIQicCnWvn44xZZCAm1g=; b=YDWThp7VzYIVW8VY1OqOCjPBJQR7XLkx6bmkbHd/5hTfYdw1DWXtXAmNkis6sSBCm+ 7zokyjprrbyxOP/D8pIJnXk9WyK6jJ4CSc/qMv5fgio/Rr1tXwjtfurabgpjmqDpjdnO PKQtMkNvvWrLblISJMvB6bUrHqNisXPorrsu6TNZ12hAemxuhIY1HSmT2AKCCxGcVmFV npzqgWvUibKZlm72PRGFcJReYqSMwVpcP/QfALcbgYeWNjozK23jyM/YluLYBRLbPyXX FTmBGQrKg1XooPoORWU5j8eu2yX0iss0VPokBDfVkYH93xsgVuWTGF8AAw5CuwM6VsKy ZoXw== X-Gm-Message-State: APjAAAV24Kr43QnUXqaoxm+t7IQhLSARubqDWOfPSql6DK1AEARlogW9 X2IsmdTsGpsXe3dKmtMicPo= X-Google-Smtp-Source: APXvYqxrh7Qa9OKBUPHhyVksliMjG/6Ax+bFpLa5NPsQjBrLUZaZq6GhwW21WDiQR6krkEOrjtjApA== X-Received: by 2002:a17:90a:e981:: with SMTP id v1mr105825pjy.82.1568641765896; Mon, 16 Sep 2019 06:49:25 -0700 (PDT) Received: from ap-To-be-filled-by-O-E-M.1.1.1.1 ([14.33.120.60]) by smtp.gmail.com with ESMTPSA id z20sm2822266pjn.12.2019.09.16.06.49.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Sep 2019 06:49:24 -0700 (PDT) From: Taehee Yoo To: davem@davemloft.net, netdev@vger.kernel.org, j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net, jiri@resnulli.us, sd@queasysnail.net, roopa@cumulusnetworks.com, saeedm@mellanox.com, manishc@marvell.com, rahulv@marvell.com, kys@microsoft.com, haiyangz@microsoft.com, stephen@networkplumber.org, sashal@kernel.org, hare@suse.de, varun@chelsio.com, ubraun@linux.ibm.com, kgraul@linux.ibm.com, jay.vosburgh@canonical.com Cc: ap420073@gmail.com Subject: [PATCH net v3 10/11] vxlan: add adjacent link to limit depth level Date: Mon, 16 Sep 2019 22:48:01 +0900 Message-Id: <20190916134802.8252-11-ap420073@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190916134802.8252-1-ap420073@gmail.com> References: <20190916134802.8252-1-ap420073@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Current vxlan code doesn't limit the number of nested devices. Nested devices would be handled recursively and this routine needs huge stack memory. So, unlimited nested devices could make stack overflow. In order to fix this issue, this patch adds adjacent links. The adjacent link APIs internally check the depth level. Test commands: ip link add dummy0 type dummy ip link add vxlan0 type vxlan id 0 group 239.1.1.1 dev dummy0 \ dstport 4789 for i in {1..100} do let A=$i-1 ip link add vxlan$i type vxlan id $i group 239.1.1.1 \ dev vxlan$A dstport 4789 done ip link del dummy0 The top upper link is vxlan100 and the lowest link is vxlan0. When vxlan0 is deleting, the upper devices will be deleted recursively. It needs huge stack memory so it makes stack overflow. Splat looks like: [ 229.628477] ============================================================================= [ 229.629785] BUG page->ptl (Not tainted): Padding overwritten. 0x0000000026abf214-0x0000000091f6abb2 [ 229.629785] ----------------------------------------------------------------------------- [ 229.629785] [ 229.655439] ================================================================== [ 229.629785] INFO: Slab 0x00000000ff7cfda8 objects=19 used=19 fp=0x00000000fe33776c flags=0x200000000010200 [ 229.655688] BUG: KASAN: stack-out-of-bounds in unmap_single_vma+0x25a/0x2e0 [ 229.655688] Read of size 8 at addr ffff888113076928 by task vlan-network-in/2334 [ 229.655688] [ 229.629785] Padding 0000000026abf214: 00 80 14 0d 81 88 ff ff 68 91 81 14 81 88 ff ff ........h....... [ 229.629785] Padding 0000000001e24790: 38 91 81 14 81 88 ff ff 68 91 81 14 81 88 ff ff 8.......h....... [ 229.629785] Padding 00000000b39397c8: 33 30 62 a7 ff ff ff ff ff eb 60 22 10 f1 ff 1f 30b.......`".... [ 229.629785] Padding 00000000bc98f53a: 80 60 07 13 81 88 ff ff 00 80 14 0d 81 88 ff ff .`.............. [ 229.629785] Padding 000000002aa8123d: 68 91 81 14 81 88 ff ff f7 21 17 a7 ff ff ff ff h........!...... [ 229.629785] Padding 000000001c8c2369: 08 81 14 0d 81 88 ff ff 03 02 00 00 00 00 00 00 ................ [ 229.629785] Padding 000000004e290c5d: 21 90 a2 21 10 ed ff ff 00 00 00 00 00 fc ff df !..!............ [ 229.629785] Padding 000000000e25d731: 18 60 07 13 81 88 ff ff c0 8b 13 05 81 88 ff ff .`.............. [ 229.629785] Padding 000000007adc7ab3: b3 8a b5 41 00 00 00 00 ...A.... [ 229.629785] FIX page->ptl: Restoring 0x0000000026abf214-0x0000000091f6abb2=0x5a [ ... ] Fixes: acaf4e70997f ("net: vxlan: when lower dev unregisters remove vxlan dev as well") Signed-off-by: Taehee Yoo --- v2 -> v3 : - This patch is not changed v1 -> v2 : - This patch is not changed drivers/net/vxlan.c | 71 ++++++++++++++++++++++++++++++++++++++------- include/net/vxlan.h | 1 + 2 files changed, 62 insertions(+), 10 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 3d9bcc957f7d..0d5c8d22d8a4 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -3567,6 +3567,8 @@ static int __vxlan_dev_create(struct net *net, struct net_device *dev, struct vxlan_net *vn = net_generic(net, vxlan_net_id); struct vxlan_dev *vxlan = netdev_priv(dev); struct vxlan_fdb *f = NULL; + struct net_device *remote_dev = NULL; + struct vxlan_rdst *dst = &vxlan->default_dst; bool unregister = false; int err; @@ -3577,14 +3579,14 @@ static int __vxlan_dev_create(struct net *net, struct net_device *dev, dev->ethtool_ops = &vxlan_ethtool_ops; /* create an fdb entry for a valid default destination */ - if (!vxlan_addr_any(&vxlan->default_dst.remote_ip)) { + if (!vxlan_addr_any(&dst->remote_ip)) { err = vxlan_fdb_create(vxlan, all_zeros_mac, - &vxlan->default_dst.remote_ip, + &dst->remote_ip, NUD_REACHABLE | NUD_PERMANENT, vxlan->cfg.dst_port, - vxlan->default_dst.remote_vni, - vxlan->default_dst.remote_vni, - vxlan->default_dst.remote_ifindex, + dst->remote_vni, + dst->remote_vni, + dst->remote_ifindex, NTF_SELF, &f); if (err) return err; @@ -3595,26 +3597,43 @@ static int __vxlan_dev_create(struct net *net, struct net_device *dev, goto errout; unregister = true; + if (dst->remote_ifindex) { + remote_dev = __dev_get_by_index(net, dst->remote_ifindex); + if (!remote_dev) + goto errout; + + err = netdev_upper_dev_link(remote_dev, dev, extack); + if (err) + goto errout; + } + err = rtnl_configure_link(dev, NULL); if (err) - goto errout; + goto unlink; if (f) { - vxlan_fdb_insert(vxlan, all_zeros_mac, - vxlan->default_dst.remote_vni, f); + vxlan_fdb_insert(vxlan, all_zeros_mac, dst->remote_vni, f); /* notify default fdb entry */ err = vxlan_fdb_notify(vxlan, f, first_remote_rtnl(f), RTM_NEWNEIGH, true, extack); if (err) { vxlan_fdb_destroy(vxlan, f, false, false); + if (remote_dev) + netdev_upper_dev_unlink(remote_dev, dev); goto unregister; } } list_add(&vxlan->next, &vn->vxlan_list); + if (remote_dev) { + dst->remote_dev = remote_dev; + dev_hold(remote_dev); + } return 0; - +unlink: + if (remote_dev) + netdev_upper_dev_unlink(remote_dev, dev); errout: /* unregister_netdevice() destroys the default FDB entry with deletion * notification. But the addition notification was not sent yet, so @@ -3936,6 +3955,8 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[], struct net_device *lowerdev; struct vxlan_config conf; int err; + bool linked = false; + bool disabled = false; err = vxlan_nl2conf(tb, data, dev, &conf, true, extack); if (err) @@ -3946,6 +3967,16 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[], if (err) return err; + if (lowerdev) { + if (dst->remote_dev && lowerdev != dst->remote_dev) { + netdev_adjacent_dev_disable(dst->remote_dev, dev); + disabled = true; + } + err = netdev_upper_dev_link(lowerdev, dev, extack); + if (err) + goto err; + linked = true; + } /* handle default dst entry */ if (!vxlan_addr_equal(&conf.remote_ip, &dst->remote_ip)) { u32 hash_index = fdb_head_index(vxlan, all_zeros_mac, conf.vni); @@ -3962,7 +3993,7 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[], NTF_SELF, true, extack); if (err) { spin_unlock_bh(&vxlan->hash_lock[hash_index]); - return err; + goto err; } } if (!vxlan_addr_any(&dst->remote_ip)) @@ -3979,8 +4010,24 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[], if (conf.age_interval != vxlan->cfg.age_interval) mod_timer(&vxlan->age_timer, jiffies); + if (disabled) { + netdev_adjacent_dev_enable(dst->remote_dev, dev); + netdev_upper_dev_unlink(dst->remote_dev, dev); + dev_put(dst->remote_dev); + } + if (linked) { + dst->remote_dev = lowerdev; + dev_hold(dst->remote_dev); + } + vxlan_config_apply(dev, &conf, lowerdev, vxlan->net, true); return 0; +err: + if (linked) + netdev_upper_dev_unlink(lowerdev, dev); + if (disabled) + netdev_adjacent_dev_enable(dst->remote_dev, dev); + return err; } static void vxlan_dellink(struct net_device *dev, struct list_head *head) @@ -3991,6 +4038,10 @@ static void vxlan_dellink(struct net_device *dev, struct list_head *head) list_del(&vxlan->next); unregister_netdevice_queue(dev, head); + if (vxlan->default_dst.remote_dev) { + netdev_upper_dev_unlink(vxlan->default_dst.remote_dev, dev); + dev_put(vxlan->default_dst.remote_dev); + } } static size_t vxlan_get_size(const struct net_device *dev) diff --git a/include/net/vxlan.h b/include/net/vxlan.h index dc1583a1fb8a..08e237d7aa73 100644 --- a/include/net/vxlan.h +++ b/include/net/vxlan.h @@ -197,6 +197,7 @@ struct vxlan_rdst { u8 offloaded:1; __be32 remote_vni; u32 remote_ifindex; + struct net_device *remote_dev; struct list_head list; struct rcu_head rcu; struct dst_cache dst_cache; From patchwork Mon Sep 16 13:48:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taehee Yoo X-Patchwork-Id: 1162833 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="YGWvbye6"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46X7003Ddlz9sQr for ; Mon, 16 Sep 2019 23:49:36 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388094AbfIPNtf (ORCPT ); Mon, 16 Sep 2019 09:49:35 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:46415 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733085AbfIPNte (ORCPT ); Mon, 16 Sep 2019 09:49:34 -0400 Received: by mail-pl1-f193.google.com with SMTP id q24so485718plr.13 for ; Mon, 16 Sep 2019 06:49:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=MaGtkcTYn5438TWZYngSX6f8mx/v5lk6kO2v8T8GHbQ=; b=YGWvbye6rKJVan5txyqdVxAP4B6xKYUUGWQb+kmcJwlJ0n5MGAfss/2TVhcs+hWH+D JomD5LGDsLyINV+sW30oREt2hIyDc1tX9Ycg3ZkXIR+8qhV0Hxgo8PL6d5k6z9RSYPcE rSj3NM9Zva/8oMH0BdyCDYz+XbywNb/w5kD84t3Xql4P0O4A9nzeWmF6vKDIdSBJKeyX Kmp5UZD5NHjdxk4c0GFbpnhDQbxeQkgeGF87fbYzzxAOgZ63a27Ae9PZ6zMd4kmP1H10 buGAufAgQ/pnvizp3C4Q0eYg7hZnE+9Ij+NhwXC1yJUB+BaZ0l8WA6vCxsab9zjE84gi ovmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=MaGtkcTYn5438TWZYngSX6f8mx/v5lk6kO2v8T8GHbQ=; b=n/shEDxfUzVQOmJUg8l58jCS8enfNrDiP6nM6sRrlUWjM6nf3fctpu9AFzwn2dQOmQ wkczlCuQSOyWXLxOmkqq3lJA37XtFOuLUYKDZTJTgxoCiNPtCLywR+obENLehk0RHE5l DmABOMsl15yI19TZQE7Yhc9ot2zOnWkhlqnt2RiIYIHOHZiBxoXD3L4qWUlko6pH+oiw 3ux7smyh9kOuLHbfSE1UqDNQXb32d/J6VGXTyX9HyrieuXBcxpvtFW+g4h/tlVbtUyoc 0IGmQBwe1bhIGdDL6qTUHlphQp3lXPmssNHMkBpiKOVaqwwyDSTVDQqz7KkcyWLGrIJz kIqw== X-Gm-Message-State: APjAAAWMoD+K/n3qz1hldZJ7J5CnnUhuXuaLqXDS5okWb08/JZi3UrS+ 0XwajrDwn5Nl+rfNDiHsZRU= X-Google-Smtp-Source: APXvYqxeIQNy2Ib+SCJ3K1yUWpCbhSdUxu4myEl9OXIPNv/ATt0VfszjWgkNH6XSoaCGGgc1t67GnQ== X-Received: by 2002:a17:902:7485:: with SMTP id h5mr16804068pll.240.1568641773378; Mon, 16 Sep 2019 06:49:33 -0700 (PDT) Received: from ap-To-be-filled-by-O-E-M.1.1.1.1 ([14.33.120.60]) by smtp.gmail.com with ESMTPSA id z20sm2822266pjn.12.2019.09.16.06.49.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Sep 2019 06:49:32 -0700 (PDT) From: Taehee Yoo To: davem@davemloft.net, netdev@vger.kernel.org, j.vosburgh@gmail.com, vfalico@gmail.com, andy@greyhouse.net, jiri@resnulli.us, sd@queasysnail.net, roopa@cumulusnetworks.com, saeedm@mellanox.com, manishc@marvell.com, rahulv@marvell.com, kys@microsoft.com, haiyangz@microsoft.com, stephen@networkplumber.org, sashal@kernel.org, hare@suse.de, varun@chelsio.com, ubraun@linux.ibm.com, kgraul@linux.ibm.com, jay.vosburgh@canonical.com Cc: ap420073@gmail.com Subject: [PATCH net v3 11/11] net: remove unnecessary variables and callback Date: Mon, 16 Sep 2019 22:48:02 +0900 Message-Id: <20190916134802.8252-12-ap420073@gmail.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190916134802.8252-1-ap420073@gmail.com> References: <20190916134802.8252-1-ap420073@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch removes variables and callback these are related to the nested device structure. devices that can be nested have their own nest_level variable that represents the depth of nested devices. In the previous patch, new {lower/upper}_level variables are added and they replace old private nest_level variable. So, this patch removes all 'nest_level' variables. In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added to get lockdep subclass value, which is actually lower nested depth value. But now, they use the dynamic lockdep key to avoid lockdep warning instead of the subclass. So, this patch removes ->ndo_get_lock_subclass() callback. Signed-off-by: Taehee Yoo --- v2 -> v3 : - This patch is not changed v1 -> v2 : - This patch is not changed drivers/net/bonding/bond_alb.c | 2 +- drivers/net/bonding/bond_main.c | 14 ------------- .../net/ethernet/mellanox/mlx5/core/en_tc.c | 2 +- drivers/net/macsec.c | 9 --------- drivers/net/macvlan.c | 7 ------- include/linux/if_macvlan.h | 1 - include/linux/if_vlan.h | 12 ----------- include/linux/netdevice.h | 12 ----------- include/net/bonding.h | 1 - net/8021q/vlan.c | 1 - net/8021q/vlan_dev.c | 6 ------ net/core/dev.c | 20 ------------------- net/core/dev_addr_lists.c | 12 +++++------ net/smc/smc_core.c | 2 +- net/smc/smc_pnet.c | 2 +- 15 files changed, 10 insertions(+), 93 deletions(-) diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 8c79bad2a9a5..4f2e6910c623 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -952,7 +952,7 @@ static int alb_upper_dev_walk(struct net_device *upper, void *_data) struct bond_vlan_tag *tags; if (is_vlan_dev(upper) && - bond->nest_level == vlan_get_encap_level(upper) - 1) { + bond->dev->lower_level == upper->lower_level - 1) { if (upper->addr_assign_type == NET_ADDR_STOLEN) { alb_send_lp_vid(slave, mac_addr, vlan_dev_vlan_proto(upper), diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 7f574e74ed78..69eb61466fbe 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1733,8 +1733,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, goto err_upper_unlink; } - bond->nest_level = dev_get_nest_level(bond_dev) + 1; - /* If the mode uses primary, then the following is handled by * bond_change_active_slave(). */ @@ -1983,9 +1981,6 @@ static int __bond_release_one(struct net_device *bond_dev, if (!bond_has_slaves(bond)) { bond_set_carrier(bond); eth_hw_addr_random(bond_dev); - bond->nest_level = SINGLE_DEPTH_NESTING; - } else { - bond->nest_level = dev_get_nest_level(bond_dev) + 1; } unblock_netpoll_tx(); @@ -3472,13 +3467,6 @@ static void bond_fold_stats(struct rtnl_link_stats64 *_res, } } -static int bond_get_nest_level(struct net_device *bond_dev) -{ - struct bonding *bond = netdev_priv(bond_dev); - - return bond->nest_level; -} - static void bond_get_stats(struct net_device *bond_dev, struct rtnl_link_stats64 *stats) { @@ -4298,7 +4286,6 @@ static const struct net_device_ops bond_netdev_ops = { .ndo_neigh_setup = bond_neigh_setup, .ndo_vlan_rx_add_vid = bond_vlan_rx_add_vid, .ndo_vlan_rx_kill_vid = bond_vlan_rx_kill_vid, - .ndo_get_lock_subclass = bond_get_nest_level, #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_netpoll_setup = bond_netpoll_setup, .ndo_netpoll_cleanup = bond_netpoll_cleanup, @@ -4822,7 +4809,6 @@ static int bond_init(struct net_device *bond_dev) if (!bond->wq) return -ENOMEM; - bond->nest_level = SINGLE_DEPTH_NESTING; bond_dev_set_lockdep_class(bond_dev); list_add_tail(&bond->bond_list, &bn->dev_list); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 00b2d4a86159..e056f9aad8df 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -2797,7 +2797,7 @@ static int add_vlan_pop_action(struct mlx5e_priv *priv, struct mlx5_esw_flow_attr *attr, u32 *action) { - int nest_level = vlan_get_encap_level(attr->parse_attr->filter_dev); + int nest_level = attr->parse_attr->filter_dev->lower_level; struct flow_action_entry vlan_act = { .id = FLOW_ACTION_VLAN_POP, }; diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index 41ec1ed0d545..c0cb595f2bba 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -269,7 +269,6 @@ struct macsec_dev { struct gro_cells gro_cells; struct lock_class_key xmit_lock_key; struct lock_class_key addr_lock_key; - unsigned int nest_level; }; /** @@ -2988,11 +2987,6 @@ static int macsec_get_iflink(const struct net_device *dev) return macsec_priv(dev)->real_dev->ifindex; } -static int macsec_get_nest_level(struct net_device *dev) -{ - return macsec_priv(dev)->nest_level; -} - static const struct net_device_ops macsec_netdev_ops = { .ndo_init = macsec_dev_init, .ndo_uninit = macsec_dev_uninit, @@ -3006,7 +3000,6 @@ static const struct net_device_ops macsec_netdev_ops = { .ndo_start_xmit = macsec_start_xmit, .ndo_get_stats64 = macsec_get_stats64, .ndo_get_iflink = macsec_get_iflink, - .ndo_get_lock_subclass = macsec_get_nest_level, }; static const struct device_type macsec_type = { @@ -3289,8 +3282,6 @@ static int macsec_newlink(struct net *net, struct net_device *dev, if (err < 0) return err; - macsec->nest_level = dev_get_nest_level(real_dev) + 1; - err = netdev_upper_dev_link(real_dev, dev, extack); if (err < 0) goto unregister; diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index dae368a2e8d1..2c14bc606514 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -867,11 +867,6 @@ static int macvlan_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) #define MACVLAN_STATE_MASK \ ((1<<__LINK_STATE_NOCARRIER) | (1<<__LINK_STATE_DORMANT)) -static int macvlan_get_nest_level(struct net_device *dev) -{ - return ((struct macvlan_dev *)netdev_priv(dev))->nest_level; -} - static void macvlan_dev_set_lockdep_one(struct net_device *dev, struct netdev_queue *txq, void *_unused) @@ -1180,7 +1175,6 @@ static const struct net_device_ops macvlan_netdev_ops = { .ndo_fdb_add = macvlan_fdb_add, .ndo_fdb_del = macvlan_fdb_del, .ndo_fdb_dump = ndo_dflt_fdb_dump, - .ndo_get_lock_subclass = macvlan_get_nest_level, #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_poll_controller = macvlan_dev_poll_controller, .ndo_netpoll_setup = macvlan_dev_netpoll_setup, @@ -1464,7 +1458,6 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev, vlan->dev = dev; vlan->port = port; vlan->set_features = MACVLAN_FEATURES; - vlan->nest_level = dev_get_nest_level(lowerdev) + 1; vlan->mode = MACVLAN_MODE_VEPA; if (data && data[IFLA_MACVLAN_MODE]) diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h index ea5b41823287..e9202edcf101 100644 --- a/include/linux/if_macvlan.h +++ b/include/linux/if_macvlan.h @@ -29,7 +29,6 @@ struct macvlan_dev { netdev_features_t set_features; enum macvlan_mode mode; u16 flags; - int nest_level; unsigned int macaddr_count; struct lock_class_key xmit_lock_key; struct lock_class_key addr_lock_key; diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 1aed9f613e90..6f30284a58e5 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -182,8 +182,6 @@ struct vlan_dev_priv { #ifdef CONFIG_NET_POLL_CONTROLLER struct netpoll *netpoll; #endif - unsigned int nest_level; - struct lock_class_key xmit_lock_key; struct lock_class_key addr_lock_key; }; @@ -224,11 +222,6 @@ extern void vlan_vids_del_by_dev(struct net_device *dev, extern bool vlan_uses_dev(const struct net_device *dev); -static inline int vlan_get_encap_level(struct net_device *dev) -{ - BUG_ON(!is_vlan_dev(dev)); - return vlan_dev_priv(dev)->nest_level; -} #else static inline struct net_device * __vlan_find_dev_deep_rcu(struct net_device *real_dev, @@ -298,11 +291,6 @@ static inline bool vlan_uses_dev(const struct net_device *dev) { return false; } -static inline int vlan_get_encap_level(struct net_device *dev) -{ - BUG(); - return 0; -} #endif /** diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 4506810c301b..31283b86a1e2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1408,7 +1408,6 @@ struct net_device_ops { void (*ndo_dfwd_del_station)(struct net_device *pdev, void *priv); - int (*ndo_get_lock_subclass)(struct net_device *dev); int (*ndo_set_tx_maxrate)(struct net_device *dev, int queue_index, u32 maxrate); @@ -4047,16 +4046,6 @@ static inline void netif_addr_lock(struct net_device *dev) spin_lock(&dev->addr_list_lock); } -static inline void netif_addr_lock_nested(struct net_device *dev) -{ - int subclass = SINGLE_DEPTH_NESTING; - - if (dev->netdev_ops->ndo_get_lock_subclass) - subclass = dev->netdev_ops->ndo_get_lock_subclass(dev); - - spin_lock_nested(&dev->addr_list_lock, subclass); -} - static inline void netif_addr_lock_bh(struct net_device *dev) { spin_lock_bh(&dev->addr_list_lock); @@ -4334,7 +4323,6 @@ void netdev_lower_state_changed(struct net_device *lower_dev, extern u8 netdev_rss_key[NETDEV_RSS_KEY_LEN] __read_mostly; void netdev_rss_key_fill(void *buffer, size_t len); -int dev_get_nest_level(struct net_device *dev); int skb_checksum_help(struct sk_buff *skb); int skb_crc32c_csum_help(struct sk_buff *skb); int skb_csum_hwoffload_help(struct sk_buff *skb, diff --git a/include/net/bonding.h b/include/net/bonding.h index c39ac7061e41..74f41dd73866 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -203,7 +203,6 @@ struct bonding { struct slave __rcu *primary_slave; struct bond_up_slave __rcu *slave_arr; /* Array of usable slaves */ bool force_primary; - u32 nest_level; s32 slave_cnt; /* never change this value outside the attach/detach wrappers */ int (*recv_probe)(const struct sk_buff *, struct bonding *, struct slave *); diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 54728d2eda18..d4bcfd8f95bf 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -172,7 +172,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack) if (err < 0) goto out_uninit_mvrp; - vlan->nest_level = dev_get_nest_level(real_dev) + 1; err = register_netdevice(dev); if (err < 0) goto out_uninit_mvrp; diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 12bc80650087..e8707827540c 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -514,11 +514,6 @@ static void vlan_dev_set_lockdep_class(struct net_device *dev) netdev_for_each_tx_queue(dev, vlan_dev_set_lockdep_one, NULL); } -static int vlan_dev_get_lock_subclass(struct net_device *dev) -{ - return vlan_dev_priv(dev)->nest_level; -} - static const struct header_ops vlan_header_ops = { .create = vlan_dev_hard_header, .parse = eth_header_parse, @@ -814,7 +809,6 @@ static const struct net_device_ops vlan_netdev_ops = { .ndo_netpoll_cleanup = vlan_dev_netpoll_cleanup, #endif .ndo_fix_features = vlan_dev_fix_features, - .ndo_get_lock_subclass = vlan_dev_get_lock_subclass, .ndo_get_iflink = vlan_dev_get_iflink, }; diff --git a/net/core/dev.c b/net/core/dev.c index 12d76b983064..5b14a70bea49 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7624,26 +7624,6 @@ void *netdev_lower_dev_get_private(struct net_device *dev, } EXPORT_SYMBOL(netdev_lower_dev_get_private); - -int dev_get_nest_level(struct net_device *dev) -{ - struct net_device *lower = NULL; - struct list_head *iter; - int max_nest = -1; - int nest; - - ASSERT_RTNL(); - - netdev_for_each_lower_dev(dev, lower, iter) { - nest = dev_get_nest_level(lower); - if (max_nest < nest) - max_nest = nest; - } - - return max_nest + 1; -} -EXPORT_SYMBOL(dev_get_nest_level); - /** * netdev_lower_change - Dispatch event about lower device state change * @lower_dev: device diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c index 6393ba930097..2f949b5a1eb9 100644 --- a/net/core/dev_addr_lists.c +++ b/net/core/dev_addr_lists.c @@ -637,7 +637,7 @@ int dev_uc_sync(struct net_device *to, struct net_device *from) if (to->addr_len != from->addr_len) return -EINVAL; - netif_addr_lock_nested(to); + netif_addr_lock(to); err = __hw_addr_sync(&to->uc, &from->uc, to->addr_len); if (!err) __dev_set_rx_mode(to); @@ -667,7 +667,7 @@ int dev_uc_sync_multiple(struct net_device *to, struct net_device *from) if (to->addr_len != from->addr_len) return -EINVAL; - netif_addr_lock_nested(to); + netif_addr_lock(to); err = __hw_addr_sync_multiple(&to->uc, &from->uc, to->addr_len); if (!err) __dev_set_rx_mode(to); @@ -691,7 +691,7 @@ void dev_uc_unsync(struct net_device *to, struct net_device *from) return; netif_addr_lock_bh(from); - netif_addr_lock_nested(to); + netif_addr_lock(to); __hw_addr_unsync(&to->uc, &from->uc, to->addr_len); __dev_set_rx_mode(to); netif_addr_unlock(to); @@ -858,7 +858,7 @@ int dev_mc_sync(struct net_device *to, struct net_device *from) if (to->addr_len != from->addr_len) return -EINVAL; - netif_addr_lock_nested(to); + netif_addr_lock(to); err = __hw_addr_sync(&to->mc, &from->mc, to->addr_len); if (!err) __dev_set_rx_mode(to); @@ -888,7 +888,7 @@ int dev_mc_sync_multiple(struct net_device *to, struct net_device *from) if (to->addr_len != from->addr_len) return -EINVAL; - netif_addr_lock_nested(to); + netif_addr_lock(to); err = __hw_addr_sync_multiple(&to->mc, &from->mc, to->addr_len); if (!err) __dev_set_rx_mode(to); @@ -912,7 +912,7 @@ void dev_mc_unsync(struct net_device *to, struct net_device *from) return; netif_addr_lock_bh(from); - netif_addr_lock_nested(to); + netif_addr_lock(to); __hw_addr_unsync(&to->mc, &from->mc, to->addr_len); __dev_set_rx_mode(to); netif_addr_unlock(to); diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 4ca50ddf8d16..a2e91b8d04b3 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -558,7 +558,7 @@ int smc_vlan_by_tcpsk(struct socket *clcsock, struct smc_init_info *ini) } rtnl_lock(); - nest_lvl = dev_get_nest_level(ndev); + nest_lvl = ndev->lower_level; for (i = 0; i < nest_lvl; i++) { struct list_head *lower = &ndev->adj_list.lower; diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c index bab2da8cf17a..2920b006f65c 100644 --- a/net/smc/smc_pnet.c +++ b/net/smc/smc_pnet.c @@ -718,7 +718,7 @@ static struct net_device *pnet_find_base_ndev(struct net_device *ndev) int i, nest_lvl; rtnl_lock(); - nest_lvl = dev_get_nest_level(ndev); + nest_lvl = ndev->lower_level; for (i = 0; i < nest_lvl; i++) { struct list_head *lower = &ndev->adj_list.lower;