From patchwork Mon Nov 30 08:25:27 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 39788 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 140EAB6EEC for ; Mon, 30 Nov 2009 19:26:10 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753994AbZK3IZk (ORCPT ); Mon, 30 Nov 2009 03:25:40 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753416AbZK3IZi (ORCPT ); Mon, 30 Nov 2009 03:25:38 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:36904 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752563AbZK3IZd (ORCPT ); Mon, 30 Nov 2009 03:25:33 -0500 Received: from in01.mta.xmission.com ([166.70.13.51]) by out01.mta.xmission.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1NF1l8-0006AA-IX; Mon, 30 Nov 2009 01:37:26 -0700 Received: from c-76-21-114-89.hsd1.ca.comcast.net ([76.21.114.89] helo=fess.ebiederm.org) by in01.mta.xmission.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1NF1Ye-0004Ub-Cn; Mon, 30 Nov 2009 01:24:35 -0700 Received: from fess.ebiederm.org (localhost [127.0.0.1]) by fess.ebiederm.org (8.14.3/8.14.3/Debian-4) with ESMTP id nAU8PYMc030645; Mon, 30 Nov 2009 00:25:34 -0800 Received: (from eric@localhost) by fess.ebiederm.org (8.14.3/8.14.3/Submit) id nAU8PXPI030644; Mon, 30 Nov 2009 00:25:33 -0800 From: "Eric W. Biederman" To: David Miller Cc: , jamal , Daniel Lezcano , Alexey Dobriyan , Patrick McHardy , "Eric W. Biederman" , "Eric W. Biederman" Date: Mon, 30 Nov 2009 00:25:27 -0800 Message-Id: <1259569530-30609-2-git-send-email-ebiederm@xmission.com> X-Mailer: git-send-email 1.6.5.2.143.g8cc62 In-Reply-To: References: X-XM-SPF: eid=; ; ; mid=; ; ; hst=in01.mta.xmission.com; ; ; ip=76.21.114.89; ; ; frm=ebiederm@xmission.com; ; ; spf=neutral X-SA-Exim-Connect-IP: 76.21.114.89 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-DCC: XMission; sa01 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on sa01.xmission.com X-Spam-Level: X-Spam-Status: No, score=-2.5 required=8.0 tests=ALL_TRUSTED,BAYES_00, DCC_CHECK_NEGATIVE, FVGT_m_MULTI_ODD, T_TM2_M_HEADER_IN_MSG, T_TooManySym_01, UNTRUSTED_Relay, XMNoVowels, XM_SPF_Neutral autolearn=disabled version=3.2.5 X-Spam-Combo: ;David Miller X-Spam-Relay-Country: X-Spam-Report: * -1.8 ALL_TRUSTED Passed through trusted hosts only via SMTP * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.0 T_TM2_M_HEADER_IN_MSG BODY: T_TM2_M_HEADER_IN_MSG * -3.0 BAYES_00 BODY: Bayesian spam probability is 0 to 1% * [score: 0.0000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa01 1397; Body=1 Fuz1=1 Fuz2=1] * 0.4 FVGT_m_MULTI_ODD Contains multiple odd letter combinations * 0.0 T_TooManySym_01 4+ unique symbols in subject * 0.0 XM_SPF_Neutral SPF-Neutral * 0.4 UNTRUSTED_Relay Comes from a non-trusted relay Subject: [PATCH 03/20] net: Batch network namespace destruction. X-SA-Exim-Version: 4.2.1 (built Thu, 25 Oct 2007 00:26:12 +0000) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Eric W. Biederman It is fairly common to kill several network namespaces at once. Either because they are nested one inside the other or because they are cooperating in multiple machine networking experiments. As the network stack control logic does not parallelize easily batch up multiple network namespaces existing together. To get the full benefit of batching the virtual network devices to be removed must be all removed in one batch. For that purpose I have added a loop after the last network device operations have run that batches up all remaining network devices and deletes them. An extra benefit is that the reorganization slightly shrinks the size of the per network namespace data structures replaceing a work_struct with a list_head. In a trivial test with 4K namespaces this change reduced the cost of a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds (at 60% cpu). The bulk of that 44s was spent in inet_twsk_purge. Signed-off-by: Eric W. Biederman --- include/net/net_namespace.h | 2 +- net/core/net_namespace.c | 66 +++++++++++++++++++++++++++++++++++++----- 2 files changed, 59 insertions(+), 9 deletions(-) diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 0addd45..d69b479 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -42,7 +42,7 @@ struct net { */ #endif struct list_head list; /* list of network namespaces */ - struct work_struct work; /* work struct for freeing */ + struct list_head cleanup_list; /* namespaces on death row */ struct proc_dir_entry *proc_net; struct proc_dir_entry *proc_net_stat; diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 86ed7f4..a42caa2 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -8,8 +8,10 @@ #include #include #include +#include #include #include +#include /* * Our network namespace constructor/destructor lists @@ -27,6 +29,20 @@ EXPORT_SYMBOL(init_net); #define INITIAL_NET_GEN_PTRS 13 /* +1 for len +2 for rcu_head */ +static void unregister_netdevices(struct net *net, struct list_head *list) +{ + struct net_device *dev; + /* At exit all network devices most be removed from a network + * namespace. Do this in the reverse order of registeration. + */ + for_each_netdev_reverse(net, dev) { + if (dev->rtnl_link_ops) + dev->rtnl_link_ops->dellink(dev, list); + else + unregister_netdevice_queue(dev, list); + } +} + /* * setup_net runs the initializers for the network namespace object. */ @@ -59,6 +75,13 @@ out_undo: list_for_each_entry_continue_reverse(ops, &pernet_list, list) { if (ops->exit) ops->exit(net); + if (&ops->list == first_device) { + LIST_HEAD(dev_kill_list); + rtnl_lock(); + unregister_netdevices(net, &dev_kill_list); + unregister_netdevice_many(&dev_kill_list); + rtnl_unlock(); + } } rcu_barrier(); @@ -147,18 +170,26 @@ struct net *copy_net_ns(unsigned long flags, struct net *old_net) return net_create(); } +static DEFINE_SPINLOCK(cleanup_list_lock); +static LIST_HEAD(cleanup_list); /* Must hold cleanup_list_lock to touch */ + static void cleanup_net(struct work_struct *work) { struct pernet_operations *ops; - struct net *net; + struct net *net, *tmp; + LIST_HEAD(net_kill_list); - net = container_of(work, struct net, work); + /* Atomically snapshot the list of namespaces to cleanup */ + spin_lock_irq(&cleanup_list_lock); + list_replace_init(&cleanup_list, &net_kill_list); + spin_unlock_irq(&cleanup_list_lock); mutex_lock(&net_mutex); /* Don't let anyone else find us. */ rtnl_lock(); - list_del_rcu(&net->list); + list_for_each_entry(net, &net_kill_list, cleanup_list) + list_del_rcu(&net->list); rtnl_unlock(); /* @@ -170,8 +201,18 @@ static void cleanup_net(struct work_struct *work) /* Run all of the network namespace exit methods */ list_for_each_entry_reverse(ops, &pernet_list, list) { - if (ops->exit) - ops->exit(net); + if (ops->exit) { + list_for_each_entry(net, &net_kill_list, cleanup_list) + ops->exit(net); + } + if (&ops->list == first_device) { + LIST_HEAD(dev_kill_list); + rtnl_lock(); + list_for_each_entry(net, &net_kill_list, cleanup_list) + unregister_netdevices(net, &dev_kill_list); + unregister_netdevice_many(&dev_kill_list); + rtnl_unlock(); + } } mutex_unlock(&net_mutex); @@ -182,14 +223,23 @@ static void cleanup_net(struct work_struct *work) rcu_barrier(); /* Finally it is safe to free my network namespace structure */ - net_free(net); + list_for_each_entry_safe(net, tmp, &net_kill_list, cleanup_list) { + list_del_init(&net->cleanup_list); + net_free(net); + } } +static DECLARE_WORK(net_cleanup_work, cleanup_net); void __put_net(struct net *net) { /* Cleanup the network namespace in process context */ - INIT_WORK(&net->work, cleanup_net); - queue_work(netns_wq, &net->work); + unsigned long flags; + + spin_lock_irqsave(&cleanup_list_lock, flags); + list_add(&net->cleanup_list, &cleanup_list); + spin_unlock_irqrestore(&cleanup_list_lock, flags); + + queue_work(netns_wq, &net_cleanup_work); } EXPORT_SYMBOL_GPL(__put_net);