From patchwork Fri Feb 20 16:02:57 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Eric W. Biederman" X-Patchwork-Id: 23492 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id B678ADDE0E for ; Sat, 21 Feb 2009 03:02:45 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753297AbZBTQCg (ORCPT ); Fri, 20 Feb 2009 11:02:36 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753056AbZBTQCg (ORCPT ); Fri, 20 Feb 2009 11:02:36 -0500 Received: from out02.mta.xmission.com ([166.70.13.232]:45845 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752876AbZBTQCf (ORCPT ); Fri, 20 Feb 2009 11:02:35 -0500 Received: from mx04.mta.xmission.com ([166.70.13.214]) by out02.mta.xmission.com with esmtp (Exim 4.62) (envelope-from ) id 1LaXpi-0004D6-On; Fri, 20 Feb 2009 09:02:34 -0700 Received: from c-67-169-126-145.hsd1.ca.comcast.net ([67.169.126.145] helo=fess.ebiederm.org) by mx04.mta.xmission.com with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1LaXpi-0005y1-1L; Fri, 20 Feb 2009 09:02:34 -0700 Received: from fess.ebiederm.org (localhost [127.0.0.1]) by fess.ebiederm.org (8.14.3/8.14.3/Debian-4) with ESMTP id n1KG2w9m007966; Fri, 20 Feb 2009 08:02:58 -0800 Received: (from eric@localhost) by fess.ebiederm.org (8.14.3/8.14.3/Submit) id n1KG2vmx007965; Fri, 20 Feb 2009 08:02:57 -0800 X-Authentication-Warning: fess.ebiederm.org: eric set sender to ebiederm@xmission.com using -f To: David Miller Cc: Linux Containers , , Alexey Dobriyan , "Denis V. Lunev" References: From: ebiederm@xmission.com (Eric W. Biederman) Date: Fri, 20 Feb 2009 08:02:57 -0800 In-Reply-To: (Eric W. Biederman's message of "Fri\, 20 Feb 2009 07\:53\:51 -0800") Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) MIME-Version: 1.0 X-XM-SPF: eid=; ; ; mid=; ; ; hst=mx04.mta.xmission.com; ; ; ip=67.169.126.145; ; ; frm=ebiederm@xmission.com; ; ; spf=neutral X-SA-Exim-Connect-IP: 67.169.126.145 X-SA-Exim-Rcpt-To: davem@davemloft.net, den@openvz.org, adobriyan@gmail.com, netdev@vger.kernel.org, containers@lists.osdl.org X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-DCC: XMission; sa01 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on sa01.xmission.com X-Spam-Level: X-Spam-Status: No, score=-3.9 required=8.0 tests=ALL_TRUSTED,BAYES_00, DCC_CHECK_NEGATIVE, XM_Body_Dirty_Words, XM_SPF_Neutral autolearn=disabled version=3.2.5 X-Spam-Combo: ;David Miller X-Spam-Relay-Country: X-Spam-Report: * -1.8 ALL_TRUSTED Passed through trusted hosts only via SMTP * -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1% * [score: 0.0000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa01 1397; Body=1 Fuz1=1 Fuz2=1] * 0.5 XM_Body_Dirty_Words Contains a dirty word * 0.0 XM_SPF_Neutral SPF-Neutral Subject: [PATCH 3/3] netns: Remove net_alive X-SA-Exim-Version: 4.2.1 (built Thu, 07 Dec 2006 04:40:56 +0000) X-SA-Exim-Scanned: Yes (on mx04.mta.xmission.com) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org It turns out that net_alive is unnecessary, and the original problem that led to it being added was simply that the icmp code thought it was a network device and wound up being unable to handle packets while there were still packets in the network namespace. Now that icmp and tcp have been fixed to properly register themselves this problem is no longer present and we have a stronger guarantee that packets will not arrive in a network namespace then that provided by net_alive in netif_receive_skb. So remove net_alive allowing packet reception run a little faster. Additionally document the strong reason why network namespace cleanup is safe so that if something happens again someone else will have a chance of figuring it out. Signed-off-by: Eric W. Biederman --- include/net/net_namespace.h | 27 +++++++++++++++++---------- net/core/dev.c | 6 ------ net/core/net_namespace.c | 3 --- 3 files changed, 17 insertions(+), 19 deletions(-) diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 6fc13d9..ded434b 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -109,11 +109,6 @@ extern struct list_head net_namespace_list; #ifdef CONFIG_NET_NS extern void __put_net(struct net *net); -static inline int net_alive(struct net *net) -{ - return net && atomic_read(&net->count); -} - static inline struct net *get_net(struct net *net) { atomic_inc(&net->count); @@ -145,11 +140,6 @@ int net_eq(const struct net *net1, const struct net *net2) } #else -static inline int net_alive(struct net *net) -{ - return 1; -} - static inline struct net *get_net(struct net *net) { return net; @@ -234,6 +224,23 @@ struct pernet_operations { void (*exit)(struct net *net); }; +/* + * Use these carefully. If you implement a network device and it + * needs per network namespace operations use device pernet operations, + * otherwise use pernet subsys operations. + * + * This is critically important. Most of the network code cleanup + * runs with the assumption that dev_remove_pack has been called so no + * new packets will arrive during and after the cleanup functions have + * been called. dev_remove_pack is not per namespace so instead the + * guarantee of no more packets arriving in a network namespace is + * provided by ensuring that all network devices and all sockets have + * left the network namespace before the cleanup methods are called. + * + * For the longest time the ipv4 icmp code was registered as a pernet + * device which caused kernel oops, and panics during network + * namespace cleanup. So please don't get this wrong. + */ extern int register_pernet_subsys(struct pernet_operations *); extern void unregister_pernet_subsys(struct pernet_operations *); extern int register_pernet_gen_subsys(int *id, struct pernet_operations *); diff --git a/net/core/dev.c b/net/core/dev.c index 5493394..fd1712d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2267,12 +2267,6 @@ int netif_receive_skb(struct sk_buff *skb) rcu_read_lock(); - /* Don't receive packets in an exiting network namespace */ - if (!net_alive(dev_net(skb->dev))) { - kfree_skb(skb); - goto out; - } - #ifdef CONFIG_NET_CLS_ACT if (skb->tc_verd & TC_NCLS) { skb->tc_verd = CLR_TC_NCLS(skb->tc_verd); diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 55151fa..516c7b1 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -140,9 +140,6 @@ static void cleanup_net(struct work_struct *work) struct pernet_operations *ops; struct net *net; - /* Be very certain incoming network packets will not find us */ - rcu_barrier(); - net = container_of(work, struct net, work); mutex_lock(&net_mutex);