From patchwork Tue Jul 28 23:41:33 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: gregkh@suse.de X-Patchwork-Id: 30326 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@bilbo.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from ozlabs.org (ozlabs.org [203.10.76.45]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx.ozlabs.org", Issuer "CA Cert Signing Authority" (verified OK)) by bilbo.ozlabs.org (Postfix) with ESMTPS id C6790B6EDF for ; Wed, 29 Jul 2009 09:52:23 +1000 (EST) Received: by ozlabs.org (Postfix) id B7596DDDA0; Wed, 29 Jul 2009 09:52:23 +1000 (EST) Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 3AC56DDD1C for ; Wed, 29 Jul 2009 09:52:23 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932144AbZG1Xv5 (ORCPT ); Tue, 28 Jul 2009 19:51:57 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754280AbZG1Xuf (ORCPT ); Tue, 28 Jul 2009 19:50:35 -0400 Received: from kroah.org ([198.145.64.141]:35970 "EHLO coco.kroah.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754976AbZG1Xuc (ORCPT ); Tue, 28 Jul 2009 19:50:32 -0400 Received: from localhost (c-76-105-230-205.hsd1.or.comcast.net [76.105.230.205]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by coco.kroah.org (Postfix) with ESMTPSA id 3442B49157; Tue, 28 Jul 2009 16:50:32 -0700 (PDT) X-Mailbox-Line: From gregkh@mini.kroah.org Tue Jul 28 16:42:01 2009 Message-Id: <20090728234201.430401674@mini.kroah.org> User-Agent: quilt/0.48-1 Date: Tue, 28 Jul 2009 16:41:33 -0700 From: Greg KH To: linux-kernel@vger.kernel.org, stable@kernel.org Cc: stable-review@kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk, netdev@vger.kernel.org, netfilter-devel@vger.kernel.org, Patrick McHardy , Eric Dumazet , "Paul E. McKenney" Subject: [patch 64/71] nf_conntrack: nf_conntrack_alloc() fixes References: <20090728234029.868717854@mini.kroah.org> Content-Disposition: inline; filename=nf_conntrack-nf_conntrack_alloc-fixes.patch Lines: 102 In-Reply-To: <20090728234756.GA11917@kroah.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org 2.6.30-stable review patch. If anyone has any objections, please let us know. ------------------ From: Eric Dumazet commit 941297f443f871b8c3372feccf27a8733f6ce9e9 upstream. When a slab cache uses SLAB_DESTROY_BY_RCU, we must be careful when allocating objects, since slab allocator could give a freed object still used by lockless readers. In particular, nf_conntrack RCU lookups rely on ct->tuplehash[xxx].hnnode.next being always valid (ie containing a valid 'nulls' value, or a valid pointer to next object in hash chain.) kmem_cache_zalloc() setups object with NULL values, but a NULL value is not valid for ct->tuplehash[xxx].hnnode.next. Fix is to call kmem_cache_alloc() and do the zeroing ourself. As spotted by Patrick, we also need to make sure lookup keys are committed to memory before setting refcount to 1, or a lockless reader could get a reference on the old version of the object. Its key re-check could then pass the barrier. Signed-off-by: Eric Dumazet Signed-off-by: Patrick McHardy Acked-by: Paul E. McKenney Signed-off-by: Greg Kroah-Hartman --- Documentation/RCU/rculist_nulls.txt | 7 ++++++- net/netfilter/nf_conntrack_core.c | 21 ++++++++++++++++++--- 2 files changed, 24 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- a/Documentation/RCU/rculist_nulls.txt +++ b/Documentation/RCU/rculist_nulls.txt @@ -83,11 +83,12 @@ not detect it missed following items in obj = kmem_cache_alloc(...); lock_chain(); // typically a spin_lock() obj->key = key; -atomic_inc(&obj->refcnt); /* * we need to make sure obj->key is updated before obj->next + * or obj->refcnt */ smp_wmb(); +atomic_set(&obj->refcnt, 1); hlist_add_head_rcu(&obj->obj_node, list); unlock_chain(); // typically a spin_unlock() @@ -159,6 +160,10 @@ out: obj = kmem_cache_alloc(cachep); lock_chain(); // typically a spin_lock() obj->key = key; +/* + * changes to obj->key must be visible before refcnt one + */ +smp_wmb(); atomic_set(&obj->refcnt, 1); /* * insert obj in RCU way (readers might be traversing chain) --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -525,22 +525,37 @@ struct nf_conn *nf_conntrack_alloc(struc } } - ct = kmem_cache_zalloc(nf_conntrack_cachep, gfp); + /* + * Do not use kmem_cache_zalloc(), as this cache uses + * SLAB_DESTROY_BY_RCU. + */ + ct = kmem_cache_alloc(nf_conntrack_cachep, gfp); if (ct == NULL) { pr_debug("nf_conntrack_alloc: Can't alloc conntrack.\n"); atomic_dec(&net->ct.count); return ERR_PTR(-ENOMEM); } - - atomic_set(&ct->ct_general.use, 1); + /* + * Let ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode.next + * and ct->tuplehash[IP_CT_DIR_REPLY].hnnode.next unchanged. + */ + memset(&ct->tuplehash[IP_CT_DIR_MAX], 0, + sizeof(*ct) - offsetof(struct nf_conn, tuplehash[IP_CT_DIR_MAX])); ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *orig; + ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode.pprev = NULL; ct->tuplehash[IP_CT_DIR_REPLY].tuple = *repl; + ct->tuplehash[IP_CT_DIR_REPLY].hnnode.pprev = NULL; /* Don't set timer yet: wait for confirmation */ setup_timer(&ct->timeout, death_by_timeout, (unsigned long)ct); #ifdef CONFIG_NET_NS ct->ct_net = net; #endif + /* + * changes to lookup keys must be done before setting refcnt to 1 + */ + smp_wmb(); + atomic_set(&ct->ct_general.use, 1); return ct; } EXPORT_SYMBOL_GPL(nf_conntrack_alloc);