From patchwork Thu Jul 9 05:36:05 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 29623 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@bilbo.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from ozlabs.org (ozlabs.org [203.10.76.45]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx.ozlabs.org", Issuer "CA Cert Signing Authority" (verified OK)) by bilbo.ozlabs.org (Postfix) with ESMTPS id 2F0D7B707E for ; Thu, 9 Jul 2009 15:38:22 +1000 (EST) Received: by ozlabs.org (Postfix) id 21E72DDDF4; Thu, 9 Jul 2009 15:38:22 +1000 (EST) Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id ADE02DDDE7 for ; Thu, 9 Jul 2009 15:38:21 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751609AbZGIFgi (ORCPT ); Thu, 9 Jul 2009 01:36:38 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751480AbZGIFgh (ORCPT ); Thu, 9 Jul 2009 01:36:37 -0400 Received: from gw1.cosmosbay.com ([212.99.114.194]:51190 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750967AbZGIFgh (ORCPT ); Thu, 9 Jul 2009 01:36:37 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) by gw1.cosmosbay.com (8.13.7/8.13.7) with ESMTP id n695a6u8022063; Thu, 9 Jul 2009 07:36:07 +0200 Message-ID: <4A5581C5.5070409@gmail.com> Date: Thu, 09 Jul 2009 07:36:05 +0200 From: Eric Dumazet User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: David Miller CC: emil.s.tantilov@intel.com, emils.tantilov@gmail.com, netdev@vger.kernel.org, jesse.brandeburg@intel.com, jeffrey.t.kirsher@intel.com, jolsa@redhat.com, Patrick McHardy , "Paul E. McKenney" Subject: Re: [PATCH] net: sk_prot_alloc() should not blindly overwrite memory References: <4A537469.3040207@gmail.com> <4A53CD39.7080407@gmail.com> <20090707.191424.167842005.davem@davemloft.net> <4A5441A0.3050504@gmail.com> In-Reply-To: <4A5441A0.3050504@gmail.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [0.0.0.0]); Thu, 09 Jul 2009 07:36:08 +0200 (CEST) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Eric Dumazet a écrit : > David Miller a écrit : >> From: Eric Dumazet >> Date: Wed, 08 Jul 2009 00:33:29 +0200 >> >>> [PATCH] net: sk_prot_alloc() should not blindly overwrite memory >>> >>> Some sockets use SLAB_DESTROY_BY_RCU, and our RCU code rely that some >>> fields should not be blindly overwritten, even with null. >>> >>> These fields are sk->sk_refcnt and sk->sk_nulls_node.next >>> >>> Current sk_prot_alloc() implementation doesnt respect this hypothesis, >>> calling kmem_cache_alloc() with __GFP_ZERO and setting sk_refcnt to 1 >>> instead of atomically increment it. >>> >>> Reported-by: Emil S Tantilov >>> Signed-off-by: Eric Dumazet >> I've applied this but will wait for some more testing before >> I push it out for real to kernel.org > > Thanks David > > I forgot to CC Paul and Patrick, so I'll ask them to look at this patch. > > Patrick, a similar fix is needed in conntrack as well, we currently > uses "ct = kmem_cache_zalloc(nf_conntrack_cachep, gfp);" and thus > overwrite struct hlist_nulls_node hnnode; contained > in "struct nf_conntrack_tuple_hash", while lockless readers still > potentialy need them. Setting hnnode.next to NULL is dangerous > since last bit is not set (not a nulls value), a reader could > try to dereference this NULL pointer and trap. > > > Here is the patch again so that Paul & Patrick can comment on it. > > I am not sure about the refcnt thing (blindly setting it to 0 again > should be OK in fact, since no reader should/can to the > atomic_inc_if_not_zero on it), but the nulls.next thing is problematic. Here is an updated and much simpler patch, taking care of sk_node.next being not set to 0 This patch applies to >= 2.6.29 kernels [PATCH] net: sk_prot_alloc() should not blindly overwrite memory Some sockets use SLAB_DESTROY_BY_RCU, and our RCU code correctness depends on sk->sk_nulls_node.next being always valid. A NULL value is not allowed as it might fault a lockless reader. Current sk_prot_alloc() implementation doesnt respect this hypothesis, calling kmem_cache_alloc() with __GFP_ZERO. Just call memset() around the forbidden field. Signed-off-by: Eric Dumazet --- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/core/sock.c b/net/core/sock.c index b0ba569..7b87ec0 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -939,8 +939,23 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority, struct kmem_cache *slab; slab = prot->slab; - if (slab != NULL) - sk = kmem_cache_alloc(slab, priority); + if (slab != NULL) { + sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO); + if (!sk) + return sk; + if (priority & __GFP_ZERO) { + /* + * caches using SLAB_DESTROY_BY_RCU should let + * sk_node.next un-modified. Special care is taken + * when initializing object to zero. + */ + if (offsetof(struct sock, sk_node.next) != 0) + memset(sk, 0, offsetof(struct sock, sk_node.next)); + memset(&sk->sk_node.pprev, 0, + prot->obj_size - offsetof(struct sock, + sk_node.pprev)); + } + } else sk = kmalloc(prot->obj_size, priority);