From patchwork Tue May 19 17:17:03 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jarek Poplawski X-Patchwork-Id: 27403 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@bilbo.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from ozlabs.org (ozlabs.org [203.10.76.45]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx.ozlabs.org", Issuer "CA Cert Signing Authority" (verified OK)) by bilbo.ozlabs.org (Postfix) with ESMTPS id 1C488B7043 for ; Wed, 20 May 2009 03:19:14 +1000 (EST) Received: by ozlabs.org (Postfix) id 0D98CDE071; Wed, 20 May 2009 03:19:14 +1000 (EST) Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 96642DE06F for ; Wed, 20 May 2009 03:19:13 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753392AbZESRTE (ORCPT ); Tue, 19 May 2009 13:19:04 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752276AbZESRTB (ORCPT ); Tue, 19 May 2009 13:19:01 -0400 Received: from mail-bw0-f174.google.com ([209.85.218.174]:41492 "EHLO mail-bw0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751177AbZESRTA (ORCPT ); Tue, 19 May 2009 13:19:00 -0400 Received: by bwz22 with SMTP id 22so3964919bwz.37 for ; Tue, 19 May 2009 10:19:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:references:mime-version:content-type:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=00jEG6HA1qzDNmjXbwYnMxJCXVhSlBxR4EDm7Q7ZIdc=; b=ZrhOvn0uz0HdIY5w7pu1b1F3nkKUU/Ti6gY4x4rzf1EKRY03bgQ/pydYcMUUgU0oPZ p5wr8fAfsyWb/LAl1T8NtuB/MG/dqBOqqihsv65wK5PmMEdSzgHbsZYyR3k21KMCelHX hgMoIdcha3TG9O+UM3E8Mg5cO+5vph/MWFWRA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; b=wseiw9uYwKtgC02mdw87Z5XwUaHQfHqooXAZJdgEueqBoX3blCRZrJ0vLOmMWxfXWe V6Y1x6GUfQMKnw872U7soJ2h4i2CpCCNR1s7QcikfyWz3aIuLYmuYCFZbV1OYOC1VBKF bkixX3U9OPqRo5jns/nzK7Pe+11HKXe1kvO5k= Received: by 10.204.53.1 with SMTP id k1mr248946bkg.125.1242753540929; Tue, 19 May 2009 10:19:00 -0700 (PDT) Received: from ami.dom.local ([79.162.144.134]) by mx.google.com with ESMTPS id g28sm232601fkg.25.2009.05.19.10.18.57 (version=SSLv3 cipher=RC4-MD5); Tue, 19 May 2009 10:18:59 -0700 (PDT) Date: Tue, 19 May 2009 19:17:03 +0200 From: Jarek Poplawski To: Neil Horman Cc: Eric Dumazet , lav@yar.ru, Stephen Hemminger , netdev@vger.kernel.org Subject: Re: Fw: [Bug 13339] New: rtable leak in ipv4/route.c Message-ID: <20090519171703.GA2749@ami.dom.local> References: <20090519123417.GA7376@ff.dom.local> <4A12D10D.3000504@cosmosbay.com> <20090519162330.GC28034@hmsreliant.think-freely.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20090519162330.GC28034@hmsreliant.think-freely.org> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, May 19, 2009 at 12:23:30PM -0400, Neil Horman wrote: > On Tue, May 19, 2009 at 05:32:29PM +0200, Eric Dumazet wrote: > > Jarek Poplawski a écrit : > > > On 19-05-2009 04:35, Stephen Hemminger wrote: > > >> Begin forwarded message: > > >> > > >> Date: Mon, 18 May 2009 14:10:20 GMT > > >> From: bugzilla-daemon@bugzilla.kernel.org > > >> To: shemminger@linux-foundation.org > > >> Subject: [Bug 13339] New: rtable leak in ipv4/route.c > > >> > > >> > > >> http://bugzilla.kernel.org/show_bug.cgi?id=13339 > > > ... > > >> 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the > > >> patch has at least one critical flaw, and another problem. > > >> > > >> rt_intern_hash calculates rthi pointer, which is later used for new entry > > >> insertion. The same loop calculates cand pointer which is used to clean the > > >> list. If the pointers are the same, rtable leak occurs, as first the cand is > > >> removed then the new entry is appended to it. > > >> > > >> This leak leads to unregister_netdevice problem (usage count > 0). > > >> > > >> Another problem of the patch is that it tries to insert the entries in certain > > >> order, to facilitate counting of entries distinct by all but QoS parameters. > > >> Unfortunately, referencing an existing rtable entry moves it to list beginning, > > >> to speed up further lookups, so the carefully built order is destroyed. > > > > We could change rt_check_expire() to be smarter and handle any order in chains. > > > > This would let rt_intern_hash() be simpler. > > > > As its a more performance critical path, all would be good :) > > > > >> > > >> For the first problem the simplest patch it to set rthi=0 when rthi==cand, but > > >> it will also destroy the ordering. > > > > > > I think fixing this bug fast is more important than this > > > ordering or counting. Could you send your patch proposal? > > > > > > > Of course, it helps if I attach the patch :) > > > diff --git a/include/net/dst.h b/include/net/dst.h > index 6be3b08..a39db6d 100644 > --- a/include/net/dst.h > +++ b/include/net/dst.h > @@ -47,6 +47,7 @@ struct dst_entry > #define DST_NOXFRM 2 > #define DST_NOPOLICY 4 > #define DST_NOHASH 8 > +#define DST_GRPLDR 16 > unsigned long expires; > > unsigned short header_len; /* more space at head required */ > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > index c4c60e9..0120f0e 100644 > --- a/net/ipv4/route.c > +++ b/net/ipv4/route.c > @@ -610,6 +610,8 @@ static inline int ip_rt_proc_init(void) > > static inline void rt_free(struct rtable *rt) > { > + if (rt->u.dst.flags & DST_GRPLDR) > + rt->u.dst.rt_next->u.dst.flag |= DST_GRPLDR; > call_rcu_bh(&rt->u.dst.rcu_head, dst_rcu_free); > } > > @@ -1143,8 +1145,11 @@ restart: > * relvant to the hash function together, which we use to adjust > * our chain length > */ > - if (*rthp && compare_hash_inputs(&(*rthp)->fl, &rt->fl)) > + if (!*rthi && *rthp && > + compare_hash_inputs(&(*rthp)->fl, &rt->fl) && > + (cand != rth)) > rthi = rth; Does it really prevent cand == rthi in the next loop? I didn't check Eric's patch yet, but I really don't know what's wrong with something as simple as below for -stable, until "proper" fix is analyzed and tested. Jarek P. --- net/ipv4/route.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/ipv4/route.c b/net/ipv4/route.c index c4c60e9..f4e6c7a 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1157,6 +1157,8 @@ restart: if (chain_length > ip_rt_gc_elasticity) { *candp = cand->u.dst.rt_next; rt_free(cand); + if (rthi == cand) + rthi = NULL; } } else { if (chain_length > rt_chain_length_max) {