From patchwork Tue Jul 14 18:33:08 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jarek Poplawski X-Patchwork-Id: 29777 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@bilbo.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from ozlabs.org (ozlabs.org [203.10.76.45]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx.ozlabs.org", Issuer "CA Cert Signing Authority" (verified OK)) by bilbo.ozlabs.org (Postfix) with ESMTPS id 0E846B707D for ; Wed, 15 Jul 2009 04:33:34 +1000 (EST) Received: by ozlabs.org (Postfix) id F348EDDDA1; Wed, 15 Jul 2009 04:33:33 +1000 (EST) Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 5EDCDDDD0B for ; Wed, 15 Jul 2009 04:33:33 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755404AbZGNSdZ (ORCPT ); Tue, 14 Jul 2009 14:33:25 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755357AbZGNSdZ (ORCPT ); Tue, 14 Jul 2009 14:33:25 -0400 Received: from mail-bw0-f228.google.com ([209.85.218.228]:43641 "EHLO mail-bw0-f228.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755290AbZGNSdY (ORCPT ); Tue, 14 Jul 2009 14:33:24 -0400 Received: by bwz28 with SMTP id 28so957100bwz.37 for ; Tue, 14 Jul 2009 11:33:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:references:mime-version:content-type:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=Q6JAcuJEDZDO4LduJOPGo3nsNHfSZsIWHCDzj25zjNk=; b=oknUHeb72Cx5wM/+kR76GFGiNjAD2uu0JJmSkXoK9KV+b7+VZJQN7MKF3Jpi/tF8Ju MjbGX1c2BKxk2S/WLfEvK/zrSrVJvp1a/bXrNx40v07A5gtorFiHIhrN2TQU52VCcYSl mQA3727WAdACBWiKQG52qHQ+nWnJ7RbjN3j5M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; b=tfTl/U9yUqEUF5GD0aRHL7wG8GbsQYSCxfkJqqB8/i3VoNOvOc6Md3XP3qENHN4mhh xYO0purkUyzVoS11pSeOQ/CABwgw+2mSzANE9Su/XY1boCTlV9Qp9slfLxb4YT91VSwq izD7RYru/sj0aOOvrFiZZuqPeB+6uoKEDpCU8= Received: by 10.204.114.136 with SMTP id e8mr6557004bkq.190.1247596401859; Tue, 14 Jul 2009 11:33:21 -0700 (PDT) Received: from ami.dom.local ([79.162.141.243]) by mx.google.com with ESMTPS id d13sm10536249fka.32.2009.07.14.11.33.17 (version=SSLv3 cipher=RC4-MD5); Tue, 14 Jul 2009 11:33:19 -0700 (PDT) Date: Tue, 14 Jul 2009 20:33:08 +0200 From: Jarek Poplawski To: David Miller Cc: "Paul E. McKenney" , =?iso-8859-2?Q?Pawe=B3?= Staszewski , Linux Network Development list , Robert Olsson Subject: [PATCH net-next] Re: rib_trie / Fix inflate_threshold_root. Now=15 size=11 bits Message-ID: <20090714183308.GB3090@ami.dom.local> References: <20090701110407.GC12715@ff.dom.local> <4A4BE06F.3090608@itcare.pl> <20090702053216.GA4954@ff.dom.local> <4A4C48FD.7040002@itcare.pl> <20090702060011.GB4954@ff.dom.local> <4A4FF34E.7080001@itcare.pl> <4A4FF40B.5090003@itcare.pl> <20090705162003.GA19477@ami.dom.local> <20090705173208.GB19477@ami.dom.local> <20090705213232.GG8943@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20090705213232.GG8943@linux.vnet.ibm.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Sun, Jul 05, 2009 at 02:32:32PM -0700, Paul E. McKenney wrote: > On Sun, Jul 05, 2009 at 07:32:08PM +0200, Jarek Poplawski wrote: > > On Sun, Jul 05, 2009 at 06:20:03PM +0200, Jarek Poplawski wrote: > > > On Sun, Jul 05, 2009 at 02:30:03AM +0200, Paweł Staszewski wrote: > > > > Oh > > > > > > > > I forgot - please Jarek give me patch with sync rcu and i will make test > > > > on preempt kernel > > > > > > Probably non-preempt kernel might need something like this more, but > > > comparing is always interesting. This patch is based on Paul's > > > suggestion (I hope). > > > > Hold on ;-) Here is something even better... Syncing after 128 pages > > might be still too slow, so here is a higher initial value, 1000, plus > > you can change this while testing in: > > > > /sys/module/fib_trie/parameters/sync_pages > > > > It would be interesting to find the lowest acceptable value. > > Looks like a promising approach to me! > > Thanx, Paul Below is a simpler version of this patch, without the sysfs parameter. (I left the previous version quoted for comparison.) Thanks. > > Jarek P. > > ---> (synchronize take 8; apply on top of the 2.6.29.x with the last > > all-in-one patch, or net-2.6) > > > > net/ipv4/fib_trie.c | 12 ++++++++++++ > > 1 files changed, 12 insertions(+), 0 deletions(-) > > > > diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c > > index 00a54b2..decc8d0 100644 > > --- a/net/ipv4/fib_trie.c > > +++ b/net/ipv4/fib_trie.c > > @@ -71,6 +71,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -164,6 +165,10 @@ static struct tnode *inflate(struct trie *t, struct tnode *tn); > > static struct tnode *halve(struct trie *t, struct tnode *tn); > > /* tnodes to free after resize(); protected by RTNL */ > > static struct tnode *tnode_free_head; > > +static size_t tnode_free_size; > > + > > +static int sync_pages __read_mostly = 1000; > > +module_param(sync_pages, int, 0640); > > > > static struct kmem_cache *fn_alias_kmem __read_mostly; > > static struct kmem_cache *trie_leaf_kmem __read_mostly; > > @@ -393,6 +398,8 @@ static void tnode_free_safe(struct tnode *tn) > > BUG_ON(IS_LEAF(tn)); > > tn->tnode_free = tnode_free_head; > > tnode_free_head = tn; > > + tnode_free_size += sizeof(struct tnode) + > > + (sizeof(struct node *) << tn->bits); > > } > > > > static void tnode_free_flush(void) > > @@ -404,6 +411,11 @@ static void tnode_free_flush(void) > > tn->tnode_free = NULL; > > tnode_free(tn); > > } > > + > > + if (tnode_free_size >= PAGE_SIZE * sync_pages) { > > + tnode_free_size = 0; > > + synchronize_rcu(); > > + } > > } > > > > static struct leaf *leaf_new(void) > > -- ------------------------> ipv4: Use synchronize_rcu() during trie_rebalance() During trie_rebalance() we free memory after resizing with call_rcu(), but large updates, especially with PREEMPT_NONE configs, can cause memory stresses, so this patch calls synchronize_rcu() in tnode_free_flush() after each sync_pages to guarantee such freeing (especially before resizing the root node). The value of sync_pages = 128 is based on Pawel Staszewski's tests as the lowest which doesn't hinder updating times. (For testing purposes there was a sysfs module parameter to change it on demand, but it's removed until we're sure it could be really useful.) The patch is based on suggestions by: Paul E. McKenney Reported-by: Pawel Staszewski Tested-by: Pawel Staszewski Signed-off-by: Jarek Poplawski --- net/ipv4/fib_trie.c | 15 +++++++++++++++ 1 files changed, 15 insertions(+), 0 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 63c2fa7..58ba9f4 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -164,6 +164,14 @@ static struct tnode *inflate(struct trie *t, struct tnode *tn); static struct tnode *halve(struct trie *t, struct tnode *tn); /* tnodes to free after resize(); protected by RTNL */ static struct tnode *tnode_free_head; +static size_t tnode_free_size; + +/* + * synchronize_rcu after call_rcu for that many pages; it should be especially + * useful before resizing the root node with PREEMPT_NONE configs; the value was + * obtained experimentally, aiming to avoid visible slowdown. + */ +static const int sync_pages = 128; static struct kmem_cache *fn_alias_kmem __read_mostly; static struct kmem_cache *trie_leaf_kmem __read_mostly; @@ -393,6 +401,8 @@ static void tnode_free_safe(struct tnode *tn) BUG_ON(IS_LEAF(tn)); tn->tnode_free = tnode_free_head; tnode_free_head = tn; + tnode_free_size += sizeof(struct tnode) + + (sizeof(struct node *) << tn->bits); } static void tnode_free_flush(void) @@ -404,6 +414,11 @@ static void tnode_free_flush(void) tn->tnode_free = NULL; tnode_free(tn); } + + if (tnode_free_size >= PAGE_SIZE * sync_pages) { + tnode_free_size = 0; + synchronize_rcu(); + } } static struct leaf *leaf_new(void)