diff mbox

[net-2.6] Re: rib_trie / Fix inflate_threshold_root. Now=15 size=11 bits

Message ID 20090629083315.GA4712@ff.dom.local
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Jarek Poplawski June 29, 2009, 8:33 a.m. UTC
On 28-06-2009 23:36, Jarek Poplawski wrote:
> To David Miller:
> since among patches tested negatively by Pawel are current 2 fixes
> from 2.6.31-rc, I hope they weren't sent to -stable yet. Otherwise,
> please withdraw them until they are tested alone. Thanks.

David, IMHO this fix is needed in net-2.6 even if it doesn't fix the
problem reported by Pawel (there could be still something more).

Pawel, I see you decided to test my previous patch, but try to add
this one on top.

Thanks,
Jarek P.
------------------->
ipv4: Fix fib_trie rebalancing, part 3

Alas current delaying of freeing old tnodes by RCU in trie_rebalance
is still not enough because we can free a top tnode before updating a
t->trie pointer.

Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 net/ipv4/fib_trie.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Paweł Staszewski June 29, 2009, 9:51 a.m. UTC | #1
I apply this patch

fib_triestats in attached file :)


Jarek Poplawski pisze:
> On 28-06-2009 23:36, Jarek Poplawski wrote:
>   
>> To David Miller:
>> since among patches tested negatively by Pawel are current 2 fixes
>> from 2.6.31-rc, I hope they weren't sent to -stable yet. Otherwise,
>> please withdraw them until they are tested alone. Thanks.
>>     
>
> David, IMHO this fix is needed in net-2.6 even if it doesn't fix the
> problem reported by Pawel (there could be still something more).
>
> Pawel, I see you decided to test my previous patch, but try to add
> this one on top.
>
> Thanks,
> Jarek P.
> ------------------->
> ipv4: Fix fib_trie rebalancing, part 3
>
> Alas current delaying of freeing old tnodes by RCU in trie_rebalance
> is still not enough because we can free a top tnode before updating a
> t->trie pointer.
>
> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> ---
>
>  net/ipv4/fib_trie.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index 012cf5a..00a54b2 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -1021,6 +1021,9 @@ static void trie_rebalance(struct trie *t, struct tnode *tn)
>  				      (struct node *)tn, wasfull);
>  
>  		tp = node_parent((struct node *) tn);
> +		if (!tp)
> +			rcu_assign_pointer(t->trie, (struct node *)tn);
> +
>  		tnode_free_flush();
>  		if (!tp)
>  			break;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
cat /proc/net/fib_triestat
Basic info: size of leaf: 20 bytes, size of tnode: 36 bytes.
Main:
        Aver depth:     2.29
        Max depth:      7
        Leaves:         276909
        Prefixes:       290383
        Internal nodes: 66893
          1: 34715  2: 14024  3: 9889  4: 4833  5: 2275  6: 1150  7: 5  9: 1  18: 1
        Pointers: 691662
Null ptrs: 347861
Total size: 18403  kB

Counters:
---------
gets = 2297579
backtracks = 131491
semantic match passed = 2233070
semantic match miss = 42
null node hit= 2016883
skipped node resize = 0

Local:
        Aver depth:     3.75
        Max depth:      5
        Leaves:         12
        Prefixes:       13
        Internal nodes: 10
          1: 9  2: 1
        Pointers: 22
Null ptrs: 1
Total size: 2  kB

Counters:
---------
gets = 2302102
backtracks = 1545197
semantic match passed = 4536
semantic match miss = 0
null node hit= 192664
skipped node resize = 0

----------------------------------------------------------------

cat /proc/net/fib_triestat
Basic info: size of leaf: 20 bytes, size of tnode: 36 bytes.
Main:
        Aver depth:     2.29
        Max depth:      7
        Leaves:         276904
        Prefixes:       290378
        Internal nodes: 66889
          1: 34711  2: 14023  3: 9890  4: 4833  5: 2275  6: 1150  7: 5  9: 1  18: 1
        Pointers: 691658
Null ptrs: 347866
Total size: 18402  kB

Counters:
---------
gets = 3006945
backtracks = 138787
semantic match passed = 2942047
semantic match miss = 85
null node hit= 2826377
skipped node resize = 0

Local:
        Aver depth:     3.75
        Max depth:      5
        Leaves:         12
        Prefixes:       13
        Internal nodes: 10
          1: 9  2: 1
        Pointers: 22
Null ptrs: 1
Total size: 2  kB

Counters:
---------
gets = 3011504
backtracks = 1796587
semantic match passed = 4577
semantic match miss = 0
null node hit= 192747
skipped node resize = 0

--------------------------------------------------------------

cat /proc/net/fib_triestat
Basic info: size of leaf: 20 bytes, size of tnode: 36 bytes.
Main:
        Aver depth:     2.29
        Max depth:      7
        Leaves:         276904
        Prefixes:       290378
        Internal nodes: 66891
          1: 34710  2: 14025  3: 9892  4: 4832  5: 2275  6: 1150  7: 5  9: 1  18: 1
        Pointers: 691664
Null ptrs: 347870
Total size: 18402  kB

Counters:
---------
gets = 3320633
backtracks = 141904
semantic match passed = 3255585
semantic match miss = 99
null node hit= 3177543
skipped node resize = 0

Local:
        Aver depth:     3.75
        Max depth:      5
        Leaves:         12
        Prefixes:       13
        Internal nodes: 10
          1: 9  2: 1
        Pointers: 22
Null ptrs: 1
Total size: 2  kB

Counters:
---------
gets = 3325226
backtracks = 1904022
semantic match passed = 4601
semantic match miss = 0
null node hit= 192782
skipped node resize = 0
Jarek Poplawski June 29, 2009, 10:58 a.m. UTC | #2
On Mon, Jun 29, 2009 at 11:51:52AM +0200, Paweł Staszewski wrote:
> I apply this patch
>
> fib_triestats in attached file :)
>
>> ------------------->
>> ipv4: Fix fib_trie rebalancing, part 3
>>
>> Alas current delaying of freeing old tnodes by RCU in trie_rebalance
>> is still not enough because we can free a top tnode before updating a
>> t->trie pointer.
>>
>> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
>> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>> ---

David, I guess you could add:

Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 30, 2009, 7:48 p.m. UTC | #3
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 29 Jun 2009 10:58:20 +0000

> On Mon, Jun 29, 2009 at 11:51:52AM +0200, Paweł Staszewski wrote:
>> I apply this patch
>>
>> fib_triestats in attached file :)
>>
>>> ------------------->
>>> ipv4: Fix fib_trie rebalancing, part 3
>>>
>>> Alas current delaying of freeing old tnodes by RCU in trie_rebalance
>>> is still not enough because we can free a top tnode before updating a
>>> t->trie pointer.
>>>
>>> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
>>> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>>> ---
> 
> David, I guess you could add:
> 
> Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>

Done, and applied, thanks Jarek.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski June 30, 2009, 8:14 p.m. UTC | #4
On Tue, Jun 30, 2009 at 12:48:49PM -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 29 Jun 2009 10:58:20 +0000
> 
> > On Mon, Jun 29, 2009 at 11:51:52AM +0200, Paweł Staszewski wrote:
> >> I apply this patch
> >>
> >> fib_triestats in attached file :)
> >>
> >>> ------------------->
> >>> ipv4: Fix fib_trie rebalancing, part 3
> >>>
> >>> Alas current delaying of freeing old tnodes by RCU in trie_rebalance
> >>> is still not enough because we can free a top tnode before updating a
> >>> t->trie pointer.
> >>>
> >>> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
> >>> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> >>> ---
> > 
> > David, I guess you could add:
> > 
> > Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
> 
> Done, and applied, thanks Jarek.

Btw., a little comment: there are still some issues while trying to
reclaim memory after synchronize_rcu, which means the algorithm is
buggy, or RCU use is still buggy, or maybe some timing because of
synchronize_rcu. Anyway, fib_trie still seems to be safe only with
CONFIG_PREEMPT_NONE, so I have no idea how this should be fixed in
-stables (or why people don't report more this BUG in 2.6.30)...

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger July 10, 2009, 3:29 p.m. UTC | #5
On Tue, 30 Jun 2009 12:48:49 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 29 Jun 2009 10:58:20 +0000
> 
> > On Mon, Jun 29, 2009 at 11:51:52AM +0200, Paweł Staszewski wrote:
> >> I apply this patch
> >>
> >> fib_triestats in attached file :)
> >>
> >>> ------------------->
> >>> ipv4: Fix fib_trie rebalancing, part 3
> >>>
> >>> Alas current delaying of freeing old tnodes by RCU in trie_rebalance
> >>> is still not enough because we can free a top tnode before updating a
> >>> t->trie pointer.
> >>>
> >>> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
> >>> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> >>> ---
> > 
> > David, I guess you could add:
> > 
> > Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
> 
> Done, and applied, thanks Jarek.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

This is probably in kernel bugzilla as well, so someone should
update:
  http://bugzilla.kernel.org/show_bug.cgi?id=6648
diff mbox

Patch

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 012cf5a..00a54b2 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1021,6 +1021,9 @@  static void trie_rebalance(struct trie *t, struct tnode *tn)
 				      (struct node *)tn, wasfull);
 
 		tp = node_parent((struct node *) tn);
+		if (!tp)
+			rcu_assign_pointer(t->trie, (struct node *)tn);
+
 		tnode_free_flush();
 		if (!tp)
 			break;