diff mbox

Linux Route Cache performance tests

Message ID 1320620915.6506.44.camel@edumazet-laptop
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Nov. 6, 2011, 11:08 p.m. UTC
Le dimanche 06 novembre 2011 à 22:57 +0100, Paweł Staszewski a écrit :
> W dniu 2011-11-06 22:26, Eric Dumazet pisze:
> > Le dimanche 06 novembre 2011 à 21:25 +0100, Paweł Staszewski a écrit :
> >> Yes with this is a little problem i think with kernel 3.1 because
> >> dmesg | egrep  '(rhash)|(route)'
> >> [    0.000000] Command line: root=/dev/md2 rhash_entries=2097152
> >> [    0.000000] Kernel command line: root=/dev/md2 rhash_entries=2097152
> >> [    4.697294] IP route cache hash table entries: 524288 (order: 10,
> >> 4194304 bytes)
> >>
> >>
> > Dont tell me you _still_ use a 32bit kernel ?
> no it is 64bit :)
> Linux localhost 3.1.0 #16 SMP Sun Nov 6 18:09:48 CET 2011 x86_64 Intel(R)
> :)
> 
> > If so, you need to tweak alloc_large_system_hash() to use vmalloc,
> > because you hit MAX_ORDER (10) page allocations.
> funny then :)
> Maybee i turned off too many kernel features
> > But considering LOWMEM is about 700 Mbytes, you wont be able to create a
> > lot of route cache entries.
> >
> > Come on, do us a favor, and enter new era of computing.

OK, then your kernel is not CONFIG_NUMA enabled

It seems strange given you probably have a NUMA machine (24 cpus)

If so, your choices are :

1) enable CONFIG_NUMA. Really this is a must given the workload of your
machine.

2) Or : you need to add "hashdist=1" on boot params
   and patch your kernel with following patch :



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Paweł Staszewski Nov. 7, 2011, 8:36 a.m. UTC | #1
W dniu 2011-11-07 00:08, Eric Dumazet pisze:
> Le dimanche 06 novembre 2011 à 22:57 +0100, Paweł Staszewski a écrit :
>> W dniu 2011-11-06 22:26, Eric Dumazet pisze:
>>> Le dimanche 06 novembre 2011 à 21:25 +0100, Paweł Staszewski a écrit :
>>>> Yes with this is a little problem i think with kernel 3.1 because
>>>> dmesg | egrep  '(rhash)|(route)'
>>>> [    0.000000] Command line: root=/dev/md2 rhash_entries=2097152
>>>> [    0.000000] Kernel command line: root=/dev/md2 rhash_entries=2097152
>>>> [    4.697294] IP route cache hash table entries: 524288 (order: 10,
>>>> 4194304 bytes)
>>>>
>>>>
>>> Dont tell me you _still_ use a 32bit kernel ?
>> no it is 64bit :)
>> Linux localhost 3.1.0 #16 SMP Sun Nov 6 18:09:48 CET 2011 x86_64 Intel(R)
>> :)
>>
>>> If so, you need to tweak alloc_large_system_hash() to use vmalloc,
>>> because you hit MAX_ORDER (10) page allocations.
>> funny then :)
>> Maybee i turned off too many kernel features
>>> But considering LOWMEM is about 700 Mbytes, you wont be able to create a
>>> lot of route cache entries.
>>>
>>> Come on, do us a favor, and enter new era of computing.
> OK, then your kernel is not CONFIG_NUMA enabled
>
> It seems strange given you probably have a NUMA machine (24 cpus)
Yes NUMA was not enabled
I make some tests with NUMA and without to compare performance of ixgbe 
with use Node="" parameters for ixgbe module

> If so, your choices are :
>
> 1) enable CONFIG_NUMA. Really this is a must given the workload of your
> machine.
>
> 2) Or : you need to add "hashdist=1" on boot params
>     and patch your kernel with following patch :
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9dd443d..07f86e0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5362,7 +5362,6 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
>
>   int hashdist = HASHDIST_DEFAULT;
>
> -#ifdef CONFIG_NUMA
>   static int __init set_hashdist(char *str)
>   {
>   	if (!str)
> @@ -5371,7 +5370,6 @@ static int __init set_hashdist(char *str)
>   	return 1;
>   }
>   __setup("hashdist=", set_hashdist);
> -#endif
>
>   /*
>    * allocate a large system hash table from bootmem
>
Yes after enabling NUMA I can change rhash_entries on kernel boot.

And what is the most important for big route cahce is rhash_entries
if route cache size exceed hash size performance will drop 6x to 8x
So the best settings for route cache are:
rhash_entries = gc_thresh = max_size

Eric tell me what are the plans for removing route cache from kernel ?
Because as You see with route cache performance is better
And without route cache performance is not soo good than with route 
cache enabled but it is stable for all situations even DDOS with 10kk 
random_ips

So for the feature we need to prepare for lower kernel IP forwarding 
performance because of no route cache ?
Or removing route cache will save some time in IP stack  processing ?


Thanks
Pawel


> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 7, 2011, 9:08 a.m. UTC | #2
Le lundi 07 novembre 2011 à 09:36 +0100, Paweł Staszewski a écrit :

> Yes after enabling NUMA I can change rhash_entries on kernel boot.
> 
> And what is the most important for big route cahce is rhash_entries
> if route cache size exceed hash size performance will drop 6x to 8x
> So the best settings for route cache are:
> rhash_entries = gc_thresh = max_size
> 
> Eric tell me what are the plans for removing route cache from kernel ?
> Because as You see with route cache performance is better
> And without route cache performance is not soo good than with route 
> cache enabled but it is stable for all situations even DDOS with 10kk 
> random_ips
> 
> So for the feature we need to prepare for lower kernel IP forwarding 
> performance because of no route cache ?
> Or removing route cache will save some time in IP stack  processing ?
> 

Obviously, cache removal will be possible only when performance without
it is the same.

Work is in progress, it started a long time ago.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 7, 2011, 9:16 a.m. UTC | #3
Le lundi 07 novembre 2011 à 10:08 +0100, Eric Dumazet a écrit :

> Obviously, cache removal will be possible only when performance without
> it is the same.
> 
> Work is in progress, it started a long time ago.
> 

One of the reason to get rid of this cache is its memory use.

256 bytes per entry, thats a lot of memory if you need 2.000.000
entries...



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paweł Staszewski Nov. 7, 2011, 10:12 p.m. UTC | #4
W dniu 2011-11-07 10:16, Eric Dumazet pisze:
> Le lundi 07 novembre 2011 à 10:08 +0100, Eric Dumazet a écrit :
>
>> Obviously, cache removal will be possible only when performance without
>> it is the same.
>>
>> Work is in progress, it started a long time ago.
>>
> One of the reason to get rid of this cache is its memory use.
>
> 256 bytes per entry, thats a lot of memory if you need 2.000.000
> entries...
>
Yes it is allot for embedded small systems
But in this times when many systems have 12 / 24 / 48GB of memory  - it 
is not too much.



>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9dd443d..07f86e0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5362,7 +5362,6 @@  int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
 
 int hashdist = HASHDIST_DEFAULT;
 
-#ifdef CONFIG_NUMA
 static int __init set_hashdist(char *str)
 {
 	if (!str)
@@ -5371,7 +5370,6 @@  static int __init set_hashdist(char *str)
 	return 1;
 }
 __setup("hashdist=", set_hashdist);
-#endif
 
 /*
  * allocate a large system hash table from bootmem