Message ID | 1320620915.6506.44.camel@edumazet-laptop |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
W dniu 2011-11-07 00:08, Eric Dumazet pisze: > Le dimanche 06 novembre 2011 à 22:57 +0100, Paweł Staszewski a écrit : >> W dniu 2011-11-06 22:26, Eric Dumazet pisze: >>> Le dimanche 06 novembre 2011 à 21:25 +0100, Paweł Staszewski a écrit : >>>> Yes with this is a little problem i think with kernel 3.1 because >>>> dmesg | egrep '(rhash)|(route)' >>>> [ 0.000000] Command line: root=/dev/md2 rhash_entries=2097152 >>>> [ 0.000000] Kernel command line: root=/dev/md2 rhash_entries=2097152 >>>> [ 4.697294] IP route cache hash table entries: 524288 (order: 10, >>>> 4194304 bytes) >>>> >>>> >>> Dont tell me you _still_ use a 32bit kernel ? >> no it is 64bit :) >> Linux localhost 3.1.0 #16 SMP Sun Nov 6 18:09:48 CET 2011 x86_64 Intel(R) >> :) >> >>> If so, you need to tweak alloc_large_system_hash() to use vmalloc, >>> because you hit MAX_ORDER (10) page allocations. >> funny then :) >> Maybee i turned off too many kernel features >>> But considering LOWMEM is about 700 Mbytes, you wont be able to create a >>> lot of route cache entries. >>> >>> Come on, do us a favor, and enter new era of computing. > OK, then your kernel is not CONFIG_NUMA enabled > > It seems strange given you probably have a NUMA machine (24 cpus) Yes NUMA was not enabled I make some tests with NUMA and without to compare performance of ixgbe with use Node="" parameters for ixgbe module > If so, your choices are : > > 1) enable CONFIG_NUMA. Really this is a must given the workload of your > machine. > > 2) Or : you need to add "hashdist=1" on boot params > and patch your kernel with following patch : > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 9dd443d..07f86e0 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5362,7 +5362,6 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write, > > int hashdist = HASHDIST_DEFAULT; > > -#ifdef CONFIG_NUMA > static int __init set_hashdist(char *str) > { > if (!str) > @@ -5371,7 +5370,6 @@ static int __init set_hashdist(char *str) > return 1; > } > __setup("hashdist=", set_hashdist); > -#endif > > /* > * allocate a large system hash table from bootmem > Yes after enabling NUMA I can change rhash_entries on kernel boot. And what is the most important for big route cahce is rhash_entries if route cache size exceed hash size performance will drop 6x to 8x So the best settings for route cache are: rhash_entries = gc_thresh = max_size Eric tell me what are the plans for removing route cache from kernel ? Because as You see with route cache performance is better And without route cache performance is not soo good than with route cache enabled but it is stable for all situations even DDOS with 10kk random_ips So for the feature we need to prepare for lower kernel IP forwarding performance because of no route cache ? Or removing route cache will save some time in IP stack processing ? Thanks Pawel > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 07 novembre 2011 à 09:36 +0100, Paweł Staszewski a écrit : > Yes after enabling NUMA I can change rhash_entries on kernel boot. > > And what is the most important for big route cahce is rhash_entries > if route cache size exceed hash size performance will drop 6x to 8x > So the best settings for route cache are: > rhash_entries = gc_thresh = max_size > > Eric tell me what are the plans for removing route cache from kernel ? > Because as You see with route cache performance is better > And without route cache performance is not soo good than with route > cache enabled but it is stable for all situations even DDOS with 10kk > random_ips > > So for the feature we need to prepare for lower kernel IP forwarding > performance because of no route cache ? > Or removing route cache will save some time in IP stack processing ? > Obviously, cache removal will be possible only when performance without it is the same. Work is in progress, it started a long time ago. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 07 novembre 2011 à 10:08 +0100, Eric Dumazet a écrit : > Obviously, cache removal will be possible only when performance without > it is the same. > > Work is in progress, it started a long time ago. > One of the reason to get rid of this cache is its memory use. 256 bytes per entry, thats a lot of memory if you need 2.000.000 entries... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
W dniu 2011-11-07 10:16, Eric Dumazet pisze: > Le lundi 07 novembre 2011 à 10:08 +0100, Eric Dumazet a écrit : > >> Obviously, cache removal will be possible only when performance without >> it is the same. >> >> Work is in progress, it started a long time ago. >> > One of the reason to get rid of this cache is its memory use. > > 256 bytes per entry, thats a lot of memory if you need 2.000.000 > entries... > Yes it is allot for embedded small systems But in this times when many systems have 12 / 24 / 48GB of memory - it is not too much. > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9dd443d..07f86e0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5362,7 +5362,6 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write, int hashdist = HASHDIST_DEFAULT; -#ifdef CONFIG_NUMA static int __init set_hashdist(char *str) { if (!str) @@ -5371,7 +5370,6 @@ static int __init set_hashdist(char *str) return 1; } __setup("hashdist=", set_hashdist); -#endif /* * allocate a large system hash table from bootmem