diff mbox

tcp: Fix sysctl_tcp_max_orphans when PAGE_SIZE != 4k

Message ID 20100825.165759.27789477.davem@davemloft.net
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

David Miller Aug. 25, 2010, 11:57 p.m. UTC
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 25 Aug 2010 19:50:34 +0200

> In fact, existing code makes litle sense....
> 
> (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
> 
> is much bigger if spinlock debugging is on. Its strange to select bigger
> limits in this case (where kernel structures are also bigger)
> 
> bhash_size max is 65536, and we get this value even for small machines. 
> 
> Sizing would probably be better if using ehash_size instead of
> bhash_size

Yes, that would avoid debugging unduly influencing this tunable.

Also, I think the "4096" constant is a reference to thing we now
call SK_MEM_QUANTUM.  Back when this sizing code was added in:

commit 1f28b683339f74f9664b77532f4a2f1aad512451
Author: davem <davem>
Date:   Sun Jan 16 05:10:52 2000 +0000

    Merge in TCP/UDP optimizations and
    bug fixing from softnet patches.  Softnet patch
    set decreases size by approx. 300k

of the netdev-vger-cvs tree, we used the '4096' constant explicitly.

> Maybe remove the 'order' loop and use ehash_size, already a result of
> the available memory or thash_entries tunable.
> 
> unsigned int ehash_size = tcp_hashinfo.ehash_mask + 1;
> 
> tcp_death_row.sysctl_max_tw_buckets = cnt / 2;
> sysctl_tcp_max_orphans = cnt / 2;
> sysctl_max_syn_backlog = min(128, cnt / 256);

Yeah, something like the following, Anton?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Anton Blanchard Aug. 26, 2010, 12:38 a.m. UTC | #1
Hi Dave,

> Yeah, something like the following, Anton?

Thanks! I tested on a 512GB box. One thing:

# cat /proc/sys/net/ipv4/tcp_max_syn_backlog 
128

We probably want to use max():

> -	sysctl_max_syn_backlog = min(128, cnt / 256);
> +	sysctl_max_syn_backlog = max(128, cnt / 256);

With that change:

# cat /proc/sys/net/ipv4/tcp_max_orphans 
262144
# cat /proc/sys/net/ipv4/tcp_max_tw_buckets
262144
# cat /proc/sys/net/ipv4/tcp_max_syn_backlog 
2048


Tested-by: Anton Blanchard <anton@samba.org>

Anton
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 26, 2010, 3:53 a.m. UTC | #2
From: Anton Blanchard <anton@samba.org>
Date: Thu, 26 Aug 2010 10:38:34 +1000

> # cat /proc/sys/net/ipv4/tcp_max_syn_backlog 
> 128
> 
> We probably want to use max():
> 
>> -	sysctl_max_syn_backlog = min(128, cnt / 256);
>> +	sysctl_max_syn_backlog = max(128, cnt / 256);
> 
> With that change:
 ...
> Tested-by: Anton Blanchard <anton@samba.org>

Thanks a lot Anton.

Did you test my percpu orphan fix yet?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 26, 2010, 4:45 a.m. UTC | #3
Le jeudi 26 août 2010 à 10:38 +1000, Anton Blanchard a écrit :
> Hi Dave,
> 
> > Yeah, something like the following, Anton?
> 
> Thanks! I tested on a 512GB box. One thing:
> 
> # cat /proc/sys/net/ipv4/tcp_max_syn_backlog 
> 128
> 
> We probably want to use max():
> 
> > -	sysctl_max_syn_backlog = min(128, cnt / 256);
> > +	sysctl_max_syn_backlog = max(128, cnt / 256);
> 

I believe I make this min()/max() error once per week, sorry ;)

In my mind, I say "I want a minimum value of 128", then I write :

val = min(128, val);

instead of 

val = max(128, val);

Oh well...

> With that change:
> 
> # cat /proc/sys/net/ipv4/tcp_max_orphans 
> 262144
> # cat /proc/sys/net/ipv4/tcp_max_tw_buckets
> 262144
> # cat /proc/sys/net/ipv4/tcp_max_syn_backlog 
> 2048
> 
> 
> Tested-by: Anton Blanchard <anton@samba.org>
> 
> Anton

Seems pretty good to me, thanks !

BTW, we now auto-limit ehash to 512*1024 slots, so you probably have
same results on a 128GB or 4096GB machine ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anton Blanchard Aug. 26, 2010, 6:36 a.m. UTC | #4
Hi Dave,

> Did you test my percpu orphan fix yet?

The machine is running some tests, but I will either get time on it or
replicate the issue on a similar box.

Anton
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 197b9b7..403c029 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3209,7 +3209,7 @@  void __init tcp_init(void)
 {
 	struct sk_buff *skb = NULL;
 	unsigned long nr_pages, limit;
-	int order, i, max_share;
+	int order, i, max_share, cnt;
 	unsigned long jiffy = jiffies;
 
 	BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb));
@@ -3258,22 +3258,11 @@  void __init tcp_init(void)
 		INIT_HLIST_HEAD(&tcp_hashinfo.bhash[i].chain);
 	}
 
-	/* Try to be a bit smarter and adjust defaults depending
-	 * on available memory.
-	 */
-	for (order = 0; ((1 << order) << PAGE_SHIFT) <
-			(tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket));
-			order++)
-		;
-	if (order >= 4) {
-		tcp_death_row.sysctl_max_tw_buckets = 180000;
-		sysctl_tcp_max_orphans = 4096 << (order - 4);
-		sysctl_max_syn_backlog = 1024;
-	} else if (order < 3) {
-		tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
-		sysctl_tcp_max_orphans >>= (3 - order);
-		sysctl_max_syn_backlog = 128;
-	}
+	cnt = tcp_hashinfo.ehash_mask + 1;
+
+	tcp_death_row.sysctl_max_tw_buckets = cnt / 2;
+	sysctl_tcp_max_orphans = cnt / 2;
+	sysctl_max_syn_backlog = min(128, cnt / 256);
 
 	/* Set the pressure threshold to be a fraction of global memory that
 	 * is up to 1/2 at 256 MB, decreasing toward zero with the amount of