Message ID | 1271808314.7895.614.camel@edumazet-laptop |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
2010/4/21 Eric Dumazet <eric.dumazet@gmail.com>: > Le mardi 20 avril 2010 à 16:49 -0700, Ben Greear a écrit : >> On 04/20/2010 04:35 PM, Gaspar Chilingarov wrote: >> > sysctl -a | grep local_port_range >> >> [root@ct503-10G-09 ~]# sysctl -a | grep local_port_range >> net.ipv4.ip_local_port_range = 10000 61000 >> >> I'm explicitly binding to local ports as well as local IPs, btw. >> > > I believe the bsockets 'optimization' is a bug, we should remove it. > > This is a stable candidate (2.6.30+) > > [PATCH net-next-2.6] tcp: remove bsockets count > > Counting number of bound sockets to avoid a loop is buggy, since we cant > know how many IP addresses are in use. When threshold is reached, we try > 5 random slots and can fail while there are plenty available ports. > Thank you a lot for the patch - I will try it. In FreeBSD I was able to add about 32 C classes (8192 ips) on the single interface (never tried to do that in Linux yet :) - so you really never know how much IP's are there available. Tens and even up to hundred IPs on the single machine are not that usual in hosting environment at all. /Gaspar -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 21 Apr 2010 02:05:14 +0200 > [PATCH net-next-2.6] tcp: remove bsockets count > > Counting number of bound sockets to avoid a loop is buggy, since we cant > know how many IP addresses are in use. When threshold is reached, we try > 5 random slots and can fail while there are plenty available ports. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Hmmm, yes indeed it seems there are improper assumptions made by this scheme. This is a tricky area, so I'll wait for some test results from the reporter and study the code over a few times myself to make sure we get this right. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 21, 2010 at 02:05:14AM +0200, Eric Dumazet (eric.dumazet@gmail.com) wrote: > I believe the bsockets 'optimization' is a bug, we should remove it. > > This is a stable candidate (2.6.30+) > > [PATCH net-next-2.6] tcp: remove bsockets count > > Counting number of bound sockets to avoid a loop is buggy, since we cant > know how many IP addresses are in use. When threshold is reached, we try > 5 random slots and can fail while there are plenty available ports. To return back to exponential bind() times you need to revert the whole original patch including magic 5 number, not only bsockets. But actual problem is not in this digit, but in a deeper logic. Previously we scanned the whole table, now we have 5 attempts to find out at least one bucket (without conflict) we will insert new socket into. Apparently for large number of addresses it is possible that all 5 times we will randomly select those buckets which conflicts. As dumb solution we can increase 'attempt' number to infinite one, or fallback to whole-table-search after several random attempts, which is a bit more clever I think.
From: Evgeniy Polyakov <zbr@ioremap.net> Date: Wed, 21 Apr 2010 04:30:22 +0400 > To return back to exponential bind() times you need to revert the whole > original patch including magic 5 number, not only bsockets. Indeed, if we keep that '5' thing there it's just going to still fail similarly. > But actual problem is not in this digit, but in a deeper logic. > Previously we scanned the whole table, now we have 5 attempts to > find out at least one bucket (without conflict) we will insert > new socket into. Apparently for large number of addresses it is possible > that all 5 times we will randomly select those buckets which conflicts. > As dumb solution we can increase 'attempt' number to infinite one, or > fallback to whole-table-search after several random attempts, which is a > bit more clever I think. If random number generator is not too terrible, just using infinite limit would be roughly equivalent. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 74358d1..e0f3a05 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -150,8 +150,6 @@ struct inet_hashinfo { */ struct inet_listen_hashbucket listening_hash[INET_LHTABLE_SIZE] ____cacheline_aligned_in_smp; - - atomic_t bsockets; }; static inline struct inet_ehash_bucket *inet_ehash_bucket( diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 8da6429..0bbfd00 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -119,11 +119,6 @@ again: (tb->num_owners < smallest_size || smallest_size == -1)) { smallest_size = tb->num_owners; smallest_rover = rover; - if (atomic_read(&hashinfo->bsockets) > (high - low) + 1) { - spin_unlock(&head->lock); - snum = smallest_rover; - goto have_snum; - } } goto next; } diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 2b79377..4bc921f 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -62,8 +62,6 @@ void inet_bind_hash(struct sock *sk, struct inet_bind_bucket *tb, { struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; - atomic_inc(&hashinfo->bsockets); - inet_sk(sk)->inet_num = snum; sk_add_bind_node(sk, &tb->owners); tb->num_owners++; @@ -81,8 +79,6 @@ static void __inet_put_port(struct sock *sk) struct inet_bind_hashbucket *head = &hashinfo->bhash[bhash]; struct inet_bind_bucket *tb; - atomic_dec(&hashinfo->bsockets); - spin_lock(&head->lock); tb = inet_csk(sk)->icsk_bind_hash; __sk_del_bind_node(sk); @@ -551,7 +547,6 @@ void inet_hashinfo_init(struct inet_hashinfo *h) { int i; - atomic_set(&h->bsockets, 0); for (i = 0; i < INET_LHTABLE_SIZE; i++) { spin_lock_init(&h->listening_hash[i].lock); INIT_HLIST_NULLS_HEAD(&h->listening_hash[i].head,