diff mbox

inet_hash_connect: source port allocation

Message ID 1291057655.3435.1363.camel@edumazet-laptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Nov. 29, 2010, 7:07 p.m. UTC
Le lundi 29 novembre 2010 à 19:46 +0100, Eric Dumazet a écrit :
> Le lundi 29 novembre 2010 à 18:29 +0000, John Haxby a écrit :
> 
> > Sorry,  I think I phrased my question badly.
> > 
> > inet_csk_get_port() starts its search for a free port with
> > 
> >      smallest_rover = rover = net_random() % remaining + low;
> > 
> > whereas __inet_hash_connect() basically misses out that call to 
> > net_random() so you get a predictable port number.
> > 
> > Is there any good reason why that is the case?
> > 
> 
> It seems random select was done at bind() time only in commit
> 6df716340da3a6f ([TCP/DCCP]: Randomize port selection)
> 
> It probably should be done in autobind too.
> 
> 

I'll test following patch :



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Nov. 29, 2010, 7:21 p.m. UTC | #1
Le lundi 29 novembre 2010 à 20:07 +0100, Eric Dumazet a écrit :
> Le lundi 29 novembre 2010 à 19:46 +0100, Eric Dumazet a écrit :
> > Le lundi 29 novembre 2010 à 18:29 +0000, John Haxby a écrit :
> > 
> > > Sorry,  I think I phrased my question badly.
> > > 
> > > inet_csk_get_port() starts its search for a free port with
> > > 
> > >      smallest_rover = rover = net_random() % remaining + low;
> > > 
> > > whereas __inet_hash_connect() basically misses out that call to 
> > > net_random() so you get a predictable port number.
> > > 
> > > Is there any good reason why that is the case?
> > > 
> > 
> > It seems random select was done at bind() time only in commit
> > 6df716340da3a6f ([TCP/DCCP]: Randomize port selection)
> > 
> > It probably should be done in autobind too.
> > 
> > 
> 
> I'll test following patch :

Oh well, forget this, there is something about inet_sk_port_offset()
using secure_ipv4_port_ephemeral()

We want to avoid reusing same port too fast.

http://www.tcpipguide.com/free/t_TCPIPClientEphemeralPortsandClientServerApplicatio-2.htm

Port is predictable only for same destination, and if no other
connections are attempted by other threads.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Nov. 29, 2010, 7:38 p.m. UTC | #2
On Mon, 29 Nov 2010 20:07:35 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le lundi 29 novembre 2010 à 19:46 +0100, Eric Dumazet a écrit :
> > Le lundi 29 novembre 2010 à 18:29 +0000, John Haxby a écrit :
> > 
> > > Sorry,  I think I phrased my question badly.
> > > 
> > > inet_csk_get_port() starts its search for a free port with
> > > 
> > >      smallest_rover = rover = net_random() % remaining + low;
> > > 
> > > whereas __inet_hash_connect() basically misses out that call to 
> > > net_random() so you get a predictable port number.
> > > 
> > > Is there any good reason why that is the case?
> > > 
> > 
> > It seems random select was done at bind() time only in commit
> > 6df716340da3a6f ([TCP/DCCP]: Randomize port selection)
> > 
> > It probably should be done in autobind too.
> > 
> > 
> 
> I'll test following patch :
> 
> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index 1b344f3..65c3702 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -466,20 +466,18 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
>  	int twrefcnt = 1;
>  
>  	if (!snum) {
> -		int i, remaining, low, high, port;
> -		static u32 hint;
> -		u32 offset = hint + port_offset;
> +		int remaining, low, high, port;
>  		struct hlist_node *node;
>  		struct inet_timewait_sock *tw = NULL;
>  
>  		inet_get_local_port_range(&low, &high);
>  		remaining = (high - low) + 1;
> +		port = net_random() % remaining + low;
>  
>  		local_bh_disable();
> -		for (i = 1; i <= remaining; i++) {
> -			port = low + (i + offset) % remaining;
> +		do {
>  			if (inet_is_reserved_local_port(port))
> -				continue;
> +				goto next_nolock;
>  			head = &hinfo->bhash[inet_bhashfn(net, port,
>  					hinfo->bhash_size)];
>  			spin_lock(&head->lock);
> @@ -510,16 +508,17 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
>  			tb->fastreuse = -1;
>  			goto ok;
>  
> -		next_port:
> +next_port:
>  			spin_unlock(&head->lock);
> -		}
> +next_nolock:
> +			if (++port > high)
> +				port = low;
> +		} while (--remaining > 0);
>  		local_bh_enable();
>  
>  		return -EADDRNOTAVAIL;
>  
>  ok:
> -		hint += i;
> -
>  		/* Head lock still held and bh's disabled */
>  		inet_bind_hash(sk, tb, port);
>  		if (sk_unhashed(sk)) {
> 

The original algorithm works better than uses if the port space is small
and being reused rapidly. Because the hint in the old algorithm is sequential ports
get used up sequentially.

You should look a the port randomization RFC. The earlier versions of the
RFC were better before the BSD guys started putting in their non-scalable
algorithms :-)
diff mbox

Patch

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 1b344f3..65c3702 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -466,20 +466,18 @@  int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	int twrefcnt = 1;
 
 	if (!snum) {
-		int i, remaining, low, high, port;
-		static u32 hint;
-		u32 offset = hint + port_offset;
+		int remaining, low, high, port;
 		struct hlist_node *node;
 		struct inet_timewait_sock *tw = NULL;
 
 		inet_get_local_port_range(&low, &high);
 		remaining = (high - low) + 1;
+		port = net_random() % remaining + low;
 
 		local_bh_disable();
-		for (i = 1; i <= remaining; i++) {
-			port = low + (i + offset) % remaining;
+		do {
 			if (inet_is_reserved_local_port(port))
-				continue;
+				goto next_nolock;
 			head = &hinfo->bhash[inet_bhashfn(net, port,
 					hinfo->bhash_size)];
 			spin_lock(&head->lock);
@@ -510,16 +508,17 @@  int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 			tb->fastreuse = -1;
 			goto ok;
 
-		next_port:
+next_port:
 			spin_unlock(&head->lock);
-		}
+next_nolock:
+			if (++port > high)
+				port = low;
+		} while (--remaining > 0);
 		local_bh_enable();
 
 		return -EADDRNOTAVAIL;
 
 ok:
-		hint += i;
-
 		/* Head lock still held and bh's disabled */
 		inet_bind_hash(sk, tb, port);
 		if (sk_unhashed(sk)) {