diff mbox

[net] tcp: fix connect() invalid -EADDRNOTAVAIL error

Message ID 1416379060-15685-1-git-send-email-jmaxwell37@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jon Maxwell Nov. 19, 2014, 6:37 a.m. UTC
The connect() routine returns -EADDRNOTAVAIL without doing a 4 
tuple check when the hash buckets were previously allocated by 
bind() and all local ports are used.

The bind() routine creates the local port hash buckets in 
inet_csk_get_port(). Depending on the socket options it sets 
tb->fastreuse and tb->fastreuseport to 0 or 1 in the bucket.

However the __inet_hash_connect() routine initializes the hash 
buckets differently and sets these to -1. The end result is 
that connect() calling into __inet_hash_connect() will 
subsequently ignore the check_established() routine if, here

__inet_hash_connect()
.
.
if (tb->fastreuse >= 0 ||↩
    tb->fastreuseport >= 0)↩
    goto next_port;

and cycle through all local ports until it returns -EADDRNOTAVAIL. 
The 4 tuple check is in check_established() so connect() can fail 
unnecessarily.

Prerequisites for this to happen:
1) The local tcp port range must be exhausted.
2) A process must have called bind() followed by connect() for all 
local ports.
3) A different process calls connect() only which returns -EADDRNOTAVAIL. 
4) The system more than 1 interface configured.

If a system has 2 IP Addresses and all local tcp ports are in use
for connection from IP Address (1). Connecting to the same ports 
via IP Address (2) should work based on the 4 tuple rule. But it 
fails under this condition. 

To fix this make __inet_hash_connect() honour inet_csk_get_port()'s
tb->fastreuse* variables.

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
---
 net/ipv4/inet_hashtables.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Eric Dumazet Nov. 19, 2014, 5:12 p.m. UTC | #1
On Wed, 2014-11-19 at 17:37 +1100, Jon Maxwell wrote:

> Prerequisites for this to happen:
> 1) The local tcp port range must be exhausted.
> 2) A process must have called bind() followed by connect() for all 
> local ports.

How the bind() is done exactly ? How SO_REUSEADDR is used ?

> 3) A different process calls connect() only which returns -EADDRNOTAVAIL. 
> 4) The system more than 1 interface configured.
> 
> If a system has 2 IP Addresses and all local tcp ports are in use
> for connection from IP Address (1). Connecting to the same ports 
> via IP Address (2) should work based on the 4 tuple rule. But it 
> fails under this condition. 

I do not think this is generally true.

If process called bind() to reserve a port, another process should not
be able to use the same port.

Do you have a test program exhibiting the problem ?

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 20, 2014, 6:33 a.m. UTC | #2
On Thu, 2014-11-20 at 14:44 +1100, Jonathan Maxwell wrote:
> > > Prerequisites for this to happen:
> > > 1) The local tcp port range must be exhausted.
> > > 2) A process must have called bind() followed by connect() for all
> > > local ports.
> > 
> > How the bind() is done exactly ? How SO_REUSEADDR is used ?
> 
> It fails regardless. I tried both with and without for the client and
> server programs.
> 

This is the missing part from the programs.

By not using SO_REUSEADDR, programs basically do not allow another
programs to use same port.

> But removing the bind() and just calling connect() from the initial
> program
> then a subsequent connect() from a separate program succeeds. It seems
> that 
> this is inconsistent behaviour. The proposed fix makes it behave the
> same for
> 
> both cases.

This not consistent behavior is well known and somehow expected by some
applications.

bind() requests the kernel that this socket has a reserved port.

It means the socket can later disconnect from the target, and reconnect
to another, using _same_ source port, or chose to listen() on this port.

That is why kernel is so picky, otherwise the listen() might fail
later...

This is part of BSD socket semantic.

You have to use SO_REUSEADDR on both programs to relax these
constraints.

Your change might break existing programs, really expecting kernel
to behave as requested.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 9111a4e..b39e89e 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -513,8 +513,8 @@  int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 			inet_bind_bucket_for_each(tb, &head->chain) {
 				if (net_eq(ib_net(tb), net) &&
 				    tb->port == port) {
-					if (tb->fastreuse >= 0 ||
-					    tb->fastreuseport >= 0)
+					if (tb->fastreuse > 0 ||
+					    tb->fastreuseport > 0)
 						goto next_port;
 					WARN_ON(hlist_empty(&tb->owners));
 					if (!check_established(death_row, sk,
@@ -530,8 +530,6 @@  int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 				spin_unlock(&head->lock);
 				break;
 			}
-			tb->fastreuse = -1;
-			tb->fastreuseport = -1;
 			goto ok;
 
 		next_port: