Patchwork inet: dont set inet_rcv_saddr in connect()

login
register
mail settings
Submitter Eric Dumazet
Date Sept. 7, 2010, 9:35 p.m.
Message ID <1283895316.2634.248.camel@edumazet-laptop>
Download mbox | patch
Permalink /patch/64073/
State Accepted
Delegated to: David Miller
Headers show

Comments

Eric Dumazet - Sept. 7, 2010, 9:35 p.m.
Le mardi 07 septembre 2010 à 12:59 -0700, David Miller a écrit :

> Eric, please just delete the code block instead of leaving it
> there inside of an #if 0 block.
> 
> If there is information conveyed by the unused code, add that
> information to the nice comment you're adding :-)
> 

Indeed ;)

[PATCH] inet: dont set inet_rcv_saddr in connect()

The following sequence :

socket(PF_INET, SOCK_DGRAM, IPPROTO_IP)

connect(fd, {sa_family=AF_INET, sin_port=htons(xx),
sin_addr=inet_addr("1.2.3.4")}, 28)

1) Does an implicit inet_autobind()
  (using an INADDR_ANY address, and selecting a random port).

2) Does an ip4_datagram_connect() to specify the address/port of
remote end point.

Problem is ip4_datagram_connect() also sets inet->inet_rcv_saddr (from
INADDR_ANY to IP source address, given the current route to remote end
point). Only the first connect() on the socket does this. Following ones
dont change the (possibly wrong) source address.

This breaks the secondary UDP hash, based on (ADDRESS, port), that was
computed by inet_autobind(). If more than 10 sockets are linked in
primary hash chain, we can drop incoming packets because searches are
done in wrong secondary hash chain.

This also potentially breaks multiple connect() to change remote
endpoints, because old source address might be non usable for packets to
new destination.

If route happens to change, then we should automatically change our
source address too, at next sendmsg() call, and UDP code deals with this
just fine.

If an application needs to specify a precise source address, it must use
bind() system call. connect() man page only refers to remote address,
not local one.

Reported-by: Krzysztof Olędzki <ole@ans.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/datagram.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Krzysztof Oledzki - Sept. 7, 2010, 9:52 p.m.
On 2010-09-07 23:35, Eric Dumazet wrote:
> Le mardi 07 septembre 2010 à 12:59 -0700, David Miller a écrit :
>
>> Eric, please just delete the code block instead of leaving it
>> there inside of an #if 0 block.
>>
>> If there is information conveyed by the unused code, add that
>> information to the nice comment you're adding :-)
>>
>
> Indeed ;)
>
> [PATCH] inet: dont set inet_rcv_saddr in connect()
>
> The following sequence :
>
> socket(PF_INET, SOCK_DGRAM, IPPROTO_IP)
>
> connect(fd, {sa_family=AF_INET, sin_port=htons(xx),
> sin_addr=inet_addr("1.2.3.4")}, 28)
>
> 1) Does an implicit inet_autobind()
>    (using an INADDR_ANY address, and selecting a random port).
>
> 2) Does an ip4_datagram_connect() to specify the address/port of
> remote end point.
>
> Problem is ip4_datagram_connect() also sets inet->inet_rcv_saddr (from
> INADDR_ANY to IP source address, given the current route to remote end
> point). Only the first connect() on the socket does this. Following ones
> dont change the (possibly wrong) source address.
>
> This breaks the secondary UDP hash, based on (ADDRESS, port), that was
> computed by inet_autobind(). If more than 10 sockets are linked in
> primary hash chain, we can drop incoming packets because searches are
> done in wrong secondary hash chain.
>
> This also potentially breaks multiple connect() to change remote
> endpoints, because old source address might be non usable for packets to
> new destination.
>
> If route happens to change, then we should automatically change our
> source address too, at next sendmsg() call, and UDP code deals with this
> just fine.
>
> If an application needs to specify a precise source address, it must use
> bind() system call. connect() man page only refers to remote address,
> not local one.
>
> Reported-by: Krzysztof Olędzki <ole@ans.pl>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>

Best regards,

			Krzysztof Olędzki
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Haley - Sept. 8, 2010, 2:34 a.m.
On 09/07/2010 05:35 PM, Eric Dumazet wrote:
> Problem is ip4_datagram_connect() also sets inet->inet_rcv_saddr (from
> INADDR_ANY to IP source address, given the current route to remote end
> point). Only the first connect() on the socket does this. Following ones
> dont change the (possibly wrong) source address.
> 
> This breaks the secondary UDP hash, based on (ADDRESS, port), that was
> computed by inet_autobind(). If more than 10 sockets are linked in
> primary hash chain, we can drop incoming packets because searches are
> done in wrong secondary hash chain.

I can't confirm this since I don't have 10 sockets in my primary hash
chain at the moment...

> This also potentially breaks multiple connect() to change remote
> endpoints, because old source address might be non usable for packets to
> new destination.
> 
> If route happens to change, then we should automatically change our
> source address too, at next sendmsg() call, and UDP code deals with this
> just fine.
> 
> If an application needs to specify a precise source address, it must use
> bind() system call. connect() man page only refers to remote address,
> not local one.

Is this really the right thing to do?  Linux has been doing this forever,
right?  Just like BSD has done it forever.  The way I've always "cleared"
a local address is to set the address family to AF_UNSPEC on the next
connect() call, as mentioned on the man page.  I just want to make sure
we're not changing something just to work around a broken application,
sendto()/sendmsg() work perfect in this case by not setting the local address.

BTW, it seems as though the reason this might only happen sometimes is
that if the first connect() is to 127.0.0.1, you won't be able to then
try and connect to say, 192.168.1.1.  If you first connect() to a remote
address things will probably just work.

My possibly wrong $.02

-Brian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Sept. 8, 2010, 3:34 a.m.
From: Brian Haley <brian.haley@hp.com>
Date: Tue, 07 Sep 2010 22:34:59 -0400

> Is this really the right thing to do?  Linux has been doing this forever,
> right?  Just like BSD has done it forever.

Indeed, I checked for this in Stevens volume 2 when I reviewed
Eric's patch, the logic is identical.

> The way I've always "cleared" a local address is to set the address
> family to AF_UNSPEC on the next connect() call, as mentioned on the
> man page.  I just want to make sure we're not changing something
> just to work around a broken application, sendto()/sendmsg() work
> perfect in this case by not setting the local address.
> 
> BTW, it seems as though the reason this might only happen sometimes is
> that if the first connect() is to 127.0.0.1, you won't be able to then
> try and connect to say, 192.168.1.1.  If you first connect() to a remote
> address things will probably just work.

Actually I have enough doubts about this too.  So I'm going to hold
off on the patch until we sort this out.

As Eric stated, sendmsg() on raw and UDP sockets pick the source
address based upon the bound or looked up route anyways.  But still
I think something still might end up being not-kosher with Eric's
change.

More so, I suspect, the issue is simply that the ordering of events
is simply inconvenient for the secondary hash implementation :-)
The port bind happens before the source address is bound, and that
is what screws everything up.

In this sense leaving the source addresses at INADDR_ANY is just a
workaround for the secondary hash table :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Sept. 8, 2010, 4:42 a.m.
Le mardi 07 septembre 2010 à 20:34 -0700, David Miller a écrit :
> From: Brian Haley <brian.haley@hp.com>
> Date: Tue, 07 Sep 2010 22:34:59 -0400
> 
> > Is this really the right thing to do?  Linux has been doing this forever,
> > right?  Just like BSD has done it forever.
> 
> Indeed, I checked for this in Stevens volume 2 when I reviewed
> Eric's patch, the logic is identical.
> 
> > The way I've always "cleared" a local address is to set the address
> > family to AF_UNSPEC on the next connect() call, as mentioned on the
> > man page.  I just want to make sure we're not changing something
> > just to work around a broken application, sendto()/sendmsg() work
> > perfect in this case by not setting the local address.
> > 

Problem is AF_UNSPEC always clears the remote address (as stated in
manual), and sometimes local one (as not stated)

if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
	inet_reset_saddr(sk);

This is the workaround that was coded years ago in Linux to undo the
local addr setting ;)


Following program produces this output :

local addr=0x7f000001 sin_port=37877
after connect(AF_UNSPEC) local addr=0x0 sin_port=0
local addr=0x7f000001 sin_port=37877
after connect(AF_UNSPEC) local addr=0x7f000001 sin_port=0



#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <string.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
	int fd;
	struct sockaddr_in target, me;
	socklen_t len;

	if ((fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)) == -1) {
		perror("socket");
		return 1;
	}
	memset(&target, 0, sizeof(target));
	target.sin_family = AF_INET;
	target.sin_port = 123;
	target.sin_addr.s_addr = htonl(0x7f000001);
	if (connect(fd, (const struct sockaddr *)&target, sizeof(target)) == -1) {
		perror("connect");
		close(fd);
		return 2;
	}
	len = sizeof(me);
	if (getsockname(fd, (struct sockaddr *)&me, &len)== -1) {
		perror("getsockname");
		close(fd);
		return 3;
	}
	printf("local addr=0x%x sin_port=%u\n",
		ntohl(me.sin_addr.s_addr), ntohs(me.sin_port));
	memset(&target, 0, sizeof(target));
	target.sin_family = AF_UNSPEC;
	if (connect(fd, (const struct sockaddr *)&target, sizeof(target)) == -1) {
		perror("connect AF_UNSPEC");
		close(fd);
		return 4;
	}	
	len = sizeof(me);
	if (getsockname(fd, (struct sockaddr *)&me, &len)== -1) {
		perror("getsockname 2");
		close(fd);
		return 3;
	}
	printf("after connect(AF_UNSPEC) local addr=0x%x sin_port=%u\n",
		ntohl(me.sin_addr.s_addr), ntohs(me.sin_port));
	memset(&target, 0, sizeof(target));
	target.sin_family = AF_INET;
	target.sin_addr.s_addr = htonl(0x7f000001);
	if (bind(fd, (const struct sockaddr *)&target, sizeof(target)) == -1) {
		perror("bind");
		close(fd);
		return 2;
	}
	len = sizeof(me);
	if (getsockname(fd, (struct sockaddr *)&me, &len)== -1) {
		perror("getsockname");
		close(fd);
		return 3;
	}
	printf("local addr=0x%x sin_port=%u\n",
		ntohl(me.sin_addr.s_addr), ntohs(me.sin_port));

	memset(&target, 0, sizeof(target));
	target.sin_family = AF_UNSPEC;
	if (connect(fd, (const struct sockaddr *)&target, sizeof(target)) == -1) {
		perror("connect AF_UNSPEC");
		close(fd);
		return 4;
	}	

	len = sizeof(me);
	if (getsockname(fd, (struct sockaddr *)&me, &len)== -1) {
		perror("getsockname 2");
		close(fd);
		return 3;
	}
	printf("after connect(AF_UNSPEC) local addr=0x%x sin_port=%u\n",
		ntohl(me.sin_addr.s_addr), ntohs(me.sin_port));
	return 0;
}



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Sept. 8, 2010, 4:57 a.m.
Le mardi 07 septembre 2010 à 22:34 -0400, Brian Haley a écrit :

> Is this really the right thing to do?  Linux has been doing this forever,
> right?  Just like BSD has done it forever.  The way I've always "cleared"
> a local address is to set the address family to AF_UNSPEC on the next
> connect() call, as mentioned on the man page.  I just want to make sure
> we're not changing something just to work around a broken application,
> sendto()/sendmsg() work perfect in this case by not setting the local address.
> 
> BTW, it seems as though the reason this might only happen sometimes is
> that if the first connect() is to 127.0.0.1, you won't be able to then
> try and connect to say, 192.168.1.1.  If you first connect() to a remote
> address things will probably just work.


I believe we have the following choice :

1) connect(AF_UNIX) sets the remote address/port
   bind() sets the local port (and optionally address)
   connect(AF_UNSPEC) clears remote addess/port,
                      let local address/port unchanged

2) Correct UDP hashing, when local address changes from 0 to x.y.z.t
   (cost : two locks taken), and a possible packet drop during the
operation.

   Document that connect() also sets local address, and that before
doing a second connect() to change remote address, its mandatory to
first issue a connect(AF_UNSPEC) to clear local address (if not locked
by a prior bind() call)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Sept. 8, 2010, 5:36 a.m.
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 08 Sep 2010 06:57:37 +0200

>    Document that connect() also sets local address, and that before
> doing a second connect() to change remote address, its mandatory to
> first issue a connect(AF_UNSPEC) to clear local address (if not locked
> by a prior bind() call)

For connectionless sockets, the application may connect() as many
times as it wishes to change the remote address.  The local address
remains set if it were set before such a re-associating connect().

It need only issue a connect(AF_UNSPEC) to make the socket have no
remote association, and as you state this operation will also wipe out
any local address settings not created by a bind() call.

And nicely our man pages are very clear about this :-) as is BSD and
Steven's volume 2.

This has been legal for decades, so we have to keep working this way.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Sept. 8, 2010, 5:51 a.m.
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 08 Sep 2010 06:42:45 +0200

> Problem is AF_UNSPEC always clears the remote address (as stated in
> manual), and sometimes local one (as not stated)
> 
> if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK))
> 	inet_reset_saddr(sk);
> 
> This is the workaround that was coded years ago in Linux to undo the
> local addr setting ;)

Ok, I mis-remembered about AF_UNSPEC, I thought BSD did it too, it
does not.

But we really offer the most flexibility here because we offer this.

In BSD case, connect() on an already connect()'d UDP socket
unconditionally zaps the local address setting (because, as Stevens
states, that local address "might" have been created by a previous
connect()).

This forces the application to re-bind() the local address if it wants
to use something other than what connect() is going to choose.

We leave the local settings alone for normal connect() on already
connect()'d sockets, and provide connect(AF_UNSPEC) to explicitly kill
connect() created local and remote associations.

This allows bind() settings to survive through multiple connect()
calls if the user wants, a facility BSD does not provide.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Sept. 8, 2010, 5:52 a.m.
Le mardi 07 septembre 2010 à 22:36 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 08 Sep 2010 06:57:37 +0200
> 
> >    Document that connect() also sets local address, and that before
> > doing a second connect() to change remote address, its mandatory to
> > first issue a connect(AF_UNSPEC) to clear local address (if not locked
> > by a prior bind() call)
> 
> For connectionless sockets, the application may connect() as many
> times as it wishes to change the remote address.  The local address
> remains set if it were set before such a re-associating connect().
> 
> It need only issue a connect(AF_UNSPEC) to make the socket have no
> remote association, and as you state this operation will also wipe out
> any local address settings not created by a bind() call.
> 
> And nicely our man pages are very clear about this :-) as is BSD and
> Steven's volume 2.
> 
> This has been legal for decades, so we have to keep working this way.

Yes, its also buggy, if 2nd remote address is not reachable on same interface.
Even if we try a connect(AF_UNSPEC), the local address stay as is :

after bind(port 5555) local addr=0x0:5555 
after connect(123) local addr=0x7f000001:5555 remote addr=0x7f000001:123 
Could not connect, errno=22
after connect(AF_UNSPEC) local addr=0x7f000001:5555 
connect: Invalid argument

I'll work on UDP fix anyway.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Sept. 8, 2010, 2:27 p.m.
Le mercredi 08 septembre 2010 à 07:52 +0200, Eric Dumazet a écrit :
> Le mardi 07 septembre 2010 à 22:36 -0700, David Miller a écrit :
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 08 Sep 2010 06:57:37 +0200
> > 
> > >    Document that connect() also sets local address, and that before
> > > doing a second connect() to change remote address, its mandatory to
> > > first issue a connect(AF_UNSPEC) to clear local address (if not locked
> > > by a prior bind() call)
> > 
> > For connectionless sockets, the application may connect() as many
> > times as it wishes to change the remote address.  The local address
> > remains set if it were set before such a re-associating connect().
> > 
> > It need only issue a connect(AF_UNSPEC) to make the socket have no
> > remote association, and as you state this operation will also wipe out
> > any local address settings not created by a bind() call.
> > 
> > And nicely our man pages are very clear about this :-) as is BSD and
> > Steven's volume 2.
> > 
> > This has been legal for decades, so we have to keep working this way.
> 
> Yes, its also buggy, if 2nd remote address is not reachable on same interface.
> Even if we try a connect(AF_UNSPEC), the local address stay as is :
> 
> after bind(port 5555) local addr=0x0:5555 
> after connect(123) local addr=0x7f000001:5555 remote addr=0x7f000001:123 
> Could not connect, errno=22
> after connect(AF_UNSPEC) local addr=0x7f000001:5555 
> connect: Invalid argument
> 

I run the program on FreeBSD 8.1, and this OS
does change the source address at connect() time.
It also change it each time connect() is called, not only once.

fd = socket()
connect(fd, "127.0.0.1:ntp")
system("netstat -p udp | grep udp")
connect(fd, "192.168.20.110:ntp")
system("netstat -p udp | grep udp")

->

udp4  0  0 127.0.0.1.35974       127.0.0.1.ntp

udp4  0  0 192.168.20.80.35974   192.168.20.110.ntp

while on Linux we refuse the second connect() -> EINVAL
(Because no route can be found from 127.0.0.1 to 192.168.20.110)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index f055094..656dc73 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -60,10 +60,13 @@  int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		ip_rt_put(rt);
 		return -EACCES;
 	}
-	if (!inet->inet_saddr)
-		inet->inet_saddr = rt->rt_src;	/* Update source address */
-	if (!inet->inet_rcv_saddr)
-		inet->inet_rcv_saddr = rt->rt_src;
+	/*
+	 * Should connect() change inet_rcv_saddr / inet_saddr ?
+	 * It should not, because we want to specify the peer to which
+	 * datagrams are to be sent, regardless of our source address that
+	 * might change in the future, after a route change.
+	 * To specify our source address, bind() is the right API.
+	 */
 	inet->inet_daddr = rt->rt_dst;
 	inet->inet_dport = usin->sin_port;
 	sk->sk_state = TCP_ESTABLISHED;