Message ID | 1283887569.2634.95.camel@edumazet-laptop |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 07 Sep 2010 21:26:09 +0200 > } > +/* > + * Should connect() change inet_rcv_saddr ? > + * It should not IMHO, because we want to specify the peer to which > + * datagrams are to be sent, regardless of our source address that might > + * change in the future, after a route change. > + * To specify our source address, bind() is the right API. > + */ > +#if 0 > if (!inet->inet_saddr) > inet->inet_saddr = rt->rt_src; /* Update source address */ > if (!inet->inet_rcv_saddr) > inet->inet_rcv_saddr = rt->rt_src; > +#endif > inet->inet_daddr = rt->rt_dst; > inet->inet_dport = usin->sin_port; > sk->sk_state = TCP_ESTABLISHED; Eric, please just delete the code block instead of leaving it there inside of an #if 0 block. If there is information conveyed by the unused code, add that information to the nice comment you're adding :-) Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2010-09-07 21:26, Eric Dumazet wrote: > Le mardi 07 septembre 2010 à 18:36 +0200, Eric Dumazet a écrit : > >> Hmm, I have a pretty good idea of what the problem is, and will post a >> fix soon ;) > > David, if you feel this is too invasive for stable, we can make UDP > rehash the socket in case we dont want to change ip4_datagram_connect() > > > [PATCH] inet: Dont set inet_rcv_saddr in connect() > > So the problem is that the sequence : > > socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) > > connect(fd, {sa_family=AF_INET, sin_port=htons(xx), > sin_addr=inet_addr("1.2.3.4")}, 28) > > 1) Does an implicit inet_autobind() > (using an INADDR_ANY address, and selecting a random port). > > 2) Then does an ip4_datagram_connect() to specify the address/port of > remote end point. > > Problem is ip4_datagram_connect() also sets inet->inet_rcv_saddr (from > INADDR_ANY to IP source address, given the current route to remote end > point). Only the first connect() on the socket does this. Following ones > dont change the (possibly wrong) source address. > > This breaks the secondary UDP hash, based on (ADDRESS, port), that was > computed by inet_autobind(). > > This also potentially breaks multiple connect() to change remote > endpoints, because old source address might be non usable for packets to > new destination. > > If route happens to change, then we should automatically change our > source address too, at next sendmsg() call, and UDP code deals with this > just fine. > > If an application needs to specify a precise source address, it must use > bind() system call. connect() man page only refers to remote address, > not local one. > > Reported-by: Krzysztof Olędzki <ole@ans.pl> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > net/ipv4/datagram.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c > index f055094..8a17241 100644 > --- a/net/ipv4/datagram.c > +++ b/net/ipv4/datagram.c > @@ -60,10 +60,19 @@ int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) > ip_rt_put(rt); > return -EACCES; > } > +/* > + * Should connect() change inet_rcv_saddr ? > + * It should not IMHO, because we want to specify the peer to which > + * datagrams are to be sent, regardless of our source address that might > + * change in the future, after a route change. > + * To specify our source address, bind() is the right API. > + */ > +#if 0 > if (!inet->inet_saddr) > inet->inet_saddr = rt->rt_src; /* Update source address */ > if (!inet->inet_rcv_saddr) > inet->inet_rcv_saddr = rt->rt_src; > +#endif > inet->inet_daddr = rt->rt_dst; > inet->inet_dport = usin->sin_port; > sk->sk_state = TCP_ESTABLISHED; With the above patch I'm no longer able to reproduce the problem. Thanks! Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> BTW: why it takes so long to trigger this bug and it is only possible over a loopback interface? Best regards, Krzysztof Olędzki -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 07 septembre 2010 à 23:28 +0200, Krzysztof Olędzki a écrit : > With the above patch I'm no longer able to reproduce the problem. Thanks! > > Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> > Thanks a lot ! > BTW: why it takes so long to trigger this bug and it is only possible > over a loopback interface? Its a bit tricky : You need at least 10 sockets linked in a particular hash chain. To check this, you can : cat /proc/net/udp maybe you have many sockets on port 123 or 53 ? And about loopback, I have no idea... I am pretty sure I can trigger the bug with other interfaces. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2010-09-07 23:39, Eric Dumazet wrote: > Le mardi 07 septembre 2010 à 23:28 +0200, Krzysztof Olędzki a écrit : > >> With the above patch I'm no longer able to reproduce the problem. Thanks! >> >> Tested-by: Krzysztof Piotr Oledzki<ole@ans.pl> >> > > Thanks a lot ! > >> BTW: why it takes so long to trigger this bug and it is only possible >> over a loopback interface? > > Its a bit tricky : You need at least 10 sockets linked in a particular > hash chain. > > To check this, you can : > > cat /proc/net/udp > > maybe you have many sockets on port 123 or 53 ? On one affected host I have 3+7 and on the other, also affacted one, I have 3+6: root@sowa:~# egrep -cw '(53|123):' /proc/net/udp 10 root@sowa:~# egrep -w '(53|123):' /proc/net/udp 53: 3582A8C0:0035 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 6084654 2 ffff8800cc012700 0 53: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 6084652 2 ffff8800cc010900 0 123: D683A8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4911 2 ffff88012de96400 0 123: 7B85A8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4910 2 ffff88012de96100 0 123: 8982A8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4909 2 ffff88012de95e00 0 123: 7B82A8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4908 2 ffff88012de95b00 0 123: 3582A8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4907 2 ffff88012de95800 0 123: 1F7EA8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4906 2 ffff88012de95500 0 123: 0100007F:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4905 2 ffff88012de95200 0 123: 00000000:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4899 2 ffff88012de94c00 0 root@cmyk:~# egrep -cw '(53|123):' /proc/net/udp 9 root@cmyk:~# egrep -w '(53|123):' /proc/net/udp 53: A2CD1253:0035 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 3774 2 eeaa9840 0 53: 0100A8C0:0035 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 3772 2 eeaa9a40 0 53: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 3770 2 eeaa9c40 0 123: A6CD1253:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4693 2 eeacdc80 0 123: A5CD1253:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4692 2 eeaa9640 0 123: A2CD1253:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4691 2 ed201700 0 123: 0100A8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4690 2 eeacd880 0 123: 0100007F:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4689 2 ec649740 0 123: 00000000:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4683 2 ed201d00 0 But how 123 is related to 53? > And about loopback, I have no idea... I am pretty sure I can trigger the > bug with other interfaces. OK. Probably it is because my other hosts have only a single IP and only the problematic ones have both DNS server and multiple IP (many sockets). Best regards, Krzysztof Olędzki -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 07 septembre 2010 à 23:51 +0200, Krzysztof Olędzki a écrit : > On 2010-09-07 23:39, Eric Dumazet wrote: > > Le mardi 07 septembre 2010 à 23:28 +0200, Krzysztof Olędzki a écrit : > > > >> With the above patch I'm no longer able to reproduce the problem. Thanks! > >> > >> Tested-by: Krzysztof Piotr Oledzki<ole@ans.pl> > >> > > > > Thanks a lot ! > > > >> BTW: why it takes so long to trigger this bug and it is only possible > >> over a loopback interface? > > > > Its a bit tricky : You need at least 10 sockets linked in a particular > > hash chain. > > > > To check this, you can : > > > > cat /proc/net/udp > > > > maybe you have many sockets on port 123 or 53 ? > > On one affected host I have 3+7 and on the other, also affacted one, I have 3+6: > > root@sowa:~# egrep -cw '(53|123):' /proc/net/udp > 10 > root@sowa:~# egrep -w '(53|123):' /proc/net/udp > 53: 3582A8C0:0035 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 6084654 2 ffff8800cc012700 0 > 53: 0100007F:0035 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 6084652 2 ffff8800cc010900 0 > 123: D683A8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4911 2 ffff88012de96400 0 > 123: 7B85A8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4910 2 ffff88012de96100 0 > 123: 8982A8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4909 2 ffff88012de95e00 0 > 123: 7B82A8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4908 2 ffff88012de95b00 0 > 123: 3582A8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4907 2 ffff88012de95800 0 > 123: 1F7EA8C0:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4906 2 ffff88012de95500 0 > 123: 0100007F:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4905 2 ffff88012de95200 0 > 123: 00000000:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4899 2 ffff88012de94c00 0 > > But how 123 is related to 53? > I was mentioning 123 or 53, as probable suspects :) When a socket is created, and connect() called, autobind() chooses a source port X for this socket. if ((X % udp_hash_size) == 123), socket is inserted in hash chain number 123. Bug then triggers, because when a packet is received for this socket, we find a slot with more than 10 sockets -> Search is done on secondary chain Z2, where we dont find the socket since its rcv_addr changed after we inserted it (in chain Y2). Packet is dropped (as seen in netstat -s) > > And about loopback, I have no idea... I am pretty sure I can trigger the > > bug with other interfaces. > > OK. Probably it is because my other hosts have only a single IP and only > the problematic ones have both DNS server and multiple IP (many sockets). > Yes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c index f055094..8a17241 100644 --- a/net/ipv4/datagram.c +++ b/net/ipv4/datagram.c @@ -60,10 +60,19 @@ int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) ip_rt_put(rt); return -EACCES; } +/* + * Should connect() change inet_rcv_saddr ? + * It should not IMHO, because we want to specify the peer to which + * datagrams are to be sent, regardless of our source address that might + * change in the future, after a route change. + * To specify our source address, bind() is the right API. + */ +#if 0 if (!inet->inet_saddr) inet->inet_saddr = rt->rt_src; /* Update source address */ if (!inet->inet_rcv_saddr) inet->inet_rcv_saddr = rt->rt_src; +#endif inet->inet_daddr = rt->rt_dst; inet->inet_dport = usin->sin_port; sk->sk_state = TCP_ESTABLISHED;