Message ID | 1447859024-1040-1-git-send-email-fw@strlen.de |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2015-11-18 at 16:03 +0100, Florian Westphal wrote: > RFC 1122, 4.2.2.13: > [..] if new data is received after CLOSE is called, its TCP > SHOULD send a RST to show that data was lost. > > When a connection is closed actively, it MUST linger in > TIME-WAIT state [..]. > > We reset a connection, but destroy state immediately. > > After discussing this with Hannes, we decided it was preferable > to also move to TW state to avoid immediate port reuse. > > packetdrill testcase: > > 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > 0.000 bind(3, ..., ...) = 0 > 0.000 listen(3, 1) = 0 > 0.100 < S 0:0(0) win 29200 <mss 1460> > 0.100 > S. 0:0(0) ack 1 <mss 1460> > 0.200 < . 1:1(0) ack 1 win 257 > 0.200 accept(3, ..., ...) = 4 > // close our side. > 0.210 close(4) = 0 > // we should expect to see FIN now, sk moves to FIN_WAIT_1 > 0.210 > F. 1:1(0) ack 1 win 29200 > // receive data, but sk already closed -> Reset > 0.300 < P. 1:1001(1000) ack 1 win 46 > 0.300 > R 1:1(0) win 0 Thanks for using packetdrill ;) This packetdrill test shows nothing special regarding your patch, it should work right now with current kernels ??? lpk51:~# tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes lpk51:~# ./packetdrill Florian.pkt 07:26:49.656548 IP 192.0.2.1.59883 > 192.168.155.177.8080: Flags [S], seq 0, win 29200, options [mss 1460], length 0 07:26:49.656611 IP 192.168.155.177.8080 > 192.0.2.1.59883: Flags [S.], seq 1433468786, ack 1, win 29200, options [mss 1460], length 0 07:26:49.756530 IP 192.0.2.1.59883 > 192.168.155.177.8080: Flags [.], ack 1, win 257, length 0 07:26:49.766529 IP 192.168.155.177.8080 > 192.0.2.1.59883: Flags [F.], seq 1, ack 1, win 29200, length 0 07:26:49.856529 IP 192.0.2.1.59883 > 192.168.155.177.8080: Flags [P.], seq 1:1001, ack 1, win 46, length 1000: HTTP 07:26:49.856560 IP 192.168.155.177.8080 > 192.0.2.1.59883: Flags [R], seq 1433468787, win 0, length 0 07:26:49.856806 IP 192.0.2.1.59883 > 192.168.155.177.8080: Flags [R.], seq 1001, ack 1, win 46, length 0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2015-11-18 at 16:03 +0100, Florian Westphal wrote: > > RFC 1122, 4.2.2.13: > > [..] if new data is received after CLOSE is called, its TCP > > SHOULD send a RST to show that data was lost. > > > > When a connection is closed actively, it MUST linger in > > TIME-WAIT state [..]. > > > > We reset a connection, but destroy state immediately. > > > > After discussing this with Hannes, we decided it was preferable > > to also move to TW state to avoid immediate port reuse. > > > > packetdrill testcase: > > > > 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > > 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > > 0.000 bind(3, ..., ...) = 0 > > 0.000 listen(3, 1) = 0 > > 0.100 < S 0:0(0) win 29200 <mss 1460> > > 0.100 > S. 0:0(0) ack 1 <mss 1460> > > 0.200 < . 1:1(0) ack 1 win 257 > > 0.200 accept(3, ..., ...) = 4 > > // close our side. > > 0.210 close(4) = 0 > > // we should expect to see FIN now, sk moves to FIN_WAIT_1 > > 0.210 > F. 1:1(0) ack 1 win 29200 > > // receive data, but sk already closed -> Reset > > 0.300 < P. 1:1001(1000) ack 1 win 46 > > 0.300 > R 1:1(0) win 0 > This packetdrill test shows nothing special regarding your patch, it > should work right now with current kernels ??? Yes, but we kill the socket. I should have added 0.400 `ss -nito state time-wait` as last line... Before patch: no output after patch: tw socket shown. The on-wire behavior doesn't change unless further packets arrive. Old behaviour: more RST New behaviour: acks+tw timer restart Sorry for the confusion. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-11-18 at 16:36 +0100, Florian Westphal wrote: > Yes, but we kill the socket. > > I should have added > > 0.400 `ss -nito state time-wait` > > as last line... > > Before patch: no output > after patch: tw socket shown. > > The on-wire behavior doesn't change unless further packets arrive. > Old behaviour: more RST > New behaviour: acks+tw timer restart Just add few more incoming packets to the packetdrill test then ? Also, is your customer really _not_ using TCP timestamps ? This is kind of a requirement for port reuse anyway. Anyway, having a TIMEWAIT setup after sending a RST makes little sense to me. When a RST packet is sent, the remote peer will forget everything about this previous connection, and another connect() might reuse the tuple and I do not think we should forbid this. Normal PAWS checks were invented for a good reason. RFC 1122, 4.2.2.13 can be interpreted in very different ways. Please show us real issue your customer has. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 18, 2015, at 16:46, Eric Dumazet wrote: > On Wed, 2015-11-18 at 16:36 +0100, Florian Westphal wrote: > > > Yes, but we kill the socket. > > > > I should have added > > > > 0.400 `ss -nito state time-wait` > > > > as last line... > > > > Before patch: no output > > after patch: tw socket shown. > > > > The on-wire behavior doesn't change unless further packets arrive. > > Old behaviour: more RST > > New behaviour: acks+tw timer restart > > Just add few more incoming packets to the packetdrill test then ? > > Also, is your customer really _not_ using TCP timestamps ? Windows mostly does not use TCP timestamps. Also we have cases were security folks tell customers to turn off timestamps as they enable attackers to guess uptime. :( > This is kind of a requirement for port reuse anyway. > > Anyway, having a TIMEWAIT setup after sending a RST makes little sense > to me. > > When a RST packet is sent, the remote peer will forget everything about > this previous connection, and another connect() might reuse the tuple > and I do not think we should forbid this. Normal PAWS checks were > invented for a good reason. Still, the RST packet can be dropped along the way. So the teardown of the socket on the other side might not happen. Bye, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-11-18 at 16:54 +0100, Hannes Frederic Sowa wrote: > Still, the RST packet can be dropped along the way. So the teardown of > the socket on the other side might not happen. This is why it is better to send RST for every incoming in-excess packet Try following packetdrill test, before and after your patch : 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < S 0:0(0) win 29200 <mss 1460> 0.100 > S. 0:0(0) ack 1 <mss 1460> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 // close our side. 0.210 close(4) = 0 // we should expect to see FIN now, sk moves to FIN_WAIT_1 0.210 > F. 1:1(0) ack 1 win 29200 // receive data, but sk already closed -> Reset +.010 < P. 1:1001(1000) ack 1 win 46 +0 > R 1:1(0) win 0 // Are we properly sending a RST like prior packet did ? +.010 < P. 1001:2001(1000) ack 1 win 46 +0 > R 1:1(0) win 0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2015-11-18 at 16:54 +0100, Hannes Frederic Sowa wrote: > > > Still, the RST packet can be dropped along the way. So the teardown of > > the socket on the other side might not happen. > > This is why it is better to send RST for every incoming in-excess packet I'm convinced and marked patch as rejected in patchwork. Thanks Eric! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 18, 2015, at 18:28, Eric Dumazet wrote: > On Wed, 2015-11-18 at 16:54 +0100, Hannes Frederic Sowa wrote: > > > Still, the RST packet can be dropped along the way. So the teardown of > > the socket on the other side might not happen. > > This is why it is better to send RST for every incoming in-excess packet I agree it would be better to send a RST than ACK incoming data. Just thinking out loud, would it make sense to add a sub-state to timewait so we RST all other packets for TIMEWAIT_LEN duration? > Try following packetdrill test, before and after your patch : > > 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > 0.000 bind(3, ..., ...) = 0 > 0.000 listen(3, 1) = 0 > 0.100 < S 0:0(0) win 29200 <mss 1460> > 0.100 > S. 0:0(0) ack 1 <mss 1460> > 0.200 < . 1:1(0) ack 1 win 257 > 0.200 accept(3, ..., ...) = 4 > // close our side. > 0.210 close(4) = 0 > // we should expect to see FIN now, sk moves to FIN_WAIT_1 > 0.210 > F. 1:1(0) ack 1 win 29200 > > // receive data, but sk already closed -> Reset > +.010 < P. 1:1001(1000) ack 1 win 46 > +0 > R 1:1(0) win 0 > > // Are we properly sending a RST like prior packet did ? > +.010 < P. 1001:2001(1000) ack 1 win 46 > +0 > R 1:1(0) win 0 Thanks, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2015-11-18 at 18:35 +0100, Hannes Frederic Sowa wrote: > On Wed, Nov 18, 2015, at 18:28, Eric Dumazet wrote: > > On Wed, 2015-11-18 at 16:54 +0100, Hannes Frederic Sowa wrote: > > > > > Still, the RST packet can be dropped along the way. So the teardown of > > > the socket on the other side might not happen. > > > > This is why it is better to send RST for every incoming in-excess packet > > I agree it would be better to send a RST than ACK incoming data. Just > thinking out loud, would it make sense to add a sub-state to timewait so > we RST all other packets for TIMEWAIT_LEN duration? Intuitively this timewait creation looks dangerous to me, maybe from an attack surface perspective. An out of sequence packet sends an advisory RST packet, which is already a hint for building some kinds of attacks. Now if we also create a timewait socket, and its associated timer, we also add quite a risk of memory and cpu consumption. So I am fine you create a timewait, only if you can prove there is no change in security. Thanks ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 18 Nov 2015 07:46:31 -0800 > Anyway, having a TIMEWAIT setup after sending a RST makes little > sense to me. > > When a RST packet is sent, the remote peer will forget everything about > this previous connection, and another connect() might reuse the tuple > and I do not think we should forbid this. Normal PAWS checks were > invented for a good reason. > > RFC 1122, 4.2.2.13 can be interpreted in very different ways. I think it is clear that once RST is emitted, that connection ID no longer exists, on both ends. Sender of RST must not have that matching state, and receiver of RST must tear things down. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index fdd88c3..3bcdad1 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5904,11 +5904,16 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) break; } - if (tp->linger2 < 0 || - (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq && - after(TCP_SKB_CB(skb)->end_seq - th->fin, tp->rcv_nxt))) { + if (tp->linger2 < 0) { + NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPABORTONDATA); tcp_done(sk); + return 1; + } + + if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq && + after(TCP_SKB_CB(skb)->end_seq - th->fin, tp->rcv_nxt)) { NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPABORTONDATA); + tcp_time_wait(sk, TCP_TIME_WAIT, 0); return 1; } @@ -5966,7 +5971,13 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq && after(TCP_SKB_CB(skb)->end_seq - th->fin, tp->rcv_nxt)) { NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPABORTONDATA); - tcp_reset(sk); + + if (sk->sk_state == TCP_CLOSE_WAIT || + sk->sk_state == TCP_LAST_ACK) + tcp_reset(sk); + else + tcp_time_wait(sk, TCP_TIME_WAIT, 0); + return 1; } }