diff mbox series

[nf] netfilter: conntrack: always store window size un-scaled

Message ID 20190711222905.22000-1-fw@strlen.de
State Accepted
Delegated to: Pablo Neira
Headers show
Series [nf] netfilter: conntrack: always store window size un-scaled | expand

Commit Message

Florian Westphal July 11, 2019, 10:29 p.m. UTC
Jakub Jankowski reported following oddity:

After 3 way handshake completes, timeout of new connection is set to
max_retrans (300s) instead of established (5 days).

shortened excerpt from pcap provided:
25.070622 IP (flags [DF], proto TCP (6), length 52)
10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
26.070462 IP (flags [DF], proto TCP (6), length 48)
10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3]
27.070449 IP (flags [DF], proto TCP (6), length 40)
10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0

Turns out the last_win is of u16 type, but we store the scaled value:
512 << 8 (== 0x20000) becomes 0 window.

The Fixes tag is not correct, as the bug has existed forever, but
without that change all that this causes might cause is to mistake a
window update (to-nonzero-from-zero) for a retransmit.

Fixes: fbcd253d2448b8 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0")
Reported-by: Jakub Jankowski <shasta@toxcorp.com>
Tested-by: Jakub Jankowski <shasta@toxcorp.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_proto_tcp.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Jozsef Kadlecsik July 12, 2019, 10:50 a.m. UTC | #1
On Fri, 12 Jul 2019, Florian Westphal wrote:

> Jakub Jankowski reported following oddity:
> 
> After 3 way handshake completes, timeout of new connection is set to
> max_retrans (300s) instead of established (5 days).
> 
> shortened excerpt from pcap provided:
> 25.070622 IP (flags [DF], proto TCP (6), length 52)
> 10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
> 26.070462 IP (flags [DF], proto TCP (6), length 48)
> 10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3]
> 27.070449 IP (flags [DF], proto TCP (6), length 40)
> 10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0
> 
> Turns out the last_win is of u16 type, but we store the scaled value:
> 512 << 8 (== 0x20000) becomes 0 window.
> 
> The Fixes tag is not correct, as the bug has existed forever, but
> without that change all that this causes might cause is to mistake a
> window update (to-nonzero-from-zero) for a retransmit.
> 
> Fixes: fbcd253d2448b8 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0")
> Reported-by: Jakub Jankowski <shasta@toxcorp.com>
> Tested-by: Jakub Jankowski <shasta@toxcorp.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

It was a nice report and catch!

Best regards,
Jozsef

> ---
>  net/netfilter/nf_conntrack_proto_tcp.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> index 7ba01d8ee165..9fe1d5e46249 100644
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -475,6 +475,7 @@ static bool tcp_in_window(const struct nf_conn *ct,
>  	struct ip_ct_tcp_state *receiver = &state->seen[!dir];
>  	const struct nf_conntrack_tuple *tuple = &ct->tuplehash[dir].tuple;
>  	__u32 seq, ack, sack, end, win, swin;
> +	u16 win_raw;
>  	s32 receiver_offset;
>  	bool res, in_recv_win;
>  
> @@ -483,7 +484,8 @@ static bool tcp_in_window(const struct nf_conn *ct,
>  	 */
>  	seq = ntohl(tcph->seq);
>  	ack = sack = ntohl(tcph->ack_seq);
> -	win = ntohs(tcph->window);
> +	win_raw = ntohs(tcph->window);
> +	win = win_raw;
>  	end = segment_seq_plus_len(seq, skb->len, dataoff, tcph);
>  
>  	if (receiver->flags & IP_CT_TCP_FLAG_SACK_PERM)
> @@ -658,14 +660,14 @@ static bool tcp_in_window(const struct nf_conn *ct,
>  			    && state->last_seq == seq
>  			    && state->last_ack == ack
>  			    && state->last_end == end
> -			    && state->last_win == win)
> +			    && state->last_win == win_raw)
>  				state->retrans++;
>  			else {
>  				state->last_dir = dir;
>  				state->last_seq = seq;
>  				state->last_ack = ack;
>  				state->last_end = end;
> -				state->last_win = win;
> +				state->last_win = win_raw;
>  				state->retrans = 0;
>  			}
>  		}
> -- 
> 2.21.0
> 
> 

-
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
          H-1525 Budapest 114, POB. 49, Hungary
Pablo Neira Ayuso July 16, 2019, 11:23 a.m. UTC | #2
On Fri, Jul 12, 2019 at 12:50:35PM +0200, Jozsef Kadlecsik wrote:
> On Fri, 12 Jul 2019, Florian Westphal wrote:
> 
> > Jakub Jankowski reported following oddity:
> > 
> > After 3 way handshake completes, timeout of new connection is set to
> > max_retrans (300s) instead of established (5 days).
> > 
> > shortened excerpt from pcap provided:
> > 25.070622 IP (flags [DF], proto TCP (6), length 52)
> > 10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
> > 26.070462 IP (flags [DF], proto TCP (6), length 48)
> > 10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3]
> > 27.070449 IP (flags [DF], proto TCP (6), length 40)
> > 10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0
> > 
> > Turns out the last_win is of u16 type, but we store the scaled value:
> > 512 << 8 (== 0x20000) becomes 0 window.
> > 
> > The Fixes tag is not correct, as the bug has existed forever, but
> > without that change all that this causes might cause is to mistake a
> > window update (to-nonzero-from-zero) for a retransmit.
> > 
> > Fixes: fbcd253d2448b8 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0")
> > Reported-by: Jakub Jankowski <shasta@toxcorp.com>
> > Tested-by: Jakub Jankowski <shasta@toxcorp.com>
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> 
> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

Applied, thanks for reviewing Jozsef.
Reindl Harald July 27, 2019, 12:15 p.m. UTC | #3
this seemed to be fixed in 5.1.19 but not announced in
https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.1.19 and after
update to 5.1.20 last night the freezes of a ssh-tunneled vnc session
are starting again without exclude "lo" while the session was stable 3 days

see topic "Connection timeouts due to INVALID state rule"

Am 12.07.19 um 00:29 schrieb Florian Westphal:
> Jakub Jankowski reported following oddity:
> 
> After 3 way handshake completes, timeout of new connection is set to
> max_retrans (300s) instead of established (5 days).
> 
> shortened excerpt from pcap provided:
> 25.070622 IP (flags [DF], proto TCP (6), length 52)
> 10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
> 26.070462 IP (flags [DF], proto TCP (6), length 48)
> 10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3]
> 27.070449 IP (flags [DF], proto TCP (6), length 40)
> 10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0
> 
> Turns out the last_win is of u16 type, but we store the scaled value:
> 512 << 8 (== 0x20000) becomes 0 window.
> 
> The Fixes tag is not correct, as the bug has existed forever, but
> without that change all that this causes might cause is to mistake a
> window update (to-nonzero-from-zero) for a retransmit.
> 
> Fixes: fbcd253d2448b8 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0")
> Reported-by: Jakub Jankowski <shasta@toxcorp.com>
> Tested-by: Jakub Jankowski <shasta@toxcorp.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/netfilter/nf_conntrack_proto_tcp.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> index 7ba01d8ee165..9fe1d5e46249 100644
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -475,6 +475,7 @@ static bool tcp_in_window(const struct nf_conn *ct,
>  	struct ip_ct_tcp_state *receiver = &state->seen[!dir];
>  	const struct nf_conntrack_tuple *tuple = &ct->tuplehash[dir].tuple;
>  	__u32 seq, ack, sack, end, win, swin;
> +	u16 win_raw;
>  	s32 receiver_offset;
>  	bool res, in_recv_win;
>  
> @@ -483,7 +484,8 @@ static bool tcp_in_window(const struct nf_conn *ct,
>  	 */
>  	seq = ntohl(tcph->seq);
>  	ack = sack = ntohl(tcph->ack_seq);
> -	win = ntohs(tcph->window);
> +	win_raw = ntohs(tcph->window);
> +	win = win_raw;
>  	end = segment_seq_plus_len(seq, skb->len, dataoff, tcph);
>  
>  	if (receiver->flags & IP_CT_TCP_FLAG_SACK_PERM)
> @@ -658,14 +660,14 @@ static bool tcp_in_window(const struct nf_conn *ct,
>  			    && state->last_seq == seq
>  			    && state->last_ack == ack
>  			    && state->last_end == end
> -			    && state->last_win == win)
> +			    && state->last_win == win_raw)
>  				state->retrans++;
>  			else {
>  				state->last_dir = dir;
>  				state->last_seq = seq;
>  				state->last_ack = ack;
>  				state->last_end = end;
> -				state->last_win = win;
> +				state->last_win = win_raw;
>  				state->retrans = 0;
>  			}
>  		}
>
Thomas Jarosch Aug. 13, 2019, 8:47 a.m. UTC | #4
Hi Florian,

You wrote on Fri, Jul 12, 2019 at 12:29:05AM +0200:
> Jakub Jankowski reported following oddity:
> 
> After 3 way handshake completes, timeout of new connection is set to
> max_retrans (300s) instead of established (5 days).
> 
> shortened excerpt from pcap provided:
> 25.070622 IP (flags [DF], proto TCP (6), length 52)
> 10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
> 26.070462 IP (flags [DF], proto TCP (6), length 48)
> 10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3]
> 27.070449 IP (flags [DF], proto TCP (6), length 40)
> 10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0
> 
> Turns out the last_win is of u16 type, but we store the scaled value:
> 512 << 8 (== 0x20000) becomes 0 window.
> 
> The Fixes tag is not correct, as the bug has existed forever, but
> without that change all that this causes might cause is to mistake a
> window update (to-nonzero-from-zero) for a retransmit.
> 
> Fixes: fbcd253d2448b8 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0")
> Reported-by: Jakub Jankowski <shasta@toxcorp.com>
> Tested-by: Jakub Jankowski <shasta@toxcorp.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

it seems the patch fixed a kernel bugzilla entry, too:
https://bugzilla.kernel.org/show_bug.cgi?id=202287

I had a feeling it could be related since the reporter bisected it
down to changes in the TCP window scaling defaults.

Cheers,
Thomas
diff mbox series

Patch

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 7ba01d8ee165..9fe1d5e46249 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -475,6 +475,7 @@  static bool tcp_in_window(const struct nf_conn *ct,
 	struct ip_ct_tcp_state *receiver = &state->seen[!dir];
 	const struct nf_conntrack_tuple *tuple = &ct->tuplehash[dir].tuple;
 	__u32 seq, ack, sack, end, win, swin;
+	u16 win_raw;
 	s32 receiver_offset;
 	bool res, in_recv_win;
 
@@ -483,7 +484,8 @@  static bool tcp_in_window(const struct nf_conn *ct,
 	 */
 	seq = ntohl(tcph->seq);
 	ack = sack = ntohl(tcph->ack_seq);
-	win = ntohs(tcph->window);
+	win_raw = ntohs(tcph->window);
+	win = win_raw;
 	end = segment_seq_plus_len(seq, skb->len, dataoff, tcph);
 
 	if (receiver->flags & IP_CT_TCP_FLAG_SACK_PERM)
@@ -658,14 +660,14 @@  static bool tcp_in_window(const struct nf_conn *ct,
 			    && state->last_seq == seq
 			    && state->last_ack == ack
 			    && state->last_end == end
-			    && state->last_win == win)
+			    && state->last_win == win_raw)
 				state->retrans++;
 			else {
 				state->last_dir = dir;
 				state->last_seq = seq;
 				state->last_ack = ack;
 				state->last_end = end;
-				state->last_win = win;
+				state->last_win = win_raw;
 				state->retrans = 0;
 			}
 		}