Message ID | 1366384390.16391.17.camel@edumazet-glaptop |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, Apr 19, 2013 at 11:13 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > commit bd090dfc634d (tcp: tcp_replace_ts_recent() should not be called > from tcp_validate_incoming()) introduced a TS ecr bug in slow path > processing. > > 1 A > B P. 1:10001(10000) ack 1 <nop,nop,TS val 1001 ecr 200> > 2 B < A . 1:1(0) ack 1 win 257 <sack 9001:10001,TS val 300 ecr 1001> > 3 A > B . 1:1001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200> > 4 A > B . 1001:2001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200> > > (ecr 200 should be ecr 300 in packets 3 & 4) > > Problem is tcp_ack() can trigger send of new packets (retransmits), > reflecting the prior TSval, instead of the TSval contained in the > currently processed incoming packet. > > Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the > checks, but before the actions. > > Reported-by: Yuchung Cheng <ycheng@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> This patch looks good. But AFAICT the other call site for tcp_replace_ts_recent() has the same bug, which can be fixed in the same way: tcp_rcv_state_process() seems to fall through the big switch statement down to its call to tcp_replace_ts_recent() even in some cases where tcp_ack() already decided the ACK was unacceptable. neal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-04-19 at 12:33 -0400, Neal Cardwell wrote: > This patch looks good. But AFAICT the other call site for > tcp_replace_ts_recent() has the same bug, which can be fixed in the > same way: tcp_rcv_state_process() seems to fall through the big switch > statement down to its call to tcp_replace_ts_recent() even in some > cases where tcp_ack() already decided the ACK was unacceptable. I was not sure of the second call site, and was willing to discuss this with you and Yuchung. Are you comfortable this is net material and not net-next ? (We are talking of states other than ESTABLISHED) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-04-19 at 09:59 -0700, Eric Dumazet wrote: > On Fri, 2013-04-19 at 12:33 -0400, Neal Cardwell wrote: > > > This patch looks good. But AFAICT the other call site for > > tcp_replace_ts_recent() has the same bug, which can be fixed in the > > same way: tcp_rcv_state_process() seems to fall through the big switch > > statement down to its call to tcp_replace_ts_recent() even in some > > cases where tcp_ack() already decided the ACK was unacceptable. > > I was not sure of the second call site, and was willing to discuss this > with you and Yuchung. > > Are you comfortable this is net material and not net-next ? (We are > talking of states other than ESTABLISHED) The other concern was about conflict with the prior net-next patch (tcp: remove one indentation level in tcp_rcv_state_process()) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 19, 2013 at 1:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Fri, 2013-04-19 at 09:59 -0700, Eric Dumazet wrote: >> On Fri, 2013-04-19 at 12:33 -0400, Neal Cardwell wrote: >> >> > This patch looks good. But AFAICT the other call site for >> > tcp_replace_ts_recent() has the same bug, which can be fixed in the >> > same way: tcp_rcv_state_process() seems to fall through the big switch >> > statement down to its call to tcp_replace_ts_recent() even in some >> > cases where tcp_ack() already decided the ACK was unacceptable. >> >> I was not sure of the second call site, and was willing to discuss this >> with you and Yuchung. >> >> Are you comfortable this is net material and not net-next ? (We are >> talking of states other than ESTABLISHED) > > The other concern was about conflict with the prior net-next patch (tcp: > remove one indentation level in tcp_rcv_state_process()) Good point. How about fixing both tcp_replace_ts_recent() call sites in net-next, as a single patch after your "tcp: remove one indentation level in tcp_rcv_state_process()" net-next patch? neal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-04-19 at 13:14 -0400, Neal Cardwell wrote: > Good point. How about fixing both tcp_replace_ts_recent() call sites > in net-next, as a single patch after your "tcp: remove one > indentation level in tcp_rcv_state_process()" net-next patch? I think the cleanup can be delayed, I'll resend a v2 of the fix. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 3bd55ba..7e9d37d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -113,6 +113,7 @@ int sysctl_tcp_early_retrans __read_mostly = 2; #define FLAG_DSACKING_ACK 0x800 /* SACK blocks contained D-SACK info */ #define FLAG_NONHEAD_RETRANS_ACKED 0x1000 /* Non-head rexmitted data was ACKed */ #define FLAG_SACK_RENEGING 0x2000 /* snd_una advanced to a sacked seq */ +#define FLAG_UPDATE_TS_RECENT 0x4000 /* tcp_replace_ts_recent() */ #define FLAG_ACKED (FLAG_DATA_ACKED|FLAG_SYN_ACKED) #define FLAG_NOT_DUP (FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED) @@ -3564,6 +3565,27 @@ static void tcp_send_challenge_ack(struct sock *sk) } } +static void tcp_store_ts_recent(struct tcp_sock *tp) +{ + tp->rx_opt.ts_recent = tp->rx_opt.rcv_tsval; + tp->rx_opt.ts_recent_stamp = get_seconds(); +} + +static void tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq) +{ + if (tp->rx_opt.saw_tstamp && !after(seq, tp->rcv_wup)) { + /* PAWS bug workaround wrt. ACK frames, the PAWS discard + * extra check below makes sure this can only happen + * for pure ACK frames. -DaveM + * + * Not only, also it occurs for expired timestamps. + */ + + if (tcp_paws_check(&tp->rx_opt, 0)) + tcp_store_ts_recent(tp); + } +} + /* This routine deals with incoming acks, but not outgoing ones. */ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) { @@ -3607,6 +3629,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) prior_fackets = tp->fackets_out; prior_in_flight = tcp_packets_in_flight(tp); + /* ts_recent update must be made after we are sure that the packet + * is in window. + */ + if (flag & FLAG_UPDATE_TS_RECENT) + tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq); + if (!(flag & FLAG_SLOWPATH) && after(ack, prior_snd_una)) { /* Window is constant, pure forward advance. * No more checks are required. @@ -3927,27 +3955,6 @@ const u8 *tcp_parse_md5sig_option(const struct tcphdr *th) EXPORT_SYMBOL(tcp_parse_md5sig_option); #endif -static inline void tcp_store_ts_recent(struct tcp_sock *tp) -{ - tp->rx_opt.ts_recent = tp->rx_opt.rcv_tsval; - tp->rx_opt.ts_recent_stamp = get_seconds(); -} - -static inline void tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq) -{ - if (tp->rx_opt.saw_tstamp && !after(seq, tp->rcv_wup)) { - /* PAWS bug workaround wrt. ACK frames, the PAWS discard - * extra check below makes sure this can only happen - * for pure ACK frames. -DaveM - * - * Not only, also it occurs for expired timestamps. - */ - - if (tcp_paws_check(&tp->rx_opt, 0)) - tcp_store_ts_recent(tp); - } -} - /* Sorry, PAWS as specified is broken wrt. pure-ACKs -DaveM * * It is not fatal. If this ACK does _not_ change critical state (seqs, window) @@ -5543,14 +5550,9 @@ slow_path: return 0; step5: - if (tcp_ack(sk, skb, FLAG_SLOWPATH) < 0) + if (tcp_ack(sk, skb, FLAG_SLOWPATH | FLAG_UPDATE_TS_RECENT) < 0) goto discard; - /* ts_recent update must be made after we are sure that the packet - * is in window. - */ - tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq); - tcp_rcv_rtt_measure_ts(sk, skb); /* Process urgent data. */