diff mbox

tcp: call tcp_replace_ts_recent() from tcp_ack()

Message ID 1366384390.16391.17.camel@edumazet-glaptop
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet April 19, 2013, 3:13 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

commit bd090dfc634d (tcp: tcp_replace_ts_recent() should not be called
from tcp_validate_incoming()) introduced a TS ecr bug in slow path
processing.

1 A > B P. 1:10001(10000) ack 1 <nop,nop,TS val 1001 ecr 200>
2 B < A . 1:1(0) ack 1 win 257 <sack 9001:10001,TS val 300 ecr 1001>
3 A > B . 1:1001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>
4 A > B . 1001:2001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>

(ecr 200 should be ecr 300 in packets 3 & 4)

Problem is tcp_ack() can trigger send of new packets (retransmits),
reflecting the prior TSval, instead of the TSval contained in the
currently processed incoming packet.

Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the
checks, but before the actions.

Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
---
Google-Bug-Id: 8660415

 net/ipv4/tcp_input.c |   56 +++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 27 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Neal Cardwell April 19, 2013, 4:33 p.m. UTC | #1
On Fri, Apr 19, 2013 at 11:13 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> commit bd090dfc634d (tcp: tcp_replace_ts_recent() should not be called
> from tcp_validate_incoming()) introduced a TS ecr bug in slow path
> processing.
>
> 1 A > B P. 1:10001(10000) ack 1 <nop,nop,TS val 1001 ecr 200>
> 2 B < A . 1:1(0) ack 1 win 257 <sack 9001:10001,TS val 300 ecr 1001>
> 3 A > B . 1:1001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>
> 4 A > B . 1001:2001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>
>
> (ecr 200 should be ecr 300 in packets 3 & 4)
>
> Problem is tcp_ack() can trigger send of new packets (retransmits),
> reflecting the prior TSval, instead of the TSval contained in the
> currently processed incoming packet.
>
> Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the
> checks, but before the actions.
>
> Reported-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>

Acked-by: Neal Cardwell <ncardwell@google.com>

This patch looks good. But AFAICT the other call site for
tcp_replace_ts_recent() has the same bug, which can be fixed in the
same way: tcp_rcv_state_process() seems to fall through the big switch
statement down to its call to tcp_replace_ts_recent() even in some
cases where tcp_ack() already decided the ACK was unacceptable.

neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 19, 2013, 4:59 p.m. UTC | #2
On Fri, 2013-04-19 at 12:33 -0400, Neal Cardwell wrote:

> This patch looks good. But AFAICT the other call site for
> tcp_replace_ts_recent() has the same bug, which can be fixed in the
> same way: tcp_rcv_state_process() seems to fall through the big switch
> statement down to its call to tcp_replace_ts_recent() even in some
> cases where tcp_ack() already decided the ACK was unacceptable.

I was not sure of the second call site, and was willing to discuss this
with you and Yuchung.

Are you comfortable this is net material and not net-next ? (We are
talking of states other than ESTABLISHED)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 19, 2013, 5:01 p.m. UTC | #3
On Fri, 2013-04-19 at 09:59 -0700, Eric Dumazet wrote:
> On Fri, 2013-04-19 at 12:33 -0400, Neal Cardwell wrote:
> 
> > This patch looks good. But AFAICT the other call site for
> > tcp_replace_ts_recent() has the same bug, which can be fixed in the
> > same way: tcp_rcv_state_process() seems to fall through the big switch
> > statement down to its call to tcp_replace_ts_recent() even in some
> > cases where tcp_ack() already decided the ACK was unacceptable.
> 
> I was not sure of the second call site, and was willing to discuss this
> with you and Yuchung.
> 
> Are you comfortable this is net material and not net-next ? (We are
> talking of states other than ESTABLISHED)

The other concern was about conflict with the prior net-next patch (tcp:
remove one indentation level in tcp_rcv_state_process())



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neal Cardwell April 19, 2013, 5:14 p.m. UTC | #4
On Fri, Apr 19, 2013 at 1:01 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2013-04-19 at 09:59 -0700, Eric Dumazet wrote:
>> On Fri, 2013-04-19 at 12:33 -0400, Neal Cardwell wrote:
>>
>> > This patch looks good. But AFAICT the other call site for
>> > tcp_replace_ts_recent() has the same bug, which can be fixed in the
>> > same way: tcp_rcv_state_process() seems to fall through the big switch
>> > statement down to its call to tcp_replace_ts_recent() even in some
>> > cases where tcp_ack() already decided the ACK was unacceptable.
>>
>> I was not sure of the second call site, and was willing to discuss this
>> with you and Yuchung.
>>
>> Are you comfortable this is net material and not net-next ? (We are
>> talking of states other than ESTABLISHED)
>
> The other concern was about conflict with the prior net-next patch (tcp:
> remove one indentation level in tcp_rcv_state_process())

Good point. How about fixing both tcp_replace_ts_recent() call sites
in net-next, as a single patch after your  "tcp: remove one
indentation level in tcp_rcv_state_process()" net-next patch?

neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 19, 2013, 5:18 p.m. UTC | #5
On Fri, 2013-04-19 at 13:14 -0400, Neal Cardwell wrote:

> Good point. How about fixing both tcp_replace_ts_recent() call sites
> in net-next, as a single patch after your  "tcp: remove one
> indentation level in tcp_rcv_state_process()" net-next patch?

I think the cleanup can be delayed, I'll resend a v2 of the fix.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 3bd55ba..7e9d37d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -113,6 +113,7 @@  int sysctl_tcp_early_retrans __read_mostly = 2;
 #define FLAG_DSACKING_ACK	0x800 /* SACK blocks contained D-SACK info */
 #define FLAG_NONHEAD_RETRANS_ACKED	0x1000 /* Non-head rexmitted data was ACKed */
 #define FLAG_SACK_RENEGING	0x2000 /* snd_una advanced to a sacked seq */
+#define FLAG_UPDATE_TS_RECENT	0x4000 /* tcp_replace_ts_recent() */
 
 #define FLAG_ACKED		(FLAG_DATA_ACKED|FLAG_SYN_ACKED)
 #define FLAG_NOT_DUP		(FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED)
@@ -3564,6 +3565,27 @@  static void tcp_send_challenge_ack(struct sock *sk)
 	}
 }
 
+static void tcp_store_ts_recent(struct tcp_sock *tp)
+{
+	tp->rx_opt.ts_recent = tp->rx_opt.rcv_tsval;
+	tp->rx_opt.ts_recent_stamp = get_seconds();
+}
+
+static void tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq)
+{
+	if (tp->rx_opt.saw_tstamp && !after(seq, tp->rcv_wup)) {
+		/* PAWS bug workaround wrt. ACK frames, the PAWS discard
+		 * extra check below makes sure this can only happen
+		 * for pure ACK frames.  -DaveM
+		 *
+		 * Not only, also it occurs for expired timestamps.
+		 */
+
+		if (tcp_paws_check(&tp->rx_opt, 0))
+			tcp_store_ts_recent(tp);
+	}
+}
+
 /* This routine deals with incoming acks, but not outgoing ones. */
 static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 {
@@ -3607,6 +3629,12 @@  static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	prior_fackets = tp->fackets_out;
 	prior_in_flight = tcp_packets_in_flight(tp);
 
+	/* ts_recent update must be made after we are sure that the packet
+	 * is in window.
+	 */
+	if (flag & FLAG_UPDATE_TS_RECENT)
+		tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq);
+
 	if (!(flag & FLAG_SLOWPATH) && after(ack, prior_snd_una)) {
 		/* Window is constant, pure forward advance.
 		 * No more checks are required.
@@ -3927,27 +3955,6 @@  const u8 *tcp_parse_md5sig_option(const struct tcphdr *th)
 EXPORT_SYMBOL(tcp_parse_md5sig_option);
 #endif
 
-static inline void tcp_store_ts_recent(struct tcp_sock *tp)
-{
-	tp->rx_opt.ts_recent = tp->rx_opt.rcv_tsval;
-	tp->rx_opt.ts_recent_stamp = get_seconds();
-}
-
-static inline void tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq)
-{
-	if (tp->rx_opt.saw_tstamp && !after(seq, tp->rcv_wup)) {
-		/* PAWS bug workaround wrt. ACK frames, the PAWS discard
-		 * extra check below makes sure this can only happen
-		 * for pure ACK frames.  -DaveM
-		 *
-		 * Not only, also it occurs for expired timestamps.
-		 */
-
-		if (tcp_paws_check(&tp->rx_opt, 0))
-			tcp_store_ts_recent(tp);
-	}
-}
-
 /* Sorry, PAWS as specified is broken wrt. pure-ACKs -DaveM
  *
  * It is not fatal. If this ACK does _not_ change critical state (seqs, window)
@@ -5543,14 +5550,9 @@  slow_path:
 		return 0;
 
 step5:
-	if (tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
+	if (tcp_ack(sk, skb, FLAG_SLOWPATH | FLAG_UPDATE_TS_RECENT) < 0)
 		goto discard;
 
-	/* ts_recent update must be made after we are sure that the packet
-	 * is in window.
-	 */
-	tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq);
-
 	tcp_rcv_rtt_measure_ts(sk, skb);
 
 	/* Process urgent data. */