diff mbox series

tcp: Fix highest_sack and highest_sack_seq

Message ID 20191227085237.7295-1-cambda@linux.alibaba.com
State Accepted
Delegated to: David Miller
Headers show
Series tcp: Fix highest_sack and highest_sack_seq | expand

Commit Message

Cambda Zhu Dec. 27, 2019, 8:52 a.m. UTC
From commit 50895b9de1d3 ("tcp: highest_sack fix"), the logic about
setting tp->highest_sack to the head of the send queue was removed.
Of course the logic is error prone, but it is logical. Before we
remove the pointer to the highest sack skb and use the seq instead,
we need to set tp->highest_sack to NULL when there is no skb after
the last sack, and then replace NULL with the real skb when new skb
inserted into the rtx queue, because the NULL means the highest sack
seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
the next ACK with sack option will increase tp->reordering unexpectedly.

This patch sets tp->highest_sack to the tail of the rtx queue if
it's NULL and new data is sent. The patch keeps the rule that the
highest_sack can only be maintained by sack processing, except for
this only case.

Fixes: 50895b9de1d3 ("tcp: highest_sack fix")
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
---
 net/ipv4/tcp_output.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Eric Dumazet Dec. 28, 2019, 12:49 a.m. UTC | #1
On 12/27/19 12:52 AM, Cambda Zhu wrote:
> From commit 50895b9de1d3 ("tcp: highest_sack fix"), the logic about
> setting tp->highest_sack to the head of the send queue was removed.
> Of course the logic is error prone, but it is logical. Before we
> remove the pointer to the highest sack skb and use the seq instead,
> we need to set tp->highest_sack to NULL when there is no skb after
> the last sack, and then replace NULL with the real skb when new skb
> inserted into the rtx queue, because the NULL means the highest sack
> seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
> the next ACK with sack option will increase tp->reordering unexpectedly.
> 
> This patch sets tp->highest_sack to the tail of the rtx queue if
> it's NULL and new data is sent. The patch keeps the rule that the
> highest_sack can only be maintained by sack processing, except for
> this only case.
> 
> Fixes: 50895b9de1d3 ("tcp: highest_sack fix")
> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
> ---
>  net/ipv4/tcp_output.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 1f7735ca8f22..58c92a7d671c 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -72,6 +72,9 @@ static void tcp_event_new_data_sent(struct sock *sk, struct sk_buff *skb)
>  	__skb_unlink(skb, &sk->sk_write_queue);
>  	tcp_rbtree_insert(&sk->tcp_rtx_queue, skb);
>  
> +	if (tp->highest_sack == NULL)
> +		tp->highest_sack = skb;
> +
>  	tp->packets_out += tcp_skb_pcount(skb);
>  	if (!prior_packets || icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
>  		tcp_rearm_rto(sk);
> 


This patch seems to keep something in the fast path, even for flows never experiencing
sacks.

Why would we always painfully maintain tp->highest_sack to the left most skb in the rtx queue ?

Given that tcp_highest_sack_seq() has an explicit check about tp->highest_sack being NULL,
there is something I do not quite understand yet.

Why keeping this piece of code ?

    if (tp->highest_sack == NULL)
            return tp->snd_nxt;

Defensive programming should be replaced by better knowledge.

Can you provide more explanations, or maybe a packetdrill test ?

Maybe some other path (in slow path this time) misses a !tp->highest_sack test.

Thanks.
Eric Dumazet Dec. 28, 2019, 3:28 a.m. UTC | #2
On 12/27/19 4:49 PM, Eric Dumazet wrote:
> 
> 
> On 12/27/19 12:52 AM, Cambda Zhu wrote:
>> From commit 50895b9de1d3 ("tcp: highest_sack fix"), the logic about
>> setting tp->highest_sack to the head of the send queue was removed.
>> Of course the logic is error prone, but it is logical. Before we
>> remove the pointer to the highest sack skb and use the seq instead,
>> we need to set tp->highest_sack to NULL when there is no skb after
>> the last sack, and then replace NULL with the real skb when new skb
>> inserted into the rtx queue, because the NULL means the highest sack
>> seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
>> the next ACK with sack option will increase tp->reordering unexpectedly.
>>
>> This patch sets tp->highest_sack to the tail of the rtx queue if
>> it's NULL and new data is sent. The patch keeps the rule that the
>> highest_sack can only be maintained by sack processing, except for
>> this only case.
>>
>> Fixes: 50895b9de1d3 ("tcp: highest_sack fix")
>> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
>> ---
>>  net/ipv4/tcp_output.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 1f7735ca8f22..58c92a7d671c 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -72,6 +72,9 @@ static void tcp_event_new_data_sent(struct sock *sk, struct sk_buff *skb)
>>  	__skb_unlink(skb, &sk->sk_write_queue);
>>  	tcp_rbtree_insert(&sk->tcp_rtx_queue, skb);
>>  
>> +	if (tp->highest_sack == NULL)
>> +		tp->highest_sack = skb;
>> +
>>  	tp->packets_out += tcp_skb_pcount(skb);
>>  	if (!prior_packets || icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
>>  		tcp_rearm_rto(sk);
>>
> 
> 
> This patch seems to keep something in the fast path, even for flows never experiencing
> sacks.
> 
> Why would we always painfully maintain tp->highest_sack to the left most skb in the rtx queue ?
> 
> Given that tcp_highest_sack_seq() has an explicit check about tp->highest_sack being NULL,
> there is something I do not quite understand yet.
> 
> Why keeping this piece of code ?
> 
>     if (tp->highest_sack == NULL)
>             return tp->snd_nxt;
> 
> Defensive programming should be replaced by better knowledge.
> 
> Can you provide more explanations, or maybe a packetdrill test ?
> 
> Maybe some other path (in slow path this time) misses a !tp->highest_sack test.
> 
> Thanks.
> 

Or maybe the real bug has been latent for years.

(added in commit 6859d49475d4 "[TCP]: Abstract tp->highest_sack accessing & point to next skb" )


diff --git a/include/net/tcp.h b/include/net/tcp.h
index e460ea7f767ba627972a63a974cae80357808366..32781fb5cf3a7aa1158c98cb87754b59dc922b1f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1843,12 +1843,9 @@ static inline void tcp_push_pending_frames(struct sock *sk)
  */
 static inline u32 tcp_highest_sack_seq(struct tcp_sock *tp)
 {
-       if (!tp->sacked_out)
+       if (!tp->sacked_out || !tp->highest_sack)
                return tp->snd_una;
 
-       if (tp->highest_sack == NULL)
-               return tp->snd_nxt;
-
        return TCP_SKB_CB(tp->highest_sack)->seq;
 }
Cambda Zhu Dec. 28, 2019, 3:58 a.m. UTC | #3
> On Dec 28, 2019, at 08:49, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> 
> 
> On 12/27/19 12:52 AM, Cambda Zhu wrote:
>> From commit 50895b9de1d3 ("tcp: highest_sack fix"), the logic about
>> setting tp->highest_sack to the head of the send queue was removed.
>> Of course the logic is error prone, but it is logical. Before we
>> remove the pointer to the highest sack skb and use the seq instead,
>> we need to set tp->highest_sack to NULL when there is no skb after
>> the last sack, and then replace NULL with the real skb when new skb
>> inserted into the rtx queue, because the NULL means the highest sack
>> seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
>> the next ACK with sack option will increase tp->reordering unexpectedly.
>> 
>> This patch sets tp->highest_sack to the tail of the rtx queue if
>> it's NULL and new data is sent. The patch keeps the rule that the
>> highest_sack can only be maintained by sack processing, except for
>> this only case.
>> 
>> Fixes: 50895b9de1d3 ("tcp: highest_sack fix")
>> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
>> ---
>> net/ipv4/tcp_output.c | 3 +++
>> 1 file changed, 3 insertions(+)
>> 
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 1f7735ca8f22..58c92a7d671c 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -72,6 +72,9 @@ static void tcp_event_new_data_sent(struct sock *sk, struct sk_buff *skb)
>> 	__skb_unlink(skb, &sk->sk_write_queue);
>> 	tcp_rbtree_insert(&sk->tcp_rtx_queue, skb);
>> 
>> +	if (tp->highest_sack == NULL)
>> +		tp->highest_sack = skb;
>> +
>> 	tp->packets_out += tcp_skb_pcount(skb);
>> 	if (!prior_packets || icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
>> 		tcp_rearm_rto(sk);
>> 
> 
> 
> This patch seems to keep something in the fast path, even for flows never experiencing
> sacks.
> 
I’ll update the patch to check tcp_is_sack(). It’s a good idea, thank you.

> Why would we always painfully maintain tp->highest_sack to the left most skb in the rtx queue ?
> 
> Given that tcp_highest_sack_seq() has an explicit check about tp->highest_sack being NULL,
> there is something I do not quite understand yet.
> 
> Why keeping this piece of code ?
> 
>    if (tp->highest_sack == NULL)
>            return tp->snd_nxt;
> 
Before we replace tp->highest_sack with tp->highest_sack_seq, the highest_sack_seq is always
gotten from the highest_sack skb. Unfortunately if the skb before tp->snd_nxt is sacked, we
have no choice but to set tp->highest_sack to a special skb or value yet, such as the send head
in the past and the NULL now.

In the past, if there’s no skb after the last sacked skb in the write queue, tp->highest_sack
will be set to NULL and tcp_highest_sack_seq() will return tp->snd_nxt, which is equal to
TCP_SKB_CB(skb_after_last_sack)->seq. And then if new data is sent, the tp->highest_sack must
be replaced by the new skb, otherwise the tcp_highest_sack_seq() will return a higher seq
and result in increasing tp->reordering when the skb which is after the last sack and before
the last sent new skb.

Now we have the same issue. If the skb_rb_last() skb is sacked, tp->highest_sack is set to NULL.
If new data is sent, tcp_highest_sack_seq() will return a higher seq than what we expect.

Here is my UGLY example:
Low								High
#1	#2	#3(last sacked)
			       ^
			       |highest_sack_seq(NULL)
#1	#2	#3(last sacked)		#4		#5	#6
								  ^
								  |highest_sack_seq(NULL)
#1	#2	#3			#4(last sacked)	#5	#6
					{	reordering	  }

> Defensive programming should be replaced by better knowledge.
> 
> Can you provide more explanations, or maybe a packetdrill test ?
> 
> Maybe some other path (in slow path this time) misses a !tp->highest_sack test.
> 
> Thanks.
I don’t know how to use a packetdrill test to reproduce this case, sorry. But I can reproduce
it via network emulator with these parameters:

Server:
rack off, metric no saving, congestion control bbr

Client:
random loss 8%, fixed rtt 20 ms, 60 Mbps bandwidth with 100 packets queue and tail drop.

Thanks.
Cambda Zhu Dec. 28, 2019, 4:26 a.m. UTC | #4
> On Dec 28, 2019, at 11:28, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> 
> 
> On 12/27/19 4:49 PM, Eric Dumazet wrote:
>> 
>> 
>> On 12/27/19 12:52 AM, Cambda Zhu wrote:
>>> From commit 50895b9de1d3 ("tcp: highest_sack fix"), the logic about
>>> setting tp->highest_sack to the head of the send queue was removed.
>>> Of course the logic is error prone, but it is logical. Before we
>>> remove the pointer to the highest sack skb and use the seq instead,
>>> we need to set tp->highest_sack to NULL when there is no skb after
>>> the last sack, and then replace NULL with the real skb when new skb
>>> inserted into the rtx queue, because the NULL means the highest sack
>>> seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
>>> the next ACK with sack option will increase tp->reordering unexpectedly.
>>> 
>>> This patch sets tp->highest_sack to the tail of the rtx queue if
>>> it's NULL and new data is sent. The patch keeps the rule that the
>>> highest_sack can only be maintained by sack processing, except for
>>> this only case.
>>> 
>>> Fixes: 50895b9de1d3 ("tcp: highest_sack fix")
>>> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
>>> ---
>>> net/ipv4/tcp_output.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>> 
>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>> index 1f7735ca8f22..58c92a7d671c 100644
>>> --- a/net/ipv4/tcp_output.c
>>> +++ b/net/ipv4/tcp_output.c
>>> @@ -72,6 +72,9 @@ static void tcp_event_new_data_sent(struct sock *sk, struct sk_buff *skb)
>>> 	__skb_unlink(skb, &sk->sk_write_queue);
>>> 	tcp_rbtree_insert(&sk->tcp_rtx_queue, skb);
>>> 
>>> +	if (tp->highest_sack == NULL)
>>> +		tp->highest_sack = skb;
>>> +
>>> 	tp->packets_out += tcp_skb_pcount(skb);
>>> 	if (!prior_packets || icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
>>> 		tcp_rearm_rto(sk);
>>> 
>> 
>> 
>> This patch seems to keep something in the fast path, even for flows never experiencing
>> sacks.
>> 
>> Why would we always painfully maintain tp->highest_sack to the left most skb in the rtx queue ?
>> 
>> Given that tcp_highest_sack_seq() has an explicit check about tp->highest_sack being NULL,
>> there is something I do not quite understand yet.
>> 
>> Why keeping this piece of code ?
>> 
>>    if (tp->highest_sack == NULL)
>>            return tp->snd_nxt;
>> 
>> Defensive programming should be replaced by better knowledge.
>> 
>> Can you provide more explanations, or maybe a packetdrill test ?
>> 
>> Maybe some other path (in slow path this time) misses a !tp->highest_sack test.
>> 
>> Thanks.
>> 
> 
> Or maybe the real bug has been latent for years.
> 
> (added in commit 6859d49475d4 "[TCP]: Abstract tp->highest_sack accessing & point to next skb" )
> 
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index e460ea7f767ba627972a63a974cae80357808366..32781fb5cf3a7aa1158c98cb87754b59dc922b1f 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1843,12 +1843,9 @@ static inline void tcp_push_pending_frames(struct sock *sk)
>  */
> static inline u32 tcp_highest_sack_seq(struct tcp_sock *tp)
> {
> -       if (!tp->sacked_out)
> +       if (!tp->sacked_out || !tp->highest_sack)
>                return tp->snd_una;
> 
> -       if (tp->highest_sack == NULL)
> -               return tp->snd_nxt;
> -
>        return TCP_SKB_CB(tp->highest_sack)->seq;
> }

The key is that we use skb but not seq to save the highest sack yet. I think it should be done in
a better way. If we replace the next skb with the last sacked skb or seq after it, we need to change
all of the sack processing about highest sack, which is worth to do in my opinion.

Thanks.
Eric Dumazet Dec. 29, 2019, 6:11 a.m. UTC | #5
On 12/27/19 12:52 AM, Cambda Zhu wrote:
> From commit 50895b9de1d3 ("tcp: highest_sack fix"), the logic about
> setting tp->highest_sack to the head of the send queue was removed.
> Of course the logic is error prone, but it is logical. Before we
> remove the pointer to the highest sack skb and use the seq instead,
> we need to set tp->highest_sack to NULL when there is no skb after
> the last sack, and then replace NULL with the real skb when new skb
> inserted into the rtx queue, because the NULL means the highest sack
> seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
> the next ACK with sack option will increase tp->reordering unexpectedly.
> 
> This patch sets tp->highest_sack to the tail of the rtx queue if
> it's NULL and new data is sent. The patch keeps the rule that the
> highest_sack can only be maintained by sack processing, except for
> this only case.
> 
> Fixes: 50895b9de1d3 ("tcp: highest_sack fix")
> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>

Signed-off-by: Eric Dumazet <edumazet@google.com>

Sadly I could not come to an alternative solution.

Thanks !
David Miller Dec. 31, 2019, 4:29 a.m. UTC | #6
From: Cambda Zhu <cambda@linux.alibaba.com>
Date: Fri, 27 Dec 2019 16:52:37 +0800

> From commit 50895b9de1d3 ("tcp: highest_sack fix"), the logic about
> setting tp->highest_sack to the head of the send queue was removed.
> Of course the logic is error prone, but it is logical. Before we
> remove the pointer to the highest sack skb and use the seq instead,
> we need to set tp->highest_sack to NULL when there is no skb after
> the last sack, and then replace NULL with the real skb when new skb
> inserted into the rtx queue, because the NULL means the highest sack
> seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
> the next ACK with sack option will increase tp->reordering unexpectedly.
> 
> This patch sets tp->highest_sack to the tail of the rtx queue if
> it's NULL and new data is sent. The patch keeps the rule that the
> highest_sack can only be maintained by sack processing, except for
> this only case.
> 
> Fixes: 50895b9de1d3 ("tcp: highest_sack fix")
> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>

Applied and queued up for -stable, thanks.
diff mbox series

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 1f7735ca8f22..58c92a7d671c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -72,6 +72,9 @@  static void tcp_event_new_data_sent(struct sock *sk, struct sk_buff *skb)
 	__skb_unlink(skb, &sk->sk_write_queue);
 	tcp_rbtree_insert(&sk->tcp_rtx_queue, skb);
 
+	if (tp->highest_sack == NULL)
+		tp->highest_sack = skb;
+
 	tp->packets_out += tcp_skb_pcount(skb);
 	if (!prior_packets || icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
 		tcp_rearm_rto(sk);