Message ID | 1578993820-2114-1-git-send-email-yangpc@wangsu.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | tcp: fix marked lost packets not being retransmitted | expand |
On Tue, Jan 14, 2020 at 4:24 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > When the packet pointed to by retransmit_skb_hint is unlinked by ACK, > retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue(). > If packet loss is detected at this time, retransmit_skb_hint will be set > to point to the current packet loss in tcp_verify_retransmit_hint(), > then the packets that were previously marked lost but not retransmitted > due to the restriction of cwnd will be skipped and cannot be > retransmitted. > > To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can > be reset only after all marked lost packets are retransmitted > (retrans_out >= lost_out), otherwise we need to traverse from > tcp_rtx_queue_head in tcp_xmit_retransmit_queue(). ... > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -915,9 +915,10 @@ static void tcp_check_sack_reordering(struct sock *sk, const u32 low_seq, > /* This must be called before lost_out is incremented */ > static void tcp_verify_retransmit_hint(struct tcp_sock *tp, struct sk_buff *skb) > { > - if (!tp->retransmit_skb_hint || > - before(TCP_SKB_CB(skb)->seq, > - TCP_SKB_CB(tp->retransmit_skb_hint)->seq)) > + if ((!tp->retransmit_skb_hint && tp->retrans_out >= tp->lost_out) || > + (tp->retransmit_skb_hint && > + before(TCP_SKB_CB(skb)->seq, > + TCP_SKB_CB(tp->retransmit_skb_hint)->seq))) > tp->retransmit_skb_hint = skb; > } Thanks for finding and fixing this issue, and for providing the very nice packetdrill test case! The fix looks good to me. I verified that the packetdrill test fails at the line notated "BUG" without the patch applied: fr-retrans-hint-skip-fix.pkt:33: error handling packet: live packet field tcp_seq: expected: 2001 (0x7d1) vs actual: 4001 (0xfa1) script packet: 0.137311 . 2001:3001(1000) ack 1 actual packet: 0.137307 . 4001:5001(1000) ack 1 win 256 Also verified that the test passes with the patch applied, and that our internal SACK and fast recovery tests continue to pass with this patch applied. Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> thanks, neal
On Tue, Jan 14, 2020 at 1:24 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > When the packet pointed to by retransmit_skb_hint is unlinked by ACK, > retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue(). > If packet loss is detected at this time, retransmit_skb_hint will be set > to point to the current packet loss in tcp_verify_retransmit_hint(), > then the packets that were previously marked lost but not retransmitted > due to the restriction of cwnd will be skipped and cannot be > retransmitted. "cannot be retransmittted" sounds quite alarming. You meant they will eventually be retransmitted, or that the flow is completely frozen at this point ? Thanks for the fix and test ! (Not sure why you CC all these people having little TCP expertise btw) > To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can > be reset only after all marked lost packets are retransmitted > (retrans_out >= lost_out), otherwise we need to traverse from > tcp_rtx_queue_head in tcp_xmit_retransmit_queue(). > > Packetdrill to demonstrate: > > // Disable RACK and set max_reordering to keep things simple > 0 `sysctl -q net.ipv4.tcp_recovery=0` > +0 `sysctl -q net.ipv4.tcp_max_reordering=3` > > // Establish a connection > +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > +0 bind(3, ..., ...) = 0 > +0 listen(3, 1) = 0 > > +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> > +0 > S. 0:0(0) ack 1 <...> > +.01 < . 1:1(0) ack 1 win 257 > +0 accept(3, ..., ...) = 4 > > // Send 8 data segments > +0 write(4, ..., 8000) = 8000 > +0 > P. 1:8001(8000) ack 1 > > // Enter recovery and 1:3001 is marked lost > +.01 < . 1:1(0) ack 1 win 257 <sack 3001:4001,nop,nop> > +0 < . 1:1(0) ack 1 win 257 <sack 5001:6001 3001:4001,nop,nop> > +0 < . 1:1(0) ack 1 win 257 <sack 5001:7001 3001:4001,nop,nop> > > // Retransmit 1:1001, now retransmit_skb_hint points to 1001:2001 > +0 > . 1:1001(1000) ack 1 > > // 1001:2001 was ACKed causing retransmit_skb_hint to be set to NULL > +.01 < . 1:1(0) ack 2001 win 257 <sack 5001:8001 3001:4001,nop,nop> > // Now retransmit_skb_hint points to 4001:5001 which is now marked lost > > // BUG: 2001:3001 was not retransmitted > +0 > . 2001:3001(1000) ack 1 > > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> > --- > net/ipv4/tcp_input.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 0238b55..5347ab2 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -915,9 +915,10 @@ static void tcp_check_sack_reordering(struct sock *sk, const u32 low_seq, > /* This must be called before lost_out is incremented */ > static void tcp_verify_retransmit_hint(struct tcp_sock *tp, struct sk_buff *skb) > { > - if (!tp->retransmit_skb_hint || > - before(TCP_SKB_CB(skb)->seq, > - TCP_SKB_CB(tp->retransmit_skb_hint)->seq)) > + if ((!tp->retransmit_skb_hint && tp->retrans_out >= tp->lost_out) || > + (tp->retransmit_skb_hint && > + before(TCP_SKB_CB(skb)->seq, > + TCP_SKB_CB(tp->retransmit_skb_hint)->seq))) > tp->retransmit_skb_hint = skb; > } > > -- > 1.8.3.1 >
On Tue, Jan 14, 2020 at 8:06 AM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Jan 14, 2020 at 1:24 AM Pengcheng Yang <yangpc@wangsu.com> wrote: > > > > When the packet pointed to by retransmit_skb_hint is unlinked by ACK, > > retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue(). > > If packet loss is detected at this time, retransmit_skb_hint will be set > > to point to the current packet loss in tcp_verify_retransmit_hint(), > > then the packets that were previously marked lost but not retransmitted > > due to the restriction of cwnd will be skipped and cannot be > > retransmitted. > > > "cannot be retransmittted" sounds quite alarming. > > You meant they will eventually be retransmitted, or that the flow is > completely frozen at this point ? He probably means those lost packets will be skipped until a timeout that reset hint pointer. nice fix this would save some RTOs. > > Thanks for the fix and test ! > > (Not sure why you CC all these people having little TCP expertise btw) > > > To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can > > be reset only after all marked lost packets are retransmitted > > (retrans_out >= lost_out), otherwise we need to traverse from > > tcp_rtx_queue_head in tcp_xmit_retransmit_queue(). > > > > Packetdrill to demonstrate: > > > > // Disable RACK and set max_reordering to keep things simple > > 0 `sysctl -q net.ipv4.tcp_recovery=0` > > +0 `sysctl -q net.ipv4.tcp_max_reordering=3` > > > > // Establish a connection > > +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > > +0 bind(3, ..., ...) = 0 > > +0 listen(3, 1) = 0 > > > > +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> > > +0 > S. 0:0(0) ack 1 <...> > > +.01 < . 1:1(0) ack 1 win 257 > > +0 accept(3, ..., ...) = 4 > > > > // Send 8 data segments > > +0 write(4, ..., 8000) = 8000 > > +0 > P. 1:8001(8000) ack 1 > > > > // Enter recovery and 1:3001 is marked lost > > +.01 < . 1:1(0) ack 1 win 257 <sack 3001:4001,nop,nop> > > +0 < . 1:1(0) ack 1 win 257 <sack 5001:6001 3001:4001,nop,nop> > > +0 < . 1:1(0) ack 1 win 257 <sack 5001:7001 3001:4001,nop,nop> > > > > // Retransmit 1:1001, now retransmit_skb_hint points to 1001:2001 > > +0 > . 1:1001(1000) ack 1 > > > > // 1001:2001 was ACKed causing retransmit_skb_hint to be set to NULL > > +.01 < . 1:1(0) ack 2001 win 257 <sack 5001:8001 3001:4001,nop,nop> > > // Now retransmit_skb_hint points to 4001:5001 which is now marked lost > > > > // BUG: 2001:3001 was not retransmitted > > +0 > . 2001:3001(1000) ack 1 > > > > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> > > --- > > net/ipv4/tcp_input.c | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > index 0238b55..5347ab2 100644 > > --- a/net/ipv4/tcp_input.c > > +++ b/net/ipv4/tcp_input.c > > @@ -915,9 +915,10 @@ static void tcp_check_sack_reordering(struct sock *sk, const u32 low_seq, > > /* This must be called before lost_out is incremented */ > > static void tcp_verify_retransmit_hint(struct tcp_sock *tp, struct sk_buff *skb) > > { > > - if (!tp->retransmit_skb_hint || > > - before(TCP_SKB_CB(skb)->seq, > > - TCP_SKB_CB(tp->retransmit_skb_hint)->seq)) > > + if ((!tp->retransmit_skb_hint && tp->retrans_out >= tp->lost_out) || > > + (tp->retransmit_skb_hint && > > + before(TCP_SKB_CB(skb)->seq, > > + TCP_SKB_CB(tp->retransmit_skb_hint)->seq))) > > tp->retransmit_skb_hint = skb; > > } > > > > -- > > 1.8.3.1 > >
On Wed, Jan 15, 2020 at 2:40 AM Yuchung Cheng <ycheng@google.com> wrote: > > On Tue, Jan 14, 2020 at 8:06 AM Eric Dumazet <edumazet@google.com> > wrote: > > > > On Tue, Jan 14, 2020 at 1:24 AM Pengcheng Yang <yangpc@wangsu.com> > wrote: > > > > > > When the packet pointed to by retransmit_skb_hint is unlinked by ACK, > > > retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue(). > > > If packet loss is detected at this time, retransmit_skb_hint will be set > > > to point to the current packet loss in tcp_verify_retransmit_hint(), > > > then the packets that were previously marked lost but not retransmitted > > > due to the restriction of cwnd will be skipped and cannot be > > > retransmitted. > > > > > > "cannot be retransmittted" sounds quite alarming. > > > > You meant they will eventually be retransmitted, or that the flow is > > completely frozen at this point ? > He probably means those lost packets will be skipped until a timeout > that reset hint pointer. nice fix this would save some RTOs. Yes, I mean those packets will not be retransmitted until RTO. I'm sorry I didn't describe it clearly. > > > > > Thanks for the fix and test ! > > > > (Not sure why you CC all these people having little TCP expertise btw) Maybe I should CC fewer people :) > > > > > To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can > > > be reset only after all marked lost packets are retransmitted > > > (retrans_out >= lost_out), otherwise we need to traverse from > > > tcp_rtx_queue_head in tcp_xmit_retransmit_queue(). > > > > > > Packetdrill to demonstrate: > > > > > > // Disable RACK and set max_reordering to keep things simple > > > 0 `sysctl -q net.ipv4.tcp_recovery=0` > > > +0 `sysctl -q net.ipv4.tcp_max_reordering=3` > > > > > > // Establish a connection > > > +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > > > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > > > +0 bind(3, ..., ...) = 0 > > > +0 listen(3, 1) = 0 > > > > > > +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> > > > +0 > S. 0:0(0) ack 1 <...> > > > +.01 < . 1:1(0) ack 1 win 257 > > > +0 accept(3, ..., ...) = 4 > > > > > > // Send 8 data segments > > > +0 write(4, ..., 8000) = 8000 > > > +0 > P. 1:8001(8000) ack 1 > > > > > > // Enter recovery and 1:3001 is marked lost > > > +.01 < . 1:1(0) ack 1 win 257 <sack 3001:4001,nop,nop> > > > +0 < . 1:1(0) ack 1 win 257 <sack 5001:6001 3001:4001,nop,nop> > > > +0 < . 1:1(0) ack 1 win 257 <sack 5001:7001 3001:4001,nop,nop> > > > > > > // Retransmit 1:1001, now retransmit_skb_hint points to 1001:2001 > > > +0 > . 1:1001(1000) ack 1 > > > > > > // 1001:2001 was ACKed causing retransmit_skb_hint to be set to NULL > > > +.01 < . 1:1(0) ack 2001 win 257 <sack 5001:8001 3001:4001,nop,nop> > > > // Now retransmit_skb_hint points to 4001:5001 which is now marked lost > > > > > > // BUG: 2001:3001 was not retransmitted > > > +0 > . 2001:3001(1000) ack 1 > > > > > > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> > > > --- > > > net/ipv4/tcp_input.c | 7 ++++--- > > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > > index 0238b55..5347ab2 100644 > > > --- a/net/ipv4/tcp_input.c > > > +++ b/net/ipv4/tcp_input.c > > > @@ -915,9 +915,10 @@ static void tcp_check_sack_reordering(struct sock > *sk, const u32 low_seq, > > > /* This must be called before lost_out is incremented */ > > > static void tcp_verify_retransmit_hint(struct tcp_sock *tp, struct sk_buff > *skb) > > > { > > > - if (!tp->retransmit_skb_hint || > > > - before(TCP_SKB_CB(skb)->seq, > > > - TCP_SKB_CB(tp->retransmit_skb_hint)->seq)) > > > + if ((!tp->retransmit_skb_hint && tp->retrans_out >= tp->lost_out) > || > > > + (tp->retransmit_skb_hint && > > > + before(TCP_SKB_CB(skb)->seq, > > > + TCP_SKB_CB(tp->retransmit_skb_hint)->seq))) > > > tp->retransmit_skb_hint = skb; > > > } > > > > > > -- > > > 1.8.3.1 > > >
From: Pengcheng Yang <yangpc@wangsu.com> Date: Tue, 14 Jan 2020 17:23:40 +0800 > When the packet pointed to by retransmit_skb_hint is unlinked by ACK, > retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue(). > If packet loss is detected at this time, retransmit_skb_hint will be set > to point to the current packet loss in tcp_verify_retransmit_hint(), > then the packets that were previously marked lost but not retransmitted > due to the restriction of cwnd will be skipped and cannot be > retransmitted. > > To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can > be reset only after all marked lost packets are retransmitted > (retrans_out >= lost_out), otherwise we need to traverse from > tcp_rtx_queue_head in tcp_xmit_retransmit_queue(). > > Packetdrill to demonstrate: ... > Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> Applied, thank you.
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 0238b55..5347ab2 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -915,9 +915,10 @@ static void tcp_check_sack_reordering(struct sock *sk, const u32 low_seq, /* This must be called before lost_out is incremented */ static void tcp_verify_retransmit_hint(struct tcp_sock *tp, struct sk_buff *skb) { - if (!tp->retransmit_skb_hint || - before(TCP_SKB_CB(skb)->seq, - TCP_SKB_CB(tp->retransmit_skb_hint)->seq)) + if ((!tp->retransmit_skb_hint && tp->retrans_out >= tp->lost_out) || + (tp->retransmit_skb_hint && + before(TCP_SKB_CB(skb)->seq, + TCP_SKB_CB(tp->retransmit_skb_hint)->seq))) tp->retransmit_skb_hint = skb; }
When the packet pointed to by retransmit_skb_hint is unlinked by ACK, retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue(). If packet loss is detected at this time, retransmit_skb_hint will be set to point to the current packet loss in tcp_verify_retransmit_hint(), then the packets that were previously marked lost but not retransmitted due to the restriction of cwnd will be skipped and cannot be retransmitted. To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can be reset only after all marked lost packets are retransmitted (retrans_out >= lost_out), otherwise we need to traverse from tcp_rtx_queue_head in tcp_xmit_retransmit_queue(). Packetdrill to demonstrate: // Disable RACK and set max_reordering to keep things simple 0 `sysctl -q net.ipv4.tcp_recovery=0` +0 `sysctl -q net.ipv4.tcp_max_reordering=3` // Establish a connection +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <...> +.01 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 // Send 8 data segments +0 write(4, ..., 8000) = 8000 +0 > P. 1:8001(8000) ack 1 // Enter recovery and 1:3001 is marked lost +.01 < . 1:1(0) ack 1 win 257 <sack 3001:4001,nop,nop> +0 < . 1:1(0) ack 1 win 257 <sack 5001:6001 3001:4001,nop,nop> +0 < . 1:1(0) ack 1 win 257 <sack 5001:7001 3001:4001,nop,nop> // Retransmit 1:1001, now retransmit_skb_hint points to 1001:2001 +0 > . 1:1001(1000) ack 1 // 1001:2001 was ACKed causing retransmit_skb_hint to be set to NULL +.01 < . 1:1(0) ack 2001 win 257 <sack 5001:8001 3001:4001,nop,nop> // Now retransmit_skb_hint points to 4001:5001 which is now marked lost // BUG: 2001:3001 was not retransmitted +0 > . 2001:3001(1000) ack 1 Signed-off-by: Pengcheng Yang <yangpc@wangsu.com> --- net/ipv4/tcp_input.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)