Message ID | f58c4b02028e0750e583518608891d20742c0f38.1415128437.git.mleitner@redhat.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, Nov 4, 2014 at 2:15 PM, Marcelo Ricardo Leitner <mleitner@redhat.com> wrote: ... > Therefore, now we clear retrans_stamp as soon as all data during the > loss window is fully acked. > > Reported-by: Ueki Kohei > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Yuchung Cheng <ycheng@google.com> > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> > --- > > Notes: > v1->v2: fixed compilation issue noticed by Neal > > net/ipv4/tcp_input.c | 60 +++++++++++++++++++++++++++------------------------- > 1 file changed, 31 insertions(+), 29 deletions(-) Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Code looks fine, and it passes Yuchung's packetdrill test case for this. Thanks for finding and fixing this, Marcelo. neal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04-11-2014 18:10, Neal Cardwell wrote: > On Tue, Nov 4, 2014 at 2:15 PM, Marcelo Ricardo Leitner > <mleitner@redhat.com> wrote: > ... >> Therefore, now we clear retrans_stamp as soon as all data during the >> loss window is fully acked. >> >> Reported-by: Ueki Kohei >> Cc: Neal Cardwell <ncardwell@google.com> >> Cc: Yuchung Cheng <ycheng@google.com> >> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> >> --- >> >> Notes: >> v1->v2: fixed compilation issue noticed by Neal >> >> net/ipv4/tcp_input.c | 60 +++++++++++++++++++++++++++------------------------- >> 1 file changed, 31 insertions(+), 29 deletions(-) > > Acked-by: Neal Cardwell <ncardwell@google.com> > Tested-by: Neal Cardwell <ncardwell@google.com> > > Code looks fine, and it passes Yuchung's packetdrill test case for this. > > Thanks for finding and fixing this, Marcelo. And thank you guys for all the assistance on it. Btw, would you send me that packetdrill script? I'm curious to see how such corner case could be written on it. Regards, Marcelo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2014-11-04 at 18:51 -0200, Marcelo Ricardo Leitner wrote: > And thank you guys for all the assistance on it. Btw, would you send me that > packetdrill script? I'm curious to see how such corner case could be written > on it. One of the script I saw was : You might have to adapt preconditions (tcp_rmem[]/tcp_wmem[]) 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 // Set a 10s timeout +.000 setsockopt(3, SOL_TCP, TCP_USER_TIMEOUT, [10000], 4) = 0 +.000 bind(3, ..., ...) = 0 +.000 listen(3, 1) = 0 +.000 < S 0:0(0) win 32792 <mss 1460,nop,wscale 7> +.000 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 6> +.010 < . 1:1(0) ack 1 win 257 +.000 accept(3, ..., ...) = 4 +.000 write(4, ..., 1000) = 1000 +.000 > P. 1:1001(1000) ack 1 +.625 > P. 1:1001(1000) ack 1 +.020 < . 1:1(0) ack 1001 win 257 // Purposely write more after the specified timeout for testing +11.0 write(4, ..., 1000) = 1000 +.000 > P. 1001:2001(1000) ack 1 +1.25 > P. 1001:2001(1000) ack 1 // socket is killed when the 2nd RTO fires at +2.50 w/o this patch // so the next write returns ETIMEOUT +2.60 write(4, ..., 1000) = 1000 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04-11-2014 23:20, Eric Dumazet wrote: > On Tue, 2014-11-04 at 18:51 -0200, Marcelo Ricardo Leitner wrote: > >> And thank you guys for all the assistance on it. Btw, would you send me that >> packetdrill script? I'm curious to see how such corner case could be written >> on it. > > One of the script I saw was : > > You might have to adapt preconditions (tcp_rmem[]/tcp_wmem[]) > > 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > // Set a 10s timeout > +.000 setsockopt(3, SOL_TCP, TCP_USER_TIMEOUT, [10000], 4) = 0 > +.000 bind(3, ..., ...) = 0 > +.000 listen(3, 1) = 0 > +.000 < S 0:0(0) win 32792 <mss 1460,nop,wscale 7> > +.000 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 6> > +.010 < . 1:1(0) ack 1 win 257 > +.000 accept(3, ..., ...) = 4 > +.000 write(4, ..., 1000) = 1000 > +.000 > P. 1:1001(1000) ack 1 > +.625 > P. 1:1001(1000) ack 1 > +.020 < . 1:1(0) ack 1001 win 257 > // Purposely write more after the specified timeout for testing > +11.0 write(4, ..., 1000) = 1000 > +.000 > P. 1001:2001(1000) ack 1 > +1.25 > P. 1001:2001(1000) ack 1 > // socket is killed when the 2nd RTO fires at +2.50 w/o this patch > // so the next write returns ETIMEOUT > +2.60 write(4, ..., 1000) = 1000 Cool, thanks! Marcelo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Neal Cardwell <ncardwell@google.com> Date: Tue, 4 Nov 2014 15:10:31 -0500 > On Tue, Nov 4, 2014 at 2:15 PM, Marcelo Ricardo Leitner > <mleitner@redhat.com> wrote: > ... >> Therefore, now we clear retrans_stamp as soon as all data during the >> loss window is fully acked. >> >> Reported-by: Ueki Kohei >> Cc: Neal Cardwell <ncardwell@google.com> >> Cc: Yuchung Cheng <ycheng@google.com> >> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> >> --- >> >> Notes: >> v1->v2: fixed compilation issue noticed by Neal >> >> net/ipv4/tcp_input.c | 60 +++++++++++++++++++++++++++------------------------- >> 1 file changed, 31 insertions(+), 29 deletions(-) > > Acked-by: Neal Cardwell <ncardwell@google.com> > Tested-by: Neal Cardwell <ncardwell@google.com> > > Code looks fine, and it passes Yuchung's packetdrill test case for this. > > Thanks for finding and fixing this, Marcelo. Applied, thanks everyone. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a12b455928e52211efdc6b471ef54de6218f5df0..88fa2d1606859de25419d0d45c3095f6d410d42b 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2315,6 +2315,35 @@ static inline bool tcp_packet_delayed(const struct tcp_sock *tp) /* Undo procedures. */ +/* We can clear retrans_stamp when there are no retransmissions in the + * window. It would seem that it is trivially available for us in + * tp->retrans_out, however, that kind of assumptions doesn't consider + * what will happen if errors occur when sending retransmission for the + * second time. ...It could the that such segment has only + * TCPCB_EVER_RETRANS set at the present time. It seems that checking + * the head skb is enough except for some reneging corner cases that + * are not worth the effort. + * + * Main reason for all this complexity is the fact that connection dying + * time now depends on the validity of the retrans_stamp, in particular, + * that successive retransmissions of a segment must not advance + * retrans_stamp under any conditions. + */ +static bool tcp_any_retrans_done(const struct sock *sk) +{ + const struct tcp_sock *tp = tcp_sk(sk); + struct sk_buff *skb; + + if (tp->retrans_out) + return true; + + skb = tcp_write_queue_head(sk); + if (unlikely(skb && TCP_SKB_CB(skb)->sacked & TCPCB_EVER_RETRANS)) + return true; + + return false; +} + #if FASTRETRANS_DEBUG > 1 static void DBGUNDO(struct sock *sk, const char *msg) { @@ -2410,6 +2439,8 @@ static bool tcp_try_undo_recovery(struct sock *sk) * is ACKed. For Reno it is MUST to prevent false * fast retransmits (RFC2582). SACK TCP is safe. */ tcp_moderate_cwnd(tp); + if (!tcp_any_retrans_done(sk)) + tp->retrans_stamp = 0; return true; } tcp_set_ca_state(sk, TCP_CA_Open); @@ -2430,35 +2461,6 @@ static bool tcp_try_undo_dsack(struct sock *sk) return false; } -/* We can clear retrans_stamp when there are no retransmissions in the - * window. It would seem that it is trivially available for us in - * tp->retrans_out, however, that kind of assumptions doesn't consider - * what will happen if errors occur when sending retransmission for the - * second time. ...It could the that such segment has only - * TCPCB_EVER_RETRANS set at the present time. It seems that checking - * the head skb is enough except for some reneging corner cases that - * are not worth the effort. - * - * Main reason for all this complexity is the fact that connection dying - * time now depends on the validity of the retrans_stamp, in particular, - * that successive retransmissions of a segment must not advance - * retrans_stamp under any conditions. - */ -static bool tcp_any_retrans_done(const struct sock *sk) -{ - const struct tcp_sock *tp = tcp_sk(sk); - struct sk_buff *skb; - - if (tp->retrans_out) - return true; - - skb = tcp_write_queue_head(sk); - if (unlikely(skb && TCP_SKB_CB(skb)->sacked & TCPCB_EVER_RETRANS)) - return true; - - return false; -} - /* Undo during loss recovery after partial ACK or using F-RTO. */ static bool tcp_try_undo_loss(struct sock *sk, bool frto_undo) {
Ueki Kohei reported that when we are using NewReno with connections that have a very low traffic, we may timeout the connection too early if a second loss occurs after the first one was successfully acked but no data was transfered later. Below is his description of it: When SACK is disabled, and a socket suffers multiple separate TCP retransmissions, that socket's ETIMEDOUT value is calculated from the time of the *first* retransmission instead of the *latest* retransmission. This happens because the tcp_sock's retrans_stamp is set once then never cleared. Take the following connection: Linux remote-machine | | send#1---->(*1)|--------> data#1 --------->| | | | RTO : : | | | ---(*2)|----> data#1(retrans) ---->| | (*3)|<---------- ACK <----------| | | | | : : | : : | : : 16 minutes (or more) : | : : | : : | : : | | | send#2---->(*4)|--------> data#2 --------->| | | | RTO : : | | | ---(*5)|----> data#2(retrans) ---->| | | | | | | RTO*2 : : | | | | | | ETIMEDOUT<----(*6)| | (*1) One data packet sent. (*2) Because no ACK packet is received, the packet is retransmitted. (*3) The ACK packet is received. The transmitted packet is acknowledged. At this point the first "retransmission event" has passed and been recovered from. Any future retransmission is a completely new "event". (*4) After 16 minutes (to correspond with retries2=15), a new data packet is sent. Note: No data is transmitted between (*3) and (*4). The socket's timeout SHOULD be calculated from this point in time, but instead it's calculated from the prior "event" 16 minutes ago. (*5) Because no ACK packet is received, the packet is retransmitted. (*6) At the time of the 2nd retransmission, the socket returns ETIMEDOUT. Therefore, now we clear retrans_stamp as soon as all data during the loss window is fully acked. Reported-by: Ueki Kohei Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> --- Notes: v1->v2: fixed compilation issue noticed by Neal net/ipv4/tcp_input.c | 60 +++++++++++++++++++++++++++------------------------- 1 file changed, 31 insertions(+), 29 deletions(-)