diff mbox

[PATCHv3,1/2] tcp: Stalling connections: Fix timeout calculation routine

Message ID 4B1D27F7.9000502@tvk.rwth-aachen.de
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Damian Lukowski Dec. 7, 2009, 4:06 p.m. UTC
This patch fixes a problem in the TCP connection timeout calculation.
Currently, timeout decisions are made on the basis of the current
tcp_time_stamp and retrans_stamp, which is usually set at the first
retransmission.
However, if the retransmission fails in tcp_retransmit_skb(),
retrans_stamp is not updated and remains zero. This leads to wrong
decisions in retransmits_timed_out() if tcp_time_stamp is larger than
the specified timeout, which is very likely.
In this case, the TCP connection dies after the first attempted
(and unsuccessful) retransmission.

With this patch, tcp_skb_cb->when is used instead, when retrans_stamp
is not available.

This bug has been introduced together with retransmits_timed_out()
in 2.6.32, as the number of retransmissions has been used for timeout
decisions before. The corresponding commit was
6fa12c85031485dff38ce550c24f10da23b0adaa.

Thanks to Ilpo Järvinen for code suggestions and Frederic Leroy for
testing.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
---
 include/net/tcp.h |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

Comments

Ilpo Järvinen Dec. 7, 2009, 6:50 p.m. UTC | #1
On Mon, 7 Dec 2009, Damian Lukowski wrote:

> This patch fixes a problem in the TCP connection timeout calculation.
> Currently, timeout decisions are made on the basis of the current
> tcp_time_stamp and retrans_stamp, which is usually set at the first
> retransmission.
> However, if the retransmission fails in tcp_retransmit_skb(),
> retrans_stamp is not updated and remains zero. This leads to wrong
> decisions in retransmits_timed_out() if tcp_time_stamp is larger than
> the specified timeout, which is very likely.
> In this case, the TCP connection dies after the first attempted
> (and unsuccessful) retransmission.
> 
> With this patch, tcp_skb_cb->when is used instead, when retrans_stamp
> is not available.
> 
> This bug has been introduced together with retransmits_timed_out()
> in 2.6.32, as the number of retransmissions has been used for timeout
> decisions before. The corresponding commit was
> 6fa12c85031485dff38ce550c24f10da23b0adaa.
> 
> Thanks to Ilpo Järvinen for code suggestions and Frederic Leroy for
> testing.
> 
> Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
> ---
>  include/net/tcp.h |   10 ++++++++--
>  1 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 03a49c7..842ac4d 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1263,14 +1263,20 @@ static inline struct sk_buff *tcp_write_queue_prev(struct sock *sk, struct sk_bu
>   * TCP connection after "boundary" unsucessful, exponentially backed-off
>   * retransmissions with an initial RTO of TCP_RTO_MIN.
>   */
> -static inline bool retransmits_timed_out(const struct sock *sk,
> +static inline bool retransmits_timed_out(struct sock *sk,
>  					 unsigned int boundary)
>  {
>  	unsigned int timeout, linear_backoff_thresh;
> +	unsigned int start_ts;
>  
>  	if (!inet_csk(sk)->icsk_retransmits)
>  		return false;
>  
> +	if (unlikely(!tcp_sk(sk)->retrans_stamp))
> +		start_ts = TCP_SKB_CB(tcp_write_queue_head(sk))->when;
> +	else
> +		start_ts = tcp_sk(sk)->retrans_stamp;
> +
>  	linear_backoff_thresh = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
>  
>  	if (boundary <= linear_backoff_thresh)
> @@ -1279,7 +1285,7 @@ static inline bool retransmits_timed_out(const struct sock *sk,
>  		timeout = ((2 << linear_backoff_thresh) - 1) * TCP_RTO_MIN +
>  			  (boundary - linear_backoff_thresh) * TCP_RTO_MAX;
>  
> -	return (tcp_time_stamp - tcp_sk(sk)->retrans_stamp) >= timeout;
> +	return (tcp_time_stamp - start_ts) >= timeout;
>  }
>  
>  static inline struct sk_buff *tcp_send_head(struct sock *sk)

Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Also, this should be added:

Reported-by: Frederic Leroy <fredo@starox.org>

...and I think he has already tested this fix a number of times in 
different forms, so I'd already include his Tested-by: too.
Frederic Leroy Dec. 7, 2009, 7:09 p.m. UTC | #2
Le Mon, 7 Dec 2009 20:50:11 +0200 (EET),
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> a écrit :

> On Mon, 7 Dec 2009, Damian Lukowski wrote:
> 
> > This patch fixes a problem in the TCP connection timeout
> > calculation. Currently, timeout decisions are made on the basis of
> > 6fa12c85031485dff38ce550c24f10da23b0adaa.
> [...]
>> >  static inline struct sk_buff *tcp_send_head(struct sock *sk) 
> 
> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> 
> Also, this should be added:
> 
> Reported-by: Frederic Leroy <fredo@starox.org>
> 
> ...and I think he has already tested this fix a number of times in 
> different forms, so I'd already include his Tested-by: too.

If you need a bad connection to do some test, don't hesitate !
Thank you guys ;)
David Miller Dec. 9, 2009, 4:53 a.m. UTC | #3
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Mon, 7 Dec 2009 20:50:11 +0200 (EET)

> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> 
> Also, this should be added:
> 
> Reported-by: Frederic Leroy <fredo@starox.org>
> 
> ...and I think he has already tested this fix a number of times in 
> different forms, so I'd already include his Tested-by: too.

Done, applied, and I'll get this to -stable as quickly as I can.

Thanks everyone for all of your hard work!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 03a49c7..842ac4d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1263,14 +1263,20 @@  static inline struct sk_buff *tcp_write_queue_prev(struct sock *sk, struct sk_bu
  * TCP connection after "boundary" unsucessful, exponentially backed-off
  * retransmissions with an initial RTO of TCP_RTO_MIN.
  */
-static inline bool retransmits_timed_out(const struct sock *sk,
+static inline bool retransmits_timed_out(struct sock *sk,
 					 unsigned int boundary)
 {
 	unsigned int timeout, linear_backoff_thresh;
+	unsigned int start_ts;
 
 	if (!inet_csk(sk)->icsk_retransmits)
 		return false;
 
+	if (unlikely(!tcp_sk(sk)->retrans_stamp))
+		start_ts = TCP_SKB_CB(tcp_write_queue_head(sk))->when;
+	else
+		start_ts = tcp_sk(sk)->retrans_stamp;
+
 	linear_backoff_thresh = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
 
 	if (boundary <= linear_backoff_thresh)
@@ -1279,7 +1285,7 @@  static inline bool retransmits_timed_out(const struct sock *sk,
 		timeout = ((2 << linear_backoff_thresh) - 1) * TCP_RTO_MIN +
 			  (boundary - linear_backoff_thresh) * TCP_RTO_MAX;
 
-	return (tcp_time_stamp - tcp_sk(sk)->retrans_stamp) >= timeout;
+	return (tcp_time_stamp - start_ts) >= timeout;
 }
 
 static inline struct sk_buff *tcp_send_head(struct sock *sk)