diff mbox

[net,2/3] tcp: enable xmit timer fix by having TLP use time when RTO should fire

Message ID 20170801025814.31206-3-ncardwell@google.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Neal Cardwell Aug. 1, 2017, 2:58 a.m. UTC
Have tcp_schedule_loss_probe() base the TLP scheduling decision based
on when the RTO *should* fire. This is to enable the upcoming xmit
timer fix in this series, where tcp_schedule_loss_probe() cannot
assume that the last timer installed was an RTO timer (because we are
no longer doing the "rearm RTO, rearm RTO, rearm TLP" dance on every
ACK). So tcp_schedule_loss_probe() must independently figure out when
an RTO would want to fire.

In the new TLP implementation following in this series, we cannot
assume that icsk_timeout was set based on an RTO; after processing a
cumulative ACK the icsk_timeout we see can be from a previous TLP or
RTO. So we need to independently recalculate the RTO time (instead of
reading it out of icsk_timeout). Removing this dependency on the
nature of icsk_timeout makes things a little easier to reason about
anyway.

Note that the old and new code should be equivalent, since they are
both saying: "if the RTO is in the future, but at an earlier time than
the normal TLP time, then set the TLP timer to fire when the RTO would
have fired".

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
---
 net/ipv4/tcp_output.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

Comments

Eric Dumazet Aug. 1, 2017, 7:22 a.m. UTC | #1
On Mon, 2017-07-31 at 22:58 -0400, Neal Cardwell wrote:
> Have tcp_schedule_loss_probe() base the TLP scheduling decision based
> on when the RTO *should* fire. This is to enable the upcoming xmit
> timer fix in this series, where tcp_schedule_loss_probe() cannot
> assume that the last timer installed was an RTO timer (because we are
> no longer doing the "rearm RTO, rearm RTO, rearm TLP" dance on every
> ACK). So tcp_schedule_loss_probe() must independently figure out when
> an RTO would want to fire.
> 
> In the new TLP implementation following in this series, we cannot
> assume that icsk_timeout was set based on an RTO; after processing a
> cumulative ACK the icsk_timeout we see can be from a previous TLP or
> RTO. So we need to independently recalculate the RTO time (instead of
> reading it out of icsk_timeout). Removing this dependency on the
> nature of icsk_timeout makes things a little easier to reason about
> anyway.
> 
> Note that the old and new code should be equivalent, since they are
> both saying: "if the RTO is in the future, but at an earlier time than
> the normal TLP time, then set the TLP timer to fire when the RTO would
> have fired".
> 
> Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Nandita Dukkipati <nanditad@google.com>
> ---
>  net/ipv4/tcp_output.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 2f1588bf73da..0ae6b5d176c0 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2377,8 +2377,8 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>  {
>  	struct inet_connection_sock *icsk = inet_csk(sk);
>  	struct tcp_sock *tp = tcp_sk(sk);
> -	u32 timeout, tlp_time_stamp, rto_time_stamp;
>  	u32 rtt = usecs_to_jiffies(tp->srtt_us >> 3);
> +	u32 timeout, rto_delta_us;
>  
>  	/* No consecutive loss probes. */
>  	if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
> @@ -2418,13 +2418,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>  	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
>  
>  	/* If RTO is shorter, just schedule TLP in its place. */

I have hard time to read this comment.

We are here trying to arm a timer based on TLP.

If RTO is shorter, we'll arm the timer based on RTO instead of TLP.

Is "If RTO is shorter, just schedule TLP in its place." really correct ?

I suggest we reword the comment or simply get rid of it now the code is
more obvious.

> -	tlp_time_stamp = tcp_jiffies32 + timeout;
> -	rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;
> -	if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {
> -		s32 delta = rto_time_stamp - tcp_jiffies32;
> -		if (delta > 0)
> -			timeout = delta;
> -	}
> +	rto_delta_us = tcp_rto_delta_us(sk);  /* How far in future is RTO? */
> +	if (rto_delta_us > 0)
> +		timeout = min_t(u32, timeout, usecs_to_jiffies(rto_delta_us));
>  
>  	inet_csk_reset_xmit_timer(sk, ICSK_TIME_LOSS_PROBE, timeout,
>  				  TCP_RTO_MAX);

Acked-by: Eric Dumazet <edumazet@google.com>
Neal Cardwell Aug. 1, 2017, 2:35 p.m. UTC | #2
On Tue, Aug 1, 2017 at 3:22 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2017-07-31 at 22:58 -0400, Neal Cardwell wrote:
>> @@ -2418,13 +2418,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>>       timeout = max_t(u32, timeout, msecs_to_jiffies(10));
>>
>>       /* If RTO is shorter, just schedule TLP in its place. */
>
> I have hard time to read this comment.
>
> We are here trying to arm a timer based on TLP.
>
> If RTO is shorter, we'll arm the timer based on RTO instead of TLP.
>
> Is "If RTO is shorter, just schedule TLP in its place." really correct ?
>
> I suggest we reword the comment or simply get rid of it now the code is
> more obvious.

OK, how about:

  /* If the RTO formula yields an earlier time, then use that time. */

We can also add a reference to the RACK/TLP Internet Draft at the top
of tcp_schedule_loss_probe().

Whatever wording we decide on, I am happy to send a patch for net-next
once this fix is merged into net-next.

neal
Eric Dumazet Aug. 1, 2017, 8:40 p.m. UTC | #3
On Tue, 2017-08-01 at 10:35 -0400, Neal Cardwell wrote:
> On Tue, Aug 1, 2017 at 3:22 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Mon, 2017-07-31 at 22:58 -0400, Neal Cardwell wrote:
> >> @@ -2418,13 +2418,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
> >>       timeout = max_t(u32, timeout, msecs_to_jiffies(10));
> >>
> >>       /* If RTO is shorter, just schedule TLP in its place. */
> >
> > I have hard time to read this comment.
> >
> > We are here trying to arm a timer based on TLP.
> >
> > If RTO is shorter, we'll arm the timer based on RTO instead of TLP.
> >
> > Is "If RTO is shorter, just schedule TLP in its place." really correct ?
> >
> > I suggest we reword the comment or simply get rid of it now the code is
> > more obvious.
> 
> OK, how about:
> 
>   /* If the RTO formula yields an earlier time, then use that time. */
> 

Sounds better :)

> We can also add a reference to the RACK/TLP Internet Draft at the top
> of tcp_schedule_loss_probe().
> 
> Whatever wording we decide on, I am happy to send a patch for net-next
> once this fix is merged into net-next.

Sure.
diff mbox

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2f1588bf73da..0ae6b5d176c0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2377,8 +2377,8 @@  bool tcp_schedule_loss_probe(struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
-	u32 timeout, tlp_time_stamp, rto_time_stamp;
 	u32 rtt = usecs_to_jiffies(tp->srtt_us >> 3);
+	u32 timeout, rto_delta_us;
 
 	/* No consecutive loss probes. */
 	if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
@@ -2418,13 +2418,9 @@  bool tcp_schedule_loss_probe(struct sock *sk)
 	timeout = max_t(u32, timeout, msecs_to_jiffies(10));
 
 	/* If RTO is shorter, just schedule TLP in its place. */
-	tlp_time_stamp = tcp_jiffies32 + timeout;
-	rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;
-	if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {
-		s32 delta = rto_time_stamp - tcp_jiffies32;
-		if (delta > 0)
-			timeout = delta;
-	}
+	rto_delta_us = tcp_rto_delta_us(sk);  /* How far in future is RTO? */
+	if (rto_delta_us > 0)
+		timeout = min_t(u32, timeout, usecs_to_jiffies(rto_delta_us));
 
 	inet_csk_reset_xmit_timer(sk, ICSK_TIME_LOSS_PROBE, timeout,
 				  TCP_RTO_MAX);