diff mbox

tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample

Message ID 1334080760-968-1-git-send-email-ncardwell@google.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Neal Cardwell April 10, 2012, 5:59 p.m. UTC
Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and
unscaled RTT samples.

The intent in the code was to only use the 'm' measurement if it was a
new minimum.  However, since 'm' had not yet been shifted left 3 bits
but 'new_sample' had, this comparison would nearly always succeed,
leading us to erroneously set our receive-side RTT estimate to the 'm'
sample when that sample could be nearly 8x too high to use.

The overall effect is to often cause the receive-side RTT estimate to
be significantly too large (up to 40% too large for brief periods in
my tests).

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

Comments

Eric Dumazet April 10, 2012, 6:45 p.m. UTC | #1
On Tue, Apr 10, 2012 at 7:59 PM, Neal Cardwell <ncardwell@google.com> wrote:
> Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and
> unscaled RTT samples.
>
> The intent in the code was to only use the 'm' measurement if it was a
> new minimum.  However, since 'm' had not yet been shifted left 3 bits
> but 'new_sample' had, this comparison would nearly always succeed,
> leading us to erroneously set our receive-side RTT estimate to the 'm'
> sample when that sample could be nearly 8x too high to use.
>
> The overall effect is to often cause the receive-side RTT estimate to
> be significantly too large (up to 40% too large for brief periods in
> my tests).
>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> ---
>  net/ipv4/tcp_input.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index e886e2f..e7b54d2 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -474,8 +474,11 @@ static void tcp_rcv_rtt_update(struct tcp_sock *tp, u32 sample, int win_dep)
>                if (!win_dep) {
>                        m -= (new_sample >> 3);
>                        new_sample += m;
> -               } else if (m < new_sample)
> -                       new_sample = m << 3;
> +               } else {
> +                       m <<= 3;
> +                       if (m < new_sample)
> +                               new_sample = m;
> +               }
>        } else {
>                /* No previous measure. */
>                new_sample = m << 3;
> --
> 1.7.7.3
>

Acked-by: Eric Dumazet <edumazet@google.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 10, 2012, 6:47 p.m. UTC | #2
From: Neal Cardwell <ncardwell@google.com>
Date: Tue, 10 Apr 2012 13:59:20 -0400

> Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and
> unscaled RTT samples.
> 
> The intent in the code was to only use the 'm' measurement if it was a
> new minimum.  However, since 'm' had not yet been shifted left 3 bits
> but 'new_sample' had, this comparison would nearly always succeed,
> leading us to erroneously set our receive-side RTT estimate to the 'm'
> sample when that sample could be nearly 8x too high to use.
> 
> The overall effect is to often cause the receive-side RTT estimate to
> be significantly too large (up to 40% too large for brief periods in
> my tests).
> 
> Signed-off-by: Neal Cardwell <ncardwell@google.com>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Taht April 10, 2012, 8:46 p.m. UTC | #3
On Tue, Apr 10, 2012 at 11:47 AM, David Miller <davem@davemloft.net> wrote:
> From: Neal Cardwell <ncardwell@google.com>
> Date: Tue, 10 Apr 2012 13:59:20 -0400
>
>> Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and
>> unscaled RTT samples.
>>
>> The intent in the code was to only use the 'm' measurement if it was a
>> new minimum.  However, since 'm' had not yet been shifted left 3 bits
>> but 'new_sample' had, this comparison would nearly always succeed,
>> leading us to erroneously set our receive-side RTT estimate to the 'm'
>> sample when that sample could be nearly 8x too high to use.
>>
>> The overall effect is to often cause the receive-side RTT estimate to
>> be significantly too large (up to 40% too large for brief periods in
>> my tests).
>>
>> Signed-off-by: Neal Cardwell <ncardwell@google.com>
>
> Applied, thanks.
> --


awesome, is this needed in stable?
David Miller April 10, 2012, 9:07 p.m. UTC | #4
From: Dave Taht <dave.taht@gmail.com>
Date: Tue, 10 Apr 2012 13:46:38 -0700

> awesome, is this needed in stable?

I have it queued up there already.

But you didn't need to write your email at all, you could
have checked this yourself:

http://patchwork.ozlabs.org/user/bundle/2566/?state=*
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 10, 2012, 9:11 p.m. UTC | #5
On Tue, 2012-04-10 at 17:07 -0400, David Miller wrote:
> From: Dave Taht <dave.taht@gmail.com>
> Date: Tue, 10 Apr 2012 13:46:38 -0700
> 
> > awesome, is this needed in stable?
> 
> I have it queued up there already.
> 
> But you didn't need to write your email at all, you could
> have checked this yourself:
> 
> http://patchwork.ozlabs.org/user/bundle/2566/?state=*

Not sure this URL works (it doesnt for me)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 10, 2012, 9:16 p.m. UTC | #6
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 10 Apr 2012 23:11:34 +0200

> On Tue, 2012-04-10 at 17:07 -0400, David Miller wrote:
>> From: Dave Taht <dave.taht@gmail.com>
>> Date: Tue, 10 Apr 2012 13:46:38 -0700
>> 
>> > awesome, is this needed in stable?
>> 
>> I have it queued up there already.
>> 
>> But you didn't need to write your email at all, you could
>> have checked this yourself:
>> 
>> http://patchwork.ozlabs.org/user/bundle/2566/?state=*
> 
> Not sure this URL works (it doesnt for me)

Sorry, that's my private URL and only works if you're logged
in as me :-)  This one is better:

http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 10, 2012, 9:22 p.m. UTC | #7
On Tue, 2012-04-10 at 17:16 -0400, David Miller wrote:

> Sorry, that's my private URL and only works if you're logged
> in as me :-)  This one is better:
> 
> http://patchwork.ozlabs.org/bundle/davem/stable/?state=*

Thanks, added to my bookmarks ;)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Taht April 10, 2012, 9:24 p.m. UTC | #8
On Tue, Apr 10, 2012 at 2:22 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2012-04-10 at 17:16 -0400, David Miller wrote:
>
>> Sorry, that's my private URL and only works if you're logged
>> in as me :-)  This one is better:
>>
>> http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
>
> Thanks, added to my bookmarks ;)

Mine too! I had no idea. Useful.

Is there an emacs interface? :)

>
>
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e886e2f..e7b54d2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -474,8 +474,11 @@  static void tcp_rcv_rtt_update(struct tcp_sock *tp, u32 sample, int win_dep)
 		if (!win_dep) {
 			m -= (new_sample >> 3);
 			new_sample += m;
-		} else if (m < new_sample)
-			new_sample = m << 3;
+		} else {
+			m <<= 3;
+			if (m < new_sample)
+				new_sample = m;
+		}
 	} else {
 		/* No previous measure. */
 		new_sample = m << 3;