Patchwork tcp_cubic: enable TCP timestamps

login
register
mail settings
Submitter Lucas Nussbaum
Date March 8, 2011, 8:09 a.m.
Message ID <20110308080926.GA22641@xanadu.blop.info>
Download mbox | patch
Permalink /patch/85933/
State Superseded
Delegated to: David Miller
Headers show

Comments

Lucas Nussbaum - March 8, 2011, 8:09 a.m.
The Hystart slow start algorithm requires precise RTT delay measurements
to decide when to leave slow start. However, currently, CUBIC doesn't
enable TCP timestamps. This can cause Hystart to mis-estimate the RTT,
and to leave slow start too early, generating bad performance since
convergence to the optimal cwnd is slower.

Timestamps are already used by TCP Illinois, LP, Vegas, Veno and Yeah.

Signed-off-by: Lucas Nussbaum <lucas.nussbaum@loria.fr>
stephen hemminger - March 8, 2011, 6:42 p.m.
On Tue, 8 Mar 2011 09:09:26 +0100
Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:

> The Hystart slow start algorithm requires precise RTT delay measurements
> to decide when to leave slow start. However, currently, CUBIC doesn't
> enable TCP timestamps. This can cause Hystart to mis-estimate the RTT,
> and to leave slow start too early, generating bad performance since
> convergence to the optimal cwnd is slower.
> 
> Timestamps are already used by TCP Illinois, LP, Vegas, Veno and Yeah.
> 
> Signed-off-by: Lucas Nussbaum <lucas.nussbaum@loria.fr>

Just to explain what RTT_STAMP does. It causes the tcp receive code
to compute the rtt using high resolution clocks rather than just
jiffies. It requires access to ktime_get_real which means accessing
clock source. This is cheap for TSC, a little expensive for HPET but
expensive for PIT. I worry that enabling it may hurt regular users
on old desktops.  But without it enabling RTT_STAMP, packets that
get acked in less than a jiffie (1 - 10 ms) will 

Also I should have used ktime_get rather than ktime_get_real
because real time is altered by NTP and other actions.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - March 8, 2011, 6:55 p.m.
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 8 Mar 2011 10:42:11 -0800

> On Tue, 8 Mar 2011 09:09:26 +0100
> Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> 
>> The Hystart slow start algorithm requires precise RTT delay measurements
>> to decide when to leave slow start. However, currently, CUBIC doesn't
>> enable TCP timestamps. This can cause Hystart to mis-estimate the RTT,
>> and to leave slow start too early, generating bad performance since
>> convergence to the optimal cwnd is slower.
>> 
>> Timestamps are already used by TCP Illinois, LP, Vegas, Veno and Yeah.
>> 
>> Signed-off-by: Lucas Nussbaum <lucas.nussbaum@loria.fr>
> 
> Just to explain what RTT_STAMP does. It causes the tcp receive code
> to compute the rtt using high resolution clocks rather than just
> jiffies. It requires access to ktime_get_real which means accessing
> clock source. This is cheap for TSC, a little expensive for HPET but
> expensive for PIT. I worry that enabling it may hurt regular users
> on old desktops.  But without it enabling RTT_STAMP, packets that
> get acked in less than a jiffie (1 - 10 ms) will 
> 
> Also I should have used ktime_get rather than ktime_get_real
> because real time is altered by NTP and other actions.

This also means that whatever testing was originally done on Hystart
is dependent upon whatever value CONFIG_HZ had in the test kernel.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sangtae Ha - March 8, 2011, 7:15 p.m.
Yes. I remember that CONFIG_HZ was 1000 at that time and the value of
CONFIG_HZ could affect the algorithm.
I don't think HyStart needs this extra RTT_STAMP since we only need a
rough delay estimate.
Let me check HyStart with the latest git and with different CONFIG_HZ values.

Sangtae

On Tue, Mar 8, 2011 at 1:55 PM, David Miller <davem@davemloft.net> wrote:
>
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Tue, 8 Mar 2011 10:42:11 -0800
>
> > On Tue, 8 Mar 2011 09:09:26 +0100
> > Lucas Nussbaum <lucas.nussbaum@loria.fr> wrote:
> >
> >> The Hystart slow start algorithm requires precise RTT delay measurements
> >> to decide when to leave slow start. However, currently, CUBIC doesn't
> >> enable TCP timestamps. This can cause Hystart to mis-estimate the RTT,
> >> and to leave slow start too early, generating bad performance since
> >> convergence to the optimal cwnd is slower.
> >>
> >> Timestamps are already used by TCP Illinois, LP, Vegas, Veno and Yeah.
> >>
> >> Signed-off-by: Lucas Nussbaum <lucas.nussbaum@loria.fr>
> >
> > Just to explain what RTT_STAMP does. It causes the tcp receive code
> > to compute the rtt using high resolution clocks rather than just
> > jiffies. It requires access to ktime_get_real which means accessing
> > clock source. This is cheap for TSC, a little expensive for HPET but
> > expensive for PIT. I worry that enabling it may hurt regular users
> > on old desktops.  But without it enabling RTT_STAMP, packets that
> > get acked in less than a jiffie (1 - 10 ms) will
> >
> > Also I should have used ktime_get rather than ktime_get_real
> > because real time is altered by NTP and other actions.
>
> This also means that whatever testing was originally done on Hystart
> is dependent upon whatever value CONFIG_HZ had in the test kernel.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lucas Nussbaum - March 8, 2011, 7:36 p.m.
On 08/03/11 at 14:15 -0500, Sangtae Ha wrote:
> Yes. I remember that CONFIG_HZ was 1000 at that time and the value of
> CONFIG_HZ could affect the algorithm.
> I don't think HyStart needs this extra RTT_STAMP since we only need a
> rough delay estimate.
> Let me check HyStart with the latest git and with different CONFIG_HZ values.

One of the problems I discovered was that ca->delay_min was completely
off if RTT_STAMP is disabled. For example, on a link with a 11ms RTT, I
could get delay_min = 4ms. But even with RTT_STAMP, there's the problem
that the code rounds the computed RTT value to a jiffie (in
bictcp_acked()).

My other patch mitigates the performance problem by making it harder for
Hystart to abort slow start. But after a few days working on this issue,
I'm wondering whether Hystart shouldn't be completely rewritten to work
on the usec values, or disabled by default.
David Miller - March 8, 2011, 7:38 p.m.
From: Lucas Nussbaum <lucas.nussbaum@loria.fr>
Date: Tue, 8 Mar 2011 20:36:02 +0100

> My other patch mitigates the performance problem by making it harder for
> Hystart to abort slow start. But after a few days working on this issue,
> I'm wondering whether Hystart shouldn't be completely rewritten to work
> on the usec values, or disabled by default.

I am of the opinion that Hystart, like TCP VEGAS, fundamentally cannot
work.

It's disabled in several major distributions because of the performance
problems you noticed, and this was on my own personal recommendation.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 71d5f2f..3a73509 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -406,6 +406,7 @@  static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us)
 }
 
 static struct tcp_congestion_ops cubictcp = {
+       .flags          = TCP_CONG_RTT_STAMP,
        .init           = bictcp_init,
        .ssthresh       = bictcp_recalc_ssthresh,
        .cong_avoid     = bictcp_cong_avoid,