diff mbox

[1/1] TCP: increase default initial receive window.

Message ID 1292642451-892-1-git-send-email-nanditad@google.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Nandita Dukkipati Dec. 18, 2010, 3:20 a.m. UTC
This patch changes the default initial receive window to 10 mss
(defined constant). The default window is limited to the maximum
of 10*1460 and 2*mss (when mss > 1460).

Signed-off-by: Nandita Dukkipati <nanditad@google.com>
---
 include/net/tcp.h     |    3 +++
 net/ipv4/tcp_output.c |   11 ++++++++---
 2 files changed, 11 insertions(+), 3 deletions(-)

Comments

stephen hemminger Dec. 18, 2010, 3:59 a.m. UTC | #1
On Fri, 17 Dec 2010 19:20:51 -0800
Nandita Dukkipati <nanditad@google.com> wrote:

> This patch changes the default initial receive window to 10 mss
> (defined constant). The default window is limited to the maximum
> of 10*1460 and 2*mss (when mss > 1460).
> 
> Signed-off-by: Nandita Dukkipati <nanditad@google.com>

This needs way more discussion because it makes Linux non-RFC
breaking the RFC behavior should require explicit user override.
I also wonder if the magic value 10 should be a sysctl.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 18, 2010, 5:13 a.m. UTC | #2
From: Nandita Dukkipati <nanditad@google.com>
Date: Fri, 17 Dec 2010 19:20:51 -0800

> This patch changes the default initial receive window to 10 mss
> (defined constant). The default window is limited to the maximum
> of 10*1460 and 2*mss (when mss > 1460).
> 
> Signed-off-by: Nandita Dukkipati <nanditad@google.com>

That's an incredibly terse explanation for a very non-trivial
change with very non-trivial implications.

What analysis have you performed to lead you to decide that this
was a reasonable change to make?  Where can people see that
analysis and look over it to see if they agree with your
assesment of the data?

We can't apply a patch like that without any form of analysis or
reasoning.

You don't say "why" you're doing this, and frankly that really
ticks me off.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 18, 2010, 5:14 a.m. UTC | #3
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Fri, 17 Dec 2010 19:59:17 -0800

> On Fri, 17 Dec 2010 19:20:51 -0800
> Nandita Dukkipati <nanditad@google.com> wrote:
> 
>> This patch changes the default initial receive window to 10 mss
>> (defined constant). The default window is limited to the maximum
>> of 10*1460 and 2*mss (when mss > 1460).
>> 
>> Signed-off-by: Nandita Dukkipati <nanditad@google.com>
> 
> This needs way more discussion because it makes Linux non-RFC
> breaking the RFC behavior should require explicit user override.
> I also wonder if the magic value 10 should be a sysctl.

He's changing the default receive window not the transmit
congestion window, so there is nothing RFC about this.

But yes he does have to explain himself why this change is
being made, in exhaustive detail.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nandita Dukkipati Dec. 18, 2010, 9:08 a.m. UTC | #4
resend in plain text.

On Fri, Dec 17, 2010 at 9:13 PM, David Miller <davem@davemloft.net> wrote:
> From: Nandita Dukkipati <nanditad@google.com>
> Date: Fri, 17 Dec 2010 19:20:51 -0800
>
>> This patch changes the default initial receive window to 10 mss
>> (defined constant). The default window is limited to the maximum
>> of 10*1460 and 2*mss (when mss > 1460).
>>
>> Signed-off-by: Nandita Dukkipati <nanditad@google.com>
>
> That's an incredibly terse explanation for a very non-trivial
> change with very non-trivial implications.
>
> What analysis have you performed to lead you to decide that this
> was a reasonable change to make?  Where can people see that
> analysis and look over it to see if they agree with your
> assesment of the data?

Apologies for the terse comment. Here's a longer explanation.

Background:
A recent proposal to the IETF [Ref: 5] recommends increasing TCP's
initial congestion window to 10 mss or about 15KB. This proposal,
backed with data from several large-scale live experiments as well as
controlled testbed experiments, is under active discussion in the TCPM
working group.

Analysis performed:
Leading up to this proposal were several large-scale Internet
experiments [Ref: 2] with an initial congestion window of 10 mss
(IW10), where we showed that the average latency of HTTP responses
improved by approximately 10%. This was accompanied by a slight
increase in retransmission rate (0.5%), most of which is coming from
applications opening multiple simultaneous connections. To understand
the extreme
worst case scenarios, as well as fairness issues with IW10 versus IW3
traffic, we further conducted controlled testbed experiments
(end-hosts are all Linux based). We came away finding minimal negative
impact even under low link bandwidths (dial-ups) and small buffers
[Ref: 3]. These results are extremely encouraging to adopting IW10.

But obviously, an initial congestion window of 10 mss is useless
unless a TCP receiver advertises an initial receive window of at least
10 mss. Fortunately, in the large-scale Internet experiments we found
that most of the operating systems advertised a large enough initial
receive window, allowing us to experiment with various values of
initial congestion windows. Linux systems were among the few
exceptions that advertised a small receive window. This patch intends
to fix that.

References:

1. This site has a comprehensive list of all IW10 references to date.
http://code.google.com/speed/protocols/tcpm-IW10.html

2. Paper describing results from large-scale Internet experiments with IW10.
http://ccr.sigcomm.org/drupal/?q=node/621

3. Controlled testbed experiments with IW10 under worst case scenarios
http://www.ietf.org/proceedings/79/slides/tcpm-0.pdf

4. Raw test data from testbed experiments (Linux senders/receivers)
with initial congestion window and initial receive window of 10 mss.
http://research.csc.ncsu.edu/netsrv/?q=content/iw10

5. Internet-Draft. Increasing TCP's Initial Window.
https://datatracker.ietf.org/doc/draft-ietf-tcpm-initcwnd/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Dec. 20, 2010, 5:03 p.m. UTC | #5
On Sat, 18 Dec 2010 01:08:33 -0800
Nandita Dukkipati <nanditad@google.com> wrote:

> resend in plain text.
> 
> On Fri, Dec 17, 2010 at 9:13 PM, David Miller <davem@davemloft.net> wrote:
> > From: Nandita Dukkipati <nanditad@google.com>
> > Date: Fri, 17 Dec 2010 19:20:51 -0800
> >
> >> This patch changes the default initial receive window to 10 mss
> >> (defined constant). The default window is limited to the maximum
> >> of 10*1460 and 2*mss (when mss > 1460).
> >>
> >> Signed-off-by: Nandita Dukkipati <nanditad@google.com>
> >
> > That's an incredibly terse explanation for a very non-trivial
> > change with very non-trivial implications.
> >
> > What analysis have you performed to lead you to decide that this
> > was a reasonable change to make?  Where can people see that
> > analysis and look over it to see if they agree with your
> > assesment of the data?
> 
> Apologies for the terse comment. Here's a longer explanation.
> 
> Background:
> A recent proposal to the IETF [Ref: 5] recommends increasing TCP's
> initial congestion window to 10 mss or about 15KB. This proposal,
> backed with data from several large-scale live experiments as well as
> controlled testbed experiments, is under active discussion in the TCPM
> working group.
> 
> Analysis performed:
> Leading up to this proposal were several large-scale Internet
> experiments [Ref: 2] with an initial congestion window of 10 mss
> (IW10), where we showed that the average latency of HTTP responses
> improved by approximately 10%. This was accompanied by a slight
> increase in retransmission rate (0.5%), most of which is coming from
> applications opening multiple simultaneous connections. To understand
> the extreme
> worst case scenarios, as well as fairness issues with IW10 versus IW3
> traffic, we further conducted controlled testbed experiments
> (end-hosts are all Linux based). We came away finding minimal negative
> impact even under low link bandwidths (dial-ups) and small buffers
> [Ref: 3]. These results are extremely encouraging to adopting IW10.
> 
> But obviously, an initial congestion window of 10 mss is useless
> unless a TCP receiver advertises an initial receive window of at least
> 10 mss. Fortunately, in the large-scale Internet experiments we found
> that most of the operating systems advertised a large enough initial
> receive window, allowing us to experiment with various values of
> initial congestion windows. Linux systems were among the few
> exceptions that advertised a small receive window. This patch intends
> to fix that.
> 
> References:
> 
> 1. This site has a comprehensive list of all IW10 references to date.
> http://code.google.com/speed/protocols/tcpm-IW10.html
> 
> 2. Paper describing results from large-scale Internet experiments with IW10.
> http://ccr.sigcomm.org/drupal/?q=node/621
> 
> 3. Controlled testbed experiments with IW10 under worst case scenarios
> http://www.ietf.org/proceedings/79/slides/tcpm-0.pdf
> 
> 4. Raw test data from testbed experiments (Linux senders/receivers)
> with initial congestion window and initial receive window of 10 mss.
> http://research.csc.ncsu.edu/netsrv/?q=content/iw10
> 
> 5. Internet-Draft. Increasing TCP's Initial Window.
> https://datatracker.ietf.org/doc/draft-ietf-tcpm-initcwnd/

Agree this is a good idea, but some further notes:
  * The control of receive window is a local function not covered by
    RFC.
  * Linux manipulates receive window automatically, unlike some other
    implementations.

But any change to TCP risks breaking other broken implementations
and users need a good way to recover. Therefore I recommend this
be made a sysctl to allow for quick workaround for the user who has
to connect to some Elbonian printer.  Doing it per route is okay,
but for the worst case, it needs to be a sysctl.

The default value of the sysctl should be your new value (10),
and it should allow the old rfc usage if zero.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rick Jones Dec. 20, 2010, 6:26 p.m. UTC | #6
> Agree this is a good idea, but some further notes:
>   * The control of receive window is a local function not covered by
>     RFC.
>   * Linux manipulates receive window automatically, unlike some other
>     implementations.
> 
> But any change to TCP risks breaking other broken implementations
> and users need a good way to recover. 

Always good to be careful, but break in what way?  Many stacks have been 
advertising an initial receive window of well above 10 segments going back 
literally decades.

HP-UX systems have been advertising a default/initial recieve window of 32768 
bytes since the mid 1990s, Solaris systems have been advertising a default 
receive window of 49152 for ages.  I cannot speak to Windows' default advertised 
window.  While that sound a bit like "But MOM! All my friends are doing it." it 
does seem to suggest that advertising an initial receive window of 10 segments 
is unlikely to uncover anything new.

rick jones
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 20, 2010, 6:48 p.m. UTC | #7
From: Rick Jones <rick.jones2@hp.com>
Date: Mon, 20 Dec 2010 10:26:20 -0800

>> Agree this is a good idea, but some further notes:
>>   * The control of receive window is a local function not covered by
>>     RFC.
>>   * Linux manipulates receive window automatically, unlike some other
>>     implementations.
>> But any change to TCP risks breaking other broken implementations
>> and users need a good way to recover. 
> 
> Always good to be careful, but break in what way?  Many stacks have
> been advertising an initial receive window of well above 10 segments
> going back literally decades.

Agreed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Heffner Dec. 21, 2010, 6:27 p.m. UTC | #8
I know this has already been applied, but one thing to think about:
Linux announces a small initial window to prevent overflowing the
receive buffer when receiving segments smaller than the link MTU.
Increasing this even to 10 segments might have some negative
consequences.  I recall, for instance, some drivers when configured
with a 9000 byte MTU, have a single pool of receive buffers all 16k
(the next highest power of 2).  So each received segment will get 16k
of allocated memory accounted to it, even if the incoming segments are
<=1460 bytes long.  The default initial rcvbuf of 87380 bytes is less
than the 160k of memory that the initial window might consume, so
we're going to start hitting the very slow path of coalescing segments
to get back under memory bounds.

Some drivers are smarter about having multiple pools of receive
buffers with different sizes, so it might not be so easy to hit this
condition.  I haven't looked at any of them for a while.  Is this
still a real concern?

Thanks,
  -John


On Fri, Dec 17, 2010 at 10:20 PM, Nandita Dukkipati <nanditad@google.com> wrote:
> This patch changes the default initial receive window to 10 mss
> (defined constant). The default window is limited to the maximum
> of 10*1460 and 2*mss (when mss > 1460).
>
> Signed-off-by: Nandita Dukkipati <nanditad@google.com>
> ---
>  include/net/tcp.h     |    3 +++
>  net/ipv4/tcp_output.c |   11 ++++++++---
>  2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 2ab6c9c..6c25ba8 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -60,6 +60,9 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo);
>  */
>  #define MAX_TCP_WINDOW         32767U
>
> +/* Offer an initial receive window of 10 mss. */
> +#define TCP_DEFAULT_INIT_RCVWND        10
> +
>  /* Minimal accepted MSS. It is (60+60+8) - (20+20). */
>  #define TCP_MIN_MSS            88U
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 2d39066..dc7c096 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -228,10 +228,15 @@ void tcp_select_initial_window(int __space, __u32 mss,
>                }
>        }
>
> -       /* Set initial window to value enough for senders, following RFC5681. */
> +       /* Set initial window to a value enough for senders starting with
> +        * initial congestion window of TCP_DEFAULT_INIT_RCVWND. Place
> +        * a limit on the initial window when mss is larger than 1460.
> +        */
>        if (mss > (1 << *rcv_wscale)) {
> -               int init_cwnd = rfc3390_bytes_to_packets(mss);
> -
> +               int init_cwnd = TCP_DEFAULT_INIT_RCVWND;
> +               if (mss > 1460)
> +                       init_cwnd =
> +                       max_t(u32, (1460 * TCP_DEFAULT_INIT_RCVWND) / mss, 2);
>                /* when initializing use the value from init_rcv_wnd
>                 * rather than the default from above
>                 */
> --
> 1.7.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 21, 2010, 6:49 p.m. UTC | #9
Le mardi 21 décembre 2010 à 13:27 -0500, John Heffner a écrit :
> I know this has already been applied, but one thing to think about:
> Linux announces a small initial window to prevent overflowing the
> receive buffer when receiving segments smaller than the link MTU.

Overflowing receive buffer ? Which one ? Do you mean NIC RX ring
buffer ?

> Increasing this even to 10 segments might have some negative
> consequences.  I recall, for instance, some drivers when configured
> with a 9000 byte MTU, have a single pool of receive buffers all 16k
> (the next highest power of 2).  So each received segment will get 16k
> of allocated memory accounted to it, even if the incoming segments are
> <=1460 bytes long.  The default initial rcvbuf of 87380 bytes is less
> than the 160k of memory that the initial window might consume, so
> we're going to start hitting the very slow path of coalescing segments
> to get back under memory bounds.

Patch is not allowing 87380 bytes, but 10 segments, limited to 14600
bytes. Its very conservative IMHO.

> 
> Some drivers are smarter about having multiple pools of receive
> buffers with different sizes, so it might not be so easy to hit this
> condition.  I haven't looked at any of them for a while.  Is this
> still a real concern?

I dont think so. You would have problem anyway, since the patch changes
only _initial_ receive window. After some kbytes of data exchanged,
window is probably larger.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2ab6c9c..6c25ba8 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -60,6 +60,9 @@  extern void tcp_time_wait(struct sock *sk, int state, int timeo);
  */
 #define MAX_TCP_WINDOW		32767U
 
+/* Offer an initial receive window of 10 mss. */
+#define TCP_DEFAULT_INIT_RCVWND	10
+
 /* Minimal accepted MSS. It is (60+60+8) - (20+20). */
 #define TCP_MIN_MSS		88U
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2d39066..dc7c096 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -228,10 +228,15 @@  void tcp_select_initial_window(int __space, __u32 mss,
 		}
 	}
 
-	/* Set initial window to value enough for senders, following RFC5681. */
+	/* Set initial window to a value enough for senders starting with
+	 * initial congestion window of TCP_DEFAULT_INIT_RCVWND. Place
+	 * a limit on the initial window when mss is larger than 1460.
+	 */
 	if (mss > (1 << *rcv_wscale)) {
-		int init_cwnd = rfc3390_bytes_to_packets(mss);
-
+		int init_cwnd = TCP_DEFAULT_INIT_RCVWND;
+		if (mss > 1460)
+			init_cwnd =
+			max_t(u32, (1460 * TCP_DEFAULT_INIT_RCVWND) / mss, 2);
 		/* when initializing use the value from init_rcv_wnd
 		 * rather than the default from above
 		 */