Message ID | 1292642451-892-1-git-send-email-nanditad@google.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 17 Dec 2010 19:20:51 -0800 Nandita Dukkipati <nanditad@google.com> wrote: > This patch changes the default initial receive window to 10 mss > (defined constant). The default window is limited to the maximum > of 10*1460 and 2*mss (when mss > 1460). > > Signed-off-by: Nandita Dukkipati <nanditad@google.com> This needs way more discussion because it makes Linux non-RFC breaking the RFC behavior should require explicit user override. I also wonder if the magic value 10 should be a sysctl. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Nandita Dukkipati <nanditad@google.com> Date: Fri, 17 Dec 2010 19:20:51 -0800 > This patch changes the default initial receive window to 10 mss > (defined constant). The default window is limited to the maximum > of 10*1460 and 2*mss (when mss > 1460). > > Signed-off-by: Nandita Dukkipati <nanditad@google.com> That's an incredibly terse explanation for a very non-trivial change with very non-trivial implications. What analysis have you performed to lead you to decide that this was a reasonable change to make? Where can people see that analysis and look over it to see if they agree with your assesment of the data? We can't apply a patch like that without any form of analysis or reasoning. You don't say "why" you're doing this, and frankly that really ticks me off. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Stephen Hemminger <shemminger@vyatta.com> Date: Fri, 17 Dec 2010 19:59:17 -0800 > On Fri, 17 Dec 2010 19:20:51 -0800 > Nandita Dukkipati <nanditad@google.com> wrote: > >> This patch changes the default initial receive window to 10 mss >> (defined constant). The default window is limited to the maximum >> of 10*1460 and 2*mss (when mss > 1460). >> >> Signed-off-by: Nandita Dukkipati <nanditad@google.com> > > This needs way more discussion because it makes Linux non-RFC > breaking the RFC behavior should require explicit user override. > I also wonder if the magic value 10 should be a sysctl. He's changing the default receive window not the transmit congestion window, so there is nothing RFC about this. But yes he does have to explain himself why this change is being made, in exhaustive detail. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
resend in plain text. On Fri, Dec 17, 2010 at 9:13 PM, David Miller <davem@davemloft.net> wrote: > From: Nandita Dukkipati <nanditad@google.com> > Date: Fri, 17 Dec 2010 19:20:51 -0800 > >> This patch changes the default initial receive window to 10 mss >> (defined constant). The default window is limited to the maximum >> of 10*1460 and 2*mss (when mss > 1460). >> >> Signed-off-by: Nandita Dukkipati <nanditad@google.com> > > That's an incredibly terse explanation for a very non-trivial > change with very non-trivial implications. > > What analysis have you performed to lead you to decide that this > was a reasonable change to make? Where can people see that > analysis and look over it to see if they agree with your > assesment of the data? Apologies for the terse comment. Here's a longer explanation. Background: A recent proposal to the IETF [Ref: 5] recommends increasing TCP's initial congestion window to 10 mss or about 15KB. This proposal, backed with data from several large-scale live experiments as well as controlled testbed experiments, is under active discussion in the TCPM working group. Analysis performed: Leading up to this proposal were several large-scale Internet experiments [Ref: 2] with an initial congestion window of 10 mss (IW10), where we showed that the average latency of HTTP responses improved by approximately 10%. This was accompanied by a slight increase in retransmission rate (0.5%), most of which is coming from applications opening multiple simultaneous connections. To understand the extreme worst case scenarios, as well as fairness issues with IW10 versus IW3 traffic, we further conducted controlled testbed experiments (end-hosts are all Linux based). We came away finding minimal negative impact even under low link bandwidths (dial-ups) and small buffers [Ref: 3]. These results are extremely encouraging to adopting IW10. But obviously, an initial congestion window of 10 mss is useless unless a TCP receiver advertises an initial receive window of at least 10 mss. Fortunately, in the large-scale Internet experiments we found that most of the operating systems advertised a large enough initial receive window, allowing us to experiment with various values of initial congestion windows. Linux systems were among the few exceptions that advertised a small receive window. This patch intends to fix that. References: 1. This site has a comprehensive list of all IW10 references to date. http://code.google.com/speed/protocols/tcpm-IW10.html 2. Paper describing results from large-scale Internet experiments with IW10. http://ccr.sigcomm.org/drupal/?q=node/621 3. Controlled testbed experiments with IW10 under worst case scenarios http://www.ietf.org/proceedings/79/slides/tcpm-0.pdf 4. Raw test data from testbed experiments (Linux senders/receivers) with initial congestion window and initial receive window of 10 mss. http://research.csc.ncsu.edu/netsrv/?q=content/iw10 5. Internet-Draft. Increasing TCP's Initial Window. https://datatracker.ietf.org/doc/draft-ietf-tcpm-initcwnd/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 18 Dec 2010 01:08:33 -0800 Nandita Dukkipati <nanditad@google.com> wrote: > resend in plain text. > > On Fri, Dec 17, 2010 at 9:13 PM, David Miller <davem@davemloft.net> wrote: > > From: Nandita Dukkipati <nanditad@google.com> > > Date: Fri, 17 Dec 2010 19:20:51 -0800 > > > >> This patch changes the default initial receive window to 10 mss > >> (defined constant). The default window is limited to the maximum > >> of 10*1460 and 2*mss (when mss > 1460). > >> > >> Signed-off-by: Nandita Dukkipati <nanditad@google.com> > > > > That's an incredibly terse explanation for a very non-trivial > > change with very non-trivial implications. > > > > What analysis have you performed to lead you to decide that this > > was a reasonable change to make? Where can people see that > > analysis and look over it to see if they agree with your > > assesment of the data? > > Apologies for the terse comment. Here's a longer explanation. > > Background: > A recent proposal to the IETF [Ref: 5] recommends increasing TCP's > initial congestion window to 10 mss or about 15KB. This proposal, > backed with data from several large-scale live experiments as well as > controlled testbed experiments, is under active discussion in the TCPM > working group. > > Analysis performed: > Leading up to this proposal were several large-scale Internet > experiments [Ref: 2] with an initial congestion window of 10 mss > (IW10), where we showed that the average latency of HTTP responses > improved by approximately 10%. This was accompanied by a slight > increase in retransmission rate (0.5%), most of which is coming from > applications opening multiple simultaneous connections. To understand > the extreme > worst case scenarios, as well as fairness issues with IW10 versus IW3 > traffic, we further conducted controlled testbed experiments > (end-hosts are all Linux based). We came away finding minimal negative > impact even under low link bandwidths (dial-ups) and small buffers > [Ref: 3]. These results are extremely encouraging to adopting IW10. > > But obviously, an initial congestion window of 10 mss is useless > unless a TCP receiver advertises an initial receive window of at least > 10 mss. Fortunately, in the large-scale Internet experiments we found > that most of the operating systems advertised a large enough initial > receive window, allowing us to experiment with various values of > initial congestion windows. Linux systems were among the few > exceptions that advertised a small receive window. This patch intends > to fix that. > > References: > > 1. This site has a comprehensive list of all IW10 references to date. > http://code.google.com/speed/protocols/tcpm-IW10.html > > 2. Paper describing results from large-scale Internet experiments with IW10. > http://ccr.sigcomm.org/drupal/?q=node/621 > > 3. Controlled testbed experiments with IW10 under worst case scenarios > http://www.ietf.org/proceedings/79/slides/tcpm-0.pdf > > 4. Raw test data from testbed experiments (Linux senders/receivers) > with initial congestion window and initial receive window of 10 mss. > http://research.csc.ncsu.edu/netsrv/?q=content/iw10 > > 5. Internet-Draft. Increasing TCP's Initial Window. > https://datatracker.ietf.org/doc/draft-ietf-tcpm-initcwnd/ Agree this is a good idea, but some further notes: * The control of receive window is a local function not covered by RFC. * Linux manipulates receive window automatically, unlike some other implementations. But any change to TCP risks breaking other broken implementations and users need a good way to recover. Therefore I recommend this be made a sysctl to allow for quick workaround for the user who has to connect to some Elbonian printer. Doing it per route is okay, but for the worst case, it needs to be a sysctl. The default value of the sysctl should be your new value (10), and it should allow the old rfc usage if zero. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Agree this is a good idea, but some further notes: > * The control of receive window is a local function not covered by > RFC. > * Linux manipulates receive window automatically, unlike some other > implementations. > > But any change to TCP risks breaking other broken implementations > and users need a good way to recover. Always good to be careful, but break in what way? Many stacks have been advertising an initial receive window of well above 10 segments going back literally decades. HP-UX systems have been advertising a default/initial recieve window of 32768 bytes since the mid 1990s, Solaris systems have been advertising a default receive window of 49152 for ages. I cannot speak to Windows' default advertised window. While that sound a bit like "But MOM! All my friends are doing it." it does seem to suggest that advertising an initial receive window of 10 segments is unlikely to uncover anything new. rick jones -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Rick Jones <rick.jones2@hp.com> Date: Mon, 20 Dec 2010 10:26:20 -0800 >> Agree this is a good idea, but some further notes: >> * The control of receive window is a local function not covered by >> RFC. >> * Linux manipulates receive window automatically, unlike some other >> implementations. >> But any change to TCP risks breaking other broken implementations >> and users need a good way to recover. > > Always good to be careful, but break in what way? Many stacks have > been advertising an initial receive window of well above 10 segments > going back literally decades. Agreed. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I know this has already been applied, but one thing to think about: Linux announces a small initial window to prevent overflowing the receive buffer when receiving segments smaller than the link MTU. Increasing this even to 10 segments might have some negative consequences. I recall, for instance, some drivers when configured with a 9000 byte MTU, have a single pool of receive buffers all 16k (the next highest power of 2). So each received segment will get 16k of allocated memory accounted to it, even if the incoming segments are <=1460 bytes long. The default initial rcvbuf of 87380 bytes is less than the 160k of memory that the initial window might consume, so we're going to start hitting the very slow path of coalescing segments to get back under memory bounds. Some drivers are smarter about having multiple pools of receive buffers with different sizes, so it might not be so easy to hit this condition. I haven't looked at any of them for a while. Is this still a real concern? Thanks, -John On Fri, Dec 17, 2010 at 10:20 PM, Nandita Dukkipati <nanditad@google.com> wrote: > This patch changes the default initial receive window to 10 mss > (defined constant). The default window is limited to the maximum > of 10*1460 and 2*mss (when mss > 1460). > > Signed-off-by: Nandita Dukkipati <nanditad@google.com> > --- > include/net/tcp.h | 3 +++ > net/ipv4/tcp_output.c | 11 ++++++++--- > 2 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index 2ab6c9c..6c25ba8 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -60,6 +60,9 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); > */ > #define MAX_TCP_WINDOW 32767U > > +/* Offer an initial receive window of 10 mss. */ > +#define TCP_DEFAULT_INIT_RCVWND 10 > + > /* Minimal accepted MSS. It is (60+60+8) - (20+20). */ > #define TCP_MIN_MSS 88U > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 2d39066..dc7c096 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -228,10 +228,15 @@ void tcp_select_initial_window(int __space, __u32 mss, > } > } > > - /* Set initial window to value enough for senders, following RFC5681. */ > + /* Set initial window to a value enough for senders starting with > + * initial congestion window of TCP_DEFAULT_INIT_RCVWND. Place > + * a limit on the initial window when mss is larger than 1460. > + */ > if (mss > (1 << *rcv_wscale)) { > - int init_cwnd = rfc3390_bytes_to_packets(mss); > - > + int init_cwnd = TCP_DEFAULT_INIT_RCVWND; > + if (mss > 1460) > + init_cwnd = > + max_t(u32, (1460 * TCP_DEFAULT_INIT_RCVWND) / mss, 2); > /* when initializing use the value from init_rcv_wnd > * rather than the default from above > */ > -- > 1.7.3.1 > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 21 décembre 2010 à 13:27 -0500, John Heffner a écrit : > I know this has already been applied, but one thing to think about: > Linux announces a small initial window to prevent overflowing the > receive buffer when receiving segments smaller than the link MTU. Overflowing receive buffer ? Which one ? Do you mean NIC RX ring buffer ? > Increasing this even to 10 segments might have some negative > consequences. I recall, for instance, some drivers when configured > with a 9000 byte MTU, have a single pool of receive buffers all 16k > (the next highest power of 2). So each received segment will get 16k > of allocated memory accounted to it, even if the incoming segments are > <=1460 bytes long. The default initial rcvbuf of 87380 bytes is less > than the 160k of memory that the initial window might consume, so > we're going to start hitting the very slow path of coalescing segments > to get back under memory bounds. Patch is not allowing 87380 bytes, but 10 segments, limited to 14600 bytes. Its very conservative IMHO. > > Some drivers are smarter about having multiple pools of receive > buffers with different sizes, so it might not be so easy to hit this > condition. I haven't looked at any of them for a while. Is this > still a real concern? I dont think so. You would have problem anyway, since the patch changes only _initial_ receive window. After some kbytes of data exchanged, window is probably larger. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/tcp.h b/include/net/tcp.h index 2ab6c9c..6c25ba8 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -60,6 +60,9 @@ extern void tcp_time_wait(struct sock *sk, int state, int timeo); */ #define MAX_TCP_WINDOW 32767U +/* Offer an initial receive window of 10 mss. */ +#define TCP_DEFAULT_INIT_RCVWND 10 + /* Minimal accepted MSS. It is (60+60+8) - (20+20). */ #define TCP_MIN_MSS 88U diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 2d39066..dc7c096 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -228,10 +228,15 @@ void tcp_select_initial_window(int __space, __u32 mss, } } - /* Set initial window to value enough for senders, following RFC5681. */ + /* Set initial window to a value enough for senders starting with + * initial congestion window of TCP_DEFAULT_INIT_RCVWND. Place + * a limit on the initial window when mss is larger than 1460. + */ if (mss > (1 << *rcv_wscale)) { - int init_cwnd = rfc3390_bytes_to_packets(mss); - + int init_cwnd = TCP_DEFAULT_INIT_RCVWND; + if (mss > 1460) + init_cwnd = + max_t(u32, (1460 * TCP_DEFAULT_INIT_RCVWND) / mss, 2); /* when initializing use the value from init_rcv_wnd * rather than the default from above */
This patch changes the default initial receive window to 10 mss (defined constant). The default window is limited to the maximum of 10*1460 and 2*mss (when mss > 1460). Signed-off-by: Nandita Dukkipati <nanditad@google.com> --- include/net/tcp.h | 3 +++ net/ipv4/tcp_output.c | 11 ++++++++--- 2 files changed, 11 insertions(+), 3 deletions(-)