diff mbox

tcp: use RTAX_CWND for outgoing connections properly

Message ID alpine.LNX.2.00.1012172310400.16569@pobox.suse.cz
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Kosina Dec. 17, 2010, 10:14 p.m. UTC
For outgoing connections, the initial value of cwnd is always set to 2 (in 
tcp_v4_init_sock()) regardless of setting of RTAX_CWND. For incoming 
connections, this is handled properly in tcp_init_metrics().

As a result of this, Linux TCP stack always uses cwnd == 2 at the beginning of
outgoing TCP session (i.e. waits for ACK after 2 packets once the connection
has been established) and grows it in accordance with slow-start algorithm
only after it receives ACK for first two packets.

When slow-start triggers later during the connection (e.g. after idle), 
cwnd is properly re-initialized to RTAX_CWND value (if specified) through 
tcp_cwnd_restart() -> tcp_init_cwnd().

Initialize tp->snd_cwnd properly so that RTAX_CWND value is being used 
also in the slow-start phase for the first packets in the connection.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 net/ipv4/tcp_output.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

Comments

Jiri Kosina Dec. 22, 2010, 10 a.m. UTC | #1
On Fri, 17 Dec 2010, Jiri Kosina wrote:

> For outgoing connections, the initial value of cwnd is always set to 2 (in 
> tcp_v4_init_sock()) regardless of setting of RTAX_CWND. For incoming 
> connections, this is handled properly in tcp_init_metrics().
> 
> As a result of this, Linux TCP stack always uses cwnd == 2 at the beginning of
> outgoing TCP session (i.e. waits for ACK after 2 packets once the connection
> has been established) and grows it in accordance with slow-start algorithm
> only after it receives ACK for first two packets.
> 
> When slow-start triggers later during the connection (e.g. after idle), 
> cwnd is properly re-initialized to RTAX_CWND value (if specified) through 
> tcp_cwnd_restart() -> tcp_init_cwnd().
> 
> Initialize tp->snd_cwnd properly so that RTAX_CWND value is being used 
> also in the slow-start phase for the first packets in the connection.

This should of course read RTAX_INITCWND instead of RTAX_CWND in the whole 
changelog, sorry.

Besides that, any comments on this, please?

Thanks.

> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
>  net/ipv4/tcp_output.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 61c2463..6dbc55b 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2572,6 +2572,8 @@ static void tcp_connect_init(struct sock *sk)
>  				  &rcv_wscale,
>  				  dst_metric(dst, RTAX_INITRWND));
>  
> +	tp->snd_cwnd = tcp_init_cwnd(tp, dst);
> +
>  	tp->rx_opt.rcv_wscale = rcv_wscale;
>  	tp->rcv_ssthresh = tp->rcv_wnd;
>  
> -- 
> 1.7.3.1
> 
>
Eric Dumazet Dec. 22, 2010, 11:29 a.m. UTC | #2
Le mercredi 22 décembre 2010 à 11:00 +0100, Jiri Kosina a écrit :
> On Fri, 17 Dec 2010, Jiri Kosina wrote:
> 
> > For outgoing connections, the initial value of cwnd is always set to 2 (in 
> > tcp_v4_init_sock()) regardless of setting of RTAX_CWND. For incoming 
> > connections, this is handled properly in tcp_init_metrics().
> > 
> > As a result of this, Linux TCP stack always uses cwnd == 2 at the beginning of
> > outgoing TCP session (i.e. waits for ACK after 2 packets once the connection
> > has been established) and grows it in accordance with slow-start algorithm
> > only after it receives ACK for first two packets.
> > 
> > When slow-start triggers later during the connection (e.g. after idle), 
> > cwnd is properly re-initialized to RTAX_CWND value (if specified) through 
> > tcp_cwnd_restart() -> tcp_init_cwnd().
> > 
> > Initialize tp->snd_cwnd properly so that RTAX_CWND value is being used 
> > also in the slow-start phase for the first packets in the connection.
> 
> This should of course read RTAX_INITCWND instead of RTAX_CWND in the whole 
> changelog, sorry.
> 
> Besides that, any comments on this, please?
> 

Its a bit strange, here is what I have with two net-next-2.6 machines
(without your patch)

192.168.20.0/24 dev eth1  scope link  initcwnd 10

12:13:46.855786 IP 192.168.20.110.39146 > 192.168.20.108.59636: S 1615862982:1615862982(0) win 14600 <mss 1460,sackOK,timestamp 277530 0,nop,wscale 8>
12:13:46.855807 IP 192.168.20.108.59636 > 192.168.20.110.39146: S 1603053412:1603053412(0) ack 1615862983 win 14480 <mss 1460,sackOK,timestamp 744840 277530,nop,wscale 7>
12:13:46.855878 IP 192.168.20.110.39146 > 192.168.20.108.59636: . ack 1 win 58 <nop,nop,timestamp 277530 744840>
12:13:46.856779 IP 192.168.20.110.39146 > 192.168.20.108.59636: P 1:14481(14480) ack 1 win 58 <nop,nop,timestamp 277530 744840>
12:13:46.856794 IP 192.168.20.108.59636 > 192.168.20.110.39146: . ack 14481 win 136 <nop,nop,timestamp 744841 277530>
12:13:46.856901 IP 192.168.20.110.39146 > 192.168.20.108.59636: . 14481:15929(1448) ack 1 win 58 <nop,nop,timestamp 277531 744841>
12:13:46.856912 IP 192.168.20.108.59636 > 192.168.20.110.39146: . ack 15929 win 159 <nop,nop,timestamp 744841 277531>
12:13:46.856930 IP 192.168.20.110.39146 > 192.168.20.108.59636: . 15929:18825(2896) ack 1 win 58 <nop,nop,timestamp 277531 744841>


We can see 192.168.20.110 sends 14480 bytes in its first frame.

Are you sure your patch still needed after commit 356f039822b8d802138f
(TCP: increase default initial receive window.)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Kosina Dec. 22, 2010, 11:38 a.m. UTC | #3
On Wed, 22 Dec 2010, Eric Dumazet wrote:

> > > For outgoing connections, the initial value of cwnd is always set to 2 (in 
> > > tcp_v4_init_sock()) regardless of setting of RTAX_CWND. For incoming 
> > > connections, this is handled properly in tcp_init_metrics().
> > > 
> > > As a result of this, Linux TCP stack always uses cwnd == 2 at the beginning of
> > > outgoing TCP session (i.e. waits for ACK after 2 packets once the connection
> > > has been established) and grows it in accordance with slow-start algorithm
> > > only after it receives ACK for first two packets.
> > > 
> > > When slow-start triggers later during the connection (e.g. after idle), 
> > > cwnd is properly re-initialized to RTAX_CWND value (if specified) through 
> > > tcp_cwnd_restart() -> tcp_init_cwnd().
> > > 
> > > Initialize tp->snd_cwnd properly so that RTAX_CWND value is being used 
> > > also in the slow-start phase for the first packets in the connection.
> > 
> > This should of course read RTAX_INITCWND instead of RTAX_CWND in the whole 
> > changelog, sorry.
> > 
> > Besides that, any comments on this, please?
> > 
> 
> Its a bit strange, here is what I have with two net-next-2.6 machines
> (without your patch)
> 
> 192.168.20.0/24 dev eth1  scope link  initcwnd 10
> 
> 12:13:46.855786 IP 192.168.20.110.39146 > 192.168.20.108.59636: S 1615862982:1615862982(0) win 14600 <mss 1460,sackOK,timestamp 277530 0,nop,wscale 8>
> 12:13:46.855807 IP 192.168.20.108.59636 > 192.168.20.110.39146: S 1603053412:1603053412(0) ack 1615862983 win 14480 <mss 1460,sackOK,timestamp 744840 277530,nop,wscale 7>
> 12:13:46.855878 IP 192.168.20.110.39146 > 192.168.20.108.59636: . ack 1 win 58 <nop,nop,timestamp 277530 744840>
> 12:13:46.856779 IP 192.168.20.110.39146 > 192.168.20.108.59636: P 1:14481(14480) ack 1 win 58 <nop,nop,timestamp 277530 744840>
> 12:13:46.856794 IP 192.168.20.108.59636 > 192.168.20.110.39146: . ack 14481 win 136 <nop,nop,timestamp 744841 277530>
> 12:13:46.856901 IP 192.168.20.110.39146 > 192.168.20.108.59636: . 14481:15929(1448) ack 1 win 58 <nop,nop,timestamp 277531 744841>
> 12:13:46.856912 IP 192.168.20.108.59636 > 192.168.20.110.39146: . ack 15929 win 159 <nop,nop,timestamp 744841 277531>
> 12:13:46.856930 IP 192.168.20.110.39146 > 192.168.20.108.59636: . 15929:18825(2896) ack 1 win 58 <nop,nop,timestamp 277531 744841>
>
> We can see 192.168.20.110 sends 14480 bytes in its first frame.

So is this with 356f039822b8d802138f applied?

I have had a testing environment in which I had forced the 'receiver' to 
advertise large receive window, but still the sender started with a very 
small initial congestion window, waiting for ack after 1 or 2 MSS-sized 
packets.

I must admit that I was reproducing it with slightly older kernel, but 
haven't seen any changes in the code which would influence this behavior. 
Will look how most up-to-date kernel (plus 356f039822b8d802138f) behaves 
in that environment.

Where in the code is the congestion window enlarged to accomodate 14480 
bytes in your case? I don't see where the value of '2' coming from 
tcp_v4_init_sock() is being overriden for "outgoing" connections ...

Thanks,
Jiri Kosina Dec. 22, 2010, 2:07 p.m. UTC | #4
On Wed, 22 Dec 2010, Jiri Kosina wrote:

> > 12:13:46.855786 IP 192.168.20.110.39146 > 192.168.20.108.59636: S 1615862982:1615862982(0) win 14600 <mss 1460,sackOK,timestamp 277530 0,nop,wscale 8>
> > 12:13:46.855807 IP 192.168.20.108.59636 > 192.168.20.110.39146: S 1603053412:1603053412(0) ack 1615862983 win 14480 <mss 1460,sackOK,timestamp 744840 277530,nop,wscale 7>
> > 12:13:46.855878 IP 192.168.20.110.39146 > 192.168.20.108.59636: . ack 1 win 58 <nop,nop,timestamp 277530 744840>
> > 12:13:46.856779 IP 192.168.20.110.39146 > 192.168.20.108.59636: P 1:14481(14480) ack 1 win 58 <nop,nop,timestamp 277530 744840>
> > 12:13:46.856794 IP 192.168.20.108.59636 > 192.168.20.110.39146: . ack 14481 win 136 <nop,nop,timestamp 744841 277530>
> > 12:13:46.856901 IP 192.168.20.110.39146 > 192.168.20.108.59636: . 14481:15929(1448) ack 1 win 58 <nop,nop,timestamp 277531 744841>
> > 12:13:46.856912 IP 192.168.20.108.59636 > 192.168.20.110.39146: . ack 15929 win 159 <nop,nop,timestamp 744841 277531>
> > 12:13:46.856930 IP 192.168.20.110.39146 > 192.168.20.108.59636: . 15929:18825(2896) ack 1 win 58 <nop,nop,timestamp 277531 744841>
> >
> > We can see 192.168.20.110 sends 14480 bytes in its first frame.
> 
> So is this with 356f039822b8d802138f applied?
> 
> I have had a testing environment in which I had forced the 'receiver' to 
> advertise large receive window, but still the sender started with a very 
> small initial congestion window, waiting for ack after 1 or 2 MSS-sized 
> packets.

OK, so current vanilla (not even net-next needed) seems to correctly 
handle initcwnd value and sends as many packets as fit into it, instead of 
waiting after one or two MSS-sized packets. So someone already fixed it 
in some other way apparently. Sorry for the noise.
Eric Dumazet Dec. 22, 2010, 2:37 p.m. UTC | #5
Le mercredi 22 décembre 2010 à 15:07 +0100, Jiri Kosina a écrit :

> 
> OK, so current vanilla (not even net-next needed) seems to correctly 
> handle initcwnd value and sends as many packets as fit into it, instead of 
> waiting after one or two MSS-sized packets. So someone already fixed it 
> in some other way apparently. Sorry for the noise.
> 

Indeed ;)

net-next was necessary at least for the other side, so that its SYNACK
packet advertizes a large enough window ;)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 22, 2010, 7:57 p.m. UTC | #6
From: Jiri Kosina <jkosina@suse.cz>
Date: Wed, 22 Dec 2010 11:00:39 +0100 (CET)

> Besides that, any comments on this, please?

It's in my backlog, patience is a virtue.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Kosina Dec. 22, 2010, 11:26 p.m. UTC | #7
On Wed, 22 Dec 2010, David Miller wrote:

> > Besides that, any comments on this, please?
> 
> It's in my backlog, patience is a virtue.

Please just disregard it, it's not needed with newer kernels. Thanks,
David Miller Dec. 22, 2010, 11:27 p.m. UTC | #8
From: Jiri Kosina <jkosina@suse.cz>
Date: Thu, 23 Dec 2010 00:26:21 +0100 (CET)

> On Wed, 22 Dec 2010, David Miller wrote:
> 
>> > Besides that, any comments on this, please?
>> 
>> It's in my backlog, patience is a virtue.
> 
> Please just disregard it, it's not needed with newer kernels. Thanks,

Was it indeed fixed by commit 86bcebafc5e7f5?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Kosina Dec. 22, 2010, 11:33 p.m. UTC | #9
On Wed, 22 Dec 2010, David Miller wrote:

> >> > Besides that, any comments on this, please?
> >> 
> >> It's in my backlog, patience is a virtue.
> > 
> > Please just disregard it, it's not needed with newer kernels. Thanks,
> 
> Was it indeed fixed by commit 86bcebafc5e7f5?

Yes, exactly. It just wasn't obvious enough when looking at the code 
(hence my followup patch which hopefully makes the code more readable on 
a first sight).

Thanks and sorry for the noise.
diff mbox

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 61c2463..6dbc55b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2572,6 +2572,8 @@  static void tcp_connect_init(struct sock *sk)
 				  &rcv_wscale,
 				  dst_metric(dst, RTAX_INITRWND));
 
+	tp->snd_cwnd = tcp_init_cwnd(tp, dst);
+
 	tp->rx_opt.rcv_wscale = rcv_wscale;
 	tp->rcv_ssthresh = tp->rcv_wnd;