diff mbox

resend: tcp: performance issue with fastopen connections (mss > window)

Message ID 3dbbcda8-ce79-641c-cf3b-21f41c563939@oracle.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Alexey Kodanev Jan. 18, 2017, 5:32 p.m. UTC
Hi Eric,

On 01/13/2017 08:07 PM, Alexey Kodanev wrote:

> Hi Eric,
> On 13.01.2017 18:35, Eric Dumazet wrote:
>
>> I would suggest to clamp MSS to half the initial window, but I guess
>> this is impractical since window in SYN/SYNACK are not scaled.

Looks like max_window not correctly initialized for tfo sockets.
On my test machine it has set to '5592320' in tcp_fastopen_create_child().

This diff fixes the issue, the question: is this the right place to do it?

          * because it's been added to the accept queue directly.


Thanks,
Alexey

Comments

Eric Dumazet Jan. 18, 2017, 5:35 p.m. UTC | #1
On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev
<alexey.kodanev@oracle.com> wrote:
> Hi Eric,
>
> On 01/13/2017 08:07 PM, Alexey Kodanev wrote:
>

> Looks like max_window not correctly initialized for tfo sockets.
> On my test machine it has set to '5592320' in tcp_fastopen_create_child().
>
> This diff fixes the issue, the question: is this the right place to do it?
>
> diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
> index 4e777a3..33ed508 100644
> --- a/net/ipv4/tcp_fastopen.c
> +++ b/net/ipv4/tcp_fastopen.c
> @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct
> sock *sk,
>          */
>         tp->snd_wnd = ntohs(tcp_hdr(skb)->window);
>
> +       tp->max_window = tp->snd_wnd;
> +

Excellent catch. Let me test our regression tests with this.

Thanks !
Yuchung Cheng Jan. 18, 2017, 6:13 p.m. UTC | #2
On Wed, Jan 18, 2017 at 9:35 AM, Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev
> <alexey.kodanev@oracle.com> wrote:
> > Hi Eric,
> >
> > On 01/13/2017 08:07 PM, Alexey Kodanev wrote:
> >
>
> > Looks like max_window not correctly initialized for tfo sockets.
> > On my test machine it has set to '5592320' in tcp_fastopen_create_child().
> >
> > This diff fixes the issue, the question: is this the right place to do it?
> >
> > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
> > index 4e777a3..33ed508 100644
> > --- a/net/ipv4/tcp_fastopen.c
> > +++ b/net/ipv4/tcp_fastopen.c
> > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct
> > sock *sk,
> >          */
> >         tp->snd_wnd = ntohs(tcp_hdr(skb)->window);
> >
> > +       tp->max_window = tp->snd_wnd;
> > +
>
> Excellent catch. Let me test our regression tests with this.
Indeed nice catch. Thanks for the investigative work!

>
> Thanks !
Eric Dumazet Jan. 18, 2017, 6:16 p.m. UTC | #3
On Wed, Jan 18, 2017 at 10:13 AM, Yuchung Cheng <ycheng@google.com> wrote:
> On Wed, Jan 18, 2017 at 9:35 AM, Eric Dumazet <edumazet@google.com> wrote:
>>
>> On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev
>> <alexey.kodanev@oracle.com> wrote:
>> > Hi Eric,
>> >
>> > On 01/13/2017 08:07 PM, Alexey Kodanev wrote:
>> >
>>
>> > Looks like max_window not correctly initialized for tfo sockets.
>> > On my test machine it has set to '5592320' in tcp_fastopen_create_child().
>> >
>> > This diff fixes the issue, the question: is this the right place to do it?
>> >
>> > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
>> > index 4e777a3..33ed508 100644
>> > --- a/net/ipv4/tcp_fastopen.c
>> > +++ b/net/ipv4/tcp_fastopen.c
>> > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct
>> > sock *sk,
>> >          */
>> >         tp->snd_wnd = ntohs(tcp_hdr(skb)->window);
>> >
>> > +       tp->max_window = tp->snd_wnd;
>> > +
>>
>> Excellent catch. Let me test our regression tests with this.
> Indeed nice catch. Thanks for the investigative work!
>

We do have 2 failures, but tests might have depended on undocumented behavior

(For googlers :
Ran 211 tests: 209 passing, 0 flaky 2 failing
Sponge: http://sponge/f1575065-6e1c-4514-bced-9167ce56d2ee
)

Please Alexey submit an official patch, thanks a lot !
Yuchung Cheng Jan. 18, 2017, 6:27 p.m. UTC | #4
Googler-only

Hi Eric: yeah I think the test was just due to the TSO chunking
difference between prod and upstream, which I was able to avoid with
this patch re-suited by Neal in b/34128974. In fact, this patch
enables me to run all recovery tests on upstream kernels for my RACK
patch set.

Neal: can we polish and check that in? super useful.


On Wed, Jan 18, 2017 at 10:16 AM, Eric Dumazet <edumazet@google.com> wrote:
> On Wed, Jan 18, 2017 at 10:13 AM, Yuchung Cheng <ycheng@google.com> wrote:
>> On Wed, Jan 18, 2017 at 9:35 AM, Eric Dumazet <edumazet@google.com> wrote:
>>>
>>> On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev
>>> <alexey.kodanev@oracle.com> wrote:
>>> > Hi Eric,
>>> >
>>> > On 01/13/2017 08:07 PM, Alexey Kodanev wrote:
>>> >
>>>
>>> > Looks like max_window not correctly initialized for tfo sockets.
>>> > On my test machine it has set to '5592320' in tcp_fastopen_create_child().
>>> >
>>> > This diff fixes the issue, the question: is this the right place to do it?
>>> >
>>> > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
>>> > index 4e777a3..33ed508 100644
>>> > --- a/net/ipv4/tcp_fastopen.c
>>> > +++ b/net/ipv4/tcp_fastopen.c
>>> > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct
>>> > sock *sk,
>>> >          */
>>> >         tp->snd_wnd = ntohs(tcp_hdr(skb)->window);
>>> >
>>> > +       tp->max_window = tp->snd_wnd;
>>> > +
>>>
>>> Excellent catch. Let me test our regression tests with this.
>> Indeed nice catch. Thanks for the investigative work!
>>
>
> We do have 2 failures, but tests might have depended on undocumented behavior
>
> (For googlers :
> Ran 211 tests: 209 passing, 0 flaky 2 failing
> Sponge: http://sponge/f1575065-6e1c-4514-bced-9167ce56d2ee
> )
>
> Please Alexey submit an official patch, thanks a lot !
Neal Cardwell Jan. 18, 2017, 6:33 p.m. UTC | #5
On Wed, Jan 18, 2017 at 1:16 PM, Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Jan 18, 2017 at 10:13 AM, Yuchung Cheng <ycheng@google.com> wrote:
> > On Wed, Jan 18, 2017 at 9:35 AM, Eric Dumazet <edumazet@google.com> wrote:
> >>
> >> On Wed, Jan 18, 2017 at 9:32 AM, Alexey Kodanev
> >> <alexey.kodanev@oracle.com> wrote:
> >> > Hi Eric,
> >> >
> >> > On 01/13/2017 08:07 PM, Alexey Kodanev wrote:
> >> >
> >>
> >> > Looks like max_window not correctly initialized for tfo sockets.
> >> > On my test machine it has set to '5592320' in tcp_fastopen_create_child().
> >> >
> >> > This diff fixes the issue, the question: is this the right place to do it?
> >> >
> >> > diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
> >> > index 4e777a3..33ed508 100644
> >> > --- a/net/ipv4/tcp_fastopen.c
> >> > +++ b/net/ipv4/tcp_fastopen.c
> >> > @@ -206,6 +206,8 @@ static struct sock *tcp_fastopen_create_child(struct
> >> > sock *sk,
> >> >          */
> >> >         tp->snd_wnd = ntohs(tcp_hdr(skb)->window);
> >> >
> >> > +       tp->max_window = tp->snd_wnd;
> >> > +
> >>
> >> Excellent catch. Let me test our regression tests with this.
> > Indeed nice catch. Thanks for the investigative work!
> >
>
> We do have 2 failures, but tests might have depended on undocumented behavior
>
> (For googlers :
> Ran 211 tests: 209 passing, 0 flaky 2 failing
> Sponge: http://sponge/f1575065-6e1c-4514-bced-9167ce56d2ee
> )

Yes, exactly, those test failures look fine. Looks like the test was
implicitly expecting the old buggy behavior (and it wasn't noticed
because the 10*MSS output matches IW10, so it looked fine. But now
with the fix, I believe we are seeing the correct new behavior because
tcp_xmit_size_goal() is calling tcp_bound_to_half_wnd(), and now with
max_window fixed, the outgoing TSO skb is now the correct 5*MSS (half
of the rwin of 10000). So the test results look good to me.

Thanks, Alexey, for tracking this down! :-)

neal
Neal Cardwell Jan. 18, 2017, 6:42 p.m. UTC | #6
On Wed, Jan 18, 2017 at 1:27 PM, Yuchung Cheng <ycheng@google.com> wrote:
> Hi Eric: yeah I think the test was just due to the TSO chunking
> difference between prod and upstream, which I was able to avoid with
> this patch re-suited by Neal in b/34128974. In fact, this patch
> enables me to run all recovery tests on upstream kernels for my RACK
> patch set.
>
> Neal: can we polish and check that in? super useful.

Yes, sounds good. Glad that was useful. I will work on polishing up
and checking in the patch to make packetdrill agnostic to TSO
segmentation by default (for upstream and Google).

neal
diff mbox

Patch

diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c
index 4e777a3..33ed508 100644
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -206,6 +206,8 @@  static struct sock *tcp_fastopen_create_child(struct 
sock *sk,
          */
         tp->snd_wnd = ntohs(tcp_hdr(skb)->window);

+       tp->max_window = tp->snd_wnd;
+
         /* Activate the retrans timer so that SYNACK can be retransmitted.
          * The request socket is not added to the ehash