diff mbox series

[net] tcp: fix to update snd_wl1 in bulk receiver fast path

Message ID 20201022143331.1887495-1-ncardwell.kernel@gmail.com
State Accepted
Delegated to: David Miller
Headers show
Series [net] tcp: fix to update snd_wl1 in bulk receiver fast path | expand

Checks

Context Check Description
jkicinski/cover_letter success Link
jkicinski/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
jkicinski/patch_count success Link
jkicinski/tree_selection success Clearly marked for net
jkicinski/subject_prefix success Link
jkicinski/source_inline success Was 0 now: 0
jkicinski/verify_signedoff success Link
jkicinski/module_param success Was 0 now: 0
jkicinski/build_32bit success Errors and warnings before: 2 this patch: 2
jkicinski/kdoc success Errors and warnings before: 0 this patch: 0
jkicinski/verify_fixes success Link
jkicinski/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
jkicinski/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
jkicinski/header_inline success Link
jkicinski/stable success Stable not CCed

Commit Message

Neal Cardwell Oct. 22, 2020, 2:33 p.m. UTC
From: Neal Cardwell <ncardwell@google.com>

In the header prediction fast path for a bulk data receiver, if no
data is newly acknowledged then we do not call tcp_ack() and do not
call tcp_ack_update_window(). This means that a bulk receiver that
receives large amounts of data can have the incoming sequence numbers
wrap, so that the check in tcp_may_update_window fails:
   after(ack_seq, tp->snd_wl1)

If the incoming receive windows are zero in this state, and then the
connection that was a bulk data receiver later wants to send data,
that connection can find itself persistently rejecting the window
updates in incoming ACKs. This means the connection can persistently
fail to discover that the receive window has opened, which in turn
means that the connection is unable to send anything, and the
connection's sending process can get permanently "stuck".

The fix is to update snd_wl1 in the header prediction fast path for a
bulk data receiver, so that it keeps up and does not see wrapping
problems.

This fix is based on a very nice and thorough analysis and diagnosis
by Apollon Oikonomopoulos (see link below).

This is a stable candidate but there is no Fixes tag here since the
bug predates current git history. Just for fun: looks like the bug
dates back to when header prediction was added in Linux v2.1.8 in Nov
1996. In that version tcp_rcv_established() was added, and the code
only updates snd_wl1 in tcp_ack(), and in the new "Bulk data transfer:
receiver" code path it does not call tcp_ack(). This fix seems to
apply cleanly at least as far back as v3.2.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reported-by: Apollon Oikonomopoulos <apoikos@dmesg.gr>
Tested-by: Apollon Oikonomopoulos <apoikos@dmesg.gr>
Link: https://www.spinics.net/lists/netdev/msg692430.html
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jakub Kicinski Oct. 22, 2020, 4:07 p.m. UTC | #1
On Thu, 22 Oct 2020 10:33:31 -0400 Neal Cardwell wrote:
> From: Neal Cardwell <ncardwell@google.com>
> 
> In the header prediction fast path for a bulk data receiver, if no
> data is newly acknowledged then we do not call tcp_ack() and do not
> call tcp_ack_update_window(). This means that a bulk receiver that
> receives large amounts of data can have the incoming sequence numbers
> wrap, so that the check in tcp_may_update_window fails:
>    after(ack_seq, tp->snd_wl1)
> 
> If the incoming receive windows are zero in this state, and then the
> connection that was a bulk data receiver later wants to send data,
> that connection can find itself persistently rejecting the window
> updates in incoming ACKs. This means the connection can persistently
> fail to discover that the receive window has opened, which in turn
> means that the connection is unable to send anything, and the
> connection's sending process can get permanently "stuck".
> 
> The fix is to update snd_wl1 in the header prediction fast path for a
> bulk data receiver, so that it keeps up and does not see wrapping
> problems.
> 
> This fix is based on a very nice and thorough analysis and diagnosis
> by Apollon Oikonomopoulos (see link below).
> 
> This is a stable candidate but there is no Fixes tag here since the
> bug predates current git history. Just for fun: looks like the bug
> dates back to when header prediction was added in Linux v2.1.8 in Nov
> 1996. In that version tcp_rcv_established() was added, and the code
> only updates snd_wl1 in tcp_ack(), and in the new "Bulk data transfer:
> receiver" code path it does not call tcp_ack(). This fix seems to
> apply cleanly at least as far back as v3.2.

In that case - can I slap:

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")

on it?
Neal Cardwell Oct. 22, 2020, 5:04 p.m. UTC | #2
On Thu, Oct 22, 2020 at 12:08 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 22 Oct 2020 10:33:31 -0400 Neal Cardwell wrote:
> > From: Neal Cardwell <ncardwell@google.com>
> >
> > In the header prediction fast path for a bulk data receiver, if no
> > data is newly acknowledged then we do not call tcp_ack() and do not
> > call tcp_ack_update_window(). This means that a bulk receiver that
> > receives large amounts of data can have the incoming sequence numbers
> > wrap, so that the check in tcp_may_update_window fails:
> >    after(ack_seq, tp->snd_wl1)
> >
> > If the incoming receive windows are zero in this state, and then the
> > connection that was a bulk data receiver later wants to send data,
> > that connection can find itself persistently rejecting the window
> > updates in incoming ACKs. This means the connection can persistently
> > fail to discover that the receive window has opened, which in turn
> > means that the connection is unable to send anything, and the
> > connection's sending process can get permanently "stuck".
> >
> > The fix is to update snd_wl1 in the header prediction fast path for a
> > bulk data receiver, so that it keeps up and does not see wrapping
> > problems.
> >
> > This fix is based on a very nice and thorough analysis and diagnosis
> > by Apollon Oikonomopoulos (see link below).
> >
> > This is a stable candidate but there is no Fixes tag here since the
> > bug predates current git history. Just for fun: looks like the bug
> > dates back to when header prediction was added in Linux v2.1.8 in Nov
> > 1996. In that version tcp_rcv_established() was added, and the code
> > only updates snd_wl1 in tcp_ack(), and in the new "Bulk data transfer:
> > receiver" code path it does not call tcp_ack(). This fix seems to
> > apply cleanly at least as far back as v3.2.
>
> In that case - can I slap:
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>
> on it?

Yes, slapping that Fixes footer on it sounds fine to me. I see
that it does apply cleanly to 1da177e4c3f4.

Or let me know if you would prefer for me to resubmit a v2 with that footer.

thanks,
neal
Jakub Kicinski Oct. 22, 2020, 5:47 p.m. UTC | #3
On Thu, 22 Oct 2020 13:04:04 -0400 Neal Cardwell wrote:
> > In that case - can I slap:
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> >
> > on it?  
> 
> Yes, slapping that Fixes footer on it sounds fine to me. I see
> that it does apply cleanly to 1da177e4c3f4.

FWIW even if it didn't - my modus operandi is to have the tag pointing
at the earliest point in the git history where the bug exists. Even if
the implementation was completely rewritten - we want to let the people
who run old kernels know they may experience the issue.

Backporters will see the patch doesn't apply and make the right call.

That said I don't think the process documentation is very clear on
this, so maybe someone more experienced will correct my course :)

> Or let me know if you would prefer for me to resubmit a v2 with that footer.

I'll add it, no worries.
Neal Cardwell Oct. 22, 2020, 5:49 p.m. UTC | #4
On Thu, Oct 22, 2020 at 1:47 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 22 Oct 2020 13:04:04 -0400 Neal Cardwell wrote:
> > > In that case - can I slap:
> > >
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > >
> > > on it?
> >
> > Yes, slapping that Fixes footer on it sounds fine to me. I see
> > that it does apply cleanly to 1da177e4c3f4.
>
> FWIW even if it didn't - my modus operandi is to have the tag pointing
> at the earliest point in the git history where the bug exists. Even if
> the implementation was completely rewritten - we want to let the people
> who run old kernels know they may experience the issue.
>
> Backporters will see the patch doesn't apply and make the right call.
>
> That said I don't think the process documentation is very clear on
> this, so maybe someone more experienced will correct my course :)
>
> > Or let me know if you would prefer for me to resubmit a v2 with that footer.
>
> I'll add it, no worries.

Great. Thanks!

neal
Jakub Kicinski Oct. 22, 2020, 7:30 p.m. UTC | #5
On Thu, 22 Oct 2020 10:33:31 -0400 Neal Cardwell wrote:
> From: Neal Cardwell <ncardwell@google.com>
> 
> In the header prediction fast path for a bulk data receiver, if no
> data is newly acknowledged then we do not call tcp_ack() and do not
> call tcp_ack_update_window(). This means that a bulk receiver that
> receives large amounts of data can have the incoming sequence numbers
> wrap, so that the check in tcp_may_update_window fails:
>    after(ack_seq, tp->snd_wl1)
> 
> If the incoming receive windows are zero in this state, and then the
> connection that was a bulk data receiver later wants to send data,
> that connection can find itself persistently rejecting the window
> updates in incoming ACKs. This means the connection can persistently
> fail to discover that the receive window has opened, which in turn
> means that the connection is unable to send anything, and the
> connection's sending process can get permanently "stuck".
> 
> The fix is to update snd_wl1 in the header prediction fast path for a
> bulk data receiver, so that it keeps up and does not see wrapping
> problems.
> 
> This fix is based on a very nice and thorough analysis and diagnosis
> by Apollon Oikonomopoulos (see link below).
> 
> This is a stable candidate but there is no Fixes tag here since the
> bug predates current git history. Just for fun: looks like the bug
> dates back to when header prediction was added in Linux v2.1.8 in Nov
> 1996. In that version tcp_rcv_established() was added, and the code
> only updates snd_wl1 in tcp_ack(), and in the new "Bulk data transfer:
> receiver" code path it does not call tcp_ack(). This fix seems to
> apply cleanly at least as far back as v3.2.
> 
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Reported-by: Apollon Oikonomopoulos <apoikos@dmesg.gr>
> Tested-by: Apollon Oikonomopoulos <apoikos@dmesg.gr>
> Link: https://www.spinics.net/lists/netdev/msg692430.html
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks!
patchwork-bot+netdevbpf@kernel.org Oct. 22, 2020, 7:40 p.m. UTC | #6
Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Thu, 22 Oct 2020 10:33:31 -0400 you wrote:
> From: Neal Cardwell <ncardwell@google.com>
> 
> In the header prediction fast path for a bulk data receiver, if no
> data is newly acknowledged then we do not call tcp_ack() and do not
> call tcp_ack_update_window(). This means that a bulk receiver that
> receives large amounts of data can have the incoming sequence numbers
> wrap, so that the check in tcp_may_update_window fails:
>    after(ack_seq, tp->snd_wl1)
> 
> [...]

Here is the summary with links:
  - [net] tcp: fix to update snd_wl1 in bulk receiver fast path
    https://git.kernel.org/netdev/net/c/18ded910b589

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
diff mbox series

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 67f10d3ec240..fc445833b5e5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5827,6 +5827,8 @@  void tcp_rcv_established(struct sock *sk, struct sk_buff *skb)
 				tcp_data_snd_check(sk);
 				if (!inet_csk_ack_scheduled(sk))
 					goto no_ack;
+			} else {
+				tcp_update_wl(tp, TCP_SKB_CB(skb)->seq);
 			}
 
 			__tcp_ack_snd_check(sk, 0);