diff mbox series

[net] tcp: fix tcp_mtu_probe() vs highest_sack

Message ID 1509430100.3828.12.camel@edumazet-glaptop3.roam.corp.google.com
State Accepted, archived
Delegated to: David Miller
Headers show
Series [net] tcp: fix tcp_mtu_probe() vs highest_sack | expand

Commit Message

Eric Dumazet Oct. 31, 2017, 6:08 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

Based on SNMP values provided by Roman, Yuchung made the observation
that some crashes in tcp_sacktag_walk() might be caused by MTU probing.

Looking at tcp_mtu_probe(), I found that when a new skb was placed
in front of the write queue, we were not updating tcp highest sack.

If one skb is freed because all its content was copied to the new skb
(for MTU probing), then tp->highest_sack could point to a now freed skb.

Bad things would then happen, including infinite loops.

This patch renames tcp_highest_sack_combine() and uses it
from tcp_mtu_probe() to fix the bug.

Note that I also removed one test against tp->sacked_out,
since we want to replace tp->highest_sack regardless of whatever
condition, since keeping a stale pointer to freed skb is a recipe
for disaster.

Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-by: Roman Gushchin <guro@fb.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
---
 include/net/tcp.h     |    6 +++---
 net/ipv4/tcp_output.c |    3 ++-
 2 files changed, 5 insertions(+), 4 deletions(-)

Comments

Alexei Starovoitov Oct. 31, 2017, 6:17 a.m. UTC | #1
On Mon, Oct 30, 2017 at 11:08:20PM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Based on SNMP values provided by Roman, Yuchung made the observation
> that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> 
> Looking at tcp_mtu_probe(), I found that when a new skb was placed
> in front of the write queue, we were not updating tcp highest sack.
> 
> If one skb is freed because all its content was copied to the new skb
> (for MTU probing), then tp->highest_sack could point to a now freed skb.
> 
> Bad things would then happen, including infinite loops.
> 
> This patch renames tcp_highest_sack_combine() and uses it
> from tcp_mtu_probe() to fix the bug.
> 
> Note that I also removed one test against tp->sacked_out,
> since we want to replace tp->highest_sack regardless of whatever
> condition, since keeping a stale pointer to freed skb is a recipe
> for disaster.
> 
> Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Reported-by: Roman Gushchin <guro@fb.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>

Thanks!

Acked-by: Alexei Starovoitov <ast@kernel.org>

wow. a bug from 2007.
Any idea why it only started to bite us in 4.11 ?

It's not trivial for us to reproduce it, but we will definitely
test the patch as soon as we can.
Do you have packet drill test or something for easy repro?
Eric Dumazet Oct. 31, 2017, 6:21 a.m. UTC | #2
On Mon, 2017-10-30 at 23:17 -0700, Alexei Starovoitov wrote:
> On Mon, Oct 30, 2017 at 11:08:20PM -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Based on SNMP values provided by Roman, Yuchung made the observation
> > that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> > 
> > Looking at tcp_mtu_probe(), I found that when a new skb was placed
> > in front of the write queue, we were not updating tcp highest sack.
> > 
> > If one skb is freed because all its content was copied to the new skb
> > (for MTU probing), then tp->highest_sack could point to a now freed skb.
> > 
> > Bad things would then happen, including infinite loops.
> > 
> > This patch renames tcp_highest_sack_combine() and uses it
> > from tcp_mtu_probe() to fix the bug.
> > 
> > Note that I also removed one test against tp->sacked_out,
> > since we want to replace tp->highest_sack regardless of whatever
> > condition, since keeping a stale pointer to freed skb is a recipe
> > for disaster.
> > 
> > Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Reported-by: Roman Gushchin <guro@fb.com>
> > Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> 
> Thanks!
> 
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> 
> wow. a bug from 2007.
> Any idea why it only started to bite us in 4.11 ?
> 
> It's not trivial for us to reproduce it, but we will definitely
> test the patch as soon as we can.
> Do you have packet drill test or something for easy repro?

I tried to cook a packetdrill test but could not trigger the issue.

When have you started to enable mtu probing ?

(Linux defaults to not enabling it )
Alexei Starovoitov Oct. 31, 2017, 6:30 a.m. UTC | #3
On Mon, Oct 30, 2017 at 11:21:42PM -0700, Eric Dumazet wrote:
> On Mon, 2017-10-30 at 23:17 -0700, Alexei Starovoitov wrote:
> > On Mon, Oct 30, 2017 at 11:08:20PM -0700, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > > 
> > > Based on SNMP values provided by Roman, Yuchung made the observation
> > > that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> > > 
> > > Looking at tcp_mtu_probe(), I found that when a new skb was placed
> > > in front of the write queue, we were not updating tcp highest sack.
> > > 
> > > If one skb is freed because all its content was copied to the new skb
> > > (for MTU probing), then tp->highest_sack could point to a now freed skb.
> > > 
> > > Bad things would then happen, including infinite loops.
> > > 
> > > This patch renames tcp_highest_sack_combine() and uses it
> > > from tcp_mtu_probe() to fix the bug.
> > > 
> > > Note that I also removed one test against tp->sacked_out,
> > > since we want to replace tp->highest_sack regardless of whatever
> > > condition, since keeping a stale pointer to freed skb is a recipe
> > > for disaster.
> > > 
> > > Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > > Reported-by: Roman Gushchin <guro@fb.com>
> > > Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> > 
> > Thanks!
> > 
> > Acked-by: Alexei Starovoitov <ast@kernel.org>
> > 
> > wow. a bug from 2007.
> > Any idea why it only started to bite us in 4.11 ?
> > 
> > It's not trivial for us to reproduce it, but we will definitely
> > test the patch as soon as we can.
> > Do you have packet drill test or something for easy repro?
> 
> I tried to cook a packetdrill test but could not trigger the issue.
> 
> When have you started to enable mtu probing ?

for some time. somehow 4.6 based kernel didn't trigger it.
May be it's a different bug still...
Neal Cardwell Oct. 31, 2017, 1:51 p.m. UTC | #4
On Tue, Oct 31, 2017 at 2:08 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Based on SNMP values provided by Roman, Yuchung made the observation
> that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
>
> Looking at tcp_mtu_probe(), I found that when a new skb was placed
> in front of the write queue, we were not updating tcp highest sack.
>
> If one skb is freed because all its content was copied to the new skb
> (for MTU probing), then tp->highest_sack could point to a now freed skb.
>
> Bad things would then happen, including infinite loops.
>
> This patch renames tcp_highest_sack_combine() and uses it
> from tcp_mtu_probe() to fix the bug.
>
> Note that I also removed one test against tp->sacked_out,
> since we want to replace tp->highest_sack regardless of whatever
> condition, since keeping a stale pointer to freed skb is a recipe
> for disaster.
>
> Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Reported-by: Roman Gushchin <guro@fb.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> ---
>  include/net/tcp.h     |    6 +++---
>  net/ipv4/tcp_output.c |    3 ++-
>  2 files changed, 5 insertions(+), 4 deletions(-)

Acked-by: Neal Cardwell <ncardwell@google.com>

Nice catch, indeed. Thanks, Eric!

neal
Yuchung Cheng Nov. 1, 2017, 5:50 a.m. UTC | #5
On Mon, Oct 30, 2017 at 11:17 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Oct 30, 2017 at 11:08:20PM -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Based on SNMP values provided by Roman, Yuchung made the observation
> > that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> >
> > Looking at tcp_mtu_probe(), I found that when a new skb was placed
> > in front of the write queue, we were not updating tcp highest sack.
> >
> > If one skb is freed because all its content was copied to the new skb
> > (for MTU probing), then tp->highest_sack could point to a now freed skb.
> >
> > Bad things would then happen, including infinite loops.
> >
> > This patch renames tcp_highest_sack_combine() and uses it
> > from tcp_mtu_probe() to fix the bug.
> >
> > Note that I also removed one test against tp->sacked_out,
> > since we want to replace tp->highest_sack regardless of whatever
> > condition, since keeping a stale pointer to freed skb is a recipe
> > for disaster.
> >
> > Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Reported-by: Roman Gushchin <guro@fb.com>
> > Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
>
> Thanks!
>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
>
> wow. a bug from 2007.
> Any idea why it only started to bite us in 4.11 ?
FWIW some random guess:
Since RACK was confirmed to trigger the issue, and RACK enables
detecting lost retransmission w/o limited-transmit in CA_Loss state, I
guess RACK create a new type of "fast retransmit" that caused some
previously impossible SACK during MTU probing.

Acked-by: Yuchung Cheng <ycheng@google.com>


>
> It's not trivial for us to reproduce it, but we will definitely
> test the patch as soon as we can.
> Do you have packet drill test or something for easy repro?
>
David Miller Nov. 1, 2017, 12:20 p.m. UTC | #6
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 30 Oct 2017 23:08:20 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Based on SNMP values provided by Roman, Yuchung made the observation
> that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> 
> Looking at tcp_mtu_probe(), I found that when a new skb was placed
> in front of the write queue, we were not updating tcp highest sack.
> 
> If one skb is freed because all its content was copied to the new skb
> (for MTU probing), then tp->highest_sack could point to a now freed skb.
> 
> Bad things would then happen, including infinite loops.
> 
> This patch renames tcp_highest_sack_combine() and uses it
> from tcp_mtu_probe() to fix the bug.
> 
> Note that I also removed one test against tp->sacked_out,
> since we want to replace tp->highest_sack regardless of whatever
> condition, since keeping a stale pointer to freed skb is a recipe
> for disaster.
> 
> Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct access")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Reported-by: Roman Gushchin <guro@fb.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>

Applied and queued up for -stable.
Oleksandr Natalenko Nov. 3, 2017, 6:22 p.m. UTC | #7
Hi.

Thanks for the fix.

However, tcp_fastretrans_alert() warning case still remains open even with 
this patch. Do I understand correctly that these are 2 different issues?

Currently, I use latest 4.13 stable kernel + this patch and still get:

WARNING: CPU: 1 PID: 736 at net/ipv4/tcp_input.c:2826 tcp_fastretrans_alert
+0x7c8/0x990

Any idea on this?

On úterý 31. října 2017 7:08:20 CET Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Based on SNMP values provided by Roman, Yuchung made the observation
> that some crashes in tcp_sacktag_walk() might be caused by MTU probing.
> 
> Looking at tcp_mtu_probe(), I found that when a new skb was placed
> in front of the write queue, we were not updating tcp highest sack.
> 
> If one skb is freed because all its content was copied to the new skb
> (for MTU probing), then tp->highest_sack could point to a now freed skb.
> 
> Bad things would then happen, including infinite loops.
> 
> This patch renames tcp_highest_sack_combine() and uses it
> from tcp_mtu_probe() to fix the bug.
> 
> Note that I also removed one test against tp->sacked_out,
> since we want to replace tp->highest_sack regardless of whatever
> condition, since keeping a stale pointer to freed skb is a recipe
> for disaster.
> 
> Fixes: a47e5a988a57 ("[TCP]: Convert highest_sack to sk_buff to allow direct
> access") Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Reported-by: Roman Gushchin <guro@fb.com>
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> ---
>  include/net/tcp.h     |    6 +++---
>  net/ipv4/tcp_output.c |    3 ++-
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index
> 33599d17522d6a19b9d9a316cc1579cd5e71ee32..e6d0002a1b0bc5f28c331a760823c8dc9
> 2f8fe24 100644 --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1771,12 +1771,12 @@ static inline void tcp_highest_sack_reset(struct
> sock *sk) tcp_sk(sk)->highest_sack = tcp_write_queue_head(sk);
>  }
> 
> -/* Called when old skb is about to be deleted (to be combined with new skb)
> */ -static inline void tcp_highest_sack_combine(struct sock *sk,
> +/* Called when old skb is about to be deleted and replaced by new skb */
> +static inline void tcp_highest_sack_replace(struct sock *sk,
>  					    struct sk_buff *old,
>  					    struct sk_buff *new)
>  {
> -	if (tcp_sk(sk)->sacked_out && (old == tcp_sk(sk)->highest_sack))
> +	if (old == tcp_highest_sack(sk))
>  		tcp_sk(sk)->highest_sack = new;
>  }
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index
> ae60dd3faed0adc71731bc686f878afd4c628d32..823003eef3a21a5cc5c27e0be9f46159a
> fa060df 100644 --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2062,6 +2062,7 @@ static int tcp_mtu_probe(struct sock *sk)
>  	nskb->ip_summed = skb->ip_summed;
> 
>  	tcp_insert_write_queue_before(nskb, skb, sk);
> +	tcp_highest_sack_replace(sk, skb, nskb);
> 
>  	len = 0;
>  	tcp_for_write_queue_from_safe(skb, next, sk) {
> @@ -2665,7 +2666,7 @@ static bool tcp_collapse_retrans(struct sock *sk,
> struct sk_buff *skb) else if (!skb_shift(skb, next_skb, next_skb_size))
>  			return false;
>  	}
> -	tcp_highest_sack_combine(sk, next_skb, skb);
> +	tcp_highest_sack_replace(sk, next_skb, skb);
> 
>  	tcp_unlink_write_queue(next_skb, sk);
Eric Dumazet Nov. 3, 2017, 9:31 p.m. UTC | #8
On Fri, 2017-11-03 at 19:22 +0100, Oleksandr Natalenko wrote:
> Hi.
> 
> Thanks for the fix.
> 
> However, tcp_fastretrans_alert() warning case still remains open even with 
> this patch. Do I understand correctly that these are 2 different issues?
> 
> Currently, I use latest 4.13 stable kernel + this patch and still get:
> 
> WARNING: CPU: 1 PID: 736 at net/ipv4/tcp_input.c:2826 tcp_fastretrans_alert
> +0x7c8/


My patch only fixed the panics that you guys reported.

The warning issue in fastretrans is a separate problem,
we are still working on it, but at least the effects are not
catastrophic.
diff mbox series

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 33599d17522d6a19b9d9a316cc1579cd5e71ee32..e6d0002a1b0bc5f28c331a760823c8dc92f8fe24 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1771,12 +1771,12 @@  static inline void tcp_highest_sack_reset(struct sock *sk)
 	tcp_sk(sk)->highest_sack = tcp_write_queue_head(sk);
 }
 
-/* Called when old skb is about to be deleted (to be combined with new skb) */
-static inline void tcp_highest_sack_combine(struct sock *sk,
+/* Called when old skb is about to be deleted and replaced by new skb */
+static inline void tcp_highest_sack_replace(struct sock *sk,
 					    struct sk_buff *old,
 					    struct sk_buff *new)
 {
-	if (tcp_sk(sk)->sacked_out && (old == tcp_sk(sk)->highest_sack))
+	if (old == tcp_highest_sack(sk))
 		tcp_sk(sk)->highest_sack = new;
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ae60dd3faed0adc71731bc686f878afd4c628d32..823003eef3a21a5cc5c27e0be9f46159afa060df 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2062,6 +2062,7 @@  static int tcp_mtu_probe(struct sock *sk)
 	nskb->ip_summed = skb->ip_summed;
 
 	tcp_insert_write_queue_before(nskb, skb, sk);
+	tcp_highest_sack_replace(sk, skb, nskb);
 
 	len = 0;
 	tcp_for_write_queue_from_safe(skb, next, sk) {
@@ -2665,7 +2666,7 @@  static bool tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
 		else if (!skb_shift(skb, next_skb, next_skb_size))
 			return false;
 	}
-	tcp_highest_sack_combine(sk, next_skb, skb);
+	tcp_highest_sack_replace(sk, next_skb, skb);
 
 	tcp_unlink_write_queue(next_skb, sk);