diff mbox

BUG: soft lockup - CPU#6 stuck for 22s! [httpd2-event:15597]

Message ID 1345895225.19483.447.camel@edumazet-glaptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Aug. 25, 2012, 11:47 a.m. UTC
On Sat, 2012-08-25 at 11:14 +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> On Sat, 2012-08-25 at 10:59 +0200, Eric Dumazet wrote:
> > On Fri, 2012-08-24 at 20:50 -0400, Cristian Rodríguez wrote:
> > > Hi, the issue I reported with IPV6 few weeks ago seems to be gone, but
> > > now I am getting the following crash..
> 
> > Oh, I now see the bug, I'll send a patch asap
> 
> Please try the following fix.
> 
> Thanks !

Well, this v2 seems cleaner :

[PATCH v2] tcp: tcp_slow_start() should not decrease snd_cwnd

Cristian Rodríguez reported various lockups in TCP stack,
introduced by commit 9dc274151a548 (tcp: fix ABC in tcp_slow_start())

We could exit tcp_slow_start() with a zeroed snd_cwnd,
and next time we enter tcp_slow_start(), we run an infinite loop.

Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_cong.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Cristian Rodríguez Aug. 25, 2012, 5:45 p.m. UTC | #1
El 25/08/12 07:47, Eric Dumazet escribió:
> On Sat, 2012-08-25 at 11:14 +0200, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> On Sat, 2012-08-25 at 10:59 +0200, Eric Dumazet wrote:
>>> On Fri, 2012-08-24 at 20:50 -0400, Cristian Rodríguez wrote:
>>>> Hi, the issue I reported with IPV6 few weeks ago seems to be gone, but
>>>> now I am getting the following crash..
>>
>>> Oh, I now see the bug, I'll send a patch asap
>>
>> Please try the following fix.
>>
>> Thanks !

Eric, thanks for your work, I have deployed the fix in one machine and
will keep an eye for any other anomaly.

As it happends around twice a day, it would take a while to determine if
this issue has gone away.

Keep up great work :-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 25, 2012, 5:52 p.m. UTC | #2
On Sat, 2012-08-25 at 13:47 +0200, Eric Dumazet wrote:
> On Sat, 2012-08-25 at 11:14 +0200, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > On Sat, 2012-08-25 at 10:59 +0200, Eric Dumazet wrote:
> > > On Fri, 2012-08-24 at 20:50 -0400, Cristian Rodríguez wrote:
> > > > Hi, the issue I reported with IPV6 few weeks ago seems to be gone, but
> > > > now I am getting the following crash..
> > 
> > > Oh, I now see the bug, I'll send a patch asap
> > 
> > Please try the following fix.
> > 
> > Thanks !
> 
> Well, this v2 seems cleaner :
> 
> [PATCH v2] tcp: tcp_slow_start() should not decrease snd_cwnd
> 
> Cristian Rodríguez reported various lockups in TCP stack,
> introduced by commit 9dc274151a548 (tcp: fix ABC in tcp_slow_start())
> 
> We could exit tcp_slow_start() with a zeroed snd_cwnd,
> and next time we enter tcp_slow_start(), we run an infinite loop.
> 
> Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv4/tcp_cong.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> index 1432cdb..e656c72 100644
> --- a/net/ipv4/tcp_cong.c
> +++ b/net/ipv4/tcp_cong.c
> @@ -337,7 +337,7 @@ void tcp_slow_start(struct tcp_sock *tp)
>  		tp->snd_cwnd_cnt -= tp->snd_cwnd;
>  		delta++;
>  	}
> -	tp->snd_cwnd = min(tp->snd_cwnd + delta, tp->snd_cwnd_clamp);
> +	tp->snd_cwnd = clamp(tp->snd_cwnd + delta, tp->snd_cwnd, tp->snd_cwnd_clamp);
>  }
>  EXPORT_SYMBOL_GPL(tcp_slow_start);
>  
> 

Hmm...

We probably have a bug in tcp_metrics.c, because snd_cwnd_clamp should
not be zero.

With RCU, it seems following code in tcpm_new() is racy :

tm->tcpm_addr = *addr;
...
tcpm_suck_dst(tm, dst);

Coupled with the code in tcpm_suck_dst(tm, dst) 

static void tcpm_suck_dst(struct tcp_metrics_block *tm, struct dst_entry *dst)
{
        u32 val;

        tm->tcpm_stamp = jiffies;

        val = 0;
        if (dst_metric_locked(dst, RTAX_RTT))
                val |= 1 << TCP_METRIC_RTT;
        if (dst_metric_locked(dst, RTAX_RTTVAR))
                val |= 1 << TCP_METRIC_RTTVAR;
        if (dst_metric_locked(dst, RTAX_SSTHRESH))
                val |= 1 << TCP_METRIC_SSTHRESH;
        if (dst_metric_locked(dst, RTAX_CWND))
                val |= 1 << TCP_METRIC_CWND;
        if (dst_metric_locked(dst, RTAX_REORDERING))
                val |= 1 << TCP_METRIC_REORDERING;
        tm->tcpm_lock = val;

// HERE we set tcpm_lock before the tcpm_vals[]

        tm->tcpm_vals[TCP_METRIC_RTT] = dst_metric_raw(dst, RTAX_RTT);
        tm->tcpm_vals[TCP_METRIC_RTTVAR] = dst_metric_raw(dst, RTAX_RTTVAR);
        tm->tcpm_vals[TCP_METRIC_SSTHRESH] = dst_metric_raw(dst, RTAX_SSTHRESH);
        tm->tcpm_vals[TCP_METRIC_CWND] = dst_metric_raw(dst, RTAX_CWND);
        tm->tcpm_vals[TCP_METRIC_REORDERING] = dst_metric_raw(dst, RTAX_REORDERING);
        tm->tcpm_ts = 0;
        tm->tcpm_ts_stamp = 0;
        tm->tcpm_fastopen.mss = 0;
        tm->tcpm_fastopen.syn_loss = 0;
        tm->tcpm_fastopen.cookie.len = 0;
}



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neal Cardwell Aug. 27, 2012, 4:02 a.m. UTC | #3
On Sat, Aug 25, 2012 at 7:47 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2012-08-25 at 11:14 +0200, Eric Dumazet wrote:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> On Sat, 2012-08-25 at 10:59 +0200, Eric Dumazet wrote:
>> > On Fri, 2012-08-24 at 20:50 -0400, Cristian Rodríguez wrote:
>> > > Hi, the issue I reported with IPV6 few weeks ago seems to be gone, but
>> > > now I am getting the following crash..
>>
>> > Oh, I now see the bug, I'll send a patch asap
>>
>> Please try the following fix.
>>
>> Thanks !
>
> Well, this v2 seems cleaner :
>
> [PATCH v2] tcp: tcp_slow_start() should not decrease snd_cwnd
>
> Cristian Rodríguez reported various lockups in TCP stack,
> introduced by commit 9dc274151a548 (tcp: fix ABC in tcp_slow_start())
>
> We could exit tcp_slow_start() with a zeroed snd_cwnd,
> and next time we enter tcp_slow_start(), we run an infinite loop.
>
> Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org>
> Cc: Yuchung Cheng <ycheng@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv4/tcp_cong.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> index 1432cdb..e656c72 100644
> --- a/net/ipv4/tcp_cong.c
> +++ b/net/ipv4/tcp_cong.c
> @@ -337,7 +337,7 @@ void tcp_slow_start(struct tcp_sock *tp)
>                 tp->snd_cwnd_cnt -= tp->snd_cwnd;
>                 delta++;
>         }
> -       tp->snd_cwnd = min(tp->snd_cwnd + delta, tp->snd_cwnd_clamp);
> +       tp->snd_cwnd = clamp(tp->snd_cwnd + delta, tp->snd_cwnd, tp->snd_cwnd_clamp);

AFAICT if tcp_slow_start() is changing snd_cwnd from non-zero to zero
then this is because snd_cwnd_clamp is zero here, as you theorize may
be happening to races somewhere.

However, AFAICT from reading the min() and clamp() macros, this code
with clamp() will still have the same problem as the existing code
that uses min: if snd_cwnd_clamp is 0 then snd_cwnd will end up 0
here. (This is because the clamp() macro implicitly assumes that the
max value is above the min value, and filters agains the max last.)

neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 27, 2012, 4:06 a.m. UTC | #4
On Mon, 2012-08-27 at 00:02 -0400, Neal Cardwell wrote:
> On Sat, Aug 25, 2012 at 7:47 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Sat, 2012-08-25 at 11:14 +0200, Eric Dumazet wrote:
> >> From: Eric Dumazet <edumazet@google.com>
> >>
> >> On Sat, 2012-08-25 at 10:59 +0200, Eric Dumazet wrote:
> >> > On Fri, 2012-08-24 at 20:50 -0400, Cristian Rodríguez wrote:
> >> > > Hi, the issue I reported with IPV6 few weeks ago seems to be gone, but
> >> > > now I am getting the following crash..
> >>
> >> > Oh, I now see the bug, I'll send a patch asap
> >>
> >> Please try the following fix.
> >>
> >> Thanks !
> >
> > Well, this v2 seems cleaner :
> >
> > [PATCH v2] tcp: tcp_slow_start() should not decrease snd_cwnd
> >
> > Cristian Rodríguez reported various lockups in TCP stack,
> > introduced by commit 9dc274151a548 (tcp: fix ABC in tcp_slow_start())
> >
> > We could exit tcp_slow_start() with a zeroed snd_cwnd,
> > and next time we enter tcp_slow_start(), we run an infinite loop.
> >
> > Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org>
> > Cc: Yuchung Cheng <ycheng@google.com>
> > Cc: Neal Cardwell <ncardwell@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  net/ipv4/tcp_cong.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> > index 1432cdb..e656c72 100644
> > --- a/net/ipv4/tcp_cong.c
> > +++ b/net/ipv4/tcp_cong.c
> > @@ -337,7 +337,7 @@ void tcp_slow_start(struct tcp_sock *tp)
> >                 tp->snd_cwnd_cnt -= tp->snd_cwnd;
> >                 delta++;
> >         }
> > -       tp->snd_cwnd = min(tp->snd_cwnd + delta, tp->snd_cwnd_clamp);
> > +       tp->snd_cwnd = clamp(tp->snd_cwnd + delta, tp->snd_cwnd, tp->snd_cwnd_clamp);
> 
> AFAICT if tcp_slow_start() is changing snd_cwnd from non-zero to zero
> then this is because snd_cwnd_clamp is zero here, as you theorize may
> be happening to races somewhere.
> 
> However, AFAICT from reading the min() and clamp() macros, this code
> with clamp() will still have the same problem as the existing code
> that uses min: if snd_cwnd_clamp is 0 then snd_cwnd will end up 0
> here. (This is because the clamp() macro implicitly assumes that the
> max value is above the min value, and filters agains the max last.)
> 

Indeed, so the first patch was better...

Not sure I can investigate this problem this week, as I attend LKS/LPC
in San Diego.

Could be that snd_cwnd is zero as well so we have this infinite loop...

        while (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
                tp->snd_cwnd_cnt -= tp->snd_cwnd;
                delta++;
        }



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neal Cardwell Aug. 30, 2012, 1:37 a.m. UTC | #5
On Mon, Aug 27, 2012 at 12:06 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Indeed, so the first patch was better...
>
> Not sure I can investigate this problem this week, as I attend LKS/LPC
> in San Diego.
>
> Could be that snd_cwnd is zero as well so we have this infinite loop...
>
>         while (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
>                 tp->snd_cwnd_cnt -= tp->snd_cwnd;
>                 delta++;
>         }

Yes, it seems that one way or another cwnd is getting to zero and
turning this into an infinite loop.

Two questions for you, Cristian, if you have a second:

1) Are you still seeing this problem in your workload? If so, would
you have time to try another small patch to add instrumentation to
track down the cause?

2) Do you happen to run with the tcp_mtu_probing sysctl enabled? I
have been looking for code paths that could cause the cwnd to go to
zero, and that's the first possibility that I've been able to see (in
tcp_mtup_probe_success() it seems like there are corner cases where
snd_cwnd could become zero).

neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cristian Rodríguez Aug. 30, 2012, 3:05 a.m. UTC | #6
El mié 29 ago 2012 21:37:46 CLT, Neal Cardwell escribió:

Hi neal, thanks for taking a look at this..

> 1) Are you still seeing this problem in your workload? If so, would
> you have time to try another small patch to add instrumentation to
> track down the cause?

After Eric's patch I no longer see this backtrace or hang, however the 
machine does
hang apparently for another reason that I have been unable to capture
no clue at all in the logs and the KVM-over-ip video does not respond 
after hang, needing a reset,
that's strange though, as it is usually able to display kernel messages 
after a crash.

> 2) Do you happen to run with the tcp_mtu_probing sysctl enabled?

   No, sysctl returns : net.ipv4.tcp_mtu_probing = 0


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 1432cdb..e656c72 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -337,7 +337,7 @@  void tcp_slow_start(struct tcp_sock *tp)
 		tp->snd_cwnd_cnt -= tp->snd_cwnd;
 		delta++;
 	}
-	tp->snd_cwnd = min(tp->snd_cwnd + delta, tp->snd_cwnd_clamp);
+	tp->snd_cwnd = clamp(tp->snd_cwnd + delta, tp->snd_cwnd, tp->snd_cwnd_clamp);
 }
 EXPORT_SYMBOL_GPL(tcp_slow_start);