diff mbox

inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp

Message ID 1335689052.2900.86.camel@edumazet-glaptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet April 29, 2012, 8:44 a.m. UTC
On Sun, 2012-04-29 at 10:40 +0200, Eric Dumazet wrote:
> On Sun, 2012-04-29 at 10:27 +0200, Eric Dumazet wrote:
> > On Sun, 2012-04-29 at 10:41 +0300, Denys Fedoryshchenko wrote:
> > > I apologize for late night post, and a lot of trash left in report.
> > > I will cleanup it up now, and send with CC to maintainers.
> > > 
> > > Server job are proxy, with very high rate of new connections.
> > > Deadlock at peaktime can be easily reproduced in 10-15 minutes.
> > > 
> > > Deadlock occured on almost all 3.3-stable versions (tried 3.3.3 - 
> > > 3.3.4). It is not easy to try older kernel,
> > > but if required i can try.
> > > Usually also, because SYN rate very high, i can see:
> > > [   51.612987] TCP: Possible SYN flooding on port 8080. Sending 
> > > cookies.  Check SNMP counters.
> > > 
> > >   [  762.903868]
> > >   [  762.903880] =================================
> > >   [  762.903890] [ INFO: inconsistent lock state ]
> > >   [  762.903903] 3.3.4-build-0061 #8 Not tainted
> > >   [  762.904133] ---------------------------------
> > >   [  762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> > >   [  762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > >   [  762.904542]  (key#3){+.?...}, at: [<c0232cc4>] 
> > > __percpu_counter_sum+0xd/0x58
> > >   [  762.904542] {IN-SOFTIRQ-W} state was registered at:
> > >   [  762.904542]   [<c0158b84>] __lock_acquire+0x284/0xc26
> > >   [  762.904542]   [<c01598e8>] lock_acquire+0x71/0x85
> > >   [  762.904542]   [<c0349765>] _raw_spin_lock+0x33/0x40
> > >   [  762.904542]   [<c0232c93>] __percpu_counter_add+0x58/0x7c
> > >   [  762.904542]   [<c02cfde1>] sk_clone_lock+0x1e5/0x200
> > >   [  762.904542]   [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
> > >   [  762.904542]   [<c0315778>] tcp_create_openreq_child+0x1b/0x404
> > >   [  762.904542]   [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
> > >   [  762.904542]   [<c031615a>] tcp_check_req+0x1fd/0x2d7
> > >   [  762.904542]   [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
> > >   [  762.904542]   [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
> > >   [  762.904542]   [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
> > >   [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
> > >   [  762.904542]   [<c02fc652>] ip_local_deliver+0x41/0x45
> > >   [  762.904542]   [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
> > >   [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
> > >   [  762.904542]   [<c02fc857>] ip_rcv+0x201/0x23e
> > >   [  762.904542]   [<c02daa3a>] __netif_receive_skb+0x319/0x368
> > >   [  762.904542]   [<c02dac07>] netif_receive_skb+0x4e/0x7d
> > >   [  762.904542]   [<c02dacf6>] napi_skb_finish+0x1e/0x34
> > >   [  762.904542]   [<c02db122>] napi_gro_receive+0x20/0x24
> > >   [  762.904542]   [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
> > >   [  762.904542]   [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
> > >   [  762.904542]   [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
> > >   [  762.904542]   [<c02db228>] net_rx_action+0x90/0x160
> > >   [  762.904542]   [<c012a445>] __do_softirq+0x7b/0x118
> > >   [  762.904542] irq event stamp: 156915469
> > >   [  762.904542] hardirqs last  enabled at (156915469): [<c019b4f4>] 
> > > __slab_alloc.clone.58.clone.63+0xc4/0x2de
> > >   [  762.904542] hardirqs last disabled at (156915468): [<c019b452>] 
> > > __slab_alloc.clone.58.clone.63+0x22/0x2de
> > >   [  762.904542] softirqs last  enabled at (156915466): [<c02ce677>] 
> > > lock_sock_nested+0x64/0x6c
> > >   [  762.904542] softirqs last disabled at (156915464): [<c0349914>] 
> > > _raw_spin_lock_bh+0xe/0x45
> > >   [  762.904542]
> > >   [  762.904542] other info that might help us debug this:
> > >   [  762.904542]  Possible unsafe locking scenario:
> > >   [  762.904542]
> > >   [  762.904542]        CPU0
> > >   [  762.904542]        ----
> > >   [  762.904542]   lock(key#3);
> > >   [  762.904542]   <Interrupt>
> > >   [  762.904542]     lock(key#3);
> > >   [  762.904542]
> > >   [  762.904542]  *** DEADLOCK ***
> > >   [  762.904542]
> > >   [  762.904542] 1 lock held by squid/1603:
> > >   [  762.904542]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>] 
> > > lock_sock+0xa/0xc
> > >   [  762.904542]
> > >   [  762.904542] stack backtrace:
> > >   [  762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
> > >   [  762.904542] Call Trace:
> > >   [  762.904542]  [<c0347b73>] ? printk+0x18/0x1d
> > >   [  762.904542]  [<c015873a>] valid_state+0x1f6/0x201
> > >   [  762.904542]  [<c0158816>] mark_lock+0xd1/0x1bb
> > >   [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
> > >   [  762.904542]  [<c015805d>] ? check_usage_forwards+0x77/0x77
> > >   [  762.904542]  [<c0158bf8>] __lock_acquire+0x2f8/0xc26
> > >   [  762.904542]  [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
> > >   [  762.904542]  [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
> > >   [  762.904542]  [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
> > >   [  762.904542]  [<c01598e8>] lock_acquire+0x71/0x85
> > >   [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
> > >   [  762.904542]  [<c0349765>] _raw_spin_lock+0x33/0x40
> > >   [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
> > >   [  762.904542]  [<c0232cc4>] __percpu_counter_sum+0xd/0x58
> > >   [  762.904542]  [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
> > >   [  762.904542]  [<c02d178d>] ? __alloc_skb+0x76/0x100
> > >   [  762.904542]  [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
> > >   [  762.904542]  [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
> > >   [  762.904542]  [<c0306567>] tcp_sendmsg+0x18f/0x68b
> > >   [  762.904542]  [<c031f3dc>] ? ip_fast_csum+0x30/0x30
> > >   [  762.904542]  [<c0320193>] inet_sendmsg+0x53/0x5a
> > >   [  762.904542]  [<c02cb633>] sock_aio_write+0xd2/0xda
> > >   [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
> > >   [  762.904542]  [<c01a1017>] do_sync_write+0x9f/0xd9
> > >   [  762.904542]  [<c01a2111>] ? file_free_rcu+0x2f/0x2f
> > >   [  762.904542]  [<c01a17a1>] vfs_write+0x8f/0xab
> > >   [  762.904542]  [<c01a284d>] ? fget_light+0x75/0x7c
> > >   [  762.904542]  [<c01a1900>] sys_write+0x3d/0x5e
> > >   [  762.904542]  [<c0349ec9>] syscall_call+0x7/0xb
> > >   [  762.904542]  [<c0340000>] ? rp_sidt+0x41/0x83
> > > 
> > 
> > 
> > OK, so when we have memory pressure we can call
> > percpu_counter_read_positive() with SOFTIRQ enabled, and lockdep
> > complains...
> > 
> > This bug was probably added in 2008, in commit 1748376b6626a
> > (net: Use a percpu_counter for sockets_allocated)
> 
> Hmm, no this patch was fine.
> 
> Bug was in fact added by Glauber Costa in commit 180d8cd942ce336b2c869
> (foundations of per-cgroup memory pressure controlling.)
> 
> Because he replaced the safe percpu_counter_read_positive() call to
> unsafe percpu_counter_sum_positive() in
> sk_sockets_allocated_read_positive()
> 
> But anyway the patch I sent should fix the problem.

Thinking again, I am not sure why Glauber did this change. He probably
made a typo or something.

Maybe we should revert and use the following patch instead, since we
dont need a precise sum at this point.

This better matches "sk_sockets_allocated_read_positive" intent anyway



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Glauber Costa April 30, 2012, 4:45 a.m. UTC | #1
On 04/29/2012 05:44 AM, Eric Dumazet wrote:
> On Sun, 2012-04-29 at 10:40 +0200, Eric Dumazet wrote:
>> On Sun, 2012-04-29 at 10:27 +0200, Eric Dumazet wrote:
>>> On Sun, 2012-04-29 at 10:41 +0300, Denys Fedoryshchenko wrote:
>>>> I apologize for late night post, and a lot of trash left in report.
>>>> I will cleanup it up now, and send with CC to maintainers.
>>>>
>>>> Server job are proxy, with very high rate of new connections.
>>>> Deadlock at peaktime can be easily reproduced in 10-15 minutes.
>>>>
>>>> Deadlock occured on almost all 3.3-stable versions (tried 3.3.3 -
>>>> 3.3.4). It is not easy to try older kernel,
>>>> but if required i can try.
>>>> Usually also, because SYN rate very high, i can see:
>>>> [   51.612987] TCP: Possible SYN flooding on port 8080. Sending
>>>> cookies.  Check SNMP counters.
>>>>
>>>>    [  762.903868]
>>>>    [  762.903880] =================================
>>>>    [  762.903890] [ INFO: inconsistent lock state ]
>>>>    [  762.903903] 3.3.4-build-0061 #8 Not tainted
>>>>    [  762.904133] ---------------------------------
>>>>    [  762.904344] inconsistent {IN-SOFTIRQ-W} ->  {SOFTIRQ-ON-W} usage.
>>>>    [  762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
>>>>    [  762.904542]  (key#3){+.?...}, at: [<c0232cc4>]
>>>> __percpu_counter_sum+0xd/0x58
>>>>    [  762.904542] {IN-SOFTIRQ-W} state was registered at:
>>>>    [  762.904542]   [<c0158b84>] __lock_acquire+0x284/0xc26
>>>>    [  762.904542]   [<c01598e8>] lock_acquire+0x71/0x85
>>>>    [  762.904542]   [<c0349765>] _raw_spin_lock+0x33/0x40
>>>>    [  762.904542]   [<c0232c93>] __percpu_counter_add+0x58/0x7c
>>>>    [  762.904542]   [<c02cfde1>] sk_clone_lock+0x1e5/0x200
>>>>    [  762.904542]   [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
>>>>    [  762.904542]   [<c0315778>] tcp_create_openreq_child+0x1b/0x404
>>>>    [  762.904542]   [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
>>>>    [  762.904542]   [<c031615a>] tcp_check_req+0x1fd/0x2d7
>>>>    [  762.904542]   [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
>>>>    [  762.904542]   [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
>>>>    [  762.904542]   [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
>>>>    [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
>>>>    [  762.904542]   [<c02fc652>] ip_local_deliver+0x41/0x45
>>>>    [  762.904542]   [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
>>>>    [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
>>>>    [  762.904542]   [<c02fc857>] ip_rcv+0x201/0x23e
>>>>    [  762.904542]   [<c02daa3a>] __netif_receive_skb+0x319/0x368
>>>>    [  762.904542]   [<c02dac07>] netif_receive_skb+0x4e/0x7d
>>>>    [  762.904542]   [<c02dacf6>] napi_skb_finish+0x1e/0x34
>>>>    [  762.904542]   [<c02db122>] napi_gro_receive+0x20/0x24
>>>>    [  762.904542]   [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
>>>>    [  762.904542]   [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
>>>>    [  762.904542]   [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
>>>>    [  762.904542]   [<c02db228>] net_rx_action+0x90/0x160
>>>>    [  762.904542]   [<c012a445>] __do_softirq+0x7b/0x118
>>>>    [  762.904542] irq event stamp: 156915469
>>>>    [  762.904542] hardirqs last  enabled at (156915469): [<c019b4f4>]
>>>> __slab_alloc.clone.58.clone.63+0xc4/0x2de
>>>>    [  762.904542] hardirqs last disabled at (156915468): [<c019b452>]
>>>> __slab_alloc.clone.58.clone.63+0x22/0x2de
>>>>    [  762.904542] softirqs last  enabled at (156915466): [<c02ce677>]
>>>> lock_sock_nested+0x64/0x6c
>>>>    [  762.904542] softirqs last disabled at (156915464): [<c0349914>]
>>>> _raw_spin_lock_bh+0xe/0x45
>>>>    [  762.904542]
>>>>    [  762.904542] other info that might help us debug this:
>>>>    [  762.904542]  Possible unsafe locking scenario:
>>>>    [  762.904542]
>>>>    [  762.904542]        CPU0
>>>>    [  762.904542]        ----
>>>>    [  762.904542]   lock(key#3);
>>>>    [  762.904542]<Interrupt>
>>>>    [  762.904542]     lock(key#3);
>>>>    [  762.904542]
>>>>    [  762.904542]  *** DEADLOCK ***
>>>>    [  762.904542]
>>>>    [  762.904542] 1 lock held by squid/1603:
>>>>    [  762.904542]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>]
>>>> lock_sock+0xa/0xc
>>>>    [  762.904542]
>>>>    [  762.904542] stack backtrace:
>>>>    [  762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
>>>>    [  762.904542] Call Trace:
>>>>    [  762.904542]  [<c0347b73>] ? printk+0x18/0x1d
>>>>    [  762.904542]  [<c015873a>] valid_state+0x1f6/0x201
>>>>    [  762.904542]  [<c0158816>] mark_lock+0xd1/0x1bb
>>>>    [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
>>>>    [  762.904542]  [<c015805d>] ? check_usage_forwards+0x77/0x77
>>>>    [  762.904542]  [<c0158bf8>] __lock_acquire+0x2f8/0xc26
>>>>    [  762.904542]  [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
>>>>    [  762.904542]  [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
>>>>    [  762.904542]  [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
>>>>    [  762.904542]  [<c01598e8>] lock_acquire+0x71/0x85
>>>>    [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
>>>>    [  762.904542]  [<c0349765>] _raw_spin_lock+0x33/0x40
>>>>    [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
>>>>    [  762.904542]  [<c0232cc4>] __percpu_counter_sum+0xd/0x58
>>>>    [  762.904542]  [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
>>>>    [  762.904542]  [<c02d178d>] ? __alloc_skb+0x76/0x100
>>>>    [  762.904542]  [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
>>>>    [  762.904542]  [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
>>>>    [  762.904542]  [<c0306567>] tcp_sendmsg+0x18f/0x68b
>>>>    [  762.904542]  [<c031f3dc>] ? ip_fast_csum+0x30/0x30
>>>>    [  762.904542]  [<c0320193>] inet_sendmsg+0x53/0x5a
>>>>    [  762.904542]  [<c02cb633>] sock_aio_write+0xd2/0xda
>>>>    [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
>>>>    [  762.904542]  [<c01a1017>] do_sync_write+0x9f/0xd9
>>>>    [  762.904542]  [<c01a2111>] ? file_free_rcu+0x2f/0x2f
>>>>    [  762.904542]  [<c01a17a1>] vfs_write+0x8f/0xab
>>>>    [  762.904542]  [<c01a284d>] ? fget_light+0x75/0x7c
>>>>    [  762.904542]  [<c01a1900>] sys_write+0x3d/0x5e
>>>>    [  762.904542]  [<c0349ec9>] syscall_call+0x7/0xb
>>>>    [  762.904542]  [<c0340000>] ? rp_sidt+0x41/0x83
>>>>
>>>
>>>
>>> OK, so when we have memory pressure we can call
>>> percpu_counter_read_positive() with SOFTIRQ enabled, and lockdep
>>> complains...
>>>
>>> This bug was probably added in 2008, in commit 1748376b6626a
>>> (net: Use a percpu_counter for sockets_allocated)
>>
>> Hmm, no this patch was fine.
>>
>> Bug was in fact added by Glauber Costa in commit 180d8cd942ce336b2c869
>> (foundations of per-cgroup memory pressure controlling.)
>>
>> Because he replaced the safe percpu_counter_read_positive() call to
>> unsafe percpu_counter_sum_positive() in
>> sk_sockets_allocated_read_positive()
>>
>> But anyway the patch I sent should fix the problem.
>
> Thinking again, I am not sure why Glauber did this change. He probably
> made a typo or something.
>

Indeed.

It wasn't my intent to change that, it was a mistake.

Very sorry.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 30, 2012, 4:49 p.m. UTC | #2
From: Glauber Costa <glommer@parallels.com>
Date: Mon, 30 Apr 2012 01:45:29 -0300

> It wasn't my intent to change that, it was a mistake.

I very much regret succumbing to the unreasonable pressure to apply
your original patches.

We've come to learn that they were full of errors and that they needed
to cook for several more months before being included.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 188532e..5a0a58a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1129,9 +1129,9 @@  sk_sockets_allocated_read_positive(struct sock *sk)
 	struct proto *prot = sk->sk_prot;
 
 	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		return percpu_counter_sum_positive(sk->sk_cgrp->sockets_allocated);
+		return percpu_counter_read_positive(sk->sk_cgrp->sockets_allocated);
 
-	return percpu_counter_sum_positive(prot->sockets_allocated);
+	return percpu_counter_read_positive(prot->sockets_allocated);
 }
 
 static inline int