diff mbox

[4.1.3-rt8,report,cpuhotplug] BUG: spinlock bad magic on CPU#0, sh/137

Message ID 5617CE6D.9060800@ti.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Grygorii Strashko Oct. 9, 2015, 2:25 p.m. UTC
Hi All,

I can constantly see below error report with 4.1 RT-kernel on TI ARM dra7-evm 
if I'm trying to unplug cpu1:

[   57.737589] CPU1: shutdown
[   57.767537] BUG: spinlock bad magic on CPU#0, sh/137
[   57.767546]  lock: 0xee994730, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[   57.767552] CPU: 0 PID: 137 Comm: sh Not tainted 4.1.10-rt8-01700-g2c38702-dirty #55
[   57.767555] Hardware name: Generic DRA74X (Flattened Device Tree)
[   57.767568] [<c001acd0>] (unwind_backtrace) from [<c001534c>] (show_stack+0x20/0x24)
[   57.767579] [<c001534c>] (show_stack) from [<c075560c>] (dump_stack+0x84/0xa0)
[   57.767593] [<c075560c>] (dump_stack) from [<c00aca48>] (spin_dump+0x84/0xac)
[   57.767603] [<c00aca48>] (spin_dump) from [<c00acaa4>] (spin_bug+0x34/0x38)
[   57.767614] [<c00acaa4>] (spin_bug) from [<c00acc10>] (do_raw_spin_lock+0x168/0x1c0)
[   57.767624] [<c00acc10>] (do_raw_spin_lock) from [<c075b4cc>] (_raw_spin_lock+0x4c/0x54)
[   57.767631] [<c075b4cc>] (_raw_spin_lock) from [<c07599fc>] (rt_spin_lock_slowlock+0x5c/0x374)
[   57.767638] [<c07599fc>] (rt_spin_lock_slowlock) from [<c075bcf4>] (rt_spin_lock+0x38/0x70)
[   57.767649] [<c075bcf4>] (rt_spin_lock) from [<c06333c0>] (skb_dequeue+0x28/0x7c)
[   57.767662] [<c06333c0>] (skb_dequeue) from [<c06476ec>] (dev_cpu_callback+0x1b8/0x240)
[   57.767673] [<c06476ec>] (dev_cpu_callback) from [<c007566c>] (notifier_call_chain+0x3c/0xb4)
[   57.767683] [<c007566c>] (notifier_call_chain) from [<c0075708>] (__raw_notifier_call_chain+0x24/0x2c)
[   57.767692] [<c0075708>] (__raw_notifier_call_chain) from [<c004f2a4>] (cpu_notify+0x34/0x50)
[   57.767699] [<c004f2a4>] (cpu_notify) from [<c004f65c>] (cpu_notify_nofail+0x18/0x24)
[   57.767707] [<c004f65c>] (cpu_notify_nofail) from [<c074f304>] (_cpu_down+0x3e8/0x55c)
[   57.767715] [<c074f304>] (_cpu_down) from [<c004ff74>] (disable_nonboot_cpus+0x118/0x5dc)
[   57.767722] [<c004ff74>] (disable_nonboot_cpus) from [<c00b091c>] (suspend_enter+0x2c4/0xd18)
[   57.767730] [<c00b091c>] (suspend_enter) from [<c00b1454>] (suspend_devices_and_enter+0xe4/0x65c)
[   57.767737] [<c00b1454>] (suspend_devices_and_enter) from [<c00b208c>] (enter_state+0x6c0/0x1050)
[   57.767744] [<c00b208c>] (enter_state) from [<c00b2a40>] (pm_suspend+0x24/0x84)
[   57.767751] [<c00b2a40>] (pm_suspend) from [<c00af460>] (state_store+0x74/0xc8)
[   57.767760] [<c00af460>] (state_store) from [<c040a660>] (kobj_attr_store+0x1c/0x28)
[   57.767771] [<c040a660>] (kobj_attr_store) from [<c024563c>] (sysfs_kf_write+0x5c/0x60)
[   57.767781] [<c024563c>] (sysfs_kf_write) from [<c0244720>] (kernfs_fop_write+0xc8/0x1ac)
[   57.767792] [<c0244720>] (kernfs_fop_write) from [<c01c3974>] (__vfs_write+0x38/0xec)
[   57.767801] [<c01c3974>] (__vfs_write) from [<c01c4290>] (vfs_write+0xa0/0x174)
[   57.767811] [<c01c4290>] (vfs_write) from [<c01c4b30>] (SyS_write+0x54/0xb0)
[   57.767822] [<c01c4b30>] (SyS_write) from [<c0010b20>] (ret_fast_syscall+0x0/0x54)
[   57.768224] Powerdomain (l3init_pwrdm) didn't enter target state 1

I'm working with TI RT-kernel:
git://git.ti.com/ti-linux-kernel/ti-linux-kernel.git
branch: ti-rt-linux-4.1.y

It looks like this backtrace was introduces by 

commit 91df05da13a6c6c358e71182e80f19f3c48d1615
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 12 15:38:34 2011 +0200

    net: Use skbufhead with raw lock


I see the potential fix for this issue as below: 

index 4969c0d..f8c23de 100644

input_pkt_queue is per-cpu queue and at this moment cpu is dead already,
so no one should touch it. But I'm not sure if my assumption is correct.

Comments

Thomas Gleixner Oct. 12, 2015, 8:16 a.m. UTC | #1
On Fri, 9 Oct 2015, Grygorii Strashko wrote:
> I can constantly see below error report with 4.1 RT-kernel on TI ARM dra7-evm 
> if I'm trying to unplug cpu1:
> 
> [   57.737589] CPU1: shutdown
> [   57.767537] BUG: spinlock bad magic on CPU#0, sh/137
> [   57.767546]  lock: 0xee994730, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0

> It looks like this backtrace was introduces by 
> 
> commit 91df05da13a6c6c358e71182e80f19f3c48d1615
> net: Use skbufhead with raw lock
>
> I see the potential fix for this issue as below: 
> 
> index 4969c0d..f8c23de 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -7217,7 +7217,7 @@ static int dev_cpu_callback(struct notifier_block *nfb,
>                 netif_rx_ni(skb);
>                 input_queue_head_incr(oldsd);
>         }
> -       while ((skb = skb_dequeue(&oldsd->input_pkt_queue))) {
> +       while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {

Your patch is white space damaged ....

>                 netif_rx_ni(skb);
>                 input_queue_head_incr(oldsd);
>         }
> 
> input_pkt_queue is per-cpu queue and at this moment cpu is dead already,
> so no one should touch it. But I'm not sure if my assumption is correct.

It is. Picking it up for the next release

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Grygorii Strashko Oct. 13, 2015, 6:25 p.m. UTC | #2
On 10/12/2015 11:16 AM, Thomas Gleixner wrote:
> On Fri, 9 Oct 2015, Grygorii Strashko wrote:
>> I can constantly see below error report with 4.1 RT-kernel on TI ARM dra7-evm
>> if I'm trying to unplug cpu1:
>>
>> [   57.737589] CPU1: shutdown
>> [   57.767537] BUG: spinlock bad magic on CPU#0, sh/137
>> [   57.767546]  lock: 0xee994730, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> 
>> It looks like this backtrace was introduces by
>>
>> commit 91df05da13a6c6c358e71182e80f19f3c48d1615
>> net: Use skbufhead with raw lock
>>
>> I see the potential fix for this issue as below:
>>
>> index 4969c0d..f8c23de 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -7217,7 +7217,7 @@ static int dev_cpu_callback(struct notifier_block *nfb,
>>                  netif_rx_ni(skb);
>>                  input_queue_head_incr(oldsd);
>>          }
>> -       while ((skb = skb_dequeue(&oldsd->input_pkt_queue))) {
>> +       while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
> 
> Your patch is white space damaged ....
> 
>>                  netif_rx_ni(skb);
>>                  input_queue_head_incr(oldsd);
>>          }
>>
>> input_pkt_queue is per-cpu queue and at this moment cpu is dead already,
>> so no one should touch it. But I'm not sure if my assumption is correct.
> 
> It is. Picking it up for the next release
> 

Thanks & Sorry. I've not expected this diff/hack to be applied as patch :(
diff mbox

Patch

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7217,7 +7217,7 @@  static int dev_cpu_callback(struct notifier_block *nfb,
                netif_rx_ni(skb);
                input_queue_head_incr(oldsd);
        }
-       while ((skb = skb_dequeue(&oldsd->input_pkt_queue))) {
+       while ((skb = __skb_dequeue(&oldsd->input_pkt_queue))) {
                netif_rx_ni(skb);
                input_queue_head_incr(oldsd);
        }