diff mbox

[LEDE-DEV] stall/hang in netifd on LEDE r1318 on Linksys WRT1900AC V1

Message ID 652b7af0-4199-f797-0df7-95db57a45277@gmail.com
State Accepted
Delegated to: Felix Fietkau
Headers show

Commit Message

Josua Mayer Aug. 17, 2016, 3:08 p.m. UTC
Hi Syrone, Pat,

I ran into the same issue on the Clearfog Pro, and my colleague figured
out a way to make it disappear.
Please try out this patch and let me know if it helps on your boards:
https://github.com/Artox/lede-project/commit/db724f8ff1ed4c77668f691ed4d066a8e0f2693e

From db724f8ff1ed4c77668f691ed4d066a8e0f2693e Mon Sep 17 00:00:00 2001
From: Josua Mayer <josua.mayer97@gmail.com>
Date: Wed, 17 Aug 2016 16:42:07 +0200
Subject: [PATCH] mvebu: enable cpu hotplug support in kernel

This option prevents the rcu stalls in mvneta on armada-38x.

Signed-off-by: Josua Mayer <josua.mayer97@gmail.com>
---
 target/linux/mvebu/config-4.4 | 1 +
 1 file changed, 1 insertion(+)

Comments

Pat Fruth Aug. 17, 2016, 7:16 p.m. UTC | #1
Hello Josua,

To my great satisfaction, your suggestion works beautifully !!

Thank you very much for taking time to pass this suggestion along.

Pat



> On Aug 17, 2016, at 9:08 AM, Josua Mayer <josua.mayer97@gmail.com> wrote:
> 
> Hi Syrone, Pat,
> 
> I ran into the same issue on the Clearfog Pro, and my colleague figured
> out a way to make it disappear.
> Please try out this patch and let me know if it helps on your boards:
> https://github.com/Artox/lede-project/commit/db724f8ff1ed4c77668f691ed4d066a8e0f2693e
> 
> From db724f8ff1ed4c77668f691ed4d066a8e0f2693e Mon Sep 17 00:00:00 2001
> From: Josua Mayer <josua.mayer97@gmail.com>
> Date: Wed, 17 Aug 2016 16:42:07 +0200
> Subject: [PATCH] mvebu: enable cpu hotplug support in kernel
> 
> This option prevents the rcu stalls in mvneta on armada-38x.
> 
> Signed-off-by: Josua Mayer <josua.mayer97@gmail.com>
> ---
> target/linux/mvebu/config-4.4 | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/target/linux/mvebu/config-4.4 b/target/linux/mvebu/config-4.4
> index d0f042e..6c4ff70 100644
> --- a/target/linux/mvebu/config-4.4
> +++ b/target/linux/mvebu/config-4.4
> @@ -209,6 +209,7 @@ CONFIG_HAVE_UID16=y
> CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
> CONFIG_HIGHMEM=y
> # CONFIG_HIGHPTE is not set
> +CONFIG_HOTPLUG_CPU=y
> CONFIG_HWBM=y
> CONFIG_HWMON=y
> CONFIG_HZ_FIXED=0
> -- 
> 2.6.6
> 
> 
> Am 15.08.2016 um 08:57 schrieb Syrone Wong:
>> I have the same issue. Everything works well on
>> https://github.com/lede-project/source/commit/22ef1c83b35cd5633b0c58c9c38a43494a906a6a,
>> boot hang when compiling
>> https://github.com/lede-project/source/commit/b9b665ae49469a73d254b1a219a4a7c4e22f27c0
>> last night.
>> 
>> I'm too lazy to attach TTL cable, then I revert to the older version.
>> 
>> I hope my information help.
>> 
>> Best Regards,
>> Syrone Wong
>> 
>> 
>> On Mon, Aug 15, 2016 at 2:27 PM, pat <pat@patfruth.com> wrote:
>>> Dear LEDE devs,
>>> 
>>> There doesn’t appear to be an LEDE forum yet, else I’d post it on the forum.  So I’m hoping someone on the mail list has a suggestion here.
>>> 
>>> I’ve been running a build of OpenWRT DD R49195 since earlier this year.
>>> I thought I’d try to move to LEDE.
>>> 
>>> To that end, I’ve just built an image based on LEDE r1318 for a Linksys WRT1900AC V1 (aka mamba).
>>> The build completed successfully, and the image appears to have flashed successfully.
>>> Upon booting, the boot process stalls/hangs.  I see the following in a serial/tty console;
>>> 
>>> .
>>> …
>>> ….
>>> [   16.641909] device eth0 entered promiscuous mode
>>> [   16.649345] IPv6: ADDRCONF(NETDEV_UP): br-lan: link is not ready
>>> [   76.692269] INFO: rcu_sched self-detected stall on CPU
>>> [   76.697461]  1-...: (6000 ticks this GP) idle=4b7/140000000000001/0 softirq=902/919 fqs=5980
>>> [   76.702321] INFO: rcu_sched detected stalls on CPUs/tasks:
>>> [   76.702341]  1-...: (6000 ticks this GP) idle=4b7/140000000000001/0 softirq=902/919 fqs=5980
>>> [   76.702352]  (detected by 0, t=6002 jiffies, g=111, c=110, q=990)
>>> [   76.702356] Task dump for CPU 1:
>>> [   76.702368] netifd          R running      0  1106      1 0x00000002
>>> [   76.702401] [<c00101ec>] (__schedule) from [<c00da48c>] (SyS_ioctl+0x34/0x5c)
>>> [   76.702418] [<c00da48c>] (SyS_ioctl) from [<c0009c80>] (ret_fast_syscall+0x0/0x3c)
>>> [   76.750523]   (t=6006 jiffies g=111 c=110 q=990)
>>> [   76.755178] Task dump for CPU 1:
>>> [   76.758420] netifd          R running      0  1106      1 0x00000002
>>> [   76.764838] [<c001fa3c>] (unwind_backtrace) from [<c001c3a4>] (show_stack+0x10/0x14)
>>> [   76.772617] [<c001c3a4>] (show_stack) from [<c006b3b8>] (rcu_dump_cpu_stacks+0x78/0xb0)
>>> [   76.780648] [<c006b3b8>] (rcu_dump_cpu_stacks) from [<c006e7c0>] (rcu_check_callbacks+0x28c/0x754)
>>> [   76.789637] [<c006e7c0>] (rcu_check_callbacks) from [<c00708dc>] (update_process_times+0x38/0x64)
>>> [   76.798543] [<c00708dc>] (update_process_times) from [<c007f738>] (tick_sched_timer+0x21c/0x260)
>>> [   76.807358] [<c007f738>] (tick_sched_timer) from [<c0071694>] (__hrtimer_run_queues+0xf8/0x1b8)
>>> [   76.816084] [<c0071694>] (__hrtimer_run_queues) from [<c00718ac>] (hrtimer_interrupt+0xac/0x200)
>>> [   76.824898] [<c00718ac>] (hrtimer_interrupt) from [<c02edac0>] (armada_370_xp_timer_interrupt+0x30/0x38)
>>> [   76.834407] [<c02edac0>] (armada_370_xp_timer_interrupt) from [<c00664f0>] (handle_percpu_devid_irq+0x6c/0x84)
>>> [   76.844447] [<c00664f0>] (handle_percpu_devid_irq) from [<c00623c0>] (generic_handle_irq+0x24/0x34)
>>> [   76.853521] [<c00623c0>] (generic_handle_irq) from [<c0062698>] (__handle_domain_irq+0x98/0xac)
>>> [   76.862247] [<c0062698>] (__handle_domain_irq) from [<c0009428>] (armada_370_xp_handle_irq+0x50/0xb0)
>>> [   76.871496] [<c0009428>] (armada_370_xp_handle_irq) from [<c000a5f4>] (__irq_svc+0x54/0x70)
>>> [   76.879869] Exception stack(0xce1b5de8 to 0xce1b5e30)
>>> [   76.884938] 5de0:                   00000000 cf83fca4 00000000 cf83eca4 c05c6614 cf83fca4
>>> [   76.893141] 5e00: cf83fc80 cf83f830 00000000 00000000 00000000 00000000 cf83eca8 ce1b5e38
>>> [   76.901341] 5e20: c0028e20 c0042198 a0000013 ffffffff
>>> [   76.906417] [<c000a5f4>] (__irq_svc) from [<c0042198>] (raw_notifier_chain_register+0x10/0x40)
>>> [   76.915064] [<c0042198>] (raw_notifier_chain_register) from [<c0028e20>] (register_cpu_notifier+0x28/0x3c)
>>> [   76.924759] [<c0028e20>] (register_cpu_notifier) from [<c028d29c>] (mvneta_open+0xb8/0x170)
>>> [   76.933147] [<c028d29c>] (mvneta_open) from [<c03164a0>] (__dev_open+0x8c/0x108)
>>> [   76.940569] [<c03164a0>] (__dev_open) from [<c0316754>] (__dev_change_flags+0xb0/0x140)
>>> [   76.948597] [<c0316754>] (__dev_change_flags) from [<c03167fc>] (dev_change_flags+0x18/0x48)
>>> [   76.957072] [<c03167fc>] (dev_change_flags) from [<c032b4a8>] (dev_ifsioc+0xd0/0x320)
>>> [   76.964928] [<c032b4a8>] (dev_ifsioc) from [<c032bf60>] (dev_ioctl+0x7f4/0x8c0)
>>> [   76.972266] [<c032bf60>] (dev_ioctl) from [<c00da408>] (do_vfs_ioctl+0x6a4/0x6f4)
>>> [   76.979772] [<c00da408>] (do_vfs_ioctl) from [<c00da48c>] (SyS_ioctl+0x34/0x5c)
>>> [   76.987106] [<c00da48c>] (SyS_ioctl) from [<c0009c80>] (ret_fast_syscall+0x0/0x3c)
>>> 
>>> It seems netifd is hung (maybe in the mvneta driver)????
>>> Has anyone else already seen this?
>>> What is causing this?
>>> How do I fix it?
>>> 
>>> Thanks
>>> 
>>> 
>>> _______________________________________________
>>> Lede-dev mailing list
>>> Lede-dev@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/lede-dev
>> 
>> _______________________________________________
>> Lede-dev mailing list
>> Lede-dev@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/lede-dev
>>
Syrone Wong Aug. 17, 2016, 11:39 p.m. UTC | #2
Hello Josua, pat,

I haven't tested this yet. Thanks for your effort.

Are you sure this is the root cause? Everything works well in the past
without this config being enabled.

If you say yes, please send a PR or send a patch to mailing list.




Best Regards,
Syrone Wong


On Thu, Aug 18, 2016 at 3:16 AM, pat <pat@patfruth.com> wrote:
> Hello Josua,
>
> To my great satisfaction, your suggestion works beautifully !!
>
> Thank you very much for taking time to pass this suggestion along.
>
> Pat
>
>
>
>> On Aug 17, 2016, at 9:08 AM, Josua Mayer <josua.mayer97@gmail.com> wrote:
>>
>> Hi Syrone, Pat,
>>
>> I ran into the same issue on the Clearfog Pro, and my colleague figured
>> out a way to make it disappear.
>> Please try out this patch and let me know if it helps on your boards:
>> https://github.com/Artox/lede-project/commit/db724f8ff1ed4c77668f691ed4d066a8e0f2693e
>>
>> From db724f8ff1ed4c77668f691ed4d066a8e0f2693e Mon Sep 17 00:00:00 2001
>> From: Josua Mayer <josua.mayer97@gmail.com>
>> Date: Wed, 17 Aug 2016 16:42:07 +0200
>> Subject: [PATCH] mvebu: enable cpu hotplug support in kernel
>>
>> This option prevents the rcu stalls in mvneta on armada-38x.
>>
>> Signed-off-by: Josua Mayer <josua.mayer97@gmail.com>
>> ---
>> target/linux/mvebu/config-4.4 | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/target/linux/mvebu/config-4.4 b/target/linux/mvebu/config-4.4
>> index d0f042e..6c4ff70 100644
>> --- a/target/linux/mvebu/config-4.4
>> +++ b/target/linux/mvebu/config-4.4
>> @@ -209,6 +209,7 @@ CONFIG_HAVE_UID16=y
>> CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
>> CONFIG_HIGHMEM=y
>> # CONFIG_HIGHPTE is not set
>> +CONFIG_HOTPLUG_CPU=y
>> CONFIG_HWBM=y
>> CONFIG_HWMON=y
>> CONFIG_HZ_FIXED=0
>> --
>> 2.6.6
>>
>>
>> Am 15.08.2016 um 08:57 schrieb Syrone Wong:
>>> I have the same issue. Everything works well on
>>> https://github.com/lede-project/source/commit/22ef1c83b35cd5633b0c58c9c38a43494a906a6a,
>>> boot hang when compiling
>>> https://github.com/lede-project/source/commit/b9b665ae49469a73d254b1a219a4a7c4e22f27c0
>>> last night.
>>>
>>> I'm too lazy to attach TTL cable, then I revert to the older version.
>>>
>>> I hope my information help.
>>>
>>> Best Regards,
>>> Syrone Wong
>>>
>>>
>>> On Mon, Aug 15, 2016 at 2:27 PM, pat <pat@patfruth.com> wrote:
>>>> Dear LEDE devs,
>>>>
>>>> There doesn’t appear to be an LEDE forum yet, else I’d post it on the forum.  So I’m hoping someone on the mail list has a suggestion here.
>>>>
>>>> I’ve been running a build of OpenWRT DD R49195 since earlier this year.
>>>> I thought I’d try to move to LEDE.
>>>>
>>>> To that end, I’ve just built an image based on LEDE r1318 for a Linksys WRT1900AC V1 (aka mamba).
>>>> The build completed successfully, and the image appears to have flashed successfully.
>>>> Upon booting, the boot process stalls/hangs.  I see the following in a serial/tty console;
>>>>
>>>> .
>>>> …
>>>> ….
>>>> [   16.641909] device eth0 entered promiscuous mode
>>>> [   16.649345] IPv6: ADDRCONF(NETDEV_UP): br-lan: link is not ready
>>>> [   76.692269] INFO: rcu_sched self-detected stall on CPU
>>>> [   76.697461]  1-...: (6000 ticks this GP) idle=4b7/140000000000001/0 softirq=902/919 fqs=5980
>>>> [   76.702321] INFO: rcu_sched detected stalls on CPUs/tasks:
>>>> [   76.702341]  1-...: (6000 ticks this GP) idle=4b7/140000000000001/0 softirq=902/919 fqs=5980
>>>> [   76.702352]  (detected by 0, t=6002 jiffies, g=111, c=110, q=990)
>>>> [   76.702356] Task dump for CPU 1:
>>>> [   76.702368] netifd          R running      0  1106      1 0x00000002
>>>> [   76.702401] [<c00101ec>] (__schedule) from [<c00da48c>] (SyS_ioctl+0x34/0x5c)
>>>> [   76.702418] [<c00da48c>] (SyS_ioctl) from [<c0009c80>] (ret_fast_syscall+0x0/0x3c)
>>>> [   76.750523]   (t=6006 jiffies g=111 c=110 q=990)
>>>> [   76.755178] Task dump for CPU 1:
>>>> [   76.758420] netifd          R running      0  1106      1 0x00000002
>>>> [   76.764838] [<c001fa3c>] (unwind_backtrace) from [<c001c3a4>] (show_stack+0x10/0x14)
>>>> [   76.772617] [<c001c3a4>] (show_stack) from [<c006b3b8>] (rcu_dump_cpu_stacks+0x78/0xb0)
>>>> [   76.780648] [<c006b3b8>] (rcu_dump_cpu_stacks) from [<c006e7c0>] (rcu_check_callbacks+0x28c/0x754)
>>>> [   76.789637] [<c006e7c0>] (rcu_check_callbacks) from [<c00708dc>] (update_process_times+0x38/0x64)
>>>> [   76.798543] [<c00708dc>] (update_process_times) from [<c007f738>] (tick_sched_timer+0x21c/0x260)
>>>> [   76.807358] [<c007f738>] (tick_sched_timer) from [<c0071694>] (__hrtimer_run_queues+0xf8/0x1b8)
>>>> [   76.816084] [<c0071694>] (__hrtimer_run_queues) from [<c00718ac>] (hrtimer_interrupt+0xac/0x200)
>>>> [   76.824898] [<c00718ac>] (hrtimer_interrupt) from [<c02edac0>] (armada_370_xp_timer_interrupt+0x30/0x38)
>>>> [   76.834407] [<c02edac0>] (armada_370_xp_timer_interrupt) from [<c00664f0>] (handle_percpu_devid_irq+0x6c/0x84)
>>>> [   76.844447] [<c00664f0>] (handle_percpu_devid_irq) from [<c00623c0>] (generic_handle_irq+0x24/0x34)
>>>> [   76.853521] [<c00623c0>] (generic_handle_irq) from [<c0062698>] (__handle_domain_irq+0x98/0xac)
>>>> [   76.862247] [<c0062698>] (__handle_domain_irq) from [<c0009428>] (armada_370_xp_handle_irq+0x50/0xb0)
>>>> [   76.871496] [<c0009428>] (armada_370_xp_handle_irq) from [<c000a5f4>] (__irq_svc+0x54/0x70)
>>>> [   76.879869] Exception stack(0xce1b5de8 to 0xce1b5e30)
>>>> [   76.884938] 5de0:                   00000000 cf83fca4 00000000 cf83eca4 c05c6614 cf83fca4
>>>> [   76.893141] 5e00: cf83fc80 cf83f830 00000000 00000000 00000000 00000000 cf83eca8 ce1b5e38
>>>> [   76.901341] 5e20: c0028e20 c0042198 a0000013 ffffffff
>>>> [   76.906417] [<c000a5f4>] (__irq_svc) from [<c0042198>] (raw_notifier_chain_register+0x10/0x40)
>>>> [   76.915064] [<c0042198>] (raw_notifier_chain_register) from [<c0028e20>] (register_cpu_notifier+0x28/0x3c)
>>>> [   76.924759] [<c0028e20>] (register_cpu_notifier) from [<c028d29c>] (mvneta_open+0xb8/0x170)
>>>> [   76.933147] [<c028d29c>] (mvneta_open) from [<c03164a0>] (__dev_open+0x8c/0x108)
>>>> [   76.940569] [<c03164a0>] (__dev_open) from [<c0316754>] (__dev_change_flags+0xb0/0x140)
>>>> [   76.948597] [<c0316754>] (__dev_change_flags) from [<c03167fc>] (dev_change_flags+0x18/0x48)
>>>> [   76.957072] [<c03167fc>] (dev_change_flags) from [<c032b4a8>] (dev_ifsioc+0xd0/0x320)
>>>> [   76.964928] [<c032b4a8>] (dev_ifsioc) from [<c032bf60>] (dev_ioctl+0x7f4/0x8c0)
>>>> [   76.972266] [<c032bf60>] (dev_ioctl) from [<c00da408>] (do_vfs_ioctl+0x6a4/0x6f4)
>>>> [   76.979772] [<c00da408>] (do_vfs_ioctl) from [<c00da48c>] (SyS_ioctl+0x34/0x5c)
>>>> [   76.987106] [<c00da48c>] (SyS_ioctl) from [<c0009c80>] (ret_fast_syscall+0x0/0x3c)
>>>>
>>>> It seems netifd is hung (maybe in the mvneta driver)????
>>>> Has anyone else already seen this?
>>>> What is causing this?
>>>> How do I fix it?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> _______________________________________________
>>>> Lede-dev mailing list
>>>> Lede-dev@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/lede-dev
>>>
>>> _______________________________________________
>>> Lede-dev mailing list
>>> Lede-dev@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/lede-dev
>>>
>
diff mbox

Patch

diff --git a/target/linux/mvebu/config-4.4 b/target/linux/mvebu/config-4.4
index d0f042e..6c4ff70 100644
--- a/target/linux/mvebu/config-4.4
+++ b/target/linux/mvebu/config-4.4
@@ -209,6 +209,7 @@  CONFIG_HAVE_UID16=y
 CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y
 CONFIG_HIGHMEM=y
 # CONFIG_HIGHPTE is not set
+CONFIG_HOTPLUG_CPU=y
 CONFIG_HWBM=y
 CONFIG_HWMON=y
 CONFIG_HZ_FIXED=0