diff mbox

debug problems on ppc 83xx target due to changed struct task_struct

Message ID 41e00d07-d7ce-0198-acce-ac25db8c9df3@keymile.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Holger Brunck Aug. 15, 2016, 2:35 p.m. UTC
On 12/08/16 18:09, Dave Hansen wrote:
> On 08/12/2016 08:47 AM, Holger Brunck wrote:
>> On 12/08/16 17:14, Dave Hansen wrote:
>>> On 08/12/2016 07:50 AM, Holger Brunck wrote:
>>>> When I try to debug our multithreaded userspace application with gdb I  get
>>>> stuck when trying to single step code.
>>>
>>> Can you clarify "stuck"?  Like the instructions don't advance?  Have you
>>> been able to find a root cause for this?
>>
>> the behaviour is slightly different on the kernel versions. So my setup is a
>> remote debug session via gdbserver.
>>
>> After connecting to the gdbserver I set a break point and start to run my
>> program. When hitting the breakpoint I try to single step. With stuck I mean
>> that the connection to the gdbserver is broken and I can't control my debug
>> session anymore while the application is not continuing.
> 
> Could you try debugging locally with gdb?  It would be nice to take all
> the stuff involved with remote debugging out of the picture.
> 

I tried this but unfortunately the error only occurs while remote debugging.
Locally with gdb everything works fine. BTW we double-checked with a 85xx ppc
target which is also 32-bit and it ends up with the same behaviour.

I was also investigating where I have to move the line in the struct task_struct
and it turns out to be like this (diff to 4.7 kernel):

because this is highly used in case of gdb debugging.

> Have you tried turning on a bunch of kernel debugging (SLAB/SLUB
> debugging, pagealloc debug, lockdep, etc...)?  If something is getting
> corrupted, those tend to catch it.
>

I switched on some memory debugging features but didn't get suspicious output.
To make the situation even more weird after enabling FTRACE in the kernel to
trace some signal code the error disappeared.

> 
> Is the process still alive at the point that the remote debugger stops
> responding?  What is it doing at that point?
> 

the process is still alive. The state of the process, it's threads and the
gdbserver is like this:

Bad case after a single step:
 73    73 TS       -   0  19   0  0.3 S    sigsuspend     gdbserver
 74    74 TS       -   0  19   0  0.0 tl+  ptrace_stop    infra_pbec83xx_
 74    77 IDL      0   -  19   0  0.0 tl+  ptrace_stop    TR_Task
 74    78 IDL      0   -  19   0  0.0 tl+  ptrace_stop    TR_Timeout
 74    79 TS       -   0  19   0  0.0 tl+  poll_schedule_ timed_msg
 74    80 IDL      0   -  19   0  0.0 tl+  ptrace_stop    stimuli
 74    81 TS       -  -5  24   0  0.0 t<l+ ptrace_stop    timer0Dflt
 74    82 TS       - -19  38   0  0.0 t<l+ futex_wait_que timerUpd0
 74    83 TS       - -19  38   0  0.0 t<l+ timerfd_read   timerClk
 74    84 TS       - -19  38   0  0.0 t<l+ ptrace_stop    b/beatWDogRefr

Good case after a single step:
 76    76 TS       -   0  19   0  4.0 S    poll_schedule_ gdbserver
 77    77 TS       -   0  19   0  0.0 tl   ptrace_stop    infra_pbec83xx_
 77    84 IDL      0   -  19   0  0.0 tl   ptrace_stop    TR_Task
 77    85 IDL      0   -  19   0  0.0 tl   ptrace_stop    TR_Timeout
 77    86 TS       -   0  19   0  0.0 tl   ptrace_stop    timed_msg
 77    87 IDL      0   -  19   0  0.0 tl   ptrace_stop    stimuli
 77    88 TS       -  -5  24   0  0.0 t<l  ptrace_stop    timer0Dflt
 77    89 TS       - -19  38   0  0.0 t<l  ptrace_stop    timerUpd0
 77    90 TS       - -19  38   0  0.0 t<l  ptrace_stop    timerClk
 77    91 TS       - -19  38   0  0.0 t<l  ptrace_stop    b/beatWDogRefr

So in the error case only some threads are at ptrace_stop, while all of them
should be after a single step with the gdb. So it's somewhere in the signal
handling between kernel and gdbserver.

Best regards
Holger Brunck

Comments

Dave Hansen Aug. 15, 2016, 4:19 p.m. UTC | #1
On 08/15/2016 07:35 AM, Holger Brunck wrote:
> I tried this but unfortunately the error only occurs while remote debugging.
> Locally with gdb everything works fine. BTW we double-checked with a 85xx ppc
> target which is also 32-bit and it ends up with the same behaviour.
> 
> I was also investigating where I have to move the line in the struct task_struct
> and it turns out to be like this (diff to 4.7 kernel):
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 253538f..4868874 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1655,7 +1655,9 @@ struct task_struct {
>         struct signal_struct *signal;
>         struct sighand_struct *sighand;
> 
> +       // struct thread_struct thread;   // until here everything is fine
>         sigset_t blocked, real_blocked;
> +       struct thread_struct thread;      // from here it's broken
>         sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
>         struct sigpending pending;

Wow, thanks for all the debugging here!

So, we know it has to do with signals, thread_info, and probably only
affects 32-bit powerpc.  Seems awfully weird.  Have you checked with any
of the 64-bit powerpc guys to see if they have any ideas?

I went grepping around for a bit.

Where is the task_struct stored?  Is it on-stack on ppc32 or something?
 The thread_info is, I assume, but I see some THREAD_INFO vs. THREAD
(thread struct) math happening in here, which confuses me:

        .globl  ret_from_debug_exc
ret_from_debug_exc:
        mfspr   r9,SPRN_SPRG_THREAD
        lwz     r10,SAVED_KSP_LIMIT(r1)
        stw     r10,KSP_LIMIT(r9)
        lwz     r9,THREAD_INFO-THREAD(r9)
        CURRENT_THREAD_INFO(r10, r1)
        lwz     r10,TI_PREEMPT(r10)
        stw     r10,TI_PREEMPT(r9)
        RESTORE_xSRR(SRR0,SRR1);
        RESTORE_xSRR(CSRR0,CSRR1);
        RESTORE_MMU_REGS;
        RET_FROM_EXC_LEVEL(SPRN_DSRR0, SPRN_DSRR1, PPC_RFDI)

But, I'm really at a loss to explain this.  It still seems like a deeply
ppc-specific issue.  We can obviously work around it with an #ifdef for
your platform, but that's awfully hackish and hides the real bug,
whatever it is.

My suspicion is that there's a bug in the 32-bit ppc assembly somewhere.
 I don't see any references to 'blocked' or 'real_blocked' in assembly
though.  You could add a bunch of padding instead of moving the
thread_struct and see if that does anything, but that's really a stab in
the dark.
Christophe Leroy Aug. 16, 2016, 5:27 p.m. UTC | #2
Le 15/08/2016 à 18:19, Dave Hansen a écrit :
> On 08/15/2016 07:35 AM, Holger Brunck wrote:
>> I tried this but unfortunately the error only occurs while remote debugging.
>> Locally with gdb everything works fine. BTW we double-checked with a 85xx ppc
>> target which is also 32-bit and it ends up with the same behaviour.
>>
>> I was also investigating where I have to move the line in the struct task_struct
>> and it turns out to be like this (diff to 4.7 kernel):
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 253538f..4868874 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1655,7 +1655,9 @@ struct task_struct {
>>         struct signal_struct *signal;
>>         struct sighand_struct *sighand;
>>
>> +       // struct thread_struct thread;   // until here everything is fine
>>         sigset_t blocked, real_blocked;
>> +       struct thread_struct thread;      // from here it's broken
>>         sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
>>         struct sigpending pending;
>
> Wow, thanks for all the debugging here!
>
> So, we know it has to do with signals, thread_info, and probably only
> affects 32-bit powerpc.  Seems awfully weird.  Have you checked with any
> of the 64-bit powerpc guys to see if they have any ideas?
>
> I went grepping around for a bit.
>
> Where is the task_struct stored?  Is it on-stack on ppc32 or something?
>  The thread_info is, I assume, but I see some THREAD_INFO vs. THREAD
> (thread struct) math happening in here, which confuses me:
>
>         .globl  ret_from_debug_exc
> ret_from_debug_exc:
>         mfspr   r9,SPRN_SPRG_THREAD
>         lwz     r10,SAVED_KSP_LIMIT(r1)
>         stw     r10,KSP_LIMIT(r9)
>         lwz     r9,THREAD_INFO-THREAD(r9)
>         CURRENT_THREAD_INFO(r10, r1)
>         lwz     r10,TI_PREEMPT(r10)
>         stw     r10,TI_PREEMPT(r9)
>         RESTORE_xSRR(SRR0,SRR1);
>         RESTORE_xSRR(CSRR0,CSRR1);
>         RESTORE_MMU_REGS;
>         RET_FROM_EXC_LEVEL(SPRN_DSRR0, SPRN_DSRR1, PPC_RFDI)
>
> But, I'm really at a loss to explain this.  It still seems like a deeply
> ppc-specific issue.  We can obviously work around it with an #ifdef for
> your platform, but that's awfully hackish and hides the real bug,
> whatever it is.
>
> My suspicion is that there's a bug in the 32-bit ppc assembly somewhere.
>  I don't see any references to 'blocked' or 'real_blocked' in assembly
> though.  You could add a bunch of padding instead of moving the
> thread_struct and see if that does anything, but that's really a stab in
> the dark.
>

Just to let you know, I'm not sure it is the same issue, but I also get 
my 8xx target stuck when I try to use gdbserver.

If I debug a very small app, it gets stuck quickly after the app has 
stopped: indeed, the console seems ok but as soon as I try to execute 
something simple, like a ps or top, it get stuck. The target still 
responds to pings, but nothing else.

If I debug a big app, it gets stuck soon after the start of debug: I set 
a bpoint at main(), do a 'continue', get breaked at main(), do some 
steps with 'next' then it gets stuck.

I have tried moving the struct thread_struct thread but it has no impact.

I have tried to run gdbserver with 'perf top' running on another 
console, and then 'perf top' stops updating itself. The quick keys are 
still working, but the data is not updated anymore and the stats on top 
all goes to 0 (# irq/sec etc...)

Christophe

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus
Dave Hansen Aug. 16, 2016, 5:36 p.m. UTC | #3
On 08/16/2016 10:27 AM, christophe leroy wrote:
> If I debug a very small app, it gets stuck quickly after the app has
> stopped: indeed, the console seems ok but as soon as I try to execute
> something simple, like a ps or top, it get stuck. The target still
> responds to pings, but nothing else.

This one is a pretty common symptom when the kernel crashes holding a
per-process lock of some kind, probably most commonly mmap_sem.

Do you have a serial console?  Can you do a sysrq-t on it to get stack
dumps of the hung processes and ps?

It's time to turn on all the debugging options we can find.  Can you
build a kernel with all this stuff enabled (=y on all of these):

CONFIG_SLUB_DEBUG
CONFIG_DEBUG_VM
CONFIG_DEBUG_PAGEALLOC
CONFIG_PAGE_POISONING
CONFIG_SLUB_DEBUG_ON
CONFIG_KASAN
CONFIG_*LOCKUP_DETECTOR
CONFIG_DETECT_HUNG_TASK
CONFIG_SCHED_DEBUG
CONFIG_DEBUG_PREEMPT
CONFIG_MAGIC_SYSRQ

And all of the spinlock/lockdep stuff enabled too:

CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_STACKTRACE=y
CONFIG_DEBUG_LIST=y

Let's hope that these turn something up.
Benjamin Herrenschmidt Aug. 16, 2016, 10:13 p.m. UTC | #4
On Mon, 2016-08-15 at 09:19 -0700, Dave Hansen wrote:

> Wow, thanks for all the debugging here!

Yup, thanks, that's really odd... I wonder if one of those
structures is accessed beyond it's boundary, either the sigset
or the thread struct, causing corruption of neighbouring fields
in task struct...

Can you try adding a little canary on both sides (make it not-so-little 
maybe a few words) which you initialize to a known pattern and check
every now and then ?

> So, we know it has to do with signals, thread_info, and probably only
> affects 32-bit powerpc.  Seems awfully weird.  Have you checked with
> any
> of the 64-bit powerpc guys to see if they have any ideas?
> 
> I went grepping around for a bit.
> 
> Where is the task_struct stored?  Is it on-stack on ppc32 or
> something?

No it's allocated normally.

>  The thread_info is,

Yes, thread_info is at the bottom of stack

>  I assume, but I see some THREAD_INFO vs. THREAD
> (thread struct) math happening in here, which confuses me:
>  
>         .globl  ret_from_debug_exc
> ret_from_debug_exc:
>         mfspr   r9,SPRN_SPRG_THREAD
>         lwz     r10,SAVED_KSP_LIMIT(r1)
>         stw     r10,KSP_LIMIT(r9)
>         lwz     r9,THREAD_INFO-THREAD(r9)

This calculates the offset between the thread struct and the pointer
to thread info inside task struct and loads that pointer into r9

>         CURRENT_THREAD_INFO(r10, r1)
>         lwz     r10,TI_PREEMPT(r10)
>         stw     r10,TI_PREEMPT(r9)
>         RESTORE_xSRR(SRR0,SRR1);
>         RESTORE_xSRR(CSRR0,CSRR1);
>         RESTORE_MMU_REGS;
>         RET_FROM_EXC_LEVEL(SPRN_DSRR0, SPRN_DSRR1, PPC_RFDI)

Basically the above code transfers TI_PREEMPT from the "current"
thread info which I believe would be on some exception/interrupt
stack into the current task thread info.

> But, I'm really at a loss to explain this.  It still seems like a
> deeply
> ppc-specific issue.  We can obviously work around it with an #ifdef
> for
> your platform, but that's awfully hackish and hides the real bug,
> whatever it is.
> 
> My suspicion is that there's a bug in the 32-bit ppc assembly
> somewhere.
>  I don't see any references to 'blocked' or 'real_blocked' in
> assembly
> though.  You could add a bunch of padding instead of moving the
> thread_struct and see if that does anything, but that's really a stab
> in
> the dark.
Christophe Leroy Aug. 17, 2016, 8:22 a.m. UTC | #5
Le 16/08/2016 à 19:36, Dave Hansen a écrit :
> On 08/16/2016 10:27 AM, christophe leroy wrote:
>> If I debug a very small app, it gets stuck quickly after the app has
>> stopped: indeed, the console seems ok but as soon as I try to execute
>> something simple, like a ps or top, it get stuck. The target still
>> responds to pings, but nothing else.
>
> This one is a pretty common symptom when the kernel crashes holding a
> per-process lock of some kind, probably most commonly mmap_sem.
>
> Do you have a serial console?  Can you do a sysrq-t on it to get stack
> dumps of the hung processes and ps?
>
> It's time to turn on all the debugging options we can find.  Can you
> build a kernel with all this stuff enabled (=y on all of these):
>
> CONFIG_SLUB_DEBUG
> CONFIG_DEBUG_VM
> CONFIG_DEBUG_PAGEALLOC
> CONFIG_PAGE_POISONING
> CONFIG_SLUB_DEBUG_ON
> CONFIG_KASAN
> CONFIG_*LOCKUP_DETECTOR
> CONFIG_DETECT_HUNG_TASK
> CONFIG_SCHED_DEBUG
> CONFIG_DEBUG_PREEMPT
> CONFIG_MAGIC_SYSRQ
>
> And all of the spinlock/lockdep stuff enabled too:
>
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_PROVE_LOCKING=y
> CONFIG_LOCKDEP=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
> CONFIG_STACKTRACE=y
> CONFIG_DEBUG_LIST=y
>
> Let's hope that these turn something up.
>

Looks like all those options don't turn anything up.

See below what I get with Sysrq. It looks like the time is freezed:
* Log date remains the same for each line
* Timers dump is identical the two times
* The running task is stuck in udelay()

[  415.307239] sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) 
show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f) 
kill-all-tasks(i) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) 
show-registers(p) show-all-timers(q) sync(s) show-task-states(t) 
unmount(u) show-blocked-tasks(w)

[  415.307239] sysrq: SysRq : Show Regs
[  415.307239] CPU: 0 PID: 281 Comm: sh Tainted: G        W 
4.4.14-s3k-dev-g08fc2be-svn #110
[  415.307239] task: c620d380 ti: c7fee000 task.ti: c6214000
[  415.307239] NIP: c0007e20 LR: c020f8a8 CTR: c020f874
[  415.307239] REGS: c62159c0 TRAP: 0501   Tainted: G        W 
(4.4.14-s3k-dev-g08fc2be-svn)
[  415.307239] MSR: 00009032 <EE,ME,IR,DR,RI>  CR: 24022884  XER: 00000000
[  415.307239]
GPR00: c020f8a0 c6215a70 c620d380 00000008 c601ba18 c020f8a0 00000000 
0000036a
GPR08: c620d7b0 00000000 cc38f55d 40324184 84022882
[  415.307239] NIP [c0007e20] udelay+0x30/0x3c
[  415.307239] LR [c020f8a8] nand_wait+0x34/0x12c
[  415.307239] Call Trace:
[  415.307239] [c6215a70] [c020f8a0] nand_wait+0x2c/0x12c (unreliable)
[  415.307239] [c6215a90] [c020e120] nand_write_page+0xb8/0x1a0
[  415.307239] [c6215ac0] [c020f248] nand_do_write_ops+0x1ec/0x44c
[  415.307239] [c6215b30] [c020f5ec] nand_write+0x68/0x94
[  415.307239] [c6215b70] [c02210e0] ubi_io_write+0x108/0x61c
[  415.307239] [c6215bb0] [c021e10c] ubi_eba_write_leb+0x88/0x7b4
[  415.307239] [c6215c20] [c0163bfc] ubifs_leb_write+0xac/0x144
[  415.307239] [c6215c50] [c0164740] ubifs_wbuf_sync_nolock+0xfc/0x33c
[  415.307239] [c6215c70] [c0158b7c] ubifs_jnl_write_inode+0xb0/0x208
[  415.307239] [c6215cb0] [c015f764] ubifs_write_inode+0xc4/0x14c
[  415.307239] [c6215cd0] [c015bd68] ubifs_writepage+0x1ac/0x22c
[  415.307239] [c6215d00] [c009a428] __writepage+0x24/0x7c
[  415.307239] [c6215d10] [c009c6e4] write_cache_pages+0x1d0/0x380
[  415.307239] [c6215db0] [c008fe6c] __filemap_fdatawrite_range+0x64/0x88
[  415.307239] [c6215de0] [c0090644] filemap_write_and_wait_range+0x38/0x94
[  415.307239] [c6215e00] [c015ac3c] ubifs_fsync+0x50/0xd4
[  415.307239] [c6215e20] [c0093bb8] generic_file_write_iter+0x200/0x2b4
[  415.307239] [c6215e50] [c015c294] ubifs_write_iter+0x104/0x1a8
[  415.307239] [c6215ea0] [c00cba6c] __vfs_write+0xc8/0x14c
[  415.307239] [c6215ef0] [c00cbc80] vfs_write+0xa4/0x178
[  415.307239] [c6215f10] [c00cbe64] SyS_write+0x4c/0xc4
[  415.307239] [c6215f40] [c000bbd4] ret_from_syscall+0x0/0x38
[  415.307239] Instruction dump:
[  415.307239] 3d20c05c 812975ac 7c6349d6 7d4c42e6 7d2c42e6 7d2a4850 
7f834840 409d0018
[  415.307239] 7c210b78 7d2c42e6 7d2a4850 7f834840 <419dfff0> 7c421378 
4e800020 7c0802a6


[  415.307239] sysrq: SysRq : Show Blocked State
[  415.307239]   task                PC stack   pid father
[  415.307239] Sched Debug Version: v0.11, 4.4.14-s3k-dev-g08fc2be-svn #110
[  415.307239] ktime                                   : 415174.352520
[  415.307239] sched_clk                               : 415307.239936
[  415.307239] cpu_clk                                 : 415307.239936
[  415.307239] jiffies                                 : 11517
[  415.307239]
[  415.307239] sysctl_sched
[  415.307239]   .sysctl_sched_latency                    : 6.000000
[  415.307239]   .sysctl_sched_min_granularity            : 0.750000
[  415.307239]   .sysctl_sched_wakeup_granularity         : 1.000000
[  415.307239]   .sysctl_sched_child_runs_first           : 0
[  415.307239]   .sysctl_sched_features                   : 44859
[  415.307239]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
[  415.307239]
[  415.307239] cpu#0
[  415.307239]   .nr_running                    : 2
[  415.307239]   .load                          : 2048
[  415.307239]   .nr_switches                   : 26656
[  415.307239]   .nr_load_updates               : 0
[  415.307239]   .nr_uninterruptible            : 0
[  415.307239]   .next_balance                  : 0.000000
[  415.307239]   .curr->pid                     : 281
[  415.307239]   .clock                         : 415307.239936
[  415.307239]   .clock_task                    : 415307.239936
[  415.307239]   .cpu_load[0]                   : 0
[  415.307239]   .cpu_load[1]                   : 0
[  415.307239]   .cpu_load[2]                   : 0
[  415.307239]   .cpu_load[3]                   : 0
[  415.307239]   .cpu_load[4]                   : 0
[  415.307239]
[  415.307239] cfs_rq[0]:
[  415.307239]   .exec_clock                    : 0.000000
[  415.307239]   .MIN_vruntime                  : 42490.860493
[  415.307239]   .min_vruntime                  : 42493.860493
[  415.307239]   .max_vruntime                  : 42490.860493
[  415.307239]   .spread                        : 0.000000
[  415.307239]   .spread0                       : 0.000000
[  415.307239]   .nr_spread_over                : 0
[  415.307239]   .nr_running                    : 2
[  415.307239]   .load                          : 2048
[  415.307239]
[  415.307239] rt_rq[0]:
[  415.307239]   .rt_nr_running                 : 0
[  415.307239]   .rt_throttled                  : 0
[  415.307239]   .rt_time                       : 0.000000
[  415.307239]   .rt_runtime                    : 950.000000
[  415.307239]
[  415.307239] dl_rq[0]:
[  415.307239]   .dl_nr_running                 : 0
[  415.307239]
[  415.307239] runnable tasks:
[  415.307239]             task   PID         tree-key  switches  prio 
   wait-time             sum-exec        sum-sleep
[  415.307239] 
----------------------------------------------------------------------------------------------------------
[  415.307239]             init     1     41218.246221       324   120 
       0.000000      7091.870720         0.000000
[  415.307239]         kthreadd     2     10041.075333        53   120 
       0.000000        39.044608         0.000000
[  415.307239]      ksoftirqd/0     3     42471.449421       525   120 
       0.000000       174.042368         0.000000
[  415.307239]      kworker/0:0     4       393.682966        10   120 
       0.000000         3.465472         0.000000
[  415.307239]     kworker/0:0H     5       542.229904         3   100 
       0.000000         0.373504         0.000000
[  415.307239]     kworker/u2:0     6     42490.860493       482   120 
       0.000000       682.979200         0.000000
[  415.307239]      rcu_preempt     7     42471.239885      7355   120 
       0.000000      2497.164544         0.000000
[  415.307239]        rcu_sched     8        16.696155         2   120 
       0.000000         0.232192         0.000000
[  415.307239]           rcu_bh     9        18.788058         2   120 
       0.000000         0.225792         0.000000
[  415.307239]        kdevtmpfs    10      9906.339623        53   120 
       0.000000        49.822080         0.000000
[  415.307239]             perf    11       209.966618         2   100 
       0.000000         0.418688         0.000000
[  415.307239]       khungtaskd    12     42271.067789        43   120 
       0.000000         9.401344         0.000000
[  415.307239]        writeback    13       393.600057         2   100 
       0.000000         0.301184         0.000000
[  415.307239]      kworker/0:1    14     42462.051405      3062   120 
       0.000000      1814.468864         0.000000
[  415.307239]           crypto    15       395.547701         2   100 
       0.000000         0.310016         0.000000
[  415.307239]           rpciod    16       547.595271         3   100 
       0.000000         0.346752         0.000000
[  415.307239]          kswapd0    17      1141.966520         3   120 
       0.000000         0.758656         0.000000
[  415.307239]    fsnotify_mark    18      1289.797239         2   120 
       0.000000         1.068032         0.000000
[  415.307239]           nfsiod    19      1294.974059         2   100 
       0.000000         0.303360         0.000000
[  415.307239]            bond0    37      4873.191411         2   100 
       0.000000         0.320512         0.000000
[  415.307239]    ipv6_addrconf    49      5541.654525         2   100 
       0.000000         0.303232         0.000000
[  415.307239]          deferwq    50      5664.433381         2   100 
       0.000000         0.299264         0.000000
[  415.307239]     kworker/u2:1    51     29504.315231         5   120 
       0.000000         0.803840         0.000000
[  415.307239]         spi32766    52     24174.314766       243   120 
       0.000000        44.892928         0.000000
[  415.307239]        ubi_bgt0d    63     10399.804677        11   120 
       0.000000        25.593344         0.000000
[  415.307239]     ubifs_bgt0_0    65     10052.319044         2   120 
       0.000000         7.036288         0.000000
[  415.307239]           udhcpc   171     19428.265114         1   120 
       0.000000         5.979520         0.000000
[  415.307239]         watchdog   188         0.654209       842     0 
       0.000000      3819.744512         0.000000
[  415.307239]             ntpd   208     42384.913677       419   120 
       0.000000       645.997952         0.000000
[  415.307239]            inetd   221     21775.101204         5   120 
       0.000000        39.310848         0.000000
[  415.307239]            crond   234     40244.762133        14   120 
       0.000000        49.742208         0.000000
[  415.307239]            snmpd   250     42386.978381      1732   120 
       0.000000     12158.752128         0.000000
[  415.307239]            login   263     28214.829541        46   120 
       0.000000       420.666240         0.000000
[  415.307239] R             sh   281     42490.860493       174   120 
       0.000000       538.372864         0.000000
[  415.307239]


[  415.307239] sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) 
show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f) 
kill-all-tasks(i) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) 
show-registers(p) show-all-timers(q) sync(s) show-task-states(t) 
unmount(u) show-blocked-tasks(w)
[  415.307239] sysrq: SysRq : Show clockevent devices & pending hrtimers 
(no others)
[  415.307239] Timer List Version: v0.8
[  415.307239] HRTIMER_MAX_CLOCK_BASES: 4
[  415.307239] now at 415174352520 nsecs
[  415.307239]
[  415.307239] cpu: 0
[  415.307239]  clock 0:
[  415.307239]   .base:       c05a6a00
[  415.307239]   .index:      0
[  415.307239]   .resolution: 1 nsecs
[  415.307239]   .get_time:   ktime_get
[  415.307239]   .offset:     0 nsecs
[  415.307239] active timers:
[  415.307239]  #0: tick_cpu_sched, tick_sched_timer, S:01
[  415.307239]  # expires at 415180000000-415180000000 nsecs [in 5647480 
to 5647480 nsecs]
[  415.307239]  #1: <c6485ea8>, hrtimer_wakeup, S:01
[  415.307239]  # expires at 415316748939-415316748939 nsecs [in 
142396419 to 142396419 nsecs]
[  415.307239]  #2: <c6481a28>, hrtimer_wakeup, S:01
[  415.307239]  # expires at 415325581283-415326561308 nsecs [in 
151228763 to 152208788 nsecs]
[  415.307239]  #3: <c61fa178>, it_real_fn, S:01
[  415.307239]  # expires at 415635620681-415635620681 nsecs [in 
461268161 to 461268161 nsecs]
[  415.307239]  #4: <c622dea8>, hrtimer_wakeup, S:01
[  415.307239]  # expires at 426702410278-426702460278 nsecs [in 
11528057758 to 11528107758 nsecs]
[  415.307239]  #5: <c6487a28>, hrtimer_wakeup, S:01
[  415.307239]  # expires at 129624716464923-129624816464923 nsecs [in 
129209542112403 to 129209642112403 nsecs]
[  415.307239]  clock 1:
[  415.307239]   .base:       c05a6a20
[  415.307239]   .index:      1
[  415.307239]   .resolution: 1 nsecs
[  415.307239]   .get_time:   ktime_get_real
[  415.307239]   .offset:     1471420915047270622 nsecs
[  415.307239] active timers:
[  415.307239]  clock 2:
[  415.307239]   .base:       c05a6a40
[  415.307239]   .index:      2
[  415.307239]   .resolution: 1 nsecs
[  415.307239]   .get_time:   ktime_get_boottime
[  415.307239]   .offset:     0 nsecs
[  415.307239] active timers:
[  415.307239]  clock 3:
[  415.307239]   .base:       c05a6a60
[  415.307239]   .index:      3
[  415.307239]   .resolution: 1 nsecs
[  415.307239]   .get_time:   ktime_get_clocktai
[  415.307239]   .offset:     1471420915047270622 nsecs
[  415.307239] active timers:
[  415.307239]   .expires_next   : 415180000000 nsecs
[  415.307239]   .hres_active    : 1
[  415.307239]   .nr_events      : 43162
[  415.307239]   .nr_retries     : 20
[  415.307239]   .nr_hangs       : 0
[  415.307239]   .max_hang_time  : 0
[  415.307239]   .nohz_mode      : 0
[  415.307239]   .last_tick      : 0 nsecs
[  415.307239]   .tick_stopped   : 0
[  415.307239]   .idle_jiffies   : 0
[  415.307239]   .idle_calls     : 0
[  415.307239]   .idle_sleeps    : 0
[  415.307239]   .idle_entrytime : 0 nsecs
[  415.307239]   .idle_waketime  : 0 nsecs
[  415.307239]   .idle_exittime  : 0 nsecs
[  415.307239]   .idle_sleeptime : 0 nsecs
[  415.307239]   .iowait_sleeptime: 0 nsecs
[  415.307239]   .last_jiffies   : 0
[  415.307239]   .next_timer     : 0
[  415.307239]   .idle_expires   : 0 nsecs
[  415.307239] jiffies: 11517
[  415.307239]
[  415.307239] Tick Device: mode:     1
[  415.307239] Per CPU device: 0
[  415.307239] Clock Event Device: decrementer
[  415.307239]  max_delta_ns:   260301049532
[  415.307239]  min_delta_ns:   1000
[  415.307239]  mult:           35433480
[  415.307239]  shift:          32
[  415.307239]  mode:           3
[  415.307239]  next_event:     415180000000 nsecs
[  415.307239]  set_next_event: decrementer_set_next_event
[  415.307239]  shutdown: decrementer_shutdown
[  415.307239]  resume:   decrementer_shutdown
[  415.307239]  event_handler:  hrtimer_interrupt
[  415.307239]  retries:        0
[  415.307239]


[  415.307239] sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) 
show-all-locks(d) terminate-all-tasks(e) memory-full-oom-kill(f) 
kill-all-tasks(i) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) 
show-registers(p) show-all-timers(q) sync(s) show-task-states(t) 
unmount(u) show-blocked-tasks(w)


[  415.307239] sysrq: SysRq : Show State
[  415.307239]   task                PC stack   pid father
[  415.307239] init            S 0fd6f410  5244     1      0 0x00000000
[  415.307239] Call Trace:
[  415.307239] [c6091dc0] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c6091e80] [c042b07c] __schedule+0x2cc/0x660
[  415.307239] [c6091eb0] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6091ec0] [c0018fb0] do_wait+0x1b0/0x1fc
[  415.307239] [c6091ef0] [c0019cf4] SyS_wait4+0x74/0xd8
[  415.307239] [c6091f40] [c000bbd4] ret_from_syscall+0x0/0x38
[  415.307239] kthreadd        S 00000000  6492     2      0 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c6093e00] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c6093ec0] [c042b100] __schedule+0x350/0x660
[  415.307239] [c6093ef0] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6093f00] [c00354d0] kthreadd+0x138/0x148
[  415.307239] [c6093f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] ksoftirqd/0     S 00000000  5644     3      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c6095d70] [c0048b8c] trace_hardirqs_on_caller+0x9c/0x1a0 
(unreliable)
[  415.307239] [c6095e30] [c042b100] __schedule+0x350/0x660
[  415.307239] [c6095e60] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6095e70] [c00387e8] smpboot_thread_fn+0x284/0x2ec
[  415.307239] [c6095eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c6095f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] kworker/0:0     S 00000000  6748     4      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c6097d60] [0000000c] 0xc (unreliable)
[  415.307239] [c6097e20] [c042b100] __schedule+0x350/0x660
[  415.307239] [c6097e50] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6097e60] [c002ef60] worker_thread+0x1b0/0x474
[  415.307239] [c6097eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c6097f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] kworker/0:0H    S 00000000  7404     5      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c6099d60] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c6099e20] [c042b100] __schedule+0x350/0x660
[  415.307239] [c6099e50] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6099e60] [c002ef60] worker_thread+0x1b0/0x474
[  415.307239] [c6099eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c6099f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] kworker/u2:0    R running   5324     6      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c609bd60] [c96e0000] 0xc96e0000 (unreliable)
[  415.307239] [c609be20] [c042b100] __schedule+0x350/0x660
[  415.307239] [c609be50] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c609be60] [c002ef60] worker_thread+0x1b0/0x474
[  415.307239] [c609beb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c609bf40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] rcu_preempt     S 00000000  6540     7      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c609dd30] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c609ddf0] [c042b100] __schedule+0x350/0x660
[  415.307239] [c609de20] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c609de30] [c005df00] rcu_gp_kthread+0xa4/0x8d0
[  415.307239] [c609deb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c609df40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] rcu_sched       S 00000000  7356     8      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c609fd30] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c609fdf0] [c042b100] __schedule+0x350/0x660
[  415.307239] [c609fe20] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c609fe30] [c005df00] rcu_gp_kthread+0xa4/0x8d0
[  415.307239] [c609feb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c609ff40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] rcu_bh          S 00000000  7356     9      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c60b1d30] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c60b1df0] [c042b100] __schedule+0x350/0x660
[  415.307239] [c60b1e20] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c60b1e30] [c005df00] rcu_gp_kthread+0xa4/0x8d0
[  415.307239] [c60b1eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c60b1f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] kdevtmpfs       S 00000000  6348    10      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c60b3d10] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c60b3dd0] [c042b100] __schedule+0x350/0x660
[  415.307239] [c60b3e00] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c60b3e10] [c01f4304] devtmpfsd+0x2e0/0x2e4
[  415.307239] [c60b3eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c60b3f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] perf            S 00000000  7388    11      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c60b5d50] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c60b5e10] [c042b100] __schedule+0x350/0x660
[  415.307239] [c60b5e40] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c60b5e50] [c002f458] rescuer_thread+0x234/0x2c8
[  415.307239] [c60b5eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c60b5f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] khungtaskd      S 00000000  7324    12      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c60b9dd0] [c042b100] __schedule+0x350/0x660
[  415.307239] [c60b9e00] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c60b9e10] [c042f2a0] schedule_timeout+0xfc/0x16c
[  415.307239] [c60b9e60] [c00788ac] watchdog+0x70/0x498
[  415.307239] [c60b9eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c60b9f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] writeback       S 00000000  7388    13      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c60bbd50] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c60bbe10] [c042b100] __schedule+0x350/0x660
[  415.307239] [c60bbe40] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c60bbe50] [c002f458] rescuer_thread+0x234/0x2c8
[  415.307239] [c60bbeb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c60bbf40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] kworker/0:1     S 00000000  6332    14      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c60bdd60] [c004a590] __lock_acquire+0x314/0x1a30 
(unreliable)
[  415.307239] [c60bde20] [c042b100] __schedule+0x350/0x660
[  415.307239] [c60bde50] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c60bde60] [c002ef60] worker_thread+0x1b0/0x474
[  415.307239] [c60bdeb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c60bdf40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] crypto          S 00000000  7388    15      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c60bfd50] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c60bfe10] [c042b100] __schedule+0x350/0x660
[  415.307239] [c60bfe40] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c60bfe50] [c002f458] rescuer_thread+0x234/0x2c8
[  415.307239] [c60bfeb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c60bff40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] rpciod          S 00000000  7388    16      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c6115d50] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c6115e10] [c042b100] __schedule+0x350/0x660
[  415.307239] [c6115e40] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6115e50] [c002f458] rescuer_thread+0x234/0x2c8
[  415.307239] [c6115eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c6115f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] kswapd0         S 00000000  7068    17      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c6117d10] [22000082] 0x22000082 (unreliable)
[  415.307239] [c6117dd0] [c042b100] __schedule+0x350/0x660
[  415.307239] [c6117e00] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6117e10] [c00a3eb4] kswapd+0x628/0x638
[  415.307239] [c6117eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c6117f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] fsnotify_mark   S 00000000  6620    18      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c6119d60] [00009032] 0x9032 (unreliable)
[  415.307239] [c6119e20] [c042b100] __schedule+0x350/0x660
[  415.307239] [c6119e50] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6119e60] [c00ffbb8] fsnotify_mark_destroy+0xfc/0x12c
[  415.307239] [c6119eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c6119f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] nfsiod          S 00000000  7388    19      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c611bd50] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c611be10] [c042b100] __schedule+0x350/0x660
[  415.307239] [c611be40] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c611be50] [c002f458] rescuer_thread+0x234/0x2c8
[  415.307239] [c611beb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c611bf40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] bond0           S 00000000  7388    37      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c6225d50] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c6225e10] [c042b100] __schedule+0x350/0x660
[  415.307239] [c6225e40] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6225e50] [c002f458] rescuer_thread+0x234/0x2c8
[  415.307239] [c6225eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c6225f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] ipv6_addrconf   S 00000000  7388    49      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c6211d50] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c6211e10] [c042b100] __schedule+0x350/0x660
[  415.307239] [c6211e40] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6211e50] [c002f458] rescuer_thread+0x234/0x2c8
[  415.307239] [c6211eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c6211f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] deferwq         S 00000000  7388    50      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c621fd50] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c621fe10] [c042b100] __schedule+0x350/0x660
[  415.307239] [c621fe40] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c621fe50] [c002f458] rescuer_thread+0x234/0x2c8
[  415.307239] [c621feb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c621ff40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] kworker/u2:1    S 00000000  7404    51      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c6213d60] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c6213e20] [c042b100] __schedule+0x350/0x660
[  415.307239] [c6213e50] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6213e60] [c002ef60] worker_thread+0x1b0/0x474
[  415.307239] [c6213eb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c6213f40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] spi32766        S 00000000  7404    52      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c621bd80] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c621be40] [c042b100] __schedule+0x350/0x660
[  415.307239] [c621be70] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c621be80] [c0035618] kthread_worker_fn+0xc8/0x124
[  415.307239] [c621beb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c621bf40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] ubi_bgt0d       S 00000000  6156    63      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c622fd70] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c622fe30] [c042b100] __schedule+0x350/0x660
[  415.307239] [c622fe60] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c622fe70] [c0223eac] ubi_thread+0xc8/0x188
[  415.307239] [c622feb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c622ff40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] ubifs_bgt0_0    S 00000000  7180    65      2 0x00000800
[  415.307239] Call Trace:
[  415.307239] [c622bd80] [c042ff6c] 
_raw_spin_unlock_irqrestore+0x48/0x98 (unreliable)
[  415.307239] [c622be40] [c042b100] __schedule+0x350/0x660
[  415.307239] [c622be70] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c622be80] [c016fa68] ubifs_bg_thread+0x120/0x19c
[  415.307239] [c622beb0] [c0034de8] kthread+0xc8/0xdc
[  415.307239] [c622bf40] [c000bd38] ret_from_kernel_thread+0x5c/0x64
[  415.307239] udhcpc          S 0fd9efa8  6268   171      1 0x00000000
[  415.307239] Call Trace:
[  415.307239] [c64879e0] [c042b07c] __schedule+0x2cc/0x660
[  415.307239] [c6487a10] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6487a20] [c042f5f4] 
schedule_hrtimeout_range_clock+0xc4/0x14c
[  415.307239] [c6487a90] [c00de47c] poll_schedule_timeout+0x44/0x7c
[  415.307239] [c6487aa0] [c00deb7c] do_select+0x5ec/0x710
[  415.307239] [c6487dc0] [c00deecc] core_sys_select+0x22c/0x4cc
[  415.307239] [c6487f10] [c00df214] SyS_select+0xa8/0x124
[  415.307239] [c6487f40] [c000bbd4] ret_from_syscall+0x0/0x38
[  415.307239] watchdog        S 0fedb94c  5532   188      1 0x00000000
[  415.307239] Call Trace:
[  415.307239] [c6485d80] [c05a4140] pidmap_lock+0x0/0x20 (unreliable)
[  415.307239] [c6485e40] [c042b07c] __schedule+0x2cc/0x660
[  415.307239] [c6485e70] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6485e80] [c042f424] do_nanosleep+0x74/0xe0
[  415.307239] [c6485ea0] [c0063fbc] hrtimer_nanosleep+0xb0/0x144
[  415.307239] [c6485f10] [c0064104] SyS_nanosleep+0xb4/0xe0
[  415.307239] [c6485f40] [c000bbd4] ret_from_syscall+0x0/0x38
[  415.307239] ntpd            S 0fe32fa8  5452   208      1 0x00000000
[  415.307239] Call Trace:
[  415.307239] [c6223920] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c62239e0] [c042b07c] __schedule+0x2cc/0x660
[  415.307239] [c6223a10] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6223a20] [c042f674] 
schedule_hrtimeout_range_clock+0x144/0x14c
[  415.307239] [c6223a90] [c00de47c] poll_schedule_timeout+0x44/0x7c
[  415.307239] [c6223aa0] [c00deb7c] do_select+0x5ec/0x710
[  415.307239] [c6223dc0] [c00deecc] core_sys_select+0x22c/0x4cc
[  415.307239] [c6223f10] [c00df214] SyS_select+0xa8/0x124
[  415.307239] [c6223f40] [c000bbd4] ret_from_syscall+0x0/0x38
[  415.307239] inetd           S 0fd9efa8  6172   221      1 0x00000000
[  415.307239] Call Trace:
[  415.307239] [c6483920] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c64839e0] [c042b07c] __schedule+0x2cc/0x660
[  415.307239] [c6483a10] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6483a20] [c042f674] 
schedule_hrtimeout_range_clock+0x144/0x14c
[  415.307239] [c6483a90] [c00de47c] poll_schedule_timeout+0x44/0x7c
[  415.307239] [c6483aa0] [c00deb7c] do_select+0x5ec/0x710
[  415.307239] [c6483dc0] [c00deecc] core_sys_select+0x22c/0x4cc
[  415.307239] [c6483f10] [c00df214] SyS_select+0xa8/0x124
[  415.307239] [c6483f40] [c000bbd4] ret_from_syscall+0x0/0x38
[  415.307239] crond           S 0fd6f94c  6156   234      1 0x00000000
[  415.307239] Call Trace:
[  415.307239] [c622dd80] [c004a590] __lock_acquire+0x314/0x1a30 
(unreliable)
[  415.307239] [c622de40] [c042b07c] __schedule+0x2cc/0x660
[  415.307239] [c622de70] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c622de80] [c042f424] do_nanosleep+0x74/0xe0
[  415.307239] [c622dea0] [c0063fbc] hrtimer_nanosleep+0xb0/0x144
[  415.307239] [c622df10] [c0064104] SyS_nanosleep+0xb4/0xe0
[  415.307239] [c622df40] [c000bbd4] ret_from_syscall+0x0/0x38
[  415.307239] snmpd           S 0fa3afa8  5516   250      1 0x00000000
[  415.307239] Call Trace:
[  415.307239] [c6481920] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c64819e0] [c042b07c] __schedule+0x2cc/0x660
[  415.307239] [c6481a10] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c6481a20] [c042f5f4] 
schedule_hrtimeout_range_clock+0xc4/0x14c
[  415.307239] [c6481a90] [c00de47c] poll_schedule_timeout+0x44/0x7c
[  415.307239] [c6481aa0] [c00deb7c] do_select+0x5ec/0x710
[  415.307239] [c6481dc0] [c00deecc] core_sys_select+0x22c/0x4cc
[  415.307239] [c6481f10] [c00df214] SyS_select+0xa8/0x124
[  415.307239] [c6481f40] [c000bbd4] ret_from_syscall+0x0/0x38
[  415.307239] login           S 0fd6f410  5404   263      1 0x00000000
[  415.307239] Call Trace:
[  415.307239] [c611ddc0] [c05dd3b0] list_entries+0x0/0x120000 (unreliable)
[  415.307239] [c611de80] [c042b07c] __schedule+0x2cc/0x660
[  415.307239] [c611deb0] [c042b4a0] schedule+0x2c/0x54
[  415.307239] [c611dec0] [c0018fb0] do_wait+0x1b0/0x1fc
[  415.307239] [c611def0] [c0019cf4] SyS_wait4+0x74/0xd8
[  415.307239] [c611df40] [c000bbd4] ret_from_syscall+0x0/0x38
[  415.307239] sh              R running   5580   281    263 0x00000000
[  415.307239] Call Trace:
[  415.307239] [c7fefe60] [c003da18] show_state_filter+0xc8/0x1c8 
(unreliable)
[  415.307239] [c7fefe90] [c01d901c] sysrq_handle_showstate+0x14/0x24
[  415.307239] [c7fefea0] [c01d94cc] __handle_sysrq+0x180/0x268
[  415.307239] [c7fefed0] [c01e2980] cpm_uart_int_rx+0x1f8/0x33c
[  415.307239] [c7feff20] [c01e2e70] cpm_uart_int+0x104/0x134
[  415.307239] [c7feff40] [c0054598] handle_irq_event_percpu+0x5c/0x158
[  415.307239] [c7feff70] [c00546d8] handle_irq_event+0x44/0x78
[  415.307239] [c7feff90] [c00579b8] handle_fasteoi_irq+0xd4/0x210
[  415.307239] [c7feffb0] [c0053e48] generic_handle_irq+0x30/0x4c
[  415.307239] [c7feffc0] [c00132dc] cpm_cascade+0x28/0x48
[  415.307239] [c7feffd0] [c0053e48] generic_handle_irq+0x30/0x4c
[  415.307239] [c7feffe0] [c0003ea0] __do_irq+0x3c/0x88
[  415.307239] [c7fefff0] [c000b394] call_do_irq+0x24/0x3c
[  415.307239] [c6215980] [c0003f74] do_IRQ+0x88/0xe0
[  415.307239] [c62159b0] [c000c314] ret_from_except+0x0/0x14
[  415.307239] [c6215a70] [c020f8a0] nand_wait+0x2c/0x12c
[  415.307239] [c6215a90] [c020e120] nand_write_page+0xb8/0x1a0
[  415.307239] [c6215ac0] [c020f248] nand_do_write_ops+0x1ec/0x44c
[  415.307239] [c6215b30] [c020f5ec] nand_write+0x68/0x94
[  415.307239] [c6215b70] [c02210e0] ubi_io_write+0x108/0x61c
[  415.307239] [c6215bb0] [c021e10c] ubi_eba_write_leb+0x88/0x7b4
[  415.307239] [c6215c20] [c0163bfc] ubifs_leb_write+0xac/0x144
[  415.307239] [c6215c50] [c0164740] ubifs_wbuf_sync_nolock+0xfc/0x33c
[  415.307239] [c6215c70] [c0158b7c] ubifs_jnl_write_inode+0xb0/0x208
[  415.307239] [c6215cb0] [c015f764] ubifs_write_inode+0xc4/0x14c
[  415.307239] [c6215cd0] [c015bd68] ubifs_writepage+0x1ac/0x22c
[  415.307239] [c6215d00] [c009a428] __writepage+0x24/0x7c
[  415.307239] [c6215d10] [c009c6e4] write_cache_pages+0x1d0/0x380
[  415.307239] [c6215db0] [c008fe6c] __filemap_fdatawrite_range+0x64/0x88
[  415.307239] [c6215de0] [c0090644] filemap_write_and_wait_range+0x38/0x94
[  415.307239] [c6215e00] [c015ac3c] ubifs_fsync+0x50/0xd4
[  415.307239] [c6215e20] [c0093bb8] generic_file_write_iter+0x200/0x2b4
[  415.307239] [c6215e50] [c015c294] ubifs_write_iter+0x104/0x1a8
[  415.307239] [c6215ea0] [c00cba6c] __vfs_write+0xc8/0x14c
[  415.307239] [c6215ef0] [c00cbc80] vfs_write+0xa4/0x178
[  415.307239] [c6215f10] [c00cbe64] SyS_write+0x4c/0xc4
[  415.307239] [c6215f40] [c000bbd4] ret_from_syscall+0x0/0x38
[  415.307239] Sched Debug Version: v0.11, 4.4.14-s3k-dev-g08fc2be-svn #110
[  415.307239] ktime                                   : 415174.352520
[  415.307239] sched_clk                               : 415307.239936
[  415.307239] cpu_clk                                 : 415307.239936
[  415.307239] jiffies                                 : 11517
[  415.307239]
[  415.307239] sysctl_sched
[  415.307239]   .sysctl_sched_latency                    : 6.000000
[  415.307239]   .sysctl_sched_min_granularity            : 0.750000
[  415.307239]   .sysctl_sched_wakeup_granularity         : 1.000000
[  415.307239]   .sysctl_sched_child_runs_first           : 0
[  415.307239]   .sysctl_sched_features                   : 44859
[  415.307239]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
[  415.307239]
[  415.307239] cpu#0
[  415.307239]   .nr_running                    : 2
[  415.307239]   .load                          : 2048
[  415.307239]   .nr_switches                   : 26656
[  415.307239]   .nr_load_updates               : 0
[  415.307239]   .nr_uninterruptible            : 0
[  415.307239]   .next_balance                  : 0.000000
[  415.307239]   .curr->pid                     : 281
[  415.307239]   .clock                         : 415307.239936
[  415.307239]   .clock_task                    : 415307.239936
[  415.307239]   .cpu_load[0]                   : 0
[  415.307239]   .cpu_load[1]                   : 0
[  415.307239]   .cpu_load[2]                   : 0
[  415.307239]   .cpu_load[3]                   : 0
[  415.307239]   .cpu_load[4]                   : 0
[  415.307239]
[  415.307239] cfs_rq[0]:
[  415.307239]   .exec_clock                    : 0.000000
[  415.307239]   .MIN_vruntime                  : 42490.860493
[  415.307239]   .min_vruntime                  : 42493.860493
[  415.307239]   .max_vruntime                  : 42490.860493
[  415.307239]   .spread                        : 0.000000
[  415.307239]   .spread0                       : 0.000000
[  415.307239]   .nr_spread_over                : 0
[  415.307239]   .nr_running                    : 2
[  415.307239]   .load                          : 2048
[  415.307239]
[  415.307239] rt_rq[0]:
[  415.307239]   .rt_nr_running                 : 0
[  415.307239]   .rt_throttled                  : 0
[  415.307239]   .rt_time                       : 0.000000
[  415.307239]   .rt_runtime                    : 950.000000
[  415.307239]
[  415.307239] dl_rq[0]:
[  415.307239]   .dl_nr_running                 : 0
[  415.307239]
[  415.307239] runnable tasks:
[  415.307239]             task   PID         tree-key  switches  prio 
   wait-time             sum-exec        sum-sleep
[  415.307239] 
----------------------------------------------------------------------------------------------------------
[  415.307239]             init     1     41218.246221       324   120 
       0.000000      7091.870720         0.000000
[  415.307239]         kthreadd     2     10041.075333        53   120 
       0.000000        39.044608         0.000000
[  415.307239]      ksoftirqd/0     3     42471.449421       525   120 
       0.000000       174.042368         0.000000
[  415.307239]      kworker/0:0     4       393.682966        10   120 
       0.000000         3.465472         0.000000
[  415.307239]     kworker/0:0H     5       542.229904         3   100 
       0.000000         0.373504         0.000000
[  415.307239]     kworker/u2:0     6     42490.860493       482   120 
       0.000000       682.979200         0.000000
[  415.307239]      rcu_preempt     7     42471.239885      7355   120 
       0.000000      2497.164544         0.000000
[  415.307239]        rcu_sched     8        16.696155         2   120 
       0.000000         0.232192         0.000000
[  415.307239]           rcu_bh     9        18.788058         2   120 
       0.000000         0.225792         0.000000
[  415.307239]        kdevtmpfs    10      9906.339623        53   120 
       0.000000        49.822080         0.000000
[  415.307239]             perf    11       209.966618         2   100 
       0.000000         0.418688         0.000000
[  415.307239]       khungtaskd    12     42271.067789        43   120 
       0.000000         9.401344         0.000000
[  415.307239]        writeback    13       393.600057         2   100 
       0.000000         0.301184         0.000000
[  415.307239]      kworker/0:1    14     42462.051405      3062   120 
       0.000000      1814.468864         0.000000
[  415.307239]           crypto    15       395.547701         2   100 
       0.000000         0.310016         0.000000
[  415.307239]           rpciod    16       547.595271         3   100 
       0.000000         0.346752         0.000000
[  415.307239]          kswapd0    17      1141.966520         3   120 
       0.000000         0.758656         0.000000
[  415.307239]    fsnotify_mark    18      1289.797239         2   120 
       0.000000         1.068032         0.000000
[  415.307239]           nfsiod    19      1294.974059         2   100 
       0.000000         0.303360         0.000000
[  415.307239]            bond0    37      4873.191411         2   100 
       0.000000         0.320512         0.000000
[  415.307239]    ipv6_addrconf    49      5541.654525         2   100 
       0.000000         0.303232         0.000000
[  415.307239]          deferwq    50      5664.433381         2   100 
       0.000000         0.299264         0.000000
[  415.307239]     kworker/u2:1    51     29504.315231         5   120 
       0.000000         0.803840         0.000000
[  415.307239]         spi32766    52     24174.314766       243   120 
       0.000000        44.892928         0.000000
[  415.307239]        ubi_bgt0d    63     10399.804677        11   120 
       0.000000        25.593344         0.000000
[  415.307239]     ubifs_bgt0_0    65     10052.319044         2   120 
       0.000000         7.036288         0.000000
[  415.307239]           udhcpc   171     19428.265114         1   120 
       0.000000         5.979520         0.000000
[  415.307239]         watchdog   188         0.654209       842     0 
       0.000000      3819.744512         0.000000
[  415.307239]             ntpd   208     42384.913677       419   120 
       0.000000       645.997952         0.000000
[  415.307239]            inetd   221     21775.101204         5   120 
       0.000000        39.310848         0.000000
[  415.307239]            crond   234     40244.762133        14   120 
       0.000000        49.742208         0.000000
[  415.307239]            snmpd   250     42386.978381      1732   120 
       0.000000     12158.752128         0.000000
[  415.307239]            login   263     28214.829541        46   120 
       0.000000       420.666240         0.000000
[  415.307239] R             sh   281     42490.860493       174   120 
       0.000000       538.372864         0.000000
[  415.307239]
[  415.307239]
[  415.307239] Showing all locks held in the system:
[  415.307239] 7 locks held by sh/281:
[  415.307239]  #0:  (sb_writers#5){.+.+.+}, at: [<c00ce934>] 
__sb_start_write+0xac/0xc4
[  415.307239]  #1:  (&ui->ui_mutex){+.+...}, at: [<c015f704>] 
ubifs_write_inode+0x64/0x14c
[  415.307239]  #2:  (&c->commit_sem){++++..}, at: [<c0157bac>] 
make_reservation+0x5c/0x408
[  415.307239]  #3:  (&wbuf->io_mutex/1){+.+...}, at: [<c0157bd4>] 
make_reservation+0x84/0x408
[  415.307239]  #4:  (&le->mutex){++++..}, at: [<c021d8d0>] 
leb_write_lock+0x24/0x38
[  415.307239]  #5:  (rcu_read_lock){......}, at: [<c01d934c>] 
__handle_sysrq+0x0/0x268
[  415.307239]  #6:  (tasklist_lock){.+.?..}, at: [<c004a10c>] 
debug_show_all_locks+0x3c/0x1ac
[  415.307239]
[  415.307239] =============================================
[  415.307239]
[  415.307239] Showing busy workqueues and worker pools:
[  415.307239] workqueue events_unbound: flags=0x2
[  415.307239]   pwq 2: cpus=0 flags=0x4 nice=0 active=1/512
[  415.307239]     pending: flush_to_ldisc


[  415.307239] sysrq: SysRq : Show clockevent devices & pending hrtimers 
(no others)
[  415.307239] Timer List Version: v0.8
[  415.307239] HRTIMER_MAX_CLOCK_BASES: 4
[  415.307239] now at 415174352520 nsecs
[  415.307239]
[  415.307239] cpu: 0
[  415.307239]  clock 0:
[  415.307239]   .base:       c05a6a00
[  415.307239]   .index:      0
[  415.307239]   .resolution: 1 nsecs
[  415.307239]   .get_time:   ktime_get
[  415.307239]   .offset:     0 nsecs
[  415.307239] active timers:
[  415.307239]  #0: tick_cpu_sched, tick_sched_timer, S:01
[  415.307239]  # expires at 415180000000-415180000000 nsecs [in 5647480 
to 5647480 nsecs]
[  415.307239]  #1: <c6485ea8>, hrtimer_wakeup, S:01
[  415.307239]  # expires at 415316748939-415316748939 nsecs [in 
142396419 to 142396419 nsecs]
[  415.307239]  #2: <c6481a28>, hrtimer_wakeup, S:01
[  415.307239]  # expires at 415325581283-415326561308 nsecs [in 
151228763 to 152208788 nsecs]
[  415.307239]  #3: <c61fa178>, it_real_fn, S:01
[  415.307239]  # expires at 415635620681-415635620681 nsecs [in 
461268161 to 461268161 nsecs]
[  415.307239]  #4: <c622dea8>, hrtimer_wakeup, S:01
[  415.307239]  # expires at 426702410278-426702460278 nsecs [in 
11528057758 to 11528107758 nsecs]
[  415.307239]  #5: <c6487a28>, hrtimer_wakeup, S:01
[  415.307239]  # expires at 129624716464923-129624816464923 nsecs [in 
129209542112403 to 129209642112403 nsecs]
[  415.307239]  clock 1:
[  415.307239]   .base:       c05a6a20
[  415.307239]   .index:      1
[  415.307239]   .resolution: 1 nsecs
[  415.307239]   .get_time:   ktime_get_real
[  415.307239]   .offset:     1471420915047270622 nsecs
[  415.307239] active timers:
[  415.307239]  clock 2:
[  415.307239]   .base:       c05a6a40
[  415.307239]   .index:      2
[  415.307239]   .resolution: 1 nsecs
[  415.307239]   .get_time:   ktime_get_boottime
[  415.307239]   .offset:     0 nsecs
[  415.307239] active timers:
[  415.307239]  clock 3:
[  415.307239]   .base:       c05a6a60
[  415.307239]   .index:      3
[  415.307239]   .resolution: 1 nsecs
[  415.307239]   .get_time:   ktime_get_clocktai
[  415.307239]   .offset:     1471420915047270622 nsecs
[  415.307239] active timers:
[  415.307239]   .expires_next   : 415180000000 nsecs
[  415.307239]   .hres_active    : 1
[  415.307239]   .nr_events      : 43162
[  415.307239]   .nr_retries     : 20
[  415.307239]   .nr_hangs       : 0
[  415.307239]   .max_hang_time  : 0
[  415.307239]   .nohz_mode      : 0
[  415.307239]   .last_tick      : 0 nsecs
[  415.307239]   .tick_stopped   : 0
[  415.307239]   .idle_jiffies   : 0
[  415.307239]   .idle_calls     : 0
[  415.307239]   .idle_sleeps    : 0
[  415.307239]   .idle_entrytime : 0 nsecs
[  415.307239]   .idle_waketime  : 0 nsecs
[  415.307239]   .idle_exittime  : 0 nsecs
[  415.307239]   .idle_sleeptime : 0 nsecs
[  415.307239]   .iowait_sleeptime: 0 nsecs
[  415.307239]   .last_jiffies   : 0
[  415.307239]   .next_timer     : 0
[  415.307239]   .idle_expires   : 0 nsecs
[  415.307239] jiffies: 11517
[  415.307239]
[  415.307239] Tick Device: mode:     1
[  415.307239] Per CPU device: 0
[  415.307239] Clock Event Device: decrementer
[  415.307239]  max_delta_ns:   260301049532
[  415.307239]  min_delta_ns:   1000
[  415.307239]  mult:           35433480
[  415.307239]  shift:          32
[  415.307239]  mode:           3
[  415.307239]  next_event:     415180000000 nsecs
[  415.307239]  set_next_event: decrementer_set_next_event
[  415.307239]  shutdown: decrementer_shutdown
[  415.307239]  resume:   decrementer_shutdown
[  415.307239]  event_handler:  hrtimer_interrupt
[  415.307239]  retries:        0
[  415.307239]
Holger Brunck Aug. 17, 2016, 2:59 p.m. UTC | #6
On 15/08/16 18:19, Dave Hansen wrote:
> On 08/15/2016 07:35 AM, Holger Brunck wrote:
>> I tried this but unfortunately the error only occurs while remote debugging.
>> Locally with gdb everything works fine. BTW we double-checked with a 85xx ppc
>> target which is also 32-bit and it ends up with the same behaviour.
>>
>> I was also investigating where I have to move the line in the struct task_struct
>> and it turns out to be like this (diff to 4.7 kernel):
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 253538f..4868874 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1655,7 +1655,9 @@ struct task_struct {
>>         struct signal_struct *signal;
>>         struct sighand_struct *sighand;
>>
>> +       // struct thread_struct thread;   // until here everything is fine
>>         sigset_t blocked, real_blocked;
>> +       struct thread_struct thread;      // from here it's broken
>>         sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
>>         struct sigpending pending;
> 
> Wow, thanks for all the debugging here!
> 
> So, we know it has to do with signals, thread_info, and probably only
> affects 32-bit powerpc.  Seems awfully weird.  Have you checked with any
> of the 64-bit powerpc guys to see if they have any ideas?
> 
> I went grepping around for a bit.
> 
> Where is the task_struct stored?  Is it on-stack on ppc32 or something?
>  The thread_info is, I assume, but I see some THREAD_INFO vs. THREAD
> (thread struct) math happening in here, which confuses me:
> 
>         .globl  ret_from_debug_exc
> ret_from_debug_exc:
>         mfspr   r9,SPRN_SPRG_THREAD
>         lwz     r10,SAVED_KSP_LIMIT(r1)
>         stw     r10,KSP_LIMIT(r9)
>         lwz     r9,THREAD_INFO-THREAD(r9)
>         CURRENT_THREAD_INFO(r10, r1)
>         lwz     r10,TI_PREEMPT(r10)
>         stw     r10,TI_PREEMPT(r9)
>         RESTORE_xSRR(SRR0,SRR1);
>         RESTORE_xSRR(CSRR0,CSRR1);
>         RESTORE_MMU_REGS;
>         RET_FROM_EXC_LEVEL(SPRN_DSRR0, SPRN_DSRR1, PPC_RFDI)
> 

yeah but here you are in arch/powerpc/kernel/head_booke.h and IIUIC my
architecture uses arch/powerpc/kernel/head_32.S

Some small updates from my side. I was able to simplify the setup. I don't need
my (quite complex) application. I now have a small C - program which starts
three threads simply running a while loop and doing printouts every now and
then. And I still can reproduce the error. So we simply need threads a gdbserver
session, ppc32 and a single step while the threads are already running.

I added some debug prints within the kernel common signal code and the specific
powerpc code and enabled the debug trace from the gdbserver. What I see is that
gdbserver sends for each thread a SIGSTOP to the kernel and waits for a
response. The kernel does receive all the signals but only respond to some of
them in the error case. Which then matches with my "ps" output as I see that
some threads are not in the state pthread_stop and then the gdbserver gets
suspended. I think the interesting part is in arch/powerpc/kernel/signal_32.c
with it's function handle_signal32. For the threads successfully stopped this
function is called once. If the kernel receives two SIGSTOP before calling the
function we end up in the error case. Now my question does anyone know if this
function should handle several pending signals at once if present or will it be
called once per signal?

> But, I'm really at a loss to explain this.  It still seems like a deeply
> ppc-specific issue.  We can obviously work around it with an #ifdef for
> your platform, but that's awfully hackish and hides the real bug,
> whatever it is.
> 

what I could do is to reuse the define CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and
if it's not set I use the thread_struct at it's old position. I know it would
only mask the error and I guess it's not acceptable for mainline but as my time
is limited I could live with such an OOT patch for my board.

Best regards
Holger
Holger Brunck Aug. 17, 2016, 3:27 p.m. UTC | #7
On 16/08/16 19:27, christophe leroy wrote:
> 
> 
> Le 15/08/2016 à 18:19, Dave Hansen a écrit :
>> On 08/15/2016 07:35 AM, Holger Brunck wrote:
>>> I tried this but unfortunately the error only occurs while remote debugging.
>>> Locally with gdb everything works fine. BTW we double-checked with a 85xx ppc
>>> target which is also 32-bit and it ends up with the same behaviour.
>>>
>>> I was also investigating where I have to move the line in the struct task_struct
>>> and it turns out to be like this (diff to 4.7 kernel):
>>>
>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>> index 253538f..4868874 100644
>>> --- a/include/linux/sched.h
>>> +++ b/include/linux/sched.h
>>> @@ -1655,7 +1655,9 @@ struct task_struct {
>>>         struct signal_struct *signal;
>>>         struct sighand_struct *sighand;
>>>
>>> +       // struct thread_struct thread;   // until here everything is fine
>>>         sigset_t blocked, real_blocked;
>>> +       struct thread_struct thread;      // from here it's broken
>>>         sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
>>>         struct sigpending pending;
>>
>> Wow, thanks for all the debugging here!
>>
>> So, we know it has to do with signals, thread_info, and probably only
>> affects 32-bit powerpc.  Seems awfully weird.  Have you checked with any
>> of the 64-bit powerpc guys to see if they have any ideas?
>>
>> I went grepping around for a bit.
>>
>> Where is the task_struct stored?  Is it on-stack on ppc32 or something?
>>  The thread_info is, I assume, but I see some THREAD_INFO vs. THREAD
>> (thread struct) math happening in here, which confuses me:
>>
>>         .globl  ret_from_debug_exc
>> ret_from_debug_exc:
>>         mfspr   r9,SPRN_SPRG_THREAD
>>         lwz     r10,SAVED_KSP_LIMIT(r1)
>>         stw     r10,KSP_LIMIT(r9)
>>         lwz     r9,THREAD_INFO-THREAD(r9)
>>         CURRENT_THREAD_INFO(r10, r1)
>>         lwz     r10,TI_PREEMPT(r10)
>>         stw     r10,TI_PREEMPT(r9)
>>         RESTORE_xSRR(SRR0,SRR1);
>>         RESTORE_xSRR(CSRR0,CSRR1);
>>         RESTORE_MMU_REGS;
>>         RET_FROM_EXC_LEVEL(SPRN_DSRR0, SPRN_DSRR1, PPC_RFDI)
>>
>> But, I'm really at a loss to explain this.  It still seems like a deeply
>> ppc-specific issue.  We can obviously work around it with an #ifdef for
>> your platform, but that's awfully hackish and hides the real bug,
>> whatever it is.
>>
>> My suspicion is that there's a bug in the 32-bit ppc assembly somewhere.
>>  I don't see any references to 'blocked' or 'real_blocked' in assembly
>> though.  You could add a bunch of padding instead of moving the
>> thread_struct and see if that does anything, but that's really a stab in
>> the dark.
>>
> 
> Just to let you know, I'm not sure it is the same issue, but I also get 
> my 8xx target stuck when I try to use gdbserver.
> 
> If I debug a very small app, it gets stuck quickly after the app has 
> stopped: indeed, the console seems ok but as soon as I try to execute 
> something simple, like a ps or top, it get stuck. The target still 
> responds to pings, but nothing else.
> 
> If I debug a big app, it gets stuck soon after the start of debug: I set 
> a bpoint at main(), do a 'continue', get breaked at main(), do some 
> steps with 'next' then it gets stuck.
> 
> I have tried moving the struct thread_struct thread but it has no impact.
> 

that sounds a bit different to what I see. Is your program also mutli-threaded?

Maybe you could try with the program I use to reproduce the error:

--- snip -----
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

void * th_1_func()
{
   while (1) {
     sleep(2);
     printf("Hello from thread function 1)\n");
   }
}

int main() {
  int err;
  pthread_t th_1, th_2, th_3;

  err = pthread_create(&th_1, NULL, th_1_func, NULL);
  if (err != 0)
    printf("pthread_create\n");
  err = pthread_create(&th_2, NULL, th_1_func, NULL);
  if (err != 0)
    printf("pthread_create\n");
  err = pthread_create(&th_3, NULL, th_1_func, NULL);
  if (err != 0)
    printf("pthread_create\n");
  while(1) {}
  return 0;
}
--- snap ---

Then copy it to your target and start it with the gdbserver. If you let it run
from your host with gdb and try to stop it e.g in the sleep call and then try to
single step it you might see the error. But as I said in this thread the
behaviour might be different depending on your kernel configuration as I
encountered different behaviour when enabling FTRACE or SCHED_STAT.

Best regards
Holger
Christophe Leroy Aug. 18, 2016, 8:23 a.m. UTC | #8
Le 17/08/2016 à 17:27, Holger Brunck a écrit :
> On 16/08/16 19:27, christophe leroy wrote:
>>
>> Le 15/08/2016 à 18:19, Dave Hansen a écrit :
>>> On 08/15/2016 07:35 AM, Holger Brunck wrote:
>>>> I tried this but unfortunately the error only occurs while remote debugging.
>>>> Locally with gdb everything works fine. BTW we double-checked with a 85xx ppc
>>>> target which is also 32-bit and it ends up with the same behaviour.
>>>>
>>>> I was also investigating where I have to move the line in the struct task_struct
>>>> and it turns out to be like this (diff to 4.7 kernel):
>>>>
>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>> index 253538f..4868874 100644
>>>> --- a/include/linux/sched.h
>>>> +++ b/include/linux/sched.h
>>>> @@ -1655,7 +1655,9 @@ struct task_struct {
>>>>          struct signal_struct *signal;
>>>>          struct sighand_struct *sighand;
>>>>
>>>> +       // struct thread_struct thread;   // until here everything is fine
>>>>          sigset_t blocked, real_blocked;
>>>> +       struct thread_struct thread;      // from here it's broken
>>>>          sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
>>>>          struct sigpending pending;
>>> Wow, thanks for all the debugging here!
>>>
>>> So, we know it has to do with signals, thread_info, and probably only
>>> affects 32-bit powerpc.  Seems awfully weird.  Have you checked with any
>>> of the 64-bit powerpc guys to see if they have any ideas?
>>>
>>> I went grepping around for a bit.
>>>
>>> Where is the task_struct stored?  Is it on-stack on ppc32 or something?
>>>   The thread_info is, I assume, but I see some THREAD_INFO vs. THREAD
>>> (thread struct) math happening in here, which confuses me:
>>>
>>>          .globl  ret_from_debug_exc
>>> ret_from_debug_exc:
>>>          mfspr   r9,SPRN_SPRG_THREAD
>>>          lwz     r10,SAVED_KSP_LIMIT(r1)
>>>          stw     r10,KSP_LIMIT(r9)
>>>          lwz     r9,THREAD_INFO-THREAD(r9)
>>>          CURRENT_THREAD_INFO(r10, r1)
>>>          lwz     r10,TI_PREEMPT(r10)
>>>          stw     r10,TI_PREEMPT(r9)
>>>          RESTORE_xSRR(SRR0,SRR1);
>>>          RESTORE_xSRR(CSRR0,CSRR1);
>>>          RESTORE_MMU_REGS;
>>>          RET_FROM_EXC_LEVEL(SPRN_DSRR0, SPRN_DSRR1, PPC_RFDI)
>>>
>>> But, I'm really at a loss to explain this.  It still seems like a deeply
>>> ppc-specific issue.  We can obviously work around it with an #ifdef for
>>> your platform, but that's awfully hackish and hides the real bug,
>>> whatever it is.
>>>
>>> My suspicion is that there's a bug in the 32-bit ppc assembly somewhere.
>>>   I don't see any references to 'blocked' or 'real_blocked' in assembly
>>> though.  You could add a bunch of padding instead of moving the
>>> thread_struct and see if that does anything, but that's really a stab in
>>> the dark.
>>>
>> Just to let you know, I'm not sure it is the same issue, but I also get
>> my 8xx target stuck when I try to use gdbserver.
>>
>> If I debug a very small app, it gets stuck quickly after the app has
>> stopped: indeed, the console seems ok but as soon as I try to execute
>> something simple, like a ps or top, it get stuck. The target still
>> responds to pings, but nothing else.
>>
>> If I debug a big app, it gets stuck soon after the start of debug: I set
>> a bpoint at main(), do a 'continue', get breaked at main(), do some
>> steps with 'next' then it gets stuck.
>>
>> I have tried moving the struct thread_struct thread but it has no impact.
>>
> that sounds a bit different to what I see. Is your program also mutli-threaded?
>
>

No my program is a simple app doing a few printf("Hello World !"); and 
nothing more.

I have now identified the issue, it is most likely specific to the 8xx: 
when entering single step exception, the 8xx asserts the internal Freeze 
which stops the decrementer and the timebase. In order to clear the 
internal Freeze, the ICR SPR has to be read.

I'll now be able to check with your program and see how it behaves.

Christophe
Christophe Leroy Aug. 19, 2016, 11:03 a.m. UTC | #9
Le 17/08/2016 à 17:27, Holger Brunck a écrit :
> On 16/08/16 19:27, christophe leroy wrote:
>>
>>
>> Le 15/08/2016 à 18:19, Dave Hansen a écrit :
>>> On 08/15/2016 07:35 AM, Holger Brunck wrote:
>>>> I tried this but unfortunately the error only occurs while remote debugging.
>>>> Locally with gdb everything works fine. BTW we double-checked with a 85xx ppc
>>>> target which is also 32-bit and it ends up with the same behaviour.
>>>>
>>>> I was also investigating where I have to move the line in the struct task_struct
>>>> and it turns out to be like this (diff to 4.7 kernel):
>>>>
>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>> index 253538f..4868874 100644
>>>> --- a/include/linux/sched.h
>>>> +++ b/include/linux/sched.h
>>>> @@ -1655,7 +1655,9 @@ struct task_struct {
>>>>         struct signal_struct *signal;
>>>>         struct sighand_struct *sighand;
>>>>
>>>> +       // struct thread_struct thread;   // until here everything is fine
>>>>         sigset_t blocked, real_blocked;
>>>> +       struct thread_struct thread;      // from here it's broken
>>>>         sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
>>>>         struct sigpending pending;
>>>
>>> Wow, thanks for all the debugging here!
>>>
>>> So, we know it has to do with signals, thread_info, and probably only
>>> affects 32-bit powerpc.  Seems awfully weird.  Have you checked with any
>>> of the 64-bit powerpc guys to see if they have any ideas?
>>>
>>> I went grepping around for a bit.
>>>
>>> Where is the task_struct stored?  Is it on-stack on ppc32 or something?
>>>  The thread_info is, I assume, but I see some THREAD_INFO vs. THREAD
>>> (thread struct) math happening in here, which confuses me:
>>>
>>>         .globl  ret_from_debug_exc
>>> ret_from_debug_exc:
>>>         mfspr   r9,SPRN_SPRG_THREAD
>>>         lwz     r10,SAVED_KSP_LIMIT(r1)
>>>         stw     r10,KSP_LIMIT(r9)
>>>         lwz     r9,THREAD_INFO-THREAD(r9)
>>>         CURRENT_THREAD_INFO(r10, r1)
>>>         lwz     r10,TI_PREEMPT(r10)
>>>         stw     r10,TI_PREEMPT(r9)
>>>         RESTORE_xSRR(SRR0,SRR1);
>>>         RESTORE_xSRR(CSRR0,CSRR1);
>>>         RESTORE_MMU_REGS;
>>>         RET_FROM_EXC_LEVEL(SPRN_DSRR0, SPRN_DSRR1, PPC_RFDI)
>>>
>>> But, I'm really at a loss to explain this.  It still seems like a deeply
>>> ppc-specific issue.  We can obviously work around it with an #ifdef for
>>> your platform, but that's awfully hackish and hides the real bug,
>>> whatever it is.
>>>
>>> My suspicion is that there's a bug in the 32-bit ppc assembly somewhere.
>>>  I don't see any references to 'blocked' or 'real_blocked' in assembly
>>> though.  You could add a bunch of padding instead of moving the
>>> thread_struct and see if that does anything, but that's really a stab in
>>> the dark.
>>>
>>
>> Just to let you know, I'm not sure it is the same issue, but I also get
>> my 8xx target stuck when I try to use gdbserver.
>>
>> If I debug a very small app, it gets stuck quickly after the app has
>> stopped: indeed, the console seems ok but as soon as I try to execute
>> something simple, like a ps or top, it get stuck. The target still
>> responds to pings, but nothing else.
>>
>> If I debug a big app, it gets stuck soon after the start of debug: I set
>> a bpoint at main(), do a 'continue', get breaked at main(), do some
>> steps with 'next' then it gets stuck.
>>
>> I have tried moving the struct thread_struct thread but it has no impact.
>>
>
> that sounds a bit different to what I see. Is your program also mutli-threaded?
>
> Maybe you could try with the program I use to reproduce the error:
>
> --- snip -----
> #include <pthread.h>
> #include <stdio.h>
> #include <unistd.h>
>
> void * th_1_func()
> {
>    while (1) {
>      sleep(2);
>      printf("Hello from thread function 1)\n");
>    }
> }
>
> int main() {
>   int err;
>   pthread_t th_1, th_2, th_3;
>
>   err = pthread_create(&th_1, NULL, th_1_func, NULL);
>   if (err != 0)
>     printf("pthread_create\n");
>   err = pthread_create(&th_2, NULL, th_1_func, NULL);
>   if (err != 0)
>     printf("pthread_create\n");
>   err = pthread_create(&th_3, NULL, th_1_func, NULL);
>   if (err != 0)
>     printf("pthread_create\n");
>   while(1) {}
>   return 0;
> }
> --- snap ---
>
> Then copy it to your target and start it with the gdbserver. If you let it run
> from your host with gdb and try to stop it e.g in the sleep call and then try to
> single step it you might see the error. But as I said in this thread the
> behaviour might be different depending on your kernel configuration as I
> encountered different behaviour when enabling FTRACE or SCHED_STAT.
>
> Best regards
> Holger
>

Hi

I just tried it on an 885 and on an 8323, it work properly on both targets.

You can see below the Debug Option that are active on my 8323 target.

Christophe

#
# Kernel hacking
#

#
# printk and dmesg options
#
CONFIG_PRINTK_TIME=y
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=4
# CONFIG_DYNAMIC_DEBUG is not set

#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=y
# CONFIG_DEBUG_INFO_SPLIT is not set
# CONFIG_DEBUG_INFO_DWARF4 is not set
# CONFIG_GDB_SCRIPTS is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=1024
# CONFIG_STRIP_ASM_SYMS is not set
# CONFIG_READABLE_ASM is not set
# CONFIG_UNUSED_SYMBOLS is not set
# CONFIG_PAGE_OWNER is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
# CONFIG_DEBUG_SECTION_MISMATCH is not set
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
# CONFIG_DEBUG_FORCE_WEAK_PER_CPU is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
CONFIG_DEBUG_KERNEL=y

#
# Memory Debugging
#
# CONFIG_PAGE_EXTENSION is not set
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set
CONFIG_HAVE_DEBUG_KMEMLEAK=y
# CONFIG_DEBUG_KMEMLEAK is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_MEMORY_INIT is not set
CONFIG_HAVE_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACKOVERFLOW is not set
# CONFIG_DEBUG_SHIRQ is not set

#
# Debug Lockups and Hangs
#
# CONFIG_LOCKUP_DETECTOR is not set
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
# CONFIG_PANIC_ON_OOPS is not set
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_SCHED_DEBUG=y
# CONFIG_SCHED_INFO is not set
# CONFIG_SCHEDSTATS is not set
# CONFIG_SCHED_STACK_END_CHECK is not set
# CONFIG_DEBUG_TIMEKEEPING is not set
# CONFIG_TIMER_STATS is not set
CONFIG_DEBUG_PREEMPT=y

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_STACKTRACE is not set
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_PI_LIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
# CONFIG_DEBUG_CREDENTIALS is not set

#
# RCU Debugging
#
# CONFIG_PROVE_RCU is not set
# CONFIG_SPARSE_RCU_POINTER is not set
# CONFIG_TORTURE_TEST is not set
# CONFIG_RCU_TORTURE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=21
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_NOTIFIER_ERROR_INJECTION is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_RING_BUFFER_ALLOW_SWAP=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_ENABLE_DEFAULT_TRACERS is not set
# CONFIG_FTRACE_SYSCALLS is not set
# CONFIG_TRACER_SNAPSHOT is not set
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
# CONFIG_STACK_TRACER is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_UPROBE_EVENT is not set
# CONFIG_PROBE_EVENTS is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_RING_BUFFER_STARTUP_TEST is not set
CONFIG_TRACING_EVENTS_GPIO=y

#
# Runtime Testing
#
# CONFIG_LKDTM is not set
# CONFIG_TEST_LIST_SORT is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_RBTREE_TEST is not set
# CONFIG_ATOMIC64_SELFTEST is not set
# CONFIG_TEST_HEXDUMP is not set
# CONFIG_TEST_STRING_HELPERS is not set
# CONFIG_TEST_KSTRTOX is not set
# CONFIG_TEST_PRINTF is not set
# CONFIG_TEST_RHASHTABLE is not set
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_TEST_FIRMWARE is not set
# CONFIG_TEST_UDELAY is not set
# CONFIG_MEMTEST is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
# CONFIG_PPC_DISABLE_WERROR is not set
CONFIG_PPC_WERROR=y
# CONFIG_STRICT_MM_TYPECHECKS is not set
CONFIG_PRINT_STACK_DEPTH=64
# CONFIG_PPC_EMULATED_STATS is not set
# CONFIG_CODE_PATCHING_SELFTEST is not set
# CONFIG_FTR_FIXUP_SELFTEST is not set
# CONFIG_MSI_BITMAP_SELFTEST is not set
# CONFIG_XMON is not set
CONFIG_BDI_SWITCH=y
# CONFIG_BOOTX_TEXT is not set
# CONFIG_PPC_EARLY_DEBUG is not set
CONFIG_STRICT_DEVMEM=y
Holger Brunck Aug. 19, 2016, 11:14 a.m. UTC | #10
On 19/08/16 13:03, Christophe Leroy wrote:
> 
> 
> Le 17/08/2016 à 17:27, Holger Brunck a écrit :
>> On 16/08/16 19:27, christophe leroy wrote:
>>>
>>>
>>> Le 15/08/2016 à 18:19, Dave Hansen a écrit :
>>>> On 08/15/2016 07:35 AM, Holger Brunck wrote:
>>>>> I tried this but unfortunately the error only occurs while remote debugging.
>>>>> Locally with gdb everything works fine. BTW we double-checked with a 85xx ppc
>>>>> target which is also 32-bit and it ends up with the same behaviour.
>>>>>
>>>>> I was also investigating where I have to move the line in the struct task_struct
>>>>> and it turns out to be like this (diff to 4.7 kernel):
>>>>>
>>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>>> index 253538f..4868874 100644
>>>>> --- a/include/linux/sched.h
>>>>> +++ b/include/linux/sched.h
>>>>> @@ -1655,7 +1655,9 @@ struct task_struct {
>>>>>         struct signal_struct *signal;
>>>>>         struct sighand_struct *sighand;
>>>>>
>>>>> +       // struct thread_struct thread;   // until here everything is fine
>>>>>         sigset_t blocked, real_blocked;
>>>>> +       struct thread_struct thread;      // from here it's broken
>>>>>         sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
>>>>>         struct sigpending pending;
>>>>
>>>> Wow, thanks for all the debugging here!
>>>>
>>>> So, we know it has to do with signals, thread_info, and probably only
>>>> affects 32-bit powerpc.  Seems awfully weird.  Have you checked with any
>>>> of the 64-bit powerpc guys to see if they have any ideas?
>>>>
>>>> I went grepping around for a bit.
>>>>
>>>> Where is the task_struct stored?  Is it on-stack on ppc32 or something?
>>>>  The thread_info is, I assume, but I see some THREAD_INFO vs. THREAD
>>>> (thread struct) math happening in here, which confuses me:
>>>>
>>>>         .globl  ret_from_debug_exc
>>>> ret_from_debug_exc:
>>>>         mfspr   r9,SPRN_SPRG_THREAD
>>>>         lwz     r10,SAVED_KSP_LIMIT(r1)
>>>>         stw     r10,KSP_LIMIT(r9)
>>>>         lwz     r9,THREAD_INFO-THREAD(r9)
>>>>         CURRENT_THREAD_INFO(r10, r1)
>>>>         lwz     r10,TI_PREEMPT(r10)
>>>>         stw     r10,TI_PREEMPT(r9)
>>>>         RESTORE_xSRR(SRR0,SRR1);
>>>>         RESTORE_xSRR(CSRR0,CSRR1);
>>>>         RESTORE_MMU_REGS;
>>>>         RET_FROM_EXC_LEVEL(SPRN_DSRR0, SPRN_DSRR1, PPC_RFDI)
>>>>
>>>> But, I'm really at a loss to explain this.  It still seems like a deeply
>>>> ppc-specific issue.  We can obviously work around it with an #ifdef for
>>>> your platform, but that's awfully hackish and hides the real bug,
>>>> whatever it is.
>>>>
>>>> My suspicion is that there's a bug in the 32-bit ppc assembly somewhere.
>>>>  I don't see any references to 'blocked' or 'real_blocked' in assembly
>>>> though.  You could add a bunch of padding instead of moving the
>>>> thread_struct and see if that does anything, but that's really a stab in
>>>> the dark.
>>>>
>>>
>>> Just to let you know, I'm not sure it is the same issue, but I also get
>>> my 8xx target stuck when I try to use gdbserver.
>>>
>>> If I debug a very small app, it gets stuck quickly after the app has
>>> stopped: indeed, the console seems ok but as soon as I try to execute
>>> something simple, like a ps or top, it get stuck. The target still
>>> responds to pings, but nothing else.
>>>
>>> If I debug a big app, it gets stuck soon after the start of debug: I set
>>> a bpoint at main(), do a 'continue', get breaked at main(), do some
>>> steps with 'next' then it gets stuck.
>>>
>>> I have tried moving the struct thread_struct thread but it has no impact.
>>>
>>
>> that sounds a bit different to what I see. Is your program also mutli-threaded?
>>
>> Maybe you could try with the program I use to reproduce the error:
>>
>> --- snip -----
>> #include <pthread.h>
>> #include <stdio.h>
>> #include <unistd.h>
>>
>> void * th_1_func()
>> {
>>    while (1) {
>>      sleep(2);
>>      printf("Hello from thread function 1)\n");
>>    }
>> }
>>
>> int main() {
>>   int err;
>>   pthread_t th_1, th_2, th_3;
>>
>>   err = pthread_create(&th_1, NULL, th_1_func, NULL);
>>   if (err != 0)
>>     printf("pthread_create\n");
>>   err = pthread_create(&th_2, NULL, th_1_func, NULL);
>>   if (err != 0)
>>     printf("pthread_create\n");
>>   err = pthread_create(&th_3, NULL, th_1_func, NULL);
>>   if (err != 0)
>>     printf("pthread_create\n");
>>   while(1) {}
>>   return 0;
>> }
>> --- snap ---
>>
>> Then copy it to your target and start it with the gdbserver. If you let it run
>> from your host with gdb and try to stop it e.g in the sleep call and then try to
>> single step it you might see the error. But as I said in this thread the
>> behaviour might be different depending on your kernel configuration as I
>> encountered different behaviour when enabling FTRACE or SCHED_STAT.
>>
>> Best regards
>> Holger
>>
> 
> Hi
> 
> I just tried it on an 885 and on an 8323, it work properly on both targets.
> 
> You can see below the Debug Option that are active on my 8323 target.
> 


thanks for trying it.

Could you completely disable FTRACE? As it also works on my side when I have
FTRACE enabled.

Best regards
Holger
Christophe Leroy Aug. 19, 2016, 1:44 p.m. UTC | #11
Le 19/08/2016 à 13:14, Holger Brunck a écrit :
> On 19/08/16 13:03, Christophe Leroy wrote:
>>
>>
>> Le 17/08/2016 à 17:27, Holger Brunck a écrit :
>>> On 16/08/16 19:27, christophe leroy wrote:
>>>>
>>>>
>>>> Le 15/08/2016 à 18:19, Dave Hansen a écrit :
>>>>> On 08/15/2016 07:35 AM, Holger Brunck wrote:
>>>>>> I tried this but unfortunately the error only occurs while remote debugging.
>>>>>> Locally with gdb everything works fine. BTW we double-checked with a 85xx ppc
>>>>>> target which is also 32-bit and it ends up with the same behaviour.
>>>>>>
>>>>>> I was also investigating where I have to move the line in the struct task_struct
>>>>>> and it turns out to be like this (diff to 4.7 kernel):
>>>>>>
>>>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>>>> index 253538f..4868874 100644
>>>>>> --- a/include/linux/sched.h
>>>>>> +++ b/include/linux/sched.h
>>>>>> @@ -1655,7 +1655,9 @@ struct task_struct {
>>>>>>         struct signal_struct *signal;
>>>>>>         struct sighand_struct *sighand;
>>>>>>
>>>>>> +       // struct thread_struct thread;   // until here everything is fine
>>>>>>         sigset_t blocked, real_blocked;
>>>>>> +       struct thread_struct thread;      // from here it's broken
>>>>>>         sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
>>>>>>         struct sigpending pending;
>>>>>
>>>>> Wow, thanks for all the debugging here!
>>>>>
>>>>> So, we know it has to do with signals, thread_info, and probably only
>>>>> affects 32-bit powerpc.  Seems awfully weird.  Have you checked with any
>>>>> of the 64-bit powerpc guys to see if they have any ideas?
>>>>>
>>>>> I went grepping around for a bit.
>>>>>
>>>>> Where is the task_struct stored?  Is it on-stack on ppc32 or something?
>>>>>  The thread_info is, I assume, but I see some THREAD_INFO vs. THREAD
>>>>> (thread struct) math happening in here, which confuses me:
>>>>>
>>>>>         .globl  ret_from_debug_exc
>>>>> ret_from_debug_exc:
>>>>>         mfspr   r9,SPRN_SPRG_THREAD
>>>>>         lwz     r10,SAVED_KSP_LIMIT(r1)
>>>>>         stw     r10,KSP_LIMIT(r9)
>>>>>         lwz     r9,THREAD_INFO-THREAD(r9)
>>>>>         CURRENT_THREAD_INFO(r10, r1)
>>>>>         lwz     r10,TI_PREEMPT(r10)
>>>>>         stw     r10,TI_PREEMPT(r9)
>>>>>         RESTORE_xSRR(SRR0,SRR1);
>>>>>         RESTORE_xSRR(CSRR0,CSRR1);
>>>>>         RESTORE_MMU_REGS;
>>>>>         RET_FROM_EXC_LEVEL(SPRN_DSRR0, SPRN_DSRR1, PPC_RFDI)
>>>>>
>>>>> But, I'm really at a loss to explain this.  It still seems like a deeply
>>>>> ppc-specific issue.  We can obviously work around it with an #ifdef for
>>>>> your platform, but that's awfully hackish and hides the real bug,
>>>>> whatever it is.
>>>>>
>>>>> My suspicion is that there's a bug in the 32-bit ppc assembly somewhere.
>>>>>  I don't see any references to 'blocked' or 'real_blocked' in assembly
>>>>> though.  You could add a bunch of padding instead of moving the
>>>>> thread_struct and see if that does anything, but that's really a stab in
>>>>> the dark.
>>>>>
>>>>
>>>> Just to let you know, I'm not sure it is the same issue, but I also get
>>>> my 8xx target stuck when I try to use gdbserver.
>>>>
>>>> If I debug a very small app, it gets stuck quickly after the app has
>>>> stopped: indeed, the console seems ok but as soon as I try to execute
>>>> something simple, like a ps or top, it get stuck. The target still
>>>> responds to pings, but nothing else.
>>>>
>>>> If I debug a big app, it gets stuck soon after the start of debug: I set
>>>> a bpoint at main(), do a 'continue', get breaked at main(), do some
>>>> steps with 'next' then it gets stuck.
>>>>
>>>> I have tried moving the struct thread_struct thread but it has no impact.
>>>>
>>>
>>> that sounds a bit different to what I see. Is your program also mutli-threaded?
>>>
>>> Maybe you could try with the program I use to reproduce the error:
>>>
>>> --- snip -----
>>> #include <pthread.h>
>>> #include <stdio.h>
>>> #include <unistd.h>
>>>
>>> void * th_1_func()
>>> {
>>>    while (1) {
>>>      sleep(2);
>>>      printf("Hello from thread function 1)\n");
>>>    }
>>> }
>>>
>>> int main() {
>>>   int err;
>>>   pthread_t th_1, th_2, th_3;
>>>
>>>   err = pthread_create(&th_1, NULL, th_1_func, NULL);
>>>   if (err != 0)
>>>     printf("pthread_create\n");
>>>   err = pthread_create(&th_2, NULL, th_1_func, NULL);
>>>   if (err != 0)
>>>     printf("pthread_create\n");
>>>   err = pthread_create(&th_3, NULL, th_1_func, NULL);
>>>   if (err != 0)
>>>     printf("pthread_create\n");
>>>   while(1) {}
>>>   return 0;
>>> }
>>> --- snap ---
>>>
>>> Then copy it to your target and start it with the gdbserver. If you let it run
>>> from your host with gdb and try to stop it e.g in the sleep call and then try to
>>> single step it you might see the error. But as I said in this thread the
>>> behaviour might be different depending on your kernel configuration as I
>>> encountered different behaviour when enabling FTRACE or SCHED_STAT.
>>>
>>> Best regards
>>> Holger
>>>
>>
>> Hi
>>
>> I just tried it on an 885 and on an 8323, it work properly on both targets.
>>
>> You can see below the Debug Option that are active on my 8323 target.
>>
>
>
> thanks for trying it.
>
> Could you completely disable FTRACE? As it also works on my side when I have
> FTRACE enabled.
>
> Best regards
> Holger
>

I have now disabled completly FTRACE, the behaviour is still OK.

Christophe
Holger Brunck Aug. 19, 2016, 4:26 p.m. UTC | #12
On 19/08/16 15:44, Christophe Leroy wrote:
>>>
>>> I just tried it on an 885 and on an 8323, it work properly on both targets.
>>>
>>> You can see below the Debug Option that are active on my 8323 target.
>>>
>>
>>
>> thanks for trying it.
>>
>> Could you completely disable FTRACE? As it also works on my side when I have
>> FTRACE enabled.
>>
>>
> I have now disabled completly FTRACE, the behaviour is still OK.
> 

I double-checked it on my 8321 board and on this board it's also not
reproducible with my test program. Also on a board with  QorIQ P2041 it's not
reproducible so easy, but we see the error with our application code. Easy
reproducible with the mainline kernel for me it's only on my MPC8360 board.

Best regards
Holger
diff mbox

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 253538f..4868874 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1655,7 +1655,9 @@  struct task_struct {
        struct signal_struct *signal;
        struct sighand_struct *sighand;

+       // struct thread_struct thread;   // until here everything is fine
        sigset_t blocked, real_blocked;
+       struct thread_struct thread;      // from here it's broken
        sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
        struct sigpending pending;

@@ -1919,7 +1921,6 @@  struct task_struct {
        struct task_struct *oom_reaper_list;
 #endif
 /* CPU-specific state of this task */
-       struct thread_struct thread;
 /*

So it's in the area where some signal information are stored, which makes sense