Message ID | 6b5327e32549860c1e6c73e5b669528bfb383df2.camel@gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | Hitting BUG_ON in do_notify_resume() with gdb and SIGTRAP | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch powerpc/merge (858ea4d1f5c22aa2f8a6724c2a65458294f2ddc6) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | warning | total: 1 errors, 3 warnings, 0 checks, 13 lines checked |
snowpatch_ozlabs/needsstable | success | Patch has no Fixes tags |
Le 04/07/2021 à 23:38, Radu Rendec a écrit : > Hi Everyone, > > I'm trying to set up my (virtual) environment to test an old bug in the > PPC32 ptrace() code. I came across a completely different problem, > which seems to make gdb pretty much unusable on PPC32. I'm not sure if > this is a real kernel bug or maybe something wrong with my > configuration. > > I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running > native gdb (inside the VM) and setting a breakpoint in main() in a test > "hello world" program. Upon running the test program, I am hitting the > BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is > included below at the end of the email. > > FWIW, gdb says: > Program terminated with signal SIGTRAP, Trace/breakpoint trap. > The program no longer exists. > > I also added a pr_info() to do_notify_resume() just to see how much > different 'regs' and 'current->thread.regs' are. Surprisingly, they are > just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to > be OK (pid and comm are consistent with the test program). The TRAP = 0x7d8 is obviously wrong. Need to know which 'TRAP' it is exactly. Could you try to dump what we have at the correct regs ? Something like 'show_regs(current->thread.regs)' should do it. > > If I'm not missing something, the 'regs' pointer that is eventually > passed to do_notify_resume() is calculated in interrupt_return() by > adding STACK_FRAME_OVERHEAD (defined to 112) to the value of r1. That > means all registers are saved on the stack before entering the > interrupt handler and, upon returning, a pointer to the register > structure is calculated from the stack pointer. Either the offset > itself is wrong, or the stack pointer is off by 0x30. > > This is as far as I have gone. Hopefully this rings a bell to someone > who is familiar with that part of the code and has a better > understanding of PPC32 interrupt handling in general. > > Last but not least, my configuration is fairly standard. I'm using the > powerpc-e500mc--glibc--bleeding-edge-2020.08-1 toolchain from Bootlin > to compile everything (kernel and user-space). The qemu version is > 5.2.0 (Fedora 34) and the user-space is a small Busybox based rootfs > that I built using Buildroot 2021.05. The gdb version is 9.2. > > regs=c7955f10 cur=c7955f40 pid=138 comm=test > ------------[ cut here ]------------ > kernel BUG at arch/powerpc/kernel/signal.c:296! > Oops: Exception in kernel mode, sig: 5 [#1] > BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500 > Modules linked in: > CPU: 0 PID: 138 Comm: test Not tainted 5.13.0-dirty #3 > NIP: c0009694 LR: c0009694 CTR: c065f540 > REGS: c7955dc0 TRAP: 0700 Not tainted (5.13.0-dirty) > MSR: 00028002 <CE,EE> CR: 28000282 XER: 20000000 > > GPR00: c0009694 c7955e80 c7145100 0000002c dfbdc3d4 dfbe5d24 00000027 dfbdc3d8 > GPR08: c0ffe988 00000000 00000000 00000000 22000282 00000000 00000000 b7fe17b4 > GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 > GPR24: bffff970 b7fff130 00000001 bffff960 c7955f10 00000800 c7145100 00000102 > NIP [c0009694] do_notify_resume+0x314/0x320 > LR [c0009694] do_notify_resume+0x314/0x320 > Call Trace: > [c7955e80] [c0009694] do_notify_resume+0x314/0x320 (unreliable) > [c7955ee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0 > [c7955f00] [c00151e8] interrupt_return+0x14/0x13c > --- interrupt: 7d8 at 0xb7fc3714 > NIP: b7fc3714 LR: b7fc3714 CTR: 00000003 > REGS: c7955f10 TRAP: 07d8 Not tainted (5.13.0-dirty) > MSR: 1002d002 <CE,EE,PR,ME> CR: 22000284 XER: 00000000 > > GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552 > GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4 > GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 > GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 > NIP [b7fc3714] 0xb7fc3714 > LR [b7fc3714] 0xb7fc3714 > --- interrupt: 7d8 > Instruction dump: > 4bffff04 7c0802a6 93a10054 90010064 93c10058 48b95369 80c20398 3c60c0dc > 7f84e378 38e204b0 3863ce30 4809d819 <0fe00000> 60000000 60000000 3d20c0ff > ---[ end trace 065671519ba3d526 ]--- > > Note: the BUG() line is slightly different because I had made this > small change to print the pointers: > > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c > index 9ded046edb0e..57ea6e500a6f 100644 > --- a/arch/powerpc/kernel/signal.c > +++ b/arch/powerpc/kernel/signal.c > @@ -289,7 +289,12 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) > klp_update_patch_state(current); > > if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) { > - BUG_ON(regs != current->thread.regs); > + if (regs != current->thread.regs) { > + pr_info("regs=%px cur=%px pid=%d comm=%s\n", > + regs, current->thread.regs, > + current->pid, current->comm); > + BUG(); > + } > do_signal(current); > } > >
On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote: > Le 04/07/2021 à 23:38, Radu Rendec a écrit : > > I'm trying to set up my (virtual) environment to test an old bug in the > > PPC32 ptrace() code. I came across a completely different problem, > > which seems to make gdb pretty much unusable on PPC32. I'm not sure if > > this is a real kernel bug or maybe something wrong with my > > configuration. > > > > I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running > > native gdb (inside the VM) and setting a breakpoint in main() in a test > > "hello world" program. Upon running the test program, I am hitting the > > BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is > > included below at the end of the email. > > > > FWIW, gdb says: > > Program terminated with signal SIGTRAP, Trace/breakpoint trap. > > The program no longer exists. > > > > I also added a pr_info() to do_notify_resume() just to see how much > > different 'regs' and 'current->thread.regs' are. Surprisingly, they are > > just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to > > be OK (pid and comm are consistent with the test program). > > The TRAP = 0x7d8 is obviously wrong. > > Need to know which 'TRAP' it is exactly. > Could you try to dump what we have at the correct regs ? > Something like 'show_regs(current->thread.regs)' should do it. Sure, please see the output below. It looks to me like the "correct" regs are just garbage. Either they are overwritten or current->thread.regs is wrong. But in any case, r1 = 0 doesn't look good. regs=c7a0ff10 cur=c7a0ff40 pid=139 comm=test CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4 NIP: 10000338 LR: 00030000 CTR: 00000003 REGS: c7a0ff40 TRAP: 670000 Not tainted (5.13.0-dirty) MSR: 1002d202 <CE,EE,PR,ME,DE> CR: 10000004 XER: 80670100 GPR00: b7fc36d8 00000000 00000000 b7fe17b4 00000000 b7ffd588 b7ffe8b8 b7ffee10 GPR08: b7fff214 b7ffdf40 b7fff208 bffff858 bffff970 b7fff130 00000001 bffff960 GPR16: b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 b7fc3714 1002d002 b7fff208 00000003 GPR24: b7fc3714 00000000 22000284 bffff960 000007d8 10000338 08000000 bffff850 NIP [10000338] 0x10000338 LR [00030000] 0x30000 Call Trace: [c7a0fe40] [c0008eac] show_regs+0x4c/0x1b0 (unreliable) [c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320 [c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0 [c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c --- interrupt: 7d8 at 0xb7fc3714 NIP: b7fc3714 LR: b7fc3714 CTR: 00000003 REGS: c7a0ff10 TRAP: 07d8 Not tainted (5.13.0-dirty) MSR: 1002d002 <CE,EE,PR,ME> CR: 22000284 XER: 00000000 GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552 GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4 GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 NIP [b7fc3714] 0xb7fc3714 LR [b7fc3714] 0xb7fc3714 --- interrupt: 7d8 ------------[ cut here ]------------ kernel BUG at arch/powerpc/kernel/signal.c:298! Oops: Exception in kernel mode, sig: 5 [#1] BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500 Modules linked in: CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4 NIP: c000969c LR: c000969c CTR: c065f540 REGS: c7a0fdc0 TRAP: 0700 Not tainted (5.13.0-dirty) MSR: 00028002 <CE,EE> CR: 28000282 XER: 20000000 GPR00: c000969c c7a0fe80 c70ef500 c70efbd8 c70ef500 00000010 c7a0fc38 00000002 GPR08: 00000001 00000000 00000000 00000000 28000282 00000000 00000000 b7fe17b4 GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 GPR24: bffff970 b7fff130 00000001 bffff960 c7a0ff10 00000800 c70ef500 00000102 NIP [c000969c] do_notify_resume+0x31c/0x320 LR [c000969c] do_notify_resume+0x31c/0x320 Call Trace: [c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320 (unreliable) [c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0 [c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c --- interrupt: 7d8 at 0xb7fc3714 NIP: b7fc3714 LR: b7fc3714 CTR: 00000003 REGS: c7a0ff10 TRAP: 07d8 Not tainted (5.13.0-dirty) MSR: 1002d002 <CE,EE,PR,ME> CR: 22000284 XER: 00000000 GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552 GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4 GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 NIP [b7fc3714] 0xb7fc3714 LR [b7fc3714] 0xb7fc3714 --- interrupt: 7d8 Instruction dump: 93a10054 90010064 93c10058 48b95369 80c20398 3c60c0dc 7f84e378 38e204b0 3863ce30 4809d819 80620704 4bfff7c9 <0fe00000> 3d20c0ff 8129c014 2c090000 ---[ end trace 1cbf27cd19ae39a0 ]--- > > If I'm not missing something, the 'regs' pointer that is eventually > > passed to do_notify_resume() is calculated in interrupt_return() by > > adding STACK_FRAME_OVERHEAD (defined to 112) to the value of r1. That > > means all registers are saved on the stack before entering the > > interrupt handler and, upon returning, a pointer to the register > > structure is calculated from the stack pointer. Either the offset > > itself is wrong, or the stack pointer is off by 0x30. > > > > This is as far as I have gone. Hopefully this rings a bell to someone > > who is familiar with that part of the code and has a better > > understanding of PPC32 interrupt handling in general. > > > > Last but not least, my configuration is fairly standard. I'm using the > > powerpc-e500mc--glibc--bleeding-edge-2020.08-1 toolchain from Bootlin > > to compile everything (kernel and user-space). The qemu version is > > 5.2.0 (Fedora 34) and the user-space is a small Busybox based rootfs > > that I built using Buildroot 2021.05. The gdb version is 9.2.
Le 06/07/2021 à 13:56, Radu Rendec a écrit : > On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote: >> Le 04/07/2021 à 23:38, Radu Rendec a écrit : >>> I'm trying to set up my (virtual) environment to test an old bug in the >>> PPC32 ptrace() code. I came across a completely different problem, >>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if >>> this is a real kernel bug or maybe something wrong with my >>> configuration. >>> >>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running >>> native gdb (inside the VM) and setting a breakpoint in main() in a test >>> "hello world" program. Upon running the test program, I am hitting the >>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is >>> included below at the end of the email. >>> >>> FWIW, gdb says: >>> Program terminated with signal SIGTRAP, Trace/breakpoint trap. >>> The program no longer exists. >>> >>> I also added a pr_info() to do_notify_resume() just to see how much >>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are >>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to >>> be OK (pid and comm are consistent with the test program). >> >> The TRAP = 0x7d8 is obviously wrong. >> >> Need to know which 'TRAP' it is exactly. >> Could you try to dump what we have at the correct regs ? >> Something like 'show_regs(current->thread.regs)' should do it. > > Sure, please see the output below. It looks to me like the "correct" > regs are just garbage. Either they are overwritten or current->thread.regs > is wrong. But in any case, r1 = 0 doesn't look good. Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see. Commit https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26 Need to find the best solution now to fix that. > > regs=c7a0ff10 cur=c7a0ff40 pid=139 comm=test > CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4 > NIP: 10000338 LR: 00030000 CTR: 00000003 > REGS: c7a0ff40 TRAP: 670000 Not tainted (5.13.0-dirty) > MSR: 1002d202 <CE,EE,PR,ME,DE> CR: 10000004 XER: 80670100 > > GPR00: b7fc36d8 00000000 00000000 b7fe17b4 00000000 b7ffd588 b7ffe8b8 b7ffee10 > GPR08: b7fff214 b7ffdf40 b7fff208 bffff858 bffff970 b7fff130 00000001 bffff960 > GPR16: b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 b7fc3714 1002d002 b7fff208 00000003 > GPR24: b7fc3714 00000000 22000284 bffff960 000007d8 10000338 08000000 bffff850 > NIP [10000338] 0x10000338 > LR [00030000] 0x30000 > Call Trace: > [c7a0fe40] [c0008eac] show_regs+0x4c/0x1b0 (unreliable) > [c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320 > [c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0 > [c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c > --- interrupt: 7d8 at 0xb7fc3714 > NIP: b7fc3714 LR: b7fc3714 CTR: 00000003 > REGS: c7a0ff10 TRAP: 07d8 Not tainted (5.13.0-dirty) > MSR: 1002d002 <CE,EE,PR,ME> CR: 22000284 XER: 00000000 > > GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552 > GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4 > GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 > GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 > NIP [b7fc3714] 0xb7fc3714 > LR [b7fc3714] 0xb7fc3714 > --- interrupt: 7d8 > ------------[ cut here ]------------ > kernel BUG at arch/powerpc/kernel/signal.c:298! > Oops: Exception in kernel mode, sig: 5 [#1] > BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500 > Modules linked in: > CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4 > NIP: c000969c LR: c000969c CTR: c065f540 > REGS: c7a0fdc0 TRAP: 0700 Not tainted (5.13.0-dirty) > MSR: 00028002 <CE,EE> CR: 28000282 XER: 20000000 > > GPR00: c000969c c7a0fe80 c70ef500 c70efbd8 c70ef500 00000010 c7a0fc38 00000002 > GPR08: 00000001 00000000 00000000 00000000 28000282 00000000 00000000 b7fe17b4 > GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 > GPR24: bffff970 b7fff130 00000001 bffff960 c7a0ff10 00000800 c70ef500 00000102 > NIP [c000969c] do_notify_resume+0x31c/0x320 > LR [c000969c] do_notify_resume+0x31c/0x320 > Call Trace: > [c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320 (unreliable) > [c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0 > [c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c > --- interrupt: 7d8 at 0xb7fc3714 > NIP: b7fc3714 LR: b7fc3714 CTR: 00000003 > REGS: c7a0ff10 TRAP: 07d8 Not tainted (5.13.0-dirty) > MSR: 1002d002 <CE,EE,PR,ME> CR: 22000284 XER: 00000000 > > GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552 > GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4 > GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 > GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 > NIP [b7fc3714] 0xb7fc3714 > LR [b7fc3714] 0xb7fc3714 > --- interrupt: 7d8 > Instruction dump: > 93a10054 90010064 93c10058 48b95369 80c20398 3c60c0dc 7f84e378 38e204b0 > 3863ce30 4809d819 80620704 4bfff7c9 <0fe00000> 3d20c0ff 8129c014 2c090000 > ---[ end trace 1cbf27cd19ae39a0 ]--- > >>> If I'm not missing something, the 'regs' pointer that is eventually >>> passed to do_notify_resume() is calculated in interrupt_return() by >>> adding STACK_FRAME_OVERHEAD (defined to 112) to the value of r1. That >>> means all registers are saved on the stack before entering the >>> interrupt handler and, upon returning, a pointer to the register >>> structure is calculated from the stack pointer. Either the offset >>> itself is wrong, or the stack pointer is off by 0x30. >>> >>> This is as far as I have gone. Hopefully this rings a bell to someone >>> who is familiar with that part of the code and has a better >>> understanding of PPC32 interrupt handling in general. >>> >>> Last but not least, my configuration is fairly standard. I'm using the >>> powerpc-e500mc--glibc--bleeding-edge-2020.08-1 toolchain from Bootlin >>> to compile everything (kernel and user-space). The qemu version is >>> 5.2.0 (Fedora 34) and the user-space is a small Busybox based rootfs >>> that I built using Buildroot 2021.05. The gdb version is 9.2.
On Tue, 2021-07-06 at 15:16 +0200, Christophe Leroy wrote: >Le 06/07/2021 à 13:56, Radu Rendec a écrit : >> On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote: >>> Le 04/07/2021 à 23:38, Radu Rendec a écrit : >>>> I'm trying to set up my (virtual) environment to test an old bug in the >>>> PPC32 ptrace() code. I came across a completely different problem, >>>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if >>>> this is a real kernel bug or maybe something wrong with my >>>> configuration. >>>> >>>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running >>>> native gdb (inside the VM) and setting a breakpoint in main() in a test >>>> "hello world" program. Upon running the test program, I am hitting the >>>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is >>>> included below at the end of the email. >>>> >>>> FWIW, gdb says: >>>> Program terminated with signal SIGTRAP, Trace/breakpoint trap. >>>> The program no longer exists. >>>> >>>> I also added a pr_info() to do_notify_resume() just to see how much >>>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are >>>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to >>>> be OK (pid and comm are consistent with the test program). >>> >>> The TRAP = 0x7d8 is obviously wrong. >>> >>> Need to know which 'TRAP' it is exactly. >>> Could you try to dump what we have at the correct regs ? >>> Something like 'show_regs(current->thread.regs)' should do it. >> >> Sure, please see the output below. It looks to me like the "correct" >> regs are just garbage. Either they are overwritten or current->thread.regs >> is wrong. But in any case, r1 = 0 doesn't look good. > >Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct >exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see. > >Commit >https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26 > >Need to find the best solution now to fix that. Awesome, happy to see you figured it out so quickly. I'm not sure if it makes any sense, but one thing that comes to mind is to put struct exception_regs before struct pt_regs when the frame is saved. Unless of course other parts of the code expect the opposite. >> regs=c7a0ff10 cur=c7a0ff40 pid=139 comm=test >> CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4 >> NIP: 10000338 LR: 00030000 CTR: 00000003 >> REGS: c7a0ff40 TRAP: 670000 Not tainted (5.13.0-dirty) >> MSR: 1002d202 <CE,EE,PR,ME,DE> CR: 10000004 XER: 80670100 >> >> GPR00: b7fc36d8 00000000 00000000 b7fe17b4 00000000 b7ffd588 b7ffe8b8 b7ffee10 >> GPR08: b7fff214 b7ffdf40 b7fff208 bffff858 bffff970 b7fff130 00000001 bffff960 >> GPR16: b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 b7fc3714 1002d002 b7fff208 00000003 >> GPR24: b7fc3714 00000000 22000284 bffff960 000007d8 10000338 08000000 bffff850 >> NIP [10000338] 0x10000338 >> LR [00030000] 0x30000 >> Call Trace: >> [c7a0fe40] [c0008eac] show_regs+0x4c/0x1b0 (unreliable) >> [c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320 >> [c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0 >> [c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c >> --- interrupt: 7d8 at 0xb7fc3714 >> NIP: b7fc3714 LR: b7fc3714 CTR: 00000003 >> REGS: c7a0ff10 TRAP: 07d8 Not tainted (5.13.0-dirty) >> MSR: 1002d002 <CE,EE,PR,ME> CR: 22000284 XER: 00000000 >> >> GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552 >> GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4 >> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 >> GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 >> NIP [b7fc3714] 0xb7fc3714 >> LR [b7fc3714] 0xb7fc3714 >> --- interrupt: 7d8 >> ------------[ cut here ]------------ >> kernel BUG at arch/powerpc/kernel/signal.c:298! >> Oops: Exception in kernel mode, sig: 5 [#1] >> BE PAGE_SIZE=4K SMP NR_CPUS=24 QEMU e500 >> Modules linked in: >> CPU: 0 PID: 139 Comm: test Not tainted 5.13.0-dirty #4 >> NIP: c000969c LR: c000969c CTR: c065f540 >> REGS: c7a0fdc0 TRAP: 0700 Not tainted (5.13.0-dirty) >> MSR: 00028002 <CE,EE> CR: 28000282 XER: 20000000 >> >> GPR00: c000969c c7a0fe80 c70ef500 c70efbd8 c70ef500 00000010 c7a0fc38 00000002 >> GPR08: 00000001 00000000 00000000 00000000 28000282 00000000 00000000 b7fe17b4 >> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 >> GPR24: bffff970 b7fff130 00000001 bffff960 c7a0ff10 00000800 c70ef500 00000102 >> NIP [c000969c] do_notify_resume+0x31c/0x320 >> LR [c000969c] do_notify_resume+0x31c/0x320 >> Call Trace: >> [c7a0fe80] [c000969c] do_notify_resume+0x31c/0x320 (unreliable) >> [c7a0fee0] [c0010b94] interrupt_exit_user_prepare+0x94/0xc0 >> [c7a0ff00] [c00151e8] interrupt_return+0x14/0x13c >> --- interrupt: 7d8 at 0xb7fc3714 >> NIP: b7fc3714 LR: b7fc3714 CTR: 00000003 >> REGS: c7a0ff10 TRAP: 07d8 Not tainted (5.13.0-dirty) >> MSR: 1002d002 <CE,EE,PR,ME> CR: 22000284 XER: 00000000 >> >> GPR00: b7fc3584 bffff850 00000000 00000000 00000000 00000000 000000a0 6474e552 >> GPR08: b7fbe0d4 00000001 b7fff230 bffff850 b7fc36d8 00000000 00000000 b7fe17b4 >> GPR16: 00000000 b7ffd588 b7ffe8b8 b7ffee10 b7fff214 b7ffdf40 b7fff208 bffff858 >> GPR24: bffff970 b7fff130 00000001 bffff960 b7fff2b0 b7ffd5f0 b7ffe8a8 bffff850 >> NIP [b7fc3714] 0xb7fc3714 >> LR [b7fc3714] 0xb7fc3714 >> --- interrupt: 7d8 >> Instruction dump: >> 93a10054 90010064 93c10058 48b95369 80c20398 3c60c0dc 7f84e378 38e204b0 >> 3863ce30 4809d819 80620704 4bfff7c9 <0fe00000> 3d20c0ff 8129c014 2c090000 >> ---[ end trace 1cbf27cd19ae39a0 ]--- >> >>>> If I'm not missing something, the 'regs' pointer that is eventually >>>> passed to do_notify_resume() is calculated in interrupt_return() by >>>> adding STACK_FRAME_OVERHEAD (defined to 112) to the value of r1. That >>>> means all registers are saved on the stack before entering the >>>> interrupt handler and, upon returning, a pointer to the register >>>> structure is calculated from the stack pointer. Either the offset >>>> itself is wrong, or the stack pointer is off by 0x30. >>>> >>>> This is as far as I have gone. Hopefully this rings a bell to someone >>>> who is familiar with that part of the code and has a better >>>> understanding of PPC32 interrupt handling in general. >>>> >>>> Last but not least, my configuration is fairly standard. I'm using the >>>> powerpc-e500mc--glibc--bleeding-edge-2020.08-1 toolchain from Bootlin >>>> to compile everything (kernel and user-space). The qemu version is >>>> 5.2.0 (Fedora 34) and the user-space is a small Busybox based rootfs >>>> that I built using Buildroot 2021.05. The gdb version is 9.2.
Le 06/07/2021 à 15:50, Radu Rendec a écrit : > On Tue, 2021-07-06 at 15:16 +0200, Christophe Leroy wrote: >> Le 06/07/2021 à 13:56, Radu Rendec a écrit : >>> On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote: >>>> Le 04/07/2021 à 23:38, Radu Rendec a écrit : >>>>> I'm trying to set up my (virtual) environment to test an old bug in the >>>>> PPC32 ptrace() code. I came across a completely different problem, >>>>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if >>>>> this is a real kernel bug or maybe something wrong with my >>>>> configuration. >>>>> >>>>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running >>>>> native gdb (inside the VM) and setting a breakpoint in main() in a test >>>>> "hello world" program. Upon running the test program, I am hitting the >>>>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is >>>>> included below at the end of the email. >>>>> >>>>> FWIW, gdb says: >>>>> Program terminated with signal SIGTRAP, Trace/breakpoint trap. >>>>> The program no longer exists. >>>>> >>>>> I also added a pr_info() to do_notify_resume() just to see how much >>>>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are >>>>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to >>>>> be OK (pid and comm are consistent with the test program). >>>> >>>> The TRAP = 0x7d8 is obviously wrong. >>>> >>>> Need to know which 'TRAP' it is exactly. >>>> Could you try to dump what we have at the correct regs ? >>>> Something like 'show_regs(current->thread.regs)' should do it. >>> >>> Sure, please see the output below. It looks to me like the "correct" >>> regs are just garbage. Either they are overwritten or current->thread.regs >>> is wrong. But in any case, r1 = 0 doesn't look good. >> >> Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct >> exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see. >> >> Commit >> https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26 >> >> Need to find the best solution now to fix that. > > Awesome, happy to see you figured it out so quickly. > > I'm not sure if it makes any sense, but one thing that comes to mind is > to put struct exception_regs before struct pt_regs when the frame is > saved. Unless of course other parts of the code expect the opposite. > Yes I think it is a good idea. I think I won't have time to look at that before summer vacation though.
On Tue, 2021-07-06 at 15:53 +0200, Christophe Leroy wrote: >Le 06/07/2021 à 15:50, Radu Rendec a écrit : >> On Tue, 2021-07-06 at 15:16 +0200, Christophe Leroy wrote: >>> Le 06/07/2021 à 13:56, Radu Rendec a écrit : >>>> On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote: >>>>> Le 04/07/2021 à 23:38, Radu Rendec a écrit : >>>>>> I'm trying to set up my (virtual) environment to test an old bug in the >>>>>> PPC32 ptrace() code. I came across a completely different problem, >>>>>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if >>>>>> this is a real kernel bug or maybe something wrong with my >>>>>> configuration. >>>>>> >>>>>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running >>>>>> native gdb (inside the VM) and setting a breakpoint in main() in a test >>>>>> "hello world" program. Upon running the test program, I am hitting the >>>>>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is >>>>>> included below at the end of the email. >>>>>> >>>>>> FWIW, gdb says: >>>>>> Program terminated with signal SIGTRAP, Trace/breakpoint trap. >>>>>> The program no longer exists. >>>>>> >>>>>> I also added a pr_info() to do_notify_resume() just to see how much >>>>>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are >>>>>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to >>>>>> be OK (pid and comm are consistent with the test program). >>>>> >>>>> The TRAP = 0x7d8 is obviously wrong. >>>>> >>>>> Need to know which 'TRAP' it is exactly. >>>>> Could you try to dump what we have at the correct regs ? >>>>> Something like 'show_regs(current->thread.regs)' should do it. >>>> >>>> Sure, please see the output below. It looks to me like the "correct" >>>> regs are just garbage. Either they are overwritten or current->thread.regs >>>> is wrong. But in any case, r1 = 0 doesn't look good. >>> >>> Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct >>> exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see. >>> >>> Commit >>> https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26 >>> >>> Need to find the best solution now to fix that. >> >> Awesome, happy to see you figured it out so quickly. >> >> I'm not sure if it makes any sense, but one thing that comes to mind is >> to put struct exception_regs before struct pt_regs when the frame is >> saved. Unless of course other parts of the code expect the opposite. > >Yes I think it is a good idea. I think I won't have time to look at that before summer vacation though. I can take a stab at it. I'm not familiar with that part of the code, but the best way to learn is to get your hands dirty :) In the worst case, I won't fix it.
Le 06/07/2021 à 16:05, Radu Rendec a écrit : > On Tue, 2021-07-06 at 15:53 +0200, Christophe Leroy wrote: >> Le 06/07/2021 à 15:50, Radu Rendec a écrit : >>> On Tue, 2021-07-06 at 15:16 +0200, Christophe Leroy wrote: >>>> Le 06/07/2021 à 13:56, Radu Rendec a écrit : >>>>> On Tue, 2021-07-06 at 12:43 +0200, Christophe Leroy wrote: >>>>>> Le 04/07/2021 à 23:38, Radu Rendec a écrit : >>>>>>> I'm trying to set up my (virtual) environment to test an old bug in the >>>>>>> PPC32 ptrace() code. I came across a completely different problem, >>>>>>> which seems to make gdb pretty much unusable on PPC32. I'm not sure if >>>>>>> this is a real kernel bug or maybe something wrong with my >>>>>>> configuration. >>>>>>> >>>>>>> I'm running kernel 5.13 in a qemu VM with one e500mc CPU. I am running >>>>>>> native gdb (inside the VM) and setting a breakpoint in main() in a test >>>>>>> "hello world" program. Upon running the test program, I am hitting the >>>>>>> BUG_ON in do_notify_resume() on line 292. The kernel bug log snippet is >>>>>>> included below at the end of the email. >>>>>>> >>>>>>> FWIW, gdb says: >>>>>>> Program terminated with signal SIGTRAP, Trace/breakpoint trap. >>>>>>> The program no longer exists. >>>>>>> >>>>>>> I also added a pr_info() to do_notify_resume() just to see how much >>>>>>> different 'regs' and 'current->thread.regs' are. Surprisingly, they are >>>>>>> just 0x30 apart: regs=c7955f10 cur=c7955f40. Also, 'current' seems to >>>>>>> be OK (pid and comm are consistent with the test program). >>>>>> >>>>>> The TRAP = 0x7d8 is obviously wrong. >>>>>> >>>>>> Need to know which 'TRAP' it is exactly. >>>>>> Could you try to dump what we have at the correct regs ? >>>>>> Something like 'show_regs(current->thread.regs)' should do it. >>>>> >>>>> Sure, please see the output below. It looks to me like the "correct" >>>>> regs are just garbage. Either they are overwritten or current->thread.regs >>>>> is wrong. But in any case, r1 = 0 doesn't look good. >>>> >>>> Yes indeed. I think I identified the problem. For Critical interrupts like DEBUG interrupt, struct >>>> exception_regs is added, therefore the frame has 12x4 (0x30) more bytes. That's what you see. >>>> >>>> Commit >>>> https://github.com/linuxppc/linux/commit/db297c3b07af7856fb7c666fbc9792d8e37556be#diff-dd6b952a3980da19df4facccdb4f3dddeb8cef56ee384c7f03d02b23b0c6cb26 >>>> >>>> Need to find the best solution now to fix that. >>> >>> Awesome, happy to see you figured it out so quickly. >>> >>> I'm not sure if it makes any sense, but one thing that comes to mind is >>> to put struct exception_regs before struct pt_regs when the frame is >>> saved. Unless of course other parts of the code expect the opposite. >> >> Yes I think it is a good idea. I think I won't have time to look at that before summer vacation though. > > I can take a stab at it. I'm not familiar with that part of the code, > but the best way to learn is to get your hands dirty :) In the worst > case, I won't fix it. > Not that easy in fact. After looking once more, the best solution I see now would be to move the content of struct exception_regs into the second part of struct pt_regs (the kernel one in asm/ptrace.h). Changes should be limited to head_booke.h and asm-offsets.c struct exception_regs and STACK_EXC_LVL_FRAME_SIZE should go away. Christophe
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c index 9ded046edb0e..57ea6e500a6f 100644 --- a/arch/powerpc/kernel/signal.c +++ b/arch/powerpc/kernel/signal.c @@ -289,7 +289,12 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags) klp_update_patch_state(current); if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) { - BUG_ON(regs != current->thread.regs); + if (regs != current->thread.regs) { + pr_info("regs=%px cur=%px pid=%d comm=%s\n", + regs, current->thread.regs, + current->pid, current->comm); + BUG(); + } do_signal(current); }