Message ID | 20181007.234832.1963273306260975472.davem@davemloft.net |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [GIT] Sparc | expand |
On Sun, Oct 07, 2018 at 11:48:32PM -0700, David Miller wrote: > > I've been moving so haven't had access to my Sparc boxes during this > time. That's been resolved, and now I can get the patches flowing > again. > > 1) Minor fallthru comment tweaks from Gustavo A. R. Silva. > > 2) VLA removal from Kees Cook. > > 3) Make sparc vdso Makefile match x86, from Masahiro Yamada. > > 4) Fix clock divider programming in mach64 driver, from Mikulas > Patocka. > > Please pull, thanks a lot! > > The following changes since commit c1d84a1b42ef70d8ae601df9cadedc7ed4f1beb1: > > Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-10-06 02:11:30 -0700) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git Now merged, thanks. greg k-h
Hi Dave! On 10/8/18 8:48 AM, David Miller wrote: > I've been moving so haven't had access to my Sparc boxes during this > time. That's been resolved, and now I can get the patches flowing > again. Great to hear you're back! Can I bring this bug [1] back to your attention which allowed a regular userland application to crash the kernel? Would be interesting to see whether you're able to reproduce the issue. The offending binary is attached to the bug report. It's a test program from the testsuite of the Free Pascal Compiler which was recently ported to sparc64. Thanks, Adrian > [1] https://bugzilla.kernel.org/show_bug.cgi?id=200625
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Date: Mon, 8 Oct 2018 20:42:14 +0200 > Hi Dave! > > On 10/8/18 8:48 AM, David Miller wrote: >> I've been moving so haven't had access to my Sparc boxes during this >> time. That's been resolved, and now I can get the patches flowing >> again. > Great to hear you're back! > > Can I bring this bug [1] back to your attention which allowed a regular > userland application to crash the kernel? Would be interesting to see > whether you're able to reproduce the issue. > > The offending binary is attached to the bug report. It's a test program > from the testsuite of the Free Pascal Compiler which was recently ported > to sparc64. > > Thanks, > Adrian > >> [1] https://bugzilla.kernel.org/show_bug.cgi?id=200625 I'm looking at this now. I cannot reproduce the system crash, but I definitely get the illegal instruction both with and without using strace. And those system log message don't show, it must be from some specific config option? Actually I don't see that "Kernel memory exposure" string anywhere in the standard upstream kernel. Running under gdb the process is gone when the SIGILL arrives which means that a signal wasn't sent but rather the process was forefully exited by the kernel with SIGILL. Only two things do that: 1) Failing register window faults (arch/sparc/kernel/process_64.c) 2) Invalid signal frames during signal delivery (arch/sparc/kernel/signal32, arch/sparc/kernel/signal_64.c) This is a 64-bit ELF binary so my guess is setup_rt_frame(). I put some debugging in and I guessed wrong: [ 40.366945] [tweaklib2:428]: fault_in_user_windows() copy failed Something definitely corrupted the stack and/or frame pointer in this code.
From: David Miller <davem@davemloft.net> Date: Thu, 25 Oct 2018 16:39:22 -0700 (PDT) > This is a 64-bit ELF binary so my guess is setup_rt_frame(). > > I put some debugging in and I guessed wrong: > > [ 40.366945] [tweaklib2:428]: fault_in_user_windows() copy failed > > Something definitely corrupted the stack and/or frame pointer in this > code. What this amounts to is actually a very simple program: davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ readelf -h tweaklib2 ... Entry point address: 0x1003d0 Entry point 0x1003d0, what's that: davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ objdump -d tweaklib2 | grep -A 1 1003d0 00000000001003d0 <SI_PRC_$$__FPC_DYNAMIC_PROC_START>: 1003d0: 81 c3 e0 08 retl 1003d4: 01 00 00 00 nop An entry point which simply returns. So this program does nothing, main never gets called, we just return to the dynamic linker code that jumps to the program's entry point which isn't expected. So the test case is simply: davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ cat test.s .text .align 4 .globl _start _start: retl nop davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ gcc -nostdlib -o test test.s davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ ./test Segmentation fault davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ and nothing more. So what happens when you return from _start like this? It ends up looping, adjusting the stack pointer up by 48 bytes every iteration. Eventually we push the stack pointer past the end of the stack mmap area and crash. The loop is: davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ gdb ./test ... (gdb) b *0x0000010000000290 Breakpoint 1 at 0x10000000290 (gdb) display/i $pc 1: x/i $pc <error: No registers.> (gdb) run Starting program: /opt/davem/src/GIT/CRASH/kernel-crash-test/test Breakpoint 1, 0x0000010000000290 in _start () 1: x/i $pc => 0x10000000290 <_start+4>: nop (gdb) stepi 0xfff0000100001550 in ?? () 1: x/i $pc => 0xfff0000100001550: sethi %hi(0x113000), %g1 (gdb) 0xfff0000100001554 in ?? () 1: x/i $pc => 0xfff0000100001554: xor %g1, -572, %g1 (gdb) 0xfff0000100001558 in ?? () 1: x/i $pc => 0xfff0000100001558: add %l7, %g1, %g1 (gdb) 0xfff000010000155c in ?? () 1: x/i $pc => 0xfff000010000155c: jmp %l0 0xfff0000100001560: add %sp, 0x30, %sp (gdb) 0xfff0000100001560 in ?? () 1: x/i $pc => 0xfff0000100001560: add %sp, 0x30, %sp (gdb) 0x000001000000028c in _start () 1: x/i $pc => 0x1000000028c <_start>: retl 0x10000000290 <_start+4>: nop (gdb) And that's how we loop forever. It should not crash the machine and here with an upstream kernel it does not for me.
Hi Dave! Thanks for the explanation! On 10/26/18 5:29 AM, David Miller wrote: > (...) > And that's how we loop forever. > > It should not crash the machine and here with an upstream kernel it does not for me. Interestingly, we couldn't reproduce the issue on the SPARC-T5 anymore either. It definitely crashes the Sun Fire 2000 used for development of the Free Pascal compiler though, just re-tested: root@stadler:~/kernel-crash-test# strace ./tweaklib2 execve("./tweaklib2", ["./tweaklib2"], 0x7feffaafb90 /* 22 vars */) = 0 brk(NULL) = 0x264000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff800100024000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "./tls/v9v/libtweaklib1.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "./tls/libtweaklib1.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "./v9v/libtweaklib1.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "./libtweaklib1.so", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\2\1\0\0\0\0\0\0\0\0\0\0\3\0+\0\0\0\1\0\0\0\0\0\0\310\360"..., 832) = 832 fstat64(3, {st_mode=S_IFREG|0755, st_size=1165312, ...}) = 0 getcwd("/root/kernel-crash-test", 128) = 24 mmap(NULL, 1503776, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xffff800100128000 mprotect(0xffff800100188000, 1040384, PROT_NONE) = 0 mmap(0xffff800100286000, 57344, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5e000) = 0xffff800100286000 mmap(0xffff800100294000, 12832, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xffff800100294000 close(3) = 0 mprotect(0xffff800100286000, 8192, PROT_READ) = 0 mprotect(0x252000, 8192, PROT_READ) = 0 mprotect(0xffff800100122000, 8192, PROT_READ) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 rt_sigaction(SIGFPE, {sa_handler=0xffff800100180e10, sa_mask=[], sa_flags=SA_SIGINFO}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 0xffff800100134ca8, 8) = 0 rt_sigaction(SIGSEGV, {sa_handler=0xffff800100180e10, sa_mask=[], sa_flags=SA_SIGINFO}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 0xffff800100134ca8, 8) = 0 rt_sigaction(SIGBUS, {sa_handler=0xffff800100180e10, sa_mask=[], sa_flags=SA_SIGINFO}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 0xffff800100134ca8, 8) = 0 rt_sigaction(SIGILL, {sa_handler=0xffff800100180e10, sa_mask=[], sa_flags=SA_SIGINFO}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 0xffff800100134ca8, 8) = 0 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0 readlink("/proc/self/exe", "/root/kernel-crash-test/tweaklib"..., 255) = 33 rt_sigaction(SIGFPE, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, NULL, 0xffff800100134ca8, 8) = 0 rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, NULL, 0xffff800100134ca8, 8) = 0 rt_sigaction(SIGBUS, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, NULL, 0xffff800100134ca8, 8) = 0 rt_sigaction(SIGILL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, NULL, 0xffff800100134ca8, 8) = 0 And on the console: sc> console -f Warning: User < > currently has write permission to this console and forcibly removing them will terminate any current write actions and all work will be lost. Would you like to continue? [y/n]y Enter #. to return to ALOM. [942798.597985] Unable to handle kernel paging request at virtual address 000000000153a000 [942798.598156] tsk->{mm,active_mm}->context = 0000000000000228 [942798.598253] tsk->{mm,active_mm}->pgd = ffff8007f5da8000 [942806.209845] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [942806.210036] rcu: 6-...0: (1 GPs behind) idle=84a/1/0x4000000000000002 softirq=8377262/8377263 fqs=2466 [942806.210137] rcu: (detected by 3, t=5337 jiffies, g=57015461, q=2750) I will let you know whether your patch helps to address the problem. Adrian
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Date: Fri, 26 Oct 2018 14:41:53 +0200 >> It should not crash the machine and here with an upstream kernel it does not for me. > > Interestingly, we couldn't reproduce the issue on the SPARC-T5 anymore either. It > definitely crashes the Sun Fire 2000 used for development of the Free Pascal > compiler though, just re-tested: I tested it on my Ultra45 Please reproduce with a vanilla upstream kernel. I think some of the local debian kernel patches might be causing this. Also, wrt. freepascal, as I described in my email, the link of that binary is causing the NOP _start function to be set as the entry point. That can't be right, whatever ends up as the entry point should end up calling 'main' so that the program actually runs.