mbox series

[GIT] Sparc

Message ID 20181007.234832.1963273306260975472.davem@davemloft.net
State Accepted
Delegated to: David Miller
Headers show
Series [GIT] Sparc | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git

Message

David Miller Oct. 8, 2018, 6:48 a.m. UTC
I've been moving so haven't had access to my Sparc boxes during this
time.  That's been resolved, and now I can get the patches flowing
again.

1) Minor fallthru comment tweaks from Gustavo A. R. Silva.

2) VLA removal from Kees Cook.

3) Make sparc vdso Makefile match x86, from Masahiro Yamada.

4) Fix clock divider programming in mach64 driver, from Mikulas
   Patocka.

Please pull, thanks a lot!

The following changes since commit c1d84a1b42ef70d8ae601df9cadedc7ed4f1beb1:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-10-06 02:11:30 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git 

for you to fetch changes up to b7dc10b64f6190a008f05baf697d4d8fa9b8ed51:

  sparc64: fix fall-through annotation (2018-10-07 22:42:02 -0700)

----------------------------------------------------------------
Colin Ian King (1):
      oradax: remove redundant null check before kfree

Gustavo A. R. Silva (2):
      sparc32: fix fall-through annotation
      sparc64: fix fall-through annotation

Kees Cook (1):
      sparc64: viohs: Remove VLA usage

Masahiro Yamada (1):
      sparc: vdso: clean-up vdso Makefile

Mikulas Patocka (1):
      mach64: detect the dot clock divider correctly on sparc

Rob Herring (2):
      sparc: Convert to using %pOFn instead of device_node.name
      sbus: Use of_get_child_by_name helper

 arch/sparc/kernel/auxio_64.c         |  4 ++--
 arch/sparc/kernel/kgdb_32.c          |  2 +-
 arch/sparc/kernel/kgdb_64.c          |  2 +-
 arch/sparc/kernel/power.c            |  4 ++--
 arch/sparc/kernel/prom_32.c          | 26 +++++++++++++-------------
 arch/sparc/kernel/prom_64.c          | 68 ++++++++++++++++++++++++++++++++++----------------------------------
 arch/sparc/kernel/viohs.c            | 12 +++++++++---
 arch/sparc/vdso/Makefile             |  8 +++-----
 drivers/sbus/char/openprom.c         | 11 +----------
 drivers/sbus/char/oradax.c           |  3 +--
 drivers/video/fbdev/aty/atyfb.h      |  3 ++-
 drivers/video/fbdev/aty/atyfb_base.c |  7 ++++---
 drivers/video/fbdev/aty/mach64_ct.c  | 10 +++++-----
 13 files changed, 78 insertions(+), 82 deletions(-)

Comments

Greg KH Oct. 8, 2018, 2:26 p.m. UTC | #1
On Sun, Oct 07, 2018 at 11:48:32PM -0700, David Miller wrote:
> 
> I've been moving so haven't had access to my Sparc boxes during this
> time.  That's been resolved, and now I can get the patches flowing
> again.
> 
> 1) Minor fallthru comment tweaks from Gustavo A. R. Silva.
> 
> 2) VLA removal from Kees Cook.
> 
> 3) Make sparc vdso Makefile match x86, from Masahiro Yamada.
> 
> 4) Fix clock divider programming in mach64 driver, from Mikulas
>    Patocka.
> 
> Please pull, thanks a lot!
> 
> The following changes since commit c1d84a1b42ef70d8ae601df9cadedc7ed4f1beb1:
> 
>   Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-10-06 02:11:30 -0700)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git 

Now merged, thanks.

greg k-h
John Paul Adrian Glaubitz Oct. 8, 2018, 6:42 p.m. UTC | #2
Hi Dave!

On 10/8/18 8:48 AM, David Miller wrote:
> I've been moving so haven't had access to my Sparc boxes during this
> time.  That's been resolved, and now I can get the patches flowing
> again.
Great to hear you're back!

Can I bring this bug [1] back to your attention which allowed a regular
userland application to crash the kernel? Would be interesting to see
whether you're able to reproduce the issue.

The offending binary is attached to the bug report. It's a test program
from the testsuite of the Free Pascal Compiler which was recently ported
to sparc64.

Thanks,
Adrian

> [1] https://bugzilla.kernel.org/show_bug.cgi?id=200625
David Miller Oct. 25, 2018, 11:39 p.m. UTC | #3
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Mon, 8 Oct 2018 20:42:14 +0200

> Hi Dave!
> 
> On 10/8/18 8:48 AM, David Miller wrote:
>> I've been moving so haven't had access to my Sparc boxes during this
>> time.  That's been resolved, and now I can get the patches flowing
>> again.
> Great to hear you're back!
> 
> Can I bring this bug [1] back to your attention which allowed a regular
> userland application to crash the kernel? Would be interesting to see
> whether you're able to reproduce the issue.
> 
> The offending binary is attached to the bug report. It's a test program
> from the testsuite of the Free Pascal Compiler which was recently ported
> to sparc64.
> 
> Thanks,
> Adrian
> 
>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=200625

I'm looking at this now.

I cannot reproduce the system crash, but I definitely get the illegal
instruction both with and without using strace.  And those system log
message don't show, it must be from some specific config option?
Actually I don't see that "Kernel memory exposure" string anywhere in
the standard upstream kernel.

Running under gdb the process is gone when the SIGILL arrives which
means that a signal wasn't sent but rather the process was forefully
exited by the kernel with SIGILL.

Only two things do that:

1) Failing register window faults (arch/sparc/kernel/process_64.c)
2) Invalid signal frames during signal delivery (arch/sparc/kernel/signal32,
   arch/sparc/kernel/signal_64.c)

This is a 64-bit ELF binary so my guess is setup_rt_frame().

I put some debugging in and I guessed wrong:

[   40.366945] [tweaklib2:428]: fault_in_user_windows() copy failed

Something definitely corrupted the stack and/or frame pointer in this
code.
David Miller Oct. 26, 2018, 3:29 a.m. UTC | #4
From: David Miller <davem@davemloft.net>
Date: Thu, 25 Oct 2018 16:39:22 -0700 (PDT)

> This is a 64-bit ELF binary so my guess is setup_rt_frame().
> 
> I put some debugging in and I guessed wrong:
> 
> [   40.366945] [tweaklib2:428]: fault_in_user_windows() copy failed
> 
> Something definitely corrupted the stack and/or frame pointer in this
> code.

What this amounts to is actually a very simple program:

davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ readelf -h tweaklib2
 ...
  Entry point address:               0x1003d0

Entry point 0x1003d0, what's that:

davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ objdump -d tweaklib2 | grep -A 1 1003d0
00000000001003d0 <SI_PRC_$$__FPC_DYNAMIC_PROC_START>:
  1003d0:       81 c3 e0 08     retl
  1003d4:       01 00 00 00     nop

An entry point which simply returns.  So this program does nothing,
main never gets called, we just return to the dynamic linker code that
jumps to the program's entry point which isn't expected.

So the test case is simply:

davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ cat test.s
        .text
        .align 4
        .globl _start
_start: retl
        nop
davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ gcc -nostdlib -o test test.s
davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ ./test
Segmentation fault
davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ 

and nothing more.

So what happens when you return from _start like this?  It ends up
looping, adjusting the stack pointer up by 48 bytes every iteration.
Eventually we push the stack pointer past the end of the stack mmap
area and crash.

The loop is:

davem@salty:/opt/davem/src/GIT/CRASH/kernel-crash-test$ gdb ./test
 ...
(gdb) b *0x0000010000000290
Breakpoint 1 at 0x10000000290
(gdb) display/i $pc
1: x/i $pc
<error: No registers.>
(gdb) run
Starting program: /opt/davem/src/GIT/CRASH/kernel-crash-test/test

Breakpoint 1, 0x0000010000000290 in _start ()
1: x/i $pc
=> 0x10000000290 <_start+4>:    nop
(gdb) stepi
0xfff0000100001550 in ?? ()
1: x/i $pc
=> 0xfff0000100001550:  sethi  %hi(0x113000), %g1
(gdb) 
0xfff0000100001554 in ?? ()
1: x/i $pc
=> 0xfff0000100001554:  xor  %g1, -572, %g1
(gdb) 
0xfff0000100001558 in ?? ()
1: x/i $pc
=> 0xfff0000100001558:  add  %l7, %g1, %g1
(gdb) 
0xfff000010000155c in ?? ()
1: x/i $pc
=> 0xfff000010000155c:  jmp  %l0
   0xfff0000100001560:  add  %sp, 0x30, %sp
(gdb) 
0xfff0000100001560 in ?? ()
1: x/i $pc
=> 0xfff0000100001560:  add  %sp, 0x30, %sp
(gdb) 
0x000001000000028c in _start () 
1: x/i $pc 
=> 0x1000000028c <_start>:      retl
   0x10000000290 <_start+4>:    nop 
(gdb) 

And that's how we loop forever.

It should not crash the machine and here with an upstream kernel it does not for me.
John Paul Adrian Glaubitz Oct. 26, 2018, 12:41 p.m. UTC | #5
Hi Dave!

Thanks for the explanation!

On 10/26/18 5:29 AM, David Miller wrote:
> (...)
> And that's how we loop forever.
> 
> It should not crash the machine and here with an upstream kernel it does not for me.

Interestingly, we couldn't reproduce the issue on the SPARC-T5 anymore either. It
definitely crashes the Sun Fire 2000 used for development of the Free Pascal
compiler though, just re-tested:

root@stadler:~/kernel-crash-test# strace ./tweaklib2 
execve("./tweaklib2", ["./tweaklib2"], 0x7feffaafb90 /* 22 vars */) = 0
brk(NULL)                               = 0x264000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff800100024000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./tls/v9v/libtweaklib1.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./tls/libtweaklib1.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./v9v/libtweaklib1.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./libtweaklib1.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\2\1\0\0\0\0\0\0\0\0\0\0\3\0+\0\0\0\1\0\0\0\0\0\0\310\360"..., 832) = 832
fstat64(3, {st_mode=S_IFREG|0755, st_size=1165312, ...}) = 0
getcwd("/root/kernel-crash-test", 128)  = 24
mmap(NULL, 1503776, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xffff800100128000
mprotect(0xffff800100188000, 1040384, PROT_NONE) = 0
mmap(0xffff800100286000, 57344, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5e000) = 0xffff800100286000
mmap(0xffff800100294000, 12832, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xffff800100294000
close(3)                                = 0
mprotect(0xffff800100286000, 8192, PROT_READ) = 0
mprotect(0x252000, 8192, PROT_READ)     = 0
mprotect(0xffff800100122000, 8192, PROT_READ) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
rt_sigaction(SIGFPE, {sa_handler=0xffff800100180e10, sa_mask=[], sa_flags=SA_SIGINFO}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 0xffff800100134ca8, 8) = 0
rt_sigaction(SIGSEGV, {sa_handler=0xffff800100180e10, sa_mask=[], sa_flags=SA_SIGINFO}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 0xffff800100134ca8, 8) = 0
rt_sigaction(SIGBUS, {sa_handler=0xffff800100180e10, sa_mask=[], sa_flags=SA_SIGINFO}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 0xffff800100134ca8, 8) = 0
rt_sigaction(SIGILL, {sa_handler=0xffff800100180e10, sa_mask=[], sa_flags=SA_SIGINFO}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 0xffff800100134ca8, 8) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
readlink("/proc/self/exe", "/root/kernel-crash-test/tweaklib"..., 255) = 33
rt_sigaction(SIGFPE, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, NULL, 0xffff800100134ca8, 8) = 0
rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, NULL, 0xffff800100134ca8, 8) = 0
rt_sigaction(SIGBUS, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, NULL, 0xffff800100134ca8, 8) = 0
rt_sigaction(SIGILL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, NULL, 0xffff800100134ca8, 8) = 0

And on the console:

sc> console -f
Warning: User < > currently has write permission to this console and forcibly removing them will terminate any current write actions and all work will be lost.  Would you like to continue? [y/n]y
Enter #. to return to ALOM.
[942798.597985] Unable to handle kernel paging request at virtual address 000000000153a000
[942798.598156] tsk->{mm,active_mm}->context = 0000000000000228
[942798.598253] tsk->{mm,active_mm}->pgd = ffff8007f5da8000
[942806.209845] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[942806.210036] rcu:    6-...0: (1 GPs behind) idle=84a/1/0x4000000000000002 softirq=8377262/8377263 fqs=2466 
[942806.210137] rcu:    (detected by 3, t=5337 jiffies, g=57015461, q=2750)

I will let you know whether your patch helps to address the problem.

Adrian
David Miller Oct. 26, 2018, 6:30 p.m. UTC | #6
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Date: Fri, 26 Oct 2018 14:41:53 +0200

>> It should not crash the machine and here with an upstream kernel it does not for me.
> 
> Interestingly, we couldn't reproduce the issue on the SPARC-T5 anymore either. It
> definitely crashes the Sun Fire 2000 used for development of the Free Pascal
> compiler though, just re-tested:

I tested it on my Ultra45

Please reproduce with a vanilla upstream kernel.  I think some of the local
debian kernel patches might be causing this.

Also, wrt. freepascal, as I described in my email, the link of that
binary is causing the NOP _start function to be set as the entry
point.

That can't be right, whatever ends up as the entry point should end up
calling 'main' so that the program actually runs.