Patchwork spufs raises two exceptions

login
register
mail settings
Submitter masterzorag
Date March 6, 2012, 9:26 a.m.
Message ID <4F55D84B.7030306@gmail.com>
Download mbox | patch
Permalink /patch/144900/
State Superseded
Headers show

Comments

masterzorag - March 6, 2012, 9:26 a.m.
I'm running my test program, it uses all available spus to compute via 
OpenCL
kernel 3.2.5 on a ps3
even on testing spu directly, it crashes


=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
test/1067 is trying to release lock (&sb->s_type->i_mutex_key) at:
[<d0000000005828a8>] .do_spu_create+0x90/0xd8 [spufs]
but there are no more locks to release!
other info that might help us debug this:
no locks held by test/1067.
stack backtrace:
Call Trace:
[c00000000e9bfa30] [c0000000000110d0] .show_stack+0x6c/0x16c (unreliable)
[c00000000e9bfae0] [c000000000081f90] .print_unlock_inbalance_bug+0xe8/0x110
[c00000000e9bfb70] [c0000000000868cc] .lock_release+0xd8/0x200
[c00000000e9bfc10] [c0000000003efb60] .__mutex_unlock_slowpath+0x11c/0x1d8
[c00000000e9bfcb0] [d0000000005828a8] .do_spu_create+0x90/0xd8 [spufs]
[c00000000e9bfd70] [c0000000000346ac] .sys_spu_create+0x164/0x1c0
[c00000000e9bfe30] [c0000000000097d8] syscall_exit+0x0/0x40
------------[ cut here ]------------
kernel BUG at fs/dcache.c:474!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2 NUMA PS3
Modules linked in: spufs dm_mod btusb bluetooth usb_storage ohci_hcd 
snd_ps3 ehci_hcd snd_pcm snd_page_alloc snd_timer sg snd usbcore 
usb_common ps3flash rtc_ps3 soundcore ps3_lpm ps3vram [last unloaded: 
scsi_wait_scan]
NIP: c000000000109f94 LR: c000000000109f84 CTR: c0000000000a029c
REGS: c00000000e9bf930 TRAP: 0700 Not tainted (3.2.5)
MSR: 8000000000028032 <EE,CE,IR,DR> CR: 22004822 XER: 00000000
TASK = c0000000062f0ec0[1067] 'test' THREAD: c00000000e9bc000 CPU: 1
GPR00: 0000000000000001 c00000000e9bfbb0 c0000000006812e8 c00000000543b798
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002
GPR08: 0000000000000000 0000000000000000 c000000000109f84 c0000000062f0ec0
GPR12: 0000000082004824 c000000007ffe280 0000000000000004 00000000f7850688
GPR16: 00000000f7830734 00000000f78517a4 00000000f7852008 00000000f78517a8
GPR20: 00000000ff805dc0 000000000fd958a0 0000000000000000 000000000000000d
GPR24: 000000000fd98240 c00000000e101e10 0000000040000010 c00000000616e080
GPR28: c00000000543b738 c00000000543b798 c0000000006149e8 c00000000543b738
NIP [c000000000109f94] .dput+0x48/0x214
LR [c000000000109f84] .dput+0x38/0x214
Call Trace:
[c00000000e9bfbb0] [c000000000109f84] .dput+0x38/0x214 (unreliable)
[c00000000e9bfc50] [c0000000000f1740] .fput+0x24c/0x288
[c00000000e9bfd00] [c0000000000ed708] .filp_close+0xbc/0xe4
[c00000000e9bfd90] [c0000000000ed800] .SyS_close+0xd0/0x128
[c00000000e9bfe30] [c0000000000097d8] syscall_exit+0x0/0x40
Instruction dump:
fb61ffd8 fb81ffe0 fba1ffe8 f821ff61 418201c8 3bbf0060 7fa3eb78 482e7f31
60000000 813f0058 7d200074 7800d182 <0b000000> 2b890001 409d0010 3809ffff
---[ end trace c337aad05d94532f ]---
------------[ cut here ]------------
kernel BUG at fs/dcache.c:474!
Oops: Exception in kernel mode, sig: 5 [#2]
SMP NR_CPUS=2 NUMA PS3
Modules linked in: spufs dm_mod btusb bluetooth usb_storage ohci_hcd 
snd_ps3 ehci_hcd snd_pcm snd_page_alloc snd_timer sg snd usbcore 
usb_common ps3flash rtc_ps3 soundcore ps3_lpm ps3vram [last unloaded: 
scsi_wait_scan]
NIP: c000000000109f94 LR: c000000000109f84 CTR: c0000000000a029c
REGS: c00000000e9bec20 TRAP: 0700 Tainted: G D (3.2.5)
MSR: 8000000000028032 <EE,CE,IR,DR> CR: 22004822 XER: 00000000
TASK = c0000000062f0ec0[1067] 'test' THREAD: c00000000e9bc000 CPU: 1
GPR00: 0000000000000001 c00000000e9beea0 c0000000006812e8 c0000000054361c8
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002
GPR08: 0000000000000000 0000000000000000 c000000000109f84 c0000000062f0ec0
GPR12: 0000000042004824 c000000007ffe280 0000000000000004 00000000f7850688
GPR16: 00000000f7830734 00000000f78517a4 00000000f7852008 00000000f78517a8
GPR20: 00000000ff805dc0 000000000fd958a0 0000000000000000 0000000000000001
GPR24: 000000000fd98240 c00000000e9b2390 0000000000000008 c0000000062bd010
GPR28: c000000005436168 c0000000054361c8 c0000000006149e8 c000000005436168
NIP [c000000000109f94] .dput+0x48/0x214
LR [c000000000109f84] .dput+0x38/0x214
Call Trace:
[c00000000e9beea0] [c000000000109f84] .dput+0x38/0x214 (unreliable)
[c00000000e9bef40] [c0000000000f1740] .fput+0x24c/0x288
[c00000000e9beff0] [c0000000000c93a8] .remove_vma+0x68/0xcc
[c00000000e9bf080] [c0000000000c951c] .exit_mmap+0x110/0x14c
[c00000000e9bf1a0] [c00000000004b4c8] .mmput+0x5c/0x13c
[c00000000e9bf230] [d00000000058237c] .spu_forget+0x54/0x7c [spufs]
[c00000000e9bf2c0] [d00000000057c294] .spufs_dir_close+0x8c/0xc8 [spufs]
[c00000000e9bf370] [c0000000000f166c] .fput+0x178/0x288
[c00000000e9bf420] [c0000000000ed708] .filp_close+0xbc/0xe4
[c00000000e9bf4b0] [c000000000050294] .put_files_struct+0xf4/0x1b8
[c00000000e9bf560] [c0000000000520bc] .do_exit+0x23c/0x6f4
[c00000000e9bf660] [c00000000001922c] .die+0x274/0x2a4
[c00000000e9bf700] [c000000000019640] ._exception+0x88/0x17c
[c00000000e9bf8c0] [c000000000005314] program_check_common+0x114/0x180
--- Exception: 700 at .dput+0x48/0x214
LR = .dput+0x38/0x214
[c00000000e9bfc50] [c0000000000f1740] .fput+0x24c/0x288
[c00000000e9bfd00] [c0000000000ed708] .filp_close+0xbc/0xe4
[c00000000e9bfd90] [c0000000000ed800] .SyS_close+0xd0/0x128
[c00000000e9bfe30] [c0000000000097d8] syscall_exit+0x0/0x40
Instruction dump:
fb61ffd8 fb81ffe0 fba1ffe8 f821ff61 418201c8 3bbf0060 7fa3eb78 482e7f31
60000000 813f0058 7d200074 7800d182 <0b000000> 2b890001 409d0010 3809ffff
---[ end trace c337aad05d945330 ]---
Fixing recursive fault but reboot is needed!

First time, the mutex gets unlocked in spufs_create_context, then the 
second time in do_spu_create.
It seems that SPU main directory dentry has invalid d_count.


This patch fixes all, OpenCL is running fine, testing spe runs without 
issues.
Benjamin Herrenschmidt - March 7, 2012, 3:49 a.m.
On Tue, 2012-03-06 at 10:26 +0100, masterzorag wrote:
> I'm running my test program, it uses all available spus to compute via 
> OpenCL
> kernel 3.2.5 on a ps3
> even on testing spu directly, it crashes

I think the patch is not 100% right yet. Looking at the code, we
have a real mess of who gets to clean what up here. This is an
attempt at sorting things by having the mutex and dentry dropped
in spufs_create() always. Can you give it a spin (untested):

Al, I'm not familiar with the vfs, can you take a quick look ?

Thanks !

Cheers,
Ben.


> 
> =====================================
> [ BUG: bad unlock balance detected! ]
> -------------------------------------
> test/1067 is trying to release lock (&sb->s_type->i_mutex_key) at:
> [<d0000000005828a8>] .do_spu_create+0x90/0xd8 [spufs]
> but there are no more locks to release!
> other info that might help us debug this:
> no locks held by test/1067.
> stack backtrace:
> Call Trace:
> [c00000000e9bfa30] [c0000000000110d0] .show_stack+0x6c/0x16c (unreliable)
> [c00000000e9bfae0] [c000000000081f90] .print_unlock_inbalance_bug+0xe8/0x110
> [c00000000e9bfb70] [c0000000000868cc] .lock_release+0xd8/0x200
> [c00000000e9bfc10] [c0000000003efb60] .__mutex_unlock_slowpath+0x11c/0x1d8
> [c00000000e9bfcb0] [d0000000005828a8] .do_spu_create+0x90/0xd8 [spufs]
> [c00000000e9bfd70] [c0000000000346ac] .sys_spu_create+0x164/0x1c0
> [c00000000e9bfe30] [c0000000000097d8] syscall_exit+0x0/0x40
> ------------[ cut here ]------------
> kernel BUG at fs/dcache.c:474!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=2 NUMA PS3
> Modules linked in: spufs dm_mod btusb bluetooth usb_storage ohci_hcd 
> snd_ps3 ehci_hcd snd_pcm snd_page_alloc snd_timer sg snd usbcore 
> usb_common ps3flash rtc_ps3 soundcore ps3_lpm ps3vram [last unloaded: 
> scsi_wait_scan]
> NIP: c000000000109f94 LR: c000000000109f84 CTR: c0000000000a029c
> REGS: c00000000e9bf930 TRAP: 0700 Not tainted (3.2.5)
> MSR: 8000000000028032 <EE,CE,IR,DR> CR: 22004822 XER: 00000000
> TASK = c0000000062f0ec0[1067] 'test' THREAD: c00000000e9bc000 CPU: 1
> GPR00: 0000000000000001 c00000000e9bfbb0 c0000000006812e8 c00000000543b798
> GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002
> GPR08: 0000000000000000 0000000000000000 c000000000109f84 c0000000062f0ec0
> GPR12: 0000000082004824 c000000007ffe280 0000000000000004 00000000f7850688
> GPR16: 00000000f7830734 00000000f78517a4 00000000f7852008 00000000f78517a8
> GPR20: 00000000ff805dc0 000000000fd958a0 0000000000000000 000000000000000d
> GPR24: 000000000fd98240 c00000000e101e10 0000000040000010 c00000000616e080
> GPR28: c00000000543b738 c00000000543b798 c0000000006149e8 c00000000543b738
> NIP [c000000000109f94] .dput+0x48/0x214
> LR [c000000000109f84] .dput+0x38/0x214
> Call Trace:
> [c00000000e9bfbb0] [c000000000109f84] .dput+0x38/0x214 (unreliable)
> [c00000000e9bfc50] [c0000000000f1740] .fput+0x24c/0x288
> [c00000000e9bfd00] [c0000000000ed708] .filp_close+0xbc/0xe4
> [c00000000e9bfd90] [c0000000000ed800] .SyS_close+0xd0/0x128
> [c00000000e9bfe30] [c0000000000097d8] syscall_exit+0x0/0x40
> Instruction dump:
> fb61ffd8 fb81ffe0 fba1ffe8 f821ff61 418201c8 3bbf0060 7fa3eb78 482e7f31
> 60000000 813f0058 7d200074 7800d182 <0b000000> 2b890001 409d0010 3809ffff
> ---[ end trace c337aad05d94532f ]---
> ------------[ cut here ]------------
> kernel BUG at fs/dcache.c:474!
> Oops: Exception in kernel mode, sig: 5 [#2]
> SMP NR_CPUS=2 NUMA PS3
> Modules linked in: spufs dm_mod btusb bluetooth usb_storage ohci_hcd 
> snd_ps3 ehci_hcd snd_pcm snd_page_alloc snd_timer sg snd usbcore 
> usb_common ps3flash rtc_ps3 soundcore ps3_lpm ps3vram [last unloaded: 
> scsi_wait_scan]
> NIP: c000000000109f94 LR: c000000000109f84 CTR: c0000000000a029c
> REGS: c00000000e9bec20 TRAP: 0700 Tainted: G D (3.2.5)
> MSR: 8000000000028032 <EE,CE,IR,DR> CR: 22004822 XER: 00000000
> TASK = c0000000062f0ec0[1067] 'test' THREAD: c00000000e9bc000 CPU: 1
> GPR00: 0000000000000001 c00000000e9beea0 c0000000006812e8 c0000000054361c8
> GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000002
> GPR08: 0000000000000000 0000000000000000 c000000000109f84 c0000000062f0ec0
> GPR12: 0000000042004824 c000000007ffe280 0000000000000004 00000000f7850688
> GPR16: 00000000f7830734 00000000f78517a4 00000000f7852008 00000000f78517a8
> GPR20: 00000000ff805dc0 000000000fd958a0 0000000000000000 0000000000000001
> GPR24: 000000000fd98240 c00000000e9b2390 0000000000000008 c0000000062bd010
> GPR28: c000000005436168 c0000000054361c8 c0000000006149e8 c000000005436168
> NIP [c000000000109f94] .dput+0x48/0x214
> LR [c000000000109f84] .dput+0x38/0x214
> Call Trace:
> [c00000000e9beea0] [c000000000109f84] .dput+0x38/0x214 (unreliable)
> [c00000000e9bef40] [c0000000000f1740] .fput+0x24c/0x288
> [c00000000e9beff0] [c0000000000c93a8] .remove_vma+0x68/0xcc
> [c00000000e9bf080] [c0000000000c951c] .exit_mmap+0x110/0x14c
> [c00000000e9bf1a0] [c00000000004b4c8] .mmput+0x5c/0x13c
> [c00000000e9bf230] [d00000000058237c] .spu_forget+0x54/0x7c [spufs]
> [c00000000e9bf2c0] [d00000000057c294] .spufs_dir_close+0x8c/0xc8 [spufs]
> [c00000000e9bf370] [c0000000000f166c] .fput+0x178/0x288
> [c00000000e9bf420] [c0000000000ed708] .filp_close+0xbc/0xe4
> [c00000000e9bf4b0] [c000000000050294] .put_files_struct+0xf4/0x1b8
> [c00000000e9bf560] [c0000000000520bc] .do_exit+0x23c/0x6f4
> [c00000000e9bf660] [c00000000001922c] .die+0x274/0x2a4
> [c00000000e9bf700] [c000000000019640] ._exception+0x88/0x17c
> [c00000000e9bf8c0] [c000000000005314] program_check_common+0x114/0x180
> --- Exception: 700 at .dput+0x48/0x214
> LR = .dput+0x38/0x214
> [c00000000e9bfc50] [c0000000000f1740] .fput+0x24c/0x288
> [c00000000e9bfd00] [c0000000000ed708] .filp_close+0xbc/0xe4
> [c00000000e9bfd90] [c0000000000ed800] .SyS_close+0xd0/0x128
> [c00000000e9bfe30] [c0000000000097d8] syscall_exit+0x0/0x40
> Instruction dump:
> fb61ffd8 fb81ffe0 fba1ffe8 f821ff61 418201c8 3bbf0060 7fa3eb78 482e7f31
> 60000000 813f0058 7d200074 7800d182 <0b000000> 2b890001 409d0010 3809ffff
> ---[ end trace c337aad05d945330 ]---
> Fixing recursive fault but reboot is needed!
> 
> First time, the mutex gets unlocked in spufs_create_context, then the 
> second time in do_spu_create.
> It seems that SPU main directory dentry has invalid d_count.
> 
> 
> This patch fixes all, OpenCL is running fine, testing spe runs without 
> issues.
> 
> --- arch/powerpc/platforms/cell/spufs/syscalls.c
> +++ arch/powerpc/platforms/cell/spufs/syscalls.c.new
> @@ -70,8 +70,8 @@
>       ret = PTR_ERR(dentry);
>       if (!IS_ERR(dentry)) {
>           ret = spufs_create(&path, dentry, flags, mode, neighbor);
> -        mutex_unlock(&path.dentry->d_inode->i_mutex);
> -        dput(dentry);
> +        if (ret < 0)
> +            dput(dentry);
>           path_put(&path);
>       }
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

Patch

--- arch/powerpc/platforms/cell/spufs/syscalls.c
+++ arch/powerpc/platforms/cell/spufs/syscalls.c.new
@@ -70,8 +70,8 @@ 
      ret = PTR_ERR(dentry);
      if (!IS_ERR(dentry)) {
          ret = spufs_create(&path, dentry, flags, mode, neighbor);
-        mutex_unlock(&path.dentry->d_inode->i_mutex);
-        dput(dentry);
+        if (ret < 0)
+            dput(dentry);
          path_put(&path);
      }