Message ID | 20210127145648.348135-1-zlang@redhat.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Series | powerpc/fault: fix wrong KUAP fault for IO_URING | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch powerpc/merge (41d8cb7ece7c81e4eb897ed7ec7d3c3d72fd0af4) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | warning | total: 0 errors, 1 warnings, 0 checks, 14 lines checked |
snowpatch_ozlabs/needsstable | success | Patch has no Fixes tags |
Le 27/01/2021 à 15:56, Zorro Lang a écrit : > On powerpc, io_uring test hit below KUAP fault on __do_page_fault. > The fail source line is: > > if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) > return SIGSEGV; > > The is_user() is based on user_mod(regs) only. This's not suit for > io_uring, where the helper thread can assume the user app identity > and could perform this fault just fine. So turn to use mm to decide > if this is valid or not. I don't understand why testing is_user would be an issue. KUAP purpose it to block any unallowed access from kernel to user memory (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit, that is what is_user provides. If the kernel access is legitimate, kernel should have opened userspace access then you shouldn't get this "Bug: Read fault blocked by KUAP!". As far as I understand, the fault occurs in iov_iter_fault_in_readable() which calls fault_in_pages_readable() And fault_in_pages_readable() uses __get_user() so it is a legitimate access and you really should get a KUAP fault. So the problem is somewhere else, I think you proposed patch just hides the problem, it doesn't fix it. Can you provide your vmlinux binary together with your .config ? > > [ 556.472666] ------------[ cut here ]------------ > [ 556.472686] Bug: Read fault blocked by KUAP! > [ 556.472697] WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 > [ 556.472728] Modules linked in: bonding rfkill sunrpc pseries_rng xts uio_pdrv_genirq vmx_crypto uio ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp > [ 556.472816] CPU: 1 PID: 101841 Comm: io_wqe_worker-0 Tainted: G W 5.11.0-rc3+ #2 > [ 556.472830] NIP: c00000000009e7e4 LR: c00000000009e7e0 CTR: 0000000000000000 > [ 556.472842] REGS: c000000016367090 TRAP: 0700 Tainted: G W (5.11.0-rc3+) > [ 556.472853] MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 48022424 XER: 00000001 > [ 556.472901] CFAR: c0000000001822ac IRQMASK: 1 > GPR00: c00000000009e7e0 c000000016367330 c0000000023fc300 0000000000000020 > GPR04: c000000001e3c2b8 0000000000000001 0000000000000027 c0000007fbcccc90 > GPR08: 0000000000000023 0000000000000000 c000000024ed0900 fffffffffc464a58 > GPR12: 0000000000002000 c00000001ecaf280 c0000000001caee8 c000000014d547c0 > GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000002454018 > GPR20: c000000001336480 bfffffffffffffff 0000000000000000 c00000000b0e5800 > GPR24: a8aaaaaaaaaaaaaa 0000000000000000 0000000000200000 c00000002cc38880 > GPR28: 000001000e3c9310 c0000000013424c0 c0000000163674a0 c000000001e0d2c0 > [ 556.473125] NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 > [ 556.473139] LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 > [ 556.473152] Call Trace: > [ 556.473168] [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) > [ 556.473198] [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 > [ 556.473216] [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c > [ 556.473232] --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 > [ 556.473245] NIP: c0000000008e8228 LR: c0000000008e834c CTR: 0000000000000000 > [ 556.473257] REGS: c0000000163674a0 TRAP: 0300 Tainted: G W (5.11.0-rc3+) > [ 556.473268] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44008482 XER: 00000001 > [ 556.473339] CFAR: c0000000008e81f0 DAR: 000001000e3c9310 DSISR: 00200000 IRQMASK: 0 > GPR00: c0000000008e834c c000000016367740 c0000000023fc300 0000000000000000 > GPR04: c00000002cc389e0 0000000000000001 00000007fa4b0000 c0000000025bc520 > GPR08: 00000007fa4b0000 0000000000000200 fcffffffffffffff ffffffffffea2ad8 > GPR12: 0000000000008000 c00000001ecaf280 c0000000001caee8 c000000014d547c0 > GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000002454018 > GPR20: c000000001336480 bfffffffffffffff 0000000000000000 c00000000b0e5800 > GPR24: a8aaaaaaaaaaaaaa fcffffffffffffff 00000000000004b1 00000000000004b1 > GPR28: c00000000b0e5888 000001000e3c97c0 0000000000000000 000001000e3c9310 > [ 556.473667] NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 > [ 556.473688] LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 > [ 556.473708] --- interrupt: 300 > [ 556.473722] [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 > [ 556.473770] [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 > [ 556.473804] [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120 > [ 556.473839] [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] > [ 556.474053] [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 > [ 556.474101] [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 > [ 556.474132] [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 > [ 556.474161] [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800 > [ 556.474192] [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 > [ 556.474223] [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 > [ 556.474254] [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c > [ 556.474286] Instruction dump: > [ 556.474310] e87e0100 481287f1 60000000 2fa30000 419e01ec 408e0400 3c82fef4 388461d0 > [ 556.474395] 3c62fef4 386362d0 480e3a69 60000000 <0fe00000> 3860000b 4bfffa08 3d220006 > [ 556.474479] irq event stamp: 1280 > [ 556.474505] hardirqs last enabled at (1279): [<c0000000005a0104>] __slab_free+0x3e4/0x570 > [ 556.474540] hardirqs last disabled at (1280): [<c000000000008a04>] data_access_common_virt+0x1a4/0x1c0 > [ 556.474565] softirqs last enabled at (536): [<c00000000107cdfc>] __do_softirq+0x6ac/0x7f4 > [ 556.474590] softirqs last disabled at (437): [<c00000000019179c>] irq_exit+0x2ec/0x320 > [ 556.474615] ---[ end trace 4c1967c400992302 ]--- > > Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=211151 > Suggested-by: Jens Axboe <axboe@kernel.dk> > Signed-off-by: Zorro Lang <zlang@redhat.com> > --- > > Hi, > > Thanks to Jens reviewed this bug report from io_uring side, and then suggest this > fix. But we're not expert of powerpc, so report this bug to powerpc maillist to > get more review. > > I've tested this patch, I can't reproduce this bug after merge this patch. And > can reproduce it after remove this patch. > > Thanks, > Zorro > > arch/powerpc/mm/fault.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > index 8961b44f350c..5a4d6af04c99 100644 > --- a/arch/powerpc/mm/fault.c > +++ b/arch/powerpc/mm/fault.c > @@ -417,9 +417,12 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address, > /* > * The kernel should never take an execute fault nor should it > * take a page fault to a kernel address or a page fault to a user > - * address outside of dedicated places > + * address outside of dedicated places. Use mm to decide if this is > + * valid or not, it's perfectly legitimate to have a kernel thread > + * take a fault as long as it's performed kthread_use_mm() prior. An > + * example of that would be io_uring helper threads. > */ > - if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) > + if (unlikely(!mm && bad_kernel_fault(regs, error_code, address, is_write))) > return SIGSEGV; > > /* >
On 1/27/21 9:38 AM, Christophe Leroy wrote: > > > Le 27/01/2021 à 15:56, Zorro Lang a écrit : >> On powerpc, io_uring test hit below KUAP fault on __do_page_fault. >> The fail source line is: >> >> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) >> return SIGSEGV; >> >> The is_user() is based on user_mod(regs) only. This's not suit for >> io_uring, where the helper thread can assume the user app identity >> and could perform this fault just fine. So turn to use mm to decide >> if this is valid or not. > > I don't understand why testing is_user would be an issue. KUAP purpose > it to block any unallowed access from kernel to user memory > (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit, > that is what is_user provides. > > If the kernel access is legitimate, kernel should have opened > userspace access then you shouldn't get this "Bug: Read fault blocked > by KUAP!". > > As far as I understand, the fault occurs in > iov_iter_fault_in_readable() which calls fault_in_pages_readable() And > fault_in_pages_readable() uses __get_user() so it is a legitimate > access and you really should get a KUAP fault. > > So the problem is somewhere else, I think you proposed patch just > hides the problem, it doesn't fix it. If we do kthread_use_mm(), can we agree that the user access is valid? We should be able to copy to/from user space, and including faults, if that's been done and the new mm assigned. Because it really should be. If SMAP was a problem on x86, we would have seen it long ago. I'm assuming this may be breakage related to the recent uaccess changes related to set_fs and friends? Or maybe recent changes on the powerpc side? Zorro, did 5.10 work?
Excerpts from Jens Axboe's message of January 28, 2021 5:29 am: > On 1/27/21 9:38 AM, Christophe Leroy wrote: >> >> >> Le 27/01/2021 à 15:56, Zorro Lang a écrit : >>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault. >>> The fail source line is: >>> >>> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) >>> return SIGSEGV; >>> >>> The is_user() is based on user_mod(regs) only. This's not suit for >>> io_uring, where the helper thread can assume the user app identity >>> and could perform this fault just fine. So turn to use mm to decide >>> if this is valid or not. >> >> I don't understand why testing is_user would be an issue. KUAP purpose >> it to block any unallowed access from kernel to user memory >> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit, >> that is what is_user provides. >> >> If the kernel access is legitimate, kernel should have opened >> userspace access then you shouldn't get this "Bug: Read fault blocked >> by KUAP!". >> >> As far as I understand, the fault occurs in >> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And >> fault_in_pages_readable() uses __get_user() so it is a legitimate >> access and you really should get a KUAP fault. >> >> So the problem is somewhere else, I think you proposed patch just >> hides the problem, it doesn't fix it. > > If we do kthread_use_mm(), can we agree that the user access is valid? Yeah the io uring code is fine, provided it uses the uaccess primitives like any other kernel code. It's looking more like a an arch/powerpc bug. > We should be able to copy to/from user space, and including faults, if > that's been done and the new mm assigned. Because it really should be. > If SMAP was a problem on x86, we would have seen it long ago. > > I'm assuming this may be breakage related to the recent uaccess changes > related to set_fs and friends? Or maybe recent changes on the powerpc > side? > > Zorro, did 5.10 work? Would be interesting to know. Thanks, Nick
On 1/27/21 8:13 PM, Zorro Lang wrote: > On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote: >> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am: >>> On 1/27/21 9:38 AM, Christophe Leroy wrote: >>>> >>>> >>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit : >>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault. >>>>> The fail source line is: >>>>> >>>>> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) >>>>> return SIGSEGV; >>>>> >>>>> The is_user() is based on user_mod(regs) only. This's not suit for >>>>> io_uring, where the helper thread can assume the user app identity >>>>> and could perform this fault just fine. So turn to use mm to decide >>>>> if this is valid or not. >>>> >>>> I don't understand why testing is_user would be an issue. KUAP purpose >>>> it to block any unallowed access from kernel to user memory >>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit, >>>> that is what is_user provides. >>>> >>>> If the kernel access is legitimate, kernel should have opened >>>> userspace access then you shouldn't get this "Bug: Read fault blocked >>>> by KUAP!". >>>> >>>> As far as I understand, the fault occurs in >>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And >>>> fault_in_pages_readable() uses __get_user() so it is a legitimate >>>> access and you really should get a KUAP fault. >>>> >>>> So the problem is somewhere else, I think you proposed patch just >>>> hides the problem, it doesn't fix it. >>> >>> If we do kthread_use_mm(), can we agree that the user access is valid? >> >> Yeah the io uring code is fine, provided it uses the uaccess primitives >> like any other kernel code. It's looking more like a an arch/powerpc bug. >> >>> We should be able to copy to/from user space, and including faults, if >>> that's been done and the new mm assigned. Because it really should be. >>> If SMAP was a problem on x86, we would have seen it long ago. >>> >>> I'm assuming this may be breakage related to the recent uaccess changes >>> related to set_fs and friends? Or maybe recent changes on the powerpc >>> side? >>> >>> Zorro, did 5.10 work? >> >> Would be interesting to know. > > Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git > commit(be the HEAD) in 5.10 phase? I forget which versions had what series of this, but 5.10 final - and if that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and 5.10 definitely has them.
On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote: > Excerpts from Jens Axboe's message of January 28, 2021 5:29 am: > > On 1/27/21 9:38 AM, Christophe Leroy wrote: > >> > >> > >> Le 27/01/2021 à 15:56, Zorro Lang a écrit : > >>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault. > >>> The fail source line is: > >>> > >>> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) > >>> return SIGSEGV; > >>> > >>> The is_user() is based on user_mod(regs) only. This's not suit for > >>> io_uring, where the helper thread can assume the user app identity > >>> and could perform this fault just fine. So turn to use mm to decide > >>> if this is valid or not. > >> > >> I don't understand why testing is_user would be an issue. KUAP purpose > >> it to block any unallowed access from kernel to user memory > >> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit, > >> that is what is_user provides. > >> > >> If the kernel access is legitimate, kernel should have opened > >> userspace access then you shouldn't get this "Bug: Read fault blocked > >> by KUAP!". > >> > >> As far as I understand, the fault occurs in > >> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And > >> fault_in_pages_readable() uses __get_user() so it is a legitimate > >> access and you really should get a KUAP fault. > >> > >> So the problem is somewhere else, I think you proposed patch just > >> hides the problem, it doesn't fix it. > > > > If we do kthread_use_mm(), can we agree that the user access is valid? > > Yeah the io uring code is fine, provided it uses the uaccess primitives > like any other kernel code. It's looking more like a an arch/powerpc bug. > > > We should be able to copy to/from user space, and including faults, if > > that's been done and the new mm assigned. Because it really should be. > > If SMAP was a problem on x86, we would have seen it long ago. > > > > I'm assuming this may be breakage related to the recent uaccess changes > > related to set_fs and friends? Or maybe recent changes on the powerpc > > side? > > > > Zorro, did 5.10 work? > > Would be interesting to know. Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git commit(be the HEAD) in 5.10 phase? Thanks, Zorro > > Thanks, > Nick >
On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote: > On 1/27/21 8:13 PM, Zorro Lang wrote: > > On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote: > >> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am: > >>> On 1/27/21 9:38 AM, Christophe Leroy wrote: > >>>> > >>>> > >>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit : > >>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault. > >>>>> The fail source line is: > >>>>> > >>>>> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) > >>>>> return SIGSEGV; > >>>>> > >>>>> The is_user() is based on user_mod(regs) only. This's not suit for > >>>>> io_uring, where the helper thread can assume the user app identity > >>>>> and could perform this fault just fine. So turn to use mm to decide > >>>>> if this is valid or not. > >>>> > >>>> I don't understand why testing is_user would be an issue. KUAP purpose > >>>> it to block any unallowed access from kernel to user memory > >>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit, > >>>> that is what is_user provides. > >>>> > >>>> If the kernel access is legitimate, kernel should have opened > >>>> userspace access then you shouldn't get this "Bug: Read fault blocked > >>>> by KUAP!". > >>>> > >>>> As far as I understand, the fault occurs in > >>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And > >>>> fault_in_pages_readable() uses __get_user() so it is a legitimate > >>>> access and you really should get a KUAP fault. > >>>> > >>>> So the problem is somewhere else, I think you proposed patch just > >>>> hides the problem, it doesn't fix it. > >>> > >>> If we do kthread_use_mm(), can we agree that the user access is valid? > >> > >> Yeah the io uring code is fine, provided it uses the uaccess primitives > >> like any other kernel code. It's looking more like a an arch/powerpc bug. > >> > >>> We should be able to copy to/from user space, and including faults, if > >>> that's been done and the new mm assigned. Because it really should be. > >>> If SMAP was a problem on x86, we would have seen it long ago. > >>> > >>> I'm assuming this may be breakage related to the recent uaccess changes > >>> related to set_fs and friends? Or maybe recent changes on the powerpc > >>> side? > >>> > >>> Zorro, did 5.10 work? > >> > >> Would be interesting to know. > > > > Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git > > commit(be the HEAD) in 5.10 phase? > > I forget which versions had what series of this, but 5.10 final - and if > that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and > 5.10 definitely has them. I justed built linux v5.10 with same .config file, and gave it same test. v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug: # ./check generic/013 generic/051 FSTYP -- xfs (non-debug) PLATFORM -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021 MKFS_OPTIONS -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3 MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch generic/013 138s ... 77s generic/051 103s ... 143s Ran: generic/013 generic/051 Passed all 2 tests > > -- > Jens Axboe >
On 1/28/21 6:52 AM, Zorro Lang wrote: > On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote: >> On 1/27/21 8:13 PM, Zorro Lang wrote: >>> On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote: >>>> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am: >>>>> On 1/27/21 9:38 AM, Christophe Leroy wrote: >>>>>> >>>>>> >>>>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit : >>>>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault. >>>>>>> The fail source line is: >>>>>>> >>>>>>> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) >>>>>>> return SIGSEGV; >>>>>>> >>>>>>> The is_user() is based on user_mod(regs) only. This's not suit for >>>>>>> io_uring, where the helper thread can assume the user app identity >>>>>>> and could perform this fault just fine. So turn to use mm to decide >>>>>>> if this is valid or not. >>>>>> >>>>>> I don't understand why testing is_user would be an issue. KUAP purpose >>>>>> it to block any unallowed access from kernel to user memory >>>>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit, >>>>>> that is what is_user provides. >>>>>> >>>>>> If the kernel access is legitimate, kernel should have opened >>>>>> userspace access then you shouldn't get this "Bug: Read fault blocked >>>>>> by KUAP!". >>>>>> >>>>>> As far as I understand, the fault occurs in >>>>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And >>>>>> fault_in_pages_readable() uses __get_user() so it is a legitimate >>>>>> access and you really should get a KUAP fault. >>>>>> >>>>>> So the problem is somewhere else, I think you proposed patch just >>>>>> hides the problem, it doesn't fix it. >>>>> >>>>> If we do kthread_use_mm(), can we agree that the user access is valid? >>>> >>>> Yeah the io uring code is fine, provided it uses the uaccess primitives >>>> like any other kernel code. It's looking more like a an arch/powerpc bug. >>>> >>>>> We should be able to copy to/from user space, and including faults, if >>>>> that's been done and the new mm assigned. Because it really should be. >>>>> If SMAP was a problem on x86, we would have seen it long ago. >>>>> >>>>> I'm assuming this may be breakage related to the recent uaccess changes >>>>> related to set_fs and friends? Or maybe recent changes on the powerpc >>>>> side? >>>>> >>>>> Zorro, did 5.10 work? >>>> >>>> Would be interesting to know. >>> >>> Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git >>> commit(be the HEAD) in 5.10 phase? >> >> I forget which versions had what series of this, but 5.10 final - and if >> that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and >> 5.10 definitely has them. > > I justed built linux v5.10 with same .config file, and gave it same test. > v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug: > > # ./check generic/013 generic/051 > FSTYP -- xfs (non-debug) > PLATFORM -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021 > MKFS_OPTIONS -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3 > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch > > generic/013 138s ... 77s > generic/051 103s ... 143s > Ran: generic/013 generic/051 > Passed all 2 tests Thanks for testing that, so I think it's safe to conclude that there's a regression in powerpc fault handling for kthreads that use kthread_use_mm in this release. A bisect would definitely find it, but might be pointless if Christophe or Nick already have an idea of what it is.
Le 28/01/2021 à 15:42, Jens Axboe a écrit : > On 1/28/21 6:52 AM, Zorro Lang wrote: >> On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote: >>> On 1/27/21 8:13 PM, Zorro Lang wrote: >>>> On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote: >>>>> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am: >>>>>> On 1/27/21 9:38 AM, Christophe Leroy wrote: >>>>>>> >>>>>>> >>>>>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit : >>>>>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault. >>>>>>>> The fail source line is: >>>>>>>> >>>>>>>> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) >>>>>>>> return SIGSEGV; >>>>>>>> >>>>>>>> The is_user() is based on user_mod(regs) only. This's not suit for >>>>>>>> io_uring, where the helper thread can assume the user app identity >>>>>>>> and could perform this fault just fine. So turn to use mm to decide >>>>>>>> if this is valid or not. >>>>>>> >>>>>>> I don't understand why testing is_user would be an issue. KUAP purpose >>>>>>> it to block any unallowed access from kernel to user memory >>>>>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit, >>>>>>> that is what is_user provides. >>>>>>> >>>>>>> If the kernel access is legitimate, kernel should have opened >>>>>>> userspace access then you shouldn't get this "Bug: Read fault blocked >>>>>>> by KUAP!". >>>>>>> >>>>>>> As far as I understand, the fault occurs in >>>>>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And >>>>>>> fault_in_pages_readable() uses __get_user() so it is a legitimate >>>>>>> access and you really should get a KUAP fault. >>>>>>> >>>>>>> So the problem is somewhere else, I think you proposed patch just >>>>>>> hides the problem, it doesn't fix it. >>>>>> >>>>>> If we do kthread_use_mm(), can we agree that the user access is valid? >>>>> >>>>> Yeah the io uring code is fine, provided it uses the uaccess primitives >>>>> like any other kernel code. It's looking more like a an arch/powerpc bug. >>>>> >>>>>> We should be able to copy to/from user space, and including faults, if >>>>>> that's been done and the new mm assigned. Because it really should be. >>>>>> If SMAP was a problem on x86, we would have seen it long ago. >>>>>> >>>>>> I'm assuming this may be breakage related to the recent uaccess changes >>>>>> related to set_fs and friends? Or maybe recent changes on the powerpc >>>>>> side? >>>>>> >>>>>> Zorro, did 5.10 work? >>>>> >>>>> Would be interesting to know. >>>> >>>> Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git >>>> commit(be the HEAD) in 5.10 phase? >>> >>> I forget which versions had what series of this, but 5.10 final - and if >>> that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and >>> 5.10 definitely has them. >> >> I justed built linux v5.10 with same .config file, and gave it same test. >> v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug: >> >> # ./check generic/013 generic/051 >> FSTYP -- xfs (non-debug) >> PLATFORM -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021 >> MKFS_OPTIONS -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3 >> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch >> >> generic/013 138s ... 77s >> generic/051 103s ... 143s >> Ran: generic/013 generic/051 >> Passed all 2 tests > > Thanks for testing that, so I think it's safe to conclude that there's a > regression in powerpc fault handling for kthreads that use > kthread_use_mm in this release. A bisect would definitely find it, but > might be pointless if Christophe or Nick already have an idea of what it > is. > I don't have any idea yet, but I'd be curious to see the vmlinux binary matching the reported Oops. Christophe
On Thu, Jan 28, 2021 at 03:44:21PM +0100, Christophe Leroy wrote: > > > Le 28/01/2021 à 15:42, Jens Axboe a écrit : > > On 1/28/21 6:52 AM, Zorro Lang wrote: > > > On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote: > > > > On 1/27/21 8:13 PM, Zorro Lang wrote: > > > > > On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote: > > > > > > Excerpts from Jens Axboe's message of January 28, 2021 5:29 am: > > > > > > > On 1/27/21 9:38 AM, Christophe Leroy wrote: > > > > > > > > > > > > > > > > > > > > > > > > Le 27/01/2021 à 15:56, Zorro Lang a écrit : > > > > > > > > > On powerpc, io_uring test hit below KUAP fault on __do_page_fault. > > > > > > > > > The fail source line is: > > > > > > > > > > > > > > > > > > if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) > > > > > > > > > return SIGSEGV; > > > > > > > > > > > > > > > > > > The is_user() is based on user_mod(regs) only. This's not suit for > > > > > > > > > io_uring, where the helper thread can assume the user app identity > > > > > > > > > and could perform this fault just fine. So turn to use mm to decide > > > > > > > > > if this is valid or not. > > > > > > > > > > > > > > > > I don't understand why testing is_user would be an issue. KUAP purpose > > > > > > > > it to block any unallowed access from kernel to user memory > > > > > > > > (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit, > > > > > > > > that is what is_user provides. > > > > > > > > > > > > > > > > If the kernel access is legitimate, kernel should have opened > > > > > > > > userspace access then you shouldn't get this "Bug: Read fault blocked > > > > > > > > by KUAP!". > > > > > > > > > > > > > > > > As far as I understand, the fault occurs in > > > > > > > > iov_iter_fault_in_readable() which calls fault_in_pages_readable() And > > > > > > > > fault_in_pages_readable() uses __get_user() so it is a legitimate > > > > > > > > access and you really should get a KUAP fault. > > > > > > > > > > > > > > > > So the problem is somewhere else, I think you proposed patch just > > > > > > > > hides the problem, it doesn't fix it. > > > > > > > > > > > > > > If we do kthread_use_mm(), can we agree that the user access is valid? > > > > > > > > > > > > Yeah the io uring code is fine, provided it uses the uaccess primitives > > > > > > like any other kernel code. It's looking more like a an arch/powerpc bug. > > > > > > > > > > > > > We should be able to copy to/from user space, and including faults, if > > > > > > > that's been done and the new mm assigned. Because it really should be. > > > > > > > If SMAP was a problem on x86, we would have seen it long ago. > > > > > > > > > > > > > > I'm assuming this may be breakage related to the recent uaccess changes > > > > > > > related to set_fs and friends? Or maybe recent changes on the powerpc > > > > > > > side? > > > > > > > > > > > > > > Zorro, did 5.10 work? > > > > > > > > > > > > Would be interesting to know. > > > > > > > > > > Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git > > > > > commit(be the HEAD) in 5.10 phase? > > > > > > > > I forget which versions had what series of this, but 5.10 final - and if > > > > that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and > > > > 5.10 definitely has them. > > > > > > I justed built linux v5.10 with same .config file, and gave it same test. > > > v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug: > > > > > > # ./check generic/013 generic/051 > > > FSTYP -- xfs (non-debug) > > > PLATFORM -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021 > > > MKFS_OPTIONS -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3 > > > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch > > > > > > generic/013 138s ... 77s > > > generic/051 103s ... 143s > > > Ran: generic/013 generic/051 > > > Passed all 2 tests > > > > Thanks for testing that, so I think it's safe to conclude that there's a > > regression in powerpc fault handling for kthreads that use > > kthread_use_mm in this release. A bisect would definitely find it, but > > might be pointless if Christophe or Nick already have an idea of what it > > is. > > > > I don't have any idea yet, but I'd be curious to see the vmlinux binary matching the reported Oops. OK, I don't have the vmlinux matching that bug report now, I can help to prepare a new one, but I need lots of time (about 10+ hours). Thanks, Zorro > > Christophe >
On Thu, Jan 28, 2021 at 03:44:21PM +0100, Christophe Leroy wrote: > > > Le 28/01/2021 à 15:42, Jens Axboe a écrit : > > On 1/28/21 6:52 AM, Zorro Lang wrote: > > > On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote: > > > > On 1/27/21 8:13 PM, Zorro Lang wrote: > > > > > On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote: > > > > > > Excerpts from Jens Axboe's message of January 28, 2021 5:29 am: > > > > > > > On 1/27/21 9:38 AM, Christophe Leroy wrote: > > > > > > > > > > > > > > > > > > > > > > > > Le 27/01/2021 à 15:56, Zorro Lang a écrit : > > > > > > > > > On powerpc, io_uring test hit below KUAP fault on __do_page_fault. > > > > > > > > > The fail source line is: > > > > > > > > > > > > > > > > > > if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) > > > > > > > > > return SIGSEGV; > > > > > > > > > > > > > > > > > > The is_user() is based on user_mod(regs) only. This's not suit for > > > > > > > > > io_uring, where the helper thread can assume the user app identity > > > > > > > > > and could perform this fault just fine. So turn to use mm to decide > > > > > > > > > if this is valid or not. > > > > > > > > > > > > > > > > I don't understand why testing is_user would be an issue. KUAP purpose > > > > > > > > it to block any unallowed access from kernel to user memory > > > > > > > > (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit, > > > > > > > > that is what is_user provides. > > > > > > > > > > > > > > > > If the kernel access is legitimate, kernel should have opened > > > > > > > > userspace access then you shouldn't get this "Bug: Read fault blocked > > > > > > > > by KUAP!". > > > > > > > > > > > > > > > > As far as I understand, the fault occurs in > > > > > > > > iov_iter_fault_in_readable() which calls fault_in_pages_readable() And > > > > > > > > fault_in_pages_readable() uses __get_user() so it is a legitimate > > > > > > > > access and you really should get a KUAP fault. > > > > > > > > > > > > > > > > So the problem is somewhere else, I think you proposed patch just > > > > > > > > hides the problem, it doesn't fix it. > > > > > > > > > > > > > > If we do kthread_use_mm(), can we agree that the user access is valid? > > > > > > > > > > > > Yeah the io uring code is fine, provided it uses the uaccess primitives > > > > > > like any other kernel code. It's looking more like a an arch/powerpc bug. > > > > > > > > > > > > > We should be able to copy to/from user space, and including faults, if > > > > > > > that's been done and the new mm assigned. Because it really should be. > > > > > > > If SMAP was a problem on x86, we would have seen it long ago. > > > > > > > > > > > > > > I'm assuming this may be breakage related to the recent uaccess changes > > > > > > > related to set_fs and friends? Or maybe recent changes on the powerpc > > > > > > > side? > > > > > > > > > > > > > > Zorro, did 5.10 work? > > > > > > > > > > > > Would be interesting to know. > > > > > > > > > > Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git > > > > > commit(be the HEAD) in 5.10 phase? > > > > > > > > I forget which versions had what series of this, but 5.10 final - and if > > > > that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and > > > > 5.10 definitely has them. > > > > > > I justed built linux v5.10 with same .config file, and gave it same test. > > > v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug: > > > > > > # ./check generic/013 generic/051 > > > FSTYP -- xfs (non-debug) > > > PLATFORM -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021 > > > MKFS_OPTIONS -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3 > > > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch > > > > > > generic/013 138s ... 77s > > > generic/051 103s ... 143s > > > Ran: generic/013 generic/051 > > > Passed all 2 tests > > > > Thanks for testing that, so I think it's safe to conclude that there's a > > regression in powerpc fault handling for kthreads that use > > kthread_use_mm in this release. A bisect would definitely find it, but > > might be pointless if Christophe or Nick already have an idea of what it > > is. > > > > I don't have any idea yet, but I'd be curious to see the vmlinux binary matching the reported Oops. I just upload the vmlinux and .config file, the vmlinux it too big, I have to upload it to my google store and share the link as below: config file: https://drive.google.com/file/d/1pMszboxdjbMPqSNXnMH-1UCZC-vtDnI9/view?usp=sharing vmlinux: https://drive.google.com/file/d/1s7g2eBPAFFV61aM2dO9bvVTERGQ9mLYk/view?usp=sharing I used latest upstream mainline linux, HEAD commit is: 76c057c84d (HEAD -> master, origin/master, origin/HEAD) Merge branch 'parisc-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux The test failed on this kernel as: # dmesg [ 96.200296] ------------[ cut here ]------------ [ 96.200304] Bug: Read fault blocked by KUAP! [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 [ 96.200323] Modules linked in: bonding rfkill sunrpc pseries_rng uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks ip_tables xfs libcrc32c sd_mod t10_pi xts ibmvscsi ibmveth scsi_transport_srp vmx_crypto [ 96.200372] CPU: 3 PID: 1876 Comm: io_wqe_worker-0 Tainted: G W 5.11.0-rc5+ #5 [ 96.200380] NIP: c00000000008f8a0 LR: c00000000008f89c CTR: 0000000000000000 [ 96.200386] REGS: c00000000d3aafd0 TRAP: 0700 Tainted: G W (5.11.0-rc5+) [ 96.200393] MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 48082204 XER: 00000005 [ 96.200416] CFAR: c00000000015ddac IRQMASK: 1 GPR00: c00000000008f89c c00000000d3ab270 c000000002116900 0000000000000020 GPR04: c000000001bec250 0000000000000001 00000001fbb80000 0000000000000027 GPR08: 0000000000000001 0000000000000000 c000000020fbba00 0000000000000001 GPR12: 0000000000002000 c00000001ecaae00 c00000000019dae8 c000000008d48040 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: c0000000012d9650 fcffffffffffffff c000000002164018 c000000001262da0 GPR24: 0000000000000000 0000000000000000 0000000000000300 c000000010c27b80 GPR28: 0000000000200000 0000000000000000 0000010007ffce60 c00000000d3ab470 [ 96.200521] NIP [c00000000008f8a0] bad_kernel_fault+0x180/0x310 [ 96.200528] LR [c00000000008f89c] bad_kernel_fault+0x17c/0x310 [ 96.200535] Call Trace: [ 96.200539] [c00000000d3ab270] [c00000000008f89c] bad_kernel_fault+0x17c/0x310 (unreliable) [ 96.200551] [c00000000d3ab2f0] [c000000000090494] __do_page_fault+0x5f4/0x900 [ 96.200561] [c00000000d3ab3b0] [c0000000000907dc] do_page_fault+0x3c/0x120 [ 96.200570] [c00000000d3ab400] [c00000000000c748] handle_page_fault+0x10/0x2c [ 96.200579] --- interrupt: 300 at fault_in_pages_readable+0x104/0x350 [ 96.200579] --- interrupt: 300 at fault_in_pages_readable+0x104/0x350 [ 96.200586] NIP: c000000000849424 LR: c00000000084952c CTR: c0000000006984a0 [ 96.200592] REGS: c00000000d3ab470 TRAP: 0300 Tainted: G W (5.11.0-rc5+) [ 96.200598] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44082804 XER: 00000001 [ 96.200628] CFAR: c000000000010330 DAR: 0000010007ffce60 DSISR: 00200000 IRQMASK: 0 GPR00: c00000000084952c c00000000d3ab710 c000000002116900 0000000000000000 GPR04: c000000010c27ce0 0000000000000001 00000001fbb80000 0000000000010000 GPR08: 00000000271cd0a4 0000000000000200 0000000000000200 0000000000000000 GPR12: 0000000000002000 c00000001ecaae00 c00000000019dae8 c000000008d48040 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: c0000000012d9650 fcffffffffffffff c000000002164018 c000000001262da0 GPR24: c000000020fbba00 bfffffffffffffff 0000000000000000 a8aaaaaaaaaaaaaa GPR28: 0000010008005d71 c00000000bec3e00 0000000000000000 0000010007ffce60 [ 96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350 [ 96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350 [ 96.200747] --- interrupt: 300 [ 96.200752] [c00000000d3ab7a0] [c0000000008496d0] iov_iter_fault_in_readable+0x60/0x120 [ 96.200761] [c00000000d3ab7e0] [c000000000698558] iomap_write_actor+0xb8/0x270 [ 96.200771] [c00000000d3ab890] [c000000000693554] iomap_apply+0x2b4/0x740 [ 96.200780] [c00000000d3ab9a0] [c000000000693dc0] iomap_file_buffered_write+0xa0/0x120 [ 96.200790] [c00000000d3ab9f0] [c008000001d3efec] xfs_file_buffered_aio_write+0x354/0x590 [xfs] [ 96.200870] [c00000000d3aba90] [c0000000006691e4] io_write+0x104/0x4b0 [ 96.200884] [c00000000d3abbb0] [c00000000066be54] io_issue_sqe+0x3d4/0xf50 [ 96.200897] [c00000000d3abc60] [c00000000066f250] io_wq_submit_work+0xb0/0x2f0 [ 96.200911] [c00000000d3abcb0] [c0000000006738a8] io_worker_handle_work+0x248/0x4a0 [ 96.200925] [c00000000d3abd30] [c000000000673d28] io_wqe_worker+0x228/0x2a0 [ 96.200939] [c00000000d3abda0] [c00000000019dc94] kthread+0x1b4/0x1c0 [ 96.200950] [c00000000d3abe10] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c [ 96.200959] Instruction dump: [ 96.200965] e87f0100 4810b155 60000000 2c230000 4182ffa8 409200ac 3c82ff15 38847e38 [ 96.200987] 3c62ff15 38637ed0 480ce4ad 60000000 <0fe00000> e8010090 38210080 38600001 [ 96.201008] irq event stamp: 46 [ 96.201013] hardirqs last enabled at (45): [<c0000000005428c4>] __slab_free+0x414/0x610 [ 96.201021] hardirqs last disabled at (46): [<c000000000008a04>] data_access_common_virt+0x1a4/0x1c0 [ 96.201030] softirqs last enabled at (0): [<c00000000015ae68>] copy_process+0x688/0x1600 [ 96.201038] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 96.201045] ---[ end trace c2373fad985a304b ]--- # ./scripts/faddr2line vmlinux bad_kernel_fault+0x180/0x310 bad_kernel_fault+0x180/0x310: bad_kernel_fault at arch/powerpc/mm/fault.c:229 (discriminator 6) 217 // Read/write fault blocked by KUAP is bad, it can never succeed. 218 if (bad_kuap_fault(regs, address, is_write)) { 219 pr_crit_ratelimited("Kernel attempted to %s user page (%lx) - exploit attempt? (uid: %d)\n", 220 is_write ? "write" : "read", address, 221 from_kuid(&init_user_ns, current_uid())); 222 223 // Fault on user outside of certain regions (eg. copy_tofrom_user()) is bad 224 if (!search_exception_tables(regs->nip)) 225 return true; 226 227 // Read/write fault in a valid region (the exception table search passed 228 // above), but blocked by KUAP is bad, it can never succeed. 229 return WARN(true, "Bug: %s fault blocked by KUAP!", is_write ? "Write" : "Read"); Thanks, Zorro > > Christophe >
+Aneesh Le 29/01/2021 à 07:52, Zorro Lang a écrit : > On Thu, Jan 28, 2021 at 03:44:21PM +0100, Christophe Leroy wrote: >> >> >> Le 28/01/2021 à 15:42, Jens Axboe a écrit : >>> On 1/28/21 6:52 AM, Zorro Lang wrote: >>>> On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote: >>>>> On 1/27/21 8:13 PM, Zorro Lang wrote: >>>>>> On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote: >>>>>>> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am: >>>>>>>> On 1/27/21 9:38 AM, Christophe Leroy wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit : >>>>>>>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault. >>>>>>>>>> The fail source line is: >>>>>>>>>> >>>>>>>>>> if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) >>>>>>>>>> return SIGSEGV; >>>>>>>>>> >>>>>>>>>> The is_user() is based on user_mod(regs) only. This's not suit for >>>>>>>>>> io_uring, where the helper thread can assume the user app identity >>>>>>>>>> and could perform this fault just fine. So turn to use mm to decide >>>>>>>>>> if this is valid or not. >>>>>>>>> >>>>>>>>> I don't understand why testing is_user would be an issue. KUAP purpose >>>>>>>>> it to block any unallowed access from kernel to user memory >>>>>>>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit, >>>>>>>>> that is what is_user provides. >>>>>>>>> >>>>>>>>> If the kernel access is legitimate, kernel should have opened >>>>>>>>> userspace access then you shouldn't get this "Bug: Read fault blocked >>>>>>>>> by KUAP!". >>>>>>>>> >>>>>>>>> As far as I understand, the fault occurs in >>>>>>>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And >>>>>>>>> fault_in_pages_readable() uses __get_user() so it is a legitimate >>>>>>>>> access and you really should get a KUAP fault. >>>>>>>>> >>>>>>>>> So the problem is somewhere else, I think you proposed patch just >>>>>>>>> hides the problem, it doesn't fix it. >>>>>>>> >>>>>>>> If we do kthread_use_mm(), can we agree that the user access is valid? >>>>>>> >>>>>>> Yeah the io uring code is fine, provided it uses the uaccess primitives >>>>>>> like any other kernel code. It's looking more like a an arch/powerpc bug. >>>>>>> >>>>>>>> We should be able to copy to/from user space, and including faults, if >>>>>>>> that's been done and the new mm assigned. Because it really should be. >>>>>>>> If SMAP was a problem on x86, we would have seen it long ago. >>>>>>>> >>>>>>>> I'm assuming this may be breakage related to the recent uaccess changes >>>>>>>> related to set_fs and friends? Or maybe recent changes on the powerpc >>>>>>>> side? >>>>>>>> >>>>>>>> Zorro, did 5.10 work? >>>>>>> >>>>>>> Would be interesting to know. >>>>>> >>>>>> Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git >>>>>> commit(be the HEAD) in 5.10 phase? >>>>> >>>>> I forget which versions had what series of this, but 5.10 final - and if >>>>> that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and >>>>> 5.10 definitely has them. >>>> >>>> I justed built linux v5.10 with same .config file, and gave it same test. >>>> v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug: >>>> >>>> # ./check generic/013 generic/051 >>>> FSTYP -- xfs (non-debug) >>>> PLATFORM -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021 >>>> MKFS_OPTIONS -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3 >>>> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch >>>> >>>> generic/013 138s ... 77s >>>> generic/051 103s ... 143s >>>> Ran: generic/013 generic/051 >>>> Passed all 2 tests >>> >>> Thanks for testing that, so I think it's safe to conclude that there's a >>> regression in powerpc fault handling for kthreads that use >>> kthread_use_mm in this release. A bisect would definitely find it, but >>> might be pointless if Christophe or Nick already have an idea of what it >>> is. >>> >> >> I don't have any idea yet, but I'd be curious to see the vmlinux binary matching the reported Oops. > > I just upload the vmlinux and .config file, the vmlinux it too big, I have to > upload it to my google store and share the link as below: > > config file: https://drive.google.com/file/d/1pMszboxdjbMPqSNXnMH-1UCZC-vtDnI9/view?usp=sharing > vmlinux: https://drive.google.com/file/d/1s7g2eBPAFFV61aM2dO9bvVTERGQ9mLYk/view?usp=sharing > > I used latest upstream mainline linux, HEAD commit is: > 76c057c84d (HEAD -> master, origin/master, origin/HEAD) Merge branch 'parisc-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux > > The test failed on this kernel as: > > # dmesg > [ 96.200296] ------------[ cut here ]------------ > [ 96.200304] Bug: Read fault blocked by KUAP! > [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 > [ 96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350 > [ 96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350 > [ 96.200747] --- interrupt: 300 Problem happens in a section where userspace access is supposed to be granted, so the patch you proposed is definitely not the right fix. c000000000849408: 2c 01 00 4c isync c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace access permission c000000000849410: 2c 01 00 4c isync c000000000849414: 00 00 36 e9 ld r9,0(r22) c000000000849418: 20 00 29 81 lwz r9,32(r9) c00000000084941c: 00 02 29 71 andi. r9,r9,512 c000000000849420: 78 d3 5e 7f mr r30,r26 ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== accessing userspace c000000000849428: 10 00 82 41 beq c000000000849438 <fault_in_pages_readable+0x118> c00000000084942c: 2c 01 00 4c isync c000000000849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace access permission c000000000849434: 2c 01 00 4c isync My first guess is that the problem is linked to the following function, see the comment /* * For kernel thread that doesn't have thread.regs return * default AMR/IAMR values. */ static inline u64 current_thread_amr(void) { if (current->thread.regs) return current->thread.regs->amr; return AMR_KUAP_BLOCKED; } Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode") Aneesh, any idea ? Christophe
Christophe Leroy <christophe.leroy@csgroup.eu> writes: > +Aneesh > > Le 29/01/2021 à 07:52, Zorro Lang a écrit : .. >> [ 96.200296] ------------[ cut here ]------------ >> [ 96.200304] Bug: Read fault blocked by KUAP! >> [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 > >> [ 96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350 >> [ 96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350 >> [ 96.200747] --- interrupt: 300 > > > Problem happens in a section where userspace access is supposed to be granted, so the patch you > proposed is definitely not the right fix. > > c000000000849408: 2c 01 00 4c isync > c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace access permission > c000000000849410: 2c 01 00 4c isync > c000000000849414: 00 00 36 e9 ld r9,0(r22) > c000000000849418: 20 00 29 81 lwz r9,32(r9) > c00000000084941c: 00 02 29 71 andi. r9,r9,512 > c000000000849420: 78 d3 5e 7f mr r30,r26 > ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== accessing userspace > c000000000849428: 10 00 82 41 beq c000000000849438 <fault_in_pages_readable+0x118> > c00000000084942c: 2c 01 00 4c isync > c000000000849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace access permission > c000000000849434: 2c 01 00 4c isync > > My first guess is that the problem is linked to the following function, see the comment > > /* > * For kernel thread that doesn't have thread.regs return > * default AMR/IAMR values. > */ > static inline u64 current_thread_amr(void) > { > if (current->thread.regs) > return current->thread.regs->amr; > return AMR_KUAP_BLOCKED; > } > > Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR > when in kernel mode") Yeah that's a bit of a curly one. At some point io_uring did kthread_use_mm(), which is supposed to mean the kthread can operate on behalf of the original process that submitted the IO. But because KUAP is implemented using memory protection keys, it depends on the value of the AMR register, which is not part of the mm, it's in thread.regs->amr. And what's worse by the time we're in kthread_use_mm() we no longer have access to the thread.regs->amr of the original process that submitted the IO. We also can't simply move the AMR into the mm, precisely because it's per thread, not per mm. So TBH I don't know how we're going to fix this. I guess we could return AMR=unblocked for kernel threads, but that's arguably a bug because it allows a process to circumvent memory keys by asking the kernel to do the access. cheers
Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: > Christophe Leroy <christophe.leroy@csgroup.eu> writes: >> +Aneesh >> >> Le 29/01/2021 à 07:52, Zorro Lang a écrit : > .. >>> [ 96.200296] ------------[ cut here ]------------ >>> [ 96.200304] Bug: Read fault blocked by KUAP! >>> [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 >> >>> [ 96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350 >>> [ 96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350 >>> [ 96.200747] --- interrupt: 300 >> >> >> Problem happens in a section where userspace access is supposed to be granted, so the patch you >> proposed is definitely not the right fix. >> >> c000000000849408: 2c 01 00 4c isync >> c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace access permission >> c000000000849410: 2c 01 00 4c isync >> c000000000849414: 00 00 36 e9 ld r9,0(r22) >> c000000000849418: 20 00 29 81 lwz r9,32(r9) >> c00000000084941c: 00 02 29 71 andi. r9,r9,512 >> c000000000849420: 78 d3 5e 7f mr r30,r26 >> ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== accessing userspace >> c000000000849428: 10 00 82 41 beq c000000000849438 <fault_in_pages_readable+0x118> >> c00000000084942c: 2c 01 00 4c isync >> c000000000849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace access permission >> c000000000849434: 2c 01 00 4c isync >> >> My first guess is that the problem is linked to the following function, see the comment >> >> /* >> * For kernel thread that doesn't have thread.regs return >> * default AMR/IAMR values. >> */ >> static inline u64 current_thread_amr(void) >> { >> if (current->thread.regs) >> return current->thread.regs->amr; >> return AMR_KUAP_BLOCKED; >> } >> >> Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR >> when in kernel mode") > > Yeah that's a bit of a curly one. > > At some point io_uring did kthread_use_mm(), which is supposed to mean > the kthread can operate on behalf of the original process that submitted > the IO. > > But because KUAP is implemented using memory protection keys, it depends > on the value of the AMR register, which is not part of the mm, it's in > thread.regs->amr. > > And what's worse by the time we're in kthread_use_mm() we no longer have > access to the thread.regs->amr of the original process that submitted > the IO. > > We also can't simply move the AMR into the mm, precisely because it's > per thread, not per mm. > > So TBH I don't know how we're going to fix this. > > I guess we could return AMR=unblocked for kernel threads, but that's > arguably a bug because it allows a process to circumvent memory keys by > asking the kernel to do the access. We shouldn't need to inherit AMR should we? We only need it to be locked for kernel threads until it's explicitly unlocked -- nothing mm specific there. I think current_thread_amr could return 0 for kernel threads? Or I would even avoid using that function for allow_user_access and open code the kthread case and remove it from current_thread_amr(). Thanks, Nick
Nicholas Piggin <npiggin@gmail.com> writes: > Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >> Christophe Leroy <christophe.leroy@csgroup.eu> writes: >>> +Aneesh >>> >>> Le 29/01/2021 à 07:52, Zorro Lang a écrit : >> .. >>>> [ 96.200296] ------------[ cut here ]------------ >>>> [ 96.200304] Bug: Read fault blocked by KUAP! >>>> [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 >>> >>>> [ 96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350 >>>> [ 96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350 >>>> [ 96.200747] --- interrupt: 300 >>> >>> >>> Problem happens in a section where userspace access is supposed to be granted, so the patch you >>> proposed is definitely not the right fix. >>> >>> c000000000849408: 2c 01 00 4c isync >>> c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace access permission >>> c000000000849410: 2c 01 00 4c isync >>> c000000000849414: 00 00 36 e9 ld r9,0(r22) >>> c000000000849418: 20 00 29 81 lwz r9,32(r9) >>> c00000000084941c: 00 02 29 71 andi. r9,r9,512 >>> c000000000849420: 78 d3 5e 7f mr r30,r26 >>> ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== accessing userspace >>> c000000000849428: 10 00 82 41 beq c000000000849438 <fault_in_pages_readable+0x118> >>> c00000000084942c: 2c 01 00 4c isync >>> c000000000849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace access permission >>> c000000000849434: 2c 01 00 4c isync >>> >>> My first guess is that the problem is linked to the following function, see the comment >>> >>> /* >>> * For kernel thread that doesn't have thread.regs return >>> * default AMR/IAMR values. >>> */ >>> static inline u64 current_thread_amr(void) >>> { >>> if (current->thread.regs) >>> return current->thread.regs->amr; >>> return AMR_KUAP_BLOCKED; >>> } >>> >>> Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR >>> when in kernel mode") >> >> Yeah that's a bit of a curly one. >> >> At some point io_uring did kthread_use_mm(), which is supposed to mean >> the kthread can operate on behalf of the original process that submitted >> the IO. >> >> But because KUAP is implemented using memory protection keys, it depends >> on the value of the AMR register, which is not part of the mm, it's in >> thread.regs->amr. >> >> And what's worse by the time we're in kthread_use_mm() we no longer have >> access to the thread.regs->amr of the original process that submitted >> the IO. >> >> We also can't simply move the AMR into the mm, precisely because it's >> per thread, not per mm. >> >> So TBH I don't know how we're going to fix this. >> >> I guess we could return AMR=unblocked for kernel threads, but that's >> arguably a bug because it allows a process to circumvent memory keys by >> asking the kernel to do the access. > > We shouldn't need to inherit AMR should we? We only need it to be locked > for kernel threads until it's explicitly unlocked -- nothing mm specific > there. I think current_thread_amr could return 0 for kernel threads? Or > I would even avoid using that function for allow_user_access and open > code the kthread case and remove it from current_thread_amr(). > > Thanks, > Nick Can we try this? commit 9a193d38b1a0a52364bc70ec953e0685993603b6 Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Date: Tue Feb 2 09:23:38 2021 +0530 powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm This fix the bad fault reported by KUAP when io_wqe_worker access userspace. Bug: Read fault blocked by KUAP! WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 .......... Call Trace: [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 .......... NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 interrupt: 300 [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120 [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800 [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c The kernel consider thread AMR value for kernel thread to be AMR_KUAP_BLOCKED. Hence access to userspace is denied. This of course not correct and we should allow userspace access after kthread_use_mm(). To be precise, kthread_use_mm() should inherit the AMR value of the operating address space. But, the AMR value is thread-specific and we inherit the address space and not thread access restrictions. Because of this ignore AMR value when accessing userspace via kernel thread. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index f50f72e535aa..7457d80ba0bb 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -384,7 +384,13 @@ static __always_inline void allow_user_access(void __user *to, const void __user // This is written so we can resolve to a single case at build time BUILD_BUG_ON(!__builtin_constant_p(dir)); - if (mmu_has_feature(MMU_FTR_PKEY)) + /* + * if it is a kthread that did kthread_use_mm() don't + * use current_thread_amr(). + */ + if (!current->mm && current->active_mm != &init_mm) + thread_amr = 0; + else if (mmu_has_feature(MMU_FTR_PKEY)) thread_amr = current_thread_amr(); if (dir == KUAP_READ)
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes: > Nicholas Piggin <npiggin@gmail.com> writes: > >> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >>> Christophe Leroy <christophe.leroy@csgroup.eu> writes: >>>> +Aneesh >>>> >>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit : >>> .. >>>>> [ 96.200296] ------------[ cut here ]------------ >>>>> [ 96.200304] Bug: Read fault blocked by KUAP! >>>>> [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 >>>> >>>>> [ 96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350 >>>>> [ 96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350 >>>>> [ 96.200747] --- interrupt: 300 >>>> >>>> >>>> Problem happens in a section where userspace access is supposed to be granted, so the patch you >>>> proposed is definitely not the right fix. >>>> >>>> c000000000849408: 2c 01 00 4c isync >>>> c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace access permission >>>> c000000000849410: 2c 01 00 4c isync >>>> c000000000849414: 00 00 36 e9 ld r9,0(r22) >>>> c000000000849418: 20 00 29 81 lwz r9,32(r9) >>>> c00000000084941c: 00 02 29 71 andi. r9,r9,512 >>>> c000000000849420: 78 d3 5e 7f mr r30,r26 >>>> ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== accessing userspace >>>> c000000000849428: 10 00 82 41 beq c000000000849438 <fault_in_pages_readable+0x118> >>>> c00000000084942c: 2c 01 00 4c isync >>>> c000000000849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace access permission >>>> c000000000849434: 2c 01 00 4c isync >>>> >>>> My first guess is that the problem is linked to the following function, see the comment >>>> >>>> /* >>>> * For kernel thread that doesn't have thread.regs return >>>> * default AMR/IAMR values. >>>> */ >>>> static inline u64 current_thread_amr(void) >>>> { >>>> if (current->thread.regs) >>>> return current->thread.regs->amr; >>>> return AMR_KUAP_BLOCKED; >>>> } >>>> >>>> Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR >>>> when in kernel mode") >>> >>> Yeah that's a bit of a curly one. >>> >>> At some point io_uring did kthread_use_mm(), which is supposed to mean >>> the kthread can operate on behalf of the original process that submitted >>> the IO. >>> >>> But because KUAP is implemented using memory protection keys, it depends >>> on the value of the AMR register, which is not part of the mm, it's in >>> thread.regs->amr. >>> >>> And what's worse by the time we're in kthread_use_mm() we no longer have >>> access to the thread.regs->amr of the original process that submitted >>> the IO. >>> >>> We also can't simply move the AMR into the mm, precisely because it's >>> per thread, not per mm. >>> >>> So TBH I don't know how we're going to fix this. >>> >>> I guess we could return AMR=unblocked for kernel threads, but that's >>> arguably a bug because it allows a process to circumvent memory keys by >>> asking the kernel to do the access. >> >> We shouldn't need to inherit AMR should we? We only need it to be locked >> for kernel threads until it's explicitly unlocked -- nothing mm specific >> there. I think current_thread_amr could return 0 for kernel threads? Or >> I would even avoid using that function for allow_user_access and open >> code the kthread case and remove it from current_thread_amr(). >> >> Thanks, >> Nick > updated one From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Date: Tue, 2 Feb 2021 09:23:38 +0530 Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm This fix the bad fault reported by KUAP when io_wqe_worker access userspace. Bug: Read fault blocked by KUAP! WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 .......... Call Trace: [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 .......... NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 interrupt: 300 [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120 [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800 [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c The kernel consider thread AMR value for kernel thread to be AMR_KUAP_BLOCKED. Hence access to userspace is denied. This of course not correct and we should allow userspace access after kthread_use_mm(). To be precise, kthread_use_mm() should inherit the AMR value of the operating address space. But, the AMR value is thread-specific and we inherit the address space and not thread access restrictions. Because of this ignore AMR value when accessing userspace via kernel thread. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- Changes from v1: * Address review feedback from Nick arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index f50f72e535aa..95f4df99249e 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -384,7 +384,13 @@ static __always_inline void allow_user_access(void __user *to, const void __user // This is written so we can resolve to a single case at build time BUILD_BUG_ON(!__builtin_constant_p(dir)); - if (mmu_has_feature(MMU_FTR_PKEY)) + /* + * if it is a kthread that did kthread_use_mm() don't + * use current_thread_amr(). + */ + if (current->flags & PF_KTHREAD) + thread_amr = 0; + else if (mmu_has_feature(MMU_FTR_PKEY)) thread_amr = current_thread_amr(); if (dir == KUAP_READ)
Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : > Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes: > >> Nicholas Piggin <npiggin@gmail.com> writes: >> >>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes: >>>>> +Aneesh >>>>> >>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit : >>>> .. >>>>>> [ 96.200296] ------------[ cut here ]------------ >>>>>> [ 96.200304] Bug: Read fault blocked by KUAP! >>>>>> [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 >>>>> >>>>>> [ 96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350 >>>>>> [ 96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350 >>>>>> [ 96.200747] --- interrupt: 300 >>>>> >>>>> >>>>> Problem happens in a section where userspace access is supposed to be granted, so the patch you >>>>> proposed is definitely not the right fix. >>>>> >>>>> c000000000849408: 2c 01 00 4c isync >>>>> c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace access permission >>>>> c000000000849410: 2c 01 00 4c isync >>>>> c000000000849414: 00 00 36 e9 ld r9,0(r22) >>>>> c000000000849418: 20 00 29 81 lwz r9,32(r9) >>>>> c00000000084941c: 00 02 29 71 andi. r9,r9,512 >>>>> c000000000849420: 78 d3 5e 7f mr r30,r26 >>>>> ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== accessing userspace >>>>> c000000000849428: 10 00 82 41 beq c000000000849438 <fault_in_pages_readable+0x118> >>>>> c00000000084942c: 2c 01 00 4c isync >>>>> c000000000849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace access permission >>>>> c000000000849434: 2c 01 00 4c isync >>>>> >>>>> My first guess is that the problem is linked to the following function, see the comment >>>>> >>>>> /* >>>>> * For kernel thread that doesn't have thread.regs return >>>>> * default AMR/IAMR values. >>>>> */ >>>>> static inline u64 current_thread_amr(void) >>>>> { >>>>> if (current->thread.regs) >>>>> return current->thread.regs->amr; >>>>> return AMR_KUAP_BLOCKED; >>>>> } >>>>> >>>>> Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR >>>>> when in kernel mode") >>>> >>>> Yeah that's a bit of a curly one. >>>> >>>> At some point io_uring did kthread_use_mm(), which is supposed to mean >>>> the kthread can operate on behalf of the original process that submitted >>>> the IO. >>>> >>>> But because KUAP is implemented using memory protection keys, it depends >>>> on the value of the AMR register, which is not part of the mm, it's in >>>> thread.regs->amr. >>>> >>>> And what's worse by the time we're in kthread_use_mm() we no longer have >>>> access to the thread.regs->amr of the original process that submitted >>>> the IO. >>>> >>>> We also can't simply move the AMR into the mm, precisely because it's >>>> per thread, not per mm. >>>> >>>> So TBH I don't know how we're going to fix this. >>>> >>>> I guess we could return AMR=unblocked for kernel threads, but that's >>>> arguably a bug because it allows a process to circumvent memory keys by >>>> asking the kernel to do the access. >>> >>> We shouldn't need to inherit AMR should we? We only need it to be locked >>> for kernel threads until it's explicitly unlocked -- nothing mm specific >>> there. I think current_thread_amr could return 0 for kernel threads? Or >>> I would even avoid using that function for allow_user_access and open >>> code the kthread case and remove it from current_thread_amr(). >>> >>> Thanks, >>> Nick >> > > updated one > > From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> > Date: Tue, 2 Feb 2021 09:23:38 +0530 > Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace > after kthread_use_mm > > This fix the bad fault reported by KUAP when io_wqe_worker access userspace. > > Bug: Read fault blocked by KUAP! > WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 > NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 > LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 > .......... > Call Trace: > [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) > [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 > [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c > --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 > .......... > NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 > LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 > interrupt: 300 > [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 > [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 > [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120 > [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] > [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 > [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 > [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 > [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800 > [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 > [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 > [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c > > The kernel consider thread AMR value for kernel thread to be > AMR_KUAP_BLOCKED. Hence access to userspace is denied. This > of course not correct and we should allow userspace access after > kthread_use_mm(). To be precise, kthread_use_mm() should inherit the > AMR value of the operating address space. But, the AMR value is > thread-specific and we inherit the address space and not thread > access restrictions. Because of this ignore AMR value when accessing > userspace via kernel thread. > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > --- > Changes from v1: > * Address review feedback from Nick > > arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h > index f50f72e535aa..95f4df99249e 100644 > --- a/arch/powerpc/include/asm/book3s/64/kup.h > +++ b/arch/powerpc/include/asm/book3s/64/kup.h > @@ -384,7 +384,13 @@ static __always_inline void allow_user_access(void __user *to, const void __user > // This is written so we can resolve to a single case at build time > BUILD_BUG_ON(!__builtin_constant_p(dir)); > > - if (mmu_has_feature(MMU_FTR_PKEY)) > + /* > + * if it is a kthread that did kthread_use_mm() don't > + * use current_thread_amr(). According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel thread */ It doesn't seem to be related to kthread_use_mm() > + */ > + if (current->flags & PF_KTHREAD) > + thread_amr = 0; > + else if (mmu_has_feature(MMU_FTR_PKEY)) > thread_amr = current_thread_amr(); > > if (dir == KUAP_READ) > Christophe
On 2/2/21 11:32 AM, Christophe Leroy wrote: > > > Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : >> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes: >> >>> Nicholas Piggin <npiggin@gmail.com> writes: >>> >>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes: >>>>>> +Aneesh >>>>>> >>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit : >>>>> .. >>>>>>> [ 96.200296] ------------[ cut here ]------------ >>>>>>> [ 96.200304] Bug: Read fault blocked by KUAP! >>>>>>> [ 96.200309] WARNING: CPU: 3 PID: 1876 at >>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 >>>>>> >>>>>>> [ 96.200734] NIP [c000000000849424] >>>>>>> fault_in_pages_readable+0x104/0x350 >>>>>>> [ 96.200741] LR [c00000000084952c] >>>>>>> fault_in_pages_readable+0x20c/0x350 >>>>>>> [ 96.200747] --- interrupt: 300 >>>>>> >>>>>> >>>>>> Problem happens in a section where userspace access is supposed to >>>>>> be granted, so the patch you >>>>>> proposed is definitely not the right fix. >>>>>> >>>>>> c000000000849408: 2c 01 00 4c isync >>>>>> c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting >>>>>> userspace access permission >>>>>> c000000000849410: 2c 01 00 4c isync >>>>>> c000000000849414: 00 00 36 e9 ld r9,0(r22) >>>>>> c000000000849418: 20 00 29 81 lwz r9,32(r9) >>>>>> c00000000084941c: 00 02 29 71 andi. r9,r9,512 >>>>>> c000000000849420: 78 d3 5e 7f mr r30,r26 >>>>>> ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== >>>>>> accessing userspace >>>>>> c000000000849428: 10 00 82 41 beq c000000000849438 >>>>>> <fault_in_pages_readable+0x118> >>>>>> c00000000084942c: 2c 01 00 4c isync >>>>>> c000000000849430: a6 03 bd 7e mtspr 29,r21 <== clearing >>>>>> userspace access permission >>>>>> c000000000849434: 2c 01 00 4c isync >>>>>> >>>>>> My first guess is that the problem is linked to the following >>>>>> function, see the comment >>>>>> >>>>>> /* >>>>>> * For kernel thread that doesn't have thread.regs return >>>>>> * default AMR/IAMR values. >>>>>> */ >>>>>> static inline u64 current_thread_amr(void) >>>>>> { >>>>>> if (current->thread.regs) >>>>>> return current->thread.regs->amr; >>>>>> return AMR_KUAP_BLOCKED; >>>>>> } >>>>>> >>>>>> Above function was introduced by commit 48a8ab4eeb82 >>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR >>>>>> when in kernel mode") >>>>> >>>>> Yeah that's a bit of a curly one. >>>>> >>>>> At some point io_uring did kthread_use_mm(), which is supposed to mean >>>>> the kthread can operate on behalf of the original process that >>>>> submitted >>>>> the IO. >>>>> >>>>> But because KUAP is implemented using memory protection keys, it >>>>> depends >>>>> on the value of the AMR register, which is not part of the mm, it's in >>>>> thread.regs->amr. >>>>> >>>>> And what's worse by the time we're in kthread_use_mm() we no longer >>>>> have >>>>> access to the thread.regs->amr of the original process that submitted >>>>> the IO. >>>>> >>>>> We also can't simply move the AMR into the mm, precisely because it's >>>>> per thread, not per mm. >>>>> >>>>> So TBH I don't know how we're going to fix this. >>>>> >>>>> I guess we could return AMR=unblocked for kernel threads, but that's >>>>> arguably a bug because it allows a process to circumvent memory >>>>> keys by >>>>> asking the kernel to do the access. >>>> >>>> We shouldn't need to inherit AMR should we? We only need it to be >>>> locked >>>> for kernel threads until it's explicitly unlocked -- nothing mm >>>> specific >>>> there. I think current_thread_amr could return 0 for kernel threads? Or >>>> I would even avoid using that function for allow_user_access and open >>>> code the kthread case and remove it from current_thread_amr(). >>>> >>>> Thanks, >>>> Nick >>> >> >> updated one >> >> From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> >> Date: Tue, 2 Feb 2021 09:23:38 +0530 >> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace >> after kthread_use_mm >> >> This fix the bad fault reported by KUAP when io_wqe_worker access >> userspace. >> >> Bug: Read fault blocked by KUAP! >> WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 >> __do_page_fault+0x6b4/0xcd0 >> NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 >> LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >> .......... >> Call Trace: >> [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >> (unreliable) >> [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 >> [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c >> --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 >> .......... >> NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 >> LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 >> interrupt: 300 >> [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 >> [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 >> [c000000016367990] [c000000000710330] >> iomap_file_buffered_write+0xa0/0x120 >> [c0000000163679e0] [c00800000040791c] >> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] >> [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 >> [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 >> [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 >> [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800 >> [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 >> [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 >> [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c >> >> The kernel consider thread AMR value for kernel thread to be >> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This >> of course not correct and we should allow userspace access after >> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the >> AMR value of the operating address space. But, the AMR value is >> thread-specific and we inherit the address space and not thread >> access restrictions. Because of this ignore AMR value when accessing >> userspace via kernel thread. >> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >> --- >> Changes from v1: >> * Address review feedback from Nick >> >> arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h >> b/arch/powerpc/include/asm/book3s/64/kup.h >> index f50f72e535aa..95f4df99249e 100644 >> --- a/arch/powerpc/include/asm/book3s/64/kup.h >> +++ b/arch/powerpc/include/asm/book3s/64/kup.h >> @@ -384,7 +384,13 @@ static __always_inline void >> allow_user_access(void __user *to, const void __user >> // This is written so we can resolve to a single case at build time >> BUILD_BUG_ON(!__builtin_constant_p(dir)); >> - if (mmu_has_feature(MMU_FTR_PKEY)) >> + /* >> + * if it is a kthread that did kthread_use_mm() don't >> + * use current_thread_amr(). > > According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel > thread */ > It doesn't seem to be related to kthread_use_mm() That should be a sufficient check here. if we did reach here without calling kthread_user_mm, we will crash on access because we don't have a mm attached to the current process. a kernel thread with kthread_use_mm has current->mm == current->active_mm && current->flags & PF_KTHREAD. The first part is true for every other process too. > >> + */ >> + if (current->flags & PF_KTHREAD) >> + thread_amr = 0; >> + else if (mmu_has_feature(MMU_FTR_PKEY)) >> thread_amr = current_thread_amr(); >> if (dir == KUAP_READ) >> > > Christophe -aneesh
Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit : > On 2/2/21 11:32 AM, Christophe Leroy wrote: >> >> >> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : >>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes: >>> >>>> Nicholas Piggin <npiggin@gmail.com> writes: >>>> >>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes: >>>>>>> +Aneesh >>>>>>> >>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit : >>>>>> .. >>>>>>>> [ 96.200296] ------------[ cut here ]------------ >>>>>>>> [ 96.200304] Bug: Read fault blocked by KUAP! >>>>>>>> [ 96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 >>>>>>>> bad_kernel_fault+0x180/0x310 >>>>>>> >>>>>>>> [ 96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350 >>>>>>>> [ 96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350 >>>>>>>> [ 96.200747] --- interrupt: 300 >>>>>>> >>>>>>> >>>>>>> Problem happens in a section where userspace access is supposed to be granted, so the patch you >>>>>>> proposed is definitely not the right fix. >>>>>>> >>>>>>> c000000000849408: 2c 01 00 4c isync >>>>>>> c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting userspace access permission >>>>>>> c000000000849410: 2c 01 00 4c isync >>>>>>> c000000000849414: 00 00 36 e9 ld r9,0(r22) >>>>>>> c000000000849418: 20 00 29 81 lwz r9,32(r9) >>>>>>> c00000000084941c: 00 02 29 71 andi. r9,r9,512 >>>>>>> c000000000849420: 78 d3 5e 7f mr r30,r26 >>>>>>> ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== accessing userspace >>>>>>> c000000000849428: 10 00 82 41 beq c000000000849438 <fault_in_pages_readable+0x118> >>>>>>> c00000000084942c: 2c 01 00 4c isync >>>>>>> c000000000849430: a6 03 bd 7e mtspr 29,r21 <== clearing userspace access permission >>>>>>> c000000000849434: 2c 01 00 4c isync >>>>>>> >>>>>>> My first guess is that the problem is linked to the following function, see the comment >>>>>>> >>>>>>> /* >>>>>>> * For kernel thread that doesn't have thread.regs return >>>>>>> * default AMR/IAMR values. >>>>>>> */ >>>>>>> static inline u64 current_thread_amr(void) >>>>>>> { >>>>>>> if (current->thread.regs) >>>>>>> return current->thread.regs->amr; >>>>>>> return AMR_KUAP_BLOCKED; >>>>>>> } >>>>>>> >>>>>>> Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update >>>>>>> SPRN_AMR >>>>>>> when in kernel mode") >>>>>> >>>>>> Yeah that's a bit of a curly one. >>>>>> >>>>>> At some point io_uring did kthread_use_mm(), which is supposed to mean >>>>>> the kthread can operate on behalf of the original process that submitted >>>>>> the IO. >>>>>> >>>>>> But because KUAP is implemented using memory protection keys, it depends >>>>>> on the value of the AMR register, which is not part of the mm, it's in >>>>>> thread.regs->amr. >>>>>> >>>>>> And what's worse by the time we're in kthread_use_mm() we no longer have >>>>>> access to the thread.regs->amr of the original process that submitted >>>>>> the IO. >>>>>> >>>>>> We also can't simply move the AMR into the mm, precisely because it's >>>>>> per thread, not per mm. >>>>>> >>>>>> So TBH I don't know how we're going to fix this. >>>>>> >>>>>> I guess we could return AMR=unblocked for kernel threads, but that's >>>>>> arguably a bug because it allows a process to circumvent memory keys by >>>>>> asking the kernel to do the access. >>>>> >>>>> We shouldn't need to inherit AMR should we? We only need it to be locked >>>>> for kernel threads until it's explicitly unlocked -- nothing mm specific >>>>> there. I think current_thread_amr could return 0 for kernel threads? Or >>>>> I would even avoid using that function for allow_user_access and open >>>>> code the kthread case and remove it from current_thread_amr(). >>>>> >>>>> Thanks, >>>>> Nick >>>> >>> >>> updated one >>> >>> From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 >>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> >>> Date: Tue, 2 Feb 2021 09:23:38 +0530 >>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace >>> after kthread_use_mm >>> >>> This fix the bad fault reported by KUAP when io_wqe_worker access userspace. >>> >>> Bug: Read fault blocked by KUAP! >>> WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 >>> NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 >>> LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>> .......... >>> Call Trace: >>> [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) >>> [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 >>> [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c >>> --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 >>> .......... >>> NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 >>> LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 >>> interrupt: 300 >>> [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 >>> [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 >>> [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120 >>> [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] >>> [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 >>> [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 >>> [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 >>> [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800 >>> [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 >>> [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 >>> [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c >>> >>> The kernel consider thread AMR value for kernel thread to be >>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This >>> of course not correct and we should allow userspace access after >>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the >>> AMR value of the operating address space. But, the AMR value is >>> thread-specific and we inherit the address space and not thread >>> access restrictions. Because of this ignore AMR value when accessing >>> userspace via kernel thread. >>> >>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >>> --- >>> Changes from v1: >>> * Address review feedback from Nick >>> >>> arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++- >>> 1 file changed, 7 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h >>> index f50f72e535aa..95f4df99249e 100644 >>> --- a/arch/powerpc/include/asm/book3s/64/kup.h >>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h >>> @@ -384,7 +384,13 @@ static __always_inline void allow_user_access(void __user *to, const void >>> __user >>> // This is written so we can resolve to a single case at build time >>> BUILD_BUG_ON(!__builtin_constant_p(dir)); >>> - if (mmu_has_feature(MMU_FTR_PKEY)) >>> + /* >>> + * if it is a kthread that did kthread_use_mm() don't >>> + * use current_thread_amr(). >> >> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel thread */ >> It doesn't seem to be related to kthread_use_mm() > > That should be a sufficient check here. if we did reach here without calling kthread_user_mm, we > will crash on access because we don't have a mm attached to the current process. a kernel thread > with kthread_use_mm has Ok but then the comment doesn't match the check. And also the comment in current_thread_amr() is then misleading. Why not do the current->flags & PF_KTHREAD check in current_thread_amr() and return 0 in that case instead of BLOCKED ? > > current->mm == current->active_mm && current->flags & PF_KTHREAD. > > The first part is true for every other process too. > >> >>> + */ >>> + if (current->flags & PF_KTHREAD) >>> + thread_amr = 0; >>> + else if (mmu_has_feature(MMU_FTR_PKEY)) >>> thread_amr = current_thread_amr(); >>> if (dir == KUAP_READ) >>> >> >> Christophe > > > -aneesh
On 2/2/21 11:50 AM, Christophe Leroy wrote: > > > Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit : >> On 2/2/21 11:32 AM, Christophe Leroy wrote: >>> >>> >>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : >>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes: >>>> >>>>> Nicholas Piggin <npiggin@gmail.com> writes: >>>>> >>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes: >>>>>>>> +Aneesh >>>>>>>> >>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit : >>>>>>> .. >>>>>>>>> [ 96.200296] ------------[ cut here ]------------ >>>>>>>>> [ 96.200304] Bug: Read fault blocked by KUAP! >>>>>>>>> [ 96.200309] WARNING: CPU: 3 PID: 1876 at >>>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 >>>>>>>> >>>>>>>>> [ 96.200734] NIP [c000000000849424] >>>>>>>>> fault_in_pages_readable+0x104/0x350 >>>>>>>>> [ 96.200741] LR [c00000000084952c] >>>>>>>>> fault_in_pages_readable+0x20c/0x350 >>>>>>>>> [ 96.200747] --- interrupt: 300 >>>>>>>> >>>>>>>> >>>>>>>> Problem happens in a section where userspace access is supposed >>>>>>>> to be granted, so the patch you >>>>>>>> proposed is definitely not the right fix. >>>>>>>> >>>>>>>> c000000000849408: 2c 01 00 4c isync >>>>>>>> c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting >>>>>>>> userspace access permission >>>>>>>> c000000000849410: 2c 01 00 4c isync >>>>>>>> c000000000849414: 00 00 36 e9 ld r9,0(r22) >>>>>>>> c000000000849418: 20 00 29 81 lwz r9,32(r9) >>>>>>>> c00000000084941c: 00 02 29 71 andi. r9,r9,512 >>>>>>>> c000000000849420: 78 d3 5e 7f mr r30,r26 >>>>>>>> ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== >>>>>>>> accessing userspace >>>>>>>> c000000000849428: 10 00 82 41 beq c000000000849438 >>>>>>>> <fault_in_pages_readable+0x118> >>>>>>>> c00000000084942c: 2c 01 00 4c isync >>>>>>>> c000000000849430: a6 03 bd 7e mtspr 29,r21 <== >>>>>>>> clearing userspace access permission >>>>>>>> c000000000849434: 2c 01 00 4c isync >>>>>>>> >>>>>>>> My first guess is that the problem is linked to the following >>>>>>>> function, see the comment >>>>>>>> >>>>>>>> /* >>>>>>>> * For kernel thread that doesn't have thread.regs return >>>>>>>> * default AMR/IAMR values. >>>>>>>> */ >>>>>>>> static inline u64 current_thread_amr(void) >>>>>>>> { >>>>>>>> if (current->thread.regs) >>>>>>>> return current->thread.regs->amr; >>>>>>>> return AMR_KUAP_BLOCKED; >>>>>>>> } >>>>>>>> >>>>>>>> Above function was introduced by commit 48a8ab4eeb82 >>>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR >>>>>>>> when in kernel mode") >>>>>>> >>>>>>> Yeah that's a bit of a curly one. >>>>>>> >>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to >>>>>>> mean >>>>>>> the kthread can operate on behalf of the original process that >>>>>>> submitted >>>>>>> the IO. >>>>>>> >>>>>>> But because KUAP is implemented using memory protection keys, it >>>>>>> depends >>>>>>> on the value of the AMR register, which is not part of the mm, >>>>>>> it's in >>>>>>> thread.regs->amr. >>>>>>> >>>>>>> And what's worse by the time we're in kthread_use_mm() we no >>>>>>> longer have >>>>>>> access to the thread.regs->amr of the original process that >>>>>>> submitted >>>>>>> the IO. >>>>>>> >>>>>>> We also can't simply move the AMR into the mm, precisely because >>>>>>> it's >>>>>>> per thread, not per mm. >>>>>>> >>>>>>> So TBH I don't know how we're going to fix this. >>>>>>> >>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's >>>>>>> arguably a bug because it allows a process to circumvent memory >>>>>>> keys by >>>>>>> asking the kernel to do the access. >>>>>> >>>>>> We shouldn't need to inherit AMR should we? We only need it to be >>>>>> locked >>>>>> for kernel threads until it's explicitly unlocked -- nothing mm >>>>>> specific >>>>>> there. I think current_thread_amr could return 0 for kernel >>>>>> threads? Or >>>>>> I would even avoid using that function for allow_user_access and open >>>>>> code the kthread case and remove it from current_thread_amr(). >>>>>> >>>>>> Thanks, >>>>>> Nick >>>>> >>>> >>>> updated one >>>> >>>> From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 >>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> >>>> Date: Tue, 2 Feb 2021 09:23:38 +0530 >>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access >>>> userspace >>>> after kthread_use_mm >>>> >>>> This fix the bad fault reported by KUAP when io_wqe_worker access >>>> userspace. >>>> >>>> Bug: Read fault blocked by KUAP! >>>> WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 >>>> __do_page_fault+0x6b4/0xcd0 >>>> NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 >>>> LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>>> .......... >>>> Call Trace: >>>> [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>>> (unreliable) >>>> [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 >>>> [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c >>>> --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 >>>> .......... >>>> NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 >>>> LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 >>>> interrupt: 300 >>>> [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 >>>> [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 >>>> [c000000016367990] [c000000000710330] >>>> iomap_file_buffered_write+0xa0/0x120 >>>> [c0000000163679e0] [c00800000040791c] >>>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] >>>> [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 >>>> [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 >>>> [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 >>>> [c000000016367cb0] [c0000000006e2578] >>>> io_worker_handle_work+0x498/0x800 >>>> [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 >>>> [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 >>>> [c000000016367e10] [c00000000000dbf0] >>>> ret_from_kernel_thread+0x5c/0x6c >>>> >>>> The kernel consider thread AMR value for kernel thread to be >>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This >>>> of course not correct and we should allow userspace access after >>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the >>>> AMR value of the operating address space. But, the AMR value is >>>> thread-specific and we inherit the address space and not thread >>>> access restrictions. Because of this ignore AMR value when accessing >>>> userspace via kernel thread. >>>> >>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >>>> --- >>>> Changes from v1: >>>> * Address review feedback from Nick >>>> >>>> arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++- >>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h >>>> b/arch/powerpc/include/asm/book3s/64/kup.h >>>> index f50f72e535aa..95f4df99249e 100644 >>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h >>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h >>>> @@ -384,7 +384,13 @@ static __always_inline void >>>> allow_user_access(void __user *to, const void __user >>>> // This is written so we can resolve to a single case at build >>>> time >>>> BUILD_BUG_ON(!__builtin_constant_p(dir)); >>>> - if (mmu_has_feature(MMU_FTR_PKEY)) >>>> + /* >>>> + * if it is a kthread that did kthread_use_mm() don't >>>> + * use current_thread_amr(). >>> >>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel >>> thread */ >>> It doesn't seem to be related to kthread_use_mm() >> >> That should be a sufficient check here. if we did reach here without >> calling kthread_user_mm, we will crash on access because we don't have >> a mm attached to the current process. a kernel thread with >> kthread_use_mm has > > Ok but then the comment doesn't match the check. I was trying to be explict in the comment that we expect the thread to have done kthread_use_mm(). > > And also the comment in current_thread_amr() is then misleading. > > Why not do the current->flags & PF_KTHREAD check in current_thread_amr() > and return 0 in that case instead of BLOCKED ? In my view currrent_thread_amr() is more generic and we want to be explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we call allow user access, we relax the AMR value. > >> >> current->mm == current->active_mm && current->flags & PF_KTHREAD. >> >> The first part is true for every other process too. >> >>> >>>> + */ >>>> + if (current->flags & PF_KTHREAD) >>>> + thread_amr = 0; >>>> + else if (mmu_has_feature(MMU_FTR_PKEY)) >>>> thread_amr = current_thread_amr(); >>>> if (dir == KUAP_READ) >>>> >>> >>> Christophe >> >> >> -aneesh
Excerpts from Aneesh Kumar K.V's message of February 2, 2021 4:30 pm: > On 2/2/21 11:50 AM, Christophe Leroy wrote: >> >> >> Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit : >>> On 2/2/21 11:32 AM, Christophe Leroy wrote: >>>> >>>> >>>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : >>>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes: >>>>> >>>>>> Nicholas Piggin <npiggin@gmail.com> writes: >>>>>> >>>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes: >>>>>>>>> +Aneesh >>>>>>>>> >>>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit : >>>>>>>> .. >>>>>>>>>> [ 96.200296] ------------[ cut here ]------------ >>>>>>>>>> [ 96.200304] Bug: Read fault blocked by KUAP! >>>>>>>>>> [ 96.200309] WARNING: CPU: 3 PID: 1876 at >>>>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 >>>>>>>>> >>>>>>>>>> [ 96.200734] NIP [c000000000849424] >>>>>>>>>> fault_in_pages_readable+0x104/0x350 >>>>>>>>>> [ 96.200741] LR [c00000000084952c] >>>>>>>>>> fault_in_pages_readable+0x20c/0x350 >>>>>>>>>> [ 96.200747] --- interrupt: 300 >>>>>>>>> >>>>>>>>> >>>>>>>>> Problem happens in a section where userspace access is supposed >>>>>>>>> to be granted, so the patch you >>>>>>>>> proposed is definitely not the right fix. >>>>>>>>> >>>>>>>>> c000000000849408: 2c 01 00 4c isync >>>>>>>>> c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting >>>>>>>>> userspace access permission >>>>>>>>> c000000000849410: 2c 01 00 4c isync >>>>>>>>> c000000000849414: 00 00 36 e9 ld r9,0(r22) >>>>>>>>> c000000000849418: 20 00 29 81 lwz r9,32(r9) >>>>>>>>> c00000000084941c: 00 02 29 71 andi. r9,r9,512 >>>>>>>>> c000000000849420: 78 d3 5e 7f mr r30,r26 >>>>>>>>> ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== >>>>>>>>> accessing userspace >>>>>>>>> c000000000849428: 10 00 82 41 beq c000000000849438 >>>>>>>>> <fault_in_pages_readable+0x118> >>>>>>>>> c00000000084942c: 2c 01 00 4c isync >>>>>>>>> c000000000849430: a6 03 bd 7e mtspr 29,r21 <== >>>>>>>>> clearing userspace access permission >>>>>>>>> c000000000849434: 2c 01 00 4c isync >>>>>>>>> >>>>>>>>> My first guess is that the problem is linked to the following >>>>>>>>> function, see the comment >>>>>>>>> >>>>>>>>> /* >>>>>>>>> * For kernel thread that doesn't have thread.regs return >>>>>>>>> * default AMR/IAMR values. >>>>>>>>> */ >>>>>>>>> static inline u64 current_thread_amr(void) >>>>>>>>> { >>>>>>>>> if (current->thread.regs) >>>>>>>>> return current->thread.regs->amr; >>>>>>>>> return AMR_KUAP_BLOCKED; >>>>>>>>> } >>>>>>>>> >>>>>>>>> Above function was introduced by commit 48a8ab4eeb82 >>>>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR >>>>>>>>> when in kernel mode") >>>>>>>> >>>>>>>> Yeah that's a bit of a curly one. >>>>>>>> >>>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to >>>>>>>> mean >>>>>>>> the kthread can operate on behalf of the original process that >>>>>>>> submitted >>>>>>>> the IO. >>>>>>>> >>>>>>>> But because KUAP is implemented using memory protection keys, it >>>>>>>> depends >>>>>>>> on the value of the AMR register, which is not part of the mm, >>>>>>>> it's in >>>>>>>> thread.regs->amr. >>>>>>>> >>>>>>>> And what's worse by the time we're in kthread_use_mm() we no >>>>>>>> longer have >>>>>>>> access to the thread.regs->amr of the original process that >>>>>>>> submitted >>>>>>>> the IO. >>>>>>>> >>>>>>>> We also can't simply move the AMR into the mm, precisely because >>>>>>>> it's >>>>>>>> per thread, not per mm. >>>>>>>> >>>>>>>> So TBH I don't know how we're going to fix this. >>>>>>>> >>>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's >>>>>>>> arguably a bug because it allows a process to circumvent memory >>>>>>>> keys by >>>>>>>> asking the kernel to do the access. >>>>>>> >>>>>>> We shouldn't need to inherit AMR should we? We only need it to be >>>>>>> locked >>>>>>> for kernel threads until it's explicitly unlocked -- nothing mm >>>>>>> specific >>>>>>> there. I think current_thread_amr could return 0 for kernel >>>>>>> threads? Or >>>>>>> I would even avoid using that function for allow_user_access and open >>>>>>> code the kthread case and remove it from current_thread_amr(). >>>>>>> >>>>>>> Thanks, >>>>>>> Nick >>>>>> >>>>> >>>>> updated one >>>>> >>>>> From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 >>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> >>>>> Date: Tue, 2 Feb 2021 09:23:38 +0530 >>>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access >>>>> userspace >>>>> after kthread_use_mm >>>>> >>>>> This fix the bad fault reported by KUAP when io_wqe_worker access >>>>> userspace. >>>>> >>>>> Bug: Read fault blocked by KUAP! >>>>> WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 >>>>> __do_page_fault+0x6b4/0xcd0 >>>>> NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 >>>>> LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>>>> .......... >>>>> Call Trace: >>>>> [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>>>> (unreliable) >>>>> [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 >>>>> [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c >>>>> --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 >>>>> .......... >>>>> NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 >>>>> LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 >>>>> interrupt: 300 >>>>> [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 >>>>> [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 >>>>> [c000000016367990] [c000000000710330] >>>>> iomap_file_buffered_write+0xa0/0x120 >>>>> [c0000000163679e0] [c00800000040791c] >>>>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] >>>>> [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 >>>>> [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 >>>>> [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 >>>>> [c000000016367cb0] [c0000000006e2578] >>>>> io_worker_handle_work+0x498/0x800 >>>>> [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 >>>>> [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 >>>>> [c000000016367e10] [c00000000000dbf0] >>>>> ret_from_kernel_thread+0x5c/0x6c >>>>> >>>>> The kernel consider thread AMR value for kernel thread to be >>>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This >>>>> of course not correct and we should allow userspace access after >>>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the >>>>> AMR value of the operating address space. But, the AMR value is >>>>> thread-specific and we inherit the address space and not thread >>>>> access restrictions. Because of this ignore AMR value when accessing >>>>> userspace via kernel thread. >>>>> >>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >>>>> --- >>>>> Changes from v1: >>>>> * Address review feedback from Nick >>>>> >>>>> arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++- >>>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h >>>>> b/arch/powerpc/include/asm/book3s/64/kup.h >>>>> index f50f72e535aa..95f4df99249e 100644 >>>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h >>>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h >>>>> @@ -384,7 +384,13 @@ static __always_inline void >>>>> allow_user_access(void __user *to, const void __user >>>>> // This is written so we can resolve to a single case at build >>>>> time >>>>> BUILD_BUG_ON(!__builtin_constant_p(dir)); >>>>> - if (mmu_has_feature(MMU_FTR_PKEY)) >>>>> + /* >>>>> + * if it is a kthread that did kthread_use_mm() don't >>>>> + * use current_thread_amr(). >>>> >>>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel >>>> thread */ >>>> It doesn't seem to be related to kthread_use_mm() >>> >>> That should be a sufficient check here. if we did reach here without >>> calling kthread_user_mm, we will crash on access because we don't have >>> a mm attached to the current process. a kernel thread with >>> kthread_use_mm has >> >> Ok but then the comment doesn't match the check. > > > I was trying to be explict in the comment that we expect the thread to > have done kthread_use_mm(). I would avoid making it sound conditional. There is no way for a kernel thread to ever access user without having done so. /* * Kernel threads may access user mm with kthread_use_mm() but * can't use current_thread_amr because they have thread.regs==NULL, * but they have no pkeys. */ > >> >> And also the comment in current_thread_amr() is then misleading. >> >> Why not do the current->flags & PF_KTHREAD check in current_thread_amr() >> and return 0 in that case instead of BLOCKED ? > > In my view currrent_thread_amr() is more generic and we want to be > explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we > call allow user access, we relax the AMR value. current_thread_amr() shouldn't be used by kernel threads ever (after the patch). It's just confusing. The user access or the check could have a test and warning for !current->mm if there is a concern about it (maybe there already is one somehwere). Thanks, Nick
On 2/1/21 11:30 PM, Aneesh Kumar K.V wrote: > On 2/2/21 11:50 AM, Christophe Leroy wrote: >> >> >> Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit : >>> On 2/2/21 11:32 AM, Christophe Leroy wrote: >>>> >>>> >>>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : >>>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes: >>>>> >>>>>> Nicholas Piggin <npiggin@gmail.com> writes: >>>>>> >>>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes: >>>>>>>>> +Aneesh >>>>>>>>> >>>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit : >>>>>>>> .. >>>>>>>>>> [ 96.200296] ------------[ cut here ]------------ >>>>>>>>>> [ 96.200304] Bug: Read fault blocked by KUAP! >>>>>>>>>> [ 96.200309] WARNING: CPU: 3 PID: 1876 at >>>>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 >>>>>>>>> >>>>>>>>>> [ 96.200734] NIP [c000000000849424] >>>>>>>>>> fault_in_pages_readable+0x104/0x350 >>>>>>>>>> [ 96.200741] LR [c00000000084952c] >>>>>>>>>> fault_in_pages_readable+0x20c/0x350 >>>>>>>>>> [ 96.200747] --- interrupt: 300 >>>>>>>>> >>>>>>>>> >>>>>>>>> Problem happens in a section where userspace access is supposed >>>>>>>>> to be granted, so the patch you >>>>>>>>> proposed is definitely not the right fix. >>>>>>>>> >>>>>>>>> c000000000849408: 2c 01 00 4c isync >>>>>>>>> c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting >>>>>>>>> userspace access permission >>>>>>>>> c000000000849410: 2c 01 00 4c isync >>>>>>>>> c000000000849414: 00 00 36 e9 ld r9,0(r22) >>>>>>>>> c000000000849418: 20 00 29 81 lwz r9,32(r9) >>>>>>>>> c00000000084941c: 00 02 29 71 andi. r9,r9,512 >>>>>>>>> c000000000849420: 78 d3 5e 7f mr r30,r26 >>>>>>>>> ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== >>>>>>>>> accessing userspace >>>>>>>>> c000000000849428: 10 00 82 41 beq c000000000849438 >>>>>>>>> <fault_in_pages_readable+0x118> >>>>>>>>> c00000000084942c: 2c 01 00 4c isync >>>>>>>>> c000000000849430: a6 03 bd 7e mtspr 29,r21 <== >>>>>>>>> clearing userspace access permission >>>>>>>>> c000000000849434: 2c 01 00 4c isync >>>>>>>>> >>>>>>>>> My first guess is that the problem is linked to the following >>>>>>>>> function, see the comment >>>>>>>>> >>>>>>>>> /* >>>>>>>>> * For kernel thread that doesn't have thread.regs return >>>>>>>>> * default AMR/IAMR values. >>>>>>>>> */ >>>>>>>>> static inline u64 current_thread_amr(void) >>>>>>>>> { >>>>>>>>> if (current->thread.regs) >>>>>>>>> return current->thread.regs->amr; >>>>>>>>> return AMR_KUAP_BLOCKED; >>>>>>>>> } >>>>>>>>> >>>>>>>>> Above function was introduced by commit 48a8ab4eeb82 >>>>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR >>>>>>>>> when in kernel mode") >>>>>>>> >>>>>>>> Yeah that's a bit of a curly one. >>>>>>>> >>>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to >>>>>>>> mean >>>>>>>> the kthread can operate on behalf of the original process that >>>>>>>> submitted >>>>>>>> the IO. >>>>>>>> >>>>>>>> But because KUAP is implemented using memory protection keys, it >>>>>>>> depends >>>>>>>> on the value of the AMR register, which is not part of the mm, >>>>>>>> it's in >>>>>>>> thread.regs->amr. >>>>>>>> >>>>>>>> And what's worse by the time we're in kthread_use_mm() we no >>>>>>>> longer have >>>>>>>> access to the thread.regs->amr of the original process that >>>>>>>> submitted >>>>>>>> the IO. >>>>>>>> >>>>>>>> We also can't simply move the AMR into the mm, precisely because >>>>>>>> it's >>>>>>>> per thread, not per mm. >>>>>>>> >>>>>>>> So TBH I don't know how we're going to fix this. >>>>>>>> >>>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's >>>>>>>> arguably a bug because it allows a process to circumvent memory >>>>>>>> keys by >>>>>>>> asking the kernel to do the access. >>>>>>> >>>>>>> We shouldn't need to inherit AMR should we? We only need it to be >>>>>>> locked >>>>>>> for kernel threads until it's explicitly unlocked -- nothing mm >>>>>>> specific >>>>>>> there. I think current_thread_amr could return 0 for kernel >>>>>>> threads? Or >>>>>>> I would even avoid using that function for allow_user_access and open >>>>>>> code the kthread case and remove it from current_thread_amr(). >>>>>>> >>>>>>> Thanks, >>>>>>> Nick >>>>>> >>>>> >>>>> updated one >>>>> >>>>> From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 >>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> >>>>> Date: Tue, 2 Feb 2021 09:23:38 +0530 >>>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access >>>>> userspace >>>>> after kthread_use_mm >>>>> >>>>> This fix the bad fault reported by KUAP when io_wqe_worker access >>>>> userspace. >>>>> >>>>> Bug: Read fault blocked by KUAP! >>>>> WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 >>>>> __do_page_fault+0x6b4/0xcd0 >>>>> NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 >>>>> LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>>>> .......... >>>>> Call Trace: >>>>> [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>>>> (unreliable) >>>>> [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 >>>>> [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c >>>>> --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 >>>>> .......... >>>>> NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 >>>>> LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 >>>>> interrupt: 300 >>>>> [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 >>>>> [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 >>>>> [c000000016367990] [c000000000710330] >>>>> iomap_file_buffered_write+0xa0/0x120 >>>>> [c0000000163679e0] [c00800000040791c] >>>>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] >>>>> [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 >>>>> [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 >>>>> [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 >>>>> [c000000016367cb0] [c0000000006e2578] >>>>> io_worker_handle_work+0x498/0x800 >>>>> [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 >>>>> [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 >>>>> [c000000016367e10] [c00000000000dbf0] >>>>> ret_from_kernel_thread+0x5c/0x6c >>>>> >>>>> The kernel consider thread AMR value for kernel thread to be >>>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This >>>>> of course not correct and we should allow userspace access after >>>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the >>>>> AMR value of the operating address space. But, the AMR value is >>>>> thread-specific and we inherit the address space and not thread >>>>> access restrictions. Because of this ignore AMR value when accessing >>>>> userspace via kernel thread. >>>>> >>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >>>>> --- >>>>> Changes from v1: >>>>> * Address review feedback from Nick >>>>> >>>>> arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++- >>>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h >>>>> b/arch/powerpc/include/asm/book3s/64/kup.h >>>>> index f50f72e535aa..95f4df99249e 100644 >>>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h >>>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h >>>>> @@ -384,7 +384,13 @@ static __always_inline void >>>>> allow_user_access(void __user *to, const void __user >>>>> // This is written so we can resolve to a single case at build >>>>> time >>>>> BUILD_BUG_ON(!__builtin_constant_p(dir)); >>>>> - if (mmu_has_feature(MMU_FTR_PKEY)) >>>>> + /* >>>>> + * if it is a kthread that did kthread_use_mm() don't >>>>> + * use current_thread_amr(). >>>> >>>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel >>>> thread */ >>>> It doesn't seem to be related to kthread_use_mm() >>> >>> That should be a sufficient check here. if we did reach here without >>> calling kthread_user_mm, we will crash on access because we don't have >>> a mm attached to the current process. a kernel thread with >>> kthread_use_mm has >> >> Ok but then the comment doesn't match the check. > > > I was trying to be explict in the comment that we expect the thread to > have done kthread_use_mm(). > >> >> And also the comment in current_thread_amr() is then misleading. >> >> Why not do the current->flags & PF_KTHREAD check in current_thread_amr() >> and return 0 in that case instead of BLOCKED ? > > In my view currrent_thread_amr() is more generic and we want to be > explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we > call allow user access, we relax the AMR value. Just following up on this, as I'd hate to have 5.11 released with this bug in it for powerpc. It'll obviously also affect other cases of a kernel thread faulting after having done kthread_use_mm(), though I'm not sure how widespread that is. In any case, it'll leave io_uring mostly broken on powerpc if this isn't patched for release.
On 2/4/21 10:23 PM, Jens Axboe wrote: > On 2/1/21 11:30 PM, Aneesh Kumar K.V wrote: >> On 2/2/21 11:50 AM, Christophe Leroy wrote: >>> >>> >>> Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit : >>>> On 2/2/21 11:32 AM, Christophe Leroy wrote: >>>>> >>>>> >>>>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : >>>>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes: >>>>>> >>>>>>> Nicholas Piggin <npiggin@gmail.com> writes: >>>>>>> >>>>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >>>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes: >>>>>>>>>> +Aneesh >>>>>>>>>> >>>>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit : >>>>>>>>> .. >>>>>>>>>>> [ 96.200296] ------------[ cut here ]------------ >>>>>>>>>>> [ 96.200304] Bug: Read fault blocked by KUAP! >>>>>>>>>>> [ 96.200309] WARNING: CPU: 3 PID: 1876 at >>>>>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 >>>>>>>>>> >>>>>>>>>>> [ 96.200734] NIP [c000000000849424] >>>>>>>>>>> fault_in_pages_readable+0x104/0x350 >>>>>>>>>>> [ 96.200741] LR [c00000000084952c] >>>>>>>>>>> fault_in_pages_readable+0x20c/0x350 >>>>>>>>>>> [ 96.200747] --- interrupt: 300 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Problem happens in a section where userspace access is supposed >>>>>>>>>> to be granted, so the patch you >>>>>>>>>> proposed is definitely not the right fix. >>>>>>>>>> >>>>>>>>>> c000000000849408: 2c 01 00 4c isync >>>>>>>>>> c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting >>>>>>>>>> userspace access permission >>>>>>>>>> c000000000849410: 2c 01 00 4c isync >>>>>>>>>> c000000000849414: 00 00 36 e9 ld r9,0(r22) >>>>>>>>>> c000000000849418: 20 00 29 81 lwz r9,32(r9) >>>>>>>>>> c00000000084941c: 00 02 29 71 andi. r9,r9,512 >>>>>>>>>> c000000000849420: 78 d3 5e 7f mr r30,r26 >>>>>>>>>> ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== >>>>>>>>>> accessing userspace >>>>>>>>>> c000000000849428: 10 00 82 41 beq c000000000849438 >>>>>>>>>> <fault_in_pages_readable+0x118> >>>>>>>>>> c00000000084942c: 2c 01 00 4c isync >>>>>>>>>> c000000000849430: a6 03 bd 7e mtspr 29,r21 <== >>>>>>>>>> clearing userspace access permission >>>>>>>>>> c000000000849434: 2c 01 00 4c isync >>>>>>>>>> >>>>>>>>>> My first guess is that the problem is linked to the following >>>>>>>>>> function, see the comment >>>>>>>>>> >>>>>>>>>> /* >>>>>>>>>> * For kernel thread that doesn't have thread.regs return >>>>>>>>>> * default AMR/IAMR values. >>>>>>>>>> */ >>>>>>>>>> static inline u64 current_thread_amr(void) >>>>>>>>>> { >>>>>>>>>> if (current->thread.regs) >>>>>>>>>> return current->thread.regs->amr; >>>>>>>>>> return AMR_KUAP_BLOCKED; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Above function was introduced by commit 48a8ab4eeb82 >>>>>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR >>>>>>>>>> when in kernel mode") >>>>>>>>> >>>>>>>>> Yeah that's a bit of a curly one. >>>>>>>>> >>>>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to >>>>>>>>> mean >>>>>>>>> the kthread can operate on behalf of the original process that >>>>>>>>> submitted >>>>>>>>> the IO. >>>>>>>>> >>>>>>>>> But because KUAP is implemented using memory protection keys, it >>>>>>>>> depends >>>>>>>>> on the value of the AMR register, which is not part of the mm, >>>>>>>>> it's in >>>>>>>>> thread.regs->amr. >>>>>>>>> >>>>>>>>> And what's worse by the time we're in kthread_use_mm() we no >>>>>>>>> longer have >>>>>>>>> access to the thread.regs->amr of the original process that >>>>>>>>> submitted >>>>>>>>> the IO. >>>>>>>>> >>>>>>>>> We also can't simply move the AMR into the mm, precisely because >>>>>>>>> it's >>>>>>>>> per thread, not per mm. >>>>>>>>> >>>>>>>>> So TBH I don't know how we're going to fix this. >>>>>>>>> >>>>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's >>>>>>>>> arguably a bug because it allows a process to circumvent memory >>>>>>>>> keys by >>>>>>>>> asking the kernel to do the access. >>>>>>>> >>>>>>>> We shouldn't need to inherit AMR should we? We only need it to be >>>>>>>> locked >>>>>>>> for kernel threads until it's explicitly unlocked -- nothing mm >>>>>>>> specific >>>>>>>> there. I think current_thread_amr could return 0 for kernel >>>>>>>> threads? Or >>>>>>>> I would even avoid using that function for allow_user_access and open >>>>>>>> code the kthread case and remove it from current_thread_amr(). >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Nick >>>>>>> >>>>>> >>>>>> updated one >>>>>> >>>>>> From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 >>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> >>>>>> Date: Tue, 2 Feb 2021 09:23:38 +0530 >>>>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access >>>>>> userspace >>>>>> after kthread_use_mm >>>>>> >>>>>> This fix the bad fault reported by KUAP when io_wqe_worker access >>>>>> userspace. >>>>>> >>>>>> Bug: Read fault blocked by KUAP! >>>>>> WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 >>>>>> __do_page_fault+0x6b4/0xcd0 >>>>>> NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 >>>>>> LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>>>>> .......... >>>>>> Call Trace: >>>>>> [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>>>>> (unreliable) >>>>>> [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 >>>>>> [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c >>>>>> --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 >>>>>> .......... >>>>>> NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 >>>>>> LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 >>>>>> interrupt: 300 >>>>>> [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 >>>>>> [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 >>>>>> [c000000016367990] [c000000000710330] >>>>>> iomap_file_buffered_write+0xa0/0x120 >>>>>> [c0000000163679e0] [c00800000040791c] >>>>>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] >>>>>> [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 >>>>>> [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 >>>>>> [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 >>>>>> [c000000016367cb0] [c0000000006e2578] >>>>>> io_worker_handle_work+0x498/0x800 >>>>>> [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 >>>>>> [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 >>>>>> [c000000016367e10] [c00000000000dbf0] >>>>>> ret_from_kernel_thread+0x5c/0x6c >>>>>> >>>>>> The kernel consider thread AMR value for kernel thread to be >>>>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This >>>>>> of course not correct and we should allow userspace access after >>>>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the >>>>>> AMR value of the operating address space. But, the AMR value is >>>>>> thread-specific and we inherit the address space and not thread >>>>>> access restrictions. Because of this ignore AMR value when accessing >>>>>> userspace via kernel thread. >>>>>> >>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >>>>>> --- >>>>>> Changes from v1: >>>>>> * Address review feedback from Nick >>>>>> >>>>>> arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++- >>>>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h >>>>>> b/arch/powerpc/include/asm/book3s/64/kup.h >>>>>> index f50f72e535aa..95f4df99249e 100644 >>>>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h >>>>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h >>>>>> @@ -384,7 +384,13 @@ static __always_inline void >>>>>> allow_user_access(void __user *to, const void __user >>>>>> // This is written so we can resolve to a single case at build >>>>>> time >>>>>> BUILD_BUG_ON(!__builtin_constant_p(dir)); >>>>>> - if (mmu_has_feature(MMU_FTR_PKEY)) >>>>>> + /* >>>>>> + * if it is a kthread that did kthread_use_mm() don't >>>>>> + * use current_thread_amr(). >>>>> >>>>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel >>>>> thread */ >>>>> It doesn't seem to be related to kthread_use_mm() >>>> >>>> That should be a sufficient check here. if we did reach here without >>>> calling kthread_user_mm, we will crash on access because we don't have >>>> a mm attached to the current process. a kernel thread with >>>> kthread_use_mm has >>> >>> Ok but then the comment doesn't match the check. >> >> >> I was trying to be explict in the comment that we expect the thread to >> have done kthread_use_mm(). >> >>> >>> And also the comment in current_thread_amr() is then misleading. >>> >>> Why not do the current->flags & PF_KTHREAD check in current_thread_amr() >>> and return 0 in that case instead of BLOCKED ? >> >> In my view currrent_thread_amr() is more generic and we want to be >> explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we >> call allow user access, we relax the AMR value. > > Just following up on this, as I'd hate to have 5.11 released with this > bug in it for powerpc. It'll obviously also affect other cases of a > kernel thread faulting after having done kthread_use_mm(), though I'm > not sure how widespread that is. In any case, it'll leave io_uring > mostly broken on powerpc if this isn't patched for release. > I am waiting for test feedback on the change I posted earlier. I am also running a regression run myself. Once that is complete i will post the patch as a separate email. -aneesh
On 2/4/21 10:01 AM, Aneesh Kumar K.V wrote: > On 2/4/21 10:23 PM, Jens Axboe wrote: >> On 2/1/21 11:30 PM, Aneesh Kumar K.V wrote: >>> On 2/2/21 11:50 AM, Christophe Leroy wrote: >>>> >>>> >>>> Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit : >>>>> On 2/2/21 11:32 AM, Christophe Leroy wrote: >>>>>> >>>>>> >>>>>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : >>>>>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes: >>>>>>> >>>>>>>> Nicholas Piggin <npiggin@gmail.com> writes: >>>>>>>> >>>>>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >>>>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes: >>>>>>>>>>> +Aneesh >>>>>>>>>>> >>>>>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit : >>>>>>>>>> .. >>>>>>>>>>>> [ 96.200296] ------------[ cut here ]------------ >>>>>>>>>>>> [ 96.200304] Bug: Read fault blocked by KUAP! >>>>>>>>>>>> [ 96.200309] WARNING: CPU: 3 PID: 1876 at >>>>>>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 >>>>>>>>>>> >>>>>>>>>>>> [ 96.200734] NIP [c000000000849424] >>>>>>>>>>>> fault_in_pages_readable+0x104/0x350 >>>>>>>>>>>> [ 96.200741] LR [c00000000084952c] >>>>>>>>>>>> fault_in_pages_readable+0x20c/0x350 >>>>>>>>>>>> [ 96.200747] --- interrupt: 300 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Problem happens in a section where userspace access is supposed >>>>>>>>>>> to be granted, so the patch you >>>>>>>>>>> proposed is definitely not the right fix. >>>>>>>>>>> >>>>>>>>>>> c000000000849408: 2c 01 00 4c isync >>>>>>>>>>> c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting >>>>>>>>>>> userspace access permission >>>>>>>>>>> c000000000849410: 2c 01 00 4c isync >>>>>>>>>>> c000000000849414: 00 00 36 e9 ld r9,0(r22) >>>>>>>>>>> c000000000849418: 20 00 29 81 lwz r9,32(r9) >>>>>>>>>>> c00000000084941c: 00 02 29 71 andi. r9,r9,512 >>>>>>>>>>> c000000000849420: 78 d3 5e 7f mr r30,r26 >>>>>>>>>>> ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== >>>>>>>>>>> accessing userspace >>>>>>>>>>> c000000000849428: 10 00 82 41 beq c000000000849438 >>>>>>>>>>> <fault_in_pages_readable+0x118> >>>>>>>>>>> c00000000084942c: 2c 01 00 4c isync >>>>>>>>>>> c000000000849430: a6 03 bd 7e mtspr 29,r21 <== >>>>>>>>>>> clearing userspace access permission >>>>>>>>>>> c000000000849434: 2c 01 00 4c isync >>>>>>>>>>> >>>>>>>>>>> My first guess is that the problem is linked to the following >>>>>>>>>>> function, see the comment >>>>>>>>>>> >>>>>>>>>>> /* >>>>>>>>>>> * For kernel thread that doesn't have thread.regs return >>>>>>>>>>> * default AMR/IAMR values. >>>>>>>>>>> */ >>>>>>>>>>> static inline u64 current_thread_amr(void) >>>>>>>>>>> { >>>>>>>>>>> if (current->thread.regs) >>>>>>>>>>> return current->thread.regs->amr; >>>>>>>>>>> return AMR_KUAP_BLOCKED; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> Above function was introduced by commit 48a8ab4eeb82 >>>>>>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR >>>>>>>>>>> when in kernel mode") >>>>>>>>>> >>>>>>>>>> Yeah that's a bit of a curly one. >>>>>>>>>> >>>>>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to >>>>>>>>>> mean >>>>>>>>>> the kthread can operate on behalf of the original process that >>>>>>>>>> submitted >>>>>>>>>> the IO. >>>>>>>>>> >>>>>>>>>> But because KUAP is implemented using memory protection keys, it >>>>>>>>>> depends >>>>>>>>>> on the value of the AMR register, which is not part of the mm, >>>>>>>>>> it's in >>>>>>>>>> thread.regs->amr. >>>>>>>>>> >>>>>>>>>> And what's worse by the time we're in kthread_use_mm() we no >>>>>>>>>> longer have >>>>>>>>>> access to the thread.regs->amr of the original process that >>>>>>>>>> submitted >>>>>>>>>> the IO. >>>>>>>>>> >>>>>>>>>> We also can't simply move the AMR into the mm, precisely because >>>>>>>>>> it's >>>>>>>>>> per thread, not per mm. >>>>>>>>>> >>>>>>>>>> So TBH I don't know how we're going to fix this. >>>>>>>>>> >>>>>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's >>>>>>>>>> arguably a bug because it allows a process to circumvent memory >>>>>>>>>> keys by >>>>>>>>>> asking the kernel to do the access. >>>>>>>>> >>>>>>>>> We shouldn't need to inherit AMR should we? We only need it to be >>>>>>>>> locked >>>>>>>>> for kernel threads until it's explicitly unlocked -- nothing mm >>>>>>>>> specific >>>>>>>>> there. I think current_thread_amr could return 0 for kernel >>>>>>>>> threads? Or >>>>>>>>> I would even avoid using that function for allow_user_access and open >>>>>>>>> code the kthread case and remove it from current_thread_amr(). >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Nick >>>>>>>> >>>>>>> >>>>>>> updated one >>>>>>> >>>>>>> From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 >>>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> >>>>>>> Date: Tue, 2 Feb 2021 09:23:38 +0530 >>>>>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access >>>>>>> userspace >>>>>>> after kthread_use_mm >>>>>>> >>>>>>> This fix the bad fault reported by KUAP when io_wqe_worker access >>>>>>> userspace. >>>>>>> >>>>>>> Bug: Read fault blocked by KUAP! >>>>>>> WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 >>>>>>> __do_page_fault+0x6b4/0xcd0 >>>>>>> NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 >>>>>>> LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>>>>>> .......... >>>>>>> Call Trace: >>>>>>> [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>>>>>> (unreliable) >>>>>>> [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 >>>>>>> [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c >>>>>>> --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 >>>>>>> .......... >>>>>>> NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 >>>>>>> LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 >>>>>>> interrupt: 300 >>>>>>> [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 >>>>>>> [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 >>>>>>> [c000000016367990] [c000000000710330] >>>>>>> iomap_file_buffered_write+0xa0/0x120 >>>>>>> [c0000000163679e0] [c00800000040791c] >>>>>>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] >>>>>>> [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 >>>>>>> [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 >>>>>>> [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 >>>>>>> [c000000016367cb0] [c0000000006e2578] >>>>>>> io_worker_handle_work+0x498/0x800 >>>>>>> [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 >>>>>>> [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 >>>>>>> [c000000016367e10] [c00000000000dbf0] >>>>>>> ret_from_kernel_thread+0x5c/0x6c >>>>>>> >>>>>>> The kernel consider thread AMR value for kernel thread to be >>>>>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This >>>>>>> of course not correct and we should allow userspace access after >>>>>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the >>>>>>> AMR value of the operating address space. But, the AMR value is >>>>>>> thread-specific and we inherit the address space and not thread >>>>>>> access restrictions. Because of this ignore AMR value when accessing >>>>>>> userspace via kernel thread. >>>>>>> >>>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >>>>>>> --- >>>>>>> Changes from v1: >>>>>>> * Address review feedback from Nick >>>>>>> >>>>>>> arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++- >>>>>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h >>>>>>> b/arch/powerpc/include/asm/book3s/64/kup.h >>>>>>> index f50f72e535aa..95f4df99249e 100644 >>>>>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h >>>>>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h >>>>>>> @@ -384,7 +384,13 @@ static __always_inline void >>>>>>> allow_user_access(void __user *to, const void __user >>>>>>> // This is written so we can resolve to a single case at build >>>>>>> time >>>>>>> BUILD_BUG_ON(!__builtin_constant_p(dir)); >>>>>>> - if (mmu_has_feature(MMU_FTR_PKEY)) >>>>>>> + /* >>>>>>> + * if it is a kthread that did kthread_use_mm() don't >>>>>>> + * use current_thread_amr(). >>>>>> >>>>>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel >>>>>> thread */ >>>>>> It doesn't seem to be related to kthread_use_mm() >>>>> >>>>> That should be a sufficient check here. if we did reach here without >>>>> calling kthread_user_mm, we will crash on access because we don't have >>>>> a mm attached to the current process. a kernel thread with >>>>> kthread_use_mm has >>>> >>>> Ok but then the comment doesn't match the check. >>> >>> >>> I was trying to be explict in the comment that we expect the thread to >>> have done kthread_use_mm(). >>> >>>> >>>> And also the comment in current_thread_amr() is then misleading. >>>> >>>> Why not do the current->flags & PF_KTHREAD check in current_thread_amr() >>>> and return 0 in that case instead of BLOCKED ? >>> >>> In my view currrent_thread_amr() is more generic and we want to be >>> explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we >>> call allow user access, we relax the AMR value. >> >> Just following up on this, as I'd hate to have 5.11 released with this >> bug in it for powerpc. It'll obviously also affect other cases of a >> kernel thread faulting after having done kthread_use_mm(), though I'm >> not sure how widespread that is. In any case, it'll leave io_uring >> mostly broken on powerpc if this isn't patched for release. >> > > I am waiting for test feedback on the change I posted earlier. I am also > running a regression run myself. Once that is complete i will post the > patch as a separate email. Perfect, sounds good!
On 2/4/21 11:27 PM, Zorro Lang wrote: > On Thu, Feb 04, 2021 at 10:31:59PM +0530, Aneesh Kumar K.V wrote: >> On 2/4/21 10:23 PM, Jens Axboe wrote: >>> On 2/1/21 11:30 PM, Aneesh Kumar K.V wrote: >>>> On 2/2/21 11:50 AM, Christophe Leroy wrote: >>>>> >>>>> >>>>> Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit : >>>>>> On 2/2/21 11:32 AM, Christophe Leroy wrote: >>>>>>> >>>>>>> >>>>>>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : >>>>>>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes: >>>>>>>> >>>>>>>>> Nicholas Piggin <npiggin@gmail.com> writes: >>>>>>>>> >>>>>>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: >>>>>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes: >>>>>>>>>>>> +Aneesh >>>>>>>>>>>> >>>>>>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit : >>>>>>>>>>> .. >>>>>>>>>>>>> [ 96.200296] ------------[ cut here ]------------ >>>>>>>>>>>>> [ 96.200304] Bug: Read fault blocked by KUAP! >>>>>>>>>>>>> [ 96.200309] WARNING: CPU: 3 PID: 1876 at >>>>>>>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 >>>>>>>>>>>> >>>>>>>>>>>>> [ 96.200734] NIP [c000000000849424] >>>>>>>>>>>>> fault_in_pages_readable+0x104/0x350 >>>>>>>>>>>>> [ 96.200741] LR [c00000000084952c] >>>>>>>>>>>>> fault_in_pages_readable+0x20c/0x350 >>>>>>>>>>>>> [ 96.200747] --- interrupt: 300 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Problem happens in a section where userspace access is supposed >>>>>>>>>>>> to be granted, so the patch you >>>>>>>>>>>> proposed is definitely not the right fix. >>>>>>>>>>>> >>>>>>>>>>>> c000000000849408: 2c 01 00 4c isync >>>>>>>>>>>> c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting >>>>>>>>>>>> userspace access permission >>>>>>>>>>>> c000000000849410: 2c 01 00 4c isync >>>>>>>>>>>> c000000000849414: 00 00 36 e9 ld r9,0(r22) >>>>>>>>>>>> c000000000849418: 20 00 29 81 lwz r9,32(r9) >>>>>>>>>>>> c00000000084941c: 00 02 29 71 andi. r9,r9,512 >>>>>>>>>>>> c000000000849420: 78 d3 5e 7f mr r30,r26 >>>>>>>>>>>> ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== >>>>>>>>>>>> accessing userspace >>>>>>>>>>>> c000000000849428: 10 00 82 41 beq c000000000849438 >>>>>>>>>>>> <fault_in_pages_readable+0x118> >>>>>>>>>>>> c00000000084942c: 2c 01 00 4c isync >>>>>>>>>>>> c000000000849430: a6 03 bd 7e mtspr 29,r21 <== >>>>>>>>>>>> clearing userspace access permission >>>>>>>>>>>> c000000000849434: 2c 01 00 4c isync >>>>>>>>>>>> >>>>>>>>>>>> My first guess is that the problem is linked to the following >>>>>>>>>>>> function, see the comment >>>>>>>>>>>> >>>>>>>>>>>> /* >>>>>>>>>>>> * For kernel thread that doesn't have thread.regs return >>>>>>>>>>>> * default AMR/IAMR values. >>>>>>>>>>>> */ >>>>>>>>>>>> static inline u64 current_thread_amr(void) >>>>>>>>>>>> { >>>>>>>>>>>> if (current->thread.regs) >>>>>>>>>>>> return current->thread.regs->amr; >>>>>>>>>>>> return AMR_KUAP_BLOCKED; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> Above function was introduced by commit 48a8ab4eeb82 >>>>>>>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR >>>>>>>>>>>> when in kernel mode") >>>>>>>>>>> >>>>>>>>>>> Yeah that's a bit of a curly one. >>>>>>>>>>> >>>>>>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to >>>>>>>>>>> mean >>>>>>>>>>> the kthread can operate on behalf of the original process that >>>>>>>>>>> submitted >>>>>>>>>>> the IO. >>>>>>>>>>> >>>>>>>>>>> But because KUAP is implemented using memory protection keys, it >>>>>>>>>>> depends >>>>>>>>>>> on the value of the AMR register, which is not part of the mm, >>>>>>>>>>> it's in >>>>>>>>>>> thread.regs->amr. >>>>>>>>>>> >>>>>>>>>>> And what's worse by the time we're in kthread_use_mm() we no >>>>>>>>>>> longer have >>>>>>>>>>> access to the thread.regs->amr of the original process that >>>>>>>>>>> submitted >>>>>>>>>>> the IO. >>>>>>>>>>> >>>>>>>>>>> We also can't simply move the AMR into the mm, precisely because >>>>>>>>>>> it's >>>>>>>>>>> per thread, not per mm. >>>>>>>>>>> >>>>>>>>>>> So TBH I don't know how we're going to fix this. >>>>>>>>>>> >>>>>>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's >>>>>>>>>>> arguably a bug because it allows a process to circumvent memory >>>>>>>>>>> keys by >>>>>>>>>>> asking the kernel to do the access. >>>>>>>>>> >>>>>>>>>> We shouldn't need to inherit AMR should we? We only need it to be >>>>>>>>>> locked >>>>>>>>>> for kernel threads until it's explicitly unlocked -- nothing mm >>>>>>>>>> specific >>>>>>>>>> there. I think current_thread_amr could return 0 for kernel >>>>>>>>>> threads? Or >>>>>>>>>> I would even avoid using that function for allow_user_access and open >>>>>>>>>> code the kthread case and remove it from current_thread_amr(). >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Nick >>>>>>>>> >>>>>>>> >>>>>>>> updated one >>>>>>>> >>>>>>>> From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 >>>>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> >>>>>>>> Date: Tue, 2 Feb 2021 09:23:38 +0530 >>>>>>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access >>>>>>>> userspace >>>>>>>> after kthread_use_mm >>>>>>>> >>>>>>>> This fix the bad fault reported by KUAP when io_wqe_worker access >>>>>>>> userspace. >>>>>>>> >>>>>>>> Bug: Read fault blocked by KUAP! >>>>>>>> WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 >>>>>>>> __do_page_fault+0x6b4/0xcd0 >>>>>>>> NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 >>>>>>>> LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>>>>>>> .......... >>>>>>>> Call Trace: >>>>>>>> [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 >>>>>>>> (unreliable) >>>>>>>> [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 >>>>>>>> [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c >>>>>>>> --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 >>>>>>>> .......... >>>>>>>> NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 >>>>>>>> LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 >>>>>>>> interrupt: 300 >>>>>>>> [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 >>>>>>>> [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 >>>>>>>> [c000000016367990] [c000000000710330] >>>>>>>> iomap_file_buffered_write+0xa0/0x120 >>>>>>>> [c0000000163679e0] [c00800000040791c] >>>>>>>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] >>>>>>>> [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 >>>>>>>> [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 >>>>>>>> [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 >>>>>>>> [c000000016367cb0] [c0000000006e2578] >>>>>>>> io_worker_handle_work+0x498/0x800 >>>>>>>> [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 >>>>>>>> [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 >>>>>>>> [c000000016367e10] [c00000000000dbf0] >>>>>>>> ret_from_kernel_thread+0x5c/0x6c >>>>>>>> >>>>>>>> The kernel consider thread AMR value for kernel thread to be >>>>>>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This >>>>>>>> of course not correct and we should allow userspace access after >>>>>>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the >>>>>>>> AMR value of the operating address space. But, the AMR value is >>>>>>>> thread-specific and we inherit the address space and not thread >>>>>>>> access restrictions. Because of this ignore AMR value when accessing >>>>>>>> userspace via kernel thread. >>>>>>>> >>>>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> >>>>>>>> --- >>>>>>>> Changes from v1: >>>>>>>> * Address review feedback from Nick >>>>>>>> >>>>>>>> arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++- >>>>>>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h >>>>>>>> b/arch/powerpc/include/asm/book3s/64/kup.h >>>>>>>> index f50f72e535aa..95f4df99249e 100644 >>>>>>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h >>>>>>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h >>>>>>>> @@ -384,7 +384,13 @@ static __always_inline void >>>>>>>> allow_user_access(void __user *to, const void __user >>>>>>>> // This is written so we can resolve to a single case at build >>>>>>>> time >>>>>>>> BUILD_BUG_ON(!__builtin_constant_p(dir)); >>>>>>>> - if (mmu_has_feature(MMU_FTR_PKEY)) >>>>>>>> + /* >>>>>>>> + * if it is a kthread that did kthread_use_mm() don't >>>>>>>> + * use current_thread_amr(). >>>>>>> >>>>>>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel >>>>>>> thread */ >>>>>>> It doesn't seem to be related to kthread_use_mm() >>>>>> >>>>>> That should be a sufficient check here. if we did reach here without >>>>>> calling kthread_user_mm, we will crash on access because we don't have >>>>>> a mm attached to the current process. a kernel thread with >>>>>> kthread_use_mm has >>>>> >>>>> Ok but then the comment doesn't match the check. >>>> >>>> >>>> I was trying to be explict in the comment that we expect the thread to >>>> have done kthread_use_mm(). >>>> >>>>> >>>>> And also the comment in current_thread_amr() is then misleading. >>>>> >>>>> Why not do the current->flags & PF_KTHREAD check in current_thread_amr() >>>>> and return 0 in that case instead of BLOCKED ? >>>> >>>> In my view currrent_thread_amr() is more generic and we want to be >>>> explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we >>>> call allow user access, we relax the AMR value. >>> >>> Just following up on this, as I'd hate to have 5.11 released with this >>> bug in it for powerpc. It'll obviously also affect other cases of a >>> kernel thread faulting after having done kthread_use_mm(), though I'm >>> not sure how widespread that is. In any case, it'll leave io_uring >>> mostly broken on powerpc if this isn't patched for release. >>> >> >> I am waiting for test feedback on the change I posted earlier. I am also >> running a regression run myself. Once that is complete i will post the patch >> as a separate email. > > Are you waiting a test from me? Or someone else who test PPC? Although I'm > the "Reported-by" of this bug, I just can help to verify this bug itself, > I don't have enough test cases to do regression test from PPC side. Do you need > me to verify this bug itself. > > If you can verify this bug, it would be really helpful. -aneesh
On Thu, Feb 04, 2021 at 10:31:59PM +0530, Aneesh Kumar K.V wrote: > On 2/4/21 10:23 PM, Jens Axboe wrote: > > On 2/1/21 11:30 PM, Aneesh Kumar K.V wrote: > > > On 2/2/21 11:50 AM, Christophe Leroy wrote: > > > > > > > > > > > > Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit : > > > > > On 2/2/21 11:32 AM, Christophe Leroy wrote: > > > > > > > > > > > > > > > > > > Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit : > > > > > > > Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes: > > > > > > > > > > > > > > > Nicholas Piggin <npiggin@gmail.com> writes: > > > > > > > > > > > > > > > > > Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm: > > > > > > > > > > Christophe Leroy <christophe.leroy@csgroup.eu> writes: > > > > > > > > > > > +Aneesh > > > > > > > > > > > > > > > > > > > > > > Le 29/01/2021 à 07:52, Zorro Lang a écrit : > > > > > > > > > > .. > > > > > > > > > > > > [ 96.200296] ------------[ cut here ]------------ > > > > > > > > > > > > [ 96.200304] Bug: Read fault blocked by KUAP! > > > > > > > > > > > > [ 96.200309] WARNING: CPU: 3 PID: 1876 at > > > > > > > > > > > > arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310 > > > > > > > > > > > > > > > > > > > > > > > [ 96.200734] NIP [c000000000849424] > > > > > > > > > > > > fault_in_pages_readable+0x104/0x350 > > > > > > > > > > > > [ 96.200741] LR [c00000000084952c] > > > > > > > > > > > > fault_in_pages_readable+0x20c/0x350 > > > > > > > > > > > > [ 96.200747] --- interrupt: 300 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Problem happens in a section where userspace access is supposed > > > > > > > > > > > to be granted, so the patch you > > > > > > > > > > > proposed is definitely not the right fix. > > > > > > > > > > > > > > > > > > > > > > c000000000849408: 2c 01 00 4c isync > > > > > > > > > > > c00000000084940c: a6 03 3d 7d mtspr 29,r9 <== granting > > > > > > > > > > > userspace access permission > > > > > > > > > > > c000000000849410: 2c 01 00 4c isync > > > > > > > > > > > c000000000849414: 00 00 36 e9 ld r9,0(r22) > > > > > > > > > > > c000000000849418: 20 00 29 81 lwz r9,32(r9) > > > > > > > > > > > c00000000084941c: 00 02 29 71 andi. r9,r9,512 > > > > > > > > > > > c000000000849420: 78 d3 5e 7f mr r30,r26 > > > > > > > > > > > ==> c000000000849424: 00 00 bf 8b lbz r29,0(r31) <== > > > > > > > > > > > accessing userspace > > > > > > > > > > > c000000000849428: 10 00 82 41 beq c000000000849438 > > > > > > > > > > > <fault_in_pages_readable+0x118> > > > > > > > > > > > c00000000084942c: 2c 01 00 4c isync > > > > > > > > > > > c000000000849430: a6 03 bd 7e mtspr 29,r21 <== > > > > > > > > > > > clearing userspace access permission > > > > > > > > > > > c000000000849434: 2c 01 00 4c isync > > > > > > > > > > > > > > > > > > > > > > My first guess is that the problem is linked to the following > > > > > > > > > > > function, see the comment > > > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > > > > * For kernel thread that doesn't have thread.regs return > > > > > > > > > > > * default AMR/IAMR values. > > > > > > > > > > > */ > > > > > > > > > > > static inline u64 current_thread_amr(void) > > > > > > > > > > > { > > > > > > > > > > > if (current->thread.regs) > > > > > > > > > > > return current->thread.regs->amr; > > > > > > > > > > > return AMR_KUAP_BLOCKED; > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > Above function was introduced by commit 48a8ab4eeb82 > > > > > > > > > > > ("powerpc/book3s64/pkeys: Don't update SPRN_AMR > > > > > > > > > > > when in kernel mode") > > > > > > > > > > > > > > > > > > > > Yeah that's a bit of a curly one. > > > > > > > > > > > > > > > > > > > > At some point io_uring did kthread_use_mm(), which is supposed to > > > > > > > > > > mean > > > > > > > > > > the kthread can operate on behalf of the original process that > > > > > > > > > > submitted > > > > > > > > > > the IO. > > > > > > > > > > > > > > > > > > > > But because KUAP is implemented using memory protection keys, it > > > > > > > > > > depends > > > > > > > > > > on the value of the AMR register, which is not part of the mm, > > > > > > > > > > it's in > > > > > > > > > > thread.regs->amr. > > > > > > > > > > > > > > > > > > > > And what's worse by the time we're in kthread_use_mm() we no > > > > > > > > > > longer have > > > > > > > > > > access to the thread.regs->amr of the original process that > > > > > > > > > > submitted > > > > > > > > > > the IO. > > > > > > > > > > > > > > > > > > > > We also can't simply move the AMR into the mm, precisely because > > > > > > > > > > it's > > > > > > > > > > per thread, not per mm. > > > > > > > > > > > > > > > > > > > > So TBH I don't know how we're going to fix this. > > > > > > > > > > > > > > > > > > > > I guess we could return AMR=unblocked for kernel threads, but that's > > > > > > > > > > arguably a bug because it allows a process to circumvent memory > > > > > > > > > > keys by > > > > > > > > > > asking the kernel to do the access. > > > > > > > > > > > > > > > > > > We shouldn't need to inherit AMR should we? We only need it to be > > > > > > > > > locked > > > > > > > > > for kernel threads until it's explicitly unlocked -- nothing mm > > > > > > > > > specific > > > > > > > > > there. I think current_thread_amr could return 0 for kernel > > > > > > > > > threads? Or > > > > > > > > > I would even avoid using that function for allow_user_access and open > > > > > > > > > code the kthread case and remove it from current_thread_amr(). > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Nick > > > > > > > > > > > > > > > > > > > > > > updated one > > > > > > > > > > > > > > From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001 > > > > > > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> > > > > > > > Date: Tue, 2 Feb 2021 09:23:38 +0530 > > > > > > > Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access > > > > > > > userspace > > > > > > > after kthread_use_mm > > > > > > > > > > > > > > This fix the bad fault reported by KUAP when io_wqe_worker access > > > > > > > userspace. > > > > > > > > > > > > > > Bug: Read fault blocked by KUAP! > > > > > > > WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 > > > > > > > __do_page_fault+0x6b4/0xcd0 > > > > > > > NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 > > > > > > > LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 > > > > > > > .......... > > > > > > > Call Trace: > > > > > > > [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 > > > > > > > (unreliable) > > > > > > > [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 > > > > > > > [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c > > > > > > > --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 > > > > > > > .......... > > > > > > > NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 > > > > > > > LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 > > > > > > > interrupt: 300 > > > > > > > [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 > > > > > > > [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 > > > > > > > [c000000016367990] [c000000000710330] > > > > > > > iomap_file_buffered_write+0xa0/0x120 > > > > > > > [c0000000163679e0] [c00800000040791c] > > > > > > > xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] > > > > > > > [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 > > > > > > > [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 > > > > > > > [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 > > > > > > > [c000000016367cb0] [c0000000006e2578] > > > > > > > io_worker_handle_work+0x498/0x800 > > > > > > > [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 > > > > > > > [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 > > > > > > > [c000000016367e10] [c00000000000dbf0] > > > > > > > ret_from_kernel_thread+0x5c/0x6c > > > > > > > > > > > > > > The kernel consider thread AMR value for kernel thread to be > > > > > > > AMR_KUAP_BLOCKED. Hence access to userspace is denied. This > > > > > > > of course not correct and we should allow userspace access after > > > > > > > kthread_use_mm(). To be precise, kthread_use_mm() should inherit the > > > > > > > AMR value of the operating address space. But, the AMR value is > > > > > > > thread-specific and we inherit the address space and not thread > > > > > > > access restrictions. Because of this ignore AMR value when accessing > > > > > > > userspace via kernel thread. > > > > > > > > > > > > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > > > > > > > --- > > > > > > > Changes from v1: > > > > > > > * Address review feedback from Nick > > > > > > > > > > > > > > arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++- > > > > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/arch/powerpc/include/asm/book3s/64/kup.h > > > > > > > b/arch/powerpc/include/asm/book3s/64/kup.h > > > > > > > index f50f72e535aa..95f4df99249e 100644 > > > > > > > --- a/arch/powerpc/include/asm/book3s/64/kup.h > > > > > > > +++ b/arch/powerpc/include/asm/book3s/64/kup.h > > > > > > > @@ -384,7 +384,13 @@ static __always_inline void > > > > > > > allow_user_access(void __user *to, const void __user > > > > > > > // This is written so we can resolve to a single case at build > > > > > > > time > > > > > > > BUILD_BUG_ON(!__builtin_constant_p(dir)); > > > > > > > - if (mmu_has_feature(MMU_FTR_PKEY)) > > > > > > > + /* > > > > > > > + * if it is a kthread that did kthread_use_mm() don't > > > > > > > + * use current_thread_amr(). > > > > > > > > > > > > According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel > > > > > > thread */ > > > > > > It doesn't seem to be related to kthread_use_mm() > > > > > > > > > > That should be a sufficient check here. if we did reach here without > > > > > calling kthread_user_mm, we will crash on access because we don't have > > > > > a mm attached to the current process. a kernel thread with > > > > > kthread_use_mm has > > > > > > > > Ok but then the comment doesn't match the check. > > > > > > > > > I was trying to be explict in the comment that we expect the thread to > > > have done kthread_use_mm(). > > > > > > > > > > > And also the comment in current_thread_amr() is then misleading. > > > > > > > > Why not do the current->flags & PF_KTHREAD check in current_thread_amr() > > > > and return 0 in that case instead of BLOCKED ? > > > > > > In my view currrent_thread_amr() is more generic and we want to be > > > explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we > > > call allow user access, we relax the AMR value. > > > > Just following up on this, as I'd hate to have 5.11 released with this > > bug in it for powerpc. It'll obviously also affect other cases of a > > kernel thread faulting after having done kthread_use_mm(), though I'm > > not sure how widespread that is. In any case, it'll leave io_uring > > mostly broken on powerpc if this isn't patched for release. > > > > I am waiting for test feedback on the change I posted earlier. I am also > running a regression run myself. Once that is complete i will post the patch > as a separate email. Are you waiting a test from me? Or someone else who test PPC? Although I'm the "Reported-by" of this bug, I just can help to verify this bug itself, I don't have enough test cases to do regression test from PPC side. Do you need me to verify this bug itself. Thanks, Zorro > > -aneesh >
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 8961b44f350c..5a4d6af04c99 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -417,9 +417,12 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address, /* * The kernel should never take an execute fault nor should it * take a page fault to a kernel address or a page fault to a user - * address outside of dedicated places + * address outside of dedicated places. Use mm to decide if this is + * valid or not, it's perfectly legitimate to have a kernel thread + * take a fault as long as it's performed kthread_use_mm() prior. An + * example of that would be io_uring helper threads. */ - if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) + if (unlikely(!mm && bad_kernel_fault(regs, error_code, address, is_write))) return SIGSEGV; /*
On powerpc, io_uring test hit below KUAP fault on __do_page_fault. The fail source line is: if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write))) return SIGSEGV; The is_user() is based on user_mod(regs) only. This's not suit for io_uring, where the helper thread can assume the user app identity and could perform this fault just fine. So turn to use mm to decide if this is valid or not. [ 556.472666] ------------[ cut here ]------------ [ 556.472686] Bug: Read fault blocked by KUAP! [ 556.472697] WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 [ 556.472728] Modules linked in: bonding rfkill sunrpc pseries_rng xts uio_pdrv_genirq vmx_crypto uio ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp [ 556.472816] CPU: 1 PID: 101841 Comm: io_wqe_worker-0 Tainted: G W 5.11.0-rc3+ #2 [ 556.472830] NIP: c00000000009e7e4 LR: c00000000009e7e0 CTR: 0000000000000000 [ 556.472842] REGS: c000000016367090 TRAP: 0700 Tainted: G W (5.11.0-rc3+) [ 556.472853] MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 48022424 XER: 00000001 [ 556.472901] CFAR: c0000000001822ac IRQMASK: 1 GPR00: c00000000009e7e0 c000000016367330 c0000000023fc300 0000000000000020 GPR04: c000000001e3c2b8 0000000000000001 0000000000000027 c0000007fbcccc90 GPR08: 0000000000000023 0000000000000000 c000000024ed0900 fffffffffc464a58 GPR12: 0000000000002000 c00000001ecaf280 c0000000001caee8 c000000014d547c0 GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000002454018 GPR20: c000000001336480 bfffffffffffffff 0000000000000000 c00000000b0e5800 GPR24: a8aaaaaaaaaaaaaa 0000000000000000 0000000000200000 c00000002cc38880 GPR28: 000001000e3c9310 c0000000013424c0 c0000000163674a0 c000000001e0d2c0 [ 556.473125] NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 [ 556.473139] LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 [ 556.473152] Call Trace: [ 556.473168] [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [ 556.473198] [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 [ 556.473216] [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c [ 556.473232] --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 [ 556.473245] NIP: c0000000008e8228 LR: c0000000008e834c CTR: 0000000000000000 [ 556.473257] REGS: c0000000163674a0 TRAP: 0300 Tainted: G W (5.11.0-rc3+) [ 556.473268] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44008482 XER: 00000001 [ 556.473339] CFAR: c0000000008e81f0 DAR: 000001000e3c9310 DSISR: 00200000 IRQMASK: 0 GPR00: c0000000008e834c c000000016367740 c0000000023fc300 0000000000000000 GPR04: c00000002cc389e0 0000000000000001 00000007fa4b0000 c0000000025bc520 GPR08: 00000007fa4b0000 0000000000000200 fcffffffffffffff ffffffffffea2ad8 GPR12: 0000000000008000 c00000001ecaf280 c0000000001caee8 c000000014d547c0 GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000002454018 GPR20: c000000001336480 bfffffffffffffff 0000000000000000 c00000000b0e5800 GPR24: a8aaaaaaaaaaaaaa fcffffffffffffff 00000000000004b1 00000000000004b1 GPR28: c00000000b0e5888 000001000e3c97c0 0000000000000000 000001000e3c9310 [ 556.473667] NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 [ 556.473688] LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 [ 556.473708] --- interrupt: 300 [ 556.473722] [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 [ 556.473770] [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 [ 556.473804] [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120 [ 556.473839] [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [ 556.474053] [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 [ 556.474101] [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 [ 556.474132] [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 [ 556.474161] [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800 [ 556.474192] [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 [ 556.474223] [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 [ 556.474254] [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c [ 556.474286] Instruction dump: [ 556.474310] e87e0100 481287f1 60000000 2fa30000 419e01ec 408e0400 3c82fef4 388461d0 [ 556.474395] 3c62fef4 386362d0 480e3a69 60000000 <0fe00000> 3860000b 4bfffa08 3d220006 [ 556.474479] irq event stamp: 1280 [ 556.474505] hardirqs last enabled at (1279): [<c0000000005a0104>] __slab_free+0x3e4/0x570 [ 556.474540] hardirqs last disabled at (1280): [<c000000000008a04>] data_access_common_virt+0x1a4/0x1c0 [ 556.474565] softirqs last enabled at (536): [<c00000000107cdfc>] __do_softirq+0x6ac/0x7f4 [ 556.474590] softirqs last disabled at (437): [<c00000000019179c>] irq_exit+0x2ec/0x320 [ 556.474615] ---[ end trace 4c1967c400992302 ]--- Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=211151 Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Zorro Lang <zlang@redhat.com> --- Hi, Thanks to Jens reviewed this bug report from io_uring side, and then suggest this fix. But we're not expert of powerpc, so report this bug to powerpc maillist to get more review. I've tested this patch, I can't reproduce this bug after merge this patch. And can reproduce it after remove this patch. Thanks, Zorro arch/powerpc/mm/fault.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)