diff mbox series

powerpc/fault: fix wrong KUAP fault for IO_URING

Message ID 20210127145648.348135-1-zlang@redhat.com (mailing list archive)
State Rejected
Headers show
Series powerpc/fault: fix wrong KUAP fault for IO_URING | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch powerpc/merge (41d8cb7ece7c81e4eb897ed7ec7d3c3d72fd0af4)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch warning total: 0 errors, 1 warnings, 0 checks, 14 lines checked
snowpatch_ozlabs/needsstable success Patch has no Fixes tags

Commit Message

Zorro Lang Jan. 27, 2021, 2:56 p.m. UTC
On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
The fail source line is:

  if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
      return SIGSEGV;

The is_user() is based on user_mod(regs) only. This's not suit for
io_uring, where the helper thread can assume the user app identity
and could perform this fault just fine. So turn to use mm to decide
if this is valid or not.

[  556.472666] ------------[ cut here ]------------
[  556.472686] Bug: Read fault blocked by KUAP!
[  556.472697] WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0
[  556.472728] Modules linked in: bonding rfkill sunrpc pseries_rng xts uio_pdrv_genirq vmx_crypto uio ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp
[  556.472816] CPU: 1 PID: 101841 Comm: io_wqe_worker-0 Tainted: G        W         5.11.0-rc3+ #2
[  556.472830] NIP:  c00000000009e7e4 LR: c00000000009e7e0 CTR: 0000000000000000
[  556.472842] REGS: c000000016367090 TRAP: 0700   Tainted: G        W          (5.11.0-rc3+)
[  556.472853] MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 48022424  XER: 00000001
[  556.472901] CFAR: c0000000001822ac IRQMASK: 1
               GPR00: c00000000009e7e0 c000000016367330 c0000000023fc300 0000000000000020
               GPR04: c000000001e3c2b8 0000000000000001 0000000000000027 c0000007fbcccc90
               GPR08: 0000000000000023 0000000000000000 c000000024ed0900 fffffffffc464a58
               GPR12: 0000000000002000 c00000001ecaf280 c0000000001caee8 c000000014d547c0
               GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000002454018
               GPR20: c000000001336480 bfffffffffffffff 0000000000000000 c00000000b0e5800
               GPR24: a8aaaaaaaaaaaaaa 0000000000000000 0000000000200000 c00000002cc38880
               GPR28: 000001000e3c9310 c0000000013424c0 c0000000163674a0 c000000001e0d2c0
[  556.473125] NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
[  556.473139] LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
[  556.473152] Call Trace:
[  556.473168] [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable)
[  556.473198] [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
[  556.473216] [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
[  556.473232] --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
[  556.473245] NIP:  c0000000008e8228 LR: c0000000008e834c CTR: 0000000000000000
[  556.473257] REGS: c0000000163674a0 TRAP: 0300   Tainted: G        W          (5.11.0-rc3+)
[  556.473268] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 44008482  XER: 00000001
[  556.473339] CFAR: c0000000008e81f0 DAR: 000001000e3c9310 DSISR: 00200000 IRQMASK: 0
               GPR00: c0000000008e834c c000000016367740 c0000000023fc300 0000000000000000
               GPR04: c00000002cc389e0 0000000000000001 00000007fa4b0000 c0000000025bc520
               GPR08: 00000007fa4b0000 0000000000000200 fcffffffffffffff ffffffffffea2ad8
               GPR12: 0000000000008000 c00000001ecaf280 c0000000001caee8 c000000014d547c0
               GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000002454018
               GPR20: c000000001336480 bfffffffffffffff 0000000000000000 c00000000b0e5800
               GPR24: a8aaaaaaaaaaaaaa fcffffffffffffff 00000000000004b1 00000000000004b1
               GPR28: c00000000b0e5888 000001000e3c97c0 0000000000000000 000001000e3c9310
[  556.473667] NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
[  556.473688] LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
[  556.473708] --- interrupt: 300
[  556.473722] [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
[  556.473770] [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
[  556.473804] [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120
[  556.473839] [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
[  556.474053] [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
[  556.474101] [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
[  556.474132] [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
[  556.474161] [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
[  556.474192] [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
[  556.474223] [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
[  556.474254] [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c
[  556.474286] Instruction dump:
[  556.474310] e87e0100 481287f1 60000000 2fa30000 419e01ec 408e0400 3c82fef4 388461d0
[  556.474395] 3c62fef4 386362d0 480e3a69 60000000 <0fe00000> 3860000b 4bfffa08 3d220006
[  556.474479] irq event stamp: 1280
[  556.474505] hardirqs last  enabled at (1279): [<c0000000005a0104>] __slab_free+0x3e4/0x570
[  556.474540] hardirqs last disabled at (1280): [<c000000000008a04>] data_access_common_virt+0x1a4/0x1c0
[  556.474565] softirqs last  enabled at (536): [<c00000000107cdfc>] __do_softirq+0x6ac/0x7f4
[  556.474590] softirqs last disabled at (437): [<c00000000019179c>] irq_exit+0x2ec/0x320
[  556.474615] ---[ end trace 4c1967c400992302 ]---

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=211151
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Zorro Lang <zlang@redhat.com>
---

Hi,

Thanks to Jens reviewed this bug report from io_uring side, and then suggest this
fix. But we're not expert of powerpc, so report this bug to powerpc maillist to
get more review.

I've tested this patch, I can't reproduce this bug after merge this patch. And
can reproduce it after remove this patch.

Thanks,
Zorro

 arch/powerpc/mm/fault.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Christophe Leroy Jan. 27, 2021, 4:38 p.m. UTC | #1
Le 27/01/2021 à 15:56, Zorro Lang a écrit :
> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
> The fail source line is:
> 
>    if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
>        return SIGSEGV;
> 
> The is_user() is based on user_mod(regs) only. This's not suit for
> io_uring, where the helper thread can assume the user app identity
> and could perform this fault just fine. So turn to use mm to decide
> if this is valid or not.

I don't understand why testing is_user would be an issue.
KUAP purpose it to block any unallowed access from kernel to user memory (Equivalent to SMAP on 
x86). So it really must be based on MSR_PR bit, that is what is_user provides.

If the kernel access is legitimate, kernel should have opened userspace access then you shouldn't 
get this "Bug: Read fault blocked by KUAP!".

As far as I understand, the fault occurs in iov_iter_fault_in_readable() which calls 
fault_in_pages_readable()
And fault_in_pages_readable() uses __get_user() so it is a legitimate access and you really should 
get a KUAP fault.

So the problem is somewhere else, I think you proposed patch just hides the problem, it doesn't fix it.

Can you provide your vmlinux binary together with your .config ?


> 
> [  556.472666] ------------[ cut here ]------------
> [  556.472686] Bug: Read fault blocked by KUAP!
> [  556.472697] WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0
> [  556.472728] Modules linked in: bonding rfkill sunrpc pseries_rng xts uio_pdrv_genirq vmx_crypto uio ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp
> [  556.472816] CPU: 1 PID: 101841 Comm: io_wqe_worker-0 Tainted: G        W         5.11.0-rc3+ #2
> [  556.472830] NIP:  c00000000009e7e4 LR: c00000000009e7e0 CTR: 0000000000000000
> [  556.472842] REGS: c000000016367090 TRAP: 0700   Tainted: G        W          (5.11.0-rc3+)
> [  556.472853] MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 48022424  XER: 00000001
> [  556.472901] CFAR: c0000000001822ac IRQMASK: 1
>                 GPR00: c00000000009e7e0 c000000016367330 c0000000023fc300 0000000000000020
>                 GPR04: c000000001e3c2b8 0000000000000001 0000000000000027 c0000007fbcccc90
>                 GPR08: 0000000000000023 0000000000000000 c000000024ed0900 fffffffffc464a58
>                 GPR12: 0000000000002000 c00000001ecaf280 c0000000001caee8 c000000014d547c0
>                 GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000002454018
>                 GPR20: c000000001336480 bfffffffffffffff 0000000000000000 c00000000b0e5800
>                 GPR24: a8aaaaaaaaaaaaaa 0000000000000000 0000000000200000 c00000002cc38880
>                 GPR28: 000001000e3c9310 c0000000013424c0 c0000000163674a0 c000000001e0d2c0
> [  556.473125] NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
> [  556.473139] LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
> [  556.473152] Call Trace:
> [  556.473168] [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable)
> [  556.473198] [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
> [  556.473216] [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
> [  556.473232] --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
> [  556.473245] NIP:  c0000000008e8228 LR: c0000000008e834c CTR: 0000000000000000
> [  556.473257] REGS: c0000000163674a0 TRAP: 0300   Tainted: G        W          (5.11.0-rc3+)
> [  556.473268] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 44008482  XER: 00000001
> [  556.473339] CFAR: c0000000008e81f0 DAR: 000001000e3c9310 DSISR: 00200000 IRQMASK: 0
>                 GPR00: c0000000008e834c c000000016367740 c0000000023fc300 0000000000000000
>                 GPR04: c00000002cc389e0 0000000000000001 00000007fa4b0000 c0000000025bc520
>                 GPR08: 00000007fa4b0000 0000000000000200 fcffffffffffffff ffffffffffea2ad8
>                 GPR12: 0000000000008000 c00000001ecaf280 c0000000001caee8 c000000014d547c0
>                 GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000002454018
>                 GPR20: c000000001336480 bfffffffffffffff 0000000000000000 c00000000b0e5800
>                 GPR24: a8aaaaaaaaaaaaaa fcffffffffffffff 00000000000004b1 00000000000004b1
>                 GPR28: c00000000b0e5888 000001000e3c97c0 0000000000000000 000001000e3c9310
> [  556.473667] NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
> [  556.473688] LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
> [  556.473708] --- interrupt: 300
> [  556.473722] [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
> [  556.473770] [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
> [  556.473804] [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120
> [  556.473839] [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
> [  556.474053] [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
> [  556.474101] [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
> [  556.474132] [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
> [  556.474161] [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
> [  556.474192] [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
> [  556.474223] [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
> [  556.474254] [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c
> [  556.474286] Instruction dump:
> [  556.474310] e87e0100 481287f1 60000000 2fa30000 419e01ec 408e0400 3c82fef4 388461d0
> [  556.474395] 3c62fef4 386362d0 480e3a69 60000000 <0fe00000> 3860000b 4bfffa08 3d220006
> [  556.474479] irq event stamp: 1280
> [  556.474505] hardirqs last  enabled at (1279): [<c0000000005a0104>] __slab_free+0x3e4/0x570
> [  556.474540] hardirqs last disabled at (1280): [<c000000000008a04>] data_access_common_virt+0x1a4/0x1c0
> [  556.474565] softirqs last  enabled at (536): [<c00000000107cdfc>] __do_softirq+0x6ac/0x7f4
> [  556.474590] softirqs last disabled at (437): [<c00000000019179c>] irq_exit+0x2ec/0x320
> [  556.474615] ---[ end trace 4c1967c400992302 ]---
> 
> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=211151
> Suggested-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Zorro Lang <zlang@redhat.com>
> ---
> 
> Hi,
> 
> Thanks to Jens reviewed this bug report from io_uring side, and then suggest this
> fix. But we're not expert of powerpc, so report this bug to powerpc maillist to
> get more review.
> 
> I've tested this patch, I can't reproduce this bug after merge this patch. And
> can reproduce it after remove this patch.
> 
> Thanks,
> Zorro
> 
>   arch/powerpc/mm/fault.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 8961b44f350c..5a4d6af04c99 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -417,9 +417,12 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>   	/*
>   	 * The kernel should never take an execute fault nor should it
>   	 * take a page fault to a kernel address or a page fault to a user
> -	 * address outside of dedicated places
> +	 * address outside of dedicated places. Use mm to decide if this is
> +	 * valid or not, it's perfectly legitimate to have a kernel thread
> +	 * take a fault as long as it's performed kthread_use_mm() prior. An
> +	 * example of that would be io_uring helper threads.
>   	 */
> -	if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
> +	if (unlikely(!mm && bad_kernel_fault(regs, error_code, address, is_write)))
>   		return SIGSEGV;
>   
>   	/*
>
Jens Axboe Jan. 27, 2021, 7:29 p.m. UTC | #2
On 1/27/21 9:38 AM, Christophe Leroy wrote:
> 
> 
> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
>> The fail source line is:
>>
>>    if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
>>        return SIGSEGV;
>>
>> The is_user() is based on user_mod(regs) only. This's not suit for
>> io_uring, where the helper thread can assume the user app identity
>> and could perform this fault just fine. So turn to use mm to decide
>> if this is valid or not.
> 
> I don't understand why testing is_user would be an issue. KUAP purpose
> it to block any unallowed access from kernel to user memory
> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
> that is what is_user provides.
> 
> If the kernel access is legitimate, kernel should have opened
> userspace access then you shouldn't get this "Bug: Read fault blocked
> by KUAP!".
> 
> As far as I understand, the fault occurs in
> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
> fault_in_pages_readable() uses __get_user() so it is a legitimate
> access and you really should get a KUAP fault.
> 
> So the problem is somewhere else, I think you proposed patch just
> hides the problem, it doesn't fix it.

If we do kthread_use_mm(), can we agree that the user access is valid?
We should be able to copy to/from user space, and including faults, if
that's been done and the new mm assigned. Because it really should be.
If SMAP was a problem on x86, we would have seen it long ago.

I'm assuming this may be breakage related to the recent uaccess changes
related to set_fs and friends? Or maybe recent changes on the powerpc
side?

Zorro, did 5.10 work?
Nicholas Piggin Jan. 28, 2021, 12:18 a.m. UTC | #3
Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
> On 1/27/21 9:38 AM, Christophe Leroy wrote:
>> 
>> 
>> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
>>> The fail source line is:
>>>
>>>    if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
>>>        return SIGSEGV;
>>>
>>> The is_user() is based on user_mod(regs) only. This's not suit for
>>> io_uring, where the helper thread can assume the user app identity
>>> and could perform this fault just fine. So turn to use mm to decide
>>> if this is valid or not.
>> 
>> I don't understand why testing is_user would be an issue. KUAP purpose
>> it to block any unallowed access from kernel to user memory
>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
>> that is what is_user provides.
>> 
>> If the kernel access is legitimate, kernel should have opened
>> userspace access then you shouldn't get this "Bug: Read fault blocked
>> by KUAP!".
>> 
>> As far as I understand, the fault occurs in
>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
>> fault_in_pages_readable() uses __get_user() so it is a legitimate
>> access and you really should get a KUAP fault.
>> 
>> So the problem is somewhere else, I think you proposed patch just
>> hides the problem, it doesn't fix it.
> 
> If we do kthread_use_mm(), can we agree that the user access is valid?

Yeah the io uring code is fine, provided it uses the uaccess primitives 
like any other kernel code. It's looking more like a an arch/powerpc bug.

> We should be able to copy to/from user space, and including faults, if
> that's been done and the new mm assigned. Because it really should be.
> If SMAP was a problem on x86, we would have seen it long ago.
> 
> I'm assuming this may be breakage related to the recent uaccess changes
> related to set_fs and friends? Or maybe recent changes on the powerpc
> side?
> 
> Zorro, did 5.10 work?

Would be interesting to know.

Thanks,
Nick
Jens Axboe Jan. 28, 2021, 3:06 a.m. UTC | #4
On 1/27/21 8:13 PM, Zorro Lang wrote:
> On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
>> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
>>> On 1/27/21 9:38 AM, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
>>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
>>>>> The fail source line is:
>>>>>
>>>>>    if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
>>>>>        return SIGSEGV;
>>>>>
>>>>> The is_user() is based on user_mod(regs) only. This's not suit for
>>>>> io_uring, where the helper thread can assume the user app identity
>>>>> and could perform this fault just fine. So turn to use mm to decide
>>>>> if this is valid or not.
>>>>
>>>> I don't understand why testing is_user would be an issue. KUAP purpose
>>>> it to block any unallowed access from kernel to user memory
>>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
>>>> that is what is_user provides.
>>>>
>>>> If the kernel access is legitimate, kernel should have opened
>>>> userspace access then you shouldn't get this "Bug: Read fault blocked
>>>> by KUAP!".
>>>>
>>>> As far as I understand, the fault occurs in
>>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
>>>> fault_in_pages_readable() uses __get_user() so it is a legitimate
>>>> access and you really should get a KUAP fault.
>>>>
>>>> So the problem is somewhere else, I think you proposed patch just
>>>> hides the problem, it doesn't fix it.
>>>
>>> If we do kthread_use_mm(), can we agree that the user access is valid?
>>
>> Yeah the io uring code is fine, provided it uses the uaccess primitives 
>> like any other kernel code. It's looking more like a an arch/powerpc bug.
>>
>>> We should be able to copy to/from user space, and including faults, if
>>> that's been done and the new mm assigned. Because it really should be.
>>> If SMAP was a problem on x86, we would have seen it long ago.
>>>
>>> I'm assuming this may be breakage related to the recent uaccess changes
>>> related to set_fs and friends? Or maybe recent changes on the powerpc
>>> side?
>>>
>>> Zorro, did 5.10 work?
>>
>> Would be interesting to know.
> 
> Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
> commit(be the HEAD) in 5.10 phase?

I forget which versions had what series of this, but 5.10 final - and if
that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
5.10 definitely has them.
Zorro Lang Jan. 28, 2021, 3:13 a.m. UTC | #5
On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
> > On 1/27/21 9:38 AM, Christophe Leroy wrote:
> >> 
> >> 
> >> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
> >>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
> >>> The fail source line is:
> >>>
> >>>    if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
> >>>        return SIGSEGV;
> >>>
> >>> The is_user() is based on user_mod(regs) only. This's not suit for
> >>> io_uring, where the helper thread can assume the user app identity
> >>> and could perform this fault just fine. So turn to use mm to decide
> >>> if this is valid or not.
> >> 
> >> I don't understand why testing is_user would be an issue. KUAP purpose
> >> it to block any unallowed access from kernel to user memory
> >> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
> >> that is what is_user provides.
> >> 
> >> If the kernel access is legitimate, kernel should have opened
> >> userspace access then you shouldn't get this "Bug: Read fault blocked
> >> by KUAP!".
> >> 
> >> As far as I understand, the fault occurs in
> >> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
> >> fault_in_pages_readable() uses __get_user() so it is a legitimate
> >> access and you really should get a KUAP fault.
> >> 
> >> So the problem is somewhere else, I think you proposed patch just
> >> hides the problem, it doesn't fix it.
> > 
> > If we do kthread_use_mm(), can we agree that the user access is valid?
> 
> Yeah the io uring code is fine, provided it uses the uaccess primitives 
> like any other kernel code. It's looking more like a an arch/powerpc bug.
> 
> > We should be able to copy to/from user space, and including faults, if
> > that's been done and the new mm assigned. Because it really should be.
> > If SMAP was a problem on x86, we would have seen it long ago.
> > 
> > I'm assuming this may be breakage related to the recent uaccess changes
> > related to set_fs and friends? Or maybe recent changes on the powerpc
> > side?
> > 
> > Zorro, did 5.10 work?
> 
> Would be interesting to know.

Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
commit(be the HEAD) in 5.10 phase?

Thanks,
Zorro

> 
> Thanks,
> Nick
>
Zorro Lang Jan. 28, 2021, 1:52 p.m. UTC | #6
On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote:
> On 1/27/21 8:13 PM, Zorro Lang wrote:
> > On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
> >> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
> >>> On 1/27/21 9:38 AM, Christophe Leroy wrote:
> >>>>
> >>>>
> >>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
> >>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
> >>>>> The fail source line is:
> >>>>>
> >>>>>    if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
> >>>>>        return SIGSEGV;
> >>>>>
> >>>>> The is_user() is based on user_mod(regs) only. This's not suit for
> >>>>> io_uring, where the helper thread can assume the user app identity
> >>>>> and could perform this fault just fine. So turn to use mm to decide
> >>>>> if this is valid or not.
> >>>>
> >>>> I don't understand why testing is_user would be an issue. KUAP purpose
> >>>> it to block any unallowed access from kernel to user memory
> >>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
> >>>> that is what is_user provides.
> >>>>
> >>>> If the kernel access is legitimate, kernel should have opened
> >>>> userspace access then you shouldn't get this "Bug: Read fault blocked
> >>>> by KUAP!".
> >>>>
> >>>> As far as I understand, the fault occurs in
> >>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
> >>>> fault_in_pages_readable() uses __get_user() so it is a legitimate
> >>>> access and you really should get a KUAP fault.
> >>>>
> >>>> So the problem is somewhere else, I think you proposed patch just
> >>>> hides the problem, it doesn't fix it.
> >>>
> >>> If we do kthread_use_mm(), can we agree that the user access is valid?
> >>
> >> Yeah the io uring code is fine, provided it uses the uaccess primitives 
> >> like any other kernel code. It's looking more like a an arch/powerpc bug.
> >>
> >>> We should be able to copy to/from user space, and including faults, if
> >>> that's been done and the new mm assigned. Because it really should be.
> >>> If SMAP was a problem on x86, we would have seen it long ago.
> >>>
> >>> I'm assuming this may be breakage related to the recent uaccess changes
> >>> related to set_fs and friends? Or maybe recent changes on the powerpc
> >>> side?
> >>>
> >>> Zorro, did 5.10 work?
> >>
> >> Would be interesting to know.
> > 
> > Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
> > commit(be the HEAD) in 5.10 phase?
> 
> I forget which versions had what series of this, but 5.10 final - and if
> that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
> 5.10 definitely has them.

I justed built linux v5.10 with same .config file, and gave it same test.
v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug:

# ./check generic/013 generic/051
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021
MKFS_OPTIONS  -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch

generic/013 138s ...  77s
generic/051 103s ...  143s
Ran: generic/013 generic/051
Passed all 2 tests

> 
> -- 
> Jens Axboe
>
Jens Axboe Jan. 28, 2021, 2:42 p.m. UTC | #7
On 1/28/21 6:52 AM, Zorro Lang wrote:
> On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote:
>> On 1/27/21 8:13 PM, Zorro Lang wrote:
>>> On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
>>>> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
>>>>> On 1/27/21 9:38 AM, Christophe Leroy wrote:
>>>>>>
>>>>>>
>>>>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
>>>>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
>>>>>>> The fail source line is:
>>>>>>>
>>>>>>>    if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
>>>>>>>        return SIGSEGV;
>>>>>>>
>>>>>>> The is_user() is based on user_mod(regs) only. This's not suit for
>>>>>>> io_uring, where the helper thread can assume the user app identity
>>>>>>> and could perform this fault just fine. So turn to use mm to decide
>>>>>>> if this is valid or not.
>>>>>>
>>>>>> I don't understand why testing is_user would be an issue. KUAP purpose
>>>>>> it to block any unallowed access from kernel to user memory
>>>>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
>>>>>> that is what is_user provides.
>>>>>>
>>>>>> If the kernel access is legitimate, kernel should have opened
>>>>>> userspace access then you shouldn't get this "Bug: Read fault blocked
>>>>>> by KUAP!".
>>>>>>
>>>>>> As far as I understand, the fault occurs in
>>>>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
>>>>>> fault_in_pages_readable() uses __get_user() so it is a legitimate
>>>>>> access and you really should get a KUAP fault.
>>>>>>
>>>>>> So the problem is somewhere else, I think you proposed patch just
>>>>>> hides the problem, it doesn't fix it.
>>>>>
>>>>> If we do kthread_use_mm(), can we agree that the user access is valid?
>>>>
>>>> Yeah the io uring code is fine, provided it uses the uaccess primitives 
>>>> like any other kernel code. It's looking more like a an arch/powerpc bug.
>>>>
>>>>> We should be able to copy to/from user space, and including faults, if
>>>>> that's been done and the new mm assigned. Because it really should be.
>>>>> If SMAP was a problem on x86, we would have seen it long ago.
>>>>>
>>>>> I'm assuming this may be breakage related to the recent uaccess changes
>>>>> related to set_fs and friends? Or maybe recent changes on the powerpc
>>>>> side?
>>>>>
>>>>> Zorro, did 5.10 work?
>>>>
>>>> Would be interesting to know.
>>>
>>> Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
>>> commit(be the HEAD) in 5.10 phase?
>>
>> I forget which versions had what series of this, but 5.10 final - and if
>> that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
>> 5.10 definitely has them.
> 
> I justed built linux v5.10 with same .config file, and gave it same test.
> v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug:
> 
> # ./check generic/013 generic/051
> FSTYP         -- xfs (non-debug)
> PLATFORM      -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021
> MKFS_OPTIONS  -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
> 
> generic/013 138s ...  77s
> generic/051 103s ...  143s
> Ran: generic/013 generic/051
> Passed all 2 tests

Thanks for testing that, so I think it's safe to conclude that there's a
regression in powerpc fault handling for kthreads that use
kthread_use_mm in this release. A bisect would definitely find it, but
might be pointless if Christophe or Nick already have an idea of what it
is.
Christophe Leroy Jan. 28, 2021, 2:44 p.m. UTC | #8
Le 28/01/2021 à 15:42, Jens Axboe a écrit :
> On 1/28/21 6:52 AM, Zorro Lang wrote:
>> On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote:
>>> On 1/27/21 8:13 PM, Zorro Lang wrote:
>>>> On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
>>>>> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
>>>>>> On 1/27/21 9:38 AM, Christophe Leroy wrote:
>>>>>>>
>>>>>>>
>>>>>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
>>>>>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
>>>>>>>> The fail source line is:
>>>>>>>>
>>>>>>>>     if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
>>>>>>>>         return SIGSEGV;
>>>>>>>>
>>>>>>>> The is_user() is based on user_mod(regs) only. This's not suit for
>>>>>>>> io_uring, where the helper thread can assume the user app identity
>>>>>>>> and could perform this fault just fine. So turn to use mm to decide
>>>>>>>> if this is valid or not.
>>>>>>>
>>>>>>> I don't understand why testing is_user would be an issue. KUAP purpose
>>>>>>> it to block any unallowed access from kernel to user memory
>>>>>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
>>>>>>> that is what is_user provides.
>>>>>>>
>>>>>>> If the kernel access is legitimate, kernel should have opened
>>>>>>> userspace access then you shouldn't get this "Bug: Read fault blocked
>>>>>>> by KUAP!".
>>>>>>>
>>>>>>> As far as I understand, the fault occurs in
>>>>>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
>>>>>>> fault_in_pages_readable() uses __get_user() so it is a legitimate
>>>>>>> access and you really should get a KUAP fault.
>>>>>>>
>>>>>>> So the problem is somewhere else, I think you proposed patch just
>>>>>>> hides the problem, it doesn't fix it.
>>>>>>
>>>>>> If we do kthread_use_mm(), can we agree that the user access is valid?
>>>>>
>>>>> Yeah the io uring code is fine, provided it uses the uaccess primitives
>>>>> like any other kernel code. It's looking more like a an arch/powerpc bug.
>>>>>
>>>>>> We should be able to copy to/from user space, and including faults, if
>>>>>> that's been done and the new mm assigned. Because it really should be.
>>>>>> If SMAP was a problem on x86, we would have seen it long ago.
>>>>>>
>>>>>> I'm assuming this may be breakage related to the recent uaccess changes
>>>>>> related to set_fs and friends? Or maybe recent changes on the powerpc
>>>>>> side?
>>>>>>
>>>>>> Zorro, did 5.10 work?
>>>>>
>>>>> Would be interesting to know.
>>>>
>>>> Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
>>>> commit(be the HEAD) in 5.10 phase?
>>>
>>> I forget which versions had what series of this, but 5.10 final - and if
>>> that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
>>> 5.10 definitely has them.
>>
>> I justed built linux v5.10 with same .config file, and gave it same test.
>> v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug:
>>
>> # ./check generic/013 generic/051
>> FSTYP         -- xfs (non-debug)
>> PLATFORM      -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021
>> MKFS_OPTIONS  -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3
>> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
>>
>> generic/013 138s ...  77s
>> generic/051 103s ...  143s
>> Ran: generic/013 generic/051
>> Passed all 2 tests
> 
> Thanks for testing that, so I think it's safe to conclude that there's a
> regression in powerpc fault handling for kthreads that use
> kthread_use_mm in this release. A bisect would definitely find it, but
> might be pointless if Christophe or Nick already have an idea of what it
> is.
> 

I don't have any idea yet, but I'd be curious to see the vmlinux binary matching the reported Oops.

Christophe
Zorro Lang Jan. 28, 2021, 3:20 p.m. UTC | #9
On Thu, Jan 28, 2021 at 03:44:21PM +0100, Christophe Leroy wrote:
> 
> 
> Le 28/01/2021 à 15:42, Jens Axboe a écrit :
> > On 1/28/21 6:52 AM, Zorro Lang wrote:
> > > On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote:
> > > > On 1/27/21 8:13 PM, Zorro Lang wrote:
> > > > > On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
> > > > > > Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
> > > > > > > On 1/27/21 9:38 AM, Christophe Leroy wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Le 27/01/2021 à 15:56, Zorro Lang a écrit :
> > > > > > > > > On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
> > > > > > > > > The fail source line is:
> > > > > > > > > 
> > > > > > > > >     if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
> > > > > > > > >         return SIGSEGV;
> > > > > > > > > 
> > > > > > > > > The is_user() is based on user_mod(regs) only. This's not suit for
> > > > > > > > > io_uring, where the helper thread can assume the user app identity
> > > > > > > > > and could perform this fault just fine. So turn to use mm to decide
> > > > > > > > > if this is valid or not.
> > > > > > > > 
> > > > > > > > I don't understand why testing is_user would be an issue. KUAP purpose
> > > > > > > > it to block any unallowed access from kernel to user memory
> > > > > > > > (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
> > > > > > > > that is what is_user provides.
> > > > > > > > 
> > > > > > > > If the kernel access is legitimate, kernel should have opened
> > > > > > > > userspace access then you shouldn't get this "Bug: Read fault blocked
> > > > > > > > by KUAP!".
> > > > > > > > 
> > > > > > > > As far as I understand, the fault occurs in
> > > > > > > > iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
> > > > > > > > fault_in_pages_readable() uses __get_user() so it is a legitimate
> > > > > > > > access and you really should get a KUAP fault.
> > > > > > > > 
> > > > > > > > So the problem is somewhere else, I think you proposed patch just
> > > > > > > > hides the problem, it doesn't fix it.
> > > > > > > 
> > > > > > > If we do kthread_use_mm(), can we agree that the user access is valid?
> > > > > > 
> > > > > > Yeah the io uring code is fine, provided it uses the uaccess primitives
> > > > > > like any other kernel code. It's looking more like a an arch/powerpc bug.
> > > > > > 
> > > > > > > We should be able to copy to/from user space, and including faults, if
> > > > > > > that's been done and the new mm assigned. Because it really should be.
> > > > > > > If SMAP was a problem on x86, we would have seen it long ago.
> > > > > > > 
> > > > > > > I'm assuming this may be breakage related to the recent uaccess changes
> > > > > > > related to set_fs and friends? Or maybe recent changes on the powerpc
> > > > > > > side?
> > > > > > > 
> > > > > > > Zorro, did 5.10 work?
> > > > > > 
> > > > > > Would be interesting to know.
> > > > > 
> > > > > Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
> > > > > commit(be the HEAD) in 5.10 phase?
> > > > 
> > > > I forget which versions had what series of this, but 5.10 final - and if
> > > > that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
> > > > 5.10 definitely has them.
> > > 
> > > I justed built linux v5.10 with same .config file, and gave it same test.
> > > v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug:
> > > 
> > > # ./check generic/013 generic/051
> > > FSTYP         -- xfs (non-debug)
> > > PLATFORM      -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021
> > > MKFS_OPTIONS  -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3
> > > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
> > > 
> > > generic/013 138s ...  77s
> > > generic/051 103s ...  143s
> > > Ran: generic/013 generic/051
> > > Passed all 2 tests
> > 
> > Thanks for testing that, so I think it's safe to conclude that there's a
> > regression in powerpc fault handling for kthreads that use
> > kthread_use_mm in this release. A bisect would definitely find it, but
> > might be pointless if Christophe or Nick already have an idea of what it
> > is.
> > 
> 
> I don't have any idea yet, but I'd be curious to see the vmlinux binary matching the reported Oops.

OK, I don't have the vmlinux matching that bug report now, I can help to prepare a new one, but
I need lots of time (about 10+ hours).

Thanks,
Zorro

> 
> Christophe
>
Zorro Lang Jan. 29, 2021, 6:52 a.m. UTC | #10
On Thu, Jan 28, 2021 at 03:44:21PM +0100, Christophe Leroy wrote:
> 
> 
> Le 28/01/2021 à 15:42, Jens Axboe a écrit :
> > On 1/28/21 6:52 AM, Zorro Lang wrote:
> > > On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote:
> > > > On 1/27/21 8:13 PM, Zorro Lang wrote:
> > > > > On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
> > > > > > Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
> > > > > > > On 1/27/21 9:38 AM, Christophe Leroy wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Le 27/01/2021 à 15:56, Zorro Lang a écrit :
> > > > > > > > > On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
> > > > > > > > > The fail source line is:
> > > > > > > > > 
> > > > > > > > >     if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
> > > > > > > > >         return SIGSEGV;
> > > > > > > > > 
> > > > > > > > > The is_user() is based on user_mod(regs) only. This's not suit for
> > > > > > > > > io_uring, where the helper thread can assume the user app identity
> > > > > > > > > and could perform this fault just fine. So turn to use mm to decide
> > > > > > > > > if this is valid or not.
> > > > > > > > 
> > > > > > > > I don't understand why testing is_user would be an issue. KUAP purpose
> > > > > > > > it to block any unallowed access from kernel to user memory
> > > > > > > > (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
> > > > > > > > that is what is_user provides.
> > > > > > > > 
> > > > > > > > If the kernel access is legitimate, kernel should have opened
> > > > > > > > userspace access then you shouldn't get this "Bug: Read fault blocked
> > > > > > > > by KUAP!".
> > > > > > > > 
> > > > > > > > As far as I understand, the fault occurs in
> > > > > > > > iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
> > > > > > > > fault_in_pages_readable() uses __get_user() so it is a legitimate
> > > > > > > > access and you really should get a KUAP fault.
> > > > > > > > 
> > > > > > > > So the problem is somewhere else, I think you proposed patch just
> > > > > > > > hides the problem, it doesn't fix it.
> > > > > > > 
> > > > > > > If we do kthread_use_mm(), can we agree that the user access is valid?
> > > > > > 
> > > > > > Yeah the io uring code is fine, provided it uses the uaccess primitives
> > > > > > like any other kernel code. It's looking more like a an arch/powerpc bug.
> > > > > > 
> > > > > > > We should be able to copy to/from user space, and including faults, if
> > > > > > > that's been done and the new mm assigned. Because it really should be.
> > > > > > > If SMAP was a problem on x86, we would have seen it long ago.
> > > > > > > 
> > > > > > > I'm assuming this may be breakage related to the recent uaccess changes
> > > > > > > related to set_fs and friends? Or maybe recent changes on the powerpc
> > > > > > > side?
> > > > > > > 
> > > > > > > Zorro, did 5.10 work?
> > > > > > 
> > > > > > Would be interesting to know.
> > > > > 
> > > > > Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
> > > > > commit(be the HEAD) in 5.10 phase?
> > > > 
> > > > I forget which versions had what series of this, but 5.10 final - and if
> > > > that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
> > > > 5.10 definitely has them.
> > > 
> > > I justed built linux v5.10 with same .config file, and gave it same test.
> > > v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug:
> > > 
> > > # ./check generic/013 generic/051
> > > FSTYP         -- xfs (non-debug)
> > > PLATFORM      -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021
> > > MKFS_OPTIONS  -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3
> > > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
> > > 
> > > generic/013 138s ...  77s
> > > generic/051 103s ...  143s
> > > Ran: generic/013 generic/051
> > > Passed all 2 tests
> > 
> > Thanks for testing that, so I think it's safe to conclude that there's a
> > regression in powerpc fault handling for kthreads that use
> > kthread_use_mm in this release. A bisect would definitely find it, but
> > might be pointless if Christophe or Nick already have an idea of what it
> > is.
> > 
> 
> I don't have any idea yet, but I'd be curious to see the vmlinux binary matching the reported Oops.

I just upload the vmlinux and .config file, the vmlinux it too big, I have to
upload it to my google store and share the link as below:

config file: https://drive.google.com/file/d/1pMszboxdjbMPqSNXnMH-1UCZC-vtDnI9/view?usp=sharing
vmlinux: https://drive.google.com/file/d/1s7g2eBPAFFV61aM2dO9bvVTERGQ9mLYk/view?usp=sharing

I used latest upstream mainline linux, HEAD commit is:
76c057c84d (HEAD -> master, origin/master, origin/HEAD) Merge branch 'parisc-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

The test failed on this kernel as:

# dmesg
[   96.200296] ------------[ cut here ]------------
[   96.200304] Bug: Read fault blocked by KUAP!
[   96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
[   96.200323] Modules linked in: bonding rfkill sunrpc pseries_rng uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks ip_tables xfs libcrc32c sd_mod t10_pi xts ibmvscsi ibmveth scsi_transport_srp vmx_crypto
[   96.200372] CPU: 3 PID: 1876 Comm: io_wqe_worker-0 Tainted: G        W         5.11.0-rc5+ #5
[   96.200380] NIP:  c00000000008f8a0 LR: c00000000008f89c CTR: 0000000000000000
[   96.200386] REGS: c00000000d3aafd0 TRAP: 0700   Tainted: G        W          (5.11.0-rc5+)
[   96.200393] MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 48082204  XER: 00000005
[   96.200416] CFAR: c00000000015ddac IRQMASK: 1 
               GPR00: c00000000008f89c c00000000d3ab270 c000000002116900 0000000000000020 
               GPR04: c000000001bec250 0000000000000001 00000001fbb80000 0000000000000027 
               GPR08: 0000000000000001 0000000000000000 c000000020fbba00 0000000000000001 
               GPR12: 0000000000002000 c00000001ecaae00 c00000000019dae8 c000000008d48040 
               GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
               GPR20: c0000000012d9650 fcffffffffffffff c000000002164018 c000000001262da0 
               GPR24: 0000000000000000 0000000000000000 0000000000000300 c000000010c27b80 
               GPR28: 0000000000200000 0000000000000000 0000010007ffce60 c00000000d3ab470 
[   96.200521] NIP [c00000000008f8a0] bad_kernel_fault+0x180/0x310
[   96.200528] LR [c00000000008f89c] bad_kernel_fault+0x17c/0x310
[   96.200535] Call Trace:
[   96.200539] [c00000000d3ab270] [c00000000008f89c] bad_kernel_fault+0x17c/0x310 (unreliable)
[   96.200551] [c00000000d3ab2f0] [c000000000090494] __do_page_fault+0x5f4/0x900
[   96.200561] [c00000000d3ab3b0] [c0000000000907dc] do_page_fault+0x3c/0x120
[   96.200570] [c00000000d3ab400] [c00000000000c748] handle_page_fault+0x10/0x2c
[   96.200579] --- interrupt: 300 at fault_in_pages_readable+0x104/0x350
[   96.200579] --- interrupt: 300 at fault_in_pages_readable+0x104/0x350
[   96.200586] NIP:  c000000000849424 LR: c00000000084952c CTR: c0000000006984a0
[   96.200592] REGS: c00000000d3ab470 TRAP: 0300   Tainted: G        W          (5.11.0-rc5+)
[   96.200598] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 44082804  XER: 00000001
[   96.200628] CFAR: c000000000010330 DAR: 0000010007ffce60 DSISR: 00200000 IRQMASK: 0 
               GPR00: c00000000084952c c00000000d3ab710 c000000002116900 0000000000000000 
               GPR04: c000000010c27ce0 0000000000000001 00000001fbb80000 0000000000010000 
               GPR08: 00000000271cd0a4 0000000000000200 0000000000000200 0000000000000000 
               GPR12: 0000000000002000 c00000001ecaae00 c00000000019dae8 c000000008d48040 
               GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
               GPR20: c0000000012d9650 fcffffffffffffff c000000002164018 c000000001262da0 
               GPR24: c000000020fbba00 bfffffffffffffff 0000000000000000 a8aaaaaaaaaaaaaa 
               GPR28: 0000010008005d71 c00000000bec3e00 0000000000000000 0000010007ffce60 
[   96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350
[   96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350
[   96.200747] --- interrupt: 300
[   96.200752] [c00000000d3ab7a0] [c0000000008496d0] iov_iter_fault_in_readable+0x60/0x120
[   96.200761] [c00000000d3ab7e0] [c000000000698558] iomap_write_actor+0xb8/0x270
[   96.200771] [c00000000d3ab890] [c000000000693554] iomap_apply+0x2b4/0x740
[   96.200780] [c00000000d3ab9a0] [c000000000693dc0] iomap_file_buffered_write+0xa0/0x120
[   96.200790] [c00000000d3ab9f0] [c008000001d3efec] xfs_file_buffered_aio_write+0x354/0x590 [xfs]
[   96.200870] [c00000000d3aba90] [c0000000006691e4] io_write+0x104/0x4b0
[   96.200884] [c00000000d3abbb0] [c00000000066be54] io_issue_sqe+0x3d4/0xf50
[   96.200897] [c00000000d3abc60] [c00000000066f250] io_wq_submit_work+0xb0/0x2f0
[   96.200911] [c00000000d3abcb0] [c0000000006738a8] io_worker_handle_work+0x248/0x4a0
[   96.200925] [c00000000d3abd30] [c000000000673d28] io_wqe_worker+0x228/0x2a0
[   96.200939] [c00000000d3abda0] [c00000000019dc94] kthread+0x1b4/0x1c0
[   96.200950] [c00000000d3abe10] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[   96.200959] Instruction dump:
[   96.200965] e87f0100 4810b155 60000000 2c230000 4182ffa8 409200ac 3c82ff15 38847e38 
[   96.200987] 3c62ff15 38637ed0 480ce4ad 60000000 <0fe00000> e8010090 38210080 38600001 
[   96.201008] irq event stamp: 46
[   96.201013] hardirqs last  enabled at (45): [<c0000000005428c4>] __slab_free+0x414/0x610
[   96.201021] hardirqs last disabled at (46): [<c000000000008a04>] data_access_common_virt+0x1a4/0x1c0
[   96.201030] softirqs last  enabled at (0): [<c00000000015ae68>] copy_process+0x688/0x1600
[   96.201038] softirqs last disabled at (0): [<0000000000000000>] 0x0
[   96.201045] ---[ end trace c2373fad985a304b ]---

# ./scripts/faddr2line vmlinux bad_kernel_fault+0x180/0x310
bad_kernel_fault+0x180/0x310:
bad_kernel_fault at arch/powerpc/mm/fault.c:229 (discriminator 6)

    217         // Read/write fault blocked by KUAP is bad, it can never succeed.
    218         if (bad_kuap_fault(regs, address, is_write)) {
    219                 pr_crit_ratelimited("Kernel attempted to %s user page (%lx) - exploit attempt? (uid: %d)\n",
    220                                     is_write ? "write" : "read", address,
    221                                     from_kuid(&init_user_ns, current_uid()));
    222 
    223                 // Fault on user outside of certain regions (eg. copy_tofrom_user()) is bad
    224                 if (!search_exception_tables(regs->nip))
    225                         return true;
    226 
    227                 // Read/write fault in a valid region (the exception table search passed
    228                 // above), but blocked by KUAP is bad, it can never succeed.
    229                 return WARN(true, "Bug: %s fault blocked by KUAP!", is_write ? "Write" : "Read");

Thanks,
Zorro

> 
> Christophe
>
Christophe Leroy Jan. 29, 2021, 12:26 p.m. UTC | #11
+Aneesh

Le 29/01/2021 à 07:52, Zorro Lang a écrit :
> On Thu, Jan 28, 2021 at 03:44:21PM +0100, Christophe Leroy wrote:
>>
>>
>> Le 28/01/2021 à 15:42, Jens Axboe a écrit :
>>> On 1/28/21 6:52 AM, Zorro Lang wrote:
>>>> On Wed, Jan 27, 2021 at 08:06:37PM -0700, Jens Axboe wrote:
>>>>> On 1/27/21 8:13 PM, Zorro Lang wrote:
>>>>>> On Thu, Jan 28, 2021 at 10:18:07AM +1000, Nicholas Piggin wrote:
>>>>>>> Excerpts from Jens Axboe's message of January 28, 2021 5:29 am:
>>>>>>>> On 1/27/21 9:38 AM, Christophe Leroy wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Le 27/01/2021 à 15:56, Zorro Lang a écrit :
>>>>>>>>>> On powerpc, io_uring test hit below KUAP fault on __do_page_fault.
>>>>>>>>>> The fail source line is:
>>>>>>>>>>
>>>>>>>>>>      if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
>>>>>>>>>>          return SIGSEGV;
>>>>>>>>>>
>>>>>>>>>> The is_user() is based on user_mod(regs) only. This's not suit for
>>>>>>>>>> io_uring, where the helper thread can assume the user app identity
>>>>>>>>>> and could perform this fault just fine. So turn to use mm to decide
>>>>>>>>>> if this is valid or not.
>>>>>>>>>
>>>>>>>>> I don't understand why testing is_user would be an issue. KUAP purpose
>>>>>>>>> it to block any unallowed access from kernel to user memory
>>>>>>>>> (Equivalent to SMAP on x86). So it really must be based on MSR_PR bit,
>>>>>>>>> that is what is_user provides.
>>>>>>>>>
>>>>>>>>> If the kernel access is legitimate, kernel should have opened
>>>>>>>>> userspace access then you shouldn't get this "Bug: Read fault blocked
>>>>>>>>> by KUAP!".
>>>>>>>>>
>>>>>>>>> As far as I understand, the fault occurs in
>>>>>>>>> iov_iter_fault_in_readable() which calls fault_in_pages_readable() And
>>>>>>>>> fault_in_pages_readable() uses __get_user() so it is a legitimate
>>>>>>>>> access and you really should get a KUAP fault.
>>>>>>>>>
>>>>>>>>> So the problem is somewhere else, I think you proposed patch just
>>>>>>>>> hides the problem, it doesn't fix it.
>>>>>>>>
>>>>>>>> If we do kthread_use_mm(), can we agree that the user access is valid?
>>>>>>>
>>>>>>> Yeah the io uring code is fine, provided it uses the uaccess primitives
>>>>>>> like any other kernel code. It's looking more like a an arch/powerpc bug.
>>>>>>>
>>>>>>>> We should be able to copy to/from user space, and including faults, if
>>>>>>>> that's been done and the new mm assigned. Because it really should be.
>>>>>>>> If SMAP was a problem on x86, we would have seen it long ago.
>>>>>>>>
>>>>>>>> I'm assuming this may be breakage related to the recent uaccess changes
>>>>>>>> related to set_fs and friends? Or maybe recent changes on the powerpc
>>>>>>>> side?
>>>>>>>>
>>>>>>>> Zorro, did 5.10 work?
>>>>>>>
>>>>>>> Would be interesting to know.
>>>>>>
>>>>>> Sure Nick and Jens, which 5.10 rc? version do you want to know ? Or any git
>>>>>> commit(be the HEAD) in 5.10 phase?
>>>>>
>>>>> I forget which versions had what series of this, but 5.10 final - and if
>>>>> that fails, then 5.9 final. IIRC, 5.9 was pre any of these changes, and
>>>>> 5.10 definitely has them.
>>>>
>>>> I justed built linux v5.10 with same .config file, and gave it same test.
>>>> v5.10 (HEAD=2c85ebc57b Linux 5.10) can't reproduce this bug:
>>>>
>>>> # ./check generic/013 generic/051
>>>> FSTYP         -- xfs (non-debug)
>>>> PLATFORM      -- Linux/ppc64le ibm-p9z-xxx-xxxx 5.10.0 #3 SMP Thu Jan 28 04:12:14 EST 2021
>>>> MKFS_OPTIONS  -- -f -m crc=1,finobt=1,reflink=1,rmapbt=1,bigtime=1,inobtcount=1 /dev/sda3
>>>> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
>>>>
>>>> generic/013 138s ...  77s
>>>> generic/051 103s ...  143s
>>>> Ran: generic/013 generic/051
>>>> Passed all 2 tests
>>>
>>> Thanks for testing that, so I think it's safe to conclude that there's a
>>> regression in powerpc fault handling for kthreads that use
>>> kthread_use_mm in this release. A bisect would definitely find it, but
>>> might be pointless if Christophe or Nick already have an idea of what it
>>> is.
>>>
>>
>> I don't have any idea yet, but I'd be curious to see the vmlinux binary matching the reported Oops.
> 
> I just upload the vmlinux and .config file, the vmlinux it too big, I have to
> upload it to my google store and share the link as below:
> 
> config file: https://drive.google.com/file/d/1pMszboxdjbMPqSNXnMH-1UCZC-vtDnI9/view?usp=sharing
> vmlinux: https://drive.google.com/file/d/1s7g2eBPAFFV61aM2dO9bvVTERGQ9mLYk/view?usp=sharing
> 
> I used latest upstream mainline linux, HEAD commit is:
> 76c057c84d (HEAD -> master, origin/master, origin/HEAD) Merge branch 'parisc-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
> 
> The test failed on this kernel as:
> 
> # dmesg
> [   96.200296] ------------[ cut here ]------------
> [   96.200304] Bug: Read fault blocked by KUAP!
> [   96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310

> [   96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350
> [   96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350
> [   96.200747] --- interrupt: 300


Problem happens in a section where userspace access is supposed to be granted, so the patch you 
proposed is definitely not the right fix.

c000000000849408:	2c 01 00 4c 	isync
c00000000084940c:	a6 03 3d 7d 	mtspr   29,r9  <== granting userspace access permission
c000000000849410:	2c 01 00 4c 	isync
c000000000849414:	00 00 36 e9 	ld      r9,0(r22)
c000000000849418:	20 00 29 81 	lwz     r9,32(r9)
c00000000084941c:	00 02 29 71 	andi.   r9,r9,512
c000000000849420:	78 d3 5e 7f 	mr      r30,r26
==> c000000000849424:	00 00 bf 8b 	lbz     r29,0(r31)  <== accessing userspace
c000000000849428:	10 00 82 41 	beq     c000000000849438 <fault_in_pages_readable+0x118>
c00000000084942c:	2c 01 00 4c 	isync
c000000000849430:	a6 03 bd 7e 	mtspr   29,r21  <== clearing userspace access permission
c000000000849434:	2c 01 00 4c 	isync

My first guess is that the problem is linked to the following function, see the comment

/*
  * For kernel thread that doesn't have thread.regs return
  * default AMR/IAMR values.
  */
static inline u64 current_thread_amr(void)
{
	if (current->thread.regs)
		return current->thread.regs->amr;
	return AMR_KUAP_BLOCKED;
}

Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR 
when in kernel mode")

Aneesh, any idea ?

Christophe
Michael Ellerman Jan. 30, 2021, 11:22 a.m. UTC | #12
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> +Aneesh
>
> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
..
>> [   96.200296] ------------[ cut here ]------------
>> [   96.200304] Bug: Read fault blocked by KUAP!
>> [   96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>
>> [   96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350
>> [   96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350
>> [   96.200747] --- interrupt: 300
>
>
> Problem happens in a section where userspace access is supposed to be granted, so the patch you 
> proposed is definitely not the right fix.
>
> c000000000849408:	2c 01 00 4c 	isync
> c00000000084940c:	a6 03 3d 7d 	mtspr   29,r9  <== granting userspace access permission
> c000000000849410:	2c 01 00 4c 	isync
> c000000000849414:	00 00 36 e9 	ld      r9,0(r22)
> c000000000849418:	20 00 29 81 	lwz     r9,32(r9)
> c00000000084941c:	00 02 29 71 	andi.   r9,r9,512
> c000000000849420:	78 d3 5e 7f 	mr      r30,r26
> ==> c000000000849424:	00 00 bf 8b 	lbz     r29,0(r31)  <== accessing userspace
> c000000000849428:	10 00 82 41 	beq     c000000000849438 <fault_in_pages_readable+0x118>
> c00000000084942c:	2c 01 00 4c 	isync
> c000000000849430:	a6 03 bd 7e 	mtspr   29,r21  <== clearing userspace access permission
> c000000000849434:	2c 01 00 4c 	isync
>
> My first guess is that the problem is linked to the following function, see the comment
>
> /*
>   * For kernel thread that doesn't have thread.regs return
>   * default AMR/IAMR values.
>   */
> static inline u64 current_thread_amr(void)
> {
> 	if (current->thread.regs)
> 		return current->thread.regs->amr;
> 	return AMR_KUAP_BLOCKED;
> }
>
> Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR 
> when in kernel mode")

Yeah that's a bit of a curly one.

At some point io_uring did kthread_use_mm(), which is supposed to mean
the kthread can operate on behalf of the original process that submitted
the IO.

But because KUAP is implemented using memory protection keys, it depends
on the value of the AMR register, which is not part of the mm, it's in
thread.regs->amr.

And what's worse by the time we're in kthread_use_mm() we no longer have
access to the thread.regs->amr of the original process that submitted
the IO.

We also can't simply move the AMR into the mm, precisely because it's
per thread, not per mm.

So TBH I don't know how we're going to fix this.

I guess we could return AMR=unblocked for kernel threads, but that's
arguably a bug because it allows a process to circumvent memory keys by
asking the kernel to do the access.


cheers
Nicholas Piggin Jan. 30, 2021, 1:54 p.m. UTC | #13
Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> +Aneesh
>>
>> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
> ..
>>> [   96.200296] ------------[ cut here ]------------
>>> [   96.200304] Bug: Read fault blocked by KUAP!
>>> [   96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>>
>>> [   96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350
>>> [   96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350
>>> [   96.200747] --- interrupt: 300
>>
>>
>> Problem happens in a section where userspace access is supposed to be granted, so the patch you 
>> proposed is definitely not the right fix.
>>
>> c000000000849408:	2c 01 00 4c 	isync
>> c00000000084940c:	a6 03 3d 7d 	mtspr   29,r9  <== granting userspace access permission
>> c000000000849410:	2c 01 00 4c 	isync
>> c000000000849414:	00 00 36 e9 	ld      r9,0(r22)
>> c000000000849418:	20 00 29 81 	lwz     r9,32(r9)
>> c00000000084941c:	00 02 29 71 	andi.   r9,r9,512
>> c000000000849420:	78 d3 5e 7f 	mr      r30,r26
>> ==> c000000000849424:	00 00 bf 8b 	lbz     r29,0(r31)  <== accessing userspace
>> c000000000849428:	10 00 82 41 	beq     c000000000849438 <fault_in_pages_readable+0x118>
>> c00000000084942c:	2c 01 00 4c 	isync
>> c000000000849430:	a6 03 bd 7e 	mtspr   29,r21  <== clearing userspace access permission
>> c000000000849434:	2c 01 00 4c 	isync
>>
>> My first guess is that the problem is linked to the following function, see the comment
>>
>> /*
>>   * For kernel thread that doesn't have thread.regs return
>>   * default AMR/IAMR values.
>>   */
>> static inline u64 current_thread_amr(void)
>> {
>> 	if (current->thread.regs)
>> 		return current->thread.regs->amr;
>> 	return AMR_KUAP_BLOCKED;
>> }
>>
>> Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR 
>> when in kernel mode")
> 
> Yeah that's a bit of a curly one.
> 
> At some point io_uring did kthread_use_mm(), which is supposed to mean
> the kthread can operate on behalf of the original process that submitted
> the IO.
> 
> But because KUAP is implemented using memory protection keys, it depends
> on the value of the AMR register, which is not part of the mm, it's in
> thread.regs->amr.
> 
> And what's worse by the time we're in kthread_use_mm() we no longer have
> access to the thread.regs->amr of the original process that submitted
> the IO.
> 
> We also can't simply move the AMR into the mm, precisely because it's
> per thread, not per mm.
> 
> So TBH I don't know how we're going to fix this.
> 
> I guess we could return AMR=unblocked for kernel threads, but that's
> arguably a bug because it allows a process to circumvent memory keys by
> asking the kernel to do the access.

We shouldn't need to inherit AMR should we? We only need it to be locked 
for kernel threads until it's explicitly unlocked -- nothing mm specific 
there. I think current_thread_amr could return 0 for kernel threads? Or
I would even avoid using that function for allow_user_access and open
code the kthread case and remove it from current_thread_amr().

Thanks,
Nick
Aneesh Kumar K V Feb. 2, 2021, 4:45 a.m. UTC | #14
Nicholas Piggin <npiggin@gmail.com> writes:

> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>> +Aneesh
>>>
>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
>> ..
>>>> [   96.200296] ------------[ cut here ]------------
>>>> [   96.200304] Bug: Read fault blocked by KUAP!
>>>> [   96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>>>
>>>> [   96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350
>>>> [   96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350
>>>> [   96.200747] --- interrupt: 300
>>>
>>>
>>> Problem happens in a section where userspace access is supposed to be granted, so the patch you 
>>> proposed is definitely not the right fix.
>>>
>>> c000000000849408:	2c 01 00 4c 	isync
>>> c00000000084940c:	a6 03 3d 7d 	mtspr   29,r9  <== granting userspace access permission
>>> c000000000849410:	2c 01 00 4c 	isync
>>> c000000000849414:	00 00 36 e9 	ld      r9,0(r22)
>>> c000000000849418:	20 00 29 81 	lwz     r9,32(r9)
>>> c00000000084941c:	00 02 29 71 	andi.   r9,r9,512
>>> c000000000849420:	78 d3 5e 7f 	mr      r30,r26
>>> ==> c000000000849424:	00 00 bf 8b 	lbz     r29,0(r31)  <== accessing userspace
>>> c000000000849428:	10 00 82 41 	beq     c000000000849438 <fault_in_pages_readable+0x118>
>>> c00000000084942c:	2c 01 00 4c 	isync
>>> c000000000849430:	a6 03 bd 7e 	mtspr   29,r21  <== clearing userspace access permission
>>> c000000000849434:	2c 01 00 4c 	isync
>>>
>>> My first guess is that the problem is linked to the following function, see the comment
>>>
>>> /*
>>>   * For kernel thread that doesn't have thread.regs return
>>>   * default AMR/IAMR values.
>>>   */
>>> static inline u64 current_thread_amr(void)
>>> {
>>> 	if (current->thread.regs)
>>> 		return current->thread.regs->amr;
>>> 	return AMR_KUAP_BLOCKED;
>>> }
>>>
>>> Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR 
>>> when in kernel mode")
>> 
>> Yeah that's a bit of a curly one.
>> 
>> At some point io_uring did kthread_use_mm(), which is supposed to mean
>> the kthread can operate on behalf of the original process that submitted
>> the IO.
>> 
>> But because KUAP is implemented using memory protection keys, it depends
>> on the value of the AMR register, which is not part of the mm, it's in
>> thread.regs->amr.
>> 
>> And what's worse by the time we're in kthread_use_mm() we no longer have
>> access to the thread.regs->amr of the original process that submitted
>> the IO.
>> 
>> We also can't simply move the AMR into the mm, precisely because it's
>> per thread, not per mm.
>> 
>> So TBH I don't know how we're going to fix this.
>> 
>> I guess we could return AMR=unblocked for kernel threads, but that's
>> arguably a bug because it allows a process to circumvent memory keys by
>> asking the kernel to do the access.
>
> We shouldn't need to inherit AMR should we? We only need it to be locked 
> for kernel threads until it's explicitly unlocked -- nothing mm specific 
> there. I think current_thread_amr could return 0 for kernel threads? Or
> I would even avoid using that function for allow_user_access and open
> code the kthread case and remove it from current_thread_amr().
>
> Thanks,
> Nick

Can we try this?

commit 9a193d38b1a0a52364bc70ec953e0685993603b6
Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Date:   Tue Feb 2 09:23:38 2021 +0530

    powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm
    
    This fix the bad fault reported by KUAP when io_wqe_worker access userspace.
    
     Bug: Read fault blocked by KUAP!
     WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0
     NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
     LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
    ..........
     Call Trace:
     [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable)
     [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
     [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
     --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
    ..........
     NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
     LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
     interrupt: 300
     [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
     [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
     [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120
     [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
     [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
     [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
     [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
     [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
     [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
     [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
     [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c
    
    The kernel consider thread AMR value for kernel thread to be
    AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
    of course not correct and we should allow userspace access after
    kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
    AMR value of the operating address space. But, the AMR value is
    thread-specific and we inherit the address space and not thread
    access restrictions. Because of this ignore AMR value when accessing
    userspace via kernel thread.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
index f50f72e535aa..7457d80ba0bb 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -384,7 +384,13 @@ static __always_inline void allow_user_access(void __user *to, const void __user
 	// This is written so we can resolve to a single case at build time
 	BUILD_BUG_ON(!__builtin_constant_p(dir));
 
-	if (mmu_has_feature(MMU_FTR_PKEY))
+	/*
+	 * if it is a kthread that did kthread_use_mm() don't
+	 * use current_thread_amr().
+	 */
+	if (!current->mm && current->active_mm != &init_mm)
+		thread_amr = 0;
+	else if (mmu_has_feature(MMU_FTR_PKEY))
 		thread_amr = current_thread_amr();
 
 	if (dir == KUAP_READ)
Aneesh Kumar K V Feb. 2, 2021, 5:55 a.m. UTC | #15
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:

> Nicholas Piggin <npiggin@gmail.com> writes:
>
>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>> +Aneesh
>>>>
>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
>>> ..
>>>>> [   96.200296] ------------[ cut here ]------------
>>>>> [   96.200304] Bug: Read fault blocked by KUAP!
>>>>> [   96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>>>>
>>>>> [   96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350
>>>>> [   96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350
>>>>> [   96.200747] --- interrupt: 300
>>>>
>>>>
>>>> Problem happens in a section where userspace access is supposed to be granted, so the patch you 
>>>> proposed is definitely not the right fix.
>>>>
>>>> c000000000849408:	2c 01 00 4c 	isync
>>>> c00000000084940c:	a6 03 3d 7d 	mtspr   29,r9  <== granting userspace access permission
>>>> c000000000849410:	2c 01 00 4c 	isync
>>>> c000000000849414:	00 00 36 e9 	ld      r9,0(r22)
>>>> c000000000849418:	20 00 29 81 	lwz     r9,32(r9)
>>>> c00000000084941c:	00 02 29 71 	andi.   r9,r9,512
>>>> c000000000849420:	78 d3 5e 7f 	mr      r30,r26
>>>> ==> c000000000849424:	00 00 bf 8b 	lbz     r29,0(r31)  <== accessing userspace
>>>> c000000000849428:	10 00 82 41 	beq     c000000000849438 <fault_in_pages_readable+0x118>
>>>> c00000000084942c:	2c 01 00 4c 	isync
>>>> c000000000849430:	a6 03 bd 7e 	mtspr   29,r21  <== clearing userspace access permission
>>>> c000000000849434:	2c 01 00 4c 	isync
>>>>
>>>> My first guess is that the problem is linked to the following function, see the comment
>>>>
>>>> /*
>>>>   * For kernel thread that doesn't have thread.regs return
>>>>   * default AMR/IAMR values.
>>>>   */
>>>> static inline u64 current_thread_amr(void)
>>>> {
>>>> 	if (current->thread.regs)
>>>> 		return current->thread.regs->amr;
>>>> 	return AMR_KUAP_BLOCKED;
>>>> }
>>>>
>>>> Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR 
>>>> when in kernel mode")
>>> 
>>> Yeah that's a bit of a curly one.
>>> 
>>> At some point io_uring did kthread_use_mm(), which is supposed to mean
>>> the kthread can operate on behalf of the original process that submitted
>>> the IO.
>>> 
>>> But because KUAP is implemented using memory protection keys, it depends
>>> on the value of the AMR register, which is not part of the mm, it's in
>>> thread.regs->amr.
>>> 
>>> And what's worse by the time we're in kthread_use_mm() we no longer have
>>> access to the thread.regs->amr of the original process that submitted
>>> the IO.
>>> 
>>> We also can't simply move the AMR into the mm, precisely because it's
>>> per thread, not per mm.
>>> 
>>> So TBH I don't know how we're going to fix this.
>>> 
>>> I guess we could return AMR=unblocked for kernel threads, but that's
>>> arguably a bug because it allows a process to circumvent memory keys by
>>> asking the kernel to do the access.
>>
>> We shouldn't need to inherit AMR should we? We only need it to be locked 
>> for kernel threads until it's explicitly unlocked -- nothing mm specific 
>> there. I think current_thread_amr could return 0 for kernel threads? Or
>> I would even avoid using that function for allow_user_access and open
>> code the kthread case and remove it from current_thread_amr().
>>
>> Thanks,
>> Nick
>

updated one

From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Date: Tue, 2 Feb 2021 09:23:38 +0530
Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace
 after kthread_use_mm

This fix the bad fault reported by KUAP when io_wqe_worker access userspace.

 Bug: Read fault blocked by KUAP!
 WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0
 NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
 LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
..........
 Call Trace:
 [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable)
 [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
 [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
 --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
..........
 NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
 LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
 interrupt: 300
 [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
 [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
 [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120
 [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
 [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
 [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
 [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
 [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
 [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
 [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
 [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c

The kernel consider thread AMR value for kernel thread to be
AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
of course not correct and we should allow userspace access after
kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
AMR value of the operating address space. But, the AMR value is
thread-specific and we inherit the address space and not thread
access restrictions. Because of this ignore AMR value when accessing
userspace via kernel thread.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
Changes from v1:
* Address review feedback from Nick 

 arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
index f50f72e535aa..95f4df99249e 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -384,7 +384,13 @@ static __always_inline void allow_user_access(void __user *to, const void __user
 	// This is written so we can resolve to a single case at build time
 	BUILD_BUG_ON(!__builtin_constant_p(dir));
 
-	if (mmu_has_feature(MMU_FTR_PKEY))
+	/*
+	 * if it is a kthread that did kthread_use_mm() don't
+	 * use current_thread_amr().
+	 */
+	if (current->flags & PF_KTHREAD)
+		thread_amr = 0;
+	else if (mmu_has_feature(MMU_FTR_PKEY))
 		thread_amr = current_thread_amr();
 
 	if (dir == KUAP_READ)
Christophe Leroy Feb. 2, 2021, 6:02 a.m. UTC | #16
Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit :
> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:
> 
>> Nicholas Piggin <npiggin@gmail.com> writes:
>>
>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>> +Aneesh
>>>>>
>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
>>>> ..
>>>>>> [   96.200296] ------------[ cut here ]------------
>>>>>> [   96.200304] Bug: Read fault blocked by KUAP!
>>>>>> [   96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>>>>>
>>>>>> [   96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350
>>>>>> [   96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350
>>>>>> [   96.200747] --- interrupt: 300
>>>>>
>>>>>
>>>>> Problem happens in a section where userspace access is supposed to be granted, so the patch you
>>>>> proposed is definitely not the right fix.
>>>>>
>>>>> c000000000849408:	2c 01 00 4c 	isync
>>>>> c00000000084940c:	a6 03 3d 7d 	mtspr   29,r9  <== granting userspace access permission
>>>>> c000000000849410:	2c 01 00 4c 	isync
>>>>> c000000000849414:	00 00 36 e9 	ld      r9,0(r22)
>>>>> c000000000849418:	20 00 29 81 	lwz     r9,32(r9)
>>>>> c00000000084941c:	00 02 29 71 	andi.   r9,r9,512
>>>>> c000000000849420:	78 d3 5e 7f 	mr      r30,r26
>>>>> ==> c000000000849424:	00 00 bf 8b 	lbz     r29,0(r31)  <== accessing userspace
>>>>> c000000000849428:	10 00 82 41 	beq     c000000000849438 <fault_in_pages_readable+0x118>
>>>>> c00000000084942c:	2c 01 00 4c 	isync
>>>>> c000000000849430:	a6 03 bd 7e 	mtspr   29,r21  <== clearing userspace access permission
>>>>> c000000000849434:	2c 01 00 4c 	isync
>>>>>
>>>>> My first guess is that the problem is linked to the following function, see the comment
>>>>>
>>>>> /*
>>>>>    * For kernel thread that doesn't have thread.regs return
>>>>>    * default AMR/IAMR values.
>>>>>    */
>>>>> static inline u64 current_thread_amr(void)
>>>>> {
>>>>> 	if (current->thread.regs)
>>>>> 		return current->thread.regs->amr;
>>>>> 	return AMR_KUAP_BLOCKED;
>>>>> }
>>>>>
>>>>> Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR
>>>>> when in kernel mode")
>>>>
>>>> Yeah that's a bit of a curly one.
>>>>
>>>> At some point io_uring did kthread_use_mm(), which is supposed to mean
>>>> the kthread can operate on behalf of the original process that submitted
>>>> the IO.
>>>>
>>>> But because KUAP is implemented using memory protection keys, it depends
>>>> on the value of the AMR register, which is not part of the mm, it's in
>>>> thread.regs->amr.
>>>>
>>>> And what's worse by the time we're in kthread_use_mm() we no longer have
>>>> access to the thread.regs->amr of the original process that submitted
>>>> the IO.
>>>>
>>>> We also can't simply move the AMR into the mm, precisely because it's
>>>> per thread, not per mm.
>>>>
>>>> So TBH I don't know how we're going to fix this.
>>>>
>>>> I guess we could return AMR=unblocked for kernel threads, but that's
>>>> arguably a bug because it allows a process to circumvent memory keys by
>>>> asking the kernel to do the access.
>>>
>>> We shouldn't need to inherit AMR should we? We only need it to be locked
>>> for kernel threads until it's explicitly unlocked -- nothing mm specific
>>> there. I think current_thread_amr could return 0 for kernel threads? Or
>>> I would even avoid using that function for allow_user_access and open
>>> code the kthread case and remove it from current_thread_amr().
>>>
>>> Thanks,
>>> Nick
>>
> 
> updated one
> 
>  From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Date: Tue, 2 Feb 2021 09:23:38 +0530
> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace
>   after kthread_use_mm
> 
> This fix the bad fault reported by KUAP when io_wqe_worker access userspace.
> 
>   Bug: Read fault blocked by KUAP!
>   WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0
>   NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
>   LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
> ..........
>   Call Trace:
>   [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable)
>   [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
>   [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
>   --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
> ..........
>   NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
>   LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
>   interrupt: 300
>   [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
>   [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
>   [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120
>   [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
>   [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
>   [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
>   [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
>   [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
>   [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
>   [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
>   [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c
> 
> The kernel consider thread AMR value for kernel thread to be
> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
> of course not correct and we should allow userspace access after
> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
> AMR value of the operating address space. But, the AMR value is
> thread-specific and we inherit the address space and not thread
> access restrictions. Because of this ignore AMR value when accessing
> userspace via kernel thread.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
> Changes from v1:
> * Address review feedback from Nick
> 
>   arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
> index f50f72e535aa..95f4df99249e 100644
> --- a/arch/powerpc/include/asm/book3s/64/kup.h
> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
> @@ -384,7 +384,13 @@ static __always_inline void allow_user_access(void __user *to, const void __user
>   	// This is written so we can resolve to a single case at build time
>   	BUILD_BUG_ON(!__builtin_constant_p(dir));
>   
> -	if (mmu_has_feature(MMU_FTR_PKEY))
> +	/*
> +	 * if it is a kthread that did kthread_use_mm() don't
> +	 * use current_thread_amr().

According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel thread */
It doesn't seem to be related to kthread_use_mm()

> +	 */
> +	if (current->flags & PF_KTHREAD)
> +		thread_amr = 0;
> +	else if (mmu_has_feature(MMU_FTR_PKEY))
>   		thread_amr = current_thread_amr();
>   
>   	if (dir == KUAP_READ)
> 

Christophe
Aneesh Kumar K V Feb. 2, 2021, 6:16 a.m. UTC | #17
On 2/2/21 11:32 AM, Christophe Leroy wrote:
> 
> 
> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit :
>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:
>>
>>> Nicholas Piggin <npiggin@gmail.com> writes:
>>>
>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>>> +Aneesh
>>>>>>
>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
>>>>> ..
>>>>>>> [   96.200296] ------------[ cut here ]------------
>>>>>>> [   96.200304] Bug: Read fault blocked by KUAP!
>>>>>>> [   96.200309] WARNING: CPU: 3 PID: 1876 at 
>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>>>>>>
>>>>>>> [   96.200734] NIP [c000000000849424] 
>>>>>>> fault_in_pages_readable+0x104/0x350
>>>>>>> [   96.200741] LR [c00000000084952c] 
>>>>>>> fault_in_pages_readable+0x20c/0x350
>>>>>>> [   96.200747] --- interrupt: 300
>>>>>>
>>>>>>
>>>>>> Problem happens in a section where userspace access is supposed to 
>>>>>> be granted, so the patch you
>>>>>> proposed is definitely not the right fix.
>>>>>>
>>>>>> c000000000849408:    2c 01 00 4c     isync
>>>>>> c00000000084940c:    a6 03 3d 7d     mtspr   29,r9  <== granting 
>>>>>> userspace access permission
>>>>>> c000000000849410:    2c 01 00 4c     isync
>>>>>> c000000000849414:    00 00 36 e9     ld      r9,0(r22)
>>>>>> c000000000849418:    20 00 29 81     lwz     r9,32(r9)
>>>>>> c00000000084941c:    00 02 29 71     andi.   r9,r9,512
>>>>>> c000000000849420:    78 d3 5e 7f     mr      r30,r26
>>>>>> ==> c000000000849424:    00 00 bf 8b     lbz     r29,0(r31)  <== 
>>>>>> accessing userspace
>>>>>> c000000000849428:    10 00 82 41     beq     c000000000849438 
>>>>>> <fault_in_pages_readable+0x118>
>>>>>> c00000000084942c:    2c 01 00 4c     isync
>>>>>> c000000000849430:    a6 03 bd 7e     mtspr   29,r21  <== clearing 
>>>>>> userspace access permission
>>>>>> c000000000849434:    2c 01 00 4c     isync
>>>>>>
>>>>>> My first guess is that the problem is linked to the following 
>>>>>> function, see the comment
>>>>>>
>>>>>> /*
>>>>>>    * For kernel thread that doesn't have thread.regs return
>>>>>>    * default AMR/IAMR values.
>>>>>>    */
>>>>>> static inline u64 current_thread_amr(void)
>>>>>> {
>>>>>>     if (current->thread.regs)
>>>>>>         return current->thread.regs->amr;
>>>>>>     return AMR_KUAP_BLOCKED;
>>>>>> }
>>>>>>
>>>>>> Above function was introduced by commit 48a8ab4eeb82 
>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR
>>>>>> when in kernel mode")
>>>>>
>>>>> Yeah that's a bit of a curly one.
>>>>>
>>>>> At some point io_uring did kthread_use_mm(), which is supposed to mean
>>>>> the kthread can operate on behalf of the original process that 
>>>>> submitted
>>>>> the IO.
>>>>>
>>>>> But because KUAP is implemented using memory protection keys, it 
>>>>> depends
>>>>> on the value of the AMR register, which is not part of the mm, it's in
>>>>> thread.regs->amr.
>>>>>
>>>>> And what's worse by the time we're in kthread_use_mm() we no longer 
>>>>> have
>>>>> access to the thread.regs->amr of the original process that submitted
>>>>> the IO.
>>>>>
>>>>> We also can't simply move the AMR into the mm, precisely because it's
>>>>> per thread, not per mm.
>>>>>
>>>>> So TBH I don't know how we're going to fix this.
>>>>>
>>>>> I guess we could return AMR=unblocked for kernel threads, but that's
>>>>> arguably a bug because it allows a process to circumvent memory 
>>>>> keys by
>>>>> asking the kernel to do the access.
>>>>
>>>> We shouldn't need to inherit AMR should we? We only need it to be 
>>>> locked
>>>> for kernel threads until it's explicitly unlocked -- nothing mm 
>>>> specific
>>>> there. I think current_thread_amr could return 0 for kernel threads? Or
>>>> I would even avoid using that function for allow_user_access and open
>>>> code the kthread case and remove it from current_thread_amr().
>>>>
>>>> Thanks,
>>>> Nick
>>>
>>
>> updated one
>>
>>  From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>> Date: Tue, 2 Feb 2021 09:23:38 +0530
>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace
>>   after kthread_use_mm
>>
>> This fix the bad fault reported by KUAP when io_wqe_worker access 
>> userspace.
>>
>>   Bug: Read fault blocked by KUAP!
>>   WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 
>> __do_page_fault+0x6b4/0xcd0
>>   NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
>>   LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
>> ..........
>>   Call Trace:
>>   [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 
>> (unreliable)
>>   [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
>>   [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
>>   --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
>> ..........
>>   NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
>>   LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
>>   interrupt: 300
>>   [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
>>   [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
>>   [c000000016367990] [c000000000710330] 
>> iomap_file_buffered_write+0xa0/0x120
>>   [c0000000163679e0] [c00800000040791c] 
>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
>>   [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
>>   [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
>>   [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
>>   [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
>>   [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
>>   [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
>>   [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c
>>
>> The kernel consider thread AMR value for kernel thread to be
>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
>> of course not correct and we should allow userspace access after
>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
>> AMR value of the operating address space. But, the AMR value is
>> thread-specific and we inherit the address space and not thread
>> access restrictions. Because of this ignore AMR value when accessing
>> userspace via kernel thread.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>> Changes from v1:
>> * Address review feedback from Nick
>>
>>   arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
>> b/arch/powerpc/include/asm/book3s/64/kup.h
>> index f50f72e535aa..95f4df99249e 100644
>> --- a/arch/powerpc/include/asm/book3s/64/kup.h
>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
>> @@ -384,7 +384,13 @@ static __always_inline void 
>> allow_user_access(void __user *to, const void __user
>>       // This is written so we can resolve to a single case at build time
>>       BUILD_BUG_ON(!__builtin_constant_p(dir));
>> -    if (mmu_has_feature(MMU_FTR_PKEY))
>> +    /*
>> +     * if it is a kthread that did kthread_use_mm() don't
>> +     * use current_thread_amr().
> 
> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel 
> thread */
> It doesn't seem to be related to kthread_use_mm()

That should be a sufficient check here. if we did reach here without 
calling kthread_user_mm, we will crash on access because we don't have a 
mm attached to the current process.  a kernel thread with kthread_use_mm has

current->mm == current->active_mm && current->flags & PF_KTHREAD.

The first part is true for every other process too.

> 
>> +     */
>> +    if (current->flags & PF_KTHREAD)
>> +        thread_amr = 0;
>> +    else if (mmu_has_feature(MMU_FTR_PKEY))
>>           thread_amr = current_thread_amr();
>>       if (dir == KUAP_READ)
>>
> 
> Christophe


-aneesh
Christophe Leroy Feb. 2, 2021, 6:20 a.m. UTC | #18
Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit :
> On 2/2/21 11:32 AM, Christophe Leroy wrote:
>>
>>
>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit :
>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:
>>>
>>>> Nicholas Piggin <npiggin@gmail.com> writes:
>>>>
>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>>>> +Aneesh
>>>>>>>
>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
>>>>>> ..
>>>>>>>> [   96.200296] ------------[ cut here ]------------
>>>>>>>> [   96.200304] Bug: Read fault blocked by KUAP!
>>>>>>>> [   96.200309] WARNING: CPU: 3 PID: 1876 at arch/powerpc/mm/fault.c:229 
>>>>>>>> bad_kernel_fault+0x180/0x310
>>>>>>>
>>>>>>>> [   96.200734] NIP [c000000000849424] fault_in_pages_readable+0x104/0x350
>>>>>>>> [   96.200741] LR [c00000000084952c] fault_in_pages_readable+0x20c/0x350
>>>>>>>> [   96.200747] --- interrupt: 300
>>>>>>>
>>>>>>>
>>>>>>> Problem happens in a section where userspace access is supposed to be granted, so the patch you
>>>>>>> proposed is definitely not the right fix.
>>>>>>>
>>>>>>> c000000000849408:    2c 01 00 4c     isync
>>>>>>> c00000000084940c:    a6 03 3d 7d     mtspr   29,r9  <== granting userspace access permission
>>>>>>> c000000000849410:    2c 01 00 4c     isync
>>>>>>> c000000000849414:    00 00 36 e9     ld      r9,0(r22)
>>>>>>> c000000000849418:    20 00 29 81     lwz     r9,32(r9)
>>>>>>> c00000000084941c:    00 02 29 71     andi.   r9,r9,512
>>>>>>> c000000000849420:    78 d3 5e 7f     mr      r30,r26
>>>>>>> ==> c000000000849424:    00 00 bf 8b     lbz     r29,0(r31)  <== accessing userspace
>>>>>>> c000000000849428:    10 00 82 41     beq     c000000000849438 <fault_in_pages_readable+0x118>
>>>>>>> c00000000084942c:    2c 01 00 4c     isync
>>>>>>> c000000000849430:    a6 03 bd 7e     mtspr   29,r21  <== clearing userspace access permission
>>>>>>> c000000000849434:    2c 01 00 4c     isync
>>>>>>>
>>>>>>> My first guess is that the problem is linked to the following function, see the comment
>>>>>>>
>>>>>>> /*
>>>>>>>    * For kernel thread that doesn't have thread.regs return
>>>>>>>    * default AMR/IAMR values.
>>>>>>>    */
>>>>>>> static inline u64 current_thread_amr(void)
>>>>>>> {
>>>>>>>     if (current->thread.regs)
>>>>>>>         return current->thread.regs->amr;
>>>>>>>     return AMR_KUAP_BLOCKED;
>>>>>>> }
>>>>>>>
>>>>>>> Above function was introduced by commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update 
>>>>>>> SPRN_AMR
>>>>>>> when in kernel mode")
>>>>>>
>>>>>> Yeah that's a bit of a curly one.
>>>>>>
>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to mean
>>>>>> the kthread can operate on behalf of the original process that submitted
>>>>>> the IO.
>>>>>>
>>>>>> But because KUAP is implemented using memory protection keys, it depends
>>>>>> on the value of the AMR register, which is not part of the mm, it's in
>>>>>> thread.regs->amr.
>>>>>>
>>>>>> And what's worse by the time we're in kthread_use_mm() we no longer have
>>>>>> access to the thread.regs->amr of the original process that submitted
>>>>>> the IO.
>>>>>>
>>>>>> We also can't simply move the AMR into the mm, precisely because it's
>>>>>> per thread, not per mm.
>>>>>>
>>>>>> So TBH I don't know how we're going to fix this.
>>>>>>
>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's
>>>>>> arguably a bug because it allows a process to circumvent memory keys by
>>>>>> asking the kernel to do the access.
>>>>>
>>>>> We shouldn't need to inherit AMR should we? We only need it to be locked
>>>>> for kernel threads until it's explicitly unlocked -- nothing mm specific
>>>>> there. I think current_thread_amr could return 0 for kernel threads? Or
>>>>> I would even avoid using that function for allow_user_access and open
>>>>> code the kthread case and remove it from current_thread_amr().
>>>>>
>>>>> Thanks,
>>>>> Nick
>>>>
>>>
>>> updated one
>>>
>>>  From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001
>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>> Date: Tue, 2 Feb 2021 09:23:38 +0530
>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace
>>>   after kthread_use_mm
>>>
>>> This fix the bad fault reported by KUAP when io_wqe_worker access userspace.
>>>
>>>   Bug: Read fault blocked by KUAP!
>>>   WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0
>>>   NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
>>>   LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
>>> ..........
>>>   Call Trace:
>>>   [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable)
>>>   [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
>>>   [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
>>>   --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
>>> ..........
>>>   NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
>>>   LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
>>>   interrupt: 300
>>>   [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
>>>   [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
>>>   [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120
>>>   [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
>>>   [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
>>>   [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
>>>   [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
>>>   [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
>>>   [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
>>>   [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
>>>   [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c
>>>
>>> The kernel consider thread AMR value for kernel thread to be
>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
>>> of course not correct and we should allow userspace access after
>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
>>> AMR value of the operating address space. But, the AMR value is
>>> thread-specific and we inherit the address space and not thread
>>> access restrictions. Because of this ignore AMR value when accessing
>>> userspace via kernel thread.
>>>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> ---
>>> Changes from v1:
>>> * Address review feedback from Nick
>>>
>>>   arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++-
>>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
>>> index f50f72e535aa..95f4df99249e 100644
>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h
>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
>>> @@ -384,7 +384,13 @@ static __always_inline void allow_user_access(void __user *to, const void 
>>> __user
>>>       // This is written so we can resolve to a single case at build time
>>>       BUILD_BUG_ON(!__builtin_constant_p(dir));
>>> -    if (mmu_has_feature(MMU_FTR_PKEY))
>>> +    /*
>>> +     * if it is a kthread that did kthread_use_mm() don't
>>> +     * use current_thread_amr().
>>
>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel thread */
>> It doesn't seem to be related to kthread_use_mm()
> 
> That should be a sufficient check here. if we did reach here without calling kthread_user_mm, we 
> will crash on access because we don't have a mm attached to the current process.  a kernel thread 
> with kthread_use_mm has

Ok but then the comment doesn't match the check.

And also the comment in current_thread_amr() is then misleading.

Why not do the current->flags & PF_KTHREAD check in current_thread_amr() and return 0 in that case 
instead of BLOCKED ?

> 
> current->mm == current->active_mm && current->flags & PF_KTHREAD.
> 
> The first part is true for every other process too.
> 
>>
>>> +     */
>>> +    if (current->flags & PF_KTHREAD)
>>> +        thread_amr = 0;
>>> +    else if (mmu_has_feature(MMU_FTR_PKEY))
>>>           thread_amr = current_thread_amr();
>>>       if (dir == KUAP_READ)
>>>
>>
>> Christophe
> 
> 
> -aneesh
Aneesh Kumar K V Feb. 2, 2021, 6:30 a.m. UTC | #19
On 2/2/21 11:50 AM, Christophe Leroy wrote:
> 
> 
> Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit :
>> On 2/2/21 11:32 AM, Christophe Leroy wrote:
>>>
>>>
>>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit :
>>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:
>>>>
>>>>> Nicholas Piggin <npiggin@gmail.com> writes:
>>>>>
>>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>>>>> +Aneesh
>>>>>>>>
>>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
>>>>>>> ..
>>>>>>>>> [   96.200296] ------------[ cut here ]------------
>>>>>>>>> [   96.200304] Bug: Read fault blocked by KUAP!
>>>>>>>>> [   96.200309] WARNING: CPU: 3 PID: 1876 at 
>>>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>>>>>>>>
>>>>>>>>> [   96.200734] NIP [c000000000849424] 
>>>>>>>>> fault_in_pages_readable+0x104/0x350
>>>>>>>>> [   96.200741] LR [c00000000084952c] 
>>>>>>>>> fault_in_pages_readable+0x20c/0x350
>>>>>>>>> [   96.200747] --- interrupt: 300
>>>>>>>>
>>>>>>>>
>>>>>>>> Problem happens in a section where userspace access is supposed 
>>>>>>>> to be granted, so the patch you
>>>>>>>> proposed is definitely not the right fix.
>>>>>>>>
>>>>>>>> c000000000849408:    2c 01 00 4c     isync
>>>>>>>> c00000000084940c:    a6 03 3d 7d     mtspr   29,r9  <== granting 
>>>>>>>> userspace access permission
>>>>>>>> c000000000849410:    2c 01 00 4c     isync
>>>>>>>> c000000000849414:    00 00 36 e9     ld      r9,0(r22)
>>>>>>>> c000000000849418:    20 00 29 81     lwz     r9,32(r9)
>>>>>>>> c00000000084941c:    00 02 29 71     andi.   r9,r9,512
>>>>>>>> c000000000849420:    78 d3 5e 7f     mr      r30,r26
>>>>>>>> ==> c000000000849424:    00 00 bf 8b     lbz     r29,0(r31)  <== 
>>>>>>>> accessing userspace
>>>>>>>> c000000000849428:    10 00 82 41     beq     c000000000849438 
>>>>>>>> <fault_in_pages_readable+0x118>
>>>>>>>> c00000000084942c:    2c 01 00 4c     isync
>>>>>>>> c000000000849430:    a6 03 bd 7e     mtspr   29,r21  <== 
>>>>>>>> clearing userspace access permission
>>>>>>>> c000000000849434:    2c 01 00 4c     isync
>>>>>>>>
>>>>>>>> My first guess is that the problem is linked to the following 
>>>>>>>> function, see the comment
>>>>>>>>
>>>>>>>> /*
>>>>>>>>    * For kernel thread that doesn't have thread.regs return
>>>>>>>>    * default AMR/IAMR values.
>>>>>>>>    */
>>>>>>>> static inline u64 current_thread_amr(void)
>>>>>>>> {
>>>>>>>>     if (current->thread.regs)
>>>>>>>>         return current->thread.regs->amr;
>>>>>>>>     return AMR_KUAP_BLOCKED;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Above function was introduced by commit 48a8ab4eeb82 
>>>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR
>>>>>>>> when in kernel mode")
>>>>>>>
>>>>>>> Yeah that's a bit of a curly one.
>>>>>>>
>>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to 
>>>>>>> mean
>>>>>>> the kthread can operate on behalf of the original process that 
>>>>>>> submitted
>>>>>>> the IO.
>>>>>>>
>>>>>>> But because KUAP is implemented using memory protection keys, it 
>>>>>>> depends
>>>>>>> on the value of the AMR register, which is not part of the mm, 
>>>>>>> it's in
>>>>>>> thread.regs->amr.
>>>>>>>
>>>>>>> And what's worse by the time we're in kthread_use_mm() we no 
>>>>>>> longer have
>>>>>>> access to the thread.regs->amr of the original process that 
>>>>>>> submitted
>>>>>>> the IO.
>>>>>>>
>>>>>>> We also can't simply move the AMR into the mm, precisely because 
>>>>>>> it's
>>>>>>> per thread, not per mm.
>>>>>>>
>>>>>>> So TBH I don't know how we're going to fix this.
>>>>>>>
>>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's
>>>>>>> arguably a bug because it allows a process to circumvent memory 
>>>>>>> keys by
>>>>>>> asking the kernel to do the access.
>>>>>>
>>>>>> We shouldn't need to inherit AMR should we? We only need it to be 
>>>>>> locked
>>>>>> for kernel threads until it's explicitly unlocked -- nothing mm 
>>>>>> specific
>>>>>> there. I think current_thread_amr could return 0 for kernel 
>>>>>> threads? Or
>>>>>> I would even avoid using that function for allow_user_access and open
>>>>>> code the kthread case and remove it from current_thread_amr().
>>>>>>
>>>>>> Thanks,
>>>>>> Nick
>>>>>
>>>>
>>>> updated one
>>>>
>>>>  From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001
>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>>> Date: Tue, 2 Feb 2021 09:23:38 +0530
>>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access 
>>>> userspace
>>>>   after kthread_use_mm
>>>>
>>>> This fix the bad fault reported by KUAP when io_wqe_worker access 
>>>> userspace.
>>>>
>>>>   Bug: Read fault blocked by KUAP!
>>>>   WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 
>>>> __do_page_fault+0x6b4/0xcd0
>>>>   NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
>>>>   LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
>>>> ..........
>>>>   Call Trace:
>>>>   [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 
>>>> (unreliable)
>>>>   [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
>>>>   [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
>>>>   --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
>>>> ..........
>>>>   NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
>>>>   LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
>>>>   interrupt: 300
>>>>   [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
>>>>   [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
>>>>   [c000000016367990] [c000000000710330] 
>>>> iomap_file_buffered_write+0xa0/0x120
>>>>   [c0000000163679e0] [c00800000040791c] 
>>>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
>>>>   [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
>>>>   [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
>>>>   [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
>>>>   [c000000016367cb0] [c0000000006e2578] 
>>>> io_worker_handle_work+0x498/0x800
>>>>   [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
>>>>   [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
>>>>   [c000000016367e10] [c00000000000dbf0] 
>>>> ret_from_kernel_thread+0x5c/0x6c
>>>>
>>>> The kernel consider thread AMR value for kernel thread to be
>>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
>>>> of course not correct and we should allow userspace access after
>>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
>>>> AMR value of the operating address space. But, the AMR value is
>>>> thread-specific and we inherit the address space and not thread
>>>> access restrictions. Because of this ignore AMR value when accessing
>>>> userspace via kernel thread.
>>>>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> ---
>>>> Changes from v1:
>>>> * Address review feedback from Nick
>>>>
>>>>   arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++-
>>>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
>>>> b/arch/powerpc/include/asm/book3s/64/kup.h
>>>> index f50f72e535aa..95f4df99249e 100644
>>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h
>>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
>>>> @@ -384,7 +384,13 @@ static __always_inline void 
>>>> allow_user_access(void __user *to, const void __user
>>>>       // This is written so we can resolve to a single case at build 
>>>> time
>>>>       BUILD_BUG_ON(!__builtin_constant_p(dir));
>>>> -    if (mmu_has_feature(MMU_FTR_PKEY))
>>>> +    /*
>>>> +     * if it is a kthread that did kthread_use_mm() don't
>>>> +     * use current_thread_amr().
>>>
>>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel 
>>> thread */
>>> It doesn't seem to be related to kthread_use_mm()
>>
>> That should be a sufficient check here. if we did reach here without 
>> calling kthread_user_mm, we will crash on access because we don't have 
>> a mm attached to the current process.  a kernel thread with 
>> kthread_use_mm has
> 
> Ok but then the comment doesn't match the check.


I was trying to be explict in the comment that we expect the thread to 
have done kthread_use_mm().

> 
> And also the comment in current_thread_amr() is then misleading.
> 
> Why not do the current->flags & PF_KTHREAD check in current_thread_amr() 
> and return 0 in that case instead of BLOCKED ?

In my view currrent_thread_amr() is more generic and we want to be 
explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we 
call allow user access, we relax the AMR value.

> 
>>
>> current->mm == current->active_mm && current->flags & PF_KTHREAD.
>>
>> The first part is true for every other process too.
>>
>>>
>>>> +     */
>>>> +    if (current->flags & PF_KTHREAD)
>>>> +        thread_amr = 0;
>>>> +    else if (mmu_has_feature(MMU_FTR_PKEY))
>>>>           thread_amr = current_thread_amr();
>>>>       if (dir == KUAP_READ)
>>>>
>>>
>>> Christophe
>>
>>
>> -aneesh
Nicholas Piggin Feb. 2, 2021, 9:49 a.m. UTC | #20
Excerpts from Aneesh Kumar K.V's message of February 2, 2021 4:30 pm:
> On 2/2/21 11:50 AM, Christophe Leroy wrote:
>> 
>> 
>> Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit :
>>> On 2/2/21 11:32 AM, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit :
>>>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:
>>>>>
>>>>>> Nicholas Piggin <npiggin@gmail.com> writes:
>>>>>>
>>>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
>>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>>>>>> +Aneesh
>>>>>>>>>
>>>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
>>>>>>>> ..
>>>>>>>>>> [   96.200296] ------------[ cut here ]------------
>>>>>>>>>> [   96.200304] Bug: Read fault blocked by KUAP!
>>>>>>>>>> [   96.200309] WARNING: CPU: 3 PID: 1876 at 
>>>>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>>>>>>>>>
>>>>>>>>>> [   96.200734] NIP [c000000000849424] 
>>>>>>>>>> fault_in_pages_readable+0x104/0x350
>>>>>>>>>> [   96.200741] LR [c00000000084952c] 
>>>>>>>>>> fault_in_pages_readable+0x20c/0x350
>>>>>>>>>> [   96.200747] --- interrupt: 300
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Problem happens in a section where userspace access is supposed 
>>>>>>>>> to be granted, so the patch you
>>>>>>>>> proposed is definitely not the right fix.
>>>>>>>>>
>>>>>>>>> c000000000849408:    2c 01 00 4c     isync
>>>>>>>>> c00000000084940c:    a6 03 3d 7d     mtspr   29,r9  <== granting 
>>>>>>>>> userspace access permission
>>>>>>>>> c000000000849410:    2c 01 00 4c     isync
>>>>>>>>> c000000000849414:    00 00 36 e9     ld      r9,0(r22)
>>>>>>>>> c000000000849418:    20 00 29 81     lwz     r9,32(r9)
>>>>>>>>> c00000000084941c:    00 02 29 71     andi.   r9,r9,512
>>>>>>>>> c000000000849420:    78 d3 5e 7f     mr      r30,r26
>>>>>>>>> ==> c000000000849424:    00 00 bf 8b     lbz     r29,0(r31)  <== 
>>>>>>>>> accessing userspace
>>>>>>>>> c000000000849428:    10 00 82 41     beq     c000000000849438 
>>>>>>>>> <fault_in_pages_readable+0x118>
>>>>>>>>> c00000000084942c:    2c 01 00 4c     isync
>>>>>>>>> c000000000849430:    a6 03 bd 7e     mtspr   29,r21  <== 
>>>>>>>>> clearing userspace access permission
>>>>>>>>> c000000000849434:    2c 01 00 4c     isync
>>>>>>>>>
>>>>>>>>> My first guess is that the problem is linked to the following 
>>>>>>>>> function, see the comment
>>>>>>>>>
>>>>>>>>> /*
>>>>>>>>>    * For kernel thread that doesn't have thread.regs return
>>>>>>>>>    * default AMR/IAMR values.
>>>>>>>>>    */
>>>>>>>>> static inline u64 current_thread_amr(void)
>>>>>>>>> {
>>>>>>>>>     if (current->thread.regs)
>>>>>>>>>         return current->thread.regs->amr;
>>>>>>>>>     return AMR_KUAP_BLOCKED;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Above function was introduced by commit 48a8ab4eeb82 
>>>>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR
>>>>>>>>> when in kernel mode")
>>>>>>>>
>>>>>>>> Yeah that's a bit of a curly one.
>>>>>>>>
>>>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to 
>>>>>>>> mean
>>>>>>>> the kthread can operate on behalf of the original process that 
>>>>>>>> submitted
>>>>>>>> the IO.
>>>>>>>>
>>>>>>>> But because KUAP is implemented using memory protection keys, it 
>>>>>>>> depends
>>>>>>>> on the value of the AMR register, which is not part of the mm, 
>>>>>>>> it's in
>>>>>>>> thread.regs->amr.
>>>>>>>>
>>>>>>>> And what's worse by the time we're in kthread_use_mm() we no 
>>>>>>>> longer have
>>>>>>>> access to the thread.regs->amr of the original process that 
>>>>>>>> submitted
>>>>>>>> the IO.
>>>>>>>>
>>>>>>>> We also can't simply move the AMR into the mm, precisely because 
>>>>>>>> it's
>>>>>>>> per thread, not per mm.
>>>>>>>>
>>>>>>>> So TBH I don't know how we're going to fix this.
>>>>>>>>
>>>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's
>>>>>>>> arguably a bug because it allows a process to circumvent memory 
>>>>>>>> keys by
>>>>>>>> asking the kernel to do the access.
>>>>>>>
>>>>>>> We shouldn't need to inherit AMR should we? We only need it to be 
>>>>>>> locked
>>>>>>> for kernel threads until it's explicitly unlocked -- nothing mm 
>>>>>>> specific
>>>>>>> there. I think current_thread_amr could return 0 for kernel 
>>>>>>> threads? Or
>>>>>>> I would even avoid using that function for allow_user_access and open
>>>>>>> code the kthread case and remove it from current_thread_amr().
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Nick
>>>>>>
>>>>>
>>>>> updated one
>>>>>
>>>>>  From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001
>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>>>> Date: Tue, 2 Feb 2021 09:23:38 +0530
>>>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access 
>>>>> userspace
>>>>>   after kthread_use_mm
>>>>>
>>>>> This fix the bad fault reported by KUAP when io_wqe_worker access 
>>>>> userspace.
>>>>>
>>>>>   Bug: Read fault blocked by KUAP!
>>>>>   WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 
>>>>> __do_page_fault+0x6b4/0xcd0
>>>>>   NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
>>>>>   LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
>>>>> ..........
>>>>>   Call Trace:
>>>>>   [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 
>>>>> (unreliable)
>>>>>   [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
>>>>>   [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
>>>>>   --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
>>>>> ..........
>>>>>   NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
>>>>>   LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
>>>>>   interrupt: 300
>>>>>   [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
>>>>>   [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
>>>>>   [c000000016367990] [c000000000710330] 
>>>>> iomap_file_buffered_write+0xa0/0x120
>>>>>   [c0000000163679e0] [c00800000040791c] 
>>>>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
>>>>>   [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
>>>>>   [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
>>>>>   [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
>>>>>   [c000000016367cb0] [c0000000006e2578] 
>>>>> io_worker_handle_work+0x498/0x800
>>>>>   [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
>>>>>   [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
>>>>>   [c000000016367e10] [c00000000000dbf0] 
>>>>> ret_from_kernel_thread+0x5c/0x6c
>>>>>
>>>>> The kernel consider thread AMR value for kernel thread to be
>>>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
>>>>> of course not correct and we should allow userspace access after
>>>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
>>>>> AMR value of the operating address space. But, the AMR value is
>>>>> thread-specific and we inherit the address space and not thread
>>>>> access restrictions. Because of this ignore AMR value when accessing
>>>>> userspace via kernel thread.
>>>>>
>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>>> ---
>>>>> Changes from v1:
>>>>> * Address review feedback from Nick
>>>>>
>>>>>   arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++-
>>>>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
>>>>> b/arch/powerpc/include/asm/book3s/64/kup.h
>>>>> index f50f72e535aa..95f4df99249e 100644
>>>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h
>>>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
>>>>> @@ -384,7 +384,13 @@ static __always_inline void 
>>>>> allow_user_access(void __user *to, const void __user
>>>>>       // This is written so we can resolve to a single case at build 
>>>>> time
>>>>>       BUILD_BUG_ON(!__builtin_constant_p(dir));
>>>>> -    if (mmu_has_feature(MMU_FTR_PKEY))
>>>>> +    /*
>>>>> +     * if it is a kthread that did kthread_use_mm() don't
>>>>> +     * use current_thread_amr().
>>>>
>>>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel 
>>>> thread */
>>>> It doesn't seem to be related to kthread_use_mm()
>>>
>>> That should be a sufficient check here. if we did reach here without 
>>> calling kthread_user_mm, we will crash on access because we don't have 
>>> a mm attached to the current process.  a kernel thread with 
>>> kthread_use_mm has
>> 
>> Ok but then the comment doesn't match the check.
> 
> 
> I was trying to be explict in the comment that we expect the thread to 
> have done kthread_use_mm().

I would avoid making it sound conditional. There is no way for a kernel 
thread to ever access user without having done so.

 /*
  * Kernel threads may access user mm with kthread_use_mm() but
  * can't use current_thread_amr because they have thread.regs==NULL,
  * but they have no pkeys.
  */

> 
>> 
>> And also the comment in current_thread_amr() is then misleading.
>> 
>> Why not do the current->flags & PF_KTHREAD check in current_thread_amr() 
>> and return 0 in that case instead of BLOCKED ?
> 
> In my view currrent_thread_amr() is more generic and we want to be 
> explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we 
> call allow user access, we relax the AMR value.

current_thread_amr() shouldn't be used by kernel threads ever
(after the patch). It's just confusing. The user access or the
check could have a test and warning for !current->mm if there
is a concern about it (maybe there already is one somehwere).

Thanks,
Nick
Jens Axboe Feb. 4, 2021, 4:53 p.m. UTC | #21
On 2/1/21 11:30 PM, Aneesh Kumar K.V wrote:
> On 2/2/21 11:50 AM, Christophe Leroy wrote:
>>
>>
>> Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit :
>>> On 2/2/21 11:32 AM, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit :
>>>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:
>>>>>
>>>>>> Nicholas Piggin <npiggin@gmail.com> writes:
>>>>>>
>>>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
>>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>>>>>> +Aneesh
>>>>>>>>>
>>>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
>>>>>>>> ..
>>>>>>>>>> [   96.200296] ------------[ cut here ]------------
>>>>>>>>>> [   96.200304] Bug: Read fault blocked by KUAP!
>>>>>>>>>> [   96.200309] WARNING: CPU: 3 PID: 1876 at 
>>>>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>>>>>>>>>
>>>>>>>>>> [   96.200734] NIP [c000000000849424] 
>>>>>>>>>> fault_in_pages_readable+0x104/0x350
>>>>>>>>>> [   96.200741] LR [c00000000084952c] 
>>>>>>>>>> fault_in_pages_readable+0x20c/0x350
>>>>>>>>>> [   96.200747] --- interrupt: 300
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Problem happens in a section where userspace access is supposed 
>>>>>>>>> to be granted, so the patch you
>>>>>>>>> proposed is definitely not the right fix.
>>>>>>>>>
>>>>>>>>> c000000000849408:    2c 01 00 4c     isync
>>>>>>>>> c00000000084940c:    a6 03 3d 7d     mtspr   29,r9  <== granting 
>>>>>>>>> userspace access permission
>>>>>>>>> c000000000849410:    2c 01 00 4c     isync
>>>>>>>>> c000000000849414:    00 00 36 e9     ld      r9,0(r22)
>>>>>>>>> c000000000849418:    20 00 29 81     lwz     r9,32(r9)
>>>>>>>>> c00000000084941c:    00 02 29 71     andi.   r9,r9,512
>>>>>>>>> c000000000849420:    78 d3 5e 7f     mr      r30,r26
>>>>>>>>> ==> c000000000849424:    00 00 bf 8b     lbz     r29,0(r31)  <== 
>>>>>>>>> accessing userspace
>>>>>>>>> c000000000849428:    10 00 82 41     beq     c000000000849438 
>>>>>>>>> <fault_in_pages_readable+0x118>
>>>>>>>>> c00000000084942c:    2c 01 00 4c     isync
>>>>>>>>> c000000000849430:    a6 03 bd 7e     mtspr   29,r21  <== 
>>>>>>>>> clearing userspace access permission
>>>>>>>>> c000000000849434:    2c 01 00 4c     isync
>>>>>>>>>
>>>>>>>>> My first guess is that the problem is linked to the following 
>>>>>>>>> function, see the comment
>>>>>>>>>
>>>>>>>>> /*
>>>>>>>>>    * For kernel thread that doesn't have thread.regs return
>>>>>>>>>    * default AMR/IAMR values.
>>>>>>>>>    */
>>>>>>>>> static inline u64 current_thread_amr(void)
>>>>>>>>> {
>>>>>>>>>     if (current->thread.regs)
>>>>>>>>>         return current->thread.regs->amr;
>>>>>>>>>     return AMR_KUAP_BLOCKED;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Above function was introduced by commit 48a8ab4eeb82 
>>>>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR
>>>>>>>>> when in kernel mode")
>>>>>>>>
>>>>>>>> Yeah that's a bit of a curly one.
>>>>>>>>
>>>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to 
>>>>>>>> mean
>>>>>>>> the kthread can operate on behalf of the original process that 
>>>>>>>> submitted
>>>>>>>> the IO.
>>>>>>>>
>>>>>>>> But because KUAP is implemented using memory protection keys, it 
>>>>>>>> depends
>>>>>>>> on the value of the AMR register, which is not part of the mm, 
>>>>>>>> it's in
>>>>>>>> thread.regs->amr.
>>>>>>>>
>>>>>>>> And what's worse by the time we're in kthread_use_mm() we no 
>>>>>>>> longer have
>>>>>>>> access to the thread.regs->amr of the original process that 
>>>>>>>> submitted
>>>>>>>> the IO.
>>>>>>>>
>>>>>>>> We also can't simply move the AMR into the mm, precisely because 
>>>>>>>> it's
>>>>>>>> per thread, not per mm.
>>>>>>>>
>>>>>>>> So TBH I don't know how we're going to fix this.
>>>>>>>>
>>>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's
>>>>>>>> arguably a bug because it allows a process to circumvent memory 
>>>>>>>> keys by
>>>>>>>> asking the kernel to do the access.
>>>>>>>
>>>>>>> We shouldn't need to inherit AMR should we? We only need it to be 
>>>>>>> locked
>>>>>>> for kernel threads until it's explicitly unlocked -- nothing mm 
>>>>>>> specific
>>>>>>> there. I think current_thread_amr could return 0 for kernel 
>>>>>>> threads? Or
>>>>>>> I would even avoid using that function for allow_user_access and open
>>>>>>> code the kthread case and remove it from current_thread_amr().
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Nick
>>>>>>
>>>>>
>>>>> updated one
>>>>>
>>>>>  From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001
>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>>>> Date: Tue, 2 Feb 2021 09:23:38 +0530
>>>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access 
>>>>> userspace
>>>>>   after kthread_use_mm
>>>>>
>>>>> This fix the bad fault reported by KUAP when io_wqe_worker access 
>>>>> userspace.
>>>>>
>>>>>   Bug: Read fault blocked by KUAP!
>>>>>   WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 
>>>>> __do_page_fault+0x6b4/0xcd0
>>>>>   NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
>>>>>   LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
>>>>> ..........
>>>>>   Call Trace:
>>>>>   [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 
>>>>> (unreliable)
>>>>>   [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
>>>>>   [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
>>>>>   --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
>>>>> ..........
>>>>>   NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
>>>>>   LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
>>>>>   interrupt: 300
>>>>>   [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
>>>>>   [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
>>>>>   [c000000016367990] [c000000000710330] 
>>>>> iomap_file_buffered_write+0xa0/0x120
>>>>>   [c0000000163679e0] [c00800000040791c] 
>>>>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
>>>>>   [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
>>>>>   [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
>>>>>   [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
>>>>>   [c000000016367cb0] [c0000000006e2578] 
>>>>> io_worker_handle_work+0x498/0x800
>>>>>   [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
>>>>>   [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
>>>>>   [c000000016367e10] [c00000000000dbf0] 
>>>>> ret_from_kernel_thread+0x5c/0x6c
>>>>>
>>>>> The kernel consider thread AMR value for kernel thread to be
>>>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
>>>>> of course not correct and we should allow userspace access after
>>>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
>>>>> AMR value of the operating address space. But, the AMR value is
>>>>> thread-specific and we inherit the address space and not thread
>>>>> access restrictions. Because of this ignore AMR value when accessing
>>>>> userspace via kernel thread.
>>>>>
>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>>> ---
>>>>> Changes from v1:
>>>>> * Address review feedback from Nick
>>>>>
>>>>>   arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++-
>>>>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
>>>>> b/arch/powerpc/include/asm/book3s/64/kup.h
>>>>> index f50f72e535aa..95f4df99249e 100644
>>>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h
>>>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
>>>>> @@ -384,7 +384,13 @@ static __always_inline void 
>>>>> allow_user_access(void __user *to, const void __user
>>>>>       // This is written so we can resolve to a single case at build 
>>>>> time
>>>>>       BUILD_BUG_ON(!__builtin_constant_p(dir));
>>>>> -    if (mmu_has_feature(MMU_FTR_PKEY))
>>>>> +    /*
>>>>> +     * if it is a kthread that did kthread_use_mm() don't
>>>>> +     * use current_thread_amr().
>>>>
>>>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel 
>>>> thread */
>>>> It doesn't seem to be related to kthread_use_mm()
>>>
>>> That should be a sufficient check here. if we did reach here without 
>>> calling kthread_user_mm, we will crash on access because we don't have 
>>> a mm attached to the current process.  a kernel thread with 
>>> kthread_use_mm has
>>
>> Ok but then the comment doesn't match the check.
> 
> 
> I was trying to be explict in the comment that we expect the thread to 
> have done kthread_use_mm().
> 
>>
>> And also the comment in current_thread_amr() is then misleading.
>>
>> Why not do the current->flags & PF_KTHREAD check in current_thread_amr() 
>> and return 0 in that case instead of BLOCKED ?
> 
> In my view currrent_thread_amr() is more generic and we want to be 
> explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we 
> call allow user access, we relax the AMR value.

Just following up on this, as I'd hate to have 5.11 released with this
bug in it for powerpc. It'll obviously also affect other cases of a
kernel thread faulting after having done kthread_use_mm(), though I'm
not sure how widespread that is. In any case, it'll leave io_uring
mostly broken on powerpc if this isn't patched for release.
Aneesh Kumar K V Feb. 4, 2021, 5:01 p.m. UTC | #22
On 2/4/21 10:23 PM, Jens Axboe wrote:
> On 2/1/21 11:30 PM, Aneesh Kumar K.V wrote:
>> On 2/2/21 11:50 AM, Christophe Leroy wrote:
>>>
>>>
>>> Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit :
>>>> On 2/2/21 11:32 AM, Christophe Leroy wrote:
>>>>>
>>>>>
>>>>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit :
>>>>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>
>>>>>>> Nicholas Piggin <npiggin@gmail.com> writes:
>>>>>>>
>>>>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
>>>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>>>>>>> +Aneesh
>>>>>>>>>>
>>>>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
>>>>>>>>> ..
>>>>>>>>>>> [   96.200296] ------------[ cut here ]------------
>>>>>>>>>>> [   96.200304] Bug: Read fault blocked by KUAP!
>>>>>>>>>>> [   96.200309] WARNING: CPU: 3 PID: 1876 at
>>>>>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>>>>>>>>>>
>>>>>>>>>>> [   96.200734] NIP [c000000000849424]
>>>>>>>>>>> fault_in_pages_readable+0x104/0x350
>>>>>>>>>>> [   96.200741] LR [c00000000084952c]
>>>>>>>>>>> fault_in_pages_readable+0x20c/0x350
>>>>>>>>>>> [   96.200747] --- interrupt: 300
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Problem happens in a section where userspace access is supposed
>>>>>>>>>> to be granted, so the patch you
>>>>>>>>>> proposed is definitely not the right fix.
>>>>>>>>>>
>>>>>>>>>> c000000000849408:    2c 01 00 4c     isync
>>>>>>>>>> c00000000084940c:    a6 03 3d 7d     mtspr   29,r9  <== granting
>>>>>>>>>> userspace access permission
>>>>>>>>>> c000000000849410:    2c 01 00 4c     isync
>>>>>>>>>> c000000000849414:    00 00 36 e9     ld      r9,0(r22)
>>>>>>>>>> c000000000849418:    20 00 29 81     lwz     r9,32(r9)
>>>>>>>>>> c00000000084941c:    00 02 29 71     andi.   r9,r9,512
>>>>>>>>>> c000000000849420:    78 d3 5e 7f     mr      r30,r26
>>>>>>>>>> ==> c000000000849424:    00 00 bf 8b     lbz     r29,0(r31)  <==
>>>>>>>>>> accessing userspace
>>>>>>>>>> c000000000849428:    10 00 82 41     beq     c000000000849438
>>>>>>>>>> <fault_in_pages_readable+0x118>
>>>>>>>>>> c00000000084942c:    2c 01 00 4c     isync
>>>>>>>>>> c000000000849430:    a6 03 bd 7e     mtspr   29,r21  <==
>>>>>>>>>> clearing userspace access permission
>>>>>>>>>> c000000000849434:    2c 01 00 4c     isync
>>>>>>>>>>
>>>>>>>>>> My first guess is that the problem is linked to the following
>>>>>>>>>> function, see the comment
>>>>>>>>>>
>>>>>>>>>> /*
>>>>>>>>>>     * For kernel thread that doesn't have thread.regs return
>>>>>>>>>>     * default AMR/IAMR values.
>>>>>>>>>>     */
>>>>>>>>>> static inline u64 current_thread_amr(void)
>>>>>>>>>> {
>>>>>>>>>>      if (current->thread.regs)
>>>>>>>>>>          return current->thread.regs->amr;
>>>>>>>>>>      return AMR_KUAP_BLOCKED;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Above function was introduced by commit 48a8ab4eeb82
>>>>>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR
>>>>>>>>>> when in kernel mode")
>>>>>>>>>
>>>>>>>>> Yeah that's a bit of a curly one.
>>>>>>>>>
>>>>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to
>>>>>>>>> mean
>>>>>>>>> the kthread can operate on behalf of the original process that
>>>>>>>>> submitted
>>>>>>>>> the IO.
>>>>>>>>>
>>>>>>>>> But because KUAP is implemented using memory protection keys, it
>>>>>>>>> depends
>>>>>>>>> on the value of the AMR register, which is not part of the mm,
>>>>>>>>> it's in
>>>>>>>>> thread.regs->amr.
>>>>>>>>>
>>>>>>>>> And what's worse by the time we're in kthread_use_mm() we no
>>>>>>>>> longer have
>>>>>>>>> access to the thread.regs->amr of the original process that
>>>>>>>>> submitted
>>>>>>>>> the IO.
>>>>>>>>>
>>>>>>>>> We also can't simply move the AMR into the mm, precisely because
>>>>>>>>> it's
>>>>>>>>> per thread, not per mm.
>>>>>>>>>
>>>>>>>>> So TBH I don't know how we're going to fix this.
>>>>>>>>>
>>>>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's
>>>>>>>>> arguably a bug because it allows a process to circumvent memory
>>>>>>>>> keys by
>>>>>>>>> asking the kernel to do the access.
>>>>>>>>
>>>>>>>> We shouldn't need to inherit AMR should we? We only need it to be
>>>>>>>> locked
>>>>>>>> for kernel threads until it's explicitly unlocked -- nothing mm
>>>>>>>> specific
>>>>>>>> there. I think current_thread_amr could return 0 for kernel
>>>>>>>> threads? Or
>>>>>>>> I would even avoid using that function for allow_user_access and open
>>>>>>>> code the kthread case and remove it from current_thread_amr().
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Nick
>>>>>>>
>>>>>>
>>>>>> updated one
>>>>>>
>>>>>>   From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001
>>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>>>>> Date: Tue, 2 Feb 2021 09:23:38 +0530
>>>>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access
>>>>>> userspace
>>>>>>    after kthread_use_mm
>>>>>>
>>>>>> This fix the bad fault reported by KUAP when io_wqe_worker access
>>>>>> userspace.
>>>>>>
>>>>>>    Bug: Read fault blocked by KUAP!
>>>>>>    WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229
>>>>>> __do_page_fault+0x6b4/0xcd0
>>>>>>    NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
>>>>>>    LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
>>>>>> ..........
>>>>>>    Call Trace:
>>>>>>    [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
>>>>>> (unreliable)
>>>>>>    [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
>>>>>>    [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
>>>>>>    --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
>>>>>> ..........
>>>>>>    NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
>>>>>>    LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
>>>>>>    interrupt: 300
>>>>>>    [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
>>>>>>    [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
>>>>>>    [c000000016367990] [c000000000710330]
>>>>>> iomap_file_buffered_write+0xa0/0x120
>>>>>>    [c0000000163679e0] [c00800000040791c]
>>>>>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
>>>>>>    [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
>>>>>>    [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
>>>>>>    [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
>>>>>>    [c000000016367cb0] [c0000000006e2578]
>>>>>> io_worker_handle_work+0x498/0x800
>>>>>>    [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
>>>>>>    [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
>>>>>>    [c000000016367e10] [c00000000000dbf0]
>>>>>> ret_from_kernel_thread+0x5c/0x6c
>>>>>>
>>>>>> The kernel consider thread AMR value for kernel thread to be
>>>>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
>>>>>> of course not correct and we should allow userspace access after
>>>>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
>>>>>> AMR value of the operating address space. But, the AMR value is
>>>>>> thread-specific and we inherit the address space and not thread
>>>>>> access restrictions. Because of this ignore AMR value when accessing
>>>>>> userspace via kernel thread.
>>>>>>
>>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>>>> ---
>>>>>> Changes from v1:
>>>>>> * Address review feedback from Nick
>>>>>>
>>>>>>    arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++-
>>>>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h
>>>>>> b/arch/powerpc/include/asm/book3s/64/kup.h
>>>>>> index f50f72e535aa..95f4df99249e 100644
>>>>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h
>>>>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
>>>>>> @@ -384,7 +384,13 @@ static __always_inline void
>>>>>> allow_user_access(void __user *to, const void __user
>>>>>>        // This is written so we can resolve to a single case at build
>>>>>> time
>>>>>>        BUILD_BUG_ON(!__builtin_constant_p(dir));
>>>>>> -    if (mmu_has_feature(MMU_FTR_PKEY))
>>>>>> +    /*
>>>>>> +     * if it is a kthread that did kthread_use_mm() don't
>>>>>> +     * use current_thread_amr().
>>>>>
>>>>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel
>>>>> thread */
>>>>> It doesn't seem to be related to kthread_use_mm()
>>>>
>>>> That should be a sufficient check here. if we did reach here without
>>>> calling kthread_user_mm, we will crash on access because we don't have
>>>> a mm attached to the current process.  a kernel thread with
>>>> kthread_use_mm has
>>>
>>> Ok but then the comment doesn't match the check.
>>
>>
>> I was trying to be explict in the comment that we expect the thread to
>> have done kthread_use_mm().
>>
>>>
>>> And also the comment in current_thread_amr() is then misleading.
>>>
>>> Why not do the current->flags & PF_KTHREAD check in current_thread_amr()
>>> and return 0 in that case instead of BLOCKED ?
>>
>> In my view currrent_thread_amr() is more generic and we want to be
>> explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we
>> call allow user access, we relax the AMR value.
> 
> Just following up on this, as I'd hate to have 5.11 released with this
> bug in it for powerpc. It'll obviously also affect other cases of a
> kernel thread faulting after having done kthread_use_mm(), though I'm
> not sure how widespread that is. In any case, it'll leave io_uring
> mostly broken on powerpc if this isn't patched for release.
> 

I am waiting for test feedback on the change I posted earlier. I am also 
running a regression run myself. Once that is complete i will post the 
patch as a separate email.

-aneesh
Jens Axboe Feb. 4, 2021, 5:03 p.m. UTC | #23
On 2/4/21 10:01 AM, Aneesh Kumar K.V wrote:
> On 2/4/21 10:23 PM, Jens Axboe wrote:
>> On 2/1/21 11:30 PM, Aneesh Kumar K.V wrote:
>>> On 2/2/21 11:50 AM, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit :
>>>>> On 2/2/21 11:32 AM, Christophe Leroy wrote:
>>>>>>
>>>>>>
>>>>>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit :
>>>>>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>
>>>>>>>> Nicholas Piggin <npiggin@gmail.com> writes:
>>>>>>>>
>>>>>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
>>>>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>>>>>>>> +Aneesh
>>>>>>>>>>>
>>>>>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
>>>>>>>>>> ..
>>>>>>>>>>>> [   96.200296] ------------[ cut here ]------------
>>>>>>>>>>>> [   96.200304] Bug: Read fault blocked by KUAP!
>>>>>>>>>>>> [   96.200309] WARNING: CPU: 3 PID: 1876 at
>>>>>>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>>>>>>>>>>>
>>>>>>>>>>>> [   96.200734] NIP [c000000000849424]
>>>>>>>>>>>> fault_in_pages_readable+0x104/0x350
>>>>>>>>>>>> [   96.200741] LR [c00000000084952c]
>>>>>>>>>>>> fault_in_pages_readable+0x20c/0x350
>>>>>>>>>>>> [   96.200747] --- interrupt: 300
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Problem happens in a section where userspace access is supposed
>>>>>>>>>>> to be granted, so the patch you
>>>>>>>>>>> proposed is definitely not the right fix.
>>>>>>>>>>>
>>>>>>>>>>> c000000000849408:    2c 01 00 4c     isync
>>>>>>>>>>> c00000000084940c:    a6 03 3d 7d     mtspr   29,r9  <== granting
>>>>>>>>>>> userspace access permission
>>>>>>>>>>> c000000000849410:    2c 01 00 4c     isync
>>>>>>>>>>> c000000000849414:    00 00 36 e9     ld      r9,0(r22)
>>>>>>>>>>> c000000000849418:    20 00 29 81     lwz     r9,32(r9)
>>>>>>>>>>> c00000000084941c:    00 02 29 71     andi.   r9,r9,512
>>>>>>>>>>> c000000000849420:    78 d3 5e 7f     mr      r30,r26
>>>>>>>>>>> ==> c000000000849424:    00 00 bf 8b     lbz     r29,0(r31)  <==
>>>>>>>>>>> accessing userspace
>>>>>>>>>>> c000000000849428:    10 00 82 41     beq     c000000000849438
>>>>>>>>>>> <fault_in_pages_readable+0x118>
>>>>>>>>>>> c00000000084942c:    2c 01 00 4c     isync
>>>>>>>>>>> c000000000849430:    a6 03 bd 7e     mtspr   29,r21  <==
>>>>>>>>>>> clearing userspace access permission
>>>>>>>>>>> c000000000849434:    2c 01 00 4c     isync
>>>>>>>>>>>
>>>>>>>>>>> My first guess is that the problem is linked to the following
>>>>>>>>>>> function, see the comment
>>>>>>>>>>>
>>>>>>>>>>> /*
>>>>>>>>>>>     * For kernel thread that doesn't have thread.regs return
>>>>>>>>>>>     * default AMR/IAMR values.
>>>>>>>>>>>     */
>>>>>>>>>>> static inline u64 current_thread_amr(void)
>>>>>>>>>>> {
>>>>>>>>>>>      if (current->thread.regs)
>>>>>>>>>>>          return current->thread.regs->amr;
>>>>>>>>>>>      return AMR_KUAP_BLOCKED;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Above function was introduced by commit 48a8ab4eeb82
>>>>>>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR
>>>>>>>>>>> when in kernel mode")
>>>>>>>>>>
>>>>>>>>>> Yeah that's a bit of a curly one.
>>>>>>>>>>
>>>>>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to
>>>>>>>>>> mean
>>>>>>>>>> the kthread can operate on behalf of the original process that
>>>>>>>>>> submitted
>>>>>>>>>> the IO.
>>>>>>>>>>
>>>>>>>>>> But because KUAP is implemented using memory protection keys, it
>>>>>>>>>> depends
>>>>>>>>>> on the value of the AMR register, which is not part of the mm,
>>>>>>>>>> it's in
>>>>>>>>>> thread.regs->amr.
>>>>>>>>>>
>>>>>>>>>> And what's worse by the time we're in kthread_use_mm() we no
>>>>>>>>>> longer have
>>>>>>>>>> access to the thread.regs->amr of the original process that
>>>>>>>>>> submitted
>>>>>>>>>> the IO.
>>>>>>>>>>
>>>>>>>>>> We also can't simply move the AMR into the mm, precisely because
>>>>>>>>>> it's
>>>>>>>>>> per thread, not per mm.
>>>>>>>>>>
>>>>>>>>>> So TBH I don't know how we're going to fix this.
>>>>>>>>>>
>>>>>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's
>>>>>>>>>> arguably a bug because it allows a process to circumvent memory
>>>>>>>>>> keys by
>>>>>>>>>> asking the kernel to do the access.
>>>>>>>>>
>>>>>>>>> We shouldn't need to inherit AMR should we? We only need it to be
>>>>>>>>> locked
>>>>>>>>> for kernel threads until it's explicitly unlocked -- nothing mm
>>>>>>>>> specific
>>>>>>>>> there. I think current_thread_amr could return 0 for kernel
>>>>>>>>> threads? Or
>>>>>>>>> I would even avoid using that function for allow_user_access and open
>>>>>>>>> code the kthread case and remove it from current_thread_amr().
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Nick
>>>>>>>>
>>>>>>>
>>>>>>> updated one
>>>>>>>
>>>>>>>   From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001
>>>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>>>>>> Date: Tue, 2 Feb 2021 09:23:38 +0530
>>>>>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access
>>>>>>> userspace
>>>>>>>    after kthread_use_mm
>>>>>>>
>>>>>>> This fix the bad fault reported by KUAP when io_wqe_worker access
>>>>>>> userspace.
>>>>>>>
>>>>>>>    Bug: Read fault blocked by KUAP!
>>>>>>>    WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229
>>>>>>> __do_page_fault+0x6b4/0xcd0
>>>>>>>    NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
>>>>>>>    LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
>>>>>>> ..........
>>>>>>>    Call Trace:
>>>>>>>    [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
>>>>>>> (unreliable)
>>>>>>>    [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
>>>>>>>    [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
>>>>>>>    --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
>>>>>>> ..........
>>>>>>>    NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
>>>>>>>    LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
>>>>>>>    interrupt: 300
>>>>>>>    [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
>>>>>>>    [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
>>>>>>>    [c000000016367990] [c000000000710330]
>>>>>>> iomap_file_buffered_write+0xa0/0x120
>>>>>>>    [c0000000163679e0] [c00800000040791c]
>>>>>>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
>>>>>>>    [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
>>>>>>>    [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
>>>>>>>    [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
>>>>>>>    [c000000016367cb0] [c0000000006e2578]
>>>>>>> io_worker_handle_work+0x498/0x800
>>>>>>>    [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
>>>>>>>    [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
>>>>>>>    [c000000016367e10] [c00000000000dbf0]
>>>>>>> ret_from_kernel_thread+0x5c/0x6c
>>>>>>>
>>>>>>> The kernel consider thread AMR value for kernel thread to be
>>>>>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
>>>>>>> of course not correct and we should allow userspace access after
>>>>>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
>>>>>>> AMR value of the operating address space. But, the AMR value is
>>>>>>> thread-specific and we inherit the address space and not thread
>>>>>>> access restrictions. Because of this ignore AMR value when accessing
>>>>>>> userspace via kernel thread.
>>>>>>>
>>>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>>>>> ---
>>>>>>> Changes from v1:
>>>>>>> * Address review feedback from Nick
>>>>>>>
>>>>>>>    arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++-
>>>>>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h
>>>>>>> b/arch/powerpc/include/asm/book3s/64/kup.h
>>>>>>> index f50f72e535aa..95f4df99249e 100644
>>>>>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h
>>>>>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
>>>>>>> @@ -384,7 +384,13 @@ static __always_inline void
>>>>>>> allow_user_access(void __user *to, const void __user
>>>>>>>        // This is written so we can resolve to a single case at build
>>>>>>> time
>>>>>>>        BUILD_BUG_ON(!__builtin_constant_p(dir));
>>>>>>> -    if (mmu_has_feature(MMU_FTR_PKEY))
>>>>>>> +    /*
>>>>>>> +     * if it is a kthread that did kthread_use_mm() don't
>>>>>>> +     * use current_thread_amr().
>>>>>>
>>>>>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel
>>>>>> thread */
>>>>>> It doesn't seem to be related to kthread_use_mm()
>>>>>
>>>>> That should be a sufficient check here. if we did reach here without
>>>>> calling kthread_user_mm, we will crash on access because we don't have
>>>>> a mm attached to the current process.  a kernel thread with
>>>>> kthread_use_mm has
>>>>
>>>> Ok but then the comment doesn't match the check.
>>>
>>>
>>> I was trying to be explict in the comment that we expect the thread to
>>> have done kthread_use_mm().
>>>
>>>>
>>>> And also the comment in current_thread_amr() is then misleading.
>>>>
>>>> Why not do the current->flags & PF_KTHREAD check in current_thread_amr()
>>>> and return 0 in that case instead of BLOCKED ?
>>>
>>> In my view currrent_thread_amr() is more generic and we want to be
>>> explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we
>>> call allow user access, we relax the AMR value.
>>
>> Just following up on this, as I'd hate to have 5.11 released with this
>> bug in it for powerpc. It'll obviously also affect other cases of a
>> kernel thread faulting after having done kthread_use_mm(), though I'm
>> not sure how widespread that is. In any case, it'll leave io_uring
>> mostly broken on powerpc if this isn't patched for release.
>>
> 
> I am waiting for test feedback on the change I posted earlier. I am also 
> running a regression run myself. Once that is complete i will post the 
> patch as a separate email.

Perfect, sounds good!
Aneesh Kumar K V Feb. 4, 2021, 5:42 p.m. UTC | #24
On 2/4/21 11:27 PM, Zorro Lang wrote:
> On Thu, Feb 04, 2021 at 10:31:59PM +0530, Aneesh Kumar K.V wrote:
>> On 2/4/21 10:23 PM, Jens Axboe wrote:
>>> On 2/1/21 11:30 PM, Aneesh Kumar K.V wrote:
>>>> On 2/2/21 11:50 AM, Christophe Leroy wrote:
>>>>>
>>>>>
>>>>> Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit :
>>>>>> On 2/2/21 11:32 AM, Christophe Leroy wrote:
>>>>>>>
>>>>>>>
>>>>>>> Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit :
>>>>>>>> Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:
>>>>>>>>
>>>>>>>>> Nicholas Piggin <npiggin@gmail.com> writes:
>>>>>>>>>
>>>>>>>>>> Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
>>>>>>>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>>>>>>>>> +Aneesh
>>>>>>>>>>>>
>>>>>>>>>>>> Le 29/01/2021 à 07:52, Zorro Lang a écrit :
>>>>>>>>>>> ..
>>>>>>>>>>>>> [   96.200296] ------------[ cut here ]------------
>>>>>>>>>>>>> [   96.200304] Bug: Read fault blocked by KUAP!
>>>>>>>>>>>>> [   96.200309] WARNING: CPU: 3 PID: 1876 at
>>>>>>>>>>>>> arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
>>>>>>>>>>>>
>>>>>>>>>>>>> [   96.200734] NIP [c000000000849424]
>>>>>>>>>>>>> fault_in_pages_readable+0x104/0x350
>>>>>>>>>>>>> [   96.200741] LR [c00000000084952c]
>>>>>>>>>>>>> fault_in_pages_readable+0x20c/0x350
>>>>>>>>>>>>> [   96.200747] --- interrupt: 300
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Problem happens in a section where userspace access is supposed
>>>>>>>>>>>> to be granted, so the patch you
>>>>>>>>>>>> proposed is definitely not the right fix.
>>>>>>>>>>>>
>>>>>>>>>>>> c000000000849408:    2c 01 00 4c     isync
>>>>>>>>>>>> c00000000084940c:    a6 03 3d 7d     mtspr   29,r9  <== granting
>>>>>>>>>>>> userspace access permission
>>>>>>>>>>>> c000000000849410:    2c 01 00 4c     isync
>>>>>>>>>>>> c000000000849414:    00 00 36 e9     ld      r9,0(r22)
>>>>>>>>>>>> c000000000849418:    20 00 29 81     lwz     r9,32(r9)
>>>>>>>>>>>> c00000000084941c:    00 02 29 71     andi.   r9,r9,512
>>>>>>>>>>>> c000000000849420:    78 d3 5e 7f     mr      r30,r26
>>>>>>>>>>>> ==> c000000000849424:    00 00 bf 8b     lbz     r29,0(r31)  <==
>>>>>>>>>>>> accessing userspace
>>>>>>>>>>>> c000000000849428:    10 00 82 41     beq     c000000000849438
>>>>>>>>>>>> <fault_in_pages_readable+0x118>
>>>>>>>>>>>> c00000000084942c:    2c 01 00 4c     isync
>>>>>>>>>>>> c000000000849430:    a6 03 bd 7e     mtspr   29,r21  <==
>>>>>>>>>>>> clearing userspace access permission
>>>>>>>>>>>> c000000000849434:    2c 01 00 4c     isync
>>>>>>>>>>>>
>>>>>>>>>>>> My first guess is that the problem is linked to the following
>>>>>>>>>>>> function, see the comment
>>>>>>>>>>>>
>>>>>>>>>>>> /*
>>>>>>>>>>>>      * For kernel thread that doesn't have thread.regs return
>>>>>>>>>>>>      * default AMR/IAMR values.
>>>>>>>>>>>>      */
>>>>>>>>>>>> static inline u64 current_thread_amr(void)
>>>>>>>>>>>> {
>>>>>>>>>>>>       if (current->thread.regs)
>>>>>>>>>>>>           return current->thread.regs->amr;
>>>>>>>>>>>>       return AMR_KUAP_BLOCKED;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> Above function was introduced by commit 48a8ab4eeb82
>>>>>>>>>>>> ("powerpc/book3s64/pkeys: Don't update SPRN_AMR
>>>>>>>>>>>> when in kernel mode")
>>>>>>>>>>>
>>>>>>>>>>> Yeah that's a bit of a curly one.
>>>>>>>>>>>
>>>>>>>>>>> At some point io_uring did kthread_use_mm(), which is supposed to
>>>>>>>>>>> mean
>>>>>>>>>>> the kthread can operate on behalf of the original process that
>>>>>>>>>>> submitted
>>>>>>>>>>> the IO.
>>>>>>>>>>>
>>>>>>>>>>> But because KUAP is implemented using memory protection keys, it
>>>>>>>>>>> depends
>>>>>>>>>>> on the value of the AMR register, which is not part of the mm,
>>>>>>>>>>> it's in
>>>>>>>>>>> thread.regs->amr.
>>>>>>>>>>>
>>>>>>>>>>> And what's worse by the time we're in kthread_use_mm() we no
>>>>>>>>>>> longer have
>>>>>>>>>>> access to the thread.regs->amr of the original process that
>>>>>>>>>>> submitted
>>>>>>>>>>> the IO.
>>>>>>>>>>>
>>>>>>>>>>> We also can't simply move the AMR into the mm, precisely because
>>>>>>>>>>> it's
>>>>>>>>>>> per thread, not per mm.
>>>>>>>>>>>
>>>>>>>>>>> So TBH I don't know how we're going to fix this.
>>>>>>>>>>>
>>>>>>>>>>> I guess we could return AMR=unblocked for kernel threads, but that's
>>>>>>>>>>> arguably a bug because it allows a process to circumvent memory
>>>>>>>>>>> keys by
>>>>>>>>>>> asking the kernel to do the access.
>>>>>>>>>>
>>>>>>>>>> We shouldn't need to inherit AMR should we? We only need it to be
>>>>>>>>>> locked
>>>>>>>>>> for kernel threads until it's explicitly unlocked -- nothing mm
>>>>>>>>>> specific
>>>>>>>>>> there. I think current_thread_amr could return 0 for kernel
>>>>>>>>>> threads? Or
>>>>>>>>>> I would even avoid using that function for allow_user_access and open
>>>>>>>>>> code the kthread case and remove it from current_thread_amr().
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Nick
>>>>>>>>>
>>>>>>>>
>>>>>>>> updated one
>>>>>>>>
>>>>>>>>    From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001
>>>>>>>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
>>>>>>>> Date: Tue, 2 Feb 2021 09:23:38 +0530
>>>>>>>> Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access
>>>>>>>> userspace
>>>>>>>>     after kthread_use_mm
>>>>>>>>
>>>>>>>> This fix the bad fault reported by KUAP when io_wqe_worker access
>>>>>>>> userspace.
>>>>>>>>
>>>>>>>>     Bug: Read fault blocked by KUAP!
>>>>>>>>     WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229
>>>>>>>> __do_page_fault+0x6b4/0xcd0
>>>>>>>>     NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
>>>>>>>>     LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
>>>>>>>> ..........
>>>>>>>>     Call Trace:
>>>>>>>>     [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
>>>>>>>> (unreliable)
>>>>>>>>     [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
>>>>>>>>     [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
>>>>>>>>     --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
>>>>>>>> ..........
>>>>>>>>     NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
>>>>>>>>     LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
>>>>>>>>     interrupt: 300
>>>>>>>>     [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
>>>>>>>>     [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
>>>>>>>>     [c000000016367990] [c000000000710330]
>>>>>>>> iomap_file_buffered_write+0xa0/0x120
>>>>>>>>     [c0000000163679e0] [c00800000040791c]
>>>>>>>> xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
>>>>>>>>     [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
>>>>>>>>     [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
>>>>>>>>     [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
>>>>>>>>     [c000000016367cb0] [c0000000006e2578]
>>>>>>>> io_worker_handle_work+0x498/0x800
>>>>>>>>     [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
>>>>>>>>     [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
>>>>>>>>     [c000000016367e10] [c00000000000dbf0]
>>>>>>>> ret_from_kernel_thread+0x5c/0x6c
>>>>>>>>
>>>>>>>> The kernel consider thread AMR value for kernel thread to be
>>>>>>>> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
>>>>>>>> of course not correct and we should allow userspace access after
>>>>>>>> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
>>>>>>>> AMR value of the operating address space. But, the AMR value is
>>>>>>>> thread-specific and we inherit the address space and not thread
>>>>>>>> access restrictions. Because of this ignore AMR value when accessing
>>>>>>>> userspace via kernel thread.
>>>>>>>>
>>>>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>>>>>> ---
>>>>>>>> Changes from v1:
>>>>>>>> * Address review feedback from Nick
>>>>>>>>
>>>>>>>>     arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++-
>>>>>>>>     1 file changed, 7 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h
>>>>>>>> b/arch/powerpc/include/asm/book3s/64/kup.h
>>>>>>>> index f50f72e535aa..95f4df99249e 100644
>>>>>>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h
>>>>>>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
>>>>>>>> @@ -384,7 +384,13 @@ static __always_inline void
>>>>>>>> allow_user_access(void __user *to, const void __user
>>>>>>>>         // This is written so we can resolve to a single case at build
>>>>>>>> time
>>>>>>>>         BUILD_BUG_ON(!__builtin_constant_p(dir));
>>>>>>>> -    if (mmu_has_feature(MMU_FTR_PKEY))
>>>>>>>> +    /*
>>>>>>>> +     * if it is a kthread that did kthread_use_mm() don't
>>>>>>>> +     * use current_thread_amr().
>>>>>>>
>>>>>>> According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel
>>>>>>> thread */
>>>>>>> It doesn't seem to be related to kthread_use_mm()
>>>>>>
>>>>>> That should be a sufficient check here. if we did reach here without
>>>>>> calling kthread_user_mm, we will crash on access because we don't have
>>>>>> a mm attached to the current process.  a kernel thread with
>>>>>> kthread_use_mm has
>>>>>
>>>>> Ok but then the comment doesn't match the check.
>>>>
>>>>
>>>> I was trying to be explict in the comment that we expect the thread to
>>>> have done kthread_use_mm().
>>>>
>>>>>
>>>>> And also the comment in current_thread_amr() is then misleading.
>>>>>
>>>>> Why not do the current->flags & PF_KTHREAD check in current_thread_amr()
>>>>> and return 0 in that case instead of BLOCKED ?
>>>>
>>>> In my view currrent_thread_amr() is more generic and we want to be
>>>> explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we
>>>> call allow user access, we relax the AMR value.
>>>
>>> Just following up on this, as I'd hate to have 5.11 released with this
>>> bug in it for powerpc. It'll obviously also affect other cases of a
>>> kernel thread faulting after having done kthread_use_mm(), though I'm
>>> not sure how widespread that is. In any case, it'll leave io_uring
>>> mostly broken on powerpc if this isn't patched for release.
>>>
>>
>> I am waiting for test feedback on the change I posted earlier. I am also
>> running a regression run myself. Once that is complete i will post the patch
>> as a separate email.
> 
> Are you waiting a test from me? Or someone else who test PPC? Although I'm
> the "Reported-by" of this bug, I just can help to verify this bug itself,
> I don't have enough test cases to do regression test from PPC side. Do you need
> me to verify this bug itself.
> 
>

If you can verify this bug, it would be really helpful.

-aneesh
Zorro Lang Feb. 4, 2021, 5:57 p.m. UTC | #25
On Thu, Feb 04, 2021 at 10:31:59PM +0530, Aneesh Kumar K.V wrote:
> On 2/4/21 10:23 PM, Jens Axboe wrote:
> > On 2/1/21 11:30 PM, Aneesh Kumar K.V wrote:
> > > On 2/2/21 11:50 AM, Christophe Leroy wrote:
> > > > 
> > > > 
> > > > Le 02/02/2021 à 07:16, Aneesh Kumar K.V a écrit :
> > > > > On 2/2/21 11:32 AM, Christophe Leroy wrote:
> > > > > > 
> > > > > > 
> > > > > > Le 02/02/2021 à 06:55, Aneesh Kumar K.V a écrit :
> > > > > > > Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> writes:
> > > > > > > 
> > > > > > > > Nicholas Piggin <npiggin@gmail.com> writes:
> > > > > > > > 
> > > > > > > > > Excerpts from Michael Ellerman's message of January 30, 2021 9:22 pm:
> > > > > > > > > > Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> > > > > > > > > > > +Aneesh
> > > > > > > > > > > 
> > > > > > > > > > > Le 29/01/2021 à 07:52, Zorro Lang a écrit :
> > > > > > > > > > ..
> > > > > > > > > > > > [   96.200296] ------------[ cut here ]------------
> > > > > > > > > > > > [   96.200304] Bug: Read fault blocked by KUAP!
> > > > > > > > > > > > [   96.200309] WARNING: CPU: 3 PID: 1876 at
> > > > > > > > > > > > arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
> > > > > > > > > > > 
> > > > > > > > > > > > [   96.200734] NIP [c000000000849424]
> > > > > > > > > > > > fault_in_pages_readable+0x104/0x350
> > > > > > > > > > > > [   96.200741] LR [c00000000084952c]
> > > > > > > > > > > > fault_in_pages_readable+0x20c/0x350
> > > > > > > > > > > > [   96.200747] --- interrupt: 300
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Problem happens in a section where userspace access is supposed
> > > > > > > > > > > to be granted, so the patch you
> > > > > > > > > > > proposed is definitely not the right fix.
> > > > > > > > > > > 
> > > > > > > > > > > c000000000849408:    2c 01 00 4c     isync
> > > > > > > > > > > c00000000084940c:    a6 03 3d 7d     mtspr   29,r9  <== granting
> > > > > > > > > > > userspace access permission
> > > > > > > > > > > c000000000849410:    2c 01 00 4c     isync
> > > > > > > > > > > c000000000849414:    00 00 36 e9     ld      r9,0(r22)
> > > > > > > > > > > c000000000849418:    20 00 29 81     lwz     r9,32(r9)
> > > > > > > > > > > c00000000084941c:    00 02 29 71     andi.   r9,r9,512
> > > > > > > > > > > c000000000849420:    78 d3 5e 7f     mr      r30,r26
> > > > > > > > > > > ==> c000000000849424:    00 00 bf 8b     lbz     r29,0(r31)  <==
> > > > > > > > > > > accessing userspace
> > > > > > > > > > > c000000000849428:    10 00 82 41     beq     c000000000849438
> > > > > > > > > > > <fault_in_pages_readable+0x118>
> > > > > > > > > > > c00000000084942c:    2c 01 00 4c     isync
> > > > > > > > > > > c000000000849430:    a6 03 bd 7e     mtspr   29,r21  <==
> > > > > > > > > > > clearing userspace access permission
> > > > > > > > > > > c000000000849434:    2c 01 00 4c     isync
> > > > > > > > > > > 
> > > > > > > > > > > My first guess is that the problem is linked to the following
> > > > > > > > > > > function, see the comment
> > > > > > > > > > > 
> > > > > > > > > > > /*
> > > > > > > > > > >     * For kernel thread that doesn't have thread.regs return
> > > > > > > > > > >     * default AMR/IAMR values.
> > > > > > > > > > >     */
> > > > > > > > > > > static inline u64 current_thread_amr(void)
> > > > > > > > > > > {
> > > > > > > > > > >      if (current->thread.regs)
> > > > > > > > > > >          return current->thread.regs->amr;
> > > > > > > > > > >      return AMR_KUAP_BLOCKED;
> > > > > > > > > > > }
> > > > > > > > > > > 
> > > > > > > > > > > Above function was introduced by commit 48a8ab4eeb82
> > > > > > > > > > > ("powerpc/book3s64/pkeys: Don't update SPRN_AMR
> > > > > > > > > > > when in kernel mode")
> > > > > > > > > > 
> > > > > > > > > > Yeah that's a bit of a curly one.
> > > > > > > > > > 
> > > > > > > > > > At some point io_uring did kthread_use_mm(), which is supposed to
> > > > > > > > > > mean
> > > > > > > > > > the kthread can operate on behalf of the original process that
> > > > > > > > > > submitted
> > > > > > > > > > the IO.
> > > > > > > > > > 
> > > > > > > > > > But because KUAP is implemented using memory protection keys, it
> > > > > > > > > > depends
> > > > > > > > > > on the value of the AMR register, which is not part of the mm,
> > > > > > > > > > it's in
> > > > > > > > > > thread.regs->amr.
> > > > > > > > > > 
> > > > > > > > > > And what's worse by the time we're in kthread_use_mm() we no
> > > > > > > > > > longer have
> > > > > > > > > > access to the thread.regs->amr of the original process that
> > > > > > > > > > submitted
> > > > > > > > > > the IO.
> > > > > > > > > > 
> > > > > > > > > > We also can't simply move the AMR into the mm, precisely because
> > > > > > > > > > it's
> > > > > > > > > > per thread, not per mm.
> > > > > > > > > > 
> > > > > > > > > > So TBH I don't know how we're going to fix this.
> > > > > > > > > > 
> > > > > > > > > > I guess we could return AMR=unblocked for kernel threads, but that's
> > > > > > > > > > arguably a bug because it allows a process to circumvent memory
> > > > > > > > > > keys by
> > > > > > > > > > asking the kernel to do the access.
> > > > > > > > > 
> > > > > > > > > We shouldn't need to inherit AMR should we? We only need it to be
> > > > > > > > > locked
> > > > > > > > > for kernel threads until it's explicitly unlocked -- nothing mm
> > > > > > > > > specific
> > > > > > > > > there. I think current_thread_amr could return 0 for kernel
> > > > > > > > > threads? Or
> > > > > > > > > I would even avoid using that function for allow_user_access and open
> > > > > > > > > code the kthread case and remove it from current_thread_amr().
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > Nick
> > > > > > > > 
> > > > > > > 
> > > > > > > updated one
> > > > > > > 
> > > > > > >   From 8fdb0680f983940d61f91da8252b13c8d3e8ebee Mon Sep 17 00:00:00 2001
> > > > > > > From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> > > > > > > Date: Tue, 2 Feb 2021 09:23:38 +0530
> > > > > > > Subject: [PATCH v2] powerpc/kuap: Allow kernel thread to access
> > > > > > > userspace
> > > > > > >    after kthread_use_mm
> > > > > > > 
> > > > > > > This fix the bad fault reported by KUAP when io_wqe_worker access
> > > > > > > userspace.
> > > > > > > 
> > > > > > >    Bug: Read fault blocked by KUAP!
> > > > > > >    WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229
> > > > > > > __do_page_fault+0x6b4/0xcd0
> > > > > > >    NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
> > > > > > >    LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
> > > > > > > ..........
> > > > > > >    Call Trace:
> > > > > > >    [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
> > > > > > > (unreliable)
> > > > > > >    [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
> > > > > > >    [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
> > > > > > >    --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
> > > > > > > ..........
> > > > > > >    NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
> > > > > > >    LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
> > > > > > >    interrupt: 300
> > > > > > >    [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
> > > > > > >    [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
> > > > > > >    [c000000016367990] [c000000000710330]
> > > > > > > iomap_file_buffered_write+0xa0/0x120
> > > > > > >    [c0000000163679e0] [c00800000040791c]
> > > > > > > xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
> > > > > > >    [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
> > > > > > >    [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
> > > > > > >    [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
> > > > > > >    [c000000016367cb0] [c0000000006e2578]
> > > > > > > io_worker_handle_work+0x498/0x800
> > > > > > >    [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
> > > > > > >    [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
> > > > > > >    [c000000016367e10] [c00000000000dbf0]
> > > > > > > ret_from_kernel_thread+0x5c/0x6c
> > > > > > > 
> > > > > > > The kernel consider thread AMR value for kernel thread to be
> > > > > > > AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
> > > > > > > of course not correct and we should allow userspace access after
> > > > > > > kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
> > > > > > > AMR value of the operating address space. But, the AMR value is
> > > > > > > thread-specific and we inherit the address space and not thread
> > > > > > > access restrictions. Because of this ignore AMR value when accessing
> > > > > > > userspace via kernel thread.
> > > > > > > 
> > > > > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> > > > > > > ---
> > > > > > > Changes from v1:
> > > > > > > * Address review feedback from Nick
> > > > > > > 
> > > > > > >    arch/powerpc/include/asm/book3s/64/kup.h | 8 +++++++-
> > > > > > >    1 file changed, 7 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/arch/powerpc/include/asm/book3s/64/kup.h
> > > > > > > b/arch/powerpc/include/asm/book3s/64/kup.h
> > > > > > > index f50f72e535aa..95f4df99249e 100644
> > > > > > > --- a/arch/powerpc/include/asm/book3s/64/kup.h
> > > > > > > +++ b/arch/powerpc/include/asm/book3s/64/kup.h
> > > > > > > @@ -384,7 +384,13 @@ static __always_inline void
> > > > > > > allow_user_access(void __user *to, const void __user
> > > > > > >        // This is written so we can resolve to a single case at build
> > > > > > > time
> > > > > > >        BUILD_BUG_ON(!__builtin_constant_p(dir));
> > > > > > > -    if (mmu_has_feature(MMU_FTR_PKEY))
> > > > > > > +    /*
> > > > > > > +     * if it is a kthread that did kthread_use_mm() don't
> > > > > > > +     * use current_thread_amr().
> > > > > > 
> > > > > > According to include/linux/sched.h, PF_KTHREAD means /* I am a kernel
> > > > > > thread */
> > > > > > It doesn't seem to be related to kthread_use_mm()
> > > > > 
> > > > > That should be a sufficient check here. if we did reach here without
> > > > > calling kthread_user_mm, we will crash on access because we don't have
> > > > > a mm attached to the current process.  a kernel thread with
> > > > > kthread_use_mm has
> > > > 
> > > > Ok but then the comment doesn't match the check.
> > > 
> > > 
> > > I was trying to be explict in the comment that we expect the thread to
> > > have done kthread_use_mm().
> > > 
> > > > 
> > > > And also the comment in current_thread_amr() is then misleading.
> > > > 
> > > > Why not do the current->flags & PF_KTHREAD check in current_thread_amr()
> > > > and return 0 in that case instead of BLOCKED ?
> > > 
> > > In my view currrent_thread_amr() is more generic and we want to be
> > > explicit there that a kernel thread AMR is KUAP_BLOCKED. Only when we
> > > call allow user access, we relax the AMR value.
> > 
> > Just following up on this, as I'd hate to have 5.11 released with this
> > bug in it for powerpc. It'll obviously also affect other cases of a
> > kernel thread faulting after having done kthread_use_mm(), though I'm
> > not sure how widespread that is. In any case, it'll leave io_uring
> > mostly broken on powerpc if this isn't patched for release.
> > 
> 
> I am waiting for test feedback on the change I posted earlier. I am also
> running a regression run myself. Once that is complete i will post the patch
> as a separate email.

Are you waiting a test from me? Or someone else who test PPC? Although I'm
the "Reported-by" of this bug, I just can help to verify this bug itself,
I don't have enough test cases to do regression test from PPC side. Do you need
me to verify this bug itself.

Thanks,
Zorro

> 
> -aneesh
>
diff mbox series

Patch

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 8961b44f350c..5a4d6af04c99 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -417,9 +417,12 @@  static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	/*
 	 * The kernel should never take an execute fault nor should it
 	 * take a page fault to a kernel address or a page fault to a user
-	 * address outside of dedicated places
+	 * address outside of dedicated places. Use mm to decide if this is
+	 * valid or not, it's perfectly legitimate to have a kernel thread
+	 * take a fault as long as it's performed kthread_use_mm() prior. An
+	 * example of that would be io_uring helper threads.
 	 */
-	if (unlikely(!is_user && bad_kernel_fault(regs, error_code, address, is_write)))
+	if (unlikely(!mm && bad_kernel_fault(regs, error_code, address, is_write)))
 		return SIGSEGV;
 
 	/*