diff mbox

[net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request at 0000a7cf

Message ID 20170302202338.ci6wwb3yzjmdy4n2@wfg-t540p.sh.intel.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

kbuild test robot March 2, 2017, 8:23 p.m. UTC
On Wed, Mar 01, 2017 at 08:54:26PM +0800, Fengguang Wu wrote:
>Hi all,
>
>Is it BPF triggering BUGs all over the places?

It looks so, and here is a fix.

>1e74a2eb1f  Merge tag 'gcc-plugins-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
>005c3490e9  Revert "ath10k: Search SMBIOS for OEM board file extension"
>3051bf36c2  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
>+-------------------------------------------------------+------------+------------+------------+
>|                                                       | 1e74a2eb1f | 005c3490e9 | 3051bf36c2 |
>+-------------------------------------------------------+------------+------------+------------+
>| boot_successes                                        | 1223       | 1098       | 242        |
>| boot_failures                                         | 1          | 126        | 72         |
>| BUG:unable_to_handle_kernel                           | 1          | 117        | 69         |
>| Oops                                                  | 1          | 126        | 72         |
>| EIP:perf_callchain_user                               | 1          |            |            |
>| Kernel_panic-not_syncing:Fatal_exception              | 1          | 121        | 67         |
>| EIP:netlink_release                                   | 0          | 20         | 3          |
>| EIP:bpf_prog_free                                     | 0          | 22         | 3          |
>| EIP:filp_close                                        | 0          | 64         | 23         |
>| EIP:netlink_update_listeners                          | 0          | 10         | 9          |
>| EIP:security_inode_getattr                            | 0          | 2          |            |
>| EIP:__lock_acquire                                    | 0          | 1          | 11         |
>| Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0          | 5          | 4          |
>| EIP:__rcu_process_callbacks                           | 0          | 2          |            |
>| EIP:__fget_light                                      | 0          | 1          |            |
>| EIP:__unix_remove_socket                              | 0          | 0          | 13         |
>| INFO:trying_to_register_non-static_key                | 0          | 0          | 2          |
>| EIP:mnt_want_write_file                               | 0          | 0          | 1          |
>| EIP:skb_dequeue                                       | 0          | 0          | 1          |
>| EIP:strlen                                            | 0          | 0          | 1          |
>| EIP:__netlink_lookup                                  | 0          | 0          | 2          |
>| EIP:vfs_fsync_range                                   | 0          | 0          | 1          |
>| EIP:__unix_find_socket_byname                         | 0          | 0          | 1          |
>| EIP:release_sock                                      | 0          | 0          | 1          |
>+-------------------------------------------------------+------------+------------+------------+

I confirm that the below patch provided by Daniel fixes the above
issues on mainline kernel, too. Where should this patch be sent to?
It'd be very noisy if all these Oops hit the upcoming RC1 kernel.

Daniel thinks there may be deeper problem in i386 set_memory_rw().
However that could take much longer time to debug.

Thanks,
Fengguang
---

Re: [bpf] 9d876e79df:  BUG: unable to handle kernel paging request at 653a8346

> On Tue, Feb 28, 2017 at 04:39:36PM +0100, Daniel Borkmann wrote:

I have a rough feeling what it is, but I didn't have cycles to work on
it yet (due to travel, sorry about that). The issue is likely shut down
by just doing:

---
 arch/x86/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Daniel Borkmann March 2, 2017, 8:40 p.m. UTC | #1
On 03/02/2017 09:23 PM, Fengguang Wu wrote:
[...]
> I confirm that the below patch provided by Daniel fixes the above
> issues on mainline kernel, too. Where should this patch be sent to?

If nobody objects, I could send it to -net tree via Dave due to being
BPF related, but I don't mind sending it elsewhere too (f.e. Linus
directly?) in order to stop your bot from continuing to send such mails.

The issue seems only related to i386 and doesn't trigger each time with
Fengguang's kernel config and qemu image when I try to reproduce it.
set_memory_ro()/set_memory_rw() on i386 seems to work in general, but
when it's used/reproduced, from time to time (perhaps some corner-case?)
it looks like that memory area can have issues much later on after being
fed back to the allocator which then causes a GPF from random locations.
Gut feeling, it might be an issue in set_memory_*() that my commit
uncovered. Still looking into it, but mean-time I could just send the
below, sure.

Thanks,
Daniel

> It'd be very noisy if all these Oops hit the upcoming RC1 kernel.
>
> Daniel thinks there may be deeper problem in i386 set_memory_rw().
> However that could take much longer time to debug.
>
> Thanks,
> Fengguang
> ---
>
> Re: [bpf] 9d876e79df:  BUG: unable to handle kernel paging request at 653a8346
>
>> On Tue, Feb 28, 2017 at 04:39:36PM +0100, Daniel Borkmann wrote:
>
> I have a rough feeling what it is, but I didn't have cycles to work on
> it yet (due to travel, sorry about that). The issue is likely shut down
> by just doing:
>
> ---
> arch/x86/Kconfig |    2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- linux.orig/arch/x86/Kconfig    2017-03-03 03:44:35.962022996 +0800
> +++ linux/arch/x86/Kconfig    2017-03-03 03:44:35.962022996 +0800
> @@ -54,7 +54,7 @@ config X86
>      select ARCH_HAS_KCOV            if X86_64
>      select ARCH_HAS_MMIO_FLUSH
>      select ARCH_HAS_PMEM_API        if X86_64
> -    select ARCH_HAS_SET_MEMORY
> +    select ARCH_HAS_SET_MEMORY        if X86_64
>      select ARCH_HAS_SG_CHAIN
>      select ARCH_HAS_STRICT_KERNEL_RWX
>      select ARCH_HAS_STRICT_MODULE_RWX
Linus Torvalds March 8, 2017, 7:25 p.m. UTC | #2
Adding x86 people too, since this seems to be something off about
ARCH_HAS_SET_MEMORY for x86-32.

The code seems to be shared between x86-32 and 64, I'm not seeing why
set_memory_r[ow]() should fail on one but not the other.

Considering that it seems to be flaky even on 32-bit, maybe it's
timing-related, or possibly related to TLB sizes or whatever (ie more
likely hidden by a larger TLB on more modern hardware?)

Anyway, just looking at change_page_attr_set_clr(), I notice that the
page alias checking treats NX specially:

        /* No alias checking for _NX bit modifications */
        checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;

which seems insane. Why would NX be different from other protection
bits (like _PAGE_RW)?

But that doesn't explain why the bpf code would have issues with this
all only on x86-32.

Maybe somebody else can see why ARCH_HAS_SET_MEMORY would depend on
64-bit only..

               Linus

On Thu, Mar 2, 2017 at 12:40 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 03/02/2017 09:23 PM, Fengguang Wu wrote:
> [...]
>>
>> I confirm that the below patch provided by Daniel fixes the above
>> issues on mainline kernel, too. Where should this patch be sent to?
>
>
> If nobody objects, I could send it to -net tree via Dave due to being
> BPF related, but I don't mind sending it elsewhere too (f.e. Linus
> directly?) in order to stop your bot from continuing to send such mails.
>
> The issue seems only related to i386 and doesn't trigger each time with
> Fengguang's kernel config and qemu image when I try to reproduce it.
> set_memory_ro()/set_memory_rw() on i386 seems to work in general, but
> when it's used/reproduced, from time to time (perhaps some corner-case?)
> it looks like that memory area can have issues much later on after being
> fed back to the allocator which then causes a GPF from random locations.
> Gut feeling, it might be an issue in set_memory_*() that my commit
> uncovered. Still looking into it, but mean-time I could just send the
> below, sure.
>
> Thanks,
> Daniel
>
>
>> It'd be very noisy if all these Oops hit the upcoming RC1 kernel.
>>
>> Daniel thinks there may be deeper problem in i386 set_memory_rw().
>> However that could take much longer time to debug.
>>
>> Thanks,
>> Fengguang
>> ---
>>
>> Re: [bpf] 9d876e79df:  BUG: unable to handle kernel paging request at
>> 653a8346
>>
>>> On Tue, Feb 28, 2017 at 04:39:36PM +0100, Daniel Borkmann wrote:
>>
>>
>> I have a rough feeling what it is, but I didn't have cycles to work on
>> it yet (due to travel, sorry about that). The issue is likely shut down
>> by just doing:
>>
>> ---
>> arch/x86/Kconfig |    2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> --- linux.orig/arch/x86/Kconfig    2017-03-03 03:44:35.962022996 +0800
>> +++ linux/arch/x86/Kconfig    2017-03-03 03:44:35.962022996 +0800
>> @@ -54,7 +54,7 @@ config X86
>>      select ARCH_HAS_KCOV            if X86_64
>>      select ARCH_HAS_MMIO_FLUSH
>>      select ARCH_HAS_PMEM_API        if X86_64
>> -    select ARCH_HAS_SET_MEMORY
>> +    select ARCH_HAS_SET_MEMORY        if X86_64
>>      select ARCH_HAS_SG_CHAIN
>>      select ARCH_HAS_STRICT_KERNEL_RWX
>>      select ARCH_HAS_STRICT_MODULE_RWX
>
>
Thomas Gleixner March 9, 2017, 1:09 p.m. UTC | #3
On Wed, 8 Mar 2017, Linus Torvalds wrote:

> Adding x86 people too, since this seems to be something off about
> ARCH_HAS_SET_MEMORY for x86-32.
> 
> The code seems to be shared between x86-32 and 64, I'm not seeing why
> set_memory_r[ow]() should fail on one but not the other.

Indeed.

> Considering that it seems to be flaky even on 32-bit, maybe it's
> timing-related, or possibly related to TLB sizes or whatever (ie more
> likely hidden by a larger TLB on more modern hardware?)

The only difference I can see is the way how __tlb_flush_all() is
happening. We have 3 variants:

   invpcid_flush_all()  - depends on X86_FEATURE_INVPCID and X86_FEATURE_PGE
   cr4 based flush	- depends on X86_FEATURE_PGE
   cr3 based flush

No idea which variant is used in that failure case.

> Anyway, just looking at change_page_attr_set_clr(), I notice that the
> page alias checking treats NX specially:
> 
>         /* No alias checking for _NX bit modifications */
>         checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
> 
> which seems insane. Why would NX be different from other protection
> bits (like _PAGE_RW)?

The reason is that the alias mapping should never be executable at all.

Thanks,

	tglx
diff mbox

Patch

--- linux.orig/arch/x86/Kconfig	2017-03-03 03:44:35.962022996 +0800
+++ linux/arch/x86/Kconfig	2017-03-03 03:44:35.962022996 +0800
@@ -54,7 +54,7 @@  config X86
 	select ARCH_HAS_KCOV			if X86_64
 	select ARCH_HAS_MMIO_FLUSH
 	select ARCH_HAS_PMEM_API		if X86_64
-	select ARCH_HAS_SET_MEMORY
+	select ARCH_HAS_SET_MEMORY		if X86_64
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_HAS_STRICT_MODULE_RWX