powerpc/mm: make NULL pointer deferences explicit on bad page faults.

Message ID 42e7e2e9b74af526523fb5a1451fdb29cfe2687a.1539060806.git.christophe.leroy@c-s.fr
State New
Headers show
Series
  • powerpc/mm: make NULL pointer deferences explicit on bad page faults.
Related show

Checks

Context Check Description
snowpatch_ozlabs/build-ppc32 success Test build-ppc32 on branch next
snowpatch_ozlabs/build-ppc64e success Test build-ppc64e on branch next
snowpatch_ozlabs/build-ppc64be success Test build-ppc64be on branch next
snowpatch_ozlabs/build-ppc64le success Test build-ppc64le on branch next
snowpatch_ozlabs/checkpatch success Test checkpatch on branch next
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied

Commit Message

Christophe Leroy Oct. 9, 2018, 5:21 a.m.
As several other arches including x86, this patch makes it explicit
that a bad page fault is a NULL pointer dereference when the fault
address is lower than PAGE_SIZE

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/mm/fault.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Michael Ellerman Dec. 14, 2018, 12:57 a.m. | #1
Hi Christophe,

You know it's the trivial patches that are going to get lots of review
comments :)

Christophe Leroy <christophe.leroy@c-s.fr> writes:
> As several other arches including x86, this patch makes it explicit
> that a bad page fault is a NULL pointer dereference when the fault
> address is lower than PAGE_SIZE

I'm being pedantic, but it's not necessarily a NULL pointer dereference.
It might just be a direct access to a low address, eg:

 char *p = 0x100;
 *p = 0;

That's not a NULL pointer dereference.

But other arches do print this so I guess it's OK to add, and in most
cases it will be an actual NULL pointer dereference.

I wonder though if we should use 4096 rather than PAGE_SIZE, given
that's the actual value other arches are using. We support 256K pages on
some systems, which is getting quite large.

> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index d51cf5f4e45e..501a1eadb3e9 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -631,13 +631,16 @@ void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
>  	switch (TRAP(regs)) {
>  	case 0x300:
>  	case 0x380:
> -		printk(KERN_ALERT "Unable to handle kernel paging request for "
> -			"data at address 0x%08lx\n", regs->dar);
> +		pr_alert("Unable to handle kernel %s for data at address 0x%08lx\n",
> +			 regs->dar < PAGE_SIZE ? "NULL pointer dereference" :
> +						 "paging request",
> +			 regs->dar);

This is now too long I think, with printk time you get:

[ 1096.450711] Unable to handle kernel NULL pointer dereference for data at address 0x00000000

Which is 93 columns. It's true on many systems it doesn't really matter
any more, but it would still be good if it was shorter.

I like that on x86 they prefix it with "BUG:", just to avoid any confusion.

What if we had for the NULL pointer case:

  BUG: Kernel NULL pointer dereference at 0x00000000

And for the normal case:

  BUG: Unable to handle kernel data access at 0x00000000

Note on the very next line we print:
  Faulting instruction address: 0xc000000000795cc8

So there should be no confusion about whether "at" refers to the data
address or the instruction address.

>  	case 0x400:
>  	case 0x480:
> -		printk(KERN_ALERT "Unable to handle kernel paging request for "
> -			"instruction fetch\n");
> +		pr_alert("Unable to handle kernel %s for instruction fetch\n",
> +			 regs->nip < PAGE_SIZE ? "NULL pointer dereference" :
> +						 "paging request");

I don't really like using "NULL pointer dereference" here, that
terminology makes me think of a load/store, I think it confuses things
rather than making it clearer.

What about:

  BUG: Unable to handle kernel instruction fetch at 0x00000000


>  		break;
>  	case 0x600:
>  		printk(KERN_ALERT "Unable to handle kernel paging request for "

It would be good to clean up these other cases as well. They seem to be
trying to use the "page request for" terminology which leads to them
being very wordy. I assume that was done to help people grepping kernel
logs for errors, but I think we should not worry about that if we have
the "BUG:" prefix.

So we have:
	printk(KERN_ALERT "Unable to handle kernel paging request for "
		"unaligned access at address 0x%08lx\n", regs->dar);

What about:

  BUG: Unable to handle kernel unaligned access at 0x00000000

And:
	printk(KERN_ALERT "Unable to handle kernel paging request for "
		"unknown fault\n");

What about:

  BUG: Unable to handle unknown paging fault at 0x00000000


Thoughts?

cheers
Christophe Leroy Dec. 14, 2018, 8:01 a.m. | #2
Hi Michael,

Le 14/12/2018 à 01:57, Michael Ellerman a écrit :
> Hi Christophe,
> 
> You know it's the trivial patches that are going to get lots of review
> comments :)

I'm so happy to get comments.

> 
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>> As several other arches including x86, this patch makes it explicit
>> that a bad page fault is a NULL pointer dereference when the fault
>> address is lower than PAGE_SIZE
> 
> I'm being pedantic, but it's not necessarily a NULL pointer dereference.
> It might just be a direct access to a low address, eg:
> 
>   char *p = 0x100;
>   *p = 0;
> 
> That's not a NULL pointer dereference.
> 
> But other arches do print this so I guess it's OK to add, and in most
> cases it will be an actual NULL pointer dereference.
> 
> I wonder though if we should use 4096 rather than PAGE_SIZE, given
> that's the actual value other arches are using. We support 256K pages on
> some systems, which is getting quite large.

Those invalid accesses are catched because the first page is marked non 
present or non accessible in the page table, so I thing using PAGE_SIZE 
here is valid regardless of the page size.

Looks like the arches have PAGE_SHIFT ranging from 12 to 16 mainly.

> 
>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>> index d51cf5f4e45e..501a1eadb3e9 100644
>> --- a/arch/powerpc/mm/fault.c
>> +++ b/arch/powerpc/mm/fault.c
>> @@ -631,13 +631,16 @@ void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
>>   	switch (TRAP(regs)) {
>>   	case 0x300:
>>   	case 0x380:
>> -		printk(KERN_ALERT "Unable to handle kernel paging request for "
>> -			"data at address 0x%08lx\n", regs->dar);
>> +		pr_alert("Unable to handle kernel %s for data at address 0x%08lx\n",
>> +			 regs->dar < PAGE_SIZE ? "NULL pointer dereference" :
>> +						 "paging request",
>> +			 regs->dar);
> 
> This is now too long I think, with printk time you get:
> 
> [ 1096.450711] Unable to handle kernel NULL pointer dereference for data at address 0x00000000
> 
> Which is 93 columns. It's true on many systems it doesn't really matter
> any more, but it would still be good if it was shorter.
> 
> I like that on x86 they prefix it with "BUG:", just to avoid any confusion.
> 
> What if we had for the NULL pointer case:
> 
>    BUG: Kernel NULL pointer dereference at 0x00000000
> 
> And for the normal case:
> 
>    BUG: Unable to handle kernel data access at 0x00000000
> 
> Note on the very next line we print:
>    Faulting instruction address: 0xc000000000795cc8
> 
> So there should be no confusion about whether "at" refers to the data
> address or the instruction address.

Agreed

> 
>>   	case 0x400:
>>   	case 0x480:
>> -		printk(KERN_ALERT "Unable to handle kernel paging request for "
>> -			"instruction fetch\n");
>> +		pr_alert("Unable to handle kernel %s for instruction fetch\n",
>> +			 regs->nip < PAGE_SIZE ? "NULL pointer dereference" :
>> +						 "paging request");
> 
> I don't really like using "NULL pointer dereference" here, that
> terminology makes me think of a load/store, I think it confuses things
> rather than making it clearer.
> 
> What about:
> 
>    BUG: Unable to handle kernel instruction fetch at 0x00000000

I think we still need to make it explicit that we jumped there due to a 
NULL function pointer, allthought I don't have a good text idea yet for 
this.

> 
> 
>>   		break;
>>   	case 0x600:
>>   		printk(KERN_ALERT "Unable to handle kernel paging request for "
> 
> It would be good to clean up these other cases as well. They seem to be
> trying to use the "page request for" terminology which leads to them
> being very wordy. I assume that was done to help people grepping kernel
> logs for errors, but I think we should not worry about that if we have
> the "BUG:" prefix.
> 
> So we have:
> 	printk(KERN_ALERT "Unable to handle kernel paging request for "
> 		"unaligned access at address 0x%08lx\n", regs->dar);
> 
> What about:
> 
>    BUG: Unable to handle kernel unaligned access at 0x00000000
> 
> And:
> 	printk(KERN_ALERT "Unable to handle kernel paging request for "
> 		"unknown fault\n");
> 
> What about:
> 
>    BUG: Unable to handle unknown paging fault at 0x00000000
> 
> 
> Thoughts?

Looks like good ideas I'll carry on.

Christophe

Patch

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index d51cf5f4e45e..501a1eadb3e9 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -631,13 +631,16 @@  void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
 	switch (TRAP(regs)) {
 	case 0x300:
 	case 0x380:
-		printk(KERN_ALERT "Unable to handle kernel paging request for "
-			"data at address 0x%08lx\n", regs->dar);
+		pr_alert("Unable to handle kernel %s for data at address 0x%08lx\n",
+			 regs->dar < PAGE_SIZE ? "NULL pointer dereference" :
+						 "paging request",
+			 regs->dar);
 		break;
 	case 0x400:
 	case 0x480:
-		printk(KERN_ALERT "Unable to handle kernel paging request for "
-			"instruction fetch\n");
+		pr_alert("Unable to handle kernel %s for instruction fetch\n",
+			 regs->nip < PAGE_SIZE ? "NULL pointer dereference" :
+						 "paging request");
 		break;
 	case 0x600:
 		printk(KERN_ALERT "Unable to handle kernel paging request for "