Patchwork Fix DEBUG_PAGEALLOC on 603/e300

login
register
mail settings
Submitter Benjamin Herrenschmidt
Date April 28, 2010, 7:22 a.m.
Message ID <1272439375.24542.94.camel@pasglop>
Download mbox | patch
Permalink /patch/51148/
State Accepted, archived
Commit 75c1d539ea13117cbe95e2c343e52af67d735145
Delegated to: Benjamin Herrenschmidt
Headers show

Comments

Benjamin Herrenschmidt - April 28, 2010, 7:22 a.m.
So we tried to speed things up a bit using flush_hash_pages() directly
but that falls over on 603 of course meaning we fail to flush the TLB
properly and we may even end up having it corrupt memory randomly by
accessing a hash table that doesn't exist.

This removes the "optimization" by always going through flush_tlb_page()
for now at least.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

Somebody with a 603 or e300 core based FSL SoC to try this out for me ?

It's obviously completely untested :-)

Cheers,
Ben.
Xianghua Xiao - April 28, 2010, 7:15 p.m.
On Wed, Apr 28, 2010 at 2:22 AM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> So we tried to speed things up a bit using flush_hash_pages() directly
> but that falls over on 603 of course meaning we fail to flush the TLB
> properly and we may even end up having it corrupt memory randomly by
> accessing a hash table that doesn't exist.
>
> This removes the "optimization" by always going through flush_tlb_page()
> for now at least.
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>
> Somebody with a 603 or e300 core based FSL SoC to try this out for me ?
>
> It's obviously completely untested :-)
>
> Cheers,
> Ben.
>
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index b9243e7..95774b4 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -385,11 +385,7 @@ static int __change_page_attr(struct page *page, pgprot_t prot)
>                return -EINVAL;
>        __set_pte_at(&init_mm, address, kpte, mk_pte(page, prot), 0);
>        wmb();
> -#ifdef CONFIG_PPC_STD_MMU
> -       flush_hash_pages(0, address, pmd_val(*kpmd), 1);
> -#else
>        flush_tlb_page(NULL, address);
> -#endif
>        pte_unmap(kpte);
>
>        return 0;
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
This change works me on a 834x(e300) platform, tested with lmbench and
a production-ready application with 2.6.33.3.

Xianghua
Kim Phillips - April 28, 2010, 7:46 p.m.
On Wed, 28 Apr 2010 14:15:50 -0500
Xianghua Xiao <xiaoxianghua@gmail.com> wrote:

> On Wed, Apr 28, 2010 at 2:22 AM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > So we tried to speed things up a bit using flush_hash_pages() directly
> > but that falls over on 603 of course meaning we fail to flush the TLB
> > properly and we may even end up having it corrupt memory randomly by
> > accessing a hash table that doesn't exist.
> >
> > This removes the "optimization" by always going through flush_tlb_page()
> > for now at least.
> >
> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > ---
> > Somebody with a 603 or e300 core based FSL SoC to try this out for me ?
> >
> This change works me on a 834x(e300) platform, tested with lmbench and
> a production-ready application with 2.6.33.3.

moi aussi, Ben: booted this on an 8377 and started a kernel build.
It seems to be progressing normally, I'm just not going to wait for it
to complete before hitting send :).

Kim
Benjamin Herrenschmidt - April 28, 2010, 10:29 p.m.
On Wed, 2010-04-28 at 14:15 -0500, Xianghua Xiao wrote:
> This change works me on a 834x(e300) platform, tested with lmbench and
> a production-ready application with 2.6.33.3.

But have you tested that DEBUG_PAGEALLOC actually works ? :-)

A way to do that is to

	- get_free_pages a page
	- read from it
	- free it
	- write to it

It should oops on the write, and I suspect that without my patch it
doesn't.

Cheers,
Ben.
Maindoor - May 2, 2010, 10:27 a.m.
Any updates on this ? I need something similar.

Thanks,
Maindoor.

--- On Thu, 4/29/10, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [PATCH] Fix DEBUG_PAGEALLOC on 603/e300
To: "Xianghua Xiao" <xiaoxianghua@gmail.com>
Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>
Date: Thursday, April 29, 2010, 3:59 AM

On Wed, 2010-04-28 at 14:15 -0500, Xianghua Xiao wrote:
> This change works me on a 834x(e300) platform, tested with lmbench and
> a production-ready application with 2.6.33.3.

But have you tested that DEBUG_PAGEALLOC actually works ? :-)

A way to do that is to

    - get_free_pages a page
    - read from it
    - free it
    - write to it

It should oops on the write, and I suspect that without my patch it
doesn't.

Cheers,
Ben.
Xianghua Xiao - May 3, 2010, 2:19 p.m.
On Sun, May 2, 2010 at 5:27 AM, Maindoor <sanjeevfiles@yahoo.com> wrote:

> Any updates on this ? I need something similar.
>
> Thanks,
> Maindoor.
>
> --- On *Thu, 4/29/10, Benjamin Herrenschmidt <benh@kernel.crashing.org>*wrote:
>
>
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Subject: Re: [PATCH] Fix DEBUG_PAGEALLOC on 603/e300
> To: "Xianghua Xiao" <xiaoxianghua@gmail.com>
> Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>
> Date: Thursday, April 29, 2010, 3:59 AM
>
> On Wed, 2010-04-28 at 14:15 -0500, Xianghua Xiao wrote:
> > This change works me on a 834x(e300) platform, tested with lmbench and
> > a production-ready application with 2.6.33.3.
>
> But have you tested that DEBUG_PAGEALLOC actually works ? :-)
>
> A way to do that is to
>
>     - get_free_pages a page
>     - read from it
>     - free it
>     - write to it
>
> It should oops on the write, and I suspect that without my patch it
> doesn't.
>
> Cheers,
> Ben.
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org<http://mc/compose?to=Linuxppc-dev@lists.ozlabs.org>
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>
> Ok here is the test result I did this morning, in short with/without Ben's
patch it will oops if you write to a freed page.

1. With DEBUG_PAGEALLOC on, after applied Ben's patch and wrote to a freed
page, it will oops:
--------------------------------------
Read from allocated page
Write to freed page
Unable to handle kernel paging request for data at address 0xce33c000
Faulting instruction address: 0xd004d07c
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT DEBUG_PAGEALLOC 834x SYS
Modules linked in: e300page(+)
NIP: d004d07c LR: d004d074 CTR: 00000001
REGS: cf231e30 TRAP: 0300   Not tainted  (2.6.33.3-rt17)
MSR: 00009032 <EE,ME,IR,DR>  CR: 24000422  XER: 20000000
DAR: ce33c000, DSISR: 22000000
TASK = ce312e10[1395] 'insmod' THREAD: cf230000
GPR00: 00000061 cf231ee0 ce312e10 0000001a 00003b0d ffffffff c047c4d6
00004000
GPR08: c047c8c4 ce33c000 00003fff ce312e10 24000422 100bbc1c 0fffd000
ffffffff
GPR16: 00000001 00000000 007fff00 00000000 00000000 0fffa1a0 00000000
100b90a0
GPR24: 4800e4ac bfd2ff80 c043d388 00000000 d004d000 c0480000 cf230000
d0050000
NIP [d004d07c] e300page_init+0x7c/0xb8 [e300page]
LR [d004d074] e300page_init+0x74/0xb8 [e300page]
Call Trace:
[cf231ee0] [d004d074] e300page_init+0x74/0xb8 [e300page] (unreliable)
[cf231ef0] [c00038b4] do_one_initcall+0x54/0x210
[cf231f20] [c0058ba4] sys_init_module+0x120/0x240
[cf231f40] [c0012afc] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xfe5baa0
    LR = 0x10027de0
Instruction dump:
4e800020 3c60d005 3863a094 48000031 807fa214 38800000 48000045 3c60d005
3863a0b4 48000019 813fa214 38000061 <98090000> 4bffffb8 00000000 3d60c035
---[ end trace 4b412bd05f6f6d8d ]---

# rmmod e300page
rmmod: e300page: Device or resource busy

2. Without Ben's patch, there is also oops observed:
# insmod ./e300page.ko
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT DEBUG_PAGEALLOC 834x SYS
Modules linked in: e300page(+)
NIP: d004d07c LR: d004d074 CTR: 00000001
REGS: ce30be30 TRAP: 0300   Not tainted  (2.6.33.3-rt17)
MSR: 00009032 <EE,ME,IR,DR>  CR: 22000422  XER: 20000000
DAR: ce172000, DSISR: 22000000
TASK = ce306120[1392] 'insmod' THREAD: ce30a000
GPR00: 00000061 ce30bee0 ce306120 0000001a 00003b0d ffffffff c047c4d6
00004000
GPR08: c047c8c4 ce172000 00003fff ce306120 22000422 100bbc1c 0fffd000
ffffffff
GPR16: 00000001 00000000 007fff00 00000000 00000000 0fffa1a0 00000000
100b90a0
GPR24: 4800e4ac bf876f10 c043d388 00000000 d004d000 c0480000 ce30a000
d0050000
NIP [d004d07c] e300page_init+0x7c/0xb8 [e300page]
LR [d004d074] e300page_init+0x74/0xb8 [e300page]
Call Trace:
[ce30bee0] [d004d074] e300page_init+0x74/0xb8 [e300page] (unreliable)
[ce30bef0] [c00038b4] do_one_initcall+0x54/0x210
[ce30bf20] [c0058bb0] sys_init_module+0x120/0x240
[ce30bf40] [c0012afc] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xfe5baa0
    LR = 0x10027de0
Instruction dump:
4e800020 3c60d005 3863a094 48000031 807fa214 38800000 48000045 3c60d005
3863a0b4 48000019 813fa214 38000061 <98090000> 4bffffb8 00000000 3d60c035
---[ end trace a05c5135ddb62734 ]---
Segmentation fault

Checking dmesg(where you can see the printk msg from the module):
Read from allocated page
Write to freed page
Unable to handle kernel paging request for data at address 0xce172000
Faulting instruction address: 0xd004d07c
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT DEBUG_PAGEALLOC 834x SYS
Modules linked in: e300page(+)
NIP: d004d07c LR: d004d074 CTR: 00000001
REGS: ce30be30 TRAP: 0300   Not tainted  (2.6.33.3-rt17)
MSR: 00009032 <EE,ME,IR,DR>  CR: 22000422  XER: 20000000
DAR: ce172000, DSISR: 22000000
TASK = ce306120[1392] 'insmod' THREAD: ce30a000
GPR00: 00000061 ce30bee0 ce306120 0000001a 00003b0d ffffffff c047c4d6
00004000
GPR08: c047c8c4 ce172000 00003fff ce306120 22000422 100bbc1c 0fffd000
ffffffff
GPR16: 00000001 00000000 007fff00 00000000 00000000 0fffa1a0 00000000
100b90a0
GPR24: 4800e4ac bf876f10 c043d388 00000000 d004d000 c0480000 ce30a000
d0050000
NIP [d004d07c] e300page_init+0x7c/0xb8 [e300page]
LR [d004d074] e300page_init+0x74/0xb8 [e300page]
Call Trace:
[ce30bee0] [d004d074] e300page_init+0x74/0xb8 [e300page] (unreliable)
[ce30bef0] [c00038b4] do_one_initcall+0x54/0x210
[ce30bf20] [c0058bb0] sys_init_module+0x120/0x240
[ce30bf40] [c0012afc] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xfe5baa0
    LR = 0x10027de0
Instruction dump:
4e800020 3c60d005 3863a094 48000031 807fa214 38800000 48000045 3c60d005
3863a0b4 48000019 813fa214 38000061 <98090000> 4bffffb8 00000000 3d60c035
---[ end trace a05c5135ddb62734 ]---

# lsmod
Module                  Size  Used by    Tainted: G
e300page                1549  1
# rmmod e300page
rmmod: e300page: Device or resource busy

Module source is below:
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/init.h>

static int __init e300page_init (void)
{
        static char *kpage;
        char ch;

        if (!(kpage = (char *)__get_free_pages (GFP_KERNEL, 0))) {
                printk (KERN_ERR " __get_free_pages failed\n");
                return 0;
        }

        printk(KERN_INFO "Read from allocated page\n");
        ch = kpage[0];
        free_pages ((unsigned long)kpage, 0);
        printk(KERN_INFO "Write to freed page\n");
        kpage[0]='a';

        return 0;
}
static void __exit e300page_exit (void)
{
        printk (KERN_INFO "e300page exiting\n");
}

module_init (e300page_init);
module_exit (e300page_exit);

MODULE_AUTHOR ("Xianghua Xiao");
MODULE_DESCRIPTION ("Test PageAlloc on e300");
MODULE_LICENSE ("GPL v2");
Benjamin Herrenschmidt - May 4, 2010, 5:13 a.m.
On Mon, 2010-05-03 at 09:19 -0500, Xianghua Xiao wrote:

> Ok here is the test result I did this morning, in short with/without
> Ben's patch it will oops if you write to a freed page.

Probably lucky enough that the free_pages() is pushing the stuff out of
the TLB. At least my patch prevents writing to a non-existing hash table
though, thus avoiding memory corruption.

Cheers,
Ben.
Maindoor - May 6, 2010, 8:23 a.m.
I'm not sure how good of a test this is. I tried the same thing and it seems to workimmediately ie. it gives an oops as soon as "I change the attribute and write to thepage". But when I try to write to it at a later stage as part of an ioctl or after about5 mins, this feature does not seem to work. What if the page is brought in again due to a fault, does the attribute remain the  same ? How can I verify this ? 
RegardsMaindoor.



--- On Mon, 5/3/10, Xianghua Xiao <xiaoxianghua@gmail.com> wrote:

From: Xianghua Xiao <xiaoxianghua@gmail.com>
Subject: Re: [PATCH] Fix DEBUG_PAGEALLOC on 603/e300
To: "Maindoor" <sanjeevfiles@yahoo.com>
Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>
Date: Monday, May 3, 2010, 7:49 PM

On Sun, May 2, 2010 at 5:27 AM, Maindoor <sanjeevfiles@yahoo.com> wrote:


Any updates on this ? I need something similar.

Thanks,
Maindoor.

--- On Thu, 4/29/10, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:


From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Subject: Re: [PATCH] Fix DEBUG_PAGEALLOC on 603/e300
To: "Xianghua Xiao" <xiaoxianghua@gmail.com>
Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>

Date: Thursday, April 29, 2010, 3:59 AM

On Wed, 2010-04-28 at 14:15 -0500, Xianghua Xiao wrote:
> This change works me on a 834x(e300) platform, tested with lmbench and
> a production-ready application with 2.6.33.3.


But have you tested that DEBUG_PAGEALLOC actually works ? :-)

A way to do that
 is to

    - get_free_pages a page
    - read from it
    - free it
    - write to it

It should oops on the write, and I suspect that without my patch it
doesn't.

Cheers,
Ben.
Maindoor - May 6, 2010, 1:04 p.m.
Kind of working, I'll confirm this tomorrow.Was discussed earlier on ppcdev. Had to make somechange in fault handler, for this to work. 
Regards,Maindoor.
--- On Thu, 5/6/10, Maindoor <sanjeevfiles@yahoo.com> wrote:

From: Maindoor <sanjeevfiles@yahoo.com>
Subject: Re: [PATCH] Fix DEBUG_PAGEALLOC on 603/e300
To: "Xianghua Xiao" <xiaoxianghua@gmail.com>
Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>
Date: Thursday, May 6, 2010, 1:53 PM

I'm not sure how good of a test this is. I tried the same thing and it seems to workimmediately ie. it gives an oops as soon as "I change the attribute and write to thepage". But when I try to write to it at a later stage as part of an ioctl or after about5 mins, this feature does not seem to work. What if the page is brought in again due to a fault, does the attribute remain the  same ? How can I verify this ? 
RegardsMaindoor.



--- On Mon, 5/3/10, Xianghua Xiao <xiaoxianghua@gmail.com> wrote:

From: Xianghua Xiao <xiaoxianghua@gmail.com>
Subject: Re: [PATCH] Fix DEBUG_PAGEALLOC on
 603/e300
To: "Maindoor" <sanjeevfiles@yahoo.com>
Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>
Date: Monday, May 3, 2010, 7:49 PM

On Sun, May 2, 2010 at 5:27 AM, Maindoor <sanjeevfiles@yahoo.com> wrote:


Any updates on this ? I need something similar.

Thanks,
Maindoor.

--- On Thu, 4/29/10, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:


From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Subject: Re: [PATCH] Fix DEBUG_PAGEALLOC on 603/e300
To: "Xianghua Xiao" <xiaoxianghua@gmail.com>
Cc: "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>

Date: Thursday, April 29, 2010, 3:59 AM

On Wed, 2010-04-28 at 14:15 -0500, Xianghua Xiao wrote:
> This change works me on a 834x(e300) platform, tested with lmbench and
> a production-ready application with 2.6.33.3.


But have you tested that DEBUG_PAGEALLOC actually works ? :-)

A way to do that
 is to

    - get_free_pages a page
    - read from it
    - free it
    - write to it

It should oops on the write, and I suspect that without my patch it
doesn't.

Cheers,
Ben.

Patch

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index b9243e7..95774b4 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -385,11 +385,7 @@  static int __change_page_attr(struct page *page, pgprot_t prot)
 		return -EINVAL;
 	__set_pte_at(&init_mm, address, kpte, mk_pte(page, prot), 0);
 	wmb();
-#ifdef CONFIG_PPC_STD_MMU
-	flush_hash_pages(0, address, pmd_val(*kpmd), 1);
-#else
 	flush_tlb_page(NULL, address);
-#endif
 	pte_unmap(kpte);
 
 	return 0;