Patchwork 2.6.28-rc9 panics with crashkernel=256M while booting

login
register
mail settings
Submitter Chandru
Date Jan. 19, 2009, 11:30 a.m.
Message ID <200901191700.03580.chandru@in.ibm.com>
Download mbox | patch
Permalink /patch/19276/
State Changes Requested
Delegated to: Benjamin Herrenschmidt
Headers show

Comments

Chandru - Jan. 19, 2009, 11:30 a.m.
In case either physbase or reserve_size are not page aligned and in addition
if the following condition is also true

node_ar.end_pfn = node->node_end_pfn = PFN_DOWN(physbase+reserve_size).

we may hit the BUG_ON(end > bdata->node_low_pfn) in mark_bootmem_node() in
mm/bootmem.c Hence pass the pfn that the physbase is part of and align
reserve_size before calling reserve_bootmem_node().

Signed-off-by: Chandru S <chandru@linux.vnet.ibm.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
Chandru - Jan. 20, 2009, 8:13 a.m.
Chandru wrote:
> In case either physbase or reserve_size are not page aligned and in addition
> if the following condition is also true
>
> node_ar.end_pfn = node->node_end_pfn = PFN_DOWN(physbase+reserve_size).
>
> we may hit the BUG_ON(end > bdata->node_low_pfn) in mark_bootmem_node() in
> mm/bootmem.c Hence pass the pfn that the physbase is part of and align
> reserve_size before calling reserve_bootmem_node().
>
> Signed-off-by: Chandru S <chandru@linux.vnet.ibm.com>
> Cc: Dave Hansen <dave@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/numa.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> --- linux-2.6.29-rc2/arch/powerpc/mm/numa.c.orig	2009-01-19 16:14:49.000000000 +0530
> +++ linux-2.6.29-rc2/arch/powerpc/mm/numa.c	2009-01-19 16:36:38.000000000 +0530
> @@ -901,7 +901,8 @@ static void mark_reserved_regions_for_ni
>  		get_node_active_region(start_pfn, &node_ar);
>  		while (start_pfn < end_pfn &&
>  			node_ar.start_pfn < node_ar.end_pfn) {
> -			unsigned long reserve_size = size;
> +			unsigned long reserve_size = (size >> PAGE_SHIFT) <<
> +								PAGE_SHIFT;
>  			/*
>  			 * if reserved region extends past active region
>  			 * then trim size to active region
> @@ -917,7 +918,8 @@ static void mark_reserved_regions_for_ni
>  				dbg("reserve_bootmem %lx %lx nid=%d\n",
>  					physbase, reserve_size, node_ar.nid);
>  				reserve_bootmem_node(NODE_DATA(node_ar.nid),
> -						physbase, reserve_size,
> +						(start_pfn << PAGE_SHIFT),
> +						reserve_size,
>  						BOOTMEM_DEFAULT);
>  			}
>  			/*
>   
does this patch look good ?, do you concur with it ?

thanks,
Chandru
Dave Hansen - Jan. 22, 2009, 12:29 a.m.
On Mon, 2009-01-19 at 17:00 +0530, Chandru wrote:
> --- linux-2.6.29-rc2/arch/powerpc/mm/numa.c.orig	2009-01-19 16:14:49.000000000 +0530
> +++ linux-2.6.29-rc2/arch/powerpc/mm/numa.c	2009-01-19 16:36:38.000000000 +0530
> @@ -901,7 +901,8 @@ static void mark_reserved_regions_for_ni
>  		get_node_active_region(start_pfn, &node_ar);
>  		while (start_pfn < end_pfn &&
>  			node_ar.start_pfn < node_ar.end_pfn) {
> -			unsigned long reserve_size = size;
> +			unsigned long reserve_size = (size >> PAGE_SHIFT) <<
> +								PAGE_SHIFT;
>  			/*
>  			 * if reserved region extends past active region
>  			 * then trim size to active region
> @@ -917,7 +918,8 @@ static void mark_reserved_regions_for_ni
>  				dbg("reserve_bootmem %lx %lx nid=%d\n",
>  					physbase, reserve_size, node_ar.nid);
>  				reserve_bootmem_node(NODE_DATA(node_ar.nid),
> -						physbase, reserve_size,
> +						(start_pfn << PAGE_SHIFT),
> +						reserve_size,
>  						BOOTMEM_DEFAULT);
>  			}
>  			/*

Chandru, I don't mean to keep ragging on your patches, but I really
don't think this is right, yet.

Let's take, for instance, a 1-byte reservation.  With this code, you've
suddenly turned that into a 0-byte reservation, and that *can't* be
right.  The same thing happens if you have a reservation that spans two
pages.  If you unconditionally round it down, then you might miss the
part that spans a portion of the second page.

It needs to be rounded down like you are suggesting here, but only in
the case where we've gone over the *CURRENT* node's boundary.  This is
kinda what that "if (end_pfn > node_ar.end_pfn)" check is doing.  But,
it evidently screws it up if the overlap isn't by an entire page or
something.

Please also, for pete's sake, use masks (a la PAGE_MASK) or macros if
you're going to page-align something.  Don't shift down and up like
that.  

-- Dave
Chandru - Jan. 22, 2009, 8:20 a.m.
On Thursday 22 January 2009 05:59:39 Dave Hansen wrote:
> Let's take, for instance, a 1-byte reservation.  With this code, you've
> suddenly turned that into a 0-byte reservation, and that *can't* be
> right.  The same thing happens if you have a reservation that spans two
> pages.  If you unconditionally round it down, then you might miss the
> part that spans a portion of the second page.
> 
> It needs to be rounded down like you are suggesting here, but only in
> the case where we've gone over the *CURRENT* node's boundary.  This is
> kinda what that "if (end_pfn > node_ar.end_pfn)" check is doing.  But,
> it evidently screws it up if the overlap isn't by an entire page or
> something.

I assumed the condition 'while (start_pfn < end_pfn  && .. )' asks for atleast
a PAGE_SIZE difference between them and hence went ahead with that patch.
My guess was a 1-byte , 2-byte or a (PAGE_SIZE -1)-byte reservations may not even
go into that loop.  However we just need a fix for this problem. So if there is a 
better fix that you have please post it to lkml. 

Thanks,
Chandru

Patch

--- linux-2.6.29-rc2/arch/powerpc/mm/numa.c.orig	2009-01-19 16:14:49.000000000 +0530
+++ linux-2.6.29-rc2/arch/powerpc/mm/numa.c	2009-01-19 16:36:38.000000000 +0530
@@ -901,7 +901,8 @@  static void mark_reserved_regions_for_ni
 		get_node_active_region(start_pfn, &node_ar);
 		while (start_pfn < end_pfn &&
 			node_ar.start_pfn < node_ar.end_pfn) {
-			unsigned long reserve_size = size;
+			unsigned long reserve_size = (size >> PAGE_SHIFT) <<
+								PAGE_SHIFT;
 			/*
 			 * if reserved region extends past active region
 			 * then trim size to active region
@@ -917,7 +918,8 @@  static void mark_reserved_regions_for_ni
 				dbg("reserve_bootmem %lx %lx nid=%d\n",
 					physbase, reserve_size, node_ar.nid);
 				reserve_bootmem_node(NODE_DATA(node_ar.nid),
-						physbase, reserve_size,
+						(start_pfn << PAGE_SHIFT),
+						reserve_size,
 						BOOTMEM_DEFAULT);
 			}
 			/*