Patchwork 2.6.28-rc9 panics with crashkernel=256M while booting

login
register
mail settings
Submitter Chandru
Date Jan. 5, 2009, 1:49 p.m.
Message ID <200901051919.52327.chandru@in.ibm.com>
Download mbox | patch
Permalink /patch/16613/
State Not Applicable
Headers show

Comments

Chandru - Jan. 5, 2009, 1:49 p.m.
On Tuesday 30 December 2008 03:06:07 Dave Hansen wrote:
> On Fri, 2008-12-26 at 11:59 +1100, Paul Mackerras wrote:
> > > +     }
> > > +
> > > +     for_each_online_node(nid) {
> > >               /*
> > > -              * Be very careful about moving this around.  Future
> > > -              * calls to careful_allocation() depend on this getting
> > > -              * done correctly.
> > > +              * Be very careful about moving this around.
> > >                */
> > >               mark_reserved_regions_for_nid(nid);
> > >               sparse_memory_present_with_active_regions(nid);
>
> I think this reintroduces one of the bugs that I squashed.  You *have*
> to call mark_reserved_regions_for_nid() right after you do
> free_bootmem_with_active_regions().  Otherwise, someone else can
> bootmem_alloc() a reserved region from that node.

Thanks for the review comments Dave. With the commit:a4c74ddd5ea3db53fc73d29c222b22656a7d05be, I see this has been taken care in mark_reserved_regions_for_nid(). In that case we may only need the change made in reserve_bootmem_node(). 

Hello Andrew, 
Could you please consider the following patch instead of the original patch in this thread. 
Thanks, 


When booted with crashkernel=224M@32M or any memory size less than this, the system boots properly. The system comes up with two nodes (0-256M and 256M-4GB). The crashkernel memory reservation spans across these two nodes. The mark_reserved_regions_for_nid() in arch/powerpc/numa.c resizes the reserved part of the memory within it as... 

	 if (end_pfn > node_ar.end_pfn)
		reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
				- (start_pfn << PAGE_SHIFT);

but the reserve_bootmem_node() in mm/bootmem.c raises the pfn value of end

	end = PFN_UP(physaddr + size);

This causes end to get a value past the last page in the 0-256M node. The following change restricts the value of end if it exceeds the last pfn in a given node.

Signed-off-by: Chandru S <chandru@linux.vnet.ibm.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
---

 mm/bootmem.c |    4 ++++
 1 file changed, 4 insertions(+)
Dave Hansen - Jan. 5, 2009, 4:30 p.m.
On Mon, 2009-01-05 at 19:19 +0530, Chandru wrote:
> On Tuesday 30 December 2008 03:06:07 Dave Hansen wrote:
> When booted with crashkernel=224M@32M or any memory size less than
> this, the system boots properly. The system comes up with two nodes
> (0-256M and 256M-4GB). The crashkernel memory reservation spans across
> these two nodes. The mark_reserved_regions_for_nid() in
> arch/powerpc/numa.c resizes the reserved part of the memory within it
> as... 
> 
> 	 if (end_pfn > node_ar.end_pfn)
> 		reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
> 				- (start_pfn << PAGE_SHIFT);
> 
> but the reserve_bootmem_node() in mm/bootmem.c raises the pfn value of
> end
> 
> 	end = PFN_UP(physaddr + size);
> 
> This causes end to get a value past the last page in the 0-256M node.
> The following change restricts the value of end if it exceeds the last
> pfn in a given node.

OK, I had to think about this for a good, long time.  That's bad. :)

There are two things that we're dealing with here: "active regions" and
the NODE_DATA's.  The if() you've pasted above resizes the reservation
so that it fits into the current active region.  However, as you noted,
we haven't resized it so that it fits into the NODE_DATA() that we're
looking at.  We call into the bootmem code, and BUG_ON().

The thing I don't like about this is that it might hide bugs in other
callers.  This really is a ppc-specific thing and, although what you
wrote will fix the bug on ppc, it will probably cause someone in the
future to call reserve_bootmem_node() with too large a reservation and
get a silent failure (not reserving the requested size) back.  

We really do need to go take a hard look at the whole interaction
between lmb's, node active regions, and the NUMA code some day.  It has
kinda grown to be a bit ungainly.

How about we just consult the NODE_DATA() in
mark_reserved_regions_for_nid() instead of reserve_bootmem_node()?

> --- linux-2.6.28/mm/bootmem.c.orig	2009-01-05 20:42:12.000000000 +0530
> +++ linux-2.6.28/mm/bootmem.c	2009-01-05 20:43:53.000000000 +0530
> @@ -375,10 +375,14 @@ int __init reserve_bootmem_node(pg_data_
>  				 unsigned long size, int flags)
>  {
>  	unsigned long start, end;
> +	bootmem_data_t *bdata = pgdat->bdata;
> 
>  	start = PFN_DOWN(physaddr);
>  	end = PFN_UP(physaddr + size);
> 
> +	if (end > bdata->node_low_pfn)
> +		end = bdata->node_low_pfn;
> +
>  	return mark_bootmem_node(pgdat->bdata, start, end, 1, flags);
>  }
-- Dave
Chandru - Jan. 7, 2009, 12:58 p.m.
On Monday 05 January 2009 22:00:33 Dave Hansen wrote:
> OK, I had to think about this for a good, long time.  That's bad. :)
>
> There are two things that we're dealing with here: "active regions" and
> the NODE_DATA's.  The if() you've pasted above resizes the reservation
> so that it fits into the current active region.  However, as you noted,
> we haven't resized it so that it fits into the NODE_DATA() that we're
> looking at.  We call into the bootmem code, and BUG_ON().
>
> The thing I don't like about this is that it might hide bugs in other
> callers.  This really is a ppc-specific thing and, although what you
> wrote will fix the bug on ppc, it will probably cause someone in the
> future to call reserve_bootmem_node() with too large a reservation and
> get a silent failure (not reserving the requested size) back.
>
> We really do need to go take a hard look at the whole interaction
> between lmb's, node active regions, and the NUMA code some day.  It has
> kinda grown to be a bit ungainly.
>
> How about we just consult the NODE_DATA() in
> mark_reserved_regions_for_nid() instead of reserve_bootmem_node()?

I don't know how you wanted NODE_DATA() to be consulted here. i.e before 
calling reserve_bootmem_node() should we have a condition 

	if (PFN_UP(physbase+reserve_size) > node_end_pfn) 
	then
		resize reserve_size again so that PFN_UP() will equate to node_end_pfn ??
	end 

Also I was wondering if in reserve_bootmem_node()
	end = PFN_DOWN() ; will do.. 

With the recent changes from you that went into 2.6.28 stable 
(commit:a4c74ddd5ea3db53fc73d29c222b22656a7d05be), it worked on the system 
with PFN_DOWN(). 

Thanks,
Chandru
Dave Hansen - Jan. 7, 2009, 5:25 p.m.
On Wed, 2009-01-07 at 18:28 +0530, Chandru wrote:
> I don't know how you wanted NODE_DATA() to be consulted here. i.e before 
> calling reserve_bootmem_node() should we have a condition 
> 
>         if (PFN_UP(physbase+reserve_size) > node_end_pfn) 
>         then
>                 resize reserve_size again so that PFN_UP() will equate to node_end_pfn ??
>         end 

I'm just suggesting making your fix in the ppc code instead of in
mm/bootmem.c.

-- Dave

Patch

--- linux-2.6.28/mm/bootmem.c.orig	2009-01-05 20:42:12.000000000 +0530
+++ linux-2.6.28/mm/bootmem.c	2009-01-05 20:43:53.000000000 +0530
@@ -375,10 +375,14 @@  int __init reserve_bootmem_node(pg_data_
 				 unsigned long size, int flags)
 {
 	unsigned long start, end;
+	bootmem_data_t *bdata = pgdat->bdata;
 
 	start = PFN_DOWN(physaddr);
 	end = PFN_UP(physaddr + size);
 
+	if (end > bdata->node_low_pfn)
+		end = bdata->node_low_pfn;
+
 	return mark_bootmem_node(pgdat->bdata, start, end, 1, flags);
 }