diff mbox

[1/8] fix bootmem reservation on uninitialized node

Message ID 20081209182130.1E4C1438@kernel (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Dave Hansen Dec. 9, 2008, 6:21 p.m. UTC
careful_allocation() was calling into the bootemem allocator for
nodes which had not been fully initialized and caused a previous
bug.  http://patchwork.ozlabs.org/patch/10528/  So, I merged a
few broken out loops in do_init_bootmem() to fix it.  That changed
the code ordering.

I think this bug is triggered by having reserved areas for a node
which are spanned by another node's contents.  In the
mark_reserved_regions_for_nid() code, we attempt to reserve the
area for a node before we have allocated the NODE_DATA() for that
nid.  We do this since I reordered that loop.  I suck.

This may only present on some systems that have 16GB pages
reserved.  But, it can probably happen on any system that is
trying to reserve large swaths of memory that happen to span other
nodes' contents.

This patch ensures that we do not touch bootmem for any node which
has not been initialized.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
---

 linux-2.6.git-dave/arch/powerpc/mm/numa.c |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Comments

Paul Mackerras Dec. 10, 2008, 10:14 p.m. UTC | #1
Dave Hansen writes:

> This patch ensures that we do not touch bootmem for any node which
> has not been initialized.
> 
> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>

So, should I be sending this to Linus for 2.6.28?

I notice you have added a dbg() call.  For a 2.6.28 patch I'd somewhat
prefer not to have that in unless necessary.

Jon, does this patch fix the problem on your machine with 16G pages?

Paul.
Jon Tollefson Dec. 10, 2008, 10:30 p.m. UTC | #2
Paul Mackerras wrote:
> Dave Hansen writes:
>
>   
>> This patch ensures that we do not touch bootmem for any node which
>> has not been initialized.
>>
>> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
>>     
>
> So, should I be sending this to Linus for 2.6.28?
>
> I notice you have added a dbg() call.  For a 2.6.28 patch I'd somewhat
> prefer not to have that in unless necessary.
>
> Jon, does this patch fix the problem on your machine with 16G pages?
>   
It worked on a machine with one page, I am awaiting access to another 
with more pages.

> Paul.
>   
Jon
Dave Hansen Dec. 10, 2008, 10:54 p.m. UTC | #3
On Thu, 2008-12-11 at 09:14 +1100, Paul Mackerras wrote:
> Dave Hansen writes:
> > This patch ensures that we do not touch bootmem for any node which
> > has not been initialized.
> > 
> > Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
> 
> So, should I be sending this to Linus for 2.6.28?

Yes, this is 2.6.28 material.

> I notice you have added a dbg() call.  For a 2.6.28 patch I'd somewhat
> prefer not to have that in unless necessary.

It isn't necessary and probably snuck in from the others.  I'll respin
and send this one out again.  If you want the others, I'll send
separately.

-- Dave
diff mbox

Patch

diff -puN arch/powerpc/mm/numa.c~fix-bad-node-reserve arch/powerpc/mm/numa.c
--- linux-2.6.git/arch/powerpc/mm/numa.c~fix-bad-node-reserve	2008-12-09 10:16:04.000000000 -0800
+++ linux-2.6.git-dave/arch/powerpc/mm/numa.c	2008-12-09 10:16:04.000000000 -0800
@@ -870,6 +870,7 @@  static void mark_reserved_regions_for_ni
 	struct pglist_data *node = NODE_DATA(nid);
 	int i;
 
+	dbg("mark_reserved_regions_for_nid(%d) NODE_DATA: %p\n", nid, node);
 	for (i = 0; i < lmb.reserved.cnt; i++) {
 		unsigned long physbase = lmb.reserved.region[i].base;
 		unsigned long size = lmb.reserved.region[i].size;
@@ -901,10 +902,14 @@  static void mark_reserved_regions_for_ni
 			if (end_pfn > node_ar.end_pfn)
 				reserve_size = (node_ar.end_pfn << PAGE_SHIFT)
 					- (start_pfn << PAGE_SHIFT);
-			dbg("reserve_bootmem %lx %lx nid=%d\n", physbase,
-				reserve_size, node_ar.nid);
-			reserve_bootmem_node(NODE_DATA(node_ar.nid), physbase,
-						reserve_size, BOOTMEM_DEFAULT);
+			/*
+			 * Only worry about *this* node, others may not
+			 * yet have valid NODE_DATA().
+			 */
+			if (node_ar.nid == nid)
+				reserve_bootmem_node(NODE_DATA(node_ar.nid),
+						physbase, reserve_size,
+						BOOTMEM_DEFAULT);
 			/*
 			 * if reserved region is contained in the active region
 			 * then done.