Message ID | 20081209182130.1E4C1438@kernel (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Dave Hansen writes: > This patch ensures that we do not touch bootmem for any node which > has not been initialized. > > Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> So, should I be sending this to Linus for 2.6.28? I notice you have added a dbg() call. For a 2.6.28 patch I'd somewhat prefer not to have that in unless necessary. Jon, does this patch fix the problem on your machine with 16G pages? Paul.
Paul Mackerras wrote: > Dave Hansen writes: > > >> This patch ensures that we do not touch bootmem for any node which >> has not been initialized. >> >> Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> >> > > So, should I be sending this to Linus for 2.6.28? > > I notice you have added a dbg() call. For a 2.6.28 patch I'd somewhat > prefer not to have that in unless necessary. > > Jon, does this patch fix the problem on your machine with 16G pages? > It worked on a machine with one page, I am awaiting access to another with more pages. > Paul. > Jon
On Thu, 2008-12-11 at 09:14 +1100, Paul Mackerras wrote: > Dave Hansen writes: > > This patch ensures that we do not touch bootmem for any node which > > has not been initialized. > > > > Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> > > So, should I be sending this to Linus for 2.6.28? Yes, this is 2.6.28 material. > I notice you have added a dbg() call. For a 2.6.28 patch I'd somewhat > prefer not to have that in unless necessary. It isn't necessary and probably snuck in from the others. I'll respin and send this one out again. If you want the others, I'll send separately. -- Dave
diff -puN arch/powerpc/mm/numa.c~fix-bad-node-reserve arch/powerpc/mm/numa.c --- linux-2.6.git/arch/powerpc/mm/numa.c~fix-bad-node-reserve 2008-12-09 10:16:04.000000000 -0800 +++ linux-2.6.git-dave/arch/powerpc/mm/numa.c 2008-12-09 10:16:04.000000000 -0800 @@ -870,6 +870,7 @@ static void mark_reserved_regions_for_ni struct pglist_data *node = NODE_DATA(nid); int i; + dbg("mark_reserved_regions_for_nid(%d) NODE_DATA: %p\n", nid, node); for (i = 0; i < lmb.reserved.cnt; i++) { unsigned long physbase = lmb.reserved.region[i].base; unsigned long size = lmb.reserved.region[i].size; @@ -901,10 +902,14 @@ static void mark_reserved_regions_for_ni if (end_pfn > node_ar.end_pfn) reserve_size = (node_ar.end_pfn << PAGE_SHIFT) - (start_pfn << PAGE_SHIFT); - dbg("reserve_bootmem %lx %lx nid=%d\n", physbase, - reserve_size, node_ar.nid); - reserve_bootmem_node(NODE_DATA(node_ar.nid), physbase, - reserve_size, BOOTMEM_DEFAULT); + /* + * Only worry about *this* node, others may not + * yet have valid NODE_DATA(). + */ + if (node_ar.nid == nid) + reserve_bootmem_node(NODE_DATA(node_ar.nid), + physbase, reserve_size, + BOOTMEM_DEFAULT); /* * if reserved region is contained in the active region * then done.
careful_allocation() was calling into the bootemem allocator for nodes which had not been fully initialized and caused a previous bug. http://patchwork.ozlabs.org/patch/10528/ So, I merged a few broken out loops in do_init_bootmem() to fix it. That changed the code ordering. I think this bug is triggered by having reserved areas for a node which are spanned by another node's contents. In the mark_reserved_regions_for_nid() code, we attempt to reserve the area for a node before we have allocated the NODE_DATA() for that nid. We do this since I reordered that loop. I suck. This may only present on some systems that have 16GB pages reserved. But, it can probably happen on any system that is trying to reserve large swaths of memory that happen to span other nodes' contents. This patch ensures that we do not touch bootmem for any node which has not been initialized. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> --- linux-2.6.git-dave/arch/powerpc/mm/numa.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)