Message ID | 20120229152830.22fc72a2.akpm@linux-foundation.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On 29.02.2012 [15:28:30 -0800], Andrew Morton wrote: > On Wed, 29 Feb 2012 10:12:33 -0800 > Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote: > > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > > Overcommit) on powerpc, we tripped the following: > > > > kernel BUG at mm/bootmem.c:483! > > > > ... > > > > This is > > > > BUG_ON(limit && goal + size > limit); > > > > and after some debugging, it seems that > > > > goal = 0x7ffff000000 > > limit = 0x80000000000 > > > > and sparse_early_usemaps_alloc_node -> > > sparse_early_usemaps_alloc_pgdat_section calls > > > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > > > This is on a system with 8TB available via the AMS pool, and as a quirk > > of AMS in firmware, all of that memory shows up in node 0. So, we end up > > with an allocation that will fail the goal/limit constraints. In theory, > > we could "fall-back" to alloc_bootmem_node() in > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > > defined, we'll BUG_ON() instead. A simple solution appears to be to > > unconditionally remove the limit condition in alloc_bootmem_section, > > meaning allocations are allowed to cross section boundaries (necessary > > for systems of this size). > > > > Johannes Weiner pointed out that if alloc_bootmem_section() no longer > > guarantees section-locality, we need check_usemap_section_nr() to print > > possible cross-dependencies between node descriptors and the usemaps > > allocated through it. That makes the two loops in > > sparse_early_usemaps_alloc_node() identical, so re-factor the code a > > bit. > > The patch is a bit scary now, so I think we should merge it into > 3.4-rc1 and then backport it into 3.3.1 if nothing blows up. I think that's fair. > Do you think it should be backported into 3.3.x? Earlier kernels? 3.3.x seems reasonable. If I had to guess, I think this could be hit on any kernels with this functionality -- that is, sparsemem in general? Not sure how far back it's worth backporting. > Also, this? Urgh, yeah, that's way better. Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> > --- a/mm/bootmem.c~bootmem-sparsemem-remove-limit-constraint-in-alloc_bootmem_section-fix > +++ a/mm/bootmem.c > @@ -766,14 +766,13 @@ void * __init alloc_bootmem_section(unsi > unsigned long section_nr) > { > bootmem_data_t *bdata; > - unsigned long pfn, goal, limit; > + unsigned long pfn, goal; > > pfn = section_nr_to_pfn(section_nr); > goal = pfn << PAGE_SHIFT; > - limit = 0; > bdata = &bootmem_node_data[early_pfn_to_nid(pfn)]; > > - return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit); > + return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, 0); > } > #endif Thanks for all the feedback! -Nish
On 29.02.2012 [15:28:30 -0800], Andrew Morton wrote: > On Wed, 29 Feb 2012 10:12:33 -0800 > Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote: > > > While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory > > Overcommit) on powerpc, we tripped the following: > > > > kernel BUG at mm/bootmem.c:483! > > > > ... > > > > This is > > > > BUG_ON(limit && goal + size > limit); > > > > and after some debugging, it seems that > > > > goal = 0x7ffff000000 > > limit = 0x80000000000 > > > > and sparse_early_usemaps_alloc_node -> > > sparse_early_usemaps_alloc_pgdat_section calls > > > > return alloc_bootmem_section(usemap_size() * count, section_nr); > > > > This is on a system with 8TB available via the AMS pool, and as a quirk > > of AMS in firmware, all of that memory shows up in node 0. So, we end up > > with an allocation that will fail the goal/limit constraints. In theory, > > we could "fall-back" to alloc_bootmem_node() in > > sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE > > defined, we'll BUG_ON() instead. A simple solution appears to be to > > unconditionally remove the limit condition in alloc_bootmem_section, > > meaning allocations are allowed to cross section boundaries (necessary > > for systems of this size). > > > > Johannes Weiner pointed out that if alloc_bootmem_section() no longer > > guarantees section-locality, we need check_usemap_section_nr() to print > > possible cross-dependencies between node descriptors and the usemaps > > allocated through it. That makes the two loops in > > sparse_early_usemaps_alloc_node() identical, so re-factor the code a > > bit. > > The patch is a bit scary now, so I think we should merge it into > 3.4-rc1 and then backport it into 3.3.1 if nothing blows up. > > Do you think it should be backported into 3.3.x? Earlier kernels? Upon review, it would be good if we can get it pushed back to kernels 3.0.x, 3.1.x and 3.2.x. Thanks, Nish
--- a/mm/bootmem.c~bootmem-sparsemem-remove-limit-constraint-in-alloc_bootmem_section-fix +++ a/mm/bootmem.c @@ -766,14 +766,13 @@ void * __init alloc_bootmem_section(unsi unsigned long section_nr) { bootmem_data_t *bdata; - unsigned long pfn, goal, limit; + unsigned long pfn, goal; pfn = section_nr_to_pfn(section_nr); goal = pfn << PAGE_SHIFT; - limit = 0; bdata = &bootmem_node_data[early_pfn_to_nid(pfn)]; - return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit); + return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, 0); } #endif