Patchwork HOLES_IN_ZONE...

login
register
mail settings
Submitter KAMEZAWA Hiroyuki
Date Feb. 5, 2009, 7:43 a.m.
Message ID <20090205164330.83777a4d.kamezawa.hiroyu@jp.fujitsu.com>
Download mbox | patch
Permalink /patch/22068/
State RFC
Delegated to: David Miller
Headers show

Comments

KAMEZAWA Hiroyuki - Feb. 5, 2009, 7:43 a.m.
On Wed, 04 Feb 2009 22:26:51 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> 
> So I've been fighting mysterious crashes on my main sparc64 devel
> machine.  What's happening is that the assertion in
> mm/page_alloc.c:move_freepages() is triggering:
> 
> 	BUG_ON(page_zone(start_page) != page_zone(end_page));
> 
> Once I knew this is what was happening, I added some annotations:
> 
> 	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
> 		printk(KERN_ERR "move_freepages: Bogus zones: "
> 		       "start_page[%p] end_page[%p] zone[%p]\n",
> 		       start_page, end_page, zone);
> 		printk(KERN_ERR "move_freepages: "
> 		       "start_zone[%p] end_zone[%p]\n",
> 		       page_zone(start_page), page_zone(end_page));
> 		printk(KERN_ERR "move_freepages: "
> 		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
> 		       page_to_pfn(start_page), page_to_pfn(end_page));
> 		printk(KERN_ERR "move_freepages: "
> 		       "start_nid[%d] end_nid[%d]\n",
> 		       page_to_nid(start_page), page_to_nid(end_page));
>  ...
> 
> And here's what I got:
> 
> 	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
> 	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
> 	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
> 	move_freepages: start_nid[1] end_nid[0]
> 
> My memory layout on this box is:
> 
> [    0.000000] Zone PFN ranges:
> [    0.000000]   Normal   0x00000000 -> 0x0081ff5d
> [    0.000000] Movable zone start PFN for each node
> [    0.000000] early_node_map[8] active PFN ranges
> [    0.000000]     0: 0x00000000 -> 0x00020000
> [    0.000000]     1: 0x00800000 -> 0x0081f7ff
> [    0.000000]     1: 0x0081f800 -> 0x0081fe50
> [    0.000000]     1: 0x0081fed1 -> 0x0081fed8
> [    0.000000]     1: 0x0081feda -> 0x0081fedb
> [    0.000000]     1: 0x0081fedd -> 0x0081fee5
> [    0.000000]     1: 0x0081fee7 -> 0x0081ff51
> [    0.000000]     1: 0x0081ff59 -> 0x0081ff5d
> 

Ah, end_pfn is not valid page. And, page->flags shows nid 0.
It seems memmap for end_pfn is not initialized correctly.

At first, there are some complicated around here..

1. pfn_valid() is just for "there is memmap." not for "the memory is valid"
2. If "memory is invalid" && it has memmap, it should be marked as PG_Reserved.
   And it will never be put into buddy allocator. 
3. memmap for not exisiting memory can be initialized but it's depends on
   zone->spanned_pages. (see free_area_init_core())
4. What CONFIG_HOLES_IN_ZONE means is 
   "there can be invalid memmap within coutinuous range of zone->mem_map"
   This comes from VIRTUAL_MEMMAP.
   In usual arch, mem_map is guaranteed to be coutinuous always.




> 	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
> 	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
> 	move_freepages: start_nid[1] end_nid[0]
> [    0.000000]     0: 0x00000000 -> 0x00020000
> [    0.000000]     1: 0x00800000 -> 0x0081f7ff

> [    0.000000]     1: 0x00800000 -> 0x0081f7ff

I think it's strange that end_pfn's nid is 0.

From this log, mem_map for end_pfn exists (means pfn_valid(end_pfn) == true)
So, it should be initialized correctly and should have nid 1 if initialized.

Maybe Node1's zone->start_pfn and zone->spanned_pages covers 0x81f7ff, and it's
range is 0x00800000 -> 0x0081ff5d

But,  this check in memmap_init_zone()
==
2619                 if (context == MEMMAP_EARLY) {
2620                         if (!early_pfn_valid(pfn))
2621                                 continue;
2622                         if (!early_pfn_in_nid(pfn, nid))
2623                                 continue;
2624                 }
==
will allow skip to init this mem_map of 0x8af7ff.
*AND*, SetPageResreved() is never called. This is a problem I think.

> It takes a lot of stressing to get that specific chunk of pages to
> attempt to be freed up in a group like that :-/
> 
> As a suggestion, it would have been a lot more pleasant if the code
> validated this requirement (in the !HOLES_IN_ZONE case) at boot time
> instead of after 2 hours of stress testing :-(
> 

Can this patch help you ? (maybe more careful study is necessary...)
---
 mm/page_alloc.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)





--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

Index: mmotm-2.6.29-Feb03/mm/page_alloc.c
===================================================================
--- mmotm-2.6.29-Feb03.orig/mm/page_alloc.c
+++ mmotm-2.6.29-Feb03/mm/page_alloc.c
@@ -2618,6 +2618,7 @@  void __meminit memmap_init_zone(unsigned
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
+	int tmp;
 
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
@@ -2632,7 +2633,8 @@  void __meminit memmap_init_zone(unsigned
 		if (context == MEMMAP_EARLY) {
 			if (!early_pfn_valid(pfn))
 				continue;
-			if (!early_pfn_in_nid(pfn, nid))
+			tmp = early_pfn_in_nid(pfn, nid);
+			if (tmp > -1 && tmp != nid)
 				continue;
 		}
 		page = pfn_to_page(pfn);
@@ -2999,8 +3001,9 @@  int __meminit early_pfn_to_nid(unsigned 
 			return early_node_map[i].nid;
 	}
 
-	return 0;
+	return -1;
 }
+
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 
 /* Basic iterator support to walk early_node_map[] */