Message ID | 200901081559.22204.chandru@in.ibm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Thu, 2009-01-08 at 15:59 +0530, Chandru wrote: > @@ -898,9 +899,17 @@ static void mark_reserved_regions_for_ni > * if reserved region extends past active region > * then trim size to active region > */ > - if (end_pfn > node_ar.end_pfn) > + if (end_pfn > node_ar.end_pfn) { > reserve_size = (node_ar.end_pfn << PAGE_SHIFT) > - (start_pfn << PAGE_SHIFT); > + /* > + * resize it further if the reservation could > + * cross the last page in this node > + */ > + if (PFN_UP(physbase+reserve_size) > > + node_end_pfn) > + reserve_size -= PAGE_SIZE; > + } > /* Now I'm even more confused. Could you please send a fully changelogged patch that describes the problem, and how this fixes it? This just seems like an off-by-one error, which isn't what I thought we had before at all. I'm also horribly confused why PFN_UP is needed here. Is 'physbase' not page aligned? reserve_size looks like it *has* to be. 'end_pfn' is always (as far as I have ever seen in the kernel) the pfn of the page after the area we are interested in and we treat it as such in that function. In the case of an unaligned physbase, that wouldn't be true. Think of the case where we have a 1-byte reservation. start_pfn will equal end_pfn and we won't go into that while loop at *all* and won't reserve anything. Does 'end_pfn' need fixing? -- Dave
On Friday 09 January 2009 01:33:12 Dave Hansen wrote: > Now I'm even more confused. Could you please send a fully changelogged > patch that describes the problem, and how this fixes it? This just > seems like an off-by-one error, which isn't what I thought we had before > at all. > > I'm also horribly confused why PFN_UP is needed here. Is 'physbase' not > page aligned? reserve_size looks like it *has* to be. 'end_pfn' is > always (as far as I have ever seen in the kernel) the pfn of the page > after the area we are interested in and we treat it as such in that > function. In the case of an unaligned physbase, that wouldn't be true. > > Think of the case where we have a 1-byte reservation. start_pfn will > equal end_pfn and we won't go into that while loop at *all* and won't > reserve anything. > > Does 'end_pfn' need fixing? > Attached is the console log with debug command line parameters enabled and with couple of more debug statements added to the code. Please take a look at it. thanks, Chandru Using 0078209e bytes for initrd buffer Please wait, loading kernel... Allocated 00d00000 bytes for kernel @ 02d00000 Elf64 kernel loaded... Loading ramdisk... ramdisk loaded 0078209e @ 00d00000 OF stdout device is: /vdevice/vty@30000000 Hypertas detected, assuming LPAR ! command line: root=/dev/disk/by-id/scsi-35000cca0030c7be2-part3 xmon=on crashkernel=256M@32M debug bootmem_debug numa=debug memory layout at init: alloc_bottom : 0000000003880000 alloc_top : 0000000010000000 alloc_top_hi : 0000000100000000 rmo_top : 0000000010000000 ram_top : 0000000100000000 Looking for displays instantiating rtas at 0x000000000f4e0000 ... done boot cpu hw idx 0000000000000000 copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0000000003a90000 -> 0x0000000003a917bc Device tree struct 0x0000000003aa0000 -> 0x0000000003ac0000 Calling quiesce ... returning from prom_init Reserving 256MB of memory at 32MB for crashkernel (System RAM: 4096MB) Phyp-dump disabled at boot time Using pSeries machine description Page orders: linear mapping = 24, virtual = 16, io = 12 Using 1TB segments Found initrd at 0xc000000000d00000:0xc00000000148209e console [udbg0] enabled Partition configured for 4 cpus. CPU maps initialized for 2 threads per core (thread shift is 1) Starting Linux PPC64 #7 SMP Fri Jan 9 04:50:05 CST 2009 ----------------------------------------------------- ppc64_pft_size = 0x1a physicalMemorySize = 0x100000000 htab_hash_mask = 0x7ffff ----------------------------------------------------- Initializing cgroup subsys cpuset Initializing cgroup subsys cpu Linux version 2.6.28-1-ppc64 (root@rulerlp10) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #7 SMP Fri Jan 9 04:50:05 CST 2009 [boot]0012 Setup Arch NUMA associativity depth for CPU/Memory: 3 Node 0 Memory: Node 5 Memory: 0x0-0x10000000 Node 7 Memory: 0x10000000-0x100000000 adding cpu 0 to node 0 node 0 NODE_DATA() = c0000000fffeda80 node 5 NODE_DATA() = c000000001f78800 start_paddr = 0 end_paddr = 10000000 bootmap_paddr = 1f60000 bootmem::init_bootmem_core nid=5 start=0 map=1f6 end=1000 mapsize=200 bootmem::mark_bootmem_node nid=5 start=0 end=1000 reserve=0 flags=0 bdata->node_min_pfn=0x0 bdata->node_low_pfn=0x1000 bootmem::__free nid=5 start=0 end=1000 reserve_bootmem : physbase :0x0 size:0xb70000 reserve_size=0xb70000 nid=5 start_pfn:0x0 end_pfn:0xb7 node_ar.start_pfn:0x0 node_ar.end_pfn:0x1000 node->node_start_pfn:0x0 node->node_spanned_pages:4096 bootmem::mark_bootmem_node nid=5 start=0 end=b7 reserve=1 flags=0 bdata->node_min_pfn=0x0 bdata->node_low_pfn=0x1000 bootmem::__reserve nid=5 start=0 end=b7 flags=0 reserve_bootmem : physbase :0xd00000 size:0x78209e reserve_size=0x78209e nid=5 start_pfn:0xd0 end_pfn:0x148 node_ar.start_pfn:0x0 node_ar.end_pfn:0x1000 node->node_start_pfn:0x0 node->node_spanned_pages:4096 bootmem::mark_bootmem_node nid=5 start=d0 end=149 reserve=1 flags=0 bdata->node_min_pfn=0x0 bdata->node_low_pfn=0x1000 bootmem::__reserve nid=5 start=d0 end=149 flags=0 reserve_bootmem : physbase :0xd00000 size:0x790000 reserve_size=0x790000 nid=5 start_pfn:0xd0 end_pfn:0x149 node_ar.start_pfn:0x0 node_ar.end_pfn:0x1000 node->node_start_pfn:0x0 node->node_spanned_pages:4096 bootmem::mark_bootmem_node nid=5 start=d0 end=149 reserve=1 flags=0 bdata->node_min_pfn=0x0 bdata->node_low_pfn=0x1000 bootmem::__reserve nid=5 start=d0 end=149 flags=0 bootmem::__reserve silent double reserve of PFN d0 bootmem::__reserve silent double reserve of PFN d1 ... ... ... bootmem::__reserve silent double reserve of PFN 147 bootmem::__reserve silent double reserve of PFN 148 reserve_bootmem : physbase :0x1f60000 size:0x10000 reserve_size=0x10000 nid=5 start_pfn:0x1f6 end_pfn:0x1f7 node_ar.start_pfn:0x0 node_ar.end_pfn:0x1000 node->node_start_pfn:0x0 node->node_spanned_pages:4096 bootmem::mark_bootmem_node nid=5 start=1f6 end=1f7 reserve=1 flags=0 bdata->node_min_pfn=0x0 bdata->node_low_pfn=0x1000 bootmem::__reserve nid=5 start=1f6 end=1f7 flags=0 reserve_bootmem : physbase :0x1f78800 size:0x10087800 reserve_size=0xe090000 nid=5 start_pfn:0x1f7 end_pfn:0x1200 node_ar.start_pfn:0x0 node_ar.end_pfn:0x1000 node->node_start_pfn:0x0 node->node_spanned_pages:4096 bootmem::mark_bootmem_node nid=5 start=1f7 end=1001 reserve=1 flags=0 bdata->node_min_pfn=0x0 bdata->node_low_pfn=0x1000 ------------[ cut here ]------------ kernel BUG at mm/bootmem.c:279! cpu 0x0: Vector: 700 (Program Check) at [c0000000008a39d0] pc: c000000000676ec0: .mark_bootmem_node+0xb4/0x12c lr: c000000000676e98: .mark_bootmem_node+0x8c/0x12c sp: c0000000008a3c50 msr: 8000000000021032 current = 0xc0000000007fa820 paca = 0xc000000000962c00 pid = 0, comm = swapper kernel BUG at mm/bootmem.c:279! enter ? for help [c0000000008a3d00] c000000000661d3c .do_init_bootmem+0xc3c/0xd78 [c0000000008a3e40] c0000000006567f0 .setup_arch+0x1a4/0x21c [c0000000008a3ee0] c000000000650868 .start_kernel+0xdc/0x56c [c0000000008a3f90] c0000000000083b8 .start_here_common+0x1c/0x64 0:mon>
On Friday 09 January 2009 16:37:24 Chandru wrote: > On Friday 09 January 2009 01:33:12 Dave Hansen wrote: > > Now I'm even more confused. Could you please send a fully changelogged > > patch that describes the problem, and how this fixes it? This just > > seems like an off-by-one error, which isn't what I thought we had before > > at all. > > > > I'm also horribly confused why PFN_UP is needed here. Is 'physbase' not > > page aligned? reserve_size looks like it *has* to be. 'end_pfn' is > > always (as far as I have ever seen in the kernel) the pfn of the page > > after the area we are interested in and we treat it as such in that > > function. In the case of an unaligned physbase, that wouldn't be true. > > > > Think of the case where we have a 1-byte reservation. start_pfn will > > equal end_pfn and we won't go into that while loop at *all* and won't > > reserve anything. > > > > Does 'end_pfn' need fixing? > > > > Attached is the console log with debug command line parameters enabled and > with couple of more debug statements added to the code. Please take a look at it. > > thanks, > Chandru > Hello Dave, From the debug console output, if there is anything you can add here, pls let me know. thanks
On Thursday 15 January 2009 13:35:27 Chandru wrote: > Hello Dave, From the debug console output, if there is anything you can add > here, pls let me know. As we can see from the console output here, physbase isn't page aligned when the panic occurs. So we could as well send (start_pfn << PAGE_SHIFT) to reserve_bootmem_node() instead of physbase. your thoughts ?. Also end_pfn in mark_reserved_region_for_nid() is defined as unsigned long end_pfn = ((physbase + size) >> PAGE_SHIFT); Does this refer to the pfn after the area that we are interested in ?. We have atleast two fixes here, 1. Limit start and end to bdata->node_min_pfn and bdata->node_low_pfn in reserve_bootmem_node() and add comments out in there that the caller of the funtion should be aware of how much are they reserving. 2. send (start_pfn << PAGE_SHIFT) to reserve_bootmem_node() instead of physbase. Chandru
On Fri, 2009-01-16 at 17:46 +0530, Chandru wrote: > On Thursday 15 January 2009 13:35:27 Chandru wrote: > > Hello Dave, From the debug console output, if there is anything you can add > > here, pls let me know. > > As we can see from the console output here, physbase isn't page aligned when > the panic occurs. So we could as well send (start_pfn << PAGE_SHIFT) to > reserve_bootmem_node() instead of physbase. your thoughts ?. > > Also end_pfn in mark_reserved_region_for_nid() is defined as > > unsigned long end_pfn = ((physbase + size) >> PAGE_SHIFT); > > Does this refer to the pfn after the area that we are interested in ?. We have > atleast two fixes here, > 1. Limit start and end to bdata->node_min_pfn and bdata->node_low_pfn in > reserve_bootmem_node() and add comments out in there that the caller of the > funtion should be aware of how much are they reserving. > 2. send (start_pfn << PAGE_SHIFT) to reserve_bootmem_node() instead of > physbase. Just looking at it, that calculation is OK. But, there was one in your dmesg that looked a page too long, like page 0x1001 instead of 0x1000. I'd find out how that happened. -- Dave
On Friday 16 January 2009 23:22:57 Dave Hansen wrote: > Just looking at it, that calculation is OK. But, there was one in your > dmesg that looked a page too long, like page 0x1001 instead of 0x1000. > I'd find out how that happened. That is a result of PFN_UP() in reserve_bootmem_node() for which we hit the BUG_ON() eventually. Prior to calling reserve_bootmem_node() we have... node_ar.end_pfn = node->node_end_pfn = PFN_DOWN(physbase+reserve_size). Hence a PFN_UP() will raise the value of 'end'. The kernel has CONFIG_PPC_64K_PAGES enabled in the config. Chandru
--- linux-2.6.28/arch/powerpc/mm/numa.c.orig 2009-01-08 03:20:41.000000000 -0600 +++ linux-2.6.28/arch/powerpc/mm/numa.c 2009-01-08 03:50:41.000000000 -0600 @@ -16,6 +16,7 @@ #include <linux/module.h> #include <linux/nodemask.h> #include <linux/cpu.h> +#include <linux/pfn.h> #include <linux/notifier.h> #include <linux/lmb.h> #include <linux/of.h> @@ -898,9 +899,17 @@ static void mark_reserved_regions_for_ni * if reserved region extends past active region * then trim size to active region */ - if (end_pfn > node_ar.end_pfn) + if (end_pfn > node_ar.end_pfn) { reserve_size = (node_ar.end_pfn << PAGE_SHIFT) - (start_pfn << PAGE_SHIFT); + /* + * resize it further if the reservation could + * cross the last page in this node + */ + if (PFN_UP(physbase+reserve_size) > + node_end_pfn) + reserve_size -= PAGE_SIZE; + } /* * Only worry about *this* node, others may not * yet have valid NODE_DATA().