Message ID | 20120221010537.GA15898@dhcp-172-17-108-109.mtv.corp.google.com |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
> Meelis, can you please apply the following patch before & after the > offending commit, boot with "memblock=debug" added as kernel param and > post the boot log? The patch will generate some offset warnings after > the commit but should work fine. Before the commit (v3.2-rc3-75-g0ee332c): memblock1.gz (attached) After the commit (v3.2-rc3-76-g7bd0b0f): memblock2.gz (attached) In addition, a third type of sparc machines breaks in a third way - V210 and V240 just hang after telling console [tty0] enabled, bootconsole disabled and before calibrating the delay loop. Bisect has led to the same commit.
On Mon, Feb 20, 2012 at 05:05:37PM -0800, Tejun Heo wrote: > Hello, > > Meelis, can you please apply the following patch before & after the > offending commit, boot with "memblock=debug" added as kernel param and > post the boot log? The patch will generate some offset warnings after > the commit but should work fine. > > Sam, David, as I'm not familiar with the code base, is it possible to > tell which address is corrupted (zeroed, it seems)? ie. can we add > "if (XXX == NULL) printk("%p is corrputed\n"...);" somewhere? No idea - sorry. I spend most of the time with sparc32 - which I do not even feel familiar with yet :-( One thing I noticed while working with memblock for sparc32 (*) is that allocations are done top-down. So we may end up allocatng memory with a considerably higher address than we are used to. This is obviously just a wild guess... Meelis - do the affected boxes have any special memory configurations? Could you try to boot with a sensible mem=xxx value to see if limiting the memory helps. (*) I have re-done the original patch-set and I have a quite good feeling about it. HIGHMEM support is outstanding - I got a bit confused when I looked at x86. But my ss5 crashes the first time I try to use the allocated memory - so I assume I have some silly issue somewhere. Nothing points at memblock in this case. Sam -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Meelis - do the affected boxes have any special memory configurations? Nothin special to me. E3500 has 2G, V100 has 1G, V210 and V240 have 2G and 1.5G. > Could you try to boot with a sensible mem=xxx value to see if limiting the memory > helps. Like mem=256M? Will try.
On Wed, Feb 22, 2012 at 07:12:06PM +0200, Meelis Roos wrote: > > Meelis - do the affected boxes have any special memory configurations? > > Nothin special to me. E3500 has 2G, V100 has 1G, V210 and V240 have 2G > and 1.5G. > > > Could you try to boot with a sensible mem=xxx value to see if limiting the memory > > helps. > > Like mem=256M? Will try. Think just a little more - I do not think this will help. I confused myself with some of the sparc32 issues I have hit. I have looked a little at the log files you included. The only thing that looked different was that the faulty version had a number after "@" which is higher than 1 - where the OK always have 1. This is "idx" in memblock_insert_region() - but I did not look closer. Sam -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Could you try to boot with a sensible mem=xxx value to see if limiting the memory > > > helps. > > > > Like mem=256M? Will try. > Think just a little more - I do not think this will help. Tried it on the 2G V210. It changes the picture. With 2G RAM, it just hangs. With mem=256M it produces a crash in strlen and of_alias_scan like in V100 with 1G. mem=512M results in the same strlen error. mem=1G results in a stranger error: [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate 0x90 bytes below 0x0. [ 0.000000] [ 0.000000] Call Trace: [ 0.000000] [00000000007a6a28] memblock_alloc_base+0x28/0x38 [ 0.000000] [000000000079ca50] prom_early_alloc+0xc/0x60 [ 0.000000] [00000000007ae090] of_pdt_create_node.part.0+0x4/0xe0 [ 0.000000] [00000000007ae250] of_pdt_build_devicetree+0x30/0xa0 [ 0.000000] [000000000079c4a8] prom_build_devicetree+0x18/0x38 [ 0.000000] [00000000007a03c0] paging_init+0x59c/0x6bc [ 0.000000] [000000000079be50] setup_arch+0xf8/0x108 [ 0.000000] [000000000079a4e8] start_kernel+0x78/0x30c [ 0.000000] [00000000006a3e80] tlb_fixup_done+0x98/0xa0 [ 0.000000] [0000000000000000] (null) The working machines have 512M RAM, 834M RAM and 2G RAM so it's not just the amount of RAM.
On 22/02/2012 00:36, Meelis Roos wrote: >> Meelis, can you please apply the following patch before& after the >> offending commit, boot with "memblock=debug" added as kernel param and >> post the boot log? The patch will generate some offset warnings after >> the commit but should work fine. > > Before the commit (v3.2-rc3-75-g0ee332c): memblock1.gz (attached) > After the commit (v3.2-rc3-76-g7bd0b0f): memblock2.gz (attached) > Its a long time since I regularly had to worry about SPARC boxes (not) booting so may be the difference between virtual & physical addresses but I notice that some of the addresses in the register dump have non-zero values in the upper 32 bits but the memblock values have zero in the upper half. memblock reserved: ADD [0x0000007fcc0a40-0x0000007fcc0a4e] node 1 memblock reserved: add [0x0000007fcc0a40-000000007fcc0a4e] node 1 @767 But a similar address in the registers has fffff800 in there. o4: fffff8007fcc0a4d I know that there are a number of explanations why things would be different (32 bit acesses etc) but it could explain things plus we would be talking 64 bit addresses in the kernel. Just a thought. Richard > In addition, a third type of sparc machines breaks in a third way - V210 > and V240 just hang after telling > > console [tty0] enabled, bootconsole disabled > > and before calibrating the delay loop. Bisect has led to the same commit. > -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Richard Mortimer <richm@oldelvet.org.uk> Date: Wed, 22 Feb 2012 18:22:36 +0000 > memblock reserved: ADD [0x0000007fcc0a40-0x0000007fcc0a4e] node 1 > memblock reserved: add [0x0000007fcc0a40-000000007fcc0a4e] node 1 @767 These are physical addresses. > But a similar address in the registers has fffff800 in there. > > o4: fffff8007fcc0a4d All of physical memory is mapped linearly starting at 0xfffff80000000000 and this is such a virtual address. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/mm/memblock.c b/mm/memblock.c index 1adbef0..dccfced 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -179,9 +179,15 @@ int __init_memblock memblock_reserve_reserved_regions(void) static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r) { - type->total_size -= type->regions[r].size; - memmove(&type->regions[r], &type->regions[r + 1], - (type->cnt - (r + 1)) * sizeof(type->regions[r])); + struct memblock_region *rgn = &type->regions[r]; + + memblock_dbg(" memblock %s: rm [%#016llx-%#016llx] node %d\n", + memblock_type_name(type), + (unsigned long long)rgn->base, + (unsigned long long)rgn->base + rgn->size, rgn->nid); + + type->total_size -= rgn->size; + memmove(rgn, rgn + 1, (type->cnt - (r + 1)) * sizeof(*rgn)); type->cnt--; /* Special case for empty arrays */ @@ -317,6 +323,9 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type, memblock_set_region_node(rgn, nid); type->cnt++; type->total_size += size; + memblock_dbg(" memblock %s: add [%#016llx-%016llx] node %d @%d\n", + memblock_type_name(type), (unsigned long long)base, + (unsigned long long)base + size, nid, idx); } /** @@ -342,6 +351,10 @@ static int __init_memblock memblock_add_region(struct memblock_type *type, phys_addr_t end = base + memblock_cap_size(base, &size); int i, nr_new; + memblock_dbg(" memblock %s: ADD [%#016llx-%#016llx] node %d\n", + memblock_type_name(type), (unsigned long long)base, + (unsigned long long)base + size, nid); + /* special case for empty array */ if (type->regions[0].size == 0) { WARN_ON(type->cnt != 1 || type->total_size); @@ -349,6 +362,8 @@ static int __init_memblock memblock_add_region(struct memblock_type *type, type->regions[0].size = size; memblock_set_region_node(&type->regions[0], nid); type->total_size = size; + memblock_dbg(" memblock %s: add first entry\n", + memblock_type_name(type)); return 0; } repeat: @@ -494,6 +509,10 @@ static int __init_memblock __memblock_remove(struct memblock_type *type, int start_rgn, end_rgn; int i, ret; + memblock_dbg(" memblock %s: RM [%#016llx-%016llx]\n", + memblock_type_name(type), (unsigned long long)base, + (unsigned long long)base + size); + ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn); if (ret) return ret;