| Submitter | Tejun Heo |
|---|---|
| Date | Feb. 21, 2012, 1:05 a.m. |
| Message ID | <20120221010537.GA15898@dhcp-172-17-108-109.mtv.corp.google.com> |
| Download | mbox | patch |
| Permalink | /patch/142215/ |
| State | Not Applicable |
| Delegated to: | David Miller |
| Headers | show |
Comments
> Meelis, can you please apply the following patch before & after the > offending commit, boot with "memblock=debug" added as kernel param and > post the boot log? The patch will generate some offset warnings after > the commit but should work fine. Before the commit (v3.2-rc3-75-g0ee332c): memblock1.gz (attached) After the commit (v3.2-rc3-76-g7bd0b0f): memblock2.gz (attached) In addition, a third type of sparc machines breaks in a third way - V210 and V240 just hang after telling console [tty0] enabled, bootconsole disabled and before calibrating the delay loop. Bisect has led to the same commit.
On Mon, Feb 20, 2012 at 05:05:37PM -0800, Tejun Heo wrote: > Hello, > > Meelis, can you please apply the following patch before & after the > offending commit, boot with "memblock=debug" added as kernel param and > post the boot log? The patch will generate some offset warnings after > the commit but should work fine. > > Sam, David, as I'm not familiar with the code base, is it possible to > tell which address is corrupted (zeroed, it seems)? ie. can we add > "if (XXX == NULL) printk("%p is corrputed\n"...);" somewhere? No idea - sorry. I spend most of the time with sparc32 - which I do not even feel familiar with yet :-( One thing I noticed while working with memblock for sparc32 (*) is that allocations are done top-down. So we may end up allocatng memory with a considerably higher address than we are used to. This is obviously just a wild guess... Meelis - do the affected boxes have any special memory configurations? Could you try to boot with a sensible mem=xxx value to see if limiting the memory helps. (*) I have re-done the original patch-set and I have a quite good feeling about it. HIGHMEM support is outstanding - I got a bit confused when I looked at x86. But my ss5 crashes the first time I try to use the allocated memory - so I assume I have some silly issue somewhere. Nothing points at memblock in this case. Sam -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Meelis - do the affected boxes have any special memory configurations? Nothin special to me. E3500 has 2G, V100 has 1G, V210 and V240 have 2G and 1.5G. > Could you try to boot with a sensible mem=xxx value to see if limiting the memory > helps. Like mem=256M? Will try.
On Wed, Feb 22, 2012 at 07:12:06PM +0200, Meelis Roos wrote: > > Meelis - do the affected boxes have any special memory configurations? > > Nothin special to me. E3500 has 2G, V100 has 1G, V210 and V240 have 2G > and 1.5G. > > > Could you try to boot with a sensible mem=xxx value to see if limiting the memory > > helps. > > Like mem=256M? Will try. Think just a little more - I do not think this will help. I confused myself with some of the sparc32 issues I have hit. I have looked a little at the log files you included. The only thing that looked different was that the faulty version had a number after "@" which is higher than 1 - where the OK always have 1. This is "idx" in memblock_insert_region() - but I did not look closer. Sam -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Could you try to boot with a sensible mem=xxx value to see if limiting the memory > > > helps. > > > > Like mem=256M? Will try. > Think just a little more - I do not think this will help. Tried it on the 2G V210. It changes the picture. With 2G RAM, it just hangs. With mem=256M it produces a crash in strlen and of_alias_scan like in V100 with 1G. mem=512M results in the same strlen error. mem=1G results in a stranger error: [ 0.000000] Kernel panic - not syncing: ERROR: Failed to allocate 0x90 bytes below 0x0. [ 0.000000] [ 0.000000] Call Trace: [ 0.000000] [00000000007a6a28] memblock_alloc_base+0x28/0x38 [ 0.000000] [000000000079ca50] prom_early_alloc+0xc/0x60 [ 0.000000] [00000000007ae090] of_pdt_create_node.part.0+0x4/0xe0 [ 0.000000] [00000000007ae250] of_pdt_build_devicetree+0x30/0xa0 [ 0.000000] [000000000079c4a8] prom_build_devicetree+0x18/0x38 [ 0.000000] [00000000007a03c0] paging_init+0x59c/0x6bc [ 0.000000] [000000000079be50] setup_arch+0xf8/0x108 [ 0.000000] [000000000079a4e8] start_kernel+0x78/0x30c [ 0.000000] [00000000006a3e80] tlb_fixup_done+0x98/0xa0 [ 0.000000] [0000000000000000] (null) The working machines have 512M RAM, 834M RAM and 2G RAM so it's not just the amount of RAM.
On 22/02/2012 00:36, Meelis Roos wrote: >> Meelis, can you please apply the following patch before& after the >> offending commit, boot with "memblock=debug" added as kernel param and >> post the boot log? The patch will generate some offset warnings after >> the commit but should work fine. > > Before the commit (v3.2-rc3-75-g0ee332c): memblock1.gz (attached) > After the commit (v3.2-rc3-76-g7bd0b0f): memblock2.gz (attached) > Its a long time since I regularly had to worry about SPARC boxes (not) booting so may be the difference between virtual & physical addresses but I notice that some of the addresses in the register dump have non-zero values in the upper 32 bits but the memblock values have zero in the upper half. memblock reserved: ADD [0x0000007fcc0a40-0x0000007fcc0a4e] node 1 memblock reserved: add [0x0000007fcc0a40-000000007fcc0a4e] node 1 @767 But a similar address in the registers has fffff800 in there. o4: fffff8007fcc0a4d I know that there are a number of explanations why things would be different (32 bit acesses etc) but it could explain things plus we would be talking 64 bit addresses in the kernel. Just a thought. Richard > In addition, a third type of sparc machines breaks in a third way - V210 > and V240 just hang after telling > > console [tty0] enabled, bootconsole disabled > > and before calibrating the delay loop. Bisect has led to the same commit. > -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Richard Mortimer <richm@oldelvet.org.uk> Date: Wed, 22 Feb 2012 18:22:36 +0000 > memblock reserved: ADD [0x0000007fcc0a40-0x0000007fcc0a4e] node 1 > memblock reserved: add [0x0000007fcc0a40-000000007fcc0a4e] node 1 @767 These are physical addresses. > But a similar address in the registers has fffff800 in there. > > o4: fffff8007fcc0a4d All of physical memory is mapped linearly starting at 0xfffff80000000000 and this is such a virtual address. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patch
diff --git a/mm/memblock.c b/mm/memblock.c index 1adbef0..dccfced 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -179,9 +179,15 @@ int __init_memblock memblock_reserve_reserved_regions(void) static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r) { - type->total_size -= type->regions[r].size; - memmove(&type->regions[r], &type->regions[r + 1], - (type->cnt - (r + 1)) * sizeof(type->regions[r])); + struct memblock_region *rgn = &type->regions[r]; + + memblock_dbg(" memblock %s: rm [%#016llx-%#016llx] node %d\n", + memblock_type_name(type), + (unsigned long long)rgn->base, + (unsigned long long)rgn->base + rgn->size, rgn->nid); + + type->total_size -= rgn->size; + memmove(rgn, rgn + 1, (type->cnt - (r + 1)) * sizeof(*rgn)); type->cnt--; /* Special case for empty arrays */ @@ -317,6 +323,9 @@ static void __init_memblock memblock_insert_region(struct memblock_type *type, memblock_set_region_node(rgn, nid); type->cnt++; type->total_size += size; + memblock_dbg(" memblock %s: add [%#016llx-%016llx] node %d @%d\n", + memblock_type_name(type), (unsigned long long)base, + (unsigned long long)base + size, nid, idx); } /** @@ -342,6 +351,10 @@ static int __init_memblock memblock_add_region(struct memblock_type *type, phys_addr_t end = base + memblock_cap_size(base, &size); int i, nr_new; + memblock_dbg(" memblock %s: ADD [%#016llx-%#016llx] node %d\n", + memblock_type_name(type), (unsigned long long)base, + (unsigned long long)base + size, nid); + /* special case for empty array */ if (type->regions[0].size == 0) { WARN_ON(type->cnt != 1 || type->total_size); @@ -349,6 +362,8 @@ static int __init_memblock memblock_add_region(struct memblock_type *type, type->regions[0].size = size; memblock_set_region_node(&type->regions[0], nid); type->total_size = size; + memblock_dbg(" memblock %s: add first entry\n", + memblock_type_name(type)); return 0; } repeat: @@ -494,6 +509,10 @@ static int __init_memblock __memblock_remove(struct memblock_type *type, int start_rgn, end_rgn; int i, ret; + memblock_dbg(" memblock %s: RM [%#016llx-%016llx]\n", + memblock_type_name(type), (unsigned long long)base, + (unsigned long long)base + size); + ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn); if (ret) return ret;
Hello, Meelis, can you please apply the following patch before & after the offending commit, boot with "memblock=debug" added as kernel param and post the boot log? The patch will generate some offset warnings after the commit but should work fine. Sam, David, as I'm not familiar with the code base, is it possible to tell which address is corrupted (zeroed, it seems)? ie. can we add "if (XXX == NULL) printk("%p is corrputed\n"...);" somewhere? Thanks. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html