Patchwork OF-related boot crash in 3.3.0-rc3-00188-g3ec1e88

login
register
mail settings
Submitter Tejun Heo
Date Feb. 22, 2012, 5:48 p.m.
Message ID <20120222174825.GA32694@google.com>
Download mbox | patch
Permalink /patch/142513/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Tejun Heo - Feb. 22, 2012, 5:48 p.m.
On Wed, Feb 22, 2012 at 02:36:13AM +0200, Meelis Roos wrote:
> > Meelis, can you please apply the following patch before & after the
> > offending commit, boot with "memblock=debug" added as kernel param and
> > post the boot log?  The patch will generate some offset warnings after
> > the commit but should work fine.
> 
> Before the commit (v3.2-rc3-75-g0ee332c): memblock1.gz (attached)
> After the commit (v3.2-rc3-76-g7bd0b0f): memblock2.gz (attached)

Can you please try the following patch?  If it still fails to boot,
please attach the failing log.  Thank you.

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Meelis Roos - Feb. 22, 2012, 6:25 p.m.
> Can you please try the following patch?  If it still fails to boot,
> please attach the failing log.  Thank you.

It works on E3500! Will try other machines tomorrow.
David Miller - Feb. 22, 2012, 8:44 p.m.
From: Tejun Heo <tj@kernel.org>
Date: Wed, 22 Feb 2012 09:48:25 -0800

> On Wed, Feb 22, 2012 at 02:36:13AM +0200, Meelis Roos wrote:
>> > Meelis, can you please apply the following patch before & after the
>> > offending commit, boot with "memblock=debug" added as kernel param and
>> > post the boot log?  The patch will generate some offset warnings after
>> > the commit but should work fine.
>> 
>> Before the commit (v3.2-rc3-75-g0ee332c): memblock1.gz (attached)
>> After the commit (v3.2-rc3-76-g7bd0b0f): memblock2.gz (attached)
> 
> Can you please try the following patch?  If it still fails to boot,
> please attach the failing log.  Thank you.

Interesting, but two things strike me.

First, this seems like it would only cause problems if the caller
specified a too small size parameter, and then wrote past the 'size'
bytes of the buffer.  And if so, this means we have an improperly
sized allocation somewhere, probably in the OF tree fetching code.

For example, maybe we mis-calculate the size of an OF device node
property before we fetch it from the firmware, therefore allocate
too small a buffer, and the property fetch operation splats all
over the end of the buffer.  Another possibility is that the
property length reported by the firmware is wrong and too small.

BTW, this kind of bug would be easy to catch, simply put a magic
number signature into all unallocated memblock memory then at
allocation time check that signature.  If we signal an error when we
don't see the proper signature and turn on the OF tree building
logging, we can see exactly which operation writes past the end of a
buffer.

Second, you'd need similar handling in other call chains such as
memblock_double_array()'s invocation of memblock_find_in_range().
It seems a bad idea to hide how size is modified, so probably it's
best to pass the address of the size parameter and modify the
caller's value in that way so that the size used in the reserve
matches up.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - Feb. 22, 2012, 9 p.m.
Hello, David.

On Wed, Feb 22, 2012 at 03:44:17PM -0500, David Miller wrote:
> > Can you please try the following patch?  If it still fails to boot,
> > please attach the failing log.  Thank you.
> 
> Interesting, but two things strike me.
> 
> First, this seems like it would only cause problems if the caller
> specified a too small size parameter, and then wrote past the 'size'
> bytes of the buffer.  And if so, this means we have an improperly
> sized allocation somewhere, probably in the OF tree fetching code.

There's another, less likely, possibility.  It made the allocation
table much larger and the lowest address used ended up lower.
0x0000007fc8fa40 vs 0x0000007fc94000.  Not too much of difference and
just allocating some more memory should rule out or confirm it.

> For example, maybe we mis-calculate the size of an OF device node
> property before we fetch it from the firmware, therefore allocate
> too small a buffer, and the property fetch operation splats all
> over the end of the buffer.  Another possibility is that the
> property length reported by the firmware is wrong and too small.
> 
> BTW, this kind of bug would be easy to catch, simply put a magic
> number signature into all unallocated memblock memory then at
> allocation time check that signature.  If we signal an error when we
> don't see the proper signature and turn on the OF tree building
> logging, we can see exactly which operation writes past the end of a
> buffer.

Yeah, redzonning can definitely help but I'm not sure whether we want
to go full on allocation debugging and all for early allocator.  The
thing doesn't even support freeing.

> Second, you'd need similar handling in other call chains such as
> memblock_double_array()'s invocation of memblock_find_in_range().
> It seems a bad idea to hide how size is modified, so probably it's
> best to pass the address of the size parameter and modify the
> caller's value in that way so that the size used in the reserve
> matches up.

I suspect the size modification was added later to avoid expanding
allocation table early during boot and we can do that only for
memblock_alloc*() calls as they don't have matching free interface.
If we modify explicit reservations, we have to propagate the modified
size to each user and so on.  Given that the allocation table is
discarded after boot completion and there aren't too many explicit
reservations, I don't think we need to expand size aligning to all
find_in_range users.  I guess it all depends on how complete allocator
we want for early boot.

Thanks.
Tejun Heo - Feb. 23, 2012, 6:55 p.m.
Hello,

On Wed, Feb 22, 2012 at 08:25:32PM +0200, Meelis Roos wrote:
> > Can you please try the following patch?  If it still fails to boot,
> > please attach the failing log.  Thank you.
> 
> It works on E3500! Will try other machines tomorrow.

Once confirmed, I'll push the patch through tip.  It just hides the
underlying problem but we should be in no worse shape than before,
it's two line change so reproduing the problem again for proper
diagnosing isn't difficult, and we're getting a bit late in release
cycle already.

Thanks.
David Miller - Feb. 23, 2012, 11:31 p.m.
From: Tejun Heo <tj@kernel.org>
Date: Thu, 23 Feb 2012 10:55:03 -0800

> Hello,
> 
> On Wed, Feb 22, 2012 at 08:25:32PM +0200, Meelis Roos wrote:
>> > Can you please try the following patch?  If it still fails to boot,
>> > please attach the failing log.  Thank you.
>> 
>> It works on E3500! Will try other machines tomorrow.
> 
> Once confirmed, I'll push the patch through tip.  It just hides the
> underlying problem but we should be in no worse shape than before,
> it's two line change so reproduing the problem again for proper
> diagnosing isn't difficult, and we're getting a bit late in release
> cycle already.

Ok.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Meelis Roos - Feb. 24, 2012, 9:20 a.m.
> > > Can you please try the following patch?  If it still fails to boot,
> > > please attach the failing log.  Thank you.
> > 
> > It works on E3500! Will try other machines tomorrow.
> 
> Once confirmed, I'll push the patch through tip.  It just hides the
> underlying problem but we should be in no worse shape than before,
> it's two line change so reproduing the problem again for proper
> diagnosing isn't difficult, and we're getting a bit late in release
> cycle already.

It cured the V210 too but I could not test V100 since it's offline until 
monday.
Meelis Roos - Feb. 27, 2012, 5:17 p.m.
> > > > Can you please try the following patch?  If it still fails to boot,
> > > > please attach the failing log.  Thank you.
> > > 
> > > It works on E3500! Will try other machines tomorrow.
> > 
> > Once confirmed, I'll push the patch through tip.  It just hides the
> > underlying problem but we should be in no worse shape than before,
> > it's two line change so reproduing the problem again for proper
> > diagnosing isn't difficult, and we're getting a bit late in release
> > cycle already.
> 
> It cured the V210 too but I could not test V100 since it's offline until 
> monday.

Tested V100 too, success!

Patch

diff --git a/mm/memblock.c b/mm/memblock.c
index 77b5f22..99f2855 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -99,9 +99,6 @@  phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
 	phys_addr_t this_start, this_end, cand;
 	u64 i;
 
-	/* align @size to avoid excessive fragmentation on reserved array */
-	size = round_up(size, align);
-
 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
 		end = memblock.current_limit;
@@ -731,6 +728,9 @@  static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
 {
 	phys_addr_t found;
 
+	/* align @size to avoid excessive fragmentation on reserved array */
+	size = round_up(size, align);
+
 	found = memblock_find_in_range_node(0, max_addr, size, align, nid);
 	if (found && !memblock_reserve(found, size))
 		return found;