diff mbox

[U-Boot,07/10] sunxi: Fix end of kernel memory alignment for A33

Message ID 1429027621-19252-7-git-send-email-hdegoede@redhat.com
State Superseded
Delegated to: Hans de Goede
Headers show

Commit Message

Hans de Goede April 14, 2015, 4:06 p.m. UTC
For unknown reasons the A33 needs the end of the memory we report to the
kernel to be aligned to a multiple of 4 MiB. Without this things will hang
when we hand over control to the kernel.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/video/sunxi_display.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Ian Campbell April 15, 2015, 7:57 p.m. UTC | #1
On Tue, 2015-04-14 at 18:06 +0200, Hans de Goede wrote:
> For unknown reasons the A33 needs the end of the memory we report to the
> kernel to be aligned to a multiple of 4 MiB.

Do you really mean "the A33 needs" (as in the processor itself) or do
you actually mean "the A33 kernel port"?

If the latter than can't that be investigated/fixed instead of hacked
here? That would be far more preferable.

Ian.
Hans de Goede April 16, 2015, 7:32 a.m. UTC | #2
Hi,

On 15-04-15 21:57, Ian Campbell wrote:
> On Tue, 2015-04-14 at 18:06 +0200, Hans de Goede wrote:
>> For unknown reasons the A33 needs the end of the memory we report to the
>> kernel to be aligned to a multiple of 4 MiB.
>
> Do you really mean "the A33 needs" (as in the processor itself) or do
> you actually mean "the A33 kernel port"?
>
> If the latter than can't that be investigated/fixed instead of hacked
> here? That would be far more preferable.

I mean the former, it seems that the SoC itself cannot handle dram
ranges with different cache policies which are not aligned to 4 MiB,
at least that is my WAG what is going on here.

I've been using an a23 dtb + generic multi-platform kernel for my testing
(as said before the a33 really is almost the same design), and that boots
fine without this alignment hack on an actual A23 device, so this is not
a kernel limitation.

I'm not entirely happy with this semi-magic workaround either, but it
took me a day to find it, and then I tried to find a better solution /
more satisfying answer as to why for another day, so at this point
my vision on this is that we will just have to live with it.

Regards,

Hans
Mark Rutland April 16, 2015, 5:35 p.m. UTC | #3
On Thu, Apr 16, 2015 at 08:32:03AM +0100, Hans de Goede wrote:
> Hi,
> 
> On 15-04-15 21:57, Ian Campbell wrote:
> > On Tue, 2015-04-14 at 18:06 +0200, Hans de Goede wrote:
> >> For unknown reasons the A33 needs the end of the memory we report to the
> >> kernel to be aligned to a multiple of 4 MiB.
> >
> > Do you really mean "the A33 needs" (as in the processor itself) or do
> > you actually mean "the A33 kernel port"?
> >
> > If the latter than can't that be investigated/fixed instead of hacked
> > here? That would be far more preferable.
> 
> I mean the former, it seems that the SoC itself cannot handle dram
> ranges with different cache policies which are not aligned to 4 MiB,
> at least that is my WAG what is going on here.

That sounds incredibly suspicious.

What do you mean w.r.t. different cache policies -- what does that have
to do with the end of DRAM? What problem do you see?

It would be worth reporting this on lakml.

> I've been using an a23 dtb + generic multi-platform kernel for my testing
> (as said before the a33 really is almost the same design), and that boots
> fine without this alignment hack on an actual A23 device, so this is not
> a kernel limitation.

Not necessarily. Is RAM at the same location on both SoCs? What about
other devices and carevouts?

It could be htat the stars happen to align and we're finally caught out
by some dodgy maths.

Mark.
Hans de Goede April 16, 2015, 7:12 p.m. UTC | #4
Hi,

On 16-04-15 19:35, Mark Rutland wrote:
> On Thu, Apr 16, 2015 at 08:32:03AM +0100, Hans de Goede wrote:
>> Hi,
>>
>> On 15-04-15 21:57, Ian Campbell wrote:
>>> On Tue, 2015-04-14 at 18:06 +0200, Hans de Goede wrote:
>>>> For unknown reasons the A33 needs the end of the memory we report to the
>>>> kernel to be aligned to a multiple of 4 MiB.
>>>
>>> Do you really mean "the A33 needs" (as in the processor itself) or do
>>> you actually mean "the A33 kernel port"?
>>>
>>> If the latter than can't that be investigated/fixed instead of hacked
>>> here? That would be far more preferable.
>>
>> I mean the former, it seems that the SoC itself cannot handle dram
>> ranges with different cache policies which are not aligned to 4 MiB,
>> at least that is my WAG what is going on here.
>
> That sounds incredibly suspicious.
>
> What do you mean w.r.t. different cache policies -- what does that have
> to do with the end of DRAM?

We carve out a framebuffer at the end of DRAM, and then report less
DRAM then we actually have to the kernel. This framebuffer then gets
picked up by the kernel through simplefb, which will map it with a different
cache policy then the normal part of the DRAM has.

> What problem do you see?

Depending on the framebuffer-size the kernel either boots or does not boot,
when it does not boot it does nothing (I've a serial console) earlyprintk
does not help, I was looking into setting up an early console (should be
a matter of just putting in the right parameters) when I found out that if
I modify the framebuffer size that fixes things.

After experimenting more it seems that keeping the last pixel of the
framebuffer at the very end of DRAM is not a problem (so this does not seem
to be a display engine problem), things start to work when I make the carve
out at the end bigger.

On the very similar A23 giving the kernel all of the DRAM except for the
framebuffer (aligned to a multiple of 4k) works just fine.

Sometimes I can get away with just making the carve-out bigger without
aligning it to a multiple of 4 MiB, but an alignment to 4 MiB seems to
always work independent of the framebuffer size.

> It would be worth reporting this on lakml.

If you still think that after the above explanation I'll start a new thread
on lakml with contents more targeted at kernel devs.

>> I've been using an a23 dtb + generic multi-platform kernel for my testing
>> (as said before the a33 really is almost the same design), and that boots
>> fine without this alignment hack on an actual A23 device, so this is not
>> a kernel limitation.
>
> Not necessarily. Is RAM at the same location on both SoCs? What about
> other devices and carevouts?

Everything is the same on both SoCs except that one has 2 Cortex A7
cores and the new one with the problem has 4 Cortex A7 cores, and a
new dram controller / mbus subsystem to keep the 4 cores fed.

> It could be htat the stars happen to align and we're finally caught out
> by some dodgy maths.

I don't think that that is the case here.

Regards,

Hans
Mark Rutland April 17, 2015, 10:20 a.m. UTC | #5
On Thu, Apr 16, 2015 at 08:12:31PM +0100, Hans de Goede wrote:
> Hi,
> 
> On 16-04-15 19:35, Mark Rutland wrote:
> > On Thu, Apr 16, 2015 at 08:32:03AM +0100, Hans de Goede wrote:
> >> Hi,
> >>
> >> On 15-04-15 21:57, Ian Campbell wrote:
> >>> On Tue, 2015-04-14 at 18:06 +0200, Hans de Goede wrote:
> >>>> For unknown reasons the A33 needs the end of the memory we report to the
> >>>> kernel to be aligned to a multiple of 4 MiB.
> >>>
> >>> Do you really mean "the A33 needs" (as in the processor itself) or do
> >>> you actually mean "the A33 kernel port"?
> >>>
> >>> If the latter than can't that be investigated/fixed instead of hacked
> >>> here? That would be far more preferable.
> >>
> >> I mean the former, it seems that the SoC itself cannot handle dram
> >> ranges with different cache policies which are not aligned to 4 MiB,
> >> at least that is my WAG what is going on here.
> >
> > That sounds incredibly suspicious.
> >
> > What do you mean w.r.t. different cache policies -- what does that have
> > to do with the end of DRAM?
> 
> We carve out a framebuffer at the end of DRAM, and then report less
> DRAM then we actually have to the kernel. This framebuffer then gets
> picked up by the kernel through simplefb, which will map it with a different
> cache policy then the normal part of the DRAM has.

I see. Thanks for the clarification.

> > What problem do you see?
> 
> Depending on the framebuffer-size the kernel either boots or does not boot,
> when it does not boot it does nothing (I've a serial console) earlyprintk
> does not help, I was looking into setting up an early console (should be
> a matter of just putting in the right parameters) when I found out that if
> I modify the framebuffer size that fixes things.

Ok. So we don't know if the kernel is stuck somewhere or everything is
completely hosed, then?

I take it you can't get JTAG worknig via the SD card slot?

> After experimenting more it seems that keeping the last pixel of the
> framebuffer at the very end of DRAM is not a problem (so this does not seem
> to be a display engine problem), things start to work when I make the carve
> out at the end bigger.
> 
> On the very similar A23 giving the kernel all of the DRAM except for the
> framebuffer (aligned to a multiple of 4k) works just fine.
> 
> Sometimes I can get away with just making the carve-out bigger without
> aligning it to a multiple of 4 MiB, but an alignment to 4 MiB seems to
> always work independent of the framebuffer size.
> 
> > It would be worth reporting this on lakml.
> 
> If you still think that after the above explanation I'll start a new thread
> on lakml with contents more targeted at kernel devs.

I think it would be worthwhile. This could be one instance of an issue
in the memory system that we might hit elsewhere. Even if we don't come
to another solution, it'll at least make it visible to others.

> >> I've been using an a23 dtb + generic multi-platform kernel for my testing
> >> (as said before the a33 really is almost the same design), and that boots
> >> fine without this alignment hack on an actual A23 device, so this is not
> >> a kernel limitation.
> >
> > Not necessarily. Is RAM at the same location on both SoCs? What about
> > other devices and carevouts?
> 
> Everything is the same on both SoCs except that one has 2 Cortex A7
> cores and the new one with the problem has 4 Cortex A7 cores, and a
> new dram controller / mbus subsystem to keep the 4 cores fed.
> 
> > It could be htat the stars happen to align and we're finally caught out
> > by some dodgy maths.
> 
> I don't think that that is the case here.

Yeah. The memory subsystem differences sound like the chief suspects.

Do we know if the A7s in the A23 and A33 are different revisions (and
which bits are set in their aux registers)? It could be that some
memory system features is enabled on one but not the other, or something
like that.

Mark.
Hans de Goede April 24, 2015, 6:32 p.m. UTC | #6
Hi Mark,

On 17-04-15 12:20, Mark Rutland wrote:
> On Thu, Apr 16, 2015 at 08:12:31PM +0100, Hans de Goede wrote:
>> Hi,
>>
>> On 16-04-15 19:35, Mark Rutland wrote:
>>> On Thu, Apr 16, 2015 at 08:32:03AM +0100, Hans de Goede wrote:
>>>> Hi,
>>>>
>>>> On 15-04-15 21:57, Ian Campbell wrote:
>>>>> On Tue, 2015-04-14 at 18:06 +0200, Hans de Goede wrote:
>>>>>> For unknown reasons the A33 needs the end of the memory we report to the
>>>>>> kernel to be aligned to a multiple of 4 MiB.
>>>>>
>>>>> Do you really mean "the A33 needs" (as in the processor itself) or do
>>>>> you actually mean "the A33 kernel port"?
>>>>>
>>>>> If the latter than can't that be investigated/fixed instead of hacked
>>>>> here? That would be far more preferable.
>>>>
>>>> I mean the former, it seems that the SoC itself cannot handle dram
>>>> ranges with different cache policies which are not aligned to 4 MiB,
>>>> at least that is my WAG what is going on here.
>>>
>>> That sounds incredibly suspicious.
>>>
>>> What do you mean w.r.t. different cache policies -- what does that have
>>> to do with the end of DRAM?
>>
>> We carve out a framebuffer at the end of DRAM, and then report less
>> DRAM then we actually have to the kernel. This framebuffer then gets
>> picked up by the kernel through simplefb, which will map it with a different
>> cache policy then the normal part of the DRAM has.
>
> I see. Thanks for the clarification.
>
>>> What problem do you see?
>>
>> Depending on the framebuffer-size the kernel either boots or does not boot,
>> when it does not boot it does nothing (I've a serial console) earlyprintk
>> does not help, I was looking into setting up an early console (should be
>> a matter of just putting in the right parameters) when I found out that if
>> I modify the framebuffer size that fixes things.
>
> Ok. So we don't know if the kernel is stuck somewhere or everything is
> completely hosed, then?
>
> I take it you can't get JTAG worknig via the SD card slot?
>
>> After experimenting more it seems that keeping the last pixel of the
>> framebuffer at the very end of DRAM is not a problem (so this does not seem
>> to be a display engine problem), things start to work when I make the carve
>> out at the end bigger.
>>
>> On the very similar A23 giving the kernel all of the DRAM except for the
>> framebuffer (aligned to a multiple of 4k) works just fine.
>>
>> Sometimes I can get away with just making the carve-out bigger without
>> aligning it to a multiple of 4 MiB, but an alignment to 4 MiB seems to
>> always work independent of the framebuffer size.
>>
>>> It would be worth reporting this on lakml.
>>
>> If you still think that after the above explanation I'll start a new thread
>> on lakml with contents more targeted at kernel devs.
>
> I think it would be worthwhile. This could be one instance of an issue
> in the memory system that we might hit elsewhere. Even if we don't come
> to another solution, it'll at least make it visible to others.

So it seems that I'm not the only one seeing this, and I've been wrongly
blaming it on the A33, instead it seems to be a kernel bug, triggered
on my A33 due to the display resolution it has.

For details see:

http://www.spinics.net/lists/arm-kernel/msg413811.html

Regards,

Hans
Mark Rutland April 28, 2015, 9:33 a.m. UTC | #7
Hi Hans,

> So it seems that I'm not the only one seeing this, and I've been wrongly
> blaming it on the A33, instead it seems to be a kernel bug, triggered
> on my A33 due to the display resolution it has.
> 
> For details see:
> 
> http://www.spinics.net/lists/arm-kernel/msg413811.html

That's good news; far less scary than a HW issue.

Would you mind replying on that thread to give it a bit more visibility?

Thanks,
Mark.
diff mbox

Patch

diff --git a/drivers/video/sunxi_display.c b/drivers/video/sunxi_display.c
index e132b75..7a63094 100644
--- a/drivers/video/sunxi_display.c
+++ b/drivers/video/sunxi_display.c
@@ -1275,6 +1275,15 @@  int sunxi_simplefb_setup(void *blob)
 	 */
 	start = gd->bd->bi_dram[0].start;
 	size = gd->bd->bi_dram[0].size - sunxi_display.fb_size;
+	/*
+	 * For unknown reasons the A33 needs the end of the memory we report to
+	 * the kernel to be aligned to a multiple of 4 MiB. Without this things
+	 * will hang when we hand over control to the kernel.
+	 */
+#ifdef CONFIG_MACH_SUN8I_A33
+	size &= ~(4 * 1024 * 1024 - 1);
+#endif
+
 	ret = fdt_fixup_memory_banks(blob, &start, &size, 1);
 	if (ret) {
 		eprintf("Cannot setup simplefb: Error reserving memory\n");