diff mbox

[Maverick,ti-omap4] SRU: A workaround for highmem issue on OMAP4 platform

Message ID AANLkTince6QY0t4gExABG21ZqZ4hMEjGm=mTm7MBfYGo@mail.gmail.com
State Rejected
Headers show

Commit Message

Bryan Wu Sept. 26, 2010, 3:38 p.m. UTC
On Sun, Sep 26, 2010 at 11:01 PM, Nicolas Pitre
<nicolas.pitre@canonical.com> wrote:
> On Sun, 26 Sep 2010, Ricardo Salveti de Araujo wrote:
>
>> On Fri, Sep 24, 2010 at 03:04:10AM -0300, Ricardo Salveti de Araujo wrote:
>> > Will also test it with only one cpu to see if this could be realted with SMP
>> > issues.
>>
>> Ok, tested the same kernel but running with only one CPU, for 40 hours (what gave me
>> 15 builds), and went all fine, without any errors at both userspace and kernelspace.
>>
>> So it seems that this data abort exception could be related with concurrency and
>> SMP support at our kernel.
>
> Right.  So I'd suggest you keep highmem off, and 2g:2g on (with the
> VMALLOC_END fix), then try to reliably reproduce the issue with that
> configuration and fix it before involving highmem again.  While highmem
> may make the problem more visible, it also brings a set of added
> complexity of its own which would make the tracking of the issue much
> harder.
>
>
> Nicolas
>

I disabled CONFIG_CACHE_L2X0 L2 cache controller for omap4. So far for
the SMP kernel with mem=1G, kernel building is running correctly.
I will test more. It looks like L2 cache controlling has some issue.


From 0743374a52900030b54f643820dc8f8f71e98651 Mon Sep 17 00:00:00 2001
From: Bryan Wu <bryan.wu@canonical.com>
Date: Sun, 26 Sep 2010 20:35:48 +0800
Subject: [PATCH] UBUNTU: [Config] Disable L2 cache for OMAP4

Signed-off-by: Bryan Wu <bryan.wu@canonical.com>
---
 debian.ti-omap4/config/config.common.ubuntu |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

Comments

Nicolas Pitre Sept. 26, 2010, 4:31 p.m. UTC | #1
On Sun, 26 Sep 2010, Bryan Wu wrote:

> On Sun, Sep 26, 2010 at 11:01 PM, Nicolas Pitre
> <nicolas.pitre@canonical.com> wrote:
> > On Sun, 26 Sep 2010, Ricardo Salveti de Araujo wrote:
> >
> >> On Fri, Sep 24, 2010 at 03:04:10AM -0300, Ricardo Salveti de Araujo wrote:
> >> > Will also test it with only one cpu to see if this could be realted with SMP
> >> > issues.
> >>
> >> Ok, tested the same kernel but running with only one CPU, for 40 hours (what gave me
> >> 15 builds), and went all fine, without any errors at both userspace and kernelspace.
> >>
> >> So it seems that this data abort exception could be related with concurrency and
> >> SMP support at our kernel.
> >
> > Right.  So I'd suggest you keep highmem off, and 2g:2g on (with the
> > VMALLOC_END fix), then try to reliably reproduce the issue with that
> > configuration and fix it before involving highmem again.  While highmem
> > may make the problem more visible, it also brings a set of added
> > complexity of its own which would make the tracking of the issue much
> > harder.
> >
> >
> > Nicolas
> >
> 
> I disabled CONFIG_CACHE_L2X0 L2 cache controller for omap4. So far for
> the SMP kernel with mem=1G, kernel building is running correctly.
> I will test more. It looks like L2 cache controlling has some issue.

That's with or without highmem involved?


Nicolas
Bryan Wu Sept. 27, 2010, 1:32 a.m. UTC | #2
On Mon, Sep 27, 2010 at 12:31 AM, Nicolas Pitre
<nicolas.pitre@canonical.com> wrote:
> On Sun, 26 Sep 2010, Bryan Wu wrote:
>
>> On Sun, Sep 26, 2010 at 11:01 PM, Nicolas Pitre
>> <nicolas.pitre@canonical.com> wrote:
>> > On Sun, 26 Sep 2010, Ricardo Salveti de Araujo wrote:
>> >
>> >> On Fri, Sep 24, 2010 at 03:04:10AM -0300, Ricardo Salveti de Araujo wrote:
>> >> > Will also test it with only one cpu to see if this could be realted with SMP
>> >> > issues.
>> >>
>> >> Ok, tested the same kernel but running with only one CPU, for 40 hours (what gave me
>> >> 15 builds), and went all fine, without any errors at both userspace and kernelspace.
>> >>
>> >> So it seems that this data abort exception could be related with concurrency and
>> >> SMP support at our kernel.
>> >
>> > Right.  So I'd suggest you keep highmem off, and 2g:2g on (with the
>> > VMALLOC_END fix), then try to reliably reproduce the issue with that
>> > configuration and fix it before involving highmem again.  While highmem
>> > may make the problem more visible, it also brings a set of added
>> > complexity of its own which would make the tracking of the issue much
>> > harder.
>> >
>> >
>> > Nicolas
>> >
>>
>> I disabled CONFIG_CACHE_L2X0 L2 cache controller for omap4. So far for
>> the SMP kernel with mem=1G, kernel building is running correctly.
>> I will test more. It looks like L2 cache controlling has some issue.
>
> That's with or without highmem involved?
>
>

It's with highmem, but finally it still fails with message like this:
"Unhandled fault: imprecise external abort (0x1406) at 0x400b0000"

Thanks,
Ricardo Salveti de Araujo Sept. 27, 2010, 1:18 p.m. UTC | #3
On Mon, 2010-09-27 at 09:32 +0800, Bryan Wu wrote:
> On Mon, Sep 27, 2010 at 12:31 AM, Nicolas Pitre
> <nicolas.pitre@canonical.com> wrote:
> > On Sun, 26 Sep 2010, Bryan Wu wrote:
> >
> >> On Sun, Sep 26, 2010 at 11:01 PM, Nicolas Pitre
> >> <nicolas.pitre@canonical.com> wrote:
> >> > On Sun, 26 Sep 2010, Ricardo Salveti de Araujo wrote:
> >> >
> >> >> On Fri, Sep 24, 2010 at 03:04:10AM -0300, Ricardo Salveti de Araujo wrote:
> >> >> > Will also test it with only one cpu to see if this could be realted with SMP
> >> >> > issues.
> >> >>
> >> >> Ok, tested the same kernel but running with only one CPU, for 40 hours (what gave me
> >> >> 15 builds), and went all fine, without any errors at both userspace and kernelspace.
> >> >>
> >> >> So it seems that this data abort exception could be related with concurrency and
> >> >> SMP support at our kernel.
> >> >
> >> > Right.  So I'd suggest you keep highmem off, and 2g:2g on (with the
> >> > VMALLOC_END fix), then try to reliably reproduce the issue with that
> >> > configuration and fix it before involving highmem again.  While highmem
> >> > may make the problem more visible, it also brings a set of added
> >> > complexity of its own which would make the tracking of the issue much
> >> > harder.
> >> >
> >> >
> >> > Nicolas
> >> >
> >>
> >> I disabled CONFIG_CACHE_L2X0 L2 cache controller for omap4. So far for
> >> the SMP kernel with mem=1G, kernel building is running correctly.
> >> I will test more. It looks like L2 cache controlling has some issue.
> >
> > That's with or without highmem involved?
> 
> It's with highmem, but finally it still fails with message like this:
> "Unhandled fault: imprecise external abort (0x1406) at 0x400b0000"

Without L2, with highmem and SMP I can easily reproduce the issue, but
was able to run for 20 hours (6 builds) without L2, without highmem and
with SMP.

So currently I can use 1G when not running with highmem and disabling
SMP or L2.

This issue is probably a racing condition, but hard to trace where
exactly.

Cheers,
diff mbox

Patch

diff --git a/debian.ti-omap4/config/config.common.ubuntu
b/debian.ti-omap4/config/config.common.ubuntu
index 8d46b55..8f5b7e9 100644
--- a/debian.ti-omap4/config/config.common.ubuntu
+++ b/debian.ti-omap4/config/config.common.ubuntu
@@ -320,8 +320,7 @@  CONFIG_C2PORT=m
 CONFIG_CACHEFILES=m
 # CONFIG_CACHEFILES_DEBUG is not set
 # CONFIG_CACHEFILES_HISTOGRAM is not set
-CONFIG_CACHE_L2X0=y
-CONFIG_CACHE_PL310=y
+# CONFIG_CACHE_L2X0 is not set
 # CONFIG_CAIF is not set
 CONFIG_CAN=m
 CONFIG_CAN_BCM=m
@@ -928,6 +927,7 @@  CONFIG_HID_WACOM=m
 CONFIG_HID_ZEROPLUS=m
 # CONFIG_HID_ZYDACRON is not set
 CONFIG_HIGHMEM=y
+# CONFIG_HIGHPTE is not set
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_HOSTAP=m
 # CONFIG_HOSTAP_FIRMWARE is not set
@@ -1928,8 +1928,6 @@  CONFIG_OMFS_FS=m
 CONFIG_OPROFILE=y
 CONFIG_OSF_PARTITION=y
 # CONFIG_OTUS is not set
-CONFIG_OUTER_CACHE=y
-CONFIG_OUTER_CACHE_SYNC=y
 CONFIG_P54_COMMON=m
 CONFIG_P54_LEDS=y
 CONFIG_P54_SPI=m
@@ -1970,7 +1968,6 @@  CONFIG_PHONET=m
 CONFIG_PHYLIB=y
 # CONFIG_PHYS_ADDR_T_64BIT is not set
 CONFIG_PID_NS=y
-# CONFIG_PL310_ERRATA_588369 is not set
 # CONFIG_PLAT_SPEAR is not set
 CONFIG_PLIP=m
 # CONFIG_PM is not set