diff mbox

[U-Boot] sunxi: mctl_mem_matches: Add missing memory barrier

Message ID 1460653084-32197-1-git-send-email-hdegoede@redhat.com
State Accepted
Commit bfb33f0bc45b9ee92ed2f85107cf20b9bfdf9f8a
Delegated to: Hans de Goede
Headers show

Commit Message

Hans de Goede April 14, 2016, 4:58 p.m. UTC
We are running with the caches disabled when mctl_mem_matches gets called,
but the cpu's write buffer is still there and can still get in the way,
add a memory barrier to fix this.

This avoids mctl_mem_matches always returning false in some cases, which
was resulting in:

U-Boot SPL 2015.07 (Apr 14 2016 - 18:47:26)
DRAM: 1024 MiB

U-Boot 2015.07 (Apr 14 2016 - 18:47:26 +0200) Allwinner Technology

CPU:   Allwinner A23 (SUN8I)
DRAM:  512 MiB

Where 512 MiB is the right amount, but the DRAM controller would be
initialized for 1024 MiB.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 arch/arm/mach-sunxi/dram_helpers.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Siarhei Siamashka April 15, 2016, 12:46 a.m. UTC | #1
Hello Hans,

On Thu, 14 Apr 2016 18:58:04 +0200
Hans de Goede <hdegoede@redhat.com> wrote:

> We are running with the caches disabled when mctl_mem_matches gets called,
> but the cpu's write buffer is still there and can still get in the way,

This does not make much sense to me. The SPL is running with the MMU
disabled, because disabling the MMU is one of the first things done by
the SPL at the very start. And when the MMU is disabled, all the data
accesses are treated as Strongly-ordered and are not supposed to use
the write buffer. A quote from the ARMv7 Architecture Manual:

   "a write to Strongly-ordered memory can complete only when it
   reaches the peripheral or memory component accessed by the write"

We can even verify whether the write buffer is actually in use by simply
benchmarking something like the memset function. If the write buffer is
working, then the sequential write speed will be around 1 GB/s or more.

> add a memory barrier to fix this.
> 
> This avoids mctl_mem_matches always returning false in some cases, which
> was resulting in:
> 
> U-Boot SPL 2015.07 (Apr 14 2016 - 18:47:26)
> DRAM: 1024 MiB
> 
> U-Boot 2015.07 (Apr 14 2016 - 18:47:26 +0200) Allwinner Technology
> 
> CPU:   Allwinner A23 (SUN8I)
> DRAM:  512 MiB
> 
> Where 512 MiB is the right amount, but the DRAM controller would be
> initialized for 1024 MiB.

Is it just a single device or board? Has anybody seen anything like
this on other devices with the same SoC?

I wonder if what you are observing could be possibly explained by just
a usual data corruption problem? Which may be happening when the DRAM
clock speed is set higher than this particular device is able to handle
in a reliable way. Inserting just one or more NOP instructions instead
of the barrier could possibly change some timings too.

If this patch helps, then it's fine. But I wonder if it is not merely
making the problem latent instead of fixing the root cause?

> 
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
>  arch/arm/mach-sunxi/dram_helpers.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm/mach-sunxi/dram_helpers.c b/arch/arm/mach-sunxi/dram_helpers.c
> index 50318d2..e0c823a 100644
> --- a/arch/arm/mach-sunxi/dram_helpers.c
> +++ b/arch/arm/mach-sunxi/dram_helpers.c
> @@ -7,6 +7,7 @@
>   */
>  
>  #include <common.h>
> +#include <asm/armv7.h>
>  #include <asm/io.h>
>  #include <asm/arch/dram.h>
>  
> @@ -31,6 +32,7 @@ bool mctl_mem_matches(u32 offset)
>  	/* Try to write different values to RAM at two addresses */
>  	writel(0, CONFIG_SYS_SDRAM_BASE);
>  	writel(0xaa55aa55, (ulong)CONFIG_SYS_SDRAM_BASE + offset);
> +	DSB;
>  	/* Check if the same value is actually observed when reading back */
>  	return readl(CONFIG_SYS_SDRAM_BASE) ==
>  	       readl((ulong)CONFIG_SYS_SDRAM_BASE + offset);
Hans de Goede April 15, 2016, 7:34 a.m. UTC | #2
Hi,

On 15-04-16 02:46, Siarhei Siamashka wrote:
> Hello Hans,
>
> On Thu, 14 Apr 2016 18:58:04 +0200
> Hans de Goede <hdegoede@redhat.com> wrote:
>
>> We are running with the caches disabled when mctl_mem_matches gets called,
>> but the cpu's write buffer is still there and can still get in the way,
>
> This does not make much sense to me. The SPL is running with the MMU
> disabled, because disabling the MMU is one of the first things done by
> the SPL at the very start. And when the MMU is disabled, all the data
> accesses are treated as Strongly-ordered and are not supposed to use
> the write buffer. A quote from the ARMv7 Architecture Manual:
>
>     "a write to Strongly-ordered memory can complete only when it
>     reaches the peripheral or memory component accessed by the write"

I was surprised by this myself too, but I noticed that mctl_mem_matches
has started to return always false on one A23 tablet (detecting 1G of mem
instead of 512MB), I then tried it on multiple A23 tablets (3 in total)
and they all showed the same problem.

I also could not remember seeing this before, so went back u-boot versions
all the way to v2015.07 but the problem was still there, which makes
me believe that it is triggered by recent gcc versions. Note I'm seeing
the problem with u-boot build with gcc-5.3 as well as with gcc-6.

> We can even verify whether the write buffer is actually in use by simply
> benchmarking something like the memset function. If the write buffer is
> working, then the sequential write speed will be around 1 GB/s or more.
>
>> add a memory barrier to fix this.
>>
>> This avoids mctl_mem_matches always returning false in some cases, which
>> was resulting in:
>>
>> U-Boot SPL 2015.07 (Apr 14 2016 - 18:47:26)
>> DRAM: 1024 MiB
>>
>> U-Boot 2015.07 (Apr 14 2016 - 18:47:26 +0200) Allwinner Technology
>>
>> CPU:   Allwinner A23 (SUN8I)
>> DRAM:  512 MiB
>>
>> Where 512 MiB is the right amount, but the DRAM controller would be
>> initialized for 1024 MiB.
>
> Is it just a single device or board? Has anybody seen anything like
> this on other devices with the same SoC?

3 a23 devices.
>
> I wonder if what you are observing could be possibly explained by just
> a usual data corruption problem? Which may be happening when the DRAM
> clock speed is set higher than this particular device is able to handle
> in a reliable way. Inserting just one or more NOP instructions instead
> of the barrier could possibly change some timings too.
>
> If this patch helps, then it's fine. But I wonder if it is not merely
> making the problem latent instead of fixing the root cause?

I do believe that this patch addresses a real problem and is not hiding
some dram timing issues, I might be wrong about the write-buffer being
the cause, it could simply be that the compiler is doing something bad
(despite the accesses being marked as volatile)  and that the DSB stops
the compiler from optimizing things too much.

Regards,

Hans
Ian Campbell April 22, 2016, 9:32 a.m. UTC | #3
On Fri, 2016-04-15 at 09:34 +0200, Hans de Goede wrote:
> > I wonder if what you are observing could be possibly explained by just
> > a usual data corruption problem? Which may be happening when the DRAM
> > clock speed is set higher than this particular device is able to handle
> > in a reliable way. Inserting just one or more NOP instructions instead
> > of the barrier could possibly change some timings too.
> > 
> > If this patch helps, then it's fine. But I wonder if it is not merely
> > making the problem latent instead of fixing the root cause?
> I do believe that this patch addresses a real problem and is not hiding
> some dram timing issues, I might be wrong about the write-buffer being
> the cause, it could simply be that the compiler is doing something bad
> (despite the accesses being marked as volatile)  and that the DSB stops
> the compiler from optimizing things too much.

I have a _very_ vague memory of seeing something not disimilar to this
(apparent write buffer interactions with MMU disabled) in the early
days of Xen development, but that was probably on models and so may not
have been representative of the intended behaviour of eventual silicon.

It might be interesting to have a look at the generated assembly and
see if it differs in more or less than the addition of the single
instruction and perhaps experiment with just a compiler barrier.

Andre, do you have any insights on this?

Ian.
Hans de Goede April 22, 2016, 10:48 a.m. UTC | #4
Hi,

On 22-04-16 11:32, Ian Campbell wrote:
> On Fri, 2016-04-15 at 09:34 +0200, Hans de Goede wrote:
>>> I wonder if what you are observing could be possibly explained by just
>>> a usual data corruption problem? Which may be happening when the DRAM
>>> clock speed is set higher than this particular device is able to handle
>>> in a reliable way. Inserting just one or more NOP instructions instead
>>> of the barrier could possibly change some timings too.
>>>
>>> If this patch helps, then it's fine. But I wonder if it is not merely
>>> making the problem latent instead of fixing the root cause?
>> I do believe that this patch addresses a real problem and is not hiding
>> some dram timing issues, I might be wrong about the write-buffer being
>> the cause, it could simply be that the compiler is doing something bad
>> (despite the accesses being marked as volatile)  and that the DSB stops
>> the compiler from optimizing things too much.
>
> I have a _very_ vague memory of seeing something not disimilar to this
> (apparent write buffer interactions with MMU disabled) in the early
> days of Xen development, but that was probably on models and so may not
> have been representative of the intended behaviour of eventual silicon.
>
> It might be interesting to have a look at the generated assembly and
> see if it differs in more or less than the addition of the single
> instruction and perhaps experiment with just a compiler barrier.
>
> Andre, do you have any insights on this?

Andre here is the original mail/patch for reference:

     sunxi: mctl_mem_matches: Add missing memory barrier

     We are running with the caches disabled when mctl_mem_matches gets called,
     but the cpu's write buffer is still there and can still get in the way,
     add a memory barrier to fix this.

     This avoids mctl_mem_matches always returning false in some cases, which
     was resulting in:

<snip>

@@ -31,6 +32,7 @@ bool mctl_mem_matches(u32 offset)
  	/* Try to write different values to RAM at two addresses */
  	writel(0, CONFIG_SYS_SDRAM_BASE);
  	writel(0xaa55aa55, (ulong)CONFIG_SYS_SDRAM_BASE + offset);
+	DSB;
  	/* Check if the same value is actually observed when reading back */
  	return readl(CONFIG_SYS_SDRAM_BASE) ==
  	       readl((ulong)CONFIG_SYS_SDRAM_BASE + offset);


What this code is trying to do is determine RAM (chip) size by seeing when
writing to RAM wrapsaround.

This works with the DSB but not without (without it always returns false)
this is on a Cortex A7 with the mmu (and data caches) disabled.

Ian, I can try using just a compiler barrier, but I've never done so
before, how do I insert one ?

Regards,

Hans
Andre Przywara April 22, 2016, 11:46 a.m. UTC | #5
Hi Hans,

thanks for the information and the heads up!

On 22/04/16 11:48, Hans de Goede wrote:
> Hi,
> 
> On 22-04-16 11:32, Ian Campbell wrote:
>> On Fri, 2016-04-15 at 09:34 +0200, Hans de Goede wrote:
>>>> I wonder if what you are observing could be possibly explained by just
>>>> a usual data corruption problem? Which may be happening when the DRAM
>>>> clock speed is set higher than this particular device is able to handle
>>>> in a reliable way. Inserting just one or more NOP instructions instead
>>>> of the barrier could possibly change some timings too.
>>>>
>>>> If this patch helps, then it's fine. But I wonder if it is not merely
>>>> making the problem latent instead of fixing the root cause?
>>> I do believe that this patch addresses a real problem and is not hiding
>>> some dram timing issues, I might be wrong about the write-buffer being
>>> the cause, it could simply be that the compiler is doing something bad
>>> (despite the accesses being marked as volatile)  and that the DSB stops
>>> the compiler from optimizing things too much.
>>
>> I have a _very_ vague memory of seeing something not disimilar to this
>> (apparent write buffer interactions with MMU disabled) in the early
>> days of Xen development, but that was probably on models and so may not
>> have been representative of the intended behaviour of eventual silicon.
>>
>> It might be interesting to have a look at the generated assembly and
>> see if it differs in more or less than the addition of the single
>> instruction and perhaps experiment with just a compiler barrier.
>>
>> Andre, do you have any insights on this?

Agree on the compiler barrier, frankly I don't see how this should break
with caches on or off unless the actual instruction order is wrong or
the compiler optimized something away.
Regardless of the write buffer the core should make sure the subsequent
reads return the value written before - especially if we are talking UP
here.

> 
> Andre here is the original mail/patch for reference:
> 
>     sunxi: mctl_mem_matches: Add missing memory barrier
> 
>     We are running with the caches disabled when mctl_mem_matches gets
> called,
>     but the cpu's write buffer is still there and can still get in the way,
>     add a memory barrier to fix this.
> 
>     This avoids mctl_mem_matches always returning false in some cases,
> which
>     was resulting in:
> 
> <snip>
> 
> @@ -31,6 +32,7 @@ bool mctl_mem_matches(u32 offset)
>      /* Try to write different values to RAM at two addresses */
>      writel(0, CONFIG_SYS_SDRAM_BASE);
>      writel(0xaa55aa55, (ulong)CONFIG_SYS_SDRAM_BASE + offset);
> +    DSB;
>      /* Check if the same value is actually observed when reading back */
>      return readl(CONFIG_SYS_SDRAM_BASE) ==
>             readl((ulong)CONFIG_SYS_SDRAM_BASE + offset);
> 
> 
> What this code is trying to do is determine RAM (chip) size by seeing when
> writing to RAM wrapsaround.
> 
> This works with the DSB but not without (without it always returns false)
> this is on a Cortex A7 with the mmu (and data caches) disabled.
> 
> Ian, I can try using just a compiler barrier, but I've never done so
> before, how do I insert one ?

barrier();

I am busy at the moment, but will take a look later.

Cheers,
Andre.
Hans de Goede April 22, 2016, 12:09 p.m. UTC | #6
Hi,

On 22-04-16 13:46, Andre Przywara wrote:
> Hi Hans,
>
> thanks for the information and the heads up!
>
> On 22/04/16 11:48, Hans de Goede wrote:
>> Hi,
>>
>> On 22-04-16 11:32, Ian Campbell wrote:
>>> On Fri, 2016-04-15 at 09:34 +0200, Hans de Goede wrote:
>>>>> I wonder if what you are observing could be possibly explained by just
>>>>> a usual data corruption problem? Which may be happening when the DRAM
>>>>> clock speed is set higher than this particular device is able to handle
>>>>> in a reliable way. Inserting just one or more NOP instructions instead
>>>>> of the barrier could possibly change some timings too.
>>>>>
>>>>> If this patch helps, then it's fine. But I wonder if it is not merely
>>>>> making the problem latent instead of fixing the root cause?
>>>> I do believe that this patch addresses a real problem and is not hiding
>>>> some dram timing issues, I might be wrong about the write-buffer being
>>>> the cause, it could simply be that the compiler is doing something bad
>>>> (despite the accesses being marked as volatile)  and that the DSB stops
>>>> the compiler from optimizing things too much.
>>>
>>> I have a _very_ vague memory of seeing something not disimilar to this
>>> (apparent write buffer interactions with MMU disabled) in the early
>>> days of Xen development, but that was probably on models and so may not
>>> have been representative of the intended behaviour of eventual silicon.
>>>
>>> It might be interesting to have a look at the generated assembly and
>>> see if it differs in more or less than the addition of the single
>>> instruction and perhaps experiment with just a compiler barrier.
>>>
>>> Andre, do you have any insights on this?
>
> Agree on the compiler barrier, frankly I don't see how this should break
> with caches on or off unless the actual instruction order is wrong or
> the compiler optimized something away.
> Regardless of the write buffer the core should make sure the subsequent
> reads return the value written before - especially if we are talking UP
> here.

"the core should make sure the subsequent reads return the value written before"
that is exactly the problem, we are writing 2 different values
to so DRAM_BASE and DRAM_BASE + 512MiB, then read them both back
and compare them, expecting them to be the same (both reads returning
the last written value) if the ramsize is 512MiB (this is used in several places
in the dram controller code to auto-config number of rows, columns, etc.).

But the core seems to just return the last written value,
rather then actually going out to the RAM and reading it from
there, which results in the function always returning false
(i.o.w. it claims no DRAM phys address wraparound is happening
  at 512MiB).

The DSB seems to fix this, but it might very well be the
compiler being to clever (although all accesses are done
through volatile pointers, so it really should not).

I'll try the barrier() fix when I've some time.

Regards,

Hans




>
>>
>> Andre here is the original mail/patch for reference:
>>
>>      sunxi: mctl_mem_matches: Add missing memory barrier
>>
>>      We are running with the caches disabled when mctl_mem_matches gets
>> called,
>>      but the cpu's write buffer is still there and can still get in the way,
>>      add a memory barrier to fix this.
>>
>>      This avoids mctl_mem_matches always returning false in some cases,
>> which
>>      was resulting in:
>>
>> <snip>
>>
>> @@ -31,6 +32,7 @@ bool mctl_mem_matches(u32 offset)
>>       /* Try to write different values to RAM at two addresses */
>>       writel(0, CONFIG_SYS_SDRAM_BASE);
>>       writel(0xaa55aa55, (ulong)CONFIG_SYS_SDRAM_BASE + offset);
>> +    DSB;
>>       /* Check if the same value is actually observed when reading back */
>>       return readl(CONFIG_SYS_SDRAM_BASE) ==
>>              readl((ulong)CONFIG_SYS_SDRAM_BASE + offset);
>>
>>
>> What this code is trying to do is determine RAM (chip) size by seeing when
>> writing to RAM wrapsaround.
>>
>> This works with the DSB but not without (without it always returns false)
>> this is on a Cortex A7 with the mmu (and data caches) disabled.
>>
>> Ian, I can try using just a compiler barrier, but I've never done so
>> before, how do I insert one ?
>
> barrier();
>
> I am busy at the moment, but will take a look later.
>
> Cheers,
> Andre.
>
Andre Przywara April 22, 2016, 1:12 p.m. UTC | #7
Hi,

On 22/04/16 13:09, Hans de Goede wrote:
> Hi,
> 
> On 22-04-16 13:46, Andre Przywara wrote:
>> Hi Hans,
>>
>> thanks for the information and the heads up!
>>
>> On 22/04/16 11:48, Hans de Goede wrote:
>>> Hi,
>>>
>>> On 22-04-16 11:32, Ian Campbell wrote:
>>>> On Fri, 2016-04-15 at 09:34 +0200, Hans de Goede wrote:
>>>>>> I wonder if what you are observing could be possibly explained by
>>>>>> just
>>>>>> a usual data corruption problem? Which may be happening when the DRAM
>>>>>> clock speed is set higher than this particular device is able to
>>>>>> handle
>>>>>> in a reliable way. Inserting just one or more NOP instructions
>>>>>> instead
>>>>>> of the barrier could possibly change some timings too.
>>>>>>
>>>>>> If this patch helps, then it's fine. But I wonder if it is not merely
>>>>>> making the problem latent instead of fixing the root cause?
>>>>> I do believe that this patch addresses a real problem and is not
>>>>> hiding
>>>>> some dram timing issues, I might be wrong about the write-buffer being
>>>>> the cause, it could simply be that the compiler is doing something bad
>>>>> (despite the accesses being marked as volatile)  and that the DSB
>>>>> stops
>>>>> the compiler from optimizing things too much.
>>>>
>>>> I have a _very_ vague memory of seeing something not disimilar to this
>>>> (apparent write buffer interactions with MMU disabled) in the early
>>>> days of Xen development, but that was probably on models and so may not
>>>> have been representative of the intended behaviour of eventual silicon.
>>>>
>>>> It might be interesting to have a look at the generated assembly and
>>>> see if it differs in more or less than the addition of the single
>>>> instruction and perhaps experiment with just a compiler barrier.
>>>>
>>>> Andre, do you have any insights on this?
>>
>> Agree on the compiler barrier, frankly I don't see how this should break
>> with caches on or off unless the actual instruction order is wrong or
>> the compiler optimized something away.
>> Regardless of the write buffer the core should make sure the subsequent
>> reads return the value written before - especially if we are talking UP
>> here.
> 
> "the core should make sure the subsequent reads return the value written
> before"
> that is exactly the problem, we are writing 2 different values
> to so DRAM_BASE and DRAM_BASE + 512MiB, then read them both back
> and compare them, expecting them to be the same (both reads returning
> the last written value) if the ramsize is 512MiB (this is used in
> several places
> in the dram controller code to auto-config number of rows, columns, etc.).
> 
> But the core seems to just return the last written value,
> rather then actually going out to the RAM and reading it from
> there, which results in the function always returning false
> (i.o.w. it claims no DRAM phys address wraparound is happening
>  at 512MiB).

Oh, right, I missed that part, sorry.
So this is about physical aliasing?
The DRAM controller has only n address lines connected, and changing a
line >n shouldn't make a difference, right?
And the write succeeds and does trigger an asynchronous abort?

In this case you would indeed need some kind of "flushing", with caches
on I'd say a DCCIMVAC (Clean and Invalidate data or unified cache line
by MVA to PoC).

So I did a quick poll around the office and people say that "dsb" is the
right thing to do here (with MMU off).
As this is backed by practical experience, I'd just say: good to go!


> The DSB seems to fix this, but it might very well be the
> compiler being to clever (although all accesses are done
> through volatile pointers, so it really should not).

Plus those writel and readl macros already have a compiler barrier,
though on the "wrong" side for our purpose (before the write and after
the read).

Cheers,
Andre.

> 
> I'll try the barrier() fix when I've some time.
> 
> Regards,
> 
> Hans
> 
> 
> 
> 
>>
>>>
>>> Andre here is the original mail/patch for reference:
>>>
>>>      sunxi: mctl_mem_matches: Add missing memory barrier
>>>
>>>      We are running with the caches disabled when mctl_mem_matches gets
>>> called,
>>>      but the cpu's write buffer is still there and can still get in
>>> the way,
>>>      add a memory barrier to fix this.
>>>
>>>      This avoids mctl_mem_matches always returning false in some cases,
>>> which
>>>      was resulting in:
>>>
>>> <snip>
>>>
>>> @@ -31,6 +32,7 @@ bool mctl_mem_matches(u32 offset)
>>>       /* Try to write different values to RAM at two addresses */
>>>       writel(0, CONFIG_SYS_SDRAM_BASE);
>>>       writel(0xaa55aa55, (ulong)CONFIG_SYS_SDRAM_BASE + offset);
>>> +    DSB;
>>>       /* Check if the same value is actually observed when reading
>>> back */
>>>       return readl(CONFIG_SYS_SDRAM_BASE) ==
>>>              readl((ulong)CONFIG_SYS_SDRAM_BASE + offset);
>>>
>>>
>>> What this code is trying to do is determine RAM (chip) size by seeing
>>> when
>>> writing to RAM wrapsaround.
>>>
>>> This works with the DSB but not without (without it always returns
>>> false)
>>> this is on a Cortex A7 with the mmu (and data caches) disabled.
>>>
>>> Ian, I can try using just a compiler barrier, but I've never done so
>>> before, how do I insert one ?
>>
>> barrier();
>>
>> I am busy at the moment, but will take a look later.
>>
>> Cheers,
>> Andre.
>>
>
Ian Campbell April 22, 2016, 1:20 p.m. UTC | #8
On Fri, 2016-04-22 at 14:12 +0100, Andre Przywara wrote:
> Hi,
> 
> On 22/04/16 13:09, Hans de Goede wrote:
> > 
> > Hi,
> > 
> > On 22-04-16 13:46, Andre Przywara wrote:
> > > 
> > > Hi Hans,
> > > 
> > > thanks for the information and the heads up!
> > > 
> > > On 22/04/16 11:48, Hans de Goede wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > On 22-04-16 11:32, Ian Campbell wrote:
> > > > > 
> > > > > On Fri, 2016-04-15 at 09:34 +0200, Hans de Goede wrote:
> > > > > > 
> > > > > > > 
> > > > > > > I wonder if what you are observing could be possibly
> > > > > > > explained by
> > > > > > > just
> > > > > > > a usual data corruption problem? Which may be happening
> > > > > > > when the DRAM
> > > > > > > clock speed is set higher than this particular device is
> > > > > > > able to
> > > > > > > handle
> > > > > > > in a reliable way. Inserting just one or more NOP
> > > > > > > instructions
> > > > > > > instead
> > > > > > > of the barrier could possibly change some timings too.
> > > > > > > 
> > > > > > > If this patch helps, then it's fine. But I wonder if it
> > > > > > > is not merely
> > > > > > > making the problem latent instead of fixing the root
> > > > > > > cause?
> > > > > > I do believe that this patch addresses a real problem and
> > > > > > is not
> > > > > > hiding
> > > > > > some dram timing issues, I might be wrong about the write-
> > > > > > buffer being
> > > > > > the cause, it could simply be that the compiler is doing
> > > > > > something bad
> > > > > > (despite the accesses being marked as volatile)  and that
> > > > > > the DSB
> > > > > > stops
> > > > > > the compiler from optimizing things too much.
> > > > > I have a _very_ vague memory of seeing something not
> > > > > disimilar to this
> > > > > (apparent write buffer interactions with MMU disabled) in the
> > > > > early
> > > > > days of Xen development, but that was probably on models and
> > > > > so may not
> > > > > have been representative of the intended behaviour of
> > > > > eventual silicon.
> > > > > 
> > > > > It might be interesting to have a look at the generated
> > > > > assembly and
> > > > > see if it differs in more or less than the addition of the
> > > > > single
> > > > > instruction and perhaps experiment with just a compiler
> > > > > barrier.
> > > > > 
> > > > > Andre, do you have any insights on this?
> > > Agree on the compiler barrier, frankly I don't see how this
> > > should break
> > > with caches on or off unless the actual instruction order is
> > > wrong or
> > > the compiler optimized something away.
> > > Regardless of the write buffer the core should make sure the
> > > subsequent
> > > reads return the value written before - especially if we are
> > > talking UP
> > > here.
> > "the core should make sure the subsequent reads return the value
> > written
> > before"
> > that is exactly the problem, we are writing 2 different values
> > to so DRAM_BASE and DRAM_BASE + 512MiB, then read them both back
> > and compare them, expecting them to be the same (both reads
> > returning
> > the last written value) if the ramsize is 512MiB (this is used in
> > several places
> > in the dram controller code to auto-config number of rows, columns,
> > etc.).
> > 
> > But the core seems to just return the last written value,
> > rather then actually going out to the RAM and reading it from
> > there, which results in the function always returning false
> > (i.o.w. it claims no DRAM phys address wraparound is happening
> >  at 512MiB).
> Oh, right, I missed that part, sorry.
> So this is about physical aliasing?
> The DRAM controller has only n address lines connected, and changing a
> line >n shouldn't make a difference, right?

Correct, it's a technique used to try and size the DRAM by determing n
by observation of the aliasing patterns.

> And the write succeeds and does trigger an asynchronous abort?

                                 ^n't

> So I did a quick poll around the office and people say that "dsb" is the
> right thing to do here (with MMU off).
> As this is backed by practical experience, I'd just say: good to go!

Patch therefore:

Acked-by: Ian Campbell <ijc@hellion.org.uk>

Ian.
diff mbox

Patch

diff --git a/arch/arm/mach-sunxi/dram_helpers.c b/arch/arm/mach-sunxi/dram_helpers.c
index 50318d2..e0c823a 100644
--- a/arch/arm/mach-sunxi/dram_helpers.c
+++ b/arch/arm/mach-sunxi/dram_helpers.c
@@ -7,6 +7,7 @@ 
  */
 
 #include <common.h>
+#include <asm/armv7.h>
 #include <asm/io.h>
 #include <asm/arch/dram.h>
 
@@ -31,6 +32,7 @@  bool mctl_mem_matches(u32 offset)
 	/* Try to write different values to RAM at two addresses */
 	writel(0, CONFIG_SYS_SDRAM_BASE);
 	writel(0xaa55aa55, (ulong)CONFIG_SYS_SDRAM_BASE + offset);
+	DSB;
 	/* Check if the same value is actually observed when reading back */
 	return readl(CONFIG_SYS_SDRAM_BASE) ==
 	       readl((ulong)CONFIG_SYS_SDRAM_BASE + offset);