diff mbox

[U-Boot] ARM926: Add mb to the cache invalidate/flush

Message ID 1349822669-26274-1-git-send-email-marex@denx.de
State Changes Requested
Delegated to: Albert ARIBAUD
Headers show

Commit Message

Marek Vasut Oct. 9, 2012, 10:44 p.m. UTC
Add memory barrier to cache invalidate and flush calls.

Signed-off-by: Marek Vasut <marex@denx.de>
CC: Albert Aribaud <albert.u.boot@aribaud.net>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Otavio Salvador <otavio@ossystems.com.br>
Cc: Stefano Babic <sbabic@denx.de>
---
 arch/arm/cpu/arm926ejs/cache.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Comments

Albert ARIBAUD Oct. 11, 2012, 5:31 a.m. UTC | #1
Hi Marek,

On Wed, 10 Oct 2012 00:44:29 +0200, Marek Vasut <marex@denx.de> wrote:

> Add memory barrier to cache invalidate and flush calls.

Memory barrier...

"You keep using that word. I do not think it means what you think it
means." :)

A memory barrier's effect is only that all of the volatile accesses
placed before it in the source code finish when the barrier executes,
and that none of the volatile accesses placed after it in the source
code starts before the barrier has executed.

Non-volatile accesses are not guaranteed to stay on one side of the
barrier, and the barrier itself is not guaranteed to stay put during
optimizations.

If what you intended was to ensure that e.g. all writes be finished
when a flush occurs, or that no read happens before an invalide has
executed, then adding memory clobbers is not an adequate solution.

If you were aiming for something else entirely, please don't hesitate
to develop a description of the problem you wish to solve.

Amicalement,
Marek Vasut Oct. 11, 2012, 12:09 p.m. UTC | #2
Dear Albert ARIBAUD,

> Hi Marek,
> 
> On Wed, 10 Oct 2012 00:44:29 +0200, Marek Vasut <marex@denx.de> wrote:
> > Add memory barrier to cache invalidate and flush calls.
> 
> Memory barrier...
> 
> "You keep using that word. I do not think it means what you think it
> means." :)
> 
> A memory barrier's effect is only that all of the volatile accesses
> placed before it in the source code finish when the barrier executes,
> and that none of the volatile accesses placed after it in the source
> code starts before the barrier has executed.
> 
> Non-volatile accesses are not guaranteed to stay on one side of the
> barrier, and the barrier itself is not guaranteed to stay put during
> optimizations.
> 
> If what you intended was to ensure that e.g. all writes be finished
> when a flush occurs

Yes, that's pretty much it. To ensure that all writes to the flushed memory area 
are finished before the flushing happens.

> or that no read happens before an invalide has
> executed, then adding memory clobbers is not an adequate solution.

What do you suggest?

> If you were aiming for something else entirely, please don't hesitate
> to develop a description of the problem you wish to solve.
> 
> Amicalement,

Best regards,
Marek Vasut
Albert ARIBAUD Oct. 11, 2012, 8:01 p.m. UTC | #3
Hi Mark,

Thanks for your example.

> My understanding of gcc is that global memory accesses are meant to 
> stay on the correct side of an asm with a "memory" clobber.  The gcc 
> manual states that if you use a memory clobber, the asm should also 
> be volatile.

Not exactly. It states that you need to add volatile if you cannot tell
where in memory your instruction will write; if you can tell (by
specifying "m" as an output of the asm) then volatile is not
needed -- simply because the compiler can tell where in memory the
write will happen, and will thus not eliminate the asm statement as
long as the destination memory is not optimized out.

> I'm not sure if adding the memory clobber is enough, but it's certainly a help.

memory clobber can help, but I don't think it helps, and I know it does
not help enough, in the patch's case.

> Regards,
> 
> Mark M.

Amicalement,
Albert ARIBAUD Oct. 11, 2012, 8:03 p.m. UTC | #4
Hi Scott,

On Thu, 11 Oct 2012 13:03:13 -0500, Scott Wood
<scottwood@freescale.com> wrote:

> On 10/11/2012 12:31:46 AM, Albert ARIBAUD wrote:
> > Hi Marek,
> > 
> > On Wed, 10 Oct 2012 00:44:29 +0200, Marek Vasut <marex@denx.de> wrote:
> > 
> > > Add memory barrier to cache invalidate and flush calls.
> > 
> > Memory barrier...
> > 
> > "You keep using that word. I do not think it means what you think it
> > means." :)
> 
> Could we wait on the condescension until your assertion of what a  
> memory clobber does and does not do is resolved?

Scott, I think you should not mistake as condescension what is just
humo(u)r. What I wrote above is a quotation from a (light, quite
humorous and above all, self-mocking) movie, meant to be read by, but in
no way directed against, Marex. Besides, since you followed the IRC
discussion, you knew the actual exact meanings of "memory barrier" and
of the "memory" clobber are not that easy to grasp, which makes the
quotation quite appropriate without necessarily.

> > A memory barrier's effect is only that all of the volatile accesses
> > placed before it in the source code finish when the barrier executes,
> > and that none of the volatile accesses placed after it in the source
> > code starts before the barrier has executed.
> 
> Cite from official GCC documentation please, or example code that shows  
> a problem.

http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Extended-Asm

"If your assembler instructions access memory in an unpredictable
fashion, add `memory' to the list of clobbered registers. This will
cause GCC to not keep memory values cached in registers across the
assembler instruction and not optimize stores or loads to that memory.
You will also want to add the volatile keyword if the memory affected
is not listed in the inputs or outputs of the asm, as the `memory'
clobber does not count as a side-effect of the asm".

> We've use memory barriers like this all the time.  It works and is  
> standard practice.  If it doesn't work like that it needs to be fixed.

I have used memory barriers too, and I've already seen some weird
things happening because they were used in ways that did not match
their effects. Particularly, we did not use memory clobbers on cache
flush or invalidate operations, we used them on actual barrier
operations -- dsb, dmb and their cp15 incarnations.

> That AVR/ARM example you showed on IRC is special because it's calling  
> a libgcc function and GCC knows that the function doesn't access memory  
> (loading constant data for the argument doesn't count).  I couldn't get  
> the same thing to happen with a normal function, even when declared  
> with __attribute__((const)).  Yes, it's a problem for ordering code in  
> general and thus keeping slow stuff out of critical sections, but it  
> shouldn't be a problem for ordering memory accesses.

Can you *guarantee* that no valid C code will ever let a non-volatile
write slip across a memory clobber?

Memory clobbers do not guarantee this, at least not explicitly in their
description, whereas C sequence points do. For instance, the call to a
function is a sequence point, reached only after its arguments have
been evaluated.

> -Scott

Amicalement,
Scott Wood Oct. 11, 2012, 9:09 p.m. UTC | #5
On 10/11/2012 03:01:32 PM, Albert ARIBAUD wrote:
> Hi Mark,
> 
> Thanks for your example.
> 
> > My understanding of gcc is that global memory accesses are meant to
> > stay on the correct side of an asm with a "memory" clobber.  The gcc
> > manual states that if you use a memory clobber, the asm should also
> > be volatile.
> 
> Not exactly. It states that you need to add volatile if you cannot  
> tell
> where in memory your instruction will write; if you can tell (by
> specifying "m" as an output of the asm) then volatile is not
> needed -- simply because the compiler can tell where in memory the
> write will happen, and will thus not eliminate the asm statement as
> long as the destination memory is not optimized out.

You're confusing the part about adding volatile to the asm statement to  
keep it from being completely removed, from anything to do with  
ordering or clobbers.

-Scott
Albert ARIBAUD Oct. 11, 2012, 10:44 p.m. UTC | #6
Hi Scott,

On Thu, 11 Oct 2012 16:09:28 -0500, Scott Wood
<scottwood@freescale.com> wrote:

> On 10/11/2012 03:01:32 PM, Albert ARIBAUD wrote:
> > Hi Mark,
> > 
> > Thanks for your example.
> > 
> > > My understanding of gcc is that global memory accesses are meant to
> > > stay on the correct side of an asm with a "memory" clobber.  The gcc
> > > manual states that if you use a memory clobber, the asm should also
> > > be volatile.
> > 
> > Not exactly. It states that you need to add volatile if you cannot  
> > tell
> > where in memory your instruction will write; if you can tell (by
> > specifying "m" as an output of the asm) then volatile is not
> > needed -- simply because the compiler can tell where in memory the
> > write will happen, and will thus not eliminate the asm statement as
> > long as the destination memory is not optimized out.
> 
> You're confusing the part about adding volatile to the asm statement to  
> keep it from being completely removed, from anything to do with  
> ordering or clobbers.

*Please* stop pretending I am saying things I haven't.

All I said in the above answer is that contrary to what Mark said about
the doc, it does not require adding volatile to memory clobbers: 'You
will also want to add the volatile keyword if the memory affected is
not listed in the inputs or outputs of the asm, as the `memory' clobber
does not count as a side-effect of the asm'.

*Nowhere* in my answer above did I state anything related to ordering.

> -Scott

Amicalement,
Albert ARIBAUD Oct. 11, 2012, 11:37 p.m. UTC | #7
Hi Scott,

On Thu, 11 Oct 2012 15:21:28 -0500, Scott Wood
<scottwood@freescale.com> wrote:

> > Scott, I think you should not mistake as condescension what is just
> > humo(u)r. What I wrote above is a quotation from a (light, quite
> > humorous and above all, self-mocking) movie, meant to be read by, but  
> > in
> > no way directed against, Marex.
> 
> Sure, it just seemed an odd way to resume a conversation that had  
> already begun on the topic.

It felt odd to you obviously. But does oddity imply condescension?
Seems to me like you're assuming a lot on what or how I think.

> > > > A memory barrier's effect is only that all of the volatile  
> > accesses
> > > > placed before it in the source code finish when the barrier  
> > executes,
> > > > and that none of the volatile accesses placed after it in the  
> > source
> > > > code starts before the barrier has executed.
> > >
> > > Cite from official GCC documentation please, or example code that  
> > shows
> > > a problem.
> > 
> > http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Extended-Asm
> > 
> > "If your assembler instructions access memory in an unpredictable
> > fashion, add `memory' to the list of clobbered registers. This will
> > cause GCC to not keep memory values cached in registers across the
> > assembler instruction and not optimize stores or loads to that memory.
> > You will also want to add the volatile keyword if the memory affected
> > is not listed in the inputs or outputs of the asm, as the `memory'
> > clobber does not count as a side-effect of the asm".
> 
> "and not optimize stores or loads to that memory".  It's not clear what  
> "that" refers to, since the memory clobber does not refer to specific  
> memory, but given that the purpose is "if your assembler instructions  
> access memory in an unpredictable fashion", I don't see how it could be  
> interpreted as anything other than "any memory which could possibly be  
> modified by the program".  So it excludes constant data, but that's  
> about it.

It does not necessarily include "all memory". Besides, "that"  -- to me
-- cleary means "the memory mentioned in the statement, for instance in
the inputs or outputs.

> The only reference to volatile is to tell you to add it to the asm  
> statement (not to other memory accesses) so that the asm statement does  
> not get removed altogether.

The memory clobber definition says no memory values are kept cached
in registers across the instruction, that implies that if a volatile
access was prepared (a memory value was cached in a register) it is
finished before the asm statement executes, and similarly, since the
desciption says "across" the instruction, volatiles reads or writes
located after the instruction won't have been started before ithe
instruction executest, or they would have needed to cache the value,
which is contrary to the memory clobber definition.

This is not to say that only volatiles are affected by the barrier; but
volatiles certainly are.

> > > We've use memory barriers like this all the time.  It works and is
> > > standard practice.  If it doesn't work like that it needs to be  
> > fixed.
> > 
> > I have used memory barriers too, and I've already seen some weird
> > things happening because they were used in ways that did not match
> > their effects. Particularly, we did not use memory clobbers on cache
> > flush or invalidate operations, we used them on actual barrier
> > operations -- dsb, dmb and their cp15 incarnations.
> 
> What specifically did you see the compiler do?

I'll have to look up the repositories. I'll keep you posted once I find
back examples.

> > > That AVR/ARM example you showed on IRC is special because it's  
> > calling
> > > a libgcc function and GCC knows that the function doesn't access  
> > memory
> > > (loading constant data for the argument doesn't count).  I couldn't  
> > get
> > > the same thing to happen with a normal function, even when declared
> > > with __attribute__((const)).  Yes, it's a problem for ordering code  
> > in
> > > general and thus keeping slow stuff out of critical sections, but it
> > > shouldn't be a problem for ordering memory accesses.
> > 
> > Can you *guarantee* that no valid C code will ever let a non-volatile
> > write slip across a memory clobber?
> > 
> > Memory clobbers do not guarantee this, at least not explicitly in  
> > their
> > description, whereas C sequence points do. For instance, the call to a
> > function is a sequence point, reached only after its arguments have
> > been evaluated.
> 
> I don't know GCC internals, so I personally can't guarantee anything.   
> What I know is that they're used for this purpose all over the place,  
> and if there really is a problem it needs to be fixed.  If the use in  
> this patch is wrong, then so are Linux synchronization primitives, for  
> example.  How would you make a spinlock?  Certainly you can't insist  
> that all the variables protected by the lock be volatile.

My point is that memory clobbers have some effect, but just because a
memory clobber appears somewhere does not mean it is responsible for
all that happens there.

Regarding Linux spinlocks vs these patches: it's not the same
situation. spinlock functions are inlined, as you noted, thus a code
sequence that takes a spinlock, does some accesses, then releases the
spinlock ends up as a long sequence of instructions. On the contrary,
the cache functions (which are not going to be inlined any time soon as
they are strong versions of weak symbols, incompatible with inlining)
contain a single asm statement, thus adding a memory clobber to this
statement won't have any effect for lack of preceding or following
instructions to (not) reorder.

> -Scott

Amicalement,
Albert ARIBAUD Oct. 13, 2012, 9:56 a.m. UTC | #8
Hi Marek,

First, a (long) preamble with some general considerations:

A. This patch does not fix an actual issues; it is a prospective patch,
modifying code which so far has not malfunctioned, or at least has not
been reported to malfunction.

B. My comments on the patch below are based on the general consideration
that the effect of a memory clobber is to contrain the reordering of
statements around the clobbering. For the sake of simplicity -- and
serenity :) -- my comments are also made under the assumption that the
clobber prevents any access (volatile or not, write or read) from
crossing it.

C. Another general comment is that adding clobber to instructions other
than barriers is IMO not a good thing and isb() should be used instead,
for two reasons:

1) it mixes an implicit secondary purpose into a statement written for
another, explicit purpose; this can drown the implicit purpose into
oblivion, when it should actually be emphasized, which is the goal and
effect of isb();

2) it mixes the ends and the means. The end of your patch is to
put instruction barriers between statements so that their relative
order is preserved during optimization; adding "memory" to the clobber
list of an asm instruction that happens to be one of the statements is
a means, but so is isb() with the added benefit that using isb() allows
architectures to use whatever means (memory clobber, specialized
instruction, other) are best for the arch.

D. My comments on the patch below are based on the current source code.

One could argue that this may change if the function becomes inline.
While this is true, I do not consider this right now because 1) the
function is a strong replacement of a weak symbol, which AFAIU is not
compatible with inlining, and 2) this patch is against the current
tree, not a potential future tree. If/when a patch arrives to make the
function inline, we'll consider the implications and how to solve them
*then*.

Ther may be another impact on this function, namely LTO; but --again,
AFAIU -- LTO does not rewrite the inside of functions; it only
optimizes between functions. And again, we'll deal with LTO if/when
patches for LTO get submitted.

... and, there may be future changes not imagined here that will break
things. We'll deal with them then.

On Wed, 10 Oct 2012 00:44:29 +0200, Marek Vasut <marex@denx.de> wrote:

> Add memory barrier to cache invalidate and flush calls.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> CC: Albert Aribaud <albert.u.boot@aribaud.net>
> Cc: Fabio Estevam <festevam@gmail.com>
> Cc: Otavio Salvador <otavio@ossystems.com.br>
> Cc: Stefano Babic <sbabic@denx.de>
> ---
>  arch/arm/cpu/arm926ejs/cache.c |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/cpu/arm926ejs/cache.c b/arch/arm/cpu/arm926ejs/cache.c
> index 2740ad7..1c67608 100644
> --- a/arch/arm/cpu/arm926ejs/cache.c
> +++ b/arch/arm/cpu/arm926ejs/cache.c
> @@ -30,7 +30,7 @@
>  
>  void invalidate_dcache_all(void)
>  {
> -	asm volatile("mcr p15, 0, %0, c7, c6, 0\n" : : "r"(0));
> +	asm volatile("mcr p15, 0, %0, c7, c6, 0\n" : : "r"(0) : "memory");
>  }

This one is useless since there are no accesses in the function to be
reordered.

>  void flush_dcache_all(void)
> @@ -67,7 +67,8 @@ void invalidate_dcache_range(unsigned long start, unsigned long stop)
>  		return;
>  
>  	while (start < stop) {
> -		asm volatile("mcr p15, 0, %0, c7, c6, 1\n" : : "r"(start));
> +		asm volatile("mcr p15, 0, %0, c7, c6, 1\n"
> +				: : "r"(start) : "memory");
>  		start += CONFIG_SYS_CACHELINE_SIZE;
>  	}
>  }

This one is useless too, as the only access it could constrain is the
one affecting start, which is also affected by the would-be-clobbered
statement (and the enclosing while's condition, thus already preventing
the compiler from reordering.

> @@ -78,11 +79,12 @@ void flush_dcache_range(unsigned long start, unsigned long stop)
>  		return;
>  
>  	while (start < stop) {
> -		asm volatile("mcr p15, 0, %0, c7, c14, 1\n" : : "r"(start));
> +		asm volatile("mcr p15, 0, %0, c7, c14, 1\n"
> +				: : "r"(start) : "memory");
>  		start += CONFIG_SYS_CACHELINE_SIZE;
>  	}
  
Here again, the only access the clobber could constrain is the one
affecting start, which is also affected by the would-be-clobbered
statement (and the enclosing while's condition, thus already preventing
the compiler from reordering.

> -	asm volatile("mcr p15, 0, %0, c7, c10, 4\n" : : "r"(0));
> +	asm volatile("mcr p15, 0, %0, c7, c10, 4\n" : : "r"(0) : "memory");
>  }

Now this asm statement might potentially move around as it does not have
input or output dependencies that the compiler could possibly use to
assess ordering constraints. I would thus suggest replacing the memory
clobber with an 'isb();' placed on the line before the asm volatile,
for the reasons indicated in part C of my (long) preamble above.

>  void flush_cache(unsigned long start, unsigned long size)

Amicalement,
diff mbox

Patch

diff --git a/arch/arm/cpu/arm926ejs/cache.c b/arch/arm/cpu/arm926ejs/cache.c
index 2740ad7..1c67608 100644
--- a/arch/arm/cpu/arm926ejs/cache.c
+++ b/arch/arm/cpu/arm926ejs/cache.c
@@ -30,7 +30,7 @@ 
 
 void invalidate_dcache_all(void)
 {
-	asm volatile("mcr p15, 0, %0, c7, c6, 0\n" : : "r"(0));
+	asm volatile("mcr p15, 0, %0, c7, c6, 0\n" : : "r"(0) : "memory");
 }
 
 void flush_dcache_all(void)
@@ -67,7 +67,8 @@  void invalidate_dcache_range(unsigned long start, unsigned long stop)
 		return;
 
 	while (start < stop) {
-		asm volatile("mcr p15, 0, %0, c7, c6, 1\n" : : "r"(start));
+		asm volatile("mcr p15, 0, %0, c7, c6, 1\n"
+				: : "r"(start) : "memory");
 		start += CONFIG_SYS_CACHELINE_SIZE;
 	}
 }
@@ -78,11 +79,12 @@  void flush_dcache_range(unsigned long start, unsigned long stop)
 		return;
 
 	while (start < stop) {
-		asm volatile("mcr p15, 0, %0, c7, c14, 1\n" : : "r"(start));
+		asm volatile("mcr p15, 0, %0, c7, c14, 1\n"
+				: : "r"(start) : "memory");
 		start += CONFIG_SYS_CACHELINE_SIZE;
 	}
 
-	asm volatile("mcr p15, 0, %0, c7, c10, 4\n" : : "r"(0));
+	asm volatile("mcr p15, 0, %0, c7, c10, 4\n" : : "r"(0) : "memory");
 }
 
 void flush_cache(unsigned long start, unsigned long size)