diff mbox

[U-Boot,RFC] arm: Fix flush_dcache_range() on arm926

Message ID 1373416229-5746-1-git-send-email-marex@denx.de
State RFC
Delegated to: Albert ARIBAUD
Headers show

Commit Message

Marek Vasut July 10, 2013, 12:30 a.m. UTC
The flush_dcache_range() on arm926 did not work as expected on i.MX28.

This can be observed during the operation of the FEC ethernet driver
where the driver did occasionally fail with timeout trying to transmit
a frame. The FEC ethernet driver uses DMA for transmitting the frame in
the following fashion:

0) Set bits in DMA descriptor
1) Write DMA descriptor into aligned DRAM address
2) Flush D-Cache over the descriptor
3) Start DMA
4) Invalidate D-Cache over the descriptor
5) Test if certain bits in DMA descriptor are unset

Very not so often it happened that the bits in the DMA descriptor were
still set even after hardware register -- which is not cached -- indicated
otherwise.

Here I will theoreticise, Albert, can you please correct me if I'm wrong?

This leads me to believe the DMA descriptor was still in the cacheline
after being flushed in step 2) and during the DMA gets evicted into DRAM
therefore corrupting the result of readback in 5) . By reading the ARM926
datasheet DDI0198E_arm926ejs_r0p5_trm.pdf page 50 table 2-17, it is not
clear whether cacheline that is Valid+Clean will be invalidated in the
D-Cache using the "mcr p15, 0, <Rx>, c7, c14, 1" instruction or whether
only Valid+Dirty lines are cleaned+invalidated. The other thing that is
unclear to me is whether a cacheline that is Valid+Clear is written back
into DRAM when it is evicted from cache.

Interestingly enough, running invalidate_dcache_range() after doing the
flush_dcache_range() over the descriptor solved the problem and I see no
occasional timeout anymore. This confirms my opinion that the descriptor
might remain in the cache and be written back during the DMA operation.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Albert ARIBAUD <albert.u.boot@aribaud.net>
Cc: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Tom Rini <trini@ti.com>
---
 arch/arm/cpu/arm926ejs/cache.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Albert ARIBAUD July 10, 2013, 12:58 p.m. UTC | #1
Hi Marek,

On Wed, 10 Jul 2013 02:30:29 +0200, Marek Vasut <marex@denx.de> wrote:

> The flush_dcache_range() on arm926 did not work as expected on i.MX28.
> 
> This can be observed during the operation of the FEC ethernet driver
> where the driver did occasionally fail with timeout trying to transmit
> a frame. The FEC ethernet driver uses DMA for transmitting the frame in
> the following fashion:
> 
> 0) Set bits in DMA descriptor
> 1) Write DMA descriptor into aligned DRAM address
> 2) Flush D-Cache over the descriptor
> 3) Start DMA
> 4) Invalidate D-Cache over the descriptor
> 5) Test if certain bits in DMA descriptor are unset
> 
> Very not so often it happened that the bits in the DMA descriptor were
> still set even after hardware register -- which is not cached -- indicated
> otherwise.
> 
> Here I will theoreticise, Albert, can you please correct me if I'm wrong?
> 
> This leads me to believe the DMA descriptor was still in the cacheline
> after being flushed in step 2) and during the DMA gets evicted into DRAM
> therefore corrupting the result of readback in 5) . By reading the ARM926
> datasheet DDI0198E_arm926ejs_r0p5_trm.pdf page 50 table 2-17, it is not
> clear whether cacheline that is Valid+Clean will be invalidated in the
> D-Cache using the "mcr p15, 0, <Rx>, c7, c14, 1" instruction or whether
> only Valid+Dirty lines are cleaned+invalidated. The other thing that is
> unclear to me is whether a cacheline that is Valid+Clear is written back
> into DRAM when it is evicted from cache.

The way I see table 2-17, the "if valid and dirty" condition only
applies to the "write" part of the description, not to the "marked not
valid" one; this reading makes the most sense and is the most consistent
with the cache functioning as a whole.

As for a valid and clean (rather than "clear", I assume) line being
written if evicted, I think it is not. Eviction is not flushing: the
line is kicked out, plain and brutal. And even if we're talking about
flushing, if the line is clean then it is coherent with the RAM area
it caches, and writing it back there is unneeded.

> Interestingly enough, running invalidate_dcache_range() after doing the
> flush_dcache_range() over the descriptor solved the problem and I see no
> occasional timeout anymore. This confirms my opinion that the descriptor
> might remain in the cache and be written back during the DMA operation.

Can you point me to the source code location(s) where the sequence
above is implemented?

Amicalement,
Marek Vasut July 10, 2013, 1:38 p.m. UTC | #2
Hi Albert,

> Hi Marek,
> 
> On Wed, 10 Jul 2013 02:30:29 +0200, Marek Vasut <marex@denx.de> wrote:
> > The flush_dcache_range() on arm926 did not work as expected on i.MX28.
> > 
> > This can be observed during the operation of the FEC ethernet driver
> > where the driver did occasionally fail with timeout trying to transmit
> > a frame. The FEC ethernet driver uses DMA for transmitting the frame in
> > the following fashion:
> > 
> > 0) Set bits in DMA descriptor
> > 1) Write DMA descriptor into aligned DRAM address
> > 2) Flush D-Cache over the descriptor
> > 3) Start DMA
> > 4) Invalidate D-Cache over the descriptor
> > 5) Test if certain bits in DMA descriptor are unset
> > 
> > Very not so often it happened that the bits in the DMA descriptor were
> > still set even after hardware register -- which is not cached --
> > indicated otherwise.
> > 
> > Here I will theoreticise, Albert, can you please correct me if I'm wrong?
> > 
> > This leads me to believe the DMA descriptor was still in the cacheline
> > after being flushed in step 2) and during the DMA gets evicted into DRAM
> > therefore corrupting the result of readback in 5) . By reading the ARM926
> > datasheet DDI0198E_arm926ejs_r0p5_trm.pdf page 50 table 2-17, it is not
> > clear whether cacheline that is Valid+Clean will be invalidated in the
> > D-Cache using the "mcr p15, 0, <Rx>, c7, c14, 1" instruction or whether
> > only Valid+Dirty lines are cleaned+invalidated. The other thing that is
> > unclear to me is whether a cacheline that is Valid+Clear is written back
> > into DRAM when it is evicted from cache.
> 
> The way I see table 2-17, the "if valid and dirty" condition only
> applies to the "write" part of the description, not to the "marked not
> valid" one; this reading makes the most sense and is the most consistent
> with the cache functioning as a whole.

So basically all Valid+Dirty get cleaned .
Nothing is done on Valid+Clean .
Finally, as they are now all Valid+Clean , they all get Invalidated .

This is how it'd make the most sense, but I'm afraid this is not what I observe.

> As for a valid and clean (rather than "clear", I assume) line being
> written if evicted, I think it is not. Eviction is not flushing: the
> line is kicked out, plain and brutal. And even if we're talking about
> flushing, if the line is clean then it is coherent with the RAM area
> it caches, and writing it back there is unneeded.

Yes, this _does_ make full sense to me.

> > Interestingly enough, running invalidate_dcache_range() after doing the
> > flush_dcache_range() over the descriptor solved the problem and I see no
> > occasional timeout anymore. This confirms my opinion that the descriptor
> > might remain in the cache and be written back during the DMA operation.
> 
> Can you point me to the source code location(s) where the sequence
> above is implemented?

See:

 738         flush_dcache_range(addr, addr + size);

in drivers/net/fec_mxc.c . If I put invalidate_dcache_range(addr, addr + size); 
, all seems to work well. Note it also happens with multiple compilers, Debian 
gcc 4.8.0, ELDK 5.2 and ELDK 5.3 .


Best regards,
Marek Vasut
diff mbox

Patch

diff --git a/arch/arm/cpu/arm926ejs/cache.c b/arch/arm/cpu/arm926ejs/cache.c
index 2740ad7..2a8da84 100644
--- a/arch/arm/cpu/arm926ejs/cache.c
+++ b/arch/arm/cpu/arm926ejs/cache.c
@@ -72,19 +72,29 @@  void invalidate_dcache_range(unsigned long start, unsigned long stop)
 	}
 }
 
-void flush_dcache_range(unsigned long start, unsigned long stop)
+static void clean_dcache_range(unsigned long start, unsigned long stop)
 {
 	if (!check_cache_range(start, stop))
 		return;
 
 	while (start < stop) {
-		asm volatile("mcr p15, 0, %0, c7, c14, 1\n" : : "r"(start));
+		asm volatile("mcr p15, 0, %0, c7, c10, 1\n" : : "r"(start));
 		start += CONFIG_SYS_CACHELINE_SIZE;
 	}
 
 	asm volatile("mcr p15, 0, %0, c7, c10, 4\n" : : "r"(0));
 }
 
+void flush_dcache_range(unsigned long start, unsigned long stop)
+{
+	/* See DDI0198E_arm926ejs_r0p5_trm.pdf p.50 Table 2-17 */
+
+	/* Write data to RAM from cachelines that are marked Valid+Dirty. */
+	clean_dcache_range(start, stop);
+	/* Invalidate all cachelines holding this RAM area (Dirty+Clean). */
+	invalidate_dcache_range(start, stop);
+}
+
 void flush_cache(unsigned long start, unsigned long size)
 {
 	flush_dcache_range(start, start + size);