diff mbox

[U-Boot] mx6: Set shared override bit in PL310 AUX_CTRL register

Message ID 1426104732-26695-1-git-send-email-festevam@gmail.com
State Changes Requested
Delegated to: Stefano Babic
Headers show

Commit Message

Fabio Estevam March 11, 2015, 8:12 p.m. UTC
From: Fabio Estevam <fabio.estevam@freescale.com>

Having bit 22 cleared in the PL310 Auxiliary Control register (shared
attribute override enable) has the side effect of transforming Normal
Shared Non-cacheable reads into Cacheable no-allocate reads.

Coherent DMA buffers in Linux always have a Cacheable alias via the
kernel linear mapping and the processor can speculatively load cache
lines into the PL310 controller. With bit 22 cleared, Non-cacheable
reads would unexpectedly hit such cache lines leading to buffer
corruption.

This was inspired by a patch from Catalin Marinas [1] and also from recent 
discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring 
suggested that bootloaders should initialize the cache. 

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html
[2] https://lkml.org/lkml/2015/2/20/199

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
---
 arch/arm/cpu/armv7/mx6/soc.c | 8 ++++++++
 arch/arm/include/asm/pl310.h | 2 ++
 2 files changed, 10 insertions(+)

Comments

Russell King - ARM Linux March 12, 2015, 1:04 a.m. UTC | #1
On Wed, Mar 11, 2015 at 05:12:12PM -0300, Fabio Estevam wrote:
> From: Fabio Estevam <fabio.estevam@freescale.com>
> 
> Having bit 22 cleared in the PL310 Auxiliary Control register (shared
> attribute override enable) has the side effect of transforming Normal
> Shared Non-cacheable reads into Cacheable no-allocate reads.
> 
> Coherent DMA buffers in Linux always have a Cacheable alias via the
> kernel linear mapping and the processor can speculatively load cache
> lines into the PL310 controller.

No, this is wrong.  They do not.  CMA remaps pages to be non-cacheable
rather than the old technique where the above statement was true.

There's some corner cases which make that less effective than it once
was, and as I've already said, those need to be fixed.  The reason
that these were missed is because all the ARM CMA work bypassed me -
CMA on ARM has had zero review from the point of view of the ARM
architecture, so it's not surprising it gets stuff like this wrong.

Once that's fixed, setting bit 22 is not necessary.
Fabio Estevam March 12, 2015, 2:27 a.m. UTC | #2
On Wed, Mar 11, 2015 at 10:04 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:

> No, this is wrong.  They do not.  CMA remaps pages to be non-cacheable
> rather than the old technique where the above statement was true.
>
> There's some corner cases which make that less effective than it once
> was, and as I've already said, those need to be fixed.  The reason
> that these were missed is because all the ARM CMA work bypassed me -
> CMA on ARM has had zero review from the point of view of the ARM
> architecture, so it's not surprising it gets stuff like this wrong.
>
> Once that's fixed, setting bit 22 is not necessary.

Understood. Thanks for the clarification.
Catalin Marinas March 12, 2015, 9:31 a.m. UTC | #3
On Thu, Mar 12, 2015 at 01:04:31AM +0000, Russell King - ARM Linux wrote:
> On Wed, Mar 11, 2015 at 05:12:12PM -0300, Fabio Estevam wrote:
> > From: Fabio Estevam <fabio.estevam@freescale.com>
> > 
> > Having bit 22 cleared in the PL310 Auxiliary Control register (shared
> > attribute override enable) has the side effect of transforming Normal
> > Shared Non-cacheable reads into Cacheable no-allocate reads.
> > 
> > Coherent DMA buffers in Linux always have a Cacheable alias via the
> > kernel linear mapping and the processor can speculatively load cache
> > lines into the PL310 controller.
> 
> No, this is wrong.  They do not.  CMA remaps pages to be non-cacheable
> rather than the old technique where the above statement was true.
> 
> There's some corner cases which make that less effective than it once
> was, and as I've already said, those need to be fixed.  The reason
> that these were missed is because all the ARM CMA work bypassed me -
> CMA on ARM has had zero review from the point of view of the ARM
> architecture, so it's not surprising it gets stuff like this wrong.
> 
> Once that's fixed, setting bit 22 is not necessary.

And I strongly disagree with your statement. Seriously, there are so
many assumption about when this bit will no longer be required like the
platform always using CMA, having fixed the CMA code in Linux. That's a
boot loader patch and even though it's used mostly (only) to load Linux,
we should not make this assumption.

Most importantly, not setting this bit makes your SoC non-compliant with
the ARM ARM clarifications on mismatched aliases. It was a hardware
mistake to have it cleared out of reset but just let firmware or
boot-loaders deal with it.

(there are some very specific use-cases for this bit that the hw guys
had in mind but none of them apply to Linux)
Catalin Marinas March 12, 2015, 9:32 a.m. UTC | #4
On Wed, Mar 11, 2015 at 08:12:12PM +0000, Fabio Estevam wrote:
> From: Fabio Estevam <fabio.estevam@freescale.com>
> 
> Having bit 22 cleared in the PL310 Auxiliary Control register (shared
> attribute override enable) has the side effect of transforming Normal
> Shared Non-cacheable reads into Cacheable no-allocate reads.
> 
> Coherent DMA buffers in Linux always have a Cacheable alias via the
> kernel linear mapping and the processor can speculatively load cache
> lines into the PL310 controller. With bit 22 cleared, Non-cacheable
> reads would unexpectedly hit such cache lines leading to buffer
> corruption.
> 
> This was inspired by a patch from Catalin Marinas [1] and also from recent 
> discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring 
> suggested that bootloaders should initialize the cache. 
> 
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html
> [2] https://lkml.org/lkml/2015/2/20/199
> 
> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tom Rini March 12, 2015, 1:41 p.m. UTC | #5
On Wed, Mar 11, 2015 at 05:12:12PM -0300, Fabio Estevam wrote:

> From: Fabio Estevam <fabio.estevam@freescale.com>
> 
> Having bit 22 cleared in the PL310 Auxiliary Control register (shared
> attribute override enable) has the side effect of transforming Normal
> Shared Non-cacheable reads into Cacheable no-allocate reads.
> 
> Coherent DMA buffers in Linux always have a Cacheable alias via the
> kernel linear mapping and the processor can speculatively load cache
> lines into the PL310 controller. With bit 22 cleared, Non-cacheable
> reads would unexpectedly hit such cache lines leading to buffer
> corruption.
> 
> This was inspired by a patch from Catalin Marinas [1] and also from recent 
> discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring 
> suggested that bootloaders should initialize the cache. 
> 
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html
> [2] https://lkml.org/lkml/2015/2/20/199
> 
> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
> ---
>  arch/arm/cpu/armv7/mx6/soc.c | 8 ++++++++
>  arch/arm/include/asm/pl310.h | 2 ++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/arch/arm/cpu/armv7/mx6/soc.c b/arch/arm/cpu/armv7/mx6/soc.c
> index ef02972..5aab305 100644
> --- a/arch/arm/cpu/armv7/mx6/soc.c
> +++ b/arch/arm/cpu/armv7/mx6/soc.c
> @@ -506,6 +506,14 @@ void v7_outer_cache_enable(void)
>  	struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE;
>  	unsigned int val;
>  
> +
> +	/*
> +	 * Set bit 22 in the auxiliary control register. If this bit
> +	 * is cleared, PL310 treats Normal Shared Non-cacheable
> +	 * accesses as Cacheable no-allocate.
> +	 */
> +	setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE);
> +
>  #if defined CONFIG_MX6SL
>  	struct iomuxc *iomux = (struct iomuxc *)IOMUXC_BASE_ADDR;
>  	val = readl(&iomux->gpr[11]);

We should put this somewhere a bit more common that other A9 cores can
also call into like OMAP4, SoCFPGA and maybe zynq later (based on a
quick git grep pl310).
Fabio Estevam March 12, 2015, 1:57 p.m. UTC | #6
On Thu, Mar 12, 2015 at 10:41 AM, Tom Rini <trini@konsulko.com> wrote:

> We should put this somewhere a bit more common that other A9 cores can
> also call into like OMAP4, SoCFPGA and maybe zynq later (based on a
> quick git grep pl310).

I thought about it as well, but I didn't find a suitable common place
for putting it.

Suggestions? Thanks
Nishanth Menon March 12, 2015, 2:17 p.m. UTC | #7
On 03/11/2015 03:12 PM, Fabio Estevam wrote:
> From: Fabio Estevam <fabio.estevam@freescale.com>
> 
> Having bit 22 cleared in the PL310 Auxiliary Control register (shared
> attribute override enable) has the side effect of transforming Normal
> Shared Non-cacheable reads into Cacheable no-allocate reads.
> 
> Coherent DMA buffers in Linux always have a Cacheable alias via the
> kernel linear mapping and the processor can speculatively load cache
> lines into the PL310 controller. With bit 22 cleared, Non-cacheable
> reads would unexpectedly hit such cache lines leading to buffer
> corruption.
> 
> This was inspired by a patch from Catalin Marinas [1] and also from recent 
> discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring 
> suggested that bootloaders should initialize the cache. 
> 
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html
> [2] https://lkml.org/lkml/2015/2/20/199
> 
> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
> ---
>  arch/arm/cpu/armv7/mx6/soc.c | 8 ++++++++
>  arch/arm/include/asm/pl310.h | 2 ++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/arch/arm/cpu/armv7/mx6/soc.c b/arch/arm/cpu/armv7/mx6/soc.c
> index ef02972..5aab305 100644
> --- a/arch/arm/cpu/armv7/mx6/soc.c
> +++ b/arch/arm/cpu/armv7/mx6/soc.c
> @@ -506,6 +506,14 @@ void v7_outer_cache_enable(void)
>  	struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE;
>  	unsigned int val;
>  
> +
> +	/*
> +	 * Set bit 22 in the auxiliary control register. If this bit
> +	 * is cleared, PL310 treats Normal Shared Non-cacheable
> +	 * accesses as Cacheable no-allocate.
> +	 */
> +	setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE);
> +
>  #if defined CONFIG_MX6SL
>  	struct iomuxc *iomux = (struct iomuxc *)IOMUXC_BASE_ADDR;
>  	val = readl(&iomux->gpr[11]);
> diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h
> index ddc245b..de7650e 100644
> --- a/arch/arm/include/asm/pl310.h
> +++ b/arch/arm/include/asm/pl310.h
> @@ -16,6 +16,8 @@
>  #define L2X0_STNDBY_MODE_EN			(1 << 0)
>  #define L2X0_CTRL_EN				1
>  
> +#define L310_SHARED_ATT_OVERRIDE_ENABLE		(1 << 22)
> +
>  struct pl310_regs {
>  	u32 pl310_cache_id;
>  	u32 pl310_cache_type;
> 

is it possible for us to centralize the pl310 logic - that'd let us
reuse generic logic cross SoCs without having to duplicate bits like
these over and over. at least A9 based TI SoCs could potentially
benefit out of this.

The only problem was to deal with actual PL310 configuration path
which could be SoC dependent, but then, we could implement weak
functions that allow us to override the same. I tried to do something
like that for CP15 errata[1]

just my 2 cents.

[1] http://article.gmane.org/gmane.comp.boot-loaders.u-boot/214436
Stefano Babic May 15, 2015, 1:35 p.m. UTC | #8
Hi Fabio,

On 11/03/2015 21:12, Fabio Estevam wrote:
> From: Fabio Estevam <fabio.estevam@freescale.com>
> 
> Having bit 22 cleared in the PL310 Auxiliary Control register (shared
> attribute override enable) has the side effect of transforming Normal
> Shared Non-cacheable reads into Cacheable no-allocate reads.
> 
> Coherent DMA buffers in Linux always have a Cacheable alias via the
> kernel linear mapping and the processor can speculatively load cache
> lines into the PL310 controller. With bit 22 cleared, Non-cacheable
> reads would unexpectedly hit such cache lines leading to buffer
> corruption.
> 
> This was inspired by a patch from Catalin Marinas [1] and also from recent 
> discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring 
> suggested that bootloaders should initialize the cache. 
> 
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html
> [2] https://lkml.org/lkml/2015/2/20/199
> 
> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
> ---
>  arch/arm/cpu/armv7/mx6/soc.c | 8 ++++++++
>  arch/arm/include/asm/pl310.h | 2 ++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/arch/arm/cpu/armv7/mx6/soc.c b/arch/arm/cpu/armv7/mx6/soc.c
> index ef02972..5aab305 100644
> --- a/arch/arm/cpu/armv7/mx6/soc.c
> +++ b/arch/arm/cpu/armv7/mx6/soc.c
> @@ -506,6 +506,14 @@ void v7_outer_cache_enable(void)
>  	struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE;
>  	unsigned int val;
>  
> +
> +	/*
> +	 * Set bit 22 in the auxiliary control register. If this bit
> +	 * is cleared, PL310 treats Normal Shared Non-cacheable
> +	 * accesses as Cacheable no-allocate.
> +	 */
> +	setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE);
> +
>  #if defined CONFIG_MX6SL
>  	struct iomuxc *iomux = (struct iomuxc *)IOMUXC_BASE_ADDR;
>  	val = readl(&iomux->gpr[11]);
> diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h
> index ddc245b..de7650e 100644
> --- a/arch/arm/include/asm/pl310.h
> +++ b/arch/arm/include/asm/pl310.h
> @@ -16,6 +16,8 @@
>  #define L2X0_STNDBY_MODE_EN			(1 << 0)
>  #define L2X0_CTRL_EN				1
>  
> +#define L310_SHARED_ATT_OVERRIDE_ENABLE		(1 << 22)
> +
>  struct pl310_regs {
>  	u32 pl310_cache_id;
>  	u32 pl310_cache_type;
> 

It looks like from the discussion and the following threads that a
general solution cannot be easy found. I agree with you to apply it at
least for i.MX6, and let's see if in the future we can factorize it for
other SOCs.

Best regards,
Stefano Babic
Fabio Estevam May 15, 2015, 1:37 p.m. UTC | #9
Hi Stefano,

On Fri, May 15, 2015 at 10:35 AM, Stefano Babic <sbabic@denx.de> wrote:

> It looks like from the discussion and the following threads that a
> general solution cannot be easy found. I agree with you to apply it at
> least for i.MX6, and let's see if in the future we can factorize it for
> other SOCs.

That sounds good! Thanks
diff mbox

Patch

diff --git a/arch/arm/cpu/armv7/mx6/soc.c b/arch/arm/cpu/armv7/mx6/soc.c
index ef02972..5aab305 100644
--- a/arch/arm/cpu/armv7/mx6/soc.c
+++ b/arch/arm/cpu/armv7/mx6/soc.c
@@ -506,6 +506,14 @@  void v7_outer_cache_enable(void)
 	struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE;
 	unsigned int val;
 
+
+	/*
+	 * Set bit 22 in the auxiliary control register. If this bit
+	 * is cleared, PL310 treats Normal Shared Non-cacheable
+	 * accesses as Cacheable no-allocate.
+	 */
+	setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE);
+
 #if defined CONFIG_MX6SL
 	struct iomuxc *iomux = (struct iomuxc *)IOMUXC_BASE_ADDR;
 	val = readl(&iomux->gpr[11]);
diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h
index ddc245b..de7650e 100644
--- a/arch/arm/include/asm/pl310.h
+++ b/arch/arm/include/asm/pl310.h
@@ -16,6 +16,8 @@ 
 #define L2X0_STNDBY_MODE_EN			(1 << 0)
 #define L2X0_CTRL_EN				1
 
+#define L310_SHARED_ATT_OVERRIDE_ENABLE		(1 << 22)
+
 struct pl310_regs {
 	u32 pl310_cache_id;
 	u32 pl310_cache_type;