Message ID | 1426104732-26695-1-git-send-email-festevam@gmail.com |
---|---|
State | Changes Requested |
Delegated to: | Stefano Babic |
Headers | show |
On Wed, Mar 11, 2015 at 05:12:12PM -0300, Fabio Estevam wrote: > From: Fabio Estevam <fabio.estevam@freescale.com> > > Having bit 22 cleared in the PL310 Auxiliary Control register (shared > attribute override enable) has the side effect of transforming Normal > Shared Non-cacheable reads into Cacheable no-allocate reads. > > Coherent DMA buffers in Linux always have a Cacheable alias via the > kernel linear mapping and the processor can speculatively load cache > lines into the PL310 controller. No, this is wrong. They do not. CMA remaps pages to be non-cacheable rather than the old technique where the above statement was true. There's some corner cases which make that less effective than it once was, and as I've already said, those need to be fixed. The reason that these were missed is because all the ARM CMA work bypassed me - CMA on ARM has had zero review from the point of view of the ARM architecture, so it's not surprising it gets stuff like this wrong. Once that's fixed, setting bit 22 is not necessary.
On Wed, Mar 11, 2015 at 10:04 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > No, this is wrong. They do not. CMA remaps pages to be non-cacheable > rather than the old technique where the above statement was true. > > There's some corner cases which make that less effective than it once > was, and as I've already said, those need to be fixed. The reason > that these were missed is because all the ARM CMA work bypassed me - > CMA on ARM has had zero review from the point of view of the ARM > architecture, so it's not surprising it gets stuff like this wrong. > > Once that's fixed, setting bit 22 is not necessary. Understood. Thanks for the clarification.
On Thu, Mar 12, 2015 at 01:04:31AM +0000, Russell King - ARM Linux wrote: > On Wed, Mar 11, 2015 at 05:12:12PM -0300, Fabio Estevam wrote: > > From: Fabio Estevam <fabio.estevam@freescale.com> > > > > Having bit 22 cleared in the PL310 Auxiliary Control register (shared > > attribute override enable) has the side effect of transforming Normal > > Shared Non-cacheable reads into Cacheable no-allocate reads. > > > > Coherent DMA buffers in Linux always have a Cacheable alias via the > > kernel linear mapping and the processor can speculatively load cache > > lines into the PL310 controller. > > No, this is wrong. They do not. CMA remaps pages to be non-cacheable > rather than the old technique where the above statement was true. > > There's some corner cases which make that less effective than it once > was, and as I've already said, those need to be fixed. The reason > that these were missed is because all the ARM CMA work bypassed me - > CMA on ARM has had zero review from the point of view of the ARM > architecture, so it's not surprising it gets stuff like this wrong. > > Once that's fixed, setting bit 22 is not necessary. And I strongly disagree with your statement. Seriously, there are so many assumption about when this bit will no longer be required like the platform always using CMA, having fixed the CMA code in Linux. That's a boot loader patch and even though it's used mostly (only) to load Linux, we should not make this assumption. Most importantly, not setting this bit makes your SoC non-compliant with the ARM ARM clarifications on mismatched aliases. It was a hardware mistake to have it cleared out of reset but just let firmware or boot-loaders deal with it. (there are some very specific use-cases for this bit that the hw guys had in mind but none of them apply to Linux)
On Wed, Mar 11, 2015 at 08:12:12PM +0000, Fabio Estevam wrote: > From: Fabio Estevam <fabio.estevam@freescale.com> > > Having bit 22 cleared in the PL310 Auxiliary Control register (shared > attribute override enable) has the side effect of transforming Normal > Shared Non-cacheable reads into Cacheable no-allocate reads. > > Coherent DMA buffers in Linux always have a Cacheable alias via the > kernel linear mapping and the processor can speculatively load cache > lines into the PL310 controller. With bit 22 cleared, Non-cacheable > reads would unexpectedly hit such cache lines leading to buffer > corruption. > > This was inspired by a patch from Catalin Marinas [1] and also from recent > discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring > suggested that bootloaders should initialize the cache. > > [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html > [2] https://lkml.org/lkml/2015/2/20/199 > > Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
On Wed, Mar 11, 2015 at 05:12:12PM -0300, Fabio Estevam wrote: > From: Fabio Estevam <fabio.estevam@freescale.com> > > Having bit 22 cleared in the PL310 Auxiliary Control register (shared > attribute override enable) has the side effect of transforming Normal > Shared Non-cacheable reads into Cacheable no-allocate reads. > > Coherent DMA buffers in Linux always have a Cacheable alias via the > kernel linear mapping and the processor can speculatively load cache > lines into the PL310 controller. With bit 22 cleared, Non-cacheable > reads would unexpectedly hit such cache lines leading to buffer > corruption. > > This was inspired by a patch from Catalin Marinas [1] and also from recent > discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring > suggested that bootloaders should initialize the cache. > > [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html > [2] https://lkml.org/lkml/2015/2/20/199 > > Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> > --- > arch/arm/cpu/armv7/mx6/soc.c | 8 ++++++++ > arch/arm/include/asm/pl310.h | 2 ++ > 2 files changed, 10 insertions(+) > > diff --git a/arch/arm/cpu/armv7/mx6/soc.c b/arch/arm/cpu/armv7/mx6/soc.c > index ef02972..5aab305 100644 > --- a/arch/arm/cpu/armv7/mx6/soc.c > +++ b/arch/arm/cpu/armv7/mx6/soc.c > @@ -506,6 +506,14 @@ void v7_outer_cache_enable(void) > struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE; > unsigned int val; > > + > + /* > + * Set bit 22 in the auxiliary control register. If this bit > + * is cleared, PL310 treats Normal Shared Non-cacheable > + * accesses as Cacheable no-allocate. > + */ > + setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE); > + > #if defined CONFIG_MX6SL > struct iomuxc *iomux = (struct iomuxc *)IOMUXC_BASE_ADDR; > val = readl(&iomux->gpr[11]); We should put this somewhere a bit more common that other A9 cores can also call into like OMAP4, SoCFPGA and maybe zynq later (based on a quick git grep pl310).
On Thu, Mar 12, 2015 at 10:41 AM, Tom Rini <trini@konsulko.com> wrote: > We should put this somewhere a bit more common that other A9 cores can > also call into like OMAP4, SoCFPGA and maybe zynq later (based on a > quick git grep pl310). I thought about it as well, but I didn't find a suitable common place for putting it. Suggestions? Thanks
On 03/11/2015 03:12 PM, Fabio Estevam wrote: > From: Fabio Estevam <fabio.estevam@freescale.com> > > Having bit 22 cleared in the PL310 Auxiliary Control register (shared > attribute override enable) has the side effect of transforming Normal > Shared Non-cacheable reads into Cacheable no-allocate reads. > > Coherent DMA buffers in Linux always have a Cacheable alias via the > kernel linear mapping and the processor can speculatively load cache > lines into the PL310 controller. With bit 22 cleared, Non-cacheable > reads would unexpectedly hit such cache lines leading to buffer > corruption. > > This was inspired by a patch from Catalin Marinas [1] and also from recent > discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring > suggested that bootloaders should initialize the cache. > > [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html > [2] https://lkml.org/lkml/2015/2/20/199 > > Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> > --- > arch/arm/cpu/armv7/mx6/soc.c | 8 ++++++++ > arch/arm/include/asm/pl310.h | 2 ++ > 2 files changed, 10 insertions(+) > > diff --git a/arch/arm/cpu/armv7/mx6/soc.c b/arch/arm/cpu/armv7/mx6/soc.c > index ef02972..5aab305 100644 > --- a/arch/arm/cpu/armv7/mx6/soc.c > +++ b/arch/arm/cpu/armv7/mx6/soc.c > @@ -506,6 +506,14 @@ void v7_outer_cache_enable(void) > struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE; > unsigned int val; > > + > + /* > + * Set bit 22 in the auxiliary control register. If this bit > + * is cleared, PL310 treats Normal Shared Non-cacheable > + * accesses as Cacheable no-allocate. > + */ > + setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE); > + > #if defined CONFIG_MX6SL > struct iomuxc *iomux = (struct iomuxc *)IOMUXC_BASE_ADDR; > val = readl(&iomux->gpr[11]); > diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h > index ddc245b..de7650e 100644 > --- a/arch/arm/include/asm/pl310.h > +++ b/arch/arm/include/asm/pl310.h > @@ -16,6 +16,8 @@ > #define L2X0_STNDBY_MODE_EN (1 << 0) > #define L2X0_CTRL_EN 1 > > +#define L310_SHARED_ATT_OVERRIDE_ENABLE (1 << 22) > + > struct pl310_regs { > u32 pl310_cache_id; > u32 pl310_cache_type; > is it possible for us to centralize the pl310 logic - that'd let us reuse generic logic cross SoCs without having to duplicate bits like these over and over. at least A9 based TI SoCs could potentially benefit out of this. The only problem was to deal with actual PL310 configuration path which could be SoC dependent, but then, we could implement weak functions that allow us to override the same. I tried to do something like that for CP15 errata[1] just my 2 cents. [1] http://article.gmane.org/gmane.comp.boot-loaders.u-boot/214436
Hi Fabio, On 11/03/2015 21:12, Fabio Estevam wrote: > From: Fabio Estevam <fabio.estevam@freescale.com> > > Having bit 22 cleared in the PL310 Auxiliary Control register (shared > attribute override enable) has the side effect of transforming Normal > Shared Non-cacheable reads into Cacheable no-allocate reads. > > Coherent DMA buffers in Linux always have a Cacheable alias via the > kernel linear mapping and the processor can speculatively load cache > lines into the PL310 controller. With bit 22 cleared, Non-cacheable > reads would unexpectedly hit such cache lines leading to buffer > corruption. > > This was inspired by a patch from Catalin Marinas [1] and also from recent > discussions in the linux-arm-kernel list [2] where Russell King and Rob Herring > suggested that bootloaders should initialize the cache. > > [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2010-November/031810.html > [2] https://lkml.org/lkml/2015/2/20/199 > > Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> > --- > arch/arm/cpu/armv7/mx6/soc.c | 8 ++++++++ > arch/arm/include/asm/pl310.h | 2 ++ > 2 files changed, 10 insertions(+) > > diff --git a/arch/arm/cpu/armv7/mx6/soc.c b/arch/arm/cpu/armv7/mx6/soc.c > index ef02972..5aab305 100644 > --- a/arch/arm/cpu/armv7/mx6/soc.c > +++ b/arch/arm/cpu/armv7/mx6/soc.c > @@ -506,6 +506,14 @@ void v7_outer_cache_enable(void) > struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE; > unsigned int val; > > + > + /* > + * Set bit 22 in the auxiliary control register. If this bit > + * is cleared, PL310 treats Normal Shared Non-cacheable > + * accesses as Cacheable no-allocate. > + */ > + setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE); > + > #if defined CONFIG_MX6SL > struct iomuxc *iomux = (struct iomuxc *)IOMUXC_BASE_ADDR; > val = readl(&iomux->gpr[11]); > diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h > index ddc245b..de7650e 100644 > --- a/arch/arm/include/asm/pl310.h > +++ b/arch/arm/include/asm/pl310.h > @@ -16,6 +16,8 @@ > #define L2X0_STNDBY_MODE_EN (1 << 0) > #define L2X0_CTRL_EN 1 > > +#define L310_SHARED_ATT_OVERRIDE_ENABLE (1 << 22) > + > struct pl310_regs { > u32 pl310_cache_id; > u32 pl310_cache_type; > It looks like from the discussion and the following threads that a general solution cannot be easy found. I agree with you to apply it at least for i.MX6, and let's see if in the future we can factorize it for other SOCs. Best regards, Stefano Babic
Hi Stefano, On Fri, May 15, 2015 at 10:35 AM, Stefano Babic <sbabic@denx.de> wrote: > It looks like from the discussion and the following threads that a > general solution cannot be easy found. I agree with you to apply it at > least for i.MX6, and let's see if in the future we can factorize it for > other SOCs. That sounds good! Thanks
diff --git a/arch/arm/cpu/armv7/mx6/soc.c b/arch/arm/cpu/armv7/mx6/soc.c index ef02972..5aab305 100644 --- a/arch/arm/cpu/armv7/mx6/soc.c +++ b/arch/arm/cpu/armv7/mx6/soc.c @@ -506,6 +506,14 @@ void v7_outer_cache_enable(void) struct pl310_regs *const pl310 = (struct pl310_regs *)L2_PL310_BASE; unsigned int val; + + /* + * Set bit 22 in the auxiliary control register. If this bit + * is cleared, PL310 treats Normal Shared Non-cacheable + * accesses as Cacheable no-allocate. + */ + setbits_le32(&pl310->pl310_aux_ctrl, L310_SHARED_ATT_OVERRIDE_ENABLE); + #if defined CONFIG_MX6SL struct iomuxc *iomux = (struct iomuxc *)IOMUXC_BASE_ADDR; val = readl(&iomux->gpr[11]); diff --git a/arch/arm/include/asm/pl310.h b/arch/arm/include/asm/pl310.h index ddc245b..de7650e 100644 --- a/arch/arm/include/asm/pl310.h +++ b/arch/arm/include/asm/pl310.h @@ -16,6 +16,8 @@ #define L2X0_STNDBY_MODE_EN (1 << 0) #define L2X0_CTRL_EN 1 +#define L310_SHARED_ATT_OVERRIDE_ENABLE (1 << 22) + struct pl310_regs { u32 pl310_cache_id; u32 pl310_cache_type;