Message ID | 20190111003121.12360-3-andre.przywara@arm.com |
---|---|
State | Superseded |
Delegated to: | Jagannadha Sutradharudu Teki |
Headers | show |
Series | arm: Introduce writel/readl_relaxed accessors | expand |
On 11.01.2019, at 01:31, Andre Przywara <andre.przywara@arm.com> wrote: > > The normal MMIO accessor macros (readX/writeX) guarantee a strong ordering, > even with normal memory accesses [1]. > For some MMIO operations (framebuffers being a prominent example) this is > not needed, and the rather costly barrier inserted on weakly ordered > systems like ARM costs some performance. > To mitigate this issue, Linux introduced readX_relaxed and > writeX_relaxed primitives, which only guarantee ordering between each > other, so are typically faster due to the missing barrier. > > We probably do not care so much about performance in U-Boot, but want to > have those primitives for two other reasons: > - Being able to use the _relaxed macros simplifies porting drivers from > Linux. > - The missing barrier typically allows to generate smaller code, which can > relieve some chronically tight SPL builds. > > Add those macros definitions by using the __raw versions of the > accessors, which matches an earlier (and less complicated) version of > the Linux implementation. > > [1] https://lwn.net/Articles/698014/ No Signed-off-by? Reviewed-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com> [On my experimental RK3399 after modifying a few drivers:] Tested-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
On 06/02/2019 12:46, Philipp Tomsich wrote: > On 11.01.2019, at 01:31, Andre Przywara <andre.przywara@arm.com> wrote: Hi, >> >> The normal MMIO accessor macros (readX/writeX) guarantee a strong ordering, >> even with normal memory accesses [1]. >> For some MMIO operations (framebuffers being a prominent example) this is >> not needed, and the rather costly barrier inserted on weakly ordered >> systems like ARM costs some performance. >> To mitigate this issue, Linux introduced readX_relaxed and >> writeX_relaxed primitives, which only guarantee ordering between each >> other, so are typically faster due to the missing barrier. >> >> We probably do not care so much about performance in U-Boot, but want to >> have those primitives for two other reasons: >> - Being able to use the _relaxed macros simplifies porting drivers from >> Linux. >> - The missing barrier typically allows to generate smaller code, which can >> relieve some chronically tight SPL builds. >> >> Add those macros definitions by using the __raw versions of the >> accessors, which matches an earlier (and less complicated) version of >> the Linux implementation. >> >> [1] https://lwn.net/Articles/698014/ > > No Signed-off-by? Doh, indeed. Got so excited about my commit message, that I forgot the obvious (and I only think about -s *after* hitting Enter). > Reviewed-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com> > [On my experimental RK3399 after modifying a few drivers:] > Tested-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com> Cool, many thanks for that! Out of curiosity, did you really need *_relaxed for some reason? Cheers, Andre.
> On 06.02.2019, at 22:53, André Przywara <andre.przywara@arm.com> wrote: > > On 06/02/2019 12:46, Philipp Tomsich wrote: >> On 11.01.2019, at 01:31, Andre Przywara <andre.przywara@arm.com <mailto:andre.przywara@arm.com>> wrote: > > Hi, > >>> >>> The normal MMIO accessor macros (readX/writeX) guarantee a strong ordering, >>> even with normal memory accesses [1]. >>> For some MMIO operations (framebuffers being a prominent example) this is >>> not needed, and the rather costly barrier inserted on weakly ordered >>> systems like ARM costs some performance. >>> To mitigate this issue, Linux introduced readX_relaxed and >>> writeX_relaxed primitives, which only guarantee ordering between each >>> other, so are typically faster due to the missing barrier. >>> >>> We probably do not care so much about performance in U-Boot, but want to >>> have those primitives for two other reasons: >>> - Being able to use the _relaxed macros simplifies porting drivers from >>> Linux. >>> - The missing barrier typically allows to generate smaller code, which can >>> relieve some chronically tight SPL builds. >>> >>> Add those macros definitions by using the __raw versions of the >>> accessors, which matches an earlier (and less complicated) version of >>> the Linux implementation. >>> >>> [1] https://lwn.net/Articles/698014/ >> >> No Signed-off-by? > > Doh, indeed. Got so excited about my commit message, that I forgot the > obvious (and I only think about -s *after* hitting Enter). > >> Reviewed-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com <mailto:philipp.tomsich@theobroma-systems.com>> >> [On my experimental RK3399 after modifying a few drivers:] >> Tested-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com <mailto:philipp.tomsich@theobroma-systems.com>> > > Cool, many thanks for that! Out of curiosity, did you really need > *_relaxed for some reason? Not yet, but possibly soon. I have been working on some changes to the way we auto-configure for different DRAM timings (effectively doing something similar to how DIMMs are detected using SPDs)… and have been fighting code-size growth on that front. As I have to set up timings for 3 frequencies on the 3399, the amount of dmb()s add up from the strongly ordered writel and clrsetbits calls. Cheers, Phil.
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h index bcbaf0d83c..5b24b88efc 100644 --- a/arch/arm/include/asm/io.h +++ b/arch/arm/include/asm/io.h @@ -98,11 +98,21 @@ static inline void __raw_readsl(unsigned long addr, void *data, int longlen) #define writel(v,c) ({ u32 __v = v; __iowmb(); __arch_putl(__v,c); __v; }) #define writeq(v,c) ({ u64 __v = v; __iowmb(); __arch_putq(__v,c); __v; }) +#define writeb_relaxed(v,c) __raw_writeb(v, c) +#define writew_relaxed(v,c) __raw_writew(v, c) +#define writel_relaxed(v,c) __raw_writel(v, c) +#define writeq_relaxed(v,c) __raw_writeq(v, c) + #define readb(c) ({ u8 __v = __arch_getb(c); __iormb(); __v; }) #define readw(c) ({ u16 __v = __arch_getw(c); __iormb(); __v; }) #define readl(c) ({ u32 __v = __arch_getl(c); __iormb(); __v; }) #define readq(c) ({ u64 __v = __arch_getq(c); __iormb(); __v; }) +#define readb_relaxed(c) __raw_readb(c) +#define readw_relaxed(c) __raw_readw(c) +#define readl_relaxed(c) __raw_readl(c) +#define readq_relaxed(c) __raw_readq(c) + /* * The compiler seems to be incapable of optimising constants * properly. Spell it out to the compiler in some cases.