[U-Boot,2/3] arm: introduce _relaxed MMIO accessors
diff mbox series

Message ID 20190111003121.12360-3-andre.przywara@arm.com
State Superseded
Delegated to: Jagannadha Sutradharudu Teki
Headers show
Series
  • arm: Introduce writel/readl_relaxed accessors
Related show

Commit Message

Andre Przywara Jan. 11, 2019, 12:31 a.m. UTC
The normal MMIO accessor macros (readX/writeX) guarantee a strong ordering,
even with normal memory accesses [1].
For some MMIO operations (framebuffers being a prominent example) this is
not needed, and the rather costly barrier inserted on weakly ordered
systems like ARM costs some performance.
To mitigate this issue, Linux introduced readX_relaxed and
writeX_relaxed primitives, which only guarantee ordering between each
other, so are typically faster due to the missing barrier.

We probably do not care so much about performance in U-Boot, but want to
have those primitives for two other reasons:
- Being able to use the _relaxed macros simplifies porting drivers from
  Linux.
- The missing barrier typically allows to generate smaller code, which can
  relieve some chronically tight SPL builds.

Add those macros definitions by using the __raw versions of the
accessors, which matches an earlier (and less complicated) version of
the Linux implementation.

[1] https://lwn.net/Articles/698014/
---
 arch/arm/include/asm/io.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Philipp Tomsich Feb. 6, 2019, 12:46 p.m. UTC | #1
On 11.01.2019, at 01:31, Andre Przywara <andre.przywara@arm.com> wrote:
> 
> The normal MMIO accessor macros (readX/writeX) guarantee a strong ordering,
> even with normal memory accesses [1].
> For some MMIO operations (framebuffers being a prominent example) this is
> not needed, and the rather costly barrier inserted on weakly ordered
> systems like ARM costs some performance.
> To mitigate this issue, Linux introduced readX_relaxed and
> writeX_relaxed primitives, which only guarantee ordering between each
> other, so are typically faster due to the missing barrier.
> 
> We probably do not care so much about performance in U-Boot, but want to
> have those primitives for two other reasons:
> - Being able to use the _relaxed macros simplifies porting drivers from
>  Linux.
> - The missing barrier typically allows to generate smaller code, which can
>  relieve some chronically tight SPL builds.
> 
> Add those macros definitions by using the __raw versions of the
> accessors, which matches an earlier (and less complicated) version of
> the Linux implementation.
> 
> [1] https://lwn.net/Articles/698014/

No Signed-off-by?

Reviewed-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
[On my experimental RK3399 after modifying a few drivers:]
Tested-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
Andre Przywara Feb. 6, 2019, 9:53 p.m. UTC | #2
On 06/02/2019 12:46, Philipp Tomsich wrote:
> On 11.01.2019, at 01:31, Andre Przywara <andre.przywara@arm.com> wrote:

Hi,

>>
>> The normal MMIO accessor macros (readX/writeX) guarantee a strong ordering,
>> even with normal memory accesses [1].
>> For some MMIO operations (framebuffers being a prominent example) this is
>> not needed, and the rather costly barrier inserted on weakly ordered
>> systems like ARM costs some performance.
>> To mitigate this issue, Linux introduced readX_relaxed and
>> writeX_relaxed primitives, which only guarantee ordering between each
>> other, so are typically faster due to the missing barrier.
>>
>> We probably do not care so much about performance in U-Boot, but want to
>> have those primitives for two other reasons:
>> - Being able to use the _relaxed macros simplifies porting drivers from
>>  Linux.
>> - The missing barrier typically allows to generate smaller code, which can
>>  relieve some chronically tight SPL builds.
>>
>> Add those macros definitions by using the __raw versions of the
>> accessors, which matches an earlier (and less complicated) version of
>> the Linux implementation.
>>
>> [1] https://lwn.net/Articles/698014/
> 
> No Signed-off-by?

Doh, indeed. Got so excited about my commit message, that I forgot the
obvious (and I only think about -s *after* hitting Enter).

> Reviewed-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
> [On my experimental RK3399 after modifying a few drivers:]
> Tested-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>

Cool, many thanks for that! Out of curiosity, did you really need
*_relaxed for some reason?

Cheers,
Andre.
Philipp Tomsich Feb. 7, 2019, 12:32 a.m. UTC | #3
> On 06.02.2019, at 22:53, André Przywara <andre.przywara@arm.com> wrote:
> 
> On 06/02/2019 12:46, Philipp Tomsich wrote:
>> On 11.01.2019, at 01:31, Andre Przywara <andre.przywara@arm.com <mailto:andre.przywara@arm.com>> wrote:
> 
> Hi,
> 
>>> 
>>> The normal MMIO accessor macros (readX/writeX) guarantee a strong ordering,
>>> even with normal memory accesses [1].
>>> For some MMIO operations (framebuffers being a prominent example) this is
>>> not needed, and the rather costly barrier inserted on weakly ordered
>>> systems like ARM costs some performance.
>>> To mitigate this issue, Linux introduced readX_relaxed and
>>> writeX_relaxed primitives, which only guarantee ordering between each
>>> other, so are typically faster due to the missing barrier.
>>> 
>>> We probably do not care so much about performance in U-Boot, but want to
>>> have those primitives for two other reasons:
>>> - Being able to use the _relaxed macros simplifies porting drivers from
>>> Linux.
>>> - The missing barrier typically allows to generate smaller code, which can
>>> relieve some chronically tight SPL builds.
>>> 
>>> Add those macros definitions by using the __raw versions of the
>>> accessors, which matches an earlier (and less complicated) version of
>>> the Linux implementation.
>>> 
>>> [1] https://lwn.net/Articles/698014/
>> 
>> No Signed-off-by?
> 
> Doh, indeed. Got so excited about my commit message, that I forgot the
> obvious (and I only think about -s *after* hitting Enter).
> 
>> Reviewed-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com <mailto:philipp.tomsich@theobroma-systems.com>>
>> [On my experimental RK3399 after modifying a few drivers:]
>> Tested-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com <mailto:philipp.tomsich@theobroma-systems.com>>
> 
> Cool, many thanks for that! Out of curiosity, did you really need
> *_relaxed for some reason?

Not yet, but possibly soon.
I have been working on some changes to the way we auto-configure for
different DRAM timings (effectively doing something similar to how DIMMs
are detected using SPDs)… and have been fighting code-size growth on
that front. As I have to set up timings for 3 frequencies on the 3399, the
amount of dmb()s add up from the strongly ordered writel and clrsetbits
calls.

Cheers,
Phil.

Patch
diff mbox series

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index bcbaf0d83c..5b24b88efc 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -98,11 +98,21 @@  static inline void __raw_readsl(unsigned long addr, void *data, int longlen)
 #define writel(v,c)	({ u32 __v = v; __iowmb(); __arch_putl(__v,c); __v; })
 #define writeq(v,c)	({ u64 __v = v; __iowmb(); __arch_putq(__v,c); __v; })
 
+#define writeb_relaxed(v,c)	__raw_writeb(v, c)
+#define writew_relaxed(v,c)	__raw_writew(v, c)
+#define writel_relaxed(v,c)	__raw_writel(v, c)
+#define writeq_relaxed(v,c)	__raw_writeq(v, c)
+
 #define readb(c)	({ u8  __v = __arch_getb(c); __iormb(); __v; })
 #define readw(c)	({ u16 __v = __arch_getw(c); __iormb(); __v; })
 #define readl(c)	({ u32 __v = __arch_getl(c); __iormb(); __v; })
 #define readq(c)	({ u64 __v = __arch_getq(c); __iormb(); __v; })
 
+#define readb_relaxed(c)	__raw_readb(c)
+#define readw_relaxed(c)	__raw_readw(c)
+#define readl_relaxed(c)	__raw_readl(c)
+#define readq_relaxed(c)	__raw_readq(c)
+
 /*
  * The compiler seems to be incapable of optimising constants
  * properly.  Spell it out to the compiler in some cases.