diff mbox series

[v5] alpha: fix memory barriers so that they conform to the specification

Message ID alpine.LRH.2.02.2005231134590.10727@file01.intranet.prod.int.rdu2.redhat.com
State Not Applicable
Headers show
Series [v5] alpha: fix memory barriers so that they conform to the specification | expand

Commit Message

Mikulas Patocka May 23, 2020, 3:37 p.m. UTC
The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti
platform. The patches move memory barriers after a write before the write.
The result is that if there's iowrite followed by ioread, there is no
barrier between them.

The Alpha architecture allows reordering of the accesses to the I/O space,
and the missing barrier between write and read causes hang with serial
port and real time clock.

This patch makes barriers confiorm to the specification.

1. We add mb() before readX_relaxed and writeX_relaxed -
   memory-barriers.txt claims that these functions must be ordered w.r.t.
   each other. Alpha doesn't order them, so we need an explicit barrier.
2. We add mb() before reads from the I/O space - so that if there's a
   write followed by a read, there should be a barrier between them.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
Cc: stable@vger.kernel.org      # v4.17+
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

---
 arch/alpha/include/asm/io.h |   87 ++++++++++++++++++++++++++++++++++++--------
 arch/alpha/kernel/io.c      |   28 ++++++++++----
 2 files changed, 93 insertions(+), 22 deletions(-)

Comments

Maciej W. Rozycki May 24, 2020, 2:54 p.m. UTC | #1
Hi Mikulas,

> This patch makes barriers confiorm to the specification.
> 
> 1. We add mb() before readX_relaxed and writeX_relaxed -
>    memory-barriers.txt claims that these functions must be ordered w.r.t.
>    each other. Alpha doesn't order them, so we need an explicit barrier.
> 2. We add mb() before reads from the I/O space - so that if there's a
>    write followed by a read, there should be a barrier between them.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
> Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
> Cc: stable@vger.kernel.org      # v4.17+
> Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>

 Thank you for your effort to address this regression.  I have looked 
through your code and the context it is to be applied to.  Overall it 
looks good to me, however I still have one concern as detailed below 
(please accept my apologies if you find it tedious to address all the 
points raised in the course of this review).

> Index: linux-stable/arch/alpha/include/asm/io.h
> ===================================================================
> --- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-23 10:01:22.000000000 +0200
> +++ linux-stable/arch/alpha/include/asm/io.h	2020-05-23 17:29:22.000000000 +0200
[...]
> @@ -487,16 +501,59 @@ extern inline void writeq(u64 b, volatil
>  #define outb_p		outb
>  #define outw_p		outw
>  #define outl_p		outl
> -#define readb_relaxed(addr)	__raw_readb(addr)
> -#define readw_relaxed(addr)	__raw_readw(addr)
> -#define readl_relaxed(addr)	__raw_readl(addr)
> -#define readq_relaxed(addr)	__raw_readq(addr)
> -#define writeb_relaxed(b, addr)	__raw_writeb(b, addr)
> -#define writew_relaxed(b, addr)	__raw_writew(b, addr)
> -#define writel_relaxed(b, addr)	__raw_writel(b, addr)
> -#define writeq_relaxed(b, addr)	__raw_writeq(b, addr)
>  
>  /*
> + * The _relaxed functions must be ordered w.r.t. each other, but they don't
> + * have to be ordered w.r.t. other memory accesses.
> + */
> +static inline u8 readb_relaxed(const volatile void __iomem *addr)
> +{
> +	mb();
> +	return __raw_readb(addr);
> +}
[etc.]

 Please observe that changing the `*_relaxed' entry points from merely 
aliasing the corresponding `__raw_*' handlers to more elaborate code 
sequences (though indeed slightly only, but still) makes the situation 
analogous to one we have with the ordinary MMIO accessor entry points.  
Those regular entry points have been made `extern inline' and wrapped 
into:

#if IO_CONCAT(__IO_PREFIX,trivial_rw_bw) == 1

and:

#if IO_CONCAT(__IO_PREFIX,trivial_rw_lq) == 1

respectively, with corresponding out-of-line entry points available, so 
that there is no extra inline code produced where the call to the relevant 
MMIO accessor is going to end up with an actual function call, as this 
would not help performance in any way and would expand code unnecessarily 
at all call sites.

 Therefore I suggest that your new `static inline' functions follow the 
pattern, perhaps by grouping them with the corresponding ordinary accessor 
functions in arch/alpha/include/asm/io.h within the relevant existing 
#ifdef, and then by making them `extern inline' and providing out-of-line 
implementations in arch/alpha/kernel/io.c, with the individual symbols 
exported.  Within arch/alpha/kernel/io.c the compiler will still inline 
code as it sees fit as it already does, e.g. `__raw_readq' might get 
inlined in `readq' if it turns out cheaper than arranging for an actual 
call, including all the stack frame preparation for `ra' preservation; 
it's less likely with say `writeq' which probably always ends with a tail 
call to `__raw_writeq' as no stack frame is required in that case.

 That for the read accessors.

> +static inline void writeb_relaxed(u8 b, volatile void __iomem *addr)
> +{
> +	mb();
> +	__raw_writeb(b, addr);
> +}
[etc.]

 Conversely for the write accessors, keeping in mind what I have noted 
above, I suggest that you just redirect the existing aliases to the 
ordinary accessors, as there will be no difference anymore between the 
respective ordinary and relaxed accessors.  That is:

#define writeb_relaxed(b, addr)	writeb(b, addr)

etc.

 Let me know if you have any further questions or comments.

  Maciej
Mikulas Patocka May 25, 2020, 1:56 p.m. UTC | #2
On Sun, 24 May 2020, Maciej W. Rozycki wrote:

> Hi Mikulas,
> 
> > This patch makes barriers confiorm to the specification.
> > 
> > 1. We add mb() before readX_relaxed and writeX_relaxed -
> >    memory-barriers.txt claims that these functions must be ordered w.r.t.
> >    each other. Alpha doesn't order them, so we need an explicit barrier.
> > 2. We add mb() before reads from the I/O space - so that if there's a
> >    write followed by a read, there should be a barrier between them.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering")
> > Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2")
> > Cc: stable@vger.kernel.org      # v4.17+
> > Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
> 
>  Thank you for your effort to address this regression.  I have looked 
> through your code and the context it is to be applied to.  Overall it 
> looks good to me, however I still have one concern as detailed below 
> (please accept my apologies if you find it tedious to address all the 
> points raised in the course of this review).
> 
> > Index: linux-stable/arch/alpha/include/asm/io.h
> > ===================================================================
> > --- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-23 10:01:22.000000000 +0200
> > +++ linux-stable/arch/alpha/include/asm/io.h	2020-05-23 17:29:22.000000000 +0200
> [...]
> > @@ -487,16 +501,59 @@ extern inline void writeq(u64 b, volatil
> >  #define outb_p		outb
> >  #define outw_p		outw
> >  #define outl_p		outl
> > -#define readb_relaxed(addr)	__raw_readb(addr)
> > -#define readw_relaxed(addr)	__raw_readw(addr)
> > -#define readl_relaxed(addr)	__raw_readl(addr)
> > -#define readq_relaxed(addr)	__raw_readq(addr)
> > -#define writeb_relaxed(b, addr)	__raw_writeb(b, addr)
> > -#define writew_relaxed(b, addr)	__raw_writew(b, addr)
> > -#define writel_relaxed(b, addr)	__raw_writel(b, addr)
> > -#define writeq_relaxed(b, addr)	__raw_writeq(b, addr)
> >  
> >  /*
> > + * The _relaxed functions must be ordered w.r.t. each other, but they don't
> > + * have to be ordered w.r.t. other memory accesses.
> > + */
> > +static inline u8 readb_relaxed(const volatile void __iomem *addr)
> > +{
> > +	mb();
> > +	return __raw_readb(addr);
> > +}
> [etc.]
> 
>  Please observe that changing the `*_relaxed' entry points from merely 
> aliasing the corresponding `__raw_*' handlers to more elaborate code 
> sequences (though indeed slightly only, but still) makes the situation 
> analogous to one we have with the ordinary MMIO accessor entry points.  
> Those regular entry points have been made `extern inline' and wrapped 
> into:
> 
> #if IO_CONCAT(__IO_PREFIX,trivial_rw_bw) == 1
> 
> and:
> 
> #if IO_CONCAT(__IO_PREFIX,trivial_rw_lq) == 1
> 
> respectively, with corresponding out-of-line entry points available, so 
> that there is no extra inline code produced where the call to the relevant 
> MMIO accessor is going to end up with an actual function call, as this 
> would not help performance in any way and would expand code unnecessarily 
> at all call sites.
> 
>  Therefore I suggest that your new `static inline' functions follow the 
> pattern, perhaps by grouping them with the corresponding ordinary accessor 
> functions in arch/alpha/include/asm/io.h within the relevant existing 
> #ifdef, and then by making them `extern inline' and providing out-of-line 
> implementations in arch/alpha/kernel/io.c, with the individual symbols 
> exported.  Within arch/alpha/kernel/io.c the compiler will still inline 
> code as it sees fit as it already does, e.g. `__raw_readq' might get 
> inlined in `readq' if it turns out cheaper than arranging for an actual 
> call, including all the stack frame preparation for `ra' preservation; 
> it's less likely with say `writeq' which probably always ends with a tail 
> call to `__raw_writeq' as no stack frame is required in that case.
> 
>  That for the read accessors.

I think that making the read*_relaxed functions extern inline just causes 
source code bloat with no practical gain - if we make them extern inline, 
we would need two implementations (one in the include file, the other in 
the C file) - and it is not good practice to duplicate code.

The functions __raw_read* are already extern inline, so the compiler will 
inline/noinline them depending on the macros trivial_io_bw and 
trivial_io_lq - so we can just call them from read*_relaxed without 
repeating the extern inline pattern.

> > +static inline void writeb_relaxed(u8 b, volatile void __iomem *addr)
> > +{
> > +	mb();
> > +	__raw_writeb(b, addr);
> > +}
> [etc.]
> 
>  Conversely for the write accessors, keeping in mind what I have noted 
> above, I suggest that you just redirect the existing aliases to the 
> ordinary accessors, as there will be no difference anymore between the 
> respective ordinary and relaxed accessors.  That is:
> 
> #define writeb_relaxed(b, addr)	writeb(b, addr)

Yes - that's a good point.

> etc.
> 
>  Let me know if you have any further questions or comments.
> 
>   Maciej

Mikulas
Arnd Bergmann May 25, 2020, 2:07 p.m. UTC | #3
On Mon, May 25, 2020 at 3:56 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> On Sun, 24 May 2020, Maciej W. Rozycki wrote:

> >
> > respectively, with corresponding out-of-line entry points available, so
> > that there is no extra inline code produced where the call to the relevant
> > MMIO accessor is going to end up with an actual function call, as this
> > would not help performance in any way and would expand code unnecessarily
> > at all call sites.
> >
> >  Therefore I suggest that your new `static inline' functions follow the
> > pattern, perhaps by grouping them with the corresponding ordinary accessor
> > functions in arch/alpha/include/asm/io.h within the relevant existing
> > #ifdef, and then by making them `extern inline' and providing out-of-line
> > implementations in arch/alpha/kernel/io.c, with the individual symbols
> > exported.  Within arch/alpha/kernel/io.c the compiler will still inline
> > code as it sees fit as it already does, e.g. `__raw_readq' might get
> > inlined in `readq' if it turns out cheaper than arranging for an actual
> > call, including all the stack frame preparation for `ra' preservation;
> > it's less likely with say `writeq' which probably always ends with a tail
> > call to `__raw_writeq' as no stack frame is required in that case.
> >
> >  That for the read accessors.
>
> I think that making the read*_relaxed functions extern inline just causes
> source code bloat with no practical gain - if we make them extern inline,
> we would need two implementations (one in the include file, the other in
> the C file) - and it is not good practice to duplicate code.
>
> The functions __raw_read* are already extern inline, so the compiler will
> inline/noinline them depending on the macros trivial_io_bw and
> trivial_io_lq - so we can just call them from read*_relaxed without
> repeating the extern inline pattern.

You could consider using the helpers in include/asm-generic/io.h
to provide some of the wrappers and only provide the ones that
don't fit in that scheme already.

       Arnd
Maciej W. Rozycki May 25, 2020, 2:45 p.m. UTC | #4
On Mon, 25 May 2020, Mikulas Patocka wrote:

> >  Please observe that changing the `*_relaxed' entry points from merely 
> > aliasing the corresponding `__raw_*' handlers to more elaborate code 
> > sequences (though indeed slightly only, but still) makes the situation 
> > analogous to one we have with the ordinary MMIO accessor entry points.  
> > Those regular entry points have been made `extern inline' and wrapped 
> > into:
> > 
> > #if IO_CONCAT(__IO_PREFIX,trivial_rw_bw) == 1
> > 
> > and:
> > 
> > #if IO_CONCAT(__IO_PREFIX,trivial_rw_lq) == 1
> > 
> > respectively, with corresponding out-of-line entry points available, so 
> > that there is no extra inline code produced where the call to the relevant 
> > MMIO accessor is going to end up with an actual function call, as this 
> > would not help performance in any way and would expand code unnecessarily 
> > at all call sites.
> > 
> >  Therefore I suggest that your new `static inline' functions follow the 
> > pattern, perhaps by grouping them with the corresponding ordinary accessor 
> > functions in arch/alpha/include/asm/io.h within the relevant existing 
> > #ifdef, and then by making them `extern inline' and providing out-of-line 
> > implementations in arch/alpha/kernel/io.c, with the individual symbols 
> > exported.  Within arch/alpha/kernel/io.c the compiler will still inline 
> > code as it sees fit as it already does, e.g. `__raw_readq' might get 
> > inlined in `readq' if it turns out cheaper than arranging for an actual 
> > call, including all the stack frame preparation for `ra' preservation; 
> > it's less likely with say `writeq' which probably always ends with a tail 
> > call to `__raw_writeq' as no stack frame is required in that case.
> > 
> >  That for the read accessors.
> 
> I think that making the read*_relaxed functions extern inline just causes 
> source code bloat with no practical gain - if we make them extern inline, 
> we would need two implementations (one in the include file, the other in 
> the C file) - and it is not good practice to duplicate code.

 We do that already with the existing accessors, and while I agree the 
duplication is a bit unfortunate and could be solved with some macro 
hackery so that there is a single version in arch/alpha/include/asm/io.h 
that gets pasted to produce out-of-line copies in arch/alpha/kernel/io.c.  
That would be good having across all the accessors, but that is not 
relevant to the regression discussed here and therefore I do not request 
that you make such a clean-up as a part of this series.

> The functions __raw_read* are already extern inline, so the compiler will 
> inline/noinline them depending on the macros trivial_io_bw and 
> trivial_io_lq - so we can just call them from read*_relaxed without 
> repeating the extern inline pattern.

 The whole point of this peculiar arrangement for cooked accessors is to 
avoid having barriers inserted inline around out-of-line calls to raw 
accessors, and therefore I maintain that we need to have the arrangement 
applied consistently across all the cooked accessors.  Since you're 
creating new distinct cooked accessors (by making their names no longer 
alias to existing cooked accessors), they need to follow the pattern.

 NB this arrangement was added back in 2.6.9-rc3, with:

ChangeSet@1.1939.5.8, 2004-09-22 22:40:06-07:00, rth@kanga.twiddle.home
  [ALPHA] Implement new ioread interface.

I believe.  I don't know if any further details or discussion around that 
can be chased, but Richard might be able to chime in.

  Maciej
Mikulas Patocka May 25, 2020, 3:54 p.m. UTC | #5
On Mon, 25 May 2020, Maciej W. Rozycki wrote:

> On Mon, 25 May 2020, Mikulas Patocka wrote:
> 
> > The functions __raw_read* are already extern inline, so the compiler will 
> > inline/noinline them depending on the macros trivial_io_bw and 
> > trivial_io_lq - so we can just call them from read*_relaxed without 
> > repeating the extern inline pattern.
> 
>  The whole point of this peculiar arrangement for cooked accessors is to 
> avoid having barriers inserted inline around out-of-line calls to raw 
> accessors,

I see, but why do we want to avoid that? Linux kernel has no binary 
compatibility, so it doesn't matter if the barriers are inlined in the 
drivers or not.

Anyway, I've sent a next version of the patch that makes read*_relaxed 
extern inline.

Mikulas
Maciej W. Rozycki May 25, 2020, 4:39 p.m. UTC | #6
On Mon, 25 May 2020, Mikulas Patocka wrote:

> > > The functions __raw_read* are already extern inline, so the compiler will 
> > > inline/noinline them depending on the macros trivial_io_bw and 
> > > trivial_io_lq - so we can just call them from read*_relaxed without 
> > > repeating the extern inline pattern.
> > 
> >  The whole point of this peculiar arrangement for cooked accessors is to 
> > avoid having barriers inserted inline around out-of-line calls to raw 
> > accessors,
> 
> I see, but why do we want to avoid that? Linux kernel has no binary 
> compatibility, so it doesn't matter if the barriers are inlined in the 
> drivers or not.

 It does matter as it expands code unnecessarily (at all call sites), as I 
noted in the original review.  This has nothing to do with compatibility.

> Anyway, I've sent a next version of the patch that makes read*_relaxed 
> extern inline.

 Thanks, I'll have a look.  And now that you have this update, please run 
`size' on ALPHA_GENERIC `vmlinux', preferably monolithic, to see what the 
difference is between `read*_relaxed' handlers `static inline' and keyed 
with `*trivial_rw_*'.

  Maciej
Mikulas Patocka May 26, 2020, 2:48 p.m. UTC | #7
On Mon, 25 May 2020, Maciej W. Rozycki wrote:

> On Mon, 25 May 2020, Mikulas Patocka wrote:
> 
> > > > The functions __raw_read* are already extern inline, so the compiler will 
> > > > inline/noinline them depending on the macros trivial_io_bw and 
> > > > trivial_io_lq - so we can just call them from read*_relaxed without 
> > > > repeating the extern inline pattern.
> > > 
> > >  The whole point of this peculiar arrangement for cooked accessors is to 
> > > avoid having barriers inserted inline around out-of-line calls to raw 
> > > accessors,
> > 
> > I see, but why do we want to avoid that? Linux kernel has no binary 
> > compatibility, so it doesn't matter if the barriers are inlined in the 
> > drivers or not.
> 
>  It does matter as it expands code unnecessarily (at all call sites), as I 
> noted in the original review.  This has nothing to do with compatibility.
> 
> > Anyway, I've sent a next version of the patch that makes read*_relaxed 
> > extern inline.
> 
>  Thanks, I'll have a look.  And now that you have this update, please run 
> `size' on ALPHA_GENERIC `vmlinux', preferably monolithic, to see what the 
> difference is between `read*_relaxed' handlers `static inline' and keyed 
> with `*trivial_rw_*'.
> 
>   Maciej

The patch with static inline:
   text    data     bss     dec     hex filename
124207762       75953010        5426432 205587204       c410304 vmlinux

The patch with extern inline:
   text    data     bss     dec     hex filename
124184422       75953474        5426432 205564328       c40a9a8 vmlinux

(I've sent version 7 of the patch because I misnamed the "write_relaxed" 
function in version 6).

Mikulas
Maciej W. Rozycki May 27, 2020, 12:23 a.m. UTC | #8
On Tue, 26 May 2020, Mikulas Patocka wrote:

> >  Thanks, I'll have a look.  And now that you have this update, please run 
> > `size' on ALPHA_GENERIC `vmlinux', preferably monolithic, to see what the 
> > difference is between `read*_relaxed' handlers `static inline' and keyed 
> > with `*trivial_rw_*'.
> > 
> >   Maciej
> 
> The patch with static inline:
>    text    data     bss     dec     hex filename
> 124207762       75953010        5426432 205587204       c410304 vmlinux
> 
> The patch with extern inline:
>    text    data     bss     dec     hex filename
> 124184422       75953474        5426432 205564328       c40a9a8 vmlinux

 I'm not sure why data grew, but still the gain in size is maybe small, 
yet it is there.

> (I've sent version 7 of the patch because I misnamed the "write_relaxed" 
> function in version 6).

 I have reviewed v7 now, thanks for your effort!

  Maciej
diff mbox series

Patch

Index: linux-stable/arch/alpha/include/asm/io.h
===================================================================
--- linux-stable.orig/arch/alpha/include/asm/io.h	2020-05-23 10:01:22.000000000 +0200
+++ linux-stable/arch/alpha/include/asm/io.h	2020-05-23 17:29:22.000000000 +0200
@@ -310,14 +310,18 @@  static inline int __is_mmio(const volati
 #if IO_CONCAT(__IO_PREFIX,trivial_io_bw)
 extern inline unsigned int ioread8(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
 	mb();
 	return ret;
 }
 
 extern inline unsigned int ioread16(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
 	mb();
 	return ret;
 }
@@ -358,7 +362,9 @@  extern inline void outw(u16 b, unsigned
 #if IO_CONCAT(__IO_PREFIX,trivial_io_lq)
 extern inline unsigned int ioread32(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
 	mb();
 	return ret;
 }
@@ -403,14 +409,18 @@  extern inline void __raw_writew(u16 b, v
 
 extern inline u8 readb(const volatile void __iomem *addr)
 {
-	u8 ret = __raw_readb(addr);
+	u8 ret;
+	mb();
+	ret = __raw_readb(addr);
 	mb();
 	return ret;
 }
 
 extern inline u16 readw(const volatile void __iomem *addr)
 {
-	u16 ret = __raw_readw(addr);
+	u16 ret;
+	mb();
+	ret = __raw_readw(addr);
 	mb();
 	return ret;
 }
@@ -451,14 +461,18 @@  extern inline void __raw_writeq(u64 b, v
 
 extern inline u32 readl(const volatile void __iomem *addr)
 {
-	u32 ret = __raw_readl(addr);
+	u32 ret;
+	mb();
+	ret = __raw_readl(addr);
 	mb();
 	return ret;
 }
 
 extern inline u64 readq(const volatile void __iomem *addr)
 {
-	u64 ret = __raw_readq(addr);
+	u64 ret;
+	mb();
+	ret = __raw_readq(addr);
 	mb();
 	return ret;
 }
@@ -487,16 +501,59 @@  extern inline void writeq(u64 b, volatil
 #define outb_p		outb
 #define outw_p		outw
 #define outl_p		outl
-#define readb_relaxed(addr)	__raw_readb(addr)
-#define readw_relaxed(addr)	__raw_readw(addr)
-#define readl_relaxed(addr)	__raw_readl(addr)
-#define readq_relaxed(addr)	__raw_readq(addr)
-#define writeb_relaxed(b, addr)	__raw_writeb(b, addr)
-#define writew_relaxed(b, addr)	__raw_writew(b, addr)
-#define writel_relaxed(b, addr)	__raw_writel(b, addr)
-#define writeq_relaxed(b, addr)	__raw_writeq(b, addr)
 
 /*
+ * The _relaxed functions must be ordered w.r.t. each other, but they don't
+ * have to be ordered w.r.t. other memory accesses.
+ */
+static inline u8 readb_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readb(addr);
+}
+
+static inline u16 readw_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readw(addr);
+}
+
+static inline u32 readl_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readl(addr);
+}
+
+static inline u64 readq_relaxed(const volatile void __iomem *addr)
+{
+	mb();
+	return __raw_readq(addr);
+}
+
+static inline void writeb_relaxed(u8 b, volatile void __iomem *addr)
+{
+	mb();
+	__raw_writeb(b, addr);
+}
+
+static inline void writew_relaxed(u16 b, volatile void __iomem *addr)
+{
+	mb();
+	__raw_writew(b, addr);
+}
+
+static inline void writel_relaxed(u32 b, volatile void __iomem *addr)
+{
+	mb();
+	__raw_writel(b, addr);
+}
+
+static inline void writeq_relaxed(u64 b, volatile void __iomem *addr)
+{
+	mb();
+	__raw_writeq(b, addr);
+}
+/*
  * String version of IO memory access ops:
  */
 extern void memcpy_fromio(void *, const volatile void __iomem *, long);
Index: linux-stable/arch/alpha/kernel/io.c
===================================================================
--- linux-stable.orig/arch/alpha/kernel/io.c	2020-05-23 10:01:22.000000000 +0200
+++ linux-stable/arch/alpha/kernel/io.c	2020-05-23 10:01:22.000000000 +0200
@@ -16,21 +16,27 @@ 
 unsigned int
 ioread8(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread8)(addr);
 	mb();
 	return ret;
 }
 
 unsigned int ioread16(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread16)(addr);
 	mb();
 	return ret;
 }
 
 unsigned int ioread32(void __iomem *addr)
 {
-	unsigned int ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
+	unsigned int ret;
+	mb();
+	ret = IO_CONCAT(__IO_PREFIX,ioread32)(addr);
 	mb();
 	return ret;
 }
@@ -148,28 +154,36 @@  EXPORT_SYMBOL(__raw_writeq);
 
 u8 readb(const volatile void __iomem *addr)
 {
-	u8 ret = __raw_readb(addr);
+	u8 ret;
+	mb();
+	ret = __raw_readb(addr);
 	mb();
 	return ret;
 }
 
 u16 readw(const volatile void __iomem *addr)
 {
-	u16 ret = __raw_readw(addr);
+	u16 ret;
+	mb();
+	ret = __raw_readw(addr);
 	mb();
 	return ret;
 }
 
 u32 readl(const volatile void __iomem *addr)
 {
-	u32 ret = __raw_readl(addr);
+	u32 ret;
+	mb();
+	ret = __raw_readl(addr);
 	mb();
 	return ret;
 }
 
 u64 readq(const volatile void __iomem *addr)
 {
-	u64 ret = __raw_readq(addr);
+	u64 ret;
+	mb();
+	ret = __raw_readq(addr);
 	mb();
 	return ret;
 }