diff mbox

virtio_ring: use smp_store_mb

Message ID 1450347932-16325-1-git-send-email-mst@redhat.com
State New
Headers show

Commit Message

Michael S. Tsirkin Dec. 17, 2015, 10:32 a.m. UTC
We need a full barrier after writing out event index, using smp_store_mb
there seems better than open-coding.
As usual, we need a wrapper to account for strong barriers/non smp.

It's tempting to use this in vhost as well, for that, we'll
need a variant of smp_store_mb that works on __user pointers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Seems to give a speedup on my box but I'm less sure about this one.  E.g. as
xchng faster than mfence on all/most intel CPUs? Anyone has an opinion?

 include/linux/virtio_ring.h  | 14 ++++++++++++++
 drivers/virtio/virtio_ring.c | 15 +++++++++------
 2 files changed, 23 insertions(+), 6 deletions(-)

Comments

Peter Zijlstra Dec. 17, 2015, 10:52 a.m. UTC | #1
On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> +static inline void virtio_store_mb(bool weak_barriers,
> +				   __virtio16 *p, __virtio16 v)
> +{
> +#ifdef CONFIG_SMP
> +	if (weak_barriers)
> +		smp_store_mb(*p, v);
> +	else
> +#endif
> +	{
> +		WRITE_ONCE(*p, v);
> +		mb();
> +	}
> +}

This is a different barrier depending on SMP, that seems wrong.

smp_mb(), as (should be) used by smp_store_mb() does not provide a
barrier against IO. mb() otoh does.

Since this is virtIO I would expect you always want mb().
Peter Zijlstra Dec. 17, 2015, 11:22 a.m. UTC | #2
On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> Seems to give a speedup on my box but I'm less sure about this one.  E.g. as
> xchng faster than mfence on all/most intel CPUs? Anyone has an opinion?

Would help if you Cc people who would actually know this :-)

Yes, we've recently established that xchg is indeed faster than mfence
on at least recent machines, see:

  lkml.kernel.org/r/CA+55aFynbkeuUGs9s-q+fLY6MeRBA6MjEyWWbbe7A5AaqsAknw@mail.gmail.com

> +static inline void virtio_store_mb(bool weak_barriers,
> +				   __virtio16 *p, __virtio16 v)
> +{
> +#ifdef CONFIG_SMP
> +	if (weak_barriers)
> +		smp_store_mb(*p, v);
> +	else
> +#endif
> +	{
> +		WRITE_ONCE(*p, v);
> +		mb();
> +	}
> +}

Note that virtio_mb() is weirdly inconsistent with virtio_[rw]mb() in
that they use dma_* ops for weak_barriers, while virtio_mb() uses
smp_mb().

As previously stated, smp_mb() does not cover the same memory domains as
dma_mb() would.
Michael S. Tsirkin Dec. 17, 2015, 1:16 p.m. UTC | #3
On Thu, Dec 17, 2015 at 11:52:38AM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> > +static inline void virtio_store_mb(bool weak_barriers,
> > +				   __virtio16 *p, __virtio16 v)
> > +{
> > +#ifdef CONFIG_SMP
> > +	if (weak_barriers)
> > +		smp_store_mb(*p, v);
> > +	else
> > +#endif
> > +	{
> > +		WRITE_ONCE(*p, v);
> > +		mb();
> > +	}
> > +}
> 
> This is a different barrier depending on SMP, that seems wrong.

Of course it's wrong in the sense that it's
suboptimal on UP. What we would really like is to
have, on UP, exactly the same barrier as on SMP.
This is because a UP guest can run on an SMP host.

But Linux doesn't provide this ability: if CONFIG_SMP is
not defined is optimizes most barriers out to a
compiler barrier.

Consider for example x86: what we want is xchg (NOT
mfence - see below for why) but if built without CONFIG_SMP
smp_store_mb does not include this.



> smp_mb(), as (should be) used by smp_store_mb() does not provide a
> barrier against IO. mb() otoh does.
> 
> Since this is virtIO I would expect you always want mb().

No because it's VIRTio not real io :) It just switches to the hyprevisor
mode - kind of like a function call really.
The weak_barriers flag is cleared for when it's used
with real devices with real IO.


All this is explained in some detail at the top of
include/linux/virtio.h
Michael S. Tsirkin Dec. 17, 2015, 1:26 p.m. UTC | #4
On Thu, Dec 17, 2015 at 12:22:22PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> > Seems to give a speedup on my box but I'm less sure about this one.  E.g. as
> > xchng faster than mfence on all/most intel CPUs? Anyone has an opinion?
> 
> Would help if you Cc people who would actually know this :-)

Good point. Glad you still saw this. Thanks!

> Yes, we've recently established that xchg is indeed faster than mfence
> on at least recent machines, see:
> 
>   lkml.kernel.org/r/CA+55aFynbkeuUGs9s-q+fLY6MeRBA6MjEyWWbbe7A5AaqsAknw@mail.gmail.com
> 
> > +static inline void virtio_store_mb(bool weak_barriers,
> > +				   __virtio16 *p, __virtio16 v)
> > +{
> > +#ifdef CONFIG_SMP
> > +	if (weak_barriers)
> > +		smp_store_mb(*p, v);
> > +	else
> > +#endif
> > +	{
> > +		WRITE_ONCE(*p, v);
> > +		mb();
> > +	}
> > +}
> 
> Note that virtio_mb() is weirdly inconsistent with virtio_[rw]mb() in
> that they use dma_* ops for weak_barriers, while virtio_mb() uses
> smp_mb().

It's a hack really. I think I'll clean it up a bit to
make it more consistent.

To simplify things, you may consider things before
the optimization brought in by
	commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
	Author: Alexander Duyck <alexander.h.duyck@redhat.com>
	Date:   Mon Apr 13 21:03:49 2015 +0930

	    virtio_ring: Update weak barriers to use dma_wmb/rmb

> As previously stated, smp_mb() does not cover the same memory domains as
> dma_mb() would.

I know.  We used to consistently do the right thing on SMP,
but on UP Linux does not have good portable APIs for us
to use. So we hack around with what's available which is
typically stronger than what's really needed.

I guess no one cares about UP that much.

The Alexander came and tried to optimize UP using
dma_wmb/dma_rmb. I guess he did not find dma_mb so
left it as is.

Maybe we should make virtio depend on SMP, and be done with it,
but the amount of code to maintain !SMP is small enough
to not be worth the potential pain to users (if any).
Peter Zijlstra Dec. 17, 2015, 1:57 p.m. UTC | #5
On Thu, Dec 17, 2015 at 03:16:20PM +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 17, 2015 at 11:52:38AM +0100, Peter Zijlstra wrote:
> > On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> > > +static inline void virtio_store_mb(bool weak_barriers,
> > > +				   __virtio16 *p, __virtio16 v)
> > > +{
> > > +#ifdef CONFIG_SMP
> > > +	if (weak_barriers)
> > > +		smp_store_mb(*p, v);
> > > +	else
> > > +#endif
> > > +	{
> > > +		WRITE_ONCE(*p, v);
> > > +		mb();
> > > +	}
> > > +}
> > 
> > This is a different barrier depending on SMP, that seems wrong.
> 
> Of course it's wrong in the sense that it's
> suboptimal on UP. What we would really like is to
> have, on UP, exactly the same barrier as on SMP.
> This is because a UP guest can run on an SMP host.
> 
> But Linux doesn't provide this ability: if CONFIG_SMP is
> not defined is optimizes most barriers out to a
> compiler barrier.
> 
> Consider for example x86: what we want is xchg (NOT
> mfence - see below for why) but if built without CONFIG_SMP
> smp_store_mb does not include this.

You could of course go fix that instead of mutilating things into
sort-of functional state.

> 
> 
> > smp_mb(), as (should be) used by smp_store_mb() does not provide a
> > barrier against IO. mb() otoh does.
> > 
> > Since this is virtIO I would expect you always want mb().
> 
> No because it's VIRTio not real io :) It just switches to the hyprevisor
> mode - kind of like a function call really.
> The weak_barriers flag is cleared for when it's used
> with real devices with real IO.
> 
> 
> All this is explained in some detail at the top of
> include/linux/virtio.h

I did read that, it didn't make any sense wrt the code below it.

For instance it seems to imply weak_barriers is for smp like stuff while
!weak_barriers is for actual devices.

But then you go use dma_*mb() ops, which are specifially for devices
only for weak_barrier.
Peter Zijlstra Dec. 17, 2015, 2:02 p.m. UTC | #6
On Thu, Dec 17, 2015 at 03:26:29PM +0200, Michael S. Tsirkin wrote:
> > Note that virtio_mb() is weirdly inconsistent with virtio_[rw]mb() in
> > that they use dma_* ops for weak_barriers, while virtio_mb() uses
> > smp_mb().
> 
> It's a hack really. I think I'll clean it up a bit to
> make it more consistent.
> 
> To simplify things, you may consider things before
> the optimization brought in by
> 	commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
> 	Author: Alexander Duyck <alexander.h.duyck@redhat.com>
> 	Date:   Mon Apr 13 21:03:49 2015 +0930
> 
> 	    virtio_ring: Update weak barriers to use dma_wmb/rmb

That commit doesn't make any sense. dma_*mb() explicitly does _NOT_
cover the smp_*mb() part.

Again, look at the ARM definitions, the smp_*mb() primitives use the
inner coherence stuff, while the dma_*mb() primitives use the outer
coherent stuff.

the *mb() primitives cover both.
Michael S. Tsirkin Dec. 17, 2015, 2:33 p.m. UTC | #7
On Thu, Dec 17, 2015 at 02:57:26PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 03:16:20PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Dec 17, 2015 at 11:52:38AM +0100, Peter Zijlstra wrote:
> > > On Thu, Dec 17, 2015 at 12:32:53PM +0200, Michael S. Tsirkin wrote:
> > > > +static inline void virtio_store_mb(bool weak_barriers,
> > > > +				   __virtio16 *p, __virtio16 v)
> > > > +{
> > > > +#ifdef CONFIG_SMP
> > > > +	if (weak_barriers)
> > > > +		smp_store_mb(*p, v);
> > > > +	else
> > > > +#endif
> > > > +	{
> > > > +		WRITE_ONCE(*p, v);
> > > > +		mb();
> > > > +	}
> > > > +}
> > > 
> > > This is a different barrier depending on SMP, that seems wrong.
> > 
> > Of course it's wrong in the sense that it's
> > suboptimal on UP. What we would really like is to
> > have, on UP, exactly the same barrier as on SMP.
> > This is because a UP guest can run on an SMP host.
> > 
> > But Linux doesn't provide this ability: if CONFIG_SMP is
> > not defined is optimizes most barriers out to a
> > compiler barrier.
> > 
> > Consider for example x86: what we want is xchg (NOT
> > mfence - see below for why) but if built without CONFIG_SMP
> > smp_store_mb does not include this.
> 
> You could of course go fix that instead of mutilating things into
> sort-of functional state.

Yes, we'd just need to touch all architectures, all for
the sake of UP which almost no one uses.
Basically, we need APIs that explicitly are
for talking to another kernel on a different CPU on
the same SMP system, and implemented identically
between CONFIG_SMP and !CONFIG_SMP on all architectures.

Do you think this is something of general usefulness,
outside virtio?

> > 
> > 
> > > smp_mb(), as (should be) used by smp_store_mb() does not provide a
> > > barrier against IO. mb() otoh does.
> > > 
> > > Since this is virtIO I would expect you always want mb().
> > 
> > No because it's VIRTio not real io :) It just switches to the hyprevisor
> > mode - kind of like a function call really.
> > The weak_barriers flag is cleared for when it's used
> > with real devices with real IO.
> > 
> > 
> > All this is explained in some detail at the top of
> > include/linux/virtio.h
> 
> I did read that, it didn't make any sense wrt the code below it.
> 
> For instance it seems to imply weak_barriers is for smp like stuff while
> !weak_barriers is for actual devices.
> 
> But then you go use dma_*mb() ops, which are specifially for devices
> only for weak_barrier.

Yes the dma_*mb thing is a kind of an interface misuse.
It's an optimization for UP introduced here:

commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
Author: Alexander Duyck <alexander.h.duyck@redhat.com>
Date:   Mon Apr 13 21:03:49 2015 +0930

    virtio_ring: Update weak barriers to use dma_wmb/rmb 
    This change makes it so that instead of using smp_wmb/rmb which varies
    depending on the kernel configuration we can can use dma_wmb/rmb which for
    most architectures should be equal to or slightly more strict than
    smp_wmb/rmb.
    
    The advantage to this is that these barriers are available to uniprocessor
    builds as well so the performance should improve under such a
    configuration.
    
    Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

but given the confusion this causes, I'm inclined to revert
this, and later switch to for cleaner API if that
appears. Cc Alexander (at the new address).
Michael S. Tsirkin Dec. 17, 2015, 2:34 p.m. UTC | #8
On Thu, Dec 17, 2015 at 03:02:12PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 03:26:29PM +0200, Michael S. Tsirkin wrote:
> > > Note that virtio_mb() is weirdly inconsistent with virtio_[rw]mb() in
> > > that they use dma_* ops for weak_barriers, while virtio_mb() uses
> > > smp_mb().
> > 
> > It's a hack really. I think I'll clean it up a bit to
> > make it more consistent.
> > 
> > To simplify things, you may consider things before
> > the optimization brought in by
> > 	commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
> > 	Author: Alexander Duyck <alexander.h.duyck@redhat.com>
> > 	Date:   Mon Apr 13 21:03:49 2015 +0930
> > 
> > 	    virtio_ring: Update weak barriers to use dma_wmb/rmb
> 
> That commit doesn't make any sense. dma_*mb() explicitly does _NOT_
> cover the smp_*mb() part.
> 
> Again, look at the ARM definitions, the smp_*mb() primitives use the
> inner coherence stuff, while the dma_*mb() primitives use the outer
> coherent stuff.

Does outer coherent imply inner coherent?

> the *mb() primitives cover both.
Peter Zijlstra Dec. 17, 2015, 2:39 p.m. UTC | #9
On Thu, Dec 17, 2015 at 04:33:44PM +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 17, 2015 at 02:57:26PM +0100, Peter Zijlstra wrote:
> > 
> > You could of course go fix that instead of mutilating things into
> > sort-of functional state.
> 
> Yes, we'd just need to touch all architectures, all for
> the sake of UP which almost no one uses.
> Basically, we need APIs that explicitly are
> for talking to another kernel on a different CPU on
> the same SMP system, and implemented identically
> between CONFIG_SMP and !CONFIG_SMP on all architectures.
> 
> Do you think this is something of general usefulness,
> outside virtio?

I'm not aware of any other case, but if there are more parts of virt
that need this then I see no problem adding it.

That is, virt in general is the only use-case that I can think of,
because this really is an artifact of interfacing with an SMP host while
running an UP kernel.

But I'm really not familiar with virt, so I do not know if there's more
sites outside of virtio that could use this.

Touching all archs is a tad tedious, but its fairly straight forward.
Michael S. Tsirkin Dec. 17, 2015, 2:43 p.m. UTC | #10
On Thu, Dec 17, 2015 at 03:39:10PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 04:33:44PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Dec 17, 2015 at 02:57:26PM +0100, Peter Zijlstra wrote:
> > > 
> > > You could of course go fix that instead of mutilating things into
> > > sort-of functional state.
> > 
> > Yes, we'd just need to touch all architectures, all for
> > the sake of UP which almost no one uses.
> > Basically, we need APIs that explicitly are
> > for talking to another kernel on a different CPU on
> > the same SMP system, and implemented identically
> > between CONFIG_SMP and !CONFIG_SMP on all architectures.
> > 
> > Do you think this is something of general usefulness,
> > outside virtio?
> 
> I'm not aware of any other case, but if there are more parts of virt
> that need this then I see no problem adding it.
> 
> That is, virt in general is the only use-case that I can think of,
> because this really is an artifact of interfacing with an SMP host while
> running an UP kernel.
> 
> But I'm really not familiar with virt, so I do not know if there's more
> sites outside of virtio that could use this.
> 
> Touching all archs is a tad tedious, but its fairly straight forward.

Thanks, will take a stub at this.
Peter Zijlstra Dec. 17, 2015, 3:09 p.m. UTC | #11
On Thu, Dec 17, 2015 at 04:34:57PM +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 17, 2015 at 03:02:12PM +0100, Peter Zijlstra wrote:

> > > 	commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
> > > 	Author: Alexander Duyck <alexander.h.duyck@redhat.com>
> > > 	Date:   Mon Apr 13 21:03:49 2015 +0930
> > > 
> > > 	    virtio_ring: Update weak barriers to use dma_wmb/rmb
> > 
> > That commit doesn't make any sense. dma_*mb() explicitly does _NOT_
> > cover the smp_*mb() part.
> > 
> > Again, look at the ARM definitions, the smp_*mb() primitives use the
> > inner coherence stuff, while the dma_*mb() primitives use the outer
> > coherent stuff.
> 
> Does outer coherent imply inner coherent?
> 
> > the *mb() primitives cover both.

I do not think so, but lets add Will, he dreams this stuff.
Will Deacon Dec. 17, 2015, 3:52 p.m. UTC | #12
On Thu, Dec 17, 2015 at 04:09:17PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 17, 2015 at 04:34:57PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Dec 17, 2015 at 03:02:12PM +0100, Peter Zijlstra wrote:
> 
> > > > 	commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
> > > > 	Author: Alexander Duyck <alexander.h.duyck@redhat.com>
> > > > 	Date:   Mon Apr 13 21:03:49 2015 +0930
> > > > 
> > > > 	    virtio_ring: Update weak barriers to use dma_wmb/rmb
> > > 
> > > That commit doesn't make any sense. dma_*mb() explicitly does _NOT_
> > > cover the smp_*mb() part.
> > > 
> > > Again, look at the ARM definitions, the smp_*mb() primitives use the
> > > inner coherence stuff, while the dma_*mb() primitives use the outer
> > > coherent stuff.
> > 
> > Does outer coherent imply inner coherent?
> > 
> > > the *mb() primitives cover both.
> 
> I do not think so, but lets add Will, he dreams this stuff.

Right, and I don't sleep well these days.

Anyway, the outer-shareable domain (osh) is a superset of the
inner-shareable domain (ish). The inner-shareable domain contains the
CPUs and any peripherals that you and I would call "cache coherent". The
outer-shareable domain extends this to cover a strange set of "less cache
coherent" devices, which we would just call "not cache coherent" for the
purposes of Linux. Normal, non-cacheable memory (i.e. the memory type we
use for non-coherent DMA buffers) is outer-shareable.

Since the barrier macros don't know if the device is coherent or not, we
use the stronger semantics of outer-shareable.

I've not been following the thread, but I reckon we could add dma_mb()
(as dmb(osh) on arm), then use that to build dma_load_acquire and
dma_store_release accessors. On arm64, we could probably use the
acquire/release instructions directly, since they inherit the shareability
domain of the address (which has the nice property of being inner-shareable
for coherent devices).

The massive pain with adding new accessors is defining the semantics.
memory-barriers.txt is already lacking for the CPU side, and we're
struggling to express the kind of transitivity guarantees we provide
today, let alone with new primitives :(

Will
Michael S. Tsirkin Dec. 17, 2015, 7:21 p.m. UTC | #13
On Thu, Dec 17, 2015 at 03:52:24PM +0000, Will Deacon wrote:
> On Thu, Dec 17, 2015 at 04:09:17PM +0100, Peter Zijlstra wrote:
> > On Thu, Dec 17, 2015 at 04:34:57PM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Dec 17, 2015 at 03:02:12PM +0100, Peter Zijlstra wrote:
> > 
> > > > > 	commit 9e1a27ea42691429e31f158cce6fc61bc79bb2e9
> > > > > 	Author: Alexander Duyck <alexander.h.duyck@redhat.com>
> > > > > 	Date:   Mon Apr 13 21:03:49 2015 +0930
> > > > > 
> > > > > 	    virtio_ring: Update weak barriers to use dma_wmb/rmb
> > > > 
> > > > That commit doesn't make any sense. dma_*mb() explicitly does _NOT_
> > > > cover the smp_*mb() part.
> > > > 
> > > > Again, look at the ARM definitions, the smp_*mb() primitives use the
> > > > inner coherence stuff, while the dma_*mb() primitives use the outer
> > > > coherent stuff.
> > > 
> > > Does outer coherent imply inner coherent?
> > > 
> > > > the *mb() primitives cover both.
> > 
> > I do not think so, but lets add Will, he dreams this stuff.
> 
> Right, and I don't sleep well these days.
> 
> Anyway, the outer-shareable domain (osh) is a superset of the
> inner-shareable domain (ish). The inner-shareable domain contains the
> CPUs and any peripherals that you and I would call "cache coherent". The
> outer-shareable domain extends this to cover a strange set of "less cache
> coherent" devices, which we would just call "not cache coherent" for the
> purposes of Linux. Normal, non-cacheable memory (i.e. the memory type we
> use for non-coherent DMA buffers) is outer-shareable.
> 
> Since the barrier macros don't know if the device is coherent or not, we
> use the stronger semantics of outer-shareable.
> 
> I've not been following the thread, but I reckon we could add dma_mb()
> (as dmb(osh) on arm), then use that to build dma_load_acquire and
> dma_store_release accessors. On arm64, we could probably use the
> acquire/release instructions directly, since they inherit the shareability
> domain of the address (which has the nice property of being inner-shareable
> for coherent devices).
> 
> The massive pain with adding new accessors is defining the semantics.
> memory-barriers.txt is already lacking for the CPU side, and we're
> struggling to express the kind of transitivity guarantees we provide
> today, let alone with new primitives :(
> 
> Will

Well virtio (might) have wanted dma_mb in the past.

But if are adding barrier stuff anyway, we really want
pv_ counterparts to smp_ that do the same on CONFIG_SMP
but don't change when CONFIG_SMP is disabled.
diff mbox

Patch

diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 0135c16..8912189 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -47,6 +47,20 @@  static inline void virtio_wmb(bool weak_barriers)
 		wmb();
 }
 
+static inline void virtio_store_mb(bool weak_barriers,
+				   __virtio16 *p, __virtio16 v)
+{
+#ifdef CONFIG_SMP
+	if (weak_barriers)
+		smp_store_mb(*p, v);
+	else
+#endif
+	{
+		WRITE_ONCE(*p, v);
+		mb();
+	}
+}
+
 static inline __virtio16 virtio_load_acquire(bool weak_barriers, __virtio16 *p)
 {
 	if (!weak_barriers) {
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f822cab..b0aea67 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -517,10 +517,10 @@  void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 	/* If we expect an interrupt for the next entry, tell host
 	 * by writing event index and flush out the write before
 	 * the read in the next get_buf call. */
-	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
-		vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx);
-		virtio_mb(vq->weak_barriers);
-	}
+	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
+		virtio_store_mb(vq->weak_barriers,
+				&vring_used_event(&vq->vring),
+				cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
 
 #ifdef DEBUG
 	vq->last_add_time_valid = false;
@@ -653,8 +653,11 @@  bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 	}
 	/* TODO: tune this threshold */
 	bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
-	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs);
-	virtio_mb(vq->weak_barriers);
+
+	virtio_store_mb(vq->weak_barriers,
+			&vring_used_event(&vq->vring),
+			cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
+
 	if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
 		END_USE(vq);
 		return false;