Patchwork qemu/virtio: make wmb compiler barrier + comments

login
register
mail settings
Submitter Michael S. Tsirkin
Date Oct. 26, 2009, 1:17 p.m.
Message ID <20091026131715.GA25271@redhat.com>
Download mbox | patch
Permalink /patch/36905/
State New
Headers show

Comments

Michael S. Tsirkin - Oct. 26, 2009, 1:17 p.m.
wmb must be at least a compiler barrier, even without SMP.
Further, we likely need some rmb()/mb() as well:
I have not audited the code but lguest has mb(),
add a comment for now.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)
Paul Brook - Nov. 11, 2009, 1:34 a.m.
On Monday 26 October 2009, Michael S. Tsirkin wrote:
> wmb must be at least a compiler barrier, even without SMP.

Why?

Paul
Michael S. Tsirkin - Nov. 11, 2009, 9:37 a.m.
On Wed, Nov 11, 2009 at 01:34:12AM +0000, Paul Brook wrote:
> On Monday 26 October 2009, Michael S. Tsirkin wrote:
> > wmb must be at least a compiler barrier, even without SMP.
> 
> Why?

Because virtio code might run on a separate thread from guest.
If compiler reorders writes, guest might see inconsistent data.
Paul Brook - Nov. 11, 2009, 1:01 p.m.
On Wednesday 11 November 2009, Michael S. Tsirkin wrote:
> On Wed, Nov 11, 2009 at 01:34:12AM +0000, Paul Brook wrote:
> > On Monday 26 October 2009, Michael S. Tsirkin wrote:
> > > wmb must be at least a compiler barrier, even without SMP.
> >
> > Why?
> 
> Because virtio code might run on a separate thread from guest.
> If compiler reorders writes, guest might see inconsistent data.

If you've got threads running in parallel (which may be running on separate 
CPUs) then you need an actual memory barrier to prevent the hardware 
reordering things behind your back.

If you've already used locking to avoid simultaneous execution then the 
locking routines already include memory barriers.

A "compiler memory barrier" provides absolutely no guarantees in a 
multithreaded environment. They are sometimes useful in a single threaded 
interruptable system (i.e. UNIX signals), but that's definitely not the case 
here.

Paul
Michael S. Tsirkin - Nov. 11, 2009, 1:12 p.m.
On Wed, Nov 11, 2009 at 01:01:03PM +0000, Paul Brook wrote:
> On Wednesday 11 November 2009, Michael S. Tsirkin wrote:
> > On Wed, Nov 11, 2009 at 01:34:12AM +0000, Paul Brook wrote:
> > > On Monday 26 October 2009, Michael S. Tsirkin wrote:
> > > > wmb must be at least a compiler barrier, even without SMP.
> > >
> > > Why?
> > 
> > Because virtio code might run on a separate thread from guest.
> > If compiler reorders writes, guest might see inconsistent data.
> 
> If you've got threads running in parallel (which may be running on separate 
> CPUs)

Yes, but you asked what happens without SMP (single CPU).

> then you need an actual memory barrier to prevent the hardware 
> reordering things behind your back.
> 
> If you've already used locking to avoid simultaneous execution then the 
> locking routines already include memory barriers.

You can not share a lock with guest.

> A "compiler memory barrier" provides absolutely no guarantees in a 
> multithreaded environment. They are sometimes useful in a single threaded 
> interruptable system (i.e. UNIX signals), but that's definitely not the case 
> here.
> 
> Paul

"absolutely no guarantees" is surely wrong.  On intel CPUs, regular
memory writes are never re-ordered by the CPU.  Only compiler can
reorder such writes.  So yes, on this platform a "compiler barrier" does
provide necessary and sufficient guarantees agains write reordering in a
multithreaded environment, both with and without SMP.
Michael S. Tsirkin - Nov. 11, 2009, 1:15 p.m.
On Wed, Nov 11, 2009 at 01:01:03PM +0000, Paul Brook wrote:
> A "compiler memory barrier" provides absolutely no guarantees in a 
> multithreaded environment.

Claims including words "absolutely no" have absolutely no
chance to be correct.
Paul Brook - Nov. 11, 2009, 1:45 p.m.
> > > > > wmb must be at least a compiler barrier, even without SMP.
> > > >
> > > > Why?
> > >
> > > Because virtio code might run on a separate thread from guest.
> > > If compiler reorders writes, guest might see inconsistent data.
> >
> > If you've got threads running in parallel (which may be running on
> > separate CPUs)
> 
> Yes, but you asked what happens without SMP (single CPU).

I assumed you meant guest SMP where guest code can run in parallel with host 
code.  My understanding is that qemu currently does not implement this, and 
everything always runs in lockstep.
 
> > then you need an actual memory barrier to prevent the hardware
> > reordering things behind your back.
> >
> > If you've already used locking to avoid simultaneous execution then the
> > locking routines already include memory barriers.
> 
> You can not share a lock with guest.

No, but you can prevent the guest code running at the same time as host code.

> > A "compiler memory barrier" provides absolutely no guarantees in a
> > multithreaded environment. They are sometimes useful in a single threaded
> > interruptable system (i.e. UNIX signals), but that's definitely not the
> > case here.
> 
> "absolutely no guarantees" is surely wrong.  On intel CPUs, regular
> memory writes are never re-ordered by the CPU.  

Technically true, however reads and writes may be reordered. 
If you don't need real barriers, then why does the kvm code have them? Also, 
there's no indication that your code is specific to Intel CPUs.

Paul
Michael S. Tsirkin - Nov. 11, 2009, 2:08 p.m.
On Wed, Nov 11, 2009 at 01:45:35PM +0000, Paul Brook wrote:
> If you don't need real barriers, then why does the kvm code have them?

We need real barriers but AFAIK kvm does not have them :(
IOW: virtio is currently broken with kvm, and my patch did
not fix this. The comment that I added says as much.
Paul Brook - Nov. 11, 2009, 2:16 p.m.
On Wednesday 11 November 2009, Michael S. Tsirkin wrote:
> On Wed, Nov 11, 2009 at 01:45:35PM +0000, Paul Brook wrote:
> > If you don't need real barriers, then why does the kvm code have them?
> 
> We need real barriers but AFAIK kvm does not have them :(
> IOW: virtio is currently broken with kvm, and my patch did
> not fix this. The comment that I added says as much.

So your code just makes the bug harder to reproduce? Doesn't sound like a good 
thing to me.

Paul
Michael S. Tsirkin - Nov. 11, 2009, 2:34 p.m.
On Wed, Nov 11, 2009 at 02:16:00PM +0000, Paul Brook wrote:
> On Wednesday 11 November 2009, Michael S. Tsirkin wrote:
> > On Wed, Nov 11, 2009 at 01:45:35PM +0000, Paul Brook wrote:
> > > If you don't need real barriers, then why does the kvm code have them?
> > 
> > We need real barriers but AFAIK kvm does not have them :(
> > IOW: virtio is currently broken with kvm, and my patch did
> > not fix this. The comment that I added says as much.
> 
> So your code just makes the bug harder to reproduce? Doesn't sound like a good 
> thing to me.
> 
> Paul

There are multiple bugs, I can't fix all of them,
for all architectures, in one patch.
Paul Brook - Nov. 11, 2009, 4:13 p.m.
On Wednesday 11 November 2009, Michael S. Tsirkin wrote:
> On Wed, Nov 11, 2009 at 02:16:00PM +0000, Paul Brook wrote:
> > On Wednesday 11 November 2009, Michael S. Tsirkin wrote:
> > > On Wed, Nov 11, 2009 at 01:45:35PM +0000, Paul Brook wrote:
> > > > If you don't need real barriers, then why does the kvm code have
> > > > them?
> > >
> > > We need real barriers but AFAIK kvm does not have them :(
> > > IOW: virtio is currently broken with kvm, and my patch did
> > > not fix this. The comment that I added says as much.
> >
> > So your code just makes the bug harder to reproduce? Doesn't sound like a
> > good thing to me.
> >
> > Paul
> 
> There are multiple bugs, I can't fix all of them,
> for all architectures, in one patch.

Maybe not, but the associated comment now appears to be bogus. Either we 
execute everything in lockstep (as the comment says, in which case no barriers 
are required), or we need proper SMP barriers.
Having a half-harted barrier that maybe works some of the time on some hosts, 
and is completely unnecessary in most cases, and doesn't match the comment is 
IMO significantly worse than what we had before.

Paul
Scott Tsai - Nov. 11, 2009, 5:18 p.m.
On Wed, Nov 11, 2009 at 10:08 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, Nov 11, 2009 at 01:45:35PM +0000, Paul Brook wrote:
>> If you don't need real barriers, then why does the kvm code have them?
>
> We need real barriers but AFAIK kvm does not have them :(
> IOW: virtio is currently broken with kvm, and my patch did
> not fix this. The comment that I added says as much.

How about just using GCC's __sync__synchronize atomic builtin (if
detected as available by configure)?
It's a full memory barrier instead of just a write barrier,  for x86,
it generates the same code as the current Linux mb() implementation:
"mfence" on x86_64
"lock orl $0x0,(%esp)" on x86 unless -march is specified to a
processor with "mfence".
PPC could continue to use "eieio" while other architectures could just
default to __sync_synchronize

I do have a newbie question, when exactly would vrtio have to handle
concurrent access from multiple threads?
My current reading of the code suggests:
1. when CONFIG_IOTHREAD is true
2. when CONFIG_KVM is true and the guest machine has multiple CPUs
Michael S. Tsirkin - Nov. 11, 2009, 6:08 p.m.
On Thu, Nov 12, 2009 at 01:18:11AM +0800, Scott Tsai wrote:
> On Wed, Nov 11, 2009 at 10:08 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Wed, Nov 11, 2009 at 01:45:35PM +0000, Paul Brook wrote:
> >> If you don't need real barriers, then why does the kvm code have them?
> >
> > We need real barriers but AFAIK kvm does not have them :(
> > IOW: virtio is currently broken with kvm, and my patch did
> > not fix this. The comment that I added says as much.
> 
> How about just using GCC's __sync__synchronize atomic builtin (if
> detected as available by configure)?
> It's a full memory barrier instead of just a write barrier,  for x86,
> it generates the same code as the current Linux mb() implementation:
> "mfence" on x86_64
> "lock orl $0x0,(%esp)" on x86 unless -march is specified to a
> processor with "mfence".
> PPC could continue to use "eieio" while other architectures could just
> default to __sync_synchronize

Hmm, on x86 that's more expensive than it needs to be...
We can also ifdef platforms that we care about ...

> I do have a newbie question, when exactly would vrtio have to handle
> concurrent access from multiple threads?
> My current reading of the code suggests:
> 1. when CONFIG_IOTHREAD is true
> 2. when CONFIG_KVM is true and the guest machine has multiple CPUs
Michael S. Tsirkin - Nov. 11, 2009, 6:09 p.m.
On Thu, Nov 12, 2009 at 01:18:11AM +0800, Scott Tsai wrote:
> On Wed, Nov 11, 2009 at 10:08 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Wed, Nov 11, 2009 at 01:45:35PM +0000, Paul Brook wrote:
> >> If you don't need real barriers, then why does the kvm code have them?
> >
> > We need real barriers but AFAIK kvm does not have them :(
> > IOW: virtio is currently broken with kvm, and my patch did
> > not fix this. The comment that I added says as much.
> 
> How about just using GCC's __sync__synchronize atomic builtin (if
> detected as available by configure)?
> It's a full memory barrier instead of just a write barrier,  for x86,
> it generates the same code as the current Linux mb() implementation:
> "mfence" on x86_64
> "lock orl $0x0,(%esp)" on x86 unless -march is specified to a
> processor with "mfence".
> PPC could continue to use "eieio" while other architectures could just
> default to __sync_synchronize
> 
> I do have a newbie question, when exactly would vrtio have to handle
> concurrent access from multiple threads?
> My current reading of the code suggests:
> 1. when CONFIG_IOTHREAD is true
> 2. when CONFIG_KVM is true and the guest machine has multiple CPUs

Right. I don't think CONFIG_IOTHREAD can work correctly
without kvm though: how would atomics be handled?
Scott Tsai - Nov. 11, 2009, 6:37 p.m.
On Thu, Nov 12, 2009 at 2:09 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> I do have a newbie question, when exactly would vrtio have to handle
>> concurrent access from multiple threads?
>> My current reading of the code suggests:
>> 1. when CONFIG_IOTHREAD is true
>> 2. when CONFIG_KVM is true and the guest machine has multiple CPUs
>
> Right. I don't think CONFIG_IOTHREAD can work correctly
> without kvm though: how would atomics be handled?

I naively imagined it to work like this:
When CONFIG_IOTHREAD is true and CONFIG_KVM is false,
all the tcg CPUs run in the tcg_cpu_thread and device emulation code
runs in io_thread,
so if the tcg translators generate suitable memory barrier
instructions when it sees a "lfence", "sfence", "mfence" instruction
while emulating a x86 or "sync" while emulating a MIPS everything
should work but a quick look at target-*/translate.c suggests memory
barrier instructions are treated as nops.

So maybe --enable-io-thread while --disable-kvm should not be allowed
at configure time.
Does anyone actually ship qemu with CONFIG_IOTHREAD enabled?
Michael S. Tsirkin - Nov. 11, 2009, 7:56 p.m.
On Thu, Nov 12, 2009 at 02:37:26AM +0800, Scott Tsai wrote:
> On Thu, Nov 12, 2009 at 2:09 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >> I do have a newbie question, when exactly would vrtio have to handle
> >> concurrent access from multiple threads?
> >> My current reading of the code suggests:
> >> 1. when CONFIG_IOTHREAD is true
> >> 2. when CONFIG_KVM is true and the guest machine has multiple CPUs
> >
> > Right. I don't think CONFIG_IOTHREAD can work correctly
> > without kvm though: how would atomics be handled?
> 
> I naively imagined it to work like this:
> When CONFIG_IOTHREAD is true and CONFIG_KVM is false,
> all the tcg CPUs run in the tcg_cpu_thread and device emulation code
> runs in io_thread,
> so if the tcg translators generate suitable memory barrier
> instructions when it sees a "lfence", "sfence", "mfence" instruction
> while emulating a x86 or "sync" while emulating a MIPS everything
> should work

that might not be enough. guest can do e.g. atomics on the same memory
with iothread. In parctice, with virtio it doesn't.

> but a quick look at target-*/translate.c suggests memory
> barrier instructions are treated as nops.
> 
> So maybe --enable-io-thread while --disable-kvm should not be allowed
> at configure time.
> Does anyone actually ship qemu with CONFIG_IOTHREAD enabled?
Jamie Lokier - Nov. 13, 2009, 3 a.m.
Scott Tsai wrote:
> I do have a newbie question, when exactly would vrtio have to handle
> concurrent access from multiple threads?
> My current reading of the code suggests:
> 1. when CONFIG_IOTHREAD is true
> 2. when CONFIG_KVM is true and the guest machine has multiple CPUs

It's enough to have CONFIG_KVM with a single guest CPU, isn't it?

Because the guest CPU can still run concurrently with the I/O thread
on kvm, and the host may have multiple CPUs even though the guest does
not.

-- Jamie

Patch

diff --git a/hw/virtio.c b/hw/virtio.c
index 337ff27..1f92171 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -23,8 +23,11 @@ 
 /* QEMU doesn't strictly need write barriers since everything runs in
  * lock-step.  We'll leave the calls to wmb() in though to make it obvious for
  * KVM or if kqemu gets SMP support.
+ * In any case, we must prevent the compiler from reordering the code.
+ * TODO: we likely need some rmb()/mb() as well.
  */
-#define wmb() do { } while (0)
+
+#define wmb() __asm__ __volatile__("": : :"memory")
 
 typedef struct VRingDesc
 {