Patchwork [v6,1/8] atomic: introduce atomic operations

login
register
mail settings
Submitter pingfan liu
Date Nov. 5, 2012, 5:38 a.m.
Message ID <1352093924-17598-2-git-send-email-qemulist@gmail.com>
Download mbox | patch
Permalink /patch/197143/
State New
Headers show

Comments

pingfan liu - Nov. 5, 2012, 5:38 a.m.
From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

If out of global lock, we will be challenged by SMP in low level,
so need atomic ops.

This file is a wrapper of GCC atomic builtin.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/qemu/atomic.h |   63 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 63 insertions(+), 0 deletions(-)
 create mode 100644 include/qemu/atomic.h
Paolo Bonzini - Nov. 12, 2012, 9:54 a.m.
Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> If out of global lock, we will be challenged by SMP in low level,
> so need atomic ops.
> 
> This file is a wrapper of GCC atomic builtin.

I still object to this.

I know it enforces type-safety, but it is incomplete.  It doesn't
provide neither atomic accesses to pointers, nor useful operations such
as exchange.  It won't be used consistently, because in some places you
just do not have an Atomic value (see both current uses of __sync_*
builtins).

If you can make it complete, and prove it by using it where __sync_* is
used now, or just use gcc builtins directly.

Paolo

> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  include/qemu/atomic.h |   63 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 63 insertions(+), 0 deletions(-)
>  create mode 100644 include/qemu/atomic.h
> 
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> new file mode 100644
> index 0000000..a9e6d35
> --- /dev/null
> +++ b/include/qemu/atomic.h
> @@ -0,0 +1,63 @@
> +/*
> + * Simple wrapper of gcc Atomic-Builtins
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +#ifndef __QEMU_ATOMIC_H
> +#define __QEMU_ATOMIC_H
> +
> +typedef struct Atomic {
> +    volatile int counter;
> +} Atomic;
> +
> +static inline void atomic_set(Atomic *v, int i)
> +{
> +    v->counter = i;
> +}
> +
> +static inline int atomic_read(Atomic *v)
> +{
> +    return v->counter;
> +}
> +
> +static inline int atomic_return_and_add(int i, Atomic *v)
> +{
> +    int ret;
> +
> +    ret = __sync_fetch_and_add(&v->counter, i);
> +    return ret;
> +}
> +
> +static inline int atomic_return_and_sub(int i, Atomic *v)
> +{
> +    int ret;
> +
> +    ret = __sync_fetch_and_sub(&v->counter, i);
> +    return ret;
> +}
> +
> +/**
> + *  * atomic_inc - increment atomic variable
> + *   * @v: pointer of type Atomic
> + *    *
> + *     * Atomically increments @v by 1.
> + *      */
> +static inline void atomic_inc(Atomic *v)
> +{
> +    __sync_fetch_and_add(&v->counter, 1);
> +}
> +
> +/**
> + *  * atomic_dec - decrement atomic variable
> + *   * @v: pointer of type Atomic
> + *    *
> + *     * Atomically decrements @v by 1.
> + *      */
> +static inline void atomic_dec(Atomic *v)
> +{
> +    __sync_fetch_and_sub(&v->counter, 1);
> +}
> +
> +#endif
>
pingfan liu - Nov. 13, 2012, 6:48 a.m.
On Mon, Nov 12, 2012 at 5:54 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> If out of global lock, we will be challenged by SMP in low level,
>> so need atomic ops.
>>
>> This file is a wrapper of GCC atomic builtin.
>
> I still object to this.
>
> I know it enforces type-safety, but it is incomplete.  It doesn't

Although it is incomplete, but the rest cases are rarely used.  Linux
faces such issue, and the "int" version is enough, so I think we can
borrow experience from there.

> provide neither atomic accesses to pointers, nor useful operations such
> as exchange.  It won't be used consistently, because in some places you
> just do not have an Atomic value (see both current uses of __sync_*
> builtins).
>
> If you can make it complete, and prove it by using it where __sync_* is

For current code, __sync_* is used rarely, I think except the
barriers, only two places use it. We can substitute it easily.

Regards,
Pingfan
> used now, or just use gcc builtins directly.
>
> Paolo
>
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  include/qemu/atomic.h |   63 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 63 insertions(+), 0 deletions(-)
>>  create mode 100644 include/qemu/atomic.h
>>
>> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
>> new file mode 100644
>> index 0000000..a9e6d35
>> --- /dev/null
>> +++ b/include/qemu/atomic.h
>> @@ -0,0 +1,63 @@
>> +/*
>> + * Simple wrapper of gcc Atomic-Builtins
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +#ifndef __QEMU_ATOMIC_H
>> +#define __QEMU_ATOMIC_H
>> +
>> +typedef struct Atomic {
>> +    volatile int counter;
>> +} Atomic;
>> +
>> +static inline void atomic_set(Atomic *v, int i)
>> +{
>> +    v->counter = i;
>> +}
>> +
>> +static inline int atomic_read(Atomic *v)
>> +{
>> +    return v->counter;
>> +}
>> +
>> +static inline int atomic_return_and_add(int i, Atomic *v)
>> +{
>> +    int ret;
>> +
>> +    ret = __sync_fetch_and_add(&v->counter, i);
>> +    return ret;
>> +}
>> +
>> +static inline int atomic_return_and_sub(int i, Atomic *v)
>> +{
>> +    int ret;
>> +
>> +    ret = __sync_fetch_and_sub(&v->counter, i);
>> +    return ret;
>> +}
>> +
>> +/**
>> + *  * atomic_inc - increment atomic variable
>> + *   * @v: pointer of type Atomic
>> + *    *
>> + *     * Atomically increments @v by 1.
>> + *      */
>> +static inline void atomic_inc(Atomic *v)
>> +{
>> +    __sync_fetch_and_add(&v->counter, 1);
>> +}
>> +
>> +/**
>> + *  * atomic_dec - decrement atomic variable
>> + *   * @v: pointer of type Atomic
>> + *    *
>> + *     * Atomically decrements @v by 1.
>> + *      */
>> +static inline void atomic_dec(Atomic *v)
>> +{
>> +    __sync_fetch_and_sub(&v->counter, 1);
>> +}
>> +
>> +#endif
>>
>
Paolo Bonzini - Nov. 13, 2012, 10:11 a.m.
> > Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
> > > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> > >
> > > If out of global lock, we will be challenged by SMP in low level,
> > > so need atomic ops.
> > >
> > > This file is a wrapper of GCC atomic builtin.
> >
> > I still object to this.
> >
> > I know it enforces type-safety, but it is incomplete.  It doesn't
> 
> Although it is incomplete, but the rest cases are rarely used.  Linux
> faces such issue, and the "int" version is enough, so I think we can
> borrow experience from there.

One of the two places that use __sync_* require 64-bit accesses.  My
RCU prototype required pointer-sized access, which you cannot make type-
safe without C++ templates.

> > provide neither atomic accesses to pointers, nor useful operations such
> > as exchange.  It won't be used consistently, because in some places you
> > just do not have an Atomic value (see both current uses of __sync_*
> > builtins).
> >
> > If you can make it complete, and prove it by using it where
> > __sync_* is
> 
> For current code, __sync_* is used rarely, I think except the
> barriers, only two places use it. We can substitute it easily.

No, you cannot.  See above, one doesn't use ints and the other is
guest state so migration becomes complicated if you use Atomic.  I'm
happy to be proven wrong, however.

Paolo
Paolo Bonzini - Nov. 13, 2012, 10:11 a.m.
> > Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
> > > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> > >
> > > If out of global lock, we will be challenged by SMP in low level,
> > > so need atomic ops.
> > >
> > > This file is a wrapper of GCC atomic builtin.
> >
> > I still object to this.
> >
> > I know it enforces type-safety, but it is incomplete.  It doesn't
> 
> Although it is incomplete, but the rest cases are rarely used.  Linux
> faces such issue, and the "int" version is enough, so I think we can
> borrow experience from there.

One of the two places that use __sync_* require 64-bit accesses.  My
RCU prototype required pointer-sized access, which you cannot make type-
safe without C++ templates.

> > provide neither atomic accesses to pointers, nor useful operations such
> > as exchange.  It won't be used consistently, because in some places you
> > just do not have an Atomic value (see both current uses of __sync_*
> > builtins).
> >
> > If you can make it complete, and prove it by using it where
> > __sync_* is
> 
> For current code, __sync_* is used rarely, I think except the
> barriers, only two places use it. We can substitute it easily.

No, you cannot.  See above, one doesn't use ints and the other is
guest state so migration becomes complicated if you use Atomic.  I'm
happy to be proven wrong, however.

Paolo
pingfan liu - Nov. 14, 2012, 9:38 a.m.
On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> > Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>> > > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> > >
>> > > If out of global lock, we will be challenged by SMP in low level,
>> > > so need atomic ops.
>> > >
>> > > This file is a wrapper of GCC atomic builtin.
>> >
>> > I still object to this.
>> >
>> > I know it enforces type-safety, but it is incomplete.  It doesn't
>>
>> Although it is incomplete, but the rest cases are rarely used.  Linux
>> faces such issue, and the "int" version is enough, so I think we can
>> borrow experience from there.
>
> One of the two places that use __sync_* require 64-bit accesses.  My
Yes, these two places are not easy to fix.

> RCU prototype required pointer-sized access, which you cannot make type-
But I think that your RCU prototype should rely on atomic of CPU, not
gcc‘s atomic.
Otherwise, it could be slow (I guess something like spinlock there).

Regards,
pingfan

> safe without C++ templates.
>
>> > provide neither atomic accesses to pointers, nor useful operations such
>> > as exchange.  It won't be used consistently, because in some places you
>> > just do not have an Atomic value (see both current uses of __sync_*
>> > builtins).
>> >
>> > If you can make it complete, and prove it by using it where
>> > __sync_* is
>>
>> For current code, __sync_* is used rarely, I think except the
>> barriers, only two places use it. We can substitute it easily.
>
> No, you cannot.  See above, one doesn't use ints and the other is
> guest state so migration becomes complicated if you use Atomic.  I'm
> happy to be proven wrong, however.
>
> Paolo
Paolo Bonzini - Nov. 14, 2012, 9:47 a.m.
Il 14/11/2012 10:38, liu ping fan ha scritto:
> On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>>
>>>>> If out of global lock, we will be challenged by SMP in low level,
>>>>> so need atomic ops.
>>>>>
>>>>> This file is a wrapper of GCC atomic builtin.
>>>>
>>>> I still object to this.
>>>>
>>>> I know it enforces type-safety, but it is incomplete.  It doesn't
>>>
>>> Although it is incomplete, but the rest cases are rarely used.  Linux
>>> faces such issue, and the "int" version is enough, so I think we can
>>> borrow experience from there.
>>
>> One of the two places that use __sync_* require 64-bit accesses.  My
> Yes, these two places are not easy to fix.

Which shows that Linux's atomic_t is not suited for QEMU, in my opinion.

>> RCU prototype required pointer-sized access, which you cannot make type-
> But I think that your RCU prototype should rely on atomic of CPU, not
> gcc‘s atomic.

What's the difference?  gcc's atomic produces the same instructions as
hand-written assembly (or should).

> Otherwise, it could be slow (I guess something like spinlock there).

Not sure what you mean.  I'm using two things: 1) write barriers; 2)
atomic_xchg of a pointer for fast, wait-free enqueuing of call_rcu
callbacks.

Paolo
pingfan liu - Nov. 15, 2012, 7:47 a.m.
On Wed, Nov 14, 2012 at 5:47 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 14/11/2012 10:38, liu ping fan ha scritto:
>> On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>> Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>>>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>>>
>>>>>> If out of global lock, we will be challenged by SMP in low level,
>>>>>> so need atomic ops.
>>>>>>
>>>>>> This file is a wrapper of GCC atomic builtin.
>>>>>
>>>>> I still object to this.
>>>>>
>>>>> I know it enforces type-safety, but it is incomplete.  It doesn't
>>>>
>>>> Although it is incomplete, but the rest cases are rarely used.  Linux
>>>> faces such issue, and the "int" version is enough, so I think we can
>>>> borrow experience from there.
>>>
>>> One of the two places that use __sync_* require 64-bit accesses.  My
>> Yes, these two places are not easy to fix.
>
> Which shows that Linux's atomic_t is not suited for QEMU, in my opinion.
>
>>> RCU prototype required pointer-sized access, which you cannot make type-
>> But I think that your RCU prototype should rely on atomic of CPU, not
>> gcc‘s atomic.
>
> What's the difference?  gcc's atomic produces the same instructions as
> hand-written assembly (or should).
>
Probably I made a mistake here, in vhost,  log =
__sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
the case  32bits qemu running on 64bits linux.  Right?   But how can
we read 32bits twice in atomic?  Seem that no instruction like "_lock
xchg" for this ops.  So I guess _sync_fetch_and_and() based on
something like spinlock.

And I think the broken issue is caused by vhost thread updates log,
while qemu read out it not atomicly, Right?

>> Otherwise, it could be slow (I guess something like spinlock there).
>
> Not sure what you mean.  I'm using two things: 1) write barriers; 2)
> atomic_xchg of a pointer for fast, wait-free enqueuing of call_rcu
> callbacks.
>
Got your main idea. Will go through your prototype. Just one more
question, why here atomic_xchg needed?  Could not the sequence "read
old pointer, set new pointer" satisfy your requirement?

Regards,
Pingfan

> Paolo
Paolo Bonzini - Nov. 15, 2012, 11:24 a.m.
Il 15/11/2012 08:47, liu ping fan ha scritto:
>>>> RCU prototype required pointer-sized access, which you cannot make type-
>>> But I think that your RCU prototype should rely on atomic of CPU, not
>>> gcc‘s atomic.
>>
>> What's the difference?  gcc's atomic produces the same instructions as
>> hand-written assembly (or should).
>>
> Probably I made a mistake here, in vhost,  log =
> __sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
> the case  32bits qemu running on 64bits linux.  Right?   But how can
> we read 32bits twice in atomic?

You can use cmpxchg8b.

>>> Otherwise, it could be slow (I guess something like spinlock there).
>>
>> Not sure what you mean.  I'm using two things: 1) write barriers; 2)
>> atomic_xchg of a pointer for fast, wait-free enqueuing of call_rcu
>> callbacks.
>
> Got your main idea. Will go through your prototype. Just one more
> question, why here atomic_xchg needed?  Could not the sequence "read
> old pointer, set new pointer" satisfy your requirement?

No, it would be racy.  Say you have this:

     wrong                     right
  -----------------------------------------
     ref(new)                  ref(new)
     old = tail                old = new
     tail = new                xchg(&old, &tail)
     old->next = new           old->next = new
     up(&sem)                  up(&sem)

where sem and tail are global, while old and new are local.  There can
be any number of producers as in the above scheme, and one consumer.  In
the consumer, iteration of the list starts from the second element (the
first element is dummy):

    dummy = new Node;
    tail = dummy;
    for(;;) {
        down(&sem);
        node = dummy->next;
        unref(dummy);
        visit node;
        dummy = node;
    }

Then the invariants are:

- the number of elements N in the list (starting from the dummy node and
following ->next) is always >1.  Because of this, you never have to
touch head when adding an element.

- the number of excess signals in the semaphore sem is always <= N-1.
Because of this, it doesn't matter that there is an instant in time
where tail is not reachable from head.  The element is really in the
list (as far as the consumer goes) only after the semaphore is signaled.

In the wrong version, two threads could read the same pointer:

    x = tail                                  |
                         y = tail             |
    tail = new_x                              |
                         tail = new_y         |
    x->next = new_x                           |   old_tail->next = new_x
    up(&sem)                                  |
                         y->next = new_y      |   old_tail->next = new_x
                         up(&sem)             |

No node points to new_x, so you have 2 nodes reachable from the head
(the dummy node, and new_y) with 2 signals on the semaphore,
contradicting the second invariant.

This instead works:

    x = new_x                                 |
                         y = new_y            |
    xchg(&x, &tail)                           |   tail = x, x = old_tail
                         xchg(&y, &tail)      |   tail = y, y = x
    x->next = new_x                           |   old_tail->next = new_x
    up(&sem)                                  |
                         y->next = new_y      |   x->next = new_y
                         up(&sem)             |

because you have 3 nodes and 2 signals.

Paolo
Richard Henderson - Nov. 16, 2012, 12:03 a.m.
On 2012-11-14 23:47, liu ping fan wrote:
> Probably I made a mistake here, in vhost,  log =
> __sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
> the case  32bits qemu running on 64bits linux.  Right?   But how can
> we read 32bits twice in atomic?  Seem that no instruction like "_lock
> xchg" for this ops.  So I guess _sync_fetch_and_and() based on
> something like spinlock.

... or for gcc 4.7 and later,

  log = __atomic_load_n(from, memory_model)

For i386, we will not perform 2 32-bit reads of course.  Paulo suggests
using cmpxchg8b, but that's a tad slow.  Instead we'll perform a 64-bit
read into either the fpu or the sse units, and from there copy the data
wherever it's needed.  Such 64-bit aligned reads are guaranteed to be
atomic for i586 (pentium) and later.

For other 32-bit architectures other possibilities exist.  Recent arm can
use its ldrexd insn.  Many of the 32-bit linux architectures have special
kernel entry points or schemes to perform atomic operations.  These are
generally based on the assumption of a single-processor system, and are
arranged to either disable interrupts or notice that no interrupt occurred,
while executing a code region.

As an ultimate fallback, yes we would use locks.  But none of the host
architectures that QEMU supports needs to do so.


r~
Avi Kivity - Nov. 18, 2012, 10:04 a.m.
On 11/15/2012 09:47 AM, liu ping fan wrote:
> On Wed, Nov 14, 2012 at 5:47 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 14/11/2012 10:38, liu ping fan ha scritto:
>>> On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>> Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>>>>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>>>>
>>>>>>> If out of global lock, we will be challenged by SMP in low level,
>>>>>>> so need atomic ops.
>>>>>>>
>>>>>>> This file is a wrapper of GCC atomic builtin.
>>>>>>
>>>>>> I still object to this.
>>>>>>
>>>>>> I know it enforces type-safety, but it is incomplete.  It doesn't
>>>>>
>>>>> Although it is incomplete, but the rest cases are rarely used.  Linux
>>>>> faces such issue, and the "int" version is enough, so I think we can
>>>>> borrow experience from there.
>>>>
>>>> One of the two places that use __sync_* require 64-bit accesses.  My
>>> Yes, these two places are not easy to fix.
>>
>> Which shows that Linux's atomic_t is not suited for QEMU, in my opinion.
>>
>>>> RCU prototype required pointer-sized access, which you cannot make type-
>>> But I think that your RCU prototype should rely on atomic of CPU, not
>>> gcc‘s atomic.
>>
>> What's the difference?  gcc's atomic produces the same instructions as
>> hand-written assembly (or should).
>>
> Probably I made a mistake here, in vhost,  log =
> __sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
> the case  32bits qemu running on 64bits linux.  Right?   But how can
> we read 32bits twice in atomic?  Seem that no instruction like "_lock
> xchg" for this ops.  So I guess _sync_fetch_and_and() based on
> something like spinlock.
> 
> And I think the broken issue is caused by vhost thread updates log,
> while qemu read out it not atomicly, Right?

For the log, 32-bit sync_fetch_and_and() is sufficient.  We only need to
ensure no bits are lost, we don't need 64-bit atomicity.
pingfan liu - Nov. 21, 2012, 5:57 a.m.
On Sun, Nov 18, 2012 at 6:04 PM, Avi Kivity <avi@redhat.com> wrote:
> On 11/15/2012 09:47 AM, liu ping fan wrote:
>> On Wed, Nov 14, 2012 at 5:47 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 14/11/2012 10:38, liu ping fan ha scritto:
>>>> On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>>>> Il 05/11/2012 06:38, Liu Ping Fan ha scritto:
>>>>>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>>>>>
>>>>>>>> If out of global lock, we will be challenged by SMP in low level,
>>>>>>>> so need atomic ops.
>>>>>>>>
>>>>>>>> This file is a wrapper of GCC atomic builtin.
>>>>>>>
>>>>>>> I still object to this.
>>>>>>>
>>>>>>> I know it enforces type-safety, but it is incomplete.  It doesn't
>>>>>>
>>>>>> Although it is incomplete, but the rest cases are rarely used.  Linux
>>>>>> faces such issue, and the "int" version is enough, so I think we can
>>>>>> borrow experience from there.
>>>>>
>>>>> One of the two places that use __sync_* require 64-bit accesses.  My
>>>> Yes, these two places are not easy to fix.
>>>
>>> Which shows that Linux's atomic_t is not suited for QEMU, in my opinion.
>>>
>>>>> RCU prototype required pointer-sized access, which you cannot make type-
>>>> But I think that your RCU prototype should rely on atomic of CPU, not
>>>> gcc‘s atomic.
>>>
>>> What's the difference?  gcc's atomic produces the same instructions as
>>> hand-written assembly (or should).
>>>
>> Probably I made a mistake here, in vhost,  log =
>> __sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
>> the case  32bits qemu running on 64bits linux.  Right?   But how can
>> we read 32bits twice in atomic?  Seem that no instruction like "_lock
>> xchg" for this ops.  So I guess _sync_fetch_and_and() based on
>> something like spinlock.
>>
>> And I think the broken issue is caused by vhost thread updates log,
>> while qemu read out it not atomicly, Right?
>
> For the log, 32-bit sync_fetch_and_and() is sufficient.  We only need to
> ensure no bits are lost, we don't need 64-bit atomicity.
>
Read vhost related code, just find it is quite different from kvm log.
And understand it.

Thanks and regards,
Pingfan
>
>
> --
> error compiling committee.c: too many arguments to function
pingfan liu - Nov. 21, 2012, 5:58 a.m.
On Fri, Nov 16, 2012 at 8:03 AM, Richard Henderson <rth@twiddle.net> wrote:
> On 2012-11-14 23:47, liu ping fan wrote:
>> Probably I made a mistake here, in vhost,  log =
>> __sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
>> the case  32bits qemu running on 64bits linux.  Right?   But how can
>> we read 32bits twice in atomic?  Seem that no instruction like "_lock
>> xchg" for this ops.  So I guess _sync_fetch_and_and() based on
>> something like spinlock.
>
> ... or for gcc 4.7 and later,
>
>   log = __atomic_load_n(from, memory_model)
>
> For i386, we will not perform 2 32-bit reads of course.  Paulo suggests
> using cmpxchg8b, but that's a tad slow.  Instead we'll perform a 64-bit
> read into either the fpu or the sse units, and from there copy the data
> wherever it's needed.  Such 64-bit aligned reads are guaranteed to be
> atomic for i586 (pentium) and later.
>
> For other 32-bit architectures other possibilities exist.  Recent arm can
> use its ldrexd insn.  Many of the 32-bit linux architectures have special
> kernel entry points or schemes to perform atomic operations.  These are
> generally based on the assumption of a single-processor system, and are
> arranged to either disable interrupts or notice that no interrupt occurred,
> while executing a code region.
>
> As an ultimate fallback, yes we would use locks.  But none of the host
> architectures that QEMU supports needs to do so.
>
Very appreciate for you so detailed description.

Regards,
Pingfan
>
> r~

Patch

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
new file mode 100644
index 0000000..a9e6d35
--- /dev/null
+++ b/include/qemu/atomic.h
@@ -0,0 +1,63 @@ 
+/*
+ * Simple wrapper of gcc Atomic-Builtins
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef __QEMU_ATOMIC_H
+#define __QEMU_ATOMIC_H
+
+typedef struct Atomic {
+    volatile int counter;
+} Atomic;
+
+static inline void atomic_set(Atomic *v, int i)
+{
+    v->counter = i;
+}
+
+static inline int atomic_read(Atomic *v)
+{
+    return v->counter;
+}
+
+static inline int atomic_return_and_add(int i, Atomic *v)
+{
+    int ret;
+
+    ret = __sync_fetch_and_add(&v->counter, i);
+    return ret;
+}
+
+static inline int atomic_return_and_sub(int i, Atomic *v)
+{
+    int ret;
+
+    ret = __sync_fetch_and_sub(&v->counter, i);
+    return ret;
+}
+
+/**
+ *  * atomic_inc - increment atomic variable
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically increments @v by 1.
+ *      */
+static inline void atomic_inc(Atomic *v)
+{
+    __sync_fetch_and_add(&v->counter, 1);
+}
+
+/**
+ *  * atomic_dec - decrement atomic variable
+ *   * @v: pointer of type Atomic
+ *    *
+ *     * Atomically decrements @v by 1.
+ *      */
+static inline void atomic_dec(Atomic *v)
+{
+    __sync_fetch_and_sub(&v->counter, 1);
+}
+
+#endif