Message ID | 1352093924-17598-2-git-send-email-qemulist@gmail.com |
---|---|
State | New |
Headers | show |
Il 05/11/2012 06:38, Liu Ping Fan ha scritto: > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com> > > If out of global lock, we will be challenged by SMP in low level, > so need atomic ops. > > This file is a wrapper of GCC atomic builtin. I still object to this. I know it enforces type-safety, but it is incomplete. It doesn't provide neither atomic accesses to pointers, nor useful operations such as exchange. It won't be used consistently, because in some places you just do not have an Atomic value (see both current uses of __sync_* builtins). If you can make it complete, and prove it by using it where __sync_* is used now, or just use gcc builtins directly. Paolo > > Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> > --- > include/qemu/atomic.h | 63 +++++++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 63 insertions(+), 0 deletions(-) > create mode 100644 include/qemu/atomic.h > > diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h > new file mode 100644 > index 0000000..a9e6d35 > --- /dev/null > +++ b/include/qemu/atomic.h > @@ -0,0 +1,63 @@ > +/* > + * Simple wrapper of gcc Atomic-Builtins > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory. > + * > + */ > +#ifndef __QEMU_ATOMIC_H > +#define __QEMU_ATOMIC_H > + > +typedef struct Atomic { > + volatile int counter; > +} Atomic; > + > +static inline void atomic_set(Atomic *v, int i) > +{ > + v->counter = i; > +} > + > +static inline int atomic_read(Atomic *v) > +{ > + return v->counter; > +} > + > +static inline int atomic_return_and_add(int i, Atomic *v) > +{ > + int ret; > + > + ret = __sync_fetch_and_add(&v->counter, i); > + return ret; > +} > + > +static inline int atomic_return_and_sub(int i, Atomic *v) > +{ > + int ret; > + > + ret = __sync_fetch_and_sub(&v->counter, i); > + return ret; > +} > + > +/** > + * * atomic_inc - increment atomic variable > + * * @v: pointer of type Atomic > + * * > + * * Atomically increments @v by 1. > + * */ > +static inline void atomic_inc(Atomic *v) > +{ > + __sync_fetch_and_add(&v->counter, 1); > +} > + > +/** > + * * atomic_dec - decrement atomic variable > + * * @v: pointer of type Atomic > + * * > + * * Atomically decrements @v by 1. > + * */ > +static inline void atomic_dec(Atomic *v) > +{ > + __sync_fetch_and_sub(&v->counter, 1); > +} > + > +#endif >
On Mon, Nov 12, 2012 at 5:54 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 05/11/2012 06:38, Liu Ping Fan ha scritto: >> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com> >> >> If out of global lock, we will be challenged by SMP in low level, >> so need atomic ops. >> >> This file is a wrapper of GCC atomic builtin. > > I still object to this. > > I know it enforces type-safety, but it is incomplete. It doesn't Although it is incomplete, but the rest cases are rarely used. Linux faces such issue, and the "int" version is enough, so I think we can borrow experience from there. > provide neither atomic accesses to pointers, nor useful operations such > as exchange. It won't be used consistently, because in some places you > just do not have an Atomic value (see both current uses of __sync_* > builtins). > > If you can make it complete, and prove it by using it where __sync_* is For current code, __sync_* is used rarely, I think except the barriers, only two places use it. We can substitute it easily. Regards, Pingfan > used now, or just use gcc builtins directly. > > Paolo > >> >> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> >> --- >> include/qemu/atomic.h | 63 +++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 files changed, 63 insertions(+), 0 deletions(-) >> create mode 100644 include/qemu/atomic.h >> >> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h >> new file mode 100644 >> index 0000000..a9e6d35 >> --- /dev/null >> +++ b/include/qemu/atomic.h >> @@ -0,0 +1,63 @@ >> +/* >> + * Simple wrapper of gcc Atomic-Builtins >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2 or later. >> + * See the COPYING file in the top-level directory. >> + * >> + */ >> +#ifndef __QEMU_ATOMIC_H >> +#define __QEMU_ATOMIC_H >> + >> +typedef struct Atomic { >> + volatile int counter; >> +} Atomic; >> + >> +static inline void atomic_set(Atomic *v, int i) >> +{ >> + v->counter = i; >> +} >> + >> +static inline int atomic_read(Atomic *v) >> +{ >> + return v->counter; >> +} >> + >> +static inline int atomic_return_and_add(int i, Atomic *v) >> +{ >> + int ret; >> + >> + ret = __sync_fetch_and_add(&v->counter, i); >> + return ret; >> +} >> + >> +static inline int atomic_return_and_sub(int i, Atomic *v) >> +{ >> + int ret; >> + >> + ret = __sync_fetch_and_sub(&v->counter, i); >> + return ret; >> +} >> + >> +/** >> + * * atomic_inc - increment atomic variable >> + * * @v: pointer of type Atomic >> + * * >> + * * Atomically increments @v by 1. >> + * */ >> +static inline void atomic_inc(Atomic *v) >> +{ >> + __sync_fetch_and_add(&v->counter, 1); >> +} >> + >> +/** >> + * * atomic_dec - decrement atomic variable >> + * * @v: pointer of type Atomic >> + * * >> + * * Atomically decrements @v by 1. >> + * */ >> +static inline void atomic_dec(Atomic *v) >> +{ >> + __sync_fetch_and_sub(&v->counter, 1); >> +} >> + >> +#endif >> >
> > Il 05/11/2012 06:38, Liu Ping Fan ha scritto: > > > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com> > > > > > > If out of global lock, we will be challenged by SMP in low level, > > > so need atomic ops. > > > > > > This file is a wrapper of GCC atomic builtin. > > > > I still object to this. > > > > I know it enforces type-safety, but it is incomplete. It doesn't > > Although it is incomplete, but the rest cases are rarely used. Linux > faces such issue, and the "int" version is enough, so I think we can > borrow experience from there. One of the two places that use __sync_* require 64-bit accesses. My RCU prototype required pointer-sized access, which you cannot make type- safe without C++ templates. > > provide neither atomic accesses to pointers, nor useful operations such > > as exchange. It won't be used consistently, because in some places you > > just do not have an Atomic value (see both current uses of __sync_* > > builtins). > > > > If you can make it complete, and prove it by using it where > > __sync_* is > > For current code, __sync_* is used rarely, I think except the > barriers, only two places use it. We can substitute it easily. No, you cannot. See above, one doesn't use ints and the other is guest state so migration becomes complicated if you use Atomic. I'm happy to be proven wrong, however. Paolo
> > Il 05/11/2012 06:38, Liu Ping Fan ha scritto: > > > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com> > > > > > > If out of global lock, we will be challenged by SMP in low level, > > > so need atomic ops. > > > > > > This file is a wrapper of GCC atomic builtin. > > > > I still object to this. > > > > I know it enforces type-safety, but it is incomplete. It doesn't > > Although it is incomplete, but the rest cases are rarely used. Linux > faces such issue, and the "int" version is enough, so I think we can > borrow experience from there. One of the two places that use __sync_* require 64-bit accesses. My RCU prototype required pointer-sized access, which you cannot make type- safe without C++ templates. > > provide neither atomic accesses to pointers, nor useful operations such > > as exchange. It won't be used consistently, because in some places you > > just do not have an Atomic value (see both current uses of __sync_* > > builtins). > > > > If you can make it complete, and prove it by using it where > > __sync_* is > > For current code, __sync_* is used rarely, I think except the > barriers, only two places use it. We can substitute it easily. No, you cannot. See above, one doesn't use ints and the other is guest state so migration becomes complicated if you use Atomic. I'm happy to be proven wrong, however. Paolo
On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >> > Il 05/11/2012 06:38, Liu Ping Fan ha scritto: >> > > From: Liu Ping Fan <pingfank@linux.vnet.ibm.com> >> > > >> > > If out of global lock, we will be challenged by SMP in low level, >> > > so need atomic ops. >> > > >> > > This file is a wrapper of GCC atomic builtin. >> > >> > I still object to this. >> > >> > I know it enforces type-safety, but it is incomplete. It doesn't >> >> Although it is incomplete, but the rest cases are rarely used. Linux >> faces such issue, and the "int" version is enough, so I think we can >> borrow experience from there. > > One of the two places that use __sync_* require 64-bit accesses. My Yes, these two places are not easy to fix. > RCU prototype required pointer-sized access, which you cannot make type- But I think that your RCU prototype should rely on atomic of CPU, not gcc‘s atomic. Otherwise, it could be slow (I guess something like spinlock there). Regards, pingfan > safe without C++ templates. > >> > provide neither atomic accesses to pointers, nor useful operations such >> > as exchange. It won't be used consistently, because in some places you >> > just do not have an Atomic value (see both current uses of __sync_* >> > builtins). >> > >> > If you can make it complete, and prove it by using it where >> > __sync_* is >> >> For current code, __sync_* is used rarely, I think except the >> barriers, only two places use it. We can substitute it easily. > > No, you cannot. See above, one doesn't use ints and the other is > guest state so migration becomes complicated if you use Atomic. I'm > happy to be proven wrong, however. > > Paolo
Il 14/11/2012 10:38, liu ping fan ha scritto: > On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>>> Il 05/11/2012 06:38, Liu Ping Fan ha scritto: >>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com> >>>>> >>>>> If out of global lock, we will be challenged by SMP in low level, >>>>> so need atomic ops. >>>>> >>>>> This file is a wrapper of GCC atomic builtin. >>>> >>>> I still object to this. >>>> >>>> I know it enforces type-safety, but it is incomplete. It doesn't >>> >>> Although it is incomplete, but the rest cases are rarely used. Linux >>> faces such issue, and the "int" version is enough, so I think we can >>> borrow experience from there. >> >> One of the two places that use __sync_* require 64-bit accesses. My > Yes, these two places are not easy to fix. Which shows that Linux's atomic_t is not suited for QEMU, in my opinion. >> RCU prototype required pointer-sized access, which you cannot make type- > But I think that your RCU prototype should rely on atomic of CPU, not > gcc‘s atomic. What's the difference? gcc's atomic produces the same instructions as hand-written assembly (or should). > Otherwise, it could be slow (I guess something like spinlock there). Not sure what you mean. I'm using two things: 1) write barriers; 2) atomic_xchg of a pointer for fast, wait-free enqueuing of call_rcu callbacks. Paolo
On Wed, Nov 14, 2012 at 5:47 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 14/11/2012 10:38, liu ping fan ha scritto: >> On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>>>> Il 05/11/2012 06:38, Liu Ping Fan ha scritto: >>>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com> >>>>>> >>>>>> If out of global lock, we will be challenged by SMP in low level, >>>>>> so need atomic ops. >>>>>> >>>>>> This file is a wrapper of GCC atomic builtin. >>>>> >>>>> I still object to this. >>>>> >>>>> I know it enforces type-safety, but it is incomplete. It doesn't >>>> >>>> Although it is incomplete, but the rest cases are rarely used. Linux >>>> faces such issue, and the "int" version is enough, so I think we can >>>> borrow experience from there. >>> >>> One of the two places that use __sync_* require 64-bit accesses. My >> Yes, these two places are not easy to fix. > > Which shows that Linux's atomic_t is not suited for QEMU, in my opinion. > >>> RCU prototype required pointer-sized access, which you cannot make type- >> But I think that your RCU prototype should rely on atomic of CPU, not >> gcc‘s atomic. > > What's the difference? gcc's atomic produces the same instructions as > hand-written assembly (or should). > Probably I made a mistake here, in vhost, log = __sync_fetch_and_and(from, 0) is used to fetch 64bits atomically in the case 32bits qemu running on 64bits linux. Right? But how can we read 32bits twice in atomic? Seem that no instruction like "_lock xchg" for this ops. So I guess _sync_fetch_and_and() based on something like spinlock. And I think the broken issue is caused by vhost thread updates log, while qemu read out it not atomicly, Right? >> Otherwise, it could be slow (I guess something like spinlock there). > > Not sure what you mean. I'm using two things: 1) write barriers; 2) > atomic_xchg of a pointer for fast, wait-free enqueuing of call_rcu > callbacks. > Got your main idea. Will go through your prototype. Just one more question, why here atomic_xchg needed? Could not the sequence "read old pointer, set new pointer" satisfy your requirement? Regards, Pingfan > Paolo
Il 15/11/2012 08:47, liu ping fan ha scritto: >>>> RCU prototype required pointer-sized access, which you cannot make type- >>> But I think that your RCU prototype should rely on atomic of CPU, not >>> gcc‘s atomic. >> >> What's the difference? gcc's atomic produces the same instructions as >> hand-written assembly (or should). >> > Probably I made a mistake here, in vhost, log = > __sync_fetch_and_and(from, 0) is used to fetch 64bits atomically in > the case 32bits qemu running on 64bits linux. Right? But how can > we read 32bits twice in atomic? You can use cmpxchg8b. >>> Otherwise, it could be slow (I guess something like spinlock there). >> >> Not sure what you mean. I'm using two things: 1) write barriers; 2) >> atomic_xchg of a pointer for fast, wait-free enqueuing of call_rcu >> callbacks. > > Got your main idea. Will go through your prototype. Just one more > question, why here atomic_xchg needed? Could not the sequence "read > old pointer, set new pointer" satisfy your requirement? No, it would be racy. Say you have this: wrong right ----------------------------------------- ref(new) ref(new) old = tail old = new tail = new xchg(&old, &tail) old->next = new old->next = new up(&sem) up(&sem) where sem and tail are global, while old and new are local. There can be any number of producers as in the above scheme, and one consumer. In the consumer, iteration of the list starts from the second element (the first element is dummy): dummy = new Node; tail = dummy; for(;;) { down(&sem); node = dummy->next; unref(dummy); visit node; dummy = node; } Then the invariants are: - the number of elements N in the list (starting from the dummy node and following ->next) is always >1. Because of this, you never have to touch head when adding an element. - the number of excess signals in the semaphore sem is always <= N-1. Because of this, it doesn't matter that there is an instant in time where tail is not reachable from head. The element is really in the list (as far as the consumer goes) only after the semaphore is signaled. In the wrong version, two threads could read the same pointer: x = tail | y = tail | tail = new_x | tail = new_y | x->next = new_x | old_tail->next = new_x up(&sem) | y->next = new_y | old_tail->next = new_x up(&sem) | No node points to new_x, so you have 2 nodes reachable from the head (the dummy node, and new_y) with 2 signals on the semaphore, contradicting the second invariant. This instead works: x = new_x | y = new_y | xchg(&x, &tail) | tail = x, x = old_tail xchg(&y, &tail) | tail = y, y = x x->next = new_x | old_tail->next = new_x up(&sem) | y->next = new_y | x->next = new_y up(&sem) | because you have 3 nodes and 2 signals. Paolo
On 2012-11-14 23:47, liu ping fan wrote: > Probably I made a mistake here, in vhost, log = > __sync_fetch_and_and(from, 0) is used to fetch 64bits atomically in > the case 32bits qemu running on 64bits linux. Right? But how can > we read 32bits twice in atomic? Seem that no instruction like "_lock > xchg" for this ops. So I guess _sync_fetch_and_and() based on > something like spinlock. ... or for gcc 4.7 and later, log = __atomic_load_n(from, memory_model) For i386, we will not perform 2 32-bit reads of course. Paulo suggests using cmpxchg8b, but that's a tad slow. Instead we'll perform a 64-bit read into either the fpu or the sse units, and from there copy the data wherever it's needed. Such 64-bit aligned reads are guaranteed to be atomic for i586 (pentium) and later. For other 32-bit architectures other possibilities exist. Recent arm can use its ldrexd insn. Many of the 32-bit linux architectures have special kernel entry points or schemes to perform atomic operations. These are generally based on the assumption of a single-processor system, and are arranged to either disable interrupts or notice that no interrupt occurred, while executing a code region. As an ultimate fallback, yes we would use locks. But none of the host architectures that QEMU supports needs to do so. r~
On 11/15/2012 09:47 AM, liu ping fan wrote: > On Wed, Nov 14, 2012 at 5:47 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >> Il 14/11/2012 10:38, liu ping fan ha scritto: >>> On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>>>>> Il 05/11/2012 06:38, Liu Ping Fan ha scritto: >>>>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com> >>>>>>> >>>>>>> If out of global lock, we will be challenged by SMP in low level, >>>>>>> so need atomic ops. >>>>>>> >>>>>>> This file is a wrapper of GCC atomic builtin. >>>>>> >>>>>> I still object to this. >>>>>> >>>>>> I know it enforces type-safety, but it is incomplete. It doesn't >>>>> >>>>> Although it is incomplete, but the rest cases are rarely used. Linux >>>>> faces such issue, and the "int" version is enough, so I think we can >>>>> borrow experience from there. >>>> >>>> One of the two places that use __sync_* require 64-bit accesses. My >>> Yes, these two places are not easy to fix. >> >> Which shows that Linux's atomic_t is not suited for QEMU, in my opinion. >> >>>> RCU prototype required pointer-sized access, which you cannot make type- >>> But I think that your RCU prototype should rely on atomic of CPU, not >>> gcc‘s atomic. >> >> What's the difference? gcc's atomic produces the same instructions as >> hand-written assembly (or should). >> > Probably I made a mistake here, in vhost, log = > __sync_fetch_and_and(from, 0) is used to fetch 64bits atomically in > the case 32bits qemu running on 64bits linux. Right? But how can > we read 32bits twice in atomic? Seem that no instruction like "_lock > xchg" for this ops. So I guess _sync_fetch_and_and() based on > something like spinlock. > > And I think the broken issue is caused by vhost thread updates log, > while qemu read out it not atomicly, Right? For the log, 32-bit sync_fetch_and_and() is sufficient. We only need to ensure no bits are lost, we don't need 64-bit atomicity.
On Sun, Nov 18, 2012 at 6:04 PM, Avi Kivity <avi@redhat.com> wrote: > On 11/15/2012 09:47 AM, liu ping fan wrote: >> On Wed, Nov 14, 2012 at 5:47 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> Il 14/11/2012 10:38, liu ping fan ha scritto: >>>> On Tue, Nov 13, 2012 at 6:11 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: >>>>>>> Il 05/11/2012 06:38, Liu Ping Fan ha scritto: >>>>>>>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com> >>>>>>>> >>>>>>>> If out of global lock, we will be challenged by SMP in low level, >>>>>>>> so need atomic ops. >>>>>>>> >>>>>>>> This file is a wrapper of GCC atomic builtin. >>>>>>> >>>>>>> I still object to this. >>>>>>> >>>>>>> I know it enforces type-safety, but it is incomplete. It doesn't >>>>>> >>>>>> Although it is incomplete, but the rest cases are rarely used. Linux >>>>>> faces such issue, and the "int" version is enough, so I think we can >>>>>> borrow experience from there. >>>>> >>>>> One of the two places that use __sync_* require 64-bit accesses. My >>>> Yes, these two places are not easy to fix. >>> >>> Which shows that Linux's atomic_t is not suited for QEMU, in my opinion. >>> >>>>> RCU prototype required pointer-sized access, which you cannot make type- >>>> But I think that your RCU prototype should rely on atomic of CPU, not >>>> gcc‘s atomic. >>> >>> What's the difference? gcc's atomic produces the same instructions as >>> hand-written assembly (or should). >>> >> Probably I made a mistake here, in vhost, log = >> __sync_fetch_and_and(from, 0) is used to fetch 64bits atomically in >> the case 32bits qemu running on 64bits linux. Right? But how can >> we read 32bits twice in atomic? Seem that no instruction like "_lock >> xchg" for this ops. So I guess _sync_fetch_and_and() based on >> something like spinlock. >> >> And I think the broken issue is caused by vhost thread updates log, >> while qemu read out it not atomicly, Right? > > For the log, 32-bit sync_fetch_and_and() is sufficient. We only need to > ensure no bits are lost, we don't need 64-bit atomicity. > Read vhost related code, just find it is quite different from kvm log. And understand it. Thanks and regards, Pingfan > > > -- > error compiling committee.c: too many arguments to function
On Fri, Nov 16, 2012 at 8:03 AM, Richard Henderson <rth@twiddle.net> wrote: > On 2012-11-14 23:47, liu ping fan wrote: >> Probably I made a mistake here, in vhost, log = >> __sync_fetch_and_and(from, 0) is used to fetch 64bits atomically in >> the case 32bits qemu running on 64bits linux. Right? But how can >> we read 32bits twice in atomic? Seem that no instruction like "_lock >> xchg" for this ops. So I guess _sync_fetch_and_and() based on >> something like spinlock. > > ... or for gcc 4.7 and later, > > log = __atomic_load_n(from, memory_model) > > For i386, we will not perform 2 32-bit reads of course. Paulo suggests > using cmpxchg8b, but that's a tad slow. Instead we'll perform a 64-bit > read into either the fpu or the sse units, and from there copy the data > wherever it's needed. Such 64-bit aligned reads are guaranteed to be > atomic for i586 (pentium) and later. > > For other 32-bit architectures other possibilities exist. Recent arm can > use its ldrexd insn. Many of the 32-bit linux architectures have special > kernel entry points or schemes to perform atomic operations. These are > generally based on the assumption of a single-processor system, and are > arranged to either disable interrupts or notice that no interrupt occurred, > while executing a code region. > > As an ultimate fallback, yes we would use locks. But none of the host > architectures that QEMU supports needs to do so. > Very appreciate for you so detailed description. Regards, Pingfan > > r~
diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h new file mode 100644 index 0000000..a9e6d35 --- /dev/null +++ b/include/qemu/atomic.h @@ -0,0 +1,63 @@ +/* + * Simple wrapper of gcc Atomic-Builtins + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + * + */ +#ifndef __QEMU_ATOMIC_H +#define __QEMU_ATOMIC_H + +typedef struct Atomic { + volatile int counter; +} Atomic; + +static inline void atomic_set(Atomic *v, int i) +{ + v->counter = i; +} + +static inline int atomic_read(Atomic *v) +{ + return v->counter; +} + +static inline int atomic_return_and_add(int i, Atomic *v) +{ + int ret; + + ret = __sync_fetch_and_add(&v->counter, i); + return ret; +} + +static inline int atomic_return_and_sub(int i, Atomic *v) +{ + int ret; + + ret = __sync_fetch_and_sub(&v->counter, i); + return ret; +} + +/** + * * atomic_inc - increment atomic variable + * * @v: pointer of type Atomic + * * + * * Atomically increments @v by 1. + * */ +static inline void atomic_inc(Atomic *v) +{ + __sync_fetch_and_add(&v->counter, 1); +} + +/** + * * atomic_dec - decrement atomic variable + * * @v: pointer of type Atomic + * * + * * Atomically decrements @v by 1. + * */ +static inline void atomic_dec(Atomic *v) +{ + __sync_fetch_and_sub(&v->counter, 1); +} + +#endif