diff mbox series

[RFC] Move to C11 Atomics

Message ID 20200921104107.134323-1-stefanha@redhat.com
State New
Headers show
Series [RFC] Move to C11 Atomics | expand

Commit Message

Stefan Hajnoczi Sept. 21, 2020, 10:41 a.m. UTC
This patch is incomplete but I am looking for feedback on the approach
before fully implementing it (which will involve lots of changes).

QEMU's atomic.h provides atomic operations and is intended to work with
or without <stdatomic.h>. Some of the atomic.h APIs are from C11 Atomics
while others are Linux-inspired or QEMU-specific extensions.

atomic.h works fine with gcc but clang enforces the following:

  atomic_fetch_add() and friends must use C11 Atomic atomic_* types.

  __atomic_fetch_add() and friends must use direct types (int, etc) and
  NOT C11 Atomic types.

The consequences are:

1. atomic_fetch_*() produces compilation errors since QEMU code uses
   direct types and not C11 Atomic types.

2. atomic_fetch_*() cannot be used on the same variables as
   __atomic_fetch_*() because they support different types. This is a
   problem because QEMU's atomic.h builds on __atomic_fetch_*() and code
   expects to use both atomic_fetch_*() and QEMU atomic.h APIs on the
   same variables.

I would like to move QEMU to C11 Atomics, removing QEMU-specific APIs
which have C11 equivalents. The new atomic.h would #include
<stdatomic.h> and define additional APIs on top. It also needs to carry
a <stdatomic.h> fallback implementation for RHEL 7 because gcc does not
have <stdatomic.h> there.

The upshot is that all atomic variables in QEMU need to use C11 Atomic
atomic_* types. This is a big change!

Also, existing atomic.h APIs that use __atomic_fetch_*() need to move to
C11 Atomics instead so that they take atomic_* types.

This patch contains a few examples of this conversion. Things to note:

1. Reimplement everything in terms of atomic_fetch_*() and other C11
   Atomic APIs. For example atomic_fetch_inc() becomes
   atomic_fetch_add(ptr, 1).

2. atomic_*_fetch() is not available in C11 Atomics so emulate it by
   performing the operation twice (once atomic, then again non-atomic
   using the fetched old atomic value). Yuck!

3. Can everything in atomic.h really be converted to C11 Atomics? I'm
   not sure yet :(.

Better ideas?

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/aio.h   |  2 +-
 include/qemu/atomic.h | 79 +++++++++++++++++++++++++++++++++----------
 include/qemu/bitops.h |  2 +-
 include/qom/object.h  |  3 +-
 util/aio-posix.h      |  2 +-
 util/async.c          |  2 +-
 meson.build           |  3 ++
 7 files changed, 70 insertions(+), 23 deletions(-)

Comments

Paolo Bonzini Sept. 21, 2020, 11:15 a.m. UTC | #1
On 21/09/20 12:41, Stefan Hajnoczi wrote:
> The upshot is that all atomic variables in QEMU need to use C11 Atomic
> atomic_* types. This is a big change!

The main issue with this is that C11 atomic types do too much magic,
including defaulting to seq-cst operations for loads and stores. As
documented in docs/devel/atomics.rst, seq-cst loads and stores are
almost never what you want (they're only a little below volatile :)):

1) in almost all cases where we do message passing between threads, we
can use store-release/load-acquire

2) when we want a full memory barrier, it's usually better to write the
whole construct by hand, to avoid issues like the ones we had with
->notify_me.

In addition, atomics are complicated enough that you almost always want
a sign that you are using them, even at the cost of heavier-looking
code.  For example "atomic_read" straight ahead tells you "this is
complicated but somebody has thought about it", while a naked access to
a global variable or a struct field tells you "check which mutex is held
when this function is called".  By allowing shortcuts for seq-cst loads
and stores, C11 atomics fool you into thinking that seq-cst accesses are
all you need, while in reality they create more problems than they solve. :(

So if we need to make a big change in the implementation of atomics,
what we should do is to wrap variables that are accessed atomically with
a struct.  This would enforce using atomic_read/atomic_set to access
them.  These structs could be types like stdatomic.h's atomic_char,
though probably we'd name them something like AtomicChar.

It would move us further from C11; however, it would also make it harder
to write buggy code.

If there is a problem of namespace pollution between qemu/atomic.h and
C11 atomics we could also fix that.

> 1. Reimplement everything in terms of atomic_fetch_*() and other C11
>    Atomic APIs. For example atomic_fetch_inc() becomes
>    atomic_fetch_add(ptr, 1).
> 
> 2. atomic_*_fetch() is not available in C11 Atomics so emulate it by
>    performing the operation twice (once atomic, then again non-atomic
>    using the fetched old atomic value). Yuck!

They're hidden in plain sight if you are okay with seq-cst operations
(which we almost always are for RMW operations, unlike loads and
stores): "x += 3" is an atomic_add_fetch(&x, 3).  However, they share
with loads and stores the problem of being too magic; if you see "x +=
3" you don't think of it as something that's thread safe.

Sorry for being so negative.  Unfortunately while the C11 memory model
is a great addition, in my not so humble opinion stdatomic.h was
designed without enough experience with atomics and with the memory
model itself (it even took people like Hans Boehm and Paul McKenney to
convince the committee that non-seq-cst atomics have a place).

Thanks,

Paolo

> 3. Can everything in atomic.h really be converted to C11 Atomics? I'm
>    not sure yet :(.
> 
> Better ideas?
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  include/block/aio.h   |  2 +-
>  include/qemu/atomic.h | 79 +++++++++++++++++++++++++++++++++----------
>  include/qemu/bitops.h |  2 +-
>  include/qom/object.h  |  3 +-
>  util/aio-posix.h      |  2 +-
>  util/async.c          |  2 +-
>  meson.build           |  3 ++
>  7 files changed, 70 insertions(+), 23 deletions(-)
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index b2f703fa3f..466c058880 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -220,7 +220,7 @@ struct AioContext {
>       */
>      QEMUTimerListGroup tlg;
>  
> -    int external_disable_cnt;
> +    atomic_int external_disable_cnt;
>  
>      /* Number of AioHandlers without .io_poll() */
>      int poll_disable_cnt;
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index ff72db5115..4fbbd5f362 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -15,6 +15,32 @@
>  #ifndef QEMU_ATOMIC_H
>  #define QEMU_ATOMIC_H
>  
> +#include <stdbool.h>
> +#include <stddef.h>
> +
> +/* Use C11 Atomics if possible, otherwise fall back to custom definitions */
> +#ifdef CONFIG_STDATOMIC_H
> +#include <stdatomic.h>
> +#else
> +/* Commonly used types from C11 "7.17.6 Atomic integer types" */
> +typedef bool atomic_bool;
> +typedef char atomic_char;
> +typedef signed char atomic_schar;
> +typedef unsigned char atomic_uchar;
> +typedef short atomic_short;
> +typedef unsigned short atomic_ushort;
> +typedef int atomic_int;
> +typedef unsigned int atomic_uint;
> +typedef long atomic_long;
> +typedef unsigned long atomic_ulong;
> +typedef long long atomic_llong;
> +typedef unsigned long long atomic_ullong;
> +typedef intptr_t atomic_intptr_t;
> +typedef uintptr_t atomic_uintptr_t;
> +typedef size_t atomic_size_t;
> +typedef ptrdiff_t atomic_ptrdiff_t;
> +#endif
> +
>  /* Compiler barrier */
>  #define barrier()   ({ asm volatile("" ::: "memory"); (void)0; })
>  
> @@ -205,10 +231,6 @@
>      atomic_cmpxchg__nocheck(ptr, old, new);                             \
>  })
>  
> -/* Provide shorter names for GCC atomic builtins, return old value */
> -#define atomic_fetch_inc(ptr)  __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST)
> -#define atomic_fetch_dec(ptr)  __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST)
> -
>  #ifndef atomic_fetch_add
>  #define atomic_fetch_add(ptr, n) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST)
>  #define atomic_fetch_sub(ptr, n) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)
> @@ -217,22 +239,41 @@
>  #define atomic_fetch_xor(ptr, n) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST)
>  #endif
>  
> -#define atomic_inc_fetch(ptr)    __atomic_add_fetch(ptr, 1, __ATOMIC_SEQ_CST)
> -#define atomic_dec_fetch(ptr)    __atomic_sub_fetch(ptr, 1, __ATOMIC_SEQ_CST)
> -#define atomic_add_fetch(ptr, n) __atomic_add_fetch(ptr, n, __ATOMIC_SEQ_CST)
> -#define atomic_sub_fetch(ptr, n) __atomic_sub_fetch(ptr, n, __ATOMIC_SEQ_CST)
> -#define atomic_and_fetch(ptr, n) __atomic_and_fetch(ptr, n, __ATOMIC_SEQ_CST)
> -#define atomic_or_fetch(ptr, n)  __atomic_or_fetch(ptr, n, __ATOMIC_SEQ_CST)
> -#define atomic_xor_fetch(ptr, n) __atomic_xor_fetch(ptr, n, __ATOMIC_SEQ_CST)
> +/* Provide shorter names for GCC atomic builtins, return old value */
> +#define atomic_fetch_inc(ptr)  atomic_fetch_add(ptr, 1)
> +#define atomic_fetch_dec(ptr)  atomic_fetch_sub(ptr, 1)
> +
> +#define atomic_inc_fetch(ptr)    (atomic_fetch_add(ptr, 1) + 1)
> +#define atomic_dec_fetch(ptr)    (atomic_fetch_sub(ptr, 1) - 1)
> +#define atomic_add_fetch(ptr, n) ({ \
> +    typeof(n) _n = n; \
> +    atomic_fetch_add(ptr, _n) + _n; \
> +})
> +#define atomic_sub_fetch(ptr, n) ({ \
> +    typeof(n) _n = n; \
> +    atomic_fetch_sub(ptr, _n) - n; \
> +})
> +#define atomic_and_fetch(ptr, n) ({ \
> +    typeof(n) _n = n; \
> +    atomic_fetch_and(ptr, _n) & _n; \
> +})
> +#define atomic_or_fetch(ptr, n) ({ \
> +    typeof(n) _n = n; \
> +    atomic_fetch_or(ptr, _n) | _n; \
> +})
> +#define atomic_xor_fetch(ptr, n) ({ \
> +    typeof(n) _n = n; \
> +    atomic_fetch_xor(ptr, _n) ^ _n; \
> +})
>  
>  /* And even shorter names that return void.  */
> -#define atomic_inc(ptr)    ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST))
> -#define atomic_dec(ptr)    ((void) __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST))
> -#define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST))
> -#define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST))
> -#define atomic_and(ptr, n) ((void) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST))
> -#define atomic_or(ptr, n)  ((void) __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST))
> -#define atomic_xor(ptr, n) ((void) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST))
> +#define atomic_inc(ptr)    ((void) atomic_fetch_add(ptr, 1))
> +#define atomic_dec(ptr)    ((void) atomic_fetch_sub(ptr, 1))
> +#define atomic_add(ptr, n) ((void) atomic_fetch_add(ptr, n))
> +#define atomic_sub(ptr, n) ((void) atomic_fetch_sub(ptr, n))
> +#define atomic_and(ptr, n) ((void) atomic_fetch_and(ptr, n))
> +#define atomic_or(ptr, n)  ((void) atomic_fetch_or(ptr, n))
> +#define atomic_xor(ptr, n) ((void) atomic_fetch_xor(ptr, n))
>  
>  #else /* __ATOMIC_RELAXED */
>  
> @@ -424,6 +465,8 @@
>  #define atomic_or(ptr, n)      ((void) __sync_fetch_and_or(ptr, n))
>  #define atomic_xor(ptr, n)     ((void) __sync_fetch_and_xor(ptr, n))
>  
> +#error TODO
> +
>  #endif /* __ATOMIC_RELAXED */
>  
>  #ifndef smp_wmb
> diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
> index f55ce8b320..e9d676d112 100644
> --- a/include/qemu/bitops.h
> +++ b/include/qemu/bitops.h
> @@ -49,7 +49,7 @@ static inline void set_bit(long nr, unsigned long *addr)
>  static inline void set_bit_atomic(long nr, unsigned long *addr)
>  {
>      unsigned long mask = BIT_MASK(nr);
> -    unsigned long *p = addr + BIT_WORD(nr);
> +    atomic_ulong *p = (atomic_ulong *)(addr + BIT_WORD(nr));
>  
>      atomic_or(p, mask);
>  }
> diff --git a/include/qom/object.h b/include/qom/object.h
> index 056f67ab3b..f51244b61f 100644
> --- a/include/qom/object.h
> +++ b/include/qom/object.h
> @@ -15,6 +15,7 @@
>  #define QEMU_OBJECT_H
>  
>  #include "qapi/qapi-builtin-types.h"
> +#include "qemu/atomic.h"
>  #include "qemu/module.h"
>  #include "qom/object.h"
>  
> @@ -550,7 +551,7 @@ struct Object
>      ObjectClass *class;
>      ObjectFree *free;
>      GHashTable *properties;
> -    uint32_t ref;
> +    atomic_uint ref;
>      Object *parent;
>  };
>  
> diff --git a/util/aio-posix.h b/util/aio-posix.h
> index c80c04506a..c5b446f0a1 100644
> --- a/util/aio-posix.h
> +++ b/util/aio-posix.h
> @@ -33,7 +33,7 @@ struct AioHandler {
>      QLIST_ENTRY(AioHandler) node_poll;
>  #ifdef CONFIG_LINUX_IO_URING
>      QSLIST_ENTRY(AioHandler) node_submitted;
> -    unsigned flags; /* see fdmon-io_uring.c */
> +    atomic_uint flags; /* see fdmon-io_uring.c */
>  #endif
>      int64_t poll_idle_timeout; /* when to stop userspace polling */
>      bool is_external;
> diff --git a/util/async.c b/util/async.c
> index 4266745dee..dcf1a32492 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -60,7 +60,7 @@ struct QEMUBH {
>      QEMUBHFunc *cb;
>      void *opaque;
>      QSLIST_ENTRY(QEMUBH) next;
> -    unsigned flags;
> +    atomic_uint flags;
>  };
>  
>  /* Called concurrently from any thread */
> diff --git a/meson.build b/meson.build
> index f4d1ab1096..8d80033d90 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -433,8 +433,11 @@ keyutils = dependency('libkeyutils', required: false,
>  
>  has_gettid = cc.has_function('gettid')
>  
> +has_stdatomic_h = cc.has_header('stdatomic.h')
> +
>  # Create config-host.h
>  
> +config_host_data.set('CONFIG_STDATOMIC_H', has_stdatomic_h)
>  config_host_data.set('CONFIG_SDL', sdl.found())
>  config_host_data.set('CONFIG_SDL_IMAGE', sdl_image.found())
>  config_host_data.set('CONFIG_VNC', vnc.found())
>
Stefan Hajnoczi Sept. 21, 2020, 1:44 p.m. UTC | #2
On Mon, Sep 21, 2020 at 01:15:30PM +0200, Paolo Bonzini wrote:
> On 21/09/20 12:41, Stefan Hajnoczi wrote:
> > The upshot is that all atomic variables in QEMU need to use C11 Atomic
> > atomic_* types. This is a big change!
> 
> The main issue with this is that C11 atomic types do too much magic,
> including defaulting to seq-cst operations for loads and stores. As
> documented in docs/devel/atomics.rst, seq-cst loads and stores are
> almost never what you want (they're only a little below volatile :)):

I can't find where atomics.rst says seq-cst operations are almost never
what you want?

I'm surprised that this isn't documented more prominently. Seq-cst
operations are the first type of atomic operation documented. It would
help to move them to the end of the document if they shouldn't be used.

> 1) in almost all cases where we do message passing between threads, we
> can use store-release/load-acquire

They don't provide atomic arithmetic/logic operations. The only
non-seq-cst ALU operation I see in atomic.h is
atomic_fetch_inc_nonzero(), and it's a cmpxchg() loop (ugly compared to
an atomic ALU instruction).

> 2) when we want a full memory barrier, it's usually better to write the
> whole construct by hand, to avoid issues like the ones we had with
> ->notify_me.
> 
> In addition, atomics are complicated enough that you almost always want
> a sign that you are using them, even at the cost of heavier-looking
> code.  For example "atomic_read" straight ahead tells you "this is
> complicated but somebody has thought about it", while a naked access to
> a global variable or a struct field tells you "check which mutex is held
> when this function is called".  By allowing shortcuts for seq-cst loads
> and stores, C11 atomics fool you into thinking that seq-cst accesses are
> all you need, while in reality they create more problems than they solve. :(

Good point. But it's easy to error out on 'atomic_fetch_add()' and
insist on 'atomic_fetch_add_explicit()' (where the user specifies the
memory order) in checkpatch.pl.

atomic_load_explicit()/atomic_store_explicit() can be used instead of
bare variable accesses. Although here I don't know how to enforce that
except via typedef struct { atomic_int i; } AtomicInt.

Thanks for suggesting the namespace cleanup I'll give that a try.

> > 1. Reimplement everything in terms of atomic_fetch_*() and other C11
> >    Atomic APIs. For example atomic_fetch_inc() becomes
> >    atomic_fetch_add(ptr, 1).
> > 
> > 2. atomic_*_fetch() is not available in C11 Atomics so emulate it by
> >    performing the operation twice (once atomic, then again non-atomic
> >    using the fetched old atomic value). Yuck!
> 
> They're hidden in plain sight if you are okay with seq-cst operations
> (which we almost always are for RMW operations, unlike loads and
> stores): "x += 3" is an atomic_add_fetch(&x, 3).  However, they share
> with loads and stores the problem of being too magic; if you see "x +=
> 3" you don't think of it as something that's thread safe.

Ah, I see!

Stefan
Paolo Bonzini Sept. 21, 2020, 2:16 p.m. UTC | #3
On 21/09/20 15:44, Stefan Hajnoczi wrote:
> On Mon, Sep 21, 2020 at 01:15:30PM +0200, Paolo Bonzini wrote:
>> On 21/09/20 12:41, Stefan Hajnoczi wrote:
>>> The upshot is that all atomic variables in QEMU need to use C11 Atomic
>>> atomic_* types. This is a big change!
>>
>> The main issue with this is that C11 atomic types do too much magic,
>> including defaulting to seq-cst operations for loads and stores. As
>> documented in docs/devel/atomics.rst, seq-cst loads and stores are
>> almost never what you want (they're only a little below volatile :)):
> 
> I can't find where atomics.rst says seq-cst operations are almost never
> what you want?

Note that I'm talking only about loads and stores.  They are so much
never what you want that we don't even provide a wrapper for them in
qemu/atomic.h.

  ``qemu/atomic.h`` also provides loads and stores that cannot be
  reordered with each other::

    typeof(*ptr) atomic_mb_read(ptr)
    void         atomic_mb_set(ptr, val)

  However these do not provide sequential consistency and, in
  particular, they do not participate in the total ordering enforced by
  sequentially-consistent operations.  For this reason they are
  deprecated.  They should instead be replaced with any of the following
  (ordered from easiest to hardest):

  - accesses inside a mutex or spinlock

  - lightweight synchronization primitives such as ``QemuEvent``

  - RCU operations (``atomic_rcu_read``, ``atomic_rcu_set``) when
  publishing or accessing a new version of a data structure

  - other atomic accesses: ``atomic_read`` and ``atomic_load_acquire``
  for loads, ``atomic_set`` and ``atomic_store_release`` for stores,
  ``smp_mb`` to forbid reordering subsequent loads before a store.

where seq-cst loads and stores are again completely missing.  smp_mb is
there to replace them, as we did in commit 5710a3e0 ("async: use
explicit memory barriers").

>> we can use store-release/load-acquire
> 
> They don't provide atomic arithmetic/logic operations. The only
> non-seq-cst ALU operation I see in atomic.h is
> atomic_fetch_inc_nonzero(), and it's a cmpxchg() loop (ugly compared to
> an atomic ALU instruction).

Seq-cst is fine for RMW operations (arithmetic/logic, xchg, cmpxchg),
also because they're usually less performance critical than loads and
stores.  It's only loads and stores that give a false sense of
correctness as in the above commit.

Paolo
Stefan Hajnoczi Sept. 21, 2020, 4:49 p.m. UTC | #4
On Mon, Sep 21, 2020 at 04:16:07PM +0200, Paolo Bonzini wrote:
> On 21/09/20 15:44, Stefan Hajnoczi wrote:
> > On Mon, Sep 21, 2020 at 01:15:30PM +0200, Paolo Bonzini wrote:
> >> On 21/09/20 12:41, Stefan Hajnoczi wrote:
> > They don't provide atomic arithmetic/logic operations. The only
> > non-seq-cst ALU operation I see in atomic.h is
> > atomic_fetch_inc_nonzero(), and it's a cmpxchg() loop (ugly compared to
> > an atomic ALU instruction).
> 
> Seq-cst is fine for RMW operations (arithmetic/logic, xchg, cmpxchg),
> also because they're usually less performance critical than loads and
> stores.  It's only loads and stores that give a false sense of
> correctness as in the above commit.

Okay. I've sent a patch to simply prefix atomic.h atomic_*() functions
with qemu_. That way they don't conflict with <stdatomic.h>.

Stefan
diff mbox series

Patch

diff --git a/include/block/aio.h b/include/block/aio.h
index b2f703fa3f..466c058880 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -220,7 +220,7 @@  struct AioContext {
      */
     QEMUTimerListGroup tlg;
 
-    int external_disable_cnt;
+    atomic_int external_disable_cnt;
 
     /* Number of AioHandlers without .io_poll() */
     int poll_disable_cnt;
diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index ff72db5115..4fbbd5f362 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -15,6 +15,32 @@ 
 #ifndef QEMU_ATOMIC_H
 #define QEMU_ATOMIC_H
 
+#include <stdbool.h>
+#include <stddef.h>
+
+/* Use C11 Atomics if possible, otherwise fall back to custom definitions */
+#ifdef CONFIG_STDATOMIC_H
+#include <stdatomic.h>
+#else
+/* Commonly used types from C11 "7.17.6 Atomic integer types" */
+typedef bool atomic_bool;
+typedef char atomic_char;
+typedef signed char atomic_schar;
+typedef unsigned char atomic_uchar;
+typedef short atomic_short;
+typedef unsigned short atomic_ushort;
+typedef int atomic_int;
+typedef unsigned int atomic_uint;
+typedef long atomic_long;
+typedef unsigned long atomic_ulong;
+typedef long long atomic_llong;
+typedef unsigned long long atomic_ullong;
+typedef intptr_t atomic_intptr_t;
+typedef uintptr_t atomic_uintptr_t;
+typedef size_t atomic_size_t;
+typedef ptrdiff_t atomic_ptrdiff_t;
+#endif
+
 /* Compiler barrier */
 #define barrier()   ({ asm volatile("" ::: "memory"); (void)0; })
 
@@ -205,10 +231,6 @@ 
     atomic_cmpxchg__nocheck(ptr, old, new);                             \
 })
 
-/* Provide shorter names for GCC atomic builtins, return old value */
-#define atomic_fetch_inc(ptr)  __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST)
-#define atomic_fetch_dec(ptr)  __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST)
-
 #ifndef atomic_fetch_add
 #define atomic_fetch_add(ptr, n) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST)
 #define atomic_fetch_sub(ptr, n) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST)
@@ -217,22 +239,41 @@ 
 #define atomic_fetch_xor(ptr, n) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST)
 #endif
 
-#define atomic_inc_fetch(ptr)    __atomic_add_fetch(ptr, 1, __ATOMIC_SEQ_CST)
-#define atomic_dec_fetch(ptr)    __atomic_sub_fetch(ptr, 1, __ATOMIC_SEQ_CST)
-#define atomic_add_fetch(ptr, n) __atomic_add_fetch(ptr, n, __ATOMIC_SEQ_CST)
-#define atomic_sub_fetch(ptr, n) __atomic_sub_fetch(ptr, n, __ATOMIC_SEQ_CST)
-#define atomic_and_fetch(ptr, n) __atomic_and_fetch(ptr, n, __ATOMIC_SEQ_CST)
-#define atomic_or_fetch(ptr, n)  __atomic_or_fetch(ptr, n, __ATOMIC_SEQ_CST)
-#define atomic_xor_fetch(ptr, n) __atomic_xor_fetch(ptr, n, __ATOMIC_SEQ_CST)
+/* Provide shorter names for GCC atomic builtins, return old value */
+#define atomic_fetch_inc(ptr)  atomic_fetch_add(ptr, 1)
+#define atomic_fetch_dec(ptr)  atomic_fetch_sub(ptr, 1)
+
+#define atomic_inc_fetch(ptr)    (atomic_fetch_add(ptr, 1) + 1)
+#define atomic_dec_fetch(ptr)    (atomic_fetch_sub(ptr, 1) - 1)
+#define atomic_add_fetch(ptr, n) ({ \
+    typeof(n) _n = n; \
+    atomic_fetch_add(ptr, _n) + _n; \
+})
+#define atomic_sub_fetch(ptr, n) ({ \
+    typeof(n) _n = n; \
+    atomic_fetch_sub(ptr, _n) - n; \
+})
+#define atomic_and_fetch(ptr, n) ({ \
+    typeof(n) _n = n; \
+    atomic_fetch_and(ptr, _n) & _n; \
+})
+#define atomic_or_fetch(ptr, n) ({ \
+    typeof(n) _n = n; \
+    atomic_fetch_or(ptr, _n) | _n; \
+})
+#define atomic_xor_fetch(ptr, n) ({ \
+    typeof(n) _n = n; \
+    atomic_fetch_xor(ptr, _n) ^ _n; \
+})
 
 /* And even shorter names that return void.  */
-#define atomic_inc(ptr)    ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST))
-#define atomic_dec(ptr)    ((void) __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST))
-#define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, __ATOMIC_SEQ_CST))
-#define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, __ATOMIC_SEQ_CST))
-#define atomic_and(ptr, n) ((void) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST))
-#define atomic_or(ptr, n)  ((void) __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST))
-#define atomic_xor(ptr, n) ((void) __atomic_fetch_xor(ptr, n, __ATOMIC_SEQ_CST))
+#define atomic_inc(ptr)    ((void) atomic_fetch_add(ptr, 1))
+#define atomic_dec(ptr)    ((void) atomic_fetch_sub(ptr, 1))
+#define atomic_add(ptr, n) ((void) atomic_fetch_add(ptr, n))
+#define atomic_sub(ptr, n) ((void) atomic_fetch_sub(ptr, n))
+#define atomic_and(ptr, n) ((void) atomic_fetch_and(ptr, n))
+#define atomic_or(ptr, n)  ((void) atomic_fetch_or(ptr, n))
+#define atomic_xor(ptr, n) ((void) atomic_fetch_xor(ptr, n))
 
 #else /* __ATOMIC_RELAXED */
 
@@ -424,6 +465,8 @@ 
 #define atomic_or(ptr, n)      ((void) __sync_fetch_and_or(ptr, n))
 #define atomic_xor(ptr, n)     ((void) __sync_fetch_and_xor(ptr, n))
 
+#error TODO
+
 #endif /* __ATOMIC_RELAXED */
 
 #ifndef smp_wmb
diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index f55ce8b320..e9d676d112 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -49,7 +49,7 @@  static inline void set_bit(long nr, unsigned long *addr)
 static inline void set_bit_atomic(long nr, unsigned long *addr)
 {
     unsigned long mask = BIT_MASK(nr);
-    unsigned long *p = addr + BIT_WORD(nr);
+    atomic_ulong *p = (atomic_ulong *)(addr + BIT_WORD(nr));
 
     atomic_or(p, mask);
 }
diff --git a/include/qom/object.h b/include/qom/object.h
index 056f67ab3b..f51244b61f 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -15,6 +15,7 @@ 
 #define QEMU_OBJECT_H
 
 #include "qapi/qapi-builtin-types.h"
+#include "qemu/atomic.h"
 #include "qemu/module.h"
 #include "qom/object.h"
 
@@ -550,7 +551,7 @@  struct Object
     ObjectClass *class;
     ObjectFree *free;
     GHashTable *properties;
-    uint32_t ref;
+    atomic_uint ref;
     Object *parent;
 };
 
diff --git a/util/aio-posix.h b/util/aio-posix.h
index c80c04506a..c5b446f0a1 100644
--- a/util/aio-posix.h
+++ b/util/aio-posix.h
@@ -33,7 +33,7 @@  struct AioHandler {
     QLIST_ENTRY(AioHandler) node_poll;
 #ifdef CONFIG_LINUX_IO_URING
     QSLIST_ENTRY(AioHandler) node_submitted;
-    unsigned flags; /* see fdmon-io_uring.c */
+    atomic_uint flags; /* see fdmon-io_uring.c */
 #endif
     int64_t poll_idle_timeout; /* when to stop userspace polling */
     bool is_external;
diff --git a/util/async.c b/util/async.c
index 4266745dee..dcf1a32492 100644
--- a/util/async.c
+++ b/util/async.c
@@ -60,7 +60,7 @@  struct QEMUBH {
     QEMUBHFunc *cb;
     void *opaque;
     QSLIST_ENTRY(QEMUBH) next;
-    unsigned flags;
+    atomic_uint flags;
 };
 
 /* Called concurrently from any thread */
diff --git a/meson.build b/meson.build
index f4d1ab1096..8d80033d90 100644
--- a/meson.build
+++ b/meson.build
@@ -433,8 +433,11 @@  keyutils = dependency('libkeyutils', required: false,
 
 has_gettid = cc.has_function('gettid')
 
+has_stdatomic_h = cc.has_header('stdatomic.h')
+
 # Create config-host.h
 
+config_host_data.set('CONFIG_STDATOMIC_H', has_stdatomic_h)
 config_host_data.set('CONFIG_SDL', sdl.found())
 config_host_data.set('CONFIG_SDL_IMAGE', sdl_image.found())
 config_host_data.set('CONFIG_VNC', vnc.found())