diff mbox series

Avoid atomic for guard acquire when that is expensive

Message ID AM6PR03MB517068C1808B39346FDC2728E4FD0@AM6PR03MB5170.eurprd03.prod.outlook.com
State New
Headers show
Series Avoid atomic for guard acquire when that is expensive | expand

Commit Message

Bernd Edlinger Nov. 22, 2020, 8:05 a.m. UTC
Hi,

this avoids the need to use -fno-threadsafe-statics on
arm-none-eabi or working around that problem by supplying
a dummy __sync_synchronize function which might
just lead to silent code failure of the worst kind
(non-reproducable, racy) at runtime, as was pointed out
on previous discussions here.

When the atomic access involves a call to __sync_synchronize
it is better to call __cxa_guard_acquire unconditionally,
since it handles the atomics too, or is a non-threaded
implementation when there is no gthread support for this target.

This fixes also a bug for the ARM EABI big-endian target,
that is, previously the wrong bit was checked.


Regression tested successfully on arm-none-eabi with newlib-3.3.0.

Is it OK for trunk?


Thanks
Bernd.

Comments

Jason Merrill Nov. 24, 2020, 10:10 p.m. UTC | #1
On 11/22/20 3:05 AM, Bernd Edlinger wrote:
> Hi,
> 
> this avoids the need to use -fno-threadsafe-statics on
> arm-none-eabi or working around that problem by supplying
> a dummy __sync_synchronize function which might
> just lead to silent code failure of the worst kind
> (non-reproducable, racy) at runtime, as was pointed out
> on previous discussions here.
> 
> When the atomic access involves a call to __sync_synchronize
> it is better to call __cxa_guard_acquire unconditionally,
> since it handles the atomics too, or is a non-threaded
> implementation when there is no gthread support for this target.
> 
> This fixes also a bug for the ARM EABI big-endian target,
> that is, previously the wrong bit was checked.

Instead of a new target macro, can't you follow 
fold_builtin_atomic_always_lock_free/can_atomic_load_p?

Jason
Bernd Edlinger Nov. 30, 2020, 8:08 p.m. UTC | #2
Hi,

I'd like to ping for this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559882.html

Thanks
Bernd.

On 11/22/20 9:05 AM, Bernd Edlinger wrote:
> Hi,
> 
> this avoids the need to use -fno-threadsafe-statics on
> arm-none-eabi or working around that problem by supplying
> a dummy __sync_synchronize function which might
> just lead to silent code failure of the worst kind
> (non-reproducable, racy) at runtime, as was pointed out
> on previous discussions here.
> 
> When the atomic access involves a call to __sync_synchronize
> it is better to call __cxa_guard_acquire unconditionally,
> since it handles the atomics too, or is a non-threaded
> implementation when there is no gthread support for this target.
> 
> This fixes also a bug for the ARM EABI big-endian target,
> that is, previously the wrong bit was checked.
> 
> 
> Regression tested successfully on arm-none-eabi with newlib-3.3.0.
> 
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
>
Jason Merrill Nov. 30, 2020, 8:54 p.m. UTC | #3
On 11/30/20 3:08 PM, Bernd Edlinger wrote:
> Hi,
> 
> I'd like to ping for this patch:

I reviewed it on the 24th:

https://gcc.gnu.org/pipermail/gcc-patches/2020-November/560118.html

> 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559882.html
> 
> Thanks
> Bernd.
> 
> On 11/22/20 9:05 AM, Bernd Edlinger wrote:
>> Hi,
>>
>> this avoids the need to use -fno-threadsafe-statics on
>> arm-none-eabi or working around that problem by supplying
>> a dummy __sync_synchronize function which might
>> just lead to silent code failure of the worst kind
>> (non-reproducable, racy) at runtime, as was pointed out
>> on previous discussions here.
>>
>> When the atomic access involves a call to __sync_synchronize
>> it is better to call __cxa_guard_acquire unconditionally,
>> since it handles the atomics too, or is a non-threaded
>> implementation when there is no gthread support for this target.
>>
>> This fixes also a bug for the ARM EABI big-endian target,
>> that is, previously the wrong bit was checked.
>>
>>
>> Regression tested successfully on arm-none-eabi with newlib-3.3.0.
>>
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
>
Bernd Edlinger Dec. 1, 2020, 6:28 p.m. UTC | #4
On 11/24/20 11:10 PM, Jason Merrill wrote:
> On 11/22/20 3:05 AM, Bernd Edlinger wrote:
>> Hi,
>>
>> this avoids the need to use -fno-threadsafe-statics on
>> arm-none-eabi or working around that problem by supplying
>> a dummy __sync_synchronize function which might
>> just lead to silent code failure of the worst kind
>> (non-reproducable, racy) at runtime, as was pointed out
>> on previous discussions here.
>>
>> When the atomic access involves a call to __sync_synchronize
>> it is better to call __cxa_guard_acquire unconditionally,
>> since it handles the atomics too, or is a non-threaded
>> implementation when there is no gthread support for this target.
>>
>> This fixes also a bug for the ARM EABI big-endian target,
>> that is, previously the wrong bit was checked.
> 
> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p?
> 

Yes, thanks, that should work too.
Would you like this better?


Thanks
Bernd.
Jason Merrill Dec. 2, 2020, 6:57 p.m. UTC | #5
On 12/1/20 1:28 PM, Bernd Edlinger wrote:
> On 11/24/20 11:10 PM, Jason Merrill wrote:
>> On 11/22/20 3:05 AM, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this avoids the need to use -fno-threadsafe-statics on
>>> arm-none-eabi or working around that problem by supplying
>>> a dummy __sync_synchronize function which might
>>> just lead to silent code failure of the worst kind
>>> (non-reproducable, racy) at runtime, as was pointed out
>>> on previous discussions here.
>>>
>>> When the atomic access involves a call to __sync_synchronize
>>> it is better to call __cxa_guard_acquire unconditionally,
>>> since it handles the atomics too, or is a non-threaded
>>> implementation when there is no gthread support for this target.
>>>
>>> This fixes also a bug for the ARM EABI big-endian target,
>>> that is, previously the wrong bit was checked.
>>
>> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p?
> 
> Yes, thanks, that should work too.
> Would you like this better?

> +is_atomic_expensive_p (machine_mode mode)
> +{
> +  if (!flag_inline_atomics)
> +    return false;

Why not true?

> +  if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode))
> +    return false;

This also seems backwards; I'd think we want to return false if either 
of those tests are true.  Or maybe just can_atomic_load_p, and not 
bother about compare-and-swap.

> +      tree type = targetm.cxx.guard_mask_bit ()
> +		  ? TREE_TYPE (guard) : char_type_node;
> +
> +      if (is_atomic_expensive_p (TYPE_MODE (type)))
> +	guard = integer_zero_node;
> +      else
> +	guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);

It should still work to load a single byte, it just needs to be the 
least-significant byte.  And this isn't an EABI issue; it looks like the 
non-EABI code is also broken for big-endian targets, both the atomic 
load and the normal load in get_guard_bits.

Jason
Bernd Edlinger Dec. 5, 2020, 12:37 p.m. UTC | #6
On 12/2/20 7:57 PM, Jason Merrill wrote:
> On 12/1/20 1:28 PM, Bernd Edlinger wrote:
>> On 11/24/20 11:10 PM, Jason Merrill wrote:
>>> On 11/22/20 3:05 AM, Bernd Edlinger wrote:
>>>> Hi,
>>>>
>>>> this avoids the need to use -fno-threadsafe-statics on
>>>> arm-none-eabi or working around that problem by supplying
>>>> a dummy __sync_synchronize function which might
>>>> just lead to silent code failure of the worst kind
>>>> (non-reproducable, racy) at runtime, as was pointed out
>>>> on previous discussions here.
>>>>
>>>> When the atomic access involves a call to __sync_synchronize
>>>> it is better to call __cxa_guard_acquire unconditionally,
>>>> since it handles the atomics too, or is a non-threaded
>>>> implementation when there is no gthread support for this target.
>>>>
>>>> This fixes also a bug for the ARM EABI big-endian target,
>>>> that is, previously the wrong bit was checked.
>>>
>>> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p?
>>
>> Yes, thanks, that should work too.
>> Would you like this better?
> 
>> +is_atomic_expensive_p (machine_mode mode)
>> +{
>> +  if (!flag_inline_atomics)
>> +    return false;
> 
> Why not true?
> 

Ooops...
Yes, I ought to return true here.
I must have made a mistake when I tested the last version of this patch,
sorry for the confusion.

>> +  if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode))
>> +    return false;
> 
> This also seems backwards; I'd think we want to return false if either of those tests are true.  Or maybe just can_atomic_load_p, and not bother about compare-and-swap.
> 


Yes, you are right.
Unfortuately can_atomic_load_p is too weak, since it does not cover
the memory barrier.

And can_compare_and_swap_p (..., false) is actually a bit too strong,
but if it returns true, we should be able to use any atomic without
need for a library call.

>> +      tree type = targetm.cxx.guard_mask_bit ()
>> +          ? TREE_TYPE (guard) : char_type_node;
>> +
>> +      if (is_atomic_expensive_p (TYPE_MODE (type)))
>> +    guard = integer_zero_node;
>> +      else
>> +    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);
> 
> It should still work to load a single byte, it just needs to be the least-significant byte.  And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits.
> 

I think the non-EABI code is always using bit 0 in the first byte,
by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1).

Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise.

For all other targets when _GLIBCXX_USE_FUTEX is defined,
__cxa_guard_XXX accesses the value as int* while the memory
is a 64-bit long, so I could imagine that is an aliasing violation.


But nothing that needs to be fixed immediately.


Attached is the corrected patch.

Tested again on arm-none-eabi with arm-sim.
Is it OK for trunk?

Thanks
Bernd.

> Jason
>
Jason Merrill Dec. 7, 2020, 3:04 p.m. UTC | #7
On 12/5/20 7:37 AM, Bernd Edlinger wrote:
> On 12/2/20 7:57 PM, Jason Merrill wrote:
>> On 12/1/20 1:28 PM, Bernd Edlinger wrote:
>>> On 11/24/20 11:10 PM, Jason Merrill wrote:
>>>> On 11/22/20 3:05 AM, Bernd Edlinger wrote:
>>>>> Hi,
>>>>>
>>>>> this avoids the need to use -fno-threadsafe-statics on
>>>>> arm-none-eabi or working around that problem by supplying
>>>>> a dummy __sync_synchronize function which might
>>>>> just lead to silent code failure of the worst kind
>>>>> (non-reproducable, racy) at runtime, as was pointed out
>>>>> on previous discussions here.
>>>>>
>>>>> When the atomic access involves a call to __sync_synchronize
>>>>> it is better to call __cxa_guard_acquire unconditionally,
>>>>> since it handles the atomics too, or is a non-threaded
>>>>> implementation when there is no gthread support for this target.
>>>>>
>>>>> This fixes also a bug for the ARM EABI big-endian target,
>>>>> that is, previously the wrong bit was checked.
>>>>
>>>> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p?
>>>
>>> Yes, thanks, that should work too.
>>> Would you like this better?
>>
>>> +is_atomic_expensive_p (machine_mode mode)
>>> +{
>>> +  if (!flag_inline_atomics)
>>> +    return false;
>>
>> Why not true?
> 
> Ooops...
> Yes, I ought to return true here.
> I must have made a mistake when I tested the last version of this patch,
> sorry for the confusion.
> 
>>> +  if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode))
>>> +    return false;
>>
>> This also seems backwards; I'd think we want to return false if either of those tests are true.  Or maybe just can_atomic_load_p, and not bother about compare-and-swap.
> 
> Yes, you are right.
> Unfortuately can_atomic_load_p is too weak, since it does not cover
> the memory barrier.
> 
> And can_compare_and_swap_p (..., false) is actually a bit too strong,
> but if it returns true, we should be able to use any atomic without
> need for a library call.

>>> +      tree type = targetm.cxx.guard_mask_bit ()
>>> +          ? TREE_TYPE (guard) : char_type_node;
>>> +
>>> +      if (is_atomic_expensive_p (TYPE_MODE (type)))
>>> +    guard = integer_zero_node;
>>> +      else
>>> +    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);
>>
>> It should still work to load a single byte, it just needs to be the least-significant byte.

I still don't think you need to load the whole word to check the guard bit.

> And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits.
>>
> 
> I think the non-EABI code is always using bit 0 in the first byte,
> by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1).

Except that set_guard sets the least-significant bit on all targets.

> Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise.
> 
> For all other targets when _GLIBCXX_USE_FUTEX is defined,
> __cxa_guard_XXX accesses the value as int* while the memory
> is a 64-bit long, so I could imagine that is an aliasing violation.
> 
> 
> But nothing that needs to be fixed immediately.

Agreed.

> Attached is the corrected patch.
> 
> Tested again on arm-none-eabi with arm-sim.
> Is it OK for trunk?
> 
> Thanks
> Bernd.
> 
>> Jason
>>
Bernd Edlinger Dec. 7, 2020, 4:17 p.m. UTC | #8
On 12/7/20 4:04 PM, Jason Merrill wrote:
> On 12/5/20 7:37 AM, Bernd Edlinger wrote:
>> On 12/2/20 7:57 PM, Jason Merrill wrote:
>>> On 12/1/20 1:28 PM, Bernd Edlinger wrote:>>>> +      tree type = targetm.cxx.guard_mask_bit ()
>>>> +          ? TREE_TYPE (guard) : char_type_node;
>>>> +
>>>> +      if (is_atomic_expensive_p (TYPE_MODE (type)))
>>>> +    guard = integer_zero_node;
>>>> +      else
>>>> +    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);
>>>
>>> It should still work to load a single byte, it just needs to be the least-significant byte.
> 
> I still don't think you need to load the whole word to check the guard bit.
> 

Of course that is also possible.  But I would not expect an
address offset and a byte access to be cheaper than a word access.

I just saw that get_guard_bits does the same thing:
access the first byte if !targetm.cxx.guard_mask_bit ()
and access the whole word otherwise, which is only
there for ARM EABI.

>> And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits.
>>>
>>
>> I think the non-EABI code is always using bit 0 in the first byte,
>> by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1).
> 
> Except that set_guard sets the least-significant bit on all targets.
> 

Hmm, as I said, get_guard_bits gets the guard as a word if !targetm.cxx.guard_mask_bit (),
and as the first byte otherwise.  Of course it could get the third byte,
if !targetm.cxx.guard_mask_bit () && BYTES_BIG_ENDIAN, but it would be more complicated
this way, wouldn't it?


Bernd.

>> Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise.
>>
>> For all other targets when _GLIBCXX_USE_FUTEX is defined,
>> __cxa_guard_XXX accesses the value as int* while the memory
>> is a 64-bit long, so I could imagine that is an aliasing violation.
>>
>>
>> But nothing that needs to be fixed immediately.
> 
> Agreed.
> 
>> Attached is the corrected patch.
>>
>> Tested again on arm-none-eabi with arm-sim.
>> Is it OK for trunk?
>>
>> Thanks
>> Bernd.
>>
>>> Jason
>>>
>
Jason Merrill Dec. 8, 2020, 7:50 p.m. UTC | #9
On 12/7/20 11:17 AM, Bernd Edlinger wrote:
> On 12/7/20 4:04 PM, Jason Merrill wrote:
>> On 12/5/20 7:37 AM, Bernd Edlinger wrote:
>>> On 12/2/20 7:57 PM, Jason Merrill wrote:
>>>> On 12/1/20 1:28 PM, Bernd Edlinger wrote:>>>> +      tree type = targetm.cxx.guard_mask_bit ()
>>>>> +          ? TREE_TYPE (guard) : char_type_node;
>>>>> +
>>>>> +      if (is_atomic_expensive_p (TYPE_MODE (type)))
>>>>> +    guard = integer_zero_node;
>>>>> +      else
>>>>> +    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);
>>>>
>>>> It should still work to load a single byte, it just needs to be the least-significant byte.
>>
>> I still don't think you need to load the whole word to check the guard bit.
> 
> Of course that is also possible.  But I would not expect an
> address offset and a byte access to be cheaper than a word access.

Fair point.

> I just saw that get_guard_bits does the same thing:
> access the first byte if !targetm.cxx.guard_mask_bit ()
> and access the whole word otherwise, which is only
> there for ARM EABI.

>>> And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits.
>>>
>>> I think the non-EABI code is always using bit 0 in the first byte,
>>> by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1).
>>
>> Except that set_guard sets the least-significant bit on all targets.
> 
> Hmm, as I said, get_guard_bits gets the guard as a word if !targetm.cxx.guard_mask_bit (),
> and as the first byte otherwise.  Of course it could get the third byte,
> if !targetm.cxx.guard_mask_bit () && BYTES_BIG_ENDIAN, but it would be more complicated
> this way, wouldn't it?

Ah, yes, I was overlooking that set_guard uses get_guard_bits.

The patch is OK.

>>> Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise.
>>>
>>> For all other targets when _GLIBCXX_USE_FUTEX is defined,
>>> __cxa_guard_XXX accesses the value as int* while the memory
>>> is a 64-bit long, so I could imagine that is an aliasing violation.
>>>
>>>
>>> But nothing that needs to be fixed immediately.
>>
>> Agreed.
>>
>>> Attached is the corrected patch.
>>>
>>> Tested again on arm-none-eabi with arm-sim.
>>> Is it OK for trunk?
>>>
>>> Thanks
>>> Bernd.
>>>
>>>> Jason
>>>>
>>
>
diff mbox series

Patch

From 9fd070407408cf10789f5e9eb5ddda2fb3798e6f Mon Sep 17 00:00:00 2001
From: Bernd Edlinger <bernd.edlinger@hotmail.de>
Date: Sun, 22 Nov 2020 08:11:14 +0100
Subject: [PATCH] Avoid atomic for guard acquire when that is expensive

When the atomic access involves a call to __sync_synchronize
it is better to call __cxa_guard_acquire unconditionally,
since it handles the atomics too, or is a non-threaded
implementation when there is no gthread support for this target.

This fixes also a bug for the ARM EABI big-endian target,
that is, previously the wrong bit was checked.

gcc:
2020-11-22  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* target.def (guard_atomic_expensive): New hook.
	* doc/tm.texi: Document TARGET_CXX_GUARD_ATOMIC_EXPENSIVE.
	* doc/tm.texi.in: Likewise.
	* config/arm/arm.c (arm_cxx_guard_atomic_expensive): New callback.

gcc/cp:
2020-11-22  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* decl2.c: (build_atomic_load_byte): Rename to...
	(build_atomic_load_type): ... and add new parameter type.
	(get_guard_cond): Skip the atomic here if that is expensive.
	Use the correct type for the atomic load on certain targets.
---
 gcc/config/arm/arm.c | 13 +++++++++++++
 gcc/cp/decl2.c       | 12 ++++++++----
 gcc/doc/tm.texi      |  7 +++++++
 gcc/doc/tm.texi.in   |  2 ++
 gcc/target.def       | 10 ++++++++++
 5 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 04190b1..04ca1fe 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -235,6 +235,7 @@  static rtx arm_dwarf_register_span (rtx);
 
 static tree arm_cxx_guard_type (void);
 static bool arm_cxx_guard_mask_bit (void);
+static bool arm_cxx_guard_atomic_expensive (void);
 static tree arm_get_cookie_size (tree);
 static bool arm_cookie_has_size (void);
 static bool arm_cxx_cdtor_returns_this (void);
@@ -593,6 +594,9 @@  static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_CXX_GUARD_MASK_BIT
 #define TARGET_CXX_GUARD_MASK_BIT arm_cxx_guard_mask_bit
 
+#undef TARGET_CXX_GUARD_ATOMIC_EXPENSIVE
+#define TARGET_CXX_GUARD_ATOMIC_EXPENSIVE arm_cxx_guard_atomic_expensive
+
 #undef TARGET_CXX_GET_COOKIE_SIZE
 #define TARGET_CXX_GET_COOKIE_SIZE arm_get_cookie_size
 
@@ -28882,6 +28886,15 @@  arm_cxx_guard_mask_bit (void)
 }
 
 
+/* Return true if atomics involve a call to __sync_synchronize.  */
+
+static bool
+arm_cxx_guard_atomic_expensive (void)
+{
+  return TARGET_AAPCS_BASED && !TARGET_HAVE_DMB && !TARGET_HAVE_DMB_MCR;
+}
+
+
 /* The EABI specifies that all array cookies are 8 bytes long.  */
 
 static tree
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 1bc7b7e..e2b29a6 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -3300,15 +3300,15 @@  get_guard (tree decl)
 /* Return an atomic load of src with the appropriate memory model.  */
 
 static tree
-build_atomic_load_byte (tree src, HOST_WIDE_INT model)
+build_atomic_load_type (tree src, HOST_WIDE_INT model, tree type)
 {
-  tree ptr_type = build_pointer_type (char_type_node);
+  tree ptr_type = build_pointer_type (type);
   tree mem_model = build_int_cst (integer_type_node, model);
   tree t, addr, val;
   unsigned int size;
   int fncode;
 
-  size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node));
+  size = tree_to_uhwi (TYPE_SIZE_UNIT (type));
 
   fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1;
   t = builtin_decl_implicit ((enum built_in_function) fncode);
@@ -3350,8 +3350,12 @@  get_guard_cond (tree guard, bool thread_safe)
 
   if (!thread_safe)
     guard = get_guard_bits (guard);
+  else if (targetm.cxx.guard_atomic_expensive ())
+    guard = integer_zero_node;
   else
-    guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE);
+    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE,
+				    targetm.cxx.guard_mask_bit ()
+				    ? TREE_TYPE (guard) : char_type_node);
 
   /* Mask off all but the low bit.  */
   if (targetm.cxx.guard_mask_bit ())
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 2b88f78..92215cf 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10728,6 +10728,13 @@  This hook determines how guard variables are used.  It should return
 @code{true} indicates that only the least significant bit should be used.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_CXX_GUARD_ATOMIC_EXPENSIVE (void)
+This hook determines if the guard atomic is expensive.  It should return
+@code{false} (the default) if the atomic is inexpensive.  A return value of
+@code{true} indicates that the atomic is expensive i.e., involves a call to
+__sync_synchronize.  In this case let __cxa_guard_acquire handle the atomics.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_CXX_GET_COOKIE_SIZE (tree @var{type})
 This hook returns the size of the cookie to use when allocating an array
 whose elements have the indicated @var{type}.  Assumes that it is already
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 897f289..ce1d837 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7321,6 +7321,8 @@  floating-point support; they are not included in this mechanism.
 
 @hook TARGET_CXX_GUARD_MASK_BIT
 
+@hook TARGET_CXX_GUARD_ATOMIC_EXPENSIVE
+
 @hook TARGET_CXX_GET_COOKIE_SIZE
 
 @hook TARGET_CXX_COOKIE_HAS_SIZE
diff --git a/gcc/target.def b/gcc/target.def
index 810d554..0c02d5c 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -6160,6 +6160,16 @@  DEFHOOK
  bool, (void),
  hook_bool_void_false)
 
+/* Return true if the guard atomic is expensive.  */
+DEFHOOK
+(guard_atomic_expensive,
+ "This hook determines if the guard atomic is expensive.  It should return\n\
+@code{false} (the default) if the atomic is inexpensive.  A return value of\n\
+@code{true} indicates that the atomic is expensive i.e., involves a call to\n\
+__sync_synchronize.  In this case let __cxa_guard_acquire handle the atomics.",
+ bool, (void),
+ hook_bool_void_false)
+
 /* Returns the size of the array cookie for an array of type.  */
 DEFHOOK
 (get_cookie_size,
-- 
1.9.1