diff mbox

[tsan] Instrument atomics

Message ID 20121123140538.GX2315@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Nov. 23, 2012, 2:05 p.m. UTC
Hi!

This patch attempts to instrument __atomic_* and __sync_* builtins.
Unfortunately for none of the builtins there is 1:1 mapping to the tsan
replacements, tsan uses weirdo memory model encoding (instead of values
from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
one memory model at the same time), so even for the easy cases GCC
has to transform them.  More importantly, there is no way to pass through
__ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
up and doesn't replace anything if e.g.
__atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
is seen, perhaps one could ignore the hle bits which are just an
optimization, but there is no way to find out in the generic code
whether the extra bits are just an optimization or change the behavior
of the builtin.

Another issue is that libtsan apparently internally uses just the deprecated
__sync_* builtins (or no builtins at all for load/store).  That in some
cases works (because __sync_* is usually equivalent to __atomic_* with
__ATOMIC_SEQ_CST memmodel), but for the load/store cases and for atomic
exchange it doesn't (the former don't provide any barrier, the latter
in __sync_lock_test_and_set is only __ATOMIC_ACQUIRE barrier).
Can libtsan at least conditionally, when built with GCC 4.7 or later,
use __atomic_* builtins instead of __sync_*?  One still probably has to
use __ATOMIC_SEQ_CST model anyway, otherwise there would need to be a switch
based on the mem model, as  only constant arguments to __atomic_* builtins
do something other than sequential consistency.

Another issue is that there are no fetch + nand (or nand + fetch) tsan
entrypoints, I could transform those into a loop using cas, but that looks
overkill, then it would be better to change libtsan.

The most important problem is that __tsan_atomic*_compare_exchange has just
a single memory model argument, while the __atomic_* builtins have two (one
for success, another one for failure).  Right now the patch just ignores
the failure memory model, but that might actually lead into not detecting
a race in a program when it should have been detected (note the failure
memory model can't be stronger than success memory model).  Would it be
possible to add another argument to those, and use the failure mode instead
of success mode if the atomic operation fails?

Apparently there are also no entry points for op and fetch (as opposed to
fetch and op), but the patch translates those into the corresponding
fetch and op libcalls with + val (etc.) on the result.

Oh, and there are no 16-byte atomics provided by libtsan, the library
would need to be built with -mcx16 on x86_64 for that and have the
additional entry points for unsigned __int128.  The patch doesn't touch
the 16-byte atomics, leaves them as is.

This patch is on top of the patch from yesterday which introduced use of
BUILT_IN_* functions in asan and builds them if the FE doesn't.

I've used the attached testcase to verify it compiles, and quickly skimmed
the changes it is doing, but haven't actually tested it more than that.

As for local statics mentioned in the PR too, those are in GCC handled by
__cxa_guard_{acquire,release,abort} library functions, so either libtsan would
need to intercept those calls, or we'd need to add some instrumentation
before and/or after that (what would be needed?).

And, I'm also wondering about string/memory builtins asan is instrumenting,
wouldn't it be useful to instrument those similarly for tsan (say also
just the first byte accessed and last one)?  I know you are overriding most
of them in libtsan, but some of them are often expanded inline (e.g. memcpy,
strlen, etc.) and thus the libtsan replacements aren't executed.
Or better yet, is there some libtsan entry point to report memory access
range?

2012-11-23  Jakub Jelinek  <jakub@redhat.com>

	PR sanitizer/55439
	* Makefile.in (tsan.o): Depend on tree-ssa-propagate.h.
	* sanitizer.def: Add __tsan_atomic* builtins.
	* asan.c (initialize_sanitizer_builtins): Adjust to also
	initialize __tsan_atomic* builtins.
	* tsan.c: Include tree-ssa-propagate.h.
	(enum tsan_atomic_action): New enum.
	(tsan_atomic_table): New table.
	(instrument_builtin_call): New function.
	(instrument_gimple): Take pointer to gimple_stmt_iterator
	instead of gimple_stmt_iterator.  Call instrument_builtin_call
	on builtin call stmts.
	(instrument_memory_accesses): Adjust instrument_gimple caller.
	* builtin-types.def (BT_FN_BOOL_VPTR_PTR_I1_INT,
	BT_FN_BOOL_VPTR_PTR_I2_INT, BT_FN_BOOL_VPTR_PTR_I4_INT,
	BT_FN_BOOL_VPTR_PTR_I8_INT): New.


	Jakub
extern void bar (void *);

template <typename T>
__attribute__((noinline, noclone)) T
foo (T *p, bool var)
{
  T x = 0, y;
  x += __sync_fetch_and_add (p, 1); bar ((void *) p);
  x += __sync_fetch_and_sub (p, 2); bar ((void *) p);
  x += __sync_fetch_and_and (p, 7); bar ((void *) p);
  x += __sync_fetch_and_or (p, 1); bar ((void *) p);
  x += __sync_fetch_and_xor (p, 2); bar ((void *) p);
  x += __sync_fetch_and_nand (p, 4); bar ((void *) p);
  x += __sync_add_and_fetch (p, 1); bar ((void *) p);
  x += __sync_sub_and_fetch (p, 2); bar ((void *) p);
  x += __sync_and_and_fetch (p, 7); bar ((void *) p);
  x += __sync_or_and_fetch (p, 1); bar ((void *) p);
  x += __sync_xor_and_fetch (p, 2); bar ((void *) p);
  x += __sync_nand_and_fetch (p, 4); bar ((void *) p);
  x += __sync_bool_compare_and_swap (p, 7, 3); bar ((void *) p);
  x += __sync_val_compare_and_swap (p, 7, 3); bar ((void *) p);
  x += __sync_lock_test_and_set (p, 9); bar ((void *) p);
  __sync_lock_release (p); bar ((void *) p);
  __sync_synchronize (); bar ((void *) p);
  x += __atomic_exchange_n (p, 9, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_exchange_n (p, 9, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_load_n (p, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_load_n (p, __ATOMIC_SEQ_CST); bar ((void *) p);
  y = 6;
  x += __atomic_compare_exchange_n (p, &y, 8, true, __ATOMIC_RELAXED, __ATOMIC_RELAXED); bar ((void *) p);
  y = 6;
  x += __atomic_compare_exchange_n (p, &y, 8, false, __ATOMIC_RELAXED, __ATOMIC_RELAXED); bar ((void *) p);
  y = 6;
  x += __atomic_compare_exchange_n (p, &y, 8, var, __ATOMIC_RELAXED, __ATOMIC_RELAXED); bar ((void *) p);
  y = 6;
  x += __atomic_compare_exchange_n (p, &y, 8, true, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); bar ((void *) p);
  y = 6;
  x += __atomic_compare_exchange_n (p, &y, 8, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); bar ((void *) p);
  y = 6;
  x += __atomic_compare_exchange_n (p, &y, 8, var, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); bar ((void *) p);
  __atomic_store_n (p, 9, __ATOMIC_RELAXED); bar ((void *) p);
  __atomic_store_n (p, 9, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_add_fetch (p, 1, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_add_fetch (p, 1, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_sub_fetch (p, 2, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_sub_fetch (p, 2, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_and_fetch (p, 7, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_and_fetch (p, 7, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_or_fetch (p, 1, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_or_fetch (p, 1, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_xor_fetch (p, 2, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_xor_fetch (p, 2, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_nand_fetch (p, 4, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_nand_fetch (p, 4, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_fetch_add (p, 1, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_fetch_add (p, 1, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_fetch_sub (p, 2, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_fetch_sub (p, 2, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_fetch_and (p, 7, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_fetch_and (p, 7, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_fetch_or (p, 1, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_fetch_or (p, 1, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_fetch_xor (p, 2, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_fetch_xor (p, 2, __ATOMIC_SEQ_CST); bar ((void *) p);
  x += __atomic_fetch_nand (p, 4, __ATOMIC_RELAXED); bar ((void *) p);
  x += __atomic_fetch_nand (p, 4, __ATOMIC_SEQ_CST); bar ((void *) p);
  __atomic_thread_fence (__ATOMIC_RELAXED); bar ((void *) p);
  __atomic_thread_fence (__ATOMIC_SEQ_CST); bar ((void *) p);
  __atomic_signal_fence (__ATOMIC_RELAXED); bar ((void *) p);
  __atomic_signal_fence (__ATOMIC_SEQ_CST); bar ((void *) p);
  return x;
}

void
baz (bool var)
{
  unsigned char c = 0;
  unsigned short s = 0;
  unsigned int i = 0;
  unsigned long l = 0;
  unsigned __int128 q = 0;
  foo (&c, var);
  foo (&s, var);
  foo (&i, var);
  foo (&l, var);
  foo (&q, var);
}

Comments

Jakub Jelinek Nov. 23, 2012, 4:39 p.m. UTC | #1
On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote:
> > This patch attempts to instrument __atomic_* and __sync_* builtins.
> > Unfortunately for none of the builtins there is 1:1 mapping to the tsan
> > replacements, tsan uses weirdo memory model encoding (instead of values
> > from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
> > one memory model at the same time), so even for the easy cases GCC
> > has to transform them.
> 
> 
> gcc must be using old version of the library. I've switched to ABI
> constants some time ago.

Ah, it was just me looking at llvm compiler-rt tsan checkout from a few days
ago.  Guess I'll need to update the patch.  So, it now expects 0 for relaxed
up to 5 for sequentially consistent?

> > More importantly, there is no way to pass through
> > __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
> > up and doesn't replace anything if e.g.
> > __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
> > is seen, perhaps one could ignore the hle bits which are just an
> > optimization, but there is no way to find out in the generic code
> > whether the extra bits are just an optimization or change the behavior
> > of the builtin.
> >
> 
> Do you mean hardware lock elission? oh, boy

Yes.

> It's not just "an atomic". It should be legal to downgrade them to plain
> atomic ops (however, I am not sure what memory order they must have... is
> it possible to have HLE_ACQUIRE before seq_cst atomic rmw?). And I think
> that's what we need to do.

Perhaps if there wasn't the compatibility hack or what is that 100500
comparison, or if it could be tweaked, we could pass through the HLE bits
too and let the library decide what to do with it.

If HLE bits are set, the low order bits (model & 65535) contains the
normal memory model, i.e. 0 (relaxed) through 5 (seq_cst), and either 65536
(hle acquire) or 131072 (hle release) is ored with that.

> Well, it's a dirty implementation that relies on x86 memory model (and no
> compiler optimizations, well, apparently there are data races :)).
> I think eventually I will just replace them with mutexes.

I don't see why mutexes would be better than just plain __atomic_* builtins.
With mutexes there wouldn't be any atomicity...

> I've just committed a patch to llvm with failure_memory_order (r168518).

Ok, I'll adjust the patch to pass both memory models through then.

> Yeah, I think it's better to transform them to a more standard ones (llvm
> also has own weird atomic ops and own memory orders).

Ok, no change on the GCC side then needed for that beyond what I posted.

> > Oh, and there are no 16-byte atomics provided by libtsan, the library
> > would need to be built with -mcx16 on x86_64 for that and have the
> > additional entry points for unsigned __int128.  The patch doesn't touch
> > the 16-byte atomics, leaves them as is.
> 
> I think that's fine for now.

Perhaps.  Note that such atomics are used also e.g. for #pragma omp atomic
on long double etc.

> That's what llvm does as well. But it inserts a fast path before
> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the
> same?
> tsan intercepts __cxa_guard functions.

Yes, except it isn't __atomic_load_*, but plain memory read.
  _3 = MEM[(char *)&_ZGVZ3foovE1a];
  if (_3 == 0)
    goto <bb 3>;
  else
    goto <bb 8>;

  <bb 8>:
  fast path, whatever;

  <bb 3>:
  _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a);
  ...

So, right now tsan would just instrument it as __tsan_read1 from
&_ZGVZ3foovE1a rather than any atomic load.

> Well, yes, the compiler module must pass to the runtime all memory
> accesses, whatever form they have in compiler internal representation.
> Yes, I think I need to provide range memory access functions in runtime. I
> already have this issue for Go language, there are a lot of range accesses
> due to arrays and slices.
> I will add them next week.

Ok.  A slight problem then is that where the tsan pass sits right now, there
is no easy way to find out if the builtin call will be expanded inline or
not, so (similar for asan), if we instrument them in the pass, it might be
instrumented twice at runtime if the builtin is expanded as a library call
(once the added instrumentation for the builtin, once in the intercepted
library call).  That isn't wrong, just might need slightly more resources
than if we ensured we only instrument the builtin if it isn't expanded
inline.

	Jakub
Xinliang David Li Nov. 23, 2012, 5:07 p.m. UTC | #2
On Fri, Nov 23, 2012 at 8:39 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote:
>> > This patch attempts to instrument __atomic_* and __sync_* builtins.
>> > Unfortunately for none of the builtins there is 1:1 mapping to the tsan
>> > replacements, tsan uses weirdo memory model encoding (instead of values
>> > from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
>> > one memory model at the same time), so even for the easy cases GCC
>> > has to transform them.
>>
>>
>> gcc must be using old version of the library. I've switched to ABI
>> constants some time ago.
>
> Ah, it was just me looking at llvm compiler-rt tsan checkout from a few days
> ago.  Guess I'll need to update the patch.  So, it now expects 0 for relaxed
> up to 5 for sequentially consistent?
>
>> > More importantly, there is no way to pass through
>> > __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
>> > up and doesn't replace anything if e.g.
>> > __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
>> > is seen, perhaps one could ignore the hle bits which are just an
>> > optimization, but there is no way to find out in the generic code
>> > whether the extra bits are just an optimization or change the behavior
>> > of the builtin.
>> >
>>
>> Do you mean hardware lock elission? oh, boy
>
> Yes.
>
>> It's not just "an atomic". It should be legal to downgrade them to plain
>> atomic ops (however, I am not sure what memory order they must have... is
>> it possible to have HLE_ACQUIRE before seq_cst atomic rmw?). And I think
>> that's what we need to do.
>
> Perhaps if there wasn't the compatibility hack or what is that 100500
> comparison, or if it could be tweaked, we could pass through the HLE bits
> too and let the library decide what to do with it.
>
> If HLE bits are set, the low order bits (model & 65535) contains the
> normal memory model, i.e. 0 (relaxed) through 5 (seq_cst), and either 65536
> (hle acquire) or 131072 (hle release) is ored with that.
>
>> Well, it's a dirty implementation that relies on x86 memory model (and no
>> compiler optimizations, well, apparently there are data races :)).
>> I think eventually I will just replace them with mutexes.
>
> I don't see why mutexes would be better than just plain __atomic_* builtins.
> With mutexes there wouldn't be any atomicity...
>
>> I've just committed a patch to llvm with failure_memory_order (r168518).
>
> Ok, I'll adjust the patch to pass both memory models through then.
>
>> Yeah, I think it's better to transform them to a more standard ones (llvm
>> also has own weird atomic ops and own memory orders).
>
> Ok, no change on the GCC side then needed for that beyond what I posted.
>
>> > Oh, and there are no 16-byte atomics provided by libtsan, the library
>> > would need to be built with -mcx16 on x86_64 for that and have the
>> > additional entry points for unsigned __int128.  The patch doesn't touch
>> > the 16-byte atomics, leaves them as is.
>>
>> I think that's fine for now.
>
> Perhaps.  Note that such atomics are used also e.g. for #pragma omp atomic
> on long double etc.
>
>> That's what llvm does as well. But it inserts a fast path before
>> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the
>> same?
>> tsan intercepts __cxa_guard functions.
>
> Yes, except it isn't __atomic_load_*, but plain memory read.
>   _3 = MEM[(char *)&_ZGVZ3foovE1a];
>   if (_3 == 0)
>     goto <bb 3>;
>   else
>     goto <bb 8>;
>
>   <bb 8>:
>   fast path, whatever;
>
>   <bb 3>:
>   _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a);
>   ...
>
> So, right now tsan would just instrument it as __tsan_read1 from
> &_ZGVZ3foovE1a rather than any atomic load.
>
>> Well, yes, the compiler module must pass to the runtime all memory
>> accesses, whatever form they have in compiler internal representation.
>> Yes, I think I need to provide range memory access functions in runtime. I
>> already have this issue for Go language, there are a lot of range accesses
>> due to arrays and slices.
>> I will add them next week.
>
> Ok.  A slight problem then is that where the tsan pass sits right now, there
> is no easy way to find out if the builtin call will be expanded inline or
> not, so (similar for asan), if we instrument them in the pass, it might be
> instrumented twice at runtime if the builtin is expanded as a library call
> (once the added instrumentation for the builtin, once in the intercepted
> library call).  That isn't wrong, just might need slightly more resources
> than if we ensured we only instrument the builtin if it isn't expanded
> inline.
>

Should inlining of those functions be disabled as if -fno-builtins is specified?

David



>         Jakub
Dmitry Vyukov Nov. 24, 2012, 8:58 a.m. UTC | #3
On Fri, Nov 23, 2012 at 9:07 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Fri, Nov 23, 2012 at 8:39 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote:
>>> > This patch attempts to instrument __atomic_* and __sync_* builtins.
>>> > Unfortunately for none of the builtins there is 1:1 mapping to the tsan
>>> > replacements, tsan uses weirdo memory model encoding (instead of values
>>> > from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
>>> > one memory model at the same time), so even for the easy cases GCC
>>> > has to transform them.
>>>
>>>
>>> gcc must be using old version of the library. I've switched to ABI
>>> constants some time ago.
>>
>> Ah, it was just me looking at llvm compiler-rt tsan checkout from a few days
>> ago.  Guess I'll need to update the patch.  So, it now expects 0 for relaxed
>> up to 5 for sequentially consistent?
>>
>>> > More importantly, there is no way to pass through
>>> > __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
>>> > up and doesn't replace anything if e.g.
>>> > __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
>>> > is seen, perhaps one could ignore the hle bits which are just an
>>> > optimization, but there is no way to find out in the generic code
>>> > whether the extra bits are just an optimization or change the behavior
>>> > of the builtin.
>>> >
>>>
>>> Do you mean hardware lock elission? oh, boy
>>
>> Yes.
>>
>>> It's not just "an atomic". It should be legal to downgrade them to plain
>>> atomic ops (however, I am not sure what memory order they must have... is
>>> it possible to have HLE_ACQUIRE before seq_cst atomic rmw?). And I think
>>> that's what we need to do.
>>
>> Perhaps if there wasn't the compatibility hack or what is that 100500
>> comparison, or if it could be tweaked, we could pass through the HLE bits
>> too and let the library decide what to do with it.
>>
>> If HLE bits are set, the low order bits (model & 65535) contains the
>> normal memory model, i.e. 0 (relaxed) through 5 (seq_cst), and either 65536
>> (hle acquire) or 131072 (hle release) is ored with that.
>>
>>> Well, it's a dirty implementation that relies on x86 memory model (and no
>>> compiler optimizations, well, apparently there are data races :)).
>>> I think eventually I will just replace them with mutexes.
>>
>> I don't see why mutexes would be better than just plain __atomic_* builtins.
>> With mutexes there wouldn't be any atomicity...
>>
>>> I've just committed a patch to llvm with failure_memory_order (r168518).
>>
>> Ok, I'll adjust the patch to pass both memory models through then.
>>
>>> Yeah, I think it's better to transform them to a more standard ones (llvm
>>> also has own weird atomic ops and own memory orders).
>>
>> Ok, no change on the GCC side then needed for that beyond what I posted.
>>
>>> > Oh, and there are no 16-byte atomics provided by libtsan, the library
>>> > would need to be built with -mcx16 on x86_64 for that and have the
>>> > additional entry points for unsigned __int128.  The patch doesn't touch
>>> > the 16-byte atomics, leaves them as is.
>>>
>>> I think that's fine for now.
>>
>> Perhaps.  Note that such atomics are used also e.g. for #pragma omp atomic
>> on long double etc.
>>
>>> That's what llvm does as well. But it inserts a fast path before
>>> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the
>>> same?
>>> tsan intercepts __cxa_guard functions.
>>
>> Yes, except it isn't __atomic_load_*, but plain memory read.
>>   _3 = MEM[(char *)&_ZGVZ3foovE1a];
>>   if (_3 == 0)
>>     goto <bb 3>;
>>   else
>>     goto <bb 8>;
>>
>>   <bb 8>:
>>   fast path, whatever;
>>
>>   <bb 3>:
>>   _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a);
>>   ...
>>
>> So, right now tsan would just instrument it as __tsan_read1 from
>> &_ZGVZ3foovE1a rather than any atomic load.
>>
>>> Well, yes, the compiler module must pass to the runtime all memory
>>> accesses, whatever form they have in compiler internal representation.
>>> Yes, I think I need to provide range memory access functions in runtime. I
>>> already have this issue for Go language, there are a lot of range accesses
>>> due to arrays and slices.
>>> I will add them next week.
>>
>> Ok.  A slight problem then is that where the tsan pass sits right now, there
>> is no easy way to find out if the builtin call will be expanded inline or
>> not, so (similar for asan), if we instrument them in the pass, it might be
>> instrumented twice at runtime if the builtin is expanded as a library call
>> (once the added instrumentation for the builtin, once in the intercepted
>> library call).  That isn't wrong, just might need slightly more resources
>> than if we ensured we only instrument the builtin if it isn't expanded
>> inline.
>>
>
> Should inlining of those functions be disabled as if -fno-builtins is specified?

Yes, it sounds reasonable. Performance characteristics under tsan
differ significantly, so most likely we don't care.
Dmitry Vyukov Nov. 24, 2012, 11:06 a.m. UTC | #4
On Fri, Nov 23, 2012 at 8:39 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote:
>> > This patch attempts to instrument __atomic_* and __sync_* builtins.
>> > Unfortunately for none of the builtins there is 1:1 mapping to the tsan
>> > replacements, tsan uses weirdo memory model encoding (instead of values
>> > from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
>> > one memory model at the same time), so even for the easy cases GCC
>> > has to transform them.
>>
>>
>> gcc must be using old version of the library. I've switched to ABI
>> constants some time ago.
>
> Ah, it was just me looking at llvm compiler-rt tsan checkout from a few days
> ago.  Guess I'll need to update the patch.  So, it now expects 0 for relaxed
> up to 5 for sequentially consistent?

Yes.


>> > More importantly, there is no way to pass through
>> > __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
>> > up and doesn't replace anything if e.g.
>> > __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
>> > is seen, perhaps one could ignore the hle bits which are just an
>> > optimization, but there is no way to find out in the generic code
>> > whether the extra bits are just an optimization or change the behavior
>> > of the builtin.
>> >
>>
>> Do you mean hardware lock elission? oh, boy
>
> Yes.
>
>> It's not just "an atomic". It should be legal to downgrade them to plain
>> atomic ops (however, I am not sure what memory order they must have... is
>> it possible to have HLE_ACQUIRE before seq_cst atomic rmw?). And I think
>> that's what we need to do.
>
> Perhaps if there wasn't the compatibility hack or what is that 100500
> comparison, or if it could be tweaked, we could pass through the HLE bits
> too and let the library decide what to do with it.

100500 was a tricky multi-step migration to the new scheme, because we
can't deploy compiler and runtime atomically.
I think we just need to drop HLE bits. I don't know what the runtime
can possibly do with them. And I don't want one more tricky multi-step
migration.



>
> If HLE bits are set, the low order bits (model & 65535) contains the
> normal memory model, i.e. 0 (relaxed) through 5 (seq_cst), and either 65536
> (hle acquire) or 131072 (hle release) is ored with that.
>
>> Well, it's a dirty implementation that relies on x86 memory model (and no
>> compiler optimizations, well, apparently there are data races :)).
>> I think eventually I will just replace them with mutexes.
>
> I don't see why mutexes would be better than just plain __atomic_* builtins.
> With mutexes there wouldn't be any atomicity...

For race detector any atomic operations is a heavy operations, which
is not atomic anyway.
Currently I do:

update vector clock
do the atomic operation
update vector clock

where 'update vector clock' is
lock container mutex
find atomic descriptor
lock atomic descriptor
unlock container mutex
update clock (O(N))
unlock atomic descriptor

it's much wiser to combine 2 vector clock updates and do the atomic
operation itself under the atomic descriptor mutex.



>> I've just committed a patch to llvm with failure_memory_order (r168518).
>
> Ok, I'll adjust the patch to pass both memory models through then.
>
>> Yeah, I think it's better to transform them to a more standard ones (llvm
>> also has own weird atomic ops and own memory orders).
>
> Ok, no change on the GCC side then needed for that beyond what I posted.
>
>> > Oh, and there are no 16-byte atomics provided by libtsan, the library
>> > would need to be built with -mcx16 on x86_64 for that and have the
>> > additional entry points for unsigned __int128.  The patch doesn't touch
>> > the 16-byte atomics, leaves them as is.
>>
>> I think that's fine for now.
>
> Perhaps.  Note that such atomics are used also e.g. for #pragma omp atomic
> on long double etc.
>
>> That's what llvm does as well. But it inserts a fast path before
>> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the
>> same?
>> tsan intercepts __cxa_guard functions.
>
> Yes, except it isn't __atomic_load_*, but plain memory read.
>   _3 = MEM[(char *)&_ZGVZ3foovE1a];
>   if (_3 == 0)
>     goto <bb 3>;
>   else
>     goto <bb 8>;
>
>   <bb 8>:
>   fast path, whatever;
>
>   <bb 3>:
>   _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a);
>   ...
>
> So, right now tsan would just instrument it as __tsan_read1 from
> &_ZGVZ3foovE1a rather than any atomic load.


Looks like a bug. That needs to be load-acquire with proper compiler
and hardware memory ordering.




>> Well, yes, the compiler module must pass to the runtime all memory
>> accesses, whatever form they have in compiler internal representation.
>> Yes, I think I need to provide range memory access functions in runtime. I
>> already have this issue for Go language, there are a lot of range accesses
>> due to arrays and slices.
>> I will add them next week.
>
> Ok.  A slight problem then is that where the tsan pass sits right now, there
> is no easy way to find out if the builtin call will be expanded inline or
> not, so (similar for asan), if we instrument them in the pass, it might be
> instrumented twice at runtime if the builtin is expanded as a library call
> (once the added instrumentation for the builtin, once in the intercepted
> library call).  That isn't wrong, just might need slightly more resources
> than if we ensured we only instrument the builtin if it isn't expanded
> inline.
>
>         Jakub
Dmitry Vyukov Nov. 26, 2012, 8:17 a.m. UTC | #5
On Sat, Nov 24, 2012 at 12:58 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> Ok.  A slight problem then is that where the tsan pass sits right now, there
>>> is no easy way to find out if the builtin call will be expanded inline or
>>> not, so (similar for asan), if we instrument them in the pass, it might be
>>> instrumented twice at runtime if the builtin is expanded as a library call
>>> (once the added instrumentation for the builtin, once in the intercepted
>>> library call).  That isn't wrong, just might need slightly more resources
>>> than if we ensured we only instrument the builtin if it isn't expanded
>>> inline.
>>>
>>
>> Should inlining of those functions be disabled as if -fno-builtins is specified?
>
> Yes, it sounds reasonable. Performance characteristics under tsan
> differ significantly, so most likely we don't care.


Do we still need range access functions then?
Jakub Jelinek Nov. 26, 2012, 8:35 a.m. UTC | #6
On Mon, Nov 26, 2012 at 12:17:44PM +0400, Dmitry Vyukov wrote:
> On Sat, Nov 24, 2012 at 12:58 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> >>> Ok.  A slight problem then is that where the tsan pass sits right now, there
> >>> is no easy way to find out if the builtin call will be expanded inline or
> >>> not, so (similar for asan), if we instrument them in the pass, it might be
> >>> instrumented twice at runtime if the builtin is expanded as a library call
> >>> (once the added instrumentation for the builtin, once in the intercepted
> >>> library call).  That isn't wrong, just might need slightly more resources
> >>> than if we ensured we only instrument the builtin if it isn't expanded
> >>> inline.
> >>>
> >>
> >> Should inlining of those functions be disabled as if -fno-builtins is specified?
> >
> > Yes, it sounds reasonable. Performance characteristics under tsan
> > differ significantly, so most likely we don't care.
> 
> 
> Do we still need range access functions then?

Yes.  I think whether to instrument builtins inline or not should be a user
decision, -fsanitize-thread and either nothinng or -fno-builtin.  Implying
-fno-builtin might penalize some code unnecessarily (remember, builtin
handling is not only about expanding some builtins inline, but also about
folding them if they are used with constant arguments, performing all sorts
of optimizations etc., all of which are ok, just we don't have a knob to
disable inline expansion of some builtins (which, some are certainly ok,
e.g. those that don't have a library counterpart).
Also note that many programs use __builtin_* calls directly and then you
can't disable recognizing them as builtins.

	Jakub
Dmitry Vyukov Nov. 26, 2012, 10:20 a.m. UTC | #7
On Sat, Nov 24, 2012 at 3:06 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Fri, Nov 23, 2012 at 8:39 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Fri, Nov 23, 2012 at 08:10:39PM +0400, Dmitry Vyukov wrote:
> >> > This patch attempts to instrument __atomic_* and __sync_* builtins.
> >> > Unfortunately for none of the builtins there is 1:1 mapping to the tsan
> >> > replacements, tsan uses weirdo memory model encoding (instead of values
> >> > from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
> >> > one memory model at the same time), so even for the easy cases GCC
> >> > has to transform them.
> >>
> >>
> >> gcc must be using old version of the library. I've switched to ABI
> >> constants some time ago.
> >
> > Ah, it was just me looking at llvm compiler-rt tsan checkout from a few days
> > ago.  Guess I'll need to update the patch.  So, it now expects 0 for relaxed
> > up to 5 for sequentially consistent?
>
> Yes.
>
>
> >> > More importantly, there is no way to pass through
> >> > __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
> >> > up and doesn't replace anything if e.g.
> >> > __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
> >> > is seen, perhaps one could ignore the hle bits which are just an
> >> > optimization, but there is no way to find out in the generic code
> >> > whether the extra bits are just an optimization or change the behavior
> >> > of the builtin.
> >> >
> >>
> >> Do you mean hardware lock elission? oh, boy
> >
> > Yes.
> >
> >> It's not just "an atomic". It should be legal to downgrade them to plain
> >> atomic ops (however, I am not sure what memory order they must have... is
> >> it possible to have HLE_ACQUIRE before seq_cst atomic rmw?). And I think
> >> that's what we need to do.
> >
> > Perhaps if there wasn't the compatibility hack or what is that 100500
> > comparison, or if it could be tweaked, we could pass through the HLE bits
> > too and let the library decide what to do with it.
>
> 100500 was a tricky multi-step migration to the new scheme, because we
> can't deploy compiler and runtime atomically.
> I think we just need to drop HLE bits. I don't know what the runtime
> can possibly do with them. And I don't want one more tricky multi-step
> migration.
>
>
>
> >
> > If HLE bits are set, the low order bits (model & 65535) contains the
> > normal memory model, i.e. 0 (relaxed) through 5 (seq_cst), and either 65536
> > (hle acquire) or 131072 (hle release) is ored with that.
> >
> >> Well, it's a dirty implementation that relies on x86 memory model (and no
> >> compiler optimizations, well, apparently there are data races :)).
> >> I think eventually I will just replace them with mutexes.
> >
> > I don't see why mutexes would be better than just plain __atomic_* builtins.
> > With mutexes there wouldn't be any atomicity...
>
> For race detector any atomic operations is a heavy operations, which
> is not atomic anyway.
> Currently I do:
>
> update vector clock
> do the atomic operation
> update vector clock
>
> where 'update vector clock' is
> lock container mutex
> find atomic descriptor
> lock atomic descriptor
> unlock container mutex
> update clock (O(N))
> unlock atomic descriptor
>
> it's much wiser to combine 2 vector clock updates and do the atomic
> operation itself under the atomic descriptor mutex.
>
>
>
> >> I've just committed a patch to llvm with failure_memory_order (r168518).
> >
> > Ok, I'll adjust the patch to pass both memory models through then.
> >
> >> Yeah, I think it's better to transform them to a more standard ones (llvm
> >> also has own weird atomic ops and own memory orders).
> >
> > Ok, no change on the GCC side then needed for that beyond what I posted.
> >
> >> > Oh, and there are no 16-byte atomics provided by libtsan, the library
> >> > would need to be built with -mcx16 on x86_64 for that and have the
> >> > additional entry points for unsigned __int128.  The patch doesn't touch
> >> > the 16-byte atomics, leaves them as is.
> >>
> >> I think that's fine for now.
> >
> > Perhaps.  Note that such atomics are used also e.g. for #pragma omp atomic
> > on long double etc.
> >
> >> That's what llvm does as well. But it inserts a fast path before
> >> __cxa_guard_acquire -- acquire-load of the lock word. Doesn't gcc do the
> >> same?
> >> tsan intercepts __cxa_guard functions.
> >
> > Yes, except it isn't __atomic_load_*, but plain memory read.
> >   _3 = MEM[(char *)&_ZGVZ3foovE1a];
> >   if (_3 == 0)
> >     goto <bb 3>;
> >   else
> >     goto <bb 8>;
> >
> >   <bb 8>:
> >   fast path, whatever;
> >
> >   <bb 3>:
> >   _5 = __cxa_guard_acquire (&_ZGVZ3foovE1a);
> >   ...
> >
> > So, right now tsan would just instrument it as __tsan_read1 from
> > &_ZGVZ3foovE1a rather than any atomic load.
>
>
> Looks like a bug. That needs to be load-acquire with proper compiler
> and hardware memory ordering.


Can anybody confirm whether it's a bug or not.
It can also involve compiler reorderings, especially if the object
ctor/functions contains accesses to other global objects.
Xinliang David Li Nov. 26, 2012, 5:07 p.m. UTC | #8
On Mon, Nov 26, 2012 at 12:35 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Nov 26, 2012 at 12:17:44PM +0400, Dmitry Vyukov wrote:
>> On Sat, Nov 24, 2012 at 12:58 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> >>> Ok.  A slight problem then is that where the tsan pass sits right now, there
>> >>> is no easy way to find out if the builtin call will be expanded inline or
>> >>> not, so (similar for asan), if we instrument them in the pass, it might be
>> >>> instrumented twice at runtime if the builtin is expanded as a library call
>> >>> (once the added instrumentation for the builtin, once in the intercepted
>> >>> library call).  That isn't wrong, just might need slightly more resources
>> >>> than if we ensured we only instrument the builtin if it isn't expanded
>> >>> inline.
>> >>>
>> >>
>> >> Should inlining of those functions be disabled as if -fno-builtins is specified?
>> >
>> > Yes, it sounds reasonable. Performance characteristics under tsan
>> > differ significantly, so most likely we don't care.
>>
>>
>> Do we still need range access functions then?
>
> Yes.  I think whether to instrument builtins inline or not should be a user
> decision

Why is that? Most users probably are not aware builtins can be inlined
or care about it.

>, -fsanitize-thread and either nothinng or -fno-builtin.  Implying
> -fno-builtin might penalize some code unnecessarily (remember, builtin
> handling is not only about expanding some builtins inline, but also about
> folding them if they are used with constant arguments, performing all sorts
> of optimizations etc.,

Yes, in some cases, it can incur a little runtime overhead as

int i;
memset(&i, 1, 4);

but I think compared with tsan's runtime overhead, this is almost negligible.


> all of which are ok, just we don't have a knob to
> disable inline expansion of some builtins (which, some are certainly ok,
> e.g. those that don't have a library counterpart).

There is an internal interface to call:

disable_builtin_function (...) -- tsan option process can call it
selectively for the stringop related builtins.

> Also note that many programs use __builtin_* calls directly and then you
> can't disable recognizing them as builtins.

Those calls won't be affected, I think. Compiler can also directly
generate builtin calls. However that means the range checking is still
needed.

David


>
>         Jakub
Dmitry Vyukov Nov. 26, 2012, 6:06 p.m. UTC | #9
On Mon, Nov 26, 2012 at 9:07 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> >>> Ok.  A slight problem then is that where the tsan pass sits right now, there
>>> >>> is no easy way to find out if the builtin call will be expanded inline or
>>> >>> not, so (similar for asan), if we instrument them in the pass, it might be
>>> >>> instrumented twice at runtime if the builtin is expanded as a library call
>>> >>> (once the added instrumentation for the builtin, once in the intercepted
>>> >>> library call).  That isn't wrong, just might need slightly more resources
>>> >>> than if we ensured we only instrument the builtin if it isn't expanded
>>> >>> inline.
>>> >>>
>>> >>
>>> >> Should inlining of those functions be disabled as if -fno-builtins is specified?
>>> >
>>> > Yes, it sounds reasonable. Performance characteristics under tsan
>>> > differ significantly, so most likely we don't care.
>>>
>>>
>>> Do we still need range access functions then?
>>
>> Yes.  I think whether to instrument builtins inline or not should be a user
>> decision
>
> Why is that? Most users probably are not aware builtins can be inlined
> or care about it.


I agree with David. Most likely it has negligible overhead under tsan.
Even if gcc provides ability to choose between builtins and library
functions, it does not have to provide it under tsan (which is a very
special mode).
On the other hand, I don't know what complexity is involved on gcc
side. As for runtime I can easily implement it, it's just a matter of
writing 2 thin wrappers for existing functionality.



>>, -fsanitize-thread and either nothinng or -fno-builtin.  Implying
>> -fno-builtin might penalize some code unnecessarily (remember, builtin
>> handling is not only about expanding some builtins inline, but also about
>> folding them if they are used with constant arguments, performing all sorts
>> of optimizations etc.,
>
> Yes, in some cases, it can incur a little runtime overhead as
>
> int i;
> memset(&i, 1, 4);
>
> but I think compared with tsan's runtime overhead, this is almost negligible.
>
>
>> all of which are ok, just we don't have a knob to
>> disable inline expansion of some builtins (which, some are certainly ok,
>> e.g. those that don't have a library counterpart).
>
> There is an internal interface to call:
>
> disable_builtin_function (...) -- tsan option process can call it
> selectively for the stringop related builtins.
>
>> Also note that many programs use __builtin_* calls directly and then you
>> can't disable recognizing them as builtins.
>
> Those calls won't be affected, I think. Compiler can also directly
> generate builtin calls. However that means the range checking is still
> needed.
>
> David
>
>
>>
>>         Jakub
Dmitry Vyukov Nov. 27, 2012, 8:13 a.m. UTC | #10
On Fri, Nov 23, 2012 at 6:05 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> This patch attempts to instrument __atomic_* and __sync_* builtins.
> Unfortunately for none of the builtins there is 1:1 mapping to the tsan
> replacements, tsan uses weirdo memory model encoding (instead of values
> from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
> one memory model at the same time), so even for the easy cases GCC
> has to transform them.  More importantly, there is no way to pass through
> __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
> up and doesn't replace anything if e.g.
> __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
> is seen, perhaps one could ignore the hle bits which are just an
> optimization, but there is no way to find out in the generic code
> whether the extra bits are just an optimization or change the behavior
> of the builtin.
>
> Another issue is that libtsan apparently internally uses just the deprecated
> __sync_* builtins (or no builtins at all for load/store).  That in some
> cases works (because __sync_* is usually equivalent to __atomic_* with
> __ATOMIC_SEQ_CST memmodel), but for the load/store cases and for atomic
> exchange it doesn't (the former don't provide any barrier, the latter
> in __sync_lock_test_and_set is only __ATOMIC_ACQUIRE barrier).
> Can libtsan at least conditionally, when built with GCC 4.7 or later,
> use __atomic_* builtins instead of __sync_*?  One still probably has to
> use __ATOMIC_SEQ_CST model anyway, otherwise there would need to be a switch
> based on the mem model, as  only constant arguments to __atomic_* builtins
> do something other than sequential consistency.
>
> Another issue is that there are no fetch + nand (or nand + fetch) tsan
> entrypoints, I could transform those into a loop using cas, but that looks
> overkill, then it would be better to change libtsan.
>
> The most important problem is that __tsan_atomic*_compare_exchange has just
> a single memory model argument, while the __atomic_* builtins have two (one
> for success, another one for failure).  Right now the patch just ignores
> the failure memory model, but that might actually lead into not detecting
> a race in a program when it should have been detected (note the failure
> memory model can't be stronger than success memory model).  Would it be
> possible to add another argument to those, and use the failure mode instead
> of success mode if the atomic operation fails?
>
> Apparently there are also no entry points for op and fetch (as opposed to
> fetch and op), but the patch translates those into the corresponding
> fetch and op libcalls with + val (etc.) on the result.
>
> Oh, and there are no 16-byte atomics provided by libtsan, the library
> would need to be built with -mcx16 on x86_64 for that and have the
> additional entry points for unsigned __int128.  The patch doesn't touch
> the 16-byte atomics, leaves them as is.


Hi,

I've added 128-bit atomic ops:
http://llvm.org/viewvc/llvm-project?view=rev&revision=168683

Refactored atomic ops so that the atomic operation itself is done
under the mutex:
http://llvm.org/viewvc/llvm-project?view=rev&revision=168682

And added atomic nand operation:
http://llvm.org/viewvc/llvm-project?view=rev&revision=168584
Dmitry Vyukov Nov. 27, 2012, 8:17 a.m. UTC | #11
On Fri, Nov 23, 2012 at 6:05 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> This patch attempts to instrument __atomic_* and __sync_* builtins.
> Unfortunately for none of the builtins there is 1:1 mapping to the tsan
> replacements, tsan uses weirdo memory model encoding (instead of values
> from 0 to 5 apparently 1 << 0 to 1 << 5, as if one could have more than
> one memory model at the same time), so even for the easy cases GCC
> has to transform them.  More importantly, there is no way to pass through
> __ATOMIC_HLE_ACQUIRE/__ATOMIC_HLE_RELEASE.  Right now the patch just gives
> up and doesn't replace anything if e.g.
> __atomic_store_n (p, 0, __ATOMIC_HLE_RELEASE | __ATOMIC_RELEASE);
> is seen, perhaps one could ignore the hle bits which are just an
> optimization, but there is no way to find out in the generic code
> whether the extra bits are just an optimization or change the behavior
> of the builtin.
>
> Another issue is that libtsan apparently internally uses just the deprecated
> __sync_* builtins (or no builtins at all for load/store).  That in some
> cases works (because __sync_* is usually equivalent to __atomic_* with
> __ATOMIC_SEQ_CST memmodel), but for the load/store cases and for atomic
> exchange it doesn't (the former don't provide any barrier, the latter
> in __sync_lock_test_and_set is only __ATOMIC_ACQUIRE barrier).
> Can libtsan at least conditionally, when built with GCC 4.7 or later,
> use __atomic_* builtins instead of __sync_*?  One still probably has to
> use __ATOMIC_SEQ_CST model anyway, otherwise there would need to be a switch
> based on the mem model, as  only constant arguments to __atomic_* builtins
> do something other than sequential consistency.
>
> Another issue is that there are no fetch + nand (or nand + fetch) tsan
> entrypoints, I could transform those into a loop using cas, but that looks
> overkill, then it would be better to change libtsan.
>
> The most important problem is that __tsan_atomic*_compare_exchange has just
> a single memory model argument, while the __atomic_* builtins have two (one
> for success, another one for failure).  Right now the patch just ignores
> the failure memory model, but that might actually lead into not detecting
> a race in a program when it should have been detected (note the failure
> memory model can't be stronger than success memory model).  Would it be
> possible to add another argument to those, and use the failure mode instead
> of success mode if the atomic operation fails?
>
> Apparently there are also no entry points for op and fetch (as opposed to
> fetch and op), but the patch translates those into the corresponding
> fetch and op libcalls with + val (etc.) on the result.
>
> Oh, and there are no 16-byte atomics provided by libtsan, the library
> would need to be built with -mcx16 on x86_64 for that and have the
> additional entry points for unsigned __int128.  The patch doesn't touch
> the 16-byte atomics, leaves them as is.
>
> This patch is on top of the patch from yesterday which introduced use of
> BUILT_IN_* functions in asan and builds them if the FE doesn't.
>
> I've used the attached testcase to verify it compiles, and quickly skimmed
> the changes it is doing, but haven't actually tested it more than that.
>
> As for local statics mentioned in the PR too, those are in GCC handled by
> __cxa_guard_{acquire,release,abort} library functions, so either libtsan would
> need to intercept those calls, or we'd need to add some instrumentation
> before and/or after that (what would be needed?).
>
> And, I'm also wondering about string/memory builtins asan is instrumenting,
> wouldn't it be useful to instrument those similarly for tsan (say also
> just the first byte accessed and last one)?  I know you are overriding most
> of them in libtsan, but some of them are often expanded inline (e.g. memcpy,
> strlen, etc.) and thus the libtsan replacements aren't executed.
> Or better yet, is there some libtsan entry point to report memory access
> range?
>
> 2012-11-23  Jakub Jelinek  <jakub@redhat.com>
>
>         PR sanitizer/55439
>         * Makefile.in (tsan.o): Depend on tree-ssa-propagate.h.
>         * sanitizer.def: Add __tsan_atomic* builtins.
>         * asan.c (initialize_sanitizer_builtins): Adjust to also
>         initialize __tsan_atomic* builtins.
>         * tsan.c: Include tree-ssa-propagate.h.
>         (enum tsan_atomic_action): New enum.
>         (tsan_atomic_table): New table.
>         (instrument_builtin_call): New function.
>         (instrument_gimple): Take pointer to gimple_stmt_iterator
>         instead of gimple_stmt_iterator.  Call instrument_builtin_call
>         on builtin call stmts.
>         (instrument_memory_accesses): Adjust instrument_gimple caller.
>         * builtin-types.def (BT_FN_BOOL_VPTR_PTR_I1_INT,
>         BT_FN_BOOL_VPTR_PTR_I2_INT, BT_FN_BOOL_VPTR_PTR_I4_INT,
>         BT_FN_BOOL_VPTR_PTR_I8_INT): New.
>
> --- gcc/Makefile.in.jj  2012-11-23 10:31:37.861377311 +0100
> +++ gcc/Makefile.in     2012-11-23 13:36:00.578761997 +0100
> @@ -2234,7 +2234,8 @@ tsan.o : $(CONFIG_H) $(SYSTEM_H) $(TREE_
>     $(TM_H) coretypes.h $(TREE_DUMP_H) $(TREE_PASS_H) $(CGRAPH_H) $(GGC_H) \
>     $(BASIC_BLOCK_H) $(FLAGS_H) $(FUNCTION_H) \
>     $(TM_P_H) $(TREE_FLOW_H) $(DIAGNOSTIC_CORE_H) $(GIMPLE_H) tree-iterator.h \
> -   intl.h cfghooks.h output.h options.h c-family/c-common.h tsan.h asan.h
> +   intl.h cfghooks.h output.h options.h c-family/c-common.h tsan.h asan.h \
> +   tree-ssa-propagate.h
>  tree-ssa-tail-merge.o: tree-ssa-tail-merge.c \
>     $(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(BITMAP_H) \
>     $(FLAGS_H) $(TM_P_H) $(BASIC_BLOCK_H) \
> --- gcc/sanitizer.def.jj        2012-11-23 10:31:37.859377232 +0100
> +++ gcc/sanitizer.def   2012-11-23 13:36:00.576761947 +0100
> @@ -57,3 +57,148 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_WRIT
>                       BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
>  DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_WRITE16, "__tsan_write16",
>                       BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
> +
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_LOAD,
> +                     "__tsan_atomic8_load",
> +                     BT_FN_I1_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_LOAD,
> +                     "__tsan_atomic16_load",
> +                     BT_FN_I2_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_LOAD,
> +                     "__tsan_atomic32_load",
> +                     BT_FN_I4_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_LOAD,
> +                     "__tsan_atomic64_load",
> +                     BT_FN_I8_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
> +
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_STORE,
> +                     "__tsan_atomic8_store",
> +                     BT_FN_VOID_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_STORE,
> +                     "__tsan_atomic16_store",
> +                     BT_FN_VOID_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_STORE,
> +                     "__tsan_atomic32_store",
> +                     BT_FN_VOID_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_STORE,
> +                     "__tsan_atomic64_store",
> +                     BT_FN_VOID_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
> +
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_EXCHANGE,
> +                     "__tsan_atomic8_exchange",
> +                     BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_EXCHANGE,
> +                     "__tsan_atomic16_exchange",
> +                     BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_EXCHANGE,
> +                     "__tsan_atomic32_exchange",
> +                     BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_EXCHANGE,
> +                     "__tsan_atomic64_exchange",
> +                     BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
> +
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_FETCH_ADD,
> +                     "__tsan_atomic8_fetch_add",
> +                     BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_FETCH_ADD,
> +                     "__tsan_atomic16_fetch_add",
> +                     BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_FETCH_ADD,
> +                     "__tsan_atomic32_fetch_add",
> +                     BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_FETCH_ADD,
> +                     "__tsan_atomic64_fetch_add",
> +                     BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
> +
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_FETCH_SUB,
> +                     "__tsan_atomic8_fetch_sub",
> +                     BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_FETCH_SUB,
> +                     "__tsan_atomic16_fetch_sub",
> +                     BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_FETCH_SUB,
> +                     "__tsan_atomic32_fetch_sub",
> +                     BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_FETCH_SUB,
> +                     "__tsan_atomic64_fetch_sub",
> +                     BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
> +
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_FETCH_AND,
> +                     "__tsan_atomic8_fetch_and",
> +                     BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_FETCH_AND,
> +                     "__tsan_atomic16_fetch_and",
> +                     BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_FETCH_AND,
> +                     "__tsan_atomic32_fetch_and",
> +                     BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_FETCH_AND,
> +                     "__tsan_atomic64_fetch_and",
> +                     BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
> +
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_FETCH_OR,
> +                     "__tsan_atomic8_fetch_or",
> +                     BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_FETCH_OR,
> +                     "__tsan_atomic16_fetch_or",
> +                     BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_FETCH_OR,
> +                     "__tsan_atomic32_fetch_or",
> +                     BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_FETCH_OR,
> +                     "__tsan_atomic64_fetch_or",
> +                     BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
> +
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_FETCH_XOR,
> +                     "__tsan_atomic8_fetch_xor",
> +                     BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_FETCH_XOR,
> +                     "__tsan_atomic16_fetch_xor",
> +                     BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_FETCH_XOR,
> +                     "__tsan_atomic32_fetch_xor",
> +                     BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_FETCH_XOR,
> +                     "__tsan_atomic64_fetch_xor",
> +                     BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
> +
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_COMPARE_EXCHANGE_STRONG,
> +                     "__tsan_atomic8_compare_exchange_strong",
> +                     BT_FN_BOOL_VPTR_PTR_I1_INT,
> +                     ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_COMPARE_EXCHANGE_STRONG,
> +                     "__tsan_atomic16_compare_exchange_strong",
> +                     BT_FN_BOOL_VPTR_PTR_I2_INT,
> +                     ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_COMPARE_EXCHANGE_STRONG,
> +                     "__tsan_atomic32_compare_exchange_strong",
> +                     BT_FN_BOOL_VPTR_PTR_I4_INT,
> +                     ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_COMPARE_EXCHANGE_STRONG,
> +                     "__tsan_atomic64_compare_exchange_strong",
> +                     BT_FN_BOOL_VPTR_PTR_I8_INT,
> +                     ATTR_NOTHROW_LEAF_LIST)
> +
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_COMPARE_EXCHANGE_WEAK,
> +                     "__tsan_atomic8_compare_exchange_weak",
> +                     BT_FN_BOOL_VPTR_PTR_I1_INT,
> +                     ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_COMPARE_EXCHANGE_WEAK,
> +                     "__tsan_atomic16_compare_exchange_weak",
> +                     BT_FN_BOOL_VPTR_PTR_I2_INT,
> +                     ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_COMPARE_EXCHANGE_WEAK,
> +                     "__tsan_atomic32_compare_exchange_weak",
> +                     BT_FN_BOOL_VPTR_PTR_I4_INT,
> +                     ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_COMPARE_EXCHANGE_WEAK,
> +                     "__tsan_atomic64_compare_exchange_weak",
> +                     BT_FN_BOOL_VPTR_PTR_I8_INT,
> +                     ATTR_NOTHROW_LEAF_LIST)
> +
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC_THREAD_FENCE,
> +                     "__tsan_atomic_thread_fence",
> +                     BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST)
> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC_SIGNAL_FENCE,
> +                     "__tsan_atomic_signal_fence",
> +                     BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST)
> --- gcc/asan.c.jj       2012-11-23 10:31:37.863377378 +0100
> +++ gcc/asan.c  2012-11-23 13:36:00.579762019 +0100
> @@ -1484,6 +1484,53 @@ initialize_sanitizer_builtins (void)
>      = build_function_type_list (void_type_node, ptr_type_node,
>                                 build_nonstandard_integer_type (POINTER_SIZE,
>                                                                 1), NULL_TREE);
> +  tree BT_FN_VOID_INT
> +    = build_function_type_list (void_type_node, integer_type_node, NULL_TREE);
> +  tree BT_FN_BOOL_VPTR_PTR_IX_INT[4];
> +  tree BT_FN_IX_CONST_VPTR_INT[4];
> +  tree BT_FN_IX_VPTR_IX_INT[4];
> +  tree BT_FN_VOID_VPTR_IX_INT[4];
> +  tree vptr
> +    = build_pointer_type (build_qualified_type (void_type_node,
> +                                               TYPE_QUAL_VOLATILE));
> +  tree cvptr
> +    = build_pointer_type (build_qualified_type (void_type_node,
> +                                               TYPE_QUAL_VOLATILE
> +                                               |TYPE_QUAL_CONST));
> +  tree boolt
> +    = lang_hooks.types.type_for_size (BOOL_TYPE_SIZE, 1);
> +  int i;
> +  for (i = 0; i < 4; i++)
> +    {
> +      tree ix = build_nonstandard_integer_type (BITS_PER_UNIT * (1 << i), 1);
> +      BT_FN_BOOL_VPTR_PTR_IX_INT[i]
> +       = build_function_type_list (boolt, vptr, ptr_type_node, ix,
> +                                   integer_type_node, NULL_TREE);
> +      BT_FN_IX_CONST_VPTR_INT[i]
> +       = build_function_type_list (ix, cvptr, integer_type_node, NULL_TREE);
> +      BT_FN_IX_VPTR_IX_INT[i]
> +       = build_function_type_list (ix, vptr, ix, integer_type_node,
> +                                   NULL_TREE);
> +      BT_FN_VOID_VPTR_IX_INT[i]
> +       = build_function_type_list (void_type_node, vptr, ix,
> +                                   integer_type_node, NULL_TREE);
> +    }
> +#define BT_FN_BOOL_VPTR_PTR_I1_INT BT_FN_BOOL_VPTR_PTR_IX_INT[0]
> +#define BT_FN_I1_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[0]
> +#define BT_FN_I1_VPTR_I1_INT BT_FN_IX_VPTR_IX_INT[0]
> +#define BT_FN_VOID_VPTR_I1_INT BT_FN_VOID_VPTR_IX_INT[0]
> +#define BT_FN_BOOL_VPTR_PTR_I2_INT BT_FN_BOOL_VPTR_PTR_IX_INT[1]
> +#define BT_FN_I2_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[1]
> +#define BT_FN_I2_VPTR_I2_INT BT_FN_IX_VPTR_IX_INT[1]
> +#define BT_FN_VOID_VPTR_I2_INT BT_FN_VOID_VPTR_IX_INT[1]
> +#define BT_FN_BOOL_VPTR_PTR_I4_INT BT_FN_BOOL_VPTR_PTR_IX_INT[2]
> +#define BT_FN_I4_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[2]
> +#define BT_FN_I4_VPTR_I4_INT BT_FN_IX_VPTR_IX_INT[2]
> +#define BT_FN_VOID_VPTR_I4_INT BT_FN_VOID_VPTR_IX_INT[2]
> +#define BT_FN_BOOL_VPTR_PTR_I8_INT BT_FN_BOOL_VPTR_PTR_IX_INT[3]
> +#define BT_FN_I8_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[3]
> +#define BT_FN_I8_VPTR_I8_INT BT_FN_IX_VPTR_IX_INT[3]
> +#define BT_FN_VOID_VPTR_I8_INT BT_FN_VOID_VPTR_IX_INT[3]
>  #undef ATTR_NOTHROW_LEAF_LIST
>  #define ATTR_NOTHROW_LEAF_LIST ECF_NOTHROW | ECF_LEAF
>  #undef ATTR_NORETURN_NOTHROW_LEAF_LIST
> --- gcc/tsan.c.jj       2012-11-23 13:35:06.448082211 +0100
> +++ gcc/tsan.c  2012-11-23 13:50:49.750579441 +0100
> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
>  #include "target.h"
>  #include "cgraph.h"
>  #include "diagnostic.h"
> +#include "tree-ssa-propagate.h"
>  #include "tsan.h"
>  #include "asan.h"
>
> @@ -216,33 +217,370 @@ instrument_expr (gimple_stmt_iterator gs
>    return true;
>  }
>
> +/* Actions for sync/atomic builtin transformations.  */
> +enum tsan_atomic_action
> +{
> +  adjust_last, add_seq_cst, add_acquire, weak_cas, strong_cas,
> +  bool_cas, val_cas, lock_release, fetch_op, fetch_op_seq_cst
> +};
> +
> +/* Table how to map sync/atomic builtins to their corresponding
> +   tsan equivalents.  */
> +static struct tsan_map_atomic
> +{
> +  enum built_in_function fcode, tsan_fcode;
> +  enum tsan_atomic_action action;
> +  enum tree_code code;
> +} tsan_atomic_table[] =
> +{
> +#define TRANSFORM(fcode, tsan_fcode, action, code) \
> +  { BUILT_IN_##fcode, BUILT_IN_##tsan_fcode, action, code }
> +#define ADJUST_LAST(fcode, tsan_fcode) \
> +  TRANSFORM (fcode, tsan_fcode, adjust_last, ERROR_MARK)
> +#define ADD_SEQ_CST(fcode, tsan_fcode) \
> +  TRANSFORM (fcode, tsan_fcode, add_seq_cst, ERROR_MARK)
> +#define ADD_ACQUIRE(fcode, tsan_fcode) \
> +  TRANSFORM (fcode, tsan_fcode, add_acquire, ERROR_MARK)
> +#define WEAK_CAS(fcode, tsan_fcode) \
> +  TRANSFORM (fcode, tsan_fcode, weak_cas, ERROR_MARK)
> +#define STRONG_CAS(fcode, tsan_fcode) \
> +  TRANSFORM (fcode, tsan_fcode, strong_cas, ERROR_MARK)
> +#define BOOL_CAS(fcode, tsan_fcode) \
> +  TRANSFORM (fcode, tsan_fcode, bool_cas, ERROR_MARK)
> +#define VAL_CAS(fcode, tsan_fcode) \
> +  TRANSFORM (fcode, tsan_fcode, val_cas, ERROR_MARK)
> +#define LOCK_RELEASE(fcode, tsan_fcode) \
> +  TRANSFORM (fcode, tsan_fcode, lock_release, ERROR_MARK)
> +#define FETCH_OP(fcode, tsan_fcode, code) \
> +  TRANSFORM (fcode, tsan_fcode, fetch_op, code)
> +#define FETCH_OPS(fcode, tsan_fcode, code) \
> +  TRANSFORM (fcode, tsan_fcode, fetch_op_seq_cst, code)
> +
> +  ADJUST_LAST (ATOMIC_LOAD_1, TSAN_ATOMIC8_LOAD),
> +  ADJUST_LAST (ATOMIC_LOAD_2, TSAN_ATOMIC16_LOAD),
> +  ADJUST_LAST (ATOMIC_LOAD_4, TSAN_ATOMIC32_LOAD),
> +  ADJUST_LAST (ATOMIC_LOAD_8, TSAN_ATOMIC64_LOAD),
> +  ADJUST_LAST (ATOMIC_STORE_1, TSAN_ATOMIC8_STORE),
> +  ADJUST_LAST (ATOMIC_STORE_2, TSAN_ATOMIC16_STORE),
> +  ADJUST_LAST (ATOMIC_STORE_4, TSAN_ATOMIC32_STORE),
> +  ADJUST_LAST (ATOMIC_STORE_8, TSAN_ATOMIC64_STORE),
> +  ADJUST_LAST (ATOMIC_EXCHANGE_1, TSAN_ATOMIC8_EXCHANGE),
> +  ADJUST_LAST (ATOMIC_EXCHANGE_2, TSAN_ATOMIC16_EXCHANGE),
> +  ADJUST_LAST (ATOMIC_EXCHANGE_4, TSAN_ATOMIC32_EXCHANGE),
> +  ADJUST_LAST (ATOMIC_EXCHANGE_8, TSAN_ATOMIC64_EXCHANGE),
> +  ADJUST_LAST (ATOMIC_FETCH_ADD_1, TSAN_ATOMIC8_FETCH_ADD),
> +  ADJUST_LAST (ATOMIC_FETCH_ADD_2, TSAN_ATOMIC16_FETCH_ADD),
> +  ADJUST_LAST (ATOMIC_FETCH_ADD_4, TSAN_ATOMIC32_FETCH_ADD),
> +  ADJUST_LAST (ATOMIC_FETCH_ADD_8, TSAN_ATOMIC64_FETCH_ADD),
> +  ADJUST_LAST (ATOMIC_FETCH_SUB_1, TSAN_ATOMIC8_FETCH_SUB),
> +  ADJUST_LAST (ATOMIC_FETCH_SUB_2, TSAN_ATOMIC16_FETCH_SUB),
> +  ADJUST_LAST (ATOMIC_FETCH_SUB_4, TSAN_ATOMIC32_FETCH_SUB),
> +  ADJUST_LAST (ATOMIC_FETCH_SUB_8, TSAN_ATOMIC64_FETCH_SUB),
> +  ADJUST_LAST (ATOMIC_FETCH_AND_1, TSAN_ATOMIC8_FETCH_AND),
> +  ADJUST_LAST (ATOMIC_FETCH_AND_2, TSAN_ATOMIC16_FETCH_AND),
> +  ADJUST_LAST (ATOMIC_FETCH_AND_4, TSAN_ATOMIC32_FETCH_AND),
> +  ADJUST_LAST (ATOMIC_FETCH_AND_8, TSAN_ATOMIC64_FETCH_AND),
> +  ADJUST_LAST (ATOMIC_FETCH_OR_1, TSAN_ATOMIC8_FETCH_OR),
> +  ADJUST_LAST (ATOMIC_FETCH_OR_2, TSAN_ATOMIC16_FETCH_OR),
> +  ADJUST_LAST (ATOMIC_FETCH_OR_4, TSAN_ATOMIC32_FETCH_OR),
> +  ADJUST_LAST (ATOMIC_FETCH_OR_8, TSAN_ATOMIC64_FETCH_OR),
> +  ADJUST_LAST (ATOMIC_FETCH_XOR_1, TSAN_ATOMIC8_FETCH_XOR),
> +  ADJUST_LAST (ATOMIC_FETCH_XOR_2, TSAN_ATOMIC16_FETCH_XOR),
> +  ADJUST_LAST (ATOMIC_FETCH_XOR_4, TSAN_ATOMIC32_FETCH_XOR),
> +  ADJUST_LAST (ATOMIC_FETCH_XOR_8, TSAN_ATOMIC64_FETCH_XOR),
> +
> +  ADJUST_LAST (ATOMIC_THREAD_FENCE, TSAN_ATOMIC_THREAD_FENCE),
> +  ADJUST_LAST (ATOMIC_SIGNAL_FENCE, TSAN_ATOMIC_SIGNAL_FENCE),
> +
> +  FETCH_OP (ATOMIC_ADD_FETCH_1, TSAN_ATOMIC8_FETCH_ADD, PLUS_EXPR),
> +  FETCH_OP (ATOMIC_ADD_FETCH_2, TSAN_ATOMIC16_FETCH_ADD, PLUS_EXPR),
> +  FETCH_OP (ATOMIC_ADD_FETCH_4, TSAN_ATOMIC32_FETCH_ADD, PLUS_EXPR),
> +  FETCH_OP (ATOMIC_ADD_FETCH_8, TSAN_ATOMIC64_FETCH_ADD, PLUS_EXPR),
> +  FETCH_OP (ATOMIC_SUB_FETCH_1, TSAN_ATOMIC8_FETCH_SUB, MINUS_EXPR),
> +  FETCH_OP (ATOMIC_SUB_FETCH_2, TSAN_ATOMIC16_FETCH_SUB, MINUS_EXPR),
> +  FETCH_OP (ATOMIC_SUB_FETCH_4, TSAN_ATOMIC32_FETCH_SUB, MINUS_EXPR),
> +  FETCH_OP (ATOMIC_SUB_FETCH_8, TSAN_ATOMIC64_FETCH_SUB, MINUS_EXPR),
> +  FETCH_OP (ATOMIC_AND_FETCH_1, TSAN_ATOMIC8_FETCH_AND, BIT_AND_EXPR),
> +  FETCH_OP (ATOMIC_AND_FETCH_2, TSAN_ATOMIC16_FETCH_AND, BIT_AND_EXPR),
> +  FETCH_OP (ATOMIC_AND_FETCH_4, TSAN_ATOMIC32_FETCH_AND, BIT_AND_EXPR),
> +  FETCH_OP (ATOMIC_AND_FETCH_8, TSAN_ATOMIC64_FETCH_AND, BIT_AND_EXPR),
> +  FETCH_OP (ATOMIC_OR_FETCH_1, TSAN_ATOMIC8_FETCH_OR, BIT_IOR_EXPR),
> +  FETCH_OP (ATOMIC_OR_FETCH_2, TSAN_ATOMIC16_FETCH_OR, BIT_IOR_EXPR),
> +  FETCH_OP (ATOMIC_OR_FETCH_4, TSAN_ATOMIC32_FETCH_OR, BIT_IOR_EXPR),
> +  FETCH_OP (ATOMIC_OR_FETCH_8, TSAN_ATOMIC64_FETCH_OR, BIT_IOR_EXPR),
> +  FETCH_OP (ATOMIC_XOR_FETCH_1, TSAN_ATOMIC8_FETCH_XOR, BIT_XOR_EXPR),
> +  FETCH_OP (ATOMIC_XOR_FETCH_2, TSAN_ATOMIC16_FETCH_XOR, BIT_XOR_EXPR),
> +  FETCH_OP (ATOMIC_XOR_FETCH_4, TSAN_ATOMIC32_FETCH_XOR, BIT_XOR_EXPR),
> +  FETCH_OP (ATOMIC_XOR_FETCH_8, TSAN_ATOMIC64_FETCH_XOR, BIT_XOR_EXPR),
> +
> +  ADD_ACQUIRE (SYNC_LOCK_TEST_AND_SET_1, TSAN_ATOMIC8_EXCHANGE),
> +  ADD_ACQUIRE (SYNC_LOCK_TEST_AND_SET_2, TSAN_ATOMIC16_EXCHANGE),
> +  ADD_ACQUIRE (SYNC_LOCK_TEST_AND_SET_4, TSAN_ATOMIC32_EXCHANGE),
> +  ADD_ACQUIRE (SYNC_LOCK_TEST_AND_SET_8, TSAN_ATOMIC64_EXCHANGE),
> +
> +  ADD_SEQ_CST (SYNC_FETCH_AND_ADD_1, TSAN_ATOMIC8_FETCH_ADD),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_ADD_2, TSAN_ATOMIC16_FETCH_ADD),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_ADD_4, TSAN_ATOMIC32_FETCH_ADD),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_ADD_8, TSAN_ATOMIC64_FETCH_ADD),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_SUB_1, TSAN_ATOMIC8_FETCH_SUB),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_SUB_2, TSAN_ATOMIC16_FETCH_SUB),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_SUB_4, TSAN_ATOMIC32_FETCH_SUB),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_SUB_8, TSAN_ATOMIC64_FETCH_SUB),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_AND_1, TSAN_ATOMIC8_FETCH_AND),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_AND_2, TSAN_ATOMIC16_FETCH_AND),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_AND_4, TSAN_ATOMIC32_FETCH_AND),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_AND_8, TSAN_ATOMIC64_FETCH_AND),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_OR_1, TSAN_ATOMIC8_FETCH_OR),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_OR_2, TSAN_ATOMIC16_FETCH_OR),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_OR_4, TSAN_ATOMIC32_FETCH_OR),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_OR_8, TSAN_ATOMIC64_FETCH_OR),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_XOR_1, TSAN_ATOMIC8_FETCH_XOR),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_XOR_2, TSAN_ATOMIC16_FETCH_XOR),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_XOR_4, TSAN_ATOMIC32_FETCH_XOR),
> +  ADD_SEQ_CST (SYNC_FETCH_AND_XOR_8, TSAN_ATOMIC64_FETCH_XOR),
> +
> +  ADD_SEQ_CST (SYNC_SYNCHRONIZE, TSAN_ATOMIC_THREAD_FENCE),
> +
> +  FETCH_OPS (SYNC_ADD_AND_FETCH_1, TSAN_ATOMIC8_FETCH_ADD, PLUS_EXPR),
> +  FETCH_OPS (SYNC_ADD_AND_FETCH_2, TSAN_ATOMIC16_FETCH_ADD, PLUS_EXPR),
> +  FETCH_OPS (SYNC_ADD_AND_FETCH_4, TSAN_ATOMIC32_FETCH_ADD, PLUS_EXPR),
> +  FETCH_OPS (SYNC_ADD_AND_FETCH_8, TSAN_ATOMIC64_FETCH_ADD, PLUS_EXPR),
> +  FETCH_OPS (SYNC_SUB_AND_FETCH_1, TSAN_ATOMIC8_FETCH_SUB, MINUS_EXPR),
> +  FETCH_OPS (SYNC_SUB_AND_FETCH_2, TSAN_ATOMIC16_FETCH_SUB, MINUS_EXPR),
> +  FETCH_OPS (SYNC_SUB_AND_FETCH_4, TSAN_ATOMIC32_FETCH_SUB, MINUS_EXPR),
> +  FETCH_OPS (SYNC_SUB_AND_FETCH_8, TSAN_ATOMIC64_FETCH_SUB, MINUS_EXPR),
> +  FETCH_OPS (SYNC_AND_AND_FETCH_1, TSAN_ATOMIC8_FETCH_AND, BIT_AND_EXPR),
> +  FETCH_OPS (SYNC_AND_AND_FETCH_2, TSAN_ATOMIC16_FETCH_AND, BIT_AND_EXPR),
> +  FETCH_OPS (SYNC_AND_AND_FETCH_4, TSAN_ATOMIC32_FETCH_AND, BIT_AND_EXPR),
> +  FETCH_OPS (SYNC_AND_AND_FETCH_8, TSAN_ATOMIC64_FETCH_AND, BIT_AND_EXPR),
> +  FETCH_OPS (SYNC_OR_AND_FETCH_1, TSAN_ATOMIC8_FETCH_OR, BIT_IOR_EXPR),
> +  FETCH_OPS (SYNC_OR_AND_FETCH_2, TSAN_ATOMIC16_FETCH_OR, BIT_IOR_EXPR),
> +  FETCH_OPS (SYNC_OR_AND_FETCH_4, TSAN_ATOMIC32_FETCH_OR, BIT_IOR_EXPR),
> +  FETCH_OPS (SYNC_OR_AND_FETCH_8, TSAN_ATOMIC64_FETCH_OR, BIT_IOR_EXPR),
> +  FETCH_OPS (SYNC_XOR_AND_FETCH_1, TSAN_ATOMIC8_FETCH_XOR, BIT_XOR_EXPR),
> +  FETCH_OPS (SYNC_XOR_AND_FETCH_2, TSAN_ATOMIC16_FETCH_XOR, BIT_XOR_EXPR),
> +  FETCH_OPS (SYNC_XOR_AND_FETCH_4, TSAN_ATOMIC32_FETCH_XOR, BIT_XOR_EXPR),
> +  FETCH_OPS (SYNC_XOR_AND_FETCH_8, TSAN_ATOMIC64_FETCH_XOR, BIT_XOR_EXPR),
> +
> +  WEAK_CAS (ATOMIC_COMPARE_EXCHANGE_1, TSAN_ATOMIC8_COMPARE_EXCHANGE_WEAK),
> +  WEAK_CAS (ATOMIC_COMPARE_EXCHANGE_2, TSAN_ATOMIC16_COMPARE_EXCHANGE_WEAK),
> +  WEAK_CAS (ATOMIC_COMPARE_EXCHANGE_4, TSAN_ATOMIC32_COMPARE_EXCHANGE_WEAK),
> +  WEAK_CAS (ATOMIC_COMPARE_EXCHANGE_8, TSAN_ATOMIC64_COMPARE_EXCHANGE_WEAK),
> +
> +  STRONG_CAS (ATOMIC_COMPARE_EXCHANGE_1, TSAN_ATOMIC8_COMPARE_EXCHANGE_STRONG),
> +  STRONG_CAS (ATOMIC_COMPARE_EXCHANGE_2,
> +             TSAN_ATOMIC16_COMPARE_EXCHANGE_STRONG),
> +  STRONG_CAS (ATOMIC_COMPARE_EXCHANGE_4,
> +             TSAN_ATOMIC32_COMPARE_EXCHANGE_STRONG),
> +  STRONG_CAS (ATOMIC_COMPARE_EXCHANGE_8,
> +             TSAN_ATOMIC64_COMPARE_EXCHANGE_STRONG),
> +
> +  BOOL_CAS (SYNC_BOOL_COMPARE_AND_SWAP_1,
> +           TSAN_ATOMIC8_COMPARE_EXCHANGE_STRONG),
> +  BOOL_CAS (SYNC_BOOL_COMPARE_AND_SWAP_2,
> +           TSAN_ATOMIC16_COMPARE_EXCHANGE_STRONG),
> +  BOOL_CAS (SYNC_BOOL_COMPARE_AND_SWAP_4,
> +           TSAN_ATOMIC32_COMPARE_EXCHANGE_STRONG),
> +  BOOL_CAS (SYNC_BOOL_COMPARE_AND_SWAP_8,
> +           TSAN_ATOMIC64_COMPARE_EXCHANGE_STRONG),
> +
> +  VAL_CAS (SYNC_VAL_COMPARE_AND_SWAP_1, TSAN_ATOMIC8_COMPARE_EXCHANGE_STRONG),
> +  VAL_CAS (SYNC_VAL_COMPARE_AND_SWAP_2, TSAN_ATOMIC16_COMPARE_EXCHANGE_STRONG),
> +  VAL_CAS (SYNC_VAL_COMPARE_AND_SWAP_4, TSAN_ATOMIC32_COMPARE_EXCHANGE_STRONG),
> +  VAL_CAS (SYNC_VAL_COMPARE_AND_SWAP_8, TSAN_ATOMIC64_COMPARE_EXCHANGE_STRONG),
> +
> +  LOCK_RELEASE (SYNC_LOCK_RELEASE_1, TSAN_ATOMIC8_STORE),
> +  LOCK_RELEASE (SYNC_LOCK_RELEASE_2, TSAN_ATOMIC16_STORE),
> +  LOCK_RELEASE (SYNC_LOCK_RELEASE_4, TSAN_ATOMIC32_STORE),
> +  LOCK_RELEASE (SYNC_LOCK_RELEASE_8, TSAN_ATOMIC64_STORE)
> +};
> +
> +/* Instrument an atomic builtin.  */
> +
> +static void
> +instrument_builtin_call (gimple_stmt_iterator *gsi)
> +{
> +  gimple stmt = gsi_stmt (*gsi), g;
> +  tree callee = gimple_call_fndecl (stmt), last_arg, args[6], t, lhs;
> +  enum built_in_function fcode = DECL_FUNCTION_CODE (callee);
> +  unsigned int i, num = gimple_call_num_args (stmt), j;
> +  for (j = 0; j < 6 && j < num; j++)
> +    args[j] = gimple_call_arg (stmt, j);
> +  for (i = 0; i < ARRAY_SIZE (tsan_atomic_table); i++)
> +    if (fcode != tsan_atomic_table[i].fcode)
> +      continue;
> +    else
> +      {
> +       tree decl = builtin_decl_implicit (tsan_atomic_table[i].tsan_fcode);
> +       if (decl == NULL_TREE)
> +         return;
> +       switch (tsan_atomic_table[i].action)
> +         {
> +         case adjust_last:
> +         case fetch_op:
> +           last_arg = gimple_call_arg (stmt, num - 1);
> +           if (!host_integerp (last_arg, 1)
> +               || (unsigned HOST_WIDE_INT) tree_low_cst (last_arg, 1)
> +                  > MEMMODEL_SEQ_CST)
> +             return;
> +           gimple_call_set_fndecl (stmt, decl);
> +           gimple_call_set_arg (stmt, num - 1,
> +                                build_int_cst (NULL_TREE,
> +                                               1 << tree_low_cst (last_arg,
> +                                                                  1)));
> +           update_stmt (stmt);
> +           if (tsan_atomic_table[i].action == fetch_op)
> +             {
> +               args[1] = gimple_call_arg (stmt, 1);
> +               goto adjust_result;
> +             }
> +           return;
> +         case add_seq_cst:
> +         case add_acquire:
> +         case fetch_op_seq_cst:
> +           gcc_assert (num <= 2);
> +           for (j = 0; j < num; j++)
> +             args[j] = gimple_call_arg (stmt, j);
> +           for (; j < 2; j++)
> +             args[j] = NULL_TREE;
> +           args[num] = build_int_cst (NULL_TREE,
> +                                      1 << (tsan_atomic_table[i].action
> +                                            != add_acquire
> +                                            ? MEMMODEL_SEQ_CST
> +                                            : MEMMODEL_ACQUIRE));
> +           update_gimple_call (gsi, decl, num + 1, args[0], args[1], args[2]);
> +           stmt = gsi_stmt (*gsi);
> +           if (tsan_atomic_table[i].action == fetch_op_seq_cst)
> +             {
> +             adjust_result:
> +               lhs = gimple_call_lhs (stmt);
> +               if (lhs == NULL_TREE)
> +                 return;
> +               if (!useless_type_conversion_p (TREE_TYPE (lhs),
> +                                               TREE_TYPE (args[1])))
> +                 {
> +                   tree var = make_ssa_name (TREE_TYPE (lhs), NULL);
> +                   g = gimple_build_assign_with_ops (NOP_EXPR, var,
> +                                                     args[1], NULL_TREE);
> +                   gsi_insert_after (gsi, g, GSI_NEW_STMT);
> +                   args[1] = var;
> +                 }
> +               gimple_call_set_lhs (stmt,
> +                                    make_ssa_name (TREE_TYPE (lhs), NULL));
> +               g = gimple_build_assign_with_ops (tsan_atomic_table[i].code,
> +                                                 lhs, gimple_call_lhs (stmt),
> +                                                 args[1]);
> +               update_stmt (stmt);
> +               gsi_insert_after (gsi, g, GSI_NEW_STMT);
> +             }
> +           return;
> +         case weak_cas:
> +           if (!integer_nonzerop (gimple_call_arg (stmt, 3)))
> +             continue;
> +           /* FALLTHRU */
> +         case strong_cas:
> +           gcc_assert (num == 6);
> +           for (j = 0; j < 6; j++)
> +             args[j] = gimple_call_arg (stmt, j);
> +           if (!host_integerp (args[4], 1)
> +               || (unsigned HOST_WIDE_INT) tree_low_cst (args[4], 1)
> +                  > MEMMODEL_SEQ_CST)
> +             return;
> +           update_gimple_call (gsi, decl, 4, args[0], args[1], args[2],
> +                               build_int_cst (NULL_TREE,
> +                                              1 << tree_low_cst (args[4],
> +                                                                 1)));
> +           return;
> +         case bool_cas:
> +         case val_cas:
> +           gcc_assert (num == 3);
> +           for (j = 0; j < 3; j++)
> +             args[j] = gimple_call_arg (stmt, j);
> +           t = TYPE_ARG_TYPES (TREE_TYPE (decl));
> +           t = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (t)));
> +           t = create_tmp_var (t, NULL);
> +           mark_addressable (t);
> +           if (!useless_type_conversion_p (TREE_TYPE (t),
> +                                           TREE_TYPE (args[1])))
> +             {
> +               g = gimple_build_assign_with_ops (NOP_EXPR,
> +                                                 make_ssa_name (TREE_TYPE (t),
> +                                                                NULL),
> +                                                 args[1], NULL_TREE);
> +               gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +               args[1] = gimple_assign_lhs (g);
> +             }
> +           g = gimple_build_assign (t, args[1]);
> +           gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +           lhs = gimple_call_lhs (stmt);
> +           update_gimple_call (gsi, decl, 4, args[0],
> +                               build_fold_addr_expr (t), args[2],
> +                               build_int_cst (NULL_TREE,
> +                                              1 << MEMMODEL_SEQ_CST));
> +           if (tsan_atomic_table[i].action == val_cas && lhs)
> +             {
> +               tree cond;
> +               stmt = gsi_stmt (*gsi);
> +               g = gimple_build_assign (make_ssa_name (TREE_TYPE (t), NULL),
> +                                        t);
> +               gsi_insert_after (gsi, g, GSI_NEW_STMT);
> +               t = make_ssa_name (TREE_TYPE (TREE_TYPE (decl)), stmt);
> +               cond = build2 (NE_EXPR, boolean_type_node, t,
> +                              build_int_cst (TREE_TYPE (t), 0));
> +               g = gimple_build_assign_with_ops (COND_EXPR, lhs, cond,
> +                                                 args[1],
> +                                                 gimple_assign_lhs (g));
> +               gimple_call_set_lhs (stmt, t);
> +               update_stmt (stmt);
> +               gsi_insert_after (gsi, g, GSI_NEW_STMT);
> +             }
> +           return;
> +         case lock_release:
> +           gcc_assert (num == 1);
> +           t = TYPE_ARG_TYPES (TREE_TYPE (decl));
> +           t = TREE_VALUE (TREE_CHAIN (t));
> +           update_gimple_call (gsi, decl, 3, gimple_call_arg (stmt, 0),
> +                               build_int_cst (t, 0),
> +                               build_int_cst (NULL_TREE,
> +                                              1 << MEMMODEL_RELEASE));
> +           return;
> +         default:
> +           continue;
> +         }
> +      }
> +}
> +
>  /* Instruments the gimple pointed to by GSI. Return
>     true if func entry/exit should be instrumented.  */
>
>  static bool
> -instrument_gimple (gimple_stmt_iterator gsi)
> +instrument_gimple (gimple_stmt_iterator *gsi)
>  {
>    gimple stmt;
>    tree rhs, lhs;
>    bool instrumented = false;
>
> -  stmt = gsi_stmt (gsi);
> +  stmt = gsi_stmt (*gsi);
>    if (is_gimple_call (stmt)
>        && (gimple_call_fndecl (stmt)
>           != builtin_decl_implicit (BUILT_IN_TSAN_INIT)))
> -    return true;
> +    {
> +      if (is_gimple_builtin_call (stmt))
> +       instrument_builtin_call (gsi);
> +      return true;
> +    }
>    else if (is_gimple_assign (stmt)
>            && !gimple_clobber_p (stmt))
>      {
>        if (gimple_store_p (stmt))
>         {
>           lhs = gimple_assign_lhs (stmt);
> -         instrumented = instrument_expr (gsi, lhs, true);
> +         instrumented = instrument_expr (*gsi, lhs, true);
>         }
>        if (gimple_assign_load_p (stmt))
>         {
>           rhs = gimple_assign_rhs1 (stmt);
> -         instrumented = instrument_expr (gsi, rhs, false);
> +         instrumented = instrument_expr (*gsi, rhs, false);
>         }
>      }
>    return instrumented;
> @@ -260,7 +598,7 @@ instrument_memory_accesses (void)
>
>    FOR_EACH_BB (bb)
>      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> -      fentry_exit_instrument |= instrument_gimple (gsi);
> +      fentry_exit_instrument |= instrument_gimple (&gsi);
>    return fentry_exit_instrument;
>  }
>
> --- gcc/builtin-types.def.jj    2012-11-23 10:31:37.866377460 +0100
> +++ gcc/builtin-types.def       2012-11-23 13:36:00.576761947 +0100
> @@ -447,6 +447,14 @@ DEF_FUNCTION_TYPE_4 (BT_FN_VOID_SIZE_VPT
>                      BT_VOLATILE_PTR, BT_PTR, BT_INT)
>  DEF_FUNCTION_TYPE_4 (BT_FN_VOID_SIZE_CONST_VPTR_PTR_INT, BT_VOID, BT_SIZE,
>                      BT_CONST_VOLATILE_PTR, BT_PTR, BT_INT)
> +DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_VPTR_PTR_I1_INT,
> +                    BT_BOOL, BT_VOLATILE_PTR, BT_PTR, BT_I1, BT_INT)
> +DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_VPTR_PTR_I2_INT,
> +                    BT_BOOL, BT_VOLATILE_PTR, BT_PTR, BT_I2, BT_INT)
> +DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_VPTR_PTR_I4_INT,
> +                    BT_BOOL, BT_VOLATILE_PTR, BT_PTR, BT_I4, BT_INT)
> +DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_VPTR_PTR_I8_INT,
> +                    BT_BOOL, BT_VOLATILE_PTR, BT_PTR, BT_I8, BT_INT)
>
>  DEF_FUNCTION_TYPE_5 (BT_FN_INT_STRING_INT_SIZE_CONST_STRING_VALIST_ARG,
>                      BT_INT, BT_STRING, BT_INT, BT_SIZE, BT_CONST_STRING,


Hi Jakub,

I've looked at the patch, but I can't actually say much about the
code, I know nothing about gcc internals. If it works, then great.
Rubber stamp LGTM.
Jakub Jelinek Nov. 27, 2012, 8:23 a.m. UTC | #12
On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
> I've added 128-bit atomic ops:
> http://llvm.org/viewvc/llvm-project?view=rev&revision=168683

Thanks.
> 
> Refactored atomic ops so that the atomic operation itself is done
> under the mutex:
> http://llvm.org/viewvc/llvm-project?view=rev&revision=168682

This is IMHO very wrong.  It is fine if you hold some mutex over the atomic
operation, but you do need to perform the operation atomicly, e.g. you can't
rely that all atomic accesses to that memory location are performed from
-fsanitize=thread compiled code (it could be from non-instrumented code
from e.g. other libraries, could be from user written assembly, etc.).
By not doing the operation atomicly, the other code might observe
inconsistent state, or the tsan non-atomic code might be doing the wrong
thing.  One thing is that the tsan analysis will be right, but another
thing is that the program itself might misbehave.

> And added atomic nand operation:
> http://llvm.org/viewvc/llvm-project?view=rev&revision=168584

Thanks.  Can you please also add the memory range read/write functions?
That could be used both for builtins (e.g. if user writes
  __builtin_memcpy (&a, &b, sizeof (a));
in his code, no -fno-builtin or similar actually disables that), compiler
generated builtins - e.g. structure copies
struct large a, b;
  ...
  a = b;
and also for accesses of sizes that aren't supported by the library
(consider 32-byte or 64-byte vectors etc.).

	Jakub
Dmitry Vyukov Nov. 27, 2012, 8:47 a.m. UTC | #13
On Tue, Nov 27, 2012 at 12:23 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
>> I've added 128-bit atomic ops:
>> http://llvm.org/viewvc/llvm-project?view=rev&revision=168683
>
> Thanks.
>>
>> Refactored atomic ops so that the atomic operation itself is done
>> under the mutex:
>> http://llvm.org/viewvc/llvm-project?view=rev&revision=168682
>
> This is IMHO very wrong.  It is fine if you hold some mutex over the atomic
> operation, but you do need to perform the operation atomicly, e.g. you can't
> rely that all atomic accesses to that memory location are performed from
> -fsanitize=thread compiled code (it could be from non-instrumented code
> from e.g. other libraries, could be from user written assembly, etc.).
> By not doing the operation atomicly, the other code might observe
> inconsistent state, or the tsan non-atomic code might be doing the wrong
> thing.  One thing is that the tsan analysis will be right, but another
> thing is that the program itself might misbehave.


Yes, you are right.
I think I've done them atomically initially because of things like
FUTEX_WAKE_OP. I will fix that.


>> And added atomic nand operation:
>> http://llvm.org/viewvc/llvm-project?view=rev&revision=168584
>
> Thanks.  Can you please also add the memory range read/write functions?
> That could be used both for builtins (e.g. if user writes
>   __builtin_memcpy (&a, &b, sizeof (a));
> in his code, no -fno-builtin or similar actually disables that), compiler
> generated builtins - e.g. structure copies
> struct large a, b;
>   ...
>   a = b;
> and also for accesses of sizes that aren't supported by the library
> (consider 32-byte or 64-byte vectors etc.).

Done:
http://llvm.org/viewvc/llvm-project?view=rev&revision=168692
Dmitry Vyukov Nov. 27, 2012, 8:49 a.m. UTC | #14
On Tue, Nov 27, 2012 at 12:47 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Nov 27, 2012 at 12:23 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
>>> I've added 128-bit atomic ops:
>>> http://llvm.org/viewvc/llvm-project?view=rev&revision=168683
>>
>> Thanks.
>>>
>>> Refactored atomic ops so that the atomic operation itself is done
>>> under the mutex:
>>> http://llvm.org/viewvc/llvm-project?view=rev&revision=168682
>>
>> This is IMHO very wrong.  It is fine if you hold some mutex over the atomic
>> operation, but you do need to perform the operation atomicly, e.g. you can't
>> rely that all atomic accesses to that memory location are performed from
>> -fsanitize=thread compiled code (it could be from non-instrumented code
>> from e.g. other libraries, could be from user written assembly, etc.).
>> By not doing the operation atomicly, the other code might observe
>> inconsistent state, or the tsan non-atomic code might be doing the wrong
>> thing.  One thing is that the tsan analysis will be right, but another
>> thing is that the program itself might misbehave.
>
>
> Yes, you are right.
> I think I've done them atomically initially because of things like
> FUTEX_WAKE_OP. I will fix that.
>
>
>>> And added atomic nand operation:
>>> http://llvm.org/viewvc/llvm-project?view=rev&revision=168584
>>
>> Thanks.  Can you please also add the memory range read/write functions?
>> That could be used both for builtins (e.g. if user writes
>>   __builtin_memcpy (&a, &b, sizeof (a));
>> in his code, no -fno-builtin or similar actually disables that), compiler
>> generated builtins - e.g. structure copies
>> struct large a, b;
>>   ...
>>   a = b;
>> and also for accesses of sizes that aren't supported by the library
>> (consider 32-byte or 64-byte vectors etc.).
>
> Done:
> http://llvm.org/viewvc/llvm-project?view=rev&revision=168692


Kostya is going to do gcc<->llvm library sync today.
Jakub Jelinek Nov. 27, 2012, 12:27 p.m. UTC | #15
On Tue, Nov 27, 2012 at 09:23:30AM +0100, Jakub Jelinek wrote:
> On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
> > I've added 128-bit atomic ops:
> > http://llvm.org/viewvc/llvm-project?view=rev&revision=168683
> 
> Thanks.

+#if (defined(__clang__) && defined(__clang_major__) \
+      && defined(__clang_minor__) && __clang__ >= 1 && __clang_major__ >= 3 \
+      && __clang_minor__ >= 3) \
+    || (defined(__GNUC__) && defined(__GNUC_MINOR__) \
+      && defined(__GNUC_PATCHLEVEL__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 6 \
+      && __GNUC_PATCHLEVEL__ >= 3)

is wrong, one thing is that __int128 is available only on a couple of
architectures (i?86/x86_64/ia64 or so), and more importantly, the above
isn't true for say GCC 4.7.0, because __GNUC_PATCHLEVEL__ is then < 3.
So, either you want something like
#define GCC_VERSION ((__GNUC__) * 10000 + (__GNUC_MINOR__) * 100 + (__GNUC_PATCHLEVEL__))
and then you can test like #if GCC_VERSION >= 40603
or, for the int128 case, much better just to test
defined(__GNUC__) && defined(__SIZEOF_INT128__)
(no idea if clang doesn't define the same macro, if it does, you could
just test for presence of the sizeof macro).

	Jakub
Dmitry Vyukov Nov. 27, 2012, 12:35 p.m. UTC | #16
On Tue, Nov 27, 2012 at 4:27 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Nov 27, 2012 at 09:23:30AM +0100, Jakub Jelinek wrote:
>> On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
>> > I've added 128-bit atomic ops:
>> > http://llvm.org/viewvc/llvm-project?view=rev&revision=168683
>>
>> Thanks.
>
> +#if (defined(__clang__) && defined(__clang_major__) \
> +      && defined(__clang_minor__) && __clang__ >= 1 && __clang_major__ >= 3 \
> +      && __clang_minor__ >= 3) \
> +    || (defined(__GNUC__) && defined(__GNUC_MINOR__) \
> +      && defined(__GNUC_PATCHLEVEL__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 6 \
> +      && __GNUC_PATCHLEVEL__ >= 3)
>
> is wrong, one thing is that __int128 is available only on a couple of
> architectures (i?86/x86_64/ia64 or so), and more importantly, the above
> isn't true for say GCC 4.7.0, because __GNUC_PATCHLEVEL__ is then < 3.
> So, either you want something like
> #define GCC_VERSION ((__GNUC__) * 10000 + (__GNUC_MINOR__) * 100 + (__GNUC_PATCHLEVEL__))
> and then you can test like #if GCC_VERSION >= 40603
> or, for the int128 case, much better just to test
> defined(__GNUC__) && defined(__SIZEOF_INT128__)
> (no idea if clang doesn't define the same macro, if it does, you could
> just test for presence of the sizeof macro).

clang does not support the macro.
what about
#if defined(__SIZEOF_INT128__) || defined(__clang__)
?

thanks!
Jakub Jelinek Nov. 27, 2012, 12:39 p.m. UTC | #17
On Tue, Nov 27, 2012 at 04:35:33PM +0400, Dmitry Vyukov wrote:
> On Tue, Nov 27, 2012 at 4:27 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Tue, Nov 27, 2012 at 09:23:30AM +0100, Jakub Jelinek wrote:
> >> On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
> >> > I've added 128-bit atomic ops:
> >> > http://llvm.org/viewvc/llvm-project?view=rev&revision=168683
> >>
> >> Thanks.
> >
> > +#if (defined(__clang__) && defined(__clang_major__) \
> > +      && defined(__clang_minor__) && __clang__ >= 1 && __clang_major__ >= 3 \
> > +      && __clang_minor__ >= 3) \
> > +    || (defined(__GNUC__) && defined(__GNUC_MINOR__) \
> > +      && defined(__GNUC_PATCHLEVEL__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 6 \
> > +      && __GNUC_PATCHLEVEL__ >= 3)
> >
> > is wrong, one thing is that __int128 is available only on a couple of
> > architectures (i?86/x86_64/ia64 or so), and more importantly, the above
> > isn't true for say GCC 4.7.0, because __GNUC_PATCHLEVEL__ is then < 3.
> > So, either you want something like
> > #define GCC_VERSION ((__GNUC__) * 10000 + (__GNUC_MINOR__) * 100 + (__GNUC_PATCHLEVEL__))
> > and then you can test like #if GCC_VERSION >= 40603
> > or, for the int128 case, much better just to test
> > defined(__GNUC__) && defined(__SIZEOF_INT128__)
> > (no idea if clang doesn't define the same macro, if it does, you could
> > just test for presence of the sizeof macro).
> 
> clang does not support the macro.
> what about
> #if defined(__SIZEOF_INT128__) || defined(__clang__)
> ?

Then for __clang__ you need to do a version check I guess (and, the same
what I wrote applies, consider clang 4.0; don't care about that though),
but for GCC sure, just the #ifdef __SIZEOF_INT128__ is what lots of tests do.

	Jakub
Dmitry Vyukov Nov. 27, 2012, 12:53 p.m. UTC | #18
On Tue, Nov 27, 2012 at 4:39 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Nov 27, 2012 at 04:35:33PM +0400, Dmitry Vyukov wrote:
>> On Tue, Nov 27, 2012 at 4:27 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > On Tue, Nov 27, 2012 at 09:23:30AM +0100, Jakub Jelinek wrote:
>> >> On Tue, Nov 27, 2012 at 12:13:42PM +0400, Dmitry Vyukov wrote:
>> >> > I've added 128-bit atomic ops:
>> >> > http://llvm.org/viewvc/llvm-project?view=rev&revision=168683
>> >>
>> >> Thanks.
>> >
>> > +#if (defined(__clang__) && defined(__clang_major__) \
>> > +      && defined(__clang_minor__) && __clang__ >= 1 && __clang_major__ >= 3 \
>> > +      && __clang_minor__ >= 3) \
>> > +    || (defined(__GNUC__) && defined(__GNUC_MINOR__) \
>> > +      && defined(__GNUC_PATCHLEVEL__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 6 \
>> > +      && __GNUC_PATCHLEVEL__ >= 3)
>> >
>> > is wrong, one thing is that __int128 is available only on a couple of
>> > architectures (i?86/x86_64/ia64 or so), and more importantly, the above
>> > isn't true for say GCC 4.7.0, because __GNUC_PATCHLEVEL__ is then < 3.
>> > So, either you want something like
>> > #define GCC_VERSION ((__GNUC__) * 10000 + (__GNUC_MINOR__) * 100 + (__GNUC_PATCHLEVEL__))
>> > and then you can test like #if GCC_VERSION >= 40603
>> > or, for the int128 case, much better just to test
>> > defined(__GNUC__) && defined(__SIZEOF_INT128__)
>> > (no idea if clang doesn't define the same macro, if it does, you could
>> > just test for presence of the sizeof macro).
>>
>> clang does not support the macro.
>> what about
>> #if defined(__SIZEOF_INT128__) || defined(__clang__)
>> ?
>
> Then for __clang__ you need to do a version check I guess (and, the same
> what I wrote applies, consider clang 4.0; don't care about that though),
> but for GCC sure, just the #ifdef __SIZEOF_INT128__ is what lots of tests do.


I've wrote:

#if defined(__SIZEOF_INT128__) \
    || (__clang_major__ * 100 + __clang_minor__ >= 302)

thanks!
Jakub Jelinek Nov. 30, 2012, 4:38 p.m. UTC | #19
On Tue, Nov 27, 2012 at 12:47:50PM +0400, Dmitry Vyukov wrote:
> Yes, you are right.
> I think I've done them atomically initially because of things like
> FUTEX_WAKE_OP. I will fix that.

Any progress on that?

BTW, the current
template<typename T> T func_nand(T v, T op) {
  return ~v & op;
}
is wrong not just by not being atomic (similarly to others), but
furthermore because __sync_fetch_and_nand (and __atomic etc.) are
  return ~(v & op);
instead (GCC < 4.4 did it wrongly as ~v & op; though).

	Jakub
Dmitry Vyukov Nov. 30, 2012, 5 p.m. UTC | #20
On Fri, Nov 30, 2012 at 8:38 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Nov 27, 2012 at 12:47:50PM +0400, Dmitry Vyukov wrote:
>> Yes, you are right.
>> I think I've done them atomically initially because of things like
>> FUTEX_WAKE_OP. I will fix that.
>
> Any progress on that?
>
> BTW, the current
> template<typename T> T func_nand(T v, T op) {
>   return ~v & op;
> }
> is wrong not just by not being atomic (similarly to others), but
> furthermore because __sync_fetch_and_nand (and __atomic etc.) are
>   return ~(v & op);
> instead (GCC < 4.4 did it wrongly as ~v & op; though).


Hi,

No progress for now. It's in my todo list, but I am busy with other tasks.
It should not block you on compiler side, and I think work most of the
time, so I will fix in near future.
Dmitry Vyukov Nov. 30, 2012, 5 p.m. UTC | #21
and thanks for the nand catch!

On Fri, Nov 30, 2012 at 9:00 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Nov 30, 2012 at 8:38 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Tue, Nov 27, 2012 at 12:47:50PM +0400, Dmitry Vyukov wrote:
>>> Yes, you are right.
>>> I think I've done them atomically initially because of things like
>>> FUTEX_WAKE_OP. I will fix that.
>>
>> Any progress on that?
>>
>> BTW, the current
>> template<typename T> T func_nand(T v, T op) {
>>   return ~v & op;
>> }
>> is wrong not just by not being atomic (similarly to others), but
>> furthermore because __sync_fetch_and_nand (and __atomic etc.) are
>>   return ~(v & op);
>> instead (GCC < 4.4 did it wrongly as ~v & op; though).
>
>
> Hi,
>
> No progress for now. It's in my todo list, but I am busy with other tasks.
> It should not block you on compiler side, and I think work most of the
> time, so I will fix in near future.
diff mbox

Patch

--- gcc/Makefile.in.jj	2012-11-23 10:31:37.861377311 +0100
+++ gcc/Makefile.in	2012-11-23 13:36:00.578761997 +0100
@@ -2234,7 +2234,8 @@  tsan.o : $(CONFIG_H) $(SYSTEM_H) $(TREE_
    $(TM_H) coretypes.h $(TREE_DUMP_H) $(TREE_PASS_H) $(CGRAPH_H) $(GGC_H) \
    $(BASIC_BLOCK_H) $(FLAGS_H) $(FUNCTION_H) \
    $(TM_P_H) $(TREE_FLOW_H) $(DIAGNOSTIC_CORE_H) $(GIMPLE_H) tree-iterator.h \
-   intl.h cfghooks.h output.h options.h c-family/c-common.h tsan.h asan.h
+   intl.h cfghooks.h output.h options.h c-family/c-common.h tsan.h asan.h \
+   tree-ssa-propagate.h
 tree-ssa-tail-merge.o: tree-ssa-tail-merge.c \
    $(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(BITMAP_H) \
    $(FLAGS_H) $(TM_P_H) $(BASIC_BLOCK_H) \
--- gcc/sanitizer.def.jj	2012-11-23 10:31:37.859377232 +0100
+++ gcc/sanitizer.def	2012-11-23 13:36:00.576761947 +0100
@@ -57,3 +57,148 @@  DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_WRIT
 		      BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_WRITE16, "__tsan_write16",
 		      BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_LOAD,
+		      "__tsan_atomic8_load",
+		      BT_FN_I1_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_LOAD,
+		      "__tsan_atomic16_load",
+		      BT_FN_I2_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_LOAD,
+		      "__tsan_atomic32_load",
+		      BT_FN_I4_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_LOAD,
+		      "__tsan_atomic64_load",
+		      BT_FN_I8_CONST_VPTR_INT, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_STORE,
+		      "__tsan_atomic8_store",
+		      BT_FN_VOID_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_STORE,
+		      "__tsan_atomic16_store",
+		      BT_FN_VOID_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_STORE,
+		      "__tsan_atomic32_store",
+		      BT_FN_VOID_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_STORE,
+		      "__tsan_atomic64_store",
+		      BT_FN_VOID_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_EXCHANGE,
+		      "__tsan_atomic8_exchange",
+		      BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_EXCHANGE,
+		      "__tsan_atomic16_exchange",
+		      BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_EXCHANGE,
+		      "__tsan_atomic32_exchange",
+		      BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_EXCHANGE,
+		      "__tsan_atomic64_exchange",
+		      BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_FETCH_ADD,
+		      "__tsan_atomic8_fetch_add",
+		      BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_FETCH_ADD,
+		      "__tsan_atomic16_fetch_add",
+		      BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_FETCH_ADD,
+		      "__tsan_atomic32_fetch_add",
+		      BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_FETCH_ADD,
+		      "__tsan_atomic64_fetch_add",
+		      BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_FETCH_SUB,
+		      "__tsan_atomic8_fetch_sub",
+		      BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_FETCH_SUB,
+		      "__tsan_atomic16_fetch_sub",
+		      BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_FETCH_SUB,
+		      "__tsan_atomic32_fetch_sub",
+		      BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_FETCH_SUB,
+		      "__tsan_atomic64_fetch_sub",
+		      BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_FETCH_AND,
+		      "__tsan_atomic8_fetch_and",
+		      BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_FETCH_AND,
+		      "__tsan_atomic16_fetch_and",
+		      BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_FETCH_AND,
+		      "__tsan_atomic32_fetch_and",
+		      BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_FETCH_AND,
+		      "__tsan_atomic64_fetch_and",
+		      BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_FETCH_OR,
+		      "__tsan_atomic8_fetch_or",
+		      BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_FETCH_OR,
+		      "__tsan_atomic16_fetch_or",
+		      BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_FETCH_OR,
+		      "__tsan_atomic32_fetch_or",
+		      BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_FETCH_OR,
+		      "__tsan_atomic64_fetch_or",
+		      BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_FETCH_XOR,
+		      "__tsan_atomic8_fetch_xor",
+		      BT_FN_I1_VPTR_I1_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_FETCH_XOR,
+		      "__tsan_atomic16_fetch_xor",
+		      BT_FN_I2_VPTR_I2_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_FETCH_XOR,
+		      "__tsan_atomic32_fetch_xor",
+		      BT_FN_I4_VPTR_I4_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_FETCH_XOR,
+		      "__tsan_atomic64_fetch_xor",
+		      BT_FN_I8_VPTR_I8_INT, ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_COMPARE_EXCHANGE_STRONG,
+		      "__tsan_atomic8_compare_exchange_strong",
+		      BT_FN_BOOL_VPTR_PTR_I1_INT,
+		      ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_COMPARE_EXCHANGE_STRONG,
+		      "__tsan_atomic16_compare_exchange_strong",
+		      BT_FN_BOOL_VPTR_PTR_I2_INT,
+		      ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_COMPARE_EXCHANGE_STRONG,
+		      "__tsan_atomic32_compare_exchange_strong",
+		      BT_FN_BOOL_VPTR_PTR_I4_INT,
+		      ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_COMPARE_EXCHANGE_STRONG,
+		      "__tsan_atomic64_compare_exchange_strong",
+		      BT_FN_BOOL_VPTR_PTR_I8_INT,
+		      ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC8_COMPARE_EXCHANGE_WEAK,
+		      "__tsan_atomic8_compare_exchange_weak",
+		      BT_FN_BOOL_VPTR_PTR_I1_INT,
+		      ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC16_COMPARE_EXCHANGE_WEAK,
+		      "__tsan_atomic16_compare_exchange_weak",
+		      BT_FN_BOOL_VPTR_PTR_I2_INT,
+		      ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC32_COMPARE_EXCHANGE_WEAK,
+		      "__tsan_atomic32_compare_exchange_weak",
+		      BT_FN_BOOL_VPTR_PTR_I4_INT,
+		      ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC64_COMPARE_EXCHANGE_WEAK,
+		      "__tsan_atomic64_compare_exchange_weak",
+		      BT_FN_BOOL_VPTR_PTR_I8_INT,
+		      ATTR_NOTHROW_LEAF_LIST)
+
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC_THREAD_FENCE,
+		      "__tsan_atomic_thread_fence",
+		      BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_ATOMIC_SIGNAL_FENCE,
+		      "__tsan_atomic_signal_fence",
+		      BT_FN_VOID_INT, ATTR_NOTHROW_LEAF_LIST)
--- gcc/asan.c.jj	2012-11-23 10:31:37.863377378 +0100
+++ gcc/asan.c	2012-11-23 13:36:00.579762019 +0100
@@ -1484,6 +1484,53 @@  initialize_sanitizer_builtins (void)
     = build_function_type_list (void_type_node, ptr_type_node,
 				build_nonstandard_integer_type (POINTER_SIZE,
 								1), NULL_TREE);
+  tree BT_FN_VOID_INT
+    = build_function_type_list (void_type_node, integer_type_node, NULL_TREE);
+  tree BT_FN_BOOL_VPTR_PTR_IX_INT[4];
+  tree BT_FN_IX_CONST_VPTR_INT[4];
+  tree BT_FN_IX_VPTR_IX_INT[4];
+  tree BT_FN_VOID_VPTR_IX_INT[4];
+  tree vptr
+    = build_pointer_type (build_qualified_type (void_type_node,
+						TYPE_QUAL_VOLATILE));
+  tree cvptr
+    = build_pointer_type (build_qualified_type (void_type_node,
+						TYPE_QUAL_VOLATILE
+						|TYPE_QUAL_CONST));
+  tree boolt
+    = lang_hooks.types.type_for_size (BOOL_TYPE_SIZE, 1);
+  int i;
+  for (i = 0; i < 4; i++)
+    {
+      tree ix = build_nonstandard_integer_type (BITS_PER_UNIT * (1 << i), 1);
+      BT_FN_BOOL_VPTR_PTR_IX_INT[i]
+	= build_function_type_list (boolt, vptr, ptr_type_node, ix,
+				    integer_type_node, NULL_TREE);
+      BT_FN_IX_CONST_VPTR_INT[i]
+	= build_function_type_list (ix, cvptr, integer_type_node, NULL_TREE);
+      BT_FN_IX_VPTR_IX_INT[i]
+	= build_function_type_list (ix, vptr, ix, integer_type_node,
+				    NULL_TREE);
+      BT_FN_VOID_VPTR_IX_INT[i]
+	= build_function_type_list (void_type_node, vptr, ix,
+				    integer_type_node, NULL_TREE);
+    }
+#define BT_FN_BOOL_VPTR_PTR_I1_INT BT_FN_BOOL_VPTR_PTR_IX_INT[0]
+#define BT_FN_I1_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[0]
+#define BT_FN_I1_VPTR_I1_INT BT_FN_IX_VPTR_IX_INT[0]
+#define BT_FN_VOID_VPTR_I1_INT BT_FN_VOID_VPTR_IX_INT[0]
+#define BT_FN_BOOL_VPTR_PTR_I2_INT BT_FN_BOOL_VPTR_PTR_IX_INT[1]
+#define BT_FN_I2_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[1]
+#define BT_FN_I2_VPTR_I2_INT BT_FN_IX_VPTR_IX_INT[1]
+#define BT_FN_VOID_VPTR_I2_INT BT_FN_VOID_VPTR_IX_INT[1]
+#define BT_FN_BOOL_VPTR_PTR_I4_INT BT_FN_BOOL_VPTR_PTR_IX_INT[2]
+#define BT_FN_I4_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[2]
+#define BT_FN_I4_VPTR_I4_INT BT_FN_IX_VPTR_IX_INT[2]
+#define BT_FN_VOID_VPTR_I4_INT BT_FN_VOID_VPTR_IX_INT[2]
+#define BT_FN_BOOL_VPTR_PTR_I8_INT BT_FN_BOOL_VPTR_PTR_IX_INT[3]
+#define BT_FN_I8_CONST_VPTR_INT BT_FN_IX_CONST_VPTR_INT[3]
+#define BT_FN_I8_VPTR_I8_INT BT_FN_IX_VPTR_IX_INT[3]
+#define BT_FN_VOID_VPTR_I8_INT BT_FN_VOID_VPTR_IX_INT[3]
 #undef ATTR_NOTHROW_LEAF_LIST
 #define ATTR_NOTHROW_LEAF_LIST ECF_NOTHROW | ECF_LEAF
 #undef ATTR_NORETURN_NOTHROW_LEAF_LIST
--- gcc/tsan.c.jj	2012-11-23 13:35:06.448082211 +0100
+++ gcc/tsan.c	2012-11-23 13:50:49.750579441 +0100
@@ -37,6 +37,7 @@  along with GCC; see the file COPYING3.
 #include "target.h"
 #include "cgraph.h"
 #include "diagnostic.h"
+#include "tree-ssa-propagate.h"
 #include "tsan.h"
 #include "asan.h"
 
@@ -216,33 +217,370 @@  instrument_expr (gimple_stmt_iterator gs
   return true;
 }
 
+/* Actions for sync/atomic builtin transformations.  */
+enum tsan_atomic_action
+{
+  adjust_last, add_seq_cst, add_acquire, weak_cas, strong_cas,
+  bool_cas, val_cas, lock_release, fetch_op, fetch_op_seq_cst
+};
+
+/* Table how to map sync/atomic builtins to their corresponding
+   tsan equivalents.  */
+static struct tsan_map_atomic
+{
+  enum built_in_function fcode, tsan_fcode;
+  enum tsan_atomic_action action;
+  enum tree_code code;
+} tsan_atomic_table[] =
+{
+#define TRANSFORM(fcode, tsan_fcode, action, code) \
+  { BUILT_IN_##fcode, BUILT_IN_##tsan_fcode, action, code }
+#define ADJUST_LAST(fcode, tsan_fcode) \
+  TRANSFORM (fcode, tsan_fcode, adjust_last, ERROR_MARK)
+#define ADD_SEQ_CST(fcode, tsan_fcode) \
+  TRANSFORM (fcode, tsan_fcode, add_seq_cst, ERROR_MARK)
+#define ADD_ACQUIRE(fcode, tsan_fcode) \
+  TRANSFORM (fcode, tsan_fcode, add_acquire, ERROR_MARK)
+#define WEAK_CAS(fcode, tsan_fcode) \
+  TRANSFORM (fcode, tsan_fcode, weak_cas, ERROR_MARK)
+#define STRONG_CAS(fcode, tsan_fcode) \
+  TRANSFORM (fcode, tsan_fcode, strong_cas, ERROR_MARK)
+#define BOOL_CAS(fcode, tsan_fcode) \
+  TRANSFORM (fcode, tsan_fcode, bool_cas, ERROR_MARK)
+#define VAL_CAS(fcode, tsan_fcode) \
+  TRANSFORM (fcode, tsan_fcode, val_cas, ERROR_MARK)
+#define LOCK_RELEASE(fcode, tsan_fcode) \
+  TRANSFORM (fcode, tsan_fcode, lock_release, ERROR_MARK)
+#define FETCH_OP(fcode, tsan_fcode, code) \
+  TRANSFORM (fcode, tsan_fcode, fetch_op, code)
+#define FETCH_OPS(fcode, tsan_fcode, code) \
+  TRANSFORM (fcode, tsan_fcode, fetch_op_seq_cst, code)
+
+  ADJUST_LAST (ATOMIC_LOAD_1, TSAN_ATOMIC8_LOAD),
+  ADJUST_LAST (ATOMIC_LOAD_2, TSAN_ATOMIC16_LOAD),
+  ADJUST_LAST (ATOMIC_LOAD_4, TSAN_ATOMIC32_LOAD),
+  ADJUST_LAST (ATOMIC_LOAD_8, TSAN_ATOMIC64_LOAD),
+  ADJUST_LAST (ATOMIC_STORE_1, TSAN_ATOMIC8_STORE),
+  ADJUST_LAST (ATOMIC_STORE_2, TSAN_ATOMIC16_STORE),
+  ADJUST_LAST (ATOMIC_STORE_4, TSAN_ATOMIC32_STORE),
+  ADJUST_LAST (ATOMIC_STORE_8, TSAN_ATOMIC64_STORE),
+  ADJUST_LAST (ATOMIC_EXCHANGE_1, TSAN_ATOMIC8_EXCHANGE),
+  ADJUST_LAST (ATOMIC_EXCHANGE_2, TSAN_ATOMIC16_EXCHANGE),
+  ADJUST_LAST (ATOMIC_EXCHANGE_4, TSAN_ATOMIC32_EXCHANGE),
+  ADJUST_LAST (ATOMIC_EXCHANGE_8, TSAN_ATOMIC64_EXCHANGE),
+  ADJUST_LAST (ATOMIC_FETCH_ADD_1, TSAN_ATOMIC8_FETCH_ADD),
+  ADJUST_LAST (ATOMIC_FETCH_ADD_2, TSAN_ATOMIC16_FETCH_ADD),
+  ADJUST_LAST (ATOMIC_FETCH_ADD_4, TSAN_ATOMIC32_FETCH_ADD),
+  ADJUST_LAST (ATOMIC_FETCH_ADD_8, TSAN_ATOMIC64_FETCH_ADD),
+  ADJUST_LAST (ATOMIC_FETCH_SUB_1, TSAN_ATOMIC8_FETCH_SUB),
+  ADJUST_LAST (ATOMIC_FETCH_SUB_2, TSAN_ATOMIC16_FETCH_SUB),
+  ADJUST_LAST (ATOMIC_FETCH_SUB_4, TSAN_ATOMIC32_FETCH_SUB),
+  ADJUST_LAST (ATOMIC_FETCH_SUB_8, TSAN_ATOMIC64_FETCH_SUB),
+  ADJUST_LAST (ATOMIC_FETCH_AND_1, TSAN_ATOMIC8_FETCH_AND),
+  ADJUST_LAST (ATOMIC_FETCH_AND_2, TSAN_ATOMIC16_FETCH_AND),
+  ADJUST_LAST (ATOMIC_FETCH_AND_4, TSAN_ATOMIC32_FETCH_AND),
+  ADJUST_LAST (ATOMIC_FETCH_AND_8, TSAN_ATOMIC64_FETCH_AND),
+  ADJUST_LAST (ATOMIC_FETCH_OR_1, TSAN_ATOMIC8_FETCH_OR),
+  ADJUST_LAST (ATOMIC_FETCH_OR_2, TSAN_ATOMIC16_FETCH_OR),
+  ADJUST_LAST (ATOMIC_FETCH_OR_4, TSAN_ATOMIC32_FETCH_OR),
+  ADJUST_LAST (ATOMIC_FETCH_OR_8, TSAN_ATOMIC64_FETCH_OR),
+  ADJUST_LAST (ATOMIC_FETCH_XOR_1, TSAN_ATOMIC8_FETCH_XOR),
+  ADJUST_LAST (ATOMIC_FETCH_XOR_2, TSAN_ATOMIC16_FETCH_XOR),
+  ADJUST_LAST (ATOMIC_FETCH_XOR_4, TSAN_ATOMIC32_FETCH_XOR),
+  ADJUST_LAST (ATOMIC_FETCH_XOR_8, TSAN_ATOMIC64_FETCH_XOR),
+
+  ADJUST_LAST (ATOMIC_THREAD_FENCE, TSAN_ATOMIC_THREAD_FENCE),
+  ADJUST_LAST (ATOMIC_SIGNAL_FENCE, TSAN_ATOMIC_SIGNAL_FENCE),
+
+  FETCH_OP (ATOMIC_ADD_FETCH_1, TSAN_ATOMIC8_FETCH_ADD, PLUS_EXPR),
+  FETCH_OP (ATOMIC_ADD_FETCH_2, TSAN_ATOMIC16_FETCH_ADD, PLUS_EXPR),
+  FETCH_OP (ATOMIC_ADD_FETCH_4, TSAN_ATOMIC32_FETCH_ADD, PLUS_EXPR),
+  FETCH_OP (ATOMIC_ADD_FETCH_8, TSAN_ATOMIC64_FETCH_ADD, PLUS_EXPR),
+  FETCH_OP (ATOMIC_SUB_FETCH_1, TSAN_ATOMIC8_FETCH_SUB, MINUS_EXPR),
+  FETCH_OP (ATOMIC_SUB_FETCH_2, TSAN_ATOMIC16_FETCH_SUB, MINUS_EXPR),
+  FETCH_OP (ATOMIC_SUB_FETCH_4, TSAN_ATOMIC32_FETCH_SUB, MINUS_EXPR),
+  FETCH_OP (ATOMIC_SUB_FETCH_8, TSAN_ATOMIC64_FETCH_SUB, MINUS_EXPR),
+  FETCH_OP (ATOMIC_AND_FETCH_1, TSAN_ATOMIC8_FETCH_AND, BIT_AND_EXPR),
+  FETCH_OP (ATOMIC_AND_FETCH_2, TSAN_ATOMIC16_FETCH_AND, BIT_AND_EXPR),
+  FETCH_OP (ATOMIC_AND_FETCH_4, TSAN_ATOMIC32_FETCH_AND, BIT_AND_EXPR),
+  FETCH_OP (ATOMIC_AND_FETCH_8, TSAN_ATOMIC64_FETCH_AND, BIT_AND_EXPR),
+  FETCH_OP (ATOMIC_OR_FETCH_1, TSAN_ATOMIC8_FETCH_OR, BIT_IOR_EXPR),
+  FETCH_OP (ATOMIC_OR_FETCH_2, TSAN_ATOMIC16_FETCH_OR, BIT_IOR_EXPR),
+  FETCH_OP (ATOMIC_OR_FETCH_4, TSAN_ATOMIC32_FETCH_OR, BIT_IOR_EXPR),
+  FETCH_OP (ATOMIC_OR_FETCH_8, TSAN_ATOMIC64_FETCH_OR, BIT_IOR_EXPR),
+  FETCH_OP (ATOMIC_XOR_FETCH_1, TSAN_ATOMIC8_FETCH_XOR, BIT_XOR_EXPR),
+  FETCH_OP (ATOMIC_XOR_FETCH_2, TSAN_ATOMIC16_FETCH_XOR, BIT_XOR_EXPR),
+  FETCH_OP (ATOMIC_XOR_FETCH_4, TSAN_ATOMIC32_FETCH_XOR, BIT_XOR_EXPR),
+  FETCH_OP (ATOMIC_XOR_FETCH_8, TSAN_ATOMIC64_FETCH_XOR, BIT_XOR_EXPR),
+
+  ADD_ACQUIRE (SYNC_LOCK_TEST_AND_SET_1, TSAN_ATOMIC8_EXCHANGE),
+  ADD_ACQUIRE (SYNC_LOCK_TEST_AND_SET_2, TSAN_ATOMIC16_EXCHANGE),
+  ADD_ACQUIRE (SYNC_LOCK_TEST_AND_SET_4, TSAN_ATOMIC32_EXCHANGE),
+  ADD_ACQUIRE (SYNC_LOCK_TEST_AND_SET_8, TSAN_ATOMIC64_EXCHANGE),
+
+  ADD_SEQ_CST (SYNC_FETCH_AND_ADD_1, TSAN_ATOMIC8_FETCH_ADD),
+  ADD_SEQ_CST (SYNC_FETCH_AND_ADD_2, TSAN_ATOMIC16_FETCH_ADD),
+  ADD_SEQ_CST (SYNC_FETCH_AND_ADD_4, TSAN_ATOMIC32_FETCH_ADD),
+  ADD_SEQ_CST (SYNC_FETCH_AND_ADD_8, TSAN_ATOMIC64_FETCH_ADD),
+  ADD_SEQ_CST (SYNC_FETCH_AND_SUB_1, TSAN_ATOMIC8_FETCH_SUB),
+  ADD_SEQ_CST (SYNC_FETCH_AND_SUB_2, TSAN_ATOMIC16_FETCH_SUB),
+  ADD_SEQ_CST (SYNC_FETCH_AND_SUB_4, TSAN_ATOMIC32_FETCH_SUB),
+  ADD_SEQ_CST (SYNC_FETCH_AND_SUB_8, TSAN_ATOMIC64_FETCH_SUB),
+  ADD_SEQ_CST (SYNC_FETCH_AND_AND_1, TSAN_ATOMIC8_FETCH_AND),
+  ADD_SEQ_CST (SYNC_FETCH_AND_AND_2, TSAN_ATOMIC16_FETCH_AND),
+  ADD_SEQ_CST (SYNC_FETCH_AND_AND_4, TSAN_ATOMIC32_FETCH_AND),
+  ADD_SEQ_CST (SYNC_FETCH_AND_AND_8, TSAN_ATOMIC64_FETCH_AND),
+  ADD_SEQ_CST (SYNC_FETCH_AND_OR_1, TSAN_ATOMIC8_FETCH_OR),
+  ADD_SEQ_CST (SYNC_FETCH_AND_OR_2, TSAN_ATOMIC16_FETCH_OR),
+  ADD_SEQ_CST (SYNC_FETCH_AND_OR_4, TSAN_ATOMIC32_FETCH_OR),
+  ADD_SEQ_CST (SYNC_FETCH_AND_OR_8, TSAN_ATOMIC64_FETCH_OR),
+  ADD_SEQ_CST (SYNC_FETCH_AND_XOR_1, TSAN_ATOMIC8_FETCH_XOR),
+  ADD_SEQ_CST (SYNC_FETCH_AND_XOR_2, TSAN_ATOMIC16_FETCH_XOR),
+  ADD_SEQ_CST (SYNC_FETCH_AND_XOR_4, TSAN_ATOMIC32_FETCH_XOR),
+  ADD_SEQ_CST (SYNC_FETCH_AND_XOR_8, TSAN_ATOMIC64_FETCH_XOR),
+
+  ADD_SEQ_CST (SYNC_SYNCHRONIZE, TSAN_ATOMIC_THREAD_FENCE),
+
+  FETCH_OPS (SYNC_ADD_AND_FETCH_1, TSAN_ATOMIC8_FETCH_ADD, PLUS_EXPR),
+  FETCH_OPS (SYNC_ADD_AND_FETCH_2, TSAN_ATOMIC16_FETCH_ADD, PLUS_EXPR),
+  FETCH_OPS (SYNC_ADD_AND_FETCH_4, TSAN_ATOMIC32_FETCH_ADD, PLUS_EXPR),
+  FETCH_OPS (SYNC_ADD_AND_FETCH_8, TSAN_ATOMIC64_FETCH_ADD, PLUS_EXPR),
+  FETCH_OPS (SYNC_SUB_AND_FETCH_1, TSAN_ATOMIC8_FETCH_SUB, MINUS_EXPR),
+  FETCH_OPS (SYNC_SUB_AND_FETCH_2, TSAN_ATOMIC16_FETCH_SUB, MINUS_EXPR),
+  FETCH_OPS (SYNC_SUB_AND_FETCH_4, TSAN_ATOMIC32_FETCH_SUB, MINUS_EXPR),
+  FETCH_OPS (SYNC_SUB_AND_FETCH_8, TSAN_ATOMIC64_FETCH_SUB, MINUS_EXPR),
+  FETCH_OPS (SYNC_AND_AND_FETCH_1, TSAN_ATOMIC8_FETCH_AND, BIT_AND_EXPR),
+  FETCH_OPS (SYNC_AND_AND_FETCH_2, TSAN_ATOMIC16_FETCH_AND, BIT_AND_EXPR),
+  FETCH_OPS (SYNC_AND_AND_FETCH_4, TSAN_ATOMIC32_FETCH_AND, BIT_AND_EXPR),
+  FETCH_OPS (SYNC_AND_AND_FETCH_8, TSAN_ATOMIC64_FETCH_AND, BIT_AND_EXPR),
+  FETCH_OPS (SYNC_OR_AND_FETCH_1, TSAN_ATOMIC8_FETCH_OR, BIT_IOR_EXPR),
+  FETCH_OPS (SYNC_OR_AND_FETCH_2, TSAN_ATOMIC16_FETCH_OR, BIT_IOR_EXPR),
+  FETCH_OPS (SYNC_OR_AND_FETCH_4, TSAN_ATOMIC32_FETCH_OR, BIT_IOR_EXPR),
+  FETCH_OPS (SYNC_OR_AND_FETCH_8, TSAN_ATOMIC64_FETCH_OR, BIT_IOR_EXPR),
+  FETCH_OPS (SYNC_XOR_AND_FETCH_1, TSAN_ATOMIC8_FETCH_XOR, BIT_XOR_EXPR),
+  FETCH_OPS (SYNC_XOR_AND_FETCH_2, TSAN_ATOMIC16_FETCH_XOR, BIT_XOR_EXPR),
+  FETCH_OPS (SYNC_XOR_AND_FETCH_4, TSAN_ATOMIC32_FETCH_XOR, BIT_XOR_EXPR),
+  FETCH_OPS (SYNC_XOR_AND_FETCH_8, TSAN_ATOMIC64_FETCH_XOR, BIT_XOR_EXPR),
+
+  WEAK_CAS (ATOMIC_COMPARE_EXCHANGE_1, TSAN_ATOMIC8_COMPARE_EXCHANGE_WEAK),
+  WEAK_CAS (ATOMIC_COMPARE_EXCHANGE_2, TSAN_ATOMIC16_COMPARE_EXCHANGE_WEAK),
+  WEAK_CAS (ATOMIC_COMPARE_EXCHANGE_4, TSAN_ATOMIC32_COMPARE_EXCHANGE_WEAK),
+  WEAK_CAS (ATOMIC_COMPARE_EXCHANGE_8, TSAN_ATOMIC64_COMPARE_EXCHANGE_WEAK),
+
+  STRONG_CAS (ATOMIC_COMPARE_EXCHANGE_1, TSAN_ATOMIC8_COMPARE_EXCHANGE_STRONG),
+  STRONG_CAS (ATOMIC_COMPARE_EXCHANGE_2,
+	      TSAN_ATOMIC16_COMPARE_EXCHANGE_STRONG),
+  STRONG_CAS (ATOMIC_COMPARE_EXCHANGE_4,
+	      TSAN_ATOMIC32_COMPARE_EXCHANGE_STRONG),
+  STRONG_CAS (ATOMIC_COMPARE_EXCHANGE_8,
+	      TSAN_ATOMIC64_COMPARE_EXCHANGE_STRONG),
+
+  BOOL_CAS (SYNC_BOOL_COMPARE_AND_SWAP_1,
+	    TSAN_ATOMIC8_COMPARE_EXCHANGE_STRONG),
+  BOOL_CAS (SYNC_BOOL_COMPARE_AND_SWAP_2,
+	    TSAN_ATOMIC16_COMPARE_EXCHANGE_STRONG),
+  BOOL_CAS (SYNC_BOOL_COMPARE_AND_SWAP_4,
+	    TSAN_ATOMIC32_COMPARE_EXCHANGE_STRONG),
+  BOOL_CAS (SYNC_BOOL_COMPARE_AND_SWAP_8,
+	    TSAN_ATOMIC64_COMPARE_EXCHANGE_STRONG),
+
+  VAL_CAS (SYNC_VAL_COMPARE_AND_SWAP_1, TSAN_ATOMIC8_COMPARE_EXCHANGE_STRONG),
+  VAL_CAS (SYNC_VAL_COMPARE_AND_SWAP_2, TSAN_ATOMIC16_COMPARE_EXCHANGE_STRONG),
+  VAL_CAS (SYNC_VAL_COMPARE_AND_SWAP_4, TSAN_ATOMIC32_COMPARE_EXCHANGE_STRONG),
+  VAL_CAS (SYNC_VAL_COMPARE_AND_SWAP_8, TSAN_ATOMIC64_COMPARE_EXCHANGE_STRONG),
+
+  LOCK_RELEASE (SYNC_LOCK_RELEASE_1, TSAN_ATOMIC8_STORE),
+  LOCK_RELEASE (SYNC_LOCK_RELEASE_2, TSAN_ATOMIC16_STORE),
+  LOCK_RELEASE (SYNC_LOCK_RELEASE_4, TSAN_ATOMIC32_STORE),
+  LOCK_RELEASE (SYNC_LOCK_RELEASE_8, TSAN_ATOMIC64_STORE)
+};
+
+/* Instrument an atomic builtin.  */
+
+static void
+instrument_builtin_call (gimple_stmt_iterator *gsi)
+{
+  gimple stmt = gsi_stmt (*gsi), g;
+  tree callee = gimple_call_fndecl (stmt), last_arg, args[6], t, lhs;
+  enum built_in_function fcode = DECL_FUNCTION_CODE (callee);
+  unsigned int i, num = gimple_call_num_args (stmt), j;
+  for (j = 0; j < 6 && j < num; j++)
+    args[j] = gimple_call_arg (stmt, j);
+  for (i = 0; i < ARRAY_SIZE (tsan_atomic_table); i++)
+    if (fcode != tsan_atomic_table[i].fcode)
+      continue;
+    else
+      {
+	tree decl = builtin_decl_implicit (tsan_atomic_table[i].tsan_fcode);
+	if (decl == NULL_TREE)
+	  return;
+	switch (tsan_atomic_table[i].action)
+	  {
+	  case adjust_last:
+	  case fetch_op:
+	    last_arg = gimple_call_arg (stmt, num - 1);
+	    if (!host_integerp (last_arg, 1)
+		|| (unsigned HOST_WIDE_INT) tree_low_cst (last_arg, 1)
+		   > MEMMODEL_SEQ_CST)
+	      return;
+	    gimple_call_set_fndecl (stmt, decl);
+	    gimple_call_set_arg (stmt, num - 1,
+				 build_int_cst (NULL_TREE,
+						1 << tree_low_cst (last_arg,
+								   1)));
+	    update_stmt (stmt);
+	    if (tsan_atomic_table[i].action == fetch_op)
+	      {
+		args[1] = gimple_call_arg (stmt, 1);
+		goto adjust_result;
+	      }
+	    return;
+	  case add_seq_cst:
+	  case add_acquire:
+	  case fetch_op_seq_cst:
+	    gcc_assert (num <= 2);
+	    for (j = 0; j < num; j++)
+	      args[j] = gimple_call_arg (stmt, j);
+	    for (; j < 2; j++)
+	      args[j] = NULL_TREE;
+	    args[num] = build_int_cst (NULL_TREE,
+				       1 << (tsan_atomic_table[i].action
+					     != add_acquire
+					     ? MEMMODEL_SEQ_CST
+					     : MEMMODEL_ACQUIRE));
+	    update_gimple_call (gsi, decl, num + 1, args[0], args[1], args[2]);
+	    stmt = gsi_stmt (*gsi);
+	    if (tsan_atomic_table[i].action == fetch_op_seq_cst)
+	      {
+	      adjust_result:
+		lhs = gimple_call_lhs (stmt);
+		if (lhs == NULL_TREE)
+		  return;
+		if (!useless_type_conversion_p (TREE_TYPE (lhs),
+						TREE_TYPE (args[1])))
+		  {
+		    tree var = make_ssa_name (TREE_TYPE (lhs), NULL);
+		    g = gimple_build_assign_with_ops (NOP_EXPR, var,
+						      args[1], NULL_TREE);
+		    gsi_insert_after (gsi, g, GSI_NEW_STMT);
+		    args[1] = var;
+		  }
+		gimple_call_set_lhs (stmt,
+				     make_ssa_name (TREE_TYPE (lhs), NULL));
+		g = gimple_build_assign_with_ops (tsan_atomic_table[i].code,
+						  lhs, gimple_call_lhs (stmt),
+						  args[1]);
+		update_stmt (stmt);
+		gsi_insert_after (gsi, g, GSI_NEW_STMT);
+	      }
+	    return;
+	  case weak_cas:
+	    if (!integer_nonzerop (gimple_call_arg (stmt, 3)))
+	      continue;
+	    /* FALLTHRU */
+	  case strong_cas:
+	    gcc_assert (num == 6);
+	    for (j = 0; j < 6; j++)
+	      args[j] = gimple_call_arg (stmt, j);
+	    if (!host_integerp (args[4], 1)
+		|| (unsigned HOST_WIDE_INT) tree_low_cst (args[4], 1)
+		   > MEMMODEL_SEQ_CST)
+	      return;
+	    update_gimple_call (gsi, decl, 4, args[0], args[1], args[2],
+				build_int_cst (NULL_TREE,
+					       1 << tree_low_cst (args[4],
+								  1)));
+	    return;
+	  case bool_cas:
+	  case val_cas:
+	    gcc_assert (num == 3);
+	    for (j = 0; j < 3; j++)
+	      args[j] = gimple_call_arg (stmt, j);
+	    t = TYPE_ARG_TYPES (TREE_TYPE (decl));
+	    t = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (t)));
+	    t = create_tmp_var (t, NULL);
+	    mark_addressable (t);
+	    if (!useless_type_conversion_p (TREE_TYPE (t),
+					    TREE_TYPE (args[1])))
+	      {
+		g = gimple_build_assign_with_ops (NOP_EXPR,
+						  make_ssa_name (TREE_TYPE (t),
+								 NULL),
+						  args[1], NULL_TREE);
+		gsi_insert_before (gsi, g, GSI_SAME_STMT);
+		args[1] = gimple_assign_lhs (g);
+	      }
+	    g = gimple_build_assign (t, args[1]);
+	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
+	    lhs = gimple_call_lhs (stmt);
+	    update_gimple_call (gsi, decl, 4, args[0],
+				build_fold_addr_expr (t), args[2],
+				build_int_cst (NULL_TREE,
+					       1 << MEMMODEL_SEQ_CST));
+	    if (tsan_atomic_table[i].action == val_cas && lhs)
+	      {
+		tree cond;
+		stmt = gsi_stmt (*gsi);
+		g = gimple_build_assign (make_ssa_name (TREE_TYPE (t), NULL),
+					 t);
+		gsi_insert_after (gsi, g, GSI_NEW_STMT);
+		t = make_ssa_name (TREE_TYPE (TREE_TYPE (decl)), stmt);
+		cond = build2 (NE_EXPR, boolean_type_node, t,
+			       build_int_cst (TREE_TYPE (t), 0));
+		g = gimple_build_assign_with_ops (COND_EXPR, lhs, cond,
+						  args[1],
+						  gimple_assign_lhs (g));
+		gimple_call_set_lhs (stmt, t);
+		update_stmt (stmt);
+		gsi_insert_after (gsi, g, GSI_NEW_STMT);
+	      }
+	    return;
+	  case lock_release:
+	    gcc_assert (num == 1);
+	    t = TYPE_ARG_TYPES (TREE_TYPE (decl));
+	    t = TREE_VALUE (TREE_CHAIN (t));
+	    update_gimple_call (gsi, decl, 3, gimple_call_arg (stmt, 0),
+				build_int_cst (t, 0),
+				build_int_cst (NULL_TREE,
+					       1 << MEMMODEL_RELEASE));
+	    return;
+	  default:
+	    continue;
+	  }
+      }
+}
+
 /* Instruments the gimple pointed to by GSI. Return
    true if func entry/exit should be instrumented.  */
 
 static bool
-instrument_gimple (gimple_stmt_iterator gsi)
+instrument_gimple (gimple_stmt_iterator *gsi)
 {
   gimple stmt;
   tree rhs, lhs;
   bool instrumented = false;
 
-  stmt = gsi_stmt (gsi);
+  stmt = gsi_stmt (*gsi);
   if (is_gimple_call (stmt)
       && (gimple_call_fndecl (stmt)
 	  != builtin_decl_implicit (BUILT_IN_TSAN_INIT)))
-    return true;
+    {
+      if (is_gimple_builtin_call (stmt))
+	instrument_builtin_call (gsi);
+      return true;
+    }
   else if (is_gimple_assign (stmt)
 	   && !gimple_clobber_p (stmt))
     {
       if (gimple_store_p (stmt))
 	{
 	  lhs = gimple_assign_lhs (stmt);
-	  instrumented = instrument_expr (gsi, lhs, true);
+	  instrumented = instrument_expr (*gsi, lhs, true);
 	}
       if (gimple_assign_load_p (stmt))
 	{
 	  rhs = gimple_assign_rhs1 (stmt);
-	  instrumented = instrument_expr (gsi, rhs, false);
+	  instrumented = instrument_expr (*gsi, rhs, false);
 	}
     }
   return instrumented;
@@ -260,7 +598,7 @@  instrument_memory_accesses (void)
 
   FOR_EACH_BB (bb)
     for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-      fentry_exit_instrument |= instrument_gimple (gsi);
+      fentry_exit_instrument |= instrument_gimple (&gsi);
   return fentry_exit_instrument;
 }
 
--- gcc/builtin-types.def.jj	2012-11-23 10:31:37.866377460 +0100
+++ gcc/builtin-types.def	2012-11-23 13:36:00.576761947 +0100
@@ -447,6 +447,14 @@  DEF_FUNCTION_TYPE_4 (BT_FN_VOID_SIZE_VPT
 		     BT_VOLATILE_PTR, BT_PTR, BT_INT)
 DEF_FUNCTION_TYPE_4 (BT_FN_VOID_SIZE_CONST_VPTR_PTR_INT, BT_VOID, BT_SIZE,
 		     BT_CONST_VOLATILE_PTR, BT_PTR, BT_INT)
+DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_VPTR_PTR_I1_INT,
+		     BT_BOOL, BT_VOLATILE_PTR, BT_PTR, BT_I1, BT_INT)
+DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_VPTR_PTR_I2_INT,
+		     BT_BOOL, BT_VOLATILE_PTR, BT_PTR, BT_I2, BT_INT)
+DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_VPTR_PTR_I4_INT,
+		     BT_BOOL, BT_VOLATILE_PTR, BT_PTR, BT_I4, BT_INT)
+DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_VPTR_PTR_I8_INT,
+		     BT_BOOL, BT_VOLATILE_PTR, BT_PTR, BT_I8, BT_INT)
 
 DEF_FUNCTION_TYPE_5 (BT_FN_INT_STRING_INT_SIZE_CONST_STRING_VALIST_ARG,
 		     BT_INT, BT_STRING, BT_INT, BT_SIZE, BT_CONST_STRING,