diff mbox

Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C

Message ID D86529BC-DB94-481A-AE60-913D7E2B8D7F@comcast.net
State New
Headers show

Commit Message

Mike Stump Jan. 5, 2015, 10:01 p.m. UTC
On Jan 5, 2015, at 12:58 PM, Mike Stump <mikestump@comcast.net> wrote:
> So, to help you out, I tried my hand at wiring up the extra library code:

So, my tsan build finally finished, and I could try this with a real test case.  I ran it 20 times, and got 11 fails with:

$ i=20; while let i--; do make RUNTESTFLAGS=tsan.exp=aligned_vs_unaligned_race.C check-g++ | grep FAIL; done
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O2  execution test
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O0  execution test
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O2  execution test
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O2  execution test
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O2  execution test
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O0  execution test
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O0  execution test
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O2  execution test
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O2  execution test
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O2  execution test
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O0  execution test
$ 

When I fixed the problem, I ran it 20 times:

$ i=20; while let i--; do make RUNTESTFLAGS=tsan.exp=aligned_vs_unaligned_race.C check-g++ | grep FAIL; done
$ 

and got 0 failures.

So, it seems to work.  I’d like a tsan person to review it and we can go from there.

The only thing I would add to it would be a huge comment that explains that the tsan run time isn’t thread safe and they should use a lock free update to the shadow bits, but, they don’t.  We introduce the step primitive to work around that bug.  Ideally, the the problem should be filed into a bug database for the tsan code gen and when closed as not to be fixed, we can then change the word bug to design, but leave the bug reference so that others that want to completely understand the issue can go read up on it.  If they actually fix the codegen to be thread safe, then we can simply remove all the step code.

To make this clang friendly, if one turns off inlining and obscures the semantics with weak from the optimizer and puts it into a header files and then #includes that header files, I think it would work.  I’ll leave this to someone that might want to do that.  If not, I’m fine with #ifdef clang/gcc and have the gcc test cases use step and the clang test cases, well, be unreliable.

$ svn diff
$ cat g++.dg/tsan/lib.c 
/* { dg-do compile } */
#include <unistd.h>

volatile int serial;

__attribute__((no_sanitize_address))
void step(int i) {
  while (serial != i-1) ;
  while (1) {
    if (++serial == i)
      return;
    --serial;
    sleep (1);
  }
}

Comments

Bernd Edlinger Jan. 6, 2015, 1:06 a.m. UTC | #1
Hi Mike,


On Mon, 5 Jan 2015 14:01:42, Mike Stump wrote:
>
> On Jan 5, 2015, at 12:58 PM, Mike Stump <mikestump@comcast.net> wrote:
>> So, to help you out, I tried my hand at wiring up the extra library code:
>
> So, my tsan build finally finished, and I could try this with a real test case. I ran it 20 times, and got 11 fails with:
>
> $ i=20; while let i--; do make RUNTESTFLAGS=tsan.exp=aligned_vs_unaligned_race.C check-g++ | grep FAIL; done
> FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C -O2 execution test
> FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C -O0 execution test
> FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C -O2 execution test
> FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C -O2 execution test
> FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C -O2 execution test
> FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C -O0 execution test
> FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C -O0 execution test
> FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C -O2 execution test
> FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C -O2 execution test
> FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C -O2 execution test
> FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C -O0 execution test
> $
>
> When I fixed the problem, I ran it 20 times:
>
> $ i=20; while let i--; do make RUNTESTFLAGS=tsan.exp=aligned_vs_unaligned_race.C check-g++ | grep FAIL; done
> $
>
> and got 0 failures.
>
> So, it seems to work. I’d like a tsan person to review it and we can go from there.
>
> The only thing I would add to it would be a huge comment that explains that the tsan run time isn’t thread safe and they should use a lock free update to the shadow bits, but, they don’t. We introduce the step primitive to work around that bug. Ideally, the the problem should be filed into a bug database for the tsan code gen and when closed as not to be fixed, we can then change the word bug to design, but leave the bug reference so that others that want to completely understand the issue can go read up on it. If they actually fix the codegen to be thread safe, then we can simply remove all the step code.
>
> To make this clang friendly, if one turns off inlining and obscures the semantics with weak from the optimizer and puts it into a header files and then #includes that header files, I think it would work. I’ll leave this to someone that might want to do that. If not, I’m fine with #ifdef clang/gcc and have the gcc test cases use step and the clang test cases, well, be unreliable.
>
> $ svn diff
> Index: g++.dg/tsan/aligned_vs_unaligned_race.C
> ===================================================================
> --- g++.dg/tsan/aligned_vs_unaligned_race.C (revision 219198)
> +++ g++.dg/tsan/aligned_vs_unaligned_race.C (working copy)
> @@ -1,11 +1,17 @@
> +/* { dg-additional-sources "lib.c" } */
> /* { dg-shouldfail "tsan" } */
> +
> #include <pthread.h>
> #include <stdio.h>
> #include <stdint.h>
> +#include <unistd.h>
> +
> +void step(int);
>
> uint64_t Global[2];
>
> void *Thread1(void *x) {
> + step (2);
> Global[1]++;
> return NULL;
> }
> @@ -15,6 +21,7 @@ void *Thread2(void *x) {
> struct __attribute__((packed, aligned(1))) u_uint64_t { uint64_t val; };
> u_uint64_t *p4 = reinterpret_cast<u_uint64_t *>(p1 + 1);
> (*p4).val++;
> + step (1);
> return NULL;
> }
>
> $ cat g++.dg/tsan/lib.c
> /* { dg-do compile } */
> #include <unistd.h>
>
> volatile int serial;
>
> __attribute__((no_sanitize_address))
> void step(int i) {
> while (serial != i-1) ;
> while (1) {
> if (++serial == i)
> return;
> --serial;
> sleep (1);
> }
> }


I tried to elaborate your idea a bit, and used proper atomics.
It is necessary that the logic of the step function is completely invisible to tsan.
Therefore it should not call sleep, because that can be intercepted, even
if the code is not instrumented.  And if this is the right solution for 
aligned_vs_unaligned.C it should be the used in all other tests as well.

Unfortunately there is no __attribute__((no_sanitize_thread)) just
no_sanitize_address, and no_sanitize_undefined.  So I need to call
the gcc compiler driver a second time with different options to compile
lib.c and link that together with the test case.  If done by hand, the tests
work very reliable, even without any sleep.

I did not know about the dg-additional-sources, that was new to me.
But it does not quite do what I need.  Unfortunately it just adds lib.c
to the gcc command line, and both sources use the same options.
But that instruments the atomics in lib.c and thus tsan figures out,
that there is really no race condition at all.
So some tests start to fail.


make check-gcc-c++ RUNTESTFLAGS="tsan.exp=*"
...
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O0  execution test
FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O2  execution test
FAIL: g++.dg/tsan/atomic_free.C   -O0  execution test
FAIL: g++.dg/tsan/atomic_free.C   -O2  execution test


That is of course only a technical problem, but one that is beyond my scope.
And OTOH I doubt that these race-free tests look very realistic.


Bernd.
Bernd Edlinger Jan. 6, 2015, 9:08 a.m. UTC | #2
Hi Mike,

after some hours of sleep I realized that your step function can do something very interesting,
(which you already requested previously):

That is: create a race condition that is _always_ at 100% missed by tsan:

cat lib.c
/* { dg-do compile } */
/* { dg-options "-O2 -fno-sanitize=all" } */
 
static volatile int serial = 0;
 
void step (int i)
{
  while (__atomic_load_n (&serial, __ATOMIC_ACQUIRE) != i - 1);
  __atomic_store_n (&serial, i, __ATOMIC_RELEASE);
}

cat tiny_race.c 
/* { dg-shouldfail "tsan" } */

#include <pthread.h>

void step(int);

int Global;

void *Thread1(void *x) {
  step (1);
  Global = 42;
  step (3);
  return x;
}

int main() {
  pthread_t t;
  pthread_create(&t, 0, Thread1, 0);
  step (2);
  Global = 43;
  step (4);
  pthread_join(t, 0);
  return Global;
}

/* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */



Bernd.
Jakub Jelinek Jan. 6, 2015, 9:16 a.m. UTC | #3
On Tue, Jan 06, 2015 at 10:08:22AM +0100, Bernd Edlinger wrote:
> Hi Mike,
> 
> after some hours of sleep I realized that your step function can do something very interesting,
> (which you already requested previously):
> 
> That is: create a race condition that is _always_ at 100% missed by tsan:
> 
> cat lib.c
> /* { dg-do compile } */
> /* { dg-options "-O2 -fno-sanitize=all" } */
>  
> static volatile int serial = 0;
>  
> void step (int i)
> {
>   while (__atomic_load_n (&serial, __ATOMIC_ACQUIRE) != i - 1);
>   __atomic_store_n (&serial, i, __ATOMIC_RELEASE);
> }

Such busy waiting is not very CPU time friendly in the testsuite, especially
if you have just a single HW thread.
If libtsan is not fixed not to be so racy, perhaps instead of all the sleeps
we could arrange (on x86_64-linux, which is the only target supporting tsan
right now) to make sure the thread runs on the same core/hw thread as the
initial thread using pthread_[gs]etaffinity_np/pthread_attr_setaffinity_np ?
Does tsan then always report the races when the threads can't run
concurrently?

	Jakub
Bernd Edlinger Jan. 6, 2015, 9:38 a.m. UTC | #4
On Tue, 6 Jan 2015 10:16:33, Jakub Jelinek wrote:
>
> On Tue, Jan 06, 2015 at 10:08:22AM +0100, Bernd Edlinger wrote:
>> Hi Mike,
>>
>> after some hours of sleep I realized that your step function can do something very interesting,
>> (which you already requested previously):
>>
>> That is: create a race condition that is _always_ at 100% missed by tsan:
>>
>> cat lib.c
>> /* { dg-do compile } */
>> /* { dg-options "-O2 -fno-sanitize=all" } */
>>
>> static volatile int serial = 0;
>>
>> void step (int i)
>> {
>>   while (__atomic_load_n (&serial, __ATOMIC_ACQUIRE) != i - 1);
>>   __atomic_store_n (&serial, i, __ATOMIC_RELEASE);
>> }
>
> Such busy waiting is not very CPU time friendly in the testsuite, especially
> if you have just a single HW thread.

and in real-time environment, without time-slicing it is even a deadlock...

plus, I need to compile this helper with different options, that can be solved,
but currently I see no example where something like that was ever done.

> If libtsan is not fixed not to be so racy, perhaps instead of all the sleeps
> we could arrange (on x86_64-linux, which is the only target supporting tsan
> right now) to make sure the thread runs on the same core/hw thread as the
> initial thread using pthread_[gs]etaffinity_np/pthread_attr_setaffinity_np ?
> Does tsan then always report the races when the threads can't run
> concurrently?
>

Probably yes, but there can be no guarantees.  Even a single core can
be interrupted.

I think we can live with sleep(1) in aligned_vs_unaligned.C,
failure of that test will be very very unlikely as well.

.. and if we find a way how to do this custom buiild step in the test suite,
it would be good as a XFAIL test, that reminds us of the racy nature of libtsan.


Bernd.

> Jakub
diff mbox

Patch

Index: g++.dg/tsan/aligned_vs_unaligned_race.C
===================================================================
--- g++.dg/tsan/aligned_vs_unaligned_race.C     (revision 219198)
+++ g++.dg/tsan/aligned_vs_unaligned_race.C     (working copy)
@@ -1,11 +1,17 @@ 
+/* { dg-additional-sources "lib.c" } */
 /* { dg-shouldfail "tsan" } */
+
 #include <pthread.h>
 #include <stdio.h>
 #include <stdint.h>
+#include <unistd.h>
+
+void step(int);
 
 uint64_t Global[2];
 
 void *Thread1(void *x) {
+  step (2);
   Global[1]++;
   return NULL;
 }
@@ -15,6 +21,7 @@  void *Thread2(void *x) {
   struct __attribute__((packed, aligned(1))) u_uint64_t { uint64_t val; };
   u_uint64_t *p4 = reinterpret_cast<u_uint64_t *>(p1 + 1);
   (*p4).val++;
+  step (1);
   return NULL;
 }