diff mbox

Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C

Message ID DUB118-W15350AC346C8CE664A5ECDE4590@phx.gbl
State New
Headers show

Commit Message

Bernd Edlinger Jan. 6, 2015, 5:45 p.m. UTC
----------------------------------------
> Date: Tue, 6 Jan 2015 10:16:33 +0100
> From: jakub@redhat.com
> To: bernd.edlinger@hotmail.de
> CC: mikestump@comcast.net; hjl.tools@gmail.com; gcc-patches@gcc.gnu.org; dvyukov@google.com
> Subject: Re: [PATCH] Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C
>
> On Tue, Jan 06, 2015 at 10:08:22AM +0100, Bernd Edlinger wrote:
>> Hi Mike,
>>
>> after some hours of sleep I realized that your step function can do something very interesting,
>> (which you already requested previously):
>>
>> That is: create a race condition that is _always_ at 100% missed by tsan:
>>
>> cat lib.c
>> /* { dg-do compile } */
>> /* { dg-options "-O2 -fno-sanitize=all" } */
>>
>> static volatile int serial = 0;
>>
>> void step (int i)
>> {
>>   while (__atomic_load_n (&serial, __ATOMIC_ACQUIRE) != i - 1);
>>   __atomic_store_n (&serial, i, __ATOMIC_RELEASE);
>> }
>
> Such busy waiting is not very CPU time friendly in the testsuite, especially
> if you have just a single HW thread.
> If libtsan is not fixed not to be so racy, perhaps instead of all the sleeps
> we could arrange (on x86_64-linux, which is the only target supporting tsan
> right now) to make sure the thread runs on the same core/hw thread as the
> initial thread using pthread_[gs]etaffinity_np/pthread_attr_setaffinity_np ?
> Does tsan then always report the races when the threads can't run
> concurrently?
>
> Jakub

I tried your suggestion now, and it seems to work. (on a  4-way core AMD64 laptop)

Would you prefer this over adding a sleep in Thread1, which I posted previously?




2015-01-06  Bernd Edlinger  <bernd.edlinger@hotmail.de>

        * g++.dg/tsan/aligned_vs_unaligned_race.C: Fixed sporadic failure.





Thanks,
Bernd.

Comments

Mike Stump Jan. 6, 2015, 7:47 p.m. UTC | #1
On Jan 6, 2015, at 9:45 AM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> I tried your suggestion now, and it seems to work. (on a  4-way core AMD64 laptop)
> 
> Would you prefer this over adding a sleep in Thread1, which I posted previously?

The problem with the patch is there is nothing in the patch that makes it work.  You merely lower the odds of it failing.  The point of addressing the problem, and the problem is, this test case randomly fails, is to change the test case, so that it is impossible by design for the test case to ever fail.  In my version of the fix, I think it can't fail.
Bernd Edlinger Jan. 6, 2015, 11:22 p.m. UTC | #2
On Tue, 6 Jan 2015 11:47:30, Mike Stump wrote:
>
> On Jan 6, 2015, at 9:45 AM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>> I tried your suggestion now, and it seems to work. (on a 4-way core AMD64 laptop)
>>
>> Would you prefer this over adding a sleep in Thread1, which I posted previously?
>
> The problem with the patch is there is nothing in the patch that makes it work. You merely lower the odds of it failing. The point of addressing the problem, and the problem is, this test case randomly fails, is to change the test case, so that it is impossible by design for the test case to ever fail. In my version of the fix, I think it can't fail.

Yes, I think too that it can't fail under these conditions. Maybe that is just a philosophical problem.

If we prepare the test in this way it does not really contain any race condition.
   
(*p4).val++;
   step (1);

happens first, and then:

   step (2);
   Global[1]++;

but we consider the test passed, when we see a diagnostic?


>> Therefore it should not call sleep,
> Again, sleep can’t fix race conditions, so therefore tsan can never use it as an indication that no race exists.  If it ever did, that would be a bug.  The only case where that would not be true, would be a hard real time analysis verifier, and, when we get one, sleep can be so modified.
>> because that can be intercepted, even if the code is not instrumented.
> I don’t see the relevance.


I mention that only, because sleep, usleep and nanosleep are intercepted by tsan_interceptors.cc
and calling them alters the printed diagnostics, some tests need to see "as if synchronized by sleep",
some don't.

As far as I can see, there is no interceptor for sched_yield, so using that in the step function is OK.

However the other use of step is a valid test, although one that always fails.
Here we see a real race condition with no tsan diagnostic, just because both
threads alter the same variable at the same nanosecond, that is remarkable.
Just as if tsan had a blind spot here.

Once again, with a 3-way handshake in case the Tread1 is scheduled before
pthread_create returns:

cat tiny-race.c
/* { dg-shouldfail "tsan" } */

#include <pthread.h>

void step(int);

int Global;

void *Thread1(void *x) {
  step (2);
  Global = 42;
  return x;
}

int main() {
  pthread_t t;
  pthread_create(&t, 0, Thread1, 0);
  step (1);
  step (3);
  Global = 43;
  pthread_join(t, 0);
  return 0;
}

/* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */

>> Unfortunately there is no __attribute__((no_sanitize_thread))
> So, is that any harder to add then:

If a new attribute is easier to implement than teaching the tsan.exp tcl script
that lib.c is something special, maybe...


Bernd.
Mike Stump Jan. 7, 2015, 12:32 a.m. UTC | #3
On Jan 6, 2015, at 3:22 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> Yes, I think too that it can't fail under these conditions.

If you mean your version…  A lot has been written on how to to make racy code non-racy…  I’d refer you to the literature on all the various solutions people have found to date.  I don’t think I’ve ever seen anyone claim that affinity can be used to solve race conditions like this.  Do you have a pointer?

> If we prepare the test in this way it does not really contain any race condition.
> 
> (*p4).val++;
>   step (1);
> 
> happens first, and then:
> 
>   step (2);
>   Global[1]++;
> 
> but we consider the test passed, when we see a diagnostic?

If tsan doesn’t work in the face of a race, then trivially, the code to test it cannot have a race.  The difference is letting tsan know there is a race, versus hiding the fact there is no race from it, so that it still thinks there is a race.  For test cases to diagnose a race, those cases must have the code that ensures there is not a race, hidden from view.

> As far as I can see, there is no interceptor for sched_yield, so using that in the step function is OK.

Sounds good.

I’m running a test suite run now with my attribute patch.  I’ll ask the tsan people if it can go in, if it passes.
Bernd Edlinger Jan. 7, 2015, 7:17 a.m. UTC | #4
On Tue, 6 Jan 2015 16:32:35, Mike Stump wrote:
>
> On Jan 6, 2015, at 3:22 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>> Yes, I think too that it can't fail under these conditions.
>
> If you mean your version… A lot has been written on how to to make racy code non-racy… I’d refer you to the literature on all the various solutions people have found to date. I don’t think I’ve ever seen anyone claim that affinity can be used to solve race conditions like this. Do you have a pointer?
>

no, I agree, the affinity would just lower the likelihood, as sleep does.
The version with affinity is just disgusting, sorry to have posted it ;-).

I meant your version, with step(1)/step(2) between the race. It is probably bullet-proof.
But the version with sleep is more realistic.  That is however just a matter of taste...

If we can agree to use your version, then all tests in g++.dg/tsan and
c-c++-common/tsan that currently use sleep(1) should use your step function instead.
sleep_sync.c can use step but needs still to call sleep because it triggers on the
"As if synchronized via sleep" diagnostic.

I don't know what to make of simple_race.c, which uses usleep, but uses
a loop to repeat the race often. Maybe leave that one as a "realistic" test.


Bernd.
Jakub Jelinek Jan. 7, 2015, 8:23 a.m. UTC | #5
On Wed, Jan 07, 2015 at 08:17:34AM +0100, Bernd Edlinger wrote:
> 
> On Tue, 6 Jan 2015 16:32:35, Mike Stump wrote:
> >
> > On Jan 6, 2015, at 3:22 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> >> Yes, I think too that it can't fail under these conditions.
> >
> > If you mean your version… A lot has been written on how to to make racy code non-racy… I’d refer you to the literature on all the various solutions people have found to date. I don’t think I’ve ever seen anyone claim that affinity can be used to solve race conditions like this. Do you have a pointer?
> >
> 
> no, I agree, the affinity would just lower the likelihood, as sleep does.
> The version with affinity is just disgusting, sorry to have posted it ;-).
> 
> I meant your version, with step(1)/step(2) between the race. It is probably bullet-proof.
> But the version with sleep is more realistic.  That is however just a matter of taste...
> 
> If we can agree to use your version, then all tests in g++.dg/tsan and
> c-c++-common/tsan that currently use sleep(1) should use your step function instead.
> sleep_sync.c can use step but needs still to call sleep because it triggers on the
> "As if synchronized via sleep" diagnostic.
> 
> I don't know what to make of simple_race.c, which uses usleep, but uses
> a loop to repeat the race often. Maybe leave that one as a "realistic" test.

I'm fine with adding the no_sanitize_thread attribute, the patch LGTM, I
think we might even have a PR for that.

But I really don't like the busy waiting.  You essentially want something
like pthread_barrier_wait that libtsan isn't aware of, right?

As tsan is only supported on x86_64-linux (and in the future could be on
some other 64-bit linuxes), I wonder if we couldn't just simplify libgomp's
config/linux/bar* for that purpose - libtsan doesn't intercept syscall(3)
and doesn't intercept futex in particular, so we could have a busy waiting
free synchronization primitive.  Or another option is to just dlopen
libpthread.so.0 and dlsym pthread_barrier_wait in it and use that, that way
libtsan won't intercept it either.

	Jakub
Bernd Edlinger Jan. 7, 2015, 2:55 p.m. UTC | #6
On Wed, 7 Jan 2015 09:23:39, Jakub Jelinek wrote:
>
> But I really don't like the busy waiting. You essentially want something
> like pthread_barrier_wait that libtsan isn't aware of, right?
>

Yes.


> As tsan is only supported on x86_64-linux (and in the future could be on
> some other 64-bit linuxes), I wonder if we couldn't just simplify libgomp's
> config/linux/bar* for that purpose - libtsan doesn't intercept syscall(3)
> and doesn't intercept futex in particular, so we could have a busy waiting
> free synchronization primitive. Or another option is to just dlopen
> libpthread.so.0 and dlsym pthread_barrier_wait in it and use that, that way
> libtsan won't intercept it either.
>


That would be the honey pot.  But unfortunately I can't reach it.


I tried this,

cat barrier.h
#include <dlfcn.h>

static pthread_barrier_t barrier;

static int (*barrier_wait)(pthread_barrier_t *);

__attribute__((constructor(101)))
void barrier_init()
{
  pthread_barrier_init (&barrier, NULL, 2);
  barrier_wait = (int (*)(pthread_barrier_t *))
                 dlsym (dlopen ("pthread.so.0", RTLD_NOW), "pthread_barrier_wait");
}


But dlsym gives me only a pointer to the tsan-interceptor.


Bernd.
Jakub Jelinek Jan. 7, 2015, 3:04 p.m. UTC | #7
On Wed, Jan 07, 2015 at 03:55:11PM +0100, Bernd Edlinger wrote:
> cat barrier.h
> #include <dlfcn.h>
> 
> static pthread_barrier_t barrier;
> 
> static int (*barrier_wait)(pthread_barrier_t *);

Better
static __typeof (pthread_barrier_wait) *barrier_wait;
?
> 
> __attribute__((constructor(101)))

I wouldn't use a constructor for it, instead simply call it from the
testcase at the start of main.

> void barrier_init()
> {
>   pthread_barrier_init (&barrier, NULL, 2);
>   barrier_wait = (int (*)(pthread_barrier_t *))
>                  dlsym (dlopen ("pthread.so.0", RTLD_NOW), "pthread_barrier_wait");

"libpthread.so.0" instead.

I'd use:
#ifdef RTLD_NOLOAD
  void *h = dlopen ("libpthread.so.0", RTLD_NOLOAD);
#else
  void *h = dlopen ("libpthread.so.0", RTLD_NOW);
#endif
  barrier_wait = (__typeof (pthread_barrier_wait) *) dlsym (h, "pthread_barrier_wait");

	Jakub
Mike Stump Jan. 7, 2015, 4:35 p.m. UTC | #8
On Jan 6, 2015, at 11:17 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> 
> On Tue, 6 Jan 2015 16:32:35, Mike Stump wrote:
>> 
>> On Jan 6, 2015, at 3:22 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>>> Yes, I think too that it can't fail under these conditions.
>> 
>> If you mean your version… A lot has been written on how to to make racy code non-racy… I’d refer you to the literature on all the various solutions people have found to date. I don’t think I’ve ever seen anyone claim that affinity can be used to solve race conditions like this. Do you have a pointer?
>> 
> 
> no, I agree, the affinity would just lower the likelihood, as sleep does.
> The version with affinity is just disgusting, sorry to have posted it ;-).
> 
> I meant your version, with step(1)/step(2) between the race. It is probably bullet-proof.
> But the version with sleep is more realistic.

> sleep_sync.c can use step but needs still to call sleep because it triggers on the
> "As if synchronized via sleep" diagnostic.

Ah, thanks for the pointer.  Yeah, it is important to not remove that sleep, though, if the test case itself is racy, we can use step to ensure the elements of it happen in a deterministic order.

> I don't know what to make of simple_race.c, which uses usleep, but uses
> a loop to repeat the race often. Maybe leave that one as a "realistic" test.

I looked at that test case to see if it is flaky (llvm term) and found:

http://reviews.llvm.org/D3913

> The tests test that ThreadSanitizer finds the data race in particular conditions. However, ThreadSanitizer core algorithm can miss a data race when the racy memory access attempts happen very close to each other (literally simultaneously). This was done intentionally, fixing this would impose significant slowdown and this is not a problem for programs other than unit tests.

So, clearly your presentation of the base problem is accurate.  I still doubt the costs of a good solution are all that expensive.

Anyway, http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140526/219172.html makes it clear that this test case is flakey (and should be fixed).

I’d fix it by removing all looping and making it deterministic (with step).  We can leave the usleep in there, it doesn’t hurt; though, not sure it adds anything.
Mike Stump Jan. 7, 2015, 4:58 p.m. UTC | #9
On Jan 7, 2015, at 12:23 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> But I really don't like the busy waiting.

We’ve already determined that sched_sleep isn’t intercepted and can be used to non-busy wait.  Any reason not to use it?

> As tsan is only supported on x86_64-linux

So, I hate hardening the code to be overly non-portable when it doesn’t have to be that.  There is something enticing to me about the simplicity of sched_sleep.
Jakub Jelinek Jan. 7, 2015, 5 p.m. UTC | #10
On Wed, Jan 07, 2015 at 08:58:04AM -0800, Mike Stump wrote:
> On Jan 7, 2015, at 12:23 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > But I really don't like the busy waiting.
> 
> We’ve already determined that sched_sleep isn’t intercepted and can be used to non-busy wait.  Any reason not to use it?
> 
> > As tsan is only supported on x86_64-linux
> 
> So, I hate hardening the code to be overly non-portable when it doesn’t have to be that.  There is something enticing to me about the simplicity of sched_sleep.

Well, pthread_barrier_wait and dlopen/dlsym are already used by libtsan and
therefore have to be supported on all the architectures that support tsan.
So that method is as portable as libtsan itself.

	Jakub
diff mbox

Patch

Index: aligned_vs_unaligned_race.C
===================================================================
--- aligned_vs_unaligned_race.C    (revision 219198)
+++ aligned_vs_unaligned_race.C    (working copy)
@@ -1,5 +1,7 @@ 
 /* { dg-shouldfail "tsan" } */
+/* { dg-additional-options "-pthread" } */
 #include <pthread.h>
+#include <sched.h>
 #include <stdio.h>
 #include <stdint.h>
 
@@ -19,6 +21,12 @@  void *Thread2(void *x) {
 }
 
 int main() {
+  /* Run this test on a single CPU, to make it somewhat easier for tsan to
+     detect the race condition.  */
+  cpu_set_t s;
+  CPU_ZERO(&s);
+  CPU_SET(0, &s);
+  pthread_setaffinity_np(pthread_self(), sizeof(s), &s);
   pthread_t t[2];
   pthread_create(&t[0], NULL, Thread1, NULL);
   pthread_create(&t[1], NULL, Thread2, NULL);