diff mbox

Fix sporadic failure in g++.dg/tsan/aligned_vs_unaligned_race.C

Message ID DUB118-W3640B7ADB7473323625ED2E4460@phx.gbl
State New
Headers show

Commit Message

Bernd Edlinger Jan. 7, 2015, 6:21 p.m. UTC
On Wed, 7 Jan 2015 18:00:27, Jakub Jelinek wrote:
>
> On Wed, Jan 07, 2015 at 08:58:04AM -0800, Mike Stump wrote:
>> On Jan 7, 2015, at 12:23 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> But I really don't like the busy waiting.
>>
>> We’ve already determined that sched_sleep isn’t intercepted and can be used to non-busy wait. Any reason not to use it?
>>
>>> As tsan is only supported on x86_64-linux
>>
>> So, I hate hardening the code to be overly non-portable when it doesn’t have to be that. There is something enticing to me about the simplicity of sched_sleep.
>
> Well, pthread_barrier_wait and dlopen/dlsym are already used by libtsan and
> therefore have to be supported on all the architectures that support tsan.
> So that method is as portable as libtsan itself.
>
> Jakub


Ok, just for completeness, I've got the dlopen finally working:
RTLD_NOLOAD is not working, but RTLD_LAZY or RTLD_NOW does.

The test case is passing reliably with this method too.

I am however not sure if I can always use -ldl or have to use some target-dependencies here.





Thanks
Bernd.

Comments

Jakub Jelinek Jan. 7, 2015, 6:32 p.m. UTC | #1
On Wed, Jan 07, 2015 at 07:21:40PM +0100, Bernd Edlinger wrote:
> Ok, just for completeness, I've got the dlopen finally working:
> RTLD_NOLOAD is not working, but RTLD_LAZY or RTLD_NOW does.

No big deal I guess.

> I am however not sure if I can always use -ldl or have to use some target-dependencies here.

libsanitizer always puts
-lpthread -ldl -lm
into libsanitizer.spec, so I'd say it should be safe and you shouldn't even
need -ldl in dg-additional-options, as merely linking with -fsanitize=thread
should link -ldl in.

> --- barrier.h    (revision 0)
> +++ barrier.h    (working copy)

I think better to name it tsan_barrier.h or otherwise make it
clear what this is for.

> @@ -0,0 +1,12 @@
> +#include <dlfcn.h>
> +
> +static __typeof(pthread_barrier_wait) *barrier_wait;
> +
> +static
> +void barrier_init (pthread_barrier_t *barrier)

And, I'd add unsigned count argument here, and pass it through
pthread_barrier_init, just in case you need more than 2 threads
in some test.

Also, what do you need <sched.h> for?

	Jakub
Bernd Edlinger Jan. 7, 2015, 10:44 p.m. UTC | #2
On Wed, 7 Jan 2015 19:32:16, Jakub Jelinek wrote:
>> I am however not sure if I can always use -ldl or have to use some target-dependencies here.
>
> libsanitizer always puts
> -lpthread -ldl -lm
> into libsanitizer.spec, so I'd say it should be safe and you shouldn't even
> need -ldl in dg-additional-options, as merely linking with -fsanitize=thread
> should link -ldl in.
>

I need -ldl otherwise this happens:

FAIL: g++.dg/tsan/aligned_vs_unaligned_race.C   -O0  (test for excess errors)
Excess errors:
/home/ed/gnu/install/x86_64-unknown-linux-gnu/bin/ld: /tmp/ccW6IHbj.o: undefined reference to symbol 'dlsym@@GLIBC_2.2.5'
/lib/x86_64-linux-gnu/libdl.so.2: error adding symbols: DSO missing from command line


>> --- barrier.h    (revision 0)
>> +++ barrier.h    (working copy)
>
> I think better to name it tsan_barrier.h or otherwise make it
> clear what this is for.
>

Yes. I'll call it tsan_barrier.h and add a comment.

>> @@ -0,0 +1,12 @@
>> +#include <dlfcn.h>
>> +
>> +static __typeof(pthread_barrier_wait) *barrier_wait;
>> +
>> +static
>> +void barrier_init (pthread_barrier_t *barrier)
>
> And, I'd add unsigned count argument here, and pass it through
> pthread_barrier_init, just in case you need more than 2 threads
> in some test.
>

done.

> Also, what do you need <sched.h> for?
>

Oops ... Thanks for your help.

Here is a new patch, that uses this method on all tsan tests,
which seem to depend on sleep(1), and which have unstable results
because of that.

Successfully tested multiple times with:

make check-gcc-c check-gcc-c++ RUNTESTFLAGS="tsan.exp=*"


OK for trunk?


Thanks
Bernd.
testsuite/ChangeLog:
2015-01-07  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* c-c++-common/tsan/tsan_barrier.h: New.
	* c-c++-common/tsan/atomic_stack.c: Reworked to not depend on sleep.
	* c-c++-common/tsan/bitfield_race.c: Likewise.
	* c-c++-common/tsan/fd_pipe_race.c: Likewise.
	* c-c++-common/tsan/mutexset1.c: Likewise.
	* c-c++-common/tsan/race_on_barrier.c: Likewise.
	* c-c++-common/tsan/race_on_mutex.c: Likewise.
	* c-c++-common/tsan/race_on_mutex2.c: Likewise.
	* c-c++-common/tsan/simple_race.c: Likewise.
	* c-c++-common/tsan/simple_stack.c: Likewise.
	* c-c++-common/tsan/sleep_sync.c: Likewise.
	* c-c++-common/tsan/tiny_race.c: Likewise.
	* c-c++-common/tsan/tls_race.c: Likewise.
	* c-c++-common/tsan/write_in_reader_lock.c: Likewise.
	* g++.dg/tsan/aligned_vs_unaligned_race.C: Likewise.
	* g++.dg/tsan/atomic_free.C: Likewise.
	* g++.dg/tsan/atomic_free2.C: Likewise.
	* g++.dg/tsan/cond_race.C: Likewise.
	* g++.dg/tsan/tsan_barrier.h: Copied from c-c++-common/tsan.
Mike Stump Jan. 8, 2015, 7:24 p.m. UTC | #3
On Jan 7, 2015, at 2:44 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> Here is a new patch, that uses this method on all tsan tests,
> which seem to depend on sleep(1), and which have unstable results
> because of that.

> OK for trunk?

No.  So, I think this is wrong.  The problem is that any synchronizing primitive can trapped by tsan and added to the analysis, and if it resolves the race, then it should change the analysis that tsan produces.

The point of the atomic set, load primitives and volatile, the code-gen is a single instruction that tsan by definition, won’t now, or ever instrument because we tell it explicitly, don’t with no_sanitize_thread.

Since gcc now supports no_sanitize_thread, I don’t know of any reason why the test cases should not now be fixed to use step.
Jakub Jelinek Jan. 8, 2015, 7:29 p.m. UTC | #4
On Thu, Jan 08, 2015 at 11:24:21AM -0800, Mike Stump wrote:
> On Jan 7, 2015, at 2:44 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
> > Here is a new patch, that uses this method on all tsan tests,
> > which seem to depend on sleep(1), and which have unstable results
> > because of that.
> 
> > OK for trunk?
> 
> No.  So, I think this is wrong.  The problem is that any synchronizing primitive can trapped by tsan and added to the analysis, and if it resolves the race, then it should change the analysis that tsan produces.

I disagree.  Busy waiting of this kind is not appropriate for the testsuite,
we burn already way too much time and adding to that is undesirable.
tsan can't intercept the calls that you do through dlsym, because you
explicitly bypass tsan in that case.

> The point of the atomic set, load primitives and volatile, the code-gen is a single instruction that tsan by definition, won’t now, or ever instrument because we tell it explicitly, don’t with no_sanitize_thread.
> 
> Since gcc now supports no_sanitize_thread, I don’t know of any reason why the test cases should not now be fixed to use step.

See above.

	Jakub
Mike Stump Jan. 8, 2015, 9:07 p.m. UTC | #5
On Jan 8, 2015, at 11:29 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> I disagree.  Busy waiting of this kind is not appropriate for the test suite,

What busy waiting, there is none in the last version of the patch?

> tsan can't intercept the calls that you do through dlsym, because you
> explicitly bypass tsan in that case.

Ah, yes, right, I had pthread_barrier_wait on the brain, sorry.  The direct use of it would be problematic.  The dlopen use of it, is safe.

So, that removes the objection I had to his patch.  Jakub, since he has a complete solution to the problem submitted with all the test cases fixed, I think it should go in.

Any objections to approving it now?
Jakub Jelinek Jan. 8, 2015, 9:27 p.m. UTC | #6
On Thu, Jan 08, 2015 at 01:07:02PM -0800, Mike Stump wrote:
> On Jan 8, 2015, at 11:29 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > I disagree.  Busy waiting of this kind is not appropriate for the test suite,
> 
> What busy waiting, there is none in the last version of the patch?

It was still while (... != i - 1), wasn't it?
While pthread_barrier_wait/futex would typically only wake you up when
needed.

> > tsan can't intercept the calls that you do through dlsym, because you
> > explicitly bypass tsan in that case.
> 
> Ah, yes, right, I had pthread_barrier_wait on the brain, sorry.  The direct use of it would be problematic.  The dlopen use of it, is safe.
> 
> So, that removes the objection I had to his patch.  Jakub, since he has a complete solution to the problem submitted with all the test cases fixed, I think it should go in.
> 
> Any objections to approving it now?

LGTM.

	Jakub
Mike Stump Jan. 8, 2015, 10:05 p.m. UTC | #7
On Jan 8, 2015, at 1:27 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> Any objections to approving it now?
> 
> LGTM.

Patch is Ok.  If you could send the clang folks a heads up, I’ve love to see them adopt the style.
Bernd Edlinger Jan. 8, 2015, 10:23 p.m. UTC | #8
On Thu, 8 Jan 2015 14:05:32, Mike Stump wrote:
>
> On Jan 8, 2015, at 1:27 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> Any objections to approving it now?
>>
>> LGTM.
>
> Patch is Ok. If you could send the clang folks a heads up, I’ve love to see them adopt the style.


Thanks, I am glad that we finally have a working solution.

Checked in as https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=219367


Dmitry, would you like to use that solution also in the clang tree, instead of sleep(1) and %deflake ?


Thanks
Bernd.
Bernd Edlinger Jan. 9, 2015, 3:32 p.m. UTC | #9
Hi,

On Thu, 8 Jan 2015 22:27:26, Jakub Jelinek wrote:
>> Any objections to approving it now?
>
> LGTM.
>
> Jakub

would it be OK to apply this patch also to the 4.9 testsuite,
except for c-c++-common/tsan/bitfield_race.c and 
g++.dg/tsan/aligned_vs_unaligned_race.C of course?


Bernd.
Jakub Jelinek Jan. 9, 2015, 3:36 p.m. UTC | #10
On Fri, Jan 09, 2015 at 04:32:47PM +0100, Bernd Edlinger wrote:
> Hi,
> 
> On Thu, 8 Jan 2015 22:27:26, Jakub Jelinek wrote:
> >> Any objections to approving it now?
> >
> > LGTM.
> >
> > Jakub
> 
> would it be OK to apply this patch also to the 4.9 testsuite,
> except for c-c++-common/tsan/bitfield_race.c and 
> g++.dg/tsan/aligned_vs_unaligned_race.C of course?

Yes, but please give Dmitry some time to respond.

	Jakub
Dmitry Vyukov Jan. 19, 2015, 8:47 a.m. UTC | #11
Long story short. Tsan has a logical data race the core of data race
detection algorithm. The race is not a bug, but a deliberate design
decision that makes tsan considerably faster. So ironically, if the
race memory accesses happen almost simultaneously, tsan can miss the
race.
Thus we have sleeps.
Sleeps vs barrier is being discussed in the "Fix parameters of
__tsan_vptr_update" thread.
I would really like to keep llvm and gcc tests in sync as much as possible.



On Fri, Jan 9, 2015 at 6:36 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Jan 09, 2015 at 04:32:47PM +0100, Bernd Edlinger wrote:
>> Hi,
>>
>> On Thu, 8 Jan 2015 22:27:26, Jakub Jelinek wrote:
>> >> Any objections to approving it now?
>> >
>> > LGTM.
>> >
>> > Jakub
>>
>> would it be OK to apply this patch also to the 4.9 testsuite,
>> except for c-c++-common/tsan/bitfield_race.c and
>> g++.dg/tsan/aligned_vs_unaligned_race.C of course?
>
> Yes, but please give Dmitry some time to respond.
>
>         Jakub
Mike Stump Jan. 19, 2015, 3:12 p.m. UTC | #12
On Jan 19, 2015, at 12:47 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Long story short. Tsan has a logical data race the core of data race
> detection algorithm. The race is not a bug, but a deliberate design
> decision that makes tsan considerably faster.

Could you please quantify that for us?  Also, what lockless update method did you use?  Did you try atomic increment?  On my machine, they are as cheap as stores; can’t imagine they could be slow at all.  If the latency and bandwidth of atomic increment is identical to store, would the costs be any higher than using a store to update the tsan data?  A proper port of tsan to my machine would make use of atomic increment.  I consider it a simple matter to sequence the thread termination and the output routine to ensure that all the updates in the threads happen before the output routine runs.  The output routine strikes me as slow, and thread termination strikes me as slow, so taking a little extra time there seems reasonable.  Was the excessive cost you saw due to the termination costs?

> So ironically, if the race memory accesses happen almost simultaneously, tsan can miss the
> race.
> Thus we have sleeps.

I’ve not seen a reason why the test suite should randomly fail.  The gcc test suite does not.  Could you explain why the llvm test suite does?  Surely you know that sleep is not a synchronization primitive?

> Sleeps vs barrier is being discussed in the "Fix parameters of
> __tsan_vptr_update" thread.

When finished, let us know the outcome.  To date, I’ve not seen any compelling reason to have the core of tsan be other than deterministic and the test suite other than deterministic.  I’d love to see the backing for such a decision.

> I would really like to keep llvm and gcc tests in sync as much as possible.

Excellent, from my perspective, that would mean that you make the llvm test suite deterministic.
diff mbox

Patch

Index: aligned_vs_unaligned_race.C
===================================================================
--- aligned_vs_unaligned_race.C    (revision 219198)
+++ aligned_vs_unaligned_race.C    (working copy)
@@ -1,11 +1,17 @@ 
 /* { dg-shouldfail "tsan" } */
+/* { dg-additional-options "-ldl" } */
 #include <pthread.h>
+#include <sched.h>
 #include <stdio.h>
 #include <stdint.h>
+#include "barrier.h"
 
+static pthread_barrier_t barrier;
+
 uint64_t Global[2];
 
 void *Thread1(void *x) {
+  barrier_wait(&barrier);
   Global[1]++;
   return NULL;
 }
@@ -15,10 +21,12 @@ 
   struct __attribute__((packed, aligned(1))) u_uint64_t { uint64_t val; };
   u_uint64_t *p4 = reinterpret_cast<u_uint64_t *>(p1 + 1);
   (*p4).val++;
+  barrier_wait(&barrier);
   return NULL;
 }
 
 int main() {
+  barrier_init(&barrier);
   pthread_t t[2];
   pthread_create(&t[0], NULL, Thread1, NULL);
   pthread_create(&t[1], NULL, Thread2, NULL);
Index: barrier.h
===================================================================
--- barrier.h    (revision 0)
+++ barrier.h    (working copy)
@@ -0,0 +1,12 @@ 
+#include <dlfcn.h>
+
+static __typeof(pthread_barrier_wait) *barrier_wait;
+
+static
+void barrier_init (pthread_barrier_t *barrier)
+{
+  void *h = dlopen ("libpthread.so.0", RTLD_LAZY);
+  barrier_wait = (__typeof (pthread_barrier_wait) *)
+          dlsym (h, "pthread_barrier_wait");
+  pthread_barrier_init (barrier, NULL, 2);
+}