Message ID | DUB118-W15350AC346C8CE664A5ECDE4590@phx.gbl |
---|---|
State | New |
Headers | show |
On Jan 6, 2015, at 9:45 AM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > I tried your suggestion now, and it seems to work. (on a 4-way core AMD64 laptop) > > Would you prefer this over adding a sleep in Thread1, which I posted previously? The problem with the patch is there is nothing in the patch that makes it work. You merely lower the odds of it failing. The point of addressing the problem, and the problem is, this test case randomly fails, is to change the test case, so that it is impossible by design for the test case to ever fail. In my version of the fix, I think it can't fail.
On Tue, 6 Jan 2015 11:47:30, Mike Stump wrote: > > On Jan 6, 2015, at 9:45 AM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: >> I tried your suggestion now, and it seems to work. (on a 4-way core AMD64 laptop) >> >> Would you prefer this over adding a sleep in Thread1, which I posted previously? > > The problem with the patch is there is nothing in the patch that makes it work. You merely lower the odds of it failing. The point of addressing the problem, and the problem is, this test case randomly fails, is to change the test case, so that it is impossible by design for the test case to ever fail. In my version of the fix, I think it can't fail. Yes, I think too that it can't fail under these conditions. Maybe that is just a philosophical problem. If we prepare the test in this way it does not really contain any race condition. (*p4).val++; step (1); happens first, and then: step (2); Global[1]++; but we consider the test passed, when we see a diagnostic? >> Therefore it should not call sleep, > Again, sleep can’t fix race conditions, so therefore tsan can never use it as an indication that no race exists. If it ever did, that would be a bug. The only case where that would not be true, would be a hard real time analysis verifier, and, when we get one, sleep can be so modified. >> because that can be intercepted, even if the code is not instrumented. > I don’t see the relevance. I mention that only, because sleep, usleep and nanosleep are intercepted by tsan_interceptors.cc and calling them alters the printed diagnostics, some tests need to see "as if synchronized by sleep", some don't. As far as I can see, there is no interceptor for sched_yield, so using that in the step function is OK. However the other use of step is a valid test, although one that always fails. Here we see a real race condition with no tsan diagnostic, just because both threads alter the same variable at the same nanosecond, that is remarkable. Just as if tsan had a blind spot here. Once again, with a 3-way handshake in case the Tread1 is scheduled before pthread_create returns: cat tiny-race.c /* { dg-shouldfail "tsan" } */ #include <pthread.h> void step(int); int Global; void *Thread1(void *x) { step (2); Global = 42; return x; } int main() { pthread_t t; pthread_create(&t, 0, Thread1, 0); step (1); step (3); Global = 43; pthread_join(t, 0); return 0; } /* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */ >> Unfortunately there is no __attribute__((no_sanitize_thread)) > So, is that any harder to add then: If a new attribute is easier to implement than teaching the tsan.exp tcl script that lib.c is something special, maybe... Bernd.
On Jan 6, 2015, at 3:22 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > Yes, I think too that it can't fail under these conditions. If you mean your version… A lot has been written on how to to make racy code non-racy… I’d refer you to the literature on all the various solutions people have found to date. I don’t think I’ve ever seen anyone claim that affinity can be used to solve race conditions like this. Do you have a pointer? > If we prepare the test in this way it does not really contain any race condition. > > (*p4).val++; > step (1); > > happens first, and then: > > step (2); > Global[1]++; > > but we consider the test passed, when we see a diagnostic? If tsan doesn’t work in the face of a race, then trivially, the code to test it cannot have a race. The difference is letting tsan know there is a race, versus hiding the fact there is no race from it, so that it still thinks there is a race. For test cases to diagnose a race, those cases must have the code that ensures there is not a race, hidden from view. > As far as I can see, there is no interceptor for sched_yield, so using that in the step function is OK. Sounds good. I’m running a test suite run now with my attribute patch. I’ll ask the tsan people if it can go in, if it passes.
On Tue, 6 Jan 2015 16:32:35, Mike Stump wrote: > > On Jan 6, 2015, at 3:22 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: >> Yes, I think too that it can't fail under these conditions. > > If you mean your version… A lot has been written on how to to make racy code non-racy… I’d refer you to the literature on all the various solutions people have found to date. I don’t think I’ve ever seen anyone claim that affinity can be used to solve race conditions like this. Do you have a pointer? > no, I agree, the affinity would just lower the likelihood, as sleep does. The version with affinity is just disgusting, sorry to have posted it ;-). I meant your version, with step(1)/step(2) between the race. It is probably bullet-proof. But the version with sleep is more realistic. That is however just a matter of taste... If we can agree to use your version, then all tests in g++.dg/tsan and c-c++-common/tsan that currently use sleep(1) should use your step function instead. sleep_sync.c can use step but needs still to call sleep because it triggers on the "As if synchronized via sleep" diagnostic. I don't know what to make of simple_race.c, which uses usleep, but uses a loop to repeat the race often. Maybe leave that one as a "realistic" test. Bernd.
On Wed, Jan 07, 2015 at 08:17:34AM +0100, Bernd Edlinger wrote: > > On Tue, 6 Jan 2015 16:32:35, Mike Stump wrote: > > > > On Jan 6, 2015, at 3:22 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > >> Yes, I think too that it can't fail under these conditions. > > > > If you mean your version… A lot has been written on how to to make racy code non-racy… I’d refer you to the literature on all the various solutions people have found to date. I don’t think I’ve ever seen anyone claim that affinity can be used to solve race conditions like this. Do you have a pointer? > > > > no, I agree, the affinity would just lower the likelihood, as sleep does. > The version with affinity is just disgusting, sorry to have posted it ;-). > > I meant your version, with step(1)/step(2) between the race. It is probably bullet-proof. > But the version with sleep is more realistic. That is however just a matter of taste... > > If we can agree to use your version, then all tests in g++.dg/tsan and > c-c++-common/tsan that currently use sleep(1) should use your step function instead. > sleep_sync.c can use step but needs still to call sleep because it triggers on the > "As if synchronized via sleep" diagnostic. > > I don't know what to make of simple_race.c, which uses usleep, but uses > a loop to repeat the race often. Maybe leave that one as a "realistic" test. I'm fine with adding the no_sanitize_thread attribute, the patch LGTM, I think we might even have a PR for that. But I really don't like the busy waiting. You essentially want something like pthread_barrier_wait that libtsan isn't aware of, right? As tsan is only supported on x86_64-linux (and in the future could be on some other 64-bit linuxes), I wonder if we couldn't just simplify libgomp's config/linux/bar* for that purpose - libtsan doesn't intercept syscall(3) and doesn't intercept futex in particular, so we could have a busy waiting free synchronization primitive. Or another option is to just dlopen libpthread.so.0 and dlsym pthread_barrier_wait in it and use that, that way libtsan won't intercept it either. Jakub
On Wed, 7 Jan 2015 09:23:39, Jakub Jelinek wrote: > > But I really don't like the busy waiting. You essentially want something > like pthread_barrier_wait that libtsan isn't aware of, right? > Yes. > As tsan is only supported on x86_64-linux (and in the future could be on > some other 64-bit linuxes), I wonder if we couldn't just simplify libgomp's > config/linux/bar* for that purpose - libtsan doesn't intercept syscall(3) > and doesn't intercept futex in particular, so we could have a busy waiting > free synchronization primitive. Or another option is to just dlopen > libpthread.so.0 and dlsym pthread_barrier_wait in it and use that, that way > libtsan won't intercept it either. > That would be the honey pot. But unfortunately I can't reach it. I tried this, cat barrier.h #include <dlfcn.h> static pthread_barrier_t barrier; static int (*barrier_wait)(pthread_barrier_t *); __attribute__((constructor(101))) void barrier_init() { pthread_barrier_init (&barrier, NULL, 2); barrier_wait = (int (*)(pthread_barrier_t *)) dlsym (dlopen ("pthread.so.0", RTLD_NOW), "pthread_barrier_wait"); } But dlsym gives me only a pointer to the tsan-interceptor. Bernd.
On Wed, Jan 07, 2015 at 03:55:11PM +0100, Bernd Edlinger wrote: > cat barrier.h > #include <dlfcn.h> > > static pthread_barrier_t barrier; > > static int (*barrier_wait)(pthread_barrier_t *); Better static __typeof (pthread_barrier_wait) *barrier_wait; ? > > __attribute__((constructor(101))) I wouldn't use a constructor for it, instead simply call it from the testcase at the start of main. > void barrier_init() > { > pthread_barrier_init (&barrier, NULL, 2); > barrier_wait = (int (*)(pthread_barrier_t *)) > dlsym (dlopen ("pthread.so.0", RTLD_NOW), "pthread_barrier_wait"); "libpthread.so.0" instead. I'd use: #ifdef RTLD_NOLOAD void *h = dlopen ("libpthread.so.0", RTLD_NOLOAD); #else void *h = dlopen ("libpthread.so.0", RTLD_NOW); #endif barrier_wait = (__typeof (pthread_barrier_wait) *) dlsym (h, "pthread_barrier_wait"); Jakub
On Jan 6, 2015, at 11:17 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: > > On Tue, 6 Jan 2015 16:32:35, Mike Stump wrote: >> >> On Jan 6, 2015, at 3:22 PM, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote: >>> Yes, I think too that it can't fail under these conditions. >> >> If you mean your version… A lot has been written on how to to make racy code non-racy… I’d refer you to the literature on all the various solutions people have found to date. I don’t think I’ve ever seen anyone claim that affinity can be used to solve race conditions like this. Do you have a pointer? >> > > no, I agree, the affinity would just lower the likelihood, as sleep does. > The version with affinity is just disgusting, sorry to have posted it ;-). > > I meant your version, with step(1)/step(2) between the race. It is probably bullet-proof. > But the version with sleep is more realistic. > sleep_sync.c can use step but needs still to call sleep because it triggers on the > "As if synchronized via sleep" diagnostic. Ah, thanks for the pointer. Yeah, it is important to not remove that sleep, though, if the test case itself is racy, we can use step to ensure the elements of it happen in a deterministic order. > I don't know what to make of simple_race.c, which uses usleep, but uses > a loop to repeat the race often. Maybe leave that one as a "realistic" test. I looked at that test case to see if it is flaky (llvm term) and found: http://reviews.llvm.org/D3913 > The tests test that ThreadSanitizer finds the data race in particular conditions. However, ThreadSanitizer core algorithm can miss a data race when the racy memory access attempts happen very close to each other (literally simultaneously). This was done intentionally, fixing this would impose significant slowdown and this is not a problem for programs other than unit tests. So, clearly your presentation of the base problem is accurate. I still doubt the costs of a good solution are all that expensive. Anyway, http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140526/219172.html makes it clear that this test case is flakey (and should be fixed). I’d fix it by removing all looping and making it deterministic (with step). We can leave the usleep in there, it doesn’t hurt; though, not sure it adds anything.
On Jan 7, 2015, at 12:23 AM, Jakub Jelinek <jakub@redhat.com> wrote: > But I really don't like the busy waiting. We’ve already determined that sched_sleep isn’t intercepted and can be used to non-busy wait. Any reason not to use it? > As tsan is only supported on x86_64-linux So, I hate hardening the code to be overly non-portable when it doesn’t have to be that. There is something enticing to me about the simplicity of sched_sleep.
On Wed, Jan 07, 2015 at 08:58:04AM -0800, Mike Stump wrote: > On Jan 7, 2015, at 12:23 AM, Jakub Jelinek <jakub@redhat.com> wrote: > > But I really don't like the busy waiting. > > We’ve already determined that sched_sleep isn’t intercepted and can be used to non-busy wait. Any reason not to use it? > > > As tsan is only supported on x86_64-linux > > So, I hate hardening the code to be overly non-portable when it doesn’t have to be that. There is something enticing to me about the simplicity of sched_sleep. Well, pthread_barrier_wait and dlopen/dlsym are already used by libtsan and therefore have to be supported on all the architectures that support tsan. So that method is as portable as libtsan itself. Jakub
Index: aligned_vs_unaligned_race.C =================================================================== --- aligned_vs_unaligned_race.C (revision 219198) +++ aligned_vs_unaligned_race.C (working copy) @@ -1,5 +1,7 @@ /* { dg-shouldfail "tsan" } */ +/* { dg-additional-options "-pthread" } */ #include <pthread.h> +#include <sched.h> #include <stdio.h> #include <stdint.h> @@ -19,6 +21,12 @@ void *Thread2(void *x) { } int main() { + /* Run this test on a single CPU, to make it somewhat easier for tsan to + detect the race condition. */ + cpu_set_t s; + CPU_ZERO(&s); + CPU_SET(0, &s); + pthread_setaffinity_np(pthread_self(), sizeof(s), &s); pthread_t t[2]; pthread_create(&t[0], NULL, Thread1, NULL); pthread_create(&t[1], NULL, Thread2, NULL);