Message ID | d6e7acca-50e3-0672-ed72-1e39d63f8212@suse.cz |
---|---|
State | New |
Headers | show |
Received as HTML from Kelvin: I have reviewed some of the implementation details for pthread_mutex_init and pthread_mutex_lock, and have confirmed that ppc64 does use memset to initialize the entirety of its mutex lock structure before potentially overwriting a few of its fields with non-zero data. So in this regard, your fix seems appropriate. However, I'm confused about how this program runs. My understanding of the test case code is that the use of barrier_wait() calls is supposed to force Thread1 to run "to completion" before Thread2 even starts. So if my understanding is correct, it is not sensible for Thread1's pthread_mutex_init call to be followed "immediately" by Thread2's pthread_mutex_lock() call. The comment in tsan_barrier.h describes barrier_init and barrier_wait as "TSAN-invisible barriers". Can you help me understand how TSAN works in this regard? I was expecting that TSAN runs the program with instrumentation on the TSAN-visible "thread services" but no instrumentation on the TSAN-invisible thread services. Is this understanding correct? The questions I am trying to answer for myself are whether this test, as crafted, is really sufficiently "portable" and "meaningful". It seems to be making non-portable assumptions about the implementation of pthread_mutex_init and pthread_mutex_lock and pthread_mutex_unlock, even with the proposed change. Just to clarify: Is it your understanding that the expected tsan diagnostic report is complaining that some data field of the mutex structure is being read in Thread 2 after having been written in Thread 1 without any "happens before" relationship (since tsan cannot see the barrier_wait calls?) Since the "conflict" is going all the way back to Thread 1's pthread_mutex_init call, am I correct in my understanding that the "first conflict noticed" is dealing with some part of the lock structure that is not normally modified by the execution of pthread_mutex_lock and pthread_mutex_unlock. Maybe adding a comment or two to the test case to clarify what is being tested would be most helpful here...
On 04/26/2017 12:05 PM, Martin Liška wrote: > Received as HTML from Kelvin: > > I have reviewed some of the implementation details for pthread_mutex_init and pthread_mutex_lock, and have confirmed that ppc64 does use memset to initialize the entirety of its mutex lock structure before potentially overwriting a few of its fields with non-zero data. So in this regard, your fix seems appropriate. > > However, I'm confused about how this program runs. My understanding of the test case code is that the use of barrier_wait() calls is supposed to force Thread1 to run "to completion" before Thread2 even starts. So if my understanding is correct, it is not sensible for Thread1's pthread_mutex_init call to be followed "immediately" by Thread2's pthread_mutex_lock() call. > > The comment in tsan_barrier.h describes barrier_init and barrier_wait as "TSAN-invisible barriers". Can you help me understand how TSAN works in this regard? I was expecting that TSAN runs the program with instrumentation on the TSAN-visible "thread services" but no instrumentation on the TSAN-invisible thread services. Is this understanding correct? > > The questions I am trying to answer for myself are whether this test, as crafted, is really sufficiently "portable" and "meaningful". It seems to be making non-portable assumptions about the implementation of pthread_mutex_init and pthread_mutex_lock and pthread_mutex_unlock, even with the proposed change. > > Just to clarify: > Is it your understanding that the expected tsan diagnostic report is complaining that some data field of the mutex structure is being read in Thread 2 after having been written in Thread 1 without any "happens before" relationship (since tsan cannot see the barrier_wait calls?) Since the "conflict" is going all the way back to Thread 1's pthread_mutex_init call, am I correct in my understanding that the "first conflict noticed" is dealing with some part of the lock structure that is not normally modified by the execution of pthread_mutex_lock and pthread_mutex_unlock. > > Maybe adding a comment or two to the test case to clarify what is being tested would be most helpful here... > Well, I'm not the author neither of original test-case or modifications that added tsan_barrier.h. I'm CCing Bernd that can hopefully answer your questions. Martin
On 04/26/17 12:07, Martin Liška wrote: > On 04/26/2017 12:05 PM, Martin Liška wrote: >> Received as HTML from Kelvin: >> >> I have reviewed some of the implementation details for pthread_mutex_init and pthread_mutex_lock, and have confirmed that ppc64 does use memset to initialize the entirety of its mutex lock structure before potentially overwriting a few of its fields with non-zero data. So in this regard, your fix seems appropriate. >> >> However, I'm confused about how this program runs. My understanding of the test case code is that the use of barrier_wait() calls is supposed to force Thread1 to run "to completion" before Thread2 even starts. So if my understanding is correct, it is not sensible for Thread1's pthread_mutex_init call to be followed "immediately" by Thread2's pthread_mutex_lock() call. >> >> The comment in tsan_barrier.h describes barrier_init and barrier_wait as "TSAN-invisible barriers". Can you help me understand how TSAN works in this regard? I was expecting that TSAN runs the program with instrumentation on the TSAN-visible "thread services" but no instrumentation on the TSAN-invisible thread services. Is this understanding correct? >> >> The questions I am trying to answer for myself are whether this test, as crafted, is really sufficiently "portable" and "meaningful". It seems to be making non-portable assumptions about the implementation of pthread_mutex_init and pthread_mutex_lock and pthread_mutex_unlock, even with the proposed change. >> >> Just to clarify: >> Is it your understanding that the expected tsan diagnostic report is complaining that some data field of the mutex structure is being read in Thread 2 after having been written in Thread 1 without any "happens before" relationship (since tsan cannot see the barrier_wait calls?) Since the "conflict" is going all the way back to Thread 1's pthread_mutex_init call, am I correct in my understanding that the "first conflict noticed" is dealing with some part of the lock structure that is not normally modified by the execution of pthread_mutex_lock and pthread_mutex_unlock. >> >> Maybe adding a comment or two to the test case to clarify what is being tested would be most helpful here... >> > > Well, I'm not the author neither of original test-case or modifications that added tsan_barrier.h. > I'm CCing Bernd that can hopefully answer your questions. > Hmm, that's a difficult story. In the PR the problem is that the 1-byte access is expected by pthread_mutex_init. But in fact the access is artificially generated by the instrumentation in tsan/tsan_rtl_mutex.cc (MutexCreate) where we have this (after the mutex is already successfully created): MemoryWrite(thr, pc, addr, kSizeLog1); What the pthread_mutex_init normaly does is invisible to tsan, because it is compiled without instrumentation, however in your glibc version a memset is actually used, and that can be intercepted by tsan, even if the .so was not built with tsan instrumentation. That was probably an unexpected second access, and that made the test case fail, because the earlier access was reported. I think however, that only one extra call frame can ever be seen, because the stack frames are only created by instrumentation at instrumented function begin and end statements. So probably the test expectations could be more strict than in the proposed patch. I probably don't understand your question right, but without the barriers the test case could either report nothing, or even crash, because pthread_mutex_lock in thread2 operates on uninitialized data. This is just to nail down the sequence of events in a way that the expected test result happens with 100% likelihood. It is in the nature of tsan, that it cannot report every problem if you only run the program once, because the internal shadow ram is inherently racy. That is why in practice many test runs are necessary, but that is not how our test suite works. Bernd.
From 8001530f2041a0a390e62a710f28ef42ef730cc7 Mon Sep 17 00:00:00 2001 From: marxin <mliska@suse.cz> Date: Mon, 24 Apr 2017 14:59:18 +0200 Subject: [PATCH] Fix test-case on ppc64le (PR testsuite/79455). gcc/testsuite/ChangeLog: 2017-04-24 Martin Liska <mliska@suse.cz> * c-c++-common/tsan/race_on_mutex.c: Make the scanned pattern more generic. --- gcc/testsuite/c-c++-common/tsan/race_on_mutex.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/c-c++-common/tsan/race_on_mutex.c b/gcc/testsuite/c-c++-common/tsan/race_on_mutex.c index ae30d053c92..b3274a60ce5 100644 --- a/gcc/testsuite/c-c++-common/tsan/race_on_mutex.c +++ b/gcc/testsuite/c-c++-common/tsan/race_on_mutex.c @@ -37,9 +37,10 @@ int main() { } /* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */ -/* { dg-output " Atomic read of size 1 at .* by thread T2:(\n|\r\n|\r)" } */ +/* { dg-output " Atomic read of size \[0-9]\+ at .* by thread T2:(\n|\r\n|\r)" } */ /* { dg-output " #0 pthread_mutex_lock.*" } */ /* { dg-output " #1 Thread2.* .*(race_on_mutex.c:22|\\?{2}:0) (.*)" } */ -/* { dg-output " Previous write of size 1 at .* by thread T1:(\n|\r\n|\r)" } */ -/* { dg-output " #0 pthread_mutex_init .* (.)*" } */ -/* { dg-output " #1 Thread1.* .*(race_on_mutex.c:12|\\?{2}:0) .*" } */ +/* { dg-output " Previous write of size \[0-9]\+ at .* by thread T1:(\n|\r\n|\r)" } */ +/* { dg-output "( #. \[^\n\r\]*(\n|\r\n|\r))?" } */ +/* { dg-output " #. (__)?pthread_mutex_init \[^\n\r\]* (.)*" } */ +/* { dg-output " #. Thread1.* .*(race_on_mutex.c:12|\\?{2}:0) .*" } */ -- 2.12.2