diff mbox

[1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch

Message ID 1253ac69-3301-f185-e43a-a34cadf8f51e@suse.cz
State New
Headers show

Commit Message

Martin Liška Aug. 5, 2016, 8:55 a.m. UTC
On 08/04/2016 07:03 PM, Nathan Sidwell wrote:
> On 08/04/16 12:43, Nathan Sidwell wrote:
> 
>> How about:
>> gcov_t expected;
>> atomic_load (&counter[0],  val, ...);
>> gcov_t delta = val == value ? 1 : -1;
>> atomic_add (&counter[1], delta);   <-- or atomic_add_fetch
>> if (delta < 0) {
>>   /* can we set counter[0]? */
>>   atomic_load (&counter[1], &expected, ...);
>>   if (expected < 0) {
>>     atomic_store (&counter[0], value, ...);
>>     atomic_add (&counter[1], 2, ...);
>>   }
>> }
>> atomic_add (&counter[2], 1, ...);
> 

Hi.

Thank you for very intensive brainstorming ;) Well I still believe that following code
is not thread safe, let's image following situation:

> we could do better by using compare_exchange storing value, and detect the race I mentioned:
> 
> gcov_t expected, val;
> atomic_load (&counter[0],  &val, ...);

[thread 1]: value == 1, read val == 1 // scheduled here

> gcov_t delta = val == value ? 1 : -1;
> atomic_add (&counter[1], delta);
> if (delta < 0) {
>    retry:
>     /* can we set counter[0]? */
>     atomic_load (&counter[1], &expected, ...);
>     if (expected < 0) {
>       bool stored = atomic_compare_exchange (&counter[0], &val, &value, ...);
>       if (!stored && val != value)
>         goto retry;
[thread 2]: value == 2, just updated counter[0] to 2
// after that [thread 1] continue, but wrongly does counter[1]++, but value != counter[0]
>       atomic_add (&counter[1], 2, ...);
>   }
> }
> atomic_add (&counter[2], 1, ...);
> 
> This  corrects the off-by one issue.
> 
> nathan

Well, I wrote attached test-case which should trigger a data-race, but TSAN is silent:

$ g++ race.cc  -pthread -fprofile-generate -g -fsanitize=thread -fprofile-update=atomic
$ ./a.out

In main: creating thread 0
In main: creating thread 1
new counter[1] value, N:0
In main: creating thread 2
new counter[1] value, N:1
new counter[1] value, N:2
new counter[1] value, N:3
new counter[1] value, N:4
new counter[1] value, N:5
new counter[1] value, N:6
new counter[1] value, N:7
new counter[1] value, N:8
new counter[1] value, N:9
new counter[1] value, N:10
new counter[1] value, N:11
new counter[1] value, N:12
new counter[1] value, N:12
new counter[1] value, N:13
new counter[1] value, N:14
new counter[1] value, N:15
new counter[1] value, N:16
In main: creating thread 3
In main: creating thread 4
In main: creating thread 5
In main: creating thread 6
In main: creating thread 7
In main: creating thread 8
In main: creating thread 9
In main: creating thread 10
In main: creating thread 11
In main: creating thread 12
In main: creating thread 13
In main: creating thread 14
In main: creating thread 15

However, not updating arc counters with atomic operations causes really many races:

$ g++ race.cc  -pthread -fprofile-generate -g -fsanitize=thread
$ ./a.out 2>&1 | grep 'data race' | wc -l
110

Sample:
WARNING: ThreadSanitizer: data race (pid=11424)
  Read of size 8 at 0x000000606718 by thread T4:
    #0 A::foo() /tmp/race.cc:10 (a.out+0x000000401e78)

  Previous write of size 8 at 0x000000606718 by thread T1:
    [failed to restore the stack]

  Location is global '__gcov0._ZN1A3fooEv' of size 16 at 0x000000606710 (a.out+0x000000606718)

  Thread T4 (tid=11429, running) created by main thread at:
    #0 pthread_create ../../../../libsanitizer/tsan/tsan_interceptors.cc:876 (libtsan.so.0+0x00000002ad2d)
    #1 main /tmp/race.cc:43 (a.out+0x000000401afb)

  Thread T1 (tid=11426, finished) created by main thread at:
    #0 pthread_create ../../../../libsanitizer/tsan/tsan_interceptors.cc:876 (libtsan.so.0+0x00000002ad2d)
    #1 main /tmp/race.cc:43 (a.out+0x000000401afb)

Maybe I miss something and my tester sample is wrong (please apply attached patch to use original __gcov_one_value_profiler_body)?
Thanks,
Martin

Comments

Nathan Sidwell Aug. 5, 2016, 12:38 p.m. UTC | #1
On 08/05/16 04:55, Martin Liška wrote:

> Thank you for very intensive brainstorming ;) Well I still believe that following code
> is not thread safe, let's image following situation:
>

yeah, you're right.

>> we could do better by using compare_exchange storing value, and detect the race I mentioned:
>>
>> gcov_t expected, val;
>> atomic_load (&counter[0],  &val, ...);
>
> [thread 1]: value == 1, read val == 1 // scheduled here
>
>> gcov_t delta = val == value ? 1 : -1;
>> atomic_add (&counter[1], delta);
>> if (delta < 0) {
>>    retry:
>>     /* can we set counter[0]? */
>>     atomic_load (&counter[1], &expected, ...);
>>     if (expected < 0) {
>>       bool stored = atomic_compare_exchange (&counter[0], &val, &value, ...);
>>       if (!stored && val != value)
>>         goto retry;
> [thread 2]: value == 2, just updated counter[0] to 2
> // after that [thread 1] continue, but wrongly does counter[1]++, but value != counter[0]
>>       atomic_add (&counter[1], 2, ...);

Bah.  but (a) does it matter enough? and (b) if so does changing the delta<0 
handling to store a count of 1 solve it?: (answer: no)

gcov_t expected, val;
A:atomic_load (&counter[0],  &val, ...);
gcov_t delta = val == value ? 1 : -1;
B:atomic_add (&counter[1], delta);

if (delta < 0) {
      /* can we set counter[0]? */
      C:atomic_load (&counter[1], &expected, ...);
      if (expected < 0) {
        D:atomic_store (&counter[0], &value);
        E: atomic_store (&counter[1], 1);
   }
atomic_add (&counter[1], 2, ...);


thread-1: value = 1, reads '1' at A
thread-2: value = 2, reads '1' at A
thread-2: decrements count @ B
thread-2: reads -1 at C
thread-2: write 2 at D
thread-2: stores 1 at E
thread-1: increments count @ B (finally)

So we still can go awry.  But the code's simpler.  Like you said, I don't think 
it's possible to solve without an atomic update to both counter[0] & counter[1].


> Well, I wrote attached test-case which should trigger a data-race, but TSAN is silent:

I'm not too surprised.  The race window is tiny and you put a printf in the 
middle of one path.  I suspect if you put a sleep/printf on the counter[1] 
increment path you'll see it more often.

nathan
Martin Liška Aug. 5, 2016, 12:48 p.m. UTC | #2
On 08/05/2016 02:38 PM, Nathan Sidwell wrote:
> On 08/05/16 04:55, Martin Liška wrote:
> 
>> Thank you for very intensive brainstorming ;) Well I still believe that following code
>> is not thread safe, let's image following situation:
>>
> 
> yeah, you're right.
> 
>>> we could do better by using compare_exchange storing value, and detect the race I mentioned:
>>>
>>> gcov_t expected, val;
>>> atomic_load (&counter[0],  &val, ...);
>>
>> [thread 1]: value == 1, read val == 1 // scheduled here
>>
>>> gcov_t delta = val == value ? 1 : -1;
>>> atomic_add (&counter[1], delta);
>>> if (delta < 0) {
>>>    retry:
>>>     /* can we set counter[0]? */
>>>     atomic_load (&counter[1], &expected, ...);
>>>     if (expected < 0) {
>>>       bool stored = atomic_compare_exchange (&counter[0], &val, &value, ...);
>>>       if (!stored && val != value)
>>>         goto retry;
>> [thread 2]: value == 2, just updated counter[0] to 2
>> // after that [thread 1] continue, but wrongly does counter[1]++, but value != counter[0]
>>>       atomic_add (&counter[1], 2, ...);
> 
> Bah.  but (a) does it matter enough? and (b) if so does changing the delta<0 handling to store a count of 1 solve it?: (answer: no)
> 
> gcov_t expected, val;
> A:atomic_load (&counter[0],  &val, ...);
> gcov_t delta = val == value ? 1 : -1;
> B:atomic_add (&counter[1], delta);
> 
> if (delta < 0) {
>      /* can we set counter[0]? */
>      C:atomic_load (&counter[1], &expected, ...);
>      if (expected < 0) {
>        D:atomic_store (&counter[0], &value);
>        E: atomic_store (&counter[1], 1);
>   }
> atomic_add (&counter[1], 2, ...);
> 
> 
> thread-1: value = 1, reads '1' at A
> thread-2: value = 2, reads '1' at A
> thread-2: decrements count @ B
> thread-2: reads -1 at C
> thread-2: write 2 at D
> thread-2: stores 1 at E
> thread-1: increments count @ B (finally)
> 
> So we still can go awry.  But the code's simpler.  Like you said, I don't think it's possible to solve without an atomic update to both counter[0] & counter[1].
> 
> 
>> Well, I wrote attached test-case which should trigger a data-race, but TSAN is silent:
> 
> I'm not too surprised.  The race window is tiny and you put a printf in the middle of one path.  I suspect if you put a sleep/printf on the counter[1] increment path you'll see it more often.
> 
> nathan

Ok, after all the experimenting and inventing "almost" thread-safe code, I incline to not to include __gcov_one_value_profiler_body_atomic
counter. The final solution is cumbersome and probably does not worth doing. Moreover, even having a thread-safe implementation, result of an indirect call counter
is not going to be stable among different runs (due to a single value storage capability).

If you agree, I'll prepare a final version of patch?

Thanks,
Martin
Nathan Sidwell Aug. 5, 2016, 1:14 p.m. UTC | #3
On 08/05/16 08:48, Martin Liška wrote:

> Ok, after all the experimenting and inventing "almost" thread-safe code, I incline to not to include __gcov_one_value_profiler_body_atomic
> counter. The final solution is cumbersome and probably does not worth doing. Moreover, even having a thread-safe implementation, result of an indirect call counter
> is not going to be stable among different runs (due to a single value storage capability).
>
> If you agree, I'll prepare a final version of patch?

Agreed.

nathan
diff mbox

Patch

diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 6e96fa9..42b780f 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -188,8 +188,8 @@  gimple_init_edge_profiler (void)
       profiler_fn_name = "__gcov_indirect_call_profiler_v2";
       if (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE))
 	profiler_fn_name = "__gcov_indirect_call_topn_profiler";
-      if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
-	profiler_fn_name = "__gcov_indirect_call_profiler_v2_atomic";
+//      if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+//	profiler_fn_name = "__gcov_indirect_call_profiler_v2_atomic";
 
       tree_indirect_call_profiler_fn
 	      = build_fn_decl (profiler_fn_name, ic_profiler_fn_type);
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index c1e287d..d43932e 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -70,6 +70,8 @@  __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
 
    In any case, COUNTERS[2] is incremented.  */
 
+static int counter = 0;
+
 static inline void
 __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
 {
@@ -77,6 +79,7 @@  __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
     counters[1]++;
   else if (counters[1] == 0)
     {
+      fprintf (stderr, "new counter[1] value, N:%d\n", counter++);
       counters[1] = 1;
       counters[0] = value;
     }