diff mbox

update edge profile info in nvptx.c

Message ID f08fb5be-40c8-117e-33f3-d523b8f2d065@codesourcery.com
State New
Headers show

Commit Message

Cesar Philippidis July 13, 2017, 4:53 p.m. UTC
The recent basic block profiling changes broke a couple of libgomp
OpenACC execution tests involving reductions with nvptx offloading. For
gang and worker reductions, the nvptx BE updates the original reduction
variable using a lock-free atomic algorithm. This lock-free algorithm
utilizes a polling loop to check the state of the variable being
updated. This loop introduced a new basic block edge, but it wasn't
assigned a branch probability. Because of the highly threaded nature of
CUDA accelerators, I set the branch probability for that edge as even.

Similarly, for nvptx vector reductions, when it comes time to initialize
the reduction variable, the nvptx BE constructs a branch so that only
vector lanes 1 to vector_length-1 are initialized the the default value
for a given reduction type, where vector lane 0 retains the original
value of the reduction variable. For similar reason to the gang and
worker reductions, I set the probability of the new edge introduced for
the vector reduction to even.

Is this OK for trunk?

Cesar

Comments

Tom de Vries July 20, 2017, 1:04 p.m. UTC | #1
On 07/13/2017 06:53 PM, Cesar Philippidis wrote:
> Similarly, for nvptx vector reductions, when it comes time to initialize
> the reduction variable, the nvptx BE constructs a branch so that only
> vector lanes 1 to vector_length-1 are initialized the the default value
> for a given reduction type, where vector lane 0 retains the original
> value of the reduction variable. For similar reason to the gang and
> worker reductions, I set the probability of the new edge introduced for
> the vector reduction to even.
> 

Hi,

The problem that you describe in abstract term looks like this concretely:
....
(gdb) call debug_bb_n (4)
;; basic block 4, loop depth 0, freq 662, maybe hot
;;  prev block 3, next block 16, flags: (VISITED)
;;  pred:       3 [always (guessed)]  (FALLTHRU,EXECUTABLE)
# VUSE <.MEM_61>
# PT = nonlocal unit-escaped null
_18 = MEM[(const struct .omp_data_t.33D.1518 &).omp_data_i_9(D)
           clique 1 base 1].s2D.1519;
# VUSE <.MEMD.1540>
# USE = anything
_72 = GOACC_DIM_POS (2);
if (_72 != 0)
   goto <bb 16>; [100.00%] [count: INV]
else
   goto <bb 17>; [INV] [count: INV]
;;  succ:       16 [always]  (TRUE_VALUE)
;;              17 (FALSE_VALUE)
...

The edge to bb16 has probability 100%. The edge to bb17 has no 
probability set.

> @@ -5211,6 +5213,7 @@ nvptx_goacc_reduction_init (gcall *call)
>         
>         /* Create false edge from call_bb to dst_bb.  */
>         edge nop_edge = make_edge (call_bb, dst_bb, EDGE_FALSE_VALUE);
> +      nop_edge->probability = profile_probability::even ();
>   
>         /* Create phi node in dst block.  */
>         gphi *phi = create_phi_node (lhs, dst_bb);
> 

Setting the probability of the edge to bb17 to 50% LGTM.

But obviously having an accumulative outgoing edge probability of 150% 
is not a good idea.

Thanks,
- Tom
Cesar Philippidis July 20, 2017, 2:12 p.m. UTC | #2
On 07/20/2017 06:04 AM, Tom de Vries wrote:
> On 07/13/2017 06:53 PM, Cesar Philippidis wrote:
>> Similarly, for nvptx vector reductions, when it comes time to initialize
>> the reduction variable, the nvptx BE constructs a branch so that only
>> vector lanes 1 to vector_length-1 are initialized the the default value
>> for a given reduction type, where vector lane 0 retains the original
>> value of the reduction variable. For similar reason to the gang and
>> worker reductions, I set the probability of the new edge introduced for
>> the vector reduction to even.
>>
> 
> Hi,
> 
> The problem that you describe in abstract term looks like this concretely:
> ....
> (gdb) call debug_bb_n (4)
> ;; basic block 4, loop depth 0, freq 662, maybe hot
> ;;  prev block 3, next block 16, flags: (VISITED)
> ;;  pred:       3 [always (guessed)]  (FALLTHRU,EXECUTABLE)
> # VUSE <.MEM_61>
> # PT = nonlocal unit-escaped null
> _18 = MEM[(const struct .omp_data_t.33D.1518 &).omp_data_i_9(D)
>           clique 1 base 1].s2D.1519;
> # VUSE <.MEMD.1540>
> # USE = anything
> _72 = GOACC_DIM_POS (2);
> if (_72 != 0)
>   goto <bb 16>; [100.00%] [count: INV]
> else
>   goto <bb 17>; [INV] [count: INV]
> ;;  succ:       16 [always]  (TRUE_VALUE)
> ;;              17 (FALSE_VALUE)
> ...
> 
> The edge to bb16 has probability 100%. The edge to bb17 has no
> probability set.

Is that with or without the patch?

>> @@ -5211,6 +5213,7 @@ nvptx_goacc_reduction_init (gcall *call)
>>                 /* Create false edge from call_bb to dst_bb.  */
>>         edge nop_edge = make_edge (call_bb, dst_bb, EDGE_FALSE_VALUE);
>> +      nop_edge->probability = profile_probability::even ();
>>           /* Create phi node in dst block.  */
>>         gphi *phi = create_phi_node (lhs, dst_bb);
>>
> 
> Setting the probability of the edge to bb17 to 50% LGTM.
> 
> But obviously having an accumulative outgoing edge probability of 150%
> is not a good idea.

Yeah, that doesn't seem correct. Would you like to take over this patch?
I saw that you started working on this issue (or a similar one to it) in
PR81442.

Cesar
Tom de Vries July 20, 2017, 2:17 p.m. UTC | #3
On 07/20/2017 04:12 PM, Cesar Philippidis wrote:
> Would you like to take over this patch?
> I saw that you started working on this issue (or a similar one to it) in
> PR81442.

Sure.

Thanks,
- Tom
diff mbox

Patch

2017-07-13  Cesar Philippidis  <cesar@codesourcery.com>

	gcc
	* config/nvptx/nvptx.c (nvptx_lockless_update): Update edge
	profiling information.
	(nvptx_lockfull_update): Likewise.
	(nvptx_goacc_reduction_init): Likewise.


diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index c8847a5dbba..3a24bd375ca 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4985,6 +4985,7 @@  nvptx_lockless_update (location_t loc, gimple_stmt_iterator *gsi,
 
   post_edge->flags ^= EDGE_TRUE_VALUE | EDGE_FALLTHRU;
   edge loop_edge = make_edge (loop_bb, loop_bb, EDGE_FALSE_VALUE);
+  loop_edge->probability = profile_probability::even ();
   set_immediate_dominator (CDI_DOMINATORS, loop_bb, pre_bb);
   set_immediate_dominator (CDI_DOMINATORS, post_bb, loop_bb);
 
@@ -5057,7 +5058,8 @@  nvptx_lockfull_update (location_t loc, gimple_stmt_iterator *gsi,
   
   /* Create the lock loop ... */
   locked_edge->flags ^= EDGE_TRUE_VALUE | EDGE_FALLTHRU;
-  make_edge (lock_bb, lock_bb, EDGE_FALSE_VALUE);
+  edge e = make_edge (lock_bb, lock_bb, EDGE_FALSE_VALUE);
+  e->probability = profile_probability::even ();
   set_immediate_dominator (CDI_DOMINATORS, lock_bb, entry_bb);
   set_immediate_dominator (CDI_DOMINATORS, update_bb, lock_bb);
 
@@ -5211,6 +5213,7 @@  nvptx_goacc_reduction_init (gcall *call)
       
       /* Create false edge from call_bb to dst_bb.  */
       edge nop_edge = make_edge (call_bb, dst_bb, EDGE_FALSE_VALUE);
+      nop_edge->probability = profile_probability::even ();
 
       /* Create phi node in dst block.  */
       gphi *phi = create_phi_node (lhs, dst_bb);