Message ID | 20210503102409.GA20090@delia |
---|---|
State | New |
Headers | show |
Series | [openmp,simt] Error out for user-defined reduction | expand |
On Mon, May 03, 2021 at 12:24:10PM +0200, Tom de Vries wrote: > The test-case included in this patch contains this target region: > ... > for (int i0 = 0 ; i0 < N0 ; i0++ ) > counter_N0.i += 1; > ... > > When running with nvptx accelerator, the counter variable is expected to > be N0 after the region, but instead is N0 / 32. The problem is that rather > than getting the result for all warp lanes, we get it for just one lane. > > This is caused by the implementation of SIMT being incomplete. It handles > regular reductions, but appearantly not user-defined reductions. > > For now, make this explicit by erroring out for nvptx, like this: > ... > target-44.c: In function 'main': > target-44.c:20:9: error: SIMT reduction not fully implemented > ... > > Tested libgomp on x86_64-linux with and without nvptx accelerator. > > Any comments? If you want a workaround, the workaround should be to disable SIMT if UDR reductions are seen, rather than erroring out. So e.g. in lower_rec_simd_input_clauses for sctx->is_simt if sctx->max_vf isn't 1 look for OMP_CLAUSE_REDUCTION with OMP_CLAUSE_REDUCTION_PLACEHOLDER and punt (set max_vf = 1) in that case. The right thing is to implement it properly of course. Jakub
On 5/3/21 12:31 PM, Jakub Jelinek wrote: > On Mon, May 03, 2021 at 12:24:10PM +0200, Tom de Vries wrote: >> The test-case included in this patch contains this target region: >> ... >> for (int i0 = 0 ; i0 < N0 ; i0++ ) >> counter_N0.i += 1; >> ... >> >> When running with nvptx accelerator, the counter variable is expected to >> be N0 after the region, but instead is N0 / 32. The problem is that rather >> than getting the result for all warp lanes, we get it for just one lane. >> >> This is caused by the implementation of SIMT being incomplete. It handles >> regular reductions, but appearantly not user-defined reductions. >> >> For now, make this explicit by erroring out for nvptx, like this: >> ... >> target-44.c: In function 'main': >> target-44.c:20:9: error: SIMT reduction not fully implemented >> ... >> >> Tested libgomp on x86_64-linux with and without nvptx accelerator. >> >> Any comments? > > If you want a workaround, the workaround should be to disable SIMT if > UDR reductions are seen, rather than erroring out. > So e.g. in lower_rec_simd_input_clauses for sctx->is_simt if sctx->max_vf > isn't 1 look for OMP_CLAUSE_REDUCTION with OMP_CLAUSE_REDUCTION_PLACEHOLDER > and punt (set max_vf = 1) in that case. > Thanks for the review, I've tried to implement this, see patch below. > The right thing is to implement it properly of course. Ack, I've taken a look, and for me itd doesn't look like a below-a-day kind of task, so unfortunately I don't have the time for this right now. Thanks, - Tom
On Mon, May 03, 2021 at 07:03:24PM +0200, Tom de Vries wrote: > + if (sctx->is_simt && !known_eq (sctx->max_vf, 1U)) > + { > + tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt), > + OMP_CLAUSE_REDUCTION); > + if (c && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c)) > + /* UDR reductions are not supported yet for SIMT, disable SIMT. */ > + sctx->max_vf = 1; This isn't sufficient, you could have e.g. 2 reductions, the first non-UDR one and the second one with UDR. So it needs to be a for loop like: for (tree c = gimple_omp_for_clauses (ctx->stmt); c; c = OMP_CLAUSE_CHAIN (c)) if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c)) { /* UDR reductions are not supported yet for SIMT, disable SIMT. */ sctx->max_vf = 1; break; } (or with omp_find_clause used in two spots). Jakub
Hi! On 2021-05-03T19:03:24+0200, Tom de Vries <tdevries@suse.de> wrote: > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c/target-44.c > @@ -0,0 +1,27 @@ > +/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */ Causes issues if more than nvptx offloading compilation is enabled. Thus pushed "'libgomp.c/target-44.c': Restrict '-latomic' to nvptx offloading compilation" to master branch in commit abf937ac00e523576ca86957dfa9769281896ca5, see attached. Grüße Thomas ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 7b122059c6e..0f122857a3a 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -6005,6 +6005,8 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist, tree placeholder = OMP_CLAUSE_REDUCTION_PLACEHOLDER (c); gimple *tseq; tree ptype = TREE_TYPE (placeholder); + if (sctx.is_simt) + error ("SIMT reduction not fully implemented"); if (cond) { x = error_mark_node; diff --git a/libgomp/testsuite/libgomp.c/target-44.c b/libgomp/testsuite/libgomp.c/target-44.c new file mode 100644 index 00000000000..497931cd14c --- /dev/null +++ b/libgomp/testsuite/libgomp.c/target-44.c @@ -0,0 +1,28 @@ +/* { dg-do link { target { offload_target_nvptx } } } */ +/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */ +/* { dg-error "SIMT reduction not fully implemented" "" { target { offload_target_nvptx } } 0 } */ +#include <stdlib.h> + +struct s +{ + int i; +}; + +#pragma omp declare reduction(+: struct s: omp_out.i += omp_in.i) + +int +main (void) +{ + const int N0 = 32768; + + struct s counter_N0 = { 0 }; +#pragma omp target +#pragma omp for simd reduction(+: counter_N0) + for (int i0 = 0 ; i0 < N0 ; i0++ ) + counter_N0.i += 1; + + if (counter_N0.i != N0) + abort (); + + return 0; +}