diff mbox series

[openmp,simt] Error out for user-defined reduction

Message ID 20210503102409.GA20090@delia
State New
Headers show
Series [openmp,simt] Error out for user-defined reduction | expand

Commit Message

Tom de Vries May 3, 2021, 10:24 a.m. UTC
Hi,

The test-case included in this patch contains this target region:
...
  for (int i0 = 0 ; i0 < N0 ; i0++ )
    counter_N0.i += 1;
...

When running with nvptx accelerator, the counter variable is expected to
be N0 after the region, but instead is N0 / 32.  The problem is that rather
than getting the result for all warp lanes, we get it for just one lane.

This is caused by the implementation of SIMT being incomplete.  It handles
regular reductions, but appearantly not user-defined reductions.

For now, make this explicit by erroring out for nvptx, like this:
...
target-44.c: In function 'main':
target-44.c:20:9: error: SIMT reduction not fully implemented
...

Tested libgomp on x86_64-linux with and without nvptx accelerator.

Any comments?

Thanks,
- Tom

[openmp, simt] Error out for user-defined reduction

gcc/ChangeLog:

2021-05-03  Tom de Vries  <tdevries@suse.de>

	PR target/100321
	* omp-low.c (lower_rec_input_clauses): Error out for user-defined reduction
	for SIMT.

libgomp/ChangeLog:

2021-05-03  Tom de Vries  <tdevries@suse.de>

	PR target/100321
	* testsuite/libgomp.c/target-44.c: New test.

---
 gcc/omp-low.c                           |  2 ++
 libgomp/testsuite/libgomp.c/target-44.c | 28 ++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

Comments

Jakub Jelinek May 3, 2021, 10:31 a.m. UTC | #1
On Mon, May 03, 2021 at 12:24:10PM +0200, Tom de Vries wrote:
> The test-case included in this patch contains this target region:
> ...
>   for (int i0 = 0 ; i0 < N0 ; i0++ )
>     counter_N0.i += 1;
> ...
> 
> When running with nvptx accelerator, the counter variable is expected to
> be N0 after the region, but instead is N0 / 32.  The problem is that rather
> than getting the result for all warp lanes, we get it for just one lane.
> 
> This is caused by the implementation of SIMT being incomplete.  It handles
> regular reductions, but appearantly not user-defined reductions.
> 
> For now, make this explicit by erroring out for nvptx, like this:
> ...
> target-44.c: In function 'main':
> target-44.c:20:9: error: SIMT reduction not fully implemented
> ...
> 
> Tested libgomp on x86_64-linux with and without nvptx accelerator.
> 
> Any comments?

If you want a workaround, the workaround should be to disable SIMT if
UDR reductions are seen, rather than erroring out.
So e.g. in lower_rec_simd_input_clauses for sctx->is_simt if sctx->max_vf
isn't 1 look for OMP_CLAUSE_REDUCTION with OMP_CLAUSE_REDUCTION_PLACEHOLDER
and punt (set max_vf = 1) in that case.

The right thing is to implement it properly of course.

	Jakub
Tom de Vries May 3, 2021, 5:03 p.m. UTC | #2
On 5/3/21 12:31 PM, Jakub Jelinek wrote:
> On Mon, May 03, 2021 at 12:24:10PM +0200, Tom de Vries wrote:
>> The test-case included in this patch contains this target region:
>> ...
>>   for (int i0 = 0 ; i0 < N0 ; i0++ )
>>     counter_N0.i += 1;
>> ...
>>
>> When running with nvptx accelerator, the counter variable is expected to
>> be N0 after the region, but instead is N0 / 32.  The problem is that rather
>> than getting the result for all warp lanes, we get it for just one lane.
>>
>> This is caused by the implementation of SIMT being incomplete.  It handles
>> regular reductions, but appearantly not user-defined reductions.
>>
>> For now, make this explicit by erroring out for nvptx, like this:
>> ...
>> target-44.c: In function 'main':
>> target-44.c:20:9: error: SIMT reduction not fully implemented
>> ...
>>
>> Tested libgomp on x86_64-linux with and without nvptx accelerator.
>>
>> Any comments?
> 
> If you want a workaround, the workaround should be to disable SIMT if
> UDR reductions are seen, rather than erroring out.
> So e.g. in lower_rec_simd_input_clauses for sctx->is_simt if sctx->max_vf
> isn't 1 look for OMP_CLAUSE_REDUCTION with OMP_CLAUSE_REDUCTION_PLACEHOLDER
> and punt (set max_vf = 1) in that case.
> 

Thanks for the review, I've tried to implement this, see patch below.

> The right thing is to implement it properly of course.

Ack, I've taken a look, and for me itd doesn't look like a below-a-day
kind of task, so unfortunately I don't have the time for this right now.

Thanks,
- Tom
Jakub Jelinek May 3, 2021, 5:14 p.m. UTC | #3
On Mon, May 03, 2021 at 07:03:24PM +0200, Tom de Vries wrote:
> +      if (sctx->is_simt && !known_eq (sctx->max_vf, 1U))
> +	{
> +	  tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
> +				    OMP_CLAUSE_REDUCTION);
> +	  if (c && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
> +	    /* UDR reductions are not supported yet for SIMT, disable SIMT.  */
> +	    sctx->max_vf = 1;

This isn't sufficient, you could have e.g. 2 reductions, the first non-UDR
one and the second one with UDR.
So it needs to be a for loop like:
	  for (tree c = gimple_omp_for_clauses (ctx->stmt); c;
	       c = OMP_CLAUSE_CHAIN (c))
	    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
		&& OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
	      {
		/* UDR reductions are not supported yet for SIMT,
		   disable SIMT.  */
		sctx->max_vf = 1;
		break;
	      }
(or with omp_find_clause used in two spots).

	Jakub
Thomas Schwinge May 18, 2021, 11:03 a.m. UTC | #4
Hi!

On 2021-05-03T19:03:24+0200, Tom de Vries <tdevries@suse.de> wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/target-44.c
> @@ -0,0 +1,27 @@
> +/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */

Causes issues if more than nvptx offloading compilation is enabled.  Thus
pushed "'libgomp.c/target-44.c': Restrict '-latomic' to nvptx offloading
compilation" to master branch in commit
abf937ac00e523576ca86957dfa9769281896ca5, see attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
diff mbox series

Patch

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 7b122059c6e..0f122857a3a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6005,6 +6005,8 @@  lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 		  tree placeholder = OMP_CLAUSE_REDUCTION_PLACEHOLDER (c);
 		  gimple *tseq;
 		  tree ptype = TREE_TYPE (placeholder);
+		  if (sctx.is_simt)
+		    error ("SIMT reduction not fully implemented");
 		  if (cond)
 		    {
 		      x = error_mark_node;
diff --git a/libgomp/testsuite/libgomp.c/target-44.c b/libgomp/testsuite/libgomp.c/target-44.c
new file mode 100644
index 00000000000..497931cd14c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-44.c
@@ -0,0 +1,28 @@ 
+/* { dg-do link { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
+/* { dg-error "SIMT reduction not fully implemented" "" { target { offload_target_nvptx } } 0 }  */
+#include <stdlib.h>
+
+struct s
+{
+  int i;
+};
+
+#pragma omp declare reduction(+: struct s: omp_out.i += omp_in.i)
+
+int
+main (void)
+{
+  const int N0 = 32768;
+
+  struct s counter_N0 = { 0 };
+#pragma omp target
+#pragma omp for simd reduction(+: counter_N0)
+  for (int i0 = 0 ; i0 < N0 ; i0++ )
+    counter_N0.i += 1;
+
+  if (counter_N0.i != N0)
+    abort ();
+
+  return 0;
+}