diff mbox

[2/2] OpenACC routine support

Message ID 5637B827.1020909@acm.org
State New
Headers show

Commit Message

Nathan Sidwell Nov. 2, 2015, 7:23 p.m. UTC
Here are the tests for the routine support.  The compiler tests check invalid 
combinations of gang, worker, vector & seq.  The libgomp execution tests check 
the expected partioning occurs within loops.  As  with the reduction tests, 
these ones  are taken from the execution model loop tests.

ok?

nathan

Comments

Jakub Jelinek Nov. 2, 2015, 7:41 p.m. UTC | #1
On Mon, Nov 02, 2015 at 02:23:19PM -0500, Nathan Sidwell wrote:
> +#pragma acc routine gang
> +void __attribute__ ((noinline)) gang (int ary[N])
> +{
> +#pragma acc loop gang
> +    for (unsigned ix = 0; ix < N; ix++)
> +      {
> +	if (__builtin_acc_on_device (5))
> +	  {
> +	    int g = 0, w = 0, v = 0;
> +
> +	    __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
> +	    __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
> +	    __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
> +	    ary[ix] = (g << 16) | (w << 8) | v;
> +	  }
> +	else
> +	  ary[ix] = ix;

Does this work even with -O0?  I mean, the assembler is invalid
for any target other than PTX, so you are relying on aggressively folding
this away.

	Jakub
Nathan Sidwell Nov. 2, 2015, 8:01 p.m. UTC | #2
On 11/02/15 14:41, Jakub Jelinek wrote:
>
> Does this work even with -O0?  I mean, the assembler is invalid
> for any target other than PTX, so you are relying on aggressively folding
> this away.

Correct.  As thread identification is inherently target-specific, I don't see 
how to do otherwise.

We know _builtin_acc_on_device is folded for a constant arg, and -O2 enables 
dead code elimination such that non-PTX targets (such as the host) don't see 
that assembler.


nathan
Jakub Jelinek Nov. 3, 2015, 3:38 p.m. UTC | #3
On Mon, Nov 02, 2015 at 02:23:19PM -0500, Nathan Sidwell wrote:
> Here are the tests for the routine support.  The compiler tests check
> invalid combinations of gang, worker, vector & seq.  The libgomp execution
> tests check the expected partioning occurs within loops.  As  with the
> reduction tests, these ones  are taken from the execution model loop tests.

I find the testsuite coverage insufficient, e.g. you don't have equivalent
of first half of declare-simd-2.C or declare-simd-2.c
(everything above #pragma omp declare simd inbranch notinbranch),
to verify that if acc routine is used without the (fnname) in it, then
it can't be followed by var definition and various other tokens.

	Jakub
Nathan Sidwell Nov. 3, 2015, 3:56 p.m. UTC | #4
On 11/03/15 10:38, Jakub Jelinek wrote:
> On Mon, Nov 02, 2015 at 02:23:19PM -0500, Nathan Sidwell wrote:
>> Here are the tests for the routine support.  The compiler tests check
>> invalid combinations of gang, worker, vector & seq.  The libgomp execution
>> tests check the expected partioning occurs within loops.  As  with the
>> reduction tests, these ones  are taken from the execution model loop tests.
>
> I find the testsuite coverage insufficient, e.g. you don't have equivalent
> of first half of declare-simd-2.C or declare-simd-2.c
> (everything above #pragma omp declare simd inbranch notinbranch),
> to verify that if acc routine is used without the (fnname) in it, then
> it can't be followed by var definition and various other tokens.

d'oh! forgot to port those tests.  Easy fix.

ok with that added?

nathan
Jakub Jelinek Nov. 3, 2015, 4:04 p.m. UTC | #5
On Tue, Nov 03, 2015 at 10:56:37AM -0500, Nathan Sidwell wrote:
> On 11/03/15 10:38, Jakub Jelinek wrote:
> >On Mon, Nov 02, 2015 at 02:23:19PM -0500, Nathan Sidwell wrote:
> >>Here are the tests for the routine support.  The compiler tests check
> >>invalid combinations of gang, worker, vector & seq.  The libgomp execution
> >>tests check the expected partioning occurs within loops.  As  with the
> >>reduction tests, these ones  are taken from the execution model loop tests.
> >
> >I find the testsuite coverage insufficient, e.g. you don't have equivalent
> >of first half of declare-simd-2.C or declare-simd-2.c
> >(everything above #pragma omp declare simd inbranch notinbranch),
> >to verify that if acc routine is used without the (fnname) in it, then
> >it can't be followed by var definition and various other tokens.
> 
> d'oh! forgot to port those tests.  Easy fix.
> 
> ok with that added?

Yes.

	Jakub
diff mbox

Patch

2015-11-02  Nathan Sidwell  <nathan@codesourcery.com>

	gcc/testsuite/
	* c-c++-common/goacc/routine-1.c: New.
	* c-c++-common/goacc/routine-2.c: New.
	* c-c++-common/goacc/routine-3.c: New.
	* c-c++-common/goacc/routine-4.c: New.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: New.

Index: gcc/testsuite/c-c++-common/goacc/routine-1.c
===================================================================
--- gcc/testsuite/c-c++-common/goacc/routine-1.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/routine-1.c	(working copy)
@@ -0,0 +1,34 @@ 
+
+#pragma acc routine gang
+void gang (void)
+{
+}
+
+#pragma acc routine worker
+void worker (void)
+{
+}
+
+#pragma acc routine vector
+void vector (void)
+{
+}
+
+#pragma acc routine seq
+void seq (void)
+{
+}
+
+int main ()
+{
+
+#pragma acc parallel num_gangs (32) num_workers (32) vector_length (32)
+  {
+    gang ();
+    worker ();
+    vector ();
+    seq ();
+  }
+
+  return 0;
+}
Index: gcc/testsuite/c-c++-common/goacc/routine-2.c
===================================================================
--- gcc/testsuite/c-c++-common/goacc/routine-2.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/routine-2.c	(working copy)
@@ -0,0 +1,21 @@ 
+#pragma acc routine gang worker /* { dg-error "multiple loop axes" } */
+void gang (void)
+{
+}
+
+#pragma acc routine worker vector /* { dg-error "multiple loop axes" } */
+void worker (void)
+{
+}
+
+#pragma acc routine vector seq /* { dg-error "multiple loop axes" } */
+void vector (void)
+{
+}
+
+#pragma acc routine seq gang /* { dg-error "multiple loop axes" } */
+void seq (void)
+{
+}
+
+#pragma acc routine (nothing) gang /* { dg-error "not been declared" } */
Index: gcc/testsuite/c-c++-common/goacc/routine-3.c
===================================================================
--- gcc/testsuite/c-c++-common/goacc/routine-3.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/routine-3.c	(working copy)
@@ -0,0 +1,53 @@ 
+#pragma acc routine gang
+void gang (void) /* { dg-message "declared here" 3 } */
+{
+}
+
+#pragma acc routine worker
+void worker (void) /* { dg-message "declared here" 2 } */
+{
+}
+
+#pragma acc routine vector
+void vector (void) /* { dg-message "declared here" 1 } */
+{
+}
+
+#pragma acc routine seq
+void seq (void)
+{
+}
+
+int main ()
+{
+
+#pragma acc parallel num_gangs (32) num_workers (32) vector_length (32)
+  {
+    #pragma acc loop gang /* { dg-message "loop here" 1 } */
+    for (int i = 0; i < 10; i++)
+      {
+	gang (); /*  { dg-error "routine call uses same" } */
+	worker ();
+	vector ();
+	seq ();
+      }
+    #pragma acc loop worker /* { dg-message "loop here" 2 } */
+    for (int i = 0; i < 10; i++)
+      {
+	gang (); /*  { dg-error "routine call uses same" } */
+	worker (); /*  { dg-error "routine call uses same" } */
+	vector ();
+	seq ();
+      }
+    #pragma acc loop vector /* { dg-message "loop here" 3 } */
+    for (int i = 0; i < 10; i++)
+      {
+	gang (); /*  { dg-error "routine call uses same" } */
+	worker (); /*  { dg-error "routine call uses same" } */
+	vector (); /*  { dg-error "routine call uses same" } */
+	seq ();
+      }
+  }
+
+  return 0;
+}
Index: gcc/testsuite/c-c++-common/goacc/routine-4.c
===================================================================
--- gcc/testsuite/c-c++-common/goacc/routine-4.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/routine-4.c	(working copy)
@@ -0,0 +1,41 @@ 
+
+void gang (void);
+void worker (void);
+void vector (void);
+
+#pragma acc routine (gang) gang
+#pragma acc routine (worker) worker
+#pragma acc routine (vector) vector
+  
+#pragma acc routine seq
+void seq (void)
+{
+  gang ();  /* { dg-error "routine call uses" } */
+  worker ();  /* { dg-error "routine call uses" } */
+  vector ();  /* { dg-error "routine call uses" } */
+  seq ();
+}
+
+void vector (void) /* { dg-message "declared here" 1 } */
+{
+  gang ();  /* { dg-error "routine call uses" } */
+  worker ();  /* { dg-error "routine call uses" } */
+  vector ();
+  seq ();
+}
+
+void worker (void) /* { dg-message "declared here" 2 } */
+{
+  gang ();  /* { dg-error "routine call uses" } */
+  worker ();
+  vector ();
+  seq ();
+}
+
+void gang (void) /* { dg-message "declared here" 3 } */
+{
+  gang ();
+  worker ();
+  vector ();
+  seq ();
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/routine-g-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-g-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-g-1.c	(working copy)
@@ -0,0 +1,64 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-O2" */
+
+#include <stdio.h>
+
+#define N (32*32*32+17)
+
+#pragma acc routine gang
+void __attribute__ ((noinline)) gang (int ary[N])
+{
+#pragma acc loop gang
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	if (__builtin_acc_on_device (5))
+	  {
+	    int g = 0, w = 0, v = 0;
+
+	    __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+	    __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+	    __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+	    ary[ix] = (g << 16) | (w << 8) | v;
+	  }
+	else
+	  ary[ix] = ix;
+      }
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int ondev = 0;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+  
+#pragma acc parallel num_gangs(32) vector_length(32) copy(ary) copy(ondev)
+  {
+    ondev = __builtin_acc_on_device (5);
+    gang (ary);
+  }
+
+  for (ix = 0; ix < N; ix++)
+    {
+      int expected = ix;
+      if(ondev)
+	{
+	  int g = ix / ((N + 31) / 32);
+	  int w = 0;
+	  int v = 0;
+
+	  expected = (g << 16) | (w << 8) | v;
+	}
+      
+      if (ary[ix] != expected)
+	{
+	  exit = 1;
+	  printf ("ary[%d]=%x expected %x\n", ix, ary[ix], expected);
+	}
+    }
+  
+  return exit;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c	(working copy)
@@ -0,0 +1,66 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-O2" */
+
+#include <stdio.h>
+
+#define N (32*32*32+17)
+
+#pragma acc routine gang
+void __attribute__ ((noinline)) gang (int ary[N])
+{
+#pragma acc loop gang worker vector
+  for (unsigned ix = 0; ix < N; ix++)
+    {
+      if (__builtin_acc_on_device (5))
+	{
+	  int g = 0, w = 0, v = 0;
+
+	  __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+	  __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+	  __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+	  ary[ix] = (g << 16) | (w << 8) | v;
+	}
+      else
+	ary[ix] = ix;
+    }
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int ondev = 0;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+  
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) copy(ary) copy(ondev)
+  {
+    ondev = __builtin_acc_on_device (5);
+    gang (ary);
+  }
+
+  for (ix = 0; ix < N; ix++)
+    {
+      int expected = ix;
+      if(ondev)
+	{
+	  int chunk_size = (N + 32*32*32 - 1) / (32*32*32);
+	  
+	  int g = ix / (chunk_size * 32 * 32);
+	  int w = ix / 32 % 32;
+	  int v = ix % 32;
+
+	  expected = (g << 16) | (w << 8) | v;
+	}
+      
+      if (ary[ix] != expected)
+	{
+	  exit = 1;
+	  printf ("ary[%d]=%x expected %x\n", ix, ary[ix], expected);
+	}
+    }
+  
+  return exit;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/routine-v-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-v-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-v-1.c	(working copy)
@@ -0,0 +1,64 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-O2" */
+
+#include <stdio.h>
+
+#define N (32*32*32+17)
+
+#pragma acc routine vector
+void __attribute__ ((noinline)) vector (int ary[N])
+{
+#pragma acc loop vector
+  for (unsigned ix = 0; ix < N; ix++)
+    {
+      if (__builtin_acc_on_device (5))
+	{
+	  int g = 0, w = 0, v = 0;
+
+	  __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+	  __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+	  __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+	  ary[ix] = (g << 16) | (w << 8) | v;
+	}
+      else
+	ary[ix] = ix;
+    }
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int ondev = 0;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+  
+#pragma acc parallel vector_length(32) copy(ary) copy(ondev)
+  {
+    ondev = __builtin_acc_on_device (5);
+    vector (ary);
+  }
+
+  for (ix = 0; ix < N; ix++)
+    {
+      int expected = ix;
+      if(ondev)
+	{
+	  int g = 0;
+	  int w = 0;
+	  int v = ix % 32;
+
+	  expected = (g << 16) | (w << 8) | v;
+	}
+      
+      if (ary[ix] != expected)
+	{
+	  exit = 1;
+	  printf ("ary[%d]=%x expected %x\n", ix, ary[ix], expected);
+	}
+    }
+  
+  return exit;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c	(working copy)
@@ -0,0 +1,64 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-O2" */
+
+#include <stdio.h>
+
+#define N (32*32*32+17)
+
+#pragma acc routine worker
+void __attribute__ ((noinline)) worker (int ary[N])
+{
+#pragma acc loop worker
+  for (unsigned ix = 0; ix < N; ix++)
+    {
+      if (__builtin_acc_on_device (5))
+	{
+	  int g = 0, w = 0, v = 0;
+
+	  __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+	  __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+	  __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+	  ary[ix] = (g << 16) | (w << 8) | v;
+	}
+      else
+	ary[ix] = ix;
+    }
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int ondev = 0;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+  
+#pragma acc parallel num_workers(32) vector_length(32) copy(ary) copy(ondev)
+  {
+    ondev = __builtin_acc_on_device (5);
+    worker (ary);
+  }
+
+  for (ix = 0; ix < N; ix++)
+    {
+      int expected = ix;
+      if(ondev)
+	{
+	  int g = 0;
+	  int w = ix % 32;
+	  int v = 0;
+
+	  expected = (g << 16) | (w << 8) | v;
+	}
+      
+      if (ary[ix] != expected)
+	{
+	  exit = 1;
+	  printf ("ary[%d]=%x expected %x\n", ix, ary[ix], expected);
+	}
+    }
+  
+  return exit;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c	(working copy)
@@ -0,0 +1,64 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-O2" */
+
+#include <stdio.h>
+
+#define N (32*32*32+17)
+
+#pragma acc routine worker
+void __attribute__ ((noinline)) worker (int ary[N])
+{
+#pragma acc loop worker vector
+  for (unsigned ix = 0; ix < N; ix++)
+    {
+      if (__builtin_acc_on_device (5))
+	{
+	  int g = 0, w = 0, v = 0;
+
+	  __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+	  __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+	  __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+	  ary[ix] = (g << 16) | (w << 8) | v;
+	}
+      else
+	ary[ix] = ix;
+    }
+}
+
+int main ()
+{
+  int ary[N];
+  int ix;
+  int exit = 0;
+  int ondev = 0;
+
+  for (ix = 0; ix < N;ix++)
+    ary[ix] = -1;
+  
+#pragma acc parallel num_workers(32) vector_length(32) copy(ary) copy(ondev)
+  {
+    ondev = __builtin_acc_on_device (5);
+    worker (ary);
+  }
+
+  for (ix = 0; ix < N; ix++)
+    {
+      int expected = ix;
+      if(ondev)
+	{
+	  int g = 0;
+	  int w = (ix / 32) % 32;
+	  int v = ix % 32;
+
+	  expected = (g << 16) | (w << 8) | v;
+	}
+      
+      if (ary[ix] != expected)
+	{
+	  exit = 1;
+	  printf ("ary[%d]=%x expected %x\n", ix, ary[ix], expected);
+	}
+    }
+  
+  return exit;
+}