diff mbox series

[committed,amdgcn] Update OpenACC testcases for amdgcn

Message ID 5383bcdf-90ea-0513-973f-9d97a9c76325@codesourcery.com
State New
Headers show
Series [committed,amdgcn] Update OpenACC testcases for amdgcn | expand

Commit Message

Andrew Stubbs Jan. 20, 2020, 4:53 p.m. UTC
Hi All,

I've committed this testsuite-only patch to fix some test cases that 
need GCN-specific settings in order to pass.

Test-case loop-auto-1.c is coded to require dimensions only suitable for 
nvptx devices. It might be fixable, but for now I'm just disabling it 
for amdgcn.

The other loop* and routine* test-cases will work with other launch 
dimensions, but need to be able to calculate what the correct results 
are. I've adjusted them so that they can do so dynamically.

Finally, the parallel-dims.c test-case needs to be configured for the 
new device. This configuration will need to be adjusted, once Julian's 
multi-worker patches are reviewed.

As far as I am aware, this should not change any test results for any 
other target. If that's not true then I apologise in advance and this 
patch can be reverted easily enough.

Andrew

Comments

Thomas Schwinge June 8, 2021, 9:51 a.m. UTC | #1
Hi!

On 2020-01-20T16:53:49+0000, Andrew Stubbs <ams@codesourcery.com> wrote:
> I've committed this testsuite-only patch to fix some test cases that
> need GCN-specific settings in order to pass.

>       * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
>       (acc_gang): Recognise acc_device_radeon.
>       (acc_worker): Likewise.
>       (acc_vector): Likewise.

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
> @@ -14,7 +14,8 @@ static unsigned int __attribute__ ((optimize ("O2"))) acc_gang ()
>  {
>    if (acc_on_device ((int) acc_device_host))
>      return 0;
> -  else if (acc_on_device ((int) acc_device_nvidia))
> +  else if (acc_on_device ((int) acc_device_nvidia)
> +        || acc_on_device ((int) acc_device_radeon))
>      return __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
>    else
>      __builtin_abort ();
> @@ -25,7 +26,8 @@ static unsigned int __attribute__ ((optimize ("O2"))) acc_worker ()
>  {
>    if (acc_on_device ((int) acc_device_host))
>      return 0;
> -  else if (acc_on_device ((int) acc_device_nvidia))
> +  else if (acc_on_device ((int) acc_device_nvidia)
> +        || acc_on_device ((int) acc_device_radeon))
>      return __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
>    else
>      __builtin_abort ();
> @@ -36,7 +38,8 @@ static unsigned int __attribute__ ((optimize ("O2"))) acc_vector ()
>  {
>    if (acc_on_device ((int) acc_device_host))
>      return 0;
> -  else if (acc_on_device ((int) acc_device_nvidia))
> +  else if (acc_on_device ((int) acc_device_nvidia)
> +        || acc_on_device ((int) acc_device_radeon))
>      return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
>    else
>      __builtin_abort ();

Similar changes are necessary for
'libgomp.oacc-fortran/parallel-dims.f90', too -- and actually all that
can be simplified; pushed "Fix 'libgomp.oacc-fortran/parallel-dims.f90'
for 'acc_device_radeon'" to master branch in commit
32099c0d24adb93a031e0301ffd77b065b6f5472, see attached.

>       (main): Set expectations for amdgcn.

> @@ -282,6 +285,12 @@ int main ()
>         /* The GCC nvptx back end enforces num_workers (32).  */
>         workers_actual = 32;
>       }
> +      else if (acc_on_device (acc_device_radeon))
> +     {
> +       /* The GCC GCN back end is limited to num_workers (16).
> +          Temporarily set this to 1 until multiple workers are permitted. */
> +       workers_actual = 1; // 16;
> +     }

ACK; working on that.

> @@ -328,6 +337,11 @@ int main ()
>         /* We're actually executing with num_workers (32).  */
>         /* workers_actual = 32; */
>       }
> +      else if (acc_on_device (acc_device_radeon))
> +     {
> +       /* The GCC GCN back end is limited to num_workers (16).  */
> +       workers_actual = 16;
> +     }

That's surprising however, that here we're not similarly having to set
'workers_actual = 1' -- find the explanation in a forthcoming email.  ;-)

> [...]


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
diff mbox series

Patch

Update OpenACC tests for amdgcn

2020-01-20  Andrew Stubbs  <ams@codesourcery.com>

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Skip test on gcn.
	* testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c (main):
	Adjust test dimensions for amdgcn.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c (main): Adjust
	gang/worker/vector expectations dynamically.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c
	(main): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c (main): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c (main): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c (main): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c (main): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c (main): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c (main): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c (main): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c (main): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
	(acc_gang): Recognise acc_device_radeon.
	(acc_worker): Likewise.
	(acc_vector): Likewise.
	(main): Set expectations for amdgcn.
	* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c
	(main): Adjust gang/worker/vector expectations dynamically.
	* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c (main): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c (main): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c (main): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Set expectations
	for amdgcn.

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
index 34bc57e51f5..0c9ae957460 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
@@ -1,3 +1,6 @@ 
+/* AMD GCN does not use 32-lane vectors.
+   { dg-skip-if "unsuitable dimensions" { openacc_amdgcn_accel_selected } { "*" } { "" } } */
+
 /* { dg-additional-options "-fopenacc-dim=32" } */
 
 #include <stdio.h>
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c
index 04387d36174..30f0539707f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c
@@ -128,5 +128,14 @@  int test_1 (int gp, int wp, int vp)
 
 int main ()
 {
+#ifdef ACC_DEVICE_TYPE_gcn
+  /* AMD GCN uses the autovectorizer for the vector dimension: the use
+     of a function call in vector-partitioned code in this test is not
+     currently supported.  */
+  /* AMD GCN does not currently support multiple workers.  This should be
+     set to 16 when that changes.  */
+  return test_1 (16, 1, 1);
+#else
   return test_1 (16, 16, 32);
+#endif
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c
index 766e5782b46..5c843012061 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c
@@ -9,11 +9,13 @@  int main ()
   int ix;
   int exit = 0;
   int ondev = 0;
+  int gangsize, workersize, vectorsize;
 
   for (ix = 0; ix < N;ix++)
     ary[ix] = -1;
   
-#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) copy(ary) copy(ondev)
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+	    copy(ary) copy(ondev) copyout(gangsize, workersize, vectorsize)
   {
 #pragma acc loop gang worker vector
     for (unsigned ix = 0; ix < N; ix++)
@@ -32,6 +34,10 @@  int main ()
 	else
 	  ary[ix] = ix;
       }
+
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -39,11 +45,12 @@  int main ()
       int expected = ix;
       if(ondev)
 	{
-	  int chunk_size = (N + 32*32*32 - 1) / (32*32*32);
+	  int chunk_size = (N + gangsize * workersize * vectorsize - 1)
+			   / (gangsize * workersize * vectorsize);
 	  
-	  int g = ix / (chunk_size * 32 * 32);
-	  int w = ix / 32 % 32;
-	  int v = ix % 32;
+	  int g = ix / (chunk_size * workersize * vectorsize);
+	  int w = (ix / vectorsize) % workersize;
+	  int v = ix % vectorsize;
 
 	  expected = (g << 16) | (w << 8) | v;
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c
index 0bec6e19510..9c4a85f7b16 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c
@@ -8,8 +8,10 @@  int main ()
   int ix;
   int ondev = 0;
   int t = 0, h = 0;
-  
-#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) copy(ondev)
+  int gangsize, workersize, vectorsize;
+
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+	copy(ondev) copyout(gangsize, workersize, vectorsize)
   {
 #pragma acc loop gang worker vector reduction(+:t)
     for (unsigned ix = 0; ix < N; ix++)
@@ -28,18 +30,22 @@  int main ()
 	  }
 	t += val;
       }
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
   }
 
   for (ix = 0; ix < N; ix++)
     {
       int val = ix;
-      if(ondev)
+      if (ondev)
 	{
-	  int chunk_size = (N + 32*32*32 - 1) / (32*32*32);
+	  int chunk_size = (N + gangsize * workersize * vectorsize - 1)
+			   / (gangsize * workersize * vectorsize);
 	  
-	  int g = ix / (chunk_size * 32 * 32);
-	  int w = ix / 32 % 32;
-	  int v = ix % 32;
+	  int g = ix / (chunk_size * vectorsize * workersize);
+	  int w = ix / vectorsize % workersize;
+	  int v = ix % vectorsize;
 
 	  val = (g << 16) | (w << 8) | v;
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c
index da4921d15f9..1173c1f57bb 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c
@@ -9,8 +9,9 @@  int main ()
   int ix;
   int ondev = 0;
   int t = 0,  h = 0;
+  int vectorsize;
 
-#pragma acc parallel vector_length(32) copy(ondev)
+#pragma acc parallel vector_length(32) copy(ondev) copyout(vectorsize)
   {
 #pragma acc loop vector reduction (+:t)
     for (unsigned ix = 0; ix < N; ix++)
@@ -29,6 +30,7 @@  int main ()
 	  }
 	t += val;
       }
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -38,7 +40,7 @@  int main ()
 	{
 	  int g = 0;
 	  int w = 0;
-	  int v = ix % 32;
+	  int v = ix % vectorsize;
 
 	  val = (g << 16) | (w << 8) | v;
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c
index 15e2bc2f83b..84c2296a7b1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c
@@ -9,8 +9,9 @@  int main ()
   int ix;
   int ondev = 0;
   int q = 0,  h = 0;
+  int vectorsize;
 
-#pragma acc parallel vector_length(32) copy(q) copy(ondev)
+#pragma acc parallel vector_length(32) copy(q) copy(ondev) copyout(vectorsize)
   {
     int t = q;
     
@@ -32,6 +33,7 @@  int main ()
 	t += val;
       }
     q = t;
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -41,7 +43,7 @@  int main ()
 	{
 	  int g = 0;
 	  int w = 0;
-	  int v = ix % 32;
+	  int v = ix % vectorsize;
 
 	  val = (g << 16) | (w << 8) | v;
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
index 6bbd04fffea..648f89e1668 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
@@ -8,8 +8,10 @@  int main ()
   int ix;
   int ondev = 0;
   int t = 0,  h = 0;
+  int workersize;
 
-#pragma acc parallel num_workers(32) vector_length(32) copy(ondev)
+#pragma acc parallel num_workers(32) vector_length(32) copy(ondev) \
+	    copyout(workersize)
   {
 #pragma acc loop worker reduction(+:t)
     for (unsigned ix = 0; ix < N; ix++)
@@ -28,6 +30,7 @@  int main ()
 	  }
 	t += val;
       }
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -36,7 +39,7 @@  int main ()
       if(ondev)
 	{
 	  int g = 0;
-	  int w = ix % 32;
+	  int w = ix % workersize;
 	  int v = 0;
 
 	  val = (g << 16) | (w << 8) | v;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
index c63a5d4f808..f9fcf3703af 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
@@ -8,8 +8,10 @@  int main ()
   int ix;
   int ondev = 0;
   int q = 0,  h = 0;
+  int workersize;
 
-#pragma acc parallel num_workers(32) vector_length(32) copy(q) copy(ondev)
+#pragma acc parallel num_workers(32) vector_length(32) copy(q) copy(ondev) \
+	    copyout(workersize)
   {
     int t = q;
     
@@ -31,6 +33,7 @@  int main ()
 	t += val;
       }
     q = t;
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -39,7 +42,7 @@  int main ()
       if(ondev)
 	{
 	  int g = 0;
-	  int w = ix % 32;
+	  int w = ix % workersize;
 	  int v = 0;
 
 	  val = (g << 16) | (w << 8) | v;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c
index 71d3969f7b6..c360ad11e7c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c
@@ -8,8 +8,10 @@  int main ()
   int ix;
   int ondev = 0;
   int t = 0, h = 0;
+  int workersize, vectorsize;
   
-#pragma acc parallel num_workers(32) vector_length(32) copy(ondev)
+#pragma acc parallel num_workers(32) vector_length(32) copy(ondev) \
+	    copyout(workersize, vectorsize)
   {
 #pragma acc loop worker vector reduction (+:t)
     for (unsigned ix = 0; ix < N; ix++)
@@ -28,6 +30,8 @@  int main ()
 	  }
 	t += val;
       }
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -36,8 +40,8 @@  int main ()
       if(ondev)
 	{
 	  int g = 0;
-	  int w = (ix / 32) % 32;
-	  int v = ix % 32;
+	  int w = (ix / vectorsize) % workersize;
+	  int v = ix % vectorsize;
 
 	  val = (g << 16) | (w << 8) | v;
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-v-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-v-1.c
index 6010cd2498a..8c858f30563 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-v-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-v-1.c
@@ -9,11 +9,13 @@  int main ()
   int ix;
   int exit = 0;
   int ondev = 0;
+  int vectorsize;
 
   for (ix = 0; ix < N;ix++)
     ary[ix] = -1;
   
-#pragma acc parallel vector_length(32) copy(ary) copy(ondev)
+#pragma acc parallel vector_length(32) copy(ary) copy(ondev) \
+	    copyout(vectorsize)
   {
 #pragma acc loop vector
     for (unsigned ix = 0; ix < N; ix++)
@@ -31,6 +33,7 @@  int main ()
 	else
 	  ary[ix] = ix;
       }
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -40,7 +43,7 @@  int main ()
 	{
 	  int g = 0;
 	  int w = 0;
-	  int v = ix % 32;
+	  int v = ix % vectorsize;
 
 	  expected = (g << 16) | (w << 8) | v;
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
index fa6fb9164e6..5fe486f50a1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
@@ -9,11 +9,13 @@  int main ()
   int ix;
   int exit = 0;
   int ondev = 0;
+  int workersize;
 
   for (ix = 0; ix < N;ix++)
     ary[ix] = -1;
   
-#pragma acc parallel num_workers(32) vector_length(32) copy(ary) copy(ondev)
+#pragma acc parallel num_workers(32) vector_length(32) copy(ary) copy(ondev) \
+	    copyout(workersize)
   {
 #pragma acc loop worker
     for (unsigned ix = 0; ix < N; ix++)
@@ -31,6 +33,7 @@  int main ()
 	else
 	  ary[ix] = ix;
       }
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -39,7 +42,7 @@  int main ()
       if(ondev)
 	{
 	  int g = 0;
-	  int w = ix % 32;
+	  int w = ix % workersize;
 	  int v = 0;
 
 	  expected = (g << 16) | (w << 8) | v;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c
index cd4cc994b82..fd4e4cf5ea9 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c
@@ -9,11 +9,13 @@  int main ()
   int ix;
   int exit = 0;
   int ondev = 0;
+  int workersize, vectorsize;
 
   for (ix = 0; ix < N;ix++)
     ary[ix] = -1;
   
-#pragma acc parallel num_workers(32) vector_length(32) copy(ary) copy(ondev)
+#pragma acc parallel num_workers(32) vector_length(32) copy(ary) copy(ondev) \
+	    copyout(workersize, vectorsize)
   {
 #pragma acc loop worker vector
     for (unsigned ix = 0; ix < N; ix++)
@@ -31,6 +33,8 @@  int main ()
 	else
 	  ary[ix] = ix;
       }
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -39,8 +43,8 @@  int main ()
       if(ondev)
 	{
 	  int g = 0;
-	  int w = (ix / 32) % 32;
-	  int v = ix % 32;
+	  int w = (ix / vectorsize) % workersize;
+	  int v = ix % vectorsize;
 
 	  expected = (g << 16) | (w << 8) | v;
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
index a5edfc6ca16..cc4c738c1db 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
@@ -14,7 +14,8 @@  static unsigned int __attribute__ ((optimize ("O2"))) acc_gang ()
 {
   if (acc_on_device ((int) acc_device_host))
     return 0;
-  else if (acc_on_device ((int) acc_device_nvidia))
+  else if (acc_on_device ((int) acc_device_nvidia)
+	   || acc_on_device ((int) acc_device_radeon))
     return __builtin_goacc_parlevel_id (GOMP_DIM_GANG);
   else
     __builtin_abort ();
@@ -25,7 +26,8 @@  static unsigned int __attribute__ ((optimize ("O2"))) acc_worker ()
 {
   if (acc_on_device ((int) acc_device_host))
     return 0;
-  else if (acc_on_device ((int) acc_device_nvidia))
+  else if (acc_on_device ((int) acc_device_nvidia)
+	   || acc_on_device ((int) acc_device_radeon))
     return __builtin_goacc_parlevel_id (GOMP_DIM_WORKER);
   else
     __builtin_abort ();
@@ -36,7 +38,8 @@  static unsigned int __attribute__ ((optimize ("O2"))) acc_vector ()
 {
   if (acc_on_device ((int) acc_device_host))
     return 0;
-  else if (acc_on_device ((int) acc_device_nvidia))
+  else if (acc_on_device ((int) acc_device_nvidia)
+	   || acc_on_device ((int) acc_device_radeon))
     return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR);
   else
     __builtin_abort ();
@@ -282,6 +285,12 @@  int main ()
 	  /* The GCC nvptx back end enforces num_workers (32).  */
 	  workers_actual = 32;
 	}
+      else if (acc_on_device (acc_device_radeon))
+	{
+	  /* The GCC GCN back end is limited to num_workers (16).
+	     Temporarily set this to 1 until multiple workers are permitted. */
+	  workers_actual = 1; // 16;
+	}
       else
 	__builtin_abort ();
 #pragma acc loop worker reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
@@ -328,6 +337,11 @@  int main ()
 	  /* We're actually executing with num_workers (32).  */
 	  /* workers_actual = 32; */
 	}
+      else if (acc_on_device (acc_device_radeon))
+	{
+	  /* The GCC GCN back end is limited to num_workers (16).  */
+	  workers_actual = 16;
+	}
       else
 	__builtin_abort ();
 #pragma acc loop worker reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
@@ -367,6 +381,11 @@  int main ()
 	  /* The GCC nvptx back end enforces vector_length (32).  */
 	  vectors_actual = 1024;
 	}
+      else if (acc_on_device (acc_device_radeon))
+	{
+	  /* The GCC GCN back end enforces vector_length (1): autovectorize. */
+	  vectors_actual = 1;
+	}
       else
 	__builtin_abort ();
 #pragma acc loop vector reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
@@ -407,6 +426,13 @@  int main ()
 	  /* The GCC nvptx back end enforces vector_length (32).  */
 	  vectors_actual = 32;
 	}
+      else if (acc_on_device (acc_device_radeon))
+	{
+	  /* Because of the way vectors are implemented for GCN, a vector loop
+	     containing a seq routine call will not vectorize calls to that
+	     routine.  Hence, we'll only get one "vector".  */
+	  vectors_actual = 1;
+	}
       else
 	__builtin_abort ();
 #pragma acc loop vector reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
@@ -433,6 +459,9 @@  int main ()
        in the following case.  So, limit ourselves here.  */
     if (acc_get_device_type () == acc_device_nvidia)
       gangs = 3;
+    /* Similar appears to be true for GCN.  */
+    if (acc_get_device_type () == acc_device_radeon)
+      gangs = 3;
     int gangs_actual = gangs;
 #define WORKERS 3
     int workers_actual = WORKERS;
@@ -459,6 +488,13 @@  int main ()
 	  /* The GCC nvptx back end enforces vector_length (32).  */
 	  vectors_actual = 32;
 	}
+      else if (acc_on_device (acc_device_radeon))
+	{
+	  /* Temporary setting, until multiple workers are permitted.  */
+	  workers_actual = 1;
+	  /* See above comments about GCN vectors_actual.  */
+	  vectors_actual = 1;
+	}
       else
 	__builtin_abort ();
 #pragma acc loop gang reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gangs_max, workers_max, vectors_max)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c
index a97e046b687..da13d84908a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c
@@ -30,14 +30,18 @@  int main ()
   int ix;
   int exit = 0;
   int ondev = 0;
+  int gangsize, workersize, vectorsize;
 
   for (ix = 0; ix < N;ix++)
     ary[ix] = -1;
   
-#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) copy(ary) copy(ondev)
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) copy(ary) copy(ondev) copyout(gangsize, workersize, vectorsize)
   {
     ondev = acc_on_device (acc_device_not_host);
     gang (ary);
+    gangsize = __builtin_goacc_parlevel_size (GOMP_DIM_GANG);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -45,11 +49,12 @@  int main ()
       int expected = ix;
       if(ondev)
 	{
-	  int chunk_size = (N + 32*32*32 - 1) / (32*32*32);
+	  int chunk_size = (N + gangsize * workersize * vectorsize - 1)
+			   / (gangsize * workersize * vectorsize);
 	  
-	  int g = ix / (chunk_size * 32 * 32);
-	  int w = ix / 32 % 32;
-	  int v = ix % 32;
+	  int g = ix / (chunk_size * vectorsize * workersize);
+	  int w = (ix / vectorsize) % workersize;
+	  int v = ix % vectorsize;
 
 	  expected = (g << 16) | (w << 8) | v;
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-v-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-v-1.c
index b1e3e3a596a..dd7bb6cdcd1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-v-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-v-1.c
@@ -30,14 +30,17 @@  int main ()
   int ix;
   int exit = 0;
   int ondev = 0;
+  int vectorsize;
 
   for (ix = 0; ix < N;ix++)
     ary[ix] = -1;
   
-#pragma acc parallel vector_length(32) copy(ary) copy(ondev)
+#pragma acc parallel vector_length(32) copy(ary) copy(ondev) \
+	    copyout(vectorsize)
   {
     ondev = acc_on_device (acc_device_not_host);
     vector (ary);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -47,7 +50,7 @@  int main ()
 	{
 	  int g = 0;
 	  int w = 0;
-	  int v = ix % 32;
+	  int v = ix % vectorsize;
 
 	  expected = (g << 16) | (w << 8) | v;
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c
index 81f1e0361c0..acd9884cbd6 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c
@@ -30,14 +30,17 @@  int main ()
   int ix;
   int exit = 0;
   int ondev = 0;
+  int workersize;
 
   for (ix = 0; ix < N;ix++)
     ary[ix] = -1;
   
-#pragma acc parallel num_workers(32) vector_length(32) copy(ary) copy(ondev)
+#pragma acc parallel num_workers(32) vector_length(32) copy(ary) copy(ondev) \
+	    copyout(workersize)
   {
     ondev = acc_on_device (acc_device_not_host);
     worker (ary);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -46,7 +49,7 @@  int main ()
       if(ondev)
 	{
 	  int g = 0;
-	  int w = ix % 32;
+	  int w = ix % workersize;
 	  int v = 0;
 
 	  expected = (g << 16) | (w << 8) | v;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c
index 23dbc1ae401..73696e4e59a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c
@@ -30,14 +30,18 @@  int main ()
   int ix;
   int exit = 0;
   int ondev = 0;
+  int workersize, vectorsize;
 
   for (ix = 0; ix < N;ix++)
     ary[ix] = -1;
   
-#pragma acc parallel num_workers(32) vector_length(32) copy(ary) copy(ondev)
+#pragma acc parallel num_workers(32) vector_length(32) copy(ary) copy(ondev) \
+	    copyout(workersize, vectorsize)
   {
     ondev = acc_on_device (acc_device_not_host);
     worker (ary);
+    workersize = __builtin_goacc_parlevel_size (GOMP_DIM_WORKER);
+    vectorsize = __builtin_goacc_parlevel_size (GOMP_DIM_VECTOR);
   }
 
   for (ix = 0; ix < N; ix++)
@@ -46,8 +50,8 @@  int main ()
       if(ondev)
 	{
 	  int g = 0;
-	  int w = (ix / 32) % 32;
-	  int v = ix % 32;
+	  int w = (ix / vectorsize) % workersize;
+	  int v = ix % vectorsize;
 
 	  expected = (g << 16) | (w << 8) | v;
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c
index 886214843f1..609f9f6a7da 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c
@@ -2,8 +2,13 @@ 
 #include <openacc.h>
 #include <gomp-constants.h>
 
+#ifdef ACC_DEVICE_TYPE_gcn
+#define NUM_WORKERS 16
+#define NUM_VECTORS 1
+#else
 #define NUM_WORKERS 16
 #define NUM_VECTORS 32
+#endif
 #define WIDTH 64
 #define HEIGHT 32