diff mbox

[3/3] OpenACC reductions

Message ID 56379197.8020309@acm.org
State New
Headers show

Commit Message

Nathan Sidwell Nov. 2, 2015, 4:38 p.m. UTC
This patch are the initial set of tests.  The libgomp tests use an idiom of 
summing thread identifiers and then checking the expected set of threads 
participated.  They are all derived from the loop tests I recently added for the 
execution model itself.

The fortran test was duplicated in both the gfortran testsuite and the libgomp 
testsuite.   I deleted it from the former.  It was slightly bogus as it asked 
for a vector-length of 40, and appeared to be working by accident by not 
actually partitioning the loop.  I fixed that up and reworked it to avoid 
needing a reduction on a reference variable.  Reference handling will be a later 
patch.

nathan

Comments

Jakub Jelinek Nov. 4, 2015, 10:03 a.m. UTC | #1
On Mon, Nov 02, 2015 at 11:38:47AM -0500, Nathan Sidwell wrote:
> This patch are the initial set of tests.  The libgomp tests use an idiom of
> summing thread identifiers and then checking the expected set of threads
> participated.  They are all derived from the loop tests I recently added for
> the execution model itself.
> 
> The fortran test was duplicated in both the gfortran testsuite and the
> libgomp testsuite.   I deleted it from the former.  It was slightly bogus as
> it asked for a vector-length of 40, and appeared to be working by accident
> by not actually partitioning the loop.  I fixed that up and reworked it to
> avoid needing a reduction on a reference variable.  Reference handling will
> be a later patch.
> 
> nathan

> 2015-11-02  Nathan Sidwell  <nathan@codesourcery.com>
> 
> 	libgomp/
> 	* libgomp.oacc-c-c++-common/loop-red-g-1.c: New.
> 	* libgomp.oacc-c-c++-common/loop-red-gwv-1.c: New.
> 	* libgomp.oacc-c-c++-common/loop-red-v-1.c: New.
> 	* libgomp.oacc-c-c++-common/loop-red-v-2.c: New.
> 	* libgomp.oacc-c-c++-common/loop-red-w-1.c: New.
> 	* libgomp.oacc-c-c++-common/loop-red-w-2.c: New.
> 	* libgomp.oacc-c-c++-common/loop-red-wv-1.c: New.
> 	* libgomp.oacc-fortran/reduction-5.f90: Avoid reference var.
> 
> 	gcc/testsuite/
> 	* gfortran.dg/goacc/reduction-2.f95: Delete.

Ok.

	Jakub
Thomas Schwinge Nov. 6, 2015, 10:49 a.m. UTC | #2
Hi Nathan!

On Mon, 2 Nov 2015 11:38:47 -0500, Nathan Sidwell <nathan@acm.org> wrote:
> This patch are the initial set of tests.  The libgomp tests use an idiom of 
> summing thread identifiers and then checking the expected set of threads 
> participated.  They are all derived from the loop tests I recently added for the 
> execution model itself.
> 
> The fortran test was duplicated in both the gfortran testsuite and the libgomp 
> testsuite.   I deleted it from the former.  It was slightly bogus as it asked 
> for a vector-length of 40, and appeared to be working by accident by not 
> actually partitioning the loop.  I fixed that up

On gomp-4_0-branch, you had modified/XFAILed (ICE) that test in r228955,
<http://news.gmane.org/find-root.php?message_id=%3C56240637.6040601%40acm.org%3E>
-- which still needs to be resolved, so I left that as-is, that is, did
not delete the gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 file in
the merge commit.

> and reworked it to avoid 
> needing a reduction on a reference variable.  Reference handling will be a later 
> patch.

As that is -- apparently -- functional on gomp-4_0-branch, I also left
the libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90 file as-is;
it's also doing more elaborate testing in its gomp-4_0-branch variant.

Merged your trunk r229769 and r229770 into gomp-4_0-branch in r229837,
effectively just adding your new libgomp testsuite files unmodified:

commit a222b569f0234d219fec69cd13b66446f664440d
Merge: 089a022 06d6724
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Nov 6 09:40:44 2015 +0000

    svn merge -r 229768:229770 svn+ssh://gcc.gnu.org/svn/gcc/trunk
    
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229837 138bc75d-0d04-0410-961f-82ee72b054a4

 gcc/testsuite/ChangeLog                            |  4 ++
 libgomp/ChangeLog                                  | 11 ++++
 .../libgomp.oacc-c-c++-common/loop-red-g-1.c       | 54 ++++++++++++++++++++
 .../libgomp.oacc-c-c++-common/loop-red-gwv-1.c     | 56 ++++++++++++++++++++
 .../libgomp.oacc-c-c++-common/loop-red-v-1.c       | 56 ++++++++++++++++++++
 .../libgomp.oacc-c-c++-common/loop-red-v-2.c       | 59 ++++++++++++++++++++++
 .../libgomp.oacc-c-c++-common/loop-red-w-1.c       | 54 ++++++++++++++++++++
 .../libgomp.oacc-c-c++-common/loop-red-w-2.c       | 57 +++++++++++++++++++++
 .../libgomp.oacc-c-c++-common/loop-red-wv-1.c      | 54 ++++++++++++++++++++
 9 files changed, 405 insertions(+)


Grüße
 Thomas
diff mbox

Patch

2015-11-02  Nathan Sidwell  <nathan@codesourcery.com>

	libgomp/
	* libgomp.oacc-c-c++-common/loop-red-g-1.c: New.
	* libgomp.oacc-c-c++-common/loop-red-gwv-1.c: New.
	* libgomp.oacc-c-c++-common/loop-red-v-1.c: New.
	* libgomp.oacc-c-c++-common/loop-red-v-2.c: New.
	* libgomp.oacc-c-c++-common/loop-red-w-1.c: New.
	* libgomp.oacc-c-c++-common/loop-red-w-2.c: New.
	* libgomp.oacc-c-c++-common/loop-red-wv-1.c: New.
	* libgomp.oacc-fortran/reduction-5.f90: Avoid reference var.

	gcc/testsuite/
	* gfortran.dg/goacc/reduction-2.f95: Delete.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c	(working copy)
@@ -0,0 +1,54 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-O2" */
+
+#include <stdio.h>
+
+#define N (32*32*32+17)
+int main ()
+{
+  int ix;
+  int ondev = 0;
+  int t = 0, h = 0;
+  
+#pragma acc parallel num_gangs(32) vector_length(32) copy(t) copy(ondev)
+  {
+#pragma acc loop gang  reduction (+:t)
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int val = ix;
+	
+	if (__builtin_acc_on_device (5))
+	  {
+	    int g = 0, w = 0, v = 0;
+
+	    __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+	    __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+	    __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+	    val = (g << 16) | (w << 8) | v;
+	    ondev = 1;
+	  }
+	t += val;
+      }
+  }
+
+  for (ix = 0; ix < N; ix++)
+    {
+      int val = ix;
+      if(ondev)
+	{
+	  int g = ix / ((N + 31) / 32);
+	  int w = 0;
+	  int v = 0;
+
+	  val = (g << 16) | (w << 8) | v;
+	}
+      h += val;
+    }
+  if (t != h)
+    {
+      printf ("t=%x expected %x\n", t, h);
+      return 1;
+    }
+  
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c	(working copy)
@@ -0,0 +1,56 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-O2" */
+
+#include <stdio.h>
+
+#define N (32*32*32+17)
+int main ()
+{
+  int ix;
+  int ondev = 0;
+  int t = 0, h = 0;
+  
+#pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) copy(t) copy(ondev)
+  {
+#pragma acc loop gang worker vector reduction(+:t)
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int val = ix;
+	
+	if (__builtin_acc_on_device (5))
+	  {
+	    int g = 0, w = 0, v = 0;
+
+	    __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+	    __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+	    __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+	    val = (g << 16) | (w << 8) | v;
+	    ondev = 1;
+	  }
+	t += val;
+      }
+  }
+
+  for (ix = 0; ix < N; ix++)
+    {
+      int val = ix;
+      if(ondev)
+	{
+	  int chunk_size = (N + 32*32*32 - 1) / (32*32*32);
+	  
+	  int g = ix / (chunk_size * 32 * 32);
+	  int w = ix / 32 % 32;
+	  int v = ix % 32;
+
+	  val = (g << 16) | (w << 8) | v;
+	}
+      h += val;
+    }
+  if (t != h)
+    {
+      printf ("t=%x expected %x\n", t, h);
+      return 1;
+    }
+  
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c	(working copy)
@@ -0,0 +1,56 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-O2" */
+
+#include <stdio.h>
+
+#define N (32*32*32+17)
+
+int main ()
+{
+  int ix;
+  int ondev = 0;
+  int t = 0,  h = 0;
+
+#pragma acc parallel vector_length(32) copy(t) copy(ondev)
+  {
+#pragma acc loop vector reduction (+:t)
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int val = ix;
+	
+	if (__builtin_acc_on_device (5))
+	  {
+	    int g = 0, w = 0, v = 0;
+
+	    __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+	    __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+	    __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+	    val = (g << 16) | (w << 8) | v;
+	    ondev = 1;
+	  }
+	t += val;
+      }
+  }
+
+  for (ix = 0; ix < N; ix++)
+    {
+      int val = ix;
+      if (ondev)
+	{
+	  int g = 0;
+	  int w = 0;
+	  int v = ix % 32;
+
+	  val = (g << 16) | (w << 8) | v;
+	}
+      h += val;
+    }
+
+  if (t != h)
+    {
+      printf ("t=%x expected %x\n", t, h);
+      return 1;
+    }
+  
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c	(working copy)
@@ -0,0 +1,59 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-O2" */
+
+#include <stdio.h>
+
+#define N (32*32*32+17)
+
+int main ()
+{
+  int ix;
+  int ondev = 0;
+  int q = 0,  h = 0;
+
+#pragma acc parallel vector_length(32) copy(q) copy(ondev)
+  {
+    int t = q;
+    
+#pragma acc loop vector reduction (+:t)
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int val = ix;
+	
+	if (__builtin_acc_on_device (5))
+	  {
+	    int g = 0, w = 0, v = 0;
+
+	    __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+	    __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+	    __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+	    val = (g << 16) | (w << 8) | v;
+	    ondev = 1;
+	  }
+	t += val;
+      }
+    q = t;
+  }
+
+  for (ix = 0; ix < N; ix++)
+    {
+      int val = ix;
+      if (ondev)
+	{
+	  int g = 0;
+	  int w = 0;
+	  int v = ix % 32;
+
+	  val = (g << 16) | (w << 8) | v;
+	}
+      h += val;
+    }
+
+  if (q != h)
+    {
+      printf ("t=%x expected %x\n", q, h);
+      return 1;
+    }
+  
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c	(working copy)
@@ -0,0 +1,54 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-O2" */
+
+#include <stdio.h>
+
+#define N (32*32*32+17)
+int main ()
+{
+  int ix;
+  int ondev = 0;
+  int t = 0,  h = 0;
+
+#pragma acc parallel num_workers(32) vector_length(32) copy(t) copy(ondev)
+  {
+#pragma acc loop worker reduction(+:t)
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int val = ix;
+	
+	if (__builtin_acc_on_device (5))
+	  {
+	    int g = 0, w = 0, v = 0;
+
+	    __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+	    __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+	    __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+	    val = (g << 16) | (w << 8) | v;
+	    ondev = 1;
+	  }
+	t += val;
+      }
+  }
+
+  for (ix = 0; ix < N; ix++)
+    {
+      int val = ix;
+      if(ondev)
+	{
+	  int g = 0;
+	  int w = ix % 32;
+	  int v = 0;
+
+	  val = (g << 16) | (w << 8) | v;
+	}
+      h += val;
+    }
+  if (t != h)
+    {
+      printf ("t=%x expected %x\n", t, h);
+      return 1;
+    }
+  
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c	(working copy)
@@ -0,0 +1,57 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-O2" */
+
+#include <stdio.h>
+
+#define N (32*32*32+17)
+int main ()
+{
+  int ix;
+  int ondev = 0;
+  int q = 0,  h = 0;
+
+#pragma acc parallel num_workers(32) vector_length(32) copy(q) copy(ondev)
+  {
+    int t = q;
+    
+#pragma acc loop worker reduction(+:t)
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int val = ix;
+	
+	if (__builtin_acc_on_device (5))
+	  {
+	    int g = 0, w = 0, v = 0;
+
+	    __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+	    __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+	    __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+	    val = (g << 16) | (w << 8) | v;
+	    ondev = 1;
+	  }
+	t += val;
+      }
+    q = t;
+  }
+
+  for (ix = 0; ix < N; ix++)
+    {
+      int val = ix;
+      if(ondev)
+	{
+	  int g = 0;
+	  int w = ix % 32;
+	  int v = 0;
+
+	  val = (g << 16) | (w << 8) | v;
+	}
+      h += val;
+    }
+  if (q != h)
+    {
+      printf ("t=%x expected %x\n", q, h);
+      return 1;
+    }
+  
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c	(working copy)
@@ -0,0 +1,54 @@ 
+/* { dg-do run } */
+/* { dg-additional-options "-O2" */
+
+#include <stdio.h>
+
+#define N (32*32*32+17)
+int main ()
+{
+  int ix;
+  int ondev = 0;
+  int t = 0, h = 0;
+  
+#pragma acc parallel num_workers(32) vector_length(32) copy(t) copy(ondev)
+  {
+#pragma acc loop worker vector reduction (+:t)
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	int val = ix;
+	
+	if (__builtin_acc_on_device (5))
+	  {
+	    int g = 0, w = 0, v = 0;
+
+	    __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
+	    __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
+	    __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
+	    val = (g << 16) | (w << 8) | v;
+	    ondev = 1;
+	  }
+	t += val;
+      }
+  }
+
+  for (ix = 0; ix < N; ix++)
+    {
+      int val = ix;
+      if(ondev)
+	{
+	  int g = 0;
+	  int w = (ix / 32) % 32;
+	  int v = ix % 32;
+
+	  val = (g << 16) | (w << 8) | v;
+	}
+      h += val;
+    }
+  if (t != h)
+    {
+      printf ("t=%x expected %x\n", t, h);
+      return 1;
+    }
+  
+  return 0;
+}
Index: libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90
===================================================================
--- libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90	(revision 229667)
+++ libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90	(working copy)
@@ -21,12 +21,15 @@  end program reduction
 subroutine redsub(sum, n, c)
   integer :: sum, n, c
 
-  sum = 0
+  integer :: s
+  s = 0
 
-  !$acc parallel vector_length(n) copyin (n, c) num_gangs(1)
-  !$acc loop reduction(+:sum)
+  !$acc parallel vector_length(32) copyin (n, c) copy (s) num_gangs(1)
+  !$acc loop reduction(+:s)
   do i = 1, n
-     sum = sum + c
+     s = s + c
   end do
   !$acc end parallel
+
+  sum = s
 end subroutine redsub
Index: gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
===================================================================
--- gcc/testsuite/gfortran.dg/goacc/reduction-2.f95	(revision 229667)
+++ gcc/testsuite/gfortran.dg/goacc/reduction-2.f95	(working copy)
@@ -1,21 +0,0 @@ 
-! { dg-do compile }
-
-program reduction
-  integer, parameter    :: n = 40, c = 10
-  integer               :: i, sum
-
-  call redsub (sum, n, c)
-end program reduction
-
-subroutine redsub(sum, n, c)
-  integer :: sum, n, c
-
-  sum = 0
-
-  !$acc parallel vector_length(n) copyin (n, c)
-  !$acc loop reduction(+:sum)
-  do i = 1, n
-     sum = sum + c
-  end do
-  !$acc end parallel
-end subroutine redsub