diff mbox

[OpenACC] Make reduction arguments addressable

Message ID fac841d6-320a-6de8-2bcd-07145151d0f3@codesourcery.com
State New
Headers show

Commit Message

Chung-Lin Tang May 30, 2016, 2:38 p.m. UTC
Hi, a previous patch of Cesar's has made the middle-end omp-lowering
automatically create and insert a tofrom (i.e. present_or_copy) map for
parallel reductions.  This allowed the user to not need explicit
clauses to copy out the reduction result, but because reduction arguments
are not marked addressable, async does not work as expected,
i.e. the asynchronous copy-out results are not used in the compiler generated code.

This patch fixes this in the front-ends, I've tested this patch without
new regressions, and fixes some C++ OpenACC tests that regressed after
my last OpenACC async patch.  Is this okay for trunk?

Thanks,
Chung-Lin

2016-05-30  Chung-Lin Tang  <cltang@codesourcery.com>

	c/
	* c-typeck.c (c_finish_omp_clauses): Mark OpenACC reduction
	arguments as addressable.

	cp/
	* semantics.c (finish_omp_clauses): Mark OpenACC reduction
	arguments as addressable.

	fortran/
	* trans-openmp.c (gfc_trans_oacc_construct): Mark OpenACC reduction
	arguments as addressable.
	(gfc_trans_oacc_combined_directive): Likewise.

Comments

Jakub Jelinek May 30, 2016, 4:53 p.m. UTC | #1
On Mon, May 30, 2016 at 10:38:59PM +0800, Chung-Lin Tang wrote:
> Hi, a previous patch of Cesar's has made the middle-end omp-lowering
> automatically create and insert a tofrom (i.e. present_or_copy) map for
> parallel reductions.  This allowed the user to not need explicit
> clauses to copy out the reduction result, but because reduction arguments
> are not marked addressable, async does not work as expected,
> i.e. the asynchronous copy-out results are not used in the compiler generated code.

If you need it only for async parallel/kernels? regions, can't you do that
only for those and not for others?

> This patch fixes this in the front-ends, I've tested this patch without
> new regressions, and fixes some C++ OpenACC tests that regressed after
> my last OpenACC async patch.  Is this okay for trunk?

Testcases in the testsuite or others?  If the latter, we should add them.

	Jakub
Thomas Schwinge May 31, 2016, 7:28 a.m. UTC | #2
Hi!

On Mon, 30 May 2016 18:53:41 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, May 30, 2016 at 10:38:59PM +0800, Chung-Lin Tang wrote:
> > Hi, a previous patch of Cesar's has made the middle-end omp-lowering
> > automatically create and insert a tofrom (i.e. present_or_copy) map for
> > parallel reductions.  This allowed the user to not need explicit
> > clauses to copy out the reduction result, but because reduction arguments
> > are not marked addressable, async does not work as expected,
> > i.e. the asynchronous copy-out results are not used in the compiler generated code.
> 
> If you need it only for async parallel/kernels? regions, can't you do that
> only for those and not for others?

Also, please add comments to the source code to document the need for
such special handling.

> > This patch fixes this in the front-ends, I've tested this patch without
> > new regressions, and fixes some C++ OpenACC tests that regressed after
> > my last OpenACC async patch.  Is this okay for trunk?
> 
> Testcases in the testsuite or others?  If the latter, we should add them.

The r236772 commit "[PATCH, libgomp] Rewire OpenACC async",
<http://news.gmane.org/find-root.php?message_id=%3C56FA4F69.1060101%40codesourcery.com%3E>
regressed (or, triggered/exposed the existing wrong behavior?)
libgomp.oacc-c++/template-reduction.C execution testing for nvptx
offloading.  (Please always send email about such known regressions, and
XFAIL them with your commit -- that would have saved me an hour
yesterday, when I bisected recent changes to figure out why that test
suddenly fails.)

For reference, here is a test case, a reduced C version of
libgomp/testsuite/libgomp.oacc-c++/template-reduction.C.  This test case
works (without Chung-Lin's "[PATCH, OpenACC] Make reduction arguments
addressable" patch) if I enable "POCs", which surprises me a bit, because
I thought after Cesar's recent changes, the gimplifier is doing the same
thing of adding a data clause next to the reduction clause.  Probably
it's not doing the exactly same thing, though.  Should it?  Cesar, do you
have any comments on this?  For example (just guessing), should
TREE_ADDRESSABLE be set where the gimplifier does its work, instead of in
the three front ends?

    // Reduced C version of libgomp/testsuite/libgomp.oacc-c++/template-reduction.C.
    
    const int n = 100;
    
    // Check present and async and an explicit firstprivate
    
    int
    async_sum (int c)
    {
      int s = 0;
    
    #define POCs //present_or_copy(s)
    #pragma acc parallel loop num_gangs (10) gang reduction (+:s) POCs firstprivate (c) async
      for (int i = 0; i < n; i++)
        s += i+c;
    
    #pragma acc wait
    
      return s;
    }
    
    int
    main()
    {
      int result = 0;
    
      for (int i = 0; i < n; i++)
        {
          result += i+1;
        }
    
      if (async_sum (1) != result)
        __builtin_abort ();
    
      return 0;
    }


Grüße
 Thomas
Chung-Lin Tang May 31, 2016, 9:51 a.m. UTC | #3
On 2016/5/31 3:28 PM, Thomas Schwinge wrote:
> Hi!
> 
> On Mon, 30 May 2016 18:53:41 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Mon, May 30, 2016 at 10:38:59PM +0800, Chung-Lin Tang wrote:
>>> Hi, a previous patch of Cesar's has made the middle-end omp-lowering
>>> automatically create and insert a tofrom (i.e. present_or_copy) map for
>>> parallel reductions.  This allowed the user to not need explicit
>>> clauses to copy out the reduction result, but because reduction arguments
>>> are not marked addressable, async does not work as expected,
>>> i.e. the asynchronous copy-out results are not used in the compiler generated code.
>>
>> If you need it only for async parallel/kernels? regions, can't you do that
>> only for those and not for others?

That is achievable, but not in line with how we currently treat all other
data clause OMP_CLAUSE_MAPs, which are all marked addressable. Is this special
case handling really better here?

> Also, please add comments to the source code to document the need for
> such special handling.
> 
>>> This patch fixes this in the front-ends, I've tested this patch without
>>> new regressions, and fixes some C++ OpenACC tests that regressed after
>>> my last OpenACC async patch.  Is this okay for trunk?
>>
>> Testcases in the testsuite or others?  If the latter, we should add them.
> 
> The r236772 commit "[PATCH, libgomp] Rewire OpenACC async",
> <http://news.gmane.org/find-root.php?message_id=%3C56FA4F69.1060101%40codesourcery.com%3E>
> regressed (or, triggered/exposed the existing wrong behavior?)
> libgomp.oacc-c++/template-reduction.C execution testing for nvptx
> offloading.  (Please always send email about such known regressions, and
> XFAIL them with your commit -- that would have saved me an hour
> yesterday, when I bisected recent changes to figure out why that test
> suddenly fails.)

Sorry, Thomas. I was going to quickly send this follow-up patch, so glossed over XFAILing.

> For reference, here is a test case, a reduced C version of
> libgomp/testsuite/libgomp.oacc-c++/template-reduction.C.  This test case
> works (without Chung-Lin's "[PATCH, OpenACC] Make reduction arguments
> addressable" patch) if I enable "POCs", which surprises me a bit, because
> I thought after Cesar's recent changes, the gimplifier is doing the same
> thing of adding a data clause next to the reduction clause.  Probably
> it's not doing the exactly same thing, though.  Should it?  Cesar, do you
> have any comments on this?  For example (just guessing), should
> TREE_ADDRESSABLE be set where the gimplifier does its work, instead of in
> the three front ends?

There's really nothing wrong about Cesar's patch. The marking addressable needs
to be done earlier, or it may be too late during gimplification. I already
tried to fix this in gimplify.c before, but didn't completely work.

I'll add more testcases for this before I commit any final patches.

Thanks,
Chung-Lin

> 
>     // Reduced C version of libgomp/testsuite/libgomp.oacc-c++/template-reduction.C.
>     
>     const int n = 100;
>     
>     // Check present and async and an explicit firstprivate
>     
>     int
>     async_sum (int c)
>     {
>       int s = 0;
>     
>     #define POCs //present_or_copy(s)
>     #pragma acc parallel loop num_gangs (10) gang reduction (+:s) POCs firstprivate (c) async
>       for (int i = 0; i < n; i++)
>         s += i+c;
>     
>     #pragma acc wait
>     
>       return s;
>     }
>     
>     int
>     main()
>     {
>       int result = 0;
>     
>       for (int i = 0; i < n; i++)
>         {
>           result += i+1;
>         }
>     
>       if (async_sum (1) != result)
>         __builtin_abort ();
>     
>       return 0;
>     }
> 
> 
> Grüße
>  Thomas
>
diff mbox

Patch

Index: c/c-typeck.c
===================================================================
--- c/c-typeck.c	(revision 236845)
+++ c/c-typeck.c	(working copy)
@@ -12575,6 +12575,8 @@  c_finish_omp_clauses (tree clauses, enum c_omp_reg
 	      remove = true;
 	      break;
 	    }
+	  if (ort & C_ORT_ACC)
+	    c_mark_addressable (t);
 	  type = TREE_TYPE (t);
 	  if (TREE_CODE (t) == MEM_REF)
 	    type = TREE_TYPE (type);
Index: cp/semantics.c
===================================================================
--- cp/semantics.c	(revision 236845)
+++ cp/semantics.c	(working copy)
@@ -5827,6 +5827,8 @@  finish_omp_clauses (tree clauses, enum c_omp_regio
 		t = n;
 	      goto check_dup_generic_t;
 	    }
+	  if (ort & C_ORT_ACC)
+	    cxx_mark_addressable (t);
 	  goto check_dup_generic;
 	case OMP_CLAUSE_COPYPRIVATE:
 	  copyprivate_seen = true;
Index: fortran/trans-openmp.c
===================================================================
--- fortran/trans-openmp.c	(revision 236845)
+++ fortran/trans-openmp.c	(working copy)
@@ -2704,6 +2704,10 @@  gfc_trans_oacc_construct (gfc_code *code)
   gfc_start_block (&block);
   oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses,
 					code->loc);
+  for (tree c = oacc_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+	&& DECL_P (OMP_CLAUSE_DECL (c)))
+      TREE_ADDRESSABLE (OMP_CLAUSE_DECL (c)) = 1;
   stmt = gfc_trans_omp_code (code->block->next, true);
   stmt = build2_loc (input_location, construct_code, void_type_node, stmt,
 		     oacc_clauses);
@@ -3501,6 +3505,10 @@  gfc_trans_oacc_combined_directive (gfc_code *code)
 	construct_clauses.lists[OMP_LIST_REDUCTION] = NULL;
       oacc_clauses = gfc_trans_omp_clauses (&block, &construct_clauses,
 					    code->loc);
+      for (tree c = oacc_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+	    && DECL_P (OMP_CLAUSE_DECL (c)))
+	  TREE_ADDRESSABLE (OMP_CLAUSE_DECL (c)) = 1;
     }
   if (!loop_clauses.seq)
     pblock = &block;