diff mbox

[4/16] Implement -foffload-alias

Message ID 565058F0.8040509@mentor.com
State New
Headers show

Commit Message

Tom de Vries Nov. 21, 2015, 11:43 a.m. UTC
On 13/11/15 12:39, Jakub Jelinek wrote:
> On Fri, Nov 13, 2015 at 12:29:51PM +0100, Richard Biener wrote:
>>> thanks for the explanation. Filed as PR68331 - '[meta-bug] fipa-pta issues'.
>>>
>>> Any feedback on the '#pragma GCC offload-alias=<none|pointer|all>' bit above?
>>> Is that sort of what you had in mind?
>>
>> Yes.  Whether that makes sense is another question of course.  You can
>> annotate memory references with MR_DEPENDENCE_BASE/CLIQUE yourself
>> as well if you know dependences without the users intervention.
>
> I really don't like even the GCC offload-alias, I just don't see anything
> special on the offload code.  Not to mention that the same issue is already
> with other outlined functions, like OpenMP tasks or parallel regions, those
> aren't offloaded, yet they can suffer from worse alias/points-to analysis
> too.

AFAIU there is one aspect that is different for offloaded code: the 
setup of the data on the device.

Consider this example:
...
unsigned int a[N];
unsigned int b[N];
unsigned int c[N];

int
main (void)
{
   ...

#pragma acc kernels copyin (a) copyin (b) copyout (c)
   {
     for (COUNTERTYPE ii = 0; ii < N; ii++)
       c[ii] = a[ii] + b[ii];
   }

   ...
...

At gimple level, we have:
...
#pragma omp target oacc_kernels \
   map(force_from:c [len: 2097152]) \
   map(force_to:b [len: 2097152]) \
   map(force_to:a [len: 2097152])
...

[ The meaning of the force_from/force_to mappings is given in 
include/gomp-constants.h:
...
     /* Allocate.  */
     GOMP_MAP_FORCE_ALLOC = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_ALLOC),
     /* ..., and copy to device.  */
     GOMP_MAP_FORCE_TO = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TO),
     /* ..., and copy from device.  */
     GOMP_MAP_FORCE_FROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_FROM),
     /* ..., and copy to and from device.  */
     GOMP_MAP_FORCE_TOFROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM),
...  ]

So before calling the offloaded function, a separate alloc is done for 
a, b and c, and the base pointers of the newly allocated objects are 
passed to the offloaded function.

This means we can mark those base pointers as restrict in the offloaded 
function.

Attached proof-of-concept patch implements that.

> We simply have some compiler internal interface between the caller and
> callee of the outlined regions, each interface in between those has
> its own structure type used to communicate the info;
> we can attach attributes on the fields, or some flags to indicate some
> properties interesting from aliasing POV.
> We don't really need to perform
> full IPA-PTA, perhaps it would be enough to a) record somewhere in cgraph
> the relationship in between such callers and callees (for offloading regions
> we already have "omp target entrypoint" attribute on the callee and a
> singler caller), tell LTO if possible not to split those into different
> partitions if easily possible, and then just for these pairs perform
> aliasing/points-to analysis in the caller and the result record using
> cliques/special attributes/whatever to the callee side, so that the callee
> (outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias analysis.

As a start, is the approach of this patch OK?

It will allow us to commit the oacc kernels patch series with the 
ability to parallelize non-trivial testcases, and work on improving the 
alias bit after that.

Thanks,
- Tom

Comments

Richard Biener Nov. 23, 2015, 11:41 a.m. UTC | #1
On Sat, 21 Nov 2015, Tom de Vries wrote:

> On 13/11/15 12:39, Jakub Jelinek wrote:
> > On Fri, Nov 13, 2015 at 12:29:51PM +0100, Richard Biener wrote:
> > > > thanks for the explanation. Filed as PR68331 - '[meta-bug] fipa-pta
> > > > issues'.
> > > > 
> > > > Any feedback on the '#pragma GCC offload-alias=<none|pointer|all>' bit
> > > > above?
> > > > Is that sort of what you had in mind?
> > > 
> > > Yes.  Whether that makes sense is another question of course.  You can
> > > annotate memory references with MR_DEPENDENCE_BASE/CLIQUE yourself
> > > as well if you know dependences without the users intervention.
> > 
> > I really don't like even the GCC offload-alias, I just don't see anything
> > special on the offload code.  Not to mention that the same issue is already
> > with other outlined functions, like OpenMP tasks or parallel regions, those
> > aren't offloaded, yet they can suffer from worse alias/points-to analysis
> > too.
> 
> AFAIU there is one aspect that is different for offloaded code: the setup of
> the data on the device.
> 
> Consider this example:
> ...
> unsigned int a[N];
> unsigned int b[N];
> unsigned int c[N];
> 
> int
> main (void)
> {
>   ...
> 
> #pragma acc kernels copyin (a) copyin (b) copyout (c)
>   {
>     for (COUNTERTYPE ii = 0; ii < N; ii++)
>       c[ii] = a[ii] + b[ii];
>   }
> 
>   ...
> ...
> 
> At gimple level, we have:
> ...
> #pragma omp target oacc_kernels \
>   map(force_from:c [len: 2097152]) \
>   map(force_to:b [len: 2097152]) \
>   map(force_to:a [len: 2097152])
> ...
> 
> [ The meaning of the force_from/force_to mappings is given in
> include/gomp-constants.h:
> ...
>     /* Allocate.  */
>     GOMP_MAP_FORCE_ALLOC = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_ALLOC),
>     /* ..., and copy to device.  */
>     GOMP_MAP_FORCE_TO = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TO),
>     /* ..., and copy from device.  */
>     GOMP_MAP_FORCE_FROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_FROM),
>     /* ..., and copy to and from device.  */
>     GOMP_MAP_FORCE_TOFROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM),
> ...  ]
> 
> So before calling the offloaded function, a separate alloc is done for a, b
> and c, and the base pointers of the newly allocated objects are passed to the
> offloaded function.
> 
> This means we can mark those base pointers as restrict in the offloaded
> function.
> 
> Attached proof-of-concept patch implements that.
> 
> > We simply have some compiler internal interface between the caller and
> > callee of the outlined regions, each interface in between those has
> > its own structure type used to communicate the info;
> > we can attach attributes on the fields, or some flags to indicate some
> > properties interesting from aliasing POV.
> > We don't really need to perform
> > full IPA-PTA, perhaps it would be enough to a) record somewhere in cgraph
> > the relationship in between such callers and callees (for offloading regions
> > we already have "omp target entrypoint" attribute on the callee and a
> > singler caller), tell LTO if possible not to split those into different
> > partitions if easily possible, and then just for these pairs perform
> > aliasing/points-to analysis in the caller and the result record using
> > cliques/special attributes/whatever to the callee side, so that the callee
> > (outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias analysis.
> 
> As a start, is the approach of this patch OK?

Works for me but leaving to Jakub to review for correctness.

Richard.

> It will allow us to commit the oacc kernels patch series with the ability to
> parallelize non-trivial testcases, and work on improving the alias bit after
> that.
> 
> Thanks,
> - Tom
> 
> 
> 
>
diff mbox

Patch

Mark pointers to allocated target vars as restricted, if possible

---
 gcc/omp-low.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 62 insertions(+), 5 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 268b67b..0ce822d 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1372,7 +1372,8 @@  build_sender_ref (tree var, omp_context *ctx)
 /* Add a new field for VAR inside the structure CTX->SENDER_DECL.  */
 
 static void
-install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
+install_var_field_1 (tree var, bool by_ref, int mask, omp_context *ctx,
+		     bool base_pointers_restrict)
 {
   tree field, type, sfield = NULL_TREE;
   splay_tree_key key = (splay_tree_key) var;
@@ -1396,7 +1397,11 @@  install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
       type = build_pointer_type (build_pointer_type (type));
     }
   else if (by_ref)
-    type = build_pointer_type (type);
+    {
+      type = build_pointer_type (type);
+      if (base_pointers_restrict)
+	type = build_qualified_type (type, TYPE_QUAL_RESTRICT);
+    }
   else if ((mask & 3) == 1 && is_reference (var))
     type = TREE_TYPE (type);
 
@@ -1460,6 +1465,12 @@  install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
     splay_tree_insert (ctx->sfield_map, key, (splay_tree_value) sfield);
 }
 
+static void
+install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
+{
+  install_var_field_1 (var, by_ref, mask, ctx, false);
+}
+
 static tree
 install_var_local (tree var, omp_context *ctx)
 {
@@ -1816,7 +1827,8 @@  fixup_child_record_type (omp_context *ctx)
    specified by CLAUSES.  */
 
 static void
-scan_sharing_clauses (tree clauses, omp_context *ctx)
+scan_sharing_clauses_1 (tree clauses, omp_context *ctx,
+			bool base_pointers_restrict)
 {
   tree c, decl;
   bool scan_array_reductions = false;
@@ -2073,7 +2085,7 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 		      && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
 		    install_var_field (decl, true, 7, ctx);
 		  else
-		    install_var_field (decl, true, 3, ctx);
+		    install_var_field_1 (decl, true, 3, ctx, base_pointers_restrict);
 		  if (is_gimple_omp_offloaded (ctx->stmt))
 		    install_var_local (decl, ctx);
 		}
@@ -2339,6 +2351,12 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 	scan_omp (&OMP_CLAUSE_LINEAR_GIMPLE_SEQ (c), ctx);
 }
 
+static void
+scan_sharing_clauses (tree clauses, omp_context *ctx)
+{
+  scan_sharing_clauses_1 (clauses, ctx, false);
+}
+
 /* Create a new name for omp child function.  Returns an identifier.  If
    IS_CILK_FOR is true then the suffix for the child function is
    "_cilk_for_fn."  */
@@ -3056,13 +3074,52 @@  scan_omp_target (gomp_target *stmt, omp_context *outer_ctx)
   DECL_NAMELESS (name) = 1;
   TYPE_NAME (ctx->record_type) = name;
   TYPE_ARTIFICIAL (ctx->record_type) = 1;
+
+  bool base_pointers_restrict = false;
   if (offloaded)
     {
       create_omp_child_function (ctx, false);
       gimple_omp_target_set_child_fn (stmt, ctx->cb.dst_fn);
+
+      /* If all the clauses force allocation, we can be certain that the objects
+	 on the target are disjoint, and therefore mark the base pointers as
+	 restrict.  */
+      base_pointers_restrict = true;
+      tree c;
+      for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+	{
+	  switch (OMP_CLAUSE_CODE (c))
+	    {
+	    case OMP_CLAUSE_MAP:
+	      switch (OMP_CLAUSE_MAP_KIND (c))
+		{
+		case GOMP_MAP_ALLOC:
+		case GOMP_MAP_FORCE_TO:
+		case GOMP_MAP_FORCE_FROM:
+		case GOMP_MAP_FORCE_TOFROM:
+		  break;
+		default:
+		  base_pointers_restrict = false;
+		  break;
+		}
+	      break;
+
+	    default:
+	      base_pointers_restrict = false;
+	      break;
+	    }
+
+	  if (!base_pointers_restrict)
+	    break;
+	}
+      if (base_pointers_restrict)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "Base pointers in offloaded function are restrict\n");
+	}
     }
 
-  scan_sharing_clauses (clauses, ctx);
+  scan_sharing_clauses_1 (clauses, ctx, base_pointers_restrict);
   scan_omp (gimple_omp_body_ptr (stmt), ctx);
 
   if (TYPE_FIELDS (ctx->record_type) == NULL)