diff mbox series

[OpenACC,2.7] Connect readonly modifier to points-to analysis

Message ID 5196826c-e81a-ab5c-63e9-bd8509232da0@siemens.com
State New
Headers show
Series [OpenACC,2.7] Connect readonly modifier to points-to analysis | expand

Commit Message

Chung-Lin Tang July 25, 2023, 3:52 p.m. UTC
On 2023/7/11 2:33 AM, Chung-Lin Tang via Gcc-patches wrote:
> As we discussed earlier, the work for actually linking this to middle-end
> points-to analysis is a somewhat non-trivial issue. This first patch allows
> the language feature to be used in OpenACC directives first (with no effect for now).
> The middle-end changes are probably going to be a later patch.

This second patch tries to link the readonly modifier to points-to analysis.

There already exists SSA_NAME_POINTS_TO_READONLY_MEMORY and it's support in the
alias oracle routines in tree-ssa-alias.cc, so basically what this patch does is
try to make the variables holding the array section base pointers to have this
flag set.

There is an another OMP_CLAUSE_MAP_POINTS_TO_READONLY set by front-ends on the
associated pointer clauses if OMP_CLAUSE_MAP_READONLY is set.
Also a DECL_POINTS_TO_READONLY flag is set for VAR_DECLs when creating the tmp
vars carrying these receiver references on the offloaded side. These
eventually get translated to SSA_NAME_POINTS_TO_READONLY_MEMORY.

This still doesn't always work as expected in terms of optimization:
struct pointer fields and Fortran arrays (kind of like C structs) which have
several accesses to create the pointer access on the receive/offloaded side,
and SRA appears to not work on these sequences, so gets in the way of much
redundancy elimination.

Currently have one testcase where we can demonstrate 'readonly' can avoid
a clobber by function call. Tested on powerpc64le-linux/nvptx.

Note this patch is create a-top of the front-end patch.
(will respond to the other front-end patch comments later)

Thanks,
Chung-Lin

2023-07-25  Chung-Lin Tang  <cltang@codesourcery.com>

gcc/c/ChangeLog:

	* c-typeck.cc (handle_omp_array_sections):
	Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.

gcc/cp/ChangeLog:

	* semantics.cc (handle_omp_array_sections):
	Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_trans_omp_array_section):
	Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.

gcc/ChangeLog:

	* gimple-expr.cc (copy_var_decl): Copy DECL_POINTS_TO_READONLY
	for VAR_DECLs.
	* gimplify.cc (struct gimplify_omp_ctx):
	Add 'hash_set<tree_operand_hash> *pt_readonly_ptrs' field.
	(internal_get_tmp_var):	Set
	DECL_POINTS_TO_READONLY/SSA_NAME_POINTS_TO_READONLY_MEMORY for
	new temp vars.
	(build_omp_struct_comp_nodes):
	Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.
	(gimplify_scan_omp_clauses): Collect OMP_CLAUSE_MAP_POINTS_TO_READONLY
	to ctx->pt_readonly_ptrs.
	* omp-low.cc (lower_omp_target): Set DECL_POINTS_TO_READONLY for
	variables of receiver refs.
	* tree-pretty-print.cc (dump_omp_clause):
	Print OMP_CLAUSE_MAP_POINTS_TO_READONLY.
	(dump_generic_node): Print SSA_NAME_POINTS_TO_READONLY_MEMORY.
	* tree.h (DECL_POINTS_TO_READONLY): New macro.
	(OMP_CLAUSE_MAP_POINTS_TO_READONLY): New macro.

gcc/testsuite/ChangeLog:

	* c-c++-common/goacc/readonly-1.c: Adjust testcase.
	* c-c++-common/goacc/readonly-2.c: New testcase.
	* gfortran.dg/goacc/readonly-1.f90: Adjust testcase.

Comments

Thomas Schwinge Oct. 27, 2023, 2:28 p.m. UTC | #1
Hi!

Richard, as the original author of 'SSA_NAME_POINTS_TO_READONLY_MEMORY':
2018 commit 6214d5c7e7470bdd5ecbeae668c2522551bfebbc (Subversion r263958)
"Move const_parm trick to generic code"; 'gcc/tree.h':

    /* Nonzero if this SSA_NAME is known to point to memory that may not
       be written to.  This is set for default defs of function parameters
       that have a corresponding r or R specification in the functions
       fn spec attribute.  This is used by alias analysis.  */
    #define SSA_NAME_POINTS_TO_READONLY_MEMORY(NODE) \
        SSA_NAME_CHECK (NODE)->base.deprecated_flag

..., may I ask you to please help review the following patch
(full-quoted)?

For context: this patch here ("second patch") depends on a first patch:
<inbox.sourceware.org/d0e6013f-ca38-b98d-dc01-b30adbd5901a@siemens.com>
"[PATCH, OpenACC 2.7] readonly modifier support in front-ends".  That one
is still under review/rework; so you're not able to apply this second
patch here.

In a nutshell: a 'readonly' modifier has been added to the OpenACC
'copyin' clause (copy host to device memory, don't copy back at end of
region):

| If the optional 'readonly' modifier appears, then the implementation may assume that the data
| referenced by _var-list_ is never written to within the applicable region.

That is, for example (untested):

    #pragma acc routine
    void escape(int *);

    int x[32] = [...];
    #pragma acc parallel copyin(readonly: x)
    {
      int a1 = x[3];
      escape(x);
      int a2 = x[3]; // Per 'readonly', don't need to reload 'x[3]' here.
      //x[22] = 0; // Invalid -- but no diagnostic mandated.
    }

What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
flag.

The actual optimization then is done in this second patch.  Chung-Lin
found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
I don't have much experience with most of the following generic code, so
would appreciate a helping hand, whether that conceptually makes sense as
well as from the implementation point of view:

On 2023-07-25T23:52:06+0800, Chung-Lin Tang via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> On 2023/7/11 2:33 AM, Chung-Lin Tang via Gcc-patches wrote:
>> As we discussed earlier, the work for actually linking this to middle-end
>> points-to analysis is a somewhat non-trivial issue. This first patch allows
>> the language feature to be used in OpenACC directives first (with no effect for now).
>> The middle-end changes are probably going to be a later patch.
>
> This second patch tries to link the readonly modifier to points-to analysis.
>
> There already exists SSA_NAME_POINTS_TO_READONLY_MEMORY and it's support in the
> alias oracle routines in tree-ssa-alias.cc, so basically what this patch does is
> try to make the variables holding the array section base pointers to have this
> flag set.
>
> There is an another OMP_CLAUSE_MAP_POINTS_TO_READONLY set by front-ends on the
> associated pointer clauses if OMP_CLAUSE_MAP_READONLY is set.
> Also a DECL_POINTS_TO_READONLY flag is set for VAR_DECLs when creating the tmp
> vars carrying these receiver references on the offloaded side. These
> eventually get translated to SSA_NAME_POINTS_TO_READONLY_MEMORY.


> This still doesn't always work as expected in terms of optimization:
> struct pointer fields and Fortran arrays (kind of like C structs) which have
> several accesses to create the pointer access on the receive/offloaded side,
> and SRA appears to not work on these sequences, so gets in the way of much
> redundancy elimination.

I understand correctly that this is left as future work?  Please add the test
cases you have, XFAILed in some reasonable way.


> Currently have one testcase where we can demonstrate 'readonly' can avoid
> a clobber by function call.

:-)


> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -14258,6 +14258,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort)
>       OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH);
>        else
>       OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
> +      if (OMP_CLAUSE_MAP_READONLY (c))
> +     OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
>        OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
>        if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
>         && !c_mark_addressable (t))

> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -5872,6 +5872,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort)
>           }
>         else
>           OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
> +       if (OMP_CLAUSE_MAP_READONLY (c))
> +         OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
>         OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
>         if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
>             && !cxx_mark_addressable (t))

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc
> @@ -2524,6 +2524,8 @@ gfc_trans_omp_array_section (stmtblock_t *block, gfc_exec_op op,
>        node3 = build_omp_clause (input_location, OMP_CLAUSE_MAP);
>        OMP_CLAUSE_SET_MAP_KIND (node3, ptr_kind);
>        OMP_CLAUSE_DECL (node3) = gfc_conv_descriptor_data_get (decl);
> +      if (n->u.readonly)
> +     OMP_CLAUSE_MAP_POINTS_TO_READONLY (node3) = 1;
>        /* This purposely does not include GOMP_MAP_ALWAYS_POINTER.  The extra
>        cast prevents gimplify.cc from recognising it as being part of the
>        struct - and adding an 'alloc: for the 'desc.data' pointer, which
> @@ -2559,6 +2561,8 @@ gfc_trans_omp_array_section (stmtblock_t *block, gfc_exec_op op,
>                               OMP_CLAUSE_MAP);
>        OMP_CLAUSE_SET_MAP_KIND (node3, ptr_kind);
>        OMP_CLAUSE_DECL (node3) = decl;
> +      if (n->u.readonly)
> +     OMP_CLAUSE_MAP_POINTS_TO_READONLY (node3) = 1;
>      }

Could combine these two into one, after
'if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))' reconverges here, like
where 'OMP_CLAUSE_SIZE (node3)' is set:

>    ptr2 = fold_convert (ptrdiff_type_node, ptr2);
>    OMP_CLAUSE_SIZE (node3) = fold_build2 (MINUS_EXPR, ptrdiff_type_node,

Is 'n->u.readonly == OMP_CLAUSE_MAP_READONLY (node)'?  If yes, would the
latter be clearer to use as the 'if' expression (like in C, C++ front
ends)?

I see further additional 'OMP_CLAUSE_MAP' clauses synthesized, for
example in 'gcc/cp/semantics.cc:handle_omp_array_sections', or
'gcc/fortran/trans-openmp.cc:gfc_trans_omp_array_section', also
'gcc/gimplify.cc'.  I assume these are not relevant to have
'OMP_CLAUSE_MAP_READONLY' -> 'OMP_CLAUSE_MAP_POINTS_TO_READONLY'
propagated?  Actually, per your changes (see below), there is one
'OMP_CLAUSE_MAP_POINTS_TO_READONLY' propagation in
'gcc/gimplify.cc:build_omp_struct_comp_nodes'.

Is the current situation re flag setting/propagation what was empirically
necessary to make the test case work, or is it a systematic review?  (The
former is fine; I'd just like to know.)

> --- a/gcc/gimple-expr.cc
> +++ b/gcc/gimple-expr.cc
> @@ -376,6 +376,8 @@ copy_var_decl (tree var, tree name, tree type)
>    DECL_CONTEXT (copy) = DECL_CONTEXT (var);
>    TREE_USED (copy) = 1;
>    DECL_SEEN_IN_BIND_EXPR_P (copy) = 1;
> +  if (VAR_P (var))
> +    DECL_POINTS_TO_READONLY (copy) = DECL_POINTS_TO_READONLY (var);
>    DECL_ATTRIBUTES (copy) = DECL_ATTRIBUTES (var);
>    if (DECL_USER_ALIGN (var))
>      {

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -221,6 +221,7 @@ struct gimplify_omp_ctx
>    splay_tree variables;
>    hash_set<tree> *privatized_types;
>    tree clauses;
> +  hash_set<tree_operand_hash> *pt_readonly_ptrs;
>    /* Iteration variables in an OMP_FOR.  */
>    vec<tree> loop_iter_var;
>    location_t location;
> @@ -628,6 +629,15 @@ internal_get_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p,
>    gimplify_expr (&val, pre_p, post_p, is_gimple_reg_rhs_or_call,
>                fb_rvalue);
>
> +  bool pt_readonly = false;
> +  if (gimplify_omp_ctxp && gimplify_omp_ctxp->pt_readonly_ptrs)
> +    {
> +      tree ptr = val;
> +      if (TREE_CODE (ptr) == POINTER_PLUS_EXPR)
> +     ptr = TREE_OPERAND (ptr, 0);
> +      pt_readonly = gimplify_omp_ctxp->pt_readonly_ptrs->contains (ptr);
> +    }

'POINTER_PLUS_EXPR' is the only special thing we may run into, here?
(Generally, I prefer 'if', 'else if, [...], 'else gcc_unreachable ()'.)

> +
>    if (allow_ssa
>        && gimplify_ctxp->into_ssa
>        && is_gimple_reg_type (TREE_TYPE (val)))
> @@ -639,9 +649,18 @@ internal_get_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p,
>         if (name)
>           SET_SSA_NAME_VAR_OR_IDENTIFIER (t, create_tmp_var_name (name));
>       }
> +      if (pt_readonly)
> +     SSA_NAME_POINTS_TO_READONLY_MEMORY (t) = 1;
>      }
>    else
> -    t = lookup_tmp_var (val, is_formal, not_gimple_reg);
> +    {
> +      t = lookup_tmp_var (val, is_formal, not_gimple_reg);
> +      if (pt_readonly)
> +     {
> +       DECL_POINTS_TO_READONLY (t) = 1;
> +       gimplify_omp_ctxp->pt_readonly_ptrs->add (t);
> +     }
> +    }
>
>    mod = build2 (INIT_EXPR, TREE_TYPE (t), t, unshare_expr (val));
>
> @@ -8906,6 +8925,8 @@ build_omp_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end,
>    OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
>    OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (grp_end));
>    OMP_CLAUSE_CHAIN (c2) = NULL_TREE;
> +  if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (grp_end))
> +    OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
>    tree grp_mid = NULL_TREE;
>    if (OMP_CLAUSE_CHAIN (grp_start) != grp_end)
>      grp_mid = OMP_CLAUSE_CHAIN (grp_start);

For my understanding, is this empirically necessary, or a systematic
review?

> @@ -11741,6 +11762,16 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
>
>             gimplify_omp_ctxp = outer_ctx;
>           }
> +       else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> +                && (code == OACC_PARALLEL
> +                    || code == OACC_KERNELS
> +                    || code == OACC_SERIAL)
> +                && OMP_CLAUSE_MAP_POINTS_TO_READONLY (c))
> +         {
> +           if (ctx->pt_readonly_ptrs == NULL)
> +             ctx->pt_readonly_ptrs = new hash_set<tree_operand_hash> ();
> +           ctx->pt_readonly_ptrs->add (OMP_CLAUSE_DECL (c));
> +         }
>         if (notice_outer)
>           goto do_notice;
>         break;

Also need to 'delete ctx->pt_readonly_ptrs;' somewhere.

> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -14098,6 +14098,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>               if (ref_to_array)
>                 x = fold_convert_loc (clause_loc, TREE_TYPE (new_var), x);
>               gimplify_expr (&x, &new_body, NULL, is_gimple_val, fb_rvalue);
> +             if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (c) && VAR_P (x))
> +               DECL_POINTS_TO_READONLY (x) = 1;
>               if ((is_ref && !ref_to_array)
>                   || ref_to_ptr)
>                 {

This is in the middle of the
"Handle GOMP_MAP_FIRSTPRIVATE_{POINTER,REFERENCE} in second pass" code
block.  Again, for my understanding, is this empirically necessary, or a
systematic review?

> --- a/gcc/testsuite/c-c++-common/goacc/readonly-1.c
> +++ b/gcc/testsuite/c-c++-common/goacc/readonly-1.c
> @@ -19,8 +19,8 @@ int main (void)
>    return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*s.ptr \\\[len: 64\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\)" 1 "original" { target { c } } } } */
> -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: 64\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\)" 1 "original" { target { c++ } } } } */
> +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*s.ptr \\\[len: 64\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c } } } } */
> +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: 64\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c++ } } } } */
>  /* { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(readonly:x\\\[0\\\] \\\[len: 128\\\]\\);$" 1 "original" } } */

I suppose the new 'map(pt_readonly,attach_detach:s.ptr [bias: 0])' clause
was previously "hidden" in '.+'?  Please then change that in the first
patch "[PATCH, OpenACC 2.7] readonly modifier support in front-ends", so
that we can see here what actually is changing (only 'pt_readonly', I
suppose).

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/readonly-2.c
> @@ -0,0 +1,15 @@
> +/* { dg-additional-options "-O -fdump-tree-fre" } */
> +
> +#pragma acc routine
> +extern void foo (int *ptr, int val);
> +
> +int main (void)
> +{
> +  int r, a[32];
> +  #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
> +  {
> +    foo (a, a[8]);
> +    r = a[8];
> +  }
> +}
> +/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 1 "fre1" } } */

Please add a comment why 'fre1', and what generally is being checked
here; that's not obvious to the casual reader.  (That is, me in a few
weeks.)  ;-)

Also add a scan for "before the optimization": two 'MEM's, I suppose?

> --- a/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
> +++ b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
> @@ -20,8 +20,8 @@ program main
>    !$acc end parallel
>  end program main
>
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) .+ map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\)" 1 "original" } }
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:a\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &a\\) / 4\\\] \\\[len: .+\\\]\\) .+ map\\(readonly,to:b\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &b\\) / 4\\\] \\\[len: .+\\\]\\)" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:a.0 \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) a.0\\\]\\) map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:b \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) b\\\]\\)" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:a\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &a\\) / 4\\\] \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:a \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &a\\\]\\) map\\(readonly,to:b\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &b\\) / 4\\\] \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:b \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &b\\\]\\)" 1 "original" } }
>  ! { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\);" 2 "original" } }

Same comment as for 'c-c++-common/goacc/readonly-1.c'.

> --- a/gcc/tree-pretty-print.cc
> +++ b/gcc/tree-pretty-print.cc
> @@ -907,6 +907,8 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
>        pp_string (pp, "map(");
>        if (OMP_CLAUSE_MAP_READONLY (clause))
>       pp_string (pp, "readonly,");
> +      if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (clause))
> +     pp_string (pp, "pt_readonly,");
>        switch (OMP_CLAUSE_MAP_KIND (clause))
>       {
>       case GOMP_MAP_ALLOC:
> @@ -3436,6 +3438,8 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
>       pp_string (pp, "(D)");
>        if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (node))
>       pp_string (pp, "(ab)");
> +      if (SSA_NAME_POINTS_TO_READONLY_MEMORY (node))
> +     pp_string (pp, "(ptro)");
>        break;
>
>      case WITH_SIZE_EXPR:

> --- a/gcc/tree-ssanames.cc
> +++ b/gcc/tree-ssanames.cc
> @@ -402,6 +402,9 @@ make_ssa_name_fn (struct function *fn, tree var, gimple *stmt,
>    else
>      SSA_NAME_RANGE_INFO (t) = NULL;
>
> +  if (VAR_P (var) && DECL_POINTS_TO_READONLY (var))
> +    SSA_NAME_POINTS_TO_READONLY_MEMORY (t) = 1;
> +
>    SSA_NAME_IN_FREE_LIST (t) = 0;
>    SSA_NAME_IS_DEFAULT_DEF (t) = 0;
>    init_ssa_name_imm_use (t);

> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -1021,6 +1021,13 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
>  #define DECL_HIDDEN_STRING_LENGTH(NODE) \
>    (TREE_CHECK (NODE, PARM_DECL)->decl_common.decl_nonshareable_flag)
>
> +/* In a VAR_DECL, set for variables regarded as pointing to memory not written
> +   to. SSA_NAME_POINTS_TO_READONLY_MEMORY gets set for SSA_NAMEs created from
> +   such VAR_DECLs. Currently used by OpenACC 'readonly' modifier in copyin
> +   clauses.  */
> +#define DECL_POINTS_TO_READONLY(NODE) \
> +  (TREE_CHECK (NODE, VAR_DECL)->decl_common.decl_not_flexarray)
> +
>  /* In a CALL_EXPR, means that the call is the jump from a thunk to the
>     thunked-to function.  Be careful to avoid using this macro when one of the
>     next two applies instead.  */
> @@ -1815,6 +1822,10 @@ class auto_suppress_location_wrappers
>  #define OMP_CLAUSE_MAP_READONLY(NODE) \
>    TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
>
> +/* Set if 'OMP_CLAUSE_DECL (NODE)' points to read-only memory.  */
> +#define OMP_CLAUSE_MAP_POINTS_TO_READONLY(NODE) \
> +  TREE_CONSTANT (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
> +
>  /* Same as above, for use in OpenACC cache directives.  */
>  #define OMP_CLAUSE__CACHE__READONLY(NODE) \
>    TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_))

As in my "[PATCH, OpenACC 2.7] readonly modifier support in front-ends"
review, please document how certain flags are used for OMP clauses.


I note you're not actually using 'OMP_CLAUSE__CACHE__READONLY' anywhere
-- but that's OK given the current 'gcc/gimplify.cc:gimplify_oacc_cache'.
;-)


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Richard Biener Oct. 30, 2023, 12:46 p.m. UTC | #2
On Fri, Oct 27, 2023 at 4:28 PM Thomas Schwinge <thomas@codesourcery.com> wrote:
>
> Hi!
>
> Richard, as the original author of 'SSA_NAME_POINTS_TO_READONLY_MEMORY':
> 2018 commit 6214d5c7e7470bdd5ecbeae668c2522551bfebbc (Subversion r263958)
> "Move const_parm trick to generic code"; 'gcc/tree.h':
>
>     /* Nonzero if this SSA_NAME is known to point to memory that may not
>        be written to.  This is set for default defs of function parameters
>        that have a corresponding r or R specification in the functions
>        fn spec attribute.  This is used by alias analysis.  */
>     #define SSA_NAME_POINTS_TO_READONLY_MEMORY(NODE) \
>         SSA_NAME_CHECK (NODE)->base.deprecated_flag
>
> ..., may I ask you to please help review the following patch
> (full-quoted)?
>
> For context: this patch here ("second patch") depends on a first patch:
> <inbox.sourceware.org/d0e6013f-ca38-b98d-dc01-b30adbd5901a@siemens.com>
> "[PATCH, OpenACC 2.7] readonly modifier support in front-ends".  That one
> is still under review/rework; so you're not able to apply this second
> patch here.
>
> In a nutshell: a 'readonly' modifier has been added to the OpenACC
> 'copyin' clause (copy host to device memory, don't copy back at end of
> region):
>
> | If the optional 'readonly' modifier appears, then the implementation may assume that the data
> | referenced by _var-list_ is never written to within the applicable region.
>
> That is, for example (untested):
>
>     #pragma acc routine
>     void escape(int *);
>
>     int x[32] = [...];
>     #pragma acc parallel copyin(readonly: x)
>     {
>       int a1 = x[3];
>       escape(x);
>       int a2 = x[3]; // Per 'readonly', don't need to reload 'x[3]' here.
>       //x[22] = 0; // Invalid -- but no diagnostic mandated.
>     }
>
> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
> flag.
>
> The actual optimization then is done in this second patch.  Chung-Lin
> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
> I don't have much experience with most of the following generic code, so
> would appreciate a helping hand, whether that conceptually makes sense as
> well as from the implementation point of view:

No, I don't think you can use that flag on non-default-defs, nor
preserve it on copying.  So
it also doesn't nicely extend to DECLs as done by the patch.  We
currently _only_ use it
for incoming parameters.  When used on arbitrary code you can get to for example

ptr1(points-to-readony-memory) = &p->x;
... access via ptr1 ...
ptr2 = &p->x;
... access via ptr2 ...

where both are your OMP regions differently constrained (the constrain is on the
code in the region, _not_ on the actual protections of the pointed to
data, much like
for the fortran case).  But now CSE comes along and happily replaces all ptr2
with ptr2 in the second region and ... oops!

So no, re-using SSA_NAME_POINTS_TO_READONLY_MEMORY doesn't look good.

Richard.

> On 2023-07-25T23:52:06+0800, Chung-Lin Tang via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > On 2023/7/11 2:33 AM, Chung-Lin Tang via Gcc-patches wrote:
> >> As we discussed earlier, the work for actually linking this to middle-end
> >> points-to analysis is a somewhat non-trivial issue. This first patch allows
> >> the language feature to be used in OpenACC directives first (with no effect for now).
> >> The middle-end changes are probably going to be a later patch.
> >
> > This second patch tries to link the readonly modifier to points-to analysis.
> >
> > There already exists SSA_NAME_POINTS_TO_READONLY_MEMORY and it's support in the
> > alias oracle routines in tree-ssa-alias.cc, so basically what this patch does is
> > try to make the variables holding the array section base pointers to have this
> > flag set.
> >
> > There is an another OMP_CLAUSE_MAP_POINTS_TO_READONLY set by front-ends on the
> > associated pointer clauses if OMP_CLAUSE_MAP_READONLY is set.
> > Also a DECL_POINTS_TO_READONLY flag is set for VAR_DECLs when creating the tmp
> > vars carrying these receiver references on the offloaded side. These
> > eventually get translated to SSA_NAME_POINTS_TO_READONLY_MEMORY.
>
>
> > This still doesn't always work as expected in terms of optimization:
> > struct pointer fields and Fortran arrays (kind of like C structs) which have
> > several accesses to create the pointer access on the receive/offloaded side,
> > and SRA appears to not work on these sequences, so gets in the way of much
> > redundancy elimination.
>
> I understand correctly that this is left as future work?  Please add the test
> cases you have, XFAILed in some reasonable way.
>
>
> > Currently have one testcase where we can demonstrate 'readonly' can avoid
> > a clobber by function call.
>
> :-)
>
>
> > --- a/gcc/c/c-typeck.cc
> > +++ b/gcc/c/c-typeck.cc
> > @@ -14258,6 +14258,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort)
> >       OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH);
> >        else
> >       OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
> > +      if (OMP_CLAUSE_MAP_READONLY (c))
> > +     OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
> >        OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
> >        if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
> >         && !c_mark_addressable (t))
>
> > --- a/gcc/cp/semantics.cc
> > +++ b/gcc/cp/semantics.cc
> > @@ -5872,6 +5872,8 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort)
> >           }
> >         else
> >           OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
> > +       if (OMP_CLAUSE_MAP_READONLY (c))
> > +         OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
> >         OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
> >         if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
> >             && !cxx_mark_addressable (t))
>
> > --- a/gcc/fortran/trans-openmp.cc
> > +++ b/gcc/fortran/trans-openmp.cc
> > @@ -2524,6 +2524,8 @@ gfc_trans_omp_array_section (stmtblock_t *block, gfc_exec_op op,
> >        node3 = build_omp_clause (input_location, OMP_CLAUSE_MAP);
> >        OMP_CLAUSE_SET_MAP_KIND (node3, ptr_kind);
> >        OMP_CLAUSE_DECL (node3) = gfc_conv_descriptor_data_get (decl);
> > +      if (n->u.readonly)
> > +     OMP_CLAUSE_MAP_POINTS_TO_READONLY (node3) = 1;
> >        /* This purposely does not include GOMP_MAP_ALWAYS_POINTER.  The extra
> >        cast prevents gimplify.cc from recognising it as being part of the
> >        struct - and adding an 'alloc: for the 'desc.data' pointer, which
> > @@ -2559,6 +2561,8 @@ gfc_trans_omp_array_section (stmtblock_t *block, gfc_exec_op op,
> >                               OMP_CLAUSE_MAP);
> >        OMP_CLAUSE_SET_MAP_KIND (node3, ptr_kind);
> >        OMP_CLAUSE_DECL (node3) = decl;
> > +      if (n->u.readonly)
> > +     OMP_CLAUSE_MAP_POINTS_TO_READONLY (node3) = 1;
> >      }
>
> Could combine these two into one, after
> 'if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))' reconverges here, like
> where 'OMP_CLAUSE_SIZE (node3)' is set:
>
> >    ptr2 = fold_convert (ptrdiff_type_node, ptr2);
> >    OMP_CLAUSE_SIZE (node3) = fold_build2 (MINUS_EXPR, ptrdiff_type_node,
>
> Is 'n->u.readonly == OMP_CLAUSE_MAP_READONLY (node)'?  If yes, would the
> latter be clearer to use as the 'if' expression (like in C, C++ front
> ends)?
>
> I see further additional 'OMP_CLAUSE_MAP' clauses synthesized, for
> example in 'gcc/cp/semantics.cc:handle_omp_array_sections', or
> 'gcc/fortran/trans-openmp.cc:gfc_trans_omp_array_section', also
> 'gcc/gimplify.cc'.  I assume these are not relevant to have
> 'OMP_CLAUSE_MAP_READONLY' -> 'OMP_CLAUSE_MAP_POINTS_TO_READONLY'
> propagated?  Actually, per your changes (see below), there is one
> 'OMP_CLAUSE_MAP_POINTS_TO_READONLY' propagation in
> 'gcc/gimplify.cc:build_omp_struct_comp_nodes'.
>
> Is the current situation re flag setting/propagation what was empirically
> necessary to make the test case work, or is it a systematic review?  (The
> former is fine; I'd just like to know.)
>
> > --- a/gcc/gimple-expr.cc
> > +++ b/gcc/gimple-expr.cc
> > @@ -376,6 +376,8 @@ copy_var_decl (tree var, tree name, tree type)
> >    DECL_CONTEXT (copy) = DECL_CONTEXT (var);
> >    TREE_USED (copy) = 1;
> >    DECL_SEEN_IN_BIND_EXPR_P (copy) = 1;
> > +  if (VAR_P (var))
> > +    DECL_POINTS_TO_READONLY (copy) = DECL_POINTS_TO_READONLY (var);
> >    DECL_ATTRIBUTES (copy) = DECL_ATTRIBUTES (var);
> >    if (DECL_USER_ALIGN (var))
> >      {
>
> > --- a/gcc/gimplify.cc
> > +++ b/gcc/gimplify.cc
> > @@ -221,6 +221,7 @@ struct gimplify_omp_ctx
> >    splay_tree variables;
> >    hash_set<tree> *privatized_types;
> >    tree clauses;
> > +  hash_set<tree_operand_hash> *pt_readonly_ptrs;
> >    /* Iteration variables in an OMP_FOR.  */
> >    vec<tree> loop_iter_var;
> >    location_t location;
> > @@ -628,6 +629,15 @@ internal_get_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p,
> >    gimplify_expr (&val, pre_p, post_p, is_gimple_reg_rhs_or_call,
> >                fb_rvalue);
> >
> > +  bool pt_readonly = false;
> > +  if (gimplify_omp_ctxp && gimplify_omp_ctxp->pt_readonly_ptrs)
> > +    {
> > +      tree ptr = val;
> > +      if (TREE_CODE (ptr) == POINTER_PLUS_EXPR)
> > +     ptr = TREE_OPERAND (ptr, 0);
> > +      pt_readonly = gimplify_omp_ctxp->pt_readonly_ptrs->contains (ptr);
> > +    }
>
> 'POINTER_PLUS_EXPR' is the only special thing we may run into, here?
> (Generally, I prefer 'if', 'else if, [...], 'else gcc_unreachable ()'.)
>
> > +
> >    if (allow_ssa
> >        && gimplify_ctxp->into_ssa
> >        && is_gimple_reg_type (TREE_TYPE (val)))
> > @@ -639,9 +649,18 @@ internal_get_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p,
> >         if (name)
> >           SET_SSA_NAME_VAR_OR_IDENTIFIER (t, create_tmp_var_name (name));
> >       }
> > +      if (pt_readonly)
> > +     SSA_NAME_POINTS_TO_READONLY_MEMORY (t) = 1;
> >      }
> >    else
> > -    t = lookup_tmp_var (val, is_formal, not_gimple_reg);
> > +    {
> > +      t = lookup_tmp_var (val, is_formal, not_gimple_reg);
> > +      if (pt_readonly)
> > +     {
> > +       DECL_POINTS_TO_READONLY (t) = 1;
> > +       gimplify_omp_ctxp->pt_readonly_ptrs->add (t);
> > +     }
> > +    }
> >
> >    mod = build2 (INIT_EXPR, TREE_TYPE (t), t, unshare_expr (val));
> >
> > @@ -8906,6 +8925,8 @@ build_omp_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end,
> >    OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
> >    OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (grp_end));
> >    OMP_CLAUSE_CHAIN (c2) = NULL_TREE;
> > +  if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (grp_end))
> > +    OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
> >    tree grp_mid = NULL_TREE;
> >    if (OMP_CLAUSE_CHAIN (grp_start) != grp_end)
> >      grp_mid = OMP_CLAUSE_CHAIN (grp_start);
>
> For my understanding, is this empirically necessary, or a systematic
> review?
>
> > @@ -11741,6 +11762,16 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
> >
> >             gimplify_omp_ctxp = outer_ctx;
> >           }
> > +       else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> > +                && (code == OACC_PARALLEL
> > +                    || code == OACC_KERNELS
> > +                    || code == OACC_SERIAL)
> > +                && OMP_CLAUSE_MAP_POINTS_TO_READONLY (c))
> > +         {
> > +           if (ctx->pt_readonly_ptrs == NULL)
> > +             ctx->pt_readonly_ptrs = new hash_set<tree_operand_hash> ();
> > +           ctx->pt_readonly_ptrs->add (OMP_CLAUSE_DECL (c));
> > +         }
> >         if (notice_outer)
> >           goto do_notice;
> >         break;
>
> Also need to 'delete ctx->pt_readonly_ptrs;' somewhere.
>
> > --- a/gcc/omp-low.cc
> > +++ b/gcc/omp-low.cc
> > @@ -14098,6 +14098,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
> >               if (ref_to_array)
> >                 x = fold_convert_loc (clause_loc, TREE_TYPE (new_var), x);
> >               gimplify_expr (&x, &new_body, NULL, is_gimple_val, fb_rvalue);
> > +             if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (c) && VAR_P (x))
> > +               DECL_POINTS_TO_READONLY (x) = 1;
> >               if ((is_ref && !ref_to_array)
> >                   || ref_to_ptr)
> >                 {
>
> This is in the middle of the
> "Handle GOMP_MAP_FIRSTPRIVATE_{POINTER,REFERENCE} in second pass" code
> block.  Again, for my understanding, is this empirically necessary, or a
> systematic review?
>
> > --- a/gcc/testsuite/c-c++-common/goacc/readonly-1.c
> > +++ b/gcc/testsuite/c-c++-common/goacc/readonly-1.c
> > @@ -19,8 +19,8 @@ int main (void)
> >    return 0;
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*s.ptr \\\[len: 64\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\)" 1 "original" { target { c } } } } */
> > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: 64\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\)" 1 "original" { target { c++ } } } } */
> > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*s.ptr \\\[len: 64\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c } } } } */
> > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: 64\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c++ } } } } */
> >  /* { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(readonly:x\\\[0\\\] \\\[len: 128\\\]\\);$" 1 "original" } } */
>
> I suppose the new 'map(pt_readonly,attach_detach:s.ptr [bias: 0])' clause
> was previously "hidden" in '.+'?  Please then change that in the first
> patch "[PATCH, OpenACC 2.7] readonly modifier support in front-ends", so
> that we can see here what actually is changing (only 'pt_readonly', I
> suppose).
>
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/goacc/readonly-2.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-additional-options "-O -fdump-tree-fre" } */
> > +
> > +#pragma acc routine
> > +extern void foo (int *ptr, int val);
> > +
> > +int main (void)
> > +{
> > +  int r, a[32];
> > +  #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
> > +  {
> > +    foo (a, a[8]);
> > +    r = a[8];
> > +  }
> > +}
> > +/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 1 "fre1" } } */
>
> Please add a comment why 'fre1', and what generally is being checked
> here; that's not obvious to the casual reader.  (That is, me in a few
> weeks.)  ;-)
>
> Also add a scan for "before the optimization": two 'MEM's, I suppose?
>
> > --- a/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
> > +++ b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
> > @@ -20,8 +20,8 @@ program main
> >    !$acc end parallel
> >  end program main
> >
> > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) .+ map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\)" 1 "original" } }
> > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:a\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &a\\) / 4\\\] \\\[len: .+\\\]\\) .+ map\\(readonly,to:b\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &b\\) / 4\\\] \\\[len: .+\\\]\\)" 1 "original" } }
> > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:a.0 \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) a.0\\\]\\) map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:b \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) b\\\]\\)" 1 "original" } }
> > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:a\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &a\\) / 4\\\] \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:a \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &a\\\]\\) map\\(readonly,to:b\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &b\\) / 4\\\] \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:b \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &b\\\]\\)" 1 "original" } }
> >  ! { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\);" 2 "original" } }
>
> Same comment as for 'c-c++-common/goacc/readonly-1.c'.
>
> > --- a/gcc/tree-pretty-print.cc
> > +++ b/gcc/tree-pretty-print.cc
> > @@ -907,6 +907,8 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
> >        pp_string (pp, "map(");
> >        if (OMP_CLAUSE_MAP_READONLY (clause))
> >       pp_string (pp, "readonly,");
> > +      if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (clause))
> > +     pp_string (pp, "pt_readonly,");
> >        switch (OMP_CLAUSE_MAP_KIND (clause))
> >       {
> >       case GOMP_MAP_ALLOC:
> > @@ -3436,6 +3438,8 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
> >       pp_string (pp, "(D)");
> >        if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (node))
> >       pp_string (pp, "(ab)");
> > +      if (SSA_NAME_POINTS_TO_READONLY_MEMORY (node))
> > +     pp_string (pp, "(ptro)");
> >        break;
> >
> >      case WITH_SIZE_EXPR:
>
> > --- a/gcc/tree-ssanames.cc
> > +++ b/gcc/tree-ssanames.cc
> > @@ -402,6 +402,9 @@ make_ssa_name_fn (struct function *fn, tree var, gimple *stmt,
> >    else
> >      SSA_NAME_RANGE_INFO (t) = NULL;
> >
> > +  if (VAR_P (var) && DECL_POINTS_TO_READONLY (var))
> > +    SSA_NAME_POINTS_TO_READONLY_MEMORY (t) = 1;
> > +
> >    SSA_NAME_IN_FREE_LIST (t) = 0;
> >    SSA_NAME_IS_DEFAULT_DEF (t) = 0;
> >    init_ssa_name_imm_use (t);
>
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -1021,6 +1021,13 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
> >  #define DECL_HIDDEN_STRING_LENGTH(NODE) \
> >    (TREE_CHECK (NODE, PARM_DECL)->decl_common.decl_nonshareable_flag)
> >
> > +/* In a VAR_DECL, set for variables regarded as pointing to memory not written
> > +   to. SSA_NAME_POINTS_TO_READONLY_MEMORY gets set for SSA_NAMEs created from
> > +   such VAR_DECLs. Currently used by OpenACC 'readonly' modifier in copyin
> > +   clauses.  */
> > +#define DECL_POINTS_TO_READONLY(NODE) \
> > +  (TREE_CHECK (NODE, VAR_DECL)->decl_common.decl_not_flexarray)
> > +
> >  /* In a CALL_EXPR, means that the call is the jump from a thunk to the
> >     thunked-to function.  Be careful to avoid using this macro when one of the
> >     next two applies instead.  */
> > @@ -1815,6 +1822,10 @@ class auto_suppress_location_wrappers
> >  #define OMP_CLAUSE_MAP_READONLY(NODE) \
> >    TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
> >
> > +/* Set if 'OMP_CLAUSE_DECL (NODE)' points to read-only memory.  */
> > +#define OMP_CLAUSE_MAP_POINTS_TO_READONLY(NODE) \
> > +  TREE_CONSTANT (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
> > +
> >  /* Same as above, for use in OpenACC cache directives.  */
> >  #define OMP_CLAUSE__CACHE__READONLY(NODE) \
> >    TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_))
>
> As in my "[PATCH, OpenACC 2.7] readonly modifier support in front-ends"
> review, please document how certain flags are used for OMP clauses.
>
>
> I note you're not actually using 'OMP_CLAUSE__CACHE__READONLY' anywhere
> -- but that's OK given the current 'gcc/gimplify.cc:gimplify_oacc_cache'.
> ;-)
>
>
> Grüße
>  Thomas
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Chung-Lin Tang April 3, 2024, 11:50 a.m. UTC | #3
Hi Richard, Thomas,

On 2023/10/30 8:46 PM, Richard Biener wrote:
>>
>> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
>> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
>> flag.
>>
>> The actual optimization then is done in this second patch.  Chung-Lin
>> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
>> I don't have much experience with most of the following generic code, so
>> would appreciate a helping hand, whether that conceptually makes sense as
>> well as from the implementation point of view:

First of all, I have removed all of the gimplify-stage scanning and setting of
DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes to
gimplify.cc now)

I remember this code was an artifact of earlier attempts to allow struct-member
pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed anyways.
I think the omp_data_* member accesses when building child function side
receiver_refs is blocking points-to analysis from working (didn't try digging deeper)

Also during gimplify, VAR_DECLs appeared to be reused (at least in some cases) for map
clause decl reference building, so hoping that the variables "happen to be" single-use and
DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does appear to be
a little risky.

However, for firstprivate pointers processed during omp-low, it appears to be somewhat different.
(see below description)

> No, I don't think you can use that flag on non-default-defs, nor
> preserve it on copying.  So
> it also doesn't nicely extend to DECLs as done by the patch.  We
> currently _only_ use it
> for incoming parameters.  When used on arbitrary code you can get to for example
> 
> ptr1(points-to-readony-memory) = &p->x;
> ... access via ptr1 ...
> ptr2 = &p->x;
> ... access via ptr2 ...
> 
> where both are your OMP regions differently constrained (the constrain is on the
> code in the region, _not_ on the actual protections of the pointed to
> data, much like
> for the fortran case).  But now CSE comes along and happily replaces all ptr2
> with ptr2 in the second region and ... oops!

Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in the second region"?

That doesn't happen, because during omp-lower/expand, OMP target regions (which is all that
this applies currently) is separated into different individual child functions.

(Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during omp-lower, when
for firstprivate pointers (i.e. 'a' here) we set this bit when constructing the first load
of this pointer)

  #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
  {
    foo (a, a[8]);
    r = a[8];
  }
  #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
  {
    foo (a, a[12]);
    r = a[12];
  }

After omp-expand (before SSA):

__attribute__((oacc parallel, omp target entrypoint, noclone))
void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
{
 ...
  <bb 5> :
  D.2962 = .omp_data_i->D.2947;
  a.8 = D.2962;
  r.1 = (*a.8)[12];
  foo (a.8, r.1);
  r.1 = (*a.8)[12];
  D.2965 = .omp_data_i->r;
  *D.2965 = r.1;
  return;
}

__attribute__((oacc parallel, omp target entrypoint, noclone))
void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i)
{
  ...
  <bb 3> :
  D.2968 = .omp_data_i->D.2939;
  a.4 = D.2968;
  r.0 = (*a.4)[8];
  foo (a.4, r.0);
  r.0 = (*a.4)[8];
  D.2971 = .omp_data_i->r;
  *D.2971 = r.0;
  return;
}

So actually, the creating of DECL_POINTS_TO_READONLY and its relaying to
SSA_NAME_POINTS_TO_READONLY_MEMORY here, is actually quite similar to a default-def
for an PARM_DECL, at least conceptually.

(If offloading was structured significantly differently, say if child functions
were separated much earlier before omp-lowering, than this readonly-modifier might
possibly be a direct application of 'r' in the "fn spec" attribute)

Other changes since first version of patch include:
1) update of C/C++ FE changes to new style in c-family/c-omp.cc
2) merging of two if cases in fortran/trans-openmp.cc like Thomas suggested
3) Update of readonly-2.c testcase to scan before/after "fre1" pass, to verify removal of a MEM load, also as Thomas suggested.

I have re-tested this patch using mainline, with no regressions. Is this okay for mainline?

Thanks,
Chung-Lin

2024-04-03  Chung-Lin Tang  <cltang@baylibre.com>

gcc/c-family/ChangeLog:

	* c-omp.cc (c_omp_address_inspector::expand_array_base):
	Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.
	(c_omp_address_inspector::expand_component_selector): Likewise.

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_trans_omp_array_section):
	Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.

gcc/ChangeLog:

	* gimple-expr.cc (copy_var_decl): Copy DECL_POINTS_TO_READONLY
	for VAR_DECLs.
	* omp-low.cc (lower_omp_target): Set DECL_POINTS_TO_READONLY for
	variables of receiver refs.
	* tree-pretty-print.cc (dump_omp_clause):
	Print OMP_CLAUSE_MAP_POINTS_TO_READONLY.
	(dump_generic_node): Print SSA_NAME_POINTS_TO_READONLY_MEMORY.
	* tree-ssanames.cc (make_ssa_name_fn): Set
	SSA_NAME_POINTS_TO_READONLY_MEMORY if DECL_POINTS_TO_READONLY is set.
	* tree.h (DECL_POINTS_TO_READONLY): New macro.
	(OMP_CLAUSE_MAP_POINTS_TO_READONLY): New macro.

gcc/testsuite/ChangeLog:

	* c-c++-common/goacc/readonly-1.c: Adjust testcase.
	* c-c++-common/goacc/readonly-2.c: New testcase.
	* gfortran.dg/goacc/readonly-1.f90: Adjust testcase.
diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index c0e02aa422f..458df1434ed 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -3907,6 +3907,8 @@ c_omp_address_inspector::expand_array_base (tree c,
     }
   else if (c2)
     {
+      if (OMP_CLAUSE_MAP_READONLY (c))
+	OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
       OMP_CLAUSE_CHAIN (c2) = OMP_CLAUSE_CHAIN (c);
       OMP_CLAUSE_CHAIN (c) = c2;
       if (implicit_p)
@@ -4051,6 +4053,8 @@ c_omp_address_inspector::expand_component_selector (tree c,
     }
   else if (c2)
     {
+      if (OMP_CLAUSE_MAP_READONLY (c))
+	OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
       OMP_CLAUSE_CHAIN (c2) = OMP_CLAUSE_CHAIN (c);
       OMP_CLAUSE_CHAIN (c) = c2;
       c = c2;
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index f867e2240bf..1b4bdb90cb6 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -2561,6 +2561,8 @@ gfc_trans_omp_array_section (stmtblock_t *block, gfc_exec_op op,
   ptr2 = fold_convert (ptrdiff_type_node, ptr2);
   OMP_CLAUSE_SIZE (node3) = fold_build2 (MINUS_EXPR, ptrdiff_type_node,
 					 ptr, ptr2);
+  if (n->u.map.readonly)
+    OMP_CLAUSE_MAP_POINTS_TO_READONLY (node3) = 1;
 }
 
 static tree
diff --git a/gcc/gimple-expr.cc b/gcc/gimple-expr.cc
index f8d7185530c..35aca9dc979 100644
--- a/gcc/gimple-expr.cc
+++ b/gcc/gimple-expr.cc
@@ -385,6 +385,8 @@ copy_var_decl (tree var, tree name, tree type)
   DECL_CONTEXT (copy) = DECL_CONTEXT (var);
   TREE_USED (copy) = 1;
   DECL_SEEN_IN_BIND_EXPR_P (copy) = 1;
+  if (VAR_P (var))
+    DECL_POINTS_TO_READONLY (copy) = DECL_POINTS_TO_READONLY (var);
   DECL_ATTRIBUTES (copy) = DECL_ATTRIBUTES (var);
   if (DECL_USER_ALIGN (var))
     {
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 4d003f42098..3c1024d563a 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14003,6 +14003,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		if (ref_to_array)
 		  x = fold_convert_loc (clause_loc, TREE_TYPE (new_var), x);
 		gimplify_expr (&x, &new_body, NULL, is_gimple_val, fb_rvalue);
+		if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (c) && VAR_P (x))
+		  DECL_POINTS_TO_READONLY (x) = 1;
 		if ((is_ref && !ref_to_array)
 		    || ref_to_ptr)
 		  {
diff --git a/gcc/testsuite/c-c++-common/goacc/readonly-1.c b/gcc/testsuite/c-c++-common/goacc/readonly-1.c
index 300464c92e3..88b6bb9efcf 100644
--- a/gcc/testsuite/c-c++-common/goacc/readonly-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/readonly-1.c
@@ -48,17 +48,17 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "(?n)#pragma acc declare map\\(to:y\\) map\\(readonly,to:s\\) map\\(readonly,to:x\\)" 1 "original" } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
-/* { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
-/* { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
-/* { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
-/* { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
+/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c } } } } */
+/* { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c } } } } */
+/* { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c } } } } */
+/* { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
+/* { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
 
-/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
-/* { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
-/* { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
-/* { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
-/* { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
+/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c++ } } } } */
+/* { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c++ } } } } */
+/* { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c++ } } } } */
+/* { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
+/* { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
 
 /* { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(readonly:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\);$" 4 "original" } } */
 /* { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\);$" 4 "original" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/readonly-2.c b/gcc/testsuite/c-c++-common/goacc/readonly-2.c
new file mode 100644
index 00000000000..3f52a9f6afb
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/readonly-2.c
@@ -0,0 +1,16 @@
+/* { dg-additional-options "-O -fdump-tree-phiprop -fdump-tree-fre" } */
+
+#pragma acc routine
+extern void foo (int *ptr, int val);
+
+int main (void)
+{
+  int r, a[32];
+  #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
+  {
+    foo (a, a[8]);
+    r = a[8];
+  }
+}
+/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 2 "phiprop1" } } */
+/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 1 "fre1" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90 b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
index fc1e2719e67..cad449e6d40 100644
--- a/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
@@ -80,16 +80,16 @@ end program main
 ! The front end turns OpenACC 'declare' into OpenACC 'data'.
 !   { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:\\*b\\) map\\(alloc:b.+ map\\(to:\\*c\\) map\\(alloc:c.+" 1 "original" } }
 !   { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:g\\) map\\(to:h\\)" 1 "original" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
 
 ! { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\);" 8 "original" } }
 ! { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\);" 8 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/pr67170.f90 b/gcc/testsuite/gfortran.dg/pr67170.f90
index 80236470f42..d7c33a4c3db 100644
--- a/gcc/testsuite/gfortran.dg/pr67170.f90
+++ b/gcc/testsuite/gfortran.dg/pr67170.f90
@@ -28,4 +28,4 @@ end subroutine foo
 end module test_module
 end program
 
-! { dg-final { scan-tree-dump-times "= \\*arg_\[0-9\]+\\(D\\);" 1 "fre1" } }
+! { dg-final { scan-tree-dump-times "= \\*arg_\[0-9\]+\\(D\\)\\(ptro\\);" 1 "fre1" } }
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 926f7e006a7..62411a97ab9 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -915,6 +915,8 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
       pp_string (pp, "map(");
       if (OMP_CLAUSE_MAP_READONLY (clause))
 	pp_string (pp, "readonly,");
+      if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (clause))
+	pp_string (pp, "pt_readonly,");
       switch (OMP_CLAUSE_MAP_KIND (clause))
 	{
 	case GOMP_MAP_ALLOC:
@@ -3620,6 +3622,8 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
 	pp_string (pp, "(D)");
       if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (node))
 	pp_string (pp, "(ab)");
+      if (SSA_NAME_POINTS_TO_READONLY_MEMORY (node))
+	pp_string (pp, "(ptro)");
       break;
 
     case WITH_SIZE_EXPR:
diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
index 1753a421a0b..cbdf4b11769 100644
--- a/gcc/tree-ssanames.cc
+++ b/gcc/tree-ssanames.cc
@@ -402,6 +402,9 @@ make_ssa_name_fn (struct function *fn, tree var, gimple *stmt,
   else
     SSA_NAME_RANGE_INFO (t) = NULL;
 
+  if (VAR_P (var) && DECL_POINTS_TO_READONLY (var))
+    SSA_NAME_POINTS_TO_READONLY_MEMORY (t) = 1;
+
   SSA_NAME_IN_FREE_LIST (t) = 0;
   SSA_NAME_IS_DEFAULT_DEF (t) = 0;
   init_ssa_name_imm_use (t);
diff --git a/gcc/tree.h b/gcc/tree.h
index b67a37d6522..1c5b883bc82 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1036,6 +1036,13 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 #define DECL_HIDDEN_STRING_LENGTH(NODE) \
   (TREE_CHECK (NODE, PARM_DECL)->decl_common.decl_nonshareable_flag)
 
+/* In a VAR_DECL, set for variables regarded as pointing to memory not written
+   to. SSA_NAME_POINTS_TO_READONLY_MEMORY gets set for SSA_NAMEs created from
+   such VAR_DECLs. Currently used by OpenACC 'readonly' modifier in copyin
+   clauses.  */
+#define DECL_POINTS_TO_READONLY(NODE) \
+  (TREE_CHECK (NODE, VAR_DECL)->decl_common.decl_not_flexarray)
+
 /* In a CALL_EXPR, means that the call is the jump from a thunk to the
    thunked-to function.  Be careful to avoid using this macro when one of the
    next two applies instead.  */
@@ -1845,6 +1852,10 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_MAP_READONLY(NODE) \
   TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
 
+/* Set if 'OMP_CLAUSE_DECL (NODE)' points to read-only memory.  */
+#define OMP_CLAUSE_MAP_POINTS_TO_READONLY(NODE) \
+  TREE_CONSTANT (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
+
 /* Same as above, for use in OpenACC cache directives.  */
 #define OMP_CLAUSE__CACHE__READONLY(NODE) \
   TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_))
Thomas Schwinge April 11, 2024, 2:29 p.m. UTC | #4
Hi Chung-Lin, Richard!

From me just a few mechanical pieces, see below.  Richard, are you able
to again comment on Chung-Lin's general strategy, as I'm not at all
familiar with those parts of the code?

On 2024-04-03T19:50:55+0800, Chung-Lin Tang <cltang@pllab.cs.nthu.edu.tw> wrote:
> On 2023/10/30 8:46 PM, Richard Biener wrote:
>>>
>>> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
>>> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
>>> flag.
>>>
>>> The actual optimization then is done in this second patch.  Chung-Lin
>>> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
>>> I don't have much experience with most of the following generic code, so
>>> would appreciate a helping hand, whether that conceptually makes sense as
>>> well as from the implementation point of view:
>
> First of all, I have removed all of the gimplify-stage scanning and setting of
> DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes to
> gimplify.cc now)
>
> I remember this code was an artifact of earlier attempts to allow struct-member
> pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed anyways.
> I think the omp_data_* member accesses when building child function side
> receiver_refs is blocking points-to analysis from working (didn't try digging deeper)
>
> Also during gimplify, VAR_DECLs appeared to be reused (at least in some cases) for map
> clause decl reference building, so hoping that the variables "happen to be" single-use and
> DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does appear to be
> a little risky.
>
> However, for firstprivate pointers processed during omp-low, it appears to be somewhat different.
> (see below description)
>
>> No, I don't think you can use that flag on non-default-defs, nor
>> preserve it on copying.  So
>> it also doesn't nicely extend to DECLs as done by the patch.  We
>> currently _only_ use it
>> for incoming parameters.  When used on arbitrary code you can get to for example
>> 
>> ptr1(points-to-readony-memory) = &p->x;
>> ... access via ptr1 ...
>> ptr2 = &p->x;
>> ... access via ptr2 ...
>> 
>> where both are your OMP regions differently constrained (the constrain is on the
>> code in the region, _not_ on the actual protections of the pointed to
>> data, much like
>> for the fortran case).  But now CSE comes along and happily replaces all ptr2
>> with ptr2 in the second region and ... oops!
>
> Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in the second region"?
>
> That doesn't happen, because during omp-lower/expand, OMP target regions (which is all that
> this applies currently) is separated into different individual child functions.
>
> (Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during omp-lower, when
> for firstprivate pointers (i.e. 'a' here) we set this bit when constructing the first load
> of this pointer)
>
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
>     foo (a, a[8]);
>     r = a[8];
>   }
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
>     foo (a, a[12]);
>     r = a[12];
>   }
>
> After omp-expand (before SSA):
>
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
> {
>  ...
>   <bb 5> :
>   D.2962 = .omp_data_i->D.2947;
>   a.8 = D.2962;
>   r.1 = (*a.8)[12];
>   foo (a.8, r.1);
>   r.1 = (*a.8)[12];
>   D.2965 = .omp_data_i->r;
>   *D.2965 = r.1;
>   return;
> }
>
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i)
> {
>   ...
>   <bb 3> :
>   D.2968 = .omp_data_i->D.2939;
>   a.4 = D.2968;
>   r.0 = (*a.4)[8];
>   foo (a.4, r.0);
>   r.0 = (*a.4)[8];
>   D.2971 = .omp_data_i->r;
>   *D.2971 = r.0;
>   return;
> }
>
> So actually, the creating of DECL_POINTS_TO_READONLY and its relaying to
> SSA_NAME_POINTS_TO_READONLY_MEMORY here, is actually quite similar to a default-def
> for an PARM_DECL, at least conceptually.
>
> (If offloading was structured significantly differently, say if child functions
> were separated much earlier before omp-lowering, than this readonly-modifier might
> possibly be a direct application of 'r' in the "fn spec" attribute)
>
> Other changes since first version of patch include:
> 1) update of C/C++ FE changes to new style in c-family/c-omp.cc
> 2) merging of two if cases in fortran/trans-openmp.cc like Thomas suggested
> 3) Update of readonly-2.c testcase to scan before/after "fre1" pass, to verify removal of a MEM load, also as Thomas suggested.

Thanks!

> I have re-tested this patch using mainline, with no regressions. Is this okay for mainline?

> 2024-04-03  Chung-Lin Tang  <cltang@baylibre.com>
>
> gcc/c-family/ChangeLog:
>
> 	* c-omp.cc (c_omp_address_inspector::expand_array_base):
> 	Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.
> 	(c_omp_address_inspector::expand_component_selector): Likewise.
>
> gcc/fortran/ChangeLog:
>
> 	* trans-openmp.cc (gfc_trans_omp_array_section):
> 	Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.
>
> gcc/ChangeLog:
>
> 	* gimple-expr.cc (copy_var_decl): Copy DECL_POINTS_TO_READONLY
> 	for VAR_DECLs.
> 	* omp-low.cc (lower_omp_target): Set DECL_POINTS_TO_READONLY for
> 	variables of receiver refs.
> 	* tree-pretty-print.cc (dump_omp_clause):
> 	Print OMP_CLAUSE_MAP_POINTS_TO_READONLY.
> 	(dump_generic_node): Print SSA_NAME_POINTS_TO_READONLY_MEMORY.
> 	* tree-ssanames.cc (make_ssa_name_fn): Set
> 	SSA_NAME_POINTS_TO_READONLY_MEMORY if DECL_POINTS_TO_READONLY is set.
> 	* tree.h (DECL_POINTS_TO_READONLY): New macro.
> 	(OMP_CLAUSE_MAP_POINTS_TO_READONLY): New macro.
>
> gcc/testsuite/ChangeLog:
>
> 	* c-c++-common/goacc/readonly-1.c: Adjust testcase.
> 	* c-c++-common/goacc/readonly-2.c: New testcase.
> 	* gfortran.dg/goacc/readonly-1.f90: Adjust testcase.

> --- a/gcc/c-family/c-omp.cc
> +++ b/gcc/c-family/c-omp.cc
> @@ -3907,6 +3907,8 @@ c_omp_address_inspector::expand_array_base (tree c,
>      }
>    else if (c2)
>      {
> +      if (OMP_CLAUSE_MAP_READONLY (c))
> +	OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
>        OMP_CLAUSE_CHAIN (c2) = OMP_CLAUSE_CHAIN (c);
>        OMP_CLAUSE_CHAIN (c) = c2;
>        if (implicit_p)
> @@ -4051,6 +4053,8 @@ c_omp_address_inspector::expand_component_selector (tree c,
>      }
>    else if (c2)
>      {
> +      if (OMP_CLAUSE_MAP_READONLY (c))
> +	OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
>        OMP_CLAUSE_CHAIN (c2) = OMP_CLAUSE_CHAIN (c);
>        OMP_CLAUSE_CHAIN (c) = c2;
>        c = c2;

(So this replaces the 'gcc/c/c-typeck.cc:handle_omp_array_sections',
'gcc/cp/semantics.cc:handle_omp_array_sections' changes of the previous
patch revision?)

Are we sure that really only the 'else if (c2)' branches need to handle
this, and explicitly not the preceding 'if (c3)' branches, too?  I
suggest we add a comment and/or handling, as necessary.  If that makes
sense, maybe handle for both 'c3', 'c2' via a 'bool readonly_p = [...]',
similar to the existing 'bool implicit_p'?

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc
> @@ -2561,6 +2561,8 @@ gfc_trans_omp_array_section (stmtblock_t *block, gfc_exec_op op,
>    ptr2 = fold_convert (ptrdiff_type_node, ptr2);
>    OMP_CLAUSE_SIZE (node3) = fold_build2 (MINUS_EXPR, ptrdiff_type_node,
>  					 ptr, ptr2);
> +  if (n->u.map.readonly)
> +    OMP_CLAUSE_MAP_POINTS_TO_READONLY (node3) = 1;
>  }
>  
>  static tree

> --- a/gcc/gimple-expr.cc
> +++ b/gcc/gimple-expr.cc
> @@ -385,6 +385,8 @@ copy_var_decl (tree var, tree name, tree type)
>    DECL_CONTEXT (copy) = DECL_CONTEXT (var);
>    TREE_USED (copy) = 1;
>    DECL_SEEN_IN_BIND_EXPR_P (copy) = 1;
> +  if (VAR_P (var))
> +    DECL_POINTS_TO_READONLY (copy) = DECL_POINTS_TO_READONLY (var);
>    DECL_ATTRIBUTES (copy) = DECL_ATTRIBUTES (var);
>    if (DECL_USER_ALIGN (var))
>      {

> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -14003,6 +14003,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>  		if (ref_to_array)
>  		  x = fold_convert_loc (clause_loc, TREE_TYPE (new_var), x);
>  		gimplify_expr (&x, &new_body, NULL, is_gimple_val, fb_rvalue);
> +		if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (c) && VAR_P (x))
> +		  DECL_POINTS_TO_READONLY (x) = 1;
>  		if ((is_ref && !ref_to_array)
>  		    || ref_to_ptr)
>  		  {

> --- a/gcc/testsuite/c-c++-common/goacc/readonly-1.c
> +++ b/gcc/testsuite/c-c++-common/goacc/readonly-1.c
> @@ -48,17 +48,17 @@ int main (void)
>  
>  /* { dg-final { scan-tree-dump-times "(?n)#pragma acc declare map\\(to:y\\) map\\(readonly,to:s\\) map\\(readonly,to:x\\)" 1 "original" } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c } } } } */
> +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c } } } } */
> +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c } } } } */
> +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
>  
> -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c++ } } } } */
> +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c++ } } } } */
> +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c++ } } } } */
> +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
>  
>  /* { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(readonly:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\);$" 4 "original" } } */
>  /* { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\);$" 4 "original" } } */

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/readonly-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-additional-options "-O -fdump-tree-phiprop -fdump-tree-fre" } */
> +
> +#pragma acc routine
> +extern void foo (int *ptr, int val);
> +
> +int main (void)
> +{
> +  int r, a[32];
> +  #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
> +  {
> +    foo (a, a[8]);
> +    r = a[8];
> +  }
> +}
> +/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 2 "phiprop1" } } */
> +/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 1 "fre1" } } */

In the tree where I've been testing your patch, I've not been seeing
'MEM[x]' but '(*x)', therefore:

    -/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 2 "phiprop1" } } */
    -/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 1 "fre1" } } */
    +/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = \\(\\\*_\[0-9\]+\\(ptro\\)\\)\\\[8\\\];" 2 "phiprop1" } } */
    +/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = \\(\\\*_\[0-9\]+\\(ptro\\)\\)\\\[8\\\];" 1 "fre1" } } */

Maybe that's due to something else in my (long...) Git branch; just make
sure you've got PASSes here, eventually.

> --- a/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
> +++ b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
> @@ -80,16 +80,16 @@ end program main
>  ! The front end turns OpenACC 'declare' into OpenACC 'data'.
>  !   { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:\\*b\\) map\\(alloc:b.+ map\\(to:\\*c\\) map\\(alloc:c.+" 1 "original" } }
>  !   { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:g\\) map\\(to:h\\)" 1 "original" } }
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> -! { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> +! { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
>  
>  ! { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\);" 8 "original" } }
>  ! { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\);" 8 "original" } }

Can we also get an OpenACC/Fortran test case à la
'c-c++-common/goacc/readonly-2.c' to demonstrate this doing something?

> --- a/gcc/testsuite/gfortran.dg/pr67170.f90
> +++ b/gcc/testsuite/gfortran.dg/pr67170.f90
> @@ -28,4 +28,4 @@ end subroutine foo
>  end module test_module
>  end program
>  
> -! { dg-final { scan-tree-dump-times "= \\*arg_\[0-9\]+\\(D\\);" 1 "fre1" } }
> +! { dg-final { scan-tree-dump-times "= \\*arg_\[0-9\]+\\(D\\)\\(ptro\\);" 1 "fre1" } }

Is it understood what's happening here, that this is the correct
behavior?  I suppose so -- there's no actual change in behavior -- as
this here doesn't trigger for OpenACC 'readonly' modifier, but just the
pretty printer change for 'SSA_NAME_POINTS_TO_READONLY_MEMORY':

> --- a/gcc/tree-pretty-print.cc
> +++ b/gcc/tree-pretty-print.cc
> @@ -915,6 +915,8 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
>        pp_string (pp, "map(");
>        if (OMP_CLAUSE_MAP_READONLY (clause))
>  	pp_string (pp, "readonly,");
> +      if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (clause))
> +	pp_string (pp, "pt_readonly,");
>        switch (OMP_CLAUSE_MAP_KIND (clause))
>  	{
>  	case GOMP_MAP_ALLOC:
> @@ -3620,6 +3622,8 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
>  	pp_string (pp, "(D)");
>        if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (node))
>  	pp_string (pp, "(ab)");
> +      if (SSA_NAME_POINTS_TO_READONLY_MEMORY (node))
> +	pp_string (pp, "(ptro)");
>        break;
>  
>      case WITH_SIZE_EXPR:

> --- a/gcc/tree-ssanames.cc
> +++ b/gcc/tree-ssanames.cc
> @@ -402,6 +402,9 @@ make_ssa_name_fn (struct function *fn, tree var, gimple *stmt,
>    else
>      SSA_NAME_RANGE_INFO (t) = NULL;
>  
> +  if (VAR_P (var) && DECL_POINTS_TO_READONLY (var))
> +    SSA_NAME_POINTS_TO_READONLY_MEMORY (t) = 1;
> +
>    SSA_NAME_IN_FREE_LIST (t) = 0;
>    SSA_NAME_IS_DEFAULT_DEF (t) = 0;
>    init_ssa_name_imm_use (t);

> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -1036,6 +1036,13 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
>  #define DECL_HIDDEN_STRING_LENGTH(NODE) \
>    (TREE_CHECK (NODE, PARM_DECL)->decl_common.decl_nonshareable_flag)
>  
> +/* In a VAR_DECL, set for variables regarded as pointing to memory not written
> +   to. SSA_NAME_POINTS_TO_READONLY_MEMORY gets set for SSA_NAMEs created from
> +   such VAR_DECLs. Currently used by OpenACC 'readonly' modifier in copyin
> +   clauses.  */
> +#define DECL_POINTS_TO_READONLY(NODE) \
> +  (TREE_CHECK (NODE, VAR_DECL)->decl_common.decl_not_flexarray)

Again update the table for the flag uses are listed?

(There is a 'VAR_DECL_CHECK', which hopefully means the same thing.)

> +
>  /* In a CALL_EXPR, means that the call is the jump from a thunk to the
>     thunked-to function.  Be careful to avoid using this macro when one of the
>     next two applies instead.  */
> @@ -1845,6 +1852,10 @@ class auto_suppress_location_wrappers
>  #define OMP_CLAUSE_MAP_READONLY(NODE) \
>    TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
>  
> +/* Set if 'OMP_CLAUSE_DECL (NODE)' points to read-only memory.  */
> +#define OMP_CLAUSE_MAP_POINTS_TO_READONLY(NODE) \
> +  TREE_CONSTANT (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
> +
>  /* Same as above, for use in OpenACC cache directives.  */
>  #define OMP_CLAUSE__CACHE__READONLY(NODE) \
>    TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_))

(Note, corresponding 'OMP_CLAUSE_MAP_POINTS_TO_READONLY' doesn't exist
yet, due to missing actual handling of the OpenACC 'cache' directive;
'gcc/gimplify.cc:gimplify_oacc_cache'.)


Grüße
 Thomas
Richard Biener April 12, 2024, 6:17 a.m. UTC | #5
On Thu, 11 Apr 2024, Thomas Schwinge wrote:

> Hi Chung-Lin, Richard!
> 
> From me just a few mechanical pieces, see below.  Richard, are you able
> to again comment on Chung-Lin's general strategy, as I'm not at all
> familiar with those parts of the code?

I've queued all stage1 material and will be only able to slowly look
at it after we released.

> On 2024-04-03T19:50:55+0800, Chung-Lin Tang <cltang@pllab.cs.nthu.edu.tw> wrote:
> > On 2023/10/30 8:46 PM, Richard Biener wrote:
> >>>
> >>> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
> >>> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
> >>> flag.
> >>>
> >>> The actual optimization then is done in this second patch.  Chung-Lin
> >>> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
> >>> I don't have much experience with most of the following generic code, so
> >>> would appreciate a helping hand, whether that conceptually makes sense as
> >>> well as from the implementation point of view:
> >
> > First of all, I have removed all of the gimplify-stage scanning and setting of
> > DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes to
> > gimplify.cc now)
> >
> > I remember this code was an artifact of earlier attempts to allow struct-member
> > pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed anyways.
> > I think the omp_data_* member accesses when building child function side
> > receiver_refs is blocking points-to analysis from working (didn't try digging deeper)
> >
> > Also during gimplify, VAR_DECLs appeared to be reused (at least in some cases) for map
> > clause decl reference building, so hoping that the variables "happen to be" single-use and
> > DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does appear to be
> > a little risky.
> >
> > However, for firstprivate pointers processed during omp-low, it appears to be somewhat different.
> > (see below description)
> >
> >> No, I don't think you can use that flag on non-default-defs, nor
> >> preserve it on copying.  So
> >> it also doesn't nicely extend to DECLs as done by the patch.  We
> >> currently _only_ use it
> >> for incoming parameters.  When used on arbitrary code you can get to for example
> >> 
> >> ptr1(points-to-readony-memory) = &p->x;
> >> ... access via ptr1 ...
> >> ptr2 = &p->x;
> >> ... access via ptr2 ...
> >> 
> >> where both are your OMP regions differently constrained (the constrain is on the
> >> code in the region, _not_ on the actual protections of the pointed to
> >> data, much like
> >> for the fortran case).  But now CSE comes along and happily replaces all ptr2
> >> with ptr2 in the second region and ... oops!
> >
> > Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in the second region"?
> >
> > That doesn't happen, because during omp-lower/expand, OMP target regions (which is all that
> > this applies currently) is separated into different individual child functions.
> >
> > (Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during omp-lower, when
> > for firstprivate pointers (i.e. 'a' here) we set this bit when constructing the first load
> > of this pointer)
> >
> >   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
> >   {
> >     foo (a, a[8]);
> >     r = a[8];
> >   }
> >   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
> >   {
> >     foo (a, a[12]);
> >     r = a[12];
> >   }
> >
> > After omp-expand (before SSA):
> >
> > __attribute__((oacc parallel, omp target entrypoint, noclone))
> > void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
> > {
> >  ...
> >   <bb 5> :
> >   D.2962 = .omp_data_i->D.2947;
> >   a.8 = D.2962;
> >   r.1 = (*a.8)[12];
> >   foo (a.8, r.1);
> >   r.1 = (*a.8)[12];
> >   D.2965 = .omp_data_i->r;
> >   *D.2965 = r.1;
> >   return;
> > }
> >
> > __attribute__((oacc parallel, omp target entrypoint, noclone))
> > void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i)
> > {
> >   ...
> >   <bb 3> :
> >   D.2968 = .omp_data_i->D.2939;
> >   a.4 = D.2968;
> >   r.0 = (*a.4)[8];
> >   foo (a.4, r.0);
> >   r.0 = (*a.4)[8];
> >   D.2971 = .omp_data_i->r;
> >   *D.2971 = r.0;
> >   return;
> > }
> >
> > So actually, the creating of DECL_POINTS_TO_READONLY and its relaying to
> > SSA_NAME_POINTS_TO_READONLY_MEMORY here, is actually quite similar to a default-def
> > for an PARM_DECL, at least conceptually.
> >
> > (If offloading was structured significantly differently, say if child functions
> > were separated much earlier before omp-lowering, than this readonly-modifier might
> > possibly be a direct application of 'r' in the "fn spec" attribute)
> >
> > Other changes since first version of patch include:
> > 1) update of C/C++ FE changes to new style in c-family/c-omp.cc
> > 2) merging of two if cases in fortran/trans-openmp.cc like Thomas suggested
> > 3) Update of readonly-2.c testcase to scan before/after "fre1" pass, to verify removal of a MEM load, also as Thomas suggested.
> 
> Thanks!
> 
> > I have re-tested this patch using mainline, with no regressions. Is this okay for mainline?
> 
> > 2024-04-03  Chung-Lin Tang  <cltang@baylibre.com>
> >
> > gcc/c-family/ChangeLog:
> >
> > 	* c-omp.cc (c_omp_address_inspector::expand_array_base):
> > 	Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.
> > 	(c_omp_address_inspector::expand_component_selector): Likewise.
> >
> > gcc/fortran/ChangeLog:
> >
> > 	* trans-openmp.cc (gfc_trans_omp_array_section):
> > 	Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.
> >
> > gcc/ChangeLog:
> >
> > 	* gimple-expr.cc (copy_var_decl): Copy DECL_POINTS_TO_READONLY
> > 	for VAR_DECLs.
> > 	* omp-low.cc (lower_omp_target): Set DECL_POINTS_TO_READONLY for
> > 	variables of receiver refs.
> > 	* tree-pretty-print.cc (dump_omp_clause):
> > 	Print OMP_CLAUSE_MAP_POINTS_TO_READONLY.
> > 	(dump_generic_node): Print SSA_NAME_POINTS_TO_READONLY_MEMORY.
> > 	* tree-ssanames.cc (make_ssa_name_fn): Set
> > 	SSA_NAME_POINTS_TO_READONLY_MEMORY if DECL_POINTS_TO_READONLY is set.
> > 	* tree.h (DECL_POINTS_TO_READONLY): New macro.
> > 	(OMP_CLAUSE_MAP_POINTS_TO_READONLY): New macro.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	* c-c++-common/goacc/readonly-1.c: Adjust testcase.
> > 	* c-c++-common/goacc/readonly-2.c: New testcase.
> > 	* gfortran.dg/goacc/readonly-1.f90: Adjust testcase.
> 
> > --- a/gcc/c-family/c-omp.cc
> > +++ b/gcc/c-family/c-omp.cc
> > @@ -3907,6 +3907,8 @@ c_omp_address_inspector::expand_array_base (tree c,
> >      }
> >    else if (c2)
> >      {
> > +      if (OMP_CLAUSE_MAP_READONLY (c))
> > +	OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
> >        OMP_CLAUSE_CHAIN (c2) = OMP_CLAUSE_CHAIN (c);
> >        OMP_CLAUSE_CHAIN (c) = c2;
> >        if (implicit_p)
> > @@ -4051,6 +4053,8 @@ c_omp_address_inspector::expand_component_selector (tree c,
> >      }
> >    else if (c2)
> >      {
> > +      if (OMP_CLAUSE_MAP_READONLY (c))
> > +	OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
> >        OMP_CLAUSE_CHAIN (c2) = OMP_CLAUSE_CHAIN (c);
> >        OMP_CLAUSE_CHAIN (c) = c2;
> >        c = c2;
> 
> (So this replaces the 'gcc/c/c-typeck.cc:handle_omp_array_sections',
> 'gcc/cp/semantics.cc:handle_omp_array_sections' changes of the previous
> patch revision?)
> 
> Are we sure that really only the 'else if (c2)' branches need to handle
> this, and explicitly not the preceding 'if (c3)' branches, too?  I
> suggest we add a comment and/or handling, as necessary.  If that makes
> sense, maybe handle for both 'c3', 'c2' via a 'bool readonly_p = [...]',
> similar to the existing 'bool implicit_p'?
> 
> > --- a/gcc/fortran/trans-openmp.cc
> > +++ b/gcc/fortran/trans-openmp.cc
> > @@ -2561,6 +2561,8 @@ gfc_trans_omp_array_section (stmtblock_t *block, gfc_exec_op op,
> >    ptr2 = fold_convert (ptrdiff_type_node, ptr2);
> >    OMP_CLAUSE_SIZE (node3) = fold_build2 (MINUS_EXPR, ptrdiff_type_node,
> >  					 ptr, ptr2);
> > +  if (n->u.map.readonly)
> > +    OMP_CLAUSE_MAP_POINTS_TO_READONLY (node3) = 1;
> >  }
> >  
> >  static tree
> 
> > --- a/gcc/gimple-expr.cc
> > +++ b/gcc/gimple-expr.cc
> > @@ -385,6 +385,8 @@ copy_var_decl (tree var, tree name, tree type)
> >    DECL_CONTEXT (copy) = DECL_CONTEXT (var);
> >    TREE_USED (copy) = 1;
> >    DECL_SEEN_IN_BIND_EXPR_P (copy) = 1;
> > +  if (VAR_P (var))
> > +    DECL_POINTS_TO_READONLY (copy) = DECL_POINTS_TO_READONLY (var);
> >    DECL_ATTRIBUTES (copy) = DECL_ATTRIBUTES (var);
> >    if (DECL_USER_ALIGN (var))
> >      {
> 
> > --- a/gcc/omp-low.cc
> > +++ b/gcc/omp-low.cc
> > @@ -14003,6 +14003,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
> >  		if (ref_to_array)
> >  		  x = fold_convert_loc (clause_loc, TREE_TYPE (new_var), x);
> >  		gimplify_expr (&x, &new_body, NULL, is_gimple_val, fb_rvalue);
> > +		if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (c) && VAR_P (x))
> > +		  DECL_POINTS_TO_READONLY (x) = 1;
> >  		if ((is_ref && !ref_to_array)
> >  		    || ref_to_ptr)
> >  		  {
> 
> > --- a/gcc/testsuite/c-c++-common/goacc/readonly-1.c
> > +++ b/gcc/testsuite/c-c++-common/goacc/readonly-1.c
> > @@ -48,17 +48,17 @@ int main (void)
> >  
> >  /* { dg-final { scan-tree-dump-times "(?n)#pragma acc declare map\\(to:y\\) map\\(readonly,to:s\\) map\\(readonly,to:x\\)" 1 "original" } } */
> >  
> > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c } } } } */
> > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c } } } } */
> > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c } } } } */
> > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*s.ptr \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c } } } } */
> >  
> > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> > -/* { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c++ } } } } */
> > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c++ } } } } */
> > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) .+ map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c++ } } } } */
> > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> > +/* { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(to:y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\) map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: \[0-9\]+\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\)" 1 "original" { target { c++ } } } } */
> >  
> >  /* { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(readonly:x\\\[0\\\] \\\[len: \[0-9\]+\\\]\\);$" 4 "original" } } */
> >  /* { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(y\\\[0\\\] \\\[len: \[0-9\]+\\\]\\);$" 4 "original" } } */
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/goacc/readonly-2.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-additional-options "-O -fdump-tree-phiprop -fdump-tree-fre" } */
> > +
> > +#pragma acc routine
> > +extern void foo (int *ptr, int val);
> > +
> > +int main (void)
> > +{
> > +  int r, a[32];
> > +  #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
> > +  {
> > +    foo (a, a[8]);
> > +    r = a[8];
> > +  }
> > +}
> > +/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 2 "phiprop1" } } */
> > +/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 1 "fre1" } } */
> 
> In the tree where I've been testing your patch, I've not been seeing
> 'MEM[x]' but '(*x)', therefore:
> 
>     -/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 2 "phiprop1" } } */
>     -/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 1 "fre1" } } */
>     +/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = \\(\\\*_\[0-9\]+\\(ptro\\)\\)\\\[8\\\];" 2 "phiprop1" } } */
>     +/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = \\(\\\*_\[0-9\]+\\(ptro\\)\\)\\\[8\\\];" 1 "fre1" } } */
> 
> Maybe that's due to something else in my (long...) Git branch; just make
> sure you've got PASSes here, eventually.
> 
> > --- a/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
> > +++ b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
> > @@ -80,16 +80,16 @@ end program main
> >  ! The front end turns OpenACC 'declare' into OpenACC 'data'.
> >  !   { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:\\*b\\) map\\(alloc:b.+ map\\(to:\\*c\\) map\\(alloc:c.+" 1 "original" } }
> >  !   { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:g\\) map\\(to:h\\)" 1 "original" } }
> > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(readonly,to:\\*.+ map\\(alloc:a.+ map\\(readonly,to:\\*.+ map\\(alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> > -! { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(readonly,to:a.+ map\\(alloc:a.+ map\\(readonly,to:b.+ map\\(alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc kernels map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc serial map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc data map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:\\*.+ map\\(pt_readonly,alloc:b.+ map\\(to:\\*.+ map\\(alloc:c.+" 1 "original" } }
> > +! { dg-final { scan-tree-dump-times "(?n)#pragma acc enter data map\\(readonly,to:a.+ map\\(pt_readonly,alloc:a.+ map\\(readonly,to:b.+ map\\(pt_readonly,alloc:b.+ map\\(to:c.+ map\\(alloc:c.+" 1 "original" } }
> >  
> >  ! { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\);" 8 "original" } }
> >  ! { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\);" 8 "original" } }
> 
> Can we also get an OpenACC/Fortran test case à la
> 'c-c++-common/goacc/readonly-2.c' to demonstrate this doing something?
> 
> > --- a/gcc/testsuite/gfortran.dg/pr67170.f90
> > +++ b/gcc/testsuite/gfortran.dg/pr67170.f90
> > @@ -28,4 +28,4 @@ end subroutine foo
> >  end module test_module
> >  end program
> >  
> > -! { dg-final { scan-tree-dump-times "= \\*arg_\[0-9\]+\\(D\\);" 1 "fre1" } }
> > +! { dg-final { scan-tree-dump-times "= \\*arg_\[0-9\]+\\(D\\)\\(ptro\\);" 1 "fre1" } }
> 
> Is it understood what's happening here, that this is the correct
> behavior?  I suppose so -- there's no actual change in behavior -- as
> this here doesn't trigger for OpenACC 'readonly' modifier, but just the
> pretty printer change for 'SSA_NAME_POINTS_TO_READONLY_MEMORY':
> 
> > --- a/gcc/tree-pretty-print.cc
> > +++ b/gcc/tree-pretty-print.cc
> > @@ -915,6 +915,8 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
> >        pp_string (pp, "map(");
> >        if (OMP_CLAUSE_MAP_READONLY (clause))
> >  	pp_string (pp, "readonly,");
> > +      if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (clause))
> > +	pp_string (pp, "pt_readonly,");
> >        switch (OMP_CLAUSE_MAP_KIND (clause))
> >  	{
> >  	case GOMP_MAP_ALLOC:
> > @@ -3620,6 +3622,8 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
> >  	pp_string (pp, "(D)");
> >        if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (node))
> >  	pp_string (pp, "(ab)");
> > +      if (SSA_NAME_POINTS_TO_READONLY_MEMORY (node))
> > +	pp_string (pp, "(ptro)");
> >        break;
> >  
> >      case WITH_SIZE_EXPR:
> 
> > --- a/gcc/tree-ssanames.cc
> > +++ b/gcc/tree-ssanames.cc
> > @@ -402,6 +402,9 @@ make_ssa_name_fn (struct function *fn, tree var, gimple *stmt,
> >    else
> >      SSA_NAME_RANGE_INFO (t) = NULL;
> >  
> > +  if (VAR_P (var) && DECL_POINTS_TO_READONLY (var))
> > +    SSA_NAME_POINTS_TO_READONLY_MEMORY (t) = 1;
> > +
> >    SSA_NAME_IN_FREE_LIST (t) = 0;
> >    SSA_NAME_IS_DEFAULT_DEF (t) = 0;
> >    init_ssa_name_imm_use (t);
> 
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -1036,6 +1036,13 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
> >  #define DECL_HIDDEN_STRING_LENGTH(NODE) \
> >    (TREE_CHECK (NODE, PARM_DECL)->decl_common.decl_nonshareable_flag)
> >  
> > +/* In a VAR_DECL, set for variables regarded as pointing to memory not written
> > +   to. SSA_NAME_POINTS_TO_READONLY_MEMORY gets set for SSA_NAMEs created from
> > +   such VAR_DECLs. Currently used by OpenACC 'readonly' modifier in copyin
> > +   clauses.  */
> > +#define DECL_POINTS_TO_READONLY(NODE) \
> > +  (TREE_CHECK (NODE, VAR_DECL)->decl_common.decl_not_flexarray)
> 
> Again update the table for the flag uses are listed?
> 
> (There is a 'VAR_DECL_CHECK', which hopefully means the same thing.)
> 
> > +
> >  /* In a CALL_EXPR, means that the call is the jump from a thunk to the
> >     thunked-to function.  Be careful to avoid using this macro when one of the
> >     next two applies instead.  */
> > @@ -1845,6 +1852,10 @@ class auto_suppress_location_wrappers
> >  #define OMP_CLAUSE_MAP_READONLY(NODE) \
> >    TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
> >  
> > +/* Set if 'OMP_CLAUSE_DECL (NODE)' points to read-only memory.  */
> > +#define OMP_CLAUSE_MAP_POINTS_TO_READONLY(NODE) \
> > +  TREE_CONSTANT (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
> > +
> >  /* Same as above, for use in OpenACC cache directives.  */
> >  #define OMP_CLAUSE__CACHE__READONLY(NODE) \
> >    TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_))
> 
> (Note, corresponding 'OMP_CLAUSE_MAP_POINTS_TO_READONLY' doesn't exist
> yet, due to missing actual handling of the OpenACC 'cache' directive;
> 'gcc/gimplify.cc:gimplify_oacc_cache'.)
> 
> 
> Grüße
>  Thomas
>
Richard Biener May 16, 2024, 12:36 p.m. UTC | #6
On Wed, 3 Apr 2024, Chung-Lin Tang wrote:

> Hi Richard, Thomas,
> 
> On 2023/10/30 8:46 PM, Richard Biener wrote:
> >>
> >> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
> >> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
> >> flag.
> >>
> >> The actual optimization then is done in this second patch.  Chung-Lin
> >> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
> >> I don't have much experience with most of the following generic code, so
> >> would appreciate a helping hand, whether that conceptually makes sense as
> >> well as from the implementation point of view:
> 
> First of all, I have removed all of the gimplify-stage scanning and setting of
> DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes to
> gimplify.cc now)
> 
> I remember this code was an artifact of earlier attempts to allow struct-member
> pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed anyways.
> I think the omp_data_* member accesses when building child function side
> receiver_refs is blocking points-to analysis from working (didn't try digging deeper)
> 
> Also during gimplify, VAR_DECLs appeared to be reused (at least in some cases) for map
> clause decl reference building, so hoping that the variables "happen to be" single-use and
> DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does appear to be
> a little risky.
> 
> However, for firstprivate pointers processed during omp-low, it appears to be somewhat different.
> (see below description)
> 
> > No, I don't think you can use that flag on non-default-defs, nor
> > preserve it on copying.  So
> > it also doesn't nicely extend to DECLs as done by the patch.  We
> > currently _only_ use it
> > for incoming parameters.  When used on arbitrary code you can get to for example
> > 
> > ptr1(points-to-readony-memory) = &p->x;
> > ... access via ptr1 ...
> > ptr2 = &p->x;
> > ... access via ptr2 ...
> > 
> > where both are your OMP regions differently constrained (the constrain is on the
> > code in the region, _not_ on the actual protections of the pointed to
> > data, much like
> > for the fortran case).  But now CSE comes along and happily replaces all ptr2
> > with ptr2 in the second region and ... oops!
> 
> Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in the second region"?
> 
> That doesn't happen, because during omp-lower/expand, OMP target regions (which is all that
> this applies currently) is separated into different individual child functions.
> 
> (Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during omp-lower, when
> for firstprivate pointers (i.e. 'a' here) we set this bit when constructing the first load
> of this pointer)
> 
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
>     foo (a, a[8]);
>     r = a[8];
>   }
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
>     foo (a, a[12]);
>     r = a[12];
>   }
> 
> After omp-expand (before SSA):
> 
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
> {
>  ...
>   <bb 5> :
>   D.2962 = .omp_data_i->D.2947;
>   a.8 = D.2962;

So 'readonly: a[:32]' is put in .omp_data_i->D.2947 in the caller
and extracted here.  And you arrange for 'a.8' to have
DECL_POINTS_TO_READONLY set by "magic"?  Looking at this I wonder
if it would be more useful to "const qualify" (but "really", not
in the C sense) .omp_data_i->D.2947 instead?  Thus have a
FIELD_POINTS_TO_READONLY_MEMORY flag on the FIELD_DECL.

Points-to analysis should then be able to handle this similar to how
it handles loads of restrict qualified pointers.  Well, of course not
as simple since it now adds "qualifiers" to storage since I presume
the same object can be both readonly and not readonly like via

 #pragma acc parallel copyin(readonly: a[:32], a[33:64]) copyout(r)

?  That is, currently there's only one "readonly" object kind in
points-to, that's STRING_CSTs which get all globbed to string_id
and "ignored" for alias purposes since you can't change them.

So possibly you want to combine this with restrict qualifying the
pointer so we know there's no other (read-write) access to the memory
possible.  But then you might get all the good stuff already by
_just_ doing that restrict qualification and ignoring the readonly-ness?

>   r.1 = (*a.8)[12];
>   foo (a.8, r.1);
>   r.1 = (*a.8)[12];
>   D.2965 = .omp_data_i->r;
>   *D.2965 = r.1;
>   return;
> }
> 
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i)
> {
>   ...
>   <bb 3> :
>   D.2968 = .omp_data_i->D.2939;
>   a.4 = D.2968;
>   r.0 = (*a.4)[8];
>   foo (a.4, r.0);
>   r.0 = (*a.4)[8];
>   D.2971 = .omp_data_i->r;
>   *D.2971 = r.0;
>   return;
> }
> 
> So actually, the creating of DECL_POINTS_TO_READONLY and its relaying to
> SSA_NAME_POINTS_TO_READONLY_MEMORY here, is actually quite similar to a default-def
> for an PARM_DECL, at least conceptually.
> 
> (If offloading was structured significantly differently, say if child functions
> were separated much earlier before omp-lowering, than this readonly-modifier might
> possibly be a direct application of 'r' in the "fn spec" attribute)
> 
> Other changes since first version of patch include:
> 1) update of C/C++ FE changes to new style in c-family/c-omp.cc
> 2) merging of two if cases in fortran/trans-openmp.cc like Thomas suggested
> 3) Update of readonly-2.c testcase to scan before/after "fre1" pass, to verify removal of a MEM load, also as Thomas suggested.
> 
> I have re-tested this patch using mainline, with no regressions. Is this 
> okay for mainline?

+/* In a VAR_DECL, set for variables regarded as pointing to memory not 
written
+   to. SSA_NAME_POINTS_TO_READONLY_MEMORY gets set for SSA_NAMEs created 
from
+   such VAR_DECLs. Currently used by OpenACC 'readonly' modifier in 
copyin
+   clauses.  */
+#define DECL_POINTS_TO_READONLY(NODE) \
+  (TREE_CHECK (NODE, VAR_DECL)->decl_common.decl_not_flexarray)

you need to document uses of flags in tree-core.h to avoid clashes.
Also since this doesn't apply to all DECLs it should be named
VAR_POINTS_TO_...

I still think this is too fragile - there's no real constraints
on what VAR_DECL we create SSA names off, so the automatism
in make_ssa_name_fn and esp. copy_var_decl and via copy_node
copy_decl_no_change, thus during inlining, makes your arguments
only apply to the use for OpenMP - but nothing above hints at
this is just usable there, asking for trouble.

Sorry for the delay,
Richard.

> Thanks,
> Chung-Lin
> 
> 2024-04-03  Chung-Lin Tang  <cltang@baylibre.com>
> 
> gcc/c-family/ChangeLog:
> 
> 	* c-omp.cc (c_omp_address_inspector::expand_array_base):
> 	Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.
> 	(c_omp_address_inspector::expand_component_selector): Likewise.
> 
> gcc/fortran/ChangeLog:
> 
> 	* trans-openmp.cc (gfc_trans_omp_array_section):
> 	Set OMP_CLAUSE_MAP_POINTS_TO_READONLY on pointer clause.
> 
> gcc/ChangeLog:
> 
> 	* gimple-expr.cc (copy_var_decl): Copy DECL_POINTS_TO_READONLY
> 	for VAR_DECLs.
> 	* omp-low.cc (lower_omp_target): Set DECL_POINTS_TO_READONLY for
> 	variables of receiver refs.
> 	* tree-pretty-print.cc (dump_omp_clause):
> 	Print OMP_CLAUSE_MAP_POINTS_TO_READONLY.
> 	(dump_generic_node): Print SSA_NAME_POINTS_TO_READONLY_MEMORY.
> 	* tree-ssanames.cc (make_ssa_name_fn): Set
> 	SSA_NAME_POINTS_TO_READONLY_MEMORY if DECL_POINTS_TO_READONLY is set.
> 	* tree.h (DECL_POINTS_TO_READONLY): New macro.
> 	(OMP_CLAUSE_MAP_POINTS_TO_READONLY): New macro.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* c-c++-common/goacc/readonly-1.c: Adjust testcase.
> 	* c-c++-common/goacc/readonly-2.c: New testcase.
> 	* gfortran.dg/goacc/readonly-1.f90: Adjust testcase.
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
diff mbox series

Patch

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 7cf411155c6..42591e4029a 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -14258,6 +14258,8 @@  handle_omp_array_sections (tree c, enum c_omp_region_type ort)
 	OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH);
       else
 	OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
+      if (OMP_CLAUSE_MAP_READONLY (c))
+	OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
       OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
       if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
 	  && !c_mark_addressable (t))
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179e..6ab467e1140 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -5872,6 +5872,8 @@  handle_omp_array_sections (tree c, enum c_omp_region_type ort)
 	    }
 	  else
 	    OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER);
+	  if (OMP_CLAUSE_MAP_READONLY (c))
+	    OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
 	  OMP_CLAUSE_MAP_IMPLICIT (c2) = OMP_CLAUSE_MAP_IMPLICIT (c);
 	  if (OMP_CLAUSE_MAP_KIND (c2) != GOMP_MAP_FIRSTPRIVATE_POINTER
 	      && !cxx_mark_addressable (t))
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 2253d559f9c..d7cd65af1bb 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -2524,6 +2524,8 @@  gfc_trans_omp_array_section (stmtblock_t *block, gfc_exec_op op,
       node3 = build_omp_clause (input_location, OMP_CLAUSE_MAP);
       OMP_CLAUSE_SET_MAP_KIND (node3, ptr_kind);
       OMP_CLAUSE_DECL (node3) = gfc_conv_descriptor_data_get (decl);
+      if (n->u.readonly)
+	OMP_CLAUSE_MAP_POINTS_TO_READONLY (node3) = 1;
       /* This purposely does not include GOMP_MAP_ALWAYS_POINTER.  The extra
 	 cast prevents gimplify.cc from recognising it as being part of the
 	 struct - and adding an 'alloc: for the 'desc.data' pointer, which
@@ -2559,6 +2561,8 @@  gfc_trans_omp_array_section (stmtblock_t *block, gfc_exec_op op,
 				OMP_CLAUSE_MAP);
       OMP_CLAUSE_SET_MAP_KIND (node3, ptr_kind);
       OMP_CLAUSE_DECL (node3) = decl;
+      if (n->u.readonly)
+	OMP_CLAUSE_MAP_POINTS_TO_READONLY (node3) = 1;
     }
   ptr2 = fold_convert (ptrdiff_type_node, ptr2);
   OMP_CLAUSE_SIZE (node3) = fold_build2 (MINUS_EXPR, ptrdiff_type_node,
diff --git a/gcc/gimple-expr.cc b/gcc/gimple-expr.cc
index f15cc0ba715..42c0f6469b1 100644
--- a/gcc/gimple-expr.cc
+++ b/gcc/gimple-expr.cc
@@ -376,6 +376,8 @@  copy_var_decl (tree var, tree name, tree type)
   DECL_CONTEXT (copy) = DECL_CONTEXT (var);
   TREE_USED (copy) = 1;
   DECL_SEEN_IN_BIND_EXPR_P (copy) = 1;
+  if (VAR_P (var))
+    DECL_POINTS_TO_READONLY (copy) = DECL_POINTS_TO_READONLY (var);
   DECL_ATTRIBUTES (copy) = DECL_ATTRIBUTES (var);
   if (DECL_USER_ALIGN (var))
     {
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 36e5df050b9..394e40fead2 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -221,6 +221,7 @@  struct gimplify_omp_ctx
   splay_tree variables;
   hash_set<tree> *privatized_types;
   tree clauses;
+  hash_set<tree_operand_hash> *pt_readonly_ptrs;
   /* Iteration variables in an OMP_FOR.  */
   vec<tree> loop_iter_var;
   location_t location;
@@ -628,6 +629,15 @@  internal_get_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p,
   gimplify_expr (&val, pre_p, post_p, is_gimple_reg_rhs_or_call,
 		 fb_rvalue);
 
+  bool pt_readonly = false;
+  if (gimplify_omp_ctxp && gimplify_omp_ctxp->pt_readonly_ptrs)
+    {
+      tree ptr = val;
+      if (TREE_CODE (ptr) == POINTER_PLUS_EXPR)
+	ptr = TREE_OPERAND (ptr, 0);
+      pt_readonly = gimplify_omp_ctxp->pt_readonly_ptrs->contains (ptr);
+    }
+
   if (allow_ssa
       && gimplify_ctxp->into_ssa
       && is_gimple_reg_type (TREE_TYPE (val)))
@@ -639,9 +649,18 @@  internal_get_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p,
 	  if (name)
 	    SET_SSA_NAME_VAR_OR_IDENTIFIER (t, create_tmp_var_name (name));
 	}
+      if (pt_readonly)
+	SSA_NAME_POINTS_TO_READONLY_MEMORY (t) = 1;
     }
   else
-    t = lookup_tmp_var (val, is_formal, not_gimple_reg);
+    {
+      t = lookup_tmp_var (val, is_formal, not_gimple_reg);
+      if (pt_readonly)
+	{
+	  DECL_POINTS_TO_READONLY (t) = 1;
+	  gimplify_omp_ctxp->pt_readonly_ptrs->add (t);
+	}
+    }
 
   mod = build2 (INIT_EXPR, TREE_TYPE (t), t, unshare_expr (val));
 
@@ -8906,6 +8925,8 @@  build_omp_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end,
   OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
   OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (grp_end));
   OMP_CLAUSE_CHAIN (c2) = NULL_TREE;
+  if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (grp_end))
+    OMP_CLAUSE_MAP_POINTS_TO_READONLY (c2) = 1;
   tree grp_mid = NULL_TREE;
   if (OMP_CLAUSE_CHAIN (grp_start) != grp_end)
     grp_mid = OMP_CLAUSE_CHAIN (grp_start);
@@ -11741,6 +11762,16 @@  gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 
 	      gimplify_omp_ctxp = outer_ctx;
 	    }
+	  else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+		   && (code == OACC_PARALLEL
+		       || code == OACC_KERNELS
+		       || code == OACC_SERIAL)
+		   && OMP_CLAUSE_MAP_POINTS_TO_READONLY (c))
+	    {
+	      if (ctx->pt_readonly_ptrs == NULL)
+		ctx->pt_readonly_ptrs = new hash_set<tree_operand_hash> ();
+	      ctx->pt_readonly_ptrs->add (OMP_CLAUSE_DECL (c));
+	    }
 	  if (notice_outer)
 	    goto do_notice;
 	  break;
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index b882df048ef..204fc72ca2d 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14098,6 +14098,8 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		if (ref_to_array)
 		  x = fold_convert_loc (clause_loc, TREE_TYPE (new_var), x);
 		gimplify_expr (&x, &new_body, NULL, is_gimple_val, fb_rvalue);
+		if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (c) && VAR_P (x))
+		  DECL_POINTS_TO_READONLY (x) = 1;
 		if ((is_ref && !ref_to_array)
 		    || ref_to_ptr)
 		  {
diff --git a/gcc/testsuite/c-c++-common/goacc/readonly-1.c b/gcc/testsuite/c-c++-common/goacc/readonly-1.c
index 171f96c08db..1f10fd25e46 100644
--- a/gcc/testsuite/c-c++-common/goacc/readonly-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/readonly-1.c
@@ -19,8 +19,8 @@  int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*s.ptr \\\[len: 64\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\)" 1 "original" { target { c } } } } */
-/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: 64\\\]\\) .+ map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\)" 1 "original" { target { c++ } } } } */
+/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*s.ptr \\\[len: 64\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c } } } } */
+/* { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*NON_LVALUE_EXPR <s.ptr> \\\[len: 64\\\]\\) map\\(pt_readonly,attach_detach:s.ptr \\\[bias: 0\\\]\\) map\\(readonly,to:x\\\[0\\\] \\\[len: 128\\\]\\) map\\(pt_readonly,firstprivate:x \\\[pointer assign, bias: 0\\\]\\)" 1 "original" { target { c++ } } } } */
 /* { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(readonly:x\\\[0\\\] \\\[len: 128\\\]\\);$" 1 "original" } } */
 
 
diff --git a/gcc/testsuite/c-c++-common/goacc/readonly-2.c b/gcc/testsuite/c-c++-common/goacc/readonly-2.c
new file mode 100644
index 00000000000..d32d3362000
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/readonly-2.c
@@ -0,0 +1,15 @@ 
+/* { dg-additional-options "-O -fdump-tree-fre" } */
+
+#pragma acc routine
+extern void foo (int *ptr, int val);
+
+int main (void)
+{
+  int r, a[32];
+  #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
+  {
+    foo (a, a[8]);
+    r = a[8];
+  }
+}
+/* { dg-final { scan-tree-dump-times "r\.\[_0-9\]+ = MEM\\\[\[^_\]+_\[0-9\]+\\(ptro\\)\\\]\\\[8\\\];" 1 "fre1" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90 b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
index 069fec0a0d5..1e5e60f9744 100644
--- a/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/readonly-1.f90
@@ -20,8 +20,8 @@  program main
   !$acc end parallel
 end program main
 
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) .+ map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\)" 1 "original" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:a\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &a\\) / 4\\\] \\\[len: .+\\\]\\) .+ map\\(readonly,to:b\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &b\\) / 4\\\] \\\[len: .+\\\]\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:a.0 \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) a.0\\\]\\) map\\(readonly,to:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:b \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) b\\\]\\)" 1 "original" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc parallel map\\(readonly,to:a\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &a\\) / 4\\\] \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:a \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &a\\\]\\) map\\(readonly,to:b\\\[\\(\\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &b\\) / 4\\\] \\\[len: .+\\\]\\) map\\(pt_readonly,alloc:b \\\[pointer assign, bias: \\(integer\\(kind=8\\)\\) parm.*data - \\(integer\\(kind=8\\)\\) &b\\\]\\)" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "(?n)#pragma acc cache \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\) \\(readonly:\\*\\(integer\\(kind=4\\)\\\[0:\\\] \\*\\) parm.*data \\\[len: .+\\\]\\);" 2 "original" } }
 
 
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 9604c3eecc5..1a8b121f30b 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -907,6 +907,8 @@  dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
       pp_string (pp, "map(");
       if (OMP_CLAUSE_MAP_READONLY (clause))
 	pp_string (pp, "readonly,");
+      if (OMP_CLAUSE_MAP_POINTS_TO_READONLY (clause))
+	pp_string (pp, "pt_readonly,");
       switch (OMP_CLAUSE_MAP_KIND (clause))
 	{
 	case GOMP_MAP_ALLOC:
@@ -3436,6 +3438,8 @@  dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
 	pp_string (pp, "(D)");
       if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (node))
 	pp_string (pp, "(ab)");
+      if (SSA_NAME_POINTS_TO_READONLY_MEMORY (node))
+	pp_string (pp, "(ptro)");
       break;
 
     case WITH_SIZE_EXPR:
diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
index 23387b90fe3..32d35a29dfc 100644
--- a/gcc/tree-ssanames.cc
+++ b/gcc/tree-ssanames.cc
@@ -402,6 +402,9 @@  make_ssa_name_fn (struct function *fn, tree var, gimple *stmt,
   else
     SSA_NAME_RANGE_INFO (t) = NULL;
 
+  if (VAR_P (var) && DECL_POINTS_TO_READONLY (var))
+    SSA_NAME_POINTS_TO_READONLY_MEMORY (t) = 1;
+
   SSA_NAME_IN_FREE_LIST (t) = 0;
   SSA_NAME_IS_DEFAULT_DEF (t) = 0;
   init_ssa_name_imm_use (t);
diff --git a/gcc/tree.h b/gcc/tree.h
index ac563de1fc3..880ffb367a3 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1021,6 +1021,13 @@  extern void omp_clause_range_check_failed (const_tree, const char *, int,
 #define DECL_HIDDEN_STRING_LENGTH(NODE) \
   (TREE_CHECK (NODE, PARM_DECL)->decl_common.decl_nonshareable_flag)
 
+/* In a VAR_DECL, set for variables regarded as pointing to memory not written
+   to. SSA_NAME_POINTS_TO_READONLY_MEMORY gets set for SSA_NAMEs created from
+   such VAR_DECLs. Currently used by OpenACC 'readonly' modifier in copyin
+   clauses.  */
+#define DECL_POINTS_TO_READONLY(NODE) \
+  (TREE_CHECK (NODE, VAR_DECL)->decl_common.decl_not_flexarray)
+
 /* In a CALL_EXPR, means that the call is the jump from a thunk to the
    thunked-to function.  Be careful to avoid using this macro when one of the
    next two applies instead.  */
@@ -1815,6 +1822,10 @@  class auto_suppress_location_wrappers
 #define OMP_CLAUSE_MAP_READONLY(NODE) \
   TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
 
+/* Set if 'OMP_CLAUSE_DECL (NODE)' points to read-only memory.  */
+#define OMP_CLAUSE_MAP_POINTS_TO_READONLY(NODE) \
+  TREE_CONSTANT (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP))
+
 /* Same as above, for use in OpenACC cache directives.  */
 #define OMP_CLAUSE__CACHE__READONLY(NODE) \
   TREE_READONLY (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__CACHE_))