diff mbox series

Make memory copy functions scalar storage order barriers

Message ID 3410100.ZmJ22YzLrf@polaris
State New
Headers show
Series Make memory copy functions scalar storage order barriers | expand

Commit Message

Eric Botcazou June 1, 2020, 8:38 a.m. UTC
Hi,

this addresses the issue raised by Andrew a few weeks ago about the usage of 
memory copy functions to toggle the scalar storage order.  Recall that you 
cannot (the compiler errors out) take the address of a scalar which is stored 
in reverse order, but you can do it for the enclosing aggregate type., which 
means that you can also pass it to the memory copy functions.  In this case, 
the optimizer may rewrite the copy into a scalar copy, which is a no-no.

The patch also contains an unrelated hunk for the tree pretty printer.

Tested on x86-64/Linux, OK for the mainline?


2020-06-01  Eric Botcazou  <ebotcazou@adacore.com>

	* gimple-fold.c (gimple_fold_builtin_memory_op): Do not replace with a
	scalar copy if either type has reverse scalar storage order.
	* tree-ssa-sccvn.c (vn_reference_lookup_3): Do not propagate through a
	memory copy if either type has reverse scalar storage order.

	* tree-pretty-print.c (dump_generic_node) <ARRAY_TYPE>: Print quals.


2020-06-01  Eric Botcazou  <ebotcazou@adacore.com>

	* gcc.c-torture/execute/sso-1.c: New test.

Comments

Richard Biener June 2, 2020, 8:49 a.m. UTC | #1
On Mon, Jun 1, 2020 at 11:09 AM Eric Botcazou <botcazou@adacore.com> wrote:
>
> Hi,
>
> this addresses the issue raised by Andrew a few weeks ago about the usage of
> memory copy functions to toggle the scalar storage order.  Recall that you
> cannot (the compiler errors out) take the address of a scalar which is stored
> in reverse order, but you can do it for the enclosing aggregate type., which
> means that you can also pass it to the memory copy functions.  In this case,
> the optimizer may rewrite the copy into a scalar copy, which is a no-no.
>
> The patch also contains an unrelated hunk for the tree pretty printer.
>
> Tested on x86-64/Linux, OK for the mainline?

Hum.  How does the pointer type matter at all for memcpy()?  Isn't the
issue just that we may not use the TYPE_REVERSE_STORAGE_ORDER
qualified type in the following case:

      gimple *new_stmt;
      if (is_gimple_reg_type (TREE_TYPE (srcvar)))
        {
          tree tem = fold_const_aggregate_ref (srcvar);
          if (tem)
            srcvar = tem;
          if (! is_gimple_min_invariant (srcvar))
            {
              new_stmt = gimple_build_assign (NULL_TREE, srcvar);
              srcvar = create_tmp_reg_or_ssa_name (TREE_TYPE (srcvar),
                                                   new_stmt);

?

For the VN case the issue is interpreting a read via memcpy (non
reverse-storage)
as a reverse-storage one when matching up with a hashtable entry from
a regular reverse-storage order store, correct?  But likewise whether the
pointer destination had the attribute doesn't correctly qualify this - instead
I think we are already properly matching up the reverse storage property?
And for a rso lookup we just need to make sure to replace the inner
operands with rso ones here:

      /* The looked-through reference is a simple MEM_REF.  */
      memset (&op, 0, sizeof (op));
      op.type = vr->type;
      op.opcode = MEM_REF;
      op.op0 = build_int_cst (ptr_type_node, at - lhs_offset + rhs_offset);
      op.off = at - lhs_offset + rhs_offset;
      vr->operands[0] = op;
      op.type = TREE_TYPE (rhs);
      op.opcode = TREE_CODE (rhs);
      op.op0 = rhs;
      op.off = -1;
      vr->operands[1] = op;
      vr->hashcode = vn_reference_compute_hash (vr);

or instead guard vr->operands to not have any rso components?

The pretty-print hunk is OK.

Thanks,
Richard.

>
> 2020-06-01  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * gimple-fold.c (gimple_fold_builtin_memory_op): Do not replace with a
>         scalar copy if either type has reverse scalar storage order.
>         * tree-ssa-sccvn.c (vn_reference_lookup_3): Do not propagate through a
>         memory copy if either type has reverse scalar storage order.
>
>         * tree-pretty-print.c (dump_generic_node) <ARRAY_TYPE>: Print quals.
>
>
> 2020-06-01  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * gcc.c-torture/execute/sso-1.c: New test.
>
> --
> Eric Botcazou
Eric Botcazou June 2, 2020, 5:11 p.m. UTC | #2
[Starting with tne VN issue]

> For the VN case the issue is interpreting a read via memcpy (non
> reverse-storage) as a reverse-storage one when matching up with a hashtable
> entry from a regular reverse-storage order store, correct?

It's a read from a scalar (hence native order) combined with a store to an 
aggregate type with reverse order:

   u = 305419896;
  __builtin_memcpy (&tempb, &u, 4);
  _1 = tempb.val;

> But likewise whether the pointer destination had the attribute doesn't
> correctly qualify this - instead I think we are already properly matching up
> the reverse storage property?

No, we do not do direct matching for the storage order because you may not 
access the same memory location with different scalar storage orders, it's the 
fundamental invariant, so it's redundant with e.g. the offset for matching.

You need to byte swap if you want to propagate 305419896 to _1 above.  But 
it's an easy case: suppose that tempb is made up of 2 scalars with reverse 
storage order, then it's more complicated.  So memcpy where one of the types 
is aggregate with reverse storage order needs to be considered a storage order 
barrier, exactly like VIEW_CONVERT_EXPR:

	case VIEW_CONVERT_EXPR:
	  temp.off = 0;
	  temp.reverse = storage_order_barrier_p (ref);
	  break;

/* Return true if T is a storage order barrier, i.e. a VIEW_CONVERT_EXPR
   that can modify the storage order of objects.  Note that, even if the
   TYPE_REVERSE_STORAGE_ORDER flag is set on both the inner type and the
   outer type, a VIEW_CONVERT_EXPR can modify the storage order because
   it can change the partition of the aggregate object into scalars.  */

static inline bool
storage_order_barrier_p (const_tree t)
{
  if (TREE_CODE (t) != VIEW_CONVERT_EXPR)
    return false;

  if (AGGREGATE_TYPE_P (TREE_TYPE (t))
      && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (t)))
    return true;

  tree op = TREE_OPERAND (t, 0);

  if (AGGREGATE_TYPE_P (TREE_TYPE (op))
      && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (op)))
    return true;

  return false;
}

hence the similar conditions, but I can add some more commentary.

> Hum.  How does the pointer type matter at all for memcpy()?
> Isn't the
> issue just that we may not use the TYPE_REVERSE_STORAGE_ORDER
> qualified type in the following case:
> 
>       gimple *new_stmt;
>       if (is_gimple_reg_type (TREE_TYPE (srcvar)))
>         {
>           tree tem = fold_const_aggregate_ref (srcvar);
>           if (tem)
>             srcvar = tem;
>           if (! is_gimple_min_invariant (srcvar))
>             {
>               new_stmt = gimple_build_assign (NULL_TREE, srcvar);
>               srcvar = create_tmp_reg_or_ssa_name (TREE_TYPE (srcvar),
>                                                    new_stmt);
> 
> ?

Same rationale as above: if you want to rewrite the copy in the presence of an 
aggregate type with reverse storage order, you may need to split the operation 
into the appropriate pieces; one large access may be incorrect.
Richard Biener June 3, 2020, 7:53 a.m. UTC | #3
On Tue, Jun 2, 2020 at 7:11 PM Eric Botcazou <botcazou@adacore.com> wrote:
>
> [Starting with tne VN issue]
>
> > For the VN case the issue is interpreting a read via memcpy (non
> > reverse-storage) as a reverse-storage one when matching up with a hashtable
> > entry from a regular reverse-storage order store, correct?
>
> It's a read from a scalar (hence native order) combined with a store to an
> aggregate type with reverse order:
>
>    u = 305419896;
>   __builtin_memcpy (&tempb, &u, 4);
>   _1 = tempb.val;
>
> > But likewise whether the pointer destination had the attribute doesn't
> > correctly qualify this - instead I think we are already properly matching up
> > the reverse storage property?
>
> No, we do not do direct matching for the storage order because you may not
> access the same memory location with different scalar storage orders, it's the
> fundamental invariant, so it's redundant with e.g. the offset for matching.
>
> You need to byte swap if you want to propagate 305419896 to _1 above.  But
> it's an easy case: suppose that tempb is made up of 2 scalars with reverse
> storage order, then it's more complicated.  So memcpy where one of the types
> is aggregate with reverse storage order needs to be considered a storage order
> barrier, exactly like VIEW_CONVERT_EXPR:
>
>         case VIEW_CONVERT_EXPR:
>           temp.off = 0;
>           temp.reverse = storage_order_barrier_p (ref);
>           break;
>
> /* Return true if T is a storage order barrier, i.e. a VIEW_CONVERT_EXPR
>    that can modify the storage order of objects.  Note that, even if the
>    TYPE_REVERSE_STORAGE_ORDER flag is set on both the inner type and the
>    outer type, a VIEW_CONVERT_EXPR can modify the storage order because
>    it can change the partition of the aggregate object into scalars.  */
>
> static inline bool
> storage_order_barrier_p (const_tree t)
> {
>   if (TREE_CODE (t) != VIEW_CONVERT_EXPR)
>     return false;
>
>   if (AGGREGATE_TYPE_P (TREE_TYPE (t))
>       && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (t)))
>     return true;
>
>   tree op = TREE_OPERAND (t, 0);
>
>   if (AGGREGATE_TYPE_P (TREE_TYPE (op))
>       && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (op)))
>     return true;
>
>   return false;
> }
>
> hence the similar conditions, but I can add some more commentary.

OK, but all I was saying is that looking at the pointed-to type of
the address argument to the memcpy looks fragile to me.  For the
case cited above it would be better to look at the actual
reference we are looking up and not allowing memcpy handling
when that reference contains a storage order barrier?  Like
a && !contains_storage_order_barrier_p (vr->operands) on the whole block
like we do for the assign from constant/SSA value case?

> > Hum.  How does the pointer type matter at all for memcpy()?
> > Isn't the
> > issue just that we may not use the TYPE_REVERSE_STORAGE_ORDER
> > qualified type in the following case:
> >
> >       gimple *new_stmt;
> >       if (is_gimple_reg_type (TREE_TYPE (srcvar)))
> >         {
> >           tree tem = fold_const_aggregate_ref (srcvar);
> >           if (tem)
> >             srcvar = tem;
> >           if (! is_gimple_min_invariant (srcvar))
> >             {
> >               new_stmt = gimple_build_assign (NULL_TREE, srcvar);
> >               srcvar = create_tmp_reg_or_ssa_name (TREE_TYPE (srcvar),
> >                                                    new_stmt);
> >
> > ?
>
> Same rationale as above: if you want to rewrite the copy in the presence of an
> aggregate type with reverse storage order, you may need to split the operation
> into the appropriate pieces; one large access may be incorrect.

But the above cited case is the _only_ case we're possibly building an
assignment of a variable - so it's enough to assert that in the above case
destvar is not TYPE_REVERSE_STORAGE_ORDER?

The other cases all build an aggregate copy assignment with
a non-reverse-storage-order char[size] type which should be OK, no?
Note even for the above we probably would be fine if we'd made sure
to use a !TYPE_REVERSE_STORAGE_ORDER "qualified" type
for the access.  So the memcpy happens via a native order
load plus a native order store, exactly what memcpy does (in fact
the actual order does not matter unless the source and destination
order "agree")?  Thus maybe amend

      /* Make sure we are not copying using a floating-point mode or
         a type whose size possibly does not match its precision.  */
      if (FLOAT_MODE_P (TYPE_MODE (desttype))
          || TREE_CODE (desttype) == BOOLEAN_TYPE
          || TREE_CODE (desttype) == ENUMERAL_TYPE)
        desttype = bitwise_type_for_mode (TYPE_MODE (desttype));
      if (FLOAT_MODE_P (TYPE_MODE (srctype))
          || TREE_CODE (srctype) == BOOLEAN_TYPE
          || TREE_CODE (srctype) == ENUMERAL_TYPE)
        srctype = bitwise_type_for_mode (TYPE_MODE (srctype));

with || TYPE_REVERSE_STORAGE_ORDER (srctype) (same for desttype)?

Thanks,
Richard.

>
> --
> Eric Botcazou
Eric Botcazou June 8, 2020, 8:37 a.m. UTC | #4
> OK, but all I was saying is that looking at the pointed-to type of
> the address argument to the memcpy looks fragile to me.  For the
> case cited above it would be better to look at the actual
> reference we are looking up and not allowing memcpy handling
> when that reference contains a storage order barrier?  Like
> a && !contains_storage_order_barrier_p (vr->operands) on the whole block
> like we do for the assign from constant/SSA value case?

The memcpy is the storage order barrier though, there is no other, so I'm not 
sure what contains_storage_order_barrier_p should be called on.  Or do you 
mean that an operand should be created for the memcpy with "reverse" set?

> But the above cited case is the _only_ case we're possibly building an
> assignment of a variable - so it's enough to assert that in the above case
> destvar is not TYPE_REVERSE_STORAGE_ORDER?

What about

      /* If we can perform the copy efficiently with first doing all loads
         and then all stores inline it that way.  Currently efficiently
	 means that we can load all the memory into a single integer
	 register which is what MOVE_MAX gives us.  */
      src_align = get_pointer_alignment (src);

It's also a scalar copy, so it won't work either.

> The other cases all build an aggregate copy assignment with
> a non-reverse-storage-order char[size] type which should be OK, no?

Yes, theoritically, but SRA rewrites the access into a scalar access and we're 
back to square one (I tried), so this can possibly work only if the array type 
is built with TYPE_REVERSE_STORAGE_ORDER too, which implies that the source is 
in reverse order too.  So the only case that could be handled correctly is the 
memcpy between two aggregates types with TYPE_REVERSE_STORAGE_ORDER.

> Note even for the above we probably would be fine if we'd made sure
> to use a !TYPE_REVERSE_STORAGE_ORDER "qualified" type
> for the access.  So the memcpy happens via a native order
> load plus a native order store, exactly what memcpy does (in fact
> the actual order does not matter unless the source and destination
> order "agree")?  Thus maybe amend
> 
>       /* Make sure we are not copying using a floating-point mode or
>          a type whose size possibly does not match its precision.  */
>       if (FLOAT_MODE_P (TYPE_MODE (desttype))
> 
>           || TREE_CODE (desttype) == BOOLEAN_TYPE
>           || TREE_CODE (desttype) == ENUMERAL_TYPE)
> 
>         desttype = bitwise_type_for_mode (TYPE_MODE (desttype));
>       if (FLOAT_MODE_P (TYPE_MODE (srctype) )
> 
>           || TREE_CODE (srctype) == BOOLEAN_TYPE
>           || TREE_CODE (srctype) == ENUMERAL_TYPE)
> 
>         srctype = bitwise_type_for_mode (TYPE_MODE (srctype));
> 
> with || TYPE_REVERSE_STORAGE_ORDER (srctype) (same for desttype)?

We don't have TYPE_REVERSE_STORAGE_ORDER qualified types, the flag is only set 
on aggregate types and bitwise_type_for_mode will return an integer type.
Richard Biener June 9, 2020, 8:18 a.m. UTC | #5
On Mon, Jun 8, 2020 at 10:37 AM Eric Botcazou <botcazou@adacore.com> wrote:
>
> > OK, but all I was saying is that looking at the pointed-to type of
> > the address argument to the memcpy looks fragile to me.  For the
> > case cited above it would be better to look at the actual
> > reference we are looking up and not allowing memcpy handling
> > when that reference contains a storage order barrier?  Like
> > a && !contains_storage_order_barrier_p (vr->operands) on the whole block
> > like we do for the assign from constant/SSA value case?
>
> The memcpy is the storage order barrier though, there is no other, so I'm not
> sure what contains_storage_order_barrier_p should be called on.  Or do you
> mean that an operand should be created for the memcpy with "reverse" set?

I'm probably completely misunderstanding how TYPE_REVERSE_STORAGE_ORDER
works and what the constraints are.  But if the memcpy is the storage order
barrier then what cases are OK?

 - both source and destination objects are [only] accessed with the same
   storage order
 - both source and destination objects are [only] accessed with
   !TYPE_REVERSE_STORAGE_ORDER "containing" accesses

and what determines the "object state" with respect to storage order
when we see a memcpy call?  I had the impression (from the other
existing cases of contains_storage_order_barrier_p () calls) that
we derive the "object state" from the downstream access (vr->operands),
in this case the memcpy destination access.  But see below for maybe
the easier case (memcpy folding).

> > But the above cited case is the _only_ case we're possibly building an
> > assignment of a variable - so it's enough to assert that in the above case
> > destvar is not TYPE_REVERSE_STORAGE_ORDER?
>
> What about
>
>       /* If we can perform the copy efficiently with first doing all loads
>          and then all stores inline it that way.  Currently efficiently
>          means that we can load all the memory into a single integer
>          register which is what MOVE_MAX gives us.  */
>       src_align = get_pointer_alignment (src);
>
> It's also a scalar copy, so it won't work either.

In my understanding of how TYPE_REVERSE_STORAGE_ORDER works
a memcpy, irrespective of its actual "implementation" should not break
anything since it transfers bytes.  In particular a 8 byte
TYPE_REVERSE_STORAGE_ORDER
object can be transfered ("memcpy"ed) with (ignoring TBAA)

  int tema = *(int *)src;
  *(int *)dst = tema;
  int temb = *((int *)src + 1);
  *((int *)dst + 1) = temb;

and what byteorder the int accesses are performed in is irrelevant
as long as read and write use the same byteorder.

> > The other cases all build an aggregate copy assignment with
> > a non-reverse-storage-order char[size] type which should be OK, no?
>
> Yes, theoritically, but SRA rewrites the access into a scalar access and we're
> back to square one (I tried).

But what's the issue with this?

> , so this can possibly work only if the array type
> is built with TYPE_REVERSE_STORAGE_ORDER too, which implies that the source is
> in reverse order too.  So the only case that could be handled correctly is the
> memcpy between two aggregates types with TYPE_REVERSE_STORAGE_ORDER.

Why?  Is the following an invalid program?

void mymemcpy (void *a, void *b) { memcpy (a, b, 8); }
void foo()
{
  int b[2] __attribute__(reverse_storage);
  int a[2];
  mymemcpy (a, b);
  return a[0];
}

how can GCC infer, inside mymemcpy that b is reverse storage order?

Thus, if all this is a problem then I think we're papering over the issue
by not folding memcpy.

> > Note even for the above we probably would be fine if we'd made sure
> > to use a !TYPE_REVERSE_STORAGE_ORDER "qualified" type
> > for the access.  So the memcpy happens via a native order
> > load plus a native order store, exactly what memcpy does (in fact
> > the actual order does not matter unless the source and destination
> > order "agree")?  Thus maybe amend
> >
> >       /* Make sure we are not copying using a floating-point mode or
> >          a type whose size possibly does not match its precision.  */
> >       if (FLOAT_MODE_P (TYPE_MODE (desttype))
> >
> >           || TREE_CODE (desttype) == BOOLEAN_TYPE
> >           || TREE_CODE (desttype) == ENUMERAL_TYPE)
> >
> >         desttype = bitwise_type_for_mode (TYPE_MODE (desttype));
> >       if (FLOAT_MODE_P (TYPE_MODE (srctype) )
> >
> >           || TREE_CODE (srctype) == BOOLEAN_TYPE
> >           || TREE_CODE (srctype) == ENUMERAL_TYPE)
> >
> >         srctype = bitwise_type_for_mode (TYPE_MODE (srctype));
> >
> > with || TYPE_REVERSE_STORAGE_ORDER (srctype) (same for desttype)?
>
> We don't have TYPE_REVERSE_STORAGE_ORDER qualified types, the flag is only set
> on aggregate types and bitwise_type_for_mode will return an integer type.

Exactly.  When the original src/desttype turns out to be a RECORD_TYPE (IIRC
that can happen) with TYPE_REVERSE_STORAGE_ORDER make sure we're
using a bitwise_type_for_mode.  Because then the copying will happen in
chunks of, say, SImode which should be fine (see above).

Richard.

>
> --
> Eric Botcazou
Eric Botcazou June 9, 2020, 10:18 a.m. UTC | #6
> I'm probably completely misunderstanding how TYPE_REVERSE_STORAGE_ORDER
> works and what the constraints are.  But if the memcpy is the storage order
> barrier then what cases are OK?
> 
>  - both source and destination objects are [only] accessed with the same
>    storage order
>  - both source and destination objects are [only] accessed with
>    !TYPE_REVERSE_STORAGE_ORDER "containing" accesses
> 
> and what determines the "object state" with respect to storage order
> when we see a memcpy call?  I had the impression (from the other
> existing cases of contains_storage_order_barrier_p () calls) that
> we derive the "object state" from the downstream access (vr->operands),
> in this case the memcpy destination access.  But see below for maybe
> the easier case (memcpy folding).

All cases are OK if you don't try to replace memcpy with something else.  The 
only possible replacement would be with a VIEW_CONVERT_EXPR, but I guess that 
it would be quickly folded into a MEM_EXPR and then we're back to square one.

> In my understanding of how TYPE_REVERSE_STORAGE_ORDER works
> a memcpy, irrespective of its actual "implementation" should not break
> anything since it transfers bytes.

The point of the discussion is precisely to make it work in all cases.

> In particular a 8 byte TYPE_REVERSE_STORAGE_ORDER
> object can be transfered ("memcpy"ed) with (ignoring TBAA)
> 
>   int tema = *(int *)src;
>   *(int *)dst = tema;
>   int temb = *((int *)src + 1);
>   *((int *)dst + 1) = temb;
> 
> and what byteorder the int accesses are performed in is irrelevant
> as long as read and write use the same byteorder.

No, because of the fundamental invariant of the implementation: you may _not_ 
access the same memory location with different storage orders, this is simply 
not supported and we document the limitation in the manual.  The program does 
not do that, so it's valid; the program after folding does that, which means 
that the folding is invalid.

> > Yes, theoritically, but SRA rewrites the access into a scalar access and
> > we're back to square one (I tried).
> 
> But what's the issue with this?

Same as above: access to the same location with different storage orders.

> Why?  Is the following an invalid program?
> 
> void mymemcpy (void *a, void *b) { memcpy (a, b, 8); }
> void foo()
> {
>   int b[2] __attribute__(reverse_storage);
>   int a[2];
>   mymemcpy (a, b);
>   return a[0];
> }
> 
> how can GCC infer, inside mymemcpy that b is reverse storage order?

No, that's supported, again the point of the discussion is precisely to make 
it work!

> Thus, if all this is a problem then I think we're papering over the issue
> by not folding memcpy.

I don't really see why you're insisting on folding memcpy.  It's semantically 
equivalent to VIEW_CONVERT_EXPR so it needs to be treated the same way.
Richard Biener June 9, 2020, 12:57 p.m. UTC | #7
On Tue, Jun 9, 2020 at 12:19 PM Eric Botcazou <botcazou@adacore.com> wrote:
>
> > I'm probably completely misunderstanding how TYPE_REVERSE_STORAGE_ORDER
> > works and what the constraints are.  But if the memcpy is the storage order
> > barrier then what cases are OK?
> >
> >  - both source and destination objects are [only] accessed with the same
> >    storage order
> >  - both source and destination objects are [only] accessed with
> >    !TYPE_REVERSE_STORAGE_ORDER "containing" accesses
> >
> > and what determines the "object state" with respect to storage order
> > when we see a memcpy call?  I had the impression (from the other
> > existing cases of contains_storage_order_barrier_p () calls) that
> > we derive the "object state" from the downstream access (vr->operands),
> > in this case the memcpy destination access.  But see below for maybe
> > the easier case (memcpy folding).
>
> All cases are OK if you don't try to replace memcpy with something else.  The
> only possible replacement would be with a VIEW_CONVERT_EXPR, but I guess that
> it would be quickly folded into a MEM_EXPR and then we're back to square one.
>
> > In my understanding of how TYPE_REVERSE_STORAGE_ORDER works
> > a memcpy, irrespective of its actual "implementation" should not break
> > anything since it transfers bytes.
>
> The point of the discussion is precisely to make it work in all cases.
>
> > In particular a 8 byte TYPE_REVERSE_STORAGE_ORDER
> > object can be transfered ("memcpy"ed) with (ignoring TBAA)
> >
> >   int tema = *(int *)src;
> >   *(int *)dst = tema;
> >   int temb = *((int *)src + 1);
> >   *((int *)dst + 1) = temb;
> >
> > and what byteorder the int accesses are performed in is irrelevant
> > as long as read and write use the same byteorder.
>
> No, because of the fundamental invariant of the implementation: you may _not_
> access the same memory location with different storage orders, this is simply
> not supported and we document the limitation in the manual.  The program does
> not do that, so it's valid; the program after folding does that, which means
> that the folding is invalid.
>
> > > Yes, theoritically, but SRA rewrites the access into a scalar access and
> > > we're back to square one (I tried).
> >
> > But what's the issue with this?
>
> Same as above: access to the same location with different storage orders.
>
> > Why?  Is the following an invalid program?
> >
> > void mymemcpy (void *a, void *b) { memcpy (a, b, 8); }
> > void foo()
> > {
> >   int b[2] __attribute__(reverse_storage);
> >   int a[2];
> >   mymemcpy (a, b);
> >   return a[0];
> > }
> >
> > how can GCC infer, inside mymemcpy that b is reverse storage order?
>
> No, that's supported, again the point of the discussion is precisely to make
> it work!

But GCC does not see the reverse storage order in mymemcpy so
it happily folds the memcpy inside it, inlines the result and then?

> > Thus, if all this is a problem then I think we're papering over the issue
> > by not folding memcpy.
>
> I don't really see why you're insisting on folding memcpy.  It's semantically
> equivalent to VIEW_CONVERT_EXPR so it needs to be treated the same way.

I'm not insisting on folding memcpy.  I'm questioning you are fixing
the actual bug ;)

Richard.

> --
> Eric Botcazou
Eric Botcazou July 8, 2020, 8:52 a.m. UTC | #8
[Sorry for dropping the ball here]

> But GCC does not see the reverse storage order in mymemcpy so
> it happily folds the memcpy inside it, inlines the result and then?

You're right, this breaks, hence the following alternative: either we prevent 
inlining from happening, or we declare that this is simply not supported and 
warn (there is a -Wscalar-storage-order warning for problematic constructs).

I didn't find any existing infrastructure for the former and I'm not sure it's 
worth adding, so the attached implements the latter.  Tested on x86-64/Linux.


2020-07-08  Eric Botcazou  <ebotcazou@adacore.com>

c-family/
	* c.opt (Wscalar-storage-order): Add warn_scalar_storage_order variable.


2020-07-08  Eric Botcazou  <ebotcazou@adacore.com>

c/
	* c-typeck.c (convert_for_assignment): If -Wscalar-storage-order is set,
	warn for conversions between pointers that point to incompatible scalar
	storage orders.


2020-07-08  Eric Botcazou  <ebotcazou@adacore.com>

	* gimple-fold.c (gimple_fold_builtin_memory_op): Do not fold if either
	type has reverse scalar storage order.
	* tree-ssa-sccvn.c (vn_reference_lookup_3): Do not propagate through a
	memory copy if either type has reverse scalar storage order.


2020-07-08  Eric Botcazou  <ebotcazou@adacore.com>

testsuite/
	* gcc.dg/sso-11.c: New test.
	* gcc.dg/sso/sso.exp: Pass -Wno-scalar-storage-order.
	* gcc.dg/sso/memcpy-1.c: New test.
Richard Biener July 8, 2020, 3:22 p.m. UTC | #9
On Wed, Jul 8, 2020 at 10:53 AM Eric Botcazou <botcazou@adacore.com> wrote:
>
> [Sorry for dropping the ball here]
>
> > But GCC does not see the reverse storage order in mymemcpy so
> > it happily folds the memcpy inside it, inlines the result and then?
>
> You're right, this breaks, hence the following alternative: either we prevent
> inlining from happening, or we declare that this is simply not supported and
> warn (there is a -Wscalar-storage-order warning for problematic constructs).
>
> I didn't find any existing infrastructure for the former and I'm not sure it's
> worth adding, so the attached implements the latter.  Tested on x86-64/Linux.

OK with me.

I still believe we could handle reverse storage order more "optimistically"
(without all the current usage restrictions).  We seem to have no problems
with address-spaces in this area for example (their problematic cases are
of course slightly different).

Richard.

>
> 2020-07-08  Eric Botcazou  <ebotcazou@adacore.com>
>
> c-family/
>         * c.opt (Wscalar-storage-order): Add warn_scalar_storage_order variable.
>
>
> 2020-07-08  Eric Botcazou  <ebotcazou@adacore.com>
>
> c/
>         * c-typeck.c (convert_for_assignment): If -Wscalar-storage-order is set,
>         warn for conversions between pointers that point to incompatible scalar
>         storage orders.
>
>
> 2020-07-08  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * gimple-fold.c (gimple_fold_builtin_memory_op): Do not fold if either
>         type has reverse scalar storage order.
>         * tree-ssa-sccvn.c (vn_reference_lookup_3): Do not propagate through a
>         memory copy if either type has reverse scalar storage order.
>
>
> 2020-07-08  Eric Botcazou  <ebotcazou@adacore.com>
>
> testsuite/
>         * gcc.dg/sso-11.c: New test.
>         * gcc.dg/sso/sso.exp: Pass -Wno-scalar-storage-order.
>         * gcc.dg/sso/memcpy-1.c: New test.
>
>
> --
> Eric Botcazou
Eric Botcazou July 9, 2020, 7:29 a.m. UTC | #10
> OK with me.

Thanks.

> I still believe we could handle reverse storage order more "optimistically"
> (without all the current usage restrictions).  We seem to have no problems
> with address-spaces in this area for example (their problematic cases are
> of course slightly different).

The fundamental restriction of the implementation is that we don't assign a 
storage order to the scalar rvalues themselves, i.e. there is only one kind of 
scalar rvalues and they are represented in target order.  Only scalar lvalues 
can have a storage order.  This one cannot reasonably be relaxed I think.

From there, you have secondary restrictions related to aliasing:

 - are pointers to scalar values stored in reverse order allowed?  Currently 
they are not and the compiler issues an error if you try to make one.  This 
one could probably be relaxed by adding a "reverse" flag to pointers and the 
analogy with address-spaces would be exact (as a matter of fact, the latest 
UltraSPARC specifications introduced an Address Space Identifier for little-
endian data, so these pointers could be implemented in hardware for them).

 - is storage order toggling by means of aliasing supported?  Currently it is 
not and this is documented as such.  This one looks hard to relax because this 
would be essentially equivalent to relaxing the fundamenal restriction: if you 
apply SRA to such a case, either materially or symbolically through VN, you 
will need to effectively assign a storage order to scalar rvalues internally.
Richard Biener July 9, 2020, 7:40 a.m. UTC | #11
On Thu, Jul 9, 2020 at 9:29 AM Eric Botcazou <botcazou@adacore.com> wrote:
>
> > OK with me.
>
> Thanks.
>
> > I still believe we could handle reverse storage order more "optimistically"
> > (without all the current usage restrictions).  We seem to have no problems
> > with address-spaces in this area for example (their problematic cases are
> > of course slightly different).
>
> The fundamental restriction of the implementation is that we don't assign a
> storage order to the scalar rvalues themselves, i.e. there is only one kind of
> scalar rvalues and they are represented in target order.  Only scalar lvalues
> can have a storage order.  This one cannot reasonably be relaxed I think.
>
> From there, you have secondary restrictions related to aliasing:
>
>  - are pointers to scalar values stored in reverse order allowed?  Currently
> they are not and the compiler issues an error if you try to make one.  This
> one could probably be relaxed by adding a "reverse" flag to pointers and the
> analogy with address-spaces would be exact (as a matter of fact, the latest
> UltraSPARC specifications introduced an Address Space Identifier for little-
> endian data, so these pointers could be implemented in hardware for them).
>
>  - is storage order toggling by means of aliasing supported?  Currently it is
> not and this is documented as such.  This one looks hard to relax because this
> would be essentially equivalent to relaxing the fundamenal restriction: if you
> apply SRA to such a case, either materially or symbolically through VN, you
> will need to effectively assign a storage order to scalar rvalues internally.

Hmm, I'd probably avoid to assign a storage order to rvalues (SSA names
and pseudos) and simply go with a storage order on lvalues.  That means
SRA can only scalarize when all actual accesses have the same storage
order and "full" scalarization (scalarization of portions where no accesses
are observed) can be only done when its materialized back to memory.
So I don't see where the need arises to tag pointers apart from on the
frontend side which of course needs to tag dereferences of pointers
appropriately for the middle-end.  In particular memcpy() behaves
storage-order agnostic so it shouldn't care.

I don't see how aliases or storage order "punning" become a problem then.
It's no different from aliasing int and float accesses?

As of treating it like address-spaces, yes - it's basically like that at the
moment but with more implementation restrictions.

Richard.

> --
> Eric Botcazou
diff mbox series

Patch

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 4e3de95d2d2..64a9221f8cf 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -741,7 +741,8 @@  gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
     }
   else
     {
-      tree srctype, desttype;
+      tree srctype = TREE_TYPE (TREE_TYPE (src));
+      tree desttype = TREE_TYPE (TREE_TYPE (dest));
       unsigned int src_align, dest_align;
       tree off0;
       const char *tmp_str;
@@ -767,7 +768,11 @@  gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 	     hack can be removed.  */
 	  && !c_strlen (src, 1)
 	  && !((tmp_str = c_getstr (src, &tmp_len)) != NULL
-	       && memchr (tmp_str, 0, tmp_len) == NULL))
+	       && memchr (tmp_str, 0, tmp_len) == NULL)
+	  && !(AGGREGATE_TYPE_P (srctype)
+	       && TYPE_REVERSE_STORAGE_ORDER (srctype))
+	  && !(AGGREGATE_TYPE_P (desttype)
+	       && TYPE_REVERSE_STORAGE_ORDER (desttype)))
 	{
 	  unsigned ilen = tree_to_uhwi (len);
 	  if (pow2p_hwi (ilen))
@@ -957,10 +962,15 @@  gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 	 but that only gains us that the destination and source possibly
 	 no longer will have their address taken.  */
       srctype = TREE_TYPE (TREE_TYPE (src));
+      desttype = TREE_TYPE (TREE_TYPE (dest));
+      if ((AGGREGATE_TYPE_P (srctype)
+	   && TYPE_REVERSE_STORAGE_ORDER (srctype))
+	  || (AGGREGATE_TYPE_P (desttype)
+	      && TYPE_REVERSE_STORAGE_ORDER (desttype)))
+	return false;
       if (TREE_CODE (srctype) == ARRAY_TYPE
 	  && !tree_int_cst_equal (TYPE_SIZE_UNIT (srctype), len))
 	srctype = TREE_TYPE (srctype);
-      desttype = TREE_TYPE (TREE_TYPE (dest));
       if (TREE_CODE (desttype) == ARRAY_TYPE
 	  && !tree_int_cst_equal (TYPE_SIZE_UNIT (desttype), len))
 	desttype = TREE_TYPE (desttype);
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index f04fd65091a..7d581214022 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1899,8 +1899,16 @@  dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
 
     case ARRAY_TYPE:
       {
+	unsigned int quals = TYPE_QUALS (node);
 	tree tmp;
 
+	if (quals & TYPE_QUAL_ATOMIC)
+	  pp_string (pp, "atomic ");
+	if (quals & TYPE_QUAL_CONST)
+	  pp_string (pp, "const ");
+	if (quals & TYPE_QUAL_VOLATILE)
+	  pp_string (pp, "volatile ");
+
 	/* Print the innermost component type.  */
 	for (tmp = TREE_TYPE (node); TREE_CODE (tmp) == ARRAY_TYPE;
 	     tmp = TREE_TYPE (tmp))
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 4b3f31c12cb..17867b65ecb 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -3275,6 +3275,9 @@  vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
 	}
       if (TREE_CODE (lhs) == ADDR_EXPR)
 	{
+	  if (AGGREGATE_TYPE_P (TREE_TYPE (TREE_TYPE (lhs)))
+	      && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (TREE_TYPE (lhs))))
+	    return (void *)-1;
 	  tree tem = get_addr_base_and_unit_offset (TREE_OPERAND (lhs, 0),
 						    &lhs_offset);
 	  if (!tem)
@@ -3303,6 +3306,9 @@  vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *data_,
 	rhs = vn_valueize (rhs);
       if (TREE_CODE (rhs) == ADDR_EXPR)
 	{
+	  if (AGGREGATE_TYPE_P (TREE_TYPE (TREE_TYPE (rhs)))
+	      && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (TREE_TYPE (rhs))))
+	    return (void *)-1;
 	  tree tem = get_addr_base_and_unit_offset (TREE_OPERAND (rhs, 0),
 						    &rhs_offset);
 	  if (!tem)