diff mbox series

Narrow ARRAY*_REF indexes to sizetype for gimple (PR tree-optimization/89223)

Message ID 20190207223321.GQ2135@tucnak
State New
Headers show
Series Narrow ARRAY*_REF indexes to sizetype for gimple (PR tree-optimization/89223) | expand

Commit Message

Jakub Jelinek Feb. 7, 2019, 10:33 p.m. UTC
Hi!

As mentioned in the PR, given that during expansion we expand ARRAY_REFs
using get_inner_reference that casts ARRAY_*REF indexes to sizetype, all the
bits above sizetype are ignored (whether one uses long long indexes on
32-bit targets or __int128 indexes on 64-bit ones), so I think it is
desirable to make that visible already for GIMPLE optimizations, where we
can narrow other arithmetic stmts feeding those indexes etc., it could help
vectorization (e.g. gather/scatter), etc.  Of course, fixing tree-data-ref.c
etc. to cope with wider constants or chrecs is desirable as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-02-06  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/89223
	* gimplify.c (gimplify_compound_lval): Narrow ARRAY*_REF indexes wider
	than sizetype to sizetype.

	* gcc.c-torture/execute/pr89223.c: New test.


	Jakub

Comments

Richard Biener Feb. 8, 2019, 9:48 a.m. UTC | #1
On Thu, 7 Feb 2019, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned in the PR, given that during expansion we expand ARRAY_REFs
> using get_inner_reference that casts ARRAY_*REF indexes to sizetype, all the
> bits above sizetype are ignored (whether one uses long long indexes on
> 32-bit targets or __int128 indexes on 64-bit ones), so I think it is
> desirable to make that visible already for GIMPLE optimizations, where we
> can narrow other arithmetic stmts feeding those indexes etc., it could help
> vectorization (e.g. gather/scatter), etc.  Of course, fixing tree-data-ref.c
> etc. to cope with wider constants or chrecs is desirable as well.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2019-02-06  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/89223
> 	* gimplify.c (gimplify_compound_lval): Narrow ARRAY*_REF indexes wider
> 	than sizetype to sizetype.
> 
> 	* gcc.c-torture/execute/pr89223.c: New test.
> 
> --- gcc/gimplify.c.jj	2019-01-28 23:30:53.199762928 +0100
> +++ gcc/gimplify.c	2019-02-06 17:15:35.368976785 +0100
> @@ -2977,6 +2977,12 @@ gimplify_compound_lval (tree *expr_p, gi
>  
>        if (TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF)
>  	{
> +	  /* Expansion will cast the index to sizetype, so any excess bits
> +	     aren't useful for optimizations.  */
> +	  if (!error_operand_p (TREE_OPERAND (t, 1))
> +	      && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (t, 1)))
> +		  > TYPE_PRECISION (sizetype)))
> +	    TREE_OPERAND (t, 1) = fold_convert (sizetype, TREE_OPERAND (t, 1));

Shouldn't we use ssizetype in case the index was signed (or always)?

I wonder what the effects are on SCEV-analyzability when the IV is, say
[unsigned] __int128 and the uses we analyze are then ([s]sizetype) IV
vs. plain IV.  That is, I'm not sure exposing the casting on GIMPLE
is always a good idea.  Did you look at how SCEV behaves with the change?

Otherwise I agree.  Not sure if we want to do this during stage4 though.

Thanks,
Richard.

>  	  /* Gimplify the dimension.  */
>  	  if (!is_gimple_min_invariant (TREE_OPERAND (t, 1)))
>  	    {
> --- gcc/testsuite/gcc.c-torture/execute/pr89223.c.jj	2019-02-07 18:09:39.869088769 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr89223.c	2019-02-07 18:09:12.359543231 +0100
> @@ -0,0 +1,27 @@
> +/* PR tree-optimization/89223 */
> +/* { dg-do run { target int128 } } */
> +
> +int a[9] = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
> +unsigned __int128 b;
> +
> +__attribute__((noipa)) void
> +foo (void)
> +{
> +  for (b = (((unsigned __int128) 4) << 64) + 4; b != 0; b -= ((((unsigned __int128) 1) << 64) + 1))
> +    a[b] = a[b + b];
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  if (sizeof (__UINTPTR_TYPE__) * __CHAR_BIT__ != 64
> +      || sizeof (__SIZE_TYPE__) * __CHAR_BIT__ != 64
> +      || sizeof (__int128) * __CHAR_BIT__ != 128)
> +    return 0;
> +  foo ();
> +  for (i = 0; i < 9; ++i)
> +    if (a[i] != (((1 << i) & 0x16) != 0 ? 9 : i == 3 ? 7 : i + 1))
> +      __builtin_abort ();
> +  return 0;
> +}
> 
> 	Jakub
> 
>
Jakub Jelinek Feb. 8, 2019, 10:11 a.m. UTC | #2
On Fri, Feb 08, 2019 at 10:48:30AM +0100, Richard Biener wrote:
> > --- gcc/gimplify.c.jj	2019-01-28 23:30:53.199762928 +0100
> > +++ gcc/gimplify.c	2019-02-06 17:15:35.368976785 +0100
> > @@ -2977,6 +2977,12 @@ gimplify_compound_lval (tree *expr_p, gi
> >  
> >        if (TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF)
> >  	{
> > +	  /* Expansion will cast the index to sizetype, so any excess bits
> > +	     aren't useful for optimizations.  */
> > +	  if (!error_operand_p (TREE_OPERAND (t, 1))
> > +	      && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (t, 1)))
> > +		  > TYPE_PRECISION (sizetype)))
> > +	    TREE_OPERAND (t, 1) = fold_convert (sizetype, TREE_OPERAND (t, 1));
> 
> Shouldn't we use ssizetype in case the index was signed (or always)?

As get_inner_reference uses sizetype, I think it is best to do what will it
do in the end.  We use sizetype for POINTER_PLUS_EXPR as well and SCEV needs
to handle that too.  If we changed POINTER_PLUS_EXPR to use ssizetype, using
ssizetype for the indexes would make sense too and we could even do the
casts if it is sizetype originally.  That said, adding conversions to either
sizetype or ssizetype if the index is narrowed is IMHO harmful.

> I wonder what the effects are on SCEV-analyzability when the IV is, say
> [unsigned] __int128 and the uses we analyze are then ([s]sizetype) IV
> vs. plain IV.  That is, I'm not sure exposing the casting on GIMPLE
> is always a good idea.  Did you look at how SCEV behaves with the change?

Don't see it would change any SCEV decisions, there are minor differences
like
 (instantiate_scev 
   (instantiate_below = 2 -> 3)
   (evolution_loop = 1)
-  (chrec = {0x40000000000000004, +, 0xfffffffffffffffeffffffffffffffff}_1)
-  (res = {0x40000000000000004, +, 0xfffffffffffffffeffffffffffffffff}_1))
+  (chrec = {4, +, 18446744073709551615}_1)
+  (res = {4, +, 18446744073709551615}_1))
in -fdump-tree-all-scev, but the optimizations are pretty much same,
initially the IL is tiny bit larger (because of the extra converts), soon it
is smaller:
-  _15 = a[0x80000000000000008];
-  a[0x40000000000000004] = _15;
-  _23 = a[0x60000000000000006];
-  a[0x30000000000000003] = _23;
-  _31 = a[0x40000000000000004];
-  a[0x20000000000000002] = _31;
-  a[0x10000000000000001] = _31;
+  _5 = a[8];
+  a[4] = _5;
+  _26 = a[6];
+  a[3] = _26;
+  a[2] = _5;
+  a[1] = _5;
in *.optimized.  By exposing the casts, guess e.g. GIMPLE opts could find
out that a[2] and a[0x20000000000000002] is the same thing etc. (rather than
(sometimes?) discovering that only during RTL optimizations when it is
expanded that way).

> Otherwise I agree.  Not sure if we want to do this during stage4 though.

If you want to defer for stage1, I can certainly wait with that.

	Jakub
Richard Biener Feb. 8, 2019, 11:47 a.m. UTC | #3
On Fri, 8 Feb 2019, Jakub Jelinek wrote:

> On Fri, Feb 08, 2019 at 10:48:30AM +0100, Richard Biener wrote:
> > > --- gcc/gimplify.c.jj	2019-01-28 23:30:53.199762928 +0100
> > > +++ gcc/gimplify.c	2019-02-06 17:15:35.368976785 +0100
> > > @@ -2977,6 +2977,12 @@ gimplify_compound_lval (tree *expr_p, gi
> > >  
> > >        if (TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF)
> > >  	{
> > > +	  /* Expansion will cast the index to sizetype, so any excess bits
> > > +	     aren't useful for optimizations.  */
> > > +	  if (!error_operand_p (TREE_OPERAND (t, 1))
> > > +	      && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (t, 1)))
> > > +		  > TYPE_PRECISION (sizetype)))
> > > +	    TREE_OPERAND (t, 1) = fold_convert (sizetype, TREE_OPERAND (t, 1));
> > 
> > Shouldn't we use ssizetype in case the index was signed (or always)?
> 
> As get_inner_reference uses sizetype, I think it is best to do what will it
> do in the end.  We use sizetype for POINTER_PLUS_EXPR as well and SCEV needs
> to handle that too.  If we changed POINTER_PLUS_EXPR to use ssizetype, using
> ssizetype for the indexes would make sense too and we could even do the
> casts if it is sizetype originally.  That said, adding conversions to either
> sizetype or ssizetype if the index is narrowed is IMHO harmful.
> 
> > I wonder what the effects are on SCEV-analyzability when the IV is, say
> > [unsigned] __int128 and the uses we analyze are then ([s]sizetype) IV
> > vs. plain IV.  That is, I'm not sure exposing the casting on GIMPLE
> > is always a good idea.  Did you look at how SCEV behaves with the change?
> 
> Don't see it would change any SCEV decisions, there are minor differences
> like
>  (instantiate_scev 
>    (instantiate_below = 2 -> 3)
>    (evolution_loop = 1)
> -  (chrec = {0x40000000000000004, +, 0xfffffffffffffffeffffffffffffffff}_1)
> -  (res = {0x40000000000000004, +, 0xfffffffffffffffeffffffffffffffff}_1))
> +  (chrec = {4, +, 18446744073709551615}_1)
> +  (res = {4, +, 18446744073709551615}_1))
> in -fdump-tree-all-scev, but the optimizations are pretty much same,
> initially the IL is tiny bit larger (because of the extra converts), soon it
> is smaller:
> -  _15 = a[0x80000000000000008];
> -  a[0x40000000000000004] = _15;
> -  _23 = a[0x60000000000000006];
> -  a[0x30000000000000003] = _23;
> -  _31 = a[0x40000000000000004];
> -  a[0x20000000000000002] = _31;
> -  a[0x10000000000000001] = _31;
> +  _5 = a[8];
> +  a[4] = _5;
> +  _26 = a[6];
> +  a[3] = _26;
> +  a[2] = _5;
> +  a[1] = _5;
> in *.optimized.  By exposing the casts, guess e.g. GIMPLE opts could find
> out that a[2] and a[0x20000000000000002] is the same thing etc. (rather than
> (sometimes?) discovering that only during RTL optimizations when it is
> expanded that way).

My worry is that while we know the IV i doesn't wrap we 
don't know the same for (sizetype)i in

for (__int128 i = 0; i < n; ++i)
  a[i] = 0.;

and thus vectorization doesn't work.  Similar for

void foo(long n, int *a)
{
  for (long i = 0; i < n; ++i)
    a[i] = 1;
}

where if you cast i to int or unsigned int in the array reference
SCEV says 

Creating dr for *_4
analyze_innermost: t.c:4:15: missed:  failed: evolution of base is not 
affine.

So the important thing for GIMPLE is to ensure to _not_ have
narrowing of IVs.  Yes, sizetype is probably not too bad here
on normal targets but some embedded ones have sizetype 16bits
but pointers 24bits and if people write long or int IVs they
get bitten.  Btw, pure sign-changes seem fine here (even to
unsigned).

One thing I don't like too much is exposing more 'sizetype' to the
IL before we "fixed" POINTER_PLUS_EXPR et al to take signed sizetype
args...

> > Otherwise I agree.  Not sure if we want to do this during stage4 though.
> 
> If you want to defer for stage1, I can certainly wait with that.

Definitely.

Richard.

> 	Jakub
> 
>
Jakub Jelinek Feb. 8, 2019, 11:58 a.m. UTC | #4
On Fri, Feb 08, 2019 at 12:47:08PM +0100, Richard Biener wrote:
> My worry is that while we know the IV i doesn't wrap we 
> don't know the same for (sizetype)i in
> 
> for (__int128 i = 0; i < n; ++i)
>   a[i] = 0.;

First of all, for normal targets I think it is very rare that people use
__int128 array indexes (or long long on 32-bit targets).
I agree it can happen on weird targets more often.

> and thus vectorization doesn't work.  Similar for
> 
> void foo(long n, int *a)
> {
>   for (long i = 0; i < n; ++i)
>     a[i] = 1;
> }

Well, we should be able to improve the number of iterations analysis here
for the > sizetype ones, no valid array should have more than PTRDIFF_MAX
or SIZE_MAX-1 elements (well, all objects should have computable sizeof and
pointer subtraction needs to work, so it is even divided by the element type)
and thus we could derive some reasonable range for the loops with > sizetype
IVs where we use that.

	Jakub
diff mbox series

Patch

--- gcc/gimplify.c.jj	2019-01-28 23:30:53.199762928 +0100
+++ gcc/gimplify.c	2019-02-06 17:15:35.368976785 +0100
@@ -2977,6 +2977,12 @@  gimplify_compound_lval (tree *expr_p, gi
 
       if (TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF)
 	{
+	  /* Expansion will cast the index to sizetype, so any excess bits
+	     aren't useful for optimizations.  */
+	  if (!error_operand_p (TREE_OPERAND (t, 1))
+	      && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (t, 1)))
+		  > TYPE_PRECISION (sizetype)))
+	    TREE_OPERAND (t, 1) = fold_convert (sizetype, TREE_OPERAND (t, 1));
 	  /* Gimplify the dimension.  */
 	  if (!is_gimple_min_invariant (TREE_OPERAND (t, 1)))
 	    {
--- gcc/testsuite/gcc.c-torture/execute/pr89223.c.jj	2019-02-07 18:09:39.869088769 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr89223.c	2019-02-07 18:09:12.359543231 +0100
@@ -0,0 +1,27 @@ 
+/* PR tree-optimization/89223 */
+/* { dg-do run { target int128 } } */
+
+int a[9] = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
+unsigned __int128 b;
+
+__attribute__((noipa)) void
+foo (void)
+{
+  for (b = (((unsigned __int128) 4) << 64) + 4; b != 0; b -= ((((unsigned __int128) 1) << 64) + 1))
+    a[b] = a[b + b];
+}
+
+int
+main ()
+{
+  int i;
+  if (sizeof (__UINTPTR_TYPE__) * __CHAR_BIT__ != 64
+      || sizeof (__SIZE_TYPE__) * __CHAR_BIT__ != 64
+      || sizeof (__int128) * __CHAR_BIT__ != 128)
+    return 0;
+  foo ();
+  for (i = 0; i < 9; ++i)
+    if (a[i] != (((1 << i) & 0x16) != 0 ? 9 : i == 3 ? 7 : i + 1))
+      __builtin_abort ();
+  return 0;
+}