diff mbox series

[2/3] Add simplify rules for wrapped binary operations.

Message ID 20190813083326.10833-3-rdapp@linux.ibm.com
State New
Headers show
Series Simplify wrapped binops. | expand

Commit Message

Robin Dapp Aug. 13, 2019, 8:33 a.m. UTC
We would like to simplify code like
 (larger_type)(var + const1) + const2
to
 (larger_type)(var + combined_const1_const2)
when we know that no overflow happens.
---
 gcc/match.pd | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

Comments

Marc Glisse Aug. 13, 2019, 10:13 a.m. UTC | #1
On Tue, 13 Aug 2019, Robin Dapp wrote:

> +/* ((T)(A)) + CST -> (T)(A + CST)  */
> +#if GIMPLE
> +  (simplify
> +   (plus (convert SSA_NAME@0) INTEGER_CST@1)
> +    (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +         && INTEGRAL_TYPE_P (type)

I have become rather wary of INTEGRAL_TYPE_P recently because it includes 
enum types, which with -fstrict-enum can have a surprising behavior. If I 
have
enum E { A, B, C };
and e has type enum E, with -fstrict-enum, do your tests manage to prevent 
(long)e+1 from becoming (long)(e+1) with an unsafe operation done in the 
enum type?

> +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> +         && int_fits_type_p (@1, TREE_TYPE (@0)))
> +     /* Perform binary operation inside the cast if the constant fits
> +        and (A + CST)'s range does not overflow.  */
> +     (with
> +      {
> +	wi::overflow_type min_ovf = wi::OVF_OVERFLOW,
> +			  max_ovf = wi::OVF_OVERFLOW;
> +        tree inner_type = TREE_TYPE (@0);
> +
> +        wide_int w1 = w1.from (wi::to_wide (@1), TYPE_PRECISION (inner_type),
> +	    TYPE_SIGN (inner_type));
> +
> +        wide_int wmin0, wmax0;
> +        if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE)
> +          {
> +            wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf);
> +            wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf);
> +          }
> +      }
> +     (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
> +      (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; })))
> +     )))
> +#endif
> +
Robin Dapp Aug. 13, 2019, 11:27 a.m. UTC | #2
> I have become rather wary of INTEGRAL_TYPE_P recently because it
> includes enum types, which with -fstrict-enum can have a surprising
> behavior. If I have
> enum E { A, B, C };
> and e has type enum E, with -fstrict-enum, do your tests manage to
> prevent (long)e+1 from becoming (long)(e+1) with an unsafe operation
> done in the enum type?

Ah, I didn't specifically check for this. Going to rather change the
checks to TREE_CODE (type) == INTEGER_TYPE then.

Regards
 Robin
Jeff Law Aug. 13, 2019, 8:20 p.m. UTC | #3
On 8/13/19 2:33 AM, Robin Dapp wrote:
> We would like to simplify code like
>  (larger_type)(var + const1) + const2
> to
>  (larger_type)(var + combined_const1_const2)
> when we know that no overflow happens.
> ---
>  gcc/match.pd | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 101 insertions(+)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0317bc704f7..94400529ad8 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2020,6 +2020,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>      (if (cst && !TREE_OVERFLOW (cst))
>       (plus { cst; } @0))))
>  
> +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST  */
Do you want to handle MINUS?  What about POINTER_PLUS_EXPR?

Richi and Marc are better suited to review the guts of this change than
I am.

I though we had these transformations in fold-const.c and I was going to
suggest removing them as a part of this patch.  *But* I can't seem to
find them.

Jeff
Robin Dapp Aug. 13, 2019, 8:42 p.m. UTC | #4
> +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST  */
> Do you want to handle MINUS?  What about POINTER_PLUS_EXPR?

When I last attempted this patch I had the MINUS still in it but got
confused easily by needing to think of too many cases at once leading to
lots of stupid mistakes.  Hence, I left it out in the last revision :)
Once everything is in place, I guess the additional cases can be added
more easily - hope that's reasonable.

Regards
 Robin
Jeff Law Aug. 13, 2019, 8:45 p.m. UTC | #5
On 8/13/19 2:42 PM, Robin Dapp wrote:
>> +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST  */
>> Do you want to handle MINUS?  What about POINTER_PLUS_EXPR?
> 
> When I last attempted this patch I had the MINUS still in it but got
> confused easily by needing to think of too many cases at once leading to
> lots of stupid mistakes.  Hence, I left it out in the last revision :)
> Once everything is in place, I guess the additional cases can be added
> more easily - hope that's reasonable.
It's fine by me.  Richi has the final say though...

jeff
Marc Glisse Aug. 13, 2019, 11:26 p.m. UTC | #6
On Tue, 13 Aug 2019, Robin Dapp wrote:

> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0317bc704f7..94400529ad8 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2020,6 +2020,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>     (if (cst && !TREE_OVERFLOW (cst))
>      (plus { cst; } @0))))
>
> +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST  */
> +#if GIMPLE
> +  (simplify
> +    (plus (convert (plus @0 INTEGER_CST@1)) INTEGER_CST@2)
> +      (if (INTEGRAL_TYPE_P (type)
> +           && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
> +       /* We actually want to perform two simplifications here:

Maybe two "transformations"? Since (1) is the exact opposite of the other 
simplification you are adding... It would have been nice if it could 
indeed be split into 2 simplifications, in particular because we already 
have a better (?) version of (2) not far above in the file. But we cannot 
add (1) and I can't think of a good way to reuse the existing version of 
(2) :-(

> +	  (1) (T)(A + CST1) + CST2  --> (T)(A) + (T)(CST1)
> +	      If for (A + CST1) we either do not care about overflow (e.g.
> +	      for a signed inner type) or the overflow is ok for an unsigned
> +	      inner type.

From the code, it seems like the condition is really "If we know that no 
overflow can occur, either because it would be UB (signed type), or thanks 
to range information". Maybe that's the meaning of what you wrote, but I 
had trouble interpreting it.

> +	  (2) (T)(A) + (T)(CST1) + CST2 --> (T)(A) + (T)(CST1 + CST2)
> +	      If (CST1 + CST2) does not overflow and we do care about overflow
> +	      (for a signed outer type) or we do not care about overflow in an
> +	      unsigned outer type.  */
> +       (with
> +       {
> +         tree inner_type = TREE_TYPE (@0);
> +         wide_int wmin0, wmax0;
> +         wide_int cst1 = wi::to_wide (@1);
> +
> +	 wi::overflow_type min_ovf = wi::OVF_OVERFLOW,

There seems to be an inconsistency between spaces and TABs depending on 
the lines.

> +			   max_ovf = wi::OVF_OVERFLOW;
> +
> +	 /* Get overflow behavior.  */
> +         bool ovf_undef_inner = TYPE_OVERFLOW_UNDEFINED (inner_type);
> +         bool ovf_undef_outer = TYPE_OVERFLOW_UNDEFINED (type);
> +
> +	 /* Get value range of A.  */
> +         enum value_range_kind vr0 = get_range_info (@0, &wmin0, &wmax0);
> +
> +	 /* If we have a proper range, determine min and max overflow
> +	    of (A + CST1).

At some point we may want to split this into a helper function 
operation_may_overflow or something, that takes an arbitrary mix of 
SSA_NAME and constants. Not necessary for this patch though.

> +	    ??? We might also want to handle anti ranges.  */
> +	 if (vr0 == VR_RANGE)
> +	   {
> +	     wi::add (wmin0, cst1, TYPE_SIGN (inner_type), &min_ovf);
> +	     wi::add (wmax0, cst1, TYPE_SIGN (inner_type), &max_ovf);
> +	   }
> +
> +	 /* Inner overflow does not matter in this case.  */
> +	 if (ovf_undef_inner)
> +	   {
> +	     min_ovf = wi::OVF_NONE;
> +	     max_ovf = wi::OVF_NONE;
> +	   }

Seems that you could switch the conditions, to avoid unnecessary additions
if (ovf_undef_inner) ... else if (vr0 == VR_RANGE) ...

Maybe even move get_range_info then.

> +
> +	 /* Extend CST from INNER_TYPE to TYPE.  */
> +	 cst1 = cst1.from (cst1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));

wide_int::from? It looks strange to use an arbitrary variable to access a 
static function, but that's a style preference, it is ok as is.

> +
> +	 /* Check for overflow of (TYPE)(CST1 + CST2).  */
> +	 wi::overflow_type outer_ovf = wi::OVF_OVERFLOW;
> +	 wide_int cst = wi::add (cst1, wi::to_wide (@2), TYPE_SIGN (type),
> +	     &outer_ovf);
> +
> +	 /* We *do* care about an overflow here as we do not want to introduce
> +	    new undefined behavior that was not there before.  */

There is always the possibility to cast to an unsigned type, perform the 
operation, then cast back, like the "other (2)" does. But that makes the 
transformation even longer, and can lose some information.

> +	 if (ovf_undef_outer && outer_ovf)
> +	   {
> +	     /* Set these here to prevent the final conversion below
> +		to take place instead of introducing a new guard variable.  */
> +	     min_ovf = wi::OVF_OVERFLOW;
> +	     max_ovf = wi::OVF_OVERFLOW;
> +	   }
> +       }
> +   (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
> +    (plus (convert @0) { wide_int_to_tree (type, cst); }
> +     )))))
> +#endif
> +
> +/* ((T)(A)) + CST -> (T)(A + CST)  */

The other transformation is a true simplification. This one is more a 
canonicalization. It does make sense to perform the operation in the 
narrower type, I just hope we have enough machinery to revert this later 
for platforms that do not like arithmetic in sub-word size. Or probably 
this is still restricted enough (doesn't narrow all operations) that it 
doesn't matter.

> +#if GIMPLE
> +  (simplify
> +   (plus (convert SSA_NAME@0) INTEGER_CST@1)
> +    (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +         && INTEGRAL_TYPE_P (type)
> +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> +         && int_fits_type_p (@1, TREE_TYPE (@0)))
> +     /* Perform binary operation inside the cast if the constant fits
> +        and (A + CST)'s range does not overflow.  */
> +     (with
> +      {
> +	wi::overflow_type min_ovf = wi::OVF_OVERFLOW,
> +			  max_ovf = wi::OVF_OVERFLOW;
> +        tree inner_type = TREE_TYPE (@0);
> +
> +        wide_int w1 = w1.from (wi::to_wide (@1), TYPE_PRECISION (inner_type),
> +	    TYPE_SIGN (inner_type));
> +
> +        wide_int wmin0, wmax0;
> +        if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE)
> +          {
> +            wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf);
> +            wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf);
> +          }
> +      }
> +     (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
> +      (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; })))

There are some extra {} here.

> +     )))
> +#endif
> +
Richard Biener Aug. 14, 2019, 8:49 a.m. UTC | #7
On Tue, Aug 13, 2019 at 10:36 AM Robin Dapp <rdapp@linux.ibm.com> wrote:
>
> We would like to simplify code like
>  (larger_type)(var + const1) + const2
> to
>  (larger_type)(var + combined_const1_const2)
> when we know that no overflow happens.

Trowing in my own comments...

> ---
>  gcc/match.pd | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 101 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0317bc704f7..94400529ad8 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2020,6 +2020,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>      (if (cst && !TREE_OVERFLOW (cst))
>       (plus { cst; } @0))))
>
> +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST  */
> +#if GIMPLE
> +  (simplify
> +    (plus (convert (plus @0 INTEGER_CST@1)) INTEGER_CST@2)
> +      (if (INTEGRAL_TYPE_P (type)
> +           && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
> +       /* We actually want to perform two simplifications here:
> +         (1) (T)(A + CST1) + CST2  --> (T)(A) + (T)(CST1)
> +             If for (A + CST1) we either do not care about overflow (e.g.
> +             for a signed inner type) or the overflow is ok for an unsigned
> +             inner type.
> +         (2) (T)(A) + (T)(CST1) + CST2 --> (T)(A) + (T)(CST1 + CST2)

But the original is already (T)(A) + CST1-in-T + CST2-in-T and thus we
can always
do 2) by means of already existing patterns.  So it's only 1) you need
to implement?!
And 1) is really (T)(A + CST) -> (T)A + CST-in-T, no?

> +             If (CST1 + CST2) does not overflow and we do care about overflow
> +             (for a signed outer type) or we do not care about overflow in an
> +             unsigned outer type.  */
> +       (with
> +       {
> +         tree inner_type = TREE_TYPE (@0);
> +         wide_int wmin0, wmax0;
> +         wide_int cst1 = wi::to_wide (@1);
> +
> +        wi::overflow_type min_ovf = wi::OVF_OVERFLOW,
> +                          max_ovf = wi::OVF_OVERFLOW;
> +
> +        /* Get overflow behavior.  */
> +         bool ovf_undef_inner = TYPE_OVERFLOW_UNDEFINED (inner_type);
> +         bool ovf_undef_outer = TYPE_OVERFLOW_UNDEFINED (type);
> +
> +        /* Get value range of A.  */
> +         enum value_range_kind vr0 = get_range_info (@0, &wmin0, &wmax0);
> +
> +        /* If we have a proper range, determine min and max overflow
> +           of (A + CST1).
> +           ??? We might also want to handle anti ranges.  */
> +        if (vr0 == VR_RANGE)
> +          {
> +            wi::add (wmin0, cst1, TYPE_SIGN (inner_type), &min_ovf);
> +            wi::add (wmax0, cst1, TYPE_SIGN (inner_type), &max_ovf);
> +          }
> +
> +        /* Inner overflow does not matter in this case.  */
> +        if (ovf_undef_inner)
> +          {
> +            min_ovf = wi::OVF_NONE;
> +            max_ovf = wi::OVF_NONE;
> +          }
> +
> +        /* Extend CST from INNER_TYPE to TYPE.  */
> +        cst1 = cst1.from (cst1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));
> +
> +        /* Check for overflow of (TYPE)(CST1 + CST2).  */
> +        wi::overflow_type outer_ovf = wi::OVF_OVERFLOW;
> +        wide_int cst = wi::add (cst1, wi::to_wide (@2), TYPE_SIGN (type),
> +            &outer_ovf);
> +
> +        /* We *do* care about an overflow here as we do not want to introduce
> +           new undefined behavior that was not there before.  */
> +        if (ovf_undef_outer && outer_ovf)
> +          {
> +            /* Set these here to prevent the final conversion below
> +               to take place instead of introducing a new guard variable.  */
> +            min_ovf = wi::OVF_OVERFLOW;
> +            max_ovf = wi::OVF_OVERFLOW;
> +          }
> +       }
> +   (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
> +    (plus (convert @0) { wide_int_to_tree (type, cst); }
> +     )))))
> +#endif
> +
> +/* ((T)(A)) + CST -> (T)(A + CST)  */

But then this is the reverse... (as Marc already noticed).

So - what are you really after? (sorry if I don't remeber, testcase(s)
are missing
from this patch)

To me it seems that 1) loses information if A + CST was done in a signed type
and we know that overflow doesn't happen because of that.  For the reverse
transformation we don't.  Btw, if you make A == A' + CST' then
you get (T)A + CST -> (T)(A' + CST' + CST) which is again trivially handled
so why do you need both transforms again?

> +#if GIMPLE
> +  (simplify
> +   (plus (convert SSA_NAME@0) INTEGER_CST@1)
> +    (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +         && INTEGRAL_TYPE_P (type)
> +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> +         && int_fits_type_p (@1, TREE_TYPE (@0)))
> +     /* Perform binary operation inside the cast if the constant fits
> +        and (A + CST)'s range does not overflow.  */
> +     (with
> +      {
> +       wi::overflow_type min_ovf = wi::OVF_OVERFLOW,
> +                         max_ovf = wi::OVF_OVERFLOW;
> +        tree inner_type = TREE_TYPE (@0);
> +
> +        wide_int w1 = w1.from (wi::to_wide (@1), TYPE_PRECISION (inner_type),
> +           TYPE_SIGN (inner_type));
> +
> +        wide_int wmin0, wmax0;
> +        if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE)
> +          {
> +            wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf);
> +            wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf);
> +          }
> +      }
> +     (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
> +      (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; })))
> +     )))
> +#endif
> +
>    /* ~A + A -> -1 */
>    (simplify
>     (plus:c (bit_not @0) @0)
> --
> 2.17.0
>
Robin Dapp Aug. 16, 2019, 12:17 p.m. UTC | #8
> So - what are you really after? (sorry if I don't remeber, testcase(s)
> are missing
> from this patch)
> 
> To me it seems that 1) loses information if A + CST was done in a signed type
> and we know that overflow doesn't happen because of that.  For the reverse
> transformation we don't.  Btw, if you make A == A' + CST' then
> you get (T)A + CST -> (T)(A' + CST' + CST) which is again trivially handled
> so why do you need both transforms again?

As far as I know the second transformation resulted from some suggestion
to split what was the status back then.  Some things have changed in
match.pd so I checked again:

What I want to achieve in the end is getting rid of
 add r1,-1
 extend r1
 add r1,1.
emitted via doloop.

We have
(1)       (T)(A + CST1) + CST2 -> (T)(A) + (T)CST1 + CST2
(2)       (T)(A) + (T)CST1 + CST2 -> (T)(A) + (T)(CST1 + CST2)
(2better) (A +- CST1) +- CST2 -> A + CST3 (the one Marc mentioned)
(3)       (T)(A) + CST -> T(A + CST)

Considering
 unsigned long foo(unsigned int a)
 {
   if (a > 3 && a < INT_MAX - 100)
     return (unsigned long)(a - 1) + 1;
 }.

This results in s390 assembly
 ahi	%r2,-1
 llgfr	%r2,%r2
 aghi	%r2,1
 br     %r14.

With the second transform it becomes just
 br     %r14
due to

(3) Applying pattern match.pd:2091, gimple-match.c:36515
    Matching expression match.pd:111, gimple-match.c:65
(2better) Applying pattern match.pd:1980, gimple-match.c:1011

in evrp.

I believe when I last tried long time ago it would not work that way but
now it does, so (3) is basically an enabler of (2better).  As far as I
can tell (2better) does not make use of range information but I didn't
really now why it works the way it does using (3).


Now for

 unsigned long bar(unsigned int a)
 {
   if (a > 3 && a < INT_MAX - 100)
     return (unsigned long)(a + 1) + ULONG_MAX;
 }

we have

Folded into: _2 = a_7 + 1;
Folding statement: _3 = (long unsigned int) _2;
Folded into: _3 = (long unsigned int) _2;
Folding statement: _6 = _3 + 18446744073709551615;
(1) Applying pattern match.pd:2060, gimple-match.c:36437

in vrp1.

This is not being handled by (2alt) but by (1).  The same applies for
long/int instead of ulong/uint except that we don't need range
information there.

Now, when removing (2) from my patch I run into nasty out-of-mem errors
during bootstrap which I didn't find the reason for yet.  I thought of
something like two match.md patterns that keep reverting the other's
result but I think we have a recursion depth check in place there.

> Also do
> 
>   if ((did_replace || fold_all_stmts)
>      && fold_stmt (...))
>    {
>    }
> 
> to avoid extra work when folding does nothing.

Does this work like this? We want to mark the statement as modified even
if folding wasn't successful, I guess, and the gimple_set_modified ()
call should not be that expensive? Could do a separate

 if (fold_all_stmts && fold_stmt (...))

of course.

Regards
 Robin
Richard Biener Aug. 19, 2019, 1:45 p.m. UTC | #9
On Fri, Aug 16, 2019 at 2:17 PM Robin Dapp <rdapp@linux.ibm.com> wrote:
>
> > So - what are you really after? (sorry if I don't remeber, testcase(s)
> > are missing
> > from this patch)
> >
> > To me it seems that 1) loses information if A + CST was done in a signed type
> > and we know that overflow doesn't happen because of that.  For the reverse
> > transformation we don't.  Btw, if you make A == A' + CST' then
> > you get (T)A + CST -> (T)(A' + CST' + CST) which is again trivially handled
> > so why do you need both transforms again?
>
> As far as I know the second transformation resulted from some suggestion
> to split what was the status back then.  Some things have changed in
> match.pd so I checked again:
>
> What I want to achieve in the end is getting rid of
>  add r1,-1
>  extend r1
>  add r1,1.
> emitted via doloop.
>
> We have
> (1)       (T)(A + CST1) + CST2 -> (T)(A) + (T)CST1 + CST2

Here + CST2 isn't part of the transform - is it just to restrict the
"first half" to a specific subset of cases?

> (2)       (T)(A) + (T)CST1 + CST2 -> (T)(A) + (T)(CST1 + CST2)

Likewise for (T)(A).

> (2better) (A +- CST1) +- CST2 -> A + CST3 (the one Marc mentioned)

But that's (1) with A == (T)(A' + CST1)

> (3)       (T)(A) + CST -> T(A + CST)
>
> Considering
>  unsigned long foo(unsigned int a)
>  {
>    if (a > 3 && a < INT_MAX - 100)
>      return (unsigned long)(a - 1) + 1;

So you want to either compute everything
in unsigned long or everything in unsigned int.
IMHO unsigned int is preferable.  So that
asks for

 (T)A + 1 -> (T)(A + 1)

if T is wider.  We are already performing the reverse
of T is narrower, (T)(A + 1) -> (T)A + 1 (IIRC).

>  }.
>
> This results in s390 assembly
>  ahi    %r2,-1
>  llgfr  %r2,%r2
>  aghi   %r2,1
>  br     %r14.
>
> With the second transform it becomes just
>  br     %r14
> due to
>
> (3) Applying pattern match.pd:2091, gimple-match.c:36515
>     Matching expression match.pd:111, gimple-match.c:65
> (2better) Applying pattern match.pd:1980, gimple-match.c:1011
>
> in evrp.
>
> I believe when I last tried long time ago it would not work that way but
> now it does, so (3) is basically an enabler of (2better).  As far as I
> can tell (2better) does not make use of range information but I didn't
> really now why it works the way it does using (3).
>
>
> Now for
>
>  unsigned long bar(unsigned int a)
>  {
>    if (a > 3 && a < INT_MAX - 100)
>      return (unsigned long)(a + 1) + ULONG_MAX;

Here we can't perform the add in the narrower type.  But
we do not want, generally, (T)(a + CST) -> (T)A + CST
when T is a widening conversion.  So for this case it makes
sense to restrict it to the case where we can combine the
constants (but I wonder if this needs to be done in a pass
like reassoc to be effective).

So - which case is it?  IIRC we want to handle small signed
constants but the code can end up unsigned.  For the
above we could write (unsigned long)((int)a + 1 - 1) and thus
sign-extend?  Or even avoid this if we know the range.
That is, it becomes the first case again (operation performed
in a smaller type).

>  }
>
> we have
>
> Folded into: _2 = a_7 + 1;
> Folding statement: _3 = (long unsigned int) _2;
> Folded into: _3 = (long unsigned int) _2;
> Folding statement: _6 = _3 + 18446744073709551615;
> (1) Applying pattern match.pd:2060, gimple-match.c:36437
>
> in vrp1.
>
> This is not being handled by (2alt) but by (1).  The same applies for
> long/int instead of ulong/uint except that we don't need range
> information there.
>
> Now, when removing (2) from my patch I run into nasty out-of-mem errors
> during bootstrap which I didn't find the reason for yet.  I thought of
> something like two match.md patterns that keep reverting the other's
> result but I think we have a recursion depth check in place there.
>
> > Also do
> >
> >   if ((did_replace || fold_all_stmts)
> >      && fold_stmt (...))
> >    {
> >    }
> >
> > to avoid extra work when folding does nothing.
>
> Does this work like this? We want to mark the statement as modified even
> if folding wasn't successful, I guess, and the gimple_set_modified ()
> call should not be that expensive? Could do a separate
>
>  if (fold_all_stmts && fold_stmt (...))

Sure, I was asking for a way to make fold_stmt conditional, it's of course
not 100% as simple as I quoted.

Richard.

> of course.
>
> Regards
>  Robin
>
Robin Dapp Aug. 20, 2019, 8:37 a.m. UTC | #10
> So - which case is it?  IIRC we want to handle small signed
> constants but the code can end up unsigned.  For the
> above we could write (unsigned long)((int)a + 1 - 1) and thus
> sign-extend?  Or even avoid this if we know the range.
> That is, it becomes the first case again (operation performed
> in a smaller type).

Both :) But in order to move forward the second transform suffices to
get rid of the redundant subtraction/addition of the ivopts candidate.

Attached is a new version as single file that disregards the other
possible cases (3/3 from before), only performs this transform and
checks for gimple_simplified in vrp2 after ivopts now.

Bootstrapped, no regressions on s390 and x86.

Regards
 Robin
Richard Biener Aug. 20, 2019, 1:01 p.m. UTC | #11
On Tue, Aug 20, 2019 at 10:37 AM Robin Dapp <rdapp@linux.ibm.com> wrote:
>
> > So - which case is it?  IIRC we want to handle small signed
> > constants but the code can end up unsigned.  For the
> > above we could write (unsigned long)((int)a + 1 - 1) and thus
> > sign-extend?  Or even avoid this if we know the range.
> > That is, it becomes the first case again (operation performed
> > in a smaller type).
>
> Both :) But in order to move forward the second transform suffices to
> get rid of the redundant subtraction/addition of the ivopts candidate.
>
> Attached is a new version as single file that disregards the other
> possible cases (3/3 from before), only performs this transform and
> checks for gimple_simplified in vrp2 after ivopts now.

          gimple_set_modified (stmt, true);
        }

+      /* Also fold if we want to fold all statements.  */
+      if (!did_replace && substitute_and_fold_engine->fold_all_stmts
+         && fold_stmt (&i, follow_single_use_edges))
+       {
+         did_replace = true;
+         stmt = gsi_stmt (i);
+         gimple_set_modified (stmt, true);

make it

       } else if (substitute_and_fold_engine->fold_all_stmts
                    && fold_stmt (&...

also please make initialization of fold_all_stmts via
a new constructor argument defaulted to false and
elide the set_fold_all_stmts method.

Are the various testcase adjustments still needed
(-fno-tree-ccp, etc.)?  I guess not.

+         && int_fits_type_p (@1, TREE_TYPE (@0)))

while this properly guards against bogus truncation
of @1 I think it says false to -1ul fitting int.  So
we wouldn't handle (unsigned long)intvar + -1ul.
But it's conservatively correct for now.  Eventually
changing it to wi::min_precision (wi::to_wide (@1),
TYPE_SIGN (TREE_TYPE (@0))) <= TYPE_PRECISION (TREE_TYPE (@0))
might work for that special case.

> Bootstrapped, no regressions on s390 and x86.

OK with the first suggested change and the testcase changes
audited.  We can do the rest as followup.

Thanks,
Richard.

> Regards
>  Robin
Robin Dapp Aug. 21, 2019, 3:05 p.m. UTC | #12
I'm going to commit the attached two patches.  Removed the redundant
changes in test cases and added constructor initialization of
fold_all_stmts.

Regards
 Robin

--

gcc/ChangeLog:

2019-08-21  Robin Dapp  <rdapp@linux.ibm.com>

	* gimple-loop-versioning.cc (loop_versioning::record_address_fragment):
	Add nop_convert case.
	* tree-ssa-propagate.c
(substitute_and_fold_dom_walker::before_dom_children):
	Fold all statements if requested.
	* tree-ssa-propagate.h (class substitute_and_fold_engine):
	Allow to fold all statements.
	* tree-vrp.c (class vrp_folder):
	Let substitute_and_fold_engine fold all statements.

gcc/fortran/ChangeLog:

2019-08-21  Robin Dapp  <rdapp@linux.ibm.com>

	* trans-intrinsic.c (gfc_conv_intrinsic_findloc): Initialize
	to prevent "maybe used uninitialized".

--

gcc/ChangeLog:

2019-08-21  Robin Dapp  <rdapp@linux.ibm.com>

	* match.pd: Add (T)(A) + CST -> (T)(A + CST).

gcc/testsuite/ChangeLog:

2019-08-21  Robin Dapp  <rdapp@linux.ibm.com>

	* gcc.dg/tree-ssa/copy-headers-5.c: Do not run vrp pass.
	* gcc.dg/tree-ssa/copy-headers-7.c: Do not run vrp pass.
	* gcc.dg/tree-ssa/loop-15.c: Remove XFAIL.
	* gcc.dg/tree-ssa/pr23744.c: Change search pattern.
	* gcc.dg/wrapped-binop-simplify.c: New test.
diff mbox series

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 0317bc704f7..94400529ad8 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2020,6 +2020,107 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
     (if (cst && !TREE_OVERFLOW (cst))
      (plus { cst; } @0))))
 
+/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST  */
+#if GIMPLE
+  (simplify
+    (plus (convert (plus @0 INTEGER_CST@1)) INTEGER_CST@2)
+      (if (INTEGRAL_TYPE_P (type)
+           && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
+       /* We actually want to perform two simplifications here:
+	  (1) (T)(A + CST1) + CST2  --> (T)(A) + (T)(CST1)
+	      If for (A + CST1) we either do not care about overflow (e.g.
+	      for a signed inner type) or the overflow is ok for an unsigned
+	      inner type.
+	  (2) (T)(A) + (T)(CST1) + CST2 --> (T)(A) + (T)(CST1 + CST2)
+	      If (CST1 + CST2) does not overflow and we do care about overflow
+	      (for a signed outer type) or we do not care about overflow in an
+	      unsigned outer type.  */
+       (with
+       {
+         tree inner_type = TREE_TYPE (@0);
+         wide_int wmin0, wmax0;
+         wide_int cst1 = wi::to_wide (@1);
+
+	 wi::overflow_type min_ovf = wi::OVF_OVERFLOW,
+			   max_ovf = wi::OVF_OVERFLOW;
+
+	 /* Get overflow behavior.  */
+         bool ovf_undef_inner = TYPE_OVERFLOW_UNDEFINED (inner_type);
+         bool ovf_undef_outer = TYPE_OVERFLOW_UNDEFINED (type);
+
+	 /* Get value range of A.  */
+         enum value_range_kind vr0 = get_range_info (@0, &wmin0, &wmax0);
+
+	 /* If we have a proper range, determine min and max overflow
+	    of (A + CST1).
+	    ??? We might also want to handle anti ranges.  */
+	 if (vr0 == VR_RANGE)
+	   {
+	     wi::add (wmin0, cst1, TYPE_SIGN (inner_type), &min_ovf);
+	     wi::add (wmax0, cst1, TYPE_SIGN (inner_type), &max_ovf);
+	   }
+
+	 /* Inner overflow does not matter in this case.  */
+	 if (ovf_undef_inner)
+	   {
+	     min_ovf = wi::OVF_NONE;
+	     max_ovf = wi::OVF_NONE;
+	   }
+
+	 /* Extend CST from INNER_TYPE to TYPE.  */
+	 cst1 = cst1.from (cst1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));
+
+	 /* Check for overflow of (TYPE)(CST1 + CST2).  */
+	 wi::overflow_type outer_ovf = wi::OVF_OVERFLOW;
+	 wide_int cst = wi::add (cst1, wi::to_wide (@2), TYPE_SIGN (type),
+	     &outer_ovf);
+
+	 /* We *do* care about an overflow here as we do not want to introduce
+	    new undefined behavior that was not there before.  */
+	 if (ovf_undef_outer && outer_ovf)
+	   {
+	     /* Set these here to prevent the final conversion below
+		to take place instead of introducing a new guard variable.  */
+	     min_ovf = wi::OVF_OVERFLOW;
+	     max_ovf = wi::OVF_OVERFLOW;
+	   }
+       }
+   (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
+    (plus (convert @0) { wide_int_to_tree (type, cst); }
+     )))))
+#endif
+
+/* ((T)(A)) + CST -> (T)(A + CST)  */
+#if GIMPLE
+  (simplify
+   (plus (convert SSA_NAME@0) INTEGER_CST@1)
+    (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+         && INTEGRAL_TYPE_P (type)
+         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+         && int_fits_type_p (@1, TREE_TYPE (@0)))
+     /* Perform binary operation inside the cast if the constant fits
+        and (A + CST)'s range does not overflow.  */
+     (with
+      {
+	wi::overflow_type min_ovf = wi::OVF_OVERFLOW,
+			  max_ovf = wi::OVF_OVERFLOW;
+        tree inner_type = TREE_TYPE (@0);
+
+        wide_int w1 = w1.from (wi::to_wide (@1), TYPE_PRECISION (inner_type),
+	    TYPE_SIGN (inner_type));
+
+        wide_int wmin0, wmax0;
+        if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE)
+          {
+            wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf);
+            wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf);
+          }
+      }
+     (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
+      (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; })))
+     )))
+#endif
+
   /* ~A + A -> -1 */
   (simplify
    (plus:c (bit_not @0) @0)