Patchwork [WIP] Re: Inefficient end-of-loop value computation - missed optimization somewhere?

login
register
mail settings
Submitter Ulrich Weigand
Date March 8, 2012, 2:29 p.m.
Message ID <201203081429.q28ETB5T000824@d06av02.portsmouth.uk.ibm.com>
Download mbox | patch
Permalink /patch/145536/
State New
Headers show

Comments

Ulrich Weigand - March 8, 2012, 2:29 p.m.
Richard Guenther wrote:
> On Tue, Feb 28, 2012 at 4:10 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> > I'll still need to do proper testing and benchmarking, but I thought
> > I'd post the patch anyway just as a heads-up ...
> >
> > Does this look reasonable to you?
> 
> Yes, that looks reasonable.  Though instead of unsigned_type_for
> please use build_nonstandard_integer_type instead (if the type
> is not already unsigned).  unsigned_type_for does not necessarily
> preserve type-precision and may even return NULL.

OK, I've changed the patch to use build_nonstandard_integer_type, and it
still fixes the original problem.  However, on further examination it
turns out that this "fix" is not because the expression is actually
simplified at tree level.  Rather, it is just that because of some
minor reassociation (the +1 is moved to the end), the *RTL* optimizers
now see a (just as complex but) slightly different RTL sequence, and
therefore combine now just happens to be able to fold everything
together (using the RTL-level simplify_plus_minus machinery).

Even worse, the patch causes a number of regressions:

> FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "\\+" 0
> FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "n_. \\* n_." 1
> FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1


The loop-15.c problem is probably just a missed optimization
elsewhere that can be fixed.  The problem is that a loop-end
value used to be computed as "(n - 1) * n + n" which got
simplified via fold_plusminus_mult_expr to "n * n".  When
computing in unsigned, however, the expression becomes something
along the lines of

  "[...] * (unsigned) n + (unsigned) n"

and the top-level of fold_binary_loc removes the *outermost* NOP_EXPRs
resulting in

  "[...] * (unsigned) n + n"

which fold_plusminus_mult_expr no longer recognizes as something
it can handle (since "(unsigned) n" is not operand_equal_p to "n").

I think this can be fixed by having fold_plusminus_mult_expr call
STRIP_NOPS on sub-operands, as is already done elsewhere in fold_binary_loc.

This basically fixes the test case (due to some remaining casts, the
scan-tree-dump patterns still don't match as-is, but the expected code
is again generated).


The no-scevccp-noreassoc-outer-4.c problem is more difficult, and would
seem to point to a fundamental problem with performing the computation
in unsigned mode.  In this case, the end value of an inner loop forms
part of a scalar evolution of the outer loop.  When computing the end
value in unsigned type, that scalar evolution is no longer detected.
In part, this is because the scalar evolution machinery might need to
be improved in handling casts.  In particular, in some places it only
does STRIP_USELESS_TYPE_CONVERSION; changing those to STRIP_NOPS makes
it detect the evolution again.  But I'm not fully convinced this is
a safe change ...   In any case, this still doesn't make the outer
loop vectorizable, since it is no longer provably finite.  Without
the patch, this could be proven *because* the computation of the
inner loop's end value used signed arithmetic which cannot overflow.
By performing the computation in unsigned, this information is in
fact lost.


This would seem to indicate that unconditionally using unsigned
arithmetic even if not strictly necessary might not always be
the best solution either.  I've gone back and looked that at the
original problem that

  (start + 1) + (int)((unsigned int) ~start + (unsigned int) end)

is not simplified by fold-const.

Interestingly enough, it *is* valid to simplify this expression
to just plain "end", even with signed arithmetic, since the
transformation does not introduce any new potential overflows.

This is actually recognized in fold_binary_loc, but only one special
case.  See the code around the following comment:
                  /* The only case we can still associate with two variables
                     is if they are the same, modulo negation.  */

Unfortunately, this handles only A and -A; it doesn't recognize that
A+1 is in fact the negation of ~A ...

There is also code in tree-ssa-forwprop.c:associate_plusminus that
handles quite a number of special cases where re-association *is*
allowed even for signed arithmetic:

        (A +- B) - A       ->  +- B
        (A +- B) -+ B      ->  A
        (CST +- A) +- CST  ->  CST +- A
        (A + CST) +- CST   ->  A + CST
        ~A + A             ->  -1
        ~A + 1             ->  -A 
        A - (A +- B)       ->  -+ B
        A +- (B +- A)      ->  +- B
        CST +- (CST +- A)  ->  CST +- A
        CST +- (A +- CST)  ->  CST +- A
        A + ~A             ->  -1

This handles some cases involving ~, but still not quite the case
we need for this expression.   In addition, the forward propagtion logic
doesn't seem to handle casts very well (as opposed to fold-const, which
makes liberal use of STRIP_NOPS).


I'm wondering where to go from there:

- Why are those special re-association cases handled only in forwprop,
  and not in fold-cost?   I would have expected forwprop to just propagate
  subexpressions into a tree and then use fold-const to actually simplify ...

- Should I try to improve forwprop to handle casts and additional re-
  association cases until it handles the above expression?

- Or should the logic in fold-const be improved to add additional cases
  of re-association?

Thanks for any further suggestions!


I've attached some of the changes mentioned above for reference.

Bye,
Ulrich


Patch: Compute loop-end value using unsigned type.
Richard Guenther - March 12, 2012, 10:56 a.m.
On Thu, Mar 8, 2012 at 3:29 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>> On Tue, Feb 28, 2012 at 4:10 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> > I'll still need to do proper testing and benchmarking, but I thought
>> > I'd post the patch anyway just as a heads-up ...
>> >
>> > Does this look reasonable to you?
>>
>> Yes, that looks reasonable.  Though instead of unsigned_type_for
>> please use build_nonstandard_integer_type instead (if the type
>> is not already unsigned).  unsigned_type_for does not necessarily
>> preserve type-precision and may even return NULL.
>
> OK, I've changed the patch to use build_nonstandard_integer_type, and it
> still fixes the original problem.  However, on further examination it
> turns out that this "fix" is not because the expression is actually
> simplified at tree level.  Rather, it is just that because of some
> minor reassociation (the +1 is moved to the end), the *RTL* optimizers
> now see a (just as complex but) slightly different RTL sequence, and
> therefore combine now just happens to be able to fold everything
> together (using the RTL-level simplify_plus_minus machinery).
>
> Even worse, the patch causes a number of regressions:
>
>> FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "\\+" 0
>> FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "n_. \\* n_." 1
>> FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
>
>
> The loop-15.c problem is probably just a missed optimization
> elsewhere that can be fixed.  The problem is that a loop-end
> value used to be computed as "(n - 1) * n + n" which got
> simplified via fold_plusminus_mult_expr to "n * n".  When
> computing in unsigned, however, the expression becomes something
> along the lines of
>
>  "[...] * (unsigned) n + (unsigned) n"
>
> and the top-level of fold_binary_loc removes the *outermost* NOP_EXPRs
> resulting in
>
>  "[...] * (unsigned) n + n"
>
> which fold_plusminus_mult_expr no longer recognizes as something
> it can handle (since "(unsigned) n" is not operand_equal_p to "n").
>
> I think this can be fixed by having fold_plusminus_mult_expr call
> STRIP_NOPS on sub-operands, as is already done elsewhere in fold_binary_loc.
>
> This basically fixes the test case (due to some remaining casts, the
> scan-tree-dump patterns still don't match as-is, but the expected code
> is again generated).
>
>
> The no-scevccp-noreassoc-outer-4.c problem is more difficult, and would
> seem to point to a fundamental problem with performing the computation
> in unsigned mode.  In this case, the end value of an inner loop forms
> part of a scalar evolution of the outer loop.  When computing the end
> value in unsigned type, that scalar evolution is no longer detected.
> In part, this is because the scalar evolution machinery might need to
> be improved in handling casts.  In particular, in some places it only
> does STRIP_USELESS_TYPE_CONVERSION; changing those to STRIP_NOPS makes
> it detect the evolution again.  But I'm not fully convinced this is
> a safe change ...   In any case, this still doesn't make the outer
> loop vectorizable, since it is no longer provably finite.  Without
> the patch, this could be proven *because* the computation of the
> inner loop's end value used signed arithmetic which cannot overflow.
> By performing the computation in unsigned, this information is in
> fact lost.
>
>
> This would seem to indicate that unconditionally using unsigned
> arithmetic even if not strictly necessary might not always be
> the best solution either.  I've gone back and looked that at the
> original problem that
>
>  (start + 1) + (int)((unsigned int) ~start + (unsigned int) end)
>
> is not simplified by fold-const.
>
> Interestingly enough, it *is* valid to simplify this expression
> to just plain "end", even with signed arithmetic, since the
> transformation does not introduce any new potential overflows.
>
> This is actually recognized in fold_binary_loc, but only one special
> case.  See the code around the following comment:
>                  /* The only case we can still associate with two variables
>                     is if they are the same, modulo negation.  */
>
> Unfortunately, this handles only A and -A; it doesn't recognize that
> A+1 is in fact the negation of ~A ...
>
> There is also code in tree-ssa-forwprop.c:associate_plusminus that
> handles quite a number of special cases where re-association *is*
> allowed even for signed arithmetic:
>
>        (A +- B) - A       ->  +- B
>        (A +- B) -+ B      ->  A
>        (CST +- A) +- CST  ->  CST +- A
>        (A + CST) +- CST   ->  A + CST
>        ~A + A             ->  -1
>        ~A + 1             ->  -A
>        A - (A +- B)       ->  -+ B
>        A +- (B +- A)      ->  +- B
>        CST +- (CST +- A)  ->  CST +- A
>        CST +- (A +- CST)  ->  CST +- A
>        A + ~A             ->  -1
>
> This handles some cases involving ~, but still not quite the case
> we need for this expression.   In addition, the forward propagtion logic
> doesn't seem to handle casts very well (as opposed to fold-const, which
> makes liberal use of STRIP_NOPS).
>
>
> I'm wondering where to go from there:
>
> - Why are those special re-association cases handled only in forwprop,
>  and not in fold-cost?   I would have expected forwprop to just propagate
>  subexpressions into a tree and then use fold-const to actually simplify ...

We are moving away from GENERIC expression simplification towards
simplifications/combines on GIMPLE SSA (Andrew Pinski is working
on this a lot on his branch at the moment).

> - Should I try to improve forwprop to handle casts and additional re-
>  association cases until it handles the above expression?

Yes, ideally by trying to sub-divide this task into separate profitable
transforms.  Maybe Andrew can chime in here?

> - Or should the logic in fold-const be improved to add additional cases
>  of re-association?

I'd say no.

> Thanks for any further suggestions!
>
>
> I've attached some of the changes mentioned above for reference.
>
> Bye,
> Ulrich
>
>
> Patch: Compute loop-end value using unsigned type.
>
> Index: gcc/tree-scalar-evolution.c
> ===================================================================
> --- gcc/tree-scalar-evolution.c (revision 184625)
> +++ gcc/tree-scalar-evolution.c (working copy)
> @@ -474,11 +474,22 @@
>            return chrec_dont_know;
>          else
>            {
> +             tree type = TREE_TYPE (evolution_fn), utype = type;
>              tree res;
>
> +             /* Since the number of iterations is always unsigned, we perform
> +                the computation in an unsigned type (if feasible) to allow for
> +                improved simplification of subexpressions.  */
> +             if ((INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type))
> +                 || POINTER_TYPE_P (type))
> +               utype = build_nonstandard_integer_type
> +                         (TYPE_PRECISION (type), 1);
> +
>              /* evolution_fn is the evolution function in LOOP.  Get
>                 its value in the nb_iter-th iteration.  */
> -             res = chrec_apply (inner_loop->num, evolution_fn, nb_iter);
> +             res = chrec_convert (utype, evolution_fn, NULL);
> +             res = chrec_apply (inner_loop->num, res, nb_iter);
> +             res = chrec_convert (type, res, NULL);
>
>              if (chrec_contains_symbols_defined_in_loop (res, loop->num))
>                res = instantiate_parameters (loop, res);
>
>
>
> Patch: Use STRIP_NOPS on subexpressions in fold_plusminus_mult_expr
>
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    (revision 184625)
> +++ gcc/fold-const.c    (working copy)
> @@ -7081,6 +7081,8 @@
>     {
>       arg00 = TREE_OPERAND (arg0, 0);
>       arg01 = TREE_OPERAND (arg0, 1);
> +      STRIP_NOPS (arg00);
> +      STRIP_NOPS (arg01);
>     }
>   else if (TREE_CODE (arg0) == INTEGER_CST)
>     {
> @@ -7099,6 +7101,8 @@
>     {
>       arg10 = TREE_OPERAND (arg1, 0);
>       arg11 = TREE_OPERAND (arg1, 1);
> +      STRIP_NOPS (arg10);
> +      STRIP_NOPS (arg11);
>     }
>   else if (TREE_CODE (arg1) == INTEGER_CST)
>     {
>
>
>
> Patch: Use STRIP_NOPS when following scalar evolutions
>
> Index: gcc/tree-scalar-evolution.c
> ===================================================================
> --- gcc/tree-scalar-evolution.c (revision 184625)
> +++ gcc/tree-scalar-evolution.c (working copy)
> @@ -1154,8 +1165,8 @@
>       rhs0 = TREE_OPERAND (expr, 0);
>       rhs1 = TREE_OPERAND (expr, 1);
>       type = TREE_TYPE (rhs0);
> -      STRIP_USELESS_TYPE_CONVERSION (rhs0);
> -      STRIP_USELESS_TYPE_CONVERSION (rhs1);
> +      STRIP_NOPS (rhs0);
> +      STRIP_NOPS (rhs1);
>       res = follow_ssa_edge_binary (loop, at_stmt, type, rhs0, code, rhs1,
>                                    halting_phi, evolution_of_loop, limit);
>       break;
> @@ -1168,8 +1179,8 @@
>          rhs0 = TREE_OPERAND (expr, 0);
>          rhs1 = TREE_OPERAND (expr, 1);
>          type = TREE_TYPE (rhs0);
> -         STRIP_USELESS_TYPE_CONVERSION (rhs0);
> -         STRIP_USELESS_TYPE_CONVERSION (rhs1);
> +         STRIP_NOPS (rhs0);
> +         STRIP_NOPS (rhs1);
>          res = follow_ssa_edge_binary (loop, at_stmt, type,
>                                        rhs0, POINTER_PLUS_EXPR, rhs1,
>                                        halting_phi, evolution_of_loop, limit);
>
>
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>
Richard Guenther - May 22, 2012, 11:14 a.m.
On Fri, May 18, 2012 at 10:27 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>> On Thu, Mar 8, 2012 at 3:29 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> > - Should I try to improve forwprop to handle casts and additional re-
>> > association cases until it handles the above expression?
>>
>> Yes, ideally by trying to sub-divide this task into separate profitable
>> transforms.  Maybe Andrew can chime in here?
>
> I finally got some time to look into this in detail.  The various special-
> case transforms in associate_plusminus all transform a plus/minus expression
> tree into either a single operand, a negated operand, or a single plus or
> minus of two operands.  This is valid as long as we can prove that the
> newly introduced expression can never overflow (if we're doing signed
> arithmetic).  This is true in those special cases due to either:
>
>  - If all sub-expressions of the contracted plus/minus tree are themselves
>   performed in signed arithmetic and thus are guaranteed to never overflow,
>   their result equals the "true" arithmetic result if we were to perform
>   the computation in the actual integers with unlimited precision.
>
>   Now, "true" arithmetic is associative.  Thus, if we re-associate the
>   expression tree such that everything cancels out and just a single
>   operation A +/- B (or -A) remains, the "true" result of that operation
>   must equal the true result of the original expression -- which means
>   in particular that it lies within the range of the underlying data
>   type, and hence that final operation itself cannot overflow.
>
>  - In the case of introducing a negation, we only need to be sure that
>   its operand can never be the minimum value of its type range.  This
>   can on occasion be proven via a form of value range tracking.
>
>   For example, when performing the transformation ~A + 1 --> -A, we note
>   that ~A cannot be INT_MAX, since we can add 1 to it without overflow;
>   and therefore A cannot be INT_MIN.
>
>
> Now, using those two rules, we can actually prove many more cases where
> reassociation is possible.  For example, we can transform (A + (B + C)) - C
> to A + B  (due to the first rule).   We can also transform the original
> expression resulting from end-of-loop value computation
>   (A + 1) + (int) ((unsigned) ~A + (unsigned) B)
> into just "B" -- this can never overflow since we aren't even introducing
> any new expression.
>
> The following patch rewrites associate_plusminus to remove all the
> explicitly coded special cases, and instead performs a scan of the
> plus/minus tree similar to what is done in tree-ssa-reassoc (and also
> in simplify-rtx for that matter).  If this results in an expression
> tree that collapses to just a single operand, or just a single newly
> introduced operation, and -in the latter case- one of the two rules
> above ensure the new operation is safe, the transformation is performed.
>
> This still passes all reassoc tests, and in fact allows to remove XFAILs
> from two of them.  It also catches the end-of-loop value computation case.
>
> Tested on i386-linux with no regressions.
>
> OK for mainline?

The point of the special-cases in forwprop was to make them fast to
detect - forwprop should be a pattern-matching thing, much like
combine on RTL.

So, instead of changing forwprop this way can you adjust tree-ssa-reassoc.c
to (again) associate !TYPE_OVERFLOW_WRAPS operations but make
sure we throw away results that would possibly introduce undefined overflow
for !TYPE_OVERFLOW_WRAPS types?  There is a reassoc pass after
loop optimizations, so that should fix it as well, no?

Thanks,
Richard.

> Bye,
> Ulrich
>
>
> ChangeLog:
>
>        gcc/
>        * tree-ssa-forwprop.c (enum plus_minus_op_range,
>        struct plus_minus_op_data, struct plus_minus_data): New types.
>        (next_plus_minus_op, remove_plus_minus_op, add_plus_minus_op,
>        add_plus_minus_ops, setup_plus_minus_ops): New functions.
>        (associate_plusminus): Reimplement.
>
>        gcc/testsuite/
>        * gcc.dg/tree-ssa/reassoc-2.c: Remove XFAIL.
>        * gcc.dg/tree-ssa/reassoc-9.c: Update test, remove XFAIL.
>        * gcc.dg/tree-ssa/reassoc-23.c: Update test.
>        * gcc.dg/tree-ssa/reassoc-26.c: New test.
>
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-23.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/reassoc-23.c  (revision 187653)
> --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-23.c  (working copy)
> ***************
> *** 1,5 ****
>  /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-reassoc1" } */
>
>  unsigned int
>  foo(unsigned int a, unsigned int b, unsigned int c, unsigned int d,
> --- 1,5 ----
>  /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-optimized" } */
>
>  unsigned int
>  foo(unsigned int a, unsigned int b, unsigned int c, unsigned int d,
> *************** foo(unsigned int a, unsigned int b, unsi
> *** 13,17 ****
>    return e;
>  }
>
> ! /* { dg-final { scan-tree-dump-times "= 20" 1 "reassoc1"} } */
> ! /* { dg-final { cleanup-tree-dump "reassoc1" } } */
> --- 13,17 ----
>    return e;
>  }
>
> ! /* { dg-final { scan-tree-dump-times "return 20" 1 "optimized"} } */
> ! /* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-26.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/reassoc-26.c  (revision 0)
> --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-26.c  (revision 0)
> ***************
> *** 0 ****
> --- 1,17 ----
> + /* { dg-do compile } */
> + /* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> + int
> + foo (int start, int end)
> + {
> +   int i;
> +
> +   for (i = start; i < end; i++)
> +     ;
> +
> +   return i;
> + }
> +
> + /* Verify that the end-of-loop value is simplified to just "end".  */
> + /* { dg-final { scan-tree-dump-times "i_. = end_." 1 "optimized"} } */
> + /* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-2.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/reassoc-2.c   (revision 187653)
> --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-2.c   (working copy)
> *************** int b0, b1, b2, b3, b4,e;
> *** 14,20 ****
>    return e;
>  }
>
> ! /* We can't reassociate the expressions due to undefined signed overflow.  */
> !
> ! /* { dg-final { scan-tree-dump-times "return 0" 1 "optimized" { xfail *-*-* } } } */
>  /* { dg-final { cleanup-tree-dump "optimized" } } */
> --- 14,18 ----
>    return e;
>  }
>
> ! /* { dg-final { scan-tree-dump-times "return 0" 1 "optimized" } } */
>  /* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-9.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/tree-ssa/reassoc-9.c   (revision 187653)
> --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-9.c   (working copy)
> ***************
> *** 1,5 ****
>  /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-reassoc1" } */
>
>  int main(int a, int b, int c, int d, int e, int f, int g, int h)
>  {
> --- 1,5 ----
>  /* { dg-do compile } */
> ! /* { dg-options "-O2 -fdump-tree-optimized" } */
>
>  int main(int a, int b, int c, int d, int e, int f, int g, int h)
>  {
> *************** int main(int a, int b, int c, int d, int
> *** 11,18 ****
>    return e;
>  }
>
> ! /* We can always re-associate to a final constant but the current
> !    implementation does not allow easy roll-back without IL changes.  */
> !
> ! /* { dg-final { scan-tree-dump-times "= 20" 1 "reassoc1" { xfail *-*-* } } } */
> ! /* { dg-final { cleanup-tree-dump "reassoc1" } } */
> --- 11,15 ----
>    return e;
>  }
>
> ! /* { dg-final { scan-tree-dump-times "return 20" 1 "optimized" } } */
> ! /* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/tree-ssa-forwprop.c
> ===================================================================
> *** gcc/tree-ssa-forwprop.c     (revision 187653)
> --- gcc/tree-ssa-forwprop.c     (working copy)
> *************** simplify_bitwise_binary (gimple_stmt_ite
> *** 2188,2462 ****
>  }
>
>
> ! /* Perform re-associations of the plus or minus statement STMT that are
> !    always permitted.  Returns true if the CFG was changed.  */
>
>  static bool
> ! associate_plusminus (gimple_stmt_iterator *gsi)
>  {
> !   gimple stmt = gsi_stmt (*gsi);
> !   tree rhs1 = gimple_assign_rhs1 (stmt);
> !   tree rhs2 = gimple_assign_rhs2 (stmt);
> !   enum tree_code code = gimple_assign_rhs_code (stmt);
> !   bool changed;
>
> !   /* We can't reassociate at all for saturating types.  */
> !   if (TYPE_SATURATING (TREE_TYPE (rhs1)))
> !     return false;
>
> !   /* First contract negates.  */
> !   do
>      {
> !       changed = false;
>
> !       /* A +- (-B) -> A -+ B.  */
> !       if (TREE_CODE (rhs2) == SSA_NAME)
>        {
> !         gimple def_stmt = SSA_NAME_DEF_STMT (rhs2);
> !         if (is_gimple_assign (def_stmt)
> !             && gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR
> !             && can_propagate_from (def_stmt))
> !           {
> !             code = (code == MINUS_EXPR) ? PLUS_EXPR : MINUS_EXPR;
> !             gimple_assign_set_rhs_code (stmt, code);
> !             rhs2 = gimple_assign_rhs1 (def_stmt);
> !             gimple_assign_set_rhs2 (stmt, rhs2);
> !             gimple_set_modified (stmt, true);
> !             changed = true;
> !           }
>        }
>
> !       /* (-A) + B -> B - A.  */
> !       if (TREE_CODE (rhs1) == SSA_NAME
> !         && code == PLUS_EXPR)
>        {
> !         gimple def_stmt = SSA_NAME_DEF_STMT (rhs1);
> !         if (is_gimple_assign (def_stmt)
> !             && gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR
> !             && can_propagate_from (def_stmt))
>            {
> !             code = MINUS_EXPR;
> !             gimple_assign_set_rhs_code (stmt, code);
> !             rhs1 = rhs2;
> !             gimple_assign_set_rhs1 (stmt, rhs1);
> !             rhs2 = gimple_assign_rhs1 (def_stmt);
> !             gimple_assign_set_rhs2 (stmt, rhs2);
> !             gimple_set_modified (stmt, true);
> !             changed = true;
>            }
>        }
>      }
> -   while (changed);
>
> !   /* We can't reassociate floating-point or fixed-point plus or minus
> !      because of saturation to +-Inf.  */
> !   if (FLOAT_TYPE_P (TREE_TYPE (rhs1))
> !       || FIXED_POINT_TYPE_P (TREE_TYPE (rhs1)))
> !     goto out;
> !
> !   /* Second match patterns that allow contracting a plus-minus pair
> !      irrespective of overflow issues.
> !
> !       (A +- B) - A       ->  +- B
> !       (A +- B) -+ B      ->  A
> !       (CST +- A) +- CST  ->  CST +- A
> !       (A + CST) +- CST   ->  A + CST
> !       ~A + A             ->  -1
> !       ~A + 1             ->  -A
> !       A - (A +- B)       ->  -+ B
> !       A +- (B +- A)      ->  +- B
> !       CST +- (CST +- A)  ->  CST +- A
> !       CST +- (A +- CST)  ->  CST +- A
> !       A + ~A             ->  -1
>
> !      via commutating the addition and contracting operations to zero
> !      by reassociation.  */
>
> !   if (TREE_CODE (rhs1) == SSA_NAME)
>      {
> !       gimple def_stmt = SSA_NAME_DEF_STMT (rhs1);
> !       if (is_gimple_assign (def_stmt) && can_propagate_from (def_stmt))
>        {
> !         enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
> !         if (def_code == PLUS_EXPR
> !             || def_code == MINUS_EXPR)
> !           {
> !             tree def_rhs1 = gimple_assign_rhs1 (def_stmt);
> !             tree def_rhs2 = gimple_assign_rhs2 (def_stmt);
> !             if (operand_equal_p (def_rhs1, rhs2, 0)
> !                 && code == MINUS_EXPR)
> !               {
> !                 /* (A +- B) - A -> +- B.  */
> !                 code = ((def_code == PLUS_EXPR)
> !                         ? TREE_CODE (def_rhs2) : NEGATE_EXPR);
> !                 rhs1 = def_rhs2;
> !                 rhs2 = NULL_TREE;
> !                 gimple_assign_set_rhs_with_ops (gsi, code, rhs1, NULL_TREE);
> !                 gcc_assert (gsi_stmt (*gsi) == stmt);
> !                 gimple_set_modified (stmt, true);
> !               }
> !             else if (operand_equal_p (def_rhs2, rhs2, 0)
> !                      && code != def_code)
> !               {
> !                 /* (A +- B) -+ B -> A.  */
> !                 code = TREE_CODE (def_rhs1);
> !                 rhs1 = def_rhs1;
> !                 rhs2 = NULL_TREE;
> !                 gimple_assign_set_rhs_with_ops (gsi, code, rhs1, NULL_TREE);
> !                 gcc_assert (gsi_stmt (*gsi) == stmt);
> !                 gimple_set_modified (stmt, true);
> !               }
> !             else if (TREE_CODE (rhs2) == INTEGER_CST
> !                      && TREE_CODE (def_rhs1) == INTEGER_CST)
> !               {
> !                 /* (CST +- A) +- CST -> CST +- A.  */
> !                 tree cst = fold_binary (code, TREE_TYPE (rhs1),
> !                                         def_rhs1, rhs2);
> !                 if (cst && !TREE_OVERFLOW (cst))
> !                   {
> !                     code = def_code;
> !                     gimple_assign_set_rhs_code (stmt, code);
> !                     rhs1 = cst;
> !                     gimple_assign_set_rhs1 (stmt, rhs1);
> !                     rhs2 = def_rhs2;
> !                     gimple_assign_set_rhs2 (stmt, rhs2);
> !                     gimple_set_modified (stmt, true);
> !                   }
> !               }
> !             else if (TREE_CODE (rhs2) == INTEGER_CST
> !                      && TREE_CODE (def_rhs2) == INTEGER_CST
> !                      && def_code == PLUS_EXPR)
> !               {
> !                 /* (A + CST) +- CST -> A + CST.  */
> !                 tree cst = fold_binary (code, TREE_TYPE (rhs1),
> !                                         def_rhs2, rhs2);
> !                 if (cst && !TREE_OVERFLOW (cst))
> !                   {
> !                     code = PLUS_EXPR;
> !                     gimple_assign_set_rhs_code (stmt, code);
> !                     rhs1 = def_rhs1;
> !                     gimple_assign_set_rhs1 (stmt, rhs1);
> !                     rhs2 = cst;
> !                     gimple_assign_set_rhs2 (stmt, rhs2);
> !                     gimple_set_modified (stmt, true);
> !                   }
> !               }
> !           }
> !         else if (def_code == BIT_NOT_EXPR
> !                  && INTEGRAL_TYPE_P (TREE_TYPE (rhs1)))
>            {
> !             tree def_rhs1 = gimple_assign_rhs1 (def_stmt);
> !             if (code == PLUS_EXPR
> !                 && operand_equal_p (def_rhs1, rhs2, 0))
> !               {
> !                 /* ~A + A -> -1.  */
> !                 code = INTEGER_CST;
> !                 rhs1 = build_int_cst_type (TREE_TYPE (rhs2), -1);
> !                 rhs2 = NULL_TREE;
> !                 gimple_assign_set_rhs_with_ops (gsi, code, rhs1, NULL_TREE);
> !                 gcc_assert (gsi_stmt (*gsi) == stmt);
> !                 gimple_set_modified (stmt, true);
> !               }
> !             else if (code == PLUS_EXPR
> !                      && integer_onep (rhs1))
> !               {
> !                 /* ~A + 1 -> -A.  */
> !                 code = NEGATE_EXPR;
> !                 rhs1 = def_rhs1;
> !                 rhs2 = NULL_TREE;
> !                 gimple_assign_set_rhs_with_ops (gsi, code, rhs1, NULL_TREE);
> !                 gcc_assert (gsi_stmt (*gsi) == stmt);
> !                 gimple_set_modified (stmt, true);
> !               }
>            }
>        }
>      }
>
> !   if (rhs2 && TREE_CODE (rhs2) == SSA_NAME)
>      {
> !       gimple def_stmt = SSA_NAME_DEF_STMT (rhs2);
> !       if (is_gimple_assign (def_stmt) && can_propagate_from (def_stmt))
> !       {
> !         enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
> !         if (def_code == PLUS_EXPR
> !             || def_code == MINUS_EXPR)
> !           {
> !             tree def_rhs1 = gimple_assign_rhs1 (def_stmt);
> !             tree def_rhs2 = gimple_assign_rhs2 (def_stmt);
> !             if (operand_equal_p (def_rhs1, rhs1, 0)
> !                 && code == MINUS_EXPR)
> !               {
> !                 /* A - (A +- B) -> -+ B.  */
> !                 code = ((def_code == PLUS_EXPR)
> !                         ? NEGATE_EXPR : TREE_CODE (def_rhs2));
> !                 rhs1 = def_rhs2;
> !                 rhs2 = NULL_TREE;
> !                 gimple_assign_set_rhs_with_ops (gsi, code, rhs1, NULL_TREE);
> !                 gcc_assert (gsi_stmt (*gsi) == stmt);
> !                 gimple_set_modified (stmt, true);
> !               }
> !             else if (operand_equal_p (def_rhs2, rhs1, 0)
> !                      && code != def_code)
> !               {
> !                 /* A +- (B +- A) -> +- B.  */
> !                 code = ((code == PLUS_EXPR)
> !                         ? TREE_CODE (def_rhs1) : NEGATE_EXPR);
> !                 rhs1 = def_rhs1;
> !                 rhs2 = NULL_TREE;
> !                 gimple_assign_set_rhs_with_ops (gsi, code, rhs1, NULL_TREE);
> !                 gcc_assert (gsi_stmt (*gsi) == stmt);
> !                 gimple_set_modified (stmt, true);
> !               }
> !             else if (TREE_CODE (rhs1) == INTEGER_CST
> !                      && TREE_CODE (def_rhs1) == INTEGER_CST)
> !               {
> !                 /* CST +- (CST +- A) -> CST +- A.  */
> !                 tree cst = fold_binary (code, TREE_TYPE (rhs2),
> !                                         rhs1, def_rhs1);
> !                 if (cst && !TREE_OVERFLOW (cst))
>                    {
> !                     code = (code == def_code ? PLUS_EXPR : MINUS_EXPR);
> !                     gimple_assign_set_rhs_code (stmt, code);
> !                     rhs1 = cst;
> !                     gimple_assign_set_rhs1 (stmt, rhs1);
> !                     rhs2 = def_rhs2;
> !                     gimple_assign_set_rhs2 (stmt, rhs2);
> !                     gimple_set_modified (stmt, true);
> !                   }
> !               }
> !             else if (TREE_CODE (rhs1) == INTEGER_CST
> !                      && TREE_CODE (def_rhs2) == INTEGER_CST)
> !               {
> !                 /* CST +- (A +- CST) -> CST +- A.  */
> !                 tree cst = fold_binary (def_code == code
> !                                         ? PLUS_EXPR : MINUS_EXPR,
> !                                         TREE_TYPE (rhs2),
> !                                         rhs1, def_rhs2);
> !                 if (cst && !TREE_OVERFLOW (cst))
> !                   {
> !                     rhs1 = cst;
> !                     gimple_assign_set_rhs1 (stmt, rhs1);
> !                     rhs2 = def_rhs1;
> !                     gimple_assign_set_rhs2 (stmt, rhs2);
> !                     gimple_set_modified (stmt, true);
>                    }
>                }
>            }
> !         else if (def_code == BIT_NOT_EXPR
> !                  && INTEGRAL_TYPE_P (TREE_TYPE (rhs2)))
> !           {
> !             tree def_rhs1 = gimple_assign_rhs1 (def_stmt);
> !             if (code == PLUS_EXPR
> !                 && operand_equal_p (def_rhs1, rhs1, 0))
> !               {
> !                 /* A + ~A -> -1.  */
> !                 code = INTEGER_CST;
> !                 rhs1 = build_int_cst_type (TREE_TYPE (rhs1), -1);
> !                 rhs2 = NULL_TREE;
> !                 gimple_assign_set_rhs_with_ops (gsi, code, rhs1, NULL_TREE);
> !                 gcc_assert (gsi_stmt (*gsi) == stmt);
> !                 gimple_set_modified (stmt, true);
> !               }
>            }
>        }
>      }
> --- 2188,2661 ----
>  }
>
>
> ! /* Data structures to hold plus/minus operand information during
> !    re-association.  */
> !
> ! /* Range information.  We only track whether we are sure that an operand
> !    cannot equal the minimum value of its type (RANGE_NOT_MIN), or that it
> !    cannot equal the maximum value of its type (RANGE_NOT_MAX).  */
> !
> ! enum plus_minus_op_range
> ! {
> !   RANGE_UNKNOWN = 0,
> !   RANGE_NOT_MIN,
> !   RANGE_NOT_MAX
> ! };
> !
> ! /* A single operand.  OP is the operand, NEG is true if the operand needs
> !    to be negated, and RANGE holds operand range information we were able
> !    to track.  */
> !
> ! struct plus_minus_op_data
> ! {
> !   tree op;
> !   bool neg;
> !   enum plus_minus_op_range range;
> ! };
> !
> ! /* All operands of the expression.  The value of the expression is considered
> !    the sum of all operands, negated if indicated.  There is no implicit order
> !    of the operands; this data structure is only used in contexts where we
> !    know addition is associative.  */
> !
> ! struct plus_minus_data
> ! {
> !   /* Operand array and number of current operands.  We do not bother to
> !      dynamically resize the array; 8 operands ought to be enough for the
> !      vast majority of cases.  */
> !   struct plus_minus_op_data ops[8];
> !   unsigned int n_ops;
> !
> !   /* Used for iterating through all operands.  */
> !   unsigned int curr_op;
> !
> !   /* True if we have successfully applied some simplification operation.  */
> !   bool simplified;
> !
> !   /* True if we ran into a failure (e.g. the ops array overflowed).  The
> !      rest of the data structure is no longer reliable in that case.  */
> !   bool failed;
> ! };
> !
> ! /* Iterate through the OPS operand array.  Return false if we are already
> !    at the end of the array.  Otherwise, return true and set OP to the index
> !    of the next operand to be considered.  */
>
>  static bool
> ! next_plus_minus_op (struct plus_minus_data *ops, unsigned int *op)
>  {
> !   if (ops->curr_op < ops->n_ops)
> !     {
> !       *op = ops->curr_op++;
> !       return true;
> !     }
> !
> !   return false;
> ! }
> !
> ! /* Remove operand with index OP from the OPS operand array.  */
> !
> ! static void
> ! remove_plus_minus_op (struct plus_minus_data *ops, unsigned int op)
> ! {
> !   if (op + 1 < ops->n_ops)
> !     memmove (&ops->ops[op], &ops->ops[op + 1],
> !            sizeof (struct plus_minus_op_data) * (ops->n_ops - op - 1));
> !
> !   ops->n_ops--;
> !
> !   if (op < ops->curr_op)
> !     ops->curr_op--;
> !
> !   ops->simplified = true;
> ! }
>
> ! /* Add NEW_OP at the end of the OPS operand array.  If possible, perform
> !    simplifications that are guaranteed to leave the total value of the
> !    expression encoded by OPS unchanged:
> !      - cancel NEW_OP against an equivalent but negated operand
> !      - simplify constant integer operands (without overflow).
> !    Assumes operands can be freely re-associated.  */
> !
> ! static void
> ! add_plus_minus_op (struct plus_minus_data *ops,
> !                  struct plus_minus_op_data *new_op)
> ! {
> !   struct plus_minus_op_data cst_op;
> !   unsigned int i;
> !
> !   if (ops->failed)
> !     return;
>
> !   if (TREE_CODE (new_op->op) == INTEGER_CST && new_op->neg)
>      {
> !       tree cst = fold_unary_to_constant (NEGATE_EXPR,
> !                                        TREE_TYPE (new_op->op), new_op->op);
> !       if (cst && !TREE_OVERFLOW (cst))
> !       {
> !         new_op = &cst_op;
> !         cst_op.op = cst;
> !         cst_op.neg = false;
> !         cst_op.range = RANGE_UNKNOWN;
> !         ops->simplified = true;
> !       }
> !     }
>
> !   for (i = 0; i < ops->n_ops; i++)
> !     {
> !       if (operand_equal_p (new_op->op, ops->ops[i].op, 0)
> !         && new_op->neg != ops->ops[i].neg)
>        {
> !         remove_plus_minus_op (ops, i);
> !         ops->simplified = true;
> !         return;
>        }
>
> !       if (TREE_CODE (new_op->op) == INTEGER_CST && !new_op->neg
> !         && TREE_CODE (ops->ops[i].op) == INTEGER_CST && !ops->ops[i].neg)
>        {
> !         tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (new_op->op),
> !                                 new_op->op, ops->ops[i].op);
> !         if (cst && !TREE_OVERFLOW (cst))
>            {
> !             if (integer_zerop (cst))
> !               remove_plus_minus_op (ops, i);
> !             else
> !               ops->ops[i].op = cst;
> !
> !             ops->simplified = true;
> !             return;
>            }
>        }
>      }
>
> !   if (ops->n_ops < ARRAY_SIZE (ops->ops))
> !     {
> !       ops->ops[ops->n_ops++] = *new_op;
> !       return;
> !     }
> !
> !   /* If we run out of room in the array, give up.  This should
> !      almost never happen.  */
> !   ops->n_ops = 0;
> !   ops->failed = true;
> !   return;
> ! }
> !
> ! /* Add up to two operands LHS/RHS (as indicated by N_OPS) to the
> !    OPS operand array.  If OUTER_NEG is true, operands are to be
> !    negated before adding them to the array.  */
> !
> ! static void
> ! add_plus_minus_ops (struct plus_minus_data *ops,
> !                   unsigned int n_ops,
> !                   struct plus_minus_op_data *lhs,
> !                   struct plus_minus_op_data *rhs,
> !                   bool outer_neg)
> ! {
> !   lhs->neg ^= outer_neg;
> !   rhs->neg ^= outer_neg;
> !
> !   if (n_ops >= 1)
> !     add_plus_minus_op (ops, lhs);
> !   if (n_ops >= 2)
> !     add_plus_minus_op (ops, rhs);
> ! }
> !
> ! /* Extract plus/minus operands from STMT.  Stores up to two operands in
> !    LHS and RHS and returns the number of operands stored.  If no operands
> !    could be extracted, returns 0.
> !
> !    The routine guarantees that a plus/minus expression formed from LHS
> !    and RHS will evaluate to the same value as STMT, using modulo arithmetic.
> !    If STMT is of integral type and TRUE_ARITHMETIC is true, the routine
> !    guarantees in addition that this property holds true when performing
> !    operations in "true" integer arithmetic.
>
> !    OUTER_RANGE encodes range information known about the result of STMT;
> !    this may be used to infer range data on LHS and/or RHS.  */
>
> ! static unsigned int
> ! setup_plus_minus_ops (gimple stmt, struct plus_minus_op_data *lhs,
> !                     struct plus_minus_op_data *rhs,
> !                     enum plus_minus_op_range outer_range,
> !                     bool true_arithmetic)
> ! {
> !   enum tree_code this_code = gimple_assign_rhs_code (stmt);
> !   tree type = TREE_TYPE (gimple_assign_lhs (stmt));
> !
> !   switch (this_code)
>      {
> !     case PLUS_EXPR:
> !     case MINUS_EXPR:
> !       /* If we have an integral type where overflow wraps, we can only
> !        guarantee that LHS plus RHS equal STMT in modulo arithmetic,
> !        so fail if "true" arithmetic is requested.  For integral types
> !        where overflow does *not* wrap, we can assume that STMT does
> !        not overflow, and thus the equality holds in "true" arithmetic
> !        as well.
> !
> !        For floating point types, we -strictly speaking- could never
> !        extract operands due to rounding effect and saturation at
> !        +/-Inf.  However, the -flag-associative-math setting directs
> !        the compiler to ignore such effect; respect it.
> !
> !        Fixed point types can never be re-associated.  */
> !       if ((INTEGRAL_TYPE_P (type)
> !          && (!TYPE_OVERFLOW_WRAPS (type) || !true_arithmetic))
> !         || (FLOAT_TYPE_P (type)
> !             && flag_associative_math))
> !       {
> !         lhs->op = gimple_assign_rhs1 (stmt);
> !         lhs->neg = false;
> !         lhs->range = RANGE_UNKNOWN;
> !
> !         rhs->op = gimple_assign_rhs2 (stmt);
> !         rhs->neg = (this_code == MINUS_EXPR);
> !         rhs->range = RANGE_UNKNOWN;
> !
> !         /* For integer types that cannot overflow, the fact that a constant
> !            is added to / subtracted from an operand allows us to deduce
> !            range information about that operand.  */
> !         if (INTEGRAL_TYPE_P (type) && !TYPE_OVERFLOW_WRAPS (type)
> !             && TREE_CODE (rhs->op) == INTEGER_CST)
> !           {
> !             int sgn = tree_int_cst_sgn (rhs->op);
> !             if (sgn != 0)
> !               {
> !                 bool neg_p = (sgn < 0) ^ (this_code == MINUS_EXPR);
> !                 lhs->range = neg_p? RANGE_NOT_MIN : RANGE_NOT_MAX;
> !               }
> !           }
> !
> !         return 2;
> !       }
> !       break;
> !
> !     case NEGATE_EXPR:
> !       /* For integral types, we can extract a negation in the same set
> !        of circumstances we could extract a PLUS or MINUS, see above.
> !
> !        We can always extract a negation for types that use a separate
> !        sign bit: floating-point types and signed fixed-point types.  */
> !       if ((INTEGRAL_TYPE_P (type)
> !          && (!TYPE_OVERFLOW_WRAPS (type) || !true_arithmetic))
> !         || FLOAT_TYPE_P (type)
> !         || (FIXED_POINT_TYPE_P (type) && !TYPE_UNSIGNED (type)))
> !       {
> !         lhs->op = gimple_assign_rhs1 (stmt);
> !         lhs->neg = true;
> !         lhs->range = RANGE_UNKNOWN;
> !
> !         /* For integer types that cannot overflow, the fact that a
> !            negation is performed allows us to deduce range information
> !            about its operand.  */
> !         if (INTEGRAL_TYPE_P (type) && !TYPE_OVERFLOW_WRAPS (type))
> !           lhs->range = RANGE_NOT_MIN;
> !
> !         return 1;
> !       }
> !       break;
> !
> !     case BIT_NOT_EXPR:
> !       /* We transform ~A into -A - 1.  This is allowed only for integral
> !        types, and only in modulo arithmetic.  */
> !       if (INTEGRAL_TYPE_P (type) && !true_arithmetic)
>        {
> !         rhs->op = build_int_cst_type (type, -1);
> !         rhs->neg = false;
> !         rhs->range = RANGE_UNKNOWN;
> !
> !         lhs->op = gimple_assign_rhs1 (stmt);
> !         lhs->neg = true;
> !         lhs->range = RANGE_UNKNOWN;
> !
> !         /* For all integer types, BIT_NOT_EXPR transforms the maximum
> !            value of its range to the minmum value and vice versa.  */
> !         if (outer_range == RANGE_NOT_MIN)
> !           lhs->range = RANGE_NOT_MAX;
> !         else if (outer_range == RANGE_NOT_MAX)
> !           lhs->range = RANGE_NOT_MIN;
> !
> !         return 2;
> !       }
> !       break;
> !
> !     CASE_CONVERT:
> !       /* We look through conversions between integral types of the same
> !        precision.  This is in general allowed only in modulo arithmetic.
> !        This case is used in particular to handle constructs like
> !                 (int) ((unsigned) A + (unsigned) B)
> !        which are on occasion produced by other optimization passes that
> !        want to avoid introducing undefined overflows.  */
> !       if (INTEGRAL_TYPE_P (type) && !true_arithmetic)
> !       {
> !         tree op = gimple_assign_rhs1 (stmt);
> !
> !         if (TREE_TYPE (op) && INTEGRAL_TYPE_P (TREE_TYPE (op))
> !             && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (op)))
>            {
> !             lhs->op = op;
> !             lhs->neg = false;
> !             lhs->range = RANGE_UNKNOWN;
> !
> !             return 1;
>            }
>        }
> +       break;
> +
> +     default:
> +       break;
>      }
>
> !   return 0;
> ! }
> !
> ! /* Perform re-associations of the plus or minus statement STMT that are
> !    always permitted.  Returns true if the CFG was changed.  */
> !
> ! static bool
> ! associate_plusminus (gimple_stmt_iterator *gsi)
> ! {
> !   gimple stmt = gsi_stmt (*gsi);
> !   tree type = TREE_TYPE (gimple_assign_lhs (stmt));
> !   struct plus_minus_data ops;
> !   struct plus_minus_op_data lhs, rhs;
> !   unsigned int n_ops;
> !   bool no_overflow;
> !   bool true_arithmetic;
> !   unsigned int i;
> !   int pass;
> !
> !   /* The type of STMT determines whether we are allowed to create new
> !      operations in that type that may introduce overflows.  */
> !   no_overflow = (INTEGRAL_TYPE_P (type) && !TYPE_OVERFLOW_WRAPS (type));
> !
> !   /* Initialize OPS array from STMT.  */
> !   memset (&ops, 0, sizeof ops);
> !   n_ops = setup_plus_minus_ops (stmt, &lhs, &rhs, RANGE_UNKNOWN, no_overflow);
> !   add_plus_minus_ops (&ops, n_ops, &lhs, &rhs, false);
> !
> !   /* Iterate over OPS array and perform simplifications.  If we cannot
> !      introduce new overflows, we first perform only simplifications that
> !      are valid in "true" arithmetic, and in a second pass then perform
> !      all simplifications valid in modulo arithmetic.  If we *are* allowed
> !      to introduce new overflows, we skip the first pass.  */
> !   for (pass = 0; pass < 2; pass++)
>      {
> !       if (pass == 0)
> !       true_arithmetic = no_overflow;
> !       else if (true_arithmetic)
> !       true_arithmetic = false;
> !       else
> !       break;
> !
> !       ops.curr_op = 0;
> !       while (next_plus_minus_op (&ops, &i))
> !       {
> !         tree this_op = ops.ops[i].op;
> !         bool this_neg = ops.ops[i].neg;
> !         enum plus_minus_op_range this_range = ops.ops[i].range;
> !
> !         /* For each operand in the array that is an SSA_NAME, see whether
> !            we can extract additional plus/minus operands from its def.  */
> !         if (TREE_CODE (this_op) == SSA_NAME)
> !           {
> !             gimple def_stmt = SSA_NAME_DEF_STMT (this_op);
> !             if (is_gimple_assign (def_stmt)
> !                 && can_propagate_from (def_stmt))
> !               {
> !                 n_ops = setup_plus_minus_ops (def_stmt, &lhs, &rhs,
> !                                               this_range, true_arithmetic);
> !                 if (n_ops > 0)
>                    {
> !                     remove_plus_minus_op (&ops, i);
> !                     add_plus_minus_ops (&ops, n_ops, &lhs, &rhs, this_neg);
>                    }
>                }
>            }
> !
> !         /* After every simplification, check whether OPS array can now be
> !            represented by a new statement.  */
> !
> !         /* If we have just a single, non-negated operand left, we replace
> !            the whole expression by that operand.  */
> !         if (ops.n_ops == 1 && !ops.ops[0].neg
> !             && useless_type_conversion_p (TREE_TYPE (ops.ops[0].op), type))
> !           {
> !             gimple_assign_set_rhs_with_ops (gsi, TREE_CODE (ops.ops[0].op),
> !                                             ops.ops[0].op, NULL_TREE);
> !             gcc_assert (gsi_stmt (*gsi) == stmt);
> !             gimple_set_modified (stmt, true);
> !             goto out;
> !           }
> !
> !         /* If we have a single *negated* operand left, check whether we are
> !            allowed to introduce a new NEGATE_EXPR.  We can do that if:
> !               - we may introduce new overflows,
> !               - all simplifications were performed in "true" arithmetic,
> !                 because then the true value of the negated operand is in
> !                 the original range of the type, or
> !               - we were able to otherwise prove the operand cannot be the
> !                 minimum value of its type.  */
> !         if (ops.n_ops == 1 && ops.ops[0].neg
> !             && useless_type_conversion_p (TREE_TYPE (ops.ops[0].op), type)
> !             && (!no_overflow || true_arithmetic
> !                 || ops.ops[0].range == RANGE_NOT_MIN))
> !           {
> !             gimple_assign_set_rhs_with_ops (gsi, NEGATE_EXPR,
> !                                             ops.ops[0].op, NULL_TREE);
> !             gcc_assert (gsi_stmt (*gsi) == stmt);
> !             gimple_set_modified (stmt, true);
> !             goto out;
> !           }
> !
> !         /* If we have two operands left (and some simplifcation took place,
> !            so we're not simply looking at the original two operands), and
> !            at most one of them is negated, check whether we are allowed to
> !            introduce a new PLUS_EXPR or MINUS_EXPR.  This follows the same
> !            rules as above, except that range information doesn't help.
> !
> !            Note that even after we've replaced the original expression with
> !            a new PLUS or MINUS, we continue further simplications -- maybe
> !            we are still able to find an even simpler equivalence.  */
> !         if (ops.n_ops == 2 && ops.simplified
> !             && !ops.ops[0].neg && !ops.ops[1].neg
> !             && useless_type_conversion_p (TREE_TYPE (ops.ops[0].op), type)
> !             && useless_type_conversion_p (TREE_TYPE (ops.ops[1].op), type)
> !             && (!no_overflow || true_arithmetic))
> !           {
> !             gimple_assign_set_rhs_code (stmt, PLUS_EXPR);
> !             gimple_assign_set_rhs1 (stmt, ops.ops[0].op);
> !             gimple_assign_set_rhs2 (stmt, ops.ops[1].op);
> !             gimple_set_modified (stmt, true);
> !             ops.simplified = false;
> !           }
> !
> !         if (ops.n_ops == 2 && ops.simplified
> !             && !ops.ops[0].neg && ops.ops[1].neg
> !             && useless_type_conversion_p (TREE_TYPE (ops.ops[0].op), type)
> !             && useless_type_conversion_p (TREE_TYPE (ops.ops[1].op), type)
> !             && (!no_overflow || true_arithmetic))
> !           {
> !             gimple_assign_set_rhs_code (stmt, MINUS_EXPR);
> !             gimple_assign_set_rhs1 (stmt, ops.ops[0].op);
> !             gimple_assign_set_rhs2 (stmt, ops.ops[1].op);
> !             gimple_set_modified (stmt, true);
> !             ops.simplified = false;
> !           }
> !
> !         if (ops.n_ops == 2 && ops.simplified
> !             && ops.ops[0].neg && !ops.ops[1].neg
> !             && useless_type_conversion_p (TREE_TYPE (ops.ops[0].op), type)
> !             && useless_type_conversion_p (TREE_TYPE (ops.ops[1].op), type)
> !             && (!no_overflow || true_arithmetic))
> !           {
> !             gimple_assign_set_rhs_code (stmt, MINUS_EXPR);
> !             gimple_assign_set_rhs1 (stmt, ops.ops[1].op);
> !             gimple_assign_set_rhs2 (stmt, ops.ops[0].op);
> !             gimple_set_modified (stmt, true);
> !             ops.simplified = false;
>            }
>        }
>      }
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>
Richard Guenther - May 22, 2012, 12:04 p.m.
On Tue, May 22, 2012 at 1:58 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>> On Fri, May 18, 2012 at 10:27 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> > The following patch rewrites associate_plusminus to remove all the
>> > explicitly coded special cases, and instead performs a scan of the
>> > plus/minus tree similar to what is done in tree-ssa-reassoc (and also
>> > in simplify-rtx for that matter).  If this results in an expression
>> > tree that collapses to just a single operand, or just a single newly
>> > introduced operation, and -in the latter case- one of the two rules
>> > above ensure the new operation is safe, the transformation is performed.
>> >
>> > This still passes all reassoc tests, and in fact allows to remove XFAILs
>> > from two of them.  It also catches the end-of-loop value computation case.
>> >
>> > Tested on i386-linux with no regressions.
>> >
>> > OK for mainline?
>>
>> The point of the special-cases in forwprop was to make them fast to
>> detect - forwprop should be a pattern-matching thing, much like
>> combine on RTL.
>
> Well, the problem is that you can really make the decision whether or not
> reassociation is allowed after you've seen the whole plus-minus tree.
>
> For example, it is valid to transform "(a + (b + c)) - c" into "a + b" --
> but the only potential "intermediate" transform, "a + (b + c)" into
> "(a + b) + c", is of course not valid in general.  It only becomes valid
> due to the outer context "... - c" in which it is executed ...
>
>> So, instead of changing forwprop this way can you adjust tree-ssa-reassoc.c
>> to (again) associate !TYPE_OVERFLOW_WRAPS operations but make
>> sure we throw away results that would possibly introduce undefined overflow
>> for !TYPE_OVERFLOW_WRAPS types?  There is a reassoc pass after
>> loop optimizations, so that should fix it as well, no?
>
> I had thought of that as well.  But it is not quite that simple -- the
> problem is that tree-ssa-reassoc.c as part of its core algorithm reassociates
> expressions all the time while even still building up the tree, see e.g.
> linearize_expr or break_up_subtract.  Those steps may all be invalid in
> general, but we only know whether that is true at the very end, once we've
> built up the full tree -- but at that point it is already too late.

Hmm, really?  I had not realized it does something it cannot undo later ...
but well ISTR patches floating around for re-organizing how we do
the break_up_subtract / negate stuff.  Micha?

> I guess it might be possible to re-work tree-ssa-reassoc to *first* build
> up the tree without changing any statements, then make the decision whether
> we can re-associate, and only then actually perform modifications.  I'll
> have to think about that a bit more.

Yes, I think that's what we want.

> If we manage to do that, would you then suggest we should remove the
> associate_plusminus phase in tree-ssa-forwprop.c again?

Not sure about removing it - simplifying the simple cases early enough
might be useful.  But yes, I installed them all to avoid regressing too much
as I "fixed" reassoc not to associate expressions with undefined overflow.

Richard.

> Bye,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>
Richard Guenther - May 22, 2012, 12:11 p.m.
On Tue, May 22, 2012 at 2:04 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, May 22, 2012 at 1:58 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>> Richard Guenther wrote:
>>> On Fri, May 18, 2012 at 10:27 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
>>> > The following patch rewrites associate_plusminus to remove all the
>>> > explicitly coded special cases, and instead performs a scan of the
>>> > plus/minus tree similar to what is done in tree-ssa-reassoc (and also
>>> > in simplify-rtx for that matter).  If this results in an expression
>>> > tree that collapses to just a single operand, or just a single newly
>>> > introduced operation, and -in the latter case- one of the two rules
>>> > above ensure the new operation is safe, the transformation is performed.
>>> >
>>> > This still passes all reassoc tests, and in fact allows to remove XFAILs
>>> > from two of them.  It also catches the end-of-loop value computation case.
>>> >
>>> > Tested on i386-linux with no regressions.
>>> >
>>> > OK for mainline?
>>>
>>> The point of the special-cases in forwprop was to make them fast to
>>> detect - forwprop should be a pattern-matching thing, much like
>>> combine on RTL.
>>
>> Well, the problem is that you can really make the decision whether or not
>> reassociation is allowed after you've seen the whole plus-minus tree.
>>
>> For example, it is valid to transform "(a + (b + c)) - c" into "a + b" --
>> but the only potential "intermediate" transform, "a + (b + c)" into
>> "(a + b) + c", is of course not valid in general.  It only becomes valid
>> due to the outer context "... - c" in which it is executed ...
>>
>>> So, instead of changing forwprop this way can you adjust tree-ssa-reassoc.c
>>> to (again) associate !TYPE_OVERFLOW_WRAPS operations but make
>>> sure we throw away results that would possibly introduce undefined overflow
>>> for !TYPE_OVERFLOW_WRAPS types?  There is a reassoc pass after
>>> loop optimizations, so that should fix it as well, no?
>>
>> I had thought of that as well.  But it is not quite that simple -- the
>> problem is that tree-ssa-reassoc.c as part of its core algorithm reassociates
>> expressions all the time while even still building up the tree, see e.g.
>> linearize_expr or break_up_subtract.  Those steps may all be invalid in
>> general, but we only know whether that is true at the very end, once we've
>> built up the full tree -- but at that point it is already too late.
>
> Hmm, really?  I had not realized it does something it cannot undo later ...
> but well ISTR patches floating around for re-organizing how we do
> the break_up_subtract / negate stuff.  Micha?
>
>> I guess it might be possible to re-work tree-ssa-reassoc to *first* build
>> up the tree without changing any statements, then make the decision whether
>> we can re-associate, and only then actually perform modifications.  I'll
>> have to think about that a bit more.
>
> Yes, I think that's what we want.
>
>> If we manage to do that, would you then suggest we should remove the
>> associate_plusminus phase in tree-ssa-forwprop.c again?
>
> Not sure about removing it - simplifying the simple cases early enough
> might be useful.  But yes, I installed them all to avoid regressing too much
> as I "fixed" reassoc not to associate expressions with undefined overflow.

Btw, reassoc (and your patch?) would have the issue that for
a + (b + c) - a it would yield b + c as result which is not an atom
(but still ok, since it is a pre-existing value that is computed in the IL
already).  The simple forwprop matching catches this as it doesn't
even look into the (b + c) expression but just sees a temporary as
atom here.

Richard.
Michael Matz - May 22, 2012, 12:31 p.m.
Hi,

On Tue, 22 May 2012, Richard Guenther wrote:

> > I had thought of that as well.  But it is not quite that simple -- the
> > problem is that tree-ssa-reassoc.c as part of its core algorithm reassociates
> > expressions all the time while even still building up the tree, see e.g.
> > linearize_expr or break_up_subtract.  Those steps may all be invalid in
> > general, but we only know whether that is true at the very end, once we've
> > built up the full tree -- but at that point it is already too late.
> 
> Hmm, really?

Yep, reassoc rewrites already at linearize time.  I think that could be 
changed without too much hassle.

> I had not realized it does something it cannot undo later ...
> but well ISTR patches floating around for re-organizing how we do
> the break_up_subtract / negate stuff.  Micha?

Yes, I have an incomplete patch for that, but it didn't touch the current 
fundamental implementation detail of tree-reassoc, namely touching 
statements at analyze time already.


Ciao,
Michael.
Richard Guenther - May 22, 2012, 12:55 p.m.
On Tue, May 22, 2012 at 2:50 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote:
> Richard Guenther wrote:
>
>> Btw, reassoc (and your patch?) would have the issue that for
>> a + (b + c) - a it would yield b + c as result which is not an atom
>> (but still ok, since it is a pre-existing value that is computed in the IL
>> already).  The simple forwprop matching catches this as it doesn't
>> even look into the (b + c) expression but just sees a temporary as
>> atom here.
>
> Yes, that is a problem, and my patch partially solves it by being
> very eager in performing optimizations and checking whether we have
> found a simplified solution.  For example, in your case we'd see
>
>  y = b + c
>  x = a + y
>  r = x - a
>
> and then, when looking at "r":
>
>  - push "x" and "-a" onto the operand stack
>  - notice no simplification yet
>  - expand "x" one stage into "a" and "y"
>  - immediately cancel "a" and "-a" when pushing onto the stack
>  - notice the stack is now just "y", which represents a simplification
>  - stop
>
> This is not a *complete* solution, since by expanding "x" we might
> miss solutions we could have seen only when expanding "-a" instead;
> we'd really have to search the full solution space or back-track
> an already performed expansion.  I have not implemented this due
> to concerns about combinatorial explosion ...
>
>
> Solving this problem in the context of the existing tree-ssa-reassoc
> algorithm is indeed yet another issue that I need to think about.

I suppose we'd have to go the full way making the reassoc intermediate
IL a tree and not just a linear list of operands ... which incidentially
also would pave the way for supporting multiple operations.  Each
operation node could then link to a stmt (or have an SSA name) that
represents its value as long as we have it.  Associating would simply
remove that link, forcing a new stmt computation.

Re-structuring reassoc that way would also make its IL and optimizations
possibly usable by other passes that construct expressions, allowing
optimizing them without relying on fold and later gimplification ...

Richard.

> Bye,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  Ulrich.Weigand@de.ibm.com
>

Patch

Index: gcc/tree-scalar-evolution.c
===================================================================
--- gcc/tree-scalar-evolution.c	(revision 184625)
+++ gcc/tree-scalar-evolution.c	(working copy)
@@ -474,11 +474,22 @@ 
 	    return chrec_dont_know;
 	  else
 	    {
+	      tree type = TREE_TYPE (evolution_fn), utype = type;
 	      tree res;
 
+	      /* Since the number of iterations is always unsigned, we perform
+		 the computation in an unsigned type (if feasible) to allow for
+		 improved simplification of subexpressions.  */
+	      if ((INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type))
+		  || POINTER_TYPE_P (type))
+		utype = build_nonstandard_integer_type
+			  (TYPE_PRECISION (type), 1);
+
 	      /* evolution_fn is the evolution function in LOOP.  Get
 		 its value in the nb_iter-th iteration.  */
-	      res = chrec_apply (inner_loop->num, evolution_fn, nb_iter);
+	      res = chrec_convert (utype, evolution_fn, NULL);
+	      res = chrec_apply (inner_loop->num, res, nb_iter);
+	      res = chrec_convert (type, res, NULL);
 
 	      if (chrec_contains_symbols_defined_in_loop (res, loop->num))
 		res = instantiate_parameters (loop, res);



Patch: Use STRIP_NOPS on subexpressions in fold_plusminus_mult_expr

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 184625)
+++ gcc/fold-const.c	(working copy)
@@ -7081,6 +7081,8 @@ 
     {
       arg00 = TREE_OPERAND (arg0, 0);
       arg01 = TREE_OPERAND (arg0, 1);
+      STRIP_NOPS (arg00);
+      STRIP_NOPS (arg01);
     }
   else if (TREE_CODE (arg0) == INTEGER_CST)
     {
@@ -7099,6 +7101,8 @@ 
     {
       arg10 = TREE_OPERAND (arg1, 0);
       arg11 = TREE_OPERAND (arg1, 1);
+      STRIP_NOPS (arg10);
+      STRIP_NOPS (arg11);
     }
   else if (TREE_CODE (arg1) == INTEGER_CST)
     {



Patch: Use STRIP_NOPS when following scalar evolutions

Index: gcc/tree-scalar-evolution.c
===================================================================
--- gcc/tree-scalar-evolution.c	(revision 184625)
+++ gcc/tree-scalar-evolution.c	(working copy)
@@ -1154,8 +1165,8 @@ 
       rhs0 = TREE_OPERAND (expr, 0);
       rhs1 = TREE_OPERAND (expr, 1);
       type = TREE_TYPE (rhs0);
-      STRIP_USELESS_TYPE_CONVERSION (rhs0);
-      STRIP_USELESS_TYPE_CONVERSION (rhs1);
+      STRIP_NOPS (rhs0);
+      STRIP_NOPS (rhs1);
       res = follow_ssa_edge_binary (loop, at_stmt, type, rhs0, code, rhs1,
 				    halting_phi, evolution_of_loop, limit);
       break;
@@ -1168,8 +1179,8 @@ 
 	  rhs0 = TREE_OPERAND (expr, 0);
 	  rhs1 = TREE_OPERAND (expr, 1);
 	  type = TREE_TYPE (rhs0);
-	  STRIP_USELESS_TYPE_CONVERSION (rhs0);
-	  STRIP_USELESS_TYPE_CONVERSION (rhs1);
+	  STRIP_NOPS (rhs0);
+	  STRIP_NOPS (rhs1);
 	  res = follow_ssa_edge_binary (loop, at_stmt, type,
 					rhs0, POINTER_PLUS_EXPR, rhs1,
 					halting_phi, evolution_of_loop, limit);