Message ID | 20190813083326.10833-3-rdapp@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | Simplify wrapped binops. | expand |
On Tue, 13 Aug 2019, Robin Dapp wrote: > +/* ((T)(A)) + CST -> (T)(A + CST) */ > +#if GIMPLE > + (simplify > + (plus (convert SSA_NAME@0) INTEGER_CST@1) > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > + && INTEGRAL_TYPE_P (type) I have become rather wary of INTEGRAL_TYPE_P recently because it includes enum types, which with -fstrict-enum can have a surprising behavior. If I have enum E { A, B, C }; and e has type enum E, with -fstrict-enum, do your tests manage to prevent (long)e+1 from becoming (long)(e+1) with an unsafe operation done in the enum type? > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > + && int_fits_type_p (@1, TREE_TYPE (@0))) > + /* Perform binary operation inside the cast if the constant fits > + and (A + CST)'s range does not overflow. */ > + (with > + { > + wi::overflow_type min_ovf = wi::OVF_OVERFLOW, > + max_ovf = wi::OVF_OVERFLOW; > + tree inner_type = TREE_TYPE (@0); > + > + wide_int w1 = w1.from (wi::to_wide (@1), TYPE_PRECISION (inner_type), > + TYPE_SIGN (inner_type)); > + > + wide_int wmin0, wmax0; > + if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE) > + { > + wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf); > + wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf); > + } > + } > + (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) > + (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; }))) > + ))) > +#endif > +
> I have become rather wary of INTEGRAL_TYPE_P recently because it > includes enum types, which with -fstrict-enum can have a surprising > behavior. If I have > enum E { A, B, C }; > and e has type enum E, with -fstrict-enum, do your tests manage to > prevent (long)e+1 from becoming (long)(e+1) with an unsafe operation > done in the enum type? Ah, I didn't specifically check for this. Going to rather change the checks to TREE_CODE (type) == INTEGER_TYPE then. Regards Robin
On 8/13/19 2:33 AM, Robin Dapp wrote: > We would like to simplify code like > (larger_type)(var + const1) + const2 > to > (larger_type)(var + combined_const1_const2) > when we know that no overflow happens. > --- > gcc/match.pd | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 101 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 0317bc704f7..94400529ad8 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -2020,6 +2020,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (cst && !TREE_OVERFLOW (cst)) > (plus { cst; } @0)))) > > +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ Do you want to handle MINUS? What about POINTER_PLUS_EXPR? Richi and Marc are better suited to review the guts of this change than I am. I though we had these transformations in fold-const.c and I was going to suggest removing them as a part of this patch. *But* I can't seem to find them. Jeff
> +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ > Do you want to handle MINUS? What about POINTER_PLUS_EXPR? When I last attempted this patch I had the MINUS still in it but got confused easily by needing to think of too many cases at once leading to lots of stupid mistakes. Hence, I left it out in the last revision :) Once everything is in place, I guess the additional cases can be added more easily - hope that's reasonable. Regards Robin
On 8/13/19 2:42 PM, Robin Dapp wrote: >> +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ >> Do you want to handle MINUS? What about POINTER_PLUS_EXPR? > > When I last attempted this patch I had the MINUS still in it but got > confused easily by needing to think of too many cases at once leading to > lots of stupid mistakes. Hence, I left it out in the last revision :) > Once everything is in place, I guess the additional cases can be added > more easily - hope that's reasonable. It's fine by me. Richi has the final say though... jeff
On Tue, 13 Aug 2019, Robin Dapp wrote: > diff --git a/gcc/match.pd b/gcc/match.pd > index 0317bc704f7..94400529ad8 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -2020,6 +2020,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (cst && !TREE_OVERFLOW (cst)) > (plus { cst; } @0)))) > > +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ > +#if GIMPLE > + (simplify > + (plus (convert (plus @0 INTEGER_CST@1)) INTEGER_CST@2) > + (if (INTEGRAL_TYPE_P (type) > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) > + /* We actually want to perform two simplifications here: Maybe two "transformations"? Since (1) is the exact opposite of the other simplification you are adding... It would have been nice if it could indeed be split into 2 simplifications, in particular because we already have a better (?) version of (2) not far above in the file. But we cannot add (1) and I can't think of a good way to reuse the existing version of (2) :-( > + (1) (T)(A + CST1) + CST2 --> (T)(A) + (T)(CST1) > + If for (A + CST1) we either do not care about overflow (e.g. > + for a signed inner type) or the overflow is ok for an unsigned > + inner type. From the code, it seems like the condition is really "If we know that no overflow can occur, either because it would be UB (signed type), or thanks to range information". Maybe that's the meaning of what you wrote, but I had trouble interpreting it. > + (2) (T)(A) + (T)(CST1) + CST2 --> (T)(A) + (T)(CST1 + CST2) > + If (CST1 + CST2) does not overflow and we do care about overflow > + (for a signed outer type) or we do not care about overflow in an > + unsigned outer type. */ > + (with > + { > + tree inner_type = TREE_TYPE (@0); > + wide_int wmin0, wmax0; > + wide_int cst1 = wi::to_wide (@1); > + > + wi::overflow_type min_ovf = wi::OVF_OVERFLOW, There seems to be an inconsistency between spaces and TABs depending on the lines. > + max_ovf = wi::OVF_OVERFLOW; > + > + /* Get overflow behavior. */ > + bool ovf_undef_inner = TYPE_OVERFLOW_UNDEFINED (inner_type); > + bool ovf_undef_outer = TYPE_OVERFLOW_UNDEFINED (type); > + > + /* Get value range of A. */ > + enum value_range_kind vr0 = get_range_info (@0, &wmin0, &wmax0); > + > + /* If we have a proper range, determine min and max overflow > + of (A + CST1). At some point we may want to split this into a helper function operation_may_overflow or something, that takes an arbitrary mix of SSA_NAME and constants. Not necessary for this patch though. > + ??? We might also want to handle anti ranges. */ > + if (vr0 == VR_RANGE) > + { > + wi::add (wmin0, cst1, TYPE_SIGN (inner_type), &min_ovf); > + wi::add (wmax0, cst1, TYPE_SIGN (inner_type), &max_ovf); > + } > + > + /* Inner overflow does not matter in this case. */ > + if (ovf_undef_inner) > + { > + min_ovf = wi::OVF_NONE; > + max_ovf = wi::OVF_NONE; > + } Seems that you could switch the conditions, to avoid unnecessary additions if (ovf_undef_inner) ... else if (vr0 == VR_RANGE) ... Maybe even move get_range_info then. > + > + /* Extend CST from INNER_TYPE to TYPE. */ > + cst1 = cst1.from (cst1, TYPE_PRECISION (type), TYPE_SIGN (inner_type)); wide_int::from? It looks strange to use an arbitrary variable to access a static function, but that's a style preference, it is ok as is. > + > + /* Check for overflow of (TYPE)(CST1 + CST2). */ > + wi::overflow_type outer_ovf = wi::OVF_OVERFLOW; > + wide_int cst = wi::add (cst1, wi::to_wide (@2), TYPE_SIGN (type), > + &outer_ovf); > + > + /* We *do* care about an overflow here as we do not want to introduce > + new undefined behavior that was not there before. */ There is always the possibility to cast to an unsigned type, perform the operation, then cast back, like the "other (2)" does. But that makes the transformation even longer, and can lose some information. > + if (ovf_undef_outer && outer_ovf) > + { > + /* Set these here to prevent the final conversion below > + to take place instead of introducing a new guard variable. */ > + min_ovf = wi::OVF_OVERFLOW; > + max_ovf = wi::OVF_OVERFLOW; > + } > + } > + (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) > + (plus (convert @0) { wide_int_to_tree (type, cst); } > + ))))) > +#endif > + > +/* ((T)(A)) + CST -> (T)(A + CST) */ The other transformation is a true simplification. This one is more a canonicalization. It does make sense to perform the operation in the narrower type, I just hope we have enough machinery to revert this later for platforms that do not like arithmetic in sub-word size. Or probably this is still restricted enough (doesn't narrow all operations) that it doesn't matter. > +#if GIMPLE > + (simplify > + (plus (convert SSA_NAME@0) INTEGER_CST@1) > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > + && INTEGRAL_TYPE_P (type) > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > + && int_fits_type_p (@1, TREE_TYPE (@0))) > + /* Perform binary operation inside the cast if the constant fits > + and (A + CST)'s range does not overflow. */ > + (with > + { > + wi::overflow_type min_ovf = wi::OVF_OVERFLOW, > + max_ovf = wi::OVF_OVERFLOW; > + tree inner_type = TREE_TYPE (@0); > + > + wide_int w1 = w1.from (wi::to_wide (@1), TYPE_PRECISION (inner_type), > + TYPE_SIGN (inner_type)); > + > + wide_int wmin0, wmax0; > + if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE) > + { > + wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf); > + wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf); > + } > + } > + (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) > + (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; }))) There are some extra {} here. > + ))) > +#endif > +
On Tue, Aug 13, 2019 at 10:36 AM Robin Dapp <rdapp@linux.ibm.com> wrote: > > We would like to simplify code like > (larger_type)(var + const1) + const2 > to > (larger_type)(var + combined_const1_const2) > when we know that no overflow happens. Trowing in my own comments... > --- > gcc/match.pd | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 101 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 0317bc704f7..94400529ad8 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -2020,6 +2020,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (cst && !TREE_OVERFLOW (cst)) > (plus { cst; } @0)))) > > +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ > +#if GIMPLE > + (simplify > + (plus (convert (plus @0 INTEGER_CST@1)) INTEGER_CST@2) > + (if (INTEGRAL_TYPE_P (type) > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) > + /* We actually want to perform two simplifications here: > + (1) (T)(A + CST1) + CST2 --> (T)(A) + (T)(CST1) > + If for (A + CST1) we either do not care about overflow (e.g. > + for a signed inner type) or the overflow is ok for an unsigned > + inner type. > + (2) (T)(A) + (T)(CST1) + CST2 --> (T)(A) + (T)(CST1 + CST2) But the original is already (T)(A) + CST1-in-T + CST2-in-T and thus we can always do 2) by means of already existing patterns. So it's only 1) you need to implement?! And 1) is really (T)(A + CST) -> (T)A + CST-in-T, no? > + If (CST1 + CST2) does not overflow and we do care about overflow > + (for a signed outer type) or we do not care about overflow in an > + unsigned outer type. */ > + (with > + { > + tree inner_type = TREE_TYPE (@0); > + wide_int wmin0, wmax0; > + wide_int cst1 = wi::to_wide (@1); > + > + wi::overflow_type min_ovf = wi::OVF_OVERFLOW, > + max_ovf = wi::OVF_OVERFLOW; > + > + /* Get overflow behavior. */ > + bool ovf_undef_inner = TYPE_OVERFLOW_UNDEFINED (inner_type); > + bool ovf_undef_outer = TYPE_OVERFLOW_UNDEFINED (type); > + > + /* Get value range of A. */ > + enum value_range_kind vr0 = get_range_info (@0, &wmin0, &wmax0); > + > + /* If we have a proper range, determine min and max overflow > + of (A + CST1). > + ??? We might also want to handle anti ranges. */ > + if (vr0 == VR_RANGE) > + { > + wi::add (wmin0, cst1, TYPE_SIGN (inner_type), &min_ovf); > + wi::add (wmax0, cst1, TYPE_SIGN (inner_type), &max_ovf); > + } > + > + /* Inner overflow does not matter in this case. */ > + if (ovf_undef_inner) > + { > + min_ovf = wi::OVF_NONE; > + max_ovf = wi::OVF_NONE; > + } > + > + /* Extend CST from INNER_TYPE to TYPE. */ > + cst1 = cst1.from (cst1, TYPE_PRECISION (type), TYPE_SIGN (inner_type)); > + > + /* Check for overflow of (TYPE)(CST1 + CST2). */ > + wi::overflow_type outer_ovf = wi::OVF_OVERFLOW; > + wide_int cst = wi::add (cst1, wi::to_wide (@2), TYPE_SIGN (type), > + &outer_ovf); > + > + /* We *do* care about an overflow here as we do not want to introduce > + new undefined behavior that was not there before. */ > + if (ovf_undef_outer && outer_ovf) > + { > + /* Set these here to prevent the final conversion below > + to take place instead of introducing a new guard variable. */ > + min_ovf = wi::OVF_OVERFLOW; > + max_ovf = wi::OVF_OVERFLOW; > + } > + } > + (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) > + (plus (convert @0) { wide_int_to_tree (type, cst); } > + ))))) > +#endif > + > +/* ((T)(A)) + CST -> (T)(A + CST) */ But then this is the reverse... (as Marc already noticed). So - what are you really after? (sorry if I don't remeber, testcase(s) are missing from this patch) To me it seems that 1) loses information if A + CST was done in a signed type and we know that overflow doesn't happen because of that. For the reverse transformation we don't. Btw, if you make A == A' + CST' then you get (T)A + CST -> (T)(A' + CST' + CST) which is again trivially handled so why do you need both transforms again? > +#if GIMPLE > + (simplify > + (plus (convert SSA_NAME@0) INTEGER_CST@1) > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > + && INTEGRAL_TYPE_P (type) > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > + && int_fits_type_p (@1, TREE_TYPE (@0))) > + /* Perform binary operation inside the cast if the constant fits > + and (A + CST)'s range does not overflow. */ > + (with > + { > + wi::overflow_type min_ovf = wi::OVF_OVERFLOW, > + max_ovf = wi::OVF_OVERFLOW; > + tree inner_type = TREE_TYPE (@0); > + > + wide_int w1 = w1.from (wi::to_wide (@1), TYPE_PRECISION (inner_type), > + TYPE_SIGN (inner_type)); > + > + wide_int wmin0, wmax0; > + if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE) > + { > + wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf); > + wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf); > + } > + } > + (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) > + (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; }))) > + ))) > +#endif > + > /* ~A + A -> -1 */ > (simplify > (plus:c (bit_not @0) @0) > -- > 2.17.0 >
> So - what are you really after? (sorry if I don't remeber, testcase(s) > are missing > from this patch) > > To me it seems that 1) loses information if A + CST was done in a signed type > and we know that overflow doesn't happen because of that. For the reverse > transformation we don't. Btw, if you make A == A' + CST' then > you get (T)A + CST -> (T)(A' + CST' + CST) which is again trivially handled > so why do you need both transforms again? As far as I know the second transformation resulted from some suggestion to split what was the status back then. Some things have changed in match.pd so I checked again: What I want to achieve in the end is getting rid of add r1,-1 extend r1 add r1,1. emitted via doloop. We have (1) (T)(A + CST1) + CST2 -> (T)(A) + (T)CST1 + CST2 (2) (T)(A) + (T)CST1 + CST2 -> (T)(A) + (T)(CST1 + CST2) (2better) (A +- CST1) +- CST2 -> A + CST3 (the one Marc mentioned) (3) (T)(A) + CST -> T(A + CST) Considering unsigned long foo(unsigned int a) { if (a > 3 && a < INT_MAX - 100) return (unsigned long)(a - 1) + 1; }. This results in s390 assembly ahi %r2,-1 llgfr %r2,%r2 aghi %r2,1 br %r14. With the second transform it becomes just br %r14 due to (3) Applying pattern match.pd:2091, gimple-match.c:36515 Matching expression match.pd:111, gimple-match.c:65 (2better) Applying pattern match.pd:1980, gimple-match.c:1011 in evrp. I believe when I last tried long time ago it would not work that way but now it does, so (3) is basically an enabler of (2better). As far as I can tell (2better) does not make use of range information but I didn't really now why it works the way it does using (3). Now for unsigned long bar(unsigned int a) { if (a > 3 && a < INT_MAX - 100) return (unsigned long)(a + 1) + ULONG_MAX; } we have Folded into: _2 = a_7 + 1; Folding statement: _3 = (long unsigned int) _2; Folded into: _3 = (long unsigned int) _2; Folding statement: _6 = _3 + 18446744073709551615; (1) Applying pattern match.pd:2060, gimple-match.c:36437 in vrp1. This is not being handled by (2alt) but by (1). The same applies for long/int instead of ulong/uint except that we don't need range information there. Now, when removing (2) from my patch I run into nasty out-of-mem errors during bootstrap which I didn't find the reason for yet. I thought of something like two match.md patterns that keep reverting the other's result but I think we have a recursion depth check in place there. > Also do > > if ((did_replace || fold_all_stmts) > && fold_stmt (...)) > { > } > > to avoid extra work when folding does nothing. Does this work like this? We want to mark the statement as modified even if folding wasn't successful, I guess, and the gimple_set_modified () call should not be that expensive? Could do a separate if (fold_all_stmts && fold_stmt (...)) of course. Regards Robin
On Fri, Aug 16, 2019 at 2:17 PM Robin Dapp <rdapp@linux.ibm.com> wrote: > > > So - what are you really after? (sorry if I don't remeber, testcase(s) > > are missing > > from this patch) > > > > To me it seems that 1) loses information if A + CST was done in a signed type > > and we know that overflow doesn't happen because of that. For the reverse > > transformation we don't. Btw, if you make A == A' + CST' then > > you get (T)A + CST -> (T)(A' + CST' + CST) which is again trivially handled > > so why do you need both transforms again? > > As far as I know the second transformation resulted from some suggestion > to split what was the status back then. Some things have changed in > match.pd so I checked again: > > What I want to achieve in the end is getting rid of > add r1,-1 > extend r1 > add r1,1. > emitted via doloop. > > We have > (1) (T)(A + CST1) + CST2 -> (T)(A) + (T)CST1 + CST2 Here + CST2 isn't part of the transform - is it just to restrict the "first half" to a specific subset of cases? > (2) (T)(A) + (T)CST1 + CST2 -> (T)(A) + (T)(CST1 + CST2) Likewise for (T)(A). > (2better) (A +- CST1) +- CST2 -> A + CST3 (the one Marc mentioned) But that's (1) with A == (T)(A' + CST1) > (3) (T)(A) + CST -> T(A + CST) > > Considering > unsigned long foo(unsigned int a) > { > if (a > 3 && a < INT_MAX - 100) > return (unsigned long)(a - 1) + 1; So you want to either compute everything in unsigned long or everything in unsigned int. IMHO unsigned int is preferable. So that asks for (T)A + 1 -> (T)(A + 1) if T is wider. We are already performing the reverse of T is narrower, (T)(A + 1) -> (T)A + 1 (IIRC). > }. > > This results in s390 assembly > ahi %r2,-1 > llgfr %r2,%r2 > aghi %r2,1 > br %r14. > > With the second transform it becomes just > br %r14 > due to > > (3) Applying pattern match.pd:2091, gimple-match.c:36515 > Matching expression match.pd:111, gimple-match.c:65 > (2better) Applying pattern match.pd:1980, gimple-match.c:1011 > > in evrp. > > I believe when I last tried long time ago it would not work that way but > now it does, so (3) is basically an enabler of (2better). As far as I > can tell (2better) does not make use of range information but I didn't > really now why it works the way it does using (3). > > > Now for > > unsigned long bar(unsigned int a) > { > if (a > 3 && a < INT_MAX - 100) > return (unsigned long)(a + 1) + ULONG_MAX; Here we can't perform the add in the narrower type. But we do not want, generally, (T)(a + CST) -> (T)A + CST when T is a widening conversion. So for this case it makes sense to restrict it to the case where we can combine the constants (but I wonder if this needs to be done in a pass like reassoc to be effective). So - which case is it? IIRC we want to handle small signed constants but the code can end up unsigned. For the above we could write (unsigned long)((int)a + 1 - 1) and thus sign-extend? Or even avoid this if we know the range. That is, it becomes the first case again (operation performed in a smaller type). > } > > we have > > Folded into: _2 = a_7 + 1; > Folding statement: _3 = (long unsigned int) _2; > Folded into: _3 = (long unsigned int) _2; > Folding statement: _6 = _3 + 18446744073709551615; > (1) Applying pattern match.pd:2060, gimple-match.c:36437 > > in vrp1. > > This is not being handled by (2alt) but by (1). The same applies for > long/int instead of ulong/uint except that we don't need range > information there. > > Now, when removing (2) from my patch I run into nasty out-of-mem errors > during bootstrap which I didn't find the reason for yet. I thought of > something like two match.md patterns that keep reverting the other's > result but I think we have a recursion depth check in place there. > > > Also do > > > > if ((did_replace || fold_all_stmts) > > && fold_stmt (...)) > > { > > } > > > > to avoid extra work when folding does nothing. > > Does this work like this? We want to mark the statement as modified even > if folding wasn't successful, I guess, and the gimple_set_modified () > call should not be that expensive? Could do a separate > > if (fold_all_stmts && fold_stmt (...)) Sure, I was asking for a way to make fold_stmt conditional, it's of course not 100% as simple as I quoted. Richard. > of course. > > Regards > Robin >
> So - which case is it? IIRC we want to handle small signed > constants but the code can end up unsigned. For the > above we could write (unsigned long)((int)a + 1 - 1) and thus > sign-extend? Or even avoid this if we know the range. > That is, it becomes the first case again (operation performed > in a smaller type). Both :) But in order to move forward the second transform suffices to get rid of the redundant subtraction/addition of the ivopts candidate. Attached is a new version as single file that disregards the other possible cases (3/3 from before), only performs this transform and checks for gimple_simplified in vrp2 after ivopts now. Bootstrapped, no regressions on s390 and x86. Regards Robin
On Tue, Aug 20, 2019 at 10:37 AM Robin Dapp <rdapp@linux.ibm.com> wrote: > > > So - which case is it? IIRC we want to handle small signed > > constants but the code can end up unsigned. For the > > above we could write (unsigned long)((int)a + 1 - 1) and thus > > sign-extend? Or even avoid this if we know the range. > > That is, it becomes the first case again (operation performed > > in a smaller type). > > Both :) But in order to move forward the second transform suffices to > get rid of the redundant subtraction/addition of the ivopts candidate. > > Attached is a new version as single file that disregards the other > possible cases (3/3 from before), only performs this transform and > checks for gimple_simplified in vrp2 after ivopts now. gimple_set_modified (stmt, true); } + /* Also fold if we want to fold all statements. */ + if (!did_replace && substitute_and_fold_engine->fold_all_stmts + && fold_stmt (&i, follow_single_use_edges)) + { + did_replace = true; + stmt = gsi_stmt (i); + gimple_set_modified (stmt, true); make it } else if (substitute_and_fold_engine->fold_all_stmts && fold_stmt (&... also please make initialization of fold_all_stmts via a new constructor argument defaulted to false and elide the set_fold_all_stmts method. Are the various testcase adjustments still needed (-fno-tree-ccp, etc.)? I guess not. + && int_fits_type_p (@1, TREE_TYPE (@0))) while this properly guards against bogus truncation of @1 I think it says false to -1ul fitting int. So we wouldn't handle (unsigned long)intvar + -1ul. But it's conservatively correct for now. Eventually changing it to wi::min_precision (wi::to_wide (@1), TYPE_SIGN (TREE_TYPE (@0))) <= TYPE_PRECISION (TREE_TYPE (@0)) might work for that special case. > Bootstrapped, no regressions on s390 and x86. OK with the first suggested change and the testcase changes audited. We can do the rest as followup. Thanks, Richard. > Regards > Robin
I'm going to commit the attached two patches. Removed the redundant changes in test cases and added constructor initialization of fold_all_stmts. Regards Robin -- gcc/ChangeLog: 2019-08-21 Robin Dapp <rdapp@linux.ibm.com> * gimple-loop-versioning.cc (loop_versioning::record_address_fragment): Add nop_convert case. * tree-ssa-propagate.c (substitute_and_fold_dom_walker::before_dom_children): Fold all statements if requested. * tree-ssa-propagate.h (class substitute_and_fold_engine): Allow to fold all statements. * tree-vrp.c (class vrp_folder): Let substitute_and_fold_engine fold all statements. gcc/fortran/ChangeLog: 2019-08-21 Robin Dapp <rdapp@linux.ibm.com> * trans-intrinsic.c (gfc_conv_intrinsic_findloc): Initialize to prevent "maybe used uninitialized". -- gcc/ChangeLog: 2019-08-21 Robin Dapp <rdapp@linux.ibm.com> * match.pd: Add (T)(A) + CST -> (T)(A + CST). gcc/testsuite/ChangeLog: 2019-08-21 Robin Dapp <rdapp@linux.ibm.com> * gcc.dg/tree-ssa/copy-headers-5.c: Do not run vrp pass. * gcc.dg/tree-ssa/copy-headers-7.c: Do not run vrp pass. * gcc.dg/tree-ssa/loop-15.c: Remove XFAIL. * gcc.dg/tree-ssa/pr23744.c: Change search pattern. * gcc.dg/wrapped-binop-simplify.c: New test.
diff --git a/gcc/match.pd b/gcc/match.pd index 0317bc704f7..94400529ad8 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2020,6 +2020,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (cst && !TREE_OVERFLOW (cst)) (plus { cst; } @0)))) +/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST */ +#if GIMPLE + (simplify + (plus (convert (plus @0 INTEGER_CST@1)) INTEGER_CST@2) + (if (INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))) + /* We actually want to perform two simplifications here: + (1) (T)(A + CST1) + CST2 --> (T)(A) + (T)(CST1) + If for (A + CST1) we either do not care about overflow (e.g. + for a signed inner type) or the overflow is ok for an unsigned + inner type. + (2) (T)(A) + (T)(CST1) + CST2 --> (T)(A) + (T)(CST1 + CST2) + If (CST1 + CST2) does not overflow and we do care about overflow + (for a signed outer type) or we do not care about overflow in an + unsigned outer type. */ + (with + { + tree inner_type = TREE_TYPE (@0); + wide_int wmin0, wmax0; + wide_int cst1 = wi::to_wide (@1); + + wi::overflow_type min_ovf = wi::OVF_OVERFLOW, + max_ovf = wi::OVF_OVERFLOW; + + /* Get overflow behavior. */ + bool ovf_undef_inner = TYPE_OVERFLOW_UNDEFINED (inner_type); + bool ovf_undef_outer = TYPE_OVERFLOW_UNDEFINED (type); + + /* Get value range of A. */ + enum value_range_kind vr0 = get_range_info (@0, &wmin0, &wmax0); + + /* If we have a proper range, determine min and max overflow + of (A + CST1). + ??? We might also want to handle anti ranges. */ + if (vr0 == VR_RANGE) + { + wi::add (wmin0, cst1, TYPE_SIGN (inner_type), &min_ovf); + wi::add (wmax0, cst1, TYPE_SIGN (inner_type), &max_ovf); + } + + /* Inner overflow does not matter in this case. */ + if (ovf_undef_inner) + { + min_ovf = wi::OVF_NONE; + max_ovf = wi::OVF_NONE; + } + + /* Extend CST from INNER_TYPE to TYPE. */ + cst1 = cst1.from (cst1, TYPE_PRECISION (type), TYPE_SIGN (inner_type)); + + /* Check for overflow of (TYPE)(CST1 + CST2). */ + wi::overflow_type outer_ovf = wi::OVF_OVERFLOW; + wide_int cst = wi::add (cst1, wi::to_wide (@2), TYPE_SIGN (type), + &outer_ovf); + + /* We *do* care about an overflow here as we do not want to introduce + new undefined behavior that was not there before. */ + if (ovf_undef_outer && outer_ovf) + { + /* Set these here to prevent the final conversion below + to take place instead of introducing a new guard variable. */ + min_ovf = wi::OVF_OVERFLOW; + max_ovf = wi::OVF_OVERFLOW; + } + } + (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) + (plus (convert @0) { wide_int_to_tree (type, cst); } + ))))) +#endif + +/* ((T)(A)) + CST -> (T)(A + CST) */ +#if GIMPLE + (simplify + (plus (convert SSA_NAME@0) INTEGER_CST@1) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) + && int_fits_type_p (@1, TREE_TYPE (@0))) + /* Perform binary operation inside the cast if the constant fits + and (A + CST)'s range does not overflow. */ + (with + { + wi::overflow_type min_ovf = wi::OVF_OVERFLOW, + max_ovf = wi::OVF_OVERFLOW; + tree inner_type = TREE_TYPE (@0); + + wide_int w1 = w1.from (wi::to_wide (@1), TYPE_PRECISION (inner_type), + TYPE_SIGN (inner_type)); + + wide_int wmin0, wmax0; + if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE) + { + wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf); + wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf); + } + } + (if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) + (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; }))) + ))) +#endif + /* ~A + A -> -1 */ (simplify (plus:c (bit_not @0) @0)