Message ID | Ypdv5qrjnk87DA8p@tucnak |
---|---|
State | New |
Headers | show |
Series | match.pd: Optimize __builtin_mul_overflow_p (x, cst, (utype)0) to x > ~(utype)0 / cst [PR30314] | expand |
On 6/1/2022 7:55 AM, Jakub Jelinek via Gcc-patches wrote: > Hi! > > A comparison with a constant is most likely always faster than > .MUL_OVERFLOW from which we only check whether it overflowed and not the > multiplication result, and even if not, it is simpler operation on GIMPLE > and even if a target exists where such multiplications with overflow checking > are cheaper than comparisons, because comparisons are so much more common > than overflow checking multiplications, it would be nice if it simply > arranged for comparisons to be emitted like those multiplications on its > own... > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2022-06-01 Jakub Jelinek <jakub@redhat.com> > > PR middle-end/30314 > * match.pd (__builtin_mul_overflow_p (x, cst, (utype) 0) -> > x > ~(utype)0 / cst): New simplification. > > * gcc.dg/tree-ssa/pr30314.c: New test. OK jeff
On Wed, 1 Jun 2022, Jakub Jelinek wrote: > Hi! > > A comparison with a constant is most likely always faster than > .MUL_OVERFLOW from which we only check whether it overflowed and not the > multiplication result, and even if not, it is simpler operation on GIMPLE > and even if a target exists where such multiplications with overflow checking > are cheaper than comparisons, because comparisons are so much more common > than overflow checking multiplications, it would be nice if it simply > arranged for comparisons to be emitted like those multiplications on its > own... > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2022-06-01 Jakub Jelinek <jakub@redhat.com> > > PR middle-end/30314 > * match.pd (__builtin_mul_overflow_p (x, cst, (utype) 0) -> > x > ~(utype)0 / cst): New simplification. > > * gcc.dg/tree-ssa/pr30314.c: New test. > > --- gcc/match.pd.jj 2022-06-01 13:54:32.000654151 +0200 > +++ gcc/match.pd 2022-06-01 15:13:35.473084402 +0200 > @@ -5969,6 +5969,17 @@ (define_operator_list SYNC_FETCH_AND_AND > && (!TYPE_UNSIGNED (TREE_TYPE (@2)) || TYPE_UNSIGNED (TREE_TYPE (@0)))) > (ovf @1 @0)))) > > +/* Optimize __builtin_mul_overflow_p (x, cst, (utype) 0) if all 3 types > + are unsigned to x > (umax / cst). */ > +(simplify > + (imagpart (IFN_MUL_OVERFLOW:cs@2 @0 integer_nonzerop@1)) does :c work here? I think it is at least ignored, possibly diagnostic in genmatch is missing ... > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > + && TYPE_UNSIGNED (TREE_TYPE (@0)) > + && TYPE_MAX_VALUE (TREE_TYPE (@0)) > + && types_match (TREE_TYPE (@0), TREE_TYPE (TREE_TYPE (@2))) > + && int_fits_type_p (@1, TREE_TYPE (@0))) > + (convert (gt @0 (trunc_div! { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1))))) > + > /* Simplification of math builtins. These rules must all be optimizations > as well as IL simplifications. If there is a possibility that the new > form could be a pessimization, the rule should go in the canonicalization > --- gcc/testsuite/gcc.dg/tree-ssa/pr30314.c.jj 2022-06-01 15:22:53.201271365 +0200 > +++ gcc/testsuite/gcc.dg/tree-ssa/pr30314.c 2022-06-01 15:13:24.725196482 +0200 > @@ -0,0 +1,18 @@ > +/* PR middle-end/30314 */ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > +/* { dg-final { scan-tree-dump-not "\.MUL_OVERFLOW " "optimized" } } */ > +/* { dg-final { scan-tree-dump " > 122713351" "optimized" { target int32 } } } */ > +/* { dg-final { scan-tree-dump " > 527049830677415760" "optimized" { target lp64 } } } */ > + > +int > +foo (unsigned int x) > +{ > + return __builtin_mul_overflow_p (x, 35U, 0U); > +} > + > +int > +bar (unsigned long int x) > +{ > + return __builtin_mul_overflow_p (x, 35UL, 0UL); > +} > > Jakub > >
On Thu, Jun 02, 2022 at 08:36:42AM +0000, Richard Biener wrote: > > --- gcc/match.pd.jj 2022-06-01 13:54:32.000654151 +0200 > > +++ gcc/match.pd 2022-06-01 15:13:35.473084402 +0200 > > @@ -5969,6 +5969,17 @@ (define_operator_list SYNC_FETCH_AND_AND > > && (!TYPE_UNSIGNED (TREE_TYPE (@2)) || TYPE_UNSIGNED (TREE_TYPE (@0)))) > > (ovf @1 @0)))) > > > > +/* Optimize __builtin_mul_overflow_p (x, cst, (utype) 0) if all 3 types > > + are unsigned to x > (umax / cst). */ > > +(simplify > > + (imagpart (IFN_MUL_OVERFLOW:cs@2 @0 integer_nonzerop@1)) > > does :c work here? I think it is at least ignored, possibly diagnostic > in genmatch is missing ... I saw it used in another pattern, so thought it will work fine: (cmp:c (realpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) @0) And looking at the generated source, I think it does work: case CFN_MUL_OVERFLOW: if (gimple_call_num_args (_c1) == 2) { tree _q20 = gimple_call_arg (_c1, 0); _q20 = do_valueize (valueize, _q20); tree _q21 = gimple_call_arg (_c1, 1); _q21 = do_valueize (valueize, _q21); if (integer_nonzerop (_q21)) { { /* #line 5976 "../../gcc/match.pd" */ tree captures[3] ATTRIBUTE_UNUSED = { _p0, _q20, _q21 }; if (gimple_simplify_256 (res_op, seq, valueize, type, captures)) return true; } } if (integer_nonzerop (_q20)) { { /* #line 5976 "../../gcc/match.pd" */ tree captures[3] ATTRIBUTE_UNUSED = { _p0, _q21, _q20 }; if (gimple_simplify_256 (res_op, seq, valueize, type, captures)) return true; } } } Though, sure, I should test it in the testcase too. Here is what I've committed as obvious after testing on x86_64-linux -m32/-m64: 2022-06-02 Jakub Jelinek <jakub@redhat.com> PR middle-end/30314 * gcc.dg/tree-ssa/pr30314.c: Add tests with swapped arguments. --- gcc/testsuite/gcc.dg/tree-ssa/pr30314.c.jj 2022-06-01 17:54:30.000000000 +0200 +++ gcc/testsuite/gcc.dg/tree-ssa/pr30314.c 2022-06-02 11:01:27.818987766 +0200 @@ -4,6 +4,8 @@ /* { dg-final { scan-tree-dump-not "\.MUL_OVERFLOW " "optimized" } } */ /* { dg-final { scan-tree-dump " > 122713351" "optimized" { target int32 } } } */ /* { dg-final { scan-tree-dump " > 527049830677415760" "optimized" { target lp64 } } } */ +/* { dg-final { scan-tree-dump " > 102261126" "optimized" { target int32 } } } */ +/* { dg-final { scan-tree-dump " > 439208192231179800" "optimized" { target lp64 } } } */ int foo (unsigned int x) @@ -16,3 +18,15 @@ bar (unsigned long int x) { return __builtin_mul_overflow_p (x, 35UL, 0UL); } + +int +baz (unsigned int x) +{ + return __builtin_mul_overflow_p (42, x, 0U); +} + +int +qux (unsigned long int x) +{ + return __builtin_mul_overflow_p (42, x, 0UL); +} Jakub
On Thu, 2 Jun 2022, Jakub Jelinek wrote: > On Thu, Jun 02, 2022 at 08:36:42AM +0000, Richard Biener wrote: > > > --- gcc/match.pd.jj 2022-06-01 13:54:32.000654151 +0200 > > > +++ gcc/match.pd 2022-06-01 15:13:35.473084402 +0200 > > > @@ -5969,6 +5969,17 @@ (define_operator_list SYNC_FETCH_AND_AND > > > && (!TYPE_UNSIGNED (TREE_TYPE (@2)) || TYPE_UNSIGNED (TREE_TYPE (@0)))) > > > (ovf @1 @0)))) > > > > > > +/* Optimize __builtin_mul_overflow_p (x, cst, (utype) 0) if all 3 types > > > + are unsigned to x > (umax / cst). */ > > > +(simplify > > > + (imagpart (IFN_MUL_OVERFLOW:cs@2 @0 integer_nonzerop@1)) > > > > does :c work here? I think it is at least ignored, possibly diagnostic > > in genmatch is missing ... > > I saw it used in another pattern, so thought it will work fine: > (cmp:c (realpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) @0) > > And looking at the generated source, I think it does work: > case CFN_MUL_OVERFLOW: > if (gimple_call_num_args (_c1) == 2) > { > tree _q20 = gimple_call_arg (_c1, 0); > _q20 = do_valueize (valueize, _q20); > tree _q21 = gimple_call_arg (_c1, 1); > _q21 = do_valueize (valueize, _q21); > if (integer_nonzerop (_q21)) > { > { > /* #line 5976 "../../gcc/match.pd" */ > tree captures[3] ATTRIBUTE_UNUSED = { _p0, _q20, _q21 }; > if (gimple_simplify_256 (res_op, seq, valueize, type, captures)) > return true; > } > } > if (integer_nonzerop (_q20)) > { > { > /* #line 5976 "../../gcc/match.pd" */ > tree captures[3] ATTRIBUTE_UNUSED = { _p0, _q21, _q20 }; > if (gimple_simplify_256 (res_op, seq, valueize, type, captures)) > return true; > } > } > } > Though, sure, I should test it in the testcase too. Ah yeah, we simply trust :c on functions with two arguments (and explicitely handle CFN_FMA). Richard.
--- gcc/match.pd.jj 2022-06-01 13:54:32.000654151 +0200 +++ gcc/match.pd 2022-06-01 15:13:35.473084402 +0200 @@ -5969,6 +5969,17 @@ (define_operator_list SYNC_FETCH_AND_AND && (!TYPE_UNSIGNED (TREE_TYPE (@2)) || TYPE_UNSIGNED (TREE_TYPE (@0)))) (ovf @1 @0)))) +/* Optimize __builtin_mul_overflow_p (x, cst, (utype) 0) if all 3 types + are unsigned to x > (umax / cst). */ +(simplify + (imagpart (IFN_MUL_OVERFLOW:cs@2 @0 integer_nonzerop@1)) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && TYPE_UNSIGNED (TREE_TYPE (@0)) + && TYPE_MAX_VALUE (TREE_TYPE (@0)) + && types_match (TREE_TYPE (@0), TREE_TYPE (TREE_TYPE (@2))) + && int_fits_type_p (@1, TREE_TYPE (@0))) + (convert (gt @0 (trunc_div! { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1))))) + /* Simplification of math builtins. These rules must all be optimizations as well as IL simplifications. If there is a possibility that the new form could be a pessimization, the rule should go in the canonicalization --- gcc/testsuite/gcc.dg/tree-ssa/pr30314.c.jj 2022-06-01 15:22:53.201271365 +0200 +++ gcc/testsuite/gcc.dg/tree-ssa/pr30314.c 2022-06-01 15:13:24.725196482 +0200 @@ -0,0 +1,18 @@ +/* PR middle-end/30314 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-not "\.MUL_OVERFLOW " "optimized" } } */ +/* { dg-final { scan-tree-dump " > 122713351" "optimized" { target int32 } } } */ +/* { dg-final { scan-tree-dump " > 527049830677415760" "optimized" { target lp64 } } } */ + +int +foo (unsigned int x) +{ + return __builtin_mul_overflow_p (x, 35U, 0U); +} + +int +bar (unsigned long int x) +{ + return __builtin_mul_overflow_p (x, 35UL, 0UL); +}