diff mbox series

match.pd: Optimize __builtin_mul_overflow_p (x, cst, (utype)0) to x > ~(utype)0 / cst [PR30314]

Message ID Ypdv5qrjnk87DA8p@tucnak
State New
Headers show
Series match.pd: Optimize __builtin_mul_overflow_p (x, cst, (utype)0) to x > ~(utype)0 / cst [PR30314] | expand

Commit Message

Jakub Jelinek June 1, 2022, 1:55 p.m. UTC
Hi!

A comparison with a constant is most likely always faster than
.MUL_OVERFLOW from which we only check whether it overflowed and not the
multiplication result, and even if not, it is simpler operation on GIMPLE
and even if a target exists where such multiplications with overflow checking
are cheaper than comparisons, because comparisons are so much more common
than overflow checking multiplications, it would be nice if it simply
arranged for comparisons to be emitted like those multiplications on its
own...

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-06-01  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/30314
	* match.pd (__builtin_mul_overflow_p (x, cst, (utype) 0) ->
	x > ~(utype)0 / cst): New simplification.

	* gcc.dg/tree-ssa/pr30314.c: New test.


	Jakub

Comments

Jeff Law June 1, 2022, 2:07 p.m. UTC | #1
On 6/1/2022 7:55 AM, Jakub Jelinek via Gcc-patches wrote:
> Hi!
>
> A comparison with a constant is most likely always faster than
> .MUL_OVERFLOW from which we only check whether it overflowed and not the
> multiplication result, and even if not, it is simpler operation on GIMPLE
> and even if a target exists where such multiplications with overflow checking
> are cheaper than comparisons, because comparisons are so much more common
> than overflow checking multiplications, it would be nice if it simply
> arranged for comparisons to be emitted like those multiplications on its
> own...
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2022-06-01  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR middle-end/30314
> 	* match.pd (__builtin_mul_overflow_p (x, cst, (utype) 0) ->
> 	x > ~(utype)0 / cst): New simplification.
>
> 	* gcc.dg/tree-ssa/pr30314.c: New test.
OK
jeff
Richard Biener June 2, 2022, 8:36 a.m. UTC | #2
On Wed, 1 Jun 2022, Jakub Jelinek wrote:

> Hi!
> 
> A comparison with a constant is most likely always faster than
> .MUL_OVERFLOW from which we only check whether it overflowed and not the
> multiplication result, and even if not, it is simpler operation on GIMPLE
> and even if a target exists where such multiplications with overflow checking
> are cheaper than comparisons, because comparisons are so much more common
> than overflow checking multiplications, it would be nice if it simply
> arranged for comparisons to be emitted like those multiplications on its
> own...
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2022-06-01  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR middle-end/30314
> 	* match.pd (__builtin_mul_overflow_p (x, cst, (utype) 0) ->
> 	x > ~(utype)0 / cst): New simplification.
> 
> 	* gcc.dg/tree-ssa/pr30314.c: New test.
> 
> --- gcc/match.pd.jj	2022-06-01 13:54:32.000654151 +0200
> +++ gcc/match.pd	2022-06-01 15:13:35.473084402 +0200
> @@ -5969,6 +5969,17 @@ (define_operator_list SYNC_FETCH_AND_AND
>         && (!TYPE_UNSIGNED (TREE_TYPE (@2)) || TYPE_UNSIGNED (TREE_TYPE (@0))))
>     (ovf @1 @0))))
>  
> +/* Optimize __builtin_mul_overflow_p (x, cst, (utype) 0) if all 3 types
> +   are unsigned to x > (umax / cst).  */
> +(simplify
> + (imagpart (IFN_MUL_OVERFLOW:cs@2 @0 integer_nonzerop@1))

does :c work here?  I think it is at least ignored, possibly diagnostic
in genmatch is missing ...

> +  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +       && TYPE_UNSIGNED (TREE_TYPE (@0))
> +       && TYPE_MAX_VALUE (TREE_TYPE (@0))
> +       && types_match (TREE_TYPE (@0), TREE_TYPE (TREE_TYPE (@2)))
> +       && int_fits_type_p (@1, TREE_TYPE (@0)))
> +   (convert (gt @0 (trunc_div! { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1)))))
> +
>  /* Simplification of math builtins.  These rules must all be optimizations
>     as well as IL simplifications.  If there is a possibility that the new
>     form could be a pessimization, the rule should go in the canonicalization
> --- gcc/testsuite/gcc.dg/tree-ssa/pr30314.c.jj	2022-06-01 15:22:53.201271365 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr30314.c	2022-06-01 15:13:24.725196482 +0200
> @@ -0,0 +1,18 @@
> +/* PR middle-end/30314 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-not "\.MUL_OVERFLOW " "optimized" } } */
> +/* { dg-final { scan-tree-dump " > 122713351" "optimized" { target int32 } } } */
> +/* { dg-final { scan-tree-dump " > 527049830677415760" "optimized" { target lp64 } } } */
> +
> +int
> +foo (unsigned int x)
> +{
> +  return __builtin_mul_overflow_p (x, 35U, 0U);
> +}
> +
> +int
> +bar (unsigned long int x)
> +{
> +  return __builtin_mul_overflow_p (x, 35UL, 0UL);
> +}
> 
> 	Jakub
> 
>
Jakub Jelinek June 2, 2022, 9:18 a.m. UTC | #3
On Thu, Jun 02, 2022 at 08:36:42AM +0000, Richard Biener wrote:
> > --- gcc/match.pd.jj	2022-06-01 13:54:32.000654151 +0200
> > +++ gcc/match.pd	2022-06-01 15:13:35.473084402 +0200
> > @@ -5969,6 +5969,17 @@ (define_operator_list SYNC_FETCH_AND_AND
> >         && (!TYPE_UNSIGNED (TREE_TYPE (@2)) || TYPE_UNSIGNED (TREE_TYPE (@0))))
> >     (ovf @1 @0))))
> >  
> > +/* Optimize __builtin_mul_overflow_p (x, cst, (utype) 0) if all 3 types
> > +   are unsigned to x > (umax / cst).  */
> > +(simplify
> > + (imagpart (IFN_MUL_OVERFLOW:cs@2 @0 integer_nonzerop@1))
> 
> does :c work here?  I think it is at least ignored, possibly diagnostic
> in genmatch is missing ...

I saw it used in another pattern, so thought it will work fine:
  (cmp:c (realpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) @0)

And looking at the generated source, I think it does work:
              case CFN_MUL_OVERFLOW:
                if (gimple_call_num_args (_c1) == 2)
                  {
                    tree _q20 = gimple_call_arg (_c1, 0);
                    _q20 = do_valueize (valueize, _q20);
                    tree _q21 = gimple_call_arg (_c1, 1);
                    _q21 = do_valueize (valueize, _q21);
                    if (integer_nonzerop (_q21))
                      {
                        {
/* #line 5976 "../../gcc/match.pd" */
                          tree captures[3] ATTRIBUTE_UNUSED = { _p0, _q20, _q21 };
                          if (gimple_simplify_256 (res_op, seq, valueize, type, captures))
                            return true;
                        }
                      }
                    if (integer_nonzerop (_q20))
                      {
                        {
/* #line 5976 "../../gcc/match.pd" */
                          tree captures[3] ATTRIBUTE_UNUSED = { _p0, _q21, _q20 };
                          if (gimple_simplify_256 (res_op, seq, valueize, type, captures))
                            return true;
                        }
                      }
                  }
Though, sure, I should test it in the testcase too.
Here is what I've committed as obvious after testing on x86_64-linux
-m32/-m64:

2022-06-02  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/30314
	* gcc.dg/tree-ssa/pr30314.c: Add tests with swapped arguments.

--- gcc/testsuite/gcc.dg/tree-ssa/pr30314.c.jj	2022-06-01 17:54:30.000000000 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr30314.c	2022-06-02 11:01:27.818987766 +0200
@@ -4,6 +4,8 @@
 /* { dg-final { scan-tree-dump-not "\.MUL_OVERFLOW " "optimized" } } */
 /* { dg-final { scan-tree-dump " > 122713351" "optimized" { target int32 } } } */
 /* { dg-final { scan-tree-dump " > 527049830677415760" "optimized" { target lp64 } } } */
+/* { dg-final { scan-tree-dump " > 102261126" "optimized" { target int32 } } } */
+/* { dg-final { scan-tree-dump " > 439208192231179800" "optimized" { target lp64 } } } */
 
 int
 foo (unsigned int x)
@@ -16,3 +18,15 @@ bar (unsigned long int x)
 {
   return __builtin_mul_overflow_p (x, 35UL, 0UL);
 }
+
+int
+baz (unsigned int x)
+{
+  return __builtin_mul_overflow_p (42, x, 0U);
+}
+
+int
+qux (unsigned long int x)
+{
+  return __builtin_mul_overflow_p (42, x, 0UL);
+}


	Jakub
Richard Biener June 2, 2022, 9:58 a.m. UTC | #4
On Thu, 2 Jun 2022, Jakub Jelinek wrote:

> On Thu, Jun 02, 2022 at 08:36:42AM +0000, Richard Biener wrote:
> > > --- gcc/match.pd.jj	2022-06-01 13:54:32.000654151 +0200
> > > +++ gcc/match.pd	2022-06-01 15:13:35.473084402 +0200
> > > @@ -5969,6 +5969,17 @@ (define_operator_list SYNC_FETCH_AND_AND
> > >         && (!TYPE_UNSIGNED (TREE_TYPE (@2)) || TYPE_UNSIGNED (TREE_TYPE (@0))))
> > >     (ovf @1 @0))))
> > >  
> > > +/* Optimize __builtin_mul_overflow_p (x, cst, (utype) 0) if all 3 types
> > > +   are unsigned to x > (umax / cst).  */
> > > +(simplify
> > > + (imagpart (IFN_MUL_OVERFLOW:cs@2 @0 integer_nonzerop@1))
> > 
> > does :c work here?  I think it is at least ignored, possibly diagnostic
> > in genmatch is missing ...
> 
> I saw it used in another pattern, so thought it will work fine:
>   (cmp:c (realpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) @0)
> 
> And looking at the generated source, I think it does work:
>               case CFN_MUL_OVERFLOW:
>                 if (gimple_call_num_args (_c1) == 2)
>                   {
>                     tree _q20 = gimple_call_arg (_c1, 0);
>                     _q20 = do_valueize (valueize, _q20);
>                     tree _q21 = gimple_call_arg (_c1, 1);
>                     _q21 = do_valueize (valueize, _q21);
>                     if (integer_nonzerop (_q21))
>                       {
>                         {
> /* #line 5976 "../../gcc/match.pd" */
>                           tree captures[3] ATTRIBUTE_UNUSED = { _p0, _q20, _q21 };
>                           if (gimple_simplify_256 (res_op, seq, valueize, type, captures))
>                             return true;
>                         }
>                       }
>                     if (integer_nonzerop (_q20))
>                       {
>                         {
> /* #line 5976 "../../gcc/match.pd" */
>                           tree captures[3] ATTRIBUTE_UNUSED = { _p0, _q21, _q20 };
>                           if (gimple_simplify_256 (res_op, seq, valueize, type, captures))
>                             return true;
>                         }
>                       }
>                   }
> Though, sure, I should test it in the testcase too.

Ah yeah, we simply trust :c on functions with two arguments (and
explicitely handle CFN_FMA).

Richard.
diff mbox series

Patch

--- gcc/match.pd.jj	2022-06-01 13:54:32.000654151 +0200
+++ gcc/match.pd	2022-06-01 15:13:35.473084402 +0200
@@ -5969,6 +5969,17 @@  (define_operator_list SYNC_FETCH_AND_AND
        && (!TYPE_UNSIGNED (TREE_TYPE (@2)) || TYPE_UNSIGNED (TREE_TYPE (@0))))
    (ovf @1 @0))))
 
+/* Optimize __builtin_mul_overflow_p (x, cst, (utype) 0) if all 3 types
+   are unsigned to x > (umax / cst).  */
+(simplify
+ (imagpart (IFN_MUL_OVERFLOW:cs@2 @0 integer_nonzerop@1))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+       && TYPE_UNSIGNED (TREE_TYPE (@0))
+       && TYPE_MAX_VALUE (TREE_TYPE (@0))
+       && types_match (TREE_TYPE (@0), TREE_TYPE (TREE_TYPE (@2)))
+       && int_fits_type_p (@1, TREE_TYPE (@0)))
+   (convert (gt @0 (trunc_div! { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1)))))
+
 /* Simplification of math builtins.  These rules must all be optimizations
    as well as IL simplifications.  If there is a possibility that the new
    form could be a pessimization, the rule should go in the canonicalization
--- gcc/testsuite/gcc.dg/tree-ssa/pr30314.c.jj	2022-06-01 15:22:53.201271365 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr30314.c	2022-06-01 15:13:24.725196482 +0200
@@ -0,0 +1,18 @@ 
+/* PR middle-end/30314 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not "\.MUL_OVERFLOW " "optimized" } } */
+/* { dg-final { scan-tree-dump " > 122713351" "optimized" { target int32 } } } */
+/* { dg-final { scan-tree-dump " > 527049830677415760" "optimized" { target lp64 } } } */
+
+int
+foo (unsigned int x)
+{
+  return __builtin_mul_overflow_p (x, 35U, 0U);
+}
+
+int
+bar (unsigned long int x)
+{
+  return __builtin_mul_overflow_p (x, 35UL, 0UL);
+}