diff mbox

Multiply Optimization in match and Simplify

Message ID SN2PR0701MB102448CA48B5A9C045EDBBAE8E200@SN2PR0701MB1024.namprd07.prod.outlook.com
State New
Headers show

Commit Message

Hurugalawadi, Naveen Oct. 29, 2015, 4:34 a.m. UTC
Hi,

Please find attached the patch that moves some multiply optimizations
from fold-const using simplify and match.

Please review the patch and let me know if any modifications are required.

Tested the patch on X86.

Observing following failures:-

>> FAIL: gcc.dg/fold-plusmult.c scan-tree-dump-times original "<a> \\* 4" 2

Should the testcase be changed to suit current pattern?

>> FAIL: gcc.dg/tree-ssa/vrp47.c scan-tree-dump-times vrp2 " & 1;" 0
>> FAIL: gcc.dg/tree-ssa/vrp59.c scan-tree-dump-not vrp1 " & 3;"

Its due to the following pattern. Pattern seems to be right.
Fold X & (X ^ Y) as X & ~Y
The test PASSes when we have the result as ~X & Y in some
of the following combinations which is wrong.
(bit_and (convert? (bit_xor:c @0 @1) (convert? @0) ))

Thanks,
Naveen

ChangeLog

2015-10-29  Naveen H.S  <Naveen.Hurugalawadi@caviumnetworks.com>

        * fold-const.c (fold_binary_loc) : Remove A - B -> A + (-B) if B
	is easily negatable as its already present.
	Move x * -C into -x * C if x is easily negatable to match.pd.
	Move (A + A) * C -> A * 2 * C to match.pd.
	Move Fold (X ^ Y) & Y as ~X & Y to match.pd.
	Move Fold (X ^ Y) & X as ~Y & X to match.pd.
	Move Fold X & (X ^ Y) as X & ~Y to match.pd.
	Move Fold X & (Y ^ X) as ~Y & X to match.pd.

        * match.pd (bit_and:c (convert? @0) (convert? (bit_xor:c @0 @1))):
	New simplifier.
	(mult:c (convert? negate_expr_p@0) INTEGER_CST@1): New simplifier.
	(mult:c (convert? (plus @0 @0)) (convert? @1)): New simplifier.
	(mult:c (convert? (plus @0 @0)) INTEGER_CST@1): New simplifier.

Comments

Richard Biener Oct. 30, 2015, 10:07 a.m. UTC | #1
On Thu, Oct 29, 2015 at 5:34 AM, Hurugalawadi, Naveen
<Naveen.Hurugalawadi@caviumnetworks.com> wrote:
> Hi,
>
> Please find attached the patch that moves some multiply optimizations
> from fold-const using simplify and match.
>
> Please review the patch and let me know if any modifications are required.
>
> Tested the patch on X86.
>
> Observing following failures:-
>
>>> FAIL: gcc.dg/fold-plusmult.c scan-tree-dump-times original "<a> \\* 4" 2
>
> Should the testcase be changed to suit current pattern?
>
>>> FAIL: gcc.dg/tree-ssa/vrp47.c scan-tree-dump-times vrp2 " & 1;" 0
>>> FAIL: gcc.dg/tree-ssa/vrp59.c scan-tree-dump-not vrp1 " & 3;"
>
> Its due to the following pattern. Pattern seems to be right.
> Fold X & (X ^ Y) as X & ~Y
> The test PASSes when we have the result as ~X & Y in some
> of the following combinations which is wrong.
> (bit_and (convert? (bit_xor:c @0 @1) (convert? @0) ))

Please do not drop A - B -> A + (-B) from fold-const as match.pd
doesn't implement all of fold-const.c negate_expr_p support.

+/* Fold X & (X ^ Y) as X & ~Y.  */
+(simplify
+ (bit_and:c (convert? @0) (convert? (bit_xor:c @0 @1)))
+  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (convert (bit_and @0 (bit_not @1)))))

Ok, so the VRP regression is because we convert

  _8 = x_2(D) ^ 1;
  _4 = (_Bool) _8;
  _5 = (int) _4;

to

  _7 = ~x_2(D);
  _9 = _7 & 1;

via the new pattern and

    /* A truncation to an unsigned type (a zero-extension) should be
       canonicalized as bitwise and of a mask.  */
    (if (final_int && inter_int && inside_int
         && final_prec == inside_prec
         && final_prec > inter_prec
         && inter_unsignedp)
     (convert (bit_and @0 { wide_int_to_tree
                              (inside_type,
                               wi::mask (inter_prec, false,
                                         TYPE_PRECISION (inside_type))); })))

Previously VRP ended up with

  <bb 3>:
  _8 = x_2(D) ^ 1;

and now we have

  _7 = ~x_2(D);
  _9 = _7 & 1;

which is more expensive.  This means that we miss a

(bit_and (bit_not @0) INTEGER_CST@1)

-> (bit_xor @0 @1)

pattern that applies when VRP knows the range of x_2(D)
(all masked bits are know to zero).

+/* Convert X * -C into -X * C.  */
+(simplify
+ (mult:c (convert? negate_expr_p@0) INTEGER_CST@1)
+  (if (tree_int_cst_sgn (@1) == -1)
+   (with { tree tem = const_unop (NEGATE_EXPR, type, @1); }
+    (if (!TREE_OVERFLOW (tem) && wi::ne_p (tem, @1)
+         && tree_nop_conversion_p (type, TREE_TYPE (@0)))
+     (mult (convert (negate @0)) @1)))))

as said above match.pd negate_expr_p doesn't capture everything
fold-const.c does so moving the above isn't a good idea.

+/* Convert (A + A) * C -> A * 2 * C.  */
+(simplify
+ (mult:c (convert? (plus @0 @0)) (convert? @1))
+  (if (tree_nop_conversion_p (TREE_TYPE (@0), type))
+   (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1)))))
+(simplify
+ (mult:c (convert? (plus @0 @0)) INTEGER_CST@1)
+  (if (tree_nop_conversion_p (TREE_TYPE (@0), type))
+   (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1)))))

fold-const.c only handles constant C, so we only need to 2nd pattern.
Also the :c on the mult in that is not needed due to canonicalization rules.
Please build the result of the inner multiplication directly.
I think the fold-const.c code has an overflow issue when the outer
multiplication
is signed and the inner addition unsigned.  (signed)((unsigned)INT_MAX
+ (unsigned)INT_MAX)*2
is valid but INT_MAX * 4 is not as it overflows.  So I think we should
_not_ allow
nop conversions here (it's fine if all ops are signed or unsigned).

Richard.



> Thanks,
> Naveen
>
> ChangeLog
>
> 2015-10-29  Naveen H.S  <Naveen.Hurugalawadi@caviumnetworks.com>
>
>         * fold-const.c (fold_binary_loc) : Remove A - B -> A + (-B) if B
>         is easily negatable as its already present.
>         Move x * -C into -x * C if x is easily negatable to match.pd.
>         Move (A + A) * C -> A * 2 * C to match.pd.
>         Move Fold (X ^ Y) & Y as ~X & Y to match.pd.
>         Move Fold (X ^ Y) & X as ~Y & X to match.pd.
>         Move Fold X & (X ^ Y) as X & ~Y to match.pd.
>         Move Fold X & (Y ^ X) as ~Y & X to match.pd.
>
>         * match.pd (bit_and:c (convert? @0) (convert? (bit_xor:c @0 @1))):
>         New simplifier.
>         (mult:c (convert? negate_expr_p@0) INTEGER_CST@1): New simplifier.
>         (mult:c (convert? (plus @0 @0)) (convert? @1)): New simplifier.
>         (mult:c (convert? (plus @0 @0)) INTEGER_CST@1): New simplifier.
Marc Glisse Oct. 30, 2015, 10:26 a.m. UTC | #2
On Fri, 30 Oct 2015, Richard Biener wrote:

> +/* Convert (A + A) * C -> A * 2 * C.  */
> +(simplify
> + (mult:c (convert? (plus @0 @0)) (convert? @1))
> +  (if (tree_nop_conversion_p (TREE_TYPE (@0), type))
> +   (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1)))))
> +(simplify
> + (mult:c (convert? (plus @0 @0)) INTEGER_CST@1)
> +  (if (tree_nop_conversion_p (TREE_TYPE (@0), type))
> +   (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1)))))
>
> fold-const.c only handles constant C, so we only need to 2nd pattern.
> Also the :c on the mult in that is not needed due to canonicalization rules.
> Please build the result of the inner multiplication directly.
> I think the fold-const.c code has an overflow issue when the outer
> multiplication
> is signed and the inner addition unsigned.  (signed)((unsigned)INT_MAX
> + (unsigned)INT_MAX)*2
> is valid but INT_MAX * 4 is not as it overflows.  So I think we should
> _not_ allow
> nop conversions here (it's fine if all ops are signed or unsigned).

Is there a reason why the simple transformation A+A->2*A is not
generally a desirable canonicalization? We currently restrict it to
SCALAR_FLOAT_TYPE_P.
Richard Biener Oct. 30, 2015, 10:55 a.m. UTC | #3
On Fri, Oct 30, 2015 at 11:26 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Fri, 30 Oct 2015, Richard Biener wrote:
>
>> +/* Convert (A + A) * C -> A * 2 * C.  */
>> +(simplify
>> + (mult:c (convert? (plus @0 @0)) (convert? @1))
>> +  (if (tree_nop_conversion_p (TREE_TYPE (@0), type))
>> +   (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1)))))
>> +(simplify
>> + (mult:c (convert? (plus @0 @0)) INTEGER_CST@1)
>> +  (if (tree_nop_conversion_p (TREE_TYPE (@0), type))
>> +   (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1)))))
>>
>> fold-const.c only handles constant C, so we only need to 2nd pattern.
>> Also the :c on the mult in that is not needed due to canonicalization
>> rules.
>> Please build the result of the inner multiplication directly.
>> I think the fold-const.c code has an overflow issue when the outer
>> multiplication
>> is signed and the inner addition unsigned.  (signed)((unsigned)INT_MAX
>> + (unsigned)INT_MAX)*2
>> is valid but INT_MAX * 4 is not as it overflows.  So I think we should
>> _not_ allow
>> nop conversions here (it's fine if all ops are signed or unsigned).
>
>
> Is there a reason why the simple transformation A+A->2*A is not
> generally a desirable canonicalization? We currently restrict it to
> SCALAR_FLOAT_TYPE_P.

No special reason I know of.

>
> --
> Marc Glisse
diff mbox

Patch

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index e8ff1de..0334b53 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9695,19 +9695,6 @@  fold_binary_loc (location_t loc,
 	    }
 	}
 
-      /* A - B -> A + (-B) if B is easily negatable.  */
-      if (negate_expr_p (arg1)
-	  && !TYPE_OVERFLOW_SANITIZED (type)
-	  && ((FLOAT_TYPE_P (type)
-               /* Avoid this transformation if B is a positive REAL_CST.  */
-	       && (TREE_CODE (arg1) != REAL_CST
-		   ||  REAL_VALUE_NEGATIVE (TREE_REAL_CST (arg1))))
-	      || INTEGRAL_TYPE_P (type)))
-	return fold_build2_loc (loc, PLUS_EXPR, type,
-			    fold_convert_loc (loc, type, arg0),
-			    fold_convert_loc (loc, type,
-					      negate_expr (arg1)));
-
       /* Fold &a[i] - &a[j] to i-j.  */
       if (TREE_CODE (arg0) == ADDR_EXPR
 	  && TREE_CODE (TREE_OPERAND (arg0, 0)) == ARRAY_REF
@@ -9749,29 +9736,6 @@  fold_binary_loc (location_t loc,
     case MULT_EXPR:
       if (! FLOAT_TYPE_P (type))
 	{
-	  /* Transform x * -C into -x * C if x is easily negatable.  */
-	  if (TREE_CODE (arg1) == INTEGER_CST
-	      && tree_int_cst_sgn (arg1) == -1
-	      && negate_expr_p (arg0)
-	      && (tem = negate_expr (arg1)) != arg1
-	      && !TREE_OVERFLOW (tem))
-	    return fold_build2_loc (loc, MULT_EXPR, type,
-	    			fold_convert_loc (loc, type,
-						  negate_expr (arg0)),
-				tem);
-
-	  /* (A + A) * C -> A * 2 * C  */
-	  if (TREE_CODE (arg0) == PLUS_EXPR
-	      && TREE_CODE (arg1) == INTEGER_CST
-	      && operand_equal_p (TREE_OPERAND (arg0, 0),
-			          TREE_OPERAND (arg0, 1), 0))
-	    return fold_build2_loc (loc, MULT_EXPR, type,
-				omit_one_operand_loc (loc, type,
-						  TREE_OPERAND (arg0, 0),
-						  TREE_OPERAND (arg0, 1)),
-				fold_build2_loc (loc, MULT_EXPR, type,
-					     build_int_cst (type, 2) , arg1));
-
 	  /* ((T) (X /[ex] C)) * C cancels out if the conversion is
 	     sign-changing only.  */
 	  if (TREE_CODE (arg1) == INTEGER_CST
@@ -9961,45 +9925,6 @@  fold_binary_loc (location_t loc,
 				  build_zero_cst (TREE_TYPE (tem)));
 	}
 
-      /* Fold (X ^ Y) & Y as ~X & Y.  */
-      if (TREE_CODE (arg0) == BIT_XOR_EXPR
-	  && operand_equal_p (TREE_OPERAND (arg0, 1), arg1, 0))
-	{
-	  tem = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
-	  return fold_build2_loc (loc, BIT_AND_EXPR, type,
-			      fold_build1_loc (loc, BIT_NOT_EXPR, type, tem),
-			      fold_convert_loc (loc, type, arg1));
-	}
-      /* Fold (X ^ Y) & X as ~Y & X.  */
-      if (TREE_CODE (arg0) == BIT_XOR_EXPR
-	  && operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0)
-	  && reorder_operands_p (TREE_OPERAND (arg0, 1), arg1))
-	{
-	  tem = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1));
-	  return fold_build2_loc (loc, BIT_AND_EXPR, type,
-			      fold_build1_loc (loc, BIT_NOT_EXPR, type, tem),
-			      fold_convert_loc (loc, type, arg1));
-	}
-      /* Fold X & (X ^ Y) as X & ~Y.  */
-      if (TREE_CODE (arg1) == BIT_XOR_EXPR
-	  && operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
-	{
-	  tem = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 1));
-	  return fold_build2_loc (loc, BIT_AND_EXPR, type,
-			      fold_convert_loc (loc, type, arg0),
-			      fold_build1_loc (loc, BIT_NOT_EXPR, type, tem));
-	}
-      /* Fold X & (Y ^ X) as ~Y & X.  */
-      if (TREE_CODE (arg1) == BIT_XOR_EXPR
-	  && operand_equal_p (arg0, TREE_OPERAND (arg1, 1), 0)
-	  && reorder_operands_p (arg0, TREE_OPERAND (arg1, 0)))
-	{
-	  tem = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 0));
-	  return fold_build2_loc (loc, BIT_AND_EXPR, type,
-			      fold_build1_loc (loc, BIT_NOT_EXPR, type, tem),
-			      fold_convert_loc (loc, type, arg0));
-	}
-
       /* Fold (X * Y) & -(1 << CST) to X * Y if Y is a constant
          multiple of 1 << CST.  */
       if (TREE_CODE (arg1) == INTEGER_CST)
diff --git a/gcc/match.pd b/gcc/match.pd
index 1d6dde1..6793f59 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -490,6 +490,12 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (wi::bit_not (@2) == @1)
   (bit_xor @0 @1)))
 
+/* Fold X & (X ^ Y) as X & ~Y.  */
+(simplify
+ (bit_and:c (convert? @0) (convert? (bit_xor:c @0 @1)))
+  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (convert (bit_and @0 (bit_not @1)))))
+
 /* X % Y is smaller than Y.  */
 (for cmp (lt ge)
  (simplify
@@ -1600,12 +1606,31 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (if (!TREE_OVERFLOW (tem) || !flag_trapping_math)
     (minus @0 { tem; })))))
 
+/* Convert X * -C into -X * C.  */
+(simplify
+ (mult:c (convert? negate_expr_p@0) INTEGER_CST@1)
+  (if (tree_int_cst_sgn (@1) == -1)
+   (with { tree tem = const_unop (NEGATE_EXPR, type, @1); }
+    (if (!TREE_OVERFLOW (tem) && wi::ne_p (tem, @1)
+	  && tree_nop_conversion_p (type, TREE_TYPE (@0)))
+     (mult (convert (negate @0)) @1)))))
+
 /* Convert x+x into x*2.0.  */
 (simplify
  (plus @0 @0)
  (if (SCALAR_FLOAT_TYPE_P (type))
   (mult @0 { build_real (type, dconst2); })))
 
+/* Convert (A + A) * C -> A * 2 * C.  */
+(simplify
+ (mult:c (convert? (plus @0 @0)) (convert? @1))
+  (if (tree_nop_conversion_p (TREE_TYPE (@0), type))
+   (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1)))))
+(simplify
+ (mult:c (convert? (plus @0 @0)) INTEGER_CST@1)
+  (if (tree_nop_conversion_p (TREE_TYPE (@0), type))
+   (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1)))))
+
 (simplify
  (minus integer_zerop @1)
  (negate @1))