Message ID | SN2PR0701MB102448CA48B5A9C045EDBBAE8E200@SN2PR0701MB1024.namprd07.prod.outlook.com |
---|---|
State | New |
Headers | show |
On Thu, Oct 29, 2015 at 5:34 AM, Hurugalawadi, Naveen <Naveen.Hurugalawadi@caviumnetworks.com> wrote: > Hi, > > Please find attached the patch that moves some multiply optimizations > from fold-const using simplify and match. > > Please review the patch and let me know if any modifications are required. > > Tested the patch on X86. > > Observing following failures:- > >>> FAIL: gcc.dg/fold-plusmult.c scan-tree-dump-times original "<a> \\* 4" 2 > > Should the testcase be changed to suit current pattern? > >>> FAIL: gcc.dg/tree-ssa/vrp47.c scan-tree-dump-times vrp2 " & 1;" 0 >>> FAIL: gcc.dg/tree-ssa/vrp59.c scan-tree-dump-not vrp1 " & 3;" > > Its due to the following pattern. Pattern seems to be right. > Fold X & (X ^ Y) as X & ~Y > The test PASSes when we have the result as ~X & Y in some > of the following combinations which is wrong. > (bit_and (convert? (bit_xor:c @0 @1) (convert? @0) )) Please do not drop A - B -> A + (-B) from fold-const as match.pd doesn't implement all of fold-const.c negate_expr_p support. +/* Fold X & (X ^ Y) as X & ~Y. */ +(simplify + (bit_and:c (convert? @0) (convert? (bit_xor:c @0 @1))) + (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) + (convert (bit_and @0 (bit_not @1))))) Ok, so the VRP regression is because we convert _8 = x_2(D) ^ 1; _4 = (_Bool) _8; _5 = (int) _4; to _7 = ~x_2(D); _9 = _7 & 1; via the new pattern and /* A truncation to an unsigned type (a zero-extension) should be canonicalized as bitwise and of a mask. */ (if (final_int && inter_int && inside_int && final_prec == inside_prec && final_prec > inter_prec && inter_unsignedp) (convert (bit_and @0 { wide_int_to_tree (inside_type, wi::mask (inter_prec, false, TYPE_PRECISION (inside_type))); }))) Previously VRP ended up with <bb 3>: _8 = x_2(D) ^ 1; and now we have _7 = ~x_2(D); _9 = _7 & 1; which is more expensive. This means that we miss a (bit_and (bit_not @0) INTEGER_CST@1) -> (bit_xor @0 @1) pattern that applies when VRP knows the range of x_2(D) (all masked bits are know to zero). +/* Convert X * -C into -X * C. */ +(simplify + (mult:c (convert? negate_expr_p@0) INTEGER_CST@1) + (if (tree_int_cst_sgn (@1) == -1) + (with { tree tem = const_unop (NEGATE_EXPR, type, @1); } + (if (!TREE_OVERFLOW (tem) && wi::ne_p (tem, @1) + && tree_nop_conversion_p (type, TREE_TYPE (@0))) + (mult (convert (negate @0)) @1))))) as said above match.pd negate_expr_p doesn't capture everything fold-const.c does so moving the above isn't a good idea. +/* Convert (A + A) * C -> A * 2 * C. */ +(simplify + (mult:c (convert? (plus @0 @0)) (convert? @1)) + (if (tree_nop_conversion_p (TREE_TYPE (@0), type)) + (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1))))) +(simplify + (mult:c (convert? (plus @0 @0)) INTEGER_CST@1) + (if (tree_nop_conversion_p (TREE_TYPE (@0), type)) + (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1))))) fold-const.c only handles constant C, so we only need to 2nd pattern. Also the :c on the mult in that is not needed due to canonicalization rules. Please build the result of the inner multiplication directly. I think the fold-const.c code has an overflow issue when the outer multiplication is signed and the inner addition unsigned. (signed)((unsigned)INT_MAX + (unsigned)INT_MAX)*2 is valid but INT_MAX * 4 is not as it overflows. So I think we should _not_ allow nop conversions here (it's fine if all ops are signed or unsigned). Richard. > Thanks, > Naveen > > ChangeLog > > 2015-10-29 Naveen H.S <Naveen.Hurugalawadi@caviumnetworks.com> > > * fold-const.c (fold_binary_loc) : Remove A - B -> A + (-B) if B > is easily negatable as its already present. > Move x * -C into -x * C if x is easily negatable to match.pd. > Move (A + A) * C -> A * 2 * C to match.pd. > Move Fold (X ^ Y) & Y as ~X & Y to match.pd. > Move Fold (X ^ Y) & X as ~Y & X to match.pd. > Move Fold X & (X ^ Y) as X & ~Y to match.pd. > Move Fold X & (Y ^ X) as ~Y & X to match.pd. > > * match.pd (bit_and:c (convert? @0) (convert? (bit_xor:c @0 @1))): > New simplifier. > (mult:c (convert? negate_expr_p@0) INTEGER_CST@1): New simplifier. > (mult:c (convert? (plus @0 @0)) (convert? @1)): New simplifier. > (mult:c (convert? (plus @0 @0)) INTEGER_CST@1): New simplifier.
On Fri, 30 Oct 2015, Richard Biener wrote: > +/* Convert (A + A) * C -> A * 2 * C. */ > +(simplify > + (mult:c (convert? (plus @0 @0)) (convert? @1)) > + (if (tree_nop_conversion_p (TREE_TYPE (@0), type)) > + (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1))))) > +(simplify > + (mult:c (convert? (plus @0 @0)) INTEGER_CST@1) > + (if (tree_nop_conversion_p (TREE_TYPE (@0), type)) > + (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1))))) > > fold-const.c only handles constant C, so we only need to 2nd pattern. > Also the :c on the mult in that is not needed due to canonicalization rules. > Please build the result of the inner multiplication directly. > I think the fold-const.c code has an overflow issue when the outer > multiplication > is signed and the inner addition unsigned. (signed)((unsigned)INT_MAX > + (unsigned)INT_MAX)*2 > is valid but INT_MAX * 4 is not as it overflows. So I think we should > _not_ allow > nop conversions here (it's fine if all ops are signed or unsigned). Is there a reason why the simple transformation A+A->2*A is not generally a desirable canonicalization? We currently restrict it to SCALAR_FLOAT_TYPE_P.
On Fri, Oct 30, 2015 at 11:26 AM, Marc Glisse <marc.glisse@inria.fr> wrote: > On Fri, 30 Oct 2015, Richard Biener wrote: > >> +/* Convert (A + A) * C -> A * 2 * C. */ >> +(simplify >> + (mult:c (convert? (plus @0 @0)) (convert? @1)) >> + (if (tree_nop_conversion_p (TREE_TYPE (@0), type)) >> + (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1))))) >> +(simplify >> + (mult:c (convert? (plus @0 @0)) INTEGER_CST@1) >> + (if (tree_nop_conversion_p (TREE_TYPE (@0), type)) >> + (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1))))) >> >> fold-const.c only handles constant C, so we only need to 2nd pattern. >> Also the :c on the mult in that is not needed due to canonicalization >> rules. >> Please build the result of the inner multiplication directly. >> I think the fold-const.c code has an overflow issue when the outer >> multiplication >> is signed and the inner addition unsigned. (signed)((unsigned)INT_MAX >> + (unsigned)INT_MAX)*2 >> is valid but INT_MAX * 4 is not as it overflows. So I think we should >> _not_ allow >> nop conversions here (it's fine if all ops are signed or unsigned). > > > Is there a reason why the simple transformation A+A->2*A is not > generally a desirable canonicalization? We currently restrict it to > SCALAR_FLOAT_TYPE_P. No special reason I know of. > > -- > Marc Glisse
diff --git a/gcc/fold-const.c b/gcc/fold-const.c index e8ff1de..0334b53 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -9695,19 +9695,6 @@ fold_binary_loc (location_t loc, } } - /* A - B -> A + (-B) if B is easily negatable. */ - if (negate_expr_p (arg1) - && !TYPE_OVERFLOW_SANITIZED (type) - && ((FLOAT_TYPE_P (type) - /* Avoid this transformation if B is a positive REAL_CST. */ - && (TREE_CODE (arg1) != REAL_CST - || REAL_VALUE_NEGATIVE (TREE_REAL_CST (arg1)))) - || INTEGRAL_TYPE_P (type))) - return fold_build2_loc (loc, PLUS_EXPR, type, - fold_convert_loc (loc, type, arg0), - fold_convert_loc (loc, type, - negate_expr (arg1))); - /* Fold &a[i] - &a[j] to i-j. */ if (TREE_CODE (arg0) == ADDR_EXPR && TREE_CODE (TREE_OPERAND (arg0, 0)) == ARRAY_REF @@ -9749,29 +9736,6 @@ fold_binary_loc (location_t loc, case MULT_EXPR: if (! FLOAT_TYPE_P (type)) { - /* Transform x * -C into -x * C if x is easily negatable. */ - if (TREE_CODE (arg1) == INTEGER_CST - && tree_int_cst_sgn (arg1) == -1 - && negate_expr_p (arg0) - && (tem = negate_expr (arg1)) != arg1 - && !TREE_OVERFLOW (tem)) - return fold_build2_loc (loc, MULT_EXPR, type, - fold_convert_loc (loc, type, - negate_expr (arg0)), - tem); - - /* (A + A) * C -> A * 2 * C */ - if (TREE_CODE (arg0) == PLUS_EXPR - && TREE_CODE (arg1) == INTEGER_CST - && operand_equal_p (TREE_OPERAND (arg0, 0), - TREE_OPERAND (arg0, 1), 0)) - return fold_build2_loc (loc, MULT_EXPR, type, - omit_one_operand_loc (loc, type, - TREE_OPERAND (arg0, 0), - TREE_OPERAND (arg0, 1)), - fold_build2_loc (loc, MULT_EXPR, type, - build_int_cst (type, 2) , arg1)); - /* ((T) (X /[ex] C)) * C cancels out if the conversion is sign-changing only. */ if (TREE_CODE (arg1) == INTEGER_CST @@ -9961,45 +9925,6 @@ fold_binary_loc (location_t loc, build_zero_cst (TREE_TYPE (tem))); } - /* Fold (X ^ Y) & Y as ~X & Y. */ - if (TREE_CODE (arg0) == BIT_XOR_EXPR - && operand_equal_p (TREE_OPERAND (arg0, 1), arg1, 0)) - { - tem = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0)); - return fold_build2_loc (loc, BIT_AND_EXPR, type, - fold_build1_loc (loc, BIT_NOT_EXPR, type, tem), - fold_convert_loc (loc, type, arg1)); - } - /* Fold (X ^ Y) & X as ~Y & X. */ - if (TREE_CODE (arg0) == BIT_XOR_EXPR - && operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0) - && reorder_operands_p (TREE_OPERAND (arg0, 1), arg1)) - { - tem = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1)); - return fold_build2_loc (loc, BIT_AND_EXPR, type, - fold_build1_loc (loc, BIT_NOT_EXPR, type, tem), - fold_convert_loc (loc, type, arg1)); - } - /* Fold X & (X ^ Y) as X & ~Y. */ - if (TREE_CODE (arg1) == BIT_XOR_EXPR - && operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0)) - { - tem = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 1)); - return fold_build2_loc (loc, BIT_AND_EXPR, type, - fold_convert_loc (loc, type, arg0), - fold_build1_loc (loc, BIT_NOT_EXPR, type, tem)); - } - /* Fold X & (Y ^ X) as ~Y & X. */ - if (TREE_CODE (arg1) == BIT_XOR_EXPR - && operand_equal_p (arg0, TREE_OPERAND (arg1, 1), 0) - && reorder_operands_p (arg0, TREE_OPERAND (arg1, 0))) - { - tem = fold_convert_loc (loc, type, TREE_OPERAND (arg1, 0)); - return fold_build2_loc (loc, BIT_AND_EXPR, type, - fold_build1_loc (loc, BIT_NOT_EXPR, type, tem), - fold_convert_loc (loc, type, arg0)); - } - /* Fold (X * Y) & -(1 << CST) to X * Y if Y is a constant multiple of 1 << CST. */ if (TREE_CODE (arg1) == INTEGER_CST) diff --git a/gcc/match.pd b/gcc/match.pd index 1d6dde1..6793f59 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -490,6 +490,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (wi::bit_not (@2) == @1) (bit_xor @0 @1))) +/* Fold X & (X ^ Y) as X & ~Y. */ +(simplify + (bit_and:c (convert? @0) (convert? (bit_xor:c @0 @1))) + (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) + (convert (bit_and @0 (bit_not @1))))) + /* X % Y is smaller than Y. */ (for cmp (lt ge) (simplify @@ -1600,12 +1606,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (!TREE_OVERFLOW (tem) || !flag_trapping_math) (minus @0 { tem; }))))) +/* Convert X * -C into -X * C. */ +(simplify + (mult:c (convert? negate_expr_p@0) INTEGER_CST@1) + (if (tree_int_cst_sgn (@1) == -1) + (with { tree tem = const_unop (NEGATE_EXPR, type, @1); } + (if (!TREE_OVERFLOW (tem) && wi::ne_p (tem, @1) + && tree_nop_conversion_p (type, TREE_TYPE (@0))) + (mult (convert (negate @0)) @1))))) + /* Convert x+x into x*2.0. */ (simplify (plus @0 @0) (if (SCALAR_FLOAT_TYPE_P (type)) (mult @0 { build_real (type, dconst2); }))) +/* Convert (A + A) * C -> A * 2 * C. */ +(simplify + (mult:c (convert? (plus @0 @0)) (convert? @1)) + (if (tree_nop_conversion_p (TREE_TYPE (@0), type)) + (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1))))) +(simplify + (mult:c (convert? (plus @0 @0)) INTEGER_CST@1) + (if (tree_nop_conversion_p (TREE_TYPE (@0), type)) + (convert (mult @0 (mult { build_int_cst (TREE_TYPE (@1), 2); } @1))))) + (simplify (minus integer_zerop @1) (negate @1))