Message ID | VI1PR0802MB2176B86348A82C79E3645463E7A80@VI1PR0802MB2176.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
On Tue, Oct 25, 2016 at 1:21 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: > Hi, > Second patch optimizing (cond (cmp (convert (x), c1), x, c2)) into (minmax (x, c)). As commented in patch, this is done if: > > + 1) Comparison's operands are promoted from smaller type. > + 2) Const c1 equals to c2 after canonicalizing comparison. > + 3) Comparison has tree code LT, LE, GT or GE. > + This specific pattern is needed when (cmp (convert x) c) may not > + be simplified by comparison patterns because of multiple uses of > + x. It also makes sense here because simplifying across multiple > + referred var is always benefitial for complicated cases. > > It also adds call to fold_stmt in tree-if-conv.c so that generated cond_expr statement has its chance to be simplified. So this sounds like the fold-const.c code. Why only for GIMPLE btw? Thus, can you make sure you covered all cases fold-const.c did and remove the corresponding code there? Thanks, Richard. > Bootstrap and test on x86_64 and AArch64. It introduces below failure on both x86_64 and AArch64: > FAIL: gcc.dg/vect/slp-cond-3.c > I believe it reveals defect in vect-slp. In call to fold_stmt in ifcvt, canonicalization transforms _145 = _95 <= _96 ? _149 : _147 into _145 = _95 > _96 ? _147 : _149. As a result, this stmt has different code to the first one of SLP instance. IMO, SLP should be improved to handle operands swapping, apparently, current support is not OK. > > It also introduces more failures on AArch64(probably other targets) as below: > FAIL: gcc.dg/vect/pr65947-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "LOOP VECTORIZED" 2 > FAIL: gcc.dg/vect/pr65947-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "condition expression based on integer induction." 4 > FAIL: gcc.dg/vect/pr65947-1.c scan-tree-dump-times vect "LOOP VECTORIZED" 2 > FAIL: gcc.dg/vect/pr65947-1.c scan-tree-dump-times vect "condition expression based on integer induction." 4 > FAIL: gcc.dg/vect/pr65947-13.c -flto -ffat-lto-objects scan-tree-dump-times vect "LOOP VECTORIZED" 2 > FAIL: gcc.dg/vect/pr65947-13.c scan-tree-dump-times vect "LOOP VECTORIZED" 2 > FAIL: gcc.dg/vect/pr65947-4.c -flto -ffat-lto-objects scan-tree-dump-times vect "LOOP VECTORIZED" 2 > FAIL: gcc.dg/vect/pr65947-4.c -flto -ffat-lto-objects scan-tree-dump-times vect "condition expression based on integer induction." 4 > FAIL: gcc.dg/vect/pr65947-4.c scan-tree-dump-times vect "LOOP VECTORIZED" 2 > FAIL: gcc.dg/vect/pr65947-4.c scan-tree-dump-times vect "condition expression based on integer induction." 4 > FAIL: gcc.dg/vect/pr77503.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" > FAIL: gcc.dg/vect/pr77503.c scan-tree-dump vect "vectorized 1 loops" > FAIL: gcc.dg/vect/vect-pr69848.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" > FAIL: gcc.dg/vect/vect-pr69848.c scan-tree-dump vect "vectorized 1 loops" > Again, these failures reveal a defect in vectorizer that operand swapping is not supported for COND_REDUCTION. > > I will send another two patches independent to this patch set resolving these failures. Is this OK? > > Thanks, > bin > > 2016-10-21 Bin Cheng <bin.cheng@arm.com> > > * tree-if-conv.c (ifcvt_follow_ssa_use_edges): New func. > (predicate_scalar_phi): Call fold_stmt using the new valueize func. > * match.pd ((cond (cmp (convert (x), c1), x, c2)) -> (minmax (x, c))): > New pattern. > > gcc/testsuite/ChangeLog > 2016-10-21 Bin Cheng <bin.cheng@arm.com> > > * gcc.dg/fold-condcmpconv-1.c: New test. > * gcc.dg/fold-condcmpconv-2.c: New test.
On Thu, Oct 27, 2016 at 2:58 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Tue, Oct 25, 2016 at 1:21 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: >> Hi, >> Second patch optimizing (cond (cmp (convert (x), c1), x, c2)) into (minmax (x, c)). As commented in patch, this is done if: >> >> + 1) Comparison's operands are promoted from smaller type. >> + 2) Const c1 equals to c2 after canonicalizing comparison. >> + 3) Comparison has tree code LT, LE, GT or GE. >> + This specific pattern is needed when (cmp (convert x) c) may not >> + be simplified by comparison patterns because of multiple uses of >> + x. It also makes sense here because simplifying across multiple >> + referred var is always benefitial for complicated cases. >> >> It also adds call to fold_stmt in tree-if-conv.c so that generated cond_expr statement has its chance to be simplified. > > So this sounds like the fold-const.c code. Why only for GIMPLE btw? > Thus, can you make sure you covered > all cases fold-const.c did and remove the corresponding code there? Hmm, no, this patch only adds simplification which is not done by GCC at the moment. I didn't look into existing simplification from fold-const.c when writing this one, however, I will look into it see if it can be moved to match.pd, either along with this pattern or as a separated ones. Thanks, bin > > Thanks, > Richard. > >> Bootstrap and test on x86_64 and AArch64. It introduces below failure on both x86_64 and AArch64: >> FAIL: gcc.dg/vect/slp-cond-3.c >> I believe it reveals defect in vect-slp. In call to fold_stmt in ifcvt, canonicalization transforms _145 = _95 <= _96 ? _149 : _147 into _145 = _95 > _96 ? _147 : _149. As a result, this stmt has different code to the first one of SLP instance. IMO, SLP should be improved to handle operands swapping, apparently, current support is not OK. >> >> It also introduces more failures on AArch64(probably other targets) as below: >> FAIL: gcc.dg/vect/pr65947-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "LOOP VECTORIZED" 2 >> FAIL: gcc.dg/vect/pr65947-1.c -flto -ffat-lto-objects scan-tree-dump-times vect "condition expression based on integer induction." 4 >> FAIL: gcc.dg/vect/pr65947-1.c scan-tree-dump-times vect "LOOP VECTORIZED" 2 >> FAIL: gcc.dg/vect/pr65947-1.c scan-tree-dump-times vect "condition expression based on integer induction." 4 >> FAIL: gcc.dg/vect/pr65947-13.c -flto -ffat-lto-objects scan-tree-dump-times vect "LOOP VECTORIZED" 2 >> FAIL: gcc.dg/vect/pr65947-13.c scan-tree-dump-times vect "LOOP VECTORIZED" 2 >> FAIL: gcc.dg/vect/pr65947-4.c -flto -ffat-lto-objects scan-tree-dump-times vect "LOOP VECTORIZED" 2 >> FAIL: gcc.dg/vect/pr65947-4.c -flto -ffat-lto-objects scan-tree-dump-times vect "condition expression based on integer induction." 4 >> FAIL: gcc.dg/vect/pr65947-4.c scan-tree-dump-times vect "LOOP VECTORIZED" 2 >> FAIL: gcc.dg/vect/pr65947-4.c scan-tree-dump-times vect "condition expression based on integer induction." 4 >> FAIL: gcc.dg/vect/pr77503.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" >> FAIL: gcc.dg/vect/pr77503.c scan-tree-dump vect "vectorized 1 loops" >> FAIL: gcc.dg/vect/vect-pr69848.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" >> FAIL: gcc.dg/vect/vect-pr69848.c scan-tree-dump vect "vectorized 1 loops" >> Again, these failures reveal a defect in vectorizer that operand swapping is not supported for COND_REDUCTION. >> >> I will send another two patches independent to this patch set resolving these failures. Is this OK? >> >> Thanks, >> bin >> >> 2016-10-21 Bin Cheng <bin.cheng@arm.com> >> >> * tree-if-conv.c (ifcvt_follow_ssa_use_edges): New func. >> (predicate_scalar_phi): Call fold_stmt using the new valueize func. >> * match.pd ((cond (cmp (convert (x), c1), x, c2)) -> (minmax (x, c))): >> New pattern. >> >> gcc/testsuite/ChangeLog >> 2016-10-21 Bin Cheng <bin.cheng@arm.com> >> >> * gcc.dg/fold-condcmpconv-1.c: New test. >> * gcc.dg/fold-condcmpconv-2.c: New test.
diff --git a/gcc/match.pd b/gcc/match.pd index 7365bc1..7523b2f 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1930,6 +1930,59 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (integer_zerop (@0)) @2))) +#if GIMPLE +/* (cond (cmp (convert (x) c1) x c2)) -> (minmax (x c)) if: + 1) Comparison's operands are promoted from smaller type. + 2) Const c1 equals to c2 after canonicalizing comparison. + 3) Comparison has tree code LT, LE, GT or GE. + This specific pattern is needed when (cmp (convert x) c) may not + be simplified by comparison patterns because of multiple uses of + x. It also makes sense here because simplifying across multiple + referred var is always benefitial for complicated cases. */ +(for cmp (lt le gt ge) + (simplify + (cond (cmp@0 (convert@3 @1) INTEGER_CST@4) @1 INTEGER_CST@2) + (with + { + tree op_type = TREE_TYPE (@1), cmp_type = TREE_TYPE (@3); + enum tree_code code = TREE_CODE (@0), cmp_code = TREE_CODE (@0); + + if (TYPE_SIGN (cmp_type) == TYPE_SIGN (op_type) + && TYPE_PRECISION (cmp_type) > TYPE_PRECISION (op_type)) + { + if (wi::to_widest (@4) == (wi::to_widest (@2) - 1)) + { + /* X <= Y - 1 equals to X < Y. */ + if (cmp_code == LE_EXPR) + code = LT_EXPR; + /* X > Y - 1 equals to X >= Y. */ + if (cmp_code == GT_EXPR) + code = GE_EXPR; + } + if (wi::to_widest (@4) == (wi::to_widest (@2) + 1)) + { + /* X < Y + 1 equals to X <= Y. */ + if (cmp_code == LT_EXPR) + code = LE_EXPR; + /* X >= Y + 1 equals to X > Y. */ + if (cmp_code == GE_EXPR) + code = GT_EXPR; + } + if (code != cmp_code || wi::to_widest (@2) == wi::to_widest (@4)) + { + if (cmp_code == LT_EXPR || cmp_code == LE_EXPR) + code = MIN_EXPR; + if (cmp_code == GT_EXPR || cmp_code == GE_EXPR) + code = MAX_EXPR; + } + } + } + (if (code == MAX_EXPR) + (max @1 @2) + (if (code == MIN_EXPR) + (min @1 @2)))))) +#endif + (for cnd (cond vec_cond) /* A ? B : (A ? X : C) -> A ? B : C. */ (simplify diff --git a/gcc/testsuite/gcc.dg/fold-condcmpconv-1.c b/gcc/testsuite/gcc.dg/fold-condcmpconv-1.c new file mode 100644 index 0000000..321294f --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-condcmpconv-1.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-ifcvt" } */ + +int foo (unsigned short a[], unsigned int x) +{ + unsigned int i; + for (i = 0; i < 1000; i++) + { + x = a[i]; + a[i] = (unsigned short)(x >= 255 ? 255 : x); + } return x; +} + +/* { dg-final { scan-tree-dump " = MIN_EXPR <" "ifcvt" } } */ diff --git a/gcc/testsuite/gcc.dg/fold-condcmpconv-2.c b/gcc/testsuite/gcc.dg/fold-condcmpconv-2.c new file mode 100644 index 0000000..5d3ef4a --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-condcmpconv-2.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-ifcvt" } */ + +int foo (short a[], int x) +{ + unsigned int i; + for (i = 0; i < 1000; i++) + { + x = a[i]; + a[i] = (short)(x <= 0 ? 0 : x); + } return x; +} + + +/* { dg-final { scan-tree-dump " = MAX_EXPR <" "ifcvt" } } */ diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 0a20189..24cca2e 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -1749,6 +1749,14 @@ gen_phi_arg_condition (gphi *phi, vec<int> *occur, return cond; } +/* Local valueization callback that follos all-use SSA edges. */ + +static tree +ifcvt_follow_ssa_use_edges (tree val) +{ + return val; +} + /* Replace a scalar PHI node with a COND_EXPR using COND as condition. This routine can handle PHI nodes with more than two arguments. @@ -1844,6 +1852,8 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi) arg0, arg1); new_stmt = gimple_build_assign (res, rhs); gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + gimple_stmt_iterator new_gsi = gsi_for_stmt (new_stmt); + fold_stmt (&new_gsi, ifcvt_follow_ssa_use_edges); update_stmt (new_stmt); if (dump_file && (dump_flags & TDF_DETAILS))