diff mbox

[2/4] Simplify (cond (cmp (convert (x), c1), x, c2)) into (minmax (x, c))

Message ID VI1PR0802MB2176B86348A82C79E3645463E7A80@VI1PR0802MB2176.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Bin Cheng Oct. 25, 2016, 11:21 a.m. UTC
Hi,
Second patch optimizing (cond (cmp (convert (x), c1), x, c2)) into (minmax (x, c)).  As commented in patch, this is done if:

+     1) Comparison's operands are promoted from smaller type.
+     2) Const c1 equals to c2 after canonicalizing comparison.
+     3) Comparison has tree code LT, LE, GT or GE.
+    This specific pattern is needed when (cmp (convert x) c) may not
+    be simplified by comparison patterns because of multiple uses of
+    x.  It also makes sense here because simplifying across multiple
+    referred var is always benefitial for complicated cases.

It also adds call to fold_stmt in tree-if-conv.c so that generated cond_expr statement has its chance to be simplified.

Bootstrap and test on x86_64 and AArch64.  It introduces below failure on both x86_64 and AArch64:
FAIL: gcc.dg/vect/slp-cond-3.c
I believe it reveals defect in vect-slp.  In call to fold_stmt in ifcvt, canonicalization transforms _145 = _95 <= _96 ? _149 : _147 into _145 = _95 > _96 ? _147 : _149.  As a result, this stmt has different code to the first one of SLP instance.  IMO, SLP should be improved to handle operands swapping, apparently, current support is not OK.

It also introduces more failures on AArch64(probably other targets) as below:
FAIL: gcc.dg/vect/pr65947-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect "LOOP VECTORIZED" 2
FAIL: gcc.dg/vect/pr65947-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect "condition expression based on integer induction." 4
FAIL: gcc.dg/vect/pr65947-1.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
FAIL: gcc.dg/vect/pr65947-1.c scan-tree-dump-times vect "condition expression based on integer induction." 4
FAIL: gcc.dg/vect/pr65947-13.c -flto -ffat-lto-objects  scan-tree-dump-times vect "LOOP VECTORIZED" 2
FAIL: gcc.dg/vect/pr65947-13.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
FAIL: gcc.dg/vect/pr65947-4.c -flto -ffat-lto-objects  scan-tree-dump-times vect "LOOP VECTORIZED" 2
FAIL: gcc.dg/vect/pr65947-4.c -flto -ffat-lto-objects  scan-tree-dump-times vect "condition expression based on integer induction." 4
FAIL: gcc.dg/vect/pr65947-4.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
FAIL: gcc.dg/vect/pr65947-4.c scan-tree-dump-times vect "condition expression based on integer induction." 4
FAIL: gcc.dg/vect/pr77503.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
FAIL: gcc.dg/vect/pr77503.c scan-tree-dump vect "vectorized 1 loops"
FAIL: gcc.dg/vect/vect-pr69848.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
FAIL: gcc.dg/vect/vect-pr69848.c scan-tree-dump vect "vectorized 1 loops"
Again, these failures reveal a defect in vectorizer that operand swapping is not supported for COND_REDUCTION.

I will send another two patches independent to this patch set resolving these failures.  Is this OK?

Thanks,
bin

2016-10-21  Bin Cheng  <bin.cheng@arm.com>

	* tree-if-conv.c (ifcvt_follow_ssa_use_edges): New func.
	(predicate_scalar_phi): Call fold_stmt using the new valueize func.
	* match.pd ((cond (cmp (convert (x), c1), x, c2)) -> (minmax (x, c))):
	New pattern.

gcc/testsuite/ChangeLog
2016-10-21  Bin Cheng  <bin.cheng@arm.com>

	* gcc.dg/fold-condcmpconv-1.c: New test.
	* gcc.dg/fold-condcmpconv-2.c: New test.

Comments

Richard Biener Oct. 27, 2016, 1:58 p.m. UTC | #1
On Tue, Oct 25, 2016 at 1:21 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> Second patch optimizing (cond (cmp (convert (x), c1), x, c2)) into (minmax (x, c)).  As commented in patch, this is done if:
>
> +     1) Comparison's operands are promoted from smaller type.
> +     2) Const c1 equals to c2 after canonicalizing comparison.
> +     3) Comparison has tree code LT, LE, GT or GE.
> +    This specific pattern is needed when (cmp (convert x) c) may not
> +    be simplified by comparison patterns because of multiple uses of
> +    x.  It also makes sense here because simplifying across multiple
> +    referred var is always benefitial for complicated cases.
>
> It also adds call to fold_stmt in tree-if-conv.c so that generated cond_expr statement has its chance to be simplified.

So this sounds like the fold-const.c code.  Why only for GIMPLE btw?
Thus, can you make sure you covered
all cases fold-const.c did and remove the corresponding code there?

Thanks,
Richard.

> Bootstrap and test on x86_64 and AArch64.  It introduces below failure on both x86_64 and AArch64:
> FAIL: gcc.dg/vect/slp-cond-3.c
> I believe it reveals defect in vect-slp.  In call to fold_stmt in ifcvt, canonicalization transforms _145 = _95 <= _96 ? _149 : _147 into _145 = _95 > _96 ? _147 : _149.  As a result, this stmt has different code to the first one of SLP instance.  IMO, SLP should be improved to handle operands swapping, apparently, current support is not OK.
>
> It also introduces more failures on AArch64(probably other targets) as below:
> FAIL: gcc.dg/vect/pr65947-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect "LOOP VECTORIZED" 2
> FAIL: gcc.dg/vect/pr65947-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect "condition expression based on integer induction." 4
> FAIL: gcc.dg/vect/pr65947-1.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
> FAIL: gcc.dg/vect/pr65947-1.c scan-tree-dump-times vect "condition expression based on integer induction." 4
> FAIL: gcc.dg/vect/pr65947-13.c -flto -ffat-lto-objects  scan-tree-dump-times vect "LOOP VECTORIZED" 2
> FAIL: gcc.dg/vect/pr65947-13.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
> FAIL: gcc.dg/vect/pr65947-4.c -flto -ffat-lto-objects  scan-tree-dump-times vect "LOOP VECTORIZED" 2
> FAIL: gcc.dg/vect/pr65947-4.c -flto -ffat-lto-objects  scan-tree-dump-times vect "condition expression based on integer induction." 4
> FAIL: gcc.dg/vect/pr65947-4.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
> FAIL: gcc.dg/vect/pr65947-4.c scan-tree-dump-times vect "condition expression based on integer induction." 4
> FAIL: gcc.dg/vect/pr77503.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
> FAIL: gcc.dg/vect/pr77503.c scan-tree-dump vect "vectorized 1 loops"
> FAIL: gcc.dg/vect/vect-pr69848.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
> FAIL: gcc.dg/vect/vect-pr69848.c scan-tree-dump vect "vectorized 1 loops"
> Again, these failures reveal a defect in vectorizer that operand swapping is not supported for COND_REDUCTION.
>
> I will send another two patches independent to this patch set resolving these failures.  Is this OK?
>
> Thanks,
> bin
>
> 2016-10-21  Bin Cheng  <bin.cheng@arm.com>
>
>         * tree-if-conv.c (ifcvt_follow_ssa_use_edges): New func.
>         (predicate_scalar_phi): Call fold_stmt using the new valueize func.
>         * match.pd ((cond (cmp (convert (x), c1), x, c2)) -> (minmax (x, c))):
>         New pattern.
>
> gcc/testsuite/ChangeLog
> 2016-10-21  Bin Cheng  <bin.cheng@arm.com>
>
>         * gcc.dg/fold-condcmpconv-1.c: New test.
>         * gcc.dg/fold-condcmpconv-2.c: New test.
Bin.Cheng Oct. 27, 2016, 2:24 p.m. UTC | #2
On Thu, Oct 27, 2016 at 2:58 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Oct 25, 2016 at 1:21 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> Second patch optimizing (cond (cmp (convert (x), c1), x, c2)) into (minmax (x, c)).  As commented in patch, this is done if:
>>
>> +     1) Comparison's operands are promoted from smaller type.
>> +     2) Const c1 equals to c2 after canonicalizing comparison.
>> +     3) Comparison has tree code LT, LE, GT or GE.
>> +    This specific pattern is needed when (cmp (convert x) c) may not
>> +    be simplified by comparison patterns because of multiple uses of
>> +    x.  It also makes sense here because simplifying across multiple
>> +    referred var is always benefitial for complicated cases.
>>
>> It also adds call to fold_stmt in tree-if-conv.c so that generated cond_expr statement has its chance to be simplified.
>
> So this sounds like the fold-const.c code.  Why only for GIMPLE btw?
> Thus, can you make sure you covered
> all cases fold-const.c did and remove the corresponding code there?
Hmm, no, this patch only adds simplification which is not done by GCC
at the moment.  I didn't look into existing simplification from
fold-const.c when writing this one, however, I will look into it see
if it can be moved to match.pd, either along with this pattern or as a
separated ones.

Thanks,
bin
>
> Thanks,
> Richard.
>
>> Bootstrap and test on x86_64 and AArch64.  It introduces below failure on both x86_64 and AArch64:
>> FAIL: gcc.dg/vect/slp-cond-3.c
>> I believe it reveals defect in vect-slp.  In call to fold_stmt in ifcvt, canonicalization transforms _145 = _95 <= _96 ? _149 : _147 into _145 = _95 > _96 ? _147 : _149.  As a result, this stmt has different code to the first one of SLP instance.  IMO, SLP should be improved to handle operands swapping, apparently, current support is not OK.
>>
>> It also introduces more failures on AArch64(probably other targets) as below:
>> FAIL: gcc.dg/vect/pr65947-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect "LOOP VECTORIZED" 2
>> FAIL: gcc.dg/vect/pr65947-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect "condition expression based on integer induction." 4
>> FAIL: gcc.dg/vect/pr65947-1.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
>> FAIL: gcc.dg/vect/pr65947-1.c scan-tree-dump-times vect "condition expression based on integer induction." 4
>> FAIL: gcc.dg/vect/pr65947-13.c -flto -ffat-lto-objects  scan-tree-dump-times vect "LOOP VECTORIZED" 2
>> FAIL: gcc.dg/vect/pr65947-13.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
>> FAIL: gcc.dg/vect/pr65947-4.c -flto -ffat-lto-objects  scan-tree-dump-times vect "LOOP VECTORIZED" 2
>> FAIL: gcc.dg/vect/pr65947-4.c -flto -ffat-lto-objects  scan-tree-dump-times vect "condition expression based on integer induction." 4
>> FAIL: gcc.dg/vect/pr65947-4.c scan-tree-dump-times vect "LOOP VECTORIZED" 2
>> FAIL: gcc.dg/vect/pr65947-4.c scan-tree-dump-times vect "condition expression based on integer induction." 4
>> FAIL: gcc.dg/vect/pr77503.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
>> FAIL: gcc.dg/vect/pr77503.c scan-tree-dump vect "vectorized 1 loops"
>> FAIL: gcc.dg/vect/vect-pr69848.c -flto -ffat-lto-objects  scan-tree-dump vect "vectorized 1 loops"
>> FAIL: gcc.dg/vect/vect-pr69848.c scan-tree-dump vect "vectorized 1 loops"
>> Again, these failures reveal a defect in vectorizer that operand swapping is not supported for COND_REDUCTION.
>>
>> I will send another two patches independent to this patch set resolving these failures.  Is this OK?
>>
>> Thanks,
>> bin
>>
>> 2016-10-21  Bin Cheng  <bin.cheng@arm.com>
>>
>>         * tree-if-conv.c (ifcvt_follow_ssa_use_edges): New func.
>>         (predicate_scalar_phi): Call fold_stmt using the new valueize func.
>>         * match.pd ((cond (cmp (convert (x), c1), x, c2)) -> (minmax (x, c))):
>>         New pattern.
>>
>> gcc/testsuite/ChangeLog
>> 2016-10-21  Bin Cheng  <bin.cheng@arm.com>
>>
>>         * gcc.dg/fold-condcmpconv-1.c: New test.
>>         * gcc.dg/fold-condcmpconv-2.c: New test.
diff mbox

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 7365bc1..7523b2f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1930,6 +1930,59 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (integer_zerop (@0))
    @2)))
 
+#if GIMPLE
+/* (cond (cmp (convert (x) c1) x c2)) -> (minmax (x c)) if:
+     1) Comparison's operands are promoted from smaller type.
+     2) Const c1 equals to c2 after canonicalizing comparison.
+     3) Comparison has tree code LT, LE, GT or GE.
+    This specific pattern is needed when (cmp (convert x) c) may not
+    be simplified by comparison patterns because of multiple uses of
+    x.  It also makes sense here because simplifying across multiple
+    referred var is always benefitial for complicated cases.  */
+(for cmp (lt le gt ge)
+ (simplify
+  (cond (cmp@0 (convert@3 @1) INTEGER_CST@4) @1 INTEGER_CST@2)
+   (with
+    {
+      tree op_type = TREE_TYPE (@1), cmp_type = TREE_TYPE (@3);
+      enum tree_code code = TREE_CODE (@0), cmp_code = TREE_CODE (@0);
+
+      if (TYPE_SIGN (cmp_type) == TYPE_SIGN (op_type)
+	  && TYPE_PRECISION (cmp_type) > TYPE_PRECISION (op_type))
+	{
+	  if (wi::to_widest (@4) == (wi::to_widest (@2) - 1))
+	    {
+	      /* X <= Y - 1 equals to X < Y.  */
+	      if (cmp_code == LE_EXPR)
+		code = LT_EXPR;
+	      /* X > Y - 1 equals to X >= Y.  */
+	      if (cmp_code == GT_EXPR)
+		code = GE_EXPR;
+	    }
+	  if (wi::to_widest (@4) == (wi::to_widest (@2) + 1))
+	    {
+	      /* X < Y + 1 equals to X <= Y.  */
+	      if (cmp_code == LT_EXPR)
+		code = LE_EXPR;
+	      /* X >= Y + 1 equals to X > Y.  */
+	      if (cmp_code == GE_EXPR)
+		code = GT_EXPR;
+	    }
+	  if (code != cmp_code || wi::to_widest (@2) == wi::to_widest (@4))
+	    {
+	      if (cmp_code == LT_EXPR || cmp_code == LE_EXPR)
+		code = MIN_EXPR;
+	      if (cmp_code == GT_EXPR || cmp_code == GE_EXPR)
+		code = MAX_EXPR;
+	    }
+	}
+    }
+    (if (code == MAX_EXPR)
+     (max @1 @2)
+     (if (code == MIN_EXPR)
+      (min @1 @2))))))
+#endif
+
 (for cnd (cond vec_cond)
  /* A ? B : (A ? X : C) -> A ? B : C.  */
  (simplify
diff --git a/gcc/testsuite/gcc.dg/fold-condcmpconv-1.c b/gcc/testsuite/gcc.dg/fold-condcmpconv-1.c
new file mode 100644
index 0000000..321294f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-condcmpconv-1.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-ifcvt" } */
+
+int foo (unsigned short a[], unsigned int x)
+{
+  unsigned int i;
+  for (i = 0; i < 1000; i++)
+    {
+      x = a[i];
+      a[i] = (unsigned short)(x >= 255 ? 255 : x);
+    }  return x;
+}
+
+/* { dg-final { scan-tree-dump " = MIN_EXPR <" "ifcvt" } } */
diff --git a/gcc/testsuite/gcc.dg/fold-condcmpconv-2.c b/gcc/testsuite/gcc.dg/fold-condcmpconv-2.c
new file mode 100644
index 0000000..5d3ef4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-condcmpconv-2.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-ifcvt" } */
+
+int foo (short a[], int x)
+{
+  unsigned int i;
+  for (i = 0; i < 1000; i++)
+    {
+      x = a[i];
+      a[i] = (short)(x <= 0 ? 0 : x);
+    }  return x;
+}
+
+
+/* { dg-final { scan-tree-dump " = MAX_EXPR <" "ifcvt" } } */
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 0a20189..24cca2e 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -1749,6 +1749,14 @@  gen_phi_arg_condition (gphi *phi, vec<int> *occur,
   return cond;
 }
 
+/* Local valueization callback that follos all-use SSA edges.  */
+
+static tree
+ifcvt_follow_ssa_use_edges (tree val)
+{
+  return val;
+}
+
 /* Replace a scalar PHI node with a COND_EXPR using COND as condition.
    This routine can handle PHI nodes with more than two arguments.
 
@@ -1844,6 +1852,8 @@  predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
 				    arg0, arg1);
       new_stmt = gimple_build_assign (res, rhs);
       gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      gimple_stmt_iterator new_gsi = gsi_for_stmt (new_stmt);
+      fold_stmt (&new_gsi, ifcvt_follow_ssa_use_edges);
       update_stmt (new_stmt);
 
       if (dump_file && (dump_flags & TDF_DETAILS))