diff mbox

[RTL-ifcvt] PR rtl-optimization/68841: Make sure one basic block doesn't clobber CC reg usage of the other

Message ID 5677C50F.2010302@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov Dec. 21, 2015, 9:23 a.m. UTC
On 18/12/15 13:16, Bernd Schmidt wrote:
> On 12/18/2015 02:07 PM, Kyrill Tkachov wrote:
>> In this PR we have a THEN and an ELSE block where one uses the condition
>> reg from the preceeding comparison
>> but the other block has an arithmetic operation that clobbers the CC reg.
>> ifcvt emits the latter first and dead code elimination later sees this
>> and eliminates the first comparison
>> because it sees that the CC reg is clobbered between the comparison and
>> its usage.
>
>>      (noce_try_cmove_arith): Check CC reg usage in both blocks
>>      and emit them in such an order so as not to clobber the CC reg
>>      before its use, if possible.
>
> Why is this done here? It looks to me like bbs_ok_for_cmove_arith is the function that already tries to sort out issues like this. Does it maybe just need to be extended to see clobbers?
>

Here is a version integrating the new checks into bbs_ok_for_cmove_arith.
We might as well do it there since it already iterates over all the instructions in the basic blocks.

Bootstrapped and tested on arm, aarch64, x86_64 and checked that there are no codegen effects on SPEC2006.
How does this look?

Thanks,
Kyrill

2015-12-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68841
     * ifcvt.c (cond_exec_get_condition): Rename to...
     (get_condition_from_jump): ... This.
     (cond_exec_process_if_block): Update callsites.
     (dead_or_predicable): Likewise.
     (bbs_ok_for_cmove_arith): Update to record use and clobbers
     of the CC register.
     (noce_try_cmove_arith): Check CC reg usage in both blocks
     and emit them in such an order so as not to clobber the CC reg
     before its use, if possible.

2015-12-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68841
     * gcc.dg/pr68841.c: New test.

Comments

Kyrill Tkachov Jan. 5, 2016, 12:03 p.m. UTC | #1
Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01966.html

Thanks,
Kyrill

On 21/12/15 09:23, Kyrill Tkachov wrote:
>
> On 18/12/15 13:16, Bernd Schmidt wrote:
>> On 12/18/2015 02:07 PM, Kyrill Tkachov wrote:
>>> In this PR we have a THEN and an ELSE block where one uses the condition
>>> reg from the preceeding comparison
>>> but the other block has an arithmetic operation that clobbers the CC reg.
>>> ifcvt emits the latter first and dead code elimination later sees this
>>> and eliminates the first comparison
>>> because it sees that the CC reg is clobbered between the comparison and
>>> its usage.
>>
>>>      (noce_try_cmove_arith): Check CC reg usage in both blocks
>>>      and emit them in such an order so as not to clobber the CC reg
>>>      before its use, if possible.
>>
>> Why is this done here? It looks to me like bbs_ok_for_cmove_arith is the function that already tries to sort out issues like this. Does it maybe just need to be extended to see clobbers?
>>
>
> Here is a version integrating the new checks into bbs_ok_for_cmove_arith.
> We might as well do it there since it already iterates over all the instructions in the basic blocks.
>
> Bootstrapped and tested on arm, aarch64, x86_64 and checked that there are no codegen effects on SPEC2006.
> How does this look?
>
> Thanks,
> Kyrill
>
> 2015-12-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR rtl-optimization/68841
>     * ifcvt.c (cond_exec_get_condition): Rename to...
>     (get_condition_from_jump): ... This.
>     (cond_exec_process_if_block): Update callsites.
>     (dead_or_predicable): Likewise.
>     (bbs_ok_for_cmove_arith): Update to record use and clobbers
>     of the CC register.
>     (noce_try_cmove_arith): Check CC reg usage in both blocks
>     and emit them in such an order so as not to clobber the CC reg
>     before its use, if possible.
>
> 2015-12-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR rtl-optimization/68841
>     * gcc.dg/pr68841.c: New test.
Bernd Schmidt Jan. 5, 2016, 1:42 p.m. UTC | #2
On 12/21/2015 10:23 AM, Kyrill Tkachov wrote:
>
> Here is a version integrating the new checks into bbs_ok_for_cmove_arith.
> We might as well do it there since it already iterates over all the
> instructions in the basic blocks.

The patch looks somewhat complicated and asymmetrical to me. I tried to 
debug the issue, and found that bbs_ok_for_cmove_arith wasn't called at 
all. After changing that, like this:

@@ -2082,7 +2082,7 @@ noce_try_cmove_arith (struct noce_if_inf
  	}
      }

-  if (then_bb && else_bb && !a_simple && !b_simple
+  if (then_bb && else_bb
        && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
  	  || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
      return FALSE;

the testcase passes. Could you investigate whether this is sufficient?

>      * ifcvt.c (cond_exec_get_condition): Rename to...
>      (get_condition_from_jump): ... This.

Please don't.

> +/* { dg-options "-Og -fif-conversion -flive-range-shrinkage -fpeel-loops -frerun-cse-after-loop" } */

That's a very specific set of options there, I wonder how Zdenek even 
found that. Maybe we also want a copy in torture/ to test it more broadly?


Bernd
Kyrill Tkachov Jan. 5, 2016, 2:22 p.m. UTC | #3
On 05/01/16 13:42, Bernd Schmidt wrote:
> On 12/21/2015 10:23 AM, Kyrill Tkachov wrote:
>>
>> Here is a version integrating the new checks into bbs_ok_for_cmove_arith.
>> We might as well do it there since it already iterates over all the
>> instructions in the basic blocks.
>
> The patch looks somewhat complicated and asymmetrical to me. I tried to debug the issue, and found that bbs_ok_for_cmove_arith wasn't called at all. After changing that, like this:
>
> @@ -2082,7 +2082,7 @@ noce_try_cmove_arith (struct noce_if_inf
>      }
>      }
>
> -  if (then_bb && else_bb && !a_simple && !b_simple
> +  if (then_bb && else_bb
>        && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
>        || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
>      return FALSE;
>
> the testcase passes. Could you investigate whether this is sufficient?
>

This works around the issue but we don't want to do perform the check for pairs of
simple basic blocks because then we'll end up rejecting code that does things like:
x = cond ? x + 1 : x - 1
i.e. source of the set in both blocks reads and writes the same register.
We can deal with this safely later on in the function since we rename the destinations
of the two sets, so we don't want to reject this case here.
So your proposal rejects this case on that basis, which is overly restrictive and doesn't
deal with the underlying issue of the condition code being clobbered.
I had tried this approach initially but it caused code quality regressions on SPEC.

>>      * ifcvt.c (cond_exec_get_condition): Rename to...
>>      (get_condition_from_jump): ... This.
>
> Please don't.
>

Ok, I'll leave it as cond_exec_get_condition

>> +/* { dg-options "-Og -fif-conversion -flive-range-shrinkage -fpeel-loops -frerun-cse-after-loop" } */
>
> That's a very specific set of options there, I wonder how Zdenek even found that. Maybe we also want a copy in torture/ to test it more broadly?
>

Judging from some other reports from Zdenek that I've seen I think he tries a matrix of options, getting some
of the exotic combinations.
So should we have a copy in gcc.dg/ with the explicit options and a clean copy in torture?

Thanks,
Kyrill

>
> Bernd
diff mbox

Patch

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 2e1fe90e28c4a03b8c1287a45e7f12868aa7684e..c71767da3c76b5e8d4ec58cee934e5d064cfbee4 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -84,7 +84,7 @@  static rtx_insn *find_active_insn_after (basic_block, rtx_insn *);
 static basic_block block_fallthru (basic_block);
 static int cond_exec_process_insns (ce_if_block *, rtx_insn *, rtx, rtx, int,
 				    int);
-static rtx cond_exec_get_condition (rtx_insn *);
+static rtx get_condition_from_jump (rtx_insn *);
 static rtx noce_get_condition (rtx_insn *, rtx_insn **, bool);
 static int noce_operand_ok (const_rtx);
 static void merge_if_block (ce_if_block *);
@@ -421,7 +421,7 @@  cond_exec_process_insns (ce_if_block *ce_info ATTRIBUTE_UNUSED,
 /* Return the condition for a jump.  Do not do any special processing.  */
 
 static rtx
-cond_exec_get_condition (rtx_insn *jump)
+get_condition_from_jump (rtx_insn *jump)
 {
   rtx test_if, cond;
 
@@ -493,7 +493,7 @@  cond_exec_process_if_block (ce_if_block * ce_info,
 
   /* Find the conditional jump to the ELSE or JOIN part, and isolate
      the test.  */
-  test_expr = cond_exec_get_condition (BB_END (test_bb));
+  test_expr = get_condition_from_jump (BB_END (test_bb));
   if (! test_expr)
     return FALSE;
 
@@ -652,7 +652,7 @@  cond_exec_process_if_block (ce_if_block * ce_info,
 	    goto fail;
 
 	  /* Find the conditional jump and isolate the test.  */
-	  t = cond_exec_get_condition (BB_END (bb));
+	  t = get_condition_from_jump (BB_END (bb));
 	  if (! t)
 	    goto fail;
 
@@ -1897,10 +1897,13 @@  insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
 
 
 /* Return true iff the registers that the insns in BB_A set do not
-   get used in BB_B.  */
+   get used in BB_B.  Set A_CLOBBERS_CC_FOR_B to true if basic block BB_A
+   clobbers condition the code register CC and bb_b uses CC.  Set it
+   to false otherwise.  */
 
 static bool
-bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
+bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b,
+			 rtx cc, bool *a_clobbers_cc_for_b)
 {
   rtx_insn *a_insn;
   bitmap bba_sets = BITMAP_ALLOC (&reg_obstack);
@@ -1908,6 +1911,10 @@  bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
   df_ref def;
   df_ref use;
 
+  bool a_clobbers_cc = false;
+  bool b_uses_cc = false;
+  *a_clobbers_cc_for_b = false;
+
   FOR_BB_INSNS (bb_a, a_insn)
     {
       if (!active_insn_p (a_insn))
@@ -1921,6 +1928,9 @@  bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
 	  return false;
 	}
 
+      if (cc)
+	a_clobbers_cc |= set_of (cc, PATTERN (a_insn)) != NULL_RTX;
+
       /* Record all registers that BB_A sets.  */
       FOR_EACH_INSN_DEF (def, a_insn)
 	bitmap_set_bit (bba_sets, DF_REF_REGNO (def));
@@ -1949,11 +1959,15 @@  bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
 	  return false;
 	}
 
+      if (cc)
+	b_uses_cc |= reg_mentioned_p (cc, SET_SRC (sset_b));
+
       /* If the insn uses a reg set in BB_A return false.  */
       FOR_EACH_INSN_USE (use, b_insn)
 	{
 	  if (bitmap_bit_p (bba_sets, DF_REF_REGNO (use)))
 	    {
+	     *a_clobbers_cc_for_b = a_clobbers_cc && b_uses_cc;
 	      BITMAP_FREE (bba_sets);
 	      return false;
 	    }
@@ -1961,6 +1975,7 @@  bbs_ok_for_cmove_arith (basic_block bb_a, basic_block bb_b)
 
     }
 
+  *a_clobbers_cc_for_b = a_clobbers_cc && b_uses_cc;
   BITMAP_FREE (bba_sets);
   return true;
 }
@@ -2112,10 +2127,31 @@  noce_try_cmove_arith (struct noce_if_info *if_info)
 	}
     }
 
-  if (then_bb && else_bb && !a_simple && !b_simple
-      && (!bbs_ok_for_cmove_arith (then_bb, else_bb)
-	  || !bbs_ok_for_cmove_arith (else_bb, then_bb)))
-    return FALSE;
+  rtx cc = cc_in_cond (if_info->cond);
+  /* If there is no condition code register in cond
+     (noce_get_condition could have replaced it) look into the original
+     jump condition.  */
+  if (!cc)
+    {
+      rtx jmp_cond = get_condition_from_jump (if_info->jump);
+      if (jmp_cond)
+	cc = cc_in_cond (jmp_cond);
+    }
+
+  bool then_clobbers_cc_use_in_else = false;
+  bool else_clobbers_cc_use_in_then = false;
+
+  if (then_bb && else_bb)
+    {
+      bool bbs_ok = bbs_ok_for_cmove_arith (then_bb, else_bb, cc,
+				    &then_clobbers_cc_use_in_else);
+      bbs_ok &= bbs_ok_for_cmove_arith (else_bb, then_bb, cc,
+				      &else_clobbers_cc_use_in_then);
+      /* If one or more of the blocks is simple then the conflict is
+	  on X, which will be replaced with a temp, so we allow it.  */
+      if (!bbs_ok && !a_simple && !b_simple)
+	return FALSE;
+    }
 
   start_sequence ();
 
@@ -2233,8 +2269,10 @@  noce_try_cmove_arith (struct noce_if_info *if_info)
 
   /* If insn to set up A clobbers any registers B depends on, try to
      swap insn that sets up A with the one that sets up B.  If even
-     that doesn't help, punt.  */
-  if (modified_in_a && !modified_in_b)
+     that doesn't help, punt.  Make sure one basic block doesn't clobber
+     the condition code register before the other uses it.  */
+  if (modified_in_a && !modified_in_b
+      && !else_clobbers_cc_use_in_then)
     {
       if (!noce_emit_bb (emit_b, else_bb, b_simple))
 	goto end_seq_and_fail;
@@ -2242,7 +2280,8 @@  noce_try_cmove_arith (struct noce_if_info *if_info)
       if (!noce_emit_bb (emit_a, then_bb, a_simple))
 	goto end_seq_and_fail;
     }
-  else if (!modified_in_a)
+  else if (!modified_in_a
+	   && !then_clobbers_cc_use_in_else)
     {
       if (!noce_emit_bb (emit_a, then_bb, a_simple))
 	goto end_seq_and_fail;
@@ -5045,7 +5084,7 @@  dead_or_predicable (basic_block test_bb, basic_block merge_bb,
 
       rtx cond;
 
-      cond = cond_exec_get_condition (jump);
+      cond = get_condition_from_jump (jump);
       if (! cond)
 	return FALSE;
 
diff --git a/gcc/testsuite/gcc.dg/pr68841.c b/gcc/testsuite/gcc.dg/pr68841.c
new file mode 100644
index 0000000000000000000000000000000000000000..470048cc24f0d7150ed1e3141181bc1e8472ae12
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68841.c
@@ -0,0 +1,34 @@ 
+/* { dg-do run } */
+/* { dg-options "-Og -fif-conversion -flive-range-shrinkage -fpeel-loops -frerun-cse-after-loop" } */
+
+static inline int
+foo (int *x, int y)
+{
+  int z = *x;
+  while (y > z)
+    z *= 2;
+  return z;
+}
+
+int
+main ()
+{
+  int i;
+  for (i = 1; i < 17; i++)
+    {
+      int j;
+      int k;
+      j = foo (&i, 7);
+      if (i >= 7)
+	k = i;
+      else if (i >= 4)
+	k = 8 + (i - 4) * 2;
+      else if (i == 3)
+	k = 12;
+      else
+	k = 8;
+      if (j != k)
+	__builtin_abort ();
+    }
+  return 0;
+}