diff mbox series

combine: Fix for PR82024

Message ID 3a127cd6a785b1e067bf8458e8010ee01980c228.1504185832.git.segher@kernel.crashing.org
State New
Headers show
Series combine: Fix for PR82024 | expand

Commit Message

Segher Boessenkool Sept. 1, 2017, 1:32 a.m. UTC
With the testcase in the PR, with all the command line options mentioned
there, a (comparison) instruction becomes dead in fwprop1 but is not
deleted until all the way in rtl_dce.

Before combine this insn look like:

20: flags:CC=cmp(r106:DI,0xffffffffffffffff)
      REG_DEAD r106:DI
      REG_UNUSED flags:CC
      REG_EQUAL cmp(0,0xffffffffffffffff)

(note the only output is unused).

Combining some earlier insns gives

13: r106:DI=0
14: r105:DI=r101:DI+r103:DI

14+13+20 then gives

(parallel [
        (set (reg:CC 17 flags)
            (compare:CC (const_int 0 [0])
                (const_int -1 [0xffffffffffffffff])))
        (set (reg:DI 105)
            (plus:DI (reg/v:DI 101 [ e ])
                (reg:DI 103)))
    ])

which doesn't match; but the set of flags is dead, so combine makes the
set of r105 the whole new instruction, which it then places at i3.  But
that is wrong, because r105 is used after i2 but before i3!  We forget
to check for that in this case.

This patch fixes it.  Bootstrapped and tested on powerpc64-linux {-m32,-m64},
and on x86_64-linux, and tested it doesn't regress output by comparing Linux
kernel builds for 30 targets.

I'll commit this tomorrow.


Segher


2017-08-31  Segher Boessenkool  <segher@kernel.crashing.org>

	PR rtl-optimization/82024
	* combine.c (try_combine): If the combination result is a PARALLEL,
	and we only need to retain the SET in there that would be placed
	at I2, check that we can place that at I3 instead, before doing so.

---
 gcc/combine.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)
diff mbox series

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index c139cf6..bf3eda8 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3505,7 +3505,10 @@  try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
      i3, and one from i2.  Combining then splitting the parallel results
      in the original i2 again plus an invalid insn (which we delete).
      The net effect is only to move instructions around, which makes
-     debug info less accurate.  */
+     debug info less accurate.
+
+     If the remaining SET came from I2 its destination should not be used
+     between I2 and I3.  See PR82024.  */
 
   if (!(added_sets_2 && i1 == 0)
       && is_parallel_of_n_reg_sets (newpat, 2)
@@ -3534,11 +3537,17 @@  try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
 	       && insn_nothrow_p (i3)
 	       && !side_effects_p (SET_SRC (set0)))
 	{
-	  newpat = set1;
-	  insn_code_number = recog_for_combine (&newpat, i3, &new_i3_notes);
+	  rtx dest = SET_DEST (set1);
+	  if (GET_CODE (dest) == SUBREG)
+	    dest = SUBREG_REG (dest);
+	  if (!reg_used_between_p (dest, i2, i3))
+	    {
+	      newpat = set1;
+	      insn_code_number = recog_for_combine (&newpat, i3, &new_i3_notes);
 
-	  if (insn_code_number >= 0)
-	    changed_i3_dest = 1;
+	      if (insn_code_number >= 0)
+		changed_i3_dest = 1;
+	    }
 	}
 
       if (insn_code_number < 0)