Patchwork Fix extendsidi2_1 splitting (PR rtl-optimization/57281, PR rtl-optimization/57300 wrong-code)

login
register
mail settings
Submitter Jakub Jelinek
Date May 16, 2013, 4:22 p.m.
Message ID <20130516162210.GI1377@tucnak.redhat.com>
Download mbox | patch
Permalink /patch/244392/
State New
Headers show

Comments

Jakub Jelinek - May 16, 2013, 4:22 p.m.
Hi!

As discussed in the PR, there seem to be only 3 define_split
patterns that use dead_or_set_p, one in i386.md and two in s390.md,
but unfortunately insn splitting is done in many passes
(combine, split{1,2,3,4,5}, dbr, pro_and_epilogue, final, sometimes mach)
and only in combine the note problem is computed.  Computing the note
problem in split{1,2,3,4,5} just because of the single pattern on i?86 -m32
and one on s390x -m64 might be too expensive, and while neither of these
targets do dbr scheduling, e.g. during final without cfg one can't
df_analyze.

So, the following patch fixes it by doing the transformation instead
in the peephole2 pass which computes the notes problem and has REG_DEAD
notes up2date (and peep2_reg_dead_p is used there heavily and works).

The splitters in i386.md for extendsidi2_1 were reload_completed, and I
think the usual case is that the pattern emerges already from register
allocation, the ordering of the relevant passes for i386 then is:
split2, ..., pro_and_epilogue, ..., peephole2, ..., split{4,3}, ... split5 not run, ... final
When the first splitter (for the case where the register is dead) is turned
into peephole2, and the latter remains define_split (so that it is split
even if matched later on), we need to prevent the second splitter
from splitting during split2, because then peephole2 would likely never
match.  That is done through the epilogue_completed check, while it isn't
exactly peephole2_completed, it is close enough (there are no other
splitting passes in between pro_and_epilogue and peephole2).

For -O0, which doesn't run either split{3,4,5} passes, we need to use
reload_completed, so that it is split at split2, peephole2 won't be run
anyway, and if -fno-peephole2, there is no point not splitting immediately
during split2 either.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8?

If applied, I think s390x should do something similar.

2013-05-16  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/57281
	PR rtl-optimization/57300
	* config/i386/i386.md (extendsidi2_1 splits): Turn the first
	one into define_peephole2 from define_split, use peep2_reg_dead_p
	instead of dead_or_set_p.  Guard the second splitter with
	epilogue_completed if possible.

	* gcc.dg/pr57300.c: New test.
	* gcc.c-torture/execute/pr57281.c: New test.


	Jakub

Patch

--- gcc/config/i386/i386.md.jj	2013-05-16 12:36:29.669418198 +0200
+++ gcc/config/i386/i386.md	2013-05-16 16:03:08.833424642 +0200
@@ -3333,13 +3333,13 @@  (define_insn "extendsidi2_1"
   "#")
 
 ;; Extend to memory case when source register does die.
-(define_split
-  [(set (match_operand:DI 0 "memory_operand")
-	(sign_extend:DI (match_operand:SI 1 "register_operand")))
-   (clobber (reg:CC FLAGS_REG))
-   (clobber (match_operand:SI 2 "register_operand"))]
-  "(reload_completed
-    && dead_or_set_p (insn, operands[1])
+;; This is a peephole2, so that it can use peep2_reg_dead_p.
+(define_peephole2
+  [(parallel [(set (match_operand:DI 0 "memory_operand")
+		   (sign_extend:DI (match_operand:SI 1 "register_operand")))
+	      (clobber (reg:CC FLAGS_REG))
+	      (clobber (match_operand:SI 2 "register_operand"))])]
+  "(peep2_reg_dead_p (1, operands[1])
     && !reg_mentioned_p (operands[1], operands[0]))"
   [(set (match_dup 3) (match_dup 1))
    (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31)))
@@ -3348,12 +3348,15 @@  (define_split
   "split_double_mode (DImode, &operands[0], 1, &operands[3], &operands[4]);")
 
 ;; Extend to memory case when source register does not die.
+;; Guarded by epilogue_completed, instead of reload_completed, if possible,
+;; so that it isn't split before peephole2 has been run.  Don't do that
+;; if none of the split[345] passes will be run or peephole2 will not happen.
 (define_split
   [(set (match_operand:DI 0 "memory_operand")
 	(sign_extend:DI (match_operand:SI 1 "register_operand")))
    (clobber (reg:CC FLAGS_REG))
    (clobber (match_operand:SI 2 "register_operand"))]
-  "reload_completed"
+  "(optimize > 0 && flag_peephole2) ? epilogue_completed : reload_completed"
   [(const_int 0)]
 {
   split_double_mode (DImode, &operands[0], 1, &operands[3], &operands[4]);
--- gcc/testsuite/gcc.dg/pr57300.c.jj	2013-05-16 15:51:25.084707211 +0200
+++ gcc/testsuite/gcc.dg/pr57300.c	2013-05-16 15:51:25.084707211 +0200
@@ -0,0 +1,21 @@ 
+/* PR rtl-optimization/57300 */
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+/* { dg-additional-options "-msse2" { target sse2_runtime } } */
+
+extern void abort (void);
+int a, b, d[10];
+long long c;
+
+int
+main ()
+{
+  int e;
+  for (e = 0; e < 10; e++)
+    d[e] = 1;
+  if (d[0])
+    c = a = (b == 0 || 1 % b);
+  if (a != 1)
+    abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr57281.c.jj	2013-05-16 15:51:25.085707131 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr57281.c	2013-05-16 15:51:25.085707131 +0200
@@ -0,0 +1,25 @@ 
+/* PR rtl-optimization/57281 */
+
+int a = 1, b, d, *e = &d;
+long long c, *g = &c;
+volatile long long f;
+
+int
+foo (int h)
+{
+  int j = *g = b;
+  return h == 0 ? j : 0;
+}
+
+int
+main ()
+{
+  int h = a;
+  for (; b != -20; b--)
+    {
+      (int) f;
+      *e = 0;
+      *e = foo (h);
+    }
+  return 0;
+}