Patchwork Fix extendsidi2_1 splitting (PR rtl-optimization/57281, PR rtl-optimization/57300 wrong-code, alternative)

login
register
mail settings
Submitter Jakub Jelinek
Date May 17, 2013, 8:25 a.m.
Message ID <20130517082501.GQ1377@tucnak.redhat.com>
Download mbox | patch
Permalink /patch/244527/
State New
Headers show

Comments

Jakub Jelinek - May 17, 2013, 8:25 a.m.
On Thu, May 16, 2013 at 06:22:10PM +0200, Jakub Jelinek wrote:
> As discussed in the PR, there seem to be only 3 define_split
> patterns that use dead_or_set_p, one in i386.md and two in s390.md,
> but unfortunately insn splitting is done in many passes
> (combine, split{1,2,3,4,5}, dbr, pro_and_epilogue, final, sometimes mach)
> and only in combine the note problem is computed.  Computing the note
> problem in split{1,2,3,4,5} just because of the single pattern on i?86 -m32
> and one on s390x -m64 might be too expensive, and while neither of these
> targets do dbr scheduling, e.g. during final without cfg one can't
> df_analyze.
> 
> So, the following patch fixes it by doing the transformation instead
> in the peephole2 pass which computes the notes problem and has REG_DEAD
> notes up2date (and peep2_reg_dead_p is used there heavily and works).

Alternative, so far untested, patch is let the register is not dead splitter
do its job always during split2 and just fix it up during peephole2, if the
register was dead.

For the non-cltd case the peephole2 is always desirable, we get rid of
a register move and free one hard register for potential other uses after
peephole2 (cprop_hardreg? anything else that could benefit from that?).

For the cltd case, it is questionable, while we gain a free hard register
at that spot, it isn't guaranteed any pass will benefit from that, and
cltd is 2 byts smaller than sarl $31, %eax.  Though, not sure about the
performance.  So, the patch below doesn't use the second peephole2 for -Os.

2013-05-17  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/57281
	PR rtl-optimization/57300
	* config/i386/i386.md (extendsidi2_1 dead reg splitter): Remove.
	(extendsidi2_1 peephole2s): Add instead 2 new peephole2s, that undo
	what the other splitter did if the registers are dead.

	* gcc.dg/pr57300.c: New test.
	* gcc.c-torture/execute/pr57281.c: New test.


	Jakub
Jakub Jelinek - May 17, 2013, 12:15 p.m.
On Fri, May 17, 2013 at 10:25:01AM +0200, Jakub Jelinek wrote:
> Alternative, so far untested, patch is let the register is not dead splitter
> do its job always during split2 and just fix it up during peephole2, if the
> register was dead.

Now fully bootstrapped/regtested on x86_64-linux and i686-linux.

> 2013-05-17  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR rtl-optimization/57281
> 	PR rtl-optimization/57300
> 	* config/i386/i386.md (extendsidi2_1 dead reg splitter): Remove.
> 	(extendsidi2_1 peephole2s): Add instead 2 new peephole2s, that undo
> 	what the other splitter did if the registers are dead.
> 
> 	* gcc.dg/pr57300.c: New test.
> 	* gcc.c-torture/execute/pr57281.c: New test.

	Jakub
Richard Henderson - May 17, 2013, 3:07 p.m.
On 05/17/2013 01:25 AM, Jakub Jelinek wrote:
> +(define_peephole2
> +  [(set (match_operand:SI 0 "memory_operand")
> +	(match_operand:SI 1 "register_operand"))
> +   (set (match_operand:SI 2 "register_operand") (match_dup 1))
> +   (parallel [(set (match_dup 2)
> +		   (ashiftrt:SI (match_dup 2)
> +				(match_operand:QI 3 "const_int_operand")))
> +	       (clobber (reg:CC FLAGS_REG))])
> +   (set (match_operand:SI 4 "memory_operand") (match_dup 2))]
> +  "INTVAL (operands[3]) == 31

No sense in using match_operand in the pattern and INTVAL == 31
in the condition when you can just use (const_int 31) in the pattern.

Modulo those two cases, the patch is ok.


r~

Patch

--- gcc/config/i386/i386.md.jj	2013-05-16 18:22:59.000000000 +0200
+++ gcc/config/i386/i386.md	2013-05-17 10:11:20.365455394 +0200
@@ -3332,22 +3332,8 @@  (define_insn "extendsidi2_1"
   "!TARGET_64BIT"
   "#")
 
-;; Extend to memory case when source register does die.
-(define_split
-  [(set (match_operand:DI 0 "memory_operand")
-	(sign_extend:DI (match_operand:SI 1 "register_operand")))
-   (clobber (reg:CC FLAGS_REG))
-   (clobber (match_operand:SI 2 "register_operand"))]
-  "(reload_completed
-    && dead_or_set_p (insn, operands[1])
-    && !reg_mentioned_p (operands[1], operands[0]))"
-  [(set (match_dup 3) (match_dup 1))
-   (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31)))
-	      (clobber (reg:CC FLAGS_REG))])
-   (set (match_dup 4) (match_dup 1))]
-  "split_double_mode (DImode, &operands[0], 1, &operands[3], &operands[4]);")
-
-;; Extend to memory case when source register does not die.
+;; Split the memory case.  If the source register doesn't die, it will stay
+;; this way, if it does die, following peephole2s take care of it.
 (define_split
   [(set (match_operand:DI 0 "memory_operand")
 	(sign_extend:DI (match_operand:SI 1 "register_operand")))
@@ -3376,6 +3362,48 @@  (define_split
   DONE;
 })
 
+;; Peepholes for the case where the source register does die, after
+;; being split with the above splitter.
+(define_peephole2
+  [(set (match_operand:SI 0 "memory_operand")
+	(match_operand:SI 1 "register_operand"))
+   (set (match_operand:SI 2 "register_operand") (match_dup 1))
+   (parallel [(set (match_dup 2)
+		   (ashiftrt:SI (match_dup 2)
+				(match_operand:QI 3 "const_int_operand")))
+	       (clobber (reg:CC FLAGS_REG))])
+   (set (match_operand:SI 4 "memory_operand") (match_dup 2))]
+  "INTVAL (operands[3]) == 31
+   && REGNO (operands[1]) != REGNO (operands[2])
+   && peep2_reg_dead_p (2, operands[1])
+   && peep2_reg_dead_p (4, operands[2])
+   && !reg_mentioned_p (operands[2], operands[4])"
+  [(set (match_dup 0) (match_dup 1))
+   (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31)))
+	      (clobber (reg:CC FLAGS_REG))])
+   (set (match_dup 4) (match_dup 1))])
+
+(define_peephole2
+  [(set (match_operand:SI 0 "memory_operand")
+	(match_operand:SI 1 "register_operand"))
+   (parallel [(set (match_operand:SI 2 "register_operand")
+		   (ashiftrt:SI (match_dup 1)
+				(match_operand:QI 3 "const_int_operand")))
+	       (clobber (reg:CC FLAGS_REG))])
+   (set (match_operand:SI 4 "memory_operand") (match_dup 2))]
+  "INTVAL (operands[3]) == 31
+   /* cltd is shorter than sarl $31, %eax */
+   && !optimize_function_for_size_p (cfun)
+   && true_regnum (operands[1]) == AX_REG
+   && true_regnum (operands[2]) == DX_REG
+   && peep2_reg_dead_p (2, operands[1])
+   && peep2_reg_dead_p (3, operands[2])
+   && !reg_mentioned_p (operands[2], operands[4])"
+  [(set (match_dup 0) (match_dup 1))
+   (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31)))
+	      (clobber (reg:CC FLAGS_REG))])
+   (set (match_dup 4) (match_dup 1))])
+
 ;; Extend to register case.  Optimize case where source and destination
 ;; registers match and cases where we can use cltd.
 (define_split
--- gcc/testsuite/gcc.dg/pr57300.c.jj	2013-05-16 15:51:25.084707211 +0200
+++ gcc/testsuite/gcc.dg/pr57300.c	2013-05-16 15:51:25.084707211 +0200
@@ -0,0 +1,21 @@ 
+/* PR rtl-optimization/57300 */
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+/* { dg-additional-options "-msse2" { target sse2_runtime } } */
+
+extern void abort (void);
+int a, b, d[10];
+long long c;
+
+int
+main ()
+{
+  int e;
+  for (e = 0; e < 10; e++)
+    d[e] = 1;
+  if (d[0])
+    c = a = (b == 0 || 1 % b);
+  if (a != 1)
+    abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr57281.c.jj	2013-05-16 15:51:25.085707131 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr57281.c	2013-05-16 15:51:25.085707131 +0200
@@ -0,0 +1,25 @@ 
+/* PR rtl-optimization/57281 */
+
+int a = 1, b, d, *e = &d;
+long long c, *g = &c;
+volatile long long f;
+
+int
+foo (int h)
+{
+  int j = *g = b;
+  return h == 0 ? j : 0;
+}
+
+int
+main ()
+{
+  int h = a;
+  for (; b != -20; b--)
+    {
+      (int) f;
+      *e = 0;
+      *e = foo (h);
+    }
+  return 0;
+}