Comments
Patch
===================================================================
@@ -28313,6 +28314,45 @@ ix86_secondary_reload (bool in_p, rtx x,
return Q_REGS;
}
+ /* This condition handles corner case where an expression involving
+ pointers gets vectorized. We're trying to use the address of a
+ stack slot as a vector initializer.
+
+ (set (reg:V2DI 74 [ vect_cst_.2 ])
+ (vec_duplicate:V2DI (reg/f:DI 20 frame)))
+
+ Eventually frame gets turned into sp+offset like this:
+
+ (set (reg:V2DI 21 xmm0 [orig:74 vect_cst_.2 ] [74])
+ (vec_duplicate:V2DI (plus:DI (reg/f:DI 7 sp)
+ (const_int 392 [0x188]))))
+
+ That later gets turned into:
+
+ (set (reg:V2DI 21 xmm0 [orig:74 vect_cst_.2 ] [74])
+ (vec_duplicate:V2DI (plus:DI (reg/f:DI 7 sp)
+ (mem/u/c/i:DI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S8 A64]))))
+
+ We'll have the following reload recorded:
+
+ Reload 0: reload_in (DI) =
+ (plus:DI (reg/f:DI 7 sp)
+ (mem/u/c/i:DI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S8 A64]))
+ reload_out (V2DI) = (reg:V2DI 21 xmm0 [orig:74 vect_cst_.2 ] [74])
+ SSE_REGS, RELOAD_OTHER (opnum = 0), can't combine
+ reload_in_reg: (plus:DI (reg/f:DI 7 sp) (const_int 392 [0x188]))
+ reload_out_reg: (reg:V2DI 21 xmm0 [orig:74 vect_cst_.2 ] [74])
+ reload_reg_rtx: (reg:V2DI 22 xmm1)
+
+ Which isn't going to work since SSE instructions can't handle scalar
+ additions. Returning GENERAL_REGS forces the addition into integer
+ register and reload can handle subsequent reloads without problems. */
+
+ if (in_p && GET_CODE (x) == PLUS
+ && SSE_CLASS_P (rclass)
+ && SCALAR_INT_MODE_P (mode))
+ return GENERAL_REGS;
+
return NO_REGS;
}
===================================================================
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -ftree-vectorize -msse" } */
+
+typedef struct {} S;
+
+void *foo()
+{
+ S a[64], *p[64];
+ int i;
+
+ for (i = 0; i < 64; i++)
+ p[i] = &a[i];
+ return p[0];
+}
Hello! Attached patch fixes PR target/43653. The core of the problem is, that frame pointer register leaks into SSE vector instruction. Eventually, frame pointer gets eliminated to SP+offset, and reload chokes on reload from SP+offset to SSE register. The solution (as proposed by Jeff) is to help reload by specifying GENERAL_REGs when PLUS RTX is to be reloaded. 2011-02-16 Uros Bizjak <ubizjak@gmail.com> * config/i386/i386.c (ix86_secondary_reload): Handle SSE input reload with PLUS RTX. testsuite/ChangeLog: 2011-02-16 Uros Bizjak <ubizjak@gmail.com> * gcc.target/i386/pr43653.c: New test. Patch was tested on x86_64-pc-linux-gnu {,-m32}. Patch will be committed to mainline and backported to release branches. Uros.