, PR target/71294, Fix compiler segfault for reload & power8

Submitted by Michael Meissner on March 15, 2017, 10:31 p.m.

Details

Message ID 20170315223112.GA28120@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner March 15, 2017, 10:31 p.m.
This patch is a simpler approach to fixing PR 71294, then I proposed last year.

The core of the problem is when we are creating V2DI/V2DF vectors via a splat
operation, the predicate for the second argument allows memory addresses, GPR
registers as well as VSX registers, but when we compile for power8, the
constraints only allow memory addresses and VSX registers.

I fixed this by allowing a GPR -> VSX splat operation and after reload,
splitting it to the appropriate scalar direct move and then XXPERMDI to form
the vector.  I did look at changing the splat_input_operation predicate to not
allow GPR registers in this case, but reload generates worse code in this case.

I have built compilers on a little endian Power8 system (64-bit only), big
endian Power8 system (64-bit only) and big endian Power7 system (32-bit and
64-bit).  At the moment, the two power8 systems have no regressions, and the
power7 system is still building.  Assuming the power7 system completes without
regressions, can I check this patch into the trunk?

Assuming the patch applies and causes no regressions, can I check this patch
into GCC 6 and 5 as well.  Since GCC 6/5 do run with reload as the default,
they would likely see the issue.  GCC 7 has switched to LRA, which is more
robust, and doesn't show the problem (unless you use the -mno-lra switch).

[gcc]
2017-03-15  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71294
	* config/rs6000/vsx.md (vsx_splat_<mode>, VSX_D iterator): Allow a
	SPLAT operation on ISA 2.07 64-bit systems that have direct move,
	but no MTVSRDD support, by doing MTVSRD and XXPERMDI.

[gcc/testsuite]
2017-03-15  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71294
	* g++.dg/pr71294.C: New test.

Comments

Segher Boessenkool March 16, 2017, 7:31 p.m.
Hi Mike,

On Wed, Mar 15, 2017 at 06:31:12PM -0400, Michael Meissner wrote:
> This patch is a simpler approach to fixing PR 71294, then I proposed last year.

Much simpler!  Thank you.

> I have built compilers on a little endian Power8 system (64-bit only), big
> endian Power8 system (64-bit only) and big endian Power7 system (32-bit and
> 64-bit).  At the moment, the two power8 systems have no regressions, and the
> power7 system is still building.  Assuming the power7 system completes without
> regressions, can I check this patch into the trunk?

Yes please.

> Assuming the patch applies and causes no regressions, can I check this patch
> into GCC 6 and 5 as well.  Since GCC 6/5 do run with reload as the default,
> they would likely see the issue.  GCC 7 has switched to LRA, which is more
> robust, and doesn't show the problem (unless you use the -mno-lra switch).

Okay after a week or so of soaking.


Segher

Patch hide | download patch | download mbox

Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 246137)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -3067,16 +3067,29 @@  (define_expand "vsx_mergeh_<mode>"
 })
 
 ;; V2DF/V2DI splat
-(define_insn "vsx_splat_<mode>"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSa>,<VSa>,we")
+(define_insn_and_split "vsx_splat_<mode>"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand"
+					"=<VSa>,    <VSa>,we,<VS_64dm>")
 	(vec_duplicate:VSX_D
-	 (match_operand:<VS_scalar> 1 "splat_input_operand" "<VS_64reg>,Z,b")))]
+	 (match_operand:<VS_scalar> 1 "splat_input_operand"
+					"<VS_64reg>,Z,    b, wr")))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
   "@
    xxpermdi %x0,%x1,%x1,0
    lxvdsx %x0,%y1
-   mtvsrdd %x0,%1,%1"
-  [(set_attr "type" "vecperm,vecload,vecperm")])
+   mtvsrdd %x0,%1,%1
+   #"
+  "&& reload_completed && TARGET_POWERPC64 && !TARGET_P9_VECTOR
+   && int_reg_operand (operands[1], <VS_scalar>mode)"
+  [(set (match_dup 2)
+	(match_dup 1))
+   (set (match_dup 0)
+	(vec_duplicate:VSX_D (match_dup 2)))]
+{
+  operands[2] = gen_rtx_REG (<VS_scalar>mode, reg_or_subregno (operands[0]));
+}
+  [(set_attr "type" "vecperm,vecload,vecperm,vecperm")
+   (set_attr "length" "4,4,4,8")])
 
 ;; V4SI splat support
 (define_insn "vsx_splat_v4si"
Index: gcc/testsuite/g++.dg/pr71294.C
===================================================================
--- gcc/testsuite/g++.dg/pr71294.C	(revision 0)
+++ gcc/testsuite/g++.dg/pr71294.C	(working copy)
@@ -0,0 +1,60 @@ 
+// { dg-do compile { target { powerpc64*-*-* && lp64 } } }
+// { dg-require-effective-target powerpc_p8vector_ok } */
+// { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } }
+// { dg-options "-mcpu=power8 -O3 -fstack-protector -mno-lra" }
+
+// PAR target/71294 failed because RELOAD could not figure how create a V2DI
+// vector that auto vectorization created with each element being the same
+// stack address, with stack-protector turned on.
+
+class A;
+template <typename _Tp, int m, int n> class B {
+public:
+  _Tp val[m * n];
+};
+class C {
+public:
+  C(A);
+};
+struct D {
+  D();
+  unsigned long &operator[](int);
+  unsigned long *p;
+};
+class A {
+public:
+  template <typename _Tp, int m, int n> A(const B<_Tp, m, n> &, bool);
+  int rows, cols;
+  unsigned char *data;
+  unsigned char *datastart;
+  unsigned char *dataend;
+  unsigned char *datalimit;
+  D step;
+};
+template <typename _Tp, int m, int n>
+A::A(const B<_Tp, m, n> &p1, bool)
+    : rows(m), cols(n) {
+  step[0] = cols * sizeof(_Tp);
+  datastart = data = (unsigned char *)p1.val;
+  datalimit = dataend = datastart + rows * step[0];
+}
+class F {
+public:
+  static void compute(C);
+  template <typename _Tp, int m, int n, int nm>
+  static void compute(const B<_Tp, m, n> &, B<_Tp, nm, 1> &, B<_Tp, m, nm> &,
+                      B<_Tp, n, nm> &);
+};
+D::D() {}
+unsigned long &D::operator[](int p1) { return p[p1]; }
+template <typename _Tp, int m, int n, int nm>
+void F::compute(const B<_Tp, m, n> &, B<_Tp, nm, 1> &, B<_Tp, m, nm> &,
+                B<_Tp, n, nm> &p4) {
+  A a(p4, false);
+  compute(a);
+}
+void fn1() {
+  B<double, 4, 4> b, c, e;
+  B<double, 4, 1> d;
+  F::compute(b, d, c, e);
+}