diff mbox

[#2] , PR 71201, Fix xxperm fusion on PowerPC ISA 3.0

Message ID 20160523221736.GA10818@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner May 23, 2016, 10:17 p.m. UTC
On Thu, May 19, 2016 at 10:33:41AM -0500, Segher Boessenkool wrote:
> On Thu, May 19, 2016 at 10:53:41AM -0400, Michael Meissner wrote:
> > GCC 6.1 added support for the XXPERM instruction for the PowerPC ISA 3.0.  The
> > XXPERM instruction is essentially a 4 operand instruction, with only 3 operands
> > in the instruction (the target register overlaps with the first input
> > register).  The Power9 hardware has fusion support where if the instruction
> > that precedes the XXPERM is a XXLOR move instruction to set the first input
> > argument, it is fused with the XXPERM.  I added code to support this fusion.
> > 
> > Unfortunately, in running the testsuite on the power9 simulator, we discovered
> > that the test gcc.c-torture/execute/pr56866.c would fail because the fusion
> > alternatives confused the register allocator and/or the passes after the
> > register allocator.  This patch removes the explicit fusion support from
> > XXPERM.
> 
> Okay.  Please keep the PR open until that problem is fixed.  It also
> shouldn't be "target" category, if the problem is RA.
> 
> > In addition, ISA 3.0 added XXPERMR and VPERMR instructions for little endian
> > support where the permute vector reverses the bytes.  This patch adds support
> > for XXPERMR/VPERMR.
> 
> Please send that as a separate patch, it has nothing to do with the PR.
> 
> > +	x = gen_rtx_UNSPEC (mode,
> > +			    gen_rtvec (3, target, reg, 
> 
> Trailing space.
> 
> > +  if (TARGET_P9_VECTOR)
> > +    {
> > +      unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op0, op1, sel), 
> 
> And another.
> 
> > +	 The VNAND is preferred for future fusion opportunities.  */
> > +      notx = gen_rtx_NOT (V16QImode, sel);
> > +      iorx = (TARGET_P8_VECTOR
> > +	      ? gen_rtx_IOR (V16QImode, notx, notx)
> > +	      : gen_rtx_AND (V16QImode, notx, notx));
> > +      emit_insn (gen_rtx_SET (norreg, iorx));
> > +      
> 
> Some more.
> 
> > +/* { dg-final { scan-assembler	   "vpermr\|xxpermr" } } */
> 
> Tab in the middle of the line.

This patch just fixes the xxperm fusion problem, and I will submit the
vpermr/xxpermr support in another patch.

Note, if you believe the register allocator and the post reload RTL passes need
to be fixed to allow the fusion of the move to the xxperm, that is fine.
However, take it on yourself.  As the person who wrote the code to add fusion
support for xxperm, I now think it was a bad idea, and I want to remove that
support.  It would probably be better done by modifying the scheduler to keep
the move and xxperm together, rather than including it in the insn.

I have bootstrapped the compiler on a little endian power9 system and there
were no regressions in the test suite.  Is it ok to check into the trunk and on
the 6.1 branch?

[gcc]
2016-05-23  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71201
	* config/rs6000/altivec.md (altivec_vperm_<mode>_internal): Drop
	ISA 3.0 xxperm fusion alternative.
	(altivec_vperm_v8hiv16qi): Likewise.
	(altivec_vperm_<mode>_uns_internal): Likewise.
	(vperm_v8hiv4si): Likewise.
	(vperm_v16qiv8hi): Likewise.

[gcc/testsuite]
2016-05-23  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Kelvin Nilsen  <kelvin@gcc.gnu.org>

	* gcc.target/powerpc/p9-permute.c: Run test on big endian as well
	as little endian.

Comments

Segher Boessenkool May 23, 2016, 11:13 p.m. UTC | #1
Hi,

On Mon, May 23, 2016 at 06:17:36PM -0400, Michael Meissner wrote:
> > > Unfortunately, in running the testsuite on the power9 simulator, we discovered
> > > that the test gcc.c-torture/execute/pr56866.c would fail because the fusion
> > > alternatives confused the register allocator and/or the passes after the
> > > register allocator.  This patch removes the explicit fusion support from
> > > XXPERM.
> > 
> > Okay.  Please keep the PR open until that problem is fixed.  It also
> > shouldn't be "target" category, if the problem is RA.

> This patch just fixes the xxperm fusion problem, and I will submit the
> vpermr/xxpermr support in another patch.

Thanks.

> Note, if you believe the register allocator and the post reload RTL passes need
> to be fixed to allow the fusion of the move to the xxperm, that is fine.
> However, take it on yourself.

I'm just saying that if the RA (and later) are "confused", that is their
problem, not a target problem.  Or I'm not understanding what the problem
is.  Maybe it is just target abusing the RA?  Either way...

> As the person who wrote the code to add fusion
> support for xxperm, I now think it was a bad idea, and I want to remove that
> support.  It would probably be better done by modifying the scheduler to keep
> the move and xxperm together, rather than including it in the insn.

That should give most of the win without most of the complexity.  I like
that plan ;-)

> I have bootstrapped the compiler on a little endian power9 system and there
> were no regressions in the test suite.  Is it ok to check into the trunk and on
> the 6.1 branch?
> 
> [gcc]
> 2016-05-23  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 
> 	PR target/71201
> 	* config/rs6000/altivec.md (altivec_vperm_<mode>_internal): Drop
> 	ISA 3.0 xxperm fusion alternative.
> 	(altivec_vperm_v8hiv16qi): Likewise.
> 	(altivec_vperm_<mode>_uns_internal): Likewise.
> 	(vperm_v8hiv4si): Likewise.
> 	(vperm_v16qiv8hi): Likewise.
> 
> [gcc/testsuite]
> 2016-05-23  Michael Meissner  <meissner@linux.vnet.ibm.com>
> 	    Kelvin Nilsen  <kelvin@gcc.gnu.org>
> 
> 	* gcc.target/powerpc/p9-permute.c: Run test on big endian as well
> 	as little endian.

Okay for trunk.  Okay for 6 after a week or so.


Segher
diff mbox

Patch

Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc)	(revision 236600)
+++ gcc/config/rs6000/altivec.md	(.../gcc)	(working copy)
@@ -1952,32 +1952,30 @@  (define_expand "altivec_vperm_<mode>"
 
 ;; Slightly prefer vperm, since the target does not overlap the source
 (define_insn "*altivec_vperm_<mode>_internal"
-  [(set (match_operand:VM 0 "register_operand" "=v,?wo,?&wo")
-	(unspec:VM [(match_operand:VM 1 "register_operand" "v,0,wo")
-		    (match_operand:VM 2 "register_operand" "v,wo,wo")
-		    (match_operand:V16QI 3 "register_operand" "v,wo,wo")]
+  [(set (match_operand:VM 0 "register_operand" "=v,?wo")
+	(unspec:VM [(match_operand:VM 1 "register_operand" "v,0")
+		    (match_operand:VM 2 "register_operand" "v,wo")
+		    (match_operand:V16QI 3 "register_operand" "v,wo")]
 		   UNSPEC_VPERM))]
   "TARGET_ALTIVEC"
   "@
    vperm %0,%1,%2,%3
-   xxperm %x0,%x2,%x3
-   xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3"
+   xxperm %x0,%x2,%x3"
   [(set_attr "type" "vecperm")
-   (set_attr "length" "4,4,8")])
+   (set_attr "length" "4")])
 
 (define_insn "altivec_vperm_v8hiv16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=v,?wo,?&wo")
-	(unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v,0,wo")
-   	               (match_operand:V8HI 2 "register_operand" "v,wo,wo")
-		       (match_operand:V16QI 3 "register_operand" "v,wo,wo")]
+  [(set (match_operand:V16QI 0 "register_operand" "=v,?wo")
+	(unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v,0")
+   	               (match_operand:V8HI 2 "register_operand" "v,wo")
+		       (match_operand:V16QI 3 "register_operand" "v,wo")]
 		   UNSPEC_VPERM))]
   "TARGET_ALTIVEC"
   "@
    vperm %0,%1,%2,%3
-   xxperm %x0,%x2,%x3
-   xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3"
+   xxperm %x0,%x2,%x3"
   [(set_attr "type" "vecperm")
-   (set_attr "length" "4,4,8")])
+   (set_attr "length" "4")])
 
 (define_expand "altivec_vperm_<mode>_uns"
   [(set (match_operand:VM 0 "register_operand" "")
@@ -1995,18 +1993,17 @@  (define_expand "altivec_vperm_<mode>_uns
 })
 
 (define_insn "*altivec_vperm_<mode>_uns_internal"
-  [(set (match_operand:VM 0 "register_operand" "=v,?wo,?&wo")
-	(unspec:VM [(match_operand:VM 1 "register_operand" "v,0,wo")
-		    (match_operand:VM 2 "register_operand" "v,wo,wo")
-		    (match_operand:V16QI 3 "register_operand" "v,wo,wo")]
+  [(set (match_operand:VM 0 "register_operand" "=v,?wo")
+	(unspec:VM [(match_operand:VM 1 "register_operand" "v,0")
+		    (match_operand:VM 2 "register_operand" "v,wo")
+		    (match_operand:V16QI 3 "register_operand" "v,wo")]
 		   UNSPEC_VPERM_UNS))]
   "TARGET_ALTIVEC"
   "@
    vperm %0,%1,%2,%3
-   xxperm %x0,%x2,%x3
-   xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3"
+   xxperm %x0,%x2,%x3"
   [(set_attr "type" "vecperm")
-   (set_attr "length" "4,4,8")])
+   (set_attr "length" "4")])
 
 (define_expand "vec_permv16qi"
   [(set (match_operand:V16QI 0 "register_operand" "")
@@ -2844,32 +2841,30 @@  (define_expand "vec_unpacks_lo_<VP_small
   "")
 
 (define_insn "vperm_v8hiv4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=v,?wo,?&wo")
-        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v,0,wo")
-		      (match_operand:V4SI 2 "register_operand" "v,wo,wo")
-		      (match_operand:V16QI 3 "register_operand" "v,wo,wo")]
+  [(set (match_operand:V4SI 0 "register_operand" "=v,?wo")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v,0")
+		      (match_operand:V4SI 2 "register_operand" "v,wo")
+		      (match_operand:V16QI 3 "register_operand" "v,wo")]
                   UNSPEC_VPERMSI))]
   "TARGET_ALTIVEC"
   "@
    vperm %0,%1,%2,%3
-   xxperm %x0,%x2,%x3
-   xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3"
+   xxperm %x0,%x2,%x3"
   [(set_attr "type" "vecperm")
-   (set_attr "length" "4,4,8")])
+   (set_attr "length" "4")])
 
 (define_insn "vperm_v16qiv8hi"
-  [(set (match_operand:V8HI 0 "register_operand" "=v,?wo,?&wo")
-        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v,0,wo")
-		      (match_operand:V8HI 2 "register_operand" "v,wo,wo")
-		      (match_operand:V16QI 3 "register_operand" "v,wo,wo")]
+  [(set (match_operand:V8HI 0 "register_operand" "=v,?wo")
+        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v,0")
+		      (match_operand:V8HI 2 "register_operand" "v,wo")
+		      (match_operand:V16QI 3 "register_operand" "v,wo")]
                   UNSPEC_VPERMHI))]
   "TARGET_ALTIVEC"
   "@
    vperm %0,%1,%2,%3
-   xxperm %x0,%x2,%x3
-   xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3"
+   xxperm %x0,%x2,%x3"
   [(set_attr "type" "vecperm")
-   (set_attr "length" "4,4,8")])
+   (set_attr "length" "4")])
 
 
 (define_expand "vec_unpacku_hi_v16qi"
Index: gcc/testsuite/gcc.target/powerpc/p9-permute.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p9-permute.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc)	(revision 236600)
+++ gcc/testsuite/gcc.target/powerpc/p9-permute.c	(.../gcc)	(working copy)
@@ -1,4 +1,4 @@ 
-/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-do compile { target { powerpc64*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
 /* { dg-options "-mcpu=power9 -O2" } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
@@ -17,5 +17,6 @@  permute (vector long long *p, vector lon
   return vec_perm (a, b, mask);
 }
 
+/* expect xxpermr on little-endian, xxperm on big-endian */
 /* { dg-final { scan-assembler	   "xxperm" } } */
 /* { dg-final { scan-assembler-not "vperm"  } } */