diff mbox series

, PR target/93932, Do not use input_operand for variable vector extract insns on PowerPC

Message ID 20200226223223.GA27931@ibm-toto.the-meissners.org
State New
Headers show
Series , PR target/93932, Do not use input_operand for variable vector extract insns on PowerPC | expand

Commit Message

Michael Meissner Feb. 26, 2020, 10:32 p.m. UTC
As part of improving the vector extract built-ins for the future processor, and
also looking at optimizing all of the cases with vector extracts and sign,
zero, float extension for GCC 11 (PR target/93230), I noticed that the code
generated for some of the tests had regressed from the GCC 8 time frame.

What is happening is in some instances, we want to do a vector extract with a
variable element where the vector is in a register:

	 #include <altivec.h>

	long long
	foo (vector long long v, unsigned long n)
	{
	  return vec_extract (v, n);
	}

During the reload pass, the register allocator decides that it should spill the
insn to the stack, and then do the vector extract from memory (which is an
optimization to prevent loading the vector in case we only want one element).

I came to the conclusion that having one insn that did both the variable vector
extract on a register and from memory, allowed this bad code to be generated.

I split the 3 variable vector extract insn patterns that used input_operand to
handle either type, to have one insn that handled a register, and a second insn
that handles only memory.

Note, there is a 4th place that uses input_operand for variable vector extracts
that is not touched by this patch.  The insn is trying to combine a variable
vector extract with a zero extend operation.  That insn pattern never will
match due to the way the insn pattern is written.  I will be submitting patches
to fix this insn separately (PR target/93937), and that patch will split the
insn into two separate insns.

I have built this code on both a little endian system and a big endian PowerPC
system, and I ran make check for C, C++, Fortran, and LTO.  There were no
regressions after applying the patch for the test suite.  Can I check this into
the master branch?

Because it also fails on the GCC 9 branch, I will be submitting patches for
that.  Due to the changes between GCC 9 and the current master, the patch in
this mail can not be blindly applied, but I have a patch for GCC 9.  GCC 8
generates the correct code, so it does not have to be fixed.

[gcc]
2020-02-26  Michael Meissner  <meissner@linux.ibm.com>

	PR target/93932
	* config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator):
	Split the insn into two parts.  This insn only does variable
	extract from a register.
	(vsx_extract_<mode>_var_load, VSX_D iterator): New insn, do
	variable extract from memory.
	(vsx_extract_v4sf_var): Split the insn into two parts.  This insn
	only does variable extract from a register.
	(vsx_extract_v4sf_var_load): New insn, do variable extract from
	memory.
	(vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Split the insn
	into two parts.  This insn only does variable extract from a
	register.
	(vsx_extract_<mode>_var_load, VSX_EXTRACT_I iterator): New insn,
	do variable extract from memory.

[gcc/testsuite]
2020-02-26  Michael Meissner  <meissner@linux.ibm.com>

	PR target/93932
	* gcc.target/powerpc/fold-vec-extract-longlong.p8.c: Adjust
	instruction counts.

Comments

Segher Boessenkool Feb. 27, 2020, 3:56 p.m. UTC | #1
Hi!

On Wed, Feb 26, 2020 at 05:32:23PM -0500, Michael Meissner wrote:
> What is happening is in some instances, we want to do a vector extract with a
> variable element where the vector is in a register:
> 
> 	 #include <altivec.h>
> 
> 	long long
> 	foo (vector long long v, unsigned long n)
> 	{
> 	  return vec_extract (v, n);
> 	}
> 
> During the reload pass, the register allocator decides that it should spill the
> insn to the stack, and then do the vector extract from memory (which is an
> optimization to prevent loading the vector in case we only want one element).

And that causes LHS/SHL, very undesirable.  Okay.

> Note, there is a 4th place that uses input_operand for variable vector extracts
> that is not touched by this patch.

There are more places that use input_operand.  input_operand is meant to
be used for the RHS of mov patterns (with a reg as LHS), and it isn't
good to use it anywhere else: input_operand allows too much: *all*
memory, many constants, datums that can only go into some kinds of regs
(while you might have another kind).

In pretty much all cases reload/LRA can fix things, but you get worse
code that way.

rs6000_expand_vector_extract is always called with a register as second
operand, so the changes you made are safe.

The "reload_completed"s now have no function other than to force worse
code to be generated, you might want to do something about that as a
follow-up.

> -// P8 (LE) variables: addi,xxpermdi,mr,stxvd2x|stxvd4x,rldicl,sldi,ldx,blr
> -// P8 (BE) constants: mfvsrd
> -// P8 (BE) Variables: addi,xxpermdi,rldicl,mr,stxvd2x|stxvd4x,sldi,ldx,blr
> +// P8 (LE) variables: xori, rldic, mtvsrd, xxpermdi, vslo, mfvsrd
> +// P8 (BE) constants: xxpermdi, mfvsrd
> +// P8 (BE) Variables:       rldic, mtvsrd, xxpermdi, vslo, mfvsrd
>  
> -/* { dg-final { scan-assembler-times {\maddi\M} 6 { target ilp32 } } } */
> -/* { dg-final { scan-assembler-times {\maddi\M} 3 { target lp64 } } } */
> +/* results. */
> +/* { dg-final { scan-assembler-times {\mxori\M} 3 { target le } } } */
> +/* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 3 } } */
> +/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 4 { target ilp32 } } } */
>  /* { dg-final { scan-assembler-times {\madd\M} 3 { target ilp32 } } } */
>  /* { dg-final { scan-assembler-times {\mlwz\M} 11 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times {\maddi\M} 6 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times {\mmfvsrd\M} 6 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 { target lp64 } } } */
>  /* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target le } } } */
> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 6 { target { be && lp64 } } } } */
>  /* { dg-final { scan-assembler-times {\mxxpermdi\M} 2 { target { be && ilp32 } } } } */
> -/* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target { be && lp64 } } } } */
> -/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 3 { target lp64 } } } */
> -/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 4 { target ilp32 } } } */
> -/* { dg-final { scan-assembler-times {\mrldicl\M|\mrldic\M|\mrlwinm\M} 3 } } */
> -/* { dg-final { scan-assembler-times {\mmfvsrd\M} 3 { target { lp64 } } } } */
> -/* { dg-final { scan-assembler-times {\mmfvsrd\M} 0 { target { be && ilp32 } } } } */
> -/* { dg-final { scan-assembler-times {\mmtvsrd\M} 0 { target { lp64 } } } } */
> -/* { dg-final { scan-assembler-times {\mmtvsrd\M} 0 { target { be && ilp32 } } } } */
> +/* { dg-final { scan-assembler-times {\mvslo\M} 3 { target lp64 } } } */

All this stays super fragile.

Okay for trunk.  Thanks!


Segher
diff mbox series

Patch

--- /tmp/TMHdwO_vsx.md	2020-02-26 13:25:27.250209645 -0500
+++ gcc/config/rs6000/vsx.md	2020-02-26 13:25:21.357125563 -0500
@@ -3245,14 +3245,14 @@  (define_insn "vsx_vslo_<mode>"
   "vslo %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-;; Variable V2DI/V2DF extract
+;; Variable V2DI/V2DF extract from a register
 (define_insn_and_split "vsx_extract_<mode>_var"
-  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r")
-	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,Q,Q")
-			     (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v")
+	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "gpc_reg_operand" "v")
+			     (match_operand:DI 2 "gpc_reg_operand" "r")]
 			    UNSPEC_VSX_EXTRACT))
-   (clobber (match_scratch:DI 3 "=r,&b,&b"))
-   (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
+   (clobber (match_scratch:DI 3 "=r"))
+   (clobber (match_scratch:V2DI 4 "=&v"))]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"
@@ -3263,6 +3263,23 @@  (define_insn_and_split "vsx_extract_<mod
   DONE;
 })
 
+;; Variable V2DI/V2DF extract from memory
+(define_insn_and_split "*vsx_extract_<mode>_var_load"
+  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=wa,r")
+	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "memory_operand" "Q,Q")
+			     (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+			    UNSPEC_VSX_EXTRACT))
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2],
+					   operands[3], <VS_scalar>mode);
+}
+  [(set_attr "type" "fpload,load")])
+
 ;; Extract a SF element from V4SF
 (define_insn_and_split "vsx_extract_v4sf"
   [(set (match_operand:SF 0 "vsx_register_operand" "=wa")
@@ -3315,14 +3332,14 @@  (define_insn_and_split "*vsx_extract_v4s
    (set_attr "length" "8")
    (set_attr "isa" "*,p7v,p9v,*")])
 
-;; Variable V4SF extract
+;; Variable V4SF extract from a register
 (define_insn_and_split "vsx_extract_v4sf_var"
-  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r")
-	(unspec:SF [(match_operand:V4SF 1 "input_operand" "v,Q,Q")
-		    (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
+	(unspec:SF [(match_operand:V4SF 1 "gpc_reg_operand" "v")
+		    (match_operand:DI 2 "gpc_reg_operand" "r")]
 		   UNSPEC_VSX_EXTRACT))
-   (clobber (match_scratch:DI 3 "=r,&b,&b"))
-   (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
+   (clobber (match_scratch:DI 3 "=r"))
+   (clobber (match_scratch:V2DI 4 "=&v"))]
   "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"
@@ -3333,6 +3350,23 @@  (define_insn_and_split "vsx_extract_v4sf
   DONE;
 })
 
+;; Variable V4SF extract from memory
+(define_insn_and_split "*vsx_extract_v4sf_var_load"
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,?r")
+	(unspec:SF [(match_operand:V4SF 1 "memory_operand" "Q,Q")
+		    (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+		   UNSPEC_VSX_EXTRACT))
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2],
+					   operands[3], SFmode);
+}
+  [(set_attr "type" "fpload,load")])
+
 ;; Expand the builtin form of xxpermdi to canonical rtl.
 (define_expand "vsx_xxpermdi_<mode>"
   [(match_operand:VSX_L 0 "vsx_register_operand")
@@ -3677,15 +3711,15 @@  (define_insn_and_split "*vsx_extract_<mo
   [(set_attr "type" "load")
    (set_attr "length" "8")])
 
-;; Variable V16QI/V8HI/V4SI extract
+;; Variable V16QI/V8HI/V4SI extract from a register
 (define_insn_and_split "vsx_extract_<mode>_var"
-  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r")
+  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r")
 	(unspec:<VS_scalar>
-	 [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q")
-	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+	 [(match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,v")
+	  (match_operand:DI 2 "gpc_reg_operand" "r,r")]
 	 UNSPEC_VSX_EXTRACT))
-   (clobber (match_scratch:DI 3 "=r,r,&b"))
-   (clobber (match_scratch:V2DI 4 "=X,&v,X"))]
+   (clobber (match_scratch:DI 3 "=r,r"))
+   (clobber (match_scratch:V2DI 4 "=X,&v"))]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"
@@ -3695,7 +3729,25 @@  (define_insn_and_split "vsx_extract_<mod
 				operands[3], operands[4]);
   DONE;
 }
-  [(set_attr "isa" "p9v,*,*")])
+  [(set_attr "isa" "p9v,*")])
+
+;; Variable V16QI/V8HI/V4SI extract from memory
+(define_insn_and_split "*vsx_extract_<mode>_var_load"
+  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r")
+	(unspec:<VS_scalar>
+	 [(match_operand:VSX_EXTRACT_I 1 "memory_operand" "Q")
+	  (match_operand:DI 2 "gpc_reg_operand" "r")]
+	 UNSPEC_VSX_EXTRACT))
+   (clobber (match_scratch:DI 3 "=&b"))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2],
+					   operands[3], <VS_scalar>mode);
+}
+  [(set_attr "type" "load")])
 
 (define_insn_and_split "*vsx_extract_<mode>_<VS_scalar>mode_var"
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r")
--- /tmp/oo6tIZ_fold-vec-extract-longlong.p8.c	2020-02-26 14:59:46.104993989 -0500
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-longlong.p8.c	2020-02-26 14:58:19.324752612 -0500
@@ -7,24 +7,23 @@ 
 
 // Targeting P8LE and P8BE, six tests total.
 // P8 (LE) constants: mfvsrd
-// P8 (LE) variables: addi,xxpermdi,mr,stxvd2x|stxvd4x,rldicl,sldi,ldx,blr
-// P8 (BE) constants: mfvsrd
-// P8 (BE) Variables: addi,xxpermdi,rldicl,mr,stxvd2x|stxvd4x,sldi,ldx,blr
+// P8 (LE) variables: xori, rldic, mtvsrd, xxpermdi, vslo, mfvsrd
+// P8 (BE) constants: xxpermdi, mfvsrd
+// P8 (BE) Variables:       rldic, mtvsrd, xxpermdi, vslo, mfvsrd
 
-/* { dg-final { scan-assembler-times {\maddi\M} 6 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\maddi\M} 3 { target lp64 } } } */
+/* results. */
+/* { dg-final { scan-assembler-times {\mxori\M} 3 { target le } } } */
+/* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 4 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\madd\M} 3 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mlwz\M} 11 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\maddi\M} 6 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mmfvsrd\M} 6 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target le } } } */
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 6 { target { be && lp64 } } } } */
 /* { dg-final { scan-assembler-times {\mxxpermdi\M} 2 { target { be && ilp32 } } } } */
-/* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target { be && lp64 } } } } */
-/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 3 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 4 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\mrldicl\M|\mrldic\M|\mrlwinm\M} 3 } } */
-/* { dg-final { scan-assembler-times {\mmfvsrd\M} 3 { target { lp64 } } } } */
-/* { dg-final { scan-assembler-times {\mmfvsrd\M} 0 { target { be && ilp32 } } } } */
-/* { dg-final { scan-assembler-times {\mmtvsrd\M} 0 { target { lp64 } } } } */
-/* { dg-final { scan-assembler-times {\mmtvsrd\M} 0 { target { be && ilp32 } } } } */
+/* { dg-final { scan-assembler-times {\mvslo\M} 3 { target lp64 } } } */
 
 #include <altivec.h>