diff mbox

, Fix constraints on VSX Fma, Fix, and Reduce options

Message ID 20140910201631.GA20669@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner Sept. 10, 2014, 8:16 p.m. UTC
In doing work on improving power8 fusion support, I noticed that in several of
the patterns (vector fused multiply-add, optimization of float (fix (x)), and
vector reduction), I used the "ws" constraint which is the constraint for
scalar double precision floating point (currently FLOAT_REGS) in cases where
the operand is a vector, where we should use "wd" (preferred constraint for
V2DF), "wf" (preferred constraint for V4SF) or even "wa" (any VSX register).
This means the register allocator might generate extra code due to preferring
the traditional floating point registers.

I was curious about the code generation changes, so I built power8 versions of
the Spec 2006 benchmark suite, and compared the number of instructions
generated, using the same options.  Most of the floating point benchmarks had
some changes in code generation, including fewer scalar floating loads/stores
(where the RA picked a traditional scalar register, which meant elsewere a
scalar was spilled to the stack), and different encodings of the FMA
instructions.

I did a run of the FP spec benchmarks on a big endian power8 system.  There
were no regressions that were significant, and the cactusADM benchmark sped up
by 2%.

I did a bootstrap/make check comparison, and there were no regressions.  Is it
ok to install in trunk and the active PowerPC branches?

Comments

David Edelsohn Sept. 10, 2014, 8:42 p.m. UTC | #1
On Wed, Sep 10, 2014 at 4:16 PM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> In doing work on improving power8 fusion support, I noticed that in several of
> the patterns (vector fused multiply-add, optimization of float (fix (x)), and
> vector reduction), I used the "ws" constraint which is the constraint for
> scalar double precision floating point (currently FLOAT_REGS) in cases where
> the operand is a vector, where we should use "wd" (preferred constraint for
> V2DF), "wf" (preferred constraint for V4SF) or even "wa" (any VSX register).
> This means the register allocator might generate extra code due to preferring
> the traditional floating point registers.
>
> I was curious about the code generation changes, so I built power8 versions of
> the Spec 2006 benchmark suite, and compared the number of instructions
> generated, using the same options.  Most of the floating point benchmarks had
> some changes in code generation, including fewer scalar floating loads/stores
> (where the RA picked a traditional scalar register, which meant elsewere a
> scalar was spilled to the stack), and different encodings of the FMA
> instructions.
>
> I did a run of the FP spec benchmarks on a big endian power8 system.  There
> were no regressions that were significant, and the cactusADM benchmark sped up
> by 2%.
>
> I did a bootstrap/make check comparison, and there were no regressions.  Is it
> ok to install in trunk and the active PowerPC branches?

Needs a ChangeLog.

Okay.

thanks, David
Michael Meissner Sept. 10, 2014, 8:48 p.m. UTC | #2
On Wed, Sep 10, 2014 at 04:42:06PM -0400, David Edelsohn wrote:
> Needs a ChangeLog.

Whoops, I forgot to include it:

2014-09-10  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/vsx.md (vsx_fmav4sf4): Use correct constraints for
	V2DF, V4SF, DF, and DI modes.
	(vsx_fmav2df2): Likewise.
	(vsx_float_fix_<mode>2): Likewise.
	(vsx_reduc_<VEC_reduc_name>_v2df_scalar): Likewise.
diff mbox

Patch

Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 214455)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -905,11 +905,11 @@  (define_insn "*vsx_tsqrt<mode>2_internal
 ;; multiply.
 
 (define_insn "*vsx_fmav4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=ws,ws,?wa,?wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wf,wf,?wa,?wa,v")
 	(fma:V4SF
-	  (match_operand:V4SF 1 "vsx_register_operand" "%ws,ws,wa,wa,v")
-	  (match_operand:V4SF 2 "vsx_register_operand" "ws,0,wa,0,v")
-	  (match_operand:V4SF 3 "vsx_register_operand" "0,ws,0,wa,v")))]
+	  (match_operand:V4SF 1 "vsx_register_operand" "%wf,wf,wa,wa,v")
+	  (match_operand:V4SF 2 "vsx_register_operand" "wf,0,wa,0,v")
+	  (match_operand:V4SF 3 "vsx_register_operand" "0,wf,0,wa,v")))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
    xvmaddasp %x0,%x1,%x2
@@ -920,11 +920,11 @@  (define_insn "*vsx_fmav4sf4"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_fmav2df4"
-  [(set (match_operand:V2DF 0 "vsx_register_operand" "=ws,ws,?wa,?wa")
+  [(set (match_operand:V2DF 0 "vsx_register_operand" "=wd,wd,?wa,?wa")
 	(fma:V2DF
-	  (match_operand:V2DF 1 "vsx_register_operand" "%ws,ws,wa,wa")
-	  (match_operand:V2DF 2 "vsx_register_operand" "ws,0,wa,0")
-	  (match_operand:V2DF 3 "vsx_register_operand" "0,ws,0,wa")))]
+	  (match_operand:V2DF 1 "vsx_register_operand" "%wd,wd,wa,wa")
+	  (match_operand:V2DF 2 "vsx_register_operand" "wd,0,wa,0")
+	  (match_operand:V2DF 3 "vsx_register_operand" "0,wd,0,wa")))]
   "VECTOR_UNIT_VSX_P (V2DFmode)"
   "@
    xvmaddadp %x0,%x1,%x2
@@ -1360,8 +1360,8 @@  (define_insn "*vsx_float_fix_<mode>2"
 (define_insn "vsx_concat_<mode>"
   [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSr>,?<VSa>")
 	(vec_concat:VSX_D
-	 (match_operand:<VS_scalar> 1 "vsx_register_operand" "ws,<VSa>")
-	 (match_operand:<VS_scalar> 2 "vsx_register_operand" "ws,<VSa>")))]
+	 (match_operand:<VS_scalar> 1 "vsx_register_operand" "<VS_64reg>,<VSa>")
+	 (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
   if (BYTES_BIG_ENDIAN)
@@ -2018,7 +2018,7 @@  (define_insn_and_split "*vsx_reduc_<VEC_
 ;; to the top element of the V2DF array without doing an extract.
 
 (define_insn_and_split "*vsx_reduc_<VEC_reduc_name>_v2df_scalar"
-  [(set (match_operand:DF 0 "vfloat_operand" "=&ws,&?wa,ws,?wa")
+  [(set (match_operand:DF 0 "vfloat_operand" "=&ws,&?ws,ws,?ws")
 	(vec_select:DF
 	 (VEC_reduc:V2DF
 	  (vec_concat:V2DF