diff mbox series

, Improve moving SFmode to GPR on PowerPC

Message ID 20170919191636.GA4037@ibm-tiger.the-meissners.org
State New
Headers show
Series , Improve moving SFmode to GPR on PowerPC | expand

Commit Message

Michael Meissner Sept. 19, 2017, 7:16 p.m. UTC
On the PowerPC starting with ISA 2.07 (power8), moving a single precision value
(SFmode) from a vector register to a GPR involves converting the scalar value
in the register from being in double (DFmode) format to the 32-bit
vector/storage format, doing the move to the GPR, and then doing a shift right
32-bits to get the value into the bottom 32-bits of the GPR for use as a
scalar:

	xscvdpspn 0,1
	mfvsrd    3,0
	srdi      3,3,32

It turns out that the current processors starting with ISA 2.06 (power7)
through ISA 3.0 (power9) actually duplicates the 32-bit value produced by the
XSCVDPSPN and XSCVDPSP instructions into the top 32-bits of the register and to
the second 32-bit word.  This allows us to eliminate the shift instruction,
since the value is already in the correct location for a 32-bit scalar.

ISA 3.0 is being updated to include this specification (and other fixes) so
that future processors will also be able to eliminate the shift.

The new code is:

	xscvdpspn 0,1
	mfvsrwz   3,0

While I was working on the modifications, I noticed that if the user did a
round from DFmode to SFmode and then tried to move it to a GPR, it would
originally do:

	frsp      1,2
	xscvdpspn 0,1
	mfvsrd    3,0
	srdi      3,3,32

The XSCVDPSP instruction already handles values outside of the SFmode range
(XSCVDPSPN does not), and so I added a combiner pattern to combine the two
instructions:

	xscvdpsp  0,1
	mfvsrwz   3,0

While I was looking at the code, I was noticing that if we have a SImode value
in a vector register, and we want to sign extended it and leave the value in a
GPR register, on power8 the register allocator would decide to do a 32-bit
store integer instruction and a sign extending load in the GPR to do the sign
extension.  I added a splitter to convert this into a pair of MFVSRWZ and
EXTSH instructions.

I built Spec 2006 with the changes, and I noticed the following changes in the
code:

    * Round DF->SF and move to GPR: namd, wrf;
    * Eliminate 32-bit shift: gromacs, namd, povray, wrf;
    * Use of MFVSRWZ/EXTSW: gromacs, povray, calculix, h264ref.

I have built these changes on the following machines with bootstrap and no
regressions in the regression test:

    * Big endian power7 (with both 32/64-bit targets);
    * Little endian power8;
    * Little endian power9 prototype.

Can I check these changes into GCC 8?  Can I back port these changes into the
GCC 7 branch?

[gcc]
2017-09-19  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/vsx.md (vsx_xscvspdp_scalar2): Move insn so it is
	next to vsx_xscvspdp.
	(vsx_xscvdpsp_scalar): Use 'ww' constraint instead of 'f' to allow
	SFmode values being in Altivec registers.
	(vsx_xscvdpspn): Eliminate uneeded alternative.  Use correct
	constraint ('ws') for DFmode.
	(vsx_xscvspdpn): Likewise.
	(vsx_xscvdpspn_scalar): Likewise.
	(peephole for optimizing move SF to GPR): Adjust code to eliminate
	needing to do the shift right 32-bits operation after XSCVDPSPN.
	* config/rs6000/rs6000.md (extendsi<mode>2): Add alternative to do
	sign extend from vector register to GPR via a split, preventing
	the register allocator from doing the move via store/load.
	(extendsi<mode>2 splitter): Likewise.
	(movsi_from_sf): Adjust code to eliminate doing a 32-bit shift
	right or vector extract after doing XSCVDPSPN.  Use MFVSRWZ
	instead of MFVSRD to move the value to a GPR register.
	(movdi_from_sf_zero_ext): Likewise.
	(movsi_from_df): Add optimization to merge a convert from DFmode
	to SFmode and moving the SFmode to a GPR to use XSCVDPSP instead
	of round and XSCVDPSPN.
	(reload_gpr_from_vsxsf): Use MFVSRWZ instead of MFVSRD to move the
	value to a GPR register.  Rename p8_mfvsrd_4_disf insn to
	p8_mfvsrwz_disf.
	(p8_mfvsrd_4_disf): Likewise.
	(p8_mfvsrwz_disf): Likewise.

[gcc/testsuite]
2017-09-19  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/pr71977-1.c: Adjust scan-assembler codes to
	reflect that we don't generate a 32-bit shift right after
	XSCVDPSPN.
	* gcc.target/powerpc/direct-move-float1.c: Likewise.
	* gcc.target/powerpc/direct-move-float3.c: New test.

Comments

Michael Meissner Sept. 26, 2017, 2:30 p.m. UTC | #1
Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #1.  Can I check this into the trunk?  I noticed without this
patch, sometimes the register allocator would do a store and then a load to
move a SImode value from a vector register to a GPR and sign extend it, instead
of doing the move and then the sign extension.

2017-09-25  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.md (extendsi<mode>2): Add a splitter to do
	sign extension from a vector register to a GPR by doing a 32-bit
	direct move and then an EXTSW.
	(extendsi<mode>2 splitter): Likewise.
Michael Meissner Sept. 26, 2017, 2:32 p.m. UTC | #2
Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #2.  Can I check this into the trunk?  Compared to the previous
patch, I simplified this to use zero_extendsidi2 instead of using an UNSPEC,
and I deleted the UNSPEC.

2017-09-25  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.md (movsi_from_sf): Adjust code to
	eliminate doing a 32-bit shift right or vector extract after doing
	XSCVDPSPN.  Use zero_extendsidi2 instead of p8_mfvsrd_4_disf to
	move the value to the GPRs.
	(movdi_from_sf_zero_ext): Likewise.
	(reload_gpr_from_vsxsf): Likewise.
	(p8_mfvsrd_4_disf): Delete, no longer used.
Michael Meissner Sept. 26, 2017, 2:34 p.m. UTC | #3
Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #3.  Can I check this into the trunk?  I noticed that sometimes
you want to convert a DFmode to SFmode and then move it to a GPR register.
This patch adds a combiner insn to use the XSCVDPSP instruction (which does the
proper rounding and handles out of bound values) instead of the XSCVDPSPN
instruction (which assumes the value has already been rounded, and does not
raise an exception in case of a Nan).

2017-09-25  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.md (movsi_from_df): Optimize converting a
	DFmode to a SFmode, and then needing to move the SFmode to a GPR
	to use the XSCVDPSP instruction instead of FRSP and XSCVDPSPN.
Michael Meissner Sept. 26, 2017, 2:36 p.m. UTC | #4
Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #4.  Can I check this into the trunk?  This is a cosmetic change,
in that I noticed the two insns providing XSCVDPSP were separated by another
insn, and it moves the 2nd insn to be adjacent to the first.

2017-09-25  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/vsx.md (vsx_xscvspdp_scalar2): Move insn so that
	it is adjacent to the other XSCVSPDP insns.
Michael Meissner Sept. 26, 2017, 2:39 p.m. UTC | #5
Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #5.  Can I check this into the trunk?  In working on the patch, I
noticed that the XSCVDPSP insn used "f" to limit the register to traditional
FPR registers.  This insn was written for power7, which did not support SFmode
in Altivec registers.  Now that power8 supports SFmode in Altivec registers,
this patch uses the proper constraint ("ww") so we can avoid a move
instruction.

2017-09-25  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/vsx.md (vsx_xscvdpsp_scalar): Use "ww" constraint
	instead of "f" to allow SFmode to be in traditional Altivec
	registers.
Michael Meissner Sept. 26, 2017, 2:44 p.m. UTC | #6
Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #6.  Can I check this into the trunk?  When I first did the
power7 port years ago, I decorated a lot of the insns with 2 alternatives, one
that used the type specific constraint (i.e. "wf" for V4SF, "wd" for V2DF,
etc.)  and then the all VSX constraint ("wa") as an alternative with '?'.  The
theory was we might want to favor Altivec or FPR registers for a given vector
type.  However, we've never done this.  Further more with the change to use
LRA instead of reload, this isn't as useful.  So, as I encounter these dual
alternatived, I have been eliminating them.

2017-09-25  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/vsx.md (vsx_xscvdpspn): Eliminate useless
	alternative constraint.
	(vsx_xscvspdpn): Likewise.
	(vsx_xscvspdpn_scalar): Likewise.
Michael Meissner Sept. 26, 2017, 2:48 p.m. UTC | #7
Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #7.  Can I check this into the trunk?  This patch fixes the
peephole to optimize code like (that shows up in the math library):

	float value;
	unsigned int u2;
	union {
	  float f;
	  unsigned int ui;
	} u;

	u.f = value;
	u2 = u.ui & 0x80000000;

2017-09-25  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/vsx.md (peephole for optimizing move SF to GPR):
	Adjust code to eliminate needing to do the shift right 32-bits
	operation after XSCVDPSPN.
Michael Meissner Sept. 26, 2017, 2:50 p.m. UTC | #8
Off list, Segher asked that I break the patch eliminating a shift right when
transfering SFmode from a vector register to a GPR register down into smaller
chunks.  The power7 and power8 instructions that convert values in the double
precision format to single precision actually duplicate the 32-bits in the
first word and second word (the ISA says the second word is undefined).  We are
in the process of issuing an update to ISA 3.0 to clarify that this will be the
required behavior going forward.

I have broken the patches down to 8 chunks.  Some of the patch are just
cosmetic of things I noticed while doing the main patch.  One patch eliminates
the shift.  Another fixes up the peephole2 that optimizes putting a SFmode into
a union and then doing masking on the value.  And the final patch updates the
tests that need to be changed.

I have verified that each of these sub-patches build, and after all 8 patches
have been applied, I did the full bootstrap and regresion test, and like the
previous combination patch there were no regressions.  If only some of the
patches are applied, then there will be 3 regressions until the remaining
patches are applied.

This is patch #8.  Can I check this into the trunk?  It fixes the two tests
(pr71977-1.c and direct-move-float1.c) that need to be adjusted with the
previous patches applies.  It also adds a new test to test combining round from
DFmode to SFmode and move it to a GPR.

2017-09-25  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/pr71977-1.c: Update test to know that we
	don't generate a 32-bit shift after doing XSCVDPSPN.
	* gcc.target/powerpc/direct-move-float1.c: Likewise.
	* gcc.target/powerpc/direct-move-float3.c: New test.
Segher Boessenkool Sept. 26, 2017, 3:02 p.m. UTC | #9
Hi!

On Tue, Sep 26, 2017 at 10:30:03AM -0400, Michael Meissner wrote:
> I have broken the patches down to 8 chunks.

Thanks for doing this.

> +(define_split
> +  [(set (match_operand:DI 0 "int_reg_operand")
> +	(sign_extend:DI (match_operand:SI 1 "vsx_register_operand")))]

Should be EXTSI instead of DI, for clarity.

> +  "TARGET_DIRECT_MOVE_64BIT && reload_completed"
> +  [(set (match_dup 2)
> +	(match_dup 1))
> +   (set (match_dup 0)
> +	(sign_extend:DI (match_dup 2)))]
> +{
> +  operands[2] = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
> +})

Okay for trunk with that.  Thanks!


Segher
Segher Boessenkool Sept. 26, 2017, 4:06 p.m. UTC | #10
Hi,

On Tue, Sep 26, 2017 at 10:32:03AM -0400, Michael Meissner wrote:
> 	* config/rs6000/rs6000.md (movsi_from_sf): Adjust code to
> 	eliminate doing a 32-bit shift right or vector extract after doing
> 	XSCVDPSPN.  Use zero_extendsidi2 instead of p8_mfvsrd_4_disf to
> 	move the value to the GPRs.
> 	(movdi_from_sf_zero_ext): Likewise.
> 	(reload_gpr_from_vsxsf): Likewise.

> --- gcc/config/rs6000/rs6000.md	(revision 253169)
> +++ gcc/config/rs6000/rs6000.md	(working copy)
> @@ -6806,25 +6806,25 @@ (define_insn "*movsi_internal1_single"
>  ;; needed.
>  
>  ;;		MR           LWZ          LFIWZX       LXSIWZX   STW
> -;;		STFS         STXSSP       STXSSPX      VSX->GPR  MTVSRWZ
> -;;		VSX->VSX
> +;;		STFS         STXSSP       STXSSPX      VSX->GPR  VSX->VSX,
> +;;		MTVSRWZ

(Typo: comma at end of line).

>  (define_insn_and_split "movsi_from_sf"
>    [(set (match_operand:SI 0 "nonimmediate_operand"
>  		"=r,         r,           ?*wI,        ?*wH,     m,
> -		 m,          wY,          Z,           r,        wIwH,
> -		 ?wK")
> +		 m,          wY,          Z,           r,        ?*wIwH,
> +		 wIwH")
>  
>  	(unspec:SI [(match_operand:SF 1 "input_operand"
>  		"r,          m,           Z,           Z,        r,
> -		 f,          wb,          wu,          wIwH,     r,
> -		 wK")]
> +		 f,          wb,          wu,          wIwH,     wIwH,
> +		 r")]
>  		    UNSPEC_SI_FROM_SF))
>  
>     (clobber (match_scratch:V4SF 2
>  		"=X,         X,           X,           X,        X,
> -		 X,          X,           X,           wa,       X,
> -		 wa"))]
> +		 X,          X,           X,           wIwH,     X,
> +		 X"))]
>  
>    "TARGET_NO_SF_SUBREG
>     && (register_operand (operands[0], SImode)
> @@ -6839,10 +6839,10 @@ (define_insn_and_split "movsi_from_sf"
>     stxssp %1,%0
>     stxsspx %x1,%y0
>     #
> -   mtvsrwz %x0,%1
> -   #"
> +   xscvdpspn %x0,%x1
> +   mtvsrwz %x0,%1"
>    "&& reload_completed
> -   && register_operand (operands[0], SImode)
> +   && int_reg_operand (operands[0], SImode)
>     && vsx_reg_sfsubreg_ok (operands[1], SFmode)"
>    [(const_int 0)]
>  {

So you swap the last two alternatives.  Hrm okay.

> @@ -6850,52 +6850,41 @@ (define_insn_and_split "movsi_from_sf"
>    rtx op1 = operands[1];
>    rtx op2 = operands[2];
>    rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
> +  rtx op2_si = gen_rtx_REG (SImode, REGNO (op2));
>  

Does this work, btw?  I would expect you need reg_or_subregno, for op0
that is (the new op2 might be fine, not sure).  You do use it in most
places; please check.

>  ;; movsi_from_sf with zero extension
>  ;;
>  ;;		RLDICL       LWZ          LFIWZX       LXSIWZX   VSX->GPR
> -;;		MTVSRWZ      VSX->VSX
> +;;		VSX->VSX     MTVSRWZ
>  
>  (define_insn_and_split "*movdi_from_sf_zero_ext"
>    [(set (match_operand:DI 0 "gpc_reg_operand"
>  		"=r,         r,           ?*wI,        ?*wH,     r,
> -		wIwH,        ?wK")
> +		 wK,         wIwH")

This loses the "?", is that on purpose?

>    [(set_attr "type"
>  		"*,          load,        fpload,      fpload,  mftgpr,
> -		 mffgpr,     veclogical")
> +		 vecexts,    mffgpr")

vecsimple or vecfloat I guess, not vecexts.  We have no way of describing
it exactly, of course.  Maybe just "two".

Okay for trunk with those things taken care of.  Thanks!


Segher
Segher Boessenkool Sept. 26, 2017, 4:36 p.m. UTC | #11
Hi!

On Tue, Sep 26, 2017 at 10:34:44AM -0400, Michael Meissner wrote:
> 	* config/rs6000/rs6000.md (movsi_from_df): Optimize converting a
> 	DFmode to a SFmode, and then needing to move the SFmode to a GPR
> 	to use the XSCVDPSP instruction instead of FRSP and XSCVDPSPN.

> --- gcc/config/rs6000/rs6000.md	(revision 253170)
> +++ gcc/config/rs6000/rs6000.md	(working copy)
> @@ -6919,6 +6919,26 @@ (define_insn_and_split "*movdi_from_sf_z
>  		"4,          4,           4,           4,        8,
>  		 8,          4")])
>  
> +;; Like movsi_from_sf, but combine a convert from DFmode to SFmode before
> +;; moving it to SImode.  We can do a SFmode store without having to do the
> +;; conversion explicitly.  If we are doing a register->register conversion, use
> +;; XSCVDPSP instead of XSCVDPSPN, since the former handles cases where the
> +;; input will not fit in a SFmode, and the later assumes the value has already
> +;; been rounded.
> +(define_insn "*movsi_from_df"
> +  [(set (match_operand:SI 0 "nonimmediate_operand"         "=wa,m,wY,Z")
> +	(unspec:SI [(float_truncate:SF
> +		     (match_operand:DF 1 "gpc_reg_operand" "wa, f,wb,wa"))]
> +		    UNSPEC_SI_FROM_SF))]

(The indentation is a bit broken here -- DF line is indented a space too
many, and the constraint strings do not line up).

Other than that: looks fine, thanks!


Segher
Segher Boessenkool Sept. 26, 2017, 5:13 p.m. UTC | #12
On Tue, Sep 26, 2017 at 10:36:34AM -0400, Michael Meissner wrote:
> 	* config/rs6000/vsx.md (vsx_xscvspdp_scalar2): Move insn so that
> 	it is adjacent to the other XSCVSPDP insns.

Okay for trunk.  Thanks,


Segher
Segher Boessenkool Sept. 26, 2017, 5:15 p.m. UTC | #13
Hi!

On Tue, Sep 26, 2017 at 10:39:06AM -0400, Michael Meissner wrote:
> 	* config/rs6000/vsx.md (vsx_xscvdpsp_scalar): Use "ww" constraint
> 	instead of "f" to allow SFmode to be in traditional Altivec
> 	registers.

Okay.  Thanks,


Segher
Segher Boessenkool Sept. 26, 2017, 5:21 p.m. UTC | #14
On Tue, Sep 26, 2017 at 10:44:24AM -0400, Michael Meissner wrote:
> 	* config/rs6000/vsx.md (vsx_xscvdpspn): Eliminate useless
> 	alternative constraint.
> 	(vsx_xscvspdpn): Likewise.
> 	(vsx_xscvspdpn_scalar): Likewise.

Okay, nice cleanup!  Thanks,


Segher
Segher Boessenkool Sept. 26, 2017, 5:28 p.m. UTC | #15
On Tue, Sep 26, 2017 at 10:50:14AM -0400, Michael Meissner wrote:
> 	* gcc.target/powerpc/pr71977-1.c: Update test to know that we
> 	don't generate a 32-bit shift after doing XSCVDPSPN.
> 	* gcc.target/powerpc/direct-move-float1.c: Likewise.
> 	* gcc.target/powerpc/direct-move-float3.c: New test.

> --- gcc/testsuite/gcc.target/powerpc/pr71977-1.c	(revision 253176)
> +++ gcc/testsuite/gcc.target/powerpc/pr71977-1.c	(working copy)
> @@ -23,9 +23,9 @@ mask_and_float_var (float f, uint32_t ma
>    return u.value;
>  }
>  
> -/* { dg-final { scan-assembler     "\[ \t\]xxland " } } */
> -/* { dg-final { scan-assembler-not "\[ \t\]and "    } } */
> -/* { dg-final { scan-assembler-not "\[ \t\]mfvsrd " } } */
> -/* { dg-final { scan-assembler-not "\[ \t\]stxv"    } } */
> -/* { dg-final { scan-assembler-not "\[ \t\]lxv"     } } */
> -/* { dg-final { scan-assembler-not "\[ \t\]srdi "   } } */
> +/* { dg-final { scan-assembler     {\mxxland\M}  } } */
> +/* { dg-final { scan-assembler-not {\mand\M}     } } */
> +/* { dg-final { scan-assembler-not {\mmfvsrd\M}  } } */
> +/* { dg-final { scan-assembler-not {\mstxv\M}    } } */
> +/* { dg-final { scan-assembler-not {\mlxv\M}     } } */
> +/* { dg-final { scan-assembler-not {\msrdi\M}    } } */

Careful, you still want to disallow lxvx and stxvx -- so just remove
the \M from those patterns, I'd say (if that works :-) )

Okay for trunk with that fixed.  Thanks,


Segher
Michael Meissner Sept. 26, 2017, 5:59 p.m. UTC | #16
On Tue, Sep 26, 2017 at 11:06:09AM -0500, Segher Boessenkool wrote:
> > @@ -6850,52 +6850,41 @@ (define_insn_and_split "movsi_from_sf"
> >    rtx op1 = operands[1];
> >    rtx op2 = operands[2];
> >    rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
> > +  rtx op2_si = gen_rtx_REG (SImode, REGNO (op2));
> >  
> 
> Does this work, btw?  I would expect you need reg_or_subregno, for op0
> that is (the new op2 might be fine, not sure).  You do use it in most
> places; please check.

It does work, but using reg_or_subreno is better.  I've changed it.

> >  ;; movsi_from_sf with zero extension
> >  ;;
> >  ;;		RLDICL       LWZ          LFIWZX       LXSIWZX   VSX->GPR
> > -;;		MTVSRWZ      VSX->VSX
> > +;;		VSX->VSX     MTVSRWZ
> >  
> >  (define_insn_and_split "*movdi_from_sf_zero_ext"
> >    [(set (match_operand:DI 0 "gpc_reg_operand"
> >  		"=r,         r,           ?*wI,        ?*wH,     r,
> > -		wIwH,        ?wK")
> > +		 wK,         wIwH")
> 
> This loses the "?", is that on purpose?

No.  I'll put it back (but I do think it is harmless, since the direct move
occurs before it).  In my original patches, I just left the value in a vector
register, and let the register allocator generate the direct move.  However,
when I updated the peephole2 (patch #7), it was a lot easier to write, if I
kept in the direct move option.  But I missed putting the '?' back in.
Thanks.

> >    [(set_attr "type"
> >  		"*,          load,        fpload,      fpload,  mftgpr,
> > -		 mffgpr,     veclogical")
> > +		 vecexts,    mffgpr")
> 
> vecsimple or vecfloat I guess, not vecexts.  We have no way of describing
> it exactly, of course.  Maybe just "two".

Ok.

> Okay for trunk with those things taken care of.  Thanks!
Michael Meissner Sept. 26, 2017, 6:08 p.m. UTC | #17
On Tue, Sep 26, 2017 at 11:36:14AM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 26, 2017 at 10:34:44AM -0400, Michael Meissner wrote:
> > 	* config/rs6000/rs6000.md (movsi_from_df): Optimize converting a
> > 	DFmode to a SFmode, and then needing to move the SFmode to a GPR
> > 	to use the XSCVDPSP instruction instead of FRSP and XSCVDPSPN.
> 
> > --- gcc/config/rs6000/rs6000.md	(revision 253170)
> > +++ gcc/config/rs6000/rs6000.md	(working copy)
> > @@ -6919,6 +6919,26 @@ (define_insn_and_split "*movdi_from_sf_z
> >  		"4,          4,           4,           4,        8,
> >  		 8,          4")])
> >  
> > +;; Like movsi_from_sf, but combine a convert from DFmode to SFmode before
> > +;; moving it to SImode.  We can do a SFmode store without having to do the
> > +;; conversion explicitly.  If we are doing a register->register conversion, use
> > +;; XSCVDPSP instead of XSCVDPSPN, since the former handles cases where the
> > +;; input will not fit in a SFmode, and the later assumes the value has already
> > +;; been rounded.
> > +(define_insn "*movsi_from_df"
> > +  [(set (match_operand:SI 0 "nonimmediate_operand"         "=wa,m,wY,Z")
> > +	(unspec:SI [(float_truncate:SF
> > +		     (match_operand:DF 1 "gpc_reg_operand" "wa, f,wb,wa"))]
> > +		    UNSPEC_SI_FROM_SF))]
> 
> (The indentation is a bit broken here -- DF line is indented a space too
> many, and the constraint strings do not line up).

That must something funky with patches and tabs.  It looks ok after I apply the
patch (the match_operand:DF is indented one space under float_truncate:SF).
Segher Boessenkool Sept. 26, 2017, 9:56 p.m. UTC | #18
On Tue, Sep 26, 2017 at 10:48:29AM -0400, Michael Meissner wrote:
> 	* config/rs6000/vsx.md (peephole for optimizing move SF to GPR):
> 	Adjust code to eliminate needing to do the shift right 32-bits
> 	operation after XSCVDPSPN.

After staring at this way too long...  Looks correct.  What a monster :-)

Okay for trunk.  Thanks!


Segher
Michael Meissner Sept. 26, 2017, 10:37 p.m. UTC | #19
On Tue, Sep 26, 2017 at 04:56:54PM -0500, Segher Boessenkool wrote:
> On Tue, Sep 26, 2017 at 10:48:29AM -0400, Michael Meissner wrote:
> > 	* config/rs6000/vsx.md (peephole for optimizing move SF to GPR):
> > 	Adjust code to eliminate needing to do the shift right 32-bits
> > 	operation after XSCVDPSPN.
> 
> After staring at this way too long...  Looks correct.  What a monster :-)
> 
> Okay for trunk.  Thanks!

Thanks for taking the time to verify it.

Yeah, it is a monster to get right.  It would be nice to put this off to a
separate MD pass, instead of abusing peephole2's.
Segher Boessenkool Sept. 27, 2017, 9:12 a.m. UTC | #20
On Tue, Sep 26, 2017 at 06:37:17PM -0400, Michael Meissner wrote:
> On Tue, Sep 26, 2017 at 04:56:54PM -0500, Segher Boessenkool wrote:
> > On Tue, Sep 26, 2017 at 10:48:29AM -0400, Michael Meissner wrote:
> > > 	* config/rs6000/vsx.md (peephole for optimizing move SF to GPR):
> > > 	Adjust code to eliminate needing to do the shift right 32-bits
> > > 	operation after XSCVDPSPN.
> > 
> > After staring at this way too long...  Looks correct.  What a monster :-)
> > 
> > Okay for trunk.  Thanks!
> 
> Thanks for taking the time to verify it.
> 
> Yeah, it is a monster to get right.  It would be nice to put this off to a
> separate MD pass, instead of abusing peephole2's.

Or maybe we can teach LRA to help us out here.  LRA dearly needs some
target hooks for the target to tell it what code sequences are good
for the target.


Segher
diff mbox series

Patch

Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 252844)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -1781,6 +1781,15 @@  (define_insn "vsx_xscvspdp"
   "xscvspdp %x0,%x1"
   [(set_attr "type" "fp")])
 
+;; Same as vsx_xscvspdp, but use SF as the type
+(define_insn "vsx_xscvspdp_scalar2"
+  [(set (match_operand:SF 0 "vsx_register_operand" "=ww")
+	(unspec:SF [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
+		   UNSPEC_VSX_CVSPDP))]
+  "VECTOR_UNIT_VSX_P (V4SFmode)"
+  "xscvspdp %x0,%x1"
+  [(set_attr "type" "fp")])
+
 ;; Generate xvcvhpsp instruction
 (define_insn "vsx_xvcvhpsp"
   [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
@@ -1794,41 +1803,32 @@  (define_insn "vsx_xvcvhpsp"
 ;; format of scalars is actually DF.
 (define_insn "vsx_xscvdpsp_scalar"
   [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "f")]
+	(unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww")]
 		     UNSPEC_VSX_CVSPDP))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "xscvdpsp %x0,%x1"
   [(set_attr "type" "fp")])
 
-;; Same as vsx_xscvspdp, but use SF as the type
-(define_insn "vsx_xscvspdp_scalar2"
-  [(set (match_operand:SF 0 "vsx_register_operand" "=ww")
-	(unspec:SF [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
-		   UNSPEC_VSX_CVSPDP))]
-  "VECTOR_UNIT_VSX_P (V4SFmode)"
-  "xscvspdp %x0,%x1"
-  [(set_attr "type" "fp")])
-
 ;; ISA 2.07 xscvdpspn/xscvspdpn that does not raise an error on signalling NaNs
 (define_insn "vsx_xscvdpspn"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=ww,?ww")
-	(unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "wd,wa")]
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=ww")
+	(unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "ws")]
 		     UNSPEC_VSX_CVDPSPN))]
   "TARGET_XSCVDPSPN"
   "xscvdpspn %x0,%x1"
   [(set_attr "type" "fp")])
 
 (define_insn "vsx_xscvspdpn"
-  [(set (match_operand:DF 0 "vsx_register_operand" "=ws,?ws")
-	(unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wf,wa")]
+  [(set (match_operand:DF 0 "vsx_register_operand" "=ws")
+	(unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wa")]
 		   UNSPEC_VSX_CVSPDPN))]
   "TARGET_XSCVSPDPN"
   "xscvspdpn %x0,%x1"
   [(set_attr "type" "fp")])
 
 (define_insn "vsx_xscvdpspn_scalar"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wf,?wa")
-	(unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww,ww")]
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa")
+	(unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww")]
 		     UNSPEC_VSX_CVDPSPN))]
   "TARGET_XSCVDPSPN"
   "xscvdpspn %x0,%x1"
@@ -4773,15 +4773,13 @@  (define_constants
 ;;
 ;; (set (reg:DI reg3) (unspec:DI [(reg:V4SF reg2)] UNSPEC_P8V_RELOAD_FROM_VSX))
 ;;
-;; (set (reg:DI reg3) (lshiftrt:DI (reg:DI reg3) (const_int 32)))
+;; (set (reg:DI reg4) (and:DI (reg:DI reg3) (reg:DI reg3)))
 ;;
-;; (set (reg:DI reg5) (and:DI (reg:DI reg3) (reg:DI reg4)))
+;; (set (reg:DI reg5) (ashift:DI (reg:DI reg4) (const_int 32)))
 ;;
-;; (set (reg:DI reg6) (ashift:DI (reg:DI reg5) (const_int 32)))
+;; (set (reg:SF reg6) (unspec:SF [(reg:DI reg5)] UNSPEC_P8V_MTVSRD))
 ;;
-;; (set (reg:SF reg7) (unspec:SF [(reg:DI reg6)] UNSPEC_P8V_MTVSRD))
-;;
-;; (set (reg:SF reg7) (unspec:SF [(reg:SF reg7)] UNSPEC_VSX_CVSPDPN))
+;; (set (reg:SF reg6) (unspec:SF [(reg:SF reg6)] UNSPEC_VSX_CVSPDPN))
 
 (define_peephole2
   [(match_scratch:DI SFBOOL_TMP_GPR "r")
@@ -4792,11 +4790,6 @@  (define_peephole2
 	(unspec:DI [(match_operand:V4SF SFBOOL_MFVSR_A "vsx_register_operand")]
 		   UNSPEC_P8V_RELOAD_FROM_VSX))
 
-   ;; SRDI
-   (set (match_dup SFBOOL_MFVSR_D)
-	(lshiftrt:DI (match_dup SFBOOL_MFVSR_D)
-		     (const_int 32)))
-
    ;; AND/IOR/XOR operation on int
    (set (match_operand:SI SFBOOL_BOOL_D "int_reg_operand")
 	(and_ior_xor:SI (match_operand:SI SFBOOL_BOOL_A1 "int_reg_operand")
@@ -4820,15 +4813,15 @@  (define_peephole2
    && (REG_P (operands[SFBOOL_BOOL_A2])
        || CONST_INT_P (operands[SFBOOL_BOOL_A2]))
    && (REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_MFVSR_D])
-       || peep2_reg_dead_p (3, operands[SFBOOL_MFVSR_D]))
+       || peep2_reg_dead_p (2, operands[SFBOOL_MFVSR_D]))
    && (REGNO (operands[SFBOOL_MFVSR_D]) == REGNO (operands[SFBOOL_BOOL_A1])
        || (REG_P (operands[SFBOOL_BOOL_A2])
 	   && REGNO (operands[SFBOOL_MFVSR_D])
 		== REGNO (operands[SFBOOL_BOOL_A2])))
    && REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_SHL_A])
    && (REGNO (operands[SFBOOL_SHL_D]) == REGNO (operands[SFBOOL_BOOL_D])
-       || peep2_reg_dead_p (4, operands[SFBOOL_BOOL_D]))
-   && peep2_reg_dead_p (5, operands[SFBOOL_SHL_D])"
+       || peep2_reg_dead_p (3, operands[SFBOOL_BOOL_D]))
+   && peep2_reg_dead_p (4, operands[SFBOOL_SHL_D])"
   [(set (match_dup SFBOOL_TMP_GPR)
 	(ashift:DI (match_dup SFBOOL_BOOL_A_DI)
 		   (const_int 32)))
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 252844)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -986,8 +986,11 @@  (define_insn_and_split "*extendhi<mode>2
 
 
 (define_insn "extendsi<mode>2"
-  [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,wl,wu,wj,wK,wH")
-	(sign_extend:EXTSI (match_operand:SI 1 "lwa_operand" "Y,r,Z,Z,r,wK,wH")))]
+  [(set (match_operand:EXTSI 0 "gpc_reg_operand"
+		     "=r, r,   wl,    wu,    wj,    wK,     wH,    wr")
+
+	(sign_extend:EXTSI (match_operand:SI 1 "lwa_operand"
+		     "Y,  r,   Z,     Z,     r,     wK,     wH,    ?wIwH")))]
   ""
   "@
    lwa%U1%X1 %0,%1
@@ -996,10 +999,23 @@  (define_insn "extendsi<mode>2"
    lxsiwax %x0,%y1
    mtvsrwa %x0,%1
    vextsw2d %0,%1
+   #
    #"
-  [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts,vecperm")
+  [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts,vecperm,mftgpr")
    (set_attr "sign_extend" "yes")
-   (set_attr "length" "4,4,4,4,4,4,8")])
+   (set_attr "length" "4,4,4,4,4,4,8,8")])
+
+(define_split
+  [(set (match_operand:DI 0 "int_reg_operand")
+	(sign_extend:DI (match_operand:SI 1 "vsx_register_operand")))]
+  "TARGET_DIRECT_MOVE_64BIT && reload_completed"
+  [(set (match_dup 2)
+	(match_dup 1))
+   (set (match_dup 0)
+	(sign_extend:DI (match_dup 2)))]
+{
+  operands[2] = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+})
 
 (define_split
   [(set (match_operand:DI 0 "altivec_register_operand")
@@ -6790,25 +6806,25 @@  (define_insn "*movsi_internal1_single"
 ;; needed.
 
 ;;		MR           LWZ          LFIWZX       LXSIWZX   STW
-;;		STFS         STXSSP       STXSSPX      VSX->GPR  MTVSRWZ
-;;		VSX->VSX
+;;		STFS         STXSSP       STXSSPX      VSX->GPR  VSX->VSX,
+;;		MTVSRWZ
 
 (define_insn_and_split "movsi_from_sf"
   [(set (match_operand:SI 0 "nonimmediate_operand"
 		"=r,         r,           ?*wI,        ?*wH,     m,
-		 m,          wY,          Z,           r,        wIwH,
-		 ?wK")
+		 m,          wY,          Z,           r,        ?*wIwH,
+		 wIwH")
 
 	(unspec:SI [(match_operand:SF 1 "input_operand"
 		"r,          m,           Z,           Z,        r,
-		 f,          wb,          wu,          wIwH,     r,
-		 wK")]
+		 f,          wb,          wu,          wIwH,     wIwH,
+		 r")]
 		    UNSPEC_SI_FROM_SF))
 
    (clobber (match_scratch:V4SF 2
 		"=X,         X,           X,           X,        X,
 		 X,          X,           X,           wa,       X,
-		 wa"))]
+		 X"))]
 
   "TARGET_NO_SF_SUBREG
    && (register_operand (operands[0], SImode)
@@ -6823,10 +6839,10 @@  (define_insn_and_split "movsi_from_sf"
    stxssp %1,%0
    stxsspx %x1,%y0
    #
-   mtvsrwz %x0,%1
-   #"
+   xscvdpspn %x0,%x1
+   mtvsrwz %x0,%1"
   "&& reload_completed
-   && register_operand (operands[0], SImode)
+   && int_reg_operand (operands[0], SImode)
    && vsx_reg_sfsubreg_ok (operands[1], SFmode)"
   [(const_int 0)]
 {
@@ -6836,50 +6852,38 @@  (define_insn_and_split "movsi_from_sf"
   rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
 
   emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
-
-  if (int_reg_operand (op0, SImode))
-    {
-      emit_insn (gen_p8_mfvsrd_4_disf (op0_di, op2));
-      emit_insn (gen_lshrdi3 (op0_di, op0_di, GEN_INT (32)));
-    }
-  else
-    {
-      rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1));
-      rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12);
-      emit_insn (gen_vextract4b (op0_di, op1_v16qi, byte_off));
-    }
-
+  emit_insn (gen_p8_mfvsrwz_disf (op0_di, op2));
   DONE;
 }
   [(set_attr "type"
 		"*,          load,        fpload,      fpload,   store,
-		 fpstore,    fpstore,     fpstore,     mftgpr,   mffgpr,
-		 veclogical")
+		 fpstore,    fpstore,     fpstore,     mftgpr,   fp,
+		 mffgpr")
 
    (set_attr "length"
 		"4,          4,           4,           4,        4,
-		 4,          4,           4,           12,       4,
-		 8")])
+		 4,          4,           4,           8,        4,
+		 4")])
 
 ;; movsi_from_sf with zero extension
 ;;
 ;;		RLDICL       LWZ          LFIWZX       LXSIWZX   VSX->GPR
-;;		MTVSRWZ      VSX->VSX
+;;		VSX->VSX     MTVSRWZ
 
 (define_insn_and_split "*movdi_from_sf_zero_ext"
   [(set (match_operand:DI 0 "gpc_reg_operand"
 		"=r,         r,           ?*wI,        ?*wH,     r,
-		wIwH,        ?wK")
+		 wK,         wIwH")
 
 	(zero_extend:DI
 	 (unspec:SI [(match_operand:SF 1 "input_operand"
 		"r,          m,           Z,           Z,        wIwH,
-		 r,          wK")]
+		 wIwH,       r")]
 		    UNSPEC_SI_FROM_SF)))
 
    (clobber (match_scratch:V4SF 2
 		"=X,         X,           X,           X,        wa,
-		 X,          wa"))]
+		 X,          X"))]
 
   "TARGET_DIRECT_MOVE_64BIT
    && (register_operand (operands[0], DImode)
@@ -6890,9 +6894,10 @@  (define_insn_and_split "*movdi_from_sf_z
    lfiwzx %0,%y1
    lxsiwzx %x0,%y1
    #
-   mtvsrwz %x0,%1
-   #"
+   #
+   mtvsrwz %x0,%1"
   "&& reload_completed
+   && register_operand (operands[0], DImode)
    && vsx_reg_sfsubreg_ok (operands[1], SFmode)"
   [(const_int 0)]
 {
@@ -6901,29 +6906,43 @@  (define_insn_and_split "*movdi_from_sf_z
   rtx op2 = operands[2];
 
   emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
-
   if (int_reg_operand (op0, DImode))
-    {
-      emit_insn (gen_p8_mfvsrd_4_disf (op0, op2));
-      emit_insn (gen_lshrdi3 (op0, op0, GEN_INT (32)));
-    }
+    emit_insn (gen_p8_mfvsrwz_disf (op0, op2));
   else
     {
-      rtx op0_si = gen_rtx_REG (SImode, REGNO (op0));
-      rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1));
-      rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12);
-      emit_insn (gen_vextract4b (op0_si, op1_v16qi, byte_off));
+      rtx op2_si = gen_rtx_REG (SImode, reg_or_subregno (op2));
+      emit_insn (gen_zero_extendsidi2 (op0, op2_si));
     }
 
   DONE;
 }
   [(set_attr "type"
 		"*,          load,        fpload,      fpload,  mftgpr,
-		 mffgpr,     veclogical")
+		 vecexts,    mffgpr")
 
    (set_attr "length"
-		"4,          4,           4,           4,        12,
-		 4,          8")])
+		"4,          4,           4,           4,        8,
+		 8,          4")])
+
+;; Like movsi_from_sf, but combine a convert from DFmode to SFmode before
+;; moving it to SImode.  We can do a SFmode store without having to do the
+;; conversion explicitly.  If we are doing a register->register conversion, use
+;; XSCVDPSP instead of XSCVDPSPN, since the former handles cases where the
+;; input will not fit in a SFmode, and the later assumes the value has already
+;; been rounded.
+(define_insn "*movsi_from_df"
+  [(set (match_operand:SI 0 "nonimmediate_operand"         "=wa,m,wY,Z")
+	(unspec:SI [(float_truncate:SF
+		     (match_operand:DF 1 "gpc_reg_operand" "wa, f,wb,wa"))]
+		    UNSPEC_SI_FROM_SF))]
+
+  "TARGET_NO_SF_SUBREG"
+  "@
+   xscvdpsp %x0,%x1
+   stfs%U0%X0 %1,%0
+   stxssp %1,%0
+   stxsspx %x1,%y0"
+  [(set_attr "type"   "fp,fpstore,fpstore,fpstore")])
 
 ;; Split a load of a large constant into the appropriate two-insn
 ;; sequence.
@@ -8437,19 +8456,20 @@  (define_insn_and_split "reload_gpr_from_
   rtx diop0 = simplify_gen_subreg (DImode, op0, SFmode, 0);
 
   emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1));
-  emit_insn (gen_p8_mfvsrd_4_disf (diop0, op2));
-  emit_insn (gen_lshrdi3 (diop0, diop0, GEN_INT (32)));
+  emit_insn (gen_p8_mfvsrwz_disf (diop0, op2));
   DONE;
 }
   [(set_attr "length" "12")
    (set_attr "type" "three")])
 
-(define_insn "p8_mfvsrd_4_disf"
+;; XSCVDPSPN puts the 32-bit value in both the first and second words, so we do
+;; not need to do a shift to extract the value.
+(define_insn "p8_mfvsrwz_disf"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(unspec:DI [(match_operand:V4SF 1 "register_operand" "wa")]
 		   UNSPEC_P8V_RELOAD_FROM_VSX))]
   "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "mfvsrd %0,%x1"
+  "mfvsrwz %0,%x1"
   [(set_attr "type" "mftgpr")])
 
 
Index: gcc/testsuite/gcc.target/powerpc/pr71977-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr71977-1.c	(revision 252844)
+++ gcc/testsuite/gcc.target/powerpc/pr71977-1.c	(working copy)
@@ -23,9 +23,9 @@  mask_and_float_var (float f, uint32_t ma
   return u.value;
 }
 
-/* { dg-final { scan-assembler     "\[ \t\]xxland " } } */
-/* { dg-final { scan-assembler-not "\[ \t\]and "    } } */
-/* { dg-final { scan-assembler-not "\[ \t\]mfvsrd " } } */
-/* { dg-final { scan-assembler-not "\[ \t\]stxv"    } } */
-/* { dg-final { scan-assembler-not "\[ \t\]lxv"     } } */
-/* { dg-final { scan-assembler-not "\[ \t\]srdi "   } } */
+/* { dg-final { scan-assembler     {\mxxland\M}  } } */
+/* { dg-final { scan-assembler-not {\mand\M}     } } */
+/* { dg-final { scan-assembler-not {\mmfvsrwz\M} } } */
+/* { dg-final { scan-assembler-not {\mstxv\M}    } } */
+/* { dg-final { scan-assembler-not {\mlxv\M}     } } */
+/* { dg-final { scan-assembler-not {\msrdi\M}    } } */
Index: gcc/testsuite/gcc.target/powerpc/direct-move-float1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-float1.c	(revision 252844)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-float1.c	(working copy)
@@ -5,7 +5,7 @@ 
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O2" } */
 /* { dg-final { scan-assembler "mtvsrd" } } */
-/* { dg-final { scan-assembler "mfvsrd" } } */
+/* { dg-final { scan-assembler "mfvsrwz" } } */
 /* { dg-final { scan-assembler "xscvdpspn" } } */
 /* { dg-final { scan-assembler "xscvspdpn" } } */
 
Index: gcc/testsuite/gcc.target/powerpc/direct-move-float3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/direct-move-float3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/direct-move-float3.c	(revision 0)
@@ -0,0 +1,28 @@ 
+/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-skip-if "" { powerpc*-*-*spe* } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mpower8-vector -O2" } */
+
+/* Test that we generate XSCVDPSP instead of FRSP and XSCVDPSPN when we combine
+   a round from double to float and moving the float value to a GPR.  */
+
+union u {
+  float f;
+  unsigned int ui;
+  int si;
+};
+
+unsigned int
+ui_d (double d)
+{
+  union u x;
+  x.f = d;
+  return x.ui;
+}
+
+/* { dg-final { scan-assembler     {\mmfvsrwz\M}   } } */
+/* { dg-final { scan-assembler     {\mxscvdpsp\M}  } } */
+/* { dg-final { scan-assembler-not {\mmtvsrd\M}    } } */
+/* { dg-final { scan-assembler-not {\mxscvdpspn\M} } } */
+/* { dg-final { scan-assembler-not {\msrdi\M}      } } */