diff mbox

[RS6000] reload_vsx_from_gprsf splitter

Message ID 20160215123653.GG10888@bubble.grove.modra.org
State New
Headers show

Commit Message

Alan Modra Feb. 15, 2016, 12:36 p.m. UTC
On Fri, Feb 12, 2016 at 02:57:22PM +0100, Ulrich Weigand wrote:
> > On Fri, Feb 12, 2016 at 08:54:19AM +1030, Alan Modra wrote:
> > > Another concern I had about this, besides using %L in asm output (what
> > > forces TFmode to use just fprs?), is what happens when we're using
> > > IEEE 128-bit floats?  In that case it looks like we'd get just one reg.
> > 
> > Good point that it breaks if the default long double (TFmode) type is IEEE
> > 128-bit floating point.  We would need to have two patterns, one that uses
> > TFmode and one that uses IFmode.  I wrote the power8 direct move stuff before
> > going down the road of IEEE 128-bit floating point.
> 
> Right.  It's a bit unfortunate that we can't just use IFmode unconditionally,
> but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and
> then we probably shouldn't be using it.

Actually, we can use IFmode unconditionally.  scalar_mode_supported_p
is relevant only up to and including expand.  Nothing prevents the
backend from using IFmode.

> Another option might be to use TDmode to allocate a scratch register pair.

That won't work, at least if we want to extract the two component regs
with simplify_gen_subreg, due to rs6000_cannot_change_mode_class.  In
my original patch I just extracted the regs by using gen_rtx_REG but I
changed that, based on your criticism of using gen_rtx_REG in
reload_vsx_from_gprsf, and because rs6000.md avoids gen_rtx_REG using
operand regnos in other places.  That particular change is of course
entirely cosmetic.  I also changed reload_vsx_from_gprsf to avoid mode
punning regs, instead duplicating insn patterns as done elsewhere in
the vsx support.  I don't believe we will see subregs of vsx or fp
regs after reload, but I'm quite willing to concede the point for a
stage4 fix.

Here's the revised patch.  To recap, the main bug fixes here are:
- stop reload_vsx_from_gprsf splitter from emitting a move not
handled by movdi_internal64
- don't use TFmode, which cannot now be assumed to be IBM
double-double.
Secondary to that, not using or passing around TFmode means the %L
restriction no longer matters, and constraints on the reload temp reg
can be relaxed.

Bootstrapped and regression tested powerpc64-linux biarch and
powerpc64le-linux.  OK David?

	PR target/68973
	* config/rs6000/rs6000.md (reload_vsx_from_gprsf): Use p8_mtvsrd_sf
	rather than attempting to use movdi_internal64.  Remove op0_di.
	(p8_mtvsrd_df, p8_mtvsrd_sf): New.
	(p8_mtvsrd_1, p8_mtvsrd_2): Delete.
	(p8_mtvsrwz): New.
	(p8_mtvsrwz_1, p8_mtvsrwz_2): Delete.
	(p8_xxpermdi_<mode>): Take two DF inputs rather than one TF.
	(p8_fmrgow_<mode>): Likewise.
	(reload_vsx_from_gpr<mode>): Make clobber IF.  Adjust for above
	changes.
	(reload_fpr_from_gpr<mode>): Similarly. Use "d" for op0 constraint.
	* config/rs6000/vsx.md (vsx_xscvspdpn_directmove): Make op1 SFmode.

Comments

David Edelsohn Feb. 15, 2016, 2:42 p.m. UTC | #1
On Mon, Feb 15, 2016 at 4:36 AM, Alan Modra <amodra@gmail.com> wrote:
> On Fri, Feb 12, 2016 at 02:57:22PM +0100, Ulrich Weigand wrote:
>> > On Fri, Feb 12, 2016 at 08:54:19AM +1030, Alan Modra wrote:
>> > > Another concern I had about this, besides using %L in asm output (what
>> > > forces TFmode to use just fprs?), is what happens when we're using
>> > > IEEE 128-bit floats?  In that case it looks like we'd get just one reg.
>> >
>> > Good point that it breaks if the default long double (TFmode) type is IEEE
>> > 128-bit floating point.  We would need to have two patterns, one that uses
>> > TFmode and one that uses IFmode.  I wrote the power8 direct move stuff before
>> > going down the road of IEEE 128-bit floating point.
>>
>> Right.  It's a bit unfortunate that we can't just use IFmode unconditionally,
>> but it seems rs6000_scalar_mode_supported_p (IFmode) may return false, and
>> then we probably shouldn't be using it.
>
> Actually, we can use IFmode unconditionally.  scalar_mode_supported_p
> is relevant only up to and including expand.  Nothing prevents the
> backend from using IFmode.
>
>> Another option might be to use TDmode to allocate a scratch register pair.
>
> That won't work, at least if we want to extract the two component regs
> with simplify_gen_subreg, due to rs6000_cannot_change_mode_class.  In
> my original patch I just extracted the regs by using gen_rtx_REG but I
> changed that, based on your criticism of using gen_rtx_REG in
> reload_vsx_from_gprsf, and because rs6000.md avoids gen_rtx_REG using
> operand regnos in other places.  That particular change is of course
> entirely cosmetic.  I also changed reload_vsx_from_gprsf to avoid mode
> punning regs, instead duplicating insn patterns as done elsewhere in
> the vsx support.  I don't believe we will see subregs of vsx or fp
> regs after reload, but I'm quite willing to concede the point for a
> stage4 fix.
>
> Here's the revised patch.  To recap, the main bug fixes here are:
> - stop reload_vsx_from_gprsf splitter from emitting a move not
> handled by movdi_internal64
> - don't use TFmode, which cannot now be assumed to be IBM
> double-double.
> Secondary to that, not using or passing around TFmode means the %L
> restriction no longer matters, and constraints on the reload temp reg
> can be relaxed.
>
> Bootstrapped and regression tested powerpc64-linux biarch and
> powerpc64le-linux.  OK David?
>
>         PR target/68973
>         * config/rs6000/rs6000.md (reload_vsx_from_gprsf): Use p8_mtvsrd_sf
>         rather than attempting to use movdi_internal64.  Remove op0_di.
>         (p8_mtvsrd_df, p8_mtvsrd_sf): New.
>         (p8_mtvsrd_1, p8_mtvsrd_2): Delete.
>         (p8_mtvsrwz): New.
>         (p8_mtvsrwz_1, p8_mtvsrwz_2): Delete.
>         (p8_xxpermdi_<mode>): Take two DF inputs rather than one TF.
>         (p8_fmrgow_<mode>): Likewise.
>         (reload_vsx_from_gpr<mode>): Make clobber IF.  Adjust for above
>         changes.
>         (reload_fpr_from_gpr<mode>): Similarly. Use "d" for op0 constraint.
>         * config/rs6000/vsx.md (vsx_xscvspdpn_directmove): Make op1 SFmode.
>

Okay.

Is there still an issue with the constraints used for movdi_internal64?

Thanks, David
Alan Modra Feb. 16, 2016, 12:24 a.m. UTC | #2
On Mon, Feb 15, 2016 at 06:42:35AM -0800, David Edelsohn wrote:
> Is there still an issue with the constraints used for movdi_internal64?

Yes and no.  No because we shouldn't be attempting DI moves between vsx
regs and gprs.  Yes because we ought to allow DImode in vsx regs, but
fixing that is likely not trivial.

Do we want to backport the PR68973 fixes to gcc-5 and gcc-4.9?  We are
exposed to the reload_vsx_from_gprsf bug there, I think, but TFmode
won't be IEEE.
David Edelsohn Feb. 16, 2016, 1:29 a.m. UTC | #3
On Mon, Feb 15, 2016 at 4:24 PM, Alan Modra <amodra@gmail.com> wrote:
> On Mon, Feb 15, 2016 at 06:42:35AM -0800, David Edelsohn wrote:
>> Is there still an issue with the constraints used for movdi_internal64?
>
> Yes and no.  No because we shouldn't be attempting DI moves between vsx
> regs and gprs.  Yes because we ought to allow DImode in vsx regs, but
> fixing that is likely not trivial.
>
> Do we want to backport the PR68973 fixes to gcc-5 and gcc-4.9?  We are
> exposed to the reload_vsx_from_gprsf bug there, I think, but TFmode
> won't be IEEE.

Backporting to 5 and 4.9 branches is okay with me.

Thanks, David
diff mbox

Patch

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index cdbf873..ec356cb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7488,41 +7488,31 @@ 
 ;; value, since it is allocated in reload and not all of the flow information
 ;; is setup for it.  We have two patterns to do the two moves between gprs and
 ;; fprs.  There isn't a dependancy between the two, but we could potentially
-;; schedule other instructions between the two instructions.  TFmode is
-;; currently limited to traditional FPR registers.  If/when this is changed, we
-;; will need to revist %L to make sure it works with VSX registers, or add an
-;; %x version of %L.
+;; schedule other instructions between the two instructions.
 
 (define_insn "p8_fmrgow_<mode>"
   [(set (match_operand:FMOVE64X 0 "register_operand" "=d")
-	(unspec:FMOVE64X [(match_operand:TF 1 "register_operand" "d")]
+	(unspec:FMOVE64X [
+		(match_operand:DF 1 "register_operand" "d")
+		(match_operand:DF 2 "register_operand" "d")]
 			 UNSPEC_P8V_FMRGOW))]
   "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "fmrgow %0,%1,%L1"
+  "fmrgow %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-(define_insn "p8_mtvsrwz_1"
-  [(set (match_operand:TF 0 "register_operand" "=d")
-	(unspec:TF [(match_operand:SI 1 "register_operand" "r")]
+(define_insn "p8_mtvsrwz"
+  [(set (match_operand:DF 0 "register_operand" "=d")
+	(unspec:DF [(match_operand:SI 1 "register_operand" "r")]
 		   UNSPEC_P8V_MTVSRWZ))]
   "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "mtvsrwz %x0,%1"
   [(set_attr "type" "mftgpr")])
 
-(define_insn "p8_mtvsrwz_2"
-  [(set (match_operand:TF 0 "register_operand" "+d")
-	(unspec:TF [(match_dup 0)
-		    (match_operand:SI 1 "register_operand" "r")]
-		   UNSPEC_P8V_MTVSRWZ))]
-  "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "mtvsrwz %L0,%1"
-  [(set_attr "type" "mftgpr")])
-
 (define_insn_and_split "reload_fpr_from_gpr<mode>"
-  [(set (match_operand:FMOVE64X 0 "register_operand" "=ws")
+  [(set (match_operand:FMOVE64X 0 "register_operand" "=d")
 	(unspec:FMOVE64X [(match_operand:FMOVE64X 1 "register_operand" "r")]
 			 UNSPEC_P8V_RELOAD_FROM_GPR))
-   (clobber (match_operand:TF 2 "register_operand" "=d"))]
+   (clobber (match_operand:IF 2 "register_operand" "=d"))]
   "!TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "#"
   "&& reload_completed"
@@ -7530,42 +7520,36 @@ 
 {
   rtx dest = operands[0];
   rtx src = operands[1];
-  rtx tmp = operands[2];
+  rtx tmp_hi = simplify_gen_subreg (DFmode, operands[2], IFmode, 0);
+  rtx tmp_lo = simplify_gen_subreg (DFmode, operands[2], IFmode, 8);
   rtx gpr_hi_reg = gen_highpart (SImode, src);
   rtx gpr_lo_reg = gen_lowpart (SImode, src);
 
-  emit_insn (gen_p8_mtvsrwz_1 (tmp, gpr_hi_reg));
-  emit_insn (gen_p8_mtvsrwz_2 (tmp, gpr_lo_reg));
-  emit_insn (gen_p8_fmrgow_<mode> (dest, tmp));
+  emit_insn (gen_p8_mtvsrwz (tmp_hi, gpr_hi_reg));
+  emit_insn (gen_p8_mtvsrwz (tmp_lo, gpr_lo_reg));
+  emit_insn (gen_p8_fmrgow_<mode> (dest, tmp_hi, tmp_lo));
   DONE;
 }
   [(set_attr "length" "12")
    (set_attr "type" "three")])
 
 ;; Move 128 bit values from GPRs to VSX registers in 64-bit mode
-(define_insn "p8_mtvsrd_1"
-  [(set (match_operand:TF 0 "register_operand" "=ws")
-	(unspec:TF [(match_operand:DI 1 "register_operand" "r")]
-		   UNSPEC_P8V_MTVSRD))]
-  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "mtvsrd %0,%1"
-  [(set_attr "type" "mftgpr")])
-
-(define_insn "p8_mtvsrd_2"
-  [(set (match_operand:TF 0 "register_operand" "+ws")
-	(unspec:TF [(match_dup 0)
-		    (match_operand:DI 1 "register_operand" "r")]
+(define_insn "p8_mtvsrd_df"
+  [(set (match_operand:DF 0 "register_operand" "=wa")
+	(unspec:DF [(match_operand:DI 1 "register_operand" "r")]
 		   UNSPEC_P8V_MTVSRD))]
   "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "mtvsrd %L0,%1"
+  "mtvsrd %x0,%1"
   [(set_attr "type" "mftgpr")])
 
 (define_insn "p8_xxpermdi_<mode>"
   [(set (match_operand:FMOVE128_GPR 0 "register_operand" "=wa")
-	(unspec:FMOVE128_GPR [(match_operand:TF 1 "register_operand" "ws")]
-			     UNSPEC_P8V_XXPERMDI))]
+	(unspec:FMOVE128_GPR [
+		(match_operand:DF 1 "register_operand" "wa")
+		(match_operand:DF 2 "register_operand" "wa")]
+		UNSPEC_P8V_XXPERMDI))]
   "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "xxpermdi %x0,%1,%L1,0"
+  "xxpermdi %x0,%x1,%x2,0"
   [(set_attr "type" "vecperm")])
 
 (define_insn_and_split "reload_vsx_from_gpr<mode>"
@@ -7573,7 +7557,7 @@ 
 	(unspec:FMOVE128_GPR
 	 [(match_operand:FMOVE128_GPR 1 "register_operand" "r")]
 	 UNSPEC_P8V_RELOAD_FROM_GPR))
-   (clobber (match_operand:TF 2 "register_operand" "=ws"))]
+   (clobber (match_operand:IF 2 "register_operand" "=wa"))]
   "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
   "#"
   "&& reload_completed"
@@ -7581,13 +7565,17 @@ 
 {
   rtx dest = operands[0];
   rtx src = operands[1];
-  rtx tmp = operands[2];
+  /* You might think that we could use op0 as one temp and a DF clobber
+     as op2, but you'd be wrong.  Secondary reload move patterns don't
+     check for overlap of the clobber and the destination.  */
+  rtx tmp_hi = simplify_gen_subreg (DFmode, operands[2], IFmode, 0);
+  rtx tmp_lo = simplify_gen_subreg (DFmode, operands[2], IFmode, 8);
   rtx gpr_hi_reg = gen_highpart (DImode, src);
   rtx gpr_lo_reg = gen_lowpart (DImode, src);
 
-  emit_insn (gen_p8_mtvsrd_1 (tmp, gpr_hi_reg));
-  emit_insn (gen_p8_mtvsrd_2 (tmp, gpr_lo_reg));
-  emit_insn (gen_p8_xxpermdi_<mode> (dest, tmp));
+  emit_insn (gen_p8_mtvsrd_df (tmp_hi, gpr_hi_reg));
+  emit_insn (gen_p8_mtvsrd_df (tmp_lo, gpr_lo_reg));
+  emit_insn (gen_p8_xxpermdi_<mode> (dest, tmp_hi, tmp_lo));
   DONE;
 }
   [(set_attr "length" "12")
@@ -7608,6 +7596,13 @@ 
 ;; Move SFmode to a VSX from a GPR register.  Because scalar floating point
 ;; type is stored internally as double precision in the VSX registers, we have
 ;; to convert it from the vector format.
+(define_insn "p8_mtvsrd_sf"
+  [(set (match_operand:SF 0 "register_operand" "=wa")
+	(unspec:SF [(match_operand:DI 1 "register_operand" "r")]
+		   UNSPEC_P8V_MTVSRD))]
+  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "mtvsrd %x0,%1"
+  [(set_attr "type" "mftgpr")])
 
 (define_insn_and_split "reload_vsx_from_gprsf"
   [(set (match_operand:SF 0 "register_operand" "=wa")
@@ -7622,16 +7617,12 @@ 
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  /* Also use the destination register to hold the unconverted DImode value.
-     This is conceptually a separate value from OP0, so we use gen_rtx_REG
-     rather than simplify_gen_subreg.  */
-  rtx op0_di = gen_rtx_REG (DImode, REGNO (op0));
   rtx op1_di = simplify_gen_subreg (DImode, op1, SFmode, 0);
 
   /* Move SF value to upper 32-bits for xscvspdpn.  */
   emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
-  emit_move_insn (op0_di, op2);
-  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0_di));
+  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
+  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
   DONE;
 }
   [(set_attr "length" "8")
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 997ff31..45af233 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1521,7 +1521,7 @@ 
 ;; Used by direct move to move a SFmode value from GPR to VSX register
 (define_insn "vsx_xscvspdpn_directmove"
   [(set (match_operand:SF 0 "vsx_register_operand" "=wa")
-	(unspec:SF [(match_operand:DI 1 "vsx_register_operand" "wa")]
+	(unspec:SF [(match_operand:SF 1 "vsx_register_operand" "wa")]
 		   UNSPEC_VSX_CVSPDPN))]
   "TARGET_XSCVSPDPN"
   "xscvspdpn %x0,%x1"