Patchwork Add support for sparc VIS3 fp<-->int moves.

login
register
mail settings
Submitter David Miller
Date Oct. 24, 2011, 3:53 a.m.
Message ID <20111023.235321.1404597026148966447.davem@davemloft.net>
Download mbox | patch
Permalink /patch/121269/
State New
Headers show

Comments

David Miller - Oct. 24, 2011, 3:53 a.m.
The non-trivial aspects (and what took the most time for me) of these
changes are:

1) Getting the register move costs and class preferencing right such
   that the VIS3 moves do get effectively used for incoming
   float/vector argument passing on 32-bit, yet IRA and reload don't
   go nuts allocating integer registers to float/vector mode values
   and vice versa.

   Non-optimized compiles are particularly sensitive to this because
   there's simply a lot of moves that don't get cleaned up.  So we
   might have 6 moves, 3 on each side of a single real calculation, so
   in the IRA costs the register classes of the moves dominate.

2) Making sure we don't merge a VIS3 move into a restore instruction.

3) Dealing with the restriction that we can't operate on 32-bit pieces
   of values contained in the upper 32 v9 float registers.

   We deal with this using two elements.

   First, we indicate a FP_REGS or GENERAL_OR_FP_REGS preferred
   reload class when we see reload try to load an integer register
   into class EXTRA_FP_REGS or GENERAL_OR_EXTRA_FP_REGS.

   Second, we teach reload that if it tries to move between float and
   integer regs, and some register class involving EXTRA_FP_REGS is
   involved, that an intermediate FP_REGS class register will possibly
   be needed to successfully complete the reload.

The rest is mostly mechanical work of splitting the existing v9/64-bit
move patterns into non-vis3 and vis3 variants.

Because of how float arguments are passed on 32-bit, these instructions
help a lot.  This is evident in even the simplest examples, this C code:

float fnegs (float a) { return -a; }
double fnegd (double a) { return -a; }

would generate:

fnegs:
      	add     %sp, -104, %sp
	st      %o0, [%sp+100]
	ld      [%sp+100], %f8
	sub     %sp, -104, %sp
	jmp     %o7+8
         fnegs  %f8, %f0
fnegd:
      	add     %sp, -104, %sp
	std     %o0, [%sp+96]
	ldd     [%sp+96], %f8
	sub     %sp, -104, %sp
	jmp     %o7+8
         fnegd  %f8, %f0

but with VIS3 moves we get:

fnegs:
      	movwtos %o0, %f8
	jmp     %o7+8
         fnegs  %f8, %f0
fnegd:
        movwtos %o0, %f8
        movwtos %o1, %f9
        jmp     %o7+8
      	 fnegd  %f8, %f0

And with our good friend pdist.c we get the following code for
function 'foo' with VIS3 moves:

foo:
        fzero   %f8
        movwtos %o0, %f10
	movwtos %o1, %f11
        movwtos %o2, %f12
        movwtos %o3, %f13
        pdist   %f10, %f12, %f8
        movstouw        %f8, %o0
	jmp     %o7+8
         movstouw       %f9, %o1

Another good example of significantly improved code generation
can be found when looking at the output of libgcc2.c:_mulsc3()

Of course, sometimes we generate spurious secondary reloads because
the use of the EXTRA_FP_REGS (and GENERAL_OR_EXTRA_FP_REGS) register
class doesn't necessary result in using one of the upper 32 v9 float
registers.  Maybe if we used segregated register classes for the lower
and upper float regs we could attack this issue effectively.

These VIS3 patterns can also in the future be used for more crafty
constant and non-constant vec_init sequences.

This was regstrapped both with the compiler defaulting to vis3, and
without.

Committed to trunk.

gcc/

	* config/sparc/sparc.h (SECONDARY_MEMORY_NEEDED): We can move
	between float and non-float regs when VIS3.
	* config/sparc/sparc.c (eligible_for_restore_insn): We can't
	use a restore when the source is a float register.
	(sparc_split_regreg_legitimate): When VIS3 allow moves between
	float and integer regs.
	(sparc_register_move_cost): Adjust to account for VIS3 moves.
	(sparc_preferred_reload_class): On 32-bit with VIS3 when moving an
	integer reg to a class containing EXTRA_FP_REGS, constrain to
	FP_REGS.
	(sparc_secondary_reload): On 32-bit with VIS3 when moving between
	float and integer regs we sometimes need a FP_REGS class
	intermediate move to satisfy the reload.  When this happens
	specify an extra cost of 2.
	(*movsi_insn): Rename to have "_novis3" suffix and add !VIS3
	guard.
	(*movdi_insn_sp32_v9): Likewise.
	(*movdi_insn_sp64): Likewise.
	(*movsf_insn): Likewise.
	(*movdf_insn_sp32_v9): Likewise.
	(*movdf_insn_sp64): Likewise.
	(*zero_extendsidi2_insn_sp64): Likewise.
	(*sign_extendsidi2_insn): Likewise.
	(*movsi_insn_vis3): New insn.
	(*movdi_insn_sp32_v9_vis3): New insn.
	(*movdi_insn_sp64_vis3): New insn.
	(*movsf_insn_vis3): New insn.
	(*movdf_insn_sp32_v9_vis3): New insn.
	(*movdf_insn_sp64_vis3): New insn.
	(*zero_extendsidi2_insn_sp64_vis3): New insn.
	(*sign_extendsidi2_insn_vis3): New insn.
	(TFmode reg/reg split): Make sure both REG operands are float.
	(*mov<VM32:mode>_insn): Add "_novis3" suffix and !VIS3 guard. Remove
	easy constant to integer reg alternatives.
	(*mov<VM64:mode>_insn_sp64): Likewise.
	(*mov<VM64:mode>_insn_sp32_novis3): Likewise.
	(*mov<VM32:mode>_insn_vis3): New insn.
	(*mov<VM64:mode>_insn_sp64_vis3): New insn.
	(*mov<VM64:mode>_insn_sp32_vis3): New insn.
	(VM64 reg<-->reg split): New spliiter for 32-bit.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180360 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog             |   41 +++++
 gcc/config/sparc/sparc.c  |   85 ++++++++++-
 gcc/config/sparc/sparc.h  |    9 +-
 gcc/config/sparc/sparc.md |  375 +++++++++++++++++++++++++++++++++++++++++----
 4 files changed, 469 insertions(+), 41 deletions(-)
Richard Henderson - Oct. 24, 2011, 9:05 p.m.
On 10/23/2011 08:53 PM, David Miller wrote:
> -(define_insn "*movsi_insn"
> +(define_insn "*movsi_insn_novis3"
>    [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,m,!f,!f,!m,d,d")
>  	(match_operand:SI 1 "input_operand"   "rI,K,m,rJ,f,m,f,J,P"))]
> -  "(register_operand (operands[0], SImode)
> -    || register_or_zero_or_all_ones_operand (operands[1], SImode))"
> +  "(! TARGET_VIS3
> +    && (register_operand (operands[0], SImode)
> +        || register_or_zero_or_all_ones_operand (operands[1], SImode)))"
>    "@
>     mov\t%1, %0
>     sethi\t%%hi(%a1), %0
> @@ -1329,6 +1330,26 @@
>     fones\t%0"
>    [(set_attr "type" "*,*,load,store,fpmove,fpload,fpstore,fga,fga")])
>  
> +(define_insn "*movsi_insn_vis3"
> +  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r, m, r,*f,*f,*f, m,d,d")
> +	(match_operand:SI 1 "input_operand"        "rI,K,m,rJ,*f, r, f, m,*f,J,P"))]
> +  "(TARGET_VIS3
> +    && (register_operand (operands[0], SImode)
> +        || register_or_zero_or_all_ones_operand (operands[1], SImode)))"
> +  "@
> +   mov\t%1, %0
> +   sethi\t%%hi(%a1), %0
> +   ld\t%1, %0
> +   st\t%r1, %0
> +   movstouw\t%1, %0
> +   movwtos\t%1, %0
> +   fmovs\t%1, %0
> +   ld\t%1, %0
> +   st\t%1, %0
> +   fzeros\t%0
> +   fones\t%0"
> +  [(set_attr "type" "*,*,load,store,*,*,fpmove,fpload,fpstore,fga,fga")])

You shouldn't need to split these anymore.  See the enabled attribute, as
used on several other targets so far.


r~
David Miller - Oct. 24, 2011, 9:18 p.m.
From: Richard Henderson <rth@redhat.com>
Date: Mon, 24 Oct 2011 14:05:28 -0700

> You shouldn't need to split these anymore.  See the enabled attribute, as
> used on several other targets so far.

See the patch I posted 2 hours after this one.

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index dfa4caf..1842402 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,46 @@ 
 2011-10-23  David S. Miller  <davem@davemloft.net>
 
+	* config/sparc/sparc.h (SECONDARY_MEMORY_NEEDED): We can move
+	between float and non-float regs when VIS3.
+	* config/sparc/sparc.c (eligible_for_restore_insn): We can't
+	use a restore when the source is a float register.
+	(sparc_split_regreg_legitimate): When VIS3 allow moves between
+	float and integer regs.
+	(sparc_register_move_cost): Adjust to account for VIS3 moves.
+	(sparc_preferred_reload_class): On 32-bit with VIS3 when moving an
+	integer reg to a class containing EXTRA_FP_REGS, constrain to
+	FP_REGS.
+	(sparc_secondary_reload): On 32-bit with VIS3 when moving between
+	float and integer regs we sometimes need a FP_REGS class
+	intermediate move to satisfy the reload.  When this happens
+	specify an extra cost of 2.
+	(*movsi_insn): Rename to have "_novis3" suffix and add !VIS3
+	guard.
+	(*movdi_insn_sp32_v9): Likewise.
+	(*movdi_insn_sp64): Likewise.
+	(*movsf_insn): Likewise.
+	(*movdf_insn_sp32_v9): Likewise.
+	(*movdf_insn_sp64): Likewise.
+	(*zero_extendsidi2_insn_sp64): Likewise.
+	(*sign_extendsidi2_insn): Likewise.
+	(*movsi_insn_vis3): New insn.
+	(*movdi_insn_sp32_v9_vis3): New insn.
+	(*movdi_insn_sp64_vis3): New insn.
+	(*movsf_insn_vis3): New insn.
+	(*movdf_insn_sp32_v9_vis3): New insn.
+	(*movdf_insn_sp64_vis3): New insn.
+	(*zero_extendsidi2_insn_sp64_vis3): New insn.
+	(*sign_extendsidi2_insn_vis3): New insn.
+	(TFmode reg/reg split): Make sure both REG operands are float.
+	(*mov<VM32:mode>_insn): Add "_novis3" suffix and !VIS3 guard. Remove
+	easy constant to integer reg alternatives.
+	(*mov<VM64:mode>_insn_sp64): Likewise.
+	(*mov<VM64:mode>_insn_sp32_novis3): Likewise.
+	(*mov<VM32:mode>_insn_vis3): New insn.
+	(*mov<VM64:mode>_insn_sp64_vis3): New insn.
+	(*mov<VM64:mode>_insn_sp32_vis3): New insn.
+	(VM64 reg<-->reg split): New spliiter for 32-bit.
+
 	* config/sparc/sparc.c (sparc_split_regreg_legitimate): New
 	function.
 	* config/sparc/sparc-protos.h (sparc_split_regreg_legitimate):
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 29d2847..79bb821 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -2996,10 +2996,23 @@  eligible_for_restore_insn (rtx trial, bool return_p)
 {
   rtx pat = PATTERN (trial);
   rtx src = SET_SRC (pat);
+  bool src_is_freg = false;
+  rtx src_reg;
+
+  /* Since we now can do moves between float and integer registers when
+     VIS3 is enabled, we have to catch this case.  We can allow such
+     moves when doing a 'return' however.  */
+  src_reg = src;
+  if (GET_CODE (src_reg) == SUBREG)
+    src_reg = SUBREG_REG (src_reg);
+  if (GET_CODE (src_reg) == REG
+      && SPARC_FP_REG_P (REGNO (src_reg)))
+    src_is_freg = true;
 
   /* The 'restore src,%g0,dest' pattern for word mode and below.  */
   if (GET_MODE_CLASS (GET_MODE (src)) != MODE_FLOAT
-      && arith_operand (src, GET_MODE (src)))
+      && arith_operand (src, GET_MODE (src))
+      && ! src_is_freg)
     {
       if (TARGET_ARCH64)
         return GET_MODE_SIZE (GET_MODE (src)) <= GET_MODE_SIZE (DImode);
@@ -3009,7 +3022,8 @@  eligible_for_restore_insn (rtx trial, bool return_p)
 
   /* The 'restore src,%g0,dest' pattern for double-word mode.  */
   else if (GET_MODE_CLASS (GET_MODE (src)) != MODE_FLOAT
-	   && arith_double_operand (src, GET_MODE (src)))
+	   && arith_double_operand (src, GET_MODE (src))
+	   && ! src_is_freg)
     return GET_MODE_SIZE (GET_MODE (src)) <= GET_MODE_SIZE (DImode);
 
   /* The 'restore src,%g0,dest' pattern for float if no FPU.  */
@@ -7784,6 +7798,13 @@  sparc_split_regreg_legitimate (rtx reg1, rtx reg2)
   if (SPARC_INT_REG_P (regno1) && SPARC_INT_REG_P (regno2))
     return 1;
 
+  if (TARGET_VIS3)
+    {
+      if ((SPARC_INT_REG_P (regno1) && SPARC_FP_REG_P (regno2))
+	  || (SPARC_FP_REG_P (regno1) && SPARC_INT_REG_P (regno2)))
+	return 1;
+    }
+
   return 0;
 }
 
@@ -10302,10 +10323,28 @@  static int
 sparc_register_move_cost (enum machine_mode mode ATTRIBUTE_UNUSED,
 			  reg_class_t from, reg_class_t to)
 {
-  if ((FP_REG_CLASS_P (from) && general_or_i64_p (to))
-      || (general_or_i64_p (from) && FP_REG_CLASS_P (to))
-      || from == FPCC_REGS
-      || to == FPCC_REGS)  
+  bool need_memory = false;
+
+  if (from == FPCC_REGS || to == FPCC_REGS)
+    need_memory = true;
+  else if ((FP_REG_CLASS_P (from) && general_or_i64_p (to))
+	   || (general_or_i64_p (from) && FP_REG_CLASS_P (to)))
+    {
+      if (TARGET_VIS3)
+	{
+	  int size = GET_MODE_SIZE (mode);
+	  if (size == 8 || size == 4)
+	    {
+	      if (! TARGET_ARCH32 || size == 4)
+		return 4;
+	      else
+		return 6;
+	    }
+	}
+      need_memory = true;
+    }
+
+  if (need_memory)
     {
       if (sparc_cpu == PROCESSOR_ULTRASPARC
 	  || sparc_cpu == PROCESSOR_ULTRASPARC3
@@ -11163,6 +11202,18 @@  sparc_preferred_reload_class (rtx x, reg_class_t rclass)
 	}
     }
 
+  if (TARGET_VIS3
+      && ! TARGET_ARCH64
+      && (rclass == EXTRA_FP_REGS
+	  || rclass == GENERAL_OR_EXTRA_FP_REGS))
+    {
+      int regno = true_regnum (x);
+
+      if (SPARC_INT_REG_P (regno))
+	return (rclass == EXTRA_FP_REGS
+		? FP_REGS : GENERAL_OR_FP_REGS);
+    }
+
   return rclass;
 }
 
@@ -11275,6 +11326,9 @@  sparc_secondary_reload (bool in_p, rtx x, reg_class_t rclass_i,
 {
   enum reg_class rclass = (enum reg_class) rclass_i;
 
+  sri->icode = CODE_FOR_nothing;
+  sri->extra_cost = 0;
+
   /* We need a temporary when loading/storing a HImode/QImode value
      between memory and the FPU registers.  This can happen when combine puts
      a paradoxical subreg in a float/fix conversion insn.  */
@@ -11307,6 +11361,25 @@  sparc_secondary_reload (bool in_p, rtx x, reg_class_t rclass_i,
       return NO_REGS;
     }
 
+  if (TARGET_VIS3 && TARGET_ARCH32)
+    {
+      int regno = true_regnum (x);
+
+      /* When using VIS3 fp<-->int register moves, on 32-bit we have
+	 to move 8-byte values in 4-byte pieces.  This only works via
+	 FP_REGS, and not via EXTRA_FP_REGS.  Therefore if we try to
+	 move between EXTRA_FP_REGS and GENERAL_REGS, we will need
+	 an FP_REGS intermediate move.  */
+      if ((rclass == EXTRA_FP_REGS && SPARC_INT_REG_P (regno))
+	  || ((general_or_i64_p (rclass)
+	       || rclass == GENERAL_OR_FP_REGS)
+	      && SPARC_FP_REG_P (regno)))
+	{
+	  sri->extra_cost = 2;
+	  return FP_REGS;
+	}
+    }
+
   return NO_REGS;
 }
 
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 76240f0..aed18fc 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -1040,10 +1040,13 @@  extern char leaf_reg_remap[];
 #define SPARC_SETHI32_P(X) \
   (SPARC_SETHI_P ((unsigned HOST_WIDE_INT) (X) & GET_MODE_MASK (SImode)))
 
-/* On SPARC it is not possible to directly move data between
-   GENERAL_REGS and FP_REGS.  */
+/* On SPARC when not VIS3 it is not possible to directly move data
+   between GENERAL_REGS and FP_REGS.  */
 #define SECONDARY_MEMORY_NEEDED(CLASS1, CLASS2, MODE) \
-  (FP_REG_CLASS_P (CLASS1) != FP_REG_CLASS_P (CLASS2))
+  ((FP_REG_CLASS_P (CLASS1) != FP_REG_CLASS_P (CLASS2)) \
+   && (! TARGET_VIS3 \
+       || GET_MODE_SIZE (MODE) > 8 \
+       || GET_MODE_SIZE (MODE) < 4))
 
 /* Get_secondary_mem widens its argument to BITS_PER_WORD which loses on v9
    because the movsi and movsf patterns don't handle r/f moves.
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index b84699a..0f716d6 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -1312,11 +1312,12 @@ 
     DONE;
 })
 
-(define_insn "*movsi_insn"
+(define_insn "*movsi_insn_novis3"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,m,!f,!f,!m,d,d")
 	(match_operand:SI 1 "input_operand"   "rI,K,m,rJ,f,m,f,J,P"))]
-  "(register_operand (operands[0], SImode)
-    || register_or_zero_or_all_ones_operand (operands[1], SImode))"
+  "(! TARGET_VIS3
+    && (register_operand (operands[0], SImode)
+        || register_or_zero_or_all_ones_operand (operands[1], SImode)))"
   "@
    mov\t%1, %0
    sethi\t%%hi(%a1), %0
@@ -1329,6 +1330,26 @@ 
    fones\t%0"
   [(set_attr "type" "*,*,load,store,fpmove,fpload,fpstore,fga,fga")])
 
+(define_insn "*movsi_insn_vis3"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r, m, r,*f,*f,*f, m,d,d")
+	(match_operand:SI 1 "input_operand"        "rI,K,m,rJ,*f, r, f, m,*f,J,P"))]
+  "(TARGET_VIS3
+    && (register_operand (operands[0], SImode)
+        || register_or_zero_or_all_ones_operand (operands[1], SImode)))"
+  "@
+   mov\t%1, %0
+   sethi\t%%hi(%a1), %0
+   ld\t%1, %0
+   st\t%r1, %0
+   movstouw\t%1, %0
+   movwtos\t%1, %0
+   fmovs\t%1, %0
+   ld\t%1, %0
+   st\t%1, %0
+   fzeros\t%0
+   fones\t%0"
+  [(set_attr "type" "*,*,load,store,*,*,fpmove,fpload,fpstore,fga,fga")])
+
 (define_insn "*movsi_lo_sum"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(lo_sum:SI (match_operand:SI 1 "register_operand" "r")
@@ -1486,13 +1507,14 @@ 
   [(set_attr "type" "store,store,load,*,*,*,*,fpstore,fpload,*,*,*")
    (set_attr "length" "2,*,*,2,2,2,2,*,*,2,2,2")])
 
-(define_insn "*movdi_insn_sp32_v9"
+(define_insn "*movdi_insn_sp32_v9_novis3"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 					"=T,o,T,U,o,r,r,r,?T,?f,?f,?o,?e,?e,?W,b,b")
         (match_operand:DI 1 "input_operand"
 					" J,J,U,T,r,o,i,r, f, T, o, f, e, W, e,J,P"))]
   "! TARGET_ARCH64
    && TARGET_V9
+   && ! TARGET_VIS3
    && (register_operand (operands[0], DImode)
        || register_or_zero_operand (operands[1], DImode))"
   "@
@@ -1517,10 +1539,45 @@ 
    (set_attr "length" "*,2,*,*,2,2,2,2,*,*,2,2,*,*,*,*,*")
    (set_attr "fptype" "*,*,*,*,*,*,*,*,*,*,*,*,double,*,*,double,double")])
 
-(define_insn "*movdi_insn_sp64"
+(define_insn "*movdi_insn_sp32_v9_vis3"
+  [(set (match_operand:DI 0 "nonimmediate_operand"
+					"=T,o,T,U,o,r,r,r,?T,?*f,?*f,?o,?*e,  r,?*f,?*e,?W,b,b")
+        (match_operand:DI 1 "input_operand"
+					" J,J,U,T,r,o,i,r,*f,  T,  o,*f, *e,?*f,  r,  W,*e,J,P"))]
+  "! TARGET_ARCH64
+   && TARGET_V9
+   && TARGET_VIS3
+   && (register_operand (operands[0], DImode)
+       || register_or_zero_operand (operands[1], DImode))"
+  "@
+   stx\t%%g0, %0
+   #
+   std\t%1, %0
+   ldd\t%1, %0
+   #
+   #
+   #
+   #
+   std\t%1, %0
+   ldd\t%1, %0
+   #
+   #
+   fmovd\t%1, %0
+   #
+   #
+   ldd\t%1, %0
+   std\t%1, %0
+   fzero\t%0
+   fone\t%0"
+  [(set_attr "type" "store,store,store,load,*,*,*,*,fpstore,fpload,*,*,*,*,fpmove,fpload,fpstore,fga,fga")
+   (set_attr "length" "*,2,*,*,2,2,2,2,*,*,2,2,*,2,2,*,*,*,*")
+   (set_attr "fptype" "*,*,*,*,*,*,*,*,*,*,*,*,double,*,*,*,*,double,double")])
+
+(define_insn "*movdi_insn_sp64_novis3"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r,m,?e,?e,?W,b,b")
         (match_operand:DI 1 "input_operand"   "rI,N,m,rJ,e,W,e,J,P"))]
   "TARGET_ARCH64
+   && ! TARGET_VIS3
    && (register_operand (operands[0], DImode)
        || register_or_zero_or_all_ones_operand (operands[1], DImode))"
   "@
@@ -1536,6 +1593,28 @@ 
   [(set_attr "type" "*,*,load,store,fpmove,fpload,fpstore,fga,fga")
    (set_attr "fptype" "*,*,*,*,double,*,*,double,double")])
 
+(define_insn "*movdi_insn_sp64_vis3"
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,r, m, r,*e,?*e,?*e,?W,b,b")
+        (match_operand:DI 1 "input_operand"        "rI,N,m,rJ,*e, r, *e,  W,*e,J,P"))]
+  "TARGET_ARCH64
+   && TARGET_VIS3
+   && (register_operand (operands[0], DImode)
+       || register_or_zero_or_all_ones_operand (operands[1], DImode))"
+  "@
+   mov\t%1, %0
+   sethi\t%%hi(%a1), %0
+   ldx\t%1, %0
+   stx\t%r1, %0
+   movdtox\t%1, %0
+   movxtod\t%1, %0
+   fmovd\t%1, %0
+   ldd\t%1, %0
+   std\t%1, %0
+   fzero\t%0
+   fone\t%0"
+  [(set_attr "type" "*,*,load,store,*,*,fpmove,fpload,fpstore,fga,fga")
+   (set_attr "fptype" "*,*,*,*,*,*,double,*,*,double,double")])
+
 (define_expand "movdi_pic_label_ref"
   [(set (match_dup 3) (high:DI
      (unspec:DI [(match_operand:DI 1 "label_ref_operand" "")
@@ -1933,10 +2012,11 @@ 
     DONE;
 })
 
-(define_insn "*movsf_insn"
+(define_insn "*movsf_insn_novis3"
   [(set (match_operand:SF 0 "nonimmediate_operand" "=d, d,f,  *r,*r,*r,f,*r,m,   m")
 	(match_operand:SF 1 "input_operand"        "GY,ZC,f,*rRY, Q, S,m, m,f,*rGY"))]
   "TARGET_FPU
+   && ! TARGET_VIS3
    && (register_operand (operands[0], SFmode)
        || register_or_zero_or_all_ones_operand (operands[1], SFmode))"
 {
@@ -1979,6 +2059,57 @@ 
 }
   [(set_attr "type" "fga,fga,fpmove,*,*,*,fpload,load,fpstore,store")])
 
+(define_insn "*movsf_insn_vis3"
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=d, d,f,  *r,*r,*r,*r, f, f,*r, m,   m")
+	(match_operand:SF 1 "input_operand"        "GY,ZC,f,*rRY, Q, S, f,*r, m, m, f,*rGY"))]
+  "TARGET_FPU
+   && TARGET_VIS3
+   && (register_operand (operands[0], SFmode)
+       || register_or_zero_or_all_ones_operand (operands[1], SFmode))"
+{
+  if (GET_CODE (operands[1]) == CONST_DOUBLE
+      && (which_alternative == 3
+          || which_alternative == 4
+          || which_alternative == 5))
+    {
+      REAL_VALUE_TYPE r;
+      long i;
+
+      REAL_VALUE_FROM_CONST_DOUBLE (r, operands[1]);
+      REAL_VALUE_TO_TARGET_SINGLE (r, i);
+      operands[1] = GEN_INT (i);
+    }
+
+  switch (which_alternative)
+    {
+    case 0:
+      return "fzeros\t%0";
+    case 1:
+      return "fones\t%0";
+    case 2:
+      return "fmovs\t%1, %0";
+    case 3:
+      return "mov\t%1, %0";
+    case 4:
+      return "sethi\t%%hi(%a1), %0";
+    case 5:
+      return "#";
+    case 6:
+      return "movstouw\t%1, %0";
+    case 7:
+      return "movwtos\t%1, %0";
+    case 8:
+    case 9:
+      return "ld\t%1, %0";
+    case 10:
+    case 11:
+      return "st\t%r1, %0";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "fga,fga,fpmove,*,*,*,*,*,fpload,load,fpstore,store")])
+
 ;; Exactly the same as above, except that all `f' cases are deleted.
 ;; This is necessary to prevent reload from ever trying to use a `f' reg
 ;; when -mno-fpu.
@@ -2107,11 +2238,12 @@ 
    (set_attr "length" "*,*,2,2,2")])
 
 ;; We have available v9 double floats but not 64-bit integer registers.
-(define_insn "*movdf_insn_sp32_v9"
+(define_insn "*movdf_insn_sp32_v9_novis3"
   [(set (match_operand:DF 0 "nonimmediate_operand" "=b, b,e,  e, T,W,U,T,  f,     *r,    o")
         (match_operand:DF 1 "input_operand"        "GY,ZC,e,W#F,GY,e,T,U,o#F,*roGYDF,*rGYf"))]
   "TARGET_FPU
    && TARGET_V9
+   && ! TARGET_VIS3
    && ! TARGET_ARCH64
    && (register_operand (operands[0], DFmode)
        || register_or_zero_or_all_ones_operand (operands[1], DFmode))"
@@ -2131,6 +2263,33 @@ 
    (set_attr "length" "*,*,*,*,*,*,*,*,2,2,2")
    (set_attr "fptype" "double,double,double,*,*,*,*,*,*,*,*")])
 
+(define_insn "*movdf_insn_sp32_v9_vis3"
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=b, b,e,*r, f,  e, T,W,U,T,  f,     *r,    o")
+        (match_operand:DF 1 "input_operand"        "GY,ZC,e, f,*r,W#F,GY,e,T,U,o#F,*roGYDF,*rGYf"))]
+  "TARGET_FPU
+   && TARGET_V9
+   && TARGET_VIS3
+   && ! TARGET_ARCH64
+   && (register_operand (operands[0], DFmode)
+       || register_or_zero_or_all_ones_operand (operands[1], DFmode))"
+  "@
+  fzero\t%0
+  fone\t%0
+  fmovd\t%1, %0
+  #
+  #
+  ldd\t%1, %0
+  stx\t%r1, %0
+  std\t%1, %0
+  ldd\t%1, %0
+  std\t%1, %0
+  #
+  #
+  #"
+  [(set_attr "type" "fga,fga,fpmove,*,*,load,store,store,load,store,*,*,*")
+   (set_attr "length" "*,*,*,2,2,*,*,*,*,*,2,2,2")
+   (set_attr "fptype" "double,double,double,*,*,*,*,*,*,*,*,*,*")])
+
 (define_insn "*movdf_insn_sp32_v9_no_fpu"
   [(set (match_operand:DF 0 "nonimmediate_operand" "=U,T,T,r,o")
 	(match_operand:DF 1 "input_operand"    "T,U,G,ro,rG"))]
@@ -2149,10 +2308,11 @@ 
    (set_attr "length" "*,*,*,2,2")])
 
 ;; We have available both v9 double floats and 64-bit integer registers.
-(define_insn "*movdf_insn_sp64"
+(define_insn "*movdf_insn_sp64_novis3"
   [(set (match_operand:DF 0 "nonimmediate_operand" "=b, b,e,  e,W,  *r,*r,   m,*r")
         (match_operand:DF 1 "input_operand"        "GY,ZC,e,W#F,e,*rGY, m,*rGY,DF"))]
   "TARGET_FPU
+   && ! TARGET_VIS3
    && TARGET_ARCH64
    && (register_operand (operands[0], DFmode)
        || register_or_zero_or_all_ones_operand (operands[1], DFmode))"
@@ -2170,6 +2330,30 @@ 
    (set_attr "length" "*,*,*,*,*,*,*,*,2")
    (set_attr "fptype" "double,double,double,*,*,*,*,*,*")])
 
+(define_insn "*movdf_insn_sp64_vis3"
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=b, b,e,*r, e,  e,W,  *r,*r,   m,*r")
+        (match_operand:DF 1 "input_operand"        "GY,ZC,e, e,*r,W#F,e,*rGY, m,*rGY,DF"))]
+  "TARGET_FPU
+   && TARGET_ARCH64
+   && TARGET_VIS3
+   && (register_operand (operands[0], DFmode)
+       || register_or_zero_or_all_ones_operand (operands[1], DFmode))"
+  "@
+  fzero\t%0
+  fone\t%0
+  fmovd\t%1, %0
+  movdtox\t%1, %0
+  movxtod\t%1, %0
+  ldd\t%1, %0
+  std\t%1, %0
+  mov\t%r1, %0
+  ldx\t%1, %0
+  stx\t%r1, %0
+  #"
+  [(set_attr "type" "fga,fga,fpmove,*,*,load,store,*,load,store,*")
+   (set_attr "length" "*,*,*,*,*,*,*,*,*,*,2")
+   (set_attr "fptype" "double,double,double,double,double,*,*,*,*,*,*")])
+
 (define_insn "*movdf_insn_sp64_no_fpu"
   [(set (match_operand:DF 0 "nonimmediate_operand" "=r,r,m")
         (match_operand:DF 1 "input_operand"    "r,m,rG"))]
@@ -2444,7 +2628,8 @@ 
    && (! TARGET_ARCH64
        || (TARGET_FPU
            && ! TARGET_HARD_QUAD)
-       || ! fp_register_operand (operands[0], TFmode))"
+       || (! fp_register_operand (operands[0], TFmode)
+           && ! fp_register_operand (operands[1], TFmode)))"
   [(clobber (const_int 0))]
 {
   rtx set_dest = operands[0];
@@ -2944,15 +3129,29 @@ 
   ""
   "")
 
-(define_insn "*zero_extendsidi2_insn_sp64"
+(define_insn "*zero_extendsidi2_insn_sp64_novis3"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(zero_extend:DI (match_operand:SI 1 "input_operand" "r,m")))]
-  "TARGET_ARCH64 && GET_CODE (operands[1]) != CONST_INT"
+  "TARGET_ARCH64
+   && ! TARGET_VIS3
+   && GET_CODE (operands[1]) != CONST_INT"
   "@
    srl\t%1, 0, %0
    lduw\t%1, %0"
   [(set_attr "type" "shift,load")])
 
+(define_insn "*zero_extendsidi2_insn_sp64_vis3"
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+	(zero_extend:DI (match_operand:SI 1 "input_operand" "r,m,*f")))]
+  "TARGET_ARCH64
+   && TARGET_VIS3
+   && GET_CODE (operands[1]) != CONST_INT"
+  "@
+   srl\t%1, 0, %0
+   lduw\t%1, %0
+   movstouw\t%1, %0"
+  [(set_attr "type" "shift,load,*")])
+
 (define_insn_and_split "*zero_extendsidi2_insn_sp32"
   [(set (match_operand:DI 0 "register_operand" "=r")
         (zero_extend:DI (match_operand:SI 1 "register_operand" "r")))]
@@ -3276,16 +3475,27 @@ 
   "TARGET_ARCH64"
   "")
 
-(define_insn "*sign_extendsidi2_insn"
+(define_insn "*sign_extendsidi2_insn_novis3"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
 	(sign_extend:DI (match_operand:SI 1 "input_operand" "r,m")))]
-  "TARGET_ARCH64"
+  "TARGET_ARCH64 && ! TARGET_VIS3"
   "@
   sra\t%1, 0, %0
   ldsw\t%1, %0"
   [(set_attr "type" "shift,sload")
    (set_attr "us3load_type" "*,3cycle")])
 
+(define_insn "*sign_extendsidi2_insn_vis3"
+  [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+	(sign_extend:DI (match_operand:SI 1 "input_operand" "r,m,*f")))]
+  "TARGET_ARCH64 && TARGET_VIS3"
+  "@
+  sra\t%1, 0, %0
+  ldsw\t%1, %0
+  movstosw\t%1, %0"
+  [(set_attr "type" "shift,sload,*")
+   (set_attr "us3load_type" "*,3cycle,*")])
+
 
 ;; Special pattern for optimizing bit-field compares.  This is needed
 ;; because combine uses this as a canonical form.
@@ -7769,10 +7979,11 @@ 
     DONE;
 })
 
-(define_insn "*mov<VM32:mode>_insn"
-  [(set (match_operand:VM32 0 "nonimmediate_operand" "=f, f,f,f,m, m,r,m, r, r")
-	(match_operand:VM32 1 "input_operand"        "GY,ZC,f,m,f,GY,m,r,GY,ZC"))]
+(define_insn "*mov<VM32:mode>_insn_novis3"
+  [(set (match_operand:VM32 0 "nonimmediate_operand" "=f, f,f,f,m, m,r,m,*r")
+	(match_operand:VM32 1 "input_operand"        "GY,ZC,f,m,f,GY,m,r,*r"))]
   "TARGET_VIS
+   && ! TARGET_VIS3
    && (register_operand (operands[0], <VM32:MODE>mode)
        || register_or_zero_or_all_ones_operand (operands[1], <VM32:MODE>mode))"
   "@
@@ -7784,14 +7995,35 @@ 
   st\t%r1, %0
   ld\t%1, %0
   st\t%1, %0
-  mov\t0, %0
-  mov\t-1, %0"
-  [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*,*")])
+  mov\t%1, %0"
+  [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*")])
 
-(define_insn "*mov<VM64:mode>_insn_sp64"
-  [(set (match_operand:VM64 0 "nonimmediate_operand" "=e, e,e,e,m, m,r,m, r, r")
-	(match_operand:VM64 1 "input_operand"        "GY,ZC,e,m,e,GY,m,r,GY,ZC"))]
+(define_insn "*mov<VM32:mode>_insn_vis3"
+  [(set (match_operand:VM32 0 "nonimmediate_operand" "=f, f,f,f,m, m,*r, m,*r,*r, f")
+	(match_operand:VM32 1 "input_operand"        "GY,ZC,f,m,f,GY, m,*r,*r, f,*r"))]
   "TARGET_VIS
+   && TARGET_VIS3
+   && (register_operand (operands[0], <VM32:MODE>mode)
+       || register_or_zero_or_all_ones_operand (operands[1], <VM32:MODE>mode))"
+  "@
+  fzeros\t%0
+  fones\t%0
+  fsrc1s\t%1, %0
+  ld\t%1, %0
+  st\t%1, %0
+  st\t%r1, %0
+  ld\t%1, %0
+  st\t%1, %0
+  mov\t%1, %0
+  movstouw\t%1, %0
+  movwtos\t%1, %0"
+  [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*,*,*")])
+
+(define_insn "*mov<VM64:mode>_insn_sp64_novis3"
+  [(set (match_operand:VM64 0 "nonimmediate_operand" "=e, e,e,e,m, m,r,m,*r")
+	(match_operand:VM64 1 "input_operand"        "GY,ZC,e,m,e,GY,m,r,*r"))]
+  "TARGET_VIS
+   && ! TARGET_VIS3
    && TARGET_ARCH64
    && (register_operand (operands[0], <VM64:MODE>mode)
        || register_or_zero_or_all_ones_operand (operands[1], <VM64:MODE>mode))"
@@ -7804,14 +8036,36 @@ 
   stx\t%r1, %0
   ldx\t%1, %0
   stx\t%1, %0
-  mov\t0, %0
-  mov\t-1, %0"
-  [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*,*")])
+  mov\t%1, %0"
+  [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*")])
 
-(define_insn "*mov<VM64:mode>_insn_sp32"
-  [(set (match_operand:VM64 0 "nonimmediate_operand" "=e, e,e,e,m, m,U,T,o, r, r")
-	(match_operand:VM64 1 "input_operand"        "GY,ZC,e,m,e,GY,T,U,r,GY,ZC"))]
+(define_insn "*mov<VM64:mode>_insn_sp64_vis3"
+  [(set (match_operand:VM64 0 "nonimmediate_operand" "=e, e,e,e,m, m,*r, m,*r, f,*r")
+	(match_operand:VM64 1 "input_operand"        "GY,ZC,e,m,e,GY, m,*r, f,*r,*r"))]
   "TARGET_VIS
+   && TARGET_VIS3
+   && TARGET_ARCH64
+   && (register_operand (operands[0], <VM64:MODE>mode)
+       || register_or_zero_or_all_ones_operand (operands[1], <VM64:MODE>mode))"
+  "@
+  fzero\t%0
+  fone\t%0
+  fsrc1\t%1, %0
+  ldd\t%1, %0
+  std\t%1, %0
+  stx\t%r1, %0
+  ldx\t%1, %0
+  stx\t%1, %0
+  movdtox\t%1, %0
+  movxtod\t%1, %0
+  mov\t%1, %0"
+  [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*,*,*")])
+
+(define_insn "*mov<VM64:mode>_insn_sp32_novis3"
+  [(set (match_operand:VM64 0 "nonimmediate_operand" "=e, e,e,e,m, m,U,T,o,*r")
+	(match_operand:VM64 1 "input_operand"        "GY,ZC,e,m,e,GY,T,U,r,*r"))]
+  "TARGET_VIS
+   && ! TARGET_VIS3
    && ! TARGET_ARCH64
    && (register_operand (operands[0], <VM64:MODE>mode)
        || register_or_zero_or_all_ones_operand (operands[1], <VM64:MODE>mode))"
@@ -7825,10 +8079,33 @@ 
   ldd\t%1, %0
   std\t%1, %0
   #
-  mov 0, %L0; mov 0, %H0
-  mov -1, %L0; mov -1, %H0"
-  [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*,*,*")
-   (set_attr "length" "*,*,*,*,*,*,*,*,2,2,2")])
+  #"
+  [(set_attr "type" "fga,fga,fga,fpload,fpstore,store,load,store,*,*")
+   (set_attr "length" "*,*,*,*,*,*,*,*,2,2")])
+
+(define_insn "*mov<VM64:mode>_insn_sp32_vis3"
+  [(set (match_operand:VM64 0 "nonimmediate_operand" "=e, e,e,*r, f,e,m, m,U,T, o,*r")
+	(match_operand:VM64 1 "input_operand"        "GY,ZC,e, f,*r,m,e,GY,T,U,*r,*r"))]
+  "TARGET_VIS
+   && TARGET_VIS3
+   && ! TARGET_ARCH64
+   && (register_operand (operands[0], <VM64:MODE>mode)
+       || register_or_zero_or_all_ones_operand (operands[1], <VM64:MODE>mode))"
+  "@
+  fzero\t%0
+  fone\t%0
+  fsrc1\t%1, %0
+  #
+  #
+  ldd\t%1, %0
+  std\t%1, %0
+  stx\t%r1, %0
+  ldd\t%1, %0
+  std\t%1, %0
+  #
+  #"
+  [(set_attr "type" "fga,fga,fga,*,*,fpload,fpstore,store,load,store,*,*")
+   (set_attr "length" "*,*,*,2,2,*,*,*,*,*,2,2")])
 
 (define_split
   [(set (match_operand:VM64 0 "memory_operand" "")
@@ -7851,6 +8128,40 @@ 
   DONE;
 })
 
+(define_split
+  [(set (match_operand:VM64 0 "register_operand" "")
+        (match_operand:VM64 1 "register_operand" ""))]
+  "reload_completed
+   && TARGET_VIS
+   && ! TARGET_ARCH64
+   && sparc_split_regreg_legitimate (operands[0], operands[1])"
+  [(clobber (const_int 0))]
+{
+  rtx set_dest = operands[0];
+  rtx set_src = operands[1];
+  rtx dest1, dest2;
+  rtx src1, src2;
+
+  dest1 = gen_highpart (SImode, set_dest);
+  dest2 = gen_lowpart (SImode, set_dest);
+  src1 = gen_highpart (SImode, set_src);
+  src2 = gen_lowpart (SImode, set_src);
+
+  /* Now emit using the real source and destination we found, swapping
+     the order if we detect overlap.  */
+  if (reg_overlap_mentioned_p (dest1, src2))
+    {
+      emit_insn (gen_movsi (dest2, src2));
+      emit_insn (gen_movsi (dest1, src1));
+    }
+  else
+    {
+      emit_insn (gen_movsi (dest1, src1));
+      emit_insn (gen_movsi (dest2, src2));
+    }
+  DONE;
+})
+
 (define_expand "vec_init<mode>"
   [(match_operand:VMALL 0 "register_operand" "")
    (match_operand:VMALL 1 "" "")]