diff mbox

[RFC] GCC 4.9, powerpc, allow TImode in VSX registers

Message ID 20130201201505.GA30409@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner Feb. 1, 2013, 8:15 p.m. UTC
When I did the initial power7 port, I punted on allowing TImode in the VSX
registers because I couldn't get it to work.  I am now revisiting it, and these
patches are my current effort, and I was wondering if people had comments on
them.  In terms of performance, there are two benchmarks in the Spec 2006 suite
that have minor regressions (perlbench and gamess), and 3 that have minor
improvements (hmmer, h264ref, and gromacs), so overall it looks like a wash.  I
do want to look the regressions, and see if there is something simple to
tweak.

Some issues I ran into include:

I needed to set CANNOT_CHANGE_MODE so that TImode won't overlap with smaller
data types, due to the scalar portion of the register being in the upper
64-bits of the VSX register.

I limited the available address formats for TImode to be REG+REG needed for VSX
instructions.

I discovered that setjmp/longjmp and exception handling needed to create TImode
values with the STACK_SAVEAREA_MODE macro.  However, the implementation of this
needs REG+OFFSET addressing.  So, I added a new type PTImode, which is only
used for STACK_SAVEAREA_MODE, and PTImode is limited to the GPRs.

If I enable logical operations in TImode mode (and, xor, etc.), the compiler
will convert DImode logical operations to TImode for 32-bit programs.  In the
future, I think I will tune this and/or provide insn splitters for DImode
logical operations.  For now, I just disallow logical operations on TImode if
32-bit.

I added a debug switch (-mvsx-timode) to disable putting TImode into VSX
registers.

2013-01-31  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/vector.md (mul<mode>3): Use the combined macro
	VECTOR_UNIT_ALTIVEC_OR_VSX_P instead of separate calls to
	VECTOR_UNIT_ALTIVEC_P and VECTOR_UNIT_VSX_P.
	(vcond<mode><mode>): Likewise.
	(vcondu<mode><mode>): Likewise.
	(vector_gtu<mode>): Likewise.
	(vector_gte<mode>): Likewise.
	(xor<mode>3): Don't allow logical operations on TImode in 32-bit
	to prevent the compiler from converting DImode operations to
	TImode.
	(ior<mode>3): Likewise.
	(and<mode>3): Likewise.
	(one_cmpl<mode>2): Likewise.
	(nor<mode>3): Likewise.
	(andc<mode>3): Likewise.

	* config/rs6000/constraints.md (wt constraint): New constraint
	that returns VSX_REGS if TImode is allowed in VSX registers.

	* config/rs6000/predicates.md (easy_fp_constant): 0.0f is an easy
	constant under VSX.

	* config/rs6000/rs6000-modes.def (PTImode): Define, PTImode is
	similar to TImode, but it is restricted to being in the GPRs.

	* config/rs6000/rs6000.opt (-mvsx-timode): New switch to allow
	TImode to occupy a single VSX register.

	* config/rs6000/rs6000-cpus.def (ISA_2_6_MASKS_SERVER): Default to
	-mvsx-timode for power7/power8.
	(power7 cpu): Likewise.
	(power8 cpu): Likewise.

	* config/rs6000/rs6000.c (rs6000_hard_regno_nregs_internal): Make
	sure that TFmode/TDmode take up two registers if they are ever
	allowed in the upper VSX registers.
	(rs6000_hard_regno_mode_ok): If -mvsx-timode, allow TImode in VSX
	registers.
	(rs6000_init_hard_regno_mode_ok): Likewise.
	(rs6000_debug_reg_global): Add debugging for PTImode and wt
	constraint.  Print if LRA is turned on.
	(rs6000_option_override_internal): Give an error if -mvsx-timode
	and VSX is not enabled.
	(invalid_e500_subreg): Handle PTImode, restricting it to GPRs.  If
	-mvsx-timode, restrict TImode to reg+reg addressing, and PTImode
	to reg+offset addressing.  Use PTImode when checking offset
	addresses for validity.
	(reg_offset_addressing_ok_p): Likewise.
	(rs6000_legitimate_offset_address_p): Likewise.
	(rs6000_legitimize_address): Likewise.
	(rs6000_legitimize_reload_address): Likewise.
	(rs6000_legitimate_address_p): Likewise.
	(rs6000_eliminate_indexed_memrefs): Likewise.
	(rs6000_emit_move): Likewise.
	(rs6000_secondary_reload): Likewise.
	(rs6000_secondary_reload_inner): Handle PTImode.  Allow 64-bit
	reloads to fpr registers to continue to use reg+offset addressing,
	but 64-bit reloads to altivec registers need reg+reg addressing.
	Drop test for PRE_MODIFY, since VSX loads/stores no longer support
	it.  Treat LO_SUM like a PLUS operation.
	(rs6000_secondary_reload_class): If type is 64-bit, prefer to use
	FLOAT_REGS instead of VSX_RGS to allow use of reg+offset
	addressing.
	(rs6000_cannot_change_mode_class): Do not allow TImode in VSX
	registers to share a register with a smaller sized type, since VSX
	puts scalars in the upper 64-bits.
	(print_operand): Add support for PTImode.
	(rs6000_register_move_cost): Use VECTOR_MEM_VSX_P instead of
	VECTOR_UNIT_VSX_P to catch types that can be loaded in VSX
	registers, but don't have arithmetic support.
	(rs6000_memory_move_cost): Add test for VSX.
	(rs6000_opt_masks): Add -mvsx-timode.

	* config/rs6000/vsx.md (VSm): Change to use 64-bit aligned moves
	for TImode.
	(VSs): Likewise.
	(VSr): Use wt constraint for TImode.
	(VSv): Drop TImode support.
	(vsx_movti): Delete, replace with versions for 32-bit and 64-bit.
	(vsx_movti_64bit): Likewise.
	(vsx_movti_32bit): Likewise.
	(vec_store_<mode>): Use VSX iterator instead of vector iterator.
	(vsx_and<mode>3): Delete use of '?' constraint on inputs, just put
	one '?' on the appropriate output constraint.  Do not allow TImode
	logical operations on 32-bit systems.
	(vsx_ior<mode>3): Likewise.
	(vsx_xor<mode>3): Likewise.
	(vsx_one_cmpl<mode>2): Likewise.
	(vsx_nor<mode>3): Likewise.
	(vsx_andc<mode>3): Likewise.
	(vsx_concat_<mode>): Likewise.
	(vsx_xxpermdi_<mode>): Fix thinko for non V2DF/V2DI modes.

	* config/rs6000/rs6000.h (MASK_VSX_TIMODE): Map from
	OPTION_MASK_VSX_TIMODE.
	(enum rs6000_reg_class_enum): Add RS6000_CONSTRAINT_wt.
	(STACK_SAVEAREA_MODE): Use PTImode instead of TImode.

	* config/rs6000/rs6000.md (INT mode attribute): Add PTImode.
	(TI2 iterator): New iterator for TImode, PTImode.
	(wd mode attribute): Add values for vector types.
	(movti_string): Replace TI move operations with operations for
	TImode and PTImode.  Add support for TImode being allowed in VSX
	registers.
	(mov<mode>_string, TImode/PTImode): Likewise.
	(movti_ppc64): Likewise.
	(mov<mode>_ppc64, TImode/PTImode): Likewise.
	(TI mode splitters): Likewise.

	* doc/md.texi (PowerPC and IBM RS6000 constraints): Document wt
	constraint.
diff mbox

Patch

Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(revision 195592)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -249,7 +249,7 @@  (define_expand "mul<mode>3"
   [(set (match_operand:VEC_F 0 "vfloat_operand" "")
 	(mult:VEC_F (match_operand:VEC_F 1 "vfloat_operand" "")
 		    (match_operand:VEC_F 2 "vfloat_operand" "")))]
-  "VECTOR_UNIT_VSX_P (<MODE>mode) || VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
 {
   if (<MODE>mode == V4SFmode && VECTOR_UNIT_ALTIVEC_P (<MODE>mode))
     {
@@ -395,7 +395,7 @@  (define_expand "vcond<mode><mode>"
 			  (match_operand:VEC_I 5 "vint_operand" "")])
 	 (match_operand:VEC_I 1 "vint_operand" "")
 	 (match_operand:VEC_I 2 "vint_operand" "")))]
-  "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "
 {
   if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
@@ -451,7 +451,7 @@  (define_expand "vcondu<mode><mode>"
 			  (match_operand:VEC_I 5 "vint_operand" "")])
 	 (match_operand:VEC_I 1 "vint_operand" "")
 	 (match_operand:VEC_I 2 "vint_operand" "")))]
-  "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "
 {
   if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
@@ -505,14 +505,14 @@  (define_expand "vector_gtu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand" "")
 	(gtu:VEC_I (match_operand:VEC_I 1 "vint_operand" "")
 		   (match_operand:VEC_I 2 "vint_operand" "")))]
-  "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
 (define_expand "vector_geu<mode>"
   [(set (match_operand:VEC_I 0 "vint_operand" "")
 	(geu:VEC_I (match_operand:VEC_I 1 "vint_operand" "")
 		   (match_operand:VEC_I 2 "vint_operand" "")))]
-  "VECTOR_UNIT_ALTIVEC_P (<MODE>mode)"
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
 (define_insn_and_split "*vector_uneq<mode>"
@@ -709,45 +709,55 @@  (define_expand "cr6_test_for_lt_reverse"
 
 
 ;; Vector logical instructions
+;; Do not support TImode logical instructions on 32-bit at present, because the
+;; compiler will see that we have a TImode and when it wanted DImode, and
+;; convert the DImode to TImode, store it on the stack, and load it in a VSX
+;; register.
 (define_expand "xor<mode>3"
   [(set (match_operand:VEC_L 0 "vlogical_operand" "")
         (xor:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
 		   (match_operand:VEC_L 2 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
 
 (define_expand "ior<mode>3"
   [(set (match_operand:VEC_L 0 "vlogical_operand" "")
         (ior:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
 		   (match_operand:VEC_L 2 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
 
 (define_expand "and<mode>3"
   [(set (match_operand:VEC_L 0 "vlogical_operand" "")
         (and:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
 		   (match_operand:VEC_L 2 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
 
 (define_expand "one_cmpl<mode>2"
   [(set (match_operand:VEC_L 0 "vlogical_operand" "")
         (not:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
 
 (define_expand "nor<mode>3"
   [(set (match_operand:VEC_L 0 "vlogical_operand" "")
         (not:VEC_L (ior:VEC_L (match_operand:VEC_L 1 "vlogical_operand" "")
 			      (match_operand:VEC_L 2 "vlogical_operand" ""))))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
 
 (define_expand "andc<mode>3"
   [(set (match_operand:VEC_L 0 "vlogical_operand" "")
         (and:VEC_L (not:VEC_L (match_operand:VEC_L 2 "vlogical_operand" ""))
 		   (match_operand:VEC_L 1 "vlogical_operand" "")))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
+  "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "")
 
 ;; Same size conversions
Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(revision 195592)
+++ gcc/config/rs6000/constraints.md	(working copy)
@@ -64,6 +64,10 @@  (define_register_constraint "wf" "rs6000
 (define_register_constraint "ws" "rs6000_constraints[RS6000_CONSTRAINT_ws]"
   "@internal")
 
+;; TImode in VSX registers
+(define_register_constraint "wt" "rs6000_constraints[RS6000_CONSTRAINT_wt]"
+  "@internal")
+
 ;; any VSX register
 (define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]"
   "@internal")
Index: gcc/config/rs6000/predicates.md
===================================================================
--- gcc/config/rs6000/predicates.md	(revision 195592)
+++ gcc/config/rs6000/predicates.md	(working copy)
@@ -329,6 +329,11 @@  (define_predicate "easy_fp_constant"
       && mode != DImode)
     return 1;
 
+  /* The constant 0.0 is easy under VSX.  */
+  if ((mode == SFmode || mode == DFmode || mode == SDmode || mode == DDmode)
+      && VECTOR_UNIT_VSX_P (DFmode) && op == CONST0_RTX (mode))
+    return 1;
+
   if (DECIMAL_FLOAT_MODE_P (mode))
     return 0;
 
Index: gcc/config/rs6000/rs6000-modes.def
===================================================================
--- gcc/config/rs6000/rs6000-modes.def	(revision 195592)
+++ gcc/config/rs6000/rs6000-modes.def	(working copy)
@@ -41,3 +41,6 @@  VECTOR_MODE (INT, DI, 1);
 VECTOR_MODES (FLOAT, 8);      /*             V4HF V2SF */
 VECTOR_MODES (FLOAT, 16);     /*       V8HF  V4SF V2DF */
 VECTOR_MODES (FLOAT, 32);     /*       V16HF V8SF V4DF */
+
+/* Replacement for TImode that only is allowed in GPRs.  */
+PARTIAL_INT_MODE (TI);
Index: gcc/config/rs6000/rs6000-cpus.def
===================================================================
--- gcc/config/rs6000/rs6000-cpus.def	(revision 195592)
+++ gcc/config/rs6000/rs6000-cpus.def	(working copy)
@@ -42,7 +42,8 @@ 
 #define ISA_2_6_MASKS_SERVER	(ISA_2_5_MASKS_SERVER			\
 				 | OPTION_MASK_POPCNTD			\
 				 | OPTION_MASK_ALTIVEC			\
-				 | OPTION_MASK_VSX)
+				 | OPTION_MASK_VSX			\
+				 | OPTION_MASK_VSX_TIMODE)
 
 #define POWERPC_7400_MASK	(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_ALTIVEC)
 
@@ -76,7 +77,8 @@ 
 				 | OPTION_MASK_RECIP_PRECISION		\
 				 | OPTION_MASK_SOFT_FLOAT		\
 				 | OPTION_MASK_STRICT_ALIGN_OPTIONAL	\
-				 | OPTION_MASK_VSX)
+				 | OPTION_MASK_VSX			\
+				 | OPTION_MASK_VSX_TIMODE)
 
 #endif
 
@@ -165,11 +167,11 @@  RS6000_CPU ("power6x", PROCESSOR_POWER6,
 RS6000_CPU ("power7", PROCESSOR_POWER7,   /* Don't add MASK_ISEL by default */
 	    POWERPC_7400_MASK | MASK_POWERPC64 | MASK_PPC_GPOPT | MASK_MFCRF
 	    | MASK_POPCNTB | MASK_FPRND | MASK_CMPB | MASK_DFP | MASK_POPCNTD
-	    | MASK_VSX | MASK_RECIP_PRECISION)
+	    | MASK_VSX | MASK_RECIP_PRECISION | MASK_VSX_TIMODE)
 RS6000_CPU ("power8", PROCESSOR_POWER7,   /* Don't add MASK_ISEL by default */
 	    POWERPC_7400_MASK | MASK_POWERPC64 | MASK_PPC_GPOPT | MASK_MFCRF
 	    | MASK_POPCNTB | MASK_FPRND | MASK_CMPB | MASK_DFP | MASK_POPCNTD
-	    | MASK_VSX | MASK_RECIP_PRECISION)
+	    | MASK_VSX | MASK_RECIP_PRECISION | MASK_VSX_TIMODE)
 RS6000_CPU ("powerpc", PROCESSOR_POWERPC, 0)
 RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, MASK_PPC_GFXOPT | MASK_POWERPC64)
 RS6000_CPU ("rs64", PROCESSOR_RS64A, MASK_PPC_GFXOPT | MASK_POWERPC64)
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 195592)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -514,3 +514,7 @@  Use/do not use r11 to hold the static li
 msave-toc-indirect
 Target Report Var(TARGET_SAVE_TOC_INDIRECT) Save
 Control whether we save the TOC in the prologue for indirect calls or generate the save inline
+
+mvsx-timode
+Target Undocumented Mask(VSX_TIMODE) Var(rs6000_isa_flags)
+; Allow/disallow TImode in VSX registers
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 195625)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -1516,8 +1516,9 @@  rs6000_hard_regno_nregs_internal (int re
 {
   unsigned HOST_WIDE_INT reg_size;
 
+  /* TF/TD modes are special in that they always take 2 registers.  */
   if (FP_REGNO_P (regno))
-    reg_size = (VECTOR_MEM_VSX_P (mode)
+    reg_size = ((VECTOR_MEM_VSX_P (mode) && mode != TDmode && mode != TFmode)
 		? UNITS_PER_VSX_WORD
 		: UNITS_PER_FP_WORD);
 
@@ -1561,14 +1562,18 @@  rs6000_hard_regno_mode_ok (int regno, en
 	return ALTIVEC_REGNO_P (last_regno);
     }
 
+  /* Allow TImode in all VSX registers if the user asked for it.  Note, PTImode
+     can only go in GPRs.  */
+  if (mode == TImode && TARGET_VSX_TIMODE && VSX_REGNO_P (regno))
+    return 1;
+
   /* The GPRs can hold any mode, but values bigger than one register
      cannot go past R31.  */
   if (INT_REGNO_P (regno))
     return INT_REGNO_P (last_regno);
 
   /* The float registers (except for VSX vector modes) can only hold floating
-     modes and DImode.  This excludes the 32-bit decimal float mode for
-     now.  */
+     modes and DImode.  */
   if (FP_REGNO_P (regno))
     {
       if (SCALAR_FLOAT_MODE_P (mode)
@@ -1602,9 +1607,8 @@  rs6000_hard_regno_mode_ok (int regno, en
   if (SPE_SIMD_REGNO_P (regno) && TARGET_SPE && SPE_VECTOR_MODE (mode))
     return 1;
 
-  /* We cannot put TImode anywhere except general register and it must be able
-     to fit within the register set.  In the future, allow TImode in the
-     Altivec or VSX registers.  */
+  /* We cannot put non-VSX TImode or PTImode anywhere except general register
+     and it must be able to fit within the register set.  */
 
   return GET_MODE_SIZE (mode) <= UNITS_PER_WORD;
 }
@@ -1721,6 +1725,7 @@  rs6000_debug_reg_global (void)
     SImode,
     DImode,
     TImode,
+    PTImode,
     SFmode,
     DFmode,
     TFmode,
@@ -1801,6 +1806,7 @@  rs6000_debug_reg_global (void)
 	   "wg reg_class = %s\n"
 	   "wl reg_class = %s\n"
 	   "ws reg_class = %s\n"
+	   "wt reg_class = %s\n"
 	   "wx reg_class = %s\n"
 	   "wz reg_class = %s\n"
 	   "\n",
@@ -1813,6 +1819,7 @@  rs6000_debug_reg_global (void)
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wg]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wl]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_ws]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wt]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wx]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wz]]);
 
@@ -2043,6 +2050,9 @@  rs6000_debug_reg_global (void)
   if (TARGET_LINK_STACK)
     fprintf (stderr, DEBUG_FMT_S, "link_stack", "true");
 
+  if (targetm.lra_p ())
+    fprintf (stderr, DEBUG_FMT_S, "lra", "true");
+
   fprintf (stderr, DEBUG_FMT_S, "plt-format",
 	   TARGET_SECURE_PLT ? "secure" : "bss");
   fprintf (stderr, DEBUG_FMT_S, "struct-return",
@@ -2188,6 +2198,13 @@  rs6000_init_hard_regno_mode_ok (bool glo
       rs6000_vector_align[DFmode] = align64;
     }
 
+  /* Allow TImode in VSX register and set the VSX memory macros.  */
+  if (TARGET_VSX && TARGET_VSX_TIMODE)
+    {
+      rs6000_vector_mem[TImode] = VECTOR_VSX;
+      rs6000_vector_align[TImode] = align64;
+    }
+
   /* TODO add SPE and paired floating point vector support.  */
 
   /* Register class constraints for the constraints that depend on compile
@@ -2211,6 +2228,8 @@  rs6000_init_hard_regno_mode_ok (bool glo
       rs6000_constraints[RS6000_CONSTRAINT_ws] = (TARGET_VSX_SCALAR_MEMORY
 						  ? VSX_REGS
 						  : FLOAT_REGS);
+      if (TARGET_VSX_TIMODE)
+	rs6000_constraints[RS6000_CONSTRAINT_wt] = VSX_REGS;
     }
 
   /* Add conditional constraints based on various options, to allow us to
@@ -2254,6 +2273,11 @@  rs6000_init_hard_regno_mode_ok (bool glo
 	      rs6000_vector_reload[DDmode][0]  = CODE_FOR_reload_dd_di_store;
 	      rs6000_vector_reload[DDmode][1]  = CODE_FOR_reload_dd_di_load;
 	    }
+	  if (TARGET_VSX_TIMODE)
+	    {
+	      rs6000_vector_reload[TImode][0]  = CODE_FOR_reload_ti_di_store;
+	      rs6000_vector_reload[TImode][1]  = CODE_FOR_reload_ti_di_load;
+	    }
 	}
       else
 	{
@@ -2276,6 +2300,11 @@  rs6000_init_hard_regno_mode_ok (bool glo
 	      rs6000_vector_reload[DDmode][0]  = CODE_FOR_reload_dd_si_store;
 	      rs6000_vector_reload[DDmode][1]  = CODE_FOR_reload_dd_si_load;
 	    }
+	  if (TARGET_VSX_TIMODE)
+	    {
+	      rs6000_vector_reload[TImode][0]  = CODE_FOR_reload_ti_si_store;
+	      rs6000_vector_reload[TImode][1]  = CODE_FOR_reload_ti_si_load;
+	    }
 	}
     }
 
@@ -2784,6 +2813,13 @@  rs6000_option_override_internal (bool gl
   else if (TARGET_ALTIVEC)
     rs6000_isa_flags |= (OPTION_MASK_PPC_GFXOPT & ~rs6000_isa_flags_explicit);
 
+  if (TARGET_VSX_TIMODE && !TARGET_VSX)
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_VSX_TIMODE)
+	error ("-mvsx-timode requires -mvsx");
+      rs6000_isa_flags &= ~OPTION_MASK_VSX_TIMODE;
+    }
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after defaults", rs6000_isa_flags);
 
@@ -5062,7 +5098,7 @@  invalid_e500_subreg (rtx op, enum machin
 	 purpose.  */
       if (GET_CODE (op) == SUBREG
 	  && (mode == SImode || mode == DImode || mode == TImode
-	      || mode == DDmode || mode == TDmode)
+	      || mode == DDmode || mode == TDmode || mode == PTImode)
 	  && REG_P (SUBREG_REG (op))
 	  && (GET_MODE (SUBREG_REG (op)) == DFmode
 	      || GET_MODE (SUBREG_REG (op)) == TFmode))
@@ -5075,6 +5111,7 @@  invalid_e500_subreg (rtx op, enum machin
 	  && REG_P (SUBREG_REG (op))
 	  && (GET_MODE (SUBREG_REG (op)) == DImode
 	      || GET_MODE (SUBREG_REG (op)) == TImode
+	      || GET_MODE (SUBREG_REG (op)) == PTImode
 	      || GET_MODE (SUBREG_REG (op)) == DDmode
 	      || GET_MODE (SUBREG_REG (op)) == TDmode))
 	return true;
@@ -5297,7 +5334,11 @@  reg_offset_addressing_ok_p (enum machine
     case V4SImode:
     case V2DFmode:
     case V2DImode:
-      /* AltiVec/VSX vector modes.  Only reg+reg addressing is valid.  */
+    case TImode:
+      /* AltiVec/VSX vector modes.  Only reg+reg addressing is valid.  While
+	 TImode is not a vector mode, if we want to use the VSX registers to
+	 move it around, we need to restrict ourselves to reg+reg
+	 addressing.  */
       if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode))
 	return false;
       break;
@@ -5550,7 +5591,7 @@  rs6000_legitimate_offset_address_p (enum
 
       /* If we are using VSX scalar loads, restrict ourselves to reg+reg
 	 addressing.  */
-      if (mode == DFmode && VECTOR_MEM_VSX_P (DFmode))
+      if (VECTOR_MEM_VSX_P (mode))
 	return false;
 
       if (!worst_case)
@@ -5564,6 +5605,7 @@  rs6000_legitimate_offset_address_p (enum
     case TFmode:
     case TDmode:
     case TImode:
+    case PTImode:
       if (TARGET_E500_DOUBLE)
 	return (SPE_CONST_OFFSET_OK (offset)
 		&& SPE_CONST_OFFSET_OK (offset + 8));
@@ -5737,11 +5779,12 @@  rs6000_legitimize_address (rtx x, rtx ol
     case TFmode:
     case TDmode:
     case TImode:
+    case PTImode:
       /* As in legitimate_offset_address_p we do not assume
 	 worst-case.  The mode here is just a hint as to the registers
 	 used.  A TImode is usually in gprs, but may actually be in
 	 fprs.  Leave worst-case scenario for reload to handle via
-	 insn constraints.  */
+	 insn constraints.  PTImode is only GPRs.  */
       extra = 8;
       break;
     default:
@@ -6472,7 +6515,7 @@  rs6000_legitimize_reload_address (rtx x,
       && !(TARGET_E500_DOUBLE && (mode == DFmode || mode == TFmode
 				  || mode == DDmode || mode == TDmode
 				  || mode == DImode))
-      && VECTOR_MEM_NONE_P (mode))
+      && (!VECTOR_MODE_P (mode) || VECTOR_MEM_NONE_P (mode)))
     {
       HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
       HOST_WIDE_INT low = ((val & 0xffff) ^ 0x8000) - 0x8000;
@@ -6503,7 +6546,7 @@  rs6000_legitimize_reload_address (rtx x,
 
   if (GET_CODE (x) == SYMBOL_REF
       && reg_offset_p
-      && VECTOR_MEM_NONE_P (mode)
+      && (!VECTOR_MODE_P (mode) || VECTOR_MEM_NONE_P (mode))
       && !SPE_VECTOR_MODE (mode)
 #if TARGET_MACHO
       && DEFAULT_ABI == ABI_DARWIN
@@ -6529,6 +6572,8 @@  rs6000_legitimize_reload_address (rtx x,
 	 mem is sufficiently aligned.  */
       && mode != TFmode
       && mode != TDmode
+      && (mode != TImode || !TARGET_VSX_TIMODE)
+      && mode != PTImode
       && (mode != DImode || TARGET_POWERPC64)
       && ((mode != DFmode && mode != DDmode) || TARGET_POWERPC64
 	  || (TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT)))
@@ -6650,10 +6695,12 @@  rs6000_legitimate_address_p (enum machin
   if (legitimate_indirect_address_p (x, reg_ok_strict))
     return 1;
   if ((GET_CODE (x) == PRE_INC || GET_CODE (x) == PRE_DEC)
-      && !VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
+      && !ALTIVEC_OR_VSX_VECTOR_MODE (mode)
       && !SPE_VECTOR_MODE (mode)
       && mode != TFmode
       && mode != TDmode
+      && mode != TImode
+      && mode != PTImode
       /* Restrict addressing for DI because of our SUBREG hackery.  */
       && !(TARGET_E500_DOUBLE
 	   && (mode == DFmode || mode == DDmode || mode == DImode))
@@ -6678,26 +6725,28 @@  rs6000_legitimate_address_p (enum machin
     return 1;
   if (rs6000_legitimate_offset_address_p (mode, x, reg_ok_strict, false))
     return 1;
-  if (mode != TImode
-      && mode != TFmode
+  if (mode != TFmode
       && mode != TDmode
       && ((TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT)
 	  || TARGET_POWERPC64
 	  || (mode != DFmode && mode != DDmode)
 	  || (TARGET_E500_DOUBLE && mode != DDmode))
       && (TARGET_POWERPC64 || mode != DImode)
+      && (mode != TImode || VECTOR_MEM_VSX_P (TImode))
+      && mode != PTImode
       && !avoiding_indexed_address_p (mode)
       && legitimate_indexed_address_p (x, reg_ok_strict))
     return 1;
   if (GET_CODE (x) == PRE_MODIFY
       && mode != TImode
+      && mode != PTImode
       && mode != TFmode
       && mode != TDmode
       && ((TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT)
 	  || TARGET_POWERPC64
 	  || ((mode != DFmode && mode != DDmode) || TARGET_E500_DOUBLE))
       && (TARGET_POWERPC64 || mode != DImode)
-      && !VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
+      && !ALTIVEC_OR_VSX_VECTOR_MODE (mode)
       && !SPE_VECTOR_MODE (mode)
       /* Restrict addressing for DI because of our SUBREG hackery.  */
       && !(TARGET_E500_DOUBLE
@@ -7140,7 +7189,7 @@  rs6000_emit_set_long_const (rtx dest, HO
 }
 
 /* Helper for the following.  Get rid of [r+r] memory refs
-   in cases where it won't work (TImode, TFmode, TDmode).  */
+   in cases where it won't work (TImode, TFmode, TDmode, PTImode).  */
 
 static void
 rs6000_eliminate_indexed_memrefs (rtx operands[2])
@@ -7524,6 +7573,11 @@  rs6000_emit_move (rtx dest, rtx source, 
       break;
 
     case TImode:
+      if (!VECTOR_MEM_VSX_P (TImode))
+	rs6000_eliminate_indexed_memrefs (operands);
+      break;
+
+    case PTImode:
       rs6000_eliminate_indexed_memrefs (operands);
       break;
 
@@ -13893,7 +13947,7 @@  rs6000_secondary_reload (bool in_p,
 	  if (rclass == GENERAL_REGS || rclass == BASE_REGS)
 	    {
 	      if (!legitimate_indirect_address_p (addr, false)
-		  && !rs6000_legitimate_offset_address_p (TImode, addr,
+		  && !rs6000_legitimate_offset_address_p (PTImode, addr,
 							  false, true))
 		{
 		  sri->icode = icode;
@@ -13903,8 +13957,20 @@  rs6000_secondary_reload (bool in_p,
 				     + ((GET_CODE (addr) == AND) ? 1 : 0));
 		}
 	    }
-	  /* Loads to and stores from vector registers can only do reg+reg
-	     addressing.  Altivec registers can also do (reg+reg)&(-16).  */
+         /* Allow scalar loads to/from the traditional floating point
+            registers, even if VSX memory is set.  */
+         else if ((rclass == FLOAT_REGS || rclass == NO_REGS)
+                  && (GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8)
+                  && (legitimate_indirect_address_p (addr, false)
+                      || legitimate_indirect_address_p (XEXP (addr, 0), false)
+                      || rs6000_legitimate_offset_address_p (mode, addr,
+                                                             false, true)))
+
+           ;
+         /* Loads to and stores from vector registers can only do reg+reg
+            addressing.  Altivec registers can also do (reg+reg)&(-16).  Allow
+            scalar modes loading up the traditional floating point registers
+            to use offset addresses.  */
 	  else if (rclass == VSX_REGS || rclass == ALTIVEC_REGS
 		   || rclass == FLOAT_REGS || rclass == NO_REGS)
 	    {
@@ -14130,7 +14196,7 @@  rs6000_secondary_reload_inner (rtx reg, 
 
       if (GET_CODE (addr) == PLUS
 	  && (and_op2 != NULL_RTX
-	      || !rs6000_legitimate_offset_address_p (TImode, addr,
+	      || !rs6000_legitimate_offset_address_p (PTImode, addr,
 						      false, true)))
 	{
 	  addr_op1 = XEXP (addr, 0);
@@ -14164,7 +14230,7 @@  rs6000_secondary_reload_inner (rtx reg, 
 	  scratch_or_premodify = scratch;
 	}
       else if (!legitimate_indirect_address_p (addr, false)
-	       && !rs6000_legitimate_offset_address_p (TImode, addr,
+	       && !rs6000_legitimate_offset_address_p (PTImode, addr,
 						       false, true))
 	{
 	  if (TARGET_DEBUG_ADDR)
@@ -14180,9 +14246,21 @@  rs6000_secondary_reload_inner (rtx reg, 
 	}
       break;
 
-      /* Float/Altivec registers can only handle reg+reg addressing.  Move
-	 other addresses into a scratch register.  */
+      /* Float registers can do offset+reg addressing for scalar types.  */
     case FLOAT_REGS:
+      if (legitimate_indirect_address_p (addr, false)	/* reg */
+	  || legitimate_indexed_address_p (addr, false)	/* reg+reg */
+	  || ((GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8)
+	      && and_op2 == NULL_RTX
+	      && scratch_or_premodify == scratch
+	      && rs6000_legitimate_offset_address_p (mode, addr, false, false)))
+	break;
+
+      /* If this isn't a legacy floating point load/store, fall through to the
+	 VSX defaults.  */
+
+      /* VSX/Altivec registers can only handle reg+reg addressing.  Move other
+	 addresses into a scratch register.  */
     case VSX_REGS:
     case ALTIVEC_REGS:
 
@@ -14202,7 +14280,9 @@  rs6000_secondary_reload_inner (rtx reg, 
       /* If we aren't using a VSX load, save the PRE_MODIFY register and use it
 	 as the address later.  */
       if (GET_CODE (addr) == PRE_MODIFY
-	  && (!VECTOR_MEM_VSX_P (mode)
+	  && ((ALTIVEC_OR_VSX_VECTOR_MODE (mode)
+	       && (rclass != FLOAT_REGS
+		   || (GET_MODE_SIZE (mode) != 4 && GET_MODE_SIZE (mode) != 8)))
 	      || and_op2 != NULL_RTX
 	      || !legitimate_indexed_address_p (XEXP (addr, 1), false)))
 	{
@@ -14218,16 +14298,12 @@  rs6000_secondary_reload_inner (rtx reg, 
 
       if (legitimate_indirect_address_p (addr, false)	/* reg */
 	  || legitimate_indexed_address_p (addr, false)	/* reg+reg */
-	  || GET_CODE (addr) == PRE_MODIFY		/* VSX pre-modify */
 	  || (GET_CODE (addr) == AND			/* Altivec memory */
+	      && rclass == ALTIVEC_REGS
 	      && GET_CODE (XEXP (addr, 1)) == CONST_INT
 	      && INTVAL (XEXP (addr, 1)) == -16
-	      && VECTOR_MEM_ALTIVEC_P (mode))
-	  || (rclass == FLOAT_REGS			/* legacy float mem */
-	      && GET_MODE_SIZE (mode) == 8
-	      && and_op2 == NULL_RTX
-	      && scratch_or_premodify == scratch
-	      && rs6000_legitimate_offset_address_p (mode, addr, false, false)))
+	      && (legitimate_indirect_address_p (XEXP (addr, 0), false)
+		  || legitimate_indexed_address_p (XEXP (addr, 0), false))))
 	;
 
       else if (GET_CODE (addr) == PLUS)
@@ -14254,7 +14330,8 @@  rs6000_secondary_reload_inner (rtx reg, 
 	}
 
       else if (GET_CODE (addr) == SYMBOL_REF || GET_CODE (addr) == CONST
-	       || GET_CODE (addr) == CONST_INT || REG_P (addr))
+	       || GET_CODE (addr) == CONST_INT || GET_CODE (addr) == LO_SUM
+	       || REG_P (addr))
 	{
 	  if (TARGET_DEBUG_ADDR)
 	    {
@@ -14639,11 +14716,17 @@  rs6000_secondary_reload_class (enum reg_
     return (mode != SDmode) ? NO_REGS : GENERAL_REGS;
 
   /* Memory, and FP/altivec registers can go into fp/altivec registers under
-     VSX.  */
+     VSX.  However, for scalar variables, use the traditional floating point
+     registers so that we can use offset+register addressing.  */
   if (TARGET_VSX
       && (regno == -1 || VSX_REGNO_P (regno))
       && VSX_REG_CLASS_P (rclass))
-    return NO_REGS;
+    {
+      if (GET_MODE_SIZE (mode) < 16)
+	return FLOAT_REGS;
+
+      return NO_REGS;
+    }
 
   /* Memory, and AltiVec registers can go into AltiVec registers.  */
   if ((regno == -1 || ALTIVEC_REGNO_P (regno))
@@ -14688,8 +14771,35 @@  rs6000_cannot_change_mode_class (enum ma
   if (from_size != to_size)
     {
       enum reg_class xclass = (TARGET_VSX) ? VSX_REGS : FLOAT_REGS;
-      return ((from_size < 8 || to_size < 8 || TARGET_IEEEQUAD)
-	      && reg_classes_intersect_p (xclass, rclass));
+
+      if (reg_classes_intersect_p (xclass, rclass))
+	{
+	  unsigned to_nregs = hard_regno_nregs[FIRST_FPR_REGNO][to];
+	  unsigned from_nregs = hard_regno_nregs[FIRST_FPR_REGNO][from];
+
+	  /* Don't allow 64-bit types to overlap with 128-bit types that take a
+	     single register under VSX because the scalar part of the register
+	     is in the upper 64-bits, and not the lower 64-bits.  Types like
+	     TFmode/TDmode that take 2 scalar register can overlap.  128-bit
+	     IEEE floating point can't overlap, and neither can small
+	     values.  */
+
+	  if (TARGET_IEEEQUAD && (to == TFmode || from == TFmode))
+	    return true;
+
+	  if (from_size < 8 || to_size < 8)
+	    return true;
+
+	  if (from_size == 8 && (8 * to_nregs) != to_size)
+	    return true;
+
+	  if (to_size == 8 && (8 * from_nregs) != from_size)
+	    return true;
+
+	  return false;
+	}
+      else
+	return false;
     }
 
   if (TARGET_E500_DOUBLE
@@ -14703,9 +14813,18 @@  rs6000_cannot_change_mode_class (enum ma
   /* Since the VSX register set includes traditional floating point registers
      and altivec registers, just check for the size being different instead of
      trying to check whether the modes are vector modes.  Otherwise it won't
-     allow say DF and DI to change classes.  */
+     allow say DF and DI to change classes.  For types like TFmode and TDmode
+     that take 2 64-bit registers, rather than a single 128-bit register, don't
+     allow subregs of those types to other 128 bit types.  */
   if (TARGET_VSX && VSX_REG_CLASS_P (rclass))
-    return (from_size != 8 && from_size != 16);
+    {
+      unsigned num_regs = (from_size + 15) / 16;
+      if (hard_regno_nregs[FIRST_FPR_REGNO][to] > num_regs
+	  || hard_regno_nregs[FIRST_FPR_REGNO][from] > num_regs)
+	return true;
+
+      return (from_size != 8 && from_size != 16);
+    }
 
   if (TARGET_ALTIVEC && rclass == ALTIVEC_REGS
       && (ALTIVEC_VECTOR_MODE (from) + ALTIVEC_VECTOR_MODE (to)) == 1)
@@ -15440,7 +15559,7 @@  print_operand (FILE *file, rtx x, int co
       return;
 
     case 'Y':
-      /* Like 'L', for third word of TImode  */
+      /* Like 'L', for third word of TImode/PTImode  */
       if (REG_P (x))
 	fputs (reg_names[REGNO (x) + 2], file);
       else if (MEM_P (x))
@@ -15490,7 +15609,7 @@  print_operand (FILE *file, rtx x, int co
       return;
 
     case 'Z':
-      /* Like 'L', for last word of TImode.  */
+      /* Like 'L', for last word of TImode/PTImode.  */
       if (REG_P (x))
 	fputs (reg_names[REGNO (x) + 3], file);
       else if (MEM_P (x))
@@ -15521,7 +15640,8 @@  print_operand (FILE *file, rtx x, int co
 	if ((TARGET_SPE || TARGET_E500_DOUBLE)
 	    && (GET_MODE_SIZE (GET_MODE (x)) == 8
 		|| GET_MODE (x) == TFmode
-		|| GET_MODE (x) == TImode))
+		|| GET_MODE (x) == TImode
+		|| GET_MODE (x) == PTImode))
 	  {
 	    /* Handle [reg].  */
 	    if (REG_P (tmp))
@@ -26574,7 +26694,7 @@  rs6000_register_move_cost (enum machine_
     }
 
   /* If we have VSX, we can easily move between FPR or Altivec registers.  */
-  else if (VECTOR_UNIT_VSX_P (mode)
+  else if (VECTOR_MEM_VSX_P (mode)
 	   && reg_classes_intersect_p (to, VSX_REGS)
 	   && reg_classes_intersect_p (from, VSX_REGS))
     ret = 2 * hard_regno_nregs[32][mode];
@@ -26615,7 +26735,8 @@  rs6000_memory_move_cost (enum machine_mo
 
   if (reg_classes_intersect_p (rclass, GENERAL_REGS))
     ret = 4 * hard_regno_nregs[0][mode];
-  else if (reg_classes_intersect_p (rclass, FLOAT_REGS))
+  else if ((reg_classes_intersect_p (rclass, FLOAT_REGS)
+	    || reg_classes_intersect_p (rclass, VSX_REGS)))
     ret = 4 * hard_regno_nregs[32][mode];
   else if (reg_classes_intersect_p (rclass, ALTIVEC_REGS))
     ret = 4 * hard_regno_nregs[FIRST_ALTIVEC_REGNO][mode];
@@ -27829,6 +27950,7 @@  static struct rs6000_opt_mask const rs60
   { "recip-precision",		OPTION_MASK_RECIP_PRECISION,	false, true  },
   { "string",			OPTION_MASK_STRING,		false, true  },
   { "vsx",			OPTION_MASK_VSX,		false, true  },
+  { "vsx-timode",		OPTION_MASK_VSX_TIMODE,		false, true  },
 #ifdef OPTION_MASK_64BIT
 #if TARGET_AIX_OS
   { "aix64",			OPTION_MASK_64BIT,		false, false },
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 195592)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -48,7 +48,7 @@  (define_mode_attr VSm  [(V16QI "vw4")
 			(V2DF  "vd2")
 			(V2DI  "vd2")
 			(DF    "d")
-			(TI    "vw4")])
+			(TI    "vd2")])
 
 ;; Map into the appropriate suffix based on the type
 (define_mode_attr VSs	[(V16QI "sp")
@@ -59,7 +59,7 @@  (define_mode_attr VSs	[(V16QI "sp")
 			 (V2DI  "dp")
 			 (DF    "dp")
 			 (SF	"sp")
-			 (TI    "sp")])
+			 (TI    "dp")])
 
 ;; Map the register class used
 (define_mode_attr VSr	[(V16QI "v")
@@ -70,7 +70,7 @@  (define_mode_attr VSr	[(V16QI "v")
 			 (V2DF  "wd")
 			 (DF    "ws")
 			 (SF	"d")
-			 (TI    "wd")])
+			 (TI    "wt")])
 
 ;; Map the register class used for float<->int conversions
 (define_mode_attr VSr2	[(V2DF  "wd")
@@ -115,7 +115,6 @@  (define_mode_attr VSv	[(V16QI "v")
 			 (V4SF  "v")
 			 (V2DI  "v")
 			 (V2DF  "v")
-			 (TI    "v")
 			 (DF    "s")])
 
 ;; Appropriate type for add ops (and other simple FP ops)
@@ -268,12 +267,13 @@  (define_insn "*vsx_mov<mode>"
 }
   [(set_attr "type" "vecstore,vecload,vecsimple,vecstore,vecload,vecsimple,*,*,*,vecsimple,vecsimple,*,vecstore,vecload")])
 
-;; Unlike other VSX moves, allow the GPRs, since a normal use of TImode is for
-;; unions.  However for plain data movement, slightly favor the vector loads
-(define_insn "*vsx_movti"
-  [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,?Y,?r,?r,wa,v,v,wZ")
-	(match_operand:TI 1 "input_operand" "wa,Z,wa,r,Y,r,j,W,wZ,v"))]
-  "VECTOR_MEM_VSX_P (TImode)
+;; Unlike other VSX moves, allow the GPRs even for reloading, since a normal
+;; use of TImode is for unions.  However for plain data movement, slightly
+;; favor the vector loads
+(define_insn "*vsx_movti_64bit"
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,v, v,wZ,?Y,?r,?r")
+	(match_operand:TI 1 "input_operand"        "wa, Z,wa, j,W,wZ, v, r, Y, r"))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (TImode)
    && (register_operand (operands[0], TImode) 
        || register_operand (operands[1], TImode))"
 {
@@ -289,27 +289,87 @@  (define_insn "*vsx_movti"
       return "xxlor %x0,%x1,%x1";
 
     case 3:
+      return "xxlxor %x0,%x0,%x0";
+
     case 4:
+      return output_vec_const_move (operands);
+
     case 5:
-      return "#";
+      return "stvx %1,%y0";
 
     case 6:
-      return "xxlxor %x0,%x0,%x0";
+      return "lvx %0,%y1";
 
     case 7:
+    case 8:
+    case 9:
+      return "#";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "type" "vecstore,vecload,vecsimple,vecsimple,vecsimple,vecstore,vecload,*,*,*")
+   (set_attr "length" "     4,      4,        4,       4,         8,       4,      4,8,8,8")])
+
+(define_insn "*vsx_movti_32bit"
+  [(set (match_operand:TI 0 "nonimmediate_operand" "=Z,wa,wa,wa,v, v,wZ,Q,Y,????r,????r,????r,r")
+	(match_operand:TI 1 "input_operand"        "wa, Z,wa, j,W,wZ, v,r,r,    Q,    Y,    r,n"))]
+  "! TARGET_POWERPC64 && VECTOR_MEM_VSX_P (TImode)
+   && (register_operand (operands[0], TImode)
+       || register_operand (operands[1], TImode))"
+{
+  switch (which_alternative)
+    {
+    case 0:
+      return "stxvd2x %x1,%y0";
+
+    case 1:
+      return "lxvd2x %x0,%y1";
+
+    case 2:
+      return "xxlor %x0,%x1,%x1";
+
+    case 3:
+      return "xxlxor %x0,%x0,%x0";
+
+    case 4:
       return output_vec_const_move (operands);
 
-    case 8:
+    case 5:
       return "stvx %1,%y0";
 
-    case 9:
+    case 6:
       return "lvx %0,%y1";
 
+    case 7:
+      if (TARGET_STRING)
+        return \"stswi %1,%P0,16\";
+
+    case 8:
+      return \"#\";
+
+    case 9:
+      /* If the address is not used in the output, we can use lsi.  Otherwise,
+	 fall through to generating four loads.  */
+      if (TARGET_STRING
+          && ! reg_overlap_mentioned_p (operands[0], operands[1]))
+	return \"lswi %0,%P1,16\";
+      /* ... fall through ...  */
+
+    case 10:
+    case 11:
+    case 12:
+      return \"#\";
     default:
       gcc_unreachable ();
     }
 }
-  [(set_attr "type" "vecstore,vecload,vecsimple,*,*,*,vecsimple,*,vecstore,vecload")])
+  [(set_attr "type" "vecstore,vecload,vecsimple,vecsimple,vecsimple,vecstore,vecload,store_ux,store_ux,load_ux,load_ux, *, *")
+   (set_attr "length" "     4,      4,        4,       4,         8,       4,      4,      16,      16,     16,     16,16,16")
+   (set (attr "cell_micro") (if_then_else (match_test "TARGET_STRING")
+   			                  (const_string "always")
+					  (const_string "conditional")))])
 
 ;; Explicit  load/store expanders for the builtin functions
 (define_expand "vsx_load_<mode>"
@@ -319,8 +379,8 @@  (define_expand "vsx_load_<mode>"
   "")
 
 (define_expand "vsx_store_<mode>"
-  [(set (match_operand:VEC_M 0 "memory_operand" "")
-	(match_operand:VEC_M 1 "vsx_register_operand" ""))]
+  [(set (match_operand:VSX_M 0 "memory_operand" "")
+	(match_operand:VSX_M 1 "vsx_register_operand" ""))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
   "")
 
@@ -1026,38 +1086,46 @@  (define_insn "*vsx_float_fix_<mode>2"
    (set_attr "fp_type" "<VSfptype_simple>")])
 
 
-;; Logical and permute operations
+;; Logical operations
+;; Do not support TImode logical instructions on 32-bit at present, because the
+;; compiler will see that we have a TImode and when it wanted DImode, and
+;; convert the DImode to TImode, store it on the stack, and load it in a VSX
+;; register.
 (define_insn "*vsx_and<mode>3"
   [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
         (and:VSX_L
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,?wa")
-	 (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,?wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)"
+	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")
+	 (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "xxland %x0,%x1,%x2"
   [(set_attr "type" "vecsimple")])
 
 (define_insn "*vsx_ior<mode>3"
   [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
-        (ior:VSX_L (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,?wa")
-		   (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,?wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)"
+        (ior:VSX_L (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")
+		   (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "xxlor %x0,%x1,%x2"
   [(set_attr "type" "vecsimple")])
 
 (define_insn "*vsx_xor<mode>3"
   [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
         (xor:VSX_L
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,?wa")
-	 (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,?wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)"
+	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")
+	 (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "xxlxor %x0,%x1,%x2"
   [(set_attr "type" "vecsimple")])
 
 (define_insn "*vsx_one_cmpl<mode>2"
   [(set (match_operand:VSX_L 0 "vsx_register_operand" "=<VSr>,?wa")
         (not:VSX_L
-	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,?wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)"
+	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "xxlnor %x0,%x1,%x1"
   [(set_attr "type" "vecsimple")])
   
@@ -1067,7 +1135,8 @@  (define_insn "*vsx_nor<mode>3"
 	 (ior:VSX_L
 	  (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,?wa")
 	  (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,?wa"))))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)"
+  "VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "xxlnor %x0,%x1,%x2"
   [(set_attr "type" "vecsimple")])
 
@@ -1077,7 +1146,8 @@  (define_insn "*vsx_andc<mode>3"
 	 (not:VSX_L
 	  (match_operand:VSX_L 2 "vsx_register_operand" "<VSr>,?wa"))
 	 (match_operand:VSX_L 1 "vsx_register_operand" "<VSr>,?wa")))]
-  "VECTOR_MEM_VSX_P (<MODE>mode)"
+  "VECTOR_MEM_VSX_P (<MODE>mode)
+   && (<MODE>mode != TImode || TARGET_POWERPC64)"
   "xxlandc %x0,%x1,%x2"
   [(set_attr "type" "vecsimple")])
 
@@ -1086,11 +1156,10 @@  (define_insn "*vsx_andc<mode>3"
 
 ;; Build a V2DF/V2DI vector from two scalars
 (define_insn "vsx_concat_<mode>"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?wa")
-	(unspec:VSX_D
-	 [(match_operand:<VS_scalar> 1 "vsx_register_operand" "ws,wa")
-	  (match_operand:<VS_scalar> 2 "vsx_register_operand" "ws,wa")]
-	 UNSPEC_VSX_CONCAT))]
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=<VSr>,?wa")
+	(vec_concat:VSX_D
+	 (match_operand:<VS_scalar> 1 "vsx_register_operand" "ws,wa")
+	 (match_operand:<VS_scalar> 2 "vsx_register_operand" "ws,wa")))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
   "xxpermdi %x0,%x1,%x2,0"
   [(set_attr "type" "vecperm")])
@@ -1212,8 +1281,8 @@  (define_expand "vsx_xxpermdi_<mode>"
       if (<MODE>mode != V2DImode)
 	{
 	  target = gen_lowpart (V2DImode, target);
-	  op0 = gen_lowpart (V2DImode, target);
-	  op1 = gen_lowpart (V2DImode, target);
+	  op0 = gen_lowpart (V2DImode, op0);
+	  op1 = gen_lowpart (V2DImode, op1);
 	}
     }
   emit_insn (gen (target, op0, op1, perm0, perm1));
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 195592)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -504,6 +504,7 @@  extern int rs6000_vector_align[];
 #define MASK_STRING			OPTION_MASK_STRING
 #define MASK_UPDATE			OPTION_MASK_UPDATE
 #define MASK_VSX			OPTION_MASK_VSX
+#define MASK_VSX_TIMODE			OPTION_MASK_VSX_TIMODE
 
 #ifndef IN_LIBGCC2
 #define MASK_POWERPC64			OPTION_MASK_POWERPC64
@@ -1328,6 +1329,7 @@  enum r6000_reg_class_enum {
   RS6000_CONSTRAINT_wf,		/* VSX register for V4SF */
   RS6000_CONSTRAINT_wl,		/* FPR register for LFIWAX */
   RS6000_CONSTRAINT_ws,		/* VSX register for DF */
+  RS6000_CONSTRAINT_wt,		/* VSX register for TImode */
   RS6000_CONSTRAINT_wx,		/* FPR register for STFIWX */
   RS6000_CONSTRAINT_wz,		/* FPR register for LFIWZX */
   RS6000_CONSTRAINT_MAX
@@ -1514,7 +1516,7 @@  extern enum reg_class rs6000_constraints
    NONLOCAL needs twice Pmode to maintain both backchain and SP.  */
 #define STACK_SAVEAREA_MODE(LEVEL)	\
   (LEVEL == SAVE_FUNCTION ? VOIDmode	\
-  : LEVEL == SAVE_NONLOCAL ? (TARGET_32BIT ? DImode : TImode) : Pmode)
+  : LEVEL == SAVE_NONLOCAL ? (TARGET_32BIT ? DImode : PTImode) : Pmode)
 
 /* Minimum and maximum general purpose registers used to hold arguments.  */
 #define GP_ARG_MIN_REG 3
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 195625)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -219,7 +219,7 @@  (define_attr "cell_micro" "not,condition
 (define_mode_iterator GPR [SI (DI "TARGET_POWERPC64")])
 
 ; Any supported integer mode.
-(define_mode_iterator INT [QI HI SI DI TI])
+(define_mode_iterator INT [QI HI SI DI TI PTI])
 
 ; Any supported integer mode that fits in one register.
 (define_mode_iterator INT1 [QI HI SI (DI "TARGET_POWERPC64")])
@@ -234,6 +234,10 @@  (define_mode_iterator SDI [SI DI])
 ; (one with a '.') will compare; and the size used for arithmetic carries.
 (define_mode_iterator P [(SI "TARGET_32BIT") (DI "TARGET_64BIT")])
 
+; Iterator to add PTImode along with TImode (TImode can go in VSX registers,
+; PTImode is GPR only)
+(define_mode_iterator TI2 [TI PTI])
+
 ; Any hardware-supported floating-point mode
 (define_mode_iterator FP [
   (SF "TARGET_HARD_FLOAT 
@@ -304,7 +308,14 @@  (define_code_attr return_str [(return ""
 
 ; Various instructions that come in SI and DI forms.
 ; A generic w/d attribute, for things like cmpw/cmpd.
-(define_mode_attr wd [(QI "b") (HI "h") (SI "w") (DI "d")])
+(define_mode_attr wd [(QI    "b")
+		      (HI    "h")
+		      (SI    "w")
+		      (DI    "d")
+		      (V16QI "b")
+		      (V8HI  "h")
+		      (V4SI  "w")
+		      (V2DI  "d")])
 
 ; DImode bits
 (define_mode_attr dbits [(QI "56") (HI "48") (SI "32")])
@@ -8635,14 +8646,16 @@  (define_split
     FAIL;
 }")
 
-;; TImode is similar, except that we usually want to compute the address into
-;; a register and use lsi/stsi (the exception is during reload).
+;; TImode/PTImode is similar, except that we usually want to compute the
+;; address into a register and use lsi/stsi (the exception is during reload).
 
-(define_insn "*movti_string"
-  [(set (match_operand:TI 0 "reg_or_mem_operand" "=Q,Y,????r,????r,????r,r")
-	(match_operand:TI 1 "input_operand" "r,r,Q,Y,r,n"))]
+(define_insn "*mov<mode>_string"
+  [(set (match_operand:TI2 0 "reg_or_mem_operand" "=Q,Y,????r,????r,????r,r")
+	(match_operand:TI2 1 "input_operand" "r,r,Q,Y,r,n"))]
   "! TARGET_POWERPC64
-   && (gpc_reg_operand (operands[0], TImode) || gpc_reg_operand (operands[1], TImode))"
+   && (<MODE>mode != TImode || VECTOR_MEM_NONE_P (TImode))
+   && (gpc_reg_operand (operands[0], <MODE>mode)
+       || gpc_reg_operand (operands[1], <MODE>mode))"
   "*
 {
   switch (which_alternative)
@@ -8672,27 +8685,28 @@  (define_insn "*movti_string"
    			                  (const_string "always")
 					  (const_string "conditional")))])
 
-(define_insn "*movti_ppc64"
-  [(set (match_operand:TI 0 "nonimmediate_operand" "=Y,r,r")
-	(match_operand:TI 1 "input_operand" "r,Y,r"))]
-  "(TARGET_POWERPC64 && (gpc_reg_operand (operands[0], TImode)
-    || gpc_reg_operand (operands[1], TImode)))
-   && VECTOR_MEM_NONE_P (TImode)"
+(define_insn "*mov<mode>_ppc64"
+  [(set (match_operand:TI2 0 "nonimmediate_operand" "=Y,r,r")
+	(match_operand:TI2 1 "input_operand" "r,Y,r"))]
+  "(TARGET_POWERPC64
+   && (<MODE>mode != TImode || VECTOR_MEM_NONE_P (TImode))
+   && (gpc_reg_operand (operands[0], <MODE>mode)
+       || gpc_reg_operand (operands[1], <MODE>mode)))"
   "#"
   [(set_attr "type" "store,load,*")])
 
 (define_split
-  [(set (match_operand:TI 0 "gpc_reg_operand" "")
-	(match_operand:TI 1 "const_double_operand" ""))]
-  "TARGET_POWERPC64 && VECTOR_MEM_NONE_P (TImode)"
+  [(set (match_operand:TI2 0 "gpc_reg_operand" "")
+	(match_operand:TI2 1 "const_double_operand" ""))]
+  "TARGET_POWERPC64"
   [(set (match_dup 2) (match_dup 4))
    (set (match_dup 3) (match_dup 5))]
   "
 {
   operands[2] = operand_subword_force (operands[0], WORDS_BIG_ENDIAN == 0,
-				       TImode);
+				       <MODE>mode);
   operands[3] = operand_subword_force (operands[0], WORDS_BIG_ENDIAN != 0,
-				       TImode);
+				       <MODE>mode);
   if (GET_CODE (operands[1]) == CONST_DOUBLE)
     {
       operands[4] = GEN_INT (CONST_DOUBLE_HIGH (operands[1]));
@@ -8708,9 +8722,9 @@  (define_split
 }")
 
 (define_split
-  [(set (match_operand:TI 0 "nonimmediate_operand" "")
-        (match_operand:TI 1 "input_operand" ""))]
-  "reload_completed && VECTOR_MEM_NONE_P (TImode)
+  [(set (match_operand:TI2 0 "nonimmediate_operand" "")
+        (match_operand:TI2 1 "input_operand" ""))]
+  "reload_completed
    && gpr_or_gpr_p (operands[0], operands[1])"
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; })
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 195592)
+++ gcc/doc/md.texi	(working copy)
@@ -2084,6 +2084,9 @@  If the LFIWAX instruction is enabled, a 
 @item ws
 VSX vector register to hold scalar float data
 
+@item wt
+VSX vector register to hold 128 bit integer
+
 @item wx
 If the STFIWX instruction is enabled, a floating point register