diff mbox

, Allow SImode to go into VSX registers on PowerPC ISA 2.07 (power8) and above

Message ID 20161026225154.GA6588@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner Oct. 26, 2016, 10:51 p.m. UTC
PowerPC GCC has traditionally only allowed DImode to go into FPR registers (and
now VSX registers) in order to allow floating point conversions.  Conversions
to/from SImode have always had to deal with special UNSPECs to allow the
generation of the LFIWAX, LXSIWAX, LFIWZX, LXSIWZX, STFIWX, and STXSIWX
instructions, since SImode was not allowed in the registers.

This patch adds support for ISA 2.07 (power8) and above to allow SImode values
in the vector registers.  It adds a new debug switch (-mvsx-small-integer) that
can turn off this support.

I have built a boostrap build on a little endian 64-bit Power8 computer, and a
big endian 64-bit Power8 compiler, and both runs compiled and had no regression
in the test suite.  I have started a big endian 64-bit Power7 build right now
that supports both 32-bit and 64-bit calls.  If the power7 build finishes
without regressions, can I check in this patch?

In addition, I did a full Spec 2006 run with an earlier version of the patch
that did not make -mvsx-small-integer default using an explicit switch.  All 29
benchmarks in the 2006 CPU suite ran.  The 436.cactusADM benchmark had a 3%
improvement with -mvsx-small-integer, and all of the other benchmarks were
performance neutral.  The difference is it eliminates some load/store of GPRs
and move directs in favor of doing the load/store 32-bit integer instructions
to the FPRs.

[gcc]
2016-10-26  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/constraints.md (wH constraint): Add new
	constraints for allowing 32-bit integers (and eventually 8/16-bit
	integers) into the vector registers.
	(wI constraint): Likewise.
	(wJ constraint): Likewise.
	(wK constraint): Likewise.
	* config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Add
	-mvsx-small-integer as a default option for ISA 2.07
	(i.e. power8).
	(POWERPC_MASKS): Likewise.
	* config/rs6000/rs6000.opt (-mvsx-small-integer): Add new debug
	switch to turn off small integer support in vector registers.
	* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Eliminate
	test for -mupper-regs-di, since it is already done with the
	reg_add[mode].scalar_in_vsx_p.  Add support for the switch
	-mvsx-small-integer.
	(rs6000_debug_reg_global): Add support for wH, wI, wJ, and wK
	constraints.
	(rs6000_setup_reg_addr_masks): Likewise.
	(rs6000_init_hard_regno_mode_ok): Likewise.
	(rs6000_option_override_internal): Add consistency checks for
	-mvsx-small-integer.
	(rs6000_secondary_reload_simple_move): SImode is a simple move if
	-mvsx-small-integer.
	(rs6000_secondary_reload): Use std::swap.
	(rs6000_preferred_reload_class): Don't prefer FLOAT_REGS over
	VSX_REGS for small integers in vector registers, since there is no
	D-FORM address mode for such types.
	(rs6000_register_move_cost): Use FIRST_FPR_REGNO instead of 32.
	(rs6000_opt_masks): Add -mvsx-small-integer.
	* config/rs6000/vsx.md (VSINT_84): Add SImode for small integer
	support.
	(VSX_EXTRACT_I2): Clone VSX_EXTRACT_I, but drop V4SI since SImode
	extracts can be done on ISA 2.07.
	(vsx_extract_<mode>): Add support for small integers in vsx
	registers.
	(vsx_extract_<mode>_p9): Use 'v' instead of VSX_EX, since we no
	longer support V4SImode in this pattern.
	(vsx_extract_si): New insn to support extraction of SImode in ISA
	2.07 using either xxextractuw or vspltw.
	(vsx_extract_<mode>_p8): Use 'v' instead of VSX_EX, since we no
	longer support V4SImode in this pattern.
	* config/rs6000/rs6000.h (enum rs6000_reg_class_enum): Add wH, wI,
	wJ, and wK constraints.
	* config/rs6000/rs6000.md (f32_sv): Use correct instruction for
	storing SDmode with VSX instructions.
	(zero_extendsi<mode>2): Reorder pattern, so RLDICL comes before
	the FPR and VSX loads, but before MTVSRWZ.  Remove ??, ! from the
	constraints.  Add MFVSRWZ and XXEXTRACTUW instructions to support
	small integers in vector registers.
	(extendsi<mode>2): Reorder pattern, so EXTSW comes before the FPR
	and VSX loads, but before MTVSRWA.  Remove ??, ! from the
	constraints.  Add VEXTSW2D support for small integers in vector
	registers.
	(lfiwax): Remove ! constraint.  Add VEXTSW2D support for small
	integers in vector registers.
	(floatsi<mode>2_lfiwax): If -mvsx-small-integer issue a normal
	move instead of using an UNSPEC.
	(lfiwzx): Remove ! constraint.  Add XXEXTRACTUW support for small
	integers in vector registers.
	(floatunssi<mode>2_lfiwzx): If -mvsx-small-integer issue a normal
	move instead of using an UNSPEC.
	(movsi_internal1): Add support for -mvsx-small-integer.  Align
	columns so that it is more readable.
	(SImode splitter for ISA 3.0 constants): Add splitter for
	-128..127 constants that can easily be constructed on ISA 3.0.
	* doc/md.texi (PowerPC Constraints): Document wH, wI, wJ, and wK
	constraints.

[gcc/testsuite]
2016-10-26  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/vsx-simode.c: New test.
	* gcc.target/powerpc/vsx-simode2.c: Likewise.
	* gcc.target/powerpc/vsx-simode3.c: Likewise.

Comments

Michael Meissner Oct. 26, 2016, 11 p.m. UTC | #1
I forgot to mention, I will be working on a follow-on patch to this that
enables QImode and HImode to go in the vector registers for ISA 3.0, since ISA
3.0 now adds load (with zero extend) and store instructions for those types.

I probably also will update vector extract for the case where the small
integers can go in vector registers.

After this, I plan to work on setting vector elements, which will be easier if
the small integer types can go in vector registers.
Segher Boessenkool Oct. 27, 2016, 7:55 p.m. UTC | #2
On Wed, Oct 26, 2016 at 06:51:54PM -0400, Michael Meissner wrote:
> 	(zero_extendsi<mode>2): Reorder pattern, so RLDICL comes before
> 	the FPR and VSX loads, but before MTVSRWZ.  Remove ??, ! from the
> 	constraints.  Add MFVSRWZ and XXEXTRACTUW instructions to support
> 	small integers in vector registers.

"but those before MTVSRWZ"?  Or don't mention rldicl at all?

> 	(extendsi<mode>2): Reorder pattern, so EXTSW comes before the FPR
> 	and VSX loads, but before MTVSRWA.  Remove ??, ! from the
> 	constraints.  Add VEXTSW2D support for small integers in vector
> 	registers.

Similar here.

> @@ -3112,7 +3133,10 @@ rs6000_init_hard_regno_mode_ok (bool glo
>  	ww - Register class to do SF conversions in with VSX operations.
>  	wx - Float register if we can do 32-bit int stores.
>  	wy - Register class to do ISA 2.07 SF operations.
> -	wz - Float register if we can do 32-bit unsigned int loads.  */
> +	wz - Float register if we can do 32-bit unsigned int loads.
> +	wI - VSX register if SImode is allowed in VSX registers.
> +	wJ - VSX register if QImode/HImode are allowed in VSX registers.
> +	wK - Altivec register if QImode/HImode are allowed in VSX registers.  */

You don't mention wH here, is that an oversight?

>    /* Add support for various direct moves available.  In this function, we only
>       look at cases where we don't need any extra registers, and one or more
> -     simple move insns are issued.  At present, 32-bit integers are not allowed
> +     simple move insns are issued. Originally small integers are not allowed

dot space space.

> @@ -5019,7 +5023,10 @@ (define_insn_and_split "floatsi<mode>2_l
>    operands[1] = rs6000_address_for_fpconvert (operands[1]);
>    if (GET_CODE (operands[2]) == SCRATCH)
>      operands[2] = gen_reg_rtx (DImode);
> -  emit_insn (gen_lfiwax (operands[2], operands[1]));
> +  if (TARGET_VSX_SMALL_INTEGER)
> +    emit_insn (gen_extendsidi2 (operands[2], operands[1]));
> +  else    

Trailing spaces here.

> +	(match_operand:SI 1 "input_operand"
> +		"r,          U,           m,           Z,           Z,
> +		 r,          wI,          wH,          I,           L,
> +                 n,          wIwH,        O,           wM,          wB,

Indent with tabs instead of spaces, like the other lines.


Everything looks fine except for those nits, and I'm really happy there
is no benchmark degradation :-)

Please install to trunk (if the -m32 and power7 runs work out fine).

Thanks,


Segher
Michael Meissner Oct. 27, 2016, 8:54 p.m. UTC | #3
On Thu, Oct 27, 2016 at 02:55:51PM -0500, Segher Boessenkool wrote:
> On Wed, Oct 26, 2016 at 06:51:54PM -0400, Michael Meissner wrote:
> > 	(zero_extendsi<mode>2): Reorder pattern, so RLDICL comes before
> > 	the FPR and VSX loads, but before MTVSRWZ.  Remove ??, ! from the
> > 	constraints.  Add MFVSRWZ and XXEXTRACTUW instructions to support
> > 	small integers in vector registers.
> 
> "but those before MTVSRWZ"?  Or don't mention rldicl at all?
> 
> > 	(extendsi<mode>2): Reorder pattern, so EXTSW comes before the FPR
> > 	and VSX loads, but before MTVSRWA.  Remove ??, ! from the
> > 	constraints.  Add VEXTSW2D support for small integers in vector
> > 	registers.
> 
> Similar here.

I reworded it to say RLDICL/EXTS came after the GPR load before before the
FPR/VSX loads.

> 
> > @@ -3112,7 +3133,10 @@ rs6000_init_hard_regno_mode_ok (bool glo
> >  	ww - Register class to do SF conversions in with VSX operations.
> >  	wx - Float register if we can do 32-bit int stores.
> >  	wy - Register class to do ISA 2.07 SF operations.
> > -	wz - Float register if we can do 32-bit unsigned int loads.  */
> > +	wz - Float register if we can do 32-bit unsigned int loads.
> > +	wI - VSX register if SImode is allowed in VSX registers.
> > +	wJ - VSX register if QImode/HImode are allowed in VSX registers.
> > +	wK - Altivec register if QImode/HImode are allowed in VSX registers.  */
> 
> You don't mention wH here, is that an oversight?

Yes, thanks for catching this.
 
> >    /* Add support for various direct moves available.  In this function, we only
> >       look at cases where we don't need any extra registers, and one or more
> > -     simple move insns are issued.  At present, 32-bit integers are not allowed
> > +     simple move insns are issued. Originally small integers are not allowed
> 
> dot space space.

Thanks.

> > @@ -5019,7 +5023,10 @@ (define_insn_and_split "floatsi<mode>2_l
> >    operands[1] = rs6000_address_for_fpconvert (operands[1]);
> >    if (GET_CODE (operands[2]) == SCRATCH)
> >      operands[2] = gen_reg_rtx (DImode);
> > -  emit_insn (gen_lfiwax (operands[2], operands[1]));
> > +  if (TARGET_VSX_SMALL_INTEGER)
> > +    emit_insn (gen_extendsidi2 (operands[2], operands[1]));
> > +  else    
> 
> Trailing spaces here.

Fixed.

> > +	(match_operand:SI 1 "input_operand"
> > +		"r,          U,           m,           Z,           Z,
> > +		 r,          wI,          wH,          I,           L,
> > +                 n,          wIwH,        O,           wM,          wB,
> 
> Indent with tabs instead of spaces, like the other lines.

Thanks.

> Everything looks fine except for those nits, and I'm really happy there
> is no benchmark degradation :-)
> 
> Please install to trunk (if the -m32 and power7 runs work out fine).

Submitted as subversion id 241631.
diff mbox

Patch

Index: gcc/config/rs6000/constraints.md
===================================================================
--- gcc/config/rs6000/constraints.md	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 241274)
+++ gcc/config/rs6000/constraints.md	(.../gcc/config/rs6000)	(working copy)
@@ -159,6 +159,18 @@  (define_memory_constraint "wG"
   "Memory operand suitable for TOC fusion memory references"
   (match_operand 0 "toc_fusion_mem_wrapped"))
 
+(define_register_constraint "wH" "rs6000_constraints[RS6000_CONSTRAINT_wH]"
+  "Altivec register to hold 32-bit integers or NO_REGS.")
+
+(define_register_constraint "wI" "rs6000_constraints[RS6000_CONSTRAINT_wI]"
+  "FPR register to hold 32-bit integers or NO_REGS.")
+
+(define_register_constraint "wJ" "rs6000_constraints[RS6000_CONSTRAINT_wJ]"
+  "FPR register to hold 8/16-bit integers or NO_REGS.")
+
+(define_register_constraint "wK" "rs6000_constraints[RS6000_CONSTRAINT_wK]"
+  "Altivec register to hold 8/16-bit integers or NO_REGS.")
+
 (define_constraint "wL"
   "Int constant that is the element number mfvsrld accesses in a vector."
   (and (match_code "const_int")
Index: gcc/config/rs6000/rs6000-cpus.def
===================================================================
--- gcc/config/rs6000/rs6000-cpus.def	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 241274)
+++ gcc/config/rs6000/rs6000-cpus.def	(.../gcc/config/rs6000)	(working copy)
@@ -58,7 +58,8 @@ 
 				 | OPTION_MASK_HTM			\
 				 | OPTION_MASK_QUAD_MEMORY		\
   				 | OPTION_MASK_QUAD_MEMORY_ATOMIC	\
-				 | OPTION_MASK_UPPER_REGS_SF)
+				 | OPTION_MASK_UPPER_REGS_SF		\
+				 | OPTION_MASK_VSX_SMALL_INTEGER)
 
 /* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not add
    P9_MINMAX until the hardware that supports it is available.  Do not add
@@ -138,6 +139,7 @@ 
 				 | OPTION_MASK_UPPER_REGS_DF		\
 				 | OPTION_MASK_UPPER_REGS_SF		\
 				 | OPTION_MASK_VSX			\
+				 | OPTION_MASK_VSX_SMALL_INTEGER	\
 				 | OPTION_MASK_VSX_TIMODE)
 
 #endif
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 241274)
+++ gcc/config/rs6000/rs6000.opt	(.../gcc/config/rs6000)	(working copy)
@@ -664,3 +664,7 @@  Enable using IEEE 128-bit floating point
 mfloat128-convert
 Target Undocumented Mask(FLOAT128_CVT) Var(rs6000_isa_flags)
 Enable default conversions between __float128 & long double.
+
+mvsx-small-integer
+Target Report Mask(VSX_SMALL_INTEGER) Var(rs6000_isa_flags)
+Enable small integers to be in VSX registers.
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 241274)
+++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
@@ -1977,8 +1977,7 @@  rs6000_hard_regno_mode_ok (int regno, ma
 	  || FLOAT128_VECTOR_P (mode)
 	  || reg_addr[mode].scalar_in_vmx_p
 	  || (TARGET_VSX_TIMODE && mode == TImode)
-	  || (TARGET_VADDUQM && mode == V1TImode)
-	  || (TARGET_UPPER_REGS_DI && mode == DImode)))
+	  || (TARGET_VADDUQM && mode == V1TImode)))
     {
       if (FP_REGNO_P (regno))
 	return FP_REGNO_P (last_regno);
@@ -2009,9 +2008,14 @@  rs6000_hard_regno_mode_ok (int regno, ma
 	  && FP_REGNO_P (last_regno))
 	return 1;
 
-      if (GET_MODE_CLASS (mode) == MODE_INT
-	  && GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD)
-	return 1;
+      if (GET_MODE_CLASS (mode) == MODE_INT)
+	{
+	  if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD)
+	    return 1;
+
+	  if (TARGET_VSX_SMALL_INTEGER && mode == SImode)
+	    return 1;
+	}
 
       if (PAIRED_SIMD_REGNO_P (regno) && TARGET_PAIRED_FLOAT
 	  && PAIRED_VECTOR_MODE (mode))
@@ -2444,6 +2448,10 @@  rs6000_debug_reg_global (void)
 	   "wx reg_class = %s\n"
 	   "wy reg_class = %s\n"
 	   "wz reg_class = %s\n"
+	   "wH reg_class = %s\n"
+	   "wI reg_class = %s\n"
+	   "wJ reg_class = %s\n"
+	   "wK reg_class = %s\n"
 	   "\n",
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
@@ -2471,7 +2479,11 @@  rs6000_debug_reg_global (void)
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_ww]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wx]],
 	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wy]],
-	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wz]]);
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wz]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wH]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wI]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wJ]],
+	   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wK]]);
 
   nl = "\n";
   for (m = 0; m < NUM_MACHINE_MODES; ++m)
@@ -2767,6 +2779,7 @@  rs6000_setup_reg_addr_masks (void)
     {
       machine_mode m2 = (machine_mode) m;
       bool complex_p = false;
+      bool small_int_p = (m2 == QImode || m2 == HImode || m2 == SImode);
       size_t msize;
 
       if (COMPLEX_MODE_P (m2))
@@ -2791,13 +2804,20 @@  rs6000_setup_reg_addr_masks (void)
 	  /* Can mode values go in the GPR/FPR/Altivec registers?  */
 	  if (reg >= 0 && rs6000_hard_regno_mode_ok_p[m][reg])
 	    {
+	      bool small_int_vsx_p = (small_int_p
+				      && (rc == RELOAD_REG_FPR
+					  || rc == RELOAD_REG_VMX));
+
 	      nregs = rs6000_hard_regno_nregs[m][reg];
 	      addr_mask |= RELOAD_REG_VALID;
 
 	      /* Indicate if the mode takes more than 1 physical register.  If
 		 it takes a single register, indicate it can do REG+REG
-		 addressing.  */
-	      if (nregs > 1 || m == BLKmode || complex_p)
+		 addressing.  Small integers in VSX registers can only do
+		 REG+REG addressing.  */
+	      if (small_int_vsx_p)
+		addr_mask |= RELOAD_REG_INDEXED;
+	      else if (nregs > 1 || m == BLKmode || complex_p)
 		addr_mask |= RELOAD_REG_MULTIPLE;
 	      else
 		addr_mask |= RELOAD_REG_INDEXED;
@@ -2814,6 +2834,7 @@  rs6000_setup_reg_addr_masks (void)
 		  && !VECTOR_MODE_P (m2)
 		  && !FLOAT128_VECTOR_P (m2)
 		  && !complex_p
+		  && !small_int_vsx_p
 		  && (m2 != DFmode || !TARGET_UPPER_REGS_DF)
 		  && (m2 != SFmode || !TARGET_UPPER_REGS_SF)
 		  && !(TARGET_E500_DOUBLE && msize == 8))
@@ -3112,7 +3133,10 @@  rs6000_init_hard_regno_mode_ok (bool glo
 	ww - Register class to do SF conversions in with VSX operations.
 	wx - Float register if we can do 32-bit int stores.
 	wy - Register class to do ISA 2.07 SF operations.
-	wz - Float register if we can do 32-bit unsigned int loads.  */
+	wz - Float register if we can do 32-bit unsigned int loads.
+	wI - VSX register if SImode is allowed in VSX registers.
+	wJ - VSX register if QImode/HImode are allowed in VSX registers.
+	wK - Altivec register if QImode/HImode are allowed in VSX registers.  */
 
   if (TARGET_HARD_FLOAT && TARGET_FPRS)
     rs6000_constraints[RS6000_CONSTRAINT_f] = FLOAT_REGS;	/* SFmode  */
@@ -3206,6 +3230,18 @@  rs6000_init_hard_regno_mode_ok (bool glo
   if (TARGET_DIRECT_MOVE_128)
     rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS;
 
+  /* Support small integers in VSX registers.  */
+  if (TARGET_VSX_SMALL_INTEGER)
+    {
+      rs6000_constraints[RS6000_CONSTRAINT_wH] = ALTIVEC_REGS;
+      rs6000_constraints[RS6000_CONSTRAINT_wI] = FLOAT_REGS;
+      if (TARGET_P9_VECTOR)
+	{
+	  rs6000_constraints[RS6000_CONSTRAINT_wJ] = FLOAT_REGS;
+	  rs6000_constraints[RS6000_CONSTRAINT_wK] = ALTIVEC_REGS;
+	}
+    }
+
   /* Set up the reload helper and direct move functions.  */
   if (TARGET_VSX || TARGET_ALTIVEC)
     {
@@ -3358,6 +3394,9 @@  rs6000_init_hard_regno_mode_ok (bool glo
 
       if (TARGET_UPPER_REGS_SF)
 	reg_addr[SFmode].scalar_in_vmx_p = true;
+
+      if (TARGET_VSX_SMALL_INTEGER)
+	reg_addr[SImode].scalar_in_vmx_p = true;
     }
 
   /* Setup the fusion operations.  */
@@ -4430,6 +4469,20 @@  rs6000_option_override_internal (bool gl
 	}
     }
 
+  /* Check whether we should allow small integers into VSX registers.  We
+     require direct move to prevent the register allocator from having to move
+     variables through memory to do moves.  SImode can be used on ISA 2.07,
+     while HImode and QImode require ISA 3.0.  */
+  if (TARGET_VSX_SMALL_INTEGER
+      && (!TARGET_DIRECT_MOVE || !TARGET_P8_VECTOR || !TARGET_UPPER_REGS_DI))
+    {
+      if (rs6000_isa_flags_explicit & OPTION_MASK_VSX_SMALL_INTEGER)
+	error ("-mvsx-small-integer requires -mpower8-vector, "
+	       "-mupper-regs-di, and -mdirect-move");
+
+      rs6000_isa_flags &= ~OPTION_MASK_VSX_SMALL_INTEGER;
+    }
+
   /* Set long double size before the IEEE 128-bit tests.  */
   if (!global_options_set.x_rs6000_long_double_type_size)
     {
@@ -20363,32 +20416,46 @@  rs6000_secondary_reload_simple_move (enu
 				     enum rs6000_reg_type from_type,
 				     machine_mode mode)
 {
-  int size;
+  int size = GET_MODE_SIZE (mode);
 
   /* Add support for various direct moves available.  In this function, we only
      look at cases where we don't need any extra registers, and one or more
-     simple move insns are issued.  At present, 32-bit integers are not allowed
+     simple move insns are issued. Originally small integers are not allowed
      in FPR/VSX registers.  Single precision binary floating is not a simple
      move because we need to convert to the single precision memory layout.
      The 4-byte SDmode can be moved.  TDmode values are disallowed since they
      need special direct move handling, which we do not support yet.  */
-  size = GET_MODE_SIZE (mode);
   if (TARGET_DIRECT_MOVE
-      && ((mode == SDmode) || (TARGET_POWERPC64 && size == 8))
       && ((to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
 	  || (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)))
-    return true;
+    {
+      if (TARGET_POWERPC64)
+	{
+	  /* ISA 2.07: MTVSRD or MVFVSRD.  */
+	  if (size == 8)
+	    return true;
 
-  else if (TARGET_DIRECT_MOVE_128 && size == 16 && mode != TDmode
-	   && ((to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
-	       || (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)))
-    return true;
+	  /* ISA 3.0: MTVSRDD or MFVSRD + MFVSRLD.  */
+	  if (size == 16 && TARGET_P9_VECTOR && mode != TDmode)
+	    return true;
+	}
+
+      /* ISA 2.07: MTVSRWZ or  MFVSRWZ.  */
+      if (TARGET_VSX_SMALL_INTEGER && mode == SImode)
+	return true;
+
+      /* ISA 2.07: MTVSRWZ or  MFVSRWZ.  */
+      if (mode == SDmode)
+	return true;
+    }
 
+  /* Power6+: MFTGPR or MFFGPR.  */
   else if (TARGET_MFPGPR && TARGET_POWERPC64 && size == 8
-	   && ((to_type == GPR_REG_TYPE && from_type == FPR_REG_TYPE)
-	       || (to_type == FPR_REG_TYPE && from_type == GPR_REG_TYPE)))
+      && ((to_type == GPR_REG_TYPE && from_type == FPR_REG_TYPE)
+	  || (to_type == FPR_REG_TYPE && from_type == GPR_REG_TYPE)))
     return true;
 
+  /* Move to/from SPR.  */
   else if ((size == 4 || (TARGET_POWERPC64 && size == 8))
 	   && ((to_type == GPR_REG_TYPE && from_type == SPR_REG_TYPE)
 	       || (to_type == SPR_REG_TYPE && from_type == GPR_REG_TYPE)))
@@ -20564,11 +20631,7 @@  rs6000_secondary_reload (bool in_p,
       enum rs6000_reg_type from_type = register_to_reg_type (x, &altivec_p);
 
       if (!in_p)
-	{
-	  enum rs6000_reg_type exchange = to_type;
-	  to_type = from_type;
-	  from_type = exchange;
-	}
+	std::swap (to_type, from_type);
 
       /* Can we do a direct move of some sort?  */
       if (rs6000_secondary_reload_move (to_type, from_type, mode, sri,
@@ -21196,7 +21259,8 @@  rs6000_preferred_reload_class (rtx x, en
       /* If this is a scalar floating point value and we don't have D-form
 	 addressing, prefer the traditional floating point registers so that we
 	 can use D-form (register+offset) addressing.  */
-      if (GET_MODE_SIZE (mode) < 16 && rclass == VSX_REGS)
+      if (rclass == VSX_REGS
+	  && (mode == SFmode || GET_MODE_SIZE (mode) == 8))
 	return FLOAT_REGS;
 
       /* Prefer the Altivec registers if Altivec is handling the vector
@@ -35767,7 +35831,7 @@  rs6000_register_move_cost (machine_mode
   else if (VECTOR_MEM_VSX_P (mode)
 	   && reg_classes_intersect_p (to, VSX_REGS)
 	   && reg_classes_intersect_p (from, VSX_REGS))
-    ret = 2 * hard_regno_nregs[32][mode];
+    ret = 2 * hard_regno_nregs[FIRST_FPR_REGNO][mode];
 
   /* Moving between two similar registers is just one instruction.  */
   else if (reg_classes_intersect_p (to, from))
@@ -37373,6 +37437,7 @@  static struct rs6000_opt_mask const rs60
   { "upper-regs-df",		OPTION_MASK_UPPER_REGS_DF,	false, true  },
   { "upper-regs-sf",		OPTION_MASK_UPPER_REGS_SF,	false, true  },
   { "vsx",			OPTION_MASK_VSX,		false, true  },
+  { "vsx-small-integer",	OPTION_MASK_VSX_SMALL_INTEGER,	false, true  },
   { "vsx-timode",		OPTION_MASK_VSX_TIMODE,		false, true  },
 #ifdef OPTION_MASK_64BIT
 #if TARGET_AIX_OS
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 241274)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000)	(working copy)
@@ -263,11 +263,14 @@  (define_mode_attr VS_64reg [(V2DF	"ws")
 			    (V2DI	"wi")])
 
 ;; Iterators for loading constants with xxspltib
-(define_mode_iterator VSINT_84  [V4SI V2DI DI])
+(define_mode_iterator VSINT_84  [V4SI V2DI DI SI])
 (define_mode_iterator VSINT_842 [V8HI V4SI V2DI])
 
-;; Iterator for ISA 3.0 vector extract/insert of integer vectors
-(define_mode_iterator VSX_EXTRACT_I [V16QI V8HI V4SI])
+;; Iterator for ISA 3.0 vector extract/insert of small integer vectors.
+;; VSX_EXTRACT_I2 doesn't include V4SImode because SI extracts can be
+;; done on ISA 2.07 and not just ISA 3.0.
+(define_mode_iterator VSX_EXTRACT_I  [V16QI V8HI V4SI])
+(define_mode_iterator VSX_EXTRACT_I2 [V16QI V8HI])
 
 ;; Mode attribute to give the correct predicate for ISA 3.0 vector extract and
 ;; insert to validate the operand number.
@@ -2476,7 +2479,9 @@  (define_expand  "vsx_extract_<mode>"
 	      (clobber (match_dup 3))])]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
 {
-  operands[3] = gen_rtx_SCRATCH ((TARGET_VEXTRACTUB) ? DImode : <MODE>mode);
+  machine_mode smode = ((<MODE>mode != V4SImode && TARGET_VEXTRACTUB)
+			? DImode : <MODE>mode);
+  operands[3] = gen_rtx_SCRATCH (smode);
 })
 
 ;; Under ISA 3.0, we can use the byte/half-word/word integer stores if we are
@@ -2485,9 +2490,9 @@  (define_expand  "vsx_extract_<mode>"
 (define_insn_and_split  "*vsx_extract_<mode>_p9"
   [(set (match_operand:<VS_scalar> 0 "nonimmediate_operand" "=r,Z")
 	(vec_select:<VS_scalar>
-	 (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "<VSX_EX>,<VSX_EX>")
+	 (match_operand:VSX_EXTRACT_I2 1 "gpc_reg_operand" "v,v")
 	 (parallel [(match_operand:QI 2 "<VSX_EXTRACT_PREDICATE>" "n,n")])))
-   (clobber (match_scratch:DI 3 "=<VSX_EX>,<VSX_EX>"))]
+   (clobber (match_scratch:DI 3 "=v,v"))]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_VEXTRACTUB"
   "#"
   "&& (reload_completed || MEM_P (operands[0]))"
@@ -2516,8 +2521,6 @@  (define_insn_and_split  "*vsx_extract_<m
 	emit_insn (gen_p9_stxsibx (dest, di_tmp));
       else if (<MODE>mode == V8HImode)
 	emit_insn (gen_p9_stxsihx (dest, di_tmp));
-      else if (<MODE>mode == V4SImode)
-	emit_insn (gen_stfiwx (dest, di_tmp));
       else
 	gcc_unreachable ();
     }
@@ -2550,12 +2553,70 @@  (define_insn  "vsx_extract_<mode>_di"
 }
   [(set_attr "type" "vecsimple")])
 
+(define_insn_and_split  "*vsx_extract_si"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,Z,Z,wJwK")
+	(vec_select:SI
+	 (match_operand:V4SI 1 "gpc_reg_operand" "v,wJwK,v,v")
+	 (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n,n")])))
+   (clobber (match_scratch:V4SI 3 "=v,wJwK,v,v"))]
+  "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rtx element = operands[2];
+  rtx vec_tmp = operands[3];
+  int value;
+
+  if (!VECTOR_ELT_ORDER_BIG)
+    element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
+
+  /* If the value is in the correct position, we can avoid doing the VSPLT<x>
+     instruction.  */
+  value = INTVAL (element);
+  if (value != 1)
+    {
+      if (TARGET_VEXTRACTUB)
+	{
+	  rtx di_tmp = gen_rtx_REG (DImode, REGNO (vec_tmp));
+	  emit_insn (gen_vsx_extract_v4si_di (di_tmp,src, element));
+	}
+      else
+	emit_insn (gen_altivec_vspltw_direct (vec_tmp, src, element));
+    }
+  else
+    vec_tmp = src;
+
+  if (MEM_P (operands[0]))
+    {
+      if (can_create_pseudo_p ())
+	dest = rs6000_address_for_fpconvert (dest);
+
+      if (TARGET_VSX_SMALL_INTEGER)
+	emit_move_insn (dest, gen_rtx_REG (SImode, REGNO (vec_tmp)));
+      else
+	emit_insn (gen_stfiwx (dest, gen_rtx_REG (DImode, REGNO (vec_tmp))));
+    }
+
+  else if (TARGET_VSX_SMALL_INTEGER)
+    emit_move_insn (dest, gen_rtx_REG (SImode, REGNO (vec_tmp)));
+  else
+    emit_move_insn (gen_rtx_REG (DImode, REGNO (dest)),
+		    gen_rtx_REG (DImode, REGNO (vec_tmp)));
+
+  DONE;
+}
+  [(set_attr "type" "mftgpr,fpstore,fpstore,vecsimple")
+   (set_attr "length" "8")])
+
 (define_insn_and_split  "*vsx_extract_<mode>_p8"
   [(set (match_operand:<VS_scalar> 0 "nonimmediate_operand" "=r")
 	(vec_select:<VS_scalar>
-	 (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v")
+	 (match_operand:VSX_EXTRACT_I2 1 "gpc_reg_operand" "v")
 	 (parallel [(match_operand:QI 2 "<VSX_EXTRACT_PREDICATE>" "n")])))
-   (clobber (match_scratch:VSX_EXTRACT_I 3 "=v"))]
+   (clobber (match_scratch:VSX_EXTRACT_I2 3 "=v"))]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"
@@ -2587,13 +2648,6 @@  (define_insn_and_split  "*vsx_extract_<m
       else
 	vec_tmp = src;
     }
-  else if (<MODE>mode == V4SImode)
-    {
-      if (value != 1)
-	emit_insn (gen_altivec_vspltw_direct (vec_tmp, src, element));
-      else
-	vec_tmp = src;
-    }
   else
     gcc_unreachable ();
 
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 241274)
+++ gcc/config/rs6000/rs6000.h	(.../gcc/config/rs6000)	(working copy)
@@ -1602,6 +1602,10 @@  enum r6000_reg_class_enum {
   RS6000_CONSTRAINT_wx,		/* FPR register for STFIWX */
   RS6000_CONSTRAINT_wy,		/* VSX register for SF */
   RS6000_CONSTRAINT_wz,		/* FPR register for LFIWZX */
+  RS6000_CONSTRAINT_wH,		/* Altivec register for 32-bit integers.  */
+  RS6000_CONSTRAINT_wI,		/* VSX register for 32-bit integers.  */
+  RS6000_CONSTRAINT_wJ,		/* VSX register for 8/16-bit integers.  */
+  RS6000_CONSTRAINT_wK,		/* Altivec register for 16/32-bit integers.  */
   RS6000_CONSTRAINT_MAX
 };
 
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 241274)
+++ gcc/config/rs6000/rs6000.md	(.../gcc/config/rs6000)	(working copy)
@@ -458,7 +458,7 @@  (define_mode_attr f32_sm  [(SF "m")
 (define_mode_attr f32_sm2 [(SF "wY")		   (SD "wn")])
 (define_mode_attr f32_si  [(SF "stfs%U0%X0 %1,%0") (SD "stfiwx %1,%y0")])
 (define_mode_attr f32_si2 [(SF "stxssp %1,%0")     (SD "stfiwx %1,%y0")])
-(define_mode_attr f32_sv  [(SF "stxsspx %x1,%y0")  (SD "stxsiwzx %x1,%y0")])
+(define_mode_attr f32_sv  [(SF "stxsspx %x1,%y0")  (SD "stxsiwx %x1,%y0")])
 
 ; Definitions for 32-bit fpr direct move
 ; At present, the decimal modes are not allowed in the traditional altivec
@@ -837,16 +837,18 @@  (define_insn_and_split "*zero_extendhi<m
 
 
 (define_insn "zero_extendsi<mode>2"
-  [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,??wj,!wz,!wu")
-	(zero_extend:EXTSI (match_operand:SI 1 "reg_or_mem_operand" "m,r,r,Z,Z")))]
+  [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,wz,wu,wj,r,wJwK")
+	(zero_extend:EXTSI (match_operand:SI 1 "reg_or_mem_operand" "m,r,Z,Z,r,wIwH,wJwK")))]
   ""
   "@
    lwz%U1%X1 %0,%1
    rldicl %0,%1,0,32
-   mtvsrwz %x0,%1
    lfiwzx %0,%y1
-   lxsiwzx %x0,%y1"
-  [(set_attr "type" "load,shift,mffgpr,fpload,fpload")])
+   lxsiwzx %x0,%y1
+   mtvsrwz %x0,%1
+   mfvsrwz %0,%x1
+   xxextractuw %x0,%x1,1"
+  [(set_attr "type" "load,shift,fpload,fpload,mffgpr,mftgpr,vecexts")])
 
 (define_insn_and_split "*zero_extendsi<mode>2_dot"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x,?y")
@@ -1005,16 +1007,17 @@  (define_insn_and_split "*extendhi<mode>2
 
 
 (define_insn "extendsi<mode>2"
-  [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,??wj,!wl,!wu")
-	(sign_extend:EXTSI (match_operand:SI 1 "lwa_operand" "Y,r,r,Z,Z")))]
+  [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,wl,wu,wj,wK")
+	(sign_extend:EXTSI (match_operand:SI 1 "lwa_operand" "Y,r,Z,Z,r,wK")))]
   ""
   "@
    lwa%U1%X1 %0,%1
    extsw %0,%1
-   mtvsrwa %x0,%1
    lfiwax %0,%y1
-   lxsiwax %x0,%y1"
-  [(set_attr "type" "load,exts,mffgpr,fpload,fpload")
+   lxsiwax %x0,%y1
+   mtvsrwa %x0,%1
+   vextsw2d %0,%1"
+  [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts")
    (set_attr "sign_extend" "yes")])
 
 (define_insn_and_split "*extendsi<mode>2_dot"
@@ -4947,15 +4950,16 @@  (define_insn "*xxsel<mode>"
 ; We don't define lfiwax/lfiwzx with the normal definition, because we
 ; don't want to support putting SImode in FPR registers.
 (define_insn "lfiwax"
-  [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wj,!wj")
-	(unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r")]
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wj,wj,wK")
+	(unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,wK")]
 		   UNSPEC_LFIWAX))]
   "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT && TARGET_LFIWAX"
   "@
    lfiwax %0,%y1
    lxsiwax %x0,%y1
-   mtvsrwa %x0,%1"
-  [(set_attr "type" "fpload,fpload,mffgpr")])
+   mtvsrwa %x0,%1
+   vextsw2d %0,%1"
+  [(set_attr "type" "fpload,fpload,mffgpr,vecexts")])
 
 ; This split must be run before register allocation because it allocates the
 ; memory slot that is needed to move values to/from the FPR.  We don't allocate
@@ -5019,7 +5023,10 @@  (define_insn_and_split "floatsi<mode>2_l
   operands[1] = rs6000_address_for_fpconvert (operands[1]);
   if (GET_CODE (operands[2]) == SCRATCH)
     operands[2] = gen_reg_rtx (DImode);
-  emit_insn (gen_lfiwax (operands[2], operands[1]));
+  if (TARGET_VSX_SMALL_INTEGER)
+    emit_insn (gen_extendsidi2 (operands[2], operands[1]));
+  else    
+    emit_insn (gen_lfiwax (operands[2], operands[1]));
   emit_insn (gen_floatdi<mode>2 (operands[0], operands[2]));
   DONE;
 }"
@@ -5027,15 +5034,16 @@  (define_insn_and_split "floatsi<mode>2_l
    (set_attr "type" "fpload")])
 
 (define_insn "lfiwzx"
-  [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wj,!wj")
-	(unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r")]
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wj,wj,wJwK")
+	(unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,wJwK")]
 		   UNSPEC_LFIWZX))]
   "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT && TARGET_LFIWZX"
   "@
    lfiwzx %0,%y1
    lxsiwzx %x0,%y1
-   mtvsrwz %x0,%1"
-  [(set_attr "type" "fpload,fpload,mftgpr")])
+   mtvsrwz %x0,%1
+   xxextractuw %x0,%x1,1"
+  [(set_attr "type" "fpload,fpload,mftgpr,vecexts")])
 
 (define_insn_and_split "floatunssi<mode>2_lfiwzx"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=<Fv>")
@@ -5094,7 +5102,10 @@  (define_insn_and_split "floatunssi<mode>
   operands[1] = rs6000_address_for_fpconvert (operands[1]);
   if (GET_CODE (operands[2]) == SCRATCH)
     operands[2] = gen_reg_rtx (DImode);
-  emit_insn (gen_lfiwzx (operands[2], operands[1]));
+  if (TARGET_VSX_SMALL_INTEGER)
+    emit_insn (gen_zero_extendsidi2 (operands[2], operands[1]));
+  else
+    emit_insn (gen_lfiwzx (operands[2], operands[1]));
   emit_insn (gen_floatdi<mode>2 (operands[0], operands[2]));
   DONE;
 }"
@@ -6518,25 +6529,66 @@  (define_insn "movsi_low"
   [(set_attr "type" "load")
    (set_attr "length" "4")])
 
-(define_insn "*movsi_internal1"
-  [(set (match_operand:SI 0 "rs6000_nonimmediate_operand" "=r,r,r,m,r,r,r,r,*c*l,*h,*h")
-	(match_operand:SI 1 "input_operand" "r,U,m,r,I,L,n,*h,r,r,0"))]
+;;		MR           LA           LWZ          LFIWZX       LXSIWZX
+;;		STW          STFIWX       STXSIWX      LI           LIS
+;;		#            XXLOR        XXSPLTIB 0   XXSPLTIB -1  VSPLTISW
+;;		XXLXOR 0     XXLORC -1    P9 const     MTVSRWZ      MFVSRWZ
+;;		MF%1         MT%0         MT%0         NOP
+(define_insn "*movsi_internal1"
+  [(set (match_operand:SI 0 "rs6000_nonimmediate_operand"
+		"=r,         r,           r,           ?*wI,        ?*wH,
+		 m,          ?Z,          ?Z,          r,           r,
+		 r,          ?*wIwH,      ?*wJwK,      ?*wK,        ?*wJwK,
+		 ?*wJwK,     ?*wH,        ?*wK,        ?*wIwH,      ?r,
+		 r,          *c*l,        *h,          *h")
+
+	(match_operand:SI 1 "input_operand"
+		"r,          U,           m,           Z,           Z,
+		 r,          wI,          wH,          I,           L,
+                 n,          wIwH,        O,           wM,          wB,
+		 O,          wM,          wS,          r,           wIwH,
+		 *h,         r,           r,           0"))]
+
   "!TARGET_SINGLE_FPU &&
    (gpc_reg_operand (operands[0], SImode) || gpc_reg_operand (operands[1], SImode))"
   "@
    mr %0,%1
    la %0,%a1
    lwz%U1%X1 %0,%1
+   lfiwzx %0,%y1
+   lxsiwzx %x0,%y1
    stw%U0%X0 %1,%0
+   stfiwx %1,%y0
+   stxsiwx %x1,%y0
    li %0,%1
    lis %0,%v1
    #
+   xxlor %x0,%x1,%x1
+   xxspltib %x0,0
+   xxspltib %x0,255
+   vspltisw %0,%1
+   xxlxor %x0,%x0,%x0
+   xxlorc %x0,%x0,%x0
+   #
+   mtvsrwz %x0,%1
+   mfvsrwz %0,%x1
    mf%1 %0
    mt%0 %1
    mt%0 %1
    nop"
-  [(set_attr "type" "*,*,load,store,*,*,*,mfjmpr,mtjmpr,*,*")
-   (set_attr "length" "4,4,4,4,4,4,8,4,4,4,4")])
+  [(set_attr "type"
+		"*,          *,           load,        fpload,      fpload,
+		 store,      fpstore,     fpstore,     *,           *,
+		 *,          veclogical,  vecsimple,   vecsimple,   vecsimple,
+		 veclogical, veclogical,  vecsimple,   mffgpr,      mftgpr,
+		 *,           *,           *,           *")
+
+   (set_attr "length"
+		"4,          4,           4,           4,           4,
+		 4,          4,           4,           4,           4,
+		 8,          4,           4,           4,           4,
+		 4,          4,           8,           4,           4,
+		 4,          4,           4,           4")])
 
 (define_insn "*movsi_internal1_single"
   [(set (match_operand:SI 0 "rs6000_nonimmediate_operand" "=r,r,r,m,r,r,r,r,*c*l,*h,*h,m,*f")
@@ -6581,6 +6633,23 @@  (define_split
     FAIL;
 }")
 
+;; Split loading -128..127 to use XXSPLITB and VEXTSW2D
+(define_split
+  [(set (match_operand:DI 0 "altivec_register_operand" "")
+	(match_operand:DI 1 "xxspltib_constant_split" ""))]
+  "TARGET_VSX_SMALL_INTEGER && TARGET_P9_VECTOR && reload_completed"
+  [(const_int 0)]
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  int r = REGNO (op0);
+  rtx op0_v16qi = gen_rtx_REG (V16QImode, r);
+
+  emit_insn (gen_xxspltib_v16qi (op0_v16qi, op1));
+  emit_insn (gen_vsx_sign_extend_qi_si (operands[0], op0_v16qi));
+  DONE;
+})
+
 (define_insn "*mov<mode>_internal2"
   [(set (match_operand:CC 2 "cc_reg_operand" "=y,x,?y")
 	(compare:CC (match_operand:P 1 "gpc_reg_operand" "0,r,r")
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/doc)	(revision 241274)
+++ gcc/doc/md.texi	(.../gcc/doc)	(working copy)
@@ -3125,6 +3125,18 @@  Memory operand suitable for power9 fusio
 @item wG
 Memory operand suitable for TOC fusion memory references.
 
+@item wH
+Altivec register if @option{-mvsx-small-integer}.
+
+@item wI
+Floating point register if @option{-mvsx-small-integer}.
+
+@item wJ
+FP register if @option{-mvsx-small-integer} and @option{-mpower9-vector}.
+
+@item wK
+Altivec register if @option{-mvsx-small-integer} and @option{-mpower9-vector}.
+
 @item wL
 Int constant that is the element number that the MFVSRLD instruction.
 targets.
Index: gcc/testsuite/gcc.target/powerpc/vsx-simode.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-simode.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-simode.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 241600)
@@ -0,0 +1,22 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2 -mvsx-small-integer" } */
+
+double load_asm_d_constraint (int *p)
+{
+  double ret;
+  __asm__ ("xxlor %x0,%x1,%x1\t# load d constraint" : "=d" (ret) : "d" (*p));
+  return ret;
+}
+
+void store_asm_d_constraint (int *p, double x)
+{
+  int i;
+  __asm__ ("xxlor %x0,%x1,%x1\t# store d constraint" : "=d" (i) : "d" (x));
+  *p = i;
+}
+
+/* { dg-final { scan-assembler "lfiwzx" } } */
+/* { dg-final { scan-assembler "stfiwx" } } */
Index: gcc/testsuite/gcc.target/powerpc/vsx-simode2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-simode2.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-simode2.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 241600)
@@ -0,0 +1,15 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2 -mvsx-small-integer" } */
+
+unsigned int foo (unsigned int u)
+{
+  unsigned int ret;
+  __asm__ ("xxlor %x0,%x1,%x1\t# v, v constraints" : "=v" (ret) : "v" (u));
+  return ret;
+}
+
+/* { dg-final { scan-assembler "mtvsrwz" } } */
+/* { dg-final { scan-assembler "mfvsrwz" } } */
Index: gcc/testsuite/gcc.target/powerpc/vsx-simode3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-simode3.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-simode3.c	(.../gcc/testsuite/gcc.target/powerpc)	(revision 241600)
@@ -0,0 +1,22 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2 -mvsx-small-integer" } */
+
+double load_asm_v_constraint (int *p)
+{
+  double ret;
+  __asm__ ("xxlor %x0,%x1,%x1\t# load v constraint" : "=d" (ret) : "v" (*p));
+  return ret;
+}
+
+void store_asm_v_constraint (int *p, double x)
+{
+  int i;
+  __asm__ ("xxlor %x0,%x1,%x1\t# store v constraint" : "=v" (i) : "d" (x));
+  *p = i;
+}
+
+/* { dg-final { scan-assembler "lxsiwzx" } } */
+/* { dg-final { scan-assembler "stxsiwx" } } */