diff mbox

[rs6000] Implement -maltivec=be for vec_splat builtins

Message ID 1391057433.28411.27.camel@gnopaine
State New
Headers show

Commit Message

Bill Schmidt Jan. 30, 2014, 4:50 a.m. UTC
Hi,

This continues the series of -maltivec=be patches, this one handling
vec_splat.  Since vec_splat is no longer treated as a true intrinsic,
its default behavior for little endian is changed to use right-to-left
element indexing for selecting the element to splat.  With -maltivec=be
for little endian, this reverts to left-to-right element indexing.

The main changes are in altivec.md, and are quite similar to what was
done for vector merge high/low.  Each of the vector splat builtins is
split into a define_expand and an "_internal" define_insn.  The expand
uses the natural element order for the target, unless we have
-maltivec=be for an LE target, in which case the order is reversed.  The
_internal insn modifies the element number on the hardware instruction
for LE to account for the instruction's big-endian bias.

For those of the vector splat instructions that are generated
internally, rather than through a builtin, a "_direct" insn is supplied
that generates precisely the instruction requested.

vsx.md contains the same changes for V4SI and V4SF patterns when VSX
instructions are enabled.  We don't need a separate define_expand for
this one because the define_expand for altivec_vspltw is used to
generate the pattern that is recognized by vsx_xxspltw_<mode>.  Either
vsx_xxspltw_<mode> or *altivec_vspltw_internal handles the pattern,
depending on whether or not VSX instructions are enabled.

Most of the changes in rs6000.c are to use the new _direct forms.  There
is one other change to rs6000_expand_vector_init, where the modification
of the selector field for a generated splat pattern is removed.  That is
properly handled instead by the code in altivec.md and vsx.md.

As usual, there are four new test cases to cover the various vector
types for VMX and VSX.  Two existing test cases require adjustment
because of the change in the default semantics of vec_splat for little
endian.

Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
regressions.  Ok for trunk?

Thanks,
Bill


gcc:

2014-01-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc/config/rs6000/rs6000.c (rs6000_expand_vector_init): Use
	gen_vsx_xxspltw_v4sf_direct instead of gen_vsx_xxspltw_v4sf;
	remove element index adjustment for endian (now handled in vsx.md
	and altivec.md).
	(altivec_expand_vec_perm_const): Use
	gen_altivec_vsplt[bhw]_direct instead of gen_altivec_vsplt[bhw].
	* gcc/config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTW): New unspec.
	(vsx_xxspltw_<mode>): Adjust element index for little endian.
	* gcc/config/rs6000/altivec.md (altivec_vspltb): Divide into a
	define_expand and a new define_insn *altivec_vspltb_internal;
	adjust for -maltivec=be on a little endian target.
	(altivec_vspltb_direct): New.
	(altivec_vsplth): Divide into a define_expand and a new
	define_insn *altivec_vsplth_internal; adjust for -maltivec=be on a
	little endian target.
	(altivec_vsplth_direct): New.
	(altivec_vspltw): Divide into a define_expand and a new
	define_insn *altivec_vspltw_internal; adjust for -maltivec=be on a
	little endian target.
	(altivec_vspltw_direct): New.
	(altivec_vspltsf): Divide into a define_expand and a new
	define_insn *altivec_vspltsf_internal; adjust for -maltivec=be on
	a little endian target.

gcc/testsuite:

2014-01-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.dg/vmx/splat.c: New.
	* gcc.dg/vmx/splat-vsx.c: New.
	* gcc.dg/vmx/splat-be-order.c: New.
	* gcc.dg/vmx/splat-vsx-be-order.c: New.
	* gcc.dg/vmx/eg-5.c: Remove special casing for little endian.
	* gcc.dg/vmx/sn7153.c: Add special casing for little endian.

Comments

David Edelsohn Jan. 30, 2014, 6:06 p.m. UTC | #1
On Wed, Jan 29, 2014 at 11:50 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> This continues the series of -maltivec=be patches, this one handling
> vec_splat.  Since vec_splat is no longer treated as a true intrinsic,
> its default behavior for little endian is changed to use right-to-left
> element indexing for selecting the element to splat.  With -maltivec=be
> for little endian, this reverts to left-to-right element indexing.
>
> The main changes are in altivec.md, and are quite similar to what was
> done for vector merge high/low.  Each of the vector splat builtins is
> split into a define_expand and an "_internal" define_insn.  The expand
> uses the natural element order for the target, unless we have
> -maltivec=be for an LE target, in which case the order is reversed.  The
> _internal insn modifies the element number on the hardware instruction
> for LE to account for the instruction's big-endian bias.
>
> For those of the vector splat instructions that are generated
> internally, rather than through a builtin, a "_direct" insn is supplied
> that generates precisely the instruction requested.
>
> vsx.md contains the same changes for V4SI and V4SF patterns when VSX
> instructions are enabled.  We don't need a separate define_expand for
> this one because the define_expand for altivec_vspltw is used to
> generate the pattern that is recognized by vsx_xxspltw_<mode>.  Either
> vsx_xxspltw_<mode> or *altivec_vspltw_internal handles the pattern,
> depending on whether or not VSX instructions are enabled.
>
> Most of the changes in rs6000.c are to use the new _direct forms.  There
> is one other change to rs6000_expand_vector_init, where the modification
> of the selector field for a generated splat pattern is removed.  That is
> properly handled instead by the code in altivec.md and vsx.md.
>
> As usual, there are four new test cases to cover the various vector
> types for VMX and VSX.  Two existing test cases require adjustment
> because of the change in the default semantics of vec_splat for little
> endian.
>
> Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
> regressions.  Ok for trunk?
>
> Thanks,
> Bill
>
>
> gcc:
>
> 2014-01-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * gcc/config/rs6000/rs6000.c (rs6000_expand_vector_init): Use
>         gen_vsx_xxspltw_v4sf_direct instead of gen_vsx_xxspltw_v4sf;
>         remove element index adjustment for endian (now handled in vsx.md
>         and altivec.md).
>         (altivec_expand_vec_perm_const): Use
>         gen_altivec_vsplt[bhw]_direct instead of gen_altivec_vsplt[bhw].
>         * gcc/config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTW): New unspec.
>         (vsx_xxspltw_<mode>): Adjust element index for little endian.
>         * gcc/config/rs6000/altivec.md (altivec_vspltb): Divide into a
>         define_expand and a new define_insn *altivec_vspltb_internal;
>         adjust for -maltivec=be on a little endian target.
>         (altivec_vspltb_direct): New.
>         (altivec_vsplth): Divide into a define_expand and a new
>         define_insn *altivec_vsplth_internal; adjust for -maltivec=be on a
>         little endian target.
>         (altivec_vsplth_direct): New.
>         (altivec_vspltw): Divide into a define_expand and a new
>         define_insn *altivec_vspltw_internal; adjust for -maltivec=be on a
>         little endian target.
>         (altivec_vspltw_direct): New.
>         (altivec_vspltsf): Divide into a define_expand and a new
>         define_insn *altivec_vspltsf_internal; adjust for -maltivec=be on
>         a little endian target.
>
> gcc/testsuite:
>
> 2014-01-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * gcc.dg/vmx/splat.c: New.
>         * gcc.dg/vmx/splat-vsx.c: New.
>         * gcc.dg/vmx/splat-be-order.c: New.
>         * gcc.dg/vmx/splat-vsx-be-order.c: New.
>         * gcc.dg/vmx/eg-5.c: Remove special casing for little endian.
>         * gcc.dg/vmx/sn7153.c: Add special casing for little endian.

Okay.

Thanks David
diff mbox

Patch

Index: gcc/testsuite/gcc.dg/vmx/splat.c
===================================================================
--- gcc/testsuite/gcc.dg/vmx/splat.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vmx/splat.c	(revision 0)
@@ -0,0 +1,47 @@ 
+#include "harness.h"
+
+static void test()
+{
+  /* Input vectors.  */
+  vector unsigned char vuc = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  vector signed char vsc = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7};
+  vector unsigned short vus = {0,1,2,3,4,5,6,7};
+  vector signed short vss = {-4,-3,-2,-1,0,1,2,3};
+  vector unsigned int vui = {0,1,2,3};
+  vector signed int vsi = {-2,-1,0,1};
+  vector float vf = {-2.0,-1.0,0.0,1.0};
+
+  /* Result vectors.  */
+  vector unsigned char vucr;
+  vector signed char vscr;
+  vector unsigned short vusr;
+  vector signed short vssr;
+  vector unsigned int vuir;
+  vector signed int vsir;
+  vector float vfr;
+
+  /* Expected result vectors.  */
+  vector unsigned char vucer = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
+  vector signed char vscer = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
+  vector unsigned short vuser = {7,7,7,7,7,7,7,7};
+  vector signed short vsser = {-4,-4,-4,-4,-4,-4,-4,-4};
+  vector unsigned int vuier = {2,2,2,2};
+  vector signed int vsier = {1,1,1,1};
+  vector float vfer = {-1.0,-1.0,-1.0,-1.0};
+
+  vucr = vec_splat (vuc, 1);
+  vscr = vec_splat (vsc, 8);
+  vusr = vec_splat (vus, 7);
+  vssr = vec_splat (vss, 0);
+  vuir = vec_splat (vui, 2);
+  vsir = vec_splat (vsi, 3);
+  vfr  = vec_splat (vf,  1);
+
+  check (vec_all_eq (vucr, vucer), "vuc");
+  check (vec_all_eq (vscr, vscer), "vsc");
+  check (vec_all_eq (vusr, vuser), "vus");
+  check (vec_all_eq (vssr, vsser), "vss");
+  check (vec_all_eq (vuir, vuier), "vui");
+  check (vec_all_eq (vsir, vsier), "vsi");
+  check (vec_all_eq (vfr,  vfer ), "vf");
+}
Index: gcc/testsuite/gcc.dg/vmx/splat-vsx-be-order.c
===================================================================
--- gcc/testsuite/gcc.dg/vmx/splat-vsx-be-order.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vmx/splat-vsx-be-order.c	(revision 0)
@@ -0,0 +1,37 @@ 
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mvsx" } */
+
+#include "harness.h"
+
+static void test()
+{
+  /* Input vectors.  */
+  vector unsigned int vui = {0,1,2,3};
+  vector signed int vsi = {-2,-1,0,1};
+  vector float vf = {-2.0,-1.0,0.0,1.0};
+
+  /* Result vectors.  */
+  vector unsigned int vuir;
+  vector signed int vsir;
+  vector float vfr;
+
+  /* Expected result vectors.  */
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  vector unsigned int vuier = {1,1,1,1};
+  vector signed int vsier = {-2,-2,-2,-2};
+  vector float vfer = {0.0,0.0,0.0,0.0};
+#else
+  vector unsigned int vuier = {2,2,2,2};
+  vector signed int vsier = {1,1,1,1};
+  vector float vfer = {-1.0,-1.0,-1.0,-1.0};
+#endif
+
+  vuir = vec_splat (vui, 2);
+  vsir = vec_splat (vsi, 3);
+  vfr  = vec_splat (vf,  1);
+
+  check (vec_all_eq (vuir, vuier), "vui");
+  check (vec_all_eq (vsir, vsier), "vsi");
+  check (vec_all_eq (vfr,  vfer ), "vf");
+}
Index: gcc/testsuite/gcc.dg/vmx/eg-5.c
===================================================================
--- gcc/testsuite/gcc.dg/vmx/eg-5.c	(revision 207294)
+++ gcc/testsuite/gcc.dg/vmx/eg-5.c	(working copy)
@@ -6,19 +6,10 @@  matvecmul4 (vector float c0, vector float c1, vect
 {
   /* Set result to a vector of f32 0's */
   vector float result = ((vector float){0.,0.,0.,0.});
-
-#ifdef __LITTLE_ENDIAN__
-  result  = vec_madd (c0, vec_splat (v, 3), result);
-  result  = vec_madd (c1, vec_splat (v, 2), result);
-  result  = vec_madd (c2, vec_splat (v, 1), result);
-  result  = vec_madd (c3, vec_splat (v, 0), result);
-#else
   result  = vec_madd (c0, vec_splat (v, 0), result);
   result  = vec_madd (c1, vec_splat (v, 1), result);
   result  = vec_madd (c2, vec_splat (v, 2), result);
   result  = vec_madd (c3, vec_splat (v, 3), result);
-#endif
-
   return result;
 }
 
Index: gcc/testsuite/gcc.dg/vmx/splat-be-order.c
===================================================================
--- gcc/testsuite/gcc.dg/vmx/splat-be-order.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vmx/splat-be-order.c	(revision 0)
@@ -0,0 +1,59 @@ 
+/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */
+
+#include "harness.h"
+
+static void test()
+{
+  /* Input vectors.  */
+  vector unsigned char vuc = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  vector signed char vsc = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7};
+  vector unsigned short vus = {0,1,2,3,4,5,6,7};
+  vector signed short vss = {-4,-3,-2,-1,0,1,2,3};
+  vector unsigned int vui = {0,1,2,3};
+  vector signed int vsi = {-2,-1,0,1};
+  vector float vf = {-2.0,-1.0,0.0,1.0};
+
+  /* Result vectors.  */
+  vector unsigned char vucr;
+  vector signed char vscr;
+  vector unsigned short vusr;
+  vector signed short vssr;
+  vector unsigned int vuir;
+  vector signed int vsir;
+  vector float vfr;
+
+  /* Expected result vectors.  */
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  vector unsigned char vucer = {14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14};
+  vector signed char vscer = {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1};
+  vector unsigned short vuser = {0,0,0,0,0,0,0,0};
+  vector signed short vsser = {3,3,3,3,3,3,3,3};
+  vector unsigned int vuier = {1,1,1,1};
+  vector signed int vsier = {-2,-2,-2,-2};
+  vector float vfer = {0.0,0.0,0.0,0.0};
+#else
+  vector unsigned char vucer = {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
+  vector signed char vscer = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
+  vector unsigned short vuser = {7,7,7,7,7,7,7,7};
+  vector signed short vsser = {-4,-4,-4,-4,-4,-4,-4,-4};
+  vector unsigned int vuier = {2,2,2,2};
+  vector signed int vsier = {1,1,1,1};
+  vector float vfer = {-1.0,-1.0,-1.0,-1.0};
+#endif
+
+  vucr = vec_splat (vuc, 1);
+  vscr = vec_splat (vsc, 8);
+  vusr = vec_splat (vus, 7);
+  vssr = vec_splat (vss, 0);
+  vuir = vec_splat (vui, 2);
+  vsir = vec_splat (vsi, 3);
+  vfr  = vec_splat (vf,  1);
+
+  check (vec_all_eq (vucr, vucer), "vuc");
+  check (vec_all_eq (vscr, vscer), "vsc");
+  check (vec_all_eq (vusr, vuser), "vus");
+  check (vec_all_eq (vssr, vsser), "vss");
+  check (vec_all_eq (vuir, vuier), "vui");
+  check (vec_all_eq (vsir, vsier), "vsi");
+  check (vec_all_eq (vfr,  vfer ), "vf");
+}
Index: gcc/testsuite/gcc.dg/vmx/sn7153.c
===================================================================
--- gcc/testsuite/gcc.dg/vmx/sn7153.c	(revision 207294)
+++ gcc/testsuite/gcc.dg/vmx/sn7153.c	(working copy)
@@ -34,7 +34,11 @@  main()
 
 void validate_sat()
 {
+#ifdef __LITTLE_ENDIAN__
+  if (vec_any_ne(vec_splat(vec_mfvscr(), 0), ((vector unsigned short){1,1,1,1,1,1,1,1})))
+#else
   if (vec_any_ne(vec_splat(vec_mfvscr(), 7), ((vector unsigned short){1,1,1,1,1,1,1,1})))
+#endif
     {
       union {vector unsigned short v; unsigned short s[8];} u;
       u.v = vec_mfvscr();
Index: gcc/testsuite/gcc.dg/vmx/splat-vsx.c
===================================================================
--- gcc/testsuite/gcc.dg/vmx/splat-vsx.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vmx/splat-vsx.c	(revision 0)
@@ -0,0 +1,31 @@ 
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-maltivec -mabi=altivec -std=gnu99 -mvsx" } */
+
+#include "harness.h"
+
+static void test()
+{
+  /* Input vectors.  */
+  vector unsigned int vui = {0,1,2,3};
+  vector signed int vsi = {-2,-1,0,1};
+  vector float vf = {-2.0,-1.0,0.0,1.0};
+
+  /* Result vectors.  */
+  vector unsigned int vuir;
+  vector signed int vsir;
+  vector float vfr;
+
+  /* Expected result vectors.  */
+  vector unsigned int vuier = {2,2,2,2};
+  vector signed int vsier = {1,1,1,1};
+  vector float vfer = {-1.0,-1.0,-1.0,-1.0};
+
+  vuir = vec_splat (vui, 2);
+  vsir = vec_splat (vsi, 3);
+  vfr  = vec_splat (vf,  1);
+
+  check (vec_all_eq (vuir, vuier), "vui");
+  check (vec_all_eq (vsir, vsier), "vsi");
+  check (vec_all_eq (vfr,  vfer ), "vf");
+}
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 207294)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -5485,7 +5485,7 @@  rs6000_expand_vector_init (rtx target, rtx vals)
 		      : gen_vsx_xscvdpsp_scalar (freg, sreg));
 
 	  emit_insn (cvt);
-	  emit_insn (gen_vsx_xxspltw_v4sf (target, freg, const0_rtx));
+	  emit_insn (gen_vsx_xxspltw_v4sf_direct (target, freg, const0_rtx));
 	}
       else
 	{
@@ -5522,11 +5522,9 @@  rs6000_expand_vector_init (rtx target, rtx vals)
 					      gen_rtx_SET (VOIDmode,
 							   target, mem),
 					      x)));
-      field = (BYTES_BIG_ENDIAN ? const0_rtx
-	       : GEN_INT (GET_MODE_NUNITS (mode) - 1));
       x = gen_rtx_VEC_SELECT (inner_mode, target,
 			      gen_rtx_PARALLEL (VOIDmode,
-						gen_rtvec (1, field)));
+						gen_rtvec (1, const0_rtx)));
       emit_insn (gen_rtx_SET (VOIDmode, target,
 			      gen_rtx_VEC_DUPLICATE (mode, x)));
       return;
@@ -29980,7 +29978,7 @@  altivec_expand_vec_perm_const (rtx operands[4])
 	{
           if (!BYTES_BIG_ENDIAN)
             elt = 15 - elt;
-	  emit_insn (gen_altivec_vspltb (target, op0, GEN_INT (elt)));
+	  emit_insn (gen_altivec_vspltb_direct (target, op0, GEN_INT (elt)));
 	  return true;
 	}
 
@@ -29993,8 +29991,8 @@  altivec_expand_vec_perm_const (rtx operands[4])
 	    {
 	      int field = BYTES_BIG_ENDIAN ? elt / 2 : 7 - elt / 2;
 	      x = gen_reg_rtx (V8HImode);
-	      emit_insn (gen_altivec_vsplth (x, gen_lowpart (V8HImode, op0),
-					     GEN_INT (field)));
+	      emit_insn (gen_altivec_vsplth_direct (x, gen_lowpart (V8HImode, op0),
+						    GEN_INT (field)));
 	      emit_move_insn (target, gen_lowpart (V16QImode, x));
 	      return true;
 	    }
@@ -30012,8 +30010,8 @@  altivec_expand_vec_perm_const (rtx operands[4])
 	    {
 	      int field = BYTES_BIG_ENDIAN ? elt / 4 : 3 - elt / 4;
 	      x = gen_reg_rtx (V4SImode);
-	      emit_insn (gen_altivec_vspltw (x, gen_lowpart (V4SImode, op0),
-					     GEN_INT (field)));
+	      emit_insn (gen_altivec_vspltw_direct (x, gen_lowpart (V4SImode, op0),
+						    GEN_INT (field)));
 	      emit_move_insn (target, gen_lowpart (V16QImode, x));
 	      return true;
 	    }
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 207294)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -213,6 +213,7 @@ 
    UNSPEC_VSX_ROUND_I
    UNSPEC_VSX_ROUND_IC
    UNSPEC_VSX_SLDWI
+   UNSPEC_VSX_XXSPLTW
   ])
 
 ;; VSX moves
@@ -1751,6 +1752,20 @@ 
 	  (parallel
 	   [(match_operand:QI 2 "u5bit_cint_operand" "i,i")]))))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  if (!BYTES_BIG_ENDIAN)
+    operands[2] = GEN_INT (3 - INTVAL (operands[2]));
+
+  return "xxspltw %x0,%x1,%2";
+}
+  [(set_attr "type" "vecperm")])
+
+(define_insn "vsx_xxspltw_<mode>_direct"
+  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wf,?wa")
+        (unspec:VSX_W [(match_operand:VSX_W 1 "vsx_register_operand" "wf,wa")
+                       (match_operand:QI 2 "u5bit_cint_operand" "i,i")]
+                      UNSPEC_VSX_XXSPLTW))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
   "xxspltw %x0,%x1,%2"
   [(set_attr "type" "vecperm")])
 
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 207294)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -1600,44 +1600,187 @@ 
   "vsumsws %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "altivec_vspltb"
+(define_expand "altivec_vspltb"
+  [(match_operand:V16QI 0 "register_operand" "")
+   (match_operand:V16QI 1 "register_operand" "")
+   (match_operand:QI 2 "u5bit_cint_operand" "")]
+  "TARGET_ALTIVEC"
+{
+  rtvec v;
+  rtx x;
+
+  /* Special handling for LE with -maltivec=be.  We have to reflect
+     the actual selected index for the splat in the RTL.  */
+  if (!BYTES_BIG_ENDIAN && VECTOR_ELT_ORDER_BIG)
+    operands[2] = GEN_INT (15 - INTVAL (operands[2]));
+
+  v = gen_rtvec (1, operands[2]);
+  x = gen_rtx_VEC_SELECT (QImode, operands[1], gen_rtx_PARALLEL (VOIDmode, v));
+  x = gen_rtx_VEC_DUPLICATE (V16QImode, x);
+  emit_insn (gen_rtx_SET (VOIDmode, operands[0], x));
+  DONE;
+})
+
+(define_insn "*altivec_vspltb_internal"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
         (vec_duplicate:V16QI
 	 (vec_select:QI (match_operand:V16QI 1 "register_operand" "v")
 			(parallel
 			 [(match_operand:QI 2 "u5bit_cint_operand" "")]))))]
   "TARGET_ALTIVEC"
+{
+  /* For true LE, this adjusts the selected index.  For LE with 
+     -maltivec=be, this reverses what was done in the define_expand
+     because the instruction already has big-endian bias.  */
+  if (!BYTES_BIG_ENDIAN)
+    operands[2] = GEN_INT (15 - INTVAL (operands[2]));
+
+  return "vspltb %0,%1,%2";
+}
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vspltb_direct"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+        (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")
+	               (match_operand:QI 2 "u5bit_cint_operand" "i")]
+                      UNSPEC_VSPLT_DIRECT))]
+  "TARGET_ALTIVEC"
   "vspltb %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-(define_insn "altivec_vsplth"
+(define_expand "altivec_vsplth"
+  [(match_operand:V8HI 0 "register_operand" "")
+   (match_operand:V8HI 1 "register_operand" "")
+   (match_operand:QI 2 "u5bit_cint_operand" "")]
+  "TARGET_ALTIVEC"
+{
+  rtvec v;
+  rtx x;
+
+  /* Special handling for LE with -maltivec=be.  We have to reflect
+     the actual selected index for the splat in the RTL.  */
+  if (!BYTES_BIG_ENDIAN && VECTOR_ELT_ORDER_BIG)
+    operands[2] = GEN_INT (7 - INTVAL (operands[2]));
+
+  v = gen_rtvec (1, operands[2]);
+  x = gen_rtx_VEC_SELECT (HImode, operands[1], gen_rtx_PARALLEL (VOIDmode, v));
+  x = gen_rtx_VEC_DUPLICATE (V8HImode, x);
+  emit_insn (gen_rtx_SET (VOIDmode, operands[0], x));
+  DONE;
+})
+
+(define_insn "*altivec_vsplth_internal"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
 	(vec_duplicate:V8HI
 	 (vec_select:HI (match_operand:V8HI 1 "register_operand" "v")
 			(parallel
 			 [(match_operand:QI 2 "u5bit_cint_operand" "")]))))]
   "TARGET_ALTIVEC"
+{
+  /* For true LE, this adjusts the selected index.  For LE with 
+     -maltivec=be, this reverses what was done in the define_expand
+     because the instruction already has big-endian bias.  */
+  if (!BYTES_BIG_ENDIAN)
+    operands[2] = GEN_INT (7 - INTVAL (operands[2]));
+
+  return "vsplth %0,%1,%2";
+}
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vsplth_direct"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+        (unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v")
+                      (match_operand:QI 2 "u5bit_cint_operand" "i")]
+                     UNSPEC_VSPLT_DIRECT))]
+  "TARGET_ALTIVEC"
   "vsplth %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-(define_insn "altivec_vspltw"
+(define_expand "altivec_vspltw"
+  [(match_operand:V4SI 0 "register_operand" "")
+   (match_operand:V4SI 1 "register_operand" "")
+   (match_operand:QI 2 "u5bit_cint_operand" "")]
+  "TARGET_ALTIVEC"
+{
+  rtvec v;
+  rtx x;
+
+  /* Special handling for LE with -maltivec=be.  We have to reflect
+     the actual selected index for the splat in the RTL.  */
+  if (!BYTES_BIG_ENDIAN && VECTOR_ELT_ORDER_BIG)
+    operands[2] = GEN_INT (3 - INTVAL (operands[2]));
+
+  v = gen_rtvec (1, operands[2]);
+  x = gen_rtx_VEC_SELECT (SImode, operands[1], gen_rtx_PARALLEL (VOIDmode, v));
+  x = gen_rtx_VEC_DUPLICATE (V4SImode, x);
+  emit_insn (gen_rtx_SET (VOIDmode, operands[0], x));
+  DONE;
+})
+
+(define_insn "*altivec_vspltw_internal"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 	(vec_duplicate:V4SI
 	 (vec_select:SI (match_operand:V4SI 1 "register_operand" "v")
 			(parallel
 			 [(match_operand:QI 2 "u5bit_cint_operand" "i")]))))]
   "TARGET_ALTIVEC"
+{
+  /* For true LE, this adjusts the selected index.  For LE with 
+     -maltivec=be, this reverses what was done in the define_expand
+     because the instruction already has big-endian bias.  */
+  if (!BYTES_BIG_ENDIAN)
+    operands[2] = GEN_INT (3 - INTVAL (operands[2]));
+
+  return "vspltw %0,%1,%2";
+}
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vspltw_direct"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
+                      (match_operand:QI 2 "u5bit_cint_operand" "i")]
+                     UNSPEC_VSPLT_DIRECT))]
+  "TARGET_ALTIVEC"
   "vspltw %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-(define_insn "altivec_vspltsf"
+(define_expand "altivec_vspltsf"
+  [(match_operand:V4SF 0 "register_operand" "")
+   (match_operand:V4SF 1 "register_operand" "")
+   (match_operand:QI 2 "u5bit_cint_operand" "")]
+  "TARGET_ALTIVEC"
+{
+  rtvec v;
+  rtx x;
+
+  /* Special handling for LE with -maltivec=be.  We have to reflect
+     the actual selected index for the splat in the RTL.  */
+  if (!BYTES_BIG_ENDIAN && VECTOR_ELT_ORDER_BIG)
+    operands[2] = GEN_INT (3 - INTVAL (operands[2]));
+
+  v = gen_rtvec (1, operands[2]);
+  x = gen_rtx_VEC_SELECT (SFmode, operands[1], gen_rtx_PARALLEL (VOIDmode, v));
+  x = gen_rtx_VEC_DUPLICATE (V4SFmode, x);
+  emit_insn (gen_rtx_SET (VOIDmode, operands[0], x));
+  DONE;
+})
+
+(define_insn "*altivec_vspltsf_internal"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 	(vec_duplicate:V4SF
 	 (vec_select:SF (match_operand:V4SF 1 "register_operand" "v")
 			(parallel
 			 [(match_operand:QI 2 "u5bit_cint_operand" "i")]))))]
   "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
-  "vspltw %0,%1,%2"
+{
+  /* For true LE, this adjusts the selected index.  For LE with 
+     -maltivec=be, this reverses what was done in the define_expand
+     because the instruction already has big-endian bias.  */
+  if (!BYTES_BIG_ENDIAN)
+    operands[2] = GEN_INT (3 - INTVAL (operands[2]));
+
+  return "vspltw %0,%1,%2";
+}
   [(set_attr "type" "vecperm")])
 
 (define_insn "altivec_vspltis<VI_char>"