diff mbox

[4.8,7/26] Backport Power8 and LE support: Vector LE

Message ID 1395257438.17148.9.camel@gnopaine
State New
Headers show

Commit Message

Bill Schmidt March 19, 2014, 7:30 p.m. UTC
Hi,

This patch (diff-le-vector) backports the changes to support vector
infrastructure on powerpc64le.  Copying Richard and Jakub for the libcpp
bits.

Thanks,
Bill


[gcc]

2014-03-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	Backport from mainline r205333
	2013-11-24  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_expand_vec_perm_const_1): Correct
	for little endian.

	Backport from mainline r205241
	2013-11-21  Bill Schmidt  <wschmidt@vnet.ibm.com>

	* config/rs6000/vector.md (vec_pack_trunc_v2df): Revert previous
	little endian change.
	(vec_pack_sfix_trunc_v2df): Likewise.
	(vec_pack_ufix_trunc_v2df): Likewise.
	* config/rs6000/rs6000.c (rs6000_expand_interleave): Correct
	double checking of endianness.

	Backport from mainline r205146
	2013-11-20  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/vsx.md (vsx_set_<mode>): Adjust for little endian.
	(vsx_extract_<mode>): Likewise.
	(*vsx_extract_<mode>_one_le): New LE variant on
	*vsx_extract_<mode>_zero.
	(vsx_extract_v4sf): Adjust for little endian.

	Backport from mainline r205080
	2013-11-19  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Adjust
	V16QI vector splat case for little endian.

	Backport from mainline r205045:

	2013-11-19  Ulrich Weigand  <Ulrich.Weigand@de.ibm.com>

	* config/rs6000/vector.md ("mov<mode>"): Do not call
	rs6000_emit_le_vsx_move to move into or out of GPRs.
	* config/rs6000/rs6000.c (rs6000_emit_le_vsx_move): Assert
	source and destination are not GPR hard regs.

	Backport from mainline r204920
	2011-11-17  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_frame_related): Add split_reg
	parameter and use it in REG_FRAME_RELATED_EXPR note.
	(emit_frame_save): Call rs6000_frame_related with extra NULL_RTX
	parameter.
	(rs6000_emit_prologue): Likewise, but for little endian VSX
	stores, pass the source register of the store instead.

	Backport from mainline r204862
	2013-11-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/altivec.md (UNSPEC_VPERM_X, UNSPEC_VPERM_UNS_X):
	Remove.
	(altivec_vperm_<mode>): Revert earlier little endian change.
	(*altivec_vperm_<mode>_internal): Remove.
	(altivec_vperm_<mode>_uns): Revert earlier little endian change.
	(*altivec_vperm_<mode>_uns_internal): Remove.
	* config/rs6000/vector.md (vec_realign_load_<mode>): Revise
	commentary.

	Backport from mainline r204441
	2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_option_override_internal):
	Remove restriction against use of VSX instructions when generating
	code for little endian mode.

	Backport from mainline r204440
	2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/altivec.md (mulv4si3): Ensure we generate vmulouh
	for both big and little endian.
	(mulv8hi3): Swap input operands for merge high and merge low
	instructions for little endian.

	Backport from mainline r204439
	2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/altivec.md (vec_widen_umult_even_v16qi): Change
	define_insn to define_expand that uses even patterns for big
	endian and odd patterns for little endian.
	(vec_widen_smult_even_v16qi): Likewise.
	(vec_widen_umult_even_v8hi): Likewise.
	(vec_widen_smult_even_v8hi): Likewise.
	(vec_widen_umult_odd_v16qi): Likewise.
	(vec_widen_smult_odd_v16qi): Likewise.
	(vec_widen_umult_odd_v8hi): Likewise.
	(vec_widen_smult_odd_v8hi): Likewise.
	(altivec_vmuleub): New define_insn.
	(altivec_vmuloub): Likewise.
	(altivec_vmulesb): Likewise.
	(altivec_vmulosb): Likewise.
	(altivec_vmuleuh): Likewise.
	(altivec_vmulouh): Likewise.
	(altivec_vmulesh): Likewise.
	(altivec_vmulosh): Likewise.

	Backport from mainline r204395
	2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/vector.md (vec_pack_sfix_trunc_v2df): Adjust for
	little endian.
	(vec_pack_ufix_trunc_v2df): Likewise.

	Backport from mainline r204363
	2013-11-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap
	arguments to merge instruction for little endian.
	(vec_widen_umult_lo_v16qi): Likewise.
	(vec_widen_smult_hi_v16qi): Likewise.
	(vec_widen_smult_lo_v16qi): Likewise.
	(vec_widen_umult_hi_v8hi): Likewise.
	(vec_widen_umult_lo_v8hi): Likewise.
	(vec_widen_smult_hi_v8hi): Likewise.
	(vec_widen_smult_lo_v8hi): Likewise.

	Backport from mainline r204350
	2013-11-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/vsx.md (*vsx_le_perm_store_<mode> for VSX_D):
	Replace the define_insn_and_split with a define_insn and two
	define_splits, with the split after reload re-permuting the source
	register to its original value.
	(*vsx_le_perm_store_<mode> for VSX_W): Likewise.
	(*vsx_le_perm_store_v8hi): Likewise.
	(*vsx_le_perm_store_v16qi): Likewise.

	Backport from mainline r204321
	2013-11-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/vector.md (vec_pack_trunc_v2df):  Adjust for
	little endian.

	Backport from mainline r204321
	2013-11-02  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>

	* config/rs6000/rs6000.c (rs6000_expand_vector_set): Adjust for
	little endian.

	Backport from mainline r203980
	2013-10-23  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/altivec.md (mulv8hi3): Adjust for little endian.

	Backport from mainline r203930
	2013-10-22  Bill Schmidt  <wschmidt@vnet.ibm.com>

	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
	meaning of merge-high and merge-low masks for little endian; avoid
	use of vector-pack masks for little endian for mismatched modes.

	Backport from mainline r203877
	2013-10-20  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Adjust for
	little endian.
	(vec_unpacku_hi_v8hi): Likewise.
	(vec_unpacku_lo_v16qi): Likewise.
	(vec_unpacku_lo_v8hi): Likewise.

	Backport from mainline r203863
	2013-10-19  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (vspltis_constant): Make sure we check
	all elements for both endian flavors.

	Backport from mainline r203714
	2013-10-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc/config/rs6000/vector.md (vec_unpacks_hi_v4sf): Correct for
	endianness.
	(vec_unpacks_lo_v4sf): Likewise.
	(vec_unpacks_float_hi_v4si): Likewise.
	(vec_unpacks_float_lo_v4si): Likewise.
	(vec_unpacku_float_hi_v4si): Likewise.
	(vec_unpacku_float_lo_v4si): Likewise.

	Backport from mainline r203713
	2013-10-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/vsx.md (vsx_concat_<mode>): Adjust output for LE.
	(vsx_concat_v2sf): Likewise.

	Backport from mainline r203458
	2013-10-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/vsx.md (*vsx_le_perm_load_v2di): Generalize to
	handle vector float as well.
	(*vsx_le_perm_load_v4si): Likewise.
	(*vsx_le_perm_store_v2di): Likewise.
	(*vsx_le_perm_store_v4si): Likewise.

	Backport from mainline r203457
	2013-10-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/vector.md (vec_realign_load<mode>): Generate vperm
	directly to circumvent subtract from splat{31} workaround.
	* config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New
	prototype.
	* config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New.
	* config/rs6000/altivec.md (define_c_enum "unspec"): Add
	UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X.
	(altivec_vperm_<mode>): Convert to define_insn_and_split to
	separate big and little endian logic.
	(*altivec_vperm_<mode>_internal): New define_insn.
	(altivec_vperm_<mode>_uns): Convert to define_insn_and_split to
	separate big and little endian logic.
	(*altivec_vperm_<mode>_uns_internal): New define_insn.
	(vec_permv16qi): Add little endian logic.

	Backport from mainline r203247
	2013-10-07  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New.
	(altivec_expand_vec_perm_const): Call it.

	Backport from mainline r203246
	2013-10-07  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/vector.md (mov<mode>): Emit permuted move
	sequences for LE VSX loads and stores at expand time.
	* config/rs6000/rs6000-protos.h (rs6000_emit_le_vsx_move): New
	prototype.
	* config/rs6000/rs6000.c (rs6000_const_vec): New.
	(rs6000_gen_le_vsx_permute): New.
	(rs6000_gen_le_vsx_load): New.
	(rs6000_gen_le_vsx_store): New.
	(rs6000_gen_le_vsx_move): New.
	* config/rs6000/vsx.md (*vsx_le_perm_load_v2di): New.
	(*vsx_le_perm_load_v4si): New.
	(*vsx_le_perm_load_v8hi): New.
	(*vsx_le_perm_load_v16qi): New.
	(*vsx_le_perm_store_v2di): New.
	(*vsx_le_perm_store_v4si): New.
	(*vsx_le_perm_store_v8hi): New.
	(*vsx_le_perm_store_v16qi): New.
	(*vsx_xxpermdi2_le_<mode>): New.
	(*vsx_xxpermdi4_le_<mode>): New.
	(*vsx_xxpermdi8_le_V8HI): New.
	(*vsx_xxpermdi16_le_V16QI): New.
	(*vsx_lxvd2x2_le_<mode>): New.
	(*vsx_lxvd2x4_le_<mode>): New.
	(*vsx_lxvd2x8_le_V8HI): New.
	(*vsx_lxvd2x16_le_V16QI): New.
	(*vsx_stxvd2x2_le_<mode>): New.
	(*vsx_stxvd2x4_le_<mode>): New.
	(*vsx_stxvd2x8_le_V8HI): New.
	(*vsx_stxvd2x16_le_V16QI): New.

	Backport from mainline r201235
	2013-07-24  Bill Schmidt  <wschmidt@linux.ibm.com>
	            Anton Blanchard <anton@au1.ibm.com>

	* config/rs6000/altivec.md (altivec_vpkpx): Handle little endian.
	(altivec_vpks<VI_char>ss): Likewise.
	(altivec_vpks<VI_char>us): Likewise.
	(altivec_vpku<VI_char>us): Likewise.
	(altivec_vpku<VI_char>um): Likewise.

	Backport from mainline r201208
	2013-07-24  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
	            Anton Blanchard <anton@au1.ibm.com>

	* config/rs6000/vector.md (vec_realign_load_<mode>): Reorder input
	operands to vperm for little endian.
	* config/rs6000/rs6000.c (rs6000_expand_builtin): Use lvsr instead
	of lvsl to create the control mask for a vperm for little endian.

	Backport from mainline r201195
	2013-07-23  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
	            Anton Blanchard <anton@au1.ibm.com>

	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
	two operands for little-endian.

	Backport from mainline r201193
	2013-07-23  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
	            Anton Blanchard <anton@au1.ibm.com>

	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Correct
	selection of field for vector splat in little endian mode.

	Backport from mainline r201149
	2013-07-22  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
	            Anton Blanchard <anton@au1.ibm.com>

	* config/rs6000/rs6000.c (rs6000_expand_vector_init): Fix
	endianness when selecting field to splat.

[gcc/testsuite]

2014-03-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	Backport from mainline r205638
	2013-12-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Skip for little
	endian.

	Backport from mainline r205146
	2013-11-20  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/pr48258-1.c: Skip for little endian.

	Backport from mainline r204862
	2013-11-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.dg/vmx/3b-15.c: Revise for little endian.

	Backport from mainline r204321
	2013-11-02  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>

	* gcc.dg/vmx/vec-set.c: New.

	Backport from mainline r204138
	2013-10-28  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.dg/vmx/gcc-bug-i.c: Add little endian variant.
	* gcc.dg/vmx/eg-5.c: Likewise.

	Backport from mainline r203930
	2013-10-22  Bill Schmidt  <wschmidt@vnet.ibm.com>

	* gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack
	tests into...
	* gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is
	restricted to big-endian targets.

	Backport from mainline r203246
	2013-10-07  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/pr43154.c: Skip for ppc64 little endian.
	* gcc.target/powerpc/fusion.c: Likewise.

[libcpp]

2014-03-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	Backport from mainline
	2013-11-18  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* lex.c (search_line_fast): Correct for little endian.

Comments

Richard Biener March 24, 2014, 10:14 a.m. UTC | #1
On Wed, 19 Mar 2014, Bill Schmidt wrote:

> Hi,
> 
> This patch (diff-le-vector) backports the changes to support vector
> infrastructure on powerpc64le.  Copying Richard and Jakub for the libcpp
> bits.

The libcpp bits are fine.

Thanks,
Richard.

> Thanks,
> Bill
> 
> 
> [gcc]
> 
> 2014-03-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	Backport from mainline r205333
> 	2013-11-24  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_expand_vec_perm_const_1): Correct
> 	for little endian.
> 
> 	Backport from mainline r205241
> 	2013-11-21  Bill Schmidt  <wschmidt@vnet.ibm.com>
> 
> 	* config/rs6000/vector.md (vec_pack_trunc_v2df): Revert previous
> 	little endian change.
> 	(vec_pack_sfix_trunc_v2df): Likewise.
> 	(vec_pack_ufix_trunc_v2df): Likewise.
> 	* config/rs6000/rs6000.c (rs6000_expand_interleave): Correct
> 	double checking of endianness.
> 
> 	Backport from mainline r205146
> 	2013-11-20  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vsx.md (vsx_set_<mode>): Adjust for little endian.
> 	(vsx_extract_<mode>): Likewise.
> 	(*vsx_extract_<mode>_one_le): New LE variant on
> 	*vsx_extract_<mode>_zero.
> 	(vsx_extract_v4sf): Adjust for little endian.
> 
> 	Backport from mainline r205080
> 	2013-11-19  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Adjust
> 	V16QI vector splat case for little endian.
> 
> 	Backport from mainline r205045:
> 
> 	2013-11-19  Ulrich Weigand  <Ulrich.Weigand@de.ibm.com>
> 
> 	* config/rs6000/vector.md ("mov<mode>"): Do not call
> 	rs6000_emit_le_vsx_move to move into or out of GPRs.
> 	* config/rs6000/rs6000.c (rs6000_emit_le_vsx_move): Assert
> 	source and destination are not GPR hard regs.
> 
> 	Backport from mainline r204920
> 	2011-11-17  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_frame_related): Add split_reg
> 	parameter and use it in REG_FRAME_RELATED_EXPR note.
> 	(emit_frame_save): Call rs6000_frame_related with extra NULL_RTX
> 	parameter.
> 	(rs6000_emit_prologue): Likewise, but for little endian VSX
> 	stores, pass the source register of the store instead.
> 
> 	Backport from mainline r204862
> 	2013-11-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (UNSPEC_VPERM_X, UNSPEC_VPERM_UNS_X):
> 	Remove.
> 	(altivec_vperm_<mode>): Revert earlier little endian change.
> 	(*altivec_vperm_<mode>_internal): Remove.
> 	(altivec_vperm_<mode>_uns): Revert earlier little endian change.
> 	(*altivec_vperm_<mode>_uns_internal): Remove.
> 	* config/rs6000/vector.md (vec_realign_load_<mode>): Revise
> 	commentary.
> 
> 	Backport from mainline r204441
> 	2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_option_override_internal):
> 	Remove restriction against use of VSX instructions when generating
> 	code for little endian mode.
> 
> 	Backport from mainline r204440
> 	2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (mulv4si3): Ensure we generate vmulouh
> 	for both big and little endian.
> 	(mulv8hi3): Swap input operands for merge high and merge low
> 	instructions for little endian.
> 
> 	Backport from mainline r204439
> 	2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (vec_widen_umult_even_v16qi): Change
> 	define_insn to define_expand that uses even patterns for big
> 	endian and odd patterns for little endian.
> 	(vec_widen_smult_even_v16qi): Likewise.
> 	(vec_widen_umult_even_v8hi): Likewise.
> 	(vec_widen_smult_even_v8hi): Likewise.
> 	(vec_widen_umult_odd_v16qi): Likewise.
> 	(vec_widen_smult_odd_v16qi): Likewise.
> 	(vec_widen_umult_odd_v8hi): Likewise.
> 	(vec_widen_smult_odd_v8hi): Likewise.
> 	(altivec_vmuleub): New define_insn.
> 	(altivec_vmuloub): Likewise.
> 	(altivec_vmulesb): Likewise.
> 	(altivec_vmulosb): Likewise.
> 	(altivec_vmuleuh): Likewise.
> 	(altivec_vmulouh): Likewise.
> 	(altivec_vmulesh): Likewise.
> 	(altivec_vmulosh): Likewise.
> 
> 	Backport from mainline r204395
> 	2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vector.md (vec_pack_sfix_trunc_v2df): Adjust for
> 	little endian.
> 	(vec_pack_ufix_trunc_v2df): Likewise.
> 
> 	Backport from mainline r204363
> 	2013-11-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap
> 	arguments to merge instruction for little endian.
> 	(vec_widen_umult_lo_v16qi): Likewise.
> 	(vec_widen_smult_hi_v16qi): Likewise.
> 	(vec_widen_smult_lo_v16qi): Likewise.
> 	(vec_widen_umult_hi_v8hi): Likewise.
> 	(vec_widen_umult_lo_v8hi): Likewise.
> 	(vec_widen_smult_hi_v8hi): Likewise.
> 	(vec_widen_smult_lo_v8hi): Likewise.
> 
> 	Backport from mainline r204350
> 	2013-11-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vsx.md (*vsx_le_perm_store_<mode> for VSX_D):
> 	Replace the define_insn_and_split with a define_insn and two
> 	define_splits, with the split after reload re-permuting the source
> 	register to its original value.
> 	(*vsx_le_perm_store_<mode> for VSX_W): Likewise.
> 	(*vsx_le_perm_store_v8hi): Likewise.
> 	(*vsx_le_perm_store_v16qi): Likewise.
> 
> 	Backport from mainline r204321
> 	2013-11-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vector.md (vec_pack_trunc_v2df):  Adjust for
> 	little endian.
> 
> 	Backport from mainline r204321
> 	2013-11-02  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_expand_vector_set): Adjust for
> 	little endian.
> 
> 	Backport from mainline r203980
> 	2013-10-23  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (mulv8hi3): Adjust for little endian.
> 
> 	Backport from mainline r203930
> 	2013-10-22  Bill Schmidt  <wschmidt@vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
> 	meaning of merge-high and merge-low masks for little endian; avoid
> 	use of vector-pack masks for little endian for mismatched modes.
> 
> 	Backport from mainline r203877
> 	2013-10-20  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Adjust for
> 	little endian.
> 	(vec_unpacku_hi_v8hi): Likewise.
> 	(vec_unpacku_lo_v16qi): Likewise.
> 	(vec_unpacku_lo_v8hi): Likewise.
> 
> 	Backport from mainline r203863
> 	2013-10-19  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (vspltis_constant): Make sure we check
> 	all elements for both endian flavors.
> 
> 	Backport from mainline r203714
> 	2013-10-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* gcc/config/rs6000/vector.md (vec_unpacks_hi_v4sf): Correct for
> 	endianness.
> 	(vec_unpacks_lo_v4sf): Likewise.
> 	(vec_unpacks_float_hi_v4si): Likewise.
> 	(vec_unpacks_float_lo_v4si): Likewise.
> 	(vec_unpacku_float_hi_v4si): Likewise.
> 	(vec_unpacku_float_lo_v4si): Likewise.
> 
> 	Backport from mainline r203713
> 	2013-10-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vsx.md (vsx_concat_<mode>): Adjust output for LE.
> 	(vsx_concat_v2sf): Likewise.
> 
> 	Backport from mainline r203458
> 	2013-10-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vsx.md (*vsx_le_perm_load_v2di): Generalize to
> 	handle vector float as well.
> 	(*vsx_le_perm_load_v4si): Likewise.
> 	(*vsx_le_perm_store_v2di): Likewise.
> 	(*vsx_le_perm_store_v4si): Likewise.
> 
> 	Backport from mainline r203457
> 	2013-10-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vector.md (vec_realign_load<mode>): Generate vperm
> 	directly to circumvent subtract from splat{31} workaround.
> 	* config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New
> 	prototype.
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New.
> 	* config/rs6000/altivec.md (define_c_enum "unspec"): Add
> 	UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X.
> 	(altivec_vperm_<mode>): Convert to define_insn_and_split to
> 	separate big and little endian logic.
> 	(*altivec_vperm_<mode>_internal): New define_insn.
> 	(altivec_vperm_<mode>_uns): Convert to define_insn_and_split to
> 	separate big and little endian logic.
> 	(*altivec_vperm_<mode>_uns_internal): New define_insn.
> 	(vec_permv16qi): Add little endian logic.
> 
> 	Backport from mainline r203247
> 	2013-10-07  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New.
> 	(altivec_expand_vec_perm_const): Call it.
> 
> 	Backport from mainline r203246
> 	2013-10-07  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/vector.md (mov<mode>): Emit permuted move
> 	sequences for LE VSX loads and stores at expand time.
> 	* config/rs6000/rs6000-protos.h (rs6000_emit_le_vsx_move): New
> 	prototype.
> 	* config/rs6000/rs6000.c (rs6000_const_vec): New.
> 	(rs6000_gen_le_vsx_permute): New.
> 	(rs6000_gen_le_vsx_load): New.
> 	(rs6000_gen_le_vsx_store): New.
> 	(rs6000_gen_le_vsx_move): New.
> 	* config/rs6000/vsx.md (*vsx_le_perm_load_v2di): New.
> 	(*vsx_le_perm_load_v4si): New.
> 	(*vsx_le_perm_load_v8hi): New.
> 	(*vsx_le_perm_load_v16qi): New.
> 	(*vsx_le_perm_store_v2di): New.
> 	(*vsx_le_perm_store_v4si): New.
> 	(*vsx_le_perm_store_v8hi): New.
> 	(*vsx_le_perm_store_v16qi): New.
> 	(*vsx_xxpermdi2_le_<mode>): New.
> 	(*vsx_xxpermdi4_le_<mode>): New.
> 	(*vsx_xxpermdi8_le_V8HI): New.
> 	(*vsx_xxpermdi16_le_V16QI): New.
> 	(*vsx_lxvd2x2_le_<mode>): New.
> 	(*vsx_lxvd2x4_le_<mode>): New.
> 	(*vsx_lxvd2x8_le_V8HI): New.
> 	(*vsx_lxvd2x16_le_V16QI): New.
> 	(*vsx_stxvd2x2_le_<mode>): New.
> 	(*vsx_stxvd2x4_le_<mode>): New.
> 	(*vsx_stxvd2x8_le_V8HI): New.
> 	(*vsx_stxvd2x16_le_V16QI): New.
> 
> 	Backport from mainline r201235
> 	2013-07-24  Bill Schmidt  <wschmidt@linux.ibm.com>
> 	            Anton Blanchard <anton@au1.ibm.com>
> 
> 	* config/rs6000/altivec.md (altivec_vpkpx): Handle little endian.
> 	(altivec_vpks<VI_char>ss): Likewise.
> 	(altivec_vpks<VI_char>us): Likewise.
> 	(altivec_vpku<VI_char>us): Likewise.
> 	(altivec_vpku<VI_char>um): Likewise.
> 
> 	Backport from mainline r201208
> 	2013-07-24  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
> 	            Anton Blanchard <anton@au1.ibm.com>
> 
> 	* config/rs6000/vector.md (vec_realign_load_<mode>): Reorder input
> 	operands to vperm for little endian.
> 	* config/rs6000/rs6000.c (rs6000_expand_builtin): Use lvsr instead
> 	of lvsl to create the control mask for a vperm for little endian.
> 
> 	Backport from mainline r201195
> 	2013-07-23  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 	            Anton Blanchard <anton@au1.ibm.com>
> 
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
> 	two operands for little-endian.
> 
> 	Backport from mainline r201193
> 	2013-07-23  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 	            Anton Blanchard <anton@au1.ibm.com>
> 
> 	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Correct
> 	selection of field for vector splat in little endian mode.
> 
> 	Backport from mainline r201149
> 	2013-07-22  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
> 	            Anton Blanchard <anton@au1.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_expand_vector_init): Fix
> 	endianness when selecting field to splat.
> 
> [gcc/testsuite]
> 
> 2014-03-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	Backport from mainline r205638
> 	2013-12-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Skip for little
> 	endian.
> 
> 	Backport from mainline r205146
> 	2013-11-20  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* gcc.target/powerpc/pr48258-1.c: Skip for little endian.
> 
> 	Backport from mainline r204862
> 	2013-11-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* gcc.dg/vmx/3b-15.c: Revise for little endian.
> 
> 	Backport from mainline r204321
> 	2013-11-02  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
> 
> 	* gcc.dg/vmx/vec-set.c: New.
> 
> 	Backport from mainline r204138
> 	2013-10-28  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* gcc.dg/vmx/gcc-bug-i.c: Add little endian variant.
> 	* gcc.dg/vmx/eg-5.c: Likewise.
> 
> 	Backport from mainline r203930
> 	2013-10-22  Bill Schmidt  <wschmidt@vnet.ibm.com>
> 
> 	* gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack
> 	tests into...
> 	* gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is
> 	restricted to big-endian targets.
> 
> 	Backport from mainline r203246
> 	2013-10-07  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* gcc.target/powerpc/pr43154.c: Skip for ppc64 little endian.
> 	* gcc.target/powerpc/fusion.c: Likewise.
> 
> [libcpp]
> 
> 2014-03-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	Backport from mainline
> 	2013-11-18  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* lex.c (search_line_fast): Correct for little endian.
> 
> 
> Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c
> +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c
> @@ -3216,11 +3216,6 @@ rs6000_option_override_internal (bool gl
>  	}
>        else if (TARGET_PAIRED_FLOAT)
>  	msg = N_("-mvsx and -mpaired are incompatible");
> -      /* The hardware will allow VSX and little endian, but until we make sure
> -	 things like vector select, etc. work don't allow VSX on little endian
> -	 systems at this point.  */
> -      else if (!BYTES_BIG_ENDIAN)
> -	msg = N_("-mvsx used with little endian code");
>        else if (TARGET_AVOID_XFORM > 0)
>  	msg = N_("-mvsx needs indexed addressing");
>        else if (!TARGET_ALTIVEC && (rs6000_isa_flags_explicit
> @@ -4991,15 +4986,16 @@ vspltis_constant (rtx op, unsigned step,
>  
>    /* Check if VAL is present in every STEP-th element, and the
>       other elements are filled with its most significant bit.  */
> -  for (i = 0; i < nunits - 1; ++i)
> +  for (i = 1; i < nunits; ++i)
>      {
>        HOST_WIDE_INT desired_val;
> -      if (((BYTES_BIG_ENDIAN ? i + 1 : i) & (step - 1)) == 0)
> +      unsigned elt = BYTES_BIG_ENDIAN ? nunits - 1 - i : i;
> +      if ((i & (step - 1)) == 0)
>  	desired_val = val;
>        else
>  	desired_val = msb_val;
>  
> -      if (desired_val != const_vector_elt_as_int (op, i))
> +      if (desired_val != const_vector_elt_as_int (op, elt))
>  	return false;
>      }
>  
> @@ -5446,6 +5442,7 @@ rs6000_expand_vector_init (rtx target, r
>       of 64-bit items is not supported on Altivec.  */
>    if (all_same && GET_MODE_SIZE (inner_mode) <= 4)
>      {
> +      rtx field;
>        mem = assign_stack_temp (mode, GET_MODE_SIZE (inner_mode));
>        emit_move_insn (adjust_address_nv (mem, inner_mode, 0),
>  		      XVECEXP (vals, 0, 0));
> @@ -5456,9 +5453,11 @@ rs6000_expand_vector_init (rtx target, r
>  					      gen_rtx_SET (VOIDmode,
>  							   target, mem),
>  					      x)));
> +      field = (BYTES_BIG_ENDIAN ? const0_rtx
> +	       : GEN_INT (GET_MODE_NUNITS (mode) - 1));
>        x = gen_rtx_VEC_SELECT (inner_mode, target,
>  			      gen_rtx_PARALLEL (VOIDmode,
> -						gen_rtvec (1, const0_rtx)));
> +						gen_rtvec (1, field)));
>        emit_insn (gen_rtx_SET (VOIDmode, target,
>  			      gen_rtx_VEC_DUPLICATE (mode, x)));
>        return;
> @@ -5531,10 +5530,27 @@ rs6000_expand_vector_set (rtx target, rt
>      XVECEXP (mask, 0, elt*width + i)
>        = GEN_INT (i + 0x10);
>    x = gen_rtx_CONST_VECTOR (V16QImode, XVEC (mask, 0));
> -  x = gen_rtx_UNSPEC (mode,
> -		      gen_rtvec (3, target, reg,
> -				 force_reg (V16QImode, x)),
> -		      UNSPEC_VPERM);
> +
> +  if (BYTES_BIG_ENDIAN)
> +    x = gen_rtx_UNSPEC (mode,
> +			gen_rtvec (3, target, reg,
> +				   force_reg (V16QImode, x)),
> +			UNSPEC_VPERM);
> +  else 
> +    {
> +      /* Invert selector.  */
> +      rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode,
> +					 gen_rtx_CONST_INT (QImode, -1));
> +      rtx tmp = gen_reg_rtx (V16QImode);
> +      emit_move_insn (tmp, splat);
> +      x = gen_rtx_MINUS (V16QImode, tmp, force_reg (V16QImode, x));
> +      emit_move_insn (tmp, x);
> +
> +      /* Permute with operands reversed and adjusted selector.  */
> +      x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
> +			  UNSPEC_VPERM);
> +    }
> +
>    emit_insn (gen_rtx_SET (VOIDmode, target, x));
>  }
>  
> @@ -7830,6 +7846,107 @@ rs6000_eliminate_indexed_memrefs (rtx op
>  			       copy_addr_to_reg (XEXP (operands[1], 0)));
>  }
>  
> +/* Generate a vector of constants to permute MODE for a little-endian
> +   storage operation by swapping the two halves of a vector.  */
> +static rtvec
> +rs6000_const_vec (enum machine_mode mode)
> +{
> +  int i, subparts;
> +  rtvec v;
> +
> +  switch (mode)
> +    {
> +    case V2DFmode:
> +    case V2DImode:
> +      subparts = 2;
> +      break;
> +    case V4SFmode:
> +    case V4SImode:
> +      subparts = 4;
> +      break;
> +    case V8HImode:
> +      subparts = 8;
> +      break;
> +    case V16QImode:
> +      subparts = 16;
> +      break;
> +    default:
> +      gcc_unreachable();
> +    }
> +
> +  v = rtvec_alloc (subparts);
> +
> +  for (i = 0; i < subparts / 2; ++i)
> +    RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i + subparts / 2);
> +  for (i = subparts / 2; i < subparts; ++i)
> +    RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i - subparts / 2);
> +
> +  return v;
> +}
> +
> +/* Generate a permute rtx that represents an lxvd2x, stxvd2x, or xxpermdi
> +   for a VSX load or store operation.  */
> +rtx
> +rs6000_gen_le_vsx_permute (rtx source, enum machine_mode mode)
> +{
> +  rtx par = gen_rtx_PARALLEL (VOIDmode, rs6000_const_vec (mode));
> +  return gen_rtx_VEC_SELECT (mode, source, par);
> +}
> +
> +/* Emit a little-endian load from vector memory location SOURCE to VSX
> +   register DEST in mode MODE.  The load is done with two permuting
> +   insn's that represent an lxvd2x and xxpermdi.  */
> +void
> +rs6000_emit_le_vsx_load (rtx dest, rtx source, enum machine_mode mode)
> +{
> +  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (dest) : dest;
> +  rtx permute_mem = rs6000_gen_le_vsx_permute (source, mode);
> +  rtx permute_reg = rs6000_gen_le_vsx_permute (tmp, mode);
> +  emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_mem));
> +  emit_insn (gen_rtx_SET (VOIDmode, dest, permute_reg));
> +}
> +
> +/* Emit a little-endian store to vector memory location DEST from VSX
> +   register SOURCE in mode MODE.  The store is done with two permuting
> +   insn's that represent an xxpermdi and an stxvd2x.  */
> +void
> +rs6000_emit_le_vsx_store (rtx dest, rtx source, enum machine_mode mode)
> +{
> +  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (source) : source;
> +  rtx permute_src = rs6000_gen_le_vsx_permute (source, mode);
> +  rtx permute_tmp = rs6000_gen_le_vsx_permute (tmp, mode);
> +  emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_src));
> +  emit_insn (gen_rtx_SET (VOIDmode, dest, permute_tmp));
> +}
> +
> +/* Emit a sequence representing a little-endian VSX load or store,
> +   moving data from SOURCE to DEST in mode MODE.  This is done
> +   separately from rs6000_emit_move to ensure it is called only
> +   during expand.  LE VSX loads and stores introduced later are
> +   handled with a split.  The expand-time RTL generation allows
> +   us to optimize away redundant pairs of register-permutes.  */
> +void
> +rs6000_emit_le_vsx_move (rtx dest, rtx source, enum machine_mode mode)
> +{
> +  gcc_assert (!BYTES_BIG_ENDIAN
> +	      && VECTOR_MEM_VSX_P (mode)
> +	      && mode != TImode
> +	      && !gpr_or_gpr_p (dest, source)
> +	      && (MEM_P (source) ^ MEM_P (dest)));
> +
> +  if (MEM_P (source))
> +    {
> +      gcc_assert (REG_P (dest));
> +      rs6000_emit_le_vsx_load (dest, source, mode);
> +    }
> +  else
> +    {
> +      if (!REG_P (source))
> +	source = force_reg (mode, source);
> +      rs6000_emit_le_vsx_store (dest, source, mode);
> +    }
> +}
> +
>  /* Emit a move from SOURCE to DEST in mode MODE.  */
>  void
>  rs6000_emit_move (rtx dest, rtx source, enum machine_mode mode)
> @@ -12589,7 +12706,8 @@ rs6000_expand_builtin (tree exp, rtx tar
>      case ALTIVEC_BUILTIN_MASK_FOR_LOAD:
>      case ALTIVEC_BUILTIN_MASK_FOR_STORE:
>        {
> -	int icode = (int) CODE_FOR_altivec_lvsr;
> +	int icode = (BYTES_BIG_ENDIAN ? (int) CODE_FOR_altivec_lvsr
> +		     : (int) CODE_FOR_altivec_lvsl);
>  	enum machine_mode tmode = insn_data[icode].operand[0].mode;
>  	enum machine_mode mode = insn_data[icode].operand[1].mode;
>  	tree arg;
> @@ -20880,7 +20998,7 @@ output_probe_stack_range (rtx reg1, rtx
>  
>  static rtx
>  rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE_INT val,
> -		      rtx reg2, rtx rreg)
> +		      rtx reg2, rtx rreg, rtx split_reg)
>  {
>    rtx real, temp;
>  
> @@ -20971,6 +21089,11 @@ rs6000_frame_related (rtx insn, rtx reg,
>  	  }
>      }
>  
> +  /* If a store insn has been split into multiple insns, the
> +     true source register is given by split_reg.  */
> +  if (split_reg != NULL_RTX)
> +    real = gen_rtx_SET (VOIDmode, SET_DEST (real), split_reg);
> +
>    RTX_FRAME_RELATED_P (insn) = 1;
>    add_reg_note (insn, REG_FRAME_RELATED_EXPR, real);
>  
> @@ -21078,7 +21201,7 @@ emit_frame_save (rtx frame_reg, enum mac
>    reg = gen_rtx_REG (mode, regno);
>    insn = emit_insn (gen_frame_store (reg, frame_reg, offset));
>    return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp,
> -			       NULL_RTX, NULL_RTX);
> +			       NULL_RTX, NULL_RTX, NULL_RTX);
>  }
>  
>  /* Emit an offset memory reference suitable for a frame store, while
> @@ -21599,7 +21722,7 @@ rs6000_emit_prologue (void)
>  
>        insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
>        rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
> -			    treg, GEN_INT (-info->total_size));
> +			    treg, GEN_INT (-info->total_size), NULL_RTX);
>        sp_off = frame_off = info->total_size;
>      }
>  
> @@ -21684,7 +21807,7 @@ rs6000_emit_prologue (void)
>  
>  	  insn = emit_move_insn (mem, reg);
>  	  rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
> -				NULL_RTX, NULL_RTX);
> +				NULL_RTX, NULL_RTX, NULL_RTX);
>  	  END_USE (0);
>  	}
>      }
> @@ -21752,7 +21875,7 @@ rs6000_emit_prologue (void)
>  				     info->lr_save_offset,
>  				     DFmode, sel);
>        rs6000_frame_related (insn, ptr_reg, sp_off,
> -			    NULL_RTX, NULL_RTX);
> +			    NULL_RTX, NULL_RTX, NULL_RTX);
>        if (lr)
>  	END_USE (0);
>      }
> @@ -21831,7 +21954,7 @@ rs6000_emit_prologue (void)
>  					 SAVRES_SAVE | SAVRES_GPR);
>  
>  	  rs6000_frame_related (insn, spe_save_area_ptr, sp_off - save_off,
> -				NULL_RTX, NULL_RTX);
> +				NULL_RTX, NULL_RTX, NULL_RTX);
>  	}
>  
>        /* Move the static chain pointer back.  */
> @@ -21881,7 +22004,7 @@ rs6000_emit_prologue (void)
>  				     info->lr_save_offset + ptr_off,
>  				     reg_mode, sel);
>        rs6000_frame_related (insn, ptr_reg, sp_off - ptr_off,
> -			    NULL_RTX, NULL_RTX);
> +			    NULL_RTX, NULL_RTX, NULL_RTX);
>        if (lr)
>  	END_USE (0);
>      }
> @@ -21897,7 +22020,7 @@ rs6000_emit_prologue (void)
>  			     info->gp_save_offset + frame_off + reg_size * i);
>        insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
>        rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
> -			    NULL_RTX, NULL_RTX);
> +			    NULL_RTX, NULL_RTX, NULL_RTX);
>      }
>    else if (!WORLD_SAVE_P (info))
>      {
> @@ -22124,7 +22247,7 @@ rs6000_emit_prologue (void)
>  				     info->altivec_save_offset + ptr_off,
>  				     0, V4SImode, SAVRES_SAVE | SAVRES_VR);
>        rs6000_frame_related (insn, scratch_reg, sp_off - ptr_off,
> -			    NULL_RTX, NULL_RTX);
> +			    NULL_RTX, NULL_RTX, NULL_RTX);
>        if (REGNO (frame_reg_rtx) == REGNO (scratch_reg))
>  	{
>  	  /* The oddity mentioned above clobbered our frame reg.  */
> @@ -22140,7 +22263,7 @@ rs6000_emit_prologue (void)
>        for (i = info->first_altivec_reg_save; i <= LAST_ALTIVEC_REGNO; ++i)
>  	if (info->vrsave_mask & ALTIVEC_REG_BIT (i))
>  	  {
> -	    rtx areg, savereg, mem;
> +	    rtx areg, savereg, mem, split_reg;
>  	    int offset;
>  
>  	    offset = (info->altivec_save_offset + frame_off
> @@ -22158,8 +22281,18 @@ rs6000_emit_prologue (void)
>  
>  	    insn = emit_move_insn (mem, savereg);
>  
> +	    /* When we split a VSX store into two insns, we need to make
> +	       sure the DWARF info knows which register we are storing.
> +	       Pass it in to be used on the appropriate note.  */
> +	    if (!BYTES_BIG_ENDIAN
> +		&& GET_CODE (PATTERN (insn)) == SET
> +		&& GET_CODE (SET_SRC (PATTERN (insn))) == VEC_SELECT)
> +	      split_reg = savereg;
> +	    else
> +	      split_reg = NULL_RTX;
> +
>  	    rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
> -				  areg, GEN_INT (offset));
> +				  areg, GEN_INT (offset), split_reg);
>  	  }
>      }
>  
> @@ -28813,6 +28946,136 @@ rs6000_emit_parity (rtx dst, rtx src)
>      }
>  }
>  
> +/* Expand an Altivec constant permutation for little endian mode.
> +   There are two issues: First, the two input operands must be
> +   swapped so that together they form a double-wide array in LE
> +   order.  Second, the vperm instruction has surprising behavior
> +   in LE mode:  it interprets the elements of the source vectors
> +   in BE mode ("left to right") and interprets the elements of
> +   the destination vector in LE mode ("right to left").  To
> +   correct for this, we must subtract each element of the permute
> +   control vector from 31.
> +
> +   For example, suppose we want to concatenate vr10 = {0, 1, 2, 3}
> +   with vr11 = {4, 5, 6, 7} and extract {0, 2, 4, 6} using a vperm.
> +   We place {0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27} in vr12 to
> +   serve as the permute control vector.  Then, in BE mode,
> +
> +     vperm 9,10,11,12
> +
> +   places the desired result in vr9.  However, in LE mode the 
> +   vector contents will be
> +
> +     vr10 = 00000003 00000002 00000001 00000000
> +     vr11 = 00000007 00000006 00000005 00000004
> +
> +   The result of the vperm using the same permute control vector is
> +
> +     vr9  = 05000000 07000000 01000000 03000000
> +
> +   That is, the leftmost 4 bytes of vr10 are interpreted as the
> +   source for the rightmost 4 bytes of vr9, and so on.
> +
> +   If we change the permute control vector to
> +
> +     vr12 = {31,20,29,28,23,22,21,20,15,14,13,12,7,6,5,4}
> +
> +   and issue
> +
> +     vperm 9,11,10,12
> +
> +   we get the desired
> +
> +   vr9  = 00000006 00000004 00000002 00000000.  */
> +
> +void
> +altivec_expand_vec_perm_const_le (rtx operands[4])
> +{
> +  unsigned int i;
> +  rtx perm[16];
> +  rtx constv, unspec;
> +  rtx target = operands[0];
> +  rtx op0 = operands[1];
> +  rtx op1 = operands[2];
> +  rtx sel = operands[3];
> +
> +  /* Unpack and adjust the constant selector.  */
> +  for (i = 0; i < 16; ++i)
> +    {
> +      rtx e = XVECEXP (sel, 0, i);
> +      unsigned int elt = 31 - (INTVAL (e) & 31);
> +      perm[i] = GEN_INT (elt);
> +    }
> +
> +  /* Expand to a permute, swapping the inputs and using the
> +     adjusted selector.  */
> +  if (!REG_P (op0))
> +    op0 = force_reg (V16QImode, op0);
> +  if (!REG_P (op1))
> +    op1 = force_reg (V16QImode, op1);
> +
> +  constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
> +  constv = force_reg (V16QImode, constv);
> +  unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, constv),
> +			   UNSPEC_VPERM);
> +  if (!REG_P (target))
> +    {
> +      rtx tmp = gen_reg_rtx (V16QImode);
> +      emit_move_insn (tmp, unspec);
> +      unspec = tmp;
> +    }
> +
> +  emit_move_insn (target, unspec);
> +}
> +
> +/* Similarly to altivec_expand_vec_perm_const_le, we must adjust the
> +   permute control vector.  But here it's not a constant, so we must
> +   generate a vector splat/subtract to do the adjustment.  */
> +
> +void
> +altivec_expand_vec_perm_le (rtx operands[4])
> +{
> +  rtx splat, unspec;
> +  rtx target = operands[0];
> +  rtx op0 = operands[1];
> +  rtx op1 = operands[2];
> +  rtx sel = operands[3];
> +  rtx tmp = target;
> +
> +  /* Get everything in regs so the pattern matches.  */
> +  if (!REG_P (op0))
> +    op0 = force_reg (V16QImode, op0);
> +  if (!REG_P (op1))
> +    op1 = force_reg (V16QImode, op1);
> +  if (!REG_P (sel))
> +    sel = force_reg (V16QImode, sel);
> +  if (!REG_P (target))
> +    tmp = gen_reg_rtx (V16QImode);
> +
> +  /* SEL = splat(31) - SEL.  */
> +  /* We want to subtract from 31, but we can't vspltisb 31 since
> +     it's out of range.  -1 works as well because only the low-order
> +     five bits of the permute control vector elements are used.  */
> +  splat = gen_rtx_VEC_DUPLICATE (V16QImode,
> +				 gen_rtx_CONST_INT (QImode, -1));
> +  emit_move_insn (tmp, splat);
> +  sel = gen_rtx_MINUS (V16QImode, tmp, sel);
> +  emit_move_insn (tmp, sel);
> +
> +  /* Permute with operands reversed and adjusted selector.  */
> +  unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, tmp),
> +			   UNSPEC_VPERM);
> +
> +  /* Copy into target, possibly by way of a register.  */
> +  if (!REG_P (target))
> +    {
> +      emit_move_insn (tmp, unspec);
> +      unspec = tmp;
> +    }
> +
> +  emit_move_insn (target, unspec);
> +}
> +
>  /* Expand an Altivec constant permutation.  Return true if we match
>     an efficient implementation; false to fall back to VPERM.  */
>  
> @@ -28829,17 +29092,23 @@ altivec_expand_vec_perm_const (rtx opera
>        {  1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } },
>      { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum,
>        {  2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } },
> -    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb,
> +    { OPTION_MASK_ALTIVEC, 
> +      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb,
>        {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },
> -    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh,
> +    { OPTION_MASK_ALTIVEC,
> +      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh,
>        {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7, 22, 23 } },
> -    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw,
> +    { OPTION_MASK_ALTIVEC,
> +      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw,
>        {  0,  1,  2,  3, 16, 17, 18, 19,  4,  5,  6,  7, 20, 21, 22, 23 } },
> -    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb,
> +    { OPTION_MASK_ALTIVEC,
> +      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb,
>        {  8, 24,  9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } },
> -    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh,
> +    { OPTION_MASK_ALTIVEC,
> +      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh,
>        {  8,  9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } },
> -    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw,
> +    { OPTION_MASK_ALTIVEC,
> +      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw,
>        {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
>      { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew,
>        {  0,  1,  2,  3, 16, 17, 18, 19,  8,  9, 10, 11, 24, 25, 26, 27 } },
> @@ -28901,6 +29170,8 @@ altivec_expand_vec_perm_const (rtx opera
>  	  break;
>        if (i == 16)
>  	{
> +          if (!BYTES_BIG_ENDIAN)
> +            elt = 15 - elt;
>  	  emit_insn (gen_altivec_vspltb (target, op0, GEN_INT (elt)));
>  	  return true;
>  	}
> @@ -28912,9 +29183,10 @@ altivec_expand_vec_perm_const (rtx opera
>  	      break;
>  	  if (i == 16)
>  	    {
> +	      int field = BYTES_BIG_ENDIAN ? elt / 2 : 7 - elt / 2;
>  	      x = gen_reg_rtx (V8HImode);
>  	      emit_insn (gen_altivec_vsplth (x, gen_lowpart (V8HImode, op0),
> -					     GEN_INT (elt / 2)));
> +					     GEN_INT (field)));
>  	      emit_move_insn (target, gen_lowpart (V16QImode, x));
>  	      return true;
>  	    }
> @@ -28930,9 +29202,10 @@ altivec_expand_vec_perm_const (rtx opera
>  	      break;
>  	  if (i == 16)
>  	    {
> +	      int field = BYTES_BIG_ENDIAN ? elt / 4 : 3 - elt / 4;
>  	      x = gen_reg_rtx (V4SImode);
>  	      emit_insn (gen_altivec_vspltw (x, gen_lowpart (V4SImode, op0),
> -					     GEN_INT (elt / 4)));
> +					     GEN_INT (field)));
>  	      emit_move_insn (target, gen_lowpart (V16QImode, x));
>  	      return true;
>  	    }
> @@ -28970,7 +29243,30 @@ altivec_expand_vec_perm_const (rtx opera
>  	  enum machine_mode omode = insn_data[icode].operand[0].mode;
>  	  enum machine_mode imode = insn_data[icode].operand[1].mode;
>  
> -	  if (swapped)
> +	  /* For little-endian, don't use vpkuwum and vpkuhum if the
> +	     underlying vector type is not V4SI and V8HI, respectively.
> +	     For example, using vpkuwum with a V8HI picks up the even
> +	     halfwords (BE numbering) when the even halfwords (LE
> +	     numbering) are what we need.  */
> +	  if (!BYTES_BIG_ENDIAN
> +	      && icode == CODE_FOR_altivec_vpkuwum
> +	      && ((GET_CODE (op0) == REG
> +		   && GET_MODE (op0) != V4SImode)
> +		  || (GET_CODE (op0) == SUBREG
> +		      && GET_MODE (XEXP (op0, 0)) != V4SImode)))
> +	    continue;
> +	  if (!BYTES_BIG_ENDIAN
> +	      && icode == CODE_FOR_altivec_vpkuhum
> +	      && ((GET_CODE (op0) == REG
> +		   && GET_MODE (op0) != V8HImode)
> +		  || (GET_CODE (op0) == SUBREG
> +		      && GET_MODE (XEXP (op0, 0)) != V8HImode)))
> +	    continue;
> +
> +          /* For little-endian, the two input operands must be swapped
> +             (or swapped back) to ensure proper right-to-left numbering
> +             from 0 to 2N-1.  */
> +	  if (swapped ^ !BYTES_BIG_ENDIAN)
>  	    x = op0, op0 = op1, op1 = x;
>  	  if (imode != V16QImode)
>  	    {
> @@ -28988,6 +29284,12 @@ altivec_expand_vec_perm_const (rtx opera
>  	}
>      }
>  
> +  if (!BYTES_BIG_ENDIAN)
> +    {
> +      altivec_expand_vec_perm_const_le (operands);
> +      return true;
> +    }
> +
>    return false;
>  }
>  
> @@ -29037,6 +29339,21 @@ rs6000_expand_vec_perm_const_1 (rtx targ
>        gcc_assert (GET_MODE_NUNITS (vmode) == 2);
>        dmode = mode_for_vector (GET_MODE_INNER (vmode), 4);
>  
> +      /* For little endian, swap operands and invert/swap selectors
> +	 to get the correct xxpermdi.  The operand swap sets up the
> +	 inputs as a little endian array.  The selectors are swapped
> +	 because they are defined to use big endian ordering.  The
> +	 selectors are inverted to get the correct doublewords for
> +	 little endian ordering.  */
> +      if (!BYTES_BIG_ENDIAN)
> +	{
> +	  int n;
> +	  perm0 = 3 - perm0;
> +	  perm1 = 3 - perm1;
> +	  n = perm0, perm0 = perm1, perm1 = n;
> +	  x = op0, op0 = op1, op1 = x;
> +	}
> +
>        x = gen_rtx_VEC_CONCAT (dmode, op0, op1);
>        v = gen_rtvec (2, GEN_INT (perm0), GEN_INT (perm1));
>        x = gen_rtx_VEC_SELECT (vmode, x, gen_rtx_PARALLEL (VOIDmode, v));
> @@ -29132,7 +29449,7 @@ rs6000_expand_interleave (rtx target, rt
>    unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
>    rtx perm[16];
>  
> -  high = (highp == BYTES_BIG_ENDIAN ? 0 : nelt / 2);
> +  high = (highp ? 0 : nelt / 2);
>    for (i = 0; i < nelt / 2; i++)
>      {
>        perm[i * 2] = GEN_INT (i + high);
> Index: gcc-4_8-test/gcc/config/rs6000/vector.md
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/vector.md
> +++ gcc-4_8-test/gcc/config/rs6000/vector.md
> @@ -88,7 +88,8 @@
>  				 (smax "smax")])
>  
>  
> -;; Vector move instructions.
> +;; Vector move instructions.  Little-endian VSX loads and stores require
> +;; special handling to circumvent "element endianness."
>  (define_expand "mov<mode>"
>    [(set (match_operand:VEC_M 0 "nonimmediate_operand" "")
>  	(match_operand:VEC_M 1 "any_operand" ""))]
> @@ -104,6 +105,16 @@
>  	       && !vlogical_operand (operands[1], <MODE>mode))
>  	operands[1] = force_reg (<MODE>mode, operands[1]);
>      }
> +  if (!BYTES_BIG_ENDIAN
> +      && VECTOR_MEM_VSX_P (<MODE>mode)
> +      && <MODE>mode != TImode
> +      && !gpr_or_gpr_p (operands[0], operands[1])
> +      && (memory_operand (operands[0], <MODE>mode)
> +          ^ memory_operand (operands[1], <MODE>mode)))
> +    {
> +      rs6000_emit_le_vsx_move (operands[0], operands[1], <MODE>mode);
> +      DONE;
> +    }
>  })
>  
>  ;; Generic vector floating point load/store instructions.  These will match
> @@ -862,7 +873,7 @@
>  {
>    rtx reg = gen_reg_rtx (V4SFmode);
>  
> -  rs6000_expand_interleave (reg, operands[1], operands[1], true);
> +  rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
>    emit_insn (gen_vsx_xvcvspdp (operands[0], reg));
>    DONE;
>  })
> @@ -874,7 +885,7 @@
>  {
>    rtx reg = gen_reg_rtx (V4SFmode);
>  
> -  rs6000_expand_interleave (reg, operands[1], operands[1], false);
> +  rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
>    emit_insn (gen_vsx_xvcvspdp (operands[0], reg));
>    DONE;
>  })
> @@ -886,7 +897,7 @@
>  {
>    rtx reg = gen_reg_rtx (V4SImode);
>  
> -  rs6000_expand_interleave (reg, operands[1], operands[1], true);
> +  rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
>    emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg));
>    DONE;
>  })
> @@ -898,7 +909,7 @@
>  {
>    rtx reg = gen_reg_rtx (V4SImode);
>  
> -  rs6000_expand_interleave (reg, operands[1], operands[1], false);
> +  rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
>    emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg));
>    DONE;
>  })
> @@ -910,7 +921,7 @@
>  {
>    rtx reg = gen_reg_rtx (V4SImode);
>  
> -  rs6000_expand_interleave (reg, operands[1], operands[1], true);
> +  rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
>    emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg));
>    DONE;
>  })
> @@ -922,7 +933,7 @@
>  {
>    rtx reg = gen_reg_rtx (V4SImode);
>  
> -  rs6000_expand_interleave (reg, operands[1], operands[1], false);
> +  rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
>    emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg));
>    DONE;
>  })
> @@ -936,8 +947,19 @@
>     (match_operand:V16QI 3 "vlogical_operand" "")]
>    "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
>  {
> -  emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1], operands[2],
> -				       operands[3]));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1],
> +    	      				 operands[2], operands[3]));
> +  else
> +    {
> +      /* We have changed lvsr to lvsl, so to complete the transformation
> +         of vperm for LE, we must swap the inputs.  */
> +      rtx unspec = gen_rtx_UNSPEC (<MODE>mode,
> +                                   gen_rtvec (3, operands[2],
> +                                              operands[1], operands[3]),
> +                                   UNSPEC_VPERM);
> +      emit_move_insn (operands[0], unspec);
> +    }
>    DONE;
>  })
>  
> Index: gcc-4_8-test/gcc/config/rs6000/altivec.md
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/altivec.md
> +++ gcc-4_8-test/gcc/config/rs6000/altivec.md
> @@ -649,7 +649,7 @@
>     convert_move (small_swap, swap, 0);
>   
>     low_product = gen_reg_rtx (V4SImode);
> -   emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
> +   emit_insn (gen_altivec_vmulouh (low_product, one, two));
>   
>     high_product = gen_reg_rtx (V4SImode);
>     emit_insn (gen_altivec_vmsumuhm (high_product, one, small_swap, zero));
> @@ -676,10 +676,18 @@
>     emit_insn (gen_vec_widen_smult_even_v8hi (even, operands[1], operands[2]));
>     emit_insn (gen_vec_widen_smult_odd_v8hi (odd, operands[1], operands[2]));
>  
> -   emit_insn (gen_altivec_vmrghw (high, even, odd));
> -   emit_insn (gen_altivec_vmrglw (low, even, odd));
> -
> -   emit_insn (gen_altivec_vpkuwum (operands[0], high, low));
> +   if (BYTES_BIG_ENDIAN)
> +     {
> +       emit_insn (gen_altivec_vmrghw (high, even, odd));
> +       emit_insn (gen_altivec_vmrglw (low, even, odd));
> +       emit_insn (gen_altivec_vpkuwum (operands[0], high, low));
> +     }
> +   else
> +     {
> +       emit_insn (gen_altivec_vmrghw (high, odd, even));
> +       emit_insn (gen_altivec_vmrglw (low, odd, even));
> +       emit_insn (gen_altivec_vpkuwum (operands[0], low, high));
> +     } 
>  
>     DONE;
>  }")
> @@ -967,7 +975,111 @@
>    "vmrgow %0,%1,%2"
>    [(set_attr "type" "vecperm")])
>  
> -(define_insn "vec_widen_umult_even_v16qi"
> +(define_expand "vec_widen_umult_even_v16qi"
> +  [(use (match_operand:V8HI 0 "register_operand" ""))
> +   (use (match_operand:V16QI 1 "register_operand" ""))
> +   (use (match_operand:V16QI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_smult_even_v16qi"
> +  [(use (match_operand:V8HI 0 "register_operand" ""))
> +   (use (match_operand:V16QI 1 "register_operand" ""))
> +   (use (match_operand:V16QI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_umult_even_v8hi"
> +  [(use (match_operand:V4SI 0 "register_operand" ""))
> +   (use (match_operand:V8HI 1 "register_operand" ""))
> +   (use (match_operand:V8HI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_smult_even_v8hi"
> +  [(use (match_operand:V4SI 0 "register_operand" ""))
> +   (use (match_operand:V8HI 1 "register_operand" ""))
> +   (use (match_operand:V8HI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_umult_odd_v16qi"
> +  [(use (match_operand:V8HI 0 "register_operand" ""))
> +   (use (match_operand:V16QI 1 "register_operand" ""))
> +   (use (match_operand:V16QI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_smult_odd_v16qi"
> +  [(use (match_operand:V8HI 0 "register_operand" ""))
> +   (use (match_operand:V16QI 1 "register_operand" ""))
> +   (use (match_operand:V16QI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_umult_odd_v8hi"
> +  [(use (match_operand:V4SI 0 "register_operand" ""))
> +   (use (match_operand:V8HI 1 "register_operand" ""))
> +   (use (match_operand:V8HI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_expand "vec_widen_smult_odd_v8hi"
> +  [(use (match_operand:V4SI 0 "register_operand" ""))
> +   (use (match_operand:V8HI 1 "register_operand" ""))
> +   (use (match_operand:V8HI 2 "register_operand" ""))]
> +  "TARGET_ALTIVEC"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2]));
> +  else
> +    emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2]));
> +  DONE;
> +})
> +
> +(define_insn "altivec_vmuleub"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
>                        (match_operand:V16QI 2 "register_operand" "v")]
> @@ -976,43 +1088,25 @@
>    "vmuleub %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
>  
> -(define_insn "vec_widen_smult_even_v16qi"
> +(define_insn "altivec_vmuloub"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
>                        (match_operand:V16QI 2 "register_operand" "v")]
> -		     UNSPEC_VMULESB))]
> -  "TARGET_ALTIVEC"
> -  "vmulesb %0,%1,%2"
> -  [(set_attr "type" "veccomplex")])
> -
> -(define_insn "vec_widen_umult_even_v8hi"
> -  [(set (match_operand:V4SI 0 "register_operand" "=v")
> -        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> -                      (match_operand:V8HI 2 "register_operand" "v")]
> -		     UNSPEC_VMULEUH))]
> -  "TARGET_ALTIVEC"
> -  "vmuleuh %0,%1,%2"
> -  [(set_attr "type" "veccomplex")])
> -
> -(define_insn "vec_widen_smult_even_v8hi"
> -  [(set (match_operand:V4SI 0 "register_operand" "=v")
> -        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> -                      (match_operand:V8HI 2 "register_operand" "v")]
> -		     UNSPEC_VMULESH))]
> +		     UNSPEC_VMULOUB))]
>    "TARGET_ALTIVEC"
> -  "vmulesh %0,%1,%2"
> +  "vmuloub %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
>  
> -(define_insn "vec_widen_umult_odd_v16qi"
> +(define_insn "altivec_vmulesb"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
>                        (match_operand:V16QI 2 "register_operand" "v")]
> -		     UNSPEC_VMULOUB))]
> +		     UNSPEC_VMULESB))]
>    "TARGET_ALTIVEC"
> -  "vmuloub %0,%1,%2"
> +  "vmulesb %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
>  
> -(define_insn "vec_widen_smult_odd_v16qi"
> +(define_insn "altivec_vmulosb"
>    [(set (match_operand:V8HI 0 "register_operand" "=v")
>          (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
>                        (match_operand:V16QI 2 "register_operand" "v")]
> @@ -1021,7 +1115,16 @@
>    "vmulosb %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
>  
> -(define_insn "vec_widen_umult_odd_v8hi"
> +(define_insn "altivec_vmuleuh"
> +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> +        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> +                      (match_operand:V8HI 2 "register_operand" "v")]
> +		     UNSPEC_VMULEUH))]
> +  "TARGET_ALTIVEC"
> +  "vmuleuh %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
> +(define_insn "altivec_vmulouh"
>    [(set (match_operand:V4SI 0 "register_operand" "=v")
>          (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
>                        (match_operand:V8HI 2 "register_operand" "v")]
> @@ -1030,7 +1133,16 @@
>    "vmulouh %0,%1,%2"
>    [(set_attr "type" "veccomplex")])
>  
> -(define_insn "vec_widen_smult_odd_v8hi"
> +(define_insn "altivec_vmulesh"
> +  [(set (match_operand:V4SI 0 "register_operand" "=v")
> +        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
> +                      (match_operand:V8HI 2 "register_operand" "v")]
> +		     UNSPEC_VMULESH))]
> +  "TARGET_ALTIVEC"
> +  "vmulesh %0,%1,%2"
> +  [(set_attr "type" "veccomplex")])
> +
> +(define_insn "altivec_vmulosh"
>    [(set (match_operand:V4SI 0 "register_operand" "=v")
>          (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
>                        (match_operand:V8HI 2 "register_operand" "v")]
> @@ -1047,7 +1159,13 @@
>                        (match_operand:V4SI 2 "register_operand" "v")]
>  		     UNSPEC_VPKPX))]
>    "TARGET_ALTIVEC"
> -  "vpkpx %0,%1,%2"
> +  "*
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      return \"vpkpx %0,%1,%2\";
> +    else
> +      return \"vpkpx %0,%2,%1\";
> +  }"
>    [(set_attr "type" "vecperm")])
>  
>  (define_insn "altivec_vpks<VI_char>ss"
> @@ -1056,7 +1174,13 @@
>  			    (match_operand:VP 2 "register_operand" "v")]
>  			   UNSPEC_VPACK_SIGN_SIGN_SAT))]
>    "<VI_unit>"
> -  "vpks<VI_char>ss %0,%1,%2"
> +  "*
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      return \"vpks<VI_char>ss %0,%1,%2\";
> +    else
> +      return \"vpks<VI_char>ss %0,%2,%1\";
> +  }"
>    [(set_attr "type" "vecperm")])
>  
>  (define_insn "altivec_vpks<VI_char>us"
> @@ -1065,7 +1189,13 @@
>  			    (match_operand:VP 2 "register_operand" "v")]
>  			   UNSPEC_VPACK_SIGN_UNS_SAT))]
>    "<VI_unit>"
> -  "vpks<VI_char>us %0,%1,%2"
> +  "*
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      return \"vpks<VI_char>us %0,%1,%2\";
> +    else
> +      return \"vpks<VI_char>us %0,%2,%1\";
> +  }"
>    [(set_attr "type" "vecperm")])
>  
>  (define_insn "altivec_vpku<VI_char>us"
> @@ -1074,7 +1204,13 @@
>  			    (match_operand:VP 2 "register_operand" "v")]
>  			   UNSPEC_VPACK_UNS_UNS_SAT))]
>    "<VI_unit>"
> -  "vpku<VI_char>us %0,%1,%2"
> +  "*
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      return \"vpku<VI_char>us %0,%1,%2\";
> +    else
> +      return \"vpku<VI_char>us %0,%2,%1\";
> +  }"
>    [(set_attr "type" "vecperm")])
>  
>  (define_insn "altivec_vpku<VI_char>um"
> @@ -1083,7 +1219,13 @@
>  			    (match_operand:VP 2 "register_operand" "v")]
>  			   UNSPEC_VPACK_UNS_UNS_MOD))]
>    "<VI_unit>"
> -  "vpku<VI_char>um %0,%1,%2"
> +  "*
> +  {
> +    if (BYTES_BIG_ENDIAN)
> +      return \"vpku<VI_char>um %0,%1,%2\";
> +    else
> +      return \"vpku<VI_char>um %0,%2,%1\";
> +  }"
>    [(set_attr "type" "vecperm")])
>  
>  (define_insn "*altivec_vrl<VI_char>"
> @@ -1276,7 +1418,12 @@
>  		       (match_operand:V16QI 3 "register_operand" "")]
>  		      UNSPEC_VPERM))]
>    "TARGET_ALTIVEC"
> -  "")
> +{
> +  if (!BYTES_BIG_ENDIAN) {
> +    altivec_expand_vec_perm_le (operands);
> +    DONE;
> +  }
> +})
>  
>  (define_expand "vec_perm_constv16qi"
>    [(match_operand:V16QI 0 "register_operand" "")
> @@ -1928,25 +2075,26 @@
>    rtx vzero = gen_reg_rtx (V8HImode);
>    rtx mask = gen_reg_rtx (V16QImode);
>    rtvec v = rtvec_alloc (16);
> +  bool be = BYTES_BIG_ENDIAN;
>     
>    emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
>     
> -  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 0);
> -  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1);
> -  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 2);
> -  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3);
> -  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 4);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 6);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7);
> +  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
> +  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  0 : 16);
> +  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 :  6);
> +  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
> +  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
> +  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ?  2 : 16);
> +  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 :  4);
> +  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
> +  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
> +  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ?  4 : 16);
> +  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 :  2);
> +  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
> +  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
> +  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ?  6 : 16);
> +  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
> +  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>  
>    emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
> @@ -1963,25 +2111,26 @@
>    rtx vzero = gen_reg_rtx (V4SImode);
>    rtx mask = gen_reg_rtx (V16QImode);
>    rtvec v = rtvec_alloc (16);
> +  bool be = BYTES_BIG_ENDIAN;
>  
>    emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
>   
> -  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 0);
> -  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1);
> -  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 2);
> -  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3);
> -  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 4);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 6);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7);
> +  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
> +  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ? 17 :  6);
> +  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ?  0 : 17);
> +  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
> +  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
> +  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 17 :  4);
> +  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ?  2 : 17);
> +  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
> +  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
> +  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 17 :  2);
> +  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ?  4 : 17);
> +  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
> +  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
> +  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 :  0);
> +  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ?  6 : 17);
> +  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>  
>    emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
> @@ -1998,25 +2147,26 @@
>    rtx vzero = gen_reg_rtx (V8HImode);
>    rtx mask = gen_reg_rtx (V16QImode);
>    rtvec v = rtvec_alloc (16);
> +  bool be = BYTES_BIG_ENDIAN;
>  
>    emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
>  
> -  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 8);
> -  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9);
> -  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 10);
> -  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11);
> -  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 12);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 14);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15);
> +  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
> +  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  8 : 16);
> +  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14);
> +  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  9 : 16);
> +  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
> +  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16);
> +  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12);
> +  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
> +  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
> +  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16);
> +  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10);
> +  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
> +  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  9);
> +  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16);
> +  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  8);
> +  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>  
>    emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
> @@ -2033,25 +2183,26 @@
>    rtx vzero = gen_reg_rtx (V4SImode);
>    rtx mask = gen_reg_rtx (V16QImode);
>    rtvec v = rtvec_alloc (16);
> +  bool be = BYTES_BIG_ENDIAN;
>  
>    emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
>   
> -  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 8);
> -  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9);
> -  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 10);
> -  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11);
> -  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 12);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 14);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15);
> +  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
> +  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14);
> +  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ?  8 : 17);
> +  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  9 : 16);
> +  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
> +  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12);
> +  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17);
> +  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
> +  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
> +  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10);
> +  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17);
> +  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
> +  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  9);
> +  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 :  8);
> +  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
> +  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
>  
>    emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>    emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
> @@ -2071,7 +2222,10 @@
>    
>    emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2088,7 +2242,10 @@
>    
>    emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2105,7 +2262,10 @@
>    
>    emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2122,7 +2282,10 @@
>    
>    emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2139,7 +2302,10 @@
>    
>    emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2156,7 +2322,10 @@
>    
>    emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrglw (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2173,7 +2342,10 @@
>    
>    emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
>    DONE;
>  }")
>  
> @@ -2190,7 +2362,10 @@
>    
>    emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
>    emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
> +  else
> +    emit_insn (gen_altivec_vmrglw (operands[0], vo, ve));
>    DONE;
>  }")
>  
> Index: gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000-protos.h
> +++ gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h
> @@ -56,6 +56,7 @@ extern void paired_expand_vector_init (r
>  extern void rs6000_expand_vector_set (rtx, rtx, int);
>  extern void rs6000_expand_vector_extract (rtx, rtx, int);
>  extern bool altivec_expand_vec_perm_const (rtx op[4]);
> +extern void altivec_expand_vec_perm_le (rtx op[4]);
>  extern bool rs6000_expand_vec_perm_const (rtx op[4]);
>  extern void rs6000_expand_extract_even (rtx, rtx, rtx);
>  extern void rs6000_expand_interleave (rtx, rtx, rtx, bool);
> @@ -122,6 +123,7 @@ extern rtx rs6000_longcall_ref (rtx);
>  extern void rs6000_fatal_bad_address (rtx);
>  extern rtx create_TOC_reference (rtx, rtx);
>  extern void rs6000_split_multireg_move (rtx, rtx);
> +extern void rs6000_emit_le_vsx_move (rtx, rtx, enum machine_mode);
>  extern void rs6000_emit_move (rtx, rtx, enum machine_mode);
>  extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode);
>  extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode,
> Index: gcc-4_8-test/gcc/config/rs6000/vsx.md
> ===================================================================
> --- gcc-4_8-test.orig/gcc/config/rs6000/vsx.md
> +++ gcc-4_8-test/gcc/config/rs6000/vsx.md
> @@ -216,6 +216,359 @@
>    ])
>  
>  ;; VSX moves
> +
> +;; The patterns for LE permuted loads and stores come before the general
> +;; VSX moves so they match first.
> +(define_insn_and_split "*vsx_le_perm_load_<mode>"
> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> +        (match_operand:VSX_D 1 "memory_operand" "Z"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  [(set (match_dup 2)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 1) (const_int 0)])))
> +   (set (match_dup 0)
> +        (vec_select:<MODE>
> +          (match_dup 2)
> +          (parallel [(const_int 1) (const_int 0)])))]
> +  "
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
> +                                       : operands[0];
> +}
> +  "
> +  [(set_attr "type" "vecload")
> +   (set_attr "length" "8")])
> +
> +(define_insn_and_split "*vsx_le_perm_load_<mode>"
> +  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> +        (match_operand:VSX_W 1 "memory_operand" "Z"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  [(set (match_dup 2)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 2) (const_int 3)
> +                     (const_int 0) (const_int 1)])))
> +   (set (match_dup 0)
> +        (vec_select:<MODE>
> +          (match_dup 2)
> +          (parallel [(const_int 2) (const_int 3)
> +                     (const_int 0) (const_int 1)])))]
> +  "
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
> +                                       : operands[0];
> +}
> +  "
> +  [(set_attr "type" "vecload")
> +   (set_attr "length" "8")])
> +
> +(define_insn_and_split "*vsx_le_perm_load_v8hi"
> +  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
> +        (match_operand:V8HI 1 "memory_operand" "Z"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  [(set (match_dup 2)
> +        (vec_select:V8HI
> +          (match_dup 1)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))
> +   (set (match_dup 0)
> +        (vec_select:V8HI
> +          (match_dup 2)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))]
> +  "
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
> +                                       : operands[0];
> +}
> +  "
> +  [(set_attr "type" "vecload")
> +   (set_attr "length" "8")])
> +
> +(define_insn_and_split "*vsx_le_perm_load_v16qi"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> +        (match_operand:V16QI 1 "memory_operand" "Z"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  [(set (match_dup 2)
> +        (vec_select:V16QI
> +          (match_dup 1)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))
> +   (set (match_dup 0)
> +        (vec_select:V16QI
> +          (match_dup 2)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))]
> +  "
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
> +                                       : operands[0];
> +}
> +  "
> +  [(set_attr "type" "vecload")
> +   (set_attr "length" "8")])
> +
> +(define_insn "*vsx_le_perm_store_<mode>"
> +  [(set (match_operand:VSX_D 0 "memory_operand" "=Z")
> +        (match_operand:VSX_D 1 "vsx_register_operand" "+wa"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  [(set_attr "type" "vecstore")
> +   (set_attr "length" "12")])
> +
> +(define_split
> +  [(set (match_operand:VSX_D 0 "memory_operand" "")
> +        (match_operand:VSX_D 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
> +  [(set (match_dup 2)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 1) (const_int 0)])))
> +   (set (match_dup 0)
> +        (vec_select:<MODE>
> +          (match_dup 2)
> +          (parallel [(const_int 1) (const_int 0)])))]
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) 
> +                                       : operands[1];
> +})
> +
> +;; The post-reload split requires that we re-permute the source
> +;; register in case it is still live.
> +(define_split
> +  [(set (match_operand:VSX_D 0 "memory_operand" "")
> +        (match_operand:VSX_D 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
> +  [(set (match_dup 1)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 1) (const_int 0)])))
> +   (set (match_dup 0)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 1) (const_int 0)])))
> +   (set (match_dup 1)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 1) (const_int 0)])))]
> +  "")
> +
> +(define_insn "*vsx_le_perm_store_<mode>"
> +  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
> +        (match_operand:VSX_W 1 "vsx_register_operand" "+wa"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  [(set_attr "type" "vecstore")
> +   (set_attr "length" "12")])
> +
> +(define_split
> +  [(set (match_operand:VSX_W 0 "memory_operand" "")
> +        (match_operand:VSX_W 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
> +  [(set (match_dup 2)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 2) (const_int 3)
> +	             (const_int 0) (const_int 1)])))
> +   (set (match_dup 0)
> +        (vec_select:<MODE>
> +          (match_dup 2)
> +          (parallel [(const_int 2) (const_int 3)
> +	             (const_int 0) (const_int 1)])))]
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) 
> +                                       : operands[1];
> +})
> +
> +;; The post-reload split requires that we re-permute the source
> +;; register in case it is still live.
> +(define_split
> +  [(set (match_operand:VSX_W 0 "memory_operand" "")
> +        (match_operand:VSX_W 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
> +  [(set (match_dup 1)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 2) (const_int 3)
> +	             (const_int 0) (const_int 1)])))
> +   (set (match_dup 0)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 2) (const_int 3)
> +	             (const_int 0) (const_int 1)])))
> +   (set (match_dup 1)
> +        (vec_select:<MODE>
> +          (match_dup 1)
> +          (parallel [(const_int 2) (const_int 3)
> +	             (const_int 0) (const_int 1)])))]
> +  "")
> +
> +(define_insn "*vsx_le_perm_store_v8hi"
> +  [(set (match_operand:V8HI 0 "memory_operand" "=Z")
> +        (match_operand:V8HI 1 "vsx_register_operand" "+wa"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  [(set_attr "type" "vecstore")
> +   (set_attr "length" "12")])
> +
> +(define_split
> +  [(set (match_operand:V8HI 0 "memory_operand" "")
> +        (match_operand:V8HI 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
> +  [(set (match_dup 2)
> +        (vec_select:V8HI
> +          (match_dup 1)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))
> +   (set (match_dup 0)
> +        (vec_select:V8HI
> +          (match_dup 2)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))]
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) 
> +                                       : operands[1];
> +})
> +
> +;; The post-reload split requires that we re-permute the source
> +;; register in case it is still live.
> +(define_split
> +  [(set (match_operand:V8HI 0 "memory_operand" "")
> +        (match_operand:V8HI 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
> +  [(set (match_dup 1)
> +        (vec_select:V8HI
> +          (match_dup 1)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))
> +   (set (match_dup 0)
> +        (vec_select:V8HI
> +          (match_dup 1)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))
> +   (set (match_dup 1)
> +        (vec_select:V8HI
> +          (match_dup 1)
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))]
> +  "")
> +
> +(define_insn "*vsx_le_perm_store_v16qi"
> +  [(set (match_operand:V16QI 0 "memory_operand" "=Z")
> +        (match_operand:V16QI 1 "vsx_register_operand" "+wa"))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX"
> +  "#"
> +  [(set_attr "type" "vecstore")
> +   (set_attr "length" "12")])
> +
> +(define_split
> +  [(set (match_operand:V16QI 0 "memory_operand" "")
> +        (match_operand:V16QI 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
> +  [(set (match_dup 2)
> +        (vec_select:V16QI
> +          (match_dup 1)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))
> +   (set (match_dup 0)
> +        (vec_select:V16QI
> +          (match_dup 2)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))]
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) 
> +                                       : operands[1];
> +})
> +
> +;; The post-reload split requires that we re-permute the source
> +;; register in case it is still live.
> +(define_split
> +  [(set (match_operand:V16QI 0 "memory_operand" "")
> +        (match_operand:V16QI 1 "vsx_register_operand" ""))]
> +  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
> +  [(set (match_dup 1)
> +        (vec_select:V16QI
> +          (match_dup 1)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))
> +   (set (match_dup 0)
> +        (vec_select:V16QI
> +          (match_dup 1)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))
> +   (set (match_dup 1)
> +        (vec_select:V16QI
> +          (match_dup 1)
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))]
> +  "")
> +
> +
>  (define_insn "*vsx_mov<mode>"
>    [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?wa,?wa,wQ,?&r,??Y,??r,??r,<VSr>,?wa,*r,v,wZ, v")
>  	(match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,wa,Z,wa,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
> @@ -962,7 +1315,12 @@
>  	 (match_operand:<VS_scalar> 1 "vsx_register_operand" "ws,wa")
>  	 (match_operand:<VS_scalar> 2 "vsx_register_operand" "ws,wa")))]
>    "VECTOR_MEM_VSX_P (<MODE>mode)"
> -  "xxpermdi %x0,%x1,%x2,0"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    return "xxpermdi %x0,%x1,%x2,0";
> +  else
> +    return "xxpermdi %x0,%x2,%x1,0";
> +}
>    [(set_attr "type" "vecperm")])
>  
>  ;; Special purpose concat using xxpermdi to glue two single precision values
> @@ -975,9 +1333,161 @@
>  	  (match_operand:SF 2 "vsx_register_operand" "f,f")]
>  	 UNSPEC_VSX_CONCAT))]
>    "VECTOR_MEM_VSX_P (V2DFmode)"
> -  "xxpermdi %x0,%x1,%x2,0"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    return "xxpermdi %x0,%x1,%x2,0";
> +  else
> +    return "xxpermdi %x0,%x2,%x1,0";
> +}
> +  [(set_attr "type" "vecperm")])
> +
> +;; xxpermdi for little endian loads and stores.  We need several of
> +;; these since the form of the PARALLEL differs by mode.
> +(define_insn "*vsx_xxpermdi2_le_<mode>"
> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> +        (vec_select:VSX_D
> +          (match_operand:VSX_D 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 1) (const_int 0)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> +  "xxpermdi %x0,%x1,%x1,2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "*vsx_xxpermdi4_le_<mode>"
> +  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> +        (vec_select:VSX_W
> +          (match_operand:VSX_W 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 2) (const_int 3)
> +                     (const_int 0) (const_int 1)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> +  "xxpermdi %x0,%x1,%x1,2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "*vsx_xxpermdi8_le_V8HI"
> +  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
> +        (vec_select:V8HI
> +          (match_operand:V8HI 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
> +  "xxpermdi %x0,%x1,%x1,2"
> +  [(set_attr "type" "vecperm")])
> +
> +(define_insn "*vsx_xxpermdi16_le_V16QI"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> +        (vec_select:V16QI
> +          (match_operand:V16QI 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
> +  "xxpermdi %x0,%x1,%x1,2"
>    [(set_attr "type" "vecperm")])
>  
> +;; lxvd2x for little endian loads.  We need several of
> +;; these since the form of the PARALLEL differs by mode.
> +(define_insn "*vsx_lxvd2x2_le_<mode>"
> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> +        (vec_select:VSX_D
> +          (match_operand:VSX_D 1 "memory_operand" "Z")
> +          (parallel [(const_int 1) (const_int 0)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> +  "lxvd2x %x0,%y1"
> +  [(set_attr "type" "vecload")])
> +
> +(define_insn "*vsx_lxvd2x4_le_<mode>"
> +  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> +        (vec_select:VSX_W
> +          (match_operand:VSX_W 1 "memory_operand" "Z")
> +          (parallel [(const_int 2) (const_int 3)
> +                     (const_int 0) (const_int 1)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> +  "lxvd2x %x0,%y1"
> +  [(set_attr "type" "vecload")])
> +
> +(define_insn "*vsx_lxvd2x8_le_V8HI"
> +  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
> +        (vec_select:V8HI
> +          (match_operand:V8HI 1 "memory_operand" "Z")
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
> +  "lxvd2x %x0,%y1"
> +  [(set_attr "type" "vecload")])
> +
> +(define_insn "*vsx_lxvd2x16_le_V16QI"
> +  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
> +        (vec_select:V16QI
> +          (match_operand:V16QI 1 "memory_operand" "Z")
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
> +  "lxvd2x %x0,%y1"
> +  [(set_attr "type" "vecload")])
> +
> +;; stxvd2x for little endian stores.  We need several of
> +;; these since the form of the PARALLEL differs by mode.
> +(define_insn "*vsx_stxvd2x2_le_<mode>"
> +  [(set (match_operand:VSX_D 0 "memory_operand" "=Z")
> +        (vec_select:VSX_D
> +          (match_operand:VSX_D 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 1) (const_int 0)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> +  "stxvd2x %x1,%y0"
> +  [(set_attr "type" "vecstore")])
> +
> +(define_insn "*vsx_stxvd2x4_le_<mode>"
> +  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
> +        (vec_select:VSX_W
> +          (match_operand:VSX_W 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 2) (const_int 3)
> +                     (const_int 0) (const_int 1)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
> +  "stxvd2x %x1,%y0"
> +  [(set_attr "type" "vecstore")])
> +
> +(define_insn "*vsx_stxvd2x8_le_V8HI"
> +  [(set (match_operand:V8HI 0 "memory_operand" "=Z")
> +        (vec_select:V8HI
> +          (match_operand:V8HI 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
> +  "stxvd2x %x1,%y0"
> +  [(set_attr "type" "vecstore")])
> +
> +(define_insn "*vsx_stxvd2x16_le_V16QI"
> +  [(set (match_operand:V16QI 0 "memory_operand" "=Z")
> +        (vec_select:V16QI
> +          (match_operand:V16QI 1 "vsx_register_operand" "wa")
> +          (parallel [(const_int 8) (const_int 9)
> +                     (const_int 10) (const_int 11)
> +                     (const_int 12) (const_int 13)
> +                     (const_int 14) (const_int 15)
> +                     (const_int 0) (const_int 1)
> +                     (const_int 2) (const_int 3)
> +                     (const_int 4) (const_int 5)
> +                     (const_int 6) (const_int 7)])))]
> +  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
> +  "stxvd2x %x1,%y0"
> +  [(set_attr "type" "vecstore")])
> +
>  ;; Set the element of a V2DI/VD2F mode
>  (define_insn "vsx_set_<mode>"
>    [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?wa")
> @@ -987,9 +1497,10 @@
>  		      UNSPEC_VSX_SET))]
>    "VECTOR_MEM_VSX_P (<MODE>mode)"
>  {
> -  if (INTVAL (operands[3]) == 0)
> +  int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
> +  if (INTVAL (operands[3]) == idx_first)
>      return \"xxpermdi %x0,%x2,%x1,1\";
> -  else if (INTVAL (operands[3]) == 1)
> +  else if (INTVAL (operands[3]) == 1 - idx_first)
>      return \"xxpermdi %x0,%x1,%x2,0\";
>    else
>      gcc_unreachable ();
> @@ -1004,8 +1515,12 @@
>  			[(match_operand:QI 2 "u5bit_cint_operand" "i,i,i")])))]
>    "VECTOR_MEM_VSX_P (<MODE>mode)"
>  {
> +  int fldDM;
>    gcc_assert (UINTVAL (operands[2]) <= 1);
> -  operands[3] = GEN_INT (INTVAL (operands[2]) << 1);
> +  fldDM = INTVAL (operands[2]) << 1;
> +  if (!BYTES_BIG_ENDIAN)
> +    fldDM = 3 - fldDM;
> +  operands[3] = GEN_INT (fldDM);
>    return \"xxpermdi %x0,%x1,%x1,%3\";
>  }
>    [(set_attr "type" "vecperm")])
> @@ -1025,6 +1540,21 @@
>  	(const_string "fpload")))
>     (set_attr "length" "4")])  
>  
> +;; Optimize extracting element 1 from memory for little endian
> +(define_insn "*vsx_extract_<mode>_one_le"
> +  [(set (match_operand:<VS_scalar> 0 "vsx_register_operand" "=ws,d,?wa")
> +	(vec_select:<VS_scalar>
> +	 (match_operand:VSX_D 1 "indexed_or_indirect_operand" "Z,Z,Z")
> +	 (parallel [(const_int 1)])))]
> +  "VECTOR_MEM_VSX_P (<MODE>mode) && !WORDS_BIG_ENDIAN"
> +  "lxsd%U1x %x0,%y1"
> +  [(set (attr "type")
> +      (if_then_else
> +	(match_test "update_indexed_address_mem (operands[1], VOIDmode)")
> +	(const_string "fpload_ux")
> +	(const_string "fpload")))
> +   (set_attr "length" "4")])  
> +
>  ;; Extract a SF element from V4SF
>  (define_insn_and_split "vsx_extract_v4sf"
>    [(set (match_operand:SF 0 "vsx_register_operand" "=f,f")
> @@ -1045,7 +1575,7 @@
>    rtx op2 = operands[2];
>    rtx op3 = operands[3];
>    rtx tmp;
> -  HOST_WIDE_INT ele = INTVAL (op2);
> +  HOST_WIDE_INT ele = BYTES_BIG_ENDIAN ? INTVAL (op2) : 3 - INTVAL (op2);
>  
>    if (ele == 0)
>      tmp = op1;
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/fusion.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/fusion.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/fusion.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
>  /* { dg-require-effective-target powerpc_p8vector_ok } */
>  /* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
>  
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr43154.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/pr43154.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr43154.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
>  /* { dg-options "-O2 -mcpu=power7" } */
>  
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
> @@ -19,19 +19,6 @@ V b4(V x)
>    return __builtin_shuffle(x, (V){ 4,5,6,7, 4,5,6,7, 4,5,6,7, 4,5,6,7, });
>  }
>  
> -V p2(V x, V y)
> -{
> -  return __builtin_shuffle(x, y,
> -	(V){ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 });
> -
> -}
> -
> -V p4(V x, V y)
> -{
> -  return __builtin_shuffle(x, y,
> -	(V){ 2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 });
> -}
> -
>  V h1(V x, V y)
>  {
>    return __builtin_shuffle(x, y,
> @@ -72,5 +59,3 @@ V l4(V x, V y)
>  /* { dg-final { scan-assembler "vspltb" } } */
>  /* { dg-final { scan-assembler "vsplth" } } */
>  /* { dg-final { scan-assembler "vspltw" } } */
> -/* { dg-final { scan-assembler "vpkuhum" } } */
> -/* { dg-final { scan-assembler "vpkuwum" } } */
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c
> ===================================================================
> --- /dev/null
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
> +/* { dg-options "-O -maltivec -mno-vsx" } */
> +
> +typedef unsigned char V __attribute__((vector_size(16)));
> +
> +V p2(V x, V y)
> +{
> +  return __builtin_shuffle(x, y,
> +	(V){ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 });
> +
> +}
> +
> +V p4(V x, V y)
> +{
> +  return __builtin_shuffle(x, y,
> +	(V){ 2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 });
> +}
> +
> +/* { dg-final { scan-assembler-not "vperm" } } */
> +/* { dg-final { scan-assembler "vpkuhum" } } */
> +/* { dg-final { scan-assembler "vpkuwum" } } */
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/eg-5.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/eg-5.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/eg-5.c
> @@ -7,10 +7,17 @@ matvecmul4 (vector float c0, vector floa
>    /* Set result to a vector of f32 0's */
>    vector float result = ((vector float){0.,0.,0.,0.});
>  
> +#ifdef __LITTLE_ENDIAN__
> +  result  = vec_madd (c0, vec_splat (v, 3), result);
> +  result  = vec_madd (c1, vec_splat (v, 2), result);
> +  result  = vec_madd (c2, vec_splat (v, 1), result);
> +  result  = vec_madd (c3, vec_splat (v, 0), result);
> +#else
>    result  = vec_madd (c0, vec_splat (v, 0), result);
>    result  = vec_madd (c1, vec_splat (v, 1), result);
>    result  = vec_madd (c2, vec_splat (v, 2), result);
>    result  = vec_madd (c3, vec_splat (v, 3), result);
> +#endif
>  
>    return result;
>  }
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c
> @@ -13,12 +13,27 @@
>  #define DO_INLINE __attribute__ ((always_inline))
>  #define DONT_INLINE __attribute__ ((noinline))
>  
> +#ifdef __LITTLE_ENDIAN__
> +static inline DO_INLINE int inline_me(vector signed short data)
> +{
> +  union {vector signed short v; signed short s[8];} u;
> +  signed short x;
> +  unsigned char x1, x2;
> +
> +  u.v = data;
> +  x = u.s[7];
> +  x1 = (x >> 8) & 0xff;
> +  x2 = x & 0xff;
> +  return ((x2 << 8) | x1);
> +}
> +#else
>  static inline DO_INLINE int inline_me(vector signed short data) 
>  {
>    union {vector signed short v; signed short s[8];} u;
>    u.v = data;
>    return u.s[7];
>  }
> +#endif
>  
>  static DONT_INLINE int foo(vector signed short data)
>  {
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/vec-set.c
> ===================================================================
> --- /dev/null
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/vec-set.c
> @@ -0,0 +1,14 @@
> +#include "harness.h"
> +
> +vector short
> +vec_set (short m)
> +{
> +  return (vector short){m, 0, 0, 0, 0, 0, 0, 0};
> +}
> +
> +static void test()
> +{
> +  check (vec_all_eq (vec_set (7),
> +		     ((vector short){7, 0, 0, 0, 0, 0, 0, 0})),
> +	 "vec_set");
> +}
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/3b-15.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/3b-15.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/3b-15.c
> @@ -3,7 +3,11 @@
>  vector unsigned char
>  f (vector unsigned char a, vector unsigned char b, vector unsigned char c)
>  {
> +#ifdef __BIG_ENDIAN__
>    return vec_perm(a,b,c); 
> +#else
> +  return vec_perm(b,a,c);
> +#endif
>  }
>  
>  static void test()
> @@ -12,8 +16,13 @@ static void test()
>  					    8,9,10,11,12,13,14,15}),
>  		     ((vector unsigned char){70,71,72,73,74,75,76,77,
>  					    78,79,80,81,82,83,84,85}),
> +#ifdef __BIG_ENDIAN__
>  		     ((vector unsigned char){0x1,0x14,0x18,0x10,0x16,0x15,0x19,0x1a,
>  					    0x1c,0x1c,0x1c,0x12,0x8,0x1d,0x1b,0xe})),
> +#else
> +                     ((vector unsigned char){0x1e,0xb,0x7,0xf,0x9,0xa,0x6,0x5,
> +                                            0x3,0x3,0x3,0xd,0x17,0x2,0x4,0x11})),
> +#endif
>  		   ((vector unsigned char){1,74,78,70,76,75,79,80,82,82,82,72,8,83,81,14})),
>  	"f");
>  }
> Index: gcc-4_8-test/libcpp/lex.c
> ===================================================================
> --- gcc-4_8-test.orig/libcpp/lex.c
> +++ gcc-4_8-test/libcpp/lex.c
> @@ -559,8 +559,13 @@ search_line_fast (const uchar *s, const
>       beginning with all ones and shifting in zeros according to the
>       mis-alignment.  The LVSR instruction pulls the exact shift we
>       want from the address.  */
> +#ifdef __BIG_ENDIAN__
>    mask = __builtin_vec_lvsr(0, s);
>    mask = __builtin_vec_perm(zero, ones, mask);
> +#else
> +  mask = __builtin_vec_lvsl(0, s);
> +  mask = __builtin_vec_perm(ones, zero, mask);
> +#endif
>    data &= mask;
>  
>    /* While altivec loads mask addresses, we still need to align S so
> @@ -624,7 +629,11 @@ search_line_fast (const uchar *s, const
>      /* L now contains 0xff in bytes for which we matched one of the
>         relevant characters.  We can find the byte index by finding
>         its bit index and dividing by 8.  */
> +#ifdef __BIG_ENDIAN__
>      l = __builtin_clzl(l) >> 3;
> +#else
> +    l = __builtin_ctzl(l) >> 3;
> +#endif
>      return s + l;
>  
>  #undef N
> Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr48258-1.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/pr48258-1.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr48258-1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> +/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
>  /* { dg-options "-O3 -mcpu=power7 -mabi=altivec -ffast-math -fno-unroll-loops" } */
>  /* { dg-final { scan-assembler-times "xvaddsp" 3 } } */
> Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
> ===================================================================
> --- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
> +++ gcc-4_8-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
> @@ -1,4 +1,5 @@
>  /* { dg-require-effective-target vect_int } */
> +/* { dg-skip-if "cost too high" { powerpc*le-*-* } { "*" } { "" } } */
>  
>  #include <stdarg.h>
>  #include "../../tree-vect.h"
> 
> 
> 
> 
>
David Edelsohn April 3, 2014, 2:33 p.m. UTC | #2
On Wed, Mar 19, 2014 at 3:30 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> This patch (diff-le-vector) backports the changes to support vector
> infrastructure on powerpc64le.  Copying Richard and Jakub for the libcpp
> bits.
>
> Thanks,
> Bill
>
>
> [gcc]
>
> 2014-03-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         Backport from mainline r205333
>         2013-11-24  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (rs6000_expand_vec_perm_const_1): Correct
>         for little endian.
>
>         Backport from mainline r205241
>         2013-11-21  Bill Schmidt  <wschmidt@vnet.ibm.com>
>
>         * config/rs6000/vector.md (vec_pack_trunc_v2df): Revert previous
>         little endian change.
>         (vec_pack_sfix_trunc_v2df): Likewise.
>         (vec_pack_ufix_trunc_v2df): Likewise.
>         * config/rs6000/rs6000.c (rs6000_expand_interleave): Correct
>         double checking of endianness.
>
>         Backport from mainline r205146
>         2013-11-20  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/vsx.md (vsx_set_<mode>): Adjust for little endian.
>         (vsx_extract_<mode>): Likewise.
>         (*vsx_extract_<mode>_one_le): New LE variant on
>         *vsx_extract_<mode>_zero.
>         (vsx_extract_v4sf): Adjust for little endian.
>
>         Backport from mainline r205080
>         2013-11-19  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Adjust
>         V16QI vector splat case for little endian.
>
>         Backport from mainline r205045:
>
>         2013-11-19  Ulrich Weigand  <Ulrich.Weigand@de.ibm.com>
>
>         * config/rs6000/vector.md ("mov<mode>"): Do not call
>         rs6000_emit_le_vsx_move to move into or out of GPRs.
>         * config/rs6000/rs6000.c (rs6000_emit_le_vsx_move): Assert
>         source and destination are not GPR hard regs.
>
>         Backport from mainline r204920
>         2011-11-17  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (rs6000_frame_related): Add split_reg
>         parameter and use it in REG_FRAME_RELATED_EXPR note.
>         (emit_frame_save): Call rs6000_frame_related with extra NULL_RTX
>         parameter.
>         (rs6000_emit_prologue): Likewise, but for little endian VSX
>         stores, pass the source register of the store instead.
>
>         Backport from mainline r204862
>         2013-11-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/altivec.md (UNSPEC_VPERM_X, UNSPEC_VPERM_UNS_X):
>         Remove.
>         (altivec_vperm_<mode>): Revert earlier little endian change.
>         (*altivec_vperm_<mode>_internal): Remove.
>         (altivec_vperm_<mode>_uns): Revert earlier little endian change.
>         (*altivec_vperm_<mode>_uns_internal): Remove.
>         * config/rs6000/vector.md (vec_realign_load_<mode>): Revise
>         commentary.
>
>         Backport from mainline r204441
>         2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (rs6000_option_override_internal):
>         Remove restriction against use of VSX instructions when generating
>         code for little endian mode.
>
>         Backport from mainline r204440
>         2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/altivec.md (mulv4si3): Ensure we generate vmulouh
>         for both big and little endian.
>         (mulv8hi3): Swap input operands for merge high and merge low
>         instructions for little endian.
>
>         Backport from mainline r204439
>         2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/altivec.md (vec_widen_umult_even_v16qi): Change
>         define_insn to define_expand that uses even patterns for big
>         endian and odd patterns for little endian.
>         (vec_widen_smult_even_v16qi): Likewise.
>         (vec_widen_umult_even_v8hi): Likewise.
>         (vec_widen_smult_even_v8hi): Likewise.
>         (vec_widen_umult_odd_v16qi): Likewise.
>         (vec_widen_smult_odd_v16qi): Likewise.
>         (vec_widen_umult_odd_v8hi): Likewise.
>         (vec_widen_smult_odd_v8hi): Likewise.
>         (altivec_vmuleub): New define_insn.
>         (altivec_vmuloub): Likewise.
>         (altivec_vmulesb): Likewise.
>         (altivec_vmulosb): Likewise.
>         (altivec_vmuleuh): Likewise.
>         (altivec_vmulouh): Likewise.
>         (altivec_vmulesh): Likewise.
>         (altivec_vmulosh): Likewise.
>
>         Backport from mainline r204395
>         2013-11-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/vector.md (vec_pack_sfix_trunc_v2df): Adjust for
>         little endian.
>         (vec_pack_ufix_trunc_v2df): Likewise.
>
>         Backport from mainline r204363
>         2013-11-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap
>         arguments to merge instruction for little endian.
>         (vec_widen_umult_lo_v16qi): Likewise.
>         (vec_widen_smult_hi_v16qi): Likewise.
>         (vec_widen_smult_lo_v16qi): Likewise.
>         (vec_widen_umult_hi_v8hi): Likewise.
>         (vec_widen_umult_lo_v8hi): Likewise.
>         (vec_widen_smult_hi_v8hi): Likewise.
>         (vec_widen_smult_lo_v8hi): Likewise.
>
>         Backport from mainline r204350
>         2013-11-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/vsx.md (*vsx_le_perm_store_<mode> for VSX_D):
>         Replace the define_insn_and_split with a define_insn and two
>         define_splits, with the split after reload re-permuting the source
>         register to its original value.
>         (*vsx_le_perm_store_<mode> for VSX_W): Likewise.
>         (*vsx_le_perm_store_v8hi): Likewise.
>         (*vsx_le_perm_store_v16qi): Likewise.
>
>         Backport from mainline r204321
>         2013-11-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/vector.md (vec_pack_trunc_v2df):  Adjust for
>         little endian.
>
>         Backport from mainline r204321
>         2013-11-02  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
>
>         * config/rs6000/rs6000.c (rs6000_expand_vector_set): Adjust for
>         little endian.
>
>         Backport from mainline r203980
>         2013-10-23  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/altivec.md (mulv8hi3): Adjust for little endian.
>
>         Backport from mainline r203930
>         2013-10-22  Bill Schmidt  <wschmidt@vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
>         meaning of merge-high and merge-low masks for little endian; avoid
>         use of vector-pack masks for little endian for mismatched modes.
>
>         Backport from mainline r203877
>         2013-10-20  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Adjust for
>         little endian.
>         (vec_unpacku_hi_v8hi): Likewise.
>         (vec_unpacku_lo_v16qi): Likewise.
>         (vec_unpacku_lo_v8hi): Likewise.
>
>         Backport from mainline r203863
>         2013-10-19  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (vspltis_constant): Make sure we check
>         all elements for both endian flavors.
>
>         Backport from mainline r203714
>         2013-10-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * gcc/config/rs6000/vector.md (vec_unpacks_hi_v4sf): Correct for
>         endianness.
>         (vec_unpacks_lo_v4sf): Likewise.
>         (vec_unpacks_float_hi_v4si): Likewise.
>         (vec_unpacks_float_lo_v4si): Likewise.
>         (vec_unpacku_float_hi_v4si): Likewise.
>         (vec_unpacku_float_lo_v4si): Likewise.
>
>         Backport from mainline r203713
>         2013-10-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/vsx.md (vsx_concat_<mode>): Adjust output for LE.
>         (vsx_concat_v2sf): Likewise.
>
>         Backport from mainline r203458
>         2013-10-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): Generalize to
>         handle vector float as well.
>         (*vsx_le_perm_load_v4si): Likewise.
>         (*vsx_le_perm_store_v2di): Likewise.
>         (*vsx_le_perm_store_v4si): Likewise.
>
>         Backport from mainline r203457
>         2013-10-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/vector.md (vec_realign_load<mode>): Generate vperm
>         directly to circumvent subtract from splat{31} workaround.
>         * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New
>         prototype.
>         * config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New.
>         * config/rs6000/altivec.md (define_c_enum "unspec"): Add
>         UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X.
>         (altivec_vperm_<mode>): Convert to define_insn_and_split to
>         separate big and little endian logic.
>         (*altivec_vperm_<mode>_internal): New define_insn.
>         (altivec_vperm_<mode>_uns): Convert to define_insn_and_split to
>         separate big and little endian logic.
>         (*altivec_vperm_<mode>_uns_internal): New define_insn.
>         (vec_permv16qi): Add little endian logic.
>
>         Backport from mainline r203247
>         2013-10-07  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New.
>         (altivec_expand_vec_perm_const): Call it.
>
>         Backport from mainline r203246
>         2013-10-07  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/vector.md (mov<mode>): Emit permuted move
>         sequences for LE VSX loads and stores at expand time.
>         * config/rs6000/rs6000-protos.h (rs6000_emit_le_vsx_move): New
>         prototype.
>         * config/rs6000/rs6000.c (rs6000_const_vec): New.
>         (rs6000_gen_le_vsx_permute): New.
>         (rs6000_gen_le_vsx_load): New.
>         (rs6000_gen_le_vsx_store): New.
>         (rs6000_gen_le_vsx_move): New.
>         * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): New.
>         (*vsx_le_perm_load_v4si): New.
>         (*vsx_le_perm_load_v8hi): New.
>         (*vsx_le_perm_load_v16qi): New.
>         (*vsx_le_perm_store_v2di): New.
>         (*vsx_le_perm_store_v4si): New.
>         (*vsx_le_perm_store_v8hi): New.
>         (*vsx_le_perm_store_v16qi): New.
>         (*vsx_xxpermdi2_le_<mode>): New.
>         (*vsx_xxpermdi4_le_<mode>): New.
>         (*vsx_xxpermdi8_le_V8HI): New.
>         (*vsx_xxpermdi16_le_V16QI): New.
>         (*vsx_lxvd2x2_le_<mode>): New.
>         (*vsx_lxvd2x4_le_<mode>): New.
>         (*vsx_lxvd2x8_le_V8HI): New.
>         (*vsx_lxvd2x16_le_V16QI): New.
>         (*vsx_stxvd2x2_le_<mode>): New.
>         (*vsx_stxvd2x4_le_<mode>): New.
>         (*vsx_stxvd2x8_le_V8HI): New.
>         (*vsx_stxvd2x16_le_V16QI): New.
>
>         Backport from mainline r201235
>         2013-07-24  Bill Schmidt  <wschmidt@linux.ibm.com>
>                     Anton Blanchard <anton@au1.ibm.com>
>
>         * config/rs6000/altivec.md (altivec_vpkpx): Handle little endian.
>         (altivec_vpks<VI_char>ss): Likewise.
>         (altivec_vpks<VI_char>us): Likewise.
>         (altivec_vpku<VI_char>us): Likewise.
>         (altivec_vpku<VI_char>um): Likewise.
>
>         Backport from mainline r201208
>         2013-07-24  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
>                     Anton Blanchard <anton@au1.ibm.com>
>
>         * config/rs6000/vector.md (vec_realign_load_<mode>): Reorder input
>         operands to vperm for little endian.
>         * config/rs6000/rs6000.c (rs6000_expand_builtin): Use lvsr instead
>         of lvsl to create the control mask for a vperm for little endian.
>
>         Backport from mainline r201195
>         2013-07-23  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>                     Anton Blanchard <anton@au1.ibm.com>
>
>         * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
>         two operands for little-endian.
>
>         Backport from mainline r201193
>         2013-07-23  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>                     Anton Blanchard <anton@au1.ibm.com>
>
>         * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Correct
>         selection of field for vector splat in little endian mode.
>
>         Backport from mainline r201149
>         2013-07-22  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
>                     Anton Blanchard <anton@au1.ibm.com>
>
>         * config/rs6000/rs6000.c (rs6000_expand_vector_init): Fix
>         endianness when selecting field to splat.
>
> [gcc/testsuite]
>
> 2014-03-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         Backport from mainline r205638
>         2013-12-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Skip for little
>         endian.
>
>         Backport from mainline r205146
>         2013-11-20  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/pr48258-1.c: Skip for little endian.
>
>         Backport from mainline r204862
>         2013-11-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * gcc.dg/vmx/3b-15.c: Revise for little endian.
>
>         Backport from mainline r204321
>         2013-11-02  Bill Schmidt  <wschmidt@vnet.linux.ibm.com>
>
>         * gcc.dg/vmx/vec-set.c: New.
>
>         Backport from mainline r204138
>         2013-10-28  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * gcc.dg/vmx/gcc-bug-i.c: Add little endian variant.
>         * gcc.dg/vmx/eg-5.c: Likewise.
>
>         Backport from mainline r203930
>         2013-10-22  Bill Schmidt  <wschmidt@vnet.ibm.com>
>
>         * gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack
>         tests into...
>         * gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is
>         restricted to big-endian targets.
>
>         Backport from mainline r203246
>         2013-10-07  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/pr43154.c: Skip for ppc64 little endian.
>         * gcc.target/powerpc/fusion.c: Likewise.
>
> [libcpp]
>
> 2014-03-29  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         Backport from mainline
>         2013-11-18  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * lex.c (search_line_fast): Correct for little endian.

PowerPC bits are okay.

Thanks, David
diff mbox

Patch

Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c
===================================================================
--- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c
+++ gcc-4_8-test/gcc/config/rs6000/rs6000.c
@@ -3216,11 +3216,6 @@  rs6000_option_override_internal (bool gl
 	}
       else if (TARGET_PAIRED_FLOAT)
 	msg = N_("-mvsx and -mpaired are incompatible");
-      /* The hardware will allow VSX and little endian, but until we make sure
-	 things like vector select, etc. work don't allow VSX on little endian
-	 systems at this point.  */
-      else if (!BYTES_BIG_ENDIAN)
-	msg = N_("-mvsx used with little endian code");
       else if (TARGET_AVOID_XFORM > 0)
 	msg = N_("-mvsx needs indexed addressing");
       else if (!TARGET_ALTIVEC && (rs6000_isa_flags_explicit
@@ -4991,15 +4986,16 @@  vspltis_constant (rtx op, unsigned step,
 
   /* Check if VAL is present in every STEP-th element, and the
      other elements are filled with its most significant bit.  */
-  for (i = 0; i < nunits - 1; ++i)
+  for (i = 1; i < nunits; ++i)
     {
       HOST_WIDE_INT desired_val;
-      if (((BYTES_BIG_ENDIAN ? i + 1 : i) & (step - 1)) == 0)
+      unsigned elt = BYTES_BIG_ENDIAN ? nunits - 1 - i : i;
+      if ((i & (step - 1)) == 0)
 	desired_val = val;
       else
 	desired_val = msb_val;
 
-      if (desired_val != const_vector_elt_as_int (op, i))
+      if (desired_val != const_vector_elt_as_int (op, elt))
 	return false;
     }
 
@@ -5446,6 +5442,7 @@  rs6000_expand_vector_init (rtx target, r
      of 64-bit items is not supported on Altivec.  */
   if (all_same && GET_MODE_SIZE (inner_mode) <= 4)
     {
+      rtx field;
       mem = assign_stack_temp (mode, GET_MODE_SIZE (inner_mode));
       emit_move_insn (adjust_address_nv (mem, inner_mode, 0),
 		      XVECEXP (vals, 0, 0));
@@ -5456,9 +5453,11 @@  rs6000_expand_vector_init (rtx target, r
 					      gen_rtx_SET (VOIDmode,
 							   target, mem),
 					      x)));
+      field = (BYTES_BIG_ENDIAN ? const0_rtx
+	       : GEN_INT (GET_MODE_NUNITS (mode) - 1));
       x = gen_rtx_VEC_SELECT (inner_mode, target,
 			      gen_rtx_PARALLEL (VOIDmode,
-						gen_rtvec (1, const0_rtx)));
+						gen_rtvec (1, field)));
       emit_insn (gen_rtx_SET (VOIDmode, target,
 			      gen_rtx_VEC_DUPLICATE (mode, x)));
       return;
@@ -5531,10 +5530,27 @@  rs6000_expand_vector_set (rtx target, rt
     XVECEXP (mask, 0, elt*width + i)
       = GEN_INT (i + 0x10);
   x = gen_rtx_CONST_VECTOR (V16QImode, XVEC (mask, 0));
-  x = gen_rtx_UNSPEC (mode,
-		      gen_rtvec (3, target, reg,
-				 force_reg (V16QImode, x)),
-		      UNSPEC_VPERM);
+
+  if (BYTES_BIG_ENDIAN)
+    x = gen_rtx_UNSPEC (mode,
+			gen_rtvec (3, target, reg,
+				   force_reg (V16QImode, x)),
+			UNSPEC_VPERM);
+  else 
+    {
+      /* Invert selector.  */
+      rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode,
+					 gen_rtx_CONST_INT (QImode, -1));
+      rtx tmp = gen_reg_rtx (V16QImode);
+      emit_move_insn (tmp, splat);
+      x = gen_rtx_MINUS (V16QImode, tmp, force_reg (V16QImode, x));
+      emit_move_insn (tmp, x);
+
+      /* Permute with operands reversed and adjusted selector.  */
+      x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
+			  UNSPEC_VPERM);
+    }
+
   emit_insn (gen_rtx_SET (VOIDmode, target, x));
 }
 
@@ -7830,6 +7846,107 @@  rs6000_eliminate_indexed_memrefs (rtx op
 			       copy_addr_to_reg (XEXP (operands[1], 0)));
 }
 
+/* Generate a vector of constants to permute MODE for a little-endian
+   storage operation by swapping the two halves of a vector.  */
+static rtvec
+rs6000_const_vec (enum machine_mode mode)
+{
+  int i, subparts;
+  rtvec v;
+
+  switch (mode)
+    {
+    case V2DFmode:
+    case V2DImode:
+      subparts = 2;
+      break;
+    case V4SFmode:
+    case V4SImode:
+      subparts = 4;
+      break;
+    case V8HImode:
+      subparts = 8;
+      break;
+    case V16QImode:
+      subparts = 16;
+      break;
+    default:
+      gcc_unreachable();
+    }
+
+  v = rtvec_alloc (subparts);
+
+  for (i = 0; i < subparts / 2; ++i)
+    RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i + subparts / 2);
+  for (i = subparts / 2; i < subparts; ++i)
+    RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i - subparts / 2);
+
+  return v;
+}
+
+/* Generate a permute rtx that represents an lxvd2x, stxvd2x, or xxpermdi
+   for a VSX load or store operation.  */
+rtx
+rs6000_gen_le_vsx_permute (rtx source, enum machine_mode mode)
+{
+  rtx par = gen_rtx_PARALLEL (VOIDmode, rs6000_const_vec (mode));
+  return gen_rtx_VEC_SELECT (mode, source, par);
+}
+
+/* Emit a little-endian load from vector memory location SOURCE to VSX
+   register DEST in mode MODE.  The load is done with two permuting
+   insn's that represent an lxvd2x and xxpermdi.  */
+void
+rs6000_emit_le_vsx_load (rtx dest, rtx source, enum machine_mode mode)
+{
+  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (dest) : dest;
+  rtx permute_mem = rs6000_gen_le_vsx_permute (source, mode);
+  rtx permute_reg = rs6000_gen_le_vsx_permute (tmp, mode);
+  emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_mem));
+  emit_insn (gen_rtx_SET (VOIDmode, dest, permute_reg));
+}
+
+/* Emit a little-endian store to vector memory location DEST from VSX
+   register SOURCE in mode MODE.  The store is done with two permuting
+   insn's that represent an xxpermdi and an stxvd2x.  */
+void
+rs6000_emit_le_vsx_store (rtx dest, rtx source, enum machine_mode mode)
+{
+  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (source) : source;
+  rtx permute_src = rs6000_gen_le_vsx_permute (source, mode);
+  rtx permute_tmp = rs6000_gen_le_vsx_permute (tmp, mode);
+  emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_src));
+  emit_insn (gen_rtx_SET (VOIDmode, dest, permute_tmp));
+}
+
+/* Emit a sequence representing a little-endian VSX load or store,
+   moving data from SOURCE to DEST in mode MODE.  This is done
+   separately from rs6000_emit_move to ensure it is called only
+   during expand.  LE VSX loads and stores introduced later are
+   handled with a split.  The expand-time RTL generation allows
+   us to optimize away redundant pairs of register-permutes.  */
+void
+rs6000_emit_le_vsx_move (rtx dest, rtx source, enum machine_mode mode)
+{
+  gcc_assert (!BYTES_BIG_ENDIAN
+	      && VECTOR_MEM_VSX_P (mode)
+	      && mode != TImode
+	      && !gpr_or_gpr_p (dest, source)
+	      && (MEM_P (source) ^ MEM_P (dest)));
+
+  if (MEM_P (source))
+    {
+      gcc_assert (REG_P (dest));
+      rs6000_emit_le_vsx_load (dest, source, mode);
+    }
+  else
+    {
+      if (!REG_P (source))
+	source = force_reg (mode, source);
+      rs6000_emit_le_vsx_store (dest, source, mode);
+    }
+}
+
 /* Emit a move from SOURCE to DEST in mode MODE.  */
 void
 rs6000_emit_move (rtx dest, rtx source, enum machine_mode mode)
@@ -12589,7 +12706,8 @@  rs6000_expand_builtin (tree exp, rtx tar
     case ALTIVEC_BUILTIN_MASK_FOR_LOAD:
     case ALTIVEC_BUILTIN_MASK_FOR_STORE:
       {
-	int icode = (int) CODE_FOR_altivec_lvsr;
+	int icode = (BYTES_BIG_ENDIAN ? (int) CODE_FOR_altivec_lvsr
+		     : (int) CODE_FOR_altivec_lvsl);
 	enum machine_mode tmode = insn_data[icode].operand[0].mode;
 	enum machine_mode mode = insn_data[icode].operand[1].mode;
 	tree arg;
@@ -20880,7 +20998,7 @@  output_probe_stack_range (rtx reg1, rtx
 
 static rtx
 rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE_INT val,
-		      rtx reg2, rtx rreg)
+		      rtx reg2, rtx rreg, rtx split_reg)
 {
   rtx real, temp;
 
@@ -20971,6 +21089,11 @@  rs6000_frame_related (rtx insn, rtx reg,
 	  }
     }
 
+  /* If a store insn has been split into multiple insns, the
+     true source register is given by split_reg.  */
+  if (split_reg != NULL_RTX)
+    real = gen_rtx_SET (VOIDmode, SET_DEST (real), split_reg);
+
   RTX_FRAME_RELATED_P (insn) = 1;
   add_reg_note (insn, REG_FRAME_RELATED_EXPR, real);
 
@@ -21078,7 +21201,7 @@  emit_frame_save (rtx frame_reg, enum mac
   reg = gen_rtx_REG (mode, regno);
   insn = emit_insn (gen_frame_store (reg, frame_reg, offset));
   return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp,
-			       NULL_RTX, NULL_RTX);
+			       NULL_RTX, NULL_RTX, NULL_RTX);
 }
 
 /* Emit an offset memory reference suitable for a frame store, while
@@ -21599,7 +21722,7 @@  rs6000_emit_prologue (void)
 
       insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
       rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
-			    treg, GEN_INT (-info->total_size));
+			    treg, GEN_INT (-info->total_size), NULL_RTX);
       sp_off = frame_off = info->total_size;
     }
 
@@ -21684,7 +21807,7 @@  rs6000_emit_prologue (void)
 
 	  insn = emit_move_insn (mem, reg);
 	  rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
-				NULL_RTX, NULL_RTX);
+				NULL_RTX, NULL_RTX, NULL_RTX);
 	  END_USE (0);
 	}
     }
@@ -21752,7 +21875,7 @@  rs6000_emit_prologue (void)
 				     info->lr_save_offset,
 				     DFmode, sel);
       rs6000_frame_related (insn, ptr_reg, sp_off,
-			    NULL_RTX, NULL_RTX);
+			    NULL_RTX, NULL_RTX, NULL_RTX);
       if (lr)
 	END_USE (0);
     }
@@ -21831,7 +21954,7 @@  rs6000_emit_prologue (void)
 					 SAVRES_SAVE | SAVRES_GPR);
 
 	  rs6000_frame_related (insn, spe_save_area_ptr, sp_off - save_off,
-				NULL_RTX, NULL_RTX);
+				NULL_RTX, NULL_RTX, NULL_RTX);
 	}
 
       /* Move the static chain pointer back.  */
@@ -21881,7 +22004,7 @@  rs6000_emit_prologue (void)
 				     info->lr_save_offset + ptr_off,
 				     reg_mode, sel);
       rs6000_frame_related (insn, ptr_reg, sp_off - ptr_off,
-			    NULL_RTX, NULL_RTX);
+			    NULL_RTX, NULL_RTX, NULL_RTX);
       if (lr)
 	END_USE (0);
     }
@@ -21897,7 +22020,7 @@  rs6000_emit_prologue (void)
 			     info->gp_save_offset + frame_off + reg_size * i);
       insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
       rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
-			    NULL_RTX, NULL_RTX);
+			    NULL_RTX, NULL_RTX, NULL_RTX);
     }
   else if (!WORLD_SAVE_P (info))
     {
@@ -22124,7 +22247,7 @@  rs6000_emit_prologue (void)
 				     info->altivec_save_offset + ptr_off,
 				     0, V4SImode, SAVRES_SAVE | SAVRES_VR);
       rs6000_frame_related (insn, scratch_reg, sp_off - ptr_off,
-			    NULL_RTX, NULL_RTX);
+			    NULL_RTX, NULL_RTX, NULL_RTX);
       if (REGNO (frame_reg_rtx) == REGNO (scratch_reg))
 	{
 	  /* The oddity mentioned above clobbered our frame reg.  */
@@ -22140,7 +22263,7 @@  rs6000_emit_prologue (void)
       for (i = info->first_altivec_reg_save; i <= LAST_ALTIVEC_REGNO; ++i)
 	if (info->vrsave_mask & ALTIVEC_REG_BIT (i))
 	  {
-	    rtx areg, savereg, mem;
+	    rtx areg, savereg, mem, split_reg;
 	    int offset;
 
 	    offset = (info->altivec_save_offset + frame_off
@@ -22158,8 +22281,18 @@  rs6000_emit_prologue (void)
 
 	    insn = emit_move_insn (mem, savereg);
 
+	    /* When we split a VSX store into two insns, we need to make
+	       sure the DWARF info knows which register we are storing.
+	       Pass it in to be used on the appropriate note.  */
+	    if (!BYTES_BIG_ENDIAN
+		&& GET_CODE (PATTERN (insn)) == SET
+		&& GET_CODE (SET_SRC (PATTERN (insn))) == VEC_SELECT)
+	      split_reg = savereg;
+	    else
+	      split_reg = NULL_RTX;
+
 	    rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
-				  areg, GEN_INT (offset));
+				  areg, GEN_INT (offset), split_reg);
 	  }
     }
 
@@ -28813,6 +28946,136 @@  rs6000_emit_parity (rtx dst, rtx src)
     }
 }
 
+/* Expand an Altivec constant permutation for little endian mode.
+   There are two issues: First, the two input operands must be
+   swapped so that together they form a double-wide array in LE
+   order.  Second, the vperm instruction has surprising behavior
+   in LE mode:  it interprets the elements of the source vectors
+   in BE mode ("left to right") and interprets the elements of
+   the destination vector in LE mode ("right to left").  To
+   correct for this, we must subtract each element of the permute
+   control vector from 31.
+
+   For example, suppose we want to concatenate vr10 = {0, 1, 2, 3}
+   with vr11 = {4, 5, 6, 7} and extract {0, 2, 4, 6} using a vperm.
+   We place {0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27} in vr12 to
+   serve as the permute control vector.  Then, in BE mode,
+
+     vperm 9,10,11,12
+
+   places the desired result in vr9.  However, in LE mode the 
+   vector contents will be
+
+     vr10 = 00000003 00000002 00000001 00000000
+     vr11 = 00000007 00000006 00000005 00000004
+
+   The result of the vperm using the same permute control vector is
+
+     vr9  = 05000000 07000000 01000000 03000000
+
+   That is, the leftmost 4 bytes of vr10 are interpreted as the
+   source for the rightmost 4 bytes of vr9, and so on.
+
+   If we change the permute control vector to
+
+     vr12 = {31,20,29,28,23,22,21,20,15,14,13,12,7,6,5,4}
+
+   and issue
+
+     vperm 9,11,10,12
+
+   we get the desired
+
+   vr9  = 00000006 00000004 00000002 00000000.  */
+
+void
+altivec_expand_vec_perm_const_le (rtx operands[4])
+{
+  unsigned int i;
+  rtx perm[16];
+  rtx constv, unspec;
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx op1 = operands[2];
+  rtx sel = operands[3];
+
+  /* Unpack and adjust the constant selector.  */
+  for (i = 0; i < 16; ++i)
+    {
+      rtx e = XVECEXP (sel, 0, i);
+      unsigned int elt = 31 - (INTVAL (e) & 31);
+      perm[i] = GEN_INT (elt);
+    }
+
+  /* Expand to a permute, swapping the inputs and using the
+     adjusted selector.  */
+  if (!REG_P (op0))
+    op0 = force_reg (V16QImode, op0);
+  if (!REG_P (op1))
+    op1 = force_reg (V16QImode, op1);
+
+  constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+  constv = force_reg (V16QImode, constv);
+  unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, constv),
+			   UNSPEC_VPERM);
+  if (!REG_P (target))
+    {
+      rtx tmp = gen_reg_rtx (V16QImode);
+      emit_move_insn (tmp, unspec);
+      unspec = tmp;
+    }
+
+  emit_move_insn (target, unspec);
+}
+
+/* Similarly to altivec_expand_vec_perm_const_le, we must adjust the
+   permute control vector.  But here it's not a constant, so we must
+   generate a vector splat/subtract to do the adjustment.  */
+
+void
+altivec_expand_vec_perm_le (rtx operands[4])
+{
+  rtx splat, unspec;
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx op1 = operands[2];
+  rtx sel = operands[3];
+  rtx tmp = target;
+
+  /* Get everything in regs so the pattern matches.  */
+  if (!REG_P (op0))
+    op0 = force_reg (V16QImode, op0);
+  if (!REG_P (op1))
+    op1 = force_reg (V16QImode, op1);
+  if (!REG_P (sel))
+    sel = force_reg (V16QImode, sel);
+  if (!REG_P (target))
+    tmp = gen_reg_rtx (V16QImode);
+
+  /* SEL = splat(31) - SEL.  */
+  /* We want to subtract from 31, but we can't vspltisb 31 since
+     it's out of range.  -1 works as well because only the low-order
+     five bits of the permute control vector elements are used.  */
+  splat = gen_rtx_VEC_DUPLICATE (V16QImode,
+				 gen_rtx_CONST_INT (QImode, -1));
+  emit_move_insn (tmp, splat);
+  sel = gen_rtx_MINUS (V16QImode, tmp, sel);
+  emit_move_insn (tmp, sel);
+
+  /* Permute with operands reversed and adjusted selector.  */
+  unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, tmp),
+			   UNSPEC_VPERM);
+
+  /* Copy into target, possibly by way of a register.  */
+  if (!REG_P (target))
+    {
+      emit_move_insn (tmp, unspec);
+      unspec = tmp;
+    }
+
+  emit_move_insn (target, unspec);
+}
+
 /* Expand an Altivec constant permutation.  Return true if we match
    an efficient implementation; false to fall back to VPERM.  */
 
@@ -28829,17 +29092,23 @@  altivec_expand_vec_perm_const (rtx opera
       {  1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } },
     { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum,
       {  2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } },
-    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb,
+    { OPTION_MASK_ALTIVEC, 
+      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb,
       {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },
-    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh,
+    { OPTION_MASK_ALTIVEC,
+      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh,
       {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7, 22, 23 } },
-    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw,
+    { OPTION_MASK_ALTIVEC,
+      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw,
       {  0,  1,  2,  3, 16, 17, 18, 19,  4,  5,  6,  7, 20, 21, 22, 23 } },
-    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb,
+    { OPTION_MASK_ALTIVEC,
+      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb,
       {  8, 24,  9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } },
-    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh,
+    { OPTION_MASK_ALTIVEC,
+      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh,
       {  8,  9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } },
-    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw,
+    { OPTION_MASK_ALTIVEC,
+      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw,
       {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
     { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew,
       {  0,  1,  2,  3, 16, 17, 18, 19,  8,  9, 10, 11, 24, 25, 26, 27 } },
@@ -28901,6 +29170,8 @@  altivec_expand_vec_perm_const (rtx opera
 	  break;
       if (i == 16)
 	{
+          if (!BYTES_BIG_ENDIAN)
+            elt = 15 - elt;
 	  emit_insn (gen_altivec_vspltb (target, op0, GEN_INT (elt)));
 	  return true;
 	}
@@ -28912,9 +29183,10 @@  altivec_expand_vec_perm_const (rtx opera
 	      break;
 	  if (i == 16)
 	    {
+	      int field = BYTES_BIG_ENDIAN ? elt / 2 : 7 - elt / 2;
 	      x = gen_reg_rtx (V8HImode);
 	      emit_insn (gen_altivec_vsplth (x, gen_lowpart (V8HImode, op0),
-					     GEN_INT (elt / 2)));
+					     GEN_INT (field)));
 	      emit_move_insn (target, gen_lowpart (V16QImode, x));
 	      return true;
 	    }
@@ -28930,9 +29202,10 @@  altivec_expand_vec_perm_const (rtx opera
 	      break;
 	  if (i == 16)
 	    {
+	      int field = BYTES_BIG_ENDIAN ? elt / 4 : 3 - elt / 4;
 	      x = gen_reg_rtx (V4SImode);
 	      emit_insn (gen_altivec_vspltw (x, gen_lowpart (V4SImode, op0),
-					     GEN_INT (elt / 4)));
+					     GEN_INT (field)));
 	      emit_move_insn (target, gen_lowpart (V16QImode, x));
 	      return true;
 	    }
@@ -28970,7 +29243,30 @@  altivec_expand_vec_perm_const (rtx opera
 	  enum machine_mode omode = insn_data[icode].operand[0].mode;
 	  enum machine_mode imode = insn_data[icode].operand[1].mode;
 
-	  if (swapped)
+	  /* For little-endian, don't use vpkuwum and vpkuhum if the
+	     underlying vector type is not V4SI and V8HI, respectively.
+	     For example, using vpkuwum with a V8HI picks up the even
+	     halfwords (BE numbering) when the even halfwords (LE
+	     numbering) are what we need.  */
+	  if (!BYTES_BIG_ENDIAN
+	      && icode == CODE_FOR_altivec_vpkuwum
+	      && ((GET_CODE (op0) == REG
+		   && GET_MODE (op0) != V4SImode)
+		  || (GET_CODE (op0) == SUBREG
+		      && GET_MODE (XEXP (op0, 0)) != V4SImode)))
+	    continue;
+	  if (!BYTES_BIG_ENDIAN
+	      && icode == CODE_FOR_altivec_vpkuhum
+	      && ((GET_CODE (op0) == REG
+		   && GET_MODE (op0) != V8HImode)
+		  || (GET_CODE (op0) == SUBREG
+		      && GET_MODE (XEXP (op0, 0)) != V8HImode)))
+	    continue;
+
+          /* For little-endian, the two input operands must be swapped
+             (or swapped back) to ensure proper right-to-left numbering
+             from 0 to 2N-1.  */
+	  if (swapped ^ !BYTES_BIG_ENDIAN)
 	    x = op0, op0 = op1, op1 = x;
 	  if (imode != V16QImode)
 	    {
@@ -28988,6 +29284,12 @@  altivec_expand_vec_perm_const (rtx opera
 	}
     }
 
+  if (!BYTES_BIG_ENDIAN)
+    {
+      altivec_expand_vec_perm_const_le (operands);
+      return true;
+    }
+
   return false;
 }
 
@@ -29037,6 +29339,21 @@  rs6000_expand_vec_perm_const_1 (rtx targ
       gcc_assert (GET_MODE_NUNITS (vmode) == 2);
       dmode = mode_for_vector (GET_MODE_INNER (vmode), 4);
 
+      /* For little endian, swap operands and invert/swap selectors
+	 to get the correct xxpermdi.  The operand swap sets up the
+	 inputs as a little endian array.  The selectors are swapped
+	 because they are defined to use big endian ordering.  The
+	 selectors are inverted to get the correct doublewords for
+	 little endian ordering.  */
+      if (!BYTES_BIG_ENDIAN)
+	{
+	  int n;
+	  perm0 = 3 - perm0;
+	  perm1 = 3 - perm1;
+	  n = perm0, perm0 = perm1, perm1 = n;
+	  x = op0, op0 = op1, op1 = x;
+	}
+
       x = gen_rtx_VEC_CONCAT (dmode, op0, op1);
       v = gen_rtvec (2, GEN_INT (perm0), GEN_INT (perm1));
       x = gen_rtx_VEC_SELECT (vmode, x, gen_rtx_PARALLEL (VOIDmode, v));
@@ -29132,7 +29449,7 @@  rs6000_expand_interleave (rtx target, rt
   unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
   rtx perm[16];
 
-  high = (highp == BYTES_BIG_ENDIAN ? 0 : nelt / 2);
+  high = (highp ? 0 : nelt / 2);
   for (i = 0; i < nelt / 2; i++)
     {
       perm[i * 2] = GEN_INT (i + high);
Index: gcc-4_8-test/gcc/config/rs6000/vector.md
===================================================================
--- gcc-4_8-test.orig/gcc/config/rs6000/vector.md
+++ gcc-4_8-test/gcc/config/rs6000/vector.md
@@ -88,7 +88,8 @@ 
 				 (smax "smax")])
 
 
-;; Vector move instructions.
+;; Vector move instructions.  Little-endian VSX loads and stores require
+;; special handling to circumvent "element endianness."
 (define_expand "mov<mode>"
   [(set (match_operand:VEC_M 0 "nonimmediate_operand" "")
 	(match_operand:VEC_M 1 "any_operand" ""))]
@@ -104,6 +105,16 @@ 
 	       && !vlogical_operand (operands[1], <MODE>mode))
 	operands[1] = force_reg (<MODE>mode, operands[1]);
     }
+  if (!BYTES_BIG_ENDIAN
+      && VECTOR_MEM_VSX_P (<MODE>mode)
+      && <MODE>mode != TImode
+      && !gpr_or_gpr_p (operands[0], operands[1])
+      && (memory_operand (operands[0], <MODE>mode)
+          ^ memory_operand (operands[1], <MODE>mode)))
+    {
+      rs6000_emit_le_vsx_move (operands[0], operands[1], <MODE>mode);
+      DONE;
+    }
 })
 
 ;; Generic vector floating point load/store instructions.  These will match
@@ -862,7 +873,7 @@ 
 {
   rtx reg = gen_reg_rtx (V4SFmode);
 
-  rs6000_expand_interleave (reg, operands[1], operands[1], true);
+  rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
   emit_insn (gen_vsx_xvcvspdp (operands[0], reg));
   DONE;
 })
@@ -874,7 +885,7 @@ 
 {
   rtx reg = gen_reg_rtx (V4SFmode);
 
-  rs6000_expand_interleave (reg, operands[1], operands[1], false);
+  rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
   emit_insn (gen_vsx_xvcvspdp (operands[0], reg));
   DONE;
 })
@@ -886,7 +897,7 @@ 
 {
   rtx reg = gen_reg_rtx (V4SImode);
 
-  rs6000_expand_interleave (reg, operands[1], operands[1], true);
+  rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
   emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg));
   DONE;
 })
@@ -898,7 +909,7 @@ 
 {
   rtx reg = gen_reg_rtx (V4SImode);
 
-  rs6000_expand_interleave (reg, operands[1], operands[1], false);
+  rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
   emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg));
   DONE;
 })
@@ -910,7 +921,7 @@ 
 {
   rtx reg = gen_reg_rtx (V4SImode);
 
-  rs6000_expand_interleave (reg, operands[1], operands[1], true);
+  rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
   emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg));
   DONE;
 })
@@ -922,7 +933,7 @@ 
 {
   rtx reg = gen_reg_rtx (V4SImode);
 
-  rs6000_expand_interleave (reg, operands[1], operands[1], false);
+  rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
   emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg));
   DONE;
 })
@@ -936,8 +947,19 @@ 
    (match_operand:V16QI 3 "vlogical_operand" "")]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
 {
-  emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1], operands[2],
-				       operands[3]));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1],
+    	      				 operands[2], operands[3]));
+  else
+    {
+      /* We have changed lvsr to lvsl, so to complete the transformation
+         of vperm for LE, we must swap the inputs.  */
+      rtx unspec = gen_rtx_UNSPEC (<MODE>mode,
+                                   gen_rtvec (3, operands[2],
+                                              operands[1], operands[3]),
+                                   UNSPEC_VPERM);
+      emit_move_insn (operands[0], unspec);
+    }
   DONE;
 })
 
Index: gcc-4_8-test/gcc/config/rs6000/altivec.md
===================================================================
--- gcc-4_8-test.orig/gcc/config/rs6000/altivec.md
+++ gcc-4_8-test/gcc/config/rs6000/altivec.md
@@ -649,7 +649,7 @@ 
    convert_move (small_swap, swap, 0);
  
    low_product = gen_reg_rtx (V4SImode);
-   emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
+   emit_insn (gen_altivec_vmulouh (low_product, one, two));
  
    high_product = gen_reg_rtx (V4SImode);
    emit_insn (gen_altivec_vmsumuhm (high_product, one, small_swap, zero));
@@ -676,10 +676,18 @@ 
    emit_insn (gen_vec_widen_smult_even_v8hi (even, operands[1], operands[2]));
    emit_insn (gen_vec_widen_smult_odd_v8hi (odd, operands[1], operands[2]));
 
-   emit_insn (gen_altivec_vmrghw (high, even, odd));
-   emit_insn (gen_altivec_vmrglw (low, even, odd));
-
-   emit_insn (gen_altivec_vpkuwum (operands[0], high, low));
+   if (BYTES_BIG_ENDIAN)
+     {
+       emit_insn (gen_altivec_vmrghw (high, even, odd));
+       emit_insn (gen_altivec_vmrglw (low, even, odd));
+       emit_insn (gen_altivec_vpkuwum (operands[0], high, low));
+     }
+   else
+     {
+       emit_insn (gen_altivec_vmrghw (high, odd, even));
+       emit_insn (gen_altivec_vmrglw (low, odd, even));
+       emit_insn (gen_altivec_vpkuwum (operands[0], low, high));
+     } 
 
    DONE;
 }")
@@ -967,7 +975,111 @@ 
   "vmrgow %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-(define_insn "vec_widen_umult_even_v16qi"
+(define_expand "vec_widen_umult_even_v16qi"
+  [(use (match_operand:V8HI 0 "register_operand" ""))
+   (use (match_operand:V16QI 1 "register_operand" ""))
+   (use (match_operand:V16QI 2 "register_operand" ""))]
+  "TARGET_ALTIVEC"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_widen_smult_even_v16qi"
+  [(use (match_operand:V8HI 0 "register_operand" ""))
+   (use (match_operand:V16QI 1 "register_operand" ""))
+   (use (match_operand:V16QI 2 "register_operand" ""))]
+  "TARGET_ALTIVEC"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_widen_umult_even_v8hi"
+  [(use (match_operand:V4SI 0 "register_operand" ""))
+   (use (match_operand:V8HI 1 "register_operand" ""))
+   (use (match_operand:V8HI 2 "register_operand" ""))]
+  "TARGET_ALTIVEC"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_widen_smult_even_v8hi"
+  [(use (match_operand:V4SI 0 "register_operand" ""))
+   (use (match_operand:V8HI 1 "register_operand" ""))
+   (use (match_operand:V8HI 2 "register_operand" ""))]
+  "TARGET_ALTIVEC"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_widen_umult_odd_v16qi"
+  [(use (match_operand:V8HI 0 "register_operand" ""))
+   (use (match_operand:V16QI 1 "register_operand" ""))
+   (use (match_operand:V16QI 2 "register_operand" ""))]
+  "TARGET_ALTIVEC"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_widen_smult_odd_v16qi"
+  [(use (match_operand:V8HI 0 "register_operand" ""))
+   (use (match_operand:V16QI 1 "register_operand" ""))
+   (use (match_operand:V16QI 2 "register_operand" ""))]
+  "TARGET_ALTIVEC"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_widen_umult_odd_v8hi"
+  [(use (match_operand:V4SI 0 "register_operand" ""))
+   (use (match_operand:V8HI 1 "register_operand" ""))
+   (use (match_operand:V8HI 2 "register_operand" ""))]
+  "TARGET_ALTIVEC"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "vec_widen_smult_odd_v8hi"
+  [(use (match_operand:V4SI 0 "register_operand" ""))
+   (use (match_operand:V8HI 1 "register_operand" ""))
+   (use (match_operand:V8HI 2 "register_operand" ""))]
+  "TARGET_ALTIVEC"
+{
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2]));
+  else
+    emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_insn "altivec_vmuleub"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
                       (match_operand:V16QI 2 "register_operand" "v")]
@@ -976,43 +1088,25 @@ 
   "vmuleub %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "vec_widen_smult_even_v16qi"
+(define_insn "altivec_vmuloub"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
                       (match_operand:V16QI 2 "register_operand" "v")]
-		     UNSPEC_VMULESB))]
-  "TARGET_ALTIVEC"
-  "vmulesb %0,%1,%2"
-  [(set_attr "type" "veccomplex")])
-
-(define_insn "vec_widen_umult_even_v8hi"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
-                      (match_operand:V8HI 2 "register_operand" "v")]
-		     UNSPEC_VMULEUH))]
-  "TARGET_ALTIVEC"
-  "vmuleuh %0,%1,%2"
-  [(set_attr "type" "veccomplex")])
-
-(define_insn "vec_widen_smult_even_v8hi"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
-                      (match_operand:V8HI 2 "register_operand" "v")]
-		     UNSPEC_VMULESH))]
+		     UNSPEC_VMULOUB))]
   "TARGET_ALTIVEC"
-  "vmulesh %0,%1,%2"
+  "vmuloub %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "vec_widen_umult_odd_v16qi"
+(define_insn "altivec_vmulesb"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
                       (match_operand:V16QI 2 "register_operand" "v")]
-		     UNSPEC_VMULOUB))]
+		     UNSPEC_VMULESB))]
   "TARGET_ALTIVEC"
-  "vmuloub %0,%1,%2"
+  "vmulesb %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "vec_widen_smult_odd_v16qi"
+(define_insn "altivec_vmulosb"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
         (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
                       (match_operand:V16QI 2 "register_operand" "v")]
@@ -1021,7 +1115,16 @@ 
   "vmulosb %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "vec_widen_umult_odd_v8hi"
+(define_insn "altivec_vmuleuh"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
+                      (match_operand:V8HI 2 "register_operand" "v")]
+		     UNSPEC_VMULEUH))]
+  "TARGET_ALTIVEC"
+  "vmuleuh %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+(define_insn "altivec_vmulouh"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
                       (match_operand:V8HI 2 "register_operand" "v")]
@@ -1030,7 +1133,16 @@ 
   "vmulouh %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "vec_widen_smult_odd_v8hi"
+(define_insn "altivec_vmulesh"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
+                      (match_operand:V8HI 2 "register_operand" "v")]
+		     UNSPEC_VMULESH))]
+  "TARGET_ALTIVEC"
+  "vmulesh %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+(define_insn "altivec_vmulosh"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
         (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
                       (match_operand:V8HI 2 "register_operand" "v")]
@@ -1047,7 +1159,13 @@ 
                       (match_operand:V4SI 2 "register_operand" "v")]
 		     UNSPEC_VPKPX))]
   "TARGET_ALTIVEC"
-  "vpkpx %0,%1,%2"
+  "*
+  {
+    if (BYTES_BIG_ENDIAN)
+      return \"vpkpx %0,%1,%2\";
+    else
+      return \"vpkpx %0,%2,%1\";
+  }"
   [(set_attr "type" "vecperm")])
 
 (define_insn "altivec_vpks<VI_char>ss"
@@ -1056,7 +1174,13 @@ 
 			    (match_operand:VP 2 "register_operand" "v")]
 			   UNSPEC_VPACK_SIGN_SIGN_SAT))]
   "<VI_unit>"
-  "vpks<VI_char>ss %0,%1,%2"
+  "*
+  {
+    if (BYTES_BIG_ENDIAN)
+      return \"vpks<VI_char>ss %0,%1,%2\";
+    else
+      return \"vpks<VI_char>ss %0,%2,%1\";
+  }"
   [(set_attr "type" "vecperm")])
 
 (define_insn "altivec_vpks<VI_char>us"
@@ -1065,7 +1189,13 @@ 
 			    (match_operand:VP 2 "register_operand" "v")]
 			   UNSPEC_VPACK_SIGN_UNS_SAT))]
   "<VI_unit>"
-  "vpks<VI_char>us %0,%1,%2"
+  "*
+  {
+    if (BYTES_BIG_ENDIAN)
+      return \"vpks<VI_char>us %0,%1,%2\";
+    else
+      return \"vpks<VI_char>us %0,%2,%1\";
+  }"
   [(set_attr "type" "vecperm")])
 
 (define_insn "altivec_vpku<VI_char>us"
@@ -1074,7 +1204,13 @@ 
 			    (match_operand:VP 2 "register_operand" "v")]
 			   UNSPEC_VPACK_UNS_UNS_SAT))]
   "<VI_unit>"
-  "vpku<VI_char>us %0,%1,%2"
+  "*
+  {
+    if (BYTES_BIG_ENDIAN)
+      return \"vpku<VI_char>us %0,%1,%2\";
+    else
+      return \"vpku<VI_char>us %0,%2,%1\";
+  }"
   [(set_attr "type" "vecperm")])
 
 (define_insn "altivec_vpku<VI_char>um"
@@ -1083,7 +1219,13 @@ 
 			    (match_operand:VP 2 "register_operand" "v")]
 			   UNSPEC_VPACK_UNS_UNS_MOD))]
   "<VI_unit>"
-  "vpku<VI_char>um %0,%1,%2"
+  "*
+  {
+    if (BYTES_BIG_ENDIAN)
+      return \"vpku<VI_char>um %0,%1,%2\";
+    else
+      return \"vpku<VI_char>um %0,%2,%1\";
+  }"
   [(set_attr "type" "vecperm")])
 
 (define_insn "*altivec_vrl<VI_char>"
@@ -1276,7 +1418,12 @@ 
 		       (match_operand:V16QI 3 "register_operand" "")]
 		      UNSPEC_VPERM))]
   "TARGET_ALTIVEC"
-  "")
+{
+  if (!BYTES_BIG_ENDIAN) {
+    altivec_expand_vec_perm_le (operands);
+    DONE;
+  }
+})
 
 (define_expand "vec_perm_constv16qi"
   [(match_operand:V16QI 0 "register_operand" "")
@@ -1928,25 +2075,26 @@ 
   rtx vzero = gen_reg_rtx (V8HImode);
   rtx mask = gen_reg_rtx (V16QImode);
   rtvec v = rtvec_alloc (16);
+  bool be = BYTES_BIG_ENDIAN;
    
   emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
    
-  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 0);
-  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1);
-  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 2);
-  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3);
-  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 4);
-  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5);
-  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 6);
-  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7);
+  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
+  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  0 : 16);
+  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 :  6);
+  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
+  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
+  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ?  2 : 16);
+  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 :  4);
+  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
+  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
+  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ?  4 : 16);
+  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 :  2);
+  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
+  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
+  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ?  6 : 16);
+  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
+  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
 
   emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
@@ -1963,25 +2111,26 @@ 
   rtx vzero = gen_reg_rtx (V4SImode);
   rtx mask = gen_reg_rtx (V16QImode);
   rtvec v = rtvec_alloc (16);
+  bool be = BYTES_BIG_ENDIAN;
 
   emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
  
-  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17);
-  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 0);
-  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1);
-  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17);
-  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 2);
-  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3);
-  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17);
-  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 4);
-  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5);
-  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17);
-  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 6);
-  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7);
+  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
+  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ? 17 :  6);
+  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ?  0 : 17);
+  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
+  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
+  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 17 :  4);
+  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ?  2 : 17);
+  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
+  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
+  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 17 :  2);
+  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ?  4 : 17);
+  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
+  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
+  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 :  0);
+  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ?  6 : 17);
+  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
 
   emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
@@ -1998,25 +2147,26 @@ 
   rtx vzero = gen_reg_rtx (V8HImode);
   rtx mask = gen_reg_rtx (V16QImode);
   rtvec v = rtvec_alloc (16);
+  bool be = BYTES_BIG_ENDIAN;
 
   emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
 
-  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 8);
-  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9);
-  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 10);
-  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11);
-  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 12);
-  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13);
-  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 14);
-  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15);
+  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
+  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  8 : 16);
+  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14);
+  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  9 : 16);
+  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
+  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16);
+  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12);
+  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
+  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
+  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16);
+  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10);
+  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
+  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  9);
+  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16);
+  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  8);
+  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
 
   emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
@@ -2033,25 +2183,26 @@ 
   rtx vzero = gen_reg_rtx (V4SImode);
   rtx mask = gen_reg_rtx (V16QImode);
   rtvec v = rtvec_alloc (16);
+  bool be = BYTES_BIG_ENDIAN;
 
   emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
  
-  RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17);
-  RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 8);
-  RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9);
-  RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17);
-  RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 10);
-  RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11);
-  RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17);
-  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 12);
-  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13);
-  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
-  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17);
-  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 14);
-  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15);
+  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
+  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14);
+  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ?  8 : 17);
+  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  9 : 16);
+  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
+  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12);
+  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17);
+  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
+  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
+  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10);
+  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17);
+  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
+  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  9);
+  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 :  8);
+  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
+  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
 
   emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
   emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
@@ -2071,7 +2222,10 @@ 
   
   emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
   emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
-  emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
+  else
+    emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
   DONE;
 }")
 
@@ -2088,7 +2242,10 @@ 
   
   emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
   emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
-  emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
+  else
+    emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
   DONE;
 }")
 
@@ -2105,7 +2262,10 @@ 
   
   emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
   emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
-  emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
+  else
+    emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
   DONE;
 }")
 
@@ -2122,7 +2282,10 @@ 
   
   emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
   emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
-  emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
+  else
+    emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
   DONE;
 }")
 
@@ -2139,7 +2302,10 @@ 
   
   emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
   emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
-  emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
+  else
+    emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
   DONE;
 }")
 
@@ -2156,7 +2322,10 @@ 
   
   emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
   emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
-  emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
+  else
+    emit_insn (gen_altivec_vmrglw (operands[0], vo, ve));
   DONE;
 }")
 
@@ -2173,7 +2342,10 @@ 
   
   emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
   emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
-  emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
+  else
+    emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
   DONE;
 }")
 
@@ -2190,7 +2362,10 @@ 
   
   emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
   emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
-  emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
+  else
+    emit_insn (gen_altivec_vmrglw (operands[0], vo, ve));
   DONE;
 }")
 
Index: gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc-4_8-test.orig/gcc/config/rs6000/rs6000-protos.h
+++ gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h
@@ -56,6 +56,7 @@  extern void paired_expand_vector_init (r
 extern void rs6000_expand_vector_set (rtx, rtx, int);
 extern void rs6000_expand_vector_extract (rtx, rtx, int);
 extern bool altivec_expand_vec_perm_const (rtx op[4]);
+extern void altivec_expand_vec_perm_le (rtx op[4]);
 extern bool rs6000_expand_vec_perm_const (rtx op[4]);
 extern void rs6000_expand_extract_even (rtx, rtx, rtx);
 extern void rs6000_expand_interleave (rtx, rtx, rtx, bool);
@@ -122,6 +123,7 @@  extern rtx rs6000_longcall_ref (rtx);
 extern void rs6000_fatal_bad_address (rtx);
 extern rtx create_TOC_reference (rtx, rtx);
 extern void rs6000_split_multireg_move (rtx, rtx);
+extern void rs6000_emit_le_vsx_move (rtx, rtx, enum machine_mode);
 extern void rs6000_emit_move (rtx, rtx, enum machine_mode);
 extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode);
 extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode,
Index: gcc-4_8-test/gcc/config/rs6000/vsx.md
===================================================================
--- gcc-4_8-test.orig/gcc/config/rs6000/vsx.md
+++ gcc-4_8-test/gcc/config/rs6000/vsx.md
@@ -216,6 +216,359 @@ 
   ])
 
 ;; VSX moves
+
+;; The patterns for LE permuted loads and stores come before the general
+;; VSX moves so they match first.
+(define_insn_and_split "*vsx_le_perm_load_<mode>"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+        (match_operand:VSX_D 1 "memory_operand" "Z"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  "#"
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  [(set (match_dup 2)
+        (vec_select:<MODE>
+          (match_dup 1)
+          (parallel [(const_int 1) (const_int 0)])))
+   (set (match_dup 0)
+        (vec_select:<MODE>
+          (match_dup 2)
+          (parallel [(const_int 1) (const_int 0)])))]
+  "
+{
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
+                                       : operands[0];
+}
+  "
+  [(set_attr "type" "vecload")
+   (set_attr "length" "8")])
+
+(define_insn_and_split "*vsx_le_perm_load_<mode>"
+  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
+        (match_operand:VSX_W 1 "memory_operand" "Z"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  "#"
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  [(set (match_dup 2)
+        (vec_select:<MODE>
+          (match_dup 1)
+          (parallel [(const_int 2) (const_int 3)
+                     (const_int 0) (const_int 1)])))
+   (set (match_dup 0)
+        (vec_select:<MODE>
+          (match_dup 2)
+          (parallel [(const_int 2) (const_int 3)
+                     (const_int 0) (const_int 1)])))]
+  "
+{
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
+                                       : operands[0];
+}
+  "
+  [(set_attr "type" "vecload")
+   (set_attr "length" "8")])
+
+(define_insn_and_split "*vsx_le_perm_load_v8hi"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
+        (match_operand:V8HI 1 "memory_operand" "Z"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  "#"
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  [(set (match_dup 2)
+        (vec_select:V8HI
+          (match_dup 1)
+          (parallel [(const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)])))
+   (set (match_dup 0)
+        (vec_select:V8HI
+          (match_dup 2)
+          (parallel [(const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)])))]
+  "
+{
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
+                                       : operands[0];
+}
+  "
+  [(set_attr "type" "vecload")
+   (set_attr "length" "8")])
+
+(define_insn_and_split "*vsx_le_perm_load_v16qi"
+  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+        (match_operand:V16QI 1 "memory_operand" "Z"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  "#"
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  [(set (match_dup 2)
+        (vec_select:V16QI
+          (match_dup 1)
+          (parallel [(const_int 8) (const_int 9)
+                     (const_int 10) (const_int 11)
+                     (const_int 12) (const_int 13)
+                     (const_int 14) (const_int 15)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)])))
+   (set (match_dup 0)
+        (vec_select:V16QI
+          (match_dup 2)
+          (parallel [(const_int 8) (const_int 9)
+                     (const_int 10) (const_int 11)
+                     (const_int 12) (const_int 13)
+                     (const_int 14) (const_int 15)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)])))]
+  "
+{
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
+                                       : operands[0];
+}
+  "
+  [(set_attr "type" "vecload")
+   (set_attr "length" "8")])
+
+(define_insn "*vsx_le_perm_store_<mode>"
+  [(set (match_operand:VSX_D 0 "memory_operand" "=Z")
+        (match_operand:VSX_D 1 "vsx_register_operand" "+wa"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  "#"
+  [(set_attr "type" "vecstore")
+   (set_attr "length" "12")])
+
+(define_split
+  [(set (match_operand:VSX_D 0 "memory_operand" "")
+        (match_operand:VSX_D 1 "vsx_register_operand" ""))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
+  [(set (match_dup 2)
+        (vec_select:<MODE>
+          (match_dup 1)
+          (parallel [(const_int 1) (const_int 0)])))
+   (set (match_dup 0)
+        (vec_select:<MODE>
+          (match_dup 2)
+          (parallel [(const_int 1) (const_int 0)])))]
+{
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) 
+                                       : operands[1];
+})
+
+;; The post-reload split requires that we re-permute the source
+;; register in case it is still live.
+(define_split
+  [(set (match_operand:VSX_D 0 "memory_operand" "")
+        (match_operand:VSX_D 1 "vsx_register_operand" ""))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
+  [(set (match_dup 1)
+        (vec_select:<MODE>
+          (match_dup 1)
+          (parallel [(const_int 1) (const_int 0)])))
+   (set (match_dup 0)
+        (vec_select:<MODE>
+          (match_dup 1)
+          (parallel [(const_int 1) (const_int 0)])))
+   (set (match_dup 1)
+        (vec_select:<MODE>
+          (match_dup 1)
+          (parallel [(const_int 1) (const_int 0)])))]
+  "")
+
+(define_insn "*vsx_le_perm_store_<mode>"
+  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
+        (match_operand:VSX_W 1 "vsx_register_operand" "+wa"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  "#"
+  [(set_attr "type" "vecstore")
+   (set_attr "length" "12")])
+
+(define_split
+  [(set (match_operand:VSX_W 0 "memory_operand" "")
+        (match_operand:VSX_W 1 "vsx_register_operand" ""))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
+  [(set (match_dup 2)
+        (vec_select:<MODE>
+          (match_dup 1)
+          (parallel [(const_int 2) (const_int 3)
+	             (const_int 0) (const_int 1)])))
+   (set (match_dup 0)
+        (vec_select:<MODE>
+          (match_dup 2)
+          (parallel [(const_int 2) (const_int 3)
+	             (const_int 0) (const_int 1)])))]
+{
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) 
+                                       : operands[1];
+})
+
+;; The post-reload split requires that we re-permute the source
+;; register in case it is still live.
+(define_split
+  [(set (match_operand:VSX_W 0 "memory_operand" "")
+        (match_operand:VSX_W 1 "vsx_register_operand" ""))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
+  [(set (match_dup 1)
+        (vec_select:<MODE>
+          (match_dup 1)
+          (parallel [(const_int 2) (const_int 3)
+	             (const_int 0) (const_int 1)])))
+   (set (match_dup 0)
+        (vec_select:<MODE>
+          (match_dup 1)
+          (parallel [(const_int 2) (const_int 3)
+	             (const_int 0) (const_int 1)])))
+   (set (match_dup 1)
+        (vec_select:<MODE>
+          (match_dup 1)
+          (parallel [(const_int 2) (const_int 3)
+	             (const_int 0) (const_int 1)])))]
+  "")
+
+(define_insn "*vsx_le_perm_store_v8hi"
+  [(set (match_operand:V8HI 0 "memory_operand" "=Z")
+        (match_operand:V8HI 1 "vsx_register_operand" "+wa"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  "#"
+  [(set_attr "type" "vecstore")
+   (set_attr "length" "12")])
+
+(define_split
+  [(set (match_operand:V8HI 0 "memory_operand" "")
+        (match_operand:V8HI 1 "vsx_register_operand" ""))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
+  [(set (match_dup 2)
+        (vec_select:V8HI
+          (match_dup 1)
+          (parallel [(const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)])))
+   (set (match_dup 0)
+        (vec_select:V8HI
+          (match_dup 2)
+          (parallel [(const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)])))]
+{
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) 
+                                       : operands[1];
+})
+
+;; The post-reload split requires that we re-permute the source
+;; register in case it is still live.
+(define_split
+  [(set (match_operand:V8HI 0 "memory_operand" "")
+        (match_operand:V8HI 1 "vsx_register_operand" ""))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
+  [(set (match_dup 1)
+        (vec_select:V8HI
+          (match_dup 1)
+          (parallel [(const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)])))
+   (set (match_dup 0)
+        (vec_select:V8HI
+          (match_dup 1)
+          (parallel [(const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)])))
+   (set (match_dup 1)
+        (vec_select:V8HI
+          (match_dup 1)
+          (parallel [(const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)])))]
+  "")
+
+(define_insn "*vsx_le_perm_store_v16qi"
+  [(set (match_operand:V16QI 0 "memory_operand" "=Z")
+        (match_operand:V16QI 1 "vsx_register_operand" "+wa"))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX"
+  "#"
+  [(set_attr "type" "vecstore")
+   (set_attr "length" "12")])
+
+(define_split
+  [(set (match_operand:V16QI 0 "memory_operand" "")
+        (match_operand:V16QI 1 "vsx_register_operand" ""))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
+  [(set (match_dup 2)
+        (vec_select:V16QI
+          (match_dup 1)
+          (parallel [(const_int 8) (const_int 9)
+                     (const_int 10) (const_int 11)
+                     (const_int 12) (const_int 13)
+                     (const_int 14) (const_int 15)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)])))
+   (set (match_dup 0)
+        (vec_select:V16QI
+          (match_dup 2)
+          (parallel [(const_int 8) (const_int 9)
+                     (const_int 10) (const_int 11)
+                     (const_int 12) (const_int 13)
+                     (const_int 14) (const_int 15)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)])))]
+{
+  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1]) 
+                                       : operands[1];
+})
+
+;; The post-reload split requires that we re-permute the source
+;; register in case it is still live.
+(define_split
+  [(set (match_operand:V16QI 0 "memory_operand" "")
+        (match_operand:V16QI 1 "vsx_register_operand" ""))]
+  "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
+  [(set (match_dup 1)
+        (vec_select:V16QI
+          (match_dup 1)
+          (parallel [(const_int 8) (const_int 9)
+                     (const_int 10) (const_int 11)
+                     (const_int 12) (const_int 13)
+                     (const_int 14) (const_int 15)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)])))
+   (set (match_dup 0)
+        (vec_select:V16QI
+          (match_dup 1)
+          (parallel [(const_int 8) (const_int 9)
+                     (const_int 10) (const_int 11)
+                     (const_int 12) (const_int 13)
+                     (const_int 14) (const_int 15)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)])))
+   (set (match_dup 1)
+        (vec_select:V16QI
+          (match_dup 1)
+          (parallel [(const_int 8) (const_int 9)
+                     (const_int 10) (const_int 11)
+                     (const_int 12) (const_int 13)
+                     (const_int 14) (const_int 15)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)])))]
+  "")
+
+
 (define_insn "*vsx_mov<mode>"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?wa,?wa,wQ,?&r,??Y,??r,??r,<VSr>,?wa,*r,v,wZ, v")
 	(match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,wa,Z,wa,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
@@ -962,7 +1315,12 @@ 
 	 (match_operand:<VS_scalar> 1 "vsx_register_operand" "ws,wa")
 	 (match_operand:<VS_scalar> 2 "vsx_register_operand" "ws,wa")))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
-  "xxpermdi %x0,%x1,%x2,0"
+{
+  if (BYTES_BIG_ENDIAN)
+    return "xxpermdi %x0,%x1,%x2,0";
+  else
+    return "xxpermdi %x0,%x2,%x1,0";
+}
   [(set_attr "type" "vecperm")])
 
 ;; Special purpose concat using xxpermdi to glue two single precision values
@@ -975,9 +1333,161 @@ 
 	  (match_operand:SF 2 "vsx_register_operand" "f,f")]
 	 UNSPEC_VSX_CONCAT))]
   "VECTOR_MEM_VSX_P (V2DFmode)"
-  "xxpermdi %x0,%x1,%x2,0"
+{
+  if (BYTES_BIG_ENDIAN)
+    return "xxpermdi %x0,%x1,%x2,0";
+  else
+    return "xxpermdi %x0,%x2,%x1,0";
+}
+  [(set_attr "type" "vecperm")])
+
+;; xxpermdi for little endian loads and stores.  We need several of
+;; these since the form of the PARALLEL differs by mode.
+(define_insn "*vsx_xxpermdi2_le_<mode>"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+        (vec_select:VSX_D
+          (match_operand:VSX_D 1 "vsx_register_operand" "wa")
+          (parallel [(const_int 1) (const_int 0)])))]
+  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "xxpermdi %x0,%x1,%x1,2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "*vsx_xxpermdi4_le_<mode>"
+  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
+        (vec_select:VSX_W
+          (match_operand:VSX_W 1 "vsx_register_operand" "wa")
+          (parallel [(const_int 2) (const_int 3)
+                     (const_int 0) (const_int 1)])))]
+  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "xxpermdi %x0,%x1,%x1,2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "*vsx_xxpermdi8_le_V8HI"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
+        (vec_select:V8HI
+          (match_operand:V8HI 1 "vsx_register_operand" "wa")
+          (parallel [(const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)])))]
+  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
+  "xxpermdi %x0,%x1,%x1,2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "*vsx_xxpermdi16_le_V16QI"
+  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+        (vec_select:V16QI
+          (match_operand:V16QI 1 "vsx_register_operand" "wa")
+          (parallel [(const_int 8) (const_int 9)
+                     (const_int 10) (const_int 11)
+                     (const_int 12) (const_int 13)
+                     (const_int 14) (const_int 15)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)])))]
+  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
+  "xxpermdi %x0,%x1,%x1,2"
   [(set_attr "type" "vecperm")])
 
+;; lxvd2x for little endian loads.  We need several of
+;; these since the form of the PARALLEL differs by mode.
+(define_insn "*vsx_lxvd2x2_le_<mode>"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+        (vec_select:VSX_D
+          (match_operand:VSX_D 1 "memory_operand" "Z")
+          (parallel [(const_int 1) (const_int 0)])))]
+  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "lxvd2x %x0,%y1"
+  [(set_attr "type" "vecload")])
+
+(define_insn "*vsx_lxvd2x4_le_<mode>"
+  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
+        (vec_select:VSX_W
+          (match_operand:VSX_W 1 "memory_operand" "Z")
+          (parallel [(const_int 2) (const_int 3)
+                     (const_int 0) (const_int 1)])))]
+  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "lxvd2x %x0,%y1"
+  [(set_attr "type" "vecload")])
+
+(define_insn "*vsx_lxvd2x8_le_V8HI"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
+        (vec_select:V8HI
+          (match_operand:V8HI 1 "memory_operand" "Z")
+          (parallel [(const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)])))]
+  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
+  "lxvd2x %x0,%y1"
+  [(set_attr "type" "vecload")])
+
+(define_insn "*vsx_lxvd2x16_le_V16QI"
+  [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+        (vec_select:V16QI
+          (match_operand:V16QI 1 "memory_operand" "Z")
+          (parallel [(const_int 8) (const_int 9)
+                     (const_int 10) (const_int 11)
+                     (const_int 12) (const_int 13)
+                     (const_int 14) (const_int 15)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)])))]
+  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
+  "lxvd2x %x0,%y1"
+  [(set_attr "type" "vecload")])
+
+;; stxvd2x for little endian stores.  We need several of
+;; these since the form of the PARALLEL differs by mode.
+(define_insn "*vsx_stxvd2x2_le_<mode>"
+  [(set (match_operand:VSX_D 0 "memory_operand" "=Z")
+        (vec_select:VSX_D
+          (match_operand:VSX_D 1 "vsx_register_operand" "wa")
+          (parallel [(const_int 1) (const_int 0)])))]
+  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "stxvd2x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
+(define_insn "*vsx_stxvd2x4_le_<mode>"
+  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
+        (vec_select:VSX_W
+          (match_operand:VSX_W 1 "vsx_register_operand" "wa")
+          (parallel [(const_int 2) (const_int 3)
+                     (const_int 0) (const_int 1)])))]
+  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
+  "stxvd2x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
+(define_insn "*vsx_stxvd2x8_le_V8HI"
+  [(set (match_operand:V8HI 0 "memory_operand" "=Z")
+        (vec_select:V8HI
+          (match_operand:V8HI 1 "vsx_register_operand" "wa")
+          (parallel [(const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)])))]
+  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
+  "stxvd2x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
+(define_insn "*vsx_stxvd2x16_le_V16QI"
+  [(set (match_operand:V16QI 0 "memory_operand" "=Z")
+        (vec_select:V16QI
+          (match_operand:V16QI 1 "vsx_register_operand" "wa")
+          (parallel [(const_int 8) (const_int 9)
+                     (const_int 10) (const_int 11)
+                     (const_int 12) (const_int 13)
+                     (const_int 14) (const_int 15)
+                     (const_int 0) (const_int 1)
+                     (const_int 2) (const_int 3)
+                     (const_int 4) (const_int 5)
+                     (const_int 6) (const_int 7)])))]
+  "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
+  "stxvd2x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
 ;; Set the element of a V2DI/VD2F mode
 (define_insn "vsx_set_<mode>"
   [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?wa")
@@ -987,9 +1497,10 @@ 
 		      UNSPEC_VSX_SET))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  if (INTVAL (operands[3]) == 0)
+  int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
+  if (INTVAL (operands[3]) == idx_first)
     return \"xxpermdi %x0,%x2,%x1,1\";
-  else if (INTVAL (operands[3]) == 1)
+  else if (INTVAL (operands[3]) == 1 - idx_first)
     return \"xxpermdi %x0,%x1,%x2,0\";
   else
     gcc_unreachable ();
@@ -1004,8 +1515,12 @@ 
 			[(match_operand:QI 2 "u5bit_cint_operand" "i,i,i")])))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
+  int fldDM;
   gcc_assert (UINTVAL (operands[2]) <= 1);
-  operands[3] = GEN_INT (INTVAL (operands[2]) << 1);
+  fldDM = INTVAL (operands[2]) << 1;
+  if (!BYTES_BIG_ENDIAN)
+    fldDM = 3 - fldDM;
+  operands[3] = GEN_INT (fldDM);
   return \"xxpermdi %x0,%x1,%x1,%3\";
 }
   [(set_attr "type" "vecperm")])
@@ -1025,6 +1540,21 @@ 
 	(const_string "fpload")))
    (set_attr "length" "4")])  
 
+;; Optimize extracting element 1 from memory for little endian
+(define_insn "*vsx_extract_<mode>_one_le"
+  [(set (match_operand:<VS_scalar> 0 "vsx_register_operand" "=ws,d,?wa")
+	(vec_select:<VS_scalar>
+	 (match_operand:VSX_D 1 "indexed_or_indirect_operand" "Z,Z,Z")
+	 (parallel [(const_int 1)])))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && !WORDS_BIG_ENDIAN"
+  "lxsd%U1x %x0,%y1"
+  [(set (attr "type")
+      (if_then_else
+	(match_test "update_indexed_address_mem (operands[1], VOIDmode)")
+	(const_string "fpload_ux")
+	(const_string "fpload")))
+   (set_attr "length" "4")])  
+
 ;; Extract a SF element from V4SF
 (define_insn_and_split "vsx_extract_v4sf"
   [(set (match_operand:SF 0 "vsx_register_operand" "=f,f")
@@ -1045,7 +1575,7 @@ 
   rtx op2 = operands[2];
   rtx op3 = operands[3];
   rtx tmp;
-  HOST_WIDE_INT ele = INTVAL (op2);
+  HOST_WIDE_INT ele = BYTES_BIG_ENDIAN ? INTVAL (op2) : 3 - INTVAL (op2);
 
   if (ele == 0)
     tmp = op1;
Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/fusion.c
===================================================================
--- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/fusion.c
+++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/fusion.c
@@ -1,5 +1,6 @@ 
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
 
Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr43154.c
===================================================================
--- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/pr43154.c
+++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr43154.c
@@ -1,5 +1,6 @@ 
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-O2 -mcpu=power7" } */
 
Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
===================================================================
--- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
+++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
@@ -19,19 +19,6 @@  V b4(V x)
   return __builtin_shuffle(x, (V){ 4,5,6,7, 4,5,6,7, 4,5,6,7, 4,5,6,7, });
 }
 
-V p2(V x, V y)
-{
-  return __builtin_shuffle(x, y,
-	(V){ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 });
-
-}
-
-V p4(V x, V y)
-{
-  return __builtin_shuffle(x, y,
-	(V){ 2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 });
-}
-
 V h1(V x, V y)
 {
   return __builtin_shuffle(x, y,
@@ -72,5 +59,3 @@  V l4(V x, V y)
 /* { dg-final { scan-assembler "vspltb" } } */
 /* { dg-final { scan-assembler "vsplth" } } */
 /* { dg-final { scan-assembler "vspltw" } } */
-/* { dg-final { scan-assembler "vpkuhum" } } */
-/* { dg-final { scan-assembler "vpkuwum" } } */
Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c
===================================================================
--- /dev/null
+++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
+/* { dg-options "-O -maltivec -mno-vsx" } */
+
+typedef unsigned char V __attribute__((vector_size(16)));
+
+V p2(V x, V y)
+{
+  return __builtin_shuffle(x, y,
+	(V){ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 });
+
+}
+
+V p4(V x, V y)
+{
+  return __builtin_shuffle(x, y,
+	(V){ 2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 });
+}
+
+/* { dg-final { scan-assembler-not "vperm" } } */
+/* { dg-final { scan-assembler "vpkuhum" } } */
+/* { dg-final { scan-assembler "vpkuwum" } } */
Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/eg-5.c
===================================================================
--- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/eg-5.c
+++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/eg-5.c
@@ -7,10 +7,17 @@  matvecmul4 (vector float c0, vector floa
   /* Set result to a vector of f32 0's */
   vector float result = ((vector float){0.,0.,0.,0.});
 
+#ifdef __LITTLE_ENDIAN__
+  result  = vec_madd (c0, vec_splat (v, 3), result);
+  result  = vec_madd (c1, vec_splat (v, 2), result);
+  result  = vec_madd (c2, vec_splat (v, 1), result);
+  result  = vec_madd (c3, vec_splat (v, 0), result);
+#else
   result  = vec_madd (c0, vec_splat (v, 0), result);
   result  = vec_madd (c1, vec_splat (v, 1), result);
   result  = vec_madd (c2, vec_splat (v, 2), result);
   result  = vec_madd (c3, vec_splat (v, 3), result);
+#endif
 
   return result;
 }
Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c
===================================================================
--- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c
+++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/gcc-bug-i.c
@@ -13,12 +13,27 @@ 
 #define DO_INLINE __attribute__ ((always_inline))
 #define DONT_INLINE __attribute__ ((noinline))
 
+#ifdef __LITTLE_ENDIAN__
+static inline DO_INLINE int inline_me(vector signed short data)
+{
+  union {vector signed short v; signed short s[8];} u;
+  signed short x;
+  unsigned char x1, x2;
+
+  u.v = data;
+  x = u.s[7];
+  x1 = (x >> 8) & 0xff;
+  x2 = x & 0xff;
+  return ((x2 << 8) | x1);
+}
+#else
 static inline DO_INLINE int inline_me(vector signed short data) 
 {
   union {vector signed short v; signed short s[8];} u;
   u.v = data;
   return u.s[7];
 }
+#endif
 
 static DONT_INLINE int foo(vector signed short data)
 {
Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/vec-set.c
===================================================================
--- /dev/null
+++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/vec-set.c
@@ -0,0 +1,14 @@ 
+#include "harness.h"
+
+vector short
+vec_set (short m)
+{
+  return (vector short){m, 0, 0, 0, 0, 0, 0, 0};
+}
+
+static void test()
+{
+  check (vec_all_eq (vec_set (7),
+		     ((vector short){7, 0, 0, 0, 0, 0, 0, 0})),
+	 "vec_set");
+}
Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/3b-15.c
===================================================================
--- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vmx/3b-15.c
+++ gcc-4_8-test/gcc/testsuite/gcc.dg/vmx/3b-15.c
@@ -3,7 +3,11 @@ 
 vector unsigned char
 f (vector unsigned char a, vector unsigned char b, vector unsigned char c)
 {
+#ifdef __BIG_ENDIAN__
   return vec_perm(a,b,c); 
+#else
+  return vec_perm(b,a,c);
+#endif
 }
 
 static void test()
@@ -12,8 +16,13 @@  static void test()
 					    8,9,10,11,12,13,14,15}),
 		     ((vector unsigned char){70,71,72,73,74,75,76,77,
 					    78,79,80,81,82,83,84,85}),
+#ifdef __BIG_ENDIAN__
 		     ((vector unsigned char){0x1,0x14,0x18,0x10,0x16,0x15,0x19,0x1a,
 					    0x1c,0x1c,0x1c,0x12,0x8,0x1d,0x1b,0xe})),
+#else
+                     ((vector unsigned char){0x1e,0xb,0x7,0xf,0x9,0xa,0x6,0x5,
+                                            0x3,0x3,0x3,0xd,0x17,0x2,0x4,0x11})),
+#endif
 		   ((vector unsigned char){1,74,78,70,76,75,79,80,82,82,82,72,8,83,81,14})),
 	"f");
 }
Index: gcc-4_8-test/libcpp/lex.c
===================================================================
--- gcc-4_8-test.orig/libcpp/lex.c
+++ gcc-4_8-test/libcpp/lex.c
@@ -559,8 +559,13 @@  search_line_fast (const uchar *s, const
      beginning with all ones and shifting in zeros according to the
      mis-alignment.  The LVSR instruction pulls the exact shift we
      want from the address.  */
+#ifdef __BIG_ENDIAN__
   mask = __builtin_vec_lvsr(0, s);
   mask = __builtin_vec_perm(zero, ones, mask);
+#else
+  mask = __builtin_vec_lvsl(0, s);
+  mask = __builtin_vec_perm(ones, zero, mask);
+#endif
   data &= mask;
 
   /* While altivec loads mask addresses, we still need to align S so
@@ -624,7 +629,11 @@  search_line_fast (const uchar *s, const
     /* L now contains 0xff in bytes for which we matched one of the
        relevant characters.  We can find the byte index by finding
        its bit index and dividing by 8.  */
+#ifdef __BIG_ENDIAN__
     l = __builtin_clzl(l) >> 3;
+#else
+    l = __builtin_ctzl(l) >> 3;
+#endif
     return s + l;
 
 #undef N
Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr48258-1.c
===================================================================
--- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/pr48258-1.c
+++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr48258-1.c
@@ -1,5 +1,6 @@ 
 /* { dg-do compile } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-O3 -mcpu=power7 -mabi=altivec -ffast-math -fno-unroll-loops" } */
 /* { dg-final { scan-assembler-times "xvaddsp" 3 } } */
Index: gcc-4_8-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
===================================================================
--- gcc-4_8-test.orig/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
+++ gcc-4_8-test/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target vect_int } */
+/* { dg-skip-if "cost too high" { powerpc*le-*-* } { "*" } { "" } } */
 
 #include <stdarg.h>
 #include "../../tree-vect.h"