Patchwork More improvements to sparc VIS vec_init code generation.

login
register
mail settings
Submitter David Miller
Date Nov. 6, 2011, 2:39 a.m.
Message ID <20111105.223920.1520491100204286381.davem@redhat.com>
Download mbox | patch
Permalink /patch/123910/
State New
Headers show

Comments

David Miller - Nov. 6, 2011, 2:39 a.m.
Eric, the testsuite target tests for vis2 and vi3 capable hardware
work well in my own testing but if you find some problem with how
it's done just let me know and I'll try to fix it up.

I'm almost %100 satisfied with the code generation for vec_init now.
The one remaining case where I think we can do better is initializing
a V8QImode vector using bshuffle with more than 4 unique inputs.

I've been trying to come up with a trick to use fpmerge to get the
last few bytes into place for the bshuffle, but it's a bit of a
challenge because the bytes don't propagate into the destination in a
convenient way.  In particular it can't be done without fighting the
natural move coalescing from the RTL optimizers that cleans up these
RTL expansions.

The vector_init_bshuffle() code tries to rely upon moves as much as
possible to put the bshuffle inputs into place, because such moves
are typically completely optimized away by the compiler.

So something like:

__v4hi foo(short a, short b, short c, short d)
{
  __v4hi x = { a, b, c, d };
  return x;
}

generates:

foo:
	movwtos	%o0, %f2
	movwtos	%o1, %f3
	movwtos	%o2, %f4
	movwtos	%o3, %f5
	sethi	%hi(bmask_val), %g1
	or	%g1, %lo(bmask_val), %g1
	bmask	%g1, %g0, %g1
	retl
	 bshuffle %f2, %f4, %f0

for VIS3.

For cases where we only load part of a register input to the bshuffle
instruction, and the rest of the register is "don't care" we have a
preceeding clobber emitted so that the compiler doesn't try to zero
initialize the uninitialized bits.

Support for the short floating point loads starts to show up here as
well, and I intend to flesh these out, support the short float stores,
and add VIS intrinsic access to them.

Richard, is there a better way to represent this in RTL?  These
instructions basically load a single byte or half-word into the bottom
of a 64-bit float register, and clear the rest of that register with
zeros.  So the v4hi one is essentially loading the vector:

	[(const_int 0) (const_int 0)
         (const_int 0) (mem:HI (register:P ...))]

into the destination 64-bit float reg.

For now I'm simply using an unspec.

Committed to trunk.

gcc/

	* config/sparc/sparc.md (UNSPEC_SHORT_LOAD): New unspec.
	(zero-extend_v8qi_vis, zero_extend_v4hi_vis): New expanders.
	(*zero_extend_v8qi_<P:mode>_insn,
	*zero_extend_v4hi_<P:mode>_insn): New insns.
	* config/sparc/sparc.c (vector_init_move_words)
	(vector_init_prepare_elts, sparc_expand_vector_init_vis2,
	sparc_expand_vector_init_vis1): New functions.
	(vector_init_bshuffle): Rewrite to handle more cases and make use
	of locs[] array prepared by vector_init_prepare_elts.
	(vector_init_fpmerge, vector_init_faligndata): Delete.
	(sparc_expand_vector_init): Rewrite using new infrastructure.

gcc/testsuite/

	* lib/test-supports.exp
	(check_effective_target_ultrasparc_vis2_hw): New proc.
	(check_effective_target_ultrasparc_vis3_hw): New proc.
	* gcc.target/sparc/vec-init-1.inc: New vector init common code.
	* gcc.target/sparc/vec-init-2.inc: Likewise.
	* gcc.target/sparc/vec-init-3.inc: Likewise.
	* gcc.target/sparc/vec-init-1-vis1.c: New test.
	* gcc.target/sparc/vec-init-1-vis2.c: New test.
	* gcc.target/sparc/vec-init-1-vis3.c: New test.
	* gcc.target/sparc/vec-init-2-vis1.c: New test.
	* gcc.target/sparc/vec-init-2-vis2.c: New test.
	* gcc.target/sparc/vec-init-2-vis3.c: New test.
	* gcc.target/sparc/vec-init-3-vis1.c: New test.
	* gcc.target/sparc/vec-init-3-vis2.c: New test.
	* gcc.target/sparc/vec-init-3-vis3.c: New test.
---
 gcc/ChangeLog                                    |   16 +-
 gcc/config/sparc/sparc.c                         |  419 +++++++++++++++++-----
 gcc/config/sparc/sparc.md                        |   43 +++
 gcc/testsuite/ChangeLog                          |   18 +
 gcc/testsuite/gcc.target/sparc/vec-init-1-vis1.c |    5 +
 gcc/testsuite/gcc.target/sparc/vec-init-1-vis2.c |    5 +
 gcc/testsuite/gcc.target/sparc/vec-init-1-vis3.c |    5 +
 gcc/testsuite/gcc.target/sparc/vec-init-1.inc    |   85 +++++
 gcc/testsuite/gcc.target/sparc/vec-init-2-vis1.c |    5 +
 gcc/testsuite/gcc.target/sparc/vec-init-2-vis2.c |    5 +
 gcc/testsuite/gcc.target/sparc/vec-init-2-vis3.c |    5 +
 gcc/testsuite/gcc.target/sparc/vec-init-2.inc    |   94 +++++
 gcc/testsuite/gcc.target/sparc/vec-init-3-vis1.c |    5 +
 gcc/testsuite/gcc.target/sparc/vec-init-3-vis2.c |    5 +
 gcc/testsuite/gcc.target/sparc/vec-init-3-vis3.c |    5 +
 gcc/testsuite/gcc.target/sparc/vec-init-3.inc    |  105 ++++++
 gcc/testsuite/lib/target-supports.exp            |   18 +
 17 files changed, 743 insertions(+), 100 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-1-vis1.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-1-vis2.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-1-vis3.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-1.inc
 create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-2-vis1.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-2-vis2.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-2-vis3.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-2.inc
 create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-3-vis1.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-3-vis2.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-3-vis3.c
 create mode 100644 gcc/testsuite/gcc.target/sparc/vec-init-3.inc
Richard Henderson - Nov. 6, 2011, 5:55 p.m.
On 11/05/2011 07:39 PM, David Miller wrote:
> Richard, is there a better way to represent this in RTL?  These
> instructions basically load a single byte or half-word into the bottom
> of a 64-bit float register, and clear the rest of that register with
> zeros.  So the v4hi one is essentially loading the vector:
> 
> 	[(const_int 0) (const_int 0)
>          (const_int 0) (mem:HI (register:P ...))]

Try

(define_insn "*zero_extend_v4hi_vis"
  [(set (match_operand:V4HI 0 "register_operand" "=e")
	(vec_merge:V4HI
	  (vec_duplicate:V4HI
	    (match_operand:HI 1 "memory_operand" "m"))
	  (match_operand:V4HI 2 "const_zero_operand" "")
	  (const_int 14)))]
  ...
)

(define_expand "zero_extend_v4hi_vis"
  [(set (match_operand:V4HI 0 "register_operand" "=e")
	(vec_merge:V4HI
	  (vec_duplicate:V4HI
	    (match_operand:HI 1 "memory_operand" "m"))
	  (match_dup 2)
	  (const_int 14)))]
  "TARGET_VIS"
{
  operands[2] = CONST0_RTX (V4HImode);
})


r~

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2df0736..819ec63 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,17 @@ 
+2011-11-05  David S. Miller  <davem@davemloft.net>
+
+	* config/sparc/sparc.md (UNSPEC_SHORT_LOAD): New unspec.
+	(zero-extend_v8qi_vis, zero_extend_v4hi_vis): New expanders.
+	(*zero_extend_v8qi_<P:mode>_insn,
+	*zero_extend_v4hi_<P:mode>_insn): New insns.
+	* config/sparc/sparc.c (vector_init_move_words)
+	(vector_init_prepare_elts, sparc_expand_vector_init_vis2,
+	sparc_expand_vector_init_vis1): New functions.
+	(vector_init_bshuffle): Rewrite to handle more cases and make use
+	of locs[] array prepared by vector_init_prepare_elts.
+	(vector_init_fpmerge, vector_init_faligndata): Delete.
+	(sparc_expand_vector_init): Rewrite using new infrastructure.
+
 2011-11-05  Joern Rennecke  <joern.rennecke@embecosm.com>
 
 	* config.gcc (epiphany-*-*): New architecture.
@@ -56,7 +70,7 @@ 
 	Remove -mcpu=601 multilib.
 	Remove -Dmpc8260 multilib.
 	* config/rs6000/rtems.h: Allow --float-gprs=... to override grps
-	on E500 targets.	
+	on E500 targets.
 
 2011-11-05  Quentin Neill  <quentin.neill@amd.com>
 
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 0daa53d..5d22fc0 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -11280,83 +11280,333 @@  output_v8plus_mult (rtx insn, rtx *operands, const char *name)
 }
 
 static void
-vector_init_bshuffle (rtx target, rtx elt, enum machine_mode mode,
+vector_init_bshuffle (rtx target, rtx *locs, int n_elts, enum machine_mode mode,
 		      enum machine_mode inner_mode)
 {
-      rtx t1, final_insn;
-      int bmask;
+  rtx mid_target, r0_high, r0_low, r1_high, r1_low;
+  enum machine_mode partial_mode;
+  int bmask, i, idxs[8];
 
-      t1 = gen_reg_rtx (mode);
+  partial_mode = (mode == V4HImode
+		  ? V2HImode
+		  : (mode == V8QImode
+		     ? V4QImode : mode));
 
-      elt = convert_modes (SImode, inner_mode, elt, true);
-      emit_move_insn (gen_lowpart(SImode, t1), elt);
+  r0_high = r0_low = NULL_RTX;
+  r1_high = r1_low = NULL_RTX;
 
-      switch (mode)
+  /* Move the pieces into place, as needed, and calculate the nibble
+     indexes for the bmask calculation.  After we execute this loop the
+     locs[] array is no longer needed.  Therefore, to simplify things,
+     we set entries that have been processed already to NULL_RTX.  */
+
+  for (i = 0; i < n_elts; i++)
+    {
+      int j;
+
+      if (locs[i] == NULL_RTX)
+	continue;
+
+      if (!r0_low)
 	{
-	case V2SImode:
-	  final_insn = gen_bshufflev2si_vis (target, t1, t1);
-	  bmask = 0x45674567;
-	  break;
-	case V4HImode:
-	  final_insn = gen_bshufflev4hi_vis (target, t1, t1);
-	  bmask = 0x67676767;
+	  r0_low = locs[i];
+	  idxs[i] = 0x7;
+	}
+      else if (!r1_low)
+	{
+	  r1_low = locs[i];
+	  idxs[i] = 0xf;
+	}
+      else if (!r0_high)
+	{
+	  r0_high = gen_highpart (partial_mode, r0_low);
+	  emit_move_insn (r0_high, gen_lowpart (partial_mode, locs[i]));
+	  idxs[i] = 0x3;
+	}
+      else if (!r1_high)
+	{
+	  r1_high = gen_highpart (partial_mode, r1_low);
+	  emit_move_insn (r1_high, gen_lowpart (partial_mode, locs[i]));
+	  idxs[i] = 0xb;
+	}
+      else
+	gcc_unreachable ();
+
+      for (j = i + 1; j < n_elts; j++)
+	{
+	  if (locs[j] == locs[i])
+	    {
+	      locs[j] = NULL_RTX;
+	      idxs[j] = idxs[i];
+	    }
+	}
+      locs[i] = NULL_RTX;
+    }
+
+  bmask = 0;
+  for (i = 0; i < n_elts; i++)
+    {
+      int v = idxs[i];
+
+      switch (GET_MODE_SIZE (inner_mode))
+	{
+	case 2:
+	  bmask <<= 8;
+	  bmask |= (((v - 1) << 4) | v);
 	  break;
-	case V8QImode:
-	  final_insn = gen_bshufflev8qi_vis (target, t1, t1);
-	  bmask = 0x77777777;
+
+	case 1:
+	  bmask <<= 4;
+	  bmask |= v;
 	  break;
+
 	default:
 	  gcc_unreachable ();
 	}
+    }
+
+  emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), CONST0_RTX (SImode),
+			      force_reg (SImode, GEN_INT (bmask))));
+
+  mid_target = target;
+  if (GET_MODE_SIZE (mode) == 4)
+    {
+      mid_target = gen_reg_rtx (mode == V2HImode
+				? V4HImode : V8QImode);
+    }
+
+  if (!r1_low)
+    r1_low = r0_low;
+
+  switch (GET_MODE (mid_target))
+    {
+    case V4HImode:
+      emit_insn (gen_bshufflev4hi_vis (mid_target, r0_low, r1_low));
+      break;
+    case V8QImode:
+      emit_insn (gen_bshufflev8qi_vis (mid_target, r0_low, r1_low));
+      break;
+    default:
+      gcc_unreachable ();
+    }
 
-      emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), CONST0_RTX (SImode),
-				  force_reg (SImode, GEN_INT (bmask))));
-      emit_insn (final_insn);
+  if (mid_target != target)
+    emit_move_insn (target, gen_lowpart (partial_mode, mid_target));
 }
 
+static bool
+vector_init_move_words (rtx target, rtx vals, enum machine_mode mode,
+			enum machine_mode inner_mode)
+{
+  switch (mode)
+    {
+    case V1SImode:
+    case V1DImode:
+      emit_move_insn (gen_lowpart (inner_mode, target),
+		      gen_lowpart (inner_mode, XVECEXP (vals, 0, 0)));
+      return true;
+
+    case V2SImode:
+      emit_move_insn (gen_highpart (SImode, target), XVECEXP (vals, 0, 0));
+      emit_move_insn (gen_lowpart (SImode, target), XVECEXP (vals, 0, 1));
+      return true;
+
+    default:
+      break;
+    }
+  return false;
+}
+
+/* Move the elements in rtvec VALS into registers compatible with MODE.
+   Store the rtx for these regs into the corresponding array entry of
+   LOCS.  */
 static void
-vector_init_fpmerge (rtx target, rtx elt, enum machine_mode inner_mode)
+vector_init_prepare_elts (rtx vals, int n_elts, rtx *locs, enum machine_mode mode,
+			  enum machine_mode inner_mode)
 {
-  rtx t1, t2, t3, t3_low;
+  enum machine_mode loc_mode;
+  int i;
 
-  t1 = gen_reg_rtx (V4QImode);
-  elt = convert_modes (SImode, inner_mode, elt, true);
-  emit_move_insn (gen_lowpart (SImode, t1), elt);
+  switch (mode)
+    {
+    case V2HImode:
+      loc_mode = V4HImode;
+      break;
 
-  t2 = gen_reg_rtx (V4QImode);
-  emit_move_insn (t2, t1);
+    case V4QImode:
+      loc_mode = V8QImode;
+      break;
+
+    case V4HImode:
+    case V8QImode:
+      loc_mode = mode;
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  gcc_assert (GET_MODE_SIZE (inner_mode) <= 4);
+  for (i = 0; i < n_elts; i++)
+    {
+      rtx dst, elt = XVECEXP (vals, 0, i);
+      int j;
+
+      /* Did we see this already?  If so just record it's location.  */
+      dst = NULL_RTX;
+      for (j = 0; j < i; j++)
+	{
+	  if (XVECEXP (vals, 0, j) == elt)
+	    {
+	      dst = locs[j];
+	      break;
+	    }
+	}
 
-  t3 = gen_reg_rtx (V8QImode);
-  t3_low = gen_lowpart (V4QImode, t3);
+      if (! dst)
+	{
+	  enum rtx_code code = GET_CODE (elt);
 
-  emit_insn (gen_fpmerge_vis (t3, t1, t2));
-  emit_move_insn (t1, t3_low);
-  emit_move_insn (t2, t3_low);
+	  dst = gen_reg_rtx (loc_mode);
 
-  emit_insn (gen_fpmerge_vis (t3, t1, t2));
-  emit_move_insn (t1, t3_low);
-  emit_move_insn (t2, t3_low);
+	  /* We use different strategies based upon whether the element
+	     is in memory or in a register.  When we start in a register
+	     and we're VIS3 capable, it's always cheaper to use the VIS3
+	     int-->fp register moves since we avoid having to use stack
+	     memory.  */
+	  if ((TARGET_VIS3 && (code == REG || code == SUBREG))
+	      || (CONSTANT_P (elt)
+		  && (const_zero_operand (elt, inner_mode)
+		      || const_all_ones_operand (elt, inner_mode))))
+	    {
+	      elt = convert_modes (SImode, inner_mode, elt, true);
 
-  emit_insn (gen_fpmerge_vis (gen_lowpart (V8QImode, target), t1, t2));
+	      emit_clobber (dst);
+	      emit_move_insn (gen_lowpart (SImode, dst), elt);
+	    }
+	  else
+	    {
+	      rtx m = elt;
+
+	      if (CONSTANT_P (elt))
+		{
+		  m = force_const_mem (inner_mode, elt);
+		}
+	      else if (code != MEM)
+		{
+		  rtx stk = assign_stack_temp (inner_mode, GET_MODE_SIZE(inner_mode), 0);
+		  emit_move_insn (stk, elt);
+		  m = stk;
+		}
+
+	      switch (loc_mode)
+		{
+		case V4HImode:
+		  emit_insn (gen_zero_extend_v4hi_vis (dst, m));
+		  break;
+		case V8QImode:
+		  emit_insn (gen_zero_extend_v8qi_vis (dst, m));
+		  break;
+		default:
+		  gcc_unreachable ();
+		}
+	    }
+	}
+      locs[i] = dst;
+    }
 }
 
 static void
-vector_init_faligndata (rtx target, rtx elt, enum machine_mode inner_mode)
+sparc_expand_vector_init_vis2 (rtx target, rtx *locs, int n_elts, int n_unique,
+			       enum machine_mode mode,
+			       enum machine_mode inner_mode)
 {
-  rtx t1 = gen_reg_rtx (V4HImode);
+  if (n_unique <= 4)
+    {
+      vector_init_bshuffle (target, locs, n_elts, mode, inner_mode);
+    }
+  else
+    {
+      int i;
 
-  elt = convert_modes (SImode, inner_mode, elt, true);
+      gcc_assert (mode == V8QImode);
 
-  emit_move_insn (gen_lowpart (SImode, t1), elt);
+      emit_insn (gen_alignaddrsi_vis (gen_reg_rtx (SImode),
+				      force_reg (SImode, GEN_INT (7)),
+				      CONST0_RTX (SImode)));
+      i = n_elts - 1;
+      emit_insn (gen_faligndatav8qi_vis (target, locs[i], locs[i]));
+      while (--i >= 0)
+	emit_insn (gen_faligndatav8qi_vis (target, locs[i], target));
+    }
+}
+
+static void
+sparc_expand_vector_init_vis1 (rtx target, rtx *locs, int n_elts, int n_unique,
+			       enum machine_mode mode)
+{
+  enum machine_mode full_mode = mode;
+  rtx (*emitter)(rtx, rtx, rtx);
+  int alignaddr_val, i;
+  rtx tmp = target;
+
+  if (n_unique == 1 && mode == V8QImode)
+    {
+      rtx t2, t2_low, t1;
+
+      t1 = gen_reg_rtx (V4QImode);
+      emit_move_insn (t1, gen_lowpart (V4QImode, locs[0]));
+
+      t2 = gen_reg_rtx (V8QImode);
+      t2_low = gen_lowpart (V4QImode, t2);
+
+      /* xxxxxxAA --> xxxxxxxxxxxxAAAA
+         xxxxAAAA --> xxxxxxxxAAAAAAAA
+         AAAAAAAA --> AAAAAAAAAAAAAAAA */
+      emit_insn (gen_fpmerge_vis (t2, t1, t1));
+      emit_move_insn (t1, t2_low);
+      emit_insn (gen_fpmerge_vis (t2, t1, t1));
+      emit_move_insn (t1, t2_low);
+      emit_insn (gen_fpmerge_vis (target, t1, t1));
+      return;
+    }
+
+  switch (mode)
+    {
+    case V2HImode:
+      full_mode = V4HImode;
+      /* FALLTHRU */
+    case V4HImode:
+      emitter = gen_faligndatav4hi_vis;
+      alignaddr_val = 6;
+      break;
+
+    case V4QImode:
+      full_mode = V8QImode;
+      /* FALLTHRU */
+    case V8QImode:
+      emitter = gen_faligndatav8qi_vis;
+      alignaddr_val = 7;
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  if (full_mode != mode)
+    tmp = gen_reg_rtx (full_mode);
 
   emit_insn (gen_alignaddrsi_vis (gen_reg_rtx (SImode),
-				  force_reg (SImode, GEN_INT (6)),
+				  force_reg (SImode, GEN_INT (alignaddr_val)),
 				  CONST0_RTX (SImode)));
 
-  emit_insn (gen_faligndatav4hi_vis (target, t1, target));
-  emit_insn (gen_faligndatav4hi_vis (target, t1, target));
-  emit_insn (gen_faligndatav4hi_vis (target, t1, target));
-  emit_insn (gen_faligndatav4hi_vis (target, t1, target));
+  i = n_elts - 1;
+  emit_insn (emitter (tmp, locs[i], locs[i]));
+  while (--i >= 0)
+    emit_insn (emitter (tmp, locs[i], tmp));
+
+  if (tmp != target)
+    emit_move_insn (target, gen_highpart (mode, tmp));
 }
 
 void
@@ -11365,19 +11615,30 @@  sparc_expand_vector_init (rtx target, rtx vals)
   enum machine_mode mode = GET_MODE (target);
   enum machine_mode inner_mode = GET_MODE_INNER (mode);
   int n_elts = GET_MODE_NUNITS (mode);
-  int i, n_var = 0;
-  bool all_same;
-  rtx mem;
+  int i, n_var = 0, n_unique = 0;
+  rtx locs[8];
+
+  gcc_assert (n_elts <= 8);
 
-  all_same = true;
   for (i = 0; i < n_elts; i++)
     {
       rtx x = XVECEXP (vals, 0, i);
+      bool found = false;
+      int j;
+
       if (!CONSTANT_P (x))
 	n_var++;
 
-      if (i > 0 && !rtx_equal_p (x, XVECEXP (vals, 0, 0)))
-	all_same = false;
+      for (j = 0; j < i; j++)
+	{
+	  if (rtx_equal_p (x, XVECEXP (vals, 0, j)))
+	    {
+	      found = true;
+	      break;
+	    }
+	}
+      if (!found)
+	n_unique++;
     }
 
   if (n_var == 0)
@@ -11386,56 +11647,16 @@  sparc_expand_vector_init (rtx target, rtx vals)
       return;
     }
 
-  if (GET_MODE_SIZE (inner_mode) == GET_MODE_SIZE (mode))
-    {
-      if (GET_MODE_SIZE (inner_mode) == 4)
-	{
-	  emit_move_insn (gen_lowpart (SImode, target),
-			  gen_lowpart (SImode, XVECEXP (vals, 0, 0)));
-	  return;
-	}
-      else if (GET_MODE_SIZE (inner_mode) == 8)
-	{
-	  emit_move_insn (gen_lowpart (DImode, target),
-			  gen_lowpart (DImode, XVECEXP (vals, 0, 0)));
-	  return;
-	}
-    }
-  else if (GET_MODE_SIZE (inner_mode) == GET_MODE_SIZE (word_mode)
-	   && GET_MODE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode))
-    {
-      emit_move_insn (gen_highpart (word_mode, target),
-		      gen_lowpart (word_mode, XVECEXP (vals, 0, 0)));
-      emit_move_insn (gen_lowpart (word_mode, target),
-		      gen_lowpart (word_mode, XVECEXP (vals, 0, 1)));
-      return;
-    }
+  if (vector_init_move_words (target, vals, mode, inner_mode))
+    return;
 
-  if (all_same && GET_MODE_SIZE (mode) == 8)
-    {
-      if (TARGET_VIS2)
-	{
-	  vector_init_bshuffle (target, XVECEXP (vals, 0, 0), mode, inner_mode);
-	  return;
-	}
-      if (mode == V8QImode)
-	{
-	  vector_init_fpmerge (target, XVECEXP (vals, 0, 0), inner_mode);
-	  return;
-	}
-      if (mode == V4HImode)
-	{
-	  vector_init_faligndata (target, XVECEXP (vals, 0, 0), inner_mode);
-	  return;
-	}
-    }
+  vector_init_prepare_elts (vals, n_elts, locs, mode, inner_mode);
 
-  mem = assign_stack_temp (mode, GET_MODE_SIZE (mode), 0);
-  for (i = 0; i < n_elts; i++)
-    emit_move_insn (adjust_address_nv (mem, inner_mode,
-				    i * GET_MODE_SIZE (inner_mode)),
-		    XVECEXP (vals, 0, i));
-  emit_move_insn (target, mem);
+  if (TARGET_VIS2)
+    sparc_expand_vector_init_vis2 (target, locs, n_elts, n_unique,
+				   mode, inner_mode);
+  else
+    sparc_expand_vector_init_vis1 (target, locs, n_elts, n_unique, mode);
 }
 
 static reg_class_t
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index d4827bd..7452f96 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -92,6 +92,7 @@ 
    (UNSPEC_MUL8			86)
    (UNSPEC_MUL8SU		87)
    (UNSPEC_MULDSU		88)
+   (UNSPEC_SHORT_LOAD		89)
   ])
 
 (define_constants
@@ -7830,6 +7831,48 @@ 
   DONE;
 })
 
+(define_expand "zero_extend_v8qi_vis"
+  [(set (match_operand:V8QI 0 "register_operand" "")
+        (unspec:V8QI [(match_operand:QI 1 "memory_operand" "")]
+                     UNSPEC_SHORT_LOAD))]
+  "TARGET_VIS"
+{
+  if (! REG_P (XEXP (operands[1], 0)))
+    {
+      rtx addr = force_reg (Pmode, XEXP (operands[1], 0));
+      operands[1] = replace_equiv_address (operands[1], addr);
+    }
+})
+
+(define_expand "zero_extend_v4hi_vis"
+  [(set (match_operand:V4HI 0 "register_operand" "")
+        (unspec:V4HI [(match_operand:HI 1 "memory_operand" "")]
+                     UNSPEC_SHORT_LOAD))]
+  "TARGET_VIS"
+{
+  if (! REG_P (XEXP (operands[1], 0)))
+    {
+      rtx addr = force_reg (Pmode, XEXP (operands[1], 0));
+      operands[1] = replace_equiv_address (operands[1], addr);
+    }
+})
+
+(define_insn "*zero_extend_v8qi_<P:mode>_insn"
+  [(set (match_operand:V8QI 0 "register_operand" "=e")
+        (unspec:V8QI [(mem:QI
+                       (match_operand:P 1 "register_operand" "r"))]
+                     UNSPEC_SHORT_LOAD))]
+  "TARGET_VIS"
+  "ldda\t[%1] 0xd0, %0")
+
+(define_insn "*zero_extend_v4hi_<P:mode>_insn"
+  [(set (match_operand:V4HI 0 "register_operand" "=e")
+        (unspec:V4HI [(mem:HI
+                       (match_operand:P 1 "register_operand" "r"))]
+                     UNSPEC_SHORT_LOAD))]
+  "TARGET_VIS"
+  "ldda\t[%1] 0xd2, %0")
+
 (define_expand "vec_init<mode>"
   [(match_operand:VMALL 0 "register_operand" "")
    (match_operand:VMALL 1 "" "")]
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 8091789..b84dcf0 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,21 @@ 
+2011-11-05  David S. Miller  <davem@davemloft.net>
+
+	* lib/test-supports.exp
+	(check_effective_target_ultrasparc_vis2_hw): New proc.
+	(check_effective_target_ultrasparc_vis3_hw): New proc.
+	* gcc.target/sparc/vec-init-1.inc: New vector init common code.
+	* gcc.target/sparc/vec-init-2.inc: Likewise.
+	* gcc.target/sparc/vec-init-3.inc: Likewise.
+	* gcc.target/sparc/vec-init-1-vis1.c: New test.
+	* gcc.target/sparc/vec-init-1-vis2.c: New test.
+	* gcc.target/sparc/vec-init-1-vis3.c: New test.
+	* gcc.target/sparc/vec-init-2-vis1.c: New test.
+	* gcc.target/sparc/vec-init-2-vis2.c: New test.
+	* gcc.target/sparc/vec-init-2-vis3.c: New test.
+	* gcc.target/sparc/vec-init-3-vis1.c: New test.
+	* gcc.target/sparc/vec-init-3-vis2.c: New test.
+	* gcc.target/sparc/vec-init-3-vis3.c: New test.
+
 2011-11-05  Joern Rennecke  <joern.rennecke@embecosm.com>
 
 	* gcc.c-torture/execute/ieee/mul-subnormal-single-1.x:
diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-1-vis1.c b/gcc/testsuite/gcc.target/sparc/vec-init-1-vis1.c
new file mode 100644
index 0000000..4202bfa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/vec-init-1-vis1.c
@@ -0,0 +1,5 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target ultrasparc_hw } */
+/* { dg-options "-mcpu=ultrasparc -mvis -O2" } */
+
+#include "vec-init-1.inc"
diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-1-vis2.c b/gcc/testsuite/gcc.target/sparc/vec-init-1-vis2.c
new file mode 100644
index 0000000..a5c2132
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/vec-init-1-vis2.c
@@ -0,0 +1,5 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target ultrasparc_vis2_hw } */
+/* { dg-options "-mcpu=ultrasparc3 -O2" } */
+
+#include "vec-init-1.inc"
diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-1-vis3.c b/gcc/testsuite/gcc.target/sparc/vec-init-1-vis3.c
new file mode 100644
index 0000000..ab916e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/vec-init-1-vis3.c
@@ -0,0 +1,5 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target ultrasparc_vis3_hw } */
+/* { dg-options "-mcpu=niagara3 -O2" } */
+
+#include "vec-init-1.inc"
diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-1.inc b/gcc/testsuite/gcc.target/sparc/vec-init-1.inc
new file mode 100644
index 0000000..e27bb6e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/vec-init-1.inc
@@ -0,0 +1,85 @@ 
+typedef int __v1si __attribute__ ((__vector_size__ (4)));
+typedef int __v2si __attribute__ ((__vector_size__ (8)));
+typedef short __v2hi __attribute__ ((__vector_size__ (4)));
+typedef short __v4hi __attribute__ ((__vector_size__ (8)));
+typedef unsigned char __v4qi __attribute__ ((__vector_size__ (4)));
+typedef unsigned char __v8qi __attribute__ ((__vector_size__ (8)));
+
+extern void abort (void);
+
+static void
+compare64 (void *p, unsigned long long val)
+{
+  if (*(unsigned long long *)p != val)
+    abort();
+}
+
+static void
+compare32 (void *p, unsigned int val)
+{
+  if (*(unsigned int *)p != val)
+    abort();
+}
+
+static void
+test_v8qi (unsigned char x)
+{
+  __v8qi v = { x, x, x, x, x, x, x, x };
+
+  compare64(&v, 0x4444444444444444ULL);
+}
+
+static void
+test_v4qi (unsigned char x)
+{
+  __v4qi v = { x, x, x, x };
+
+  compare32(&v, 0x44444444);
+}
+
+static void
+test_v4hi (unsigned short x)
+{
+  __v4hi v = { x, x, x, x, };
+
+  compare64(&v, 0x3344334433443344ULL);
+}
+
+static void
+test_v2hi (unsigned short x)
+{
+  __v2hi v = { x, x, };
+
+  compare32(&v, 0x33443344);
+}
+
+static void
+test_v2si (unsigned int x)
+{
+  __v2si v = { x, x, };
+
+  compare64(&v, 0x1122334411223344ULL);
+}
+
+static void
+test_v1si (unsigned int x)
+{
+  __v1si v = { x };
+
+  compare32(&v, 0x11223344);
+}
+
+unsigned char x8 = 0x44;
+unsigned short x16 = 0x3344;
+unsigned int x32 = 0x11223344;
+
+int main(void)
+{
+  test_v8qi (x8);
+  test_v4qi (x8);
+  test_v4hi (x16);
+  test_v2hi (x16);
+  test_v2si (x32);
+  test_v1si (x32);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-2-vis1.c b/gcc/testsuite/gcc.target/sparc/vec-init-2-vis1.c
new file mode 100644
index 0000000..efa08fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/vec-init-2-vis1.c
@@ -0,0 +1,5 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target ultrasparc_hw } */
+/* { dg-options "-mcpu=ultrasparc -mvis -O2" } */
+
+#include "vec-init-2.inc"
diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-2-vis2.c b/gcc/testsuite/gcc.target/sparc/vec-init-2-vis2.c
new file mode 100644
index 0000000..3aa0f51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/vec-init-2-vis2.c
@@ -0,0 +1,5 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target ultrasparc_vis2_hw } */
+/* { dg-options "-mcpu=ultrasparc3 -O2" } */
+
+#include "vec-init-2.inc"
diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-2-vis3.c b/gcc/testsuite/gcc.target/sparc/vec-init-2-vis3.c
new file mode 100644
index 0000000..5f0c658
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/vec-init-2-vis3.c
@@ -0,0 +1,5 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target ultrasparc_vis3_hw } */
+/* { dg-options "-mcpu=niagara3 -O2" } */
+
+#include "vec-init-2.inc"
diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-2.inc b/gcc/testsuite/gcc.target/sparc/vec-init-2.inc
new file mode 100644
index 0000000..13685a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/vec-init-2.inc
@@ -0,0 +1,94 @@ 
+typedef short __v2hi __attribute__ ((__vector_size__ (4)));
+typedef short __v4hi __attribute__ ((__vector_size__ (8)));
+
+extern void abort (void);
+
+static void
+compare64 (int n, void *p, unsigned long long val)
+{
+  unsigned long long *x = (unsigned long long *) p;
+
+  if (*x != val)
+    abort();
+}
+
+static void
+compare32 (int n, void *p, unsigned int val)
+{
+  unsigned int *x = (unsigned int *) p;
+  if (*x != val)
+    abort();
+}
+
+#define V2HI_TEST(N, elt0, elt1) \
+static void \
+test_v2hi_##N (unsigned short x, unsigned short y) \
+{ \
+  __v2hi v = { (elt0), (elt1) }; \
+  compare32(N, &v, ((int)(elt0) << 16) | (elt1)); \
+}
+
+V2HI_TEST(1, x, y)
+V2HI_TEST(2, y, x)
+V2HI_TEST(3, x, x)
+V2HI_TEST(4, x, 0)
+V2HI_TEST(5, 0, x)
+V2HI_TEST(6, y, 1)
+V2HI_TEST(7, 1, y)
+V2HI_TEST(8, 2, 3)
+V2HI_TEST(9, 0x400, x)
+V2HI_TEST(10, y, 0x8000)
+
+#define V4HI_TEST(N, elt0, elt1, elt2, elt3)	\
+static void \
+test_v4hi_##N (unsigned short a, unsigned short b, unsigned short c, unsigned short d) \
+{ \
+  __v4hi v = { (elt0), (elt1), (elt2), (elt3) }; \
+  compare64(N, &v, \
+            ((long long)(elt0) << 48) | \
+	    ((long long)(elt1) << 32) | \
+            ((long long)(elt2) << 16) | \
+            ((long long)(elt3))); \
+}
+
+V4HI_TEST(1, a, a, a, a)
+V4HI_TEST(2, a, b, c, d)
+V4HI_TEST(3, a, a, b, b)
+V4HI_TEST(4, d, c, b, a)
+V4HI_TEST(5, a, 0, 0, 0)
+V4HI_TEST(6, a, 0, b, 0)
+V4HI_TEST(7, c, 5, 5, 5)
+V4HI_TEST(8, d, 6, a, 6)
+V4HI_TEST(9, 0x200, 0x300, 0x500, 0x8800)
+V4HI_TEST(10, 0x600, a, a, a)
+
+unsigned short a16 = 0x3344;
+unsigned short b16 = 0x5566;
+unsigned short c16 = 0x7788;
+unsigned short d16 = 0x9911;
+
+int main(void)
+{
+  test_v2hi_1 (a16, b16);
+  test_v2hi_2 (a16, b16);
+  test_v2hi_3 (a16, b16);
+  test_v2hi_4 (a16, b16);
+  test_v2hi_5 (a16, b16);
+  test_v2hi_6 (a16, b16);
+  test_v2hi_7 (a16, b16);
+  test_v2hi_8 (a16, b16);
+  test_v2hi_9 (a16, b16);
+  test_v2hi_10 (a16, b16);
+
+  test_v4hi_1 (a16, b16, c16, d16);
+  test_v4hi_2 (a16, b16, c16, d16);
+  test_v4hi_3 (a16, b16, c16, d16);
+  test_v4hi_4 (a16, b16, c16, d16);
+  test_v4hi_5 (a16, b16, c16, d16);
+  test_v4hi_6 (a16, b16, c16, d16);
+  test_v4hi_7 (a16, b16, c16, d16);
+  test_v4hi_8 (a16, b16, c16, d16);
+  test_v4hi_9 (a16, b16, c16, d16);
+  test_v4hi_10 (a16, b16, c16, d16);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-3-vis1.c b/gcc/testsuite/gcc.target/sparc/vec-init-3-vis1.c
new file mode 100644
index 0000000..6c82610
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/vec-init-3-vis1.c
@@ -0,0 +1,5 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target ultrasparc_hw } */
+/* { dg-options "-mcpu=ultrasparc -mvis -O2" } */
+
+#include "vec-init-3.inc"
diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-3-vis2.c b/gcc/testsuite/gcc.target/sparc/vec-init-3-vis2.c
new file mode 100644
index 0000000..6424e2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/vec-init-3-vis2.c
@@ -0,0 +1,5 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target ultrasparc_vis2_hw } */
+/* { dg-options "-mcpu=ultrasparc3 -O2" } */
+
+#include "vec-init-3.inc"
diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-3-vis3.c b/gcc/testsuite/gcc.target/sparc/vec-init-3-vis3.c
new file mode 100644
index 0000000..226c108
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/vec-init-3-vis3.c
@@ -0,0 +1,5 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target ultrasparc_vis3_hw } */
+/* { dg-options "-mcpu=niagara3 -O2" } */
+
+#include "vec-init-3.inc"
diff --git a/gcc/testsuite/gcc.target/sparc/vec-init-3.inc b/gcc/testsuite/gcc.target/sparc/vec-init-3.inc
new file mode 100644
index 0000000..8a3db26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/vec-init-3.inc
@@ -0,0 +1,105 @@ 
+typedef unsigned char __v4qi __attribute__ ((__vector_size__ (4)));
+typedef unsigned char __v8qi __attribute__ ((__vector_size__ (8)));
+
+extern void abort (void);
+
+static void
+compare64 (int n, void *p, unsigned long long val)
+{
+  unsigned long long *x = (unsigned long long *) p;
+
+  if (*x != val)
+    abort();
+}
+
+static void
+compare32 (int n, void *p, unsigned int val)
+{
+  unsigned int *x = (unsigned int *) p;
+  if (*x != val)
+    abort();
+}
+
+#define V4QI_TEST(N, elt0, elt1, elt2, elt3)	\
+static void \
+test_v4qi_##N (unsigned char a, unsigned char b, unsigned char c, unsigned char d) \
+{ \
+  __v4qi v = { (elt0), (elt1), (elt2), (elt3) };	\
+  compare32(N, &v, ((int)(elt0) << 24) | \
+	           ((int)(elt1) << 16) | \
+	           ((int)(elt2) << 8) | ((int)(elt3)));	\
+}
+
+V4QI_TEST(1, a, a, a, a)
+V4QI_TEST(2, b, b, b, b)
+V4QI_TEST(3, a, b, c, d)
+V4QI_TEST(4, d, c, b, a)
+V4QI_TEST(5, a, 0, 0, 0)
+V4QI_TEST(6, b, 1, 1, b)
+V4QI_TEST(7, c, 5, d, 5)
+V4QI_TEST(8, 0x20, 0x30, b, a)
+V4QI_TEST(9, 0x40, 0x50, 0x60, 0x70)
+V4QI_TEST(10, 0x40, 0x50, 0x60, c)
+
+#define V8QI_TEST(N, elt0, elt1, elt2, elt3, elt4, elt5, elt6, elt7) \
+static void \
+test_v8qi_##N (unsigned char a, unsigned char b, unsigned char c, unsigned char d, \
+               unsigned char e, unsigned char f, unsigned char g, unsigned char h) \
+{ \
+  __v8qi v = { (elt0), (elt1), (elt2), (elt3), \
+	       (elt4), (elt5), (elt6), (elt7) }; \
+  compare64(N, &v, ((long long)(elt0) << 56) | \
+	           ((long long)(elt1) << 48) | \
+	           ((long long)(elt2) << 40) | \
+	           ((long long)(elt3) << 32) | \
+	           ((long long)(elt4) << 24) | \
+	           ((long long)(elt5) << 16) | \
+	           ((long long)(elt6) << 8) | \
+	           ((long long)(elt7) << 0)); \
+}
+
+V8QI_TEST(1, a, a, a, a, a, a, a, a)
+V8QI_TEST(2, a, b, c, d, e, f, g, h)
+V8QI_TEST(3, h, g, f, e, d, c, b, a)
+V8QI_TEST(4, a, b, a, b, a, b, a, b)
+V8QI_TEST(5, c, b, c, b, c, b, c, a)
+V8QI_TEST(6, a, 0, 0, 0, 0, 0, 0, 0)
+V8QI_TEST(7, b, 1, b, 1, b, 1, b, 1)
+V8QI_TEST(8, c, d, 0x20, a, 0x21, b, 0x23, c)
+V8QI_TEST(9, 1, 2, 3, 4, 5, 6, 7, 8)
+V8QI_TEST(10, a, a, b, b, c, c, d, d)
+
+unsigned char a8 = 0x33;
+unsigned char b8 = 0x55;
+unsigned char c8 = 0x77;
+unsigned char d8 = 0x99;
+unsigned char e8 = 0x11;
+unsigned char f8 = 0x22;
+unsigned char g8 = 0x44;
+unsigned char h8 = 0x66;
+
+int main(void)
+{
+  test_v4qi_1 (a8, b8, c8, d8);
+  test_v4qi_2 (a8, b8, c8, d8);
+  test_v4qi_3 (a8, b8, c8, d8);
+  test_v4qi_4 (a8, b8, c8, d8);
+  test_v4qi_5 (a8, b8, c8, d8);
+  test_v4qi_6 (a8, b8, c8, d8);
+  test_v4qi_7 (a8, b8, c8, d8);
+  test_v4qi_8 (a8, b8, c8, d8);
+  test_v4qi_9 (a8, b8, c8, d8);
+  test_v4qi_10 (a8, b8, c8, d8);
+
+  test_v8qi_1 (a8, b8, c8, d8, e8, f8, g8, h8);
+  test_v8qi_2 (a8, b8, c8, d8, e8, f8, g8, h8);
+  test_v8qi_3 (a8, b8, c8, d8, e8, f8, g8, h8);
+  test_v8qi_4 (a8, b8, c8, d8, e8, f8, g8, h8);
+  test_v8qi_5 (a8, b8, c8, d8, e8, f8, g8, h8);
+  test_v8qi_6 (a8, b8, c8, d8, e8, f8, g8, h8);
+  test_v8qi_7 (a8, b8, c8, d8, e8, f8, g8, h8);
+  test_v8qi_8 (a8, b8, c8, d8, e8, f8, g8, h8);
+  test_v8qi_9 (a8, b8, c8, d8, e8, f8, g8, h8);
+  test_v8qi_10 (a8, b8, c8, d8, e8, f8, g8, h8);
+  return 0;
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index f19c3c5..1ba71f0 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2449,6 +2449,24 @@  proc check_effective_target_ultrasparc_hw { } {
     } "-mcpu=ultrasparc"]
 }
 
+# Return 1 if the test environment supports executing UltraSPARC VIS2
+# instructions.  We check this by attempting: "bmask %g0, %g0, %g0"
+
+proc check_effective_target_ultrasparc_vis2_hw { } {
+    return [check_runtime ultrasparc_hw {
+	int main() { __asm__(".word 0x81b00320"); return 0; }
+    } "-mcpu=ultrasparc3"]
+}
+
+# Return 1 if the test environment supports executing UltraSPARC VIS3
+# instructions.  We check this by attempting: "addxc %g0, %g0, %g0"
+
+proc check_effective_target_ultrasparc_vis3_hw { } {
+    return [check_runtime ultrasparc_hw {
+	int main() { __asm__(".word 0x81b00220"); return 0; }
+    } "-mcpu=niagara3"]
+}
+
 # Return 1 if the target supports hardware vector shift operation.
 
 proc check_effective_target_vect_shift { } {