diff mbox

[rs6000] Fix variable permute control vectors for little endian

Message ID 1381360293.6275.31.camel@gnopaine
State New
Headers show

Commit Message

Bill Schmidt Oct. 9, 2013, 11:11 p.m. UTC
Hi,

This is a follow-up to the recent patch that fixed constant permute
control vectors for little endian.  When the control vector is constant,
we can adjust the constant and use a vperm without increasing code size.
When the control vector is unknown, however, we have to generate two
additional instructions to subtract each element of the control vector
from 31 (equivalently, from -1, since only 5 bits are pertinent).  This
patch adds the additional code generation.

There are two main paths to the affected permutes:  via the known
pattern vec_perm<mode>, and via an altivec builtin.  The builtin path
causes a little difficulty because there's no way to dispatch a builtin
to two different insns for BE and LE.  I solved this by adding two new
unspecs for the builtins (UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X).  The
insns for the builtins are changed from a define_insn to a
define_insn_and_split.  We create the _X forms at expand time and later
split them into the correct sequences for BE and LE, using the "real"
UNSPEC_VPERM and UNSPEC_VPERM_UNS to generate the vperm instruction.

For the path via the known pattern, I added a new routine in rs6000.c in
similar fashion to the solution for the constant control vector case.

When the permute control vector is a rotate vector loaded by lvsl or
lvsr, we can generate the desired control vector more cheaply by simply
changing to use the opposite instruction.  We are already doing that
when expanding an unaligned load.  The changes in vector.md avoid
undoing that effort by circumventing the subtract-from-splat (going
straight to the UNSPEC_VPERM).

I bootstrapped and tested this for big endian on
powerpc64-unknown-linux-gnu with no new regressions.  I did the same for
little endian on powerpc64le-unknown-linux-gnu.  Here the results were
slightly mixed: the changes fix 32 test failures, but expose an
unrelated bug in 9 others when -mvsx is permitted on LE (not currently
allowed).  The bug is a missing permute for a vector load in the
unaligned vector load logic that will be fixed in a subsequent patch.

Is this okay for trunk?

Thanks,
Bill


2013-10-09  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/vector.md (vec_realign_load<mode>): Generate vperm
	directly to circumvent subtract from splat{31} workaround.
	* config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New
	prototype.
	* config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New.
	* config/rs6000/altivec.md (define_c_enum "unspec"): Add
	UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X.
	(altivec_vperm_<mode>): Convert to define_insn_and_split to
	separate big and little endian logic.
	(*altivec_vperm_<mode>_internal): New define_insn.
	(altivec_vperm_<mode>_uns): Convert to define_insn_and_split to
	separate big and little endian logic.
	(*altivec_vperm_<mode>_uns_internal): New define_insn.
	(vec_permv16qi): Add little endian logic.

Comments

David Edelsohn Oct. 11, 2013, 12:39 p.m. UTC | #1
On Wed, Oct 9, 2013 at 7:11 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> This is a follow-up to the recent patch that fixed constant permute
> control vectors for little endian.  When the control vector is constant,
> we can adjust the constant and use a vperm without increasing code size.
> When the control vector is unknown, however, we have to generate two
> additional instructions to subtract each element of the control vector
> from 31 (equivalently, from -1, since only 5 bits are pertinent).  This
> patch adds the additional code generation.
>
> There are two main paths to the affected permutes:  via the known
> pattern vec_perm<mode>, and via an altivec builtin.  The builtin path
> causes a little difficulty because there's no way to dispatch a builtin
> to two different insns for BE and LE.  I solved this by adding two new
> unspecs for the builtins (UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X).  The
> insns for the builtins are changed from a define_insn to a
> define_insn_and_split.  We create the _X forms at expand time and later
> split them into the correct sequences for BE and LE, using the "real"
> UNSPEC_VPERM and UNSPEC_VPERM_UNS to generate the vperm instruction.
>
> For the path via the known pattern, I added a new routine in rs6000.c in
> similar fashion to the solution for the constant control vector case.
>
> When the permute control vector is a rotate vector loaded by lvsl or
> lvsr, we can generate the desired control vector more cheaply by simply
> changing to use the opposite instruction.  We are already doing that
> when expanding an unaligned load.  The changes in vector.md avoid
> undoing that effort by circumventing the subtract-from-splat (going
> straight to the UNSPEC_VPERM).
>
> I bootstrapped and tested this for big endian on
> powerpc64-unknown-linux-gnu with no new regressions.  I did the same for
> little endian on powerpc64le-unknown-linux-gnu.  Here the results were
> slightly mixed: the changes fix 32 test failures, but expose an
> unrelated bug in 9 others when -mvsx is permitted on LE (not currently
> allowed).  The bug is a missing permute for a vector load in the
> unaligned vector load logic that will be fixed in a subsequent patch.
>
> Is this okay for trunk?
>
> Thanks,
> Bill
>
>
> 2013-10-09  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/vector.md (vec_realign_load<mode>): Generate vperm
>         directly to circumvent subtract from splat{31} workaround.
>         * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New
>         prototype.
>         * config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New.
>         * config/rs6000/altivec.md (define_c_enum "unspec"): Add
>         UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X.
>         (altivec_vperm_<mode>): Convert to define_insn_and_split to
>         separate big and little endian logic.
>         (*altivec_vperm_<mode>_internal): New define_insn.
>         (altivec_vperm_<mode>_uns): Convert to define_insn_and_split to
>         separate big and little endian logic.
>         (*altivec_vperm_<mode>_uns_internal): New define_insn.
>         (vec_permv16qi): Add little endian logic.

Okay.

Thanks, David
diff mbox

Patch

Index: gcc/config/rs6000/vector.md
===================================================================
--- gcc/config/rs6000/vector.md	(revision 203246)
+++ gcc/config/rs6000/vector.md	(working copy)
@@ -950,8 +950,15 @@ 
     emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1],
     	      				 operands[2], operands[3]));
   else
-    emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[2],
-    	      				 operands[1], operands[3]));
+    {
+      /* Avoid the "subtract from splat31" workaround for vperm since
+         we have changed lvsr to lvsl instead.  */
+      rtx unspec = gen_rtx_UNSPEC (<MODE>mode,
+                                   gen_rtvec (3, operands[2],
+                                              operands[1], operands[3]),
+                                   UNSPEC_VPERM);
+      emit_move_insn (operands[0], unspec);
+    }
   DONE;
 })
 
Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 203246)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -56,6 +56,7 @@  extern void paired_expand_vector_init (rtx, rtx);
 extern void rs6000_expand_vector_set (rtx, rtx, int);
 extern void rs6000_expand_vector_extract (rtx, rtx, int);
 extern bool altivec_expand_vec_perm_const (rtx op[4]);
+extern void altivec_expand_vec_perm_le (rtx op[4]);
 extern bool rs6000_expand_vec_perm_const (rtx op[4]);
 extern void rs6000_expand_extract_even (rtx, rtx, rtx);
 extern void rs6000_expand_interleave (rtx, rtx, rtx, bool);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 203247)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -28608,6 +28608,54 @@  altivec_expand_vec_perm_const_le (rtx operands[4])
   emit_move_insn (target, unspec);
 }
 
+/* Similarly to altivec_expand_vec_perm_const_le, we must adjust the
+   permute control vector.  But here it's not a constant, so we must
+   generate a vector splat/subtract to do the adjustment.  */
+
+void
+altivec_expand_vec_perm_le (rtx operands[4])
+{
+  rtx splat, unspec;
+  rtx target = operands[0];
+  rtx op0 = operands[1];
+  rtx op1 = operands[2];
+  rtx sel = operands[3];
+  rtx tmp = target;
+
+  /* Get everything in regs so the pattern matches.  */
+  if (!REG_P (op0))
+    op0 = force_reg (V16QImode, op0);
+  if (!REG_P (op1))
+    op1 = force_reg (V16QImode, op1);
+  if (!REG_P (sel))
+    sel = force_reg (V16QImode, sel);
+  if (!REG_P (target))
+    tmp = gen_reg_rtx (V16QImode);
+
+  /* SEL = splat(31) - SEL.  */
+  /* We want to subtract from 31, but we can't vspltisb 31 since
+     it's out of range.  -1 works as well because only the low-order
+     five bits of the permute control vector elements are used.  */
+  splat = gen_rtx_VEC_DUPLICATE (V16QImode,
+				 gen_rtx_CONST_INT (QImode, -1));
+  emit_move_insn (tmp, splat);
+  sel = gen_rtx_MINUS (V16QImode, tmp, sel);
+  emit_move_insn (tmp, sel);
+
+  /* Permute with operands reversed and adjusted selector.  */
+  unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, tmp),
+			   UNSPEC_VPERM);
+
+  /* Copy into target, possibly by way of a register.  */
+  if (!REG_P (target))
+    {
+      emit_move_insn (tmp, unspec);
+      unspec = tmp;
+    }
+
+  emit_move_insn (target, unspec);
+}
+
 /* Expand an Altivec constant permutation.  Return true if we match
    an efficient implementation; false to fall back to VPERM.  */
 
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 203244)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -59,6 +59,8 @@ 
    UNSPEC_VSUMSWS
    UNSPEC_VPERM
    UNSPEC_VPERM_UNS
+   UNSPEC_VPERM_X
+   UNSPEC_VPERM_UNS_X
    UNSPEC_VRFIN
    UNSPEC_VCFUX
    UNSPEC_VCFSX
@@ -1279,21 +1281,91 @@ 
   "vrfiz %0,%1"
   [(set_attr "type" "vecfloat")])
 
-(define_insn "altivec_vperm_<mode>"
+(define_insn_and_split "altivec_vperm_<mode>"
   [(set (match_operand:VM 0 "register_operand" "=v")
 	(unspec:VM [(match_operand:VM 1 "register_operand" "v")
 		    (match_operand:VM 2 "register_operand" "v")
 		    (match_operand:V16QI 3 "register_operand" "v")]
+		   UNSPEC_VPERM_X))]
+  "TARGET_ALTIVEC"
+  "#"
+  "!reload_in_progress && !reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  if (BYTES_BIG_ENDIAN)
+    operands[4] = gen_rtx_UNSPEC (<MODE>mode,
+                                  gen_rtvec (3, operands[1],
+ 		                             operands[2], operands[3]),
+                                  UNSPEC_VPERM);
+  else
+    {
+      /* We want to subtract from 31, but we can't vspltisb 31 since
+         it's out of range.  -1 works as well because only the low-order
+         five bits of the permute control vector elements are used.  */
+      rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode,
+                                         gen_rtx_CONST_INT (QImode, -1));
+      rtx tmp = gen_reg_rtx (V16QImode);
+      emit_move_insn (tmp, splat);
+      rtx sel = gen_rtx_MINUS (V16QImode, tmp, operands[3]);
+      emit_move_insn (tmp, sel);
+      operands[4] = gen_rtx_UNSPEC (<MODE>mode,
+                                    gen_rtvec (3, operands[2],
+		 		               operands[1], tmp),
+		                    UNSPEC_VPERM);
+    }
+}
+  [(set_attr "type" "vecperm")])
+
+(define_insn "*altivec_vperm_<mode>_internal"
+  [(set (match_operand:VM 0 "register_operand" "=v")
+	(unspec:VM [(match_operand:VM 1 "register_operand" "v")
+		    (match_operand:VM 2 "register_operand" "v")
+		    (match_operand:V16QI 3 "register_operand" "+v")]
 		   UNSPEC_VPERM))]
   "TARGET_ALTIVEC"
   "vperm %0,%1,%2,%3"
   [(set_attr "type" "vecperm")])
 
-(define_insn "altivec_vperm_<mode>_uns"
+(define_insn_and_split "altivec_vperm_<mode>_uns"
   [(set (match_operand:VM 0 "register_operand" "=v")
 	(unspec:VM [(match_operand:VM 1 "register_operand" "v")
 		    (match_operand:VM 2 "register_operand" "v")
 		    (match_operand:V16QI 3 "register_operand" "v")]
+		   UNSPEC_VPERM_UNS_X))]
+  "TARGET_ALTIVEC"
+  "#"
+  "!reload_in_progress && !reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  if (BYTES_BIG_ENDIAN)
+    operands[4] = gen_rtx_UNSPEC (<MODE>mode,
+                                  gen_rtvec (3, operands[1],
+				             operands[2], operands[3]),
+                                  UNSPEC_VPERM_UNS);
+  else
+    {
+      /* We want to subtract from 31, but we can't vspltisb 31 since
+         it's out of range.  -1 works as well because only the low-order
+         five bits of the permute control vector elements are used.  */
+      rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode,
+                                         gen_rtx_CONST_INT (QImode, -1));
+      rtx tmp = gen_reg_rtx (V16QImode);
+      emit_move_insn (tmp, splat);
+      rtx sel = gen_rtx_MINUS (V16QImode, tmp, operands[3]);
+      emit_move_insn (tmp, sel);
+      operands[4] = gen_rtx_UNSPEC (<MODE>mode,
+                                    gen_rtvec (3, operands[2],
+				               operands[1], tmp),
+		                    UNSPEC_VPERM_UNS);
+    }
+}
+  [(set_attr "type" "vecperm")])
+
+(define_insn "*altivec_vperm_<mode>_uns_internal"
+  [(set (match_operand:VM 0 "register_operand" "=v")
+	(unspec:VM [(match_operand:VM 1 "register_operand" "v")
+		    (match_operand:VM 2 "register_operand" "v")
+		    (match_operand:V16QI 3 "register_operand" "+v")]
 		   UNSPEC_VPERM_UNS))]
   "TARGET_ALTIVEC"
   "vperm %0,%1,%2,%3"
@@ -1306,7 +1378,12 @@ 
 		       (match_operand:V16QI 3 "register_operand" "")]
 		      UNSPEC_VPERM))]
   "TARGET_ALTIVEC"
-  "")
+{
+  if (!BYTES_BIG_ENDIAN) {
+    altivec_expand_vec_perm_le (operands);
+    DONE;
+  }
+})
 
 (define_expand "vec_perm_constv16qi"
   [(match_operand:V16QI 0 "register_operand" "")