diff mbox

[AARCH64] fix and enable non-const shuffle for bigendian using TBL instruction

Message ID 53AA949B.4030709@arm.com
State New
Headers show

Commit Message

Alan Lawrence June 25, 2014, 9:21 a.m. UTC
This one seems to have slipped under the radar. I've just rebased and run the 
regression tests on aarch64_be-none-elf, with no issues; ping?

(patch applied straightforwardly, but rebased version below)

--Alan

Alan Lawrence wrote:
> At present vec_perm with non-const indices is not handled on bigendian, so gcc 
> generates generic, slow, code. This patch fixes up TBL to reverse the indices 
> within each input vector (following Richard Henderson's suggestion of using an 
> XOR with (nelts - 1) rather than a complicated mask/add/subtract, 
> http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01285.html), and enables the code 
> for bigendian.
> 
> Regressed on aarch64_be-none-elf with no changes. (This is as expected: in all 
> affected cases, gcc was already producing correct non-arch-specific code using 
> scalar op. However, I have manually verified for various tests in 
> c-c++-common/torture/vshuf-v* that (a) TBL instructions are now produced, (b) a 
> version of the compiler that produces TBLs without the index correction, fails 
> tests).
> 
> Note tests c-c++-common/torture/vshuf-{v16hi,v4df,v4di,v8si} (i.e. the 32-byte 
> vectors) were broken prior to this patch and are not affected.
> 
> gcc/ChangeLog:
> 2014-04-23  Alan Lawrence  <alan.lawrence@arm.com>
> 
> 	* config/aarch64/aarch64-simd.md (vec_perm): Enable for bigendian.
> 	* config/aarch64/aarch64.c (aarch64_expand_vec_perm): Remove assert
> 	against bigendian and adjust indices.
>

Comments

Marcus Shawcroft June 27, 2014, 7:34 a.m. UTC | #1
On 25 June 2014 10:21, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This one seems to have slipped under the radar. I've just rebased and run
> the regression tests on aarch64_be-none-elf, with no issues; ping?
>
> (patch applied straightforwardly, but rebased version below)

OK /Marcus
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-sim
index 42bfd3e..08eb6b3 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4224,7 +4224,7 @@ 
     (match_operand:VB 1 "register_operand")
     (match_operand:VB 2 "register_operand")
     (match_operand:VB 3 "register_operand")]
-  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+  "TARGET_SIMD"
  {
    aarch64_expand_vec_perm (operands[0], operands[1],
                            operands[2], operands[3]);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b2d005b..0ea277a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8730,18 +8730,24 @@  aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, r
    enum machine_mode vmode = GET_MODE (target);
    unsigned int i, nelt = GET_MODE_NUNITS (vmode);
    bool one_vector_p = rtx_equal_p (op0, op1);
-  rtx rmask[MAX_VECT_LEN], mask;
-
-  gcc_checking_assert (!BYTES_BIG_ENDIAN);
+  rtx mask;

    /* The TBL instruction does not use a modulo index, so we must take care
       of that ourselves.  */
-  mask = GEN_INT (one_vector_p ? nelt - 1 : 2 * nelt - 1);
-  for (i = 0; i < nelt; ++i)
-    rmask[i] = mask;
-  mask = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rmask));
+  mask = aarch64_simd_gen_const_vector_dup (vmode,
+      one_vector_p ? nelt - 1 : 2 * nelt - 1);
    sel = expand_simple_binop (vmode, AND, sel, mask, NULL, 0, OPTAB_LIB_WIDEN);

+  /* For big-endian, we also need to reverse the index within the vector
+     (but not which vector).  */
+  if (BYTES_BIG_ENDIAN)
+    {
+      /* If one_vector_p, mask is a vector of (nelt - 1)'s already.  */
+      if (!one_vector_p)
+        mask = aarch64_simd_gen_const_vector_dup (vmode, nelt - 1);
+      sel = expand_simple_binop (vmode, XOR, sel, mask,
+                                NULL, 0, OPTAB_LIB_WIDEN);
+    }
    aarch64_expand_vec_perm_1 (target, op0, op1, sel);
  }