From patchwork Wed Apr 23 15:29:52 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alan Lawrence <alan.lawrence@arm.com>
X-Patchwork-Id: 341903
Return-Path: 
 <gcc-patches-return-365734-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 6A4651400FB
	for <incoming@patchwork.ozlabs.org>;
	Thu, 24 Apr 2014 01:30:26 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:content-type; q=
	dns; s=default; b=pnd677mZtcRHbP0p8cDJ9EO2TWjvnpSptoFFOCXjYB7emb
	mMHqf52fdAC5lrFpIcjSHQhrpx+jHRiagGc98mT/VaJQIvyNLoDuQ+ZW6zVxts8U
	M8nPFP0wt+8mA6zwuN2J/QiTqz5Y5YENVg9BnYMzGf5VpI6Z4DXkV7NJGpEUA=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:subject:content-type; s=
	default; bh=KcKteYF0MUllkSOrndwqdpAUcFs=; b=h2fqAVSXopjDb9c1jdHU
	7DqpEJFraky1kAvSNI9DJIXRv6KwKDohY8zn/H2PHUuXsLF5VNbc8Xb+E2i+T37Z
	XLjXEecXQXYKhXVtKvePmfp72QNs5tkFtIEb6D36GIzcnc418umh7NSNCCQ3nWjv
	B22Kl57ZeaCWbjwkH/8bWw4=
Received: (qmail 25739 invoked by alias); 23 Apr 2014 15:29:59 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 25708 invoked by uid 89); 23 Apr 2014 15:29:58 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00,
	RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2
X-HELO: service87.mimecast.com
Received: from service87.mimecast.com (HELO service87.mimecast.com)
	(91.220.42.44) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Wed, 23 Apr 2014 15:29:57 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com
	[217.140.96.21]) by service87.mimecast.com;
	Wed, 23 Apr 2014 16:29:54 +0100
Received: from [10.1.209.51] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with
	Microsoft SMTPSVC(6.0.3790.3959); Wed, 23 Apr 2014 16:30:07 +0100
Message-ID: <5357DC70.5080907@arm.com>
Date: Wed, 23 Apr 2014 16:29:52 +0100
From: Alan Lawrence <alan.lawrence@arm.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20101213)
MIME-Version: 1.0
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: [PATCH AARCH64] fix and enable non-const shuffle for bigendian
	using TBL instruction
X-MC-Unique: 114042316295414601
X-IsSubscribed: yes

At present vec_perm with non-const indices is not handled on bigendian, so gcc 
generates generic, slow, code. This patch fixes up TBL to reverse the indices 
within each input vector (following Richard Henderson's suggestion of using an 
XOR with (nelts - 1) rather than a complicated mask/add/subtract, 
http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01285.html), and enables the code 
for bigendian.

Regressed on aarch64_be-none-elf with no changes. (This is as expected: in all 
affected cases, gcc was already producing correct non-arch-specific code using 
scalar op. However, I have manually verified for various tests in 
c-c++-common/torture/vshuf-v* that (a) TBL instructions are now produced, (b) a 
version of the compiler that produces TBLs without the index correction, fails 
tests).

Note tests c-c++-common/torture/vshuf-{v16hi,v4df,v4di,v8si} (i.e. the 32-byte 
vectors) were broken prior to this patch and are not affected.

gcc/ChangeLog:
2014-04-23  Alan Lawrence  <alan.lawrence@arm.com>

	* config/aarch64/aarch64-simd.md (vec_perm): Enable for bigendian.
	* config/aarch64/aarch64.c (aarch64_expand_vec_perm): Remove assert
	against bigendian and adjust indices.

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 73aee2c..e14e9b0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4002,7 +4002,7 @@
    (match_operand:VB 1 "register_operand")
    (match_operand:VB 2 "register_operand")
    (match_operand:VB 3 "register_operand")]
-  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+  "TARGET_SIMD"
 {
   aarch64_expand_vec_perm (operands[0], operands[1],
 			   operands[2], operands[3]);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d332741..6875b58 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7763,18 +7763,24 @@ aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
   enum machine_mode vmode = GET_MODE (target);
   unsigned int i, nelt = GET_MODE_NUNITS (vmode);
   bool one_vector_p = rtx_equal_p (op0, op1);
-  rtx rmask[MAX_VECT_LEN], mask;
-
-  gcc_checking_assert (!BYTES_BIG_ENDIAN);
+  rtx mask;
 
   /* The TBL instruction does not use a modulo index, so we must take care
      of that ourselves.  */
-  mask = GEN_INT (one_vector_p ? nelt - 1 : 2 * nelt - 1);
-  for (i = 0; i < nelt; ++i)
-    rmask[i] = mask;
-  mask = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rmask));
+  mask = aarch64_simd_gen_const_vector_dup (vmode,
+      one_vector_p ? nelt - 1 : 2 * nelt - 1);
   sel = expand_simple_binop (vmode, AND, sel, mask, NULL, 0, OPTAB_LIB_WIDEN);
 
+  /* For big-endian, we also need to reverse the index within the vector
+     (but not which vector).  */
+  if (BYTES_BIG_ENDIAN)
+    {
+      /* If one_vector_p, mask is a vector of (nelt - 1)'s already.  */
+      if (!one_vector_p)
+        mask = aarch64_simd_gen_const_vector_dup (vmode, nelt - 1);
+      sel = expand_simple_binop (vmode, XOR, sel, mask,
+				 NULL, 0, OPTAB_LIB_WIDEN);
+    }
   aarch64_expand_vec_perm_1 (target, op0, op1, sel);
 }