From patchwork Fri Feb 21 16:30:41 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tejas Belagod <tbelagod@arm.com>
X-Patchwork-Id: 322945
Return-Path: 
 <gcc-patches-return-362205-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 70C7C2C031A
	for <incoming@patchwork.ozlabs.org>;
	Sat, 22 Feb 2014 03:30:56 +1100 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:content-type;
	q=dns; s=default; b=UkHpmfgmb3AguAwEHD8fsfYZV2c0FMGO4o4w9VDpsEd
	tF48EX61qcAEIGfscrJ8hYjOjGI7FJ6VOgdQVDEktwLvEHfw7gJTePb0TygxZQ++
	l0HkJi4zZXQN9GGcYTLzAc27nh2CG7oRCs/lNyVaVtwBqWvOw+RRv9aXffJRxO5U
	=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:content-type;
	s=default; bh=0x99PhJfloloTYmDfe7NVY+bIEc=; b=UAe2UK22FQOQMgMyZ
	i+RtIoxiBCS292KrIjObwmDGlDAv1269xvGIaQsZnn8xJE46qFIADF/iUjINpoqU
	BtB8ggXoYMg0y4uq74cLvXX8cC8hl8gkvSsYJ8xGBejJUWmlPlRS9X0543c7X5bX
	zx1rkwZ1Bo+l70pMgSWDs9NudY=
Received: (qmail 24731 invoked by alias); 21 Feb 2014 16:30:49 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 24704 invoked by uid 89); 21 Feb 2014 16:30:48 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00,
	RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2
X-HELO: service87.mimecast.com
Received: from service87.mimecast.com (HELO service87.mimecast.com)
	(91.220.42.44) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Fri, 21 Feb 2014 16:30:46 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com
	[217.140.96.21]) by service87.mimecast.com;
	Fri, 21 Feb 2014 16:30:43 +0000
Received: from [10.1.203.80] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with
	Microsoft SMTPSVC(6.0.3790.3959); Fri, 21 Feb 2014 16:30:42 +0000
Message-ID: <53077F31.8070003@arm.com>
Date: Fri, 21 Feb 2014 16:30:41 +0000
From: Tejas Belagod <tbelagod@arm.com>
User-Agent: Thunderbird 2.0.0.18 (X11/20081120)
MIME-Version: 1.0
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: Marcus Shawcroft <Marcus.Shawcroft@arm.com>
Subject: [Patch, AArch64] Fix shuffle for big-endian.
X-MC-Unique: 114022116304308501
X-IsSubscribed: yes

Hi,

When a shuffle of more than one input happens, on NEON we end up with a 
'mixed-endian' format in the register list which TBL operates on. We don't make 
this correction in RTL and therefore the shuffle operation gets it incorrect. 
Here is a patch that fixes-up the index table in the selector rtx in RTL to also 
be mixed-endian to reflect what's happening on NEON.

As trunk stands, this patch will not be exercised as constant vector permute for 
Big-endian is disabled. I've tested this by locally enabling const vec_perm and 
it fixes the some regressions we have on big-endian:

aarch64_be-none-elf:
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer 
-funroll-all-loops -finline-functions
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer 
-funroll-loops
FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -g
FAIL->PASS: gcc.dg/torture/vector-shuffle1.c  -O0  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2df.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2di.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v2si.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v4si.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c  -O2  execution test
FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c  -O2  execution test
FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test
FAIL->PASS: gcc.dg/vect/vect-114.c execution test
FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test
FAIL->PASS: gcc.dg/vect/vect-15.c execution test

Also regressed on aarch64-none-elf.

OK for stage-1?

Thanks,
Tejas.

2014-02-21  Tejas Belagod  <tejas.belagod@arm.com>

gcc/
	* config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for
	big-endian when dealing with more than one input shuffle vector.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ea90311..fd473a3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8128,7 +8128,28 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d)
     return false;

   for (i = 0; i < nelt; ++i)
-    rperm[i] = GEN_INT (d->perm[i]);
+    {
+      int nunits = GET_MODE_NUNITS (vmode);
+      int elt = d->perm[i];
+
+      /* If two vectors, we end up with a wierd mixed-endian mode on NEON.  */
+      if (BYTES_BIG_ENDIAN)
+	{
+	  if (!d->one_vector_p && d->perm[i] & nunits)
+	    {
+	      /* Extract the offset.  */
+	      elt = d->perm[i] & (nunits - 1);
+	      /* Reverse the top half.  */
+	      elt = nunits - 1 - elt;
+	      /* Offset it by the bottom half.  */
+	      elt += nunits;
+	    }
+	  else
+	    elt = nunits - 1 - d->perm[i];
+	}
+
+      rperm[i] = GEN_INT (elt);
+    }
   sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm));
   sel = force_reg (vmode, sel);