From patchwork Mon Jan 14 14:01:47 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Tamar Christina <Tamar.Christina@arm.com>
X-Patchwork-Id: 1024562
Return-Path: 
 <gcc-patches-return-494007-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-494007-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=none (p=none dis=none) header.from=arm.com
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="Ont5CM9I";
	dkim=fail reason="signature verification failed" (1024-bit key;
	unprotected) header.d=armh.onmicrosoft.com
	header.i=@armh.onmicrosoft.com header.b="lofhuZUL";
	dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 43dZsX5LYrz9rxp
	for <incoming@patchwork.ozlabs.org>;
	Tue, 15 Jan 2019 01:02:07 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:content-type:mime-version; q=dns;
	s=default; b=mo+ZkBrN1BXB8L52AgZZuS7yMAnI1JnNZTloSRRR9xQEJQaSqV
	MD3yiYV+hqHQwNwfmz+xezNB+o2lSh5IvDoov2DvusxNmagDVlGWrIxe+gSIjkRy
	sreP3do9cbAzgRGbiyiFrDkX6cjkwC0T+zgdvRyVF7L5qFYjWAZphfarU=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:content-type:mime-version; s=
	default; bh=6MYDCq6V9rXq3QfgygZN96KSMoQ=; b=Ont5CM9I5hEBJzTh9061
	ryCFwh+EJAMT98PpXw86TygZeR5JLwkirOrXhyjKWjw3uCFFr3PKp9IkmhjSz1FT
	FzwkeY0pYUF2HRjzQh+Hz+onVhaQzFvrBmL+s0BXkhjZBSk/chXd3MkBB1VeBLKT
	3YhUb3mzFUZxpptMCj5QGKo=
Received: (qmail 103621 invoked by alias); 14 Jan 2019 14:02:00 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 96817 invoked by uid 89); 14 Jan 2019 14:01:54 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0,
	GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE,
	SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=24h,
	subreg, DImode, dimode
X-HELO: EUR02-AM5-obe.outbound.protection.outlook.com
Received: from mail-eopbgr00050.outbound.protection.outlook.com (HELO
	EUR02-AM5-obe.outbound.protection.outlook.com) (40.107.0.50)
	by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	ESMTP; Mon, 14 Jan 2019 14:01:51 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com;
	s=selector1-arm-com;
	h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
	bh=59vGoKX6Hk0Zr4notkm443hQg1poKlBHoNbXleiV0/k=;
	b=lofhuZUL7ig6tREMPy7UzEKnnGtdCEmL7iOAwt4MF7FyePUA4NCq0XdYJtJbQ3/irxzrnNL/Fy5QXdhxua3ifjsSAfBAJaR1CH2/N74nTdSjD4JhPZeRxLlvFn0tTggP5ECexzbmrNqnPxIzFv8P/IzDy3LUQp1sJcrl5lPxgew=
Received: from DB6PR0802MB2309.eurprd08.prod.outlook.com (10.172.228.13) by
	DB6PR0802MB2184.eurprd08.prod.outlook.com (10.172.227.18)
	with Microsoft SMTP Server (version=TLS1_2,
	cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
	15.20.1516.14; Mon, 14 Jan 2019 14:01:47 +0000
Received: from DB6PR0802MB2309.eurprd08.prod.outlook.com
	([fe80::ad19:20e5:52a5:b3df]) by
	DB6PR0802MB2309.eurprd08.prod.outlook.com
	([fe80::ad19:20e5:52a5:b3df%7]) with mapi id 15.20.1516.019;
	Mon, 14 Jan 2019 14:01:47 +0000
From: Tamar Christina <Tamar.Christina@arm.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: nd <nd@arm.com>, James Greenhalgh <James.Greenhalgh@arm.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,
	Marcus Shawcroft	<Marcus.Shawcroft@arm.com>
Subject: [PATCH][GCC][AArch64] Fix big-endian neon-intrinsics ICEs
Date: Mon, 14 Jan 2019 14:01:47 +0000
Message-ID: <20190114140143.GA31810@arm.com>
authentication-results: spf=none (sender IP is )
	smtp.mailfrom=Tamar.Christina@arm.com;
received-spf: None (protection.outlook.com: arm.com does not designate
	permitted sender hosts)
MIME-Version: 1.0
X-IsSubscribed: yes

Hi All,


This patch fixes some ICEs when the fcmla_lane intrinsics are used on
big endian by correcting the lane indices and removing the hardcoded byte
offset from subreg calls and instead use subreg_lowpart_offset.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Cross compiled and regtested on aarch64_be-none-elf and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2019-01-14  Tamar Christina  <tamar.christina@arm.com>

	* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Use correct
	max nunits for endian swap.
	(aarch64_expand_fcmla_builtin): Correct subreg code.
	* config/aarch64/aarch64-simd.md (aarch64_fcmla_lane<rot><mode>,
	aarch64_fcmla_laneq<rot>v4hf, aarch64_fcmlaq_lane<rot><mode>): Correct lane
	endianness.
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 04063e5ed134d2e64487db23b8fa7794817b2739..c8f5a555f6724433dc6cea1cff3547c0c66c54a7 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1197,7 +1197,9 @@ aarch64_simd_expand_args (rtx target, int icode, int have_retval,
 		    = GET_MODE_NUNITS (vmode).to_constant ();
 		  aarch64_simd_lane_bounds (op[opc], 0, nunits / 2, exp);
 		  /* Keep to GCC-vector-extension lane indices in the RTL.  */
-		  op[opc] = aarch64_endian_lane_rtx (vmode, INTVAL (op[opc]));
+		  int lane = INTVAL (op[opc]);
+		  op[opc] = gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane),
+					  SImode);
 		}
 	      /* Fall through - if the lane index isn't a constant then
 		 the next case will error.  */
@@ -1443,14 +1445,12 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
   int nunits = GET_MODE_NUNITS (quadmode).to_constant ();
   aarch64_simd_lane_bounds (lane_idx, 0, nunits / 2, exp);
 
-  /* Keep to GCC-vector-extension lane indices in the RTL.  */
-  lane_idx = aarch64_endian_lane_rtx (quadmode, INTVAL (lane_idx));
-
   /* Generate the correct register and mode.  */
   int lane = INTVAL (lane_idx);
 
   if (lane < nunits / 4)
-    op2 = simplify_gen_subreg (d->mode, op2, quadmode, 0);
+    op2 = simplify_gen_subreg (d->mode, op2, quadmode,
+			       subreg_lowpart_offset (d->mode, quadmode));
   else
     {
       /* Select the upper 64 bits, either a V2SF or V4HF, this however
@@ -1460,15 +1460,24 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
 	 gen_highpart_mode generates code that isn't optimal.  */
       rtx temp1 = gen_reg_rtx (d->mode);
       rtx temp2 = gen_reg_rtx (DImode);
-      temp1 = simplify_gen_subreg (d->mode, op2, quadmode, 0);
+      temp1 = simplify_gen_subreg (d->mode, op2, quadmode,
+				   subreg_lowpart_offset (d->mode, quadmode));
       temp1 = simplify_gen_subreg (V2DImode, temp1, d->mode, 0);
-      emit_insn (gen_aarch64_get_lanev2di (temp2, temp1     , const1_rtx));
+      if (BYTES_BIG_ENDIAN)
+	emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const0_rtx));
+      else
+	emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const1_rtx));
       op2 = simplify_gen_subreg (d->mode, temp2, GET_MODE (temp2), 0);
 
       /* And recalculate the index.  */
       lane -= nunits / 4;
     }
 
+  /* Keep to GCC-vector-extension lane indices in the RTL, only nunits / 4
+     (max nunits in range check) are valid.  Which means only 0-1, so we
+     only need to know the order in a V2mode.  */
+  lane_idx = aarch64_endian_lane_rtx (V2DImode, lane);
+
   if (!target)
     target = gen_reg_rtx (d->mode);
   else
@@ -1477,8 +1486,7 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
   rtx pat = NULL_RTX;
 
   if (d->lane)
-    pat = GEN_FCN (d->icode) (target, op0, op1, op2,
-			      gen_int_mode (lane, SImode));
+    pat = GEN_FCN (d->icode) (target, op0, op1, op2, lane_idx);
   else
     pat = GEN_FCN (d->icode) (target, op0, op1, op2);
 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index be6c27d319a1ca6fee581d8f8856a4dff8f4a060..805d7a895fad4c7370260fd77ef9864805206b07 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -455,7 +455,10 @@
 				   (match_operand:SI 4 "const_int_operand" "n")]
 				   FCMLA)))]
   "TARGET_COMPLEX"
-  "fcmla\t%0.<Vtype>, %2.<Vtype>, %3.<FCMLA_maybe_lane>, #<rot>"
+{
+  operands[4] = aarch64_endian_lane_rtx (<VHALF>mode, INTVAL (operands[4]));
+  return "fcmla\t%0.<Vtype>, %2.<Vtype>, %3.<FCMLA_maybe_lane>, #<rot>";
+}
   [(set_attr "type" "neon_fcmla")]
 )
 
@@ -467,7 +470,10 @@
 				 (match_operand:SI 4 "const_int_operand" "n")]
 				 FCMLA)))]
   "TARGET_COMPLEX"
-  "fcmla\t%0.4h, %2.4h, %3.h[%4], #<rot>"
+{
+  operands[4] = aarch64_endian_lane_rtx (V4HFmode, INTVAL (operands[4]));
+  return "fcmla\t%0.4h, %2.4h, %3.h[%4], #<rot>";
+}
   [(set_attr "type" "neon_fcmla")]
 )
 
@@ -479,7 +485,12 @@
 				     (match_operand:SI 4 "const_int_operand" "n")]
 				     FCMLA)))]
   "TARGET_COMPLEX"
-  "fcmla\t%0.<Vtype>, %2.<Vtype>, %3.<FCMLA_maybe_lane>, #<rot>"
+{
+  int nunits = GET_MODE_NUNITS (<VHALF>mode).to_constant ();
+  operands[4]
+    = gen_int_mode (ENDIAN_LANE_N (nunits / 2, INTVAL (operands[4])), SImode);
+  return "fcmla\t%0.<Vtype>, %2.<Vtype>, %3.<FCMLA_maybe_lane>, #<rot>";
+}
   [(set_attr "type" "neon_fcmla")]
 )