From patchwork Thu Nov 14 19:13:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Srinath Parvathaneni X-Patchwork-Id: 1195154 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-513492-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="ZGFYyKZD"; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.b="otlQUyuc"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.b="otlQUyuc"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DWhn3dGMz9sP3 for ; Fri, 15 Nov 2019 06:27:37 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :content-type:mime-version; q=dns; s=default; b=MnfR1vEcf0OfV2n+ T7edZODQPZzEHgBd8i2qB4TFID55Coo9jJJpMzZ/yUWpXrBsC5vy/X/kml1qKLQO fVEYh7Qty6zTuT2rz6zc8YkwKy9I2Wq3xjTSSSZzBJpij/Kg2hf9MKYSuC1LHWAt 6EC7WtHGxjHjWIEN0NAm7UVl8uE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :content-type:mime-version; s=default; bh=HuIq1V4QbpNsx2jQN3qDbp oYsDg=; b=ZGFYyKZDbYTCvjoWWujJMObjHfgma1uxu1rhIIVZ9smiLIrrAyrlgc Q+97sNpP6DQgwINls/0GK4OCpLyKlbw+wnQCt1tqxWRRPeUHAVRg18RRQfT2BNFe PWHuA1gbNJ5A5ICPHEc6I+a07JJbpzze0hVNpX28y2FhK2s/eKTTg= Received: (qmail 50039 invoked by alias); 14 Nov 2019 19:16:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 42736 invoked by uid 89); 14 Nov 2019 19:14:47 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-22.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com Received: from mail-eopbgr130053.outbound.protection.outlook.com (HELO EUR01-HE1-obe.outbound.protection.outlook.com) (40.107.13.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 14 Nov 2019 19:14:16 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JyZXf9GZ9St9PSRdMQdEQEiOoI70uE0FYYLmKbd6UbY=; b=otlQUyuc0oiVQQKvF6D2CCMH84/Qk+V7nG3asM909Zxs5vB4CKhiC+AuTwZkylRqnyEXgZRu++9dQzxDPsvrDHXU9WUbL9Li0Q5zQRCe3WwDN3LPAGbYRt35G0hVzzE9ecTcg5ohjEb7tnoy/6NkGWtYBGexGvDJL45DlAXqi+E= Received: from VI1PR08CA0165.eurprd08.prod.outlook.com (2603:10a6:800:d1::19) by VE1PR08MB4814.eurprd08.prod.outlook.com (2603:10a6:802:ad::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2451.28; Thu, 14 Nov 2019 19:14:12 +0000 Received: from DB5EUR03FT049.eop-EUR03.prod.protection.outlook.com (2a01:111:f400:7e0a::204) by VI1PR08CA0165.outlook.office365.com (2603:10a6:800:d1::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2451.23 via Frontend Transport; Thu, 14 Nov 2019 19:14:12 +0000 Authentication-Results: spf=fail (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received-SPF: Fail (protection.outlook.com: domain of arm.com does not designate 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT049.mail.protection.outlook.com (10.152.20.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2451.23 via Frontend Transport; Thu, 14 Nov 2019 19:14:12 +0000 Received: ("Tessian outbound e4042aced47b:v33"); Thu, 14 Nov 2019 19:14:12 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: de29cf88d686efaf X-CR-MTA-TID: 64aa7808 Received: from b5d9259c62d0.2 (cr-mta-lb-1.cr-mta-net [104.47.13.59]) by 64aa7808-outbound-1.mta.getcheckrecipient.com id 8FA5CC28-4CEE-426C-85FA-4636F85E80C0.1; Thu, 14 Nov 2019 19:14:06 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04lp2059.outbound.protection.outlook.com [104.47.13.59]) by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id b5d9259c62d0.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 14 Nov 2019 19:14:06 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kGPMLZLSqE8P/m/mpvW5OJjhenr9kZKUnJ6x+Yob6lennv0CMZKX2TN9PWVktBqdO0ONK3jzx6nWmm42+T5DvhOEzsSFyU9nCBiQlOs5QnPk7KfmlltPsQbdty0cQlxOVkSp0M9IJtF5x4ewtcIE4SzI1lYMTen+ZMZFv8m3WrB3JGtE3DvQHJI9PilGHRHto1lqQd0FGiKLfMJm0mWGh3+n9y2qyOWT46T1cfoMgyEDsxPSTOzkVIHvMgmB7HJrXm8iiRZL9svE1WKSiG7p0B0fQ83nJ9MRptkwrBFsJGnAaFQYtG4GGcdDqo/fc7qBoZI6wRnzRd4smv33dyDWkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JyZXf9GZ9St9PSRdMQdEQEiOoI70uE0FYYLmKbd6UbY=; b=iR6z80uXo7wjuzqbchLGeaWGFNAEdr76/iJzUn4oWWAwKWl6GqJw0cY/J/IDVSwdp1ONenE9AT71taZ7G5JusiXSU9h06Zyntrbo84J4W1NvJdwdOHvhJxKueonvYFDvapa6yp1nkDrrPQqnp8h+vefs9DQovseLHynjF1C3F09wXK4EUXGqDtFRvMwOt9U5HIGb7lHP7hmNLjEH/Jxiei5naeuUaK34dZE3VrFELapu/M0pcIQzIX4TpscvKEHWSecAilgnKujNBN1WhKavOVjzkvrDvYlihrV0KH0WREzSlzBiWc9TWm0Mnyizq3leG78Ck/z8RTAk61rj80orYw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JyZXf9GZ9St9PSRdMQdEQEiOoI70uE0FYYLmKbd6UbY=; b=otlQUyuc0oiVQQKvF6D2CCMH84/Qk+V7nG3asM909Zxs5vB4CKhiC+AuTwZkylRqnyEXgZRu++9dQzxDPsvrDHXU9WUbL9Li0Q5zQRCe3WwDN3LPAGbYRt35G0hVzzE9ecTcg5ohjEb7tnoy/6NkGWtYBGexGvDJL45DlAXqi+E= Received: from DBBPR08MB4775.eurprd08.prod.outlook.com (20.179.46.211) by DBBPR08MB4807.eurprd08.prod.outlook.com (20.179.46.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2430.23; Thu, 14 Nov 2019 19:14:03 +0000 Received: from DBBPR08MB4775.eurprd08.prod.outlook.com ([fe80::1c7c:c72d:2183:12d1]) by DBBPR08MB4775.eurprd08.prod.outlook.com ([fe80::1c7c:c72d:2183:12d1%7]) with mapi id 15.20.2430.028; Thu, 14 Nov 2019 19:14:03 +0000 From: Srinath Parvathaneni To: "gcc-patches@gcc.gnu.org" CC: Richard Earnshaw , Kyrylo Tkachov Subject: [PATCH][ARM][GCC][12x]: MVE ACLE intrinsics to set and get vector lane. Date: Thu, 14 Nov 2019 19:13:35 +0000 Message-ID: References: <157375666998.31400.16652205595246718910.scripted-patch-series@arm.com> In-Reply-To: <157375666998.31400.16652205595246718910.scripted-patch-series@arm.com> Authentication-Results-Original: spf=none (sender IP is ) smtp.mailfrom=Srinath.Parvathaneni@arm.com; X-MS-Exchange-PUrlCount: 1 x-ms-exchange-transport-forked: True x-checkrecipientrouted: true x-ms-oob-tlc-oobclassifiers: OLM:86;OLM:86; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM; SFS:(10009020)(4636009)(39860400002)(396003)(366004)(376002)(346002)(136003)(54534003)(199004)(189003)(81156014)(316002)(81166006)(30864003)(2501003)(44832011)(4326008)(74316002)(7696005)(99286004)(478600001)(186003)(11346002)(26005)(54906003)(76176011)(52536014)(5660300002)(446003)(25786009)(71190400001)(71200400001)(33656002)(66946007)(52116002)(305945005)(66556008)(7736002)(66446008)(6666004)(66476007)(66616009)(64756008)(14444005)(256004)(5024004)(9686003)(966005)(5640700003)(3846002)(66066001)(6916009)(86362001)(486006)(6506007)(6436002)(14454004)(8936002)(2906002)(6116002)(2351001)(55016002)(8676002)(102836004)(386003)(476003)(6306002)(579004)(559001)(569006); DIR:OUT; SFP:1101; SCL:1; SRVR:DBBPR08MB4807; H:DBBPR08MB4775.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: cjOVreOinjI2JWDQ5sCCT8pHW9NoZM4X0nK9T726d9sU/yG4B9vf55sHpOzzFm/r2hQwIY2aJ1g787kF6fnlQK7omqFrJ0ubqt27ApVHXRzuC0xAENrdTBPguuphSt1IC9k+b6WYidM52RUdDdP2qpdt/cKRe35LA6ipF2hgWSuts2lTqWJcTKx3LlTEvJp2W0XbWr2/6vvkMyfVQCRJNp7zy0hSpsOmVCO6NXeJSz3DGFfM7daDqazWWqYVNXPrGkLPV1EQAS++kQeR9h21Ou+6Ck6l41M50kzQpQxvTyfVuW19gtu2vQpl2yfmhllu1sdy3M4lYF/6I3HelKwoalVQ8IiFj24r2jgVlS2wMpF9jiDC4isIIk5LJFsuRNxMPQxoFgHfaaDKG+zOJMcnpZtEHZTmIRvGtmXplQsSoG+e+3gssBxGlgJK4obA/0zgNaAVa7qrJJzh2Gkpc1D3OrlUPo+wYZY5xc4ACpInOfg= MIME-Version: 1.0 Original-Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Srinath.Parvathaneni@arm.com; X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT049.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: aea6871b-e291-456c-e992-08d76936bf18 X-IsSubscribed: yes Hello, This patch supports following MVE ACLE intrinsics to get and set vector lane. vsetq_lane_f16, vsetq_lane_f32, vsetq_lane_s16, vsetq_lane_s32, vsetq_lane_s8, vsetq_lane_s64, vsetq_lane_u8, vsetq_lane_u16, vsetq_lane_u32, vsetq_lane_u64, vgetq_lane_f16, vgetq_lane_f32, vgetq_lane_s16, vgetq_lane_s32, vgetq_lane_s8, vgetq_lane_s64, vgetq_lane_u8, vgetq_lane_u16, vgetq_lane_u32, vgetq_lane_u64. Please refer to M-profile Vector Extension (MVE) intrinsics [1] for more details. [1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics Regression tested on arm-none-eabi and found no regressions. Ok for trunk? Thanks, Srinath. gcc/ChangeLog: 2019-11-08 Andre Vieira Mihail Ionescu Srinath Parvathaneni * config/arm/arm_mve.h (vsetq_lane_f16): Define macro. (vsetq_lane_f32): Likewise. (vsetq_lane_s16): Likewise. (vsetq_lane_s32): Likewise. (vsetq_lane_s8): Likewise. (vsetq_lane_s64): Likewise. (vsetq_lane_u8): Likewise. (vsetq_lane_u16): Likewise. (vsetq_lane_u32): Likewise. (vsetq_lane_u64): Likewise. (vgetq_lane_f16): Likewise. (vgetq_lane_f32): Likewise. (vgetq_lane_s16): Likewise. (vgetq_lane_s32): Likewise. (vgetq_lane_s8): Likewise. (vgetq_lane_s64): Likewise. (vgetq_lane_u8): Likewise. (vgetq_lane_u16): Likewise. (vgetq_lane_u32): Likewise. (vgetq_lane_u64): Likewise. (__ARM_NUM_LANES): Likewise. (__ARM_LANEQ): Likewise. (__ARM_CHECK_LANEQ): Likewise. (__arm_vsetq_lane_s16): Define intrinsic. (__arm_vsetq_lane_s32): Likewise. (__arm_vsetq_lane_s8): Likewise. (__arm_vsetq_lane_s64): Likewise. (__arm_vsetq_lane_u8): Likewise. (__arm_vsetq_lane_u16): Likewise. (__arm_vsetq_lane_u32): Likewise. (__arm_vsetq_lane_u64): Likewise. (__arm_vgetq_lane_s16): Likewise. (__arm_vgetq_lane_s32): Likewise. (__arm_vgetq_lane_s8): Likewise. (__arm_vgetq_lane_s64): Likewise. (__arm_vgetq_lane_u8): Likewise. (__arm_vgetq_lane_u16): Likewise. (__arm_vgetq_lane_u32): Likewise. (__arm_vgetq_lane_u64): Likewise. (__arm_vsetq_lane_f16): Likewise. (__arm_vsetq_lane_f32): Likewise. (__arm_vgetq_lane_f16): Likewise. (__arm_vgetq_lane_f32): Likewise. (vgetq_lane): Define polymorphic variant. (vsetq_lane): Likewise. * config/arm/mve.md (mve_vec_extract): Define RTL pattern. (mve_vec_extractv2didi): Likewise. (mve_vec_extract_sext_internal): Likewise. (mve_vec_extract_zext_internal): Likewise. (mve_vec_set_internal): Likewise. (mve_vec_setv2di_internal): Likewise. * config/arm/neon.md (vec_set): Move RTL pattern to vec-common.md file. (vec_extract): Rename to "neon_vec_extract". (vec_extractv2didi): Rename to "neon_vec_extractv2didi". * config/arm/vec-common.md (vec_extract): Define RTL pattern common for MVE and NEON. (vec_set): Move RTL pattern from neon.md and modify to accept both MVE and NEON. gcc/testsuite/ChangeLog: 2019-11-08 Andre Vieira Mihail Ionescu Srinath Parvathaneni * gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: New test. * gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise. * gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise. ############### Attachment also inlined for ease of reply ############### diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index d0259d7bd96c565d901b7634e9f735e0e14bc9dc..9dcf8d692670cd8552fade9868bc051683553b91 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -2506,8 +2506,40 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t; #define vld1q_z_f32(__base, __p) __arm_vld1q_z_f32(__base, __p) #define vst2q_f32(__addr, __value) __arm_vst2q_f32(__addr, __value) #define vst1q_p_f32(__addr, __value, __p) __arm_vst1q_p_f32(__addr, __value, __p) +#define vsetq_lane_f16(__a, __b, __idx) __arm_vsetq_lane_f16(__a, __b, __idx) +#define vsetq_lane_f32(__a, __b, __idx) __arm_vsetq_lane_f32(__a, __b, __idx) +#define vsetq_lane_s16(__a, __b, __idx) __arm_vsetq_lane_s16(__a, __b, __idx) +#define vsetq_lane_s32(__a, __b, __idx) __arm_vsetq_lane_s32(__a, __b, __idx) +#define vsetq_lane_s8(__a, __b, __idx) __arm_vsetq_lane_s8(__a, __b, __idx) +#define vsetq_lane_s64(__a, __b, __idx) __arm_vsetq_lane_s64(__a, __b, __idx) +#define vsetq_lane_u8(__a, __b, __idx) __arm_vsetq_lane_u8(__a, __b, __idx) +#define vsetq_lane_u16(__a, __b, __idx) __arm_vsetq_lane_u16(__a, __b, __idx) +#define vsetq_lane_u32(__a, __b, __idx) __arm_vsetq_lane_u32(__a, __b, __idx) +#define vsetq_lane_u64(__a, __b, __idx) __arm_vsetq_lane_u64(__a, __b, __idx) +#define vgetq_lane_f16(__a, __idx) __arm_vgetq_lane_f16(__a, __idx) +#define vgetq_lane_f32(__a, __idx) __arm_vgetq_lane_f32(__a, __idx) +#define vgetq_lane_s16(__a, __idx) __arm_vgetq_lane_s16(__a, __idx) +#define vgetq_lane_s32(__a, __idx) __arm_vgetq_lane_s32(__a, __idx) +#define vgetq_lane_s8(__a, __idx) __arm_vgetq_lane_s8(__a, __idx) +#define vgetq_lane_s64(__a, __idx) __arm_vgetq_lane_s64(__a, __idx) +#define vgetq_lane_u8(__a, __idx) __arm_vgetq_lane_u8(__a, __idx) +#define vgetq_lane_u16(__a, __idx) __arm_vgetq_lane_u16(__a, __idx) +#define vgetq_lane_u32(__a, __idx) __arm_vgetq_lane_u32(__a, __idx) +#define vgetq_lane_u64(__a, __idx) __arm_vgetq_lane_u64(__a, __idx) #endif +/* For big-endian, GCC's vector indices are reversed within each 64 bits + compared to the architectural lane indices used by MVE intrinsics. */ +#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0])) +#ifdef __ARM_BIG_ENDIAN +#define __ARM_LANEQ(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 - 1)) +#else +#define __ARM_LANEQ(__vec, __idx) __idx +#endif +#define __ARM_CHECK_LANEQ(__vec, __idx) \ + __builtin_arm_lane_check (__ARM_NUM_LANES(__vec), \ + __ARM_LANEQ(__vec, __idx)) + __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst4q_s8 (int8_t * __addr, int8x16x4_t __value) @@ -16371,6 +16403,142 @@ __arm_vld4q_u32 (uint32_t const * __addr) return __rv.__i; } +__extension__ extern __inline int16x8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_s16 (int16_t __a, int16x8_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_s32 (int32_t __a, int32x4_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline int8x16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_s8 (int8_t __a, int8x16_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline int64x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_s64 (int64_t __a, int64x2_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline uint8x16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_u8 (uint8_t __a, uint8x16_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline uint16x8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_u16 (uint16_t __a, uint16x8_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline uint32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_u32 (uint32_t __a, uint32x4_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline uint64x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_u64 (uint64_t __a, uint64x2_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline int16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_s16 (int16x8_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline int32_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_s32 (int32x4_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline int8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_s8 (int8x16_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline int64_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_s64 (int64x2_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline uint8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_u8 (uint8x16_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline uint16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_u16 (uint16x8_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline uint32_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_u32 (uint32x4_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline uint64_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_u64 (uint64x2_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point. */ __extension__ extern __inline void @@ -19804,6 +19972,39 @@ __arm_vst1q_p_f32 (float32_t * __addr, float32x4_t __value, mve_pred16_t __p) return vstrwq_p_f32 (__addr, __value, __p); } +__extension__ extern __inline float16x8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_f16 (float16_t __a, float16x8_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_f32 (float32_t __a, float32x4_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline float16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_f16 (float16x8_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline float32_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_f32 (float32x4_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} #endif enum { @@ -22165,6 +22366,35 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot90_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \ int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot90_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));}) +#define vgetq_lane(p0,p1) __arm_vgetq_lane(p0,p1) +#define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \ + _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \ + int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \ + int (*)[__ARM_mve_type_int16x8_t]: __arm_vgetq_lane_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1), \ + int (*)[__ARM_mve_type_int32x4_t]: __arm_vgetq_lane_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \ + int (*)[__ARM_mve_type_int64x2_t]: __arm_vgetq_lane_s64 (__ARM_mve_coerce(__p0, int64x2_t), p1), \ + int (*)[__ARM_mve_type_uint8x16_t]: __arm_vgetq_lane_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1), \ + int (*)[__ARM_mve_type_uint16x8_t]: __arm_vgetq_lane_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \ + int (*)[__ARM_mve_type_uint32x4_t]: __arm_vgetq_lane_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1), \ + int (*)[__ARM_mve_type_uint64x2_t]: __arm_vgetq_lane_u64 (__ARM_mve_coerce(__p0, uint64x2_t), p1), \ + int (*)[__ARM_mve_type_float16x8_t]: __arm_vgetq_lane_f16 (__ARM_mve_coerce(__p0, float16x8_t), p1), \ + int (*)[__ARM_mve_type_float32x4_t]: __arm_vgetq_lane_f32 (__ARM_mve_coerce(__p0, float32x4_t), p1));}) + +#define vsetq_lane(p0,p1,p2) __arm_vsetq_lane(p0,p1,p2) +#define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \ + __typeof(p1) __p1 = (p1); \ + _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \ + int (*)[__ARM_mve_type_int8_t][__ARM_mve_type_int8x16_t]: __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \ + int (*)[__ARM_mve_type_int16_t][__ARM_mve_type_int16x8_t]: __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \ + int (*)[__ARM_mve_type_int32_t][__ARM_mve_type_int32x4_t]: __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \ + int (*)[__ARM_mve_type_int64_t][__ARM_mve_type_int64x2_t]: __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int64x2_t), p2), \ + int (*)[__ARM_mve_type_uint8_t][__ARM_mve_type_uint8x16_t]: __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \ + int (*)[__ARM_mve_type_uint16_t][__ARM_mve_type_uint16x8_t]: __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \ + int (*)[__ARM_mve_type_uint32_t][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \ + int (*)[__ARM_mve_type_uint64_t][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint64x2_t), p2), \ + int (*)[__ARM_mve_type_float16_t][__ARM_mve_type_float16x8_t]: __arm_vsetq_lane_f16 (__ARM_mve_coerce(__p0, float16_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \ + int (*)[__ARM_mve_type_float32_t][__ARM_mve_type_float32x4_t]: __arm_vsetq_lane_f32 (__ARM_mve_coerce(__p0, float32_t), __ARM_mve_coerce(__p1, float32x4_t), p2));}) + #else /* MVE Interger. */ #define vst4q(p0,p1) __arm_vst4q(p0,p1) @@ -26262,6 +26492,31 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint16_t_const_ptr]: __arm_vld4q_u16 (__ARM_mve_coerce(__p0, uint16_t const *)), \ int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vld4q_u32 (__ARM_mve_coerce(__p0, uint32_t const *)));}) +#define vgetq_lane(p0,p1) __arm_vgetq_lane(p0,p1) +#define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \ + _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \ + int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \ + int (*)[__ARM_mve_type_int16x8_t]: __arm_vgetq_lane_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1), \ + int (*)[__ARM_mve_type_int32x4_t]: __arm_vgetq_lane_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \ + int (*)[__ARM_mve_type_int64x2_t]: __arm_vgetq_lane_s64 (__ARM_mve_coerce(__p0, int64x2_t), p1), \ + int (*)[__ARM_mve_type_uint8x16_t]: __arm_vgetq_lane_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1), \ + int (*)[__ARM_mve_type_uint16x8_t]: __arm_vgetq_lane_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \ + int (*)[__ARM_mve_type_uint32x4_t]: __arm_vgetq_lane_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1), \ + int (*)[__ARM_mve_type_uint64x2_t]: __arm_vgetq_lane_u64 (__ARM_mve_coerce(__p0, uint64x2_t), p1));}) + +#define vsetq_lane(p0,p1,p2) __arm_vsetq_lane(p0,p1,p2) +#define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \ + __typeof(p1) __p1 = (p1); \ + _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \ + int (*)[__ARM_mve_type_int8_t][__ARM_mve_type_int8x16_t]: __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \ + int (*)[__ARM_mve_type_int16_t][__ARM_mve_type_int16x8_t]: __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \ + int (*)[__ARM_mve_type_int32_t][__ARM_mve_type_int32x4_t]: __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \ + int (*)[__ARM_mve_type_int64_t][__ARM_mve_type_int64x2_t]: __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int64x2_t), p2), \ + int (*)[__ARM_mve_type_uint8_t][__ARM_mve_type_uint8x16_t]: __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \ + int (*)[__ARM_mve_type_uint16_t][__ARM_mve_type_uint16x8_t]: __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \ + int (*)[__ARM_mve_type_uint32_t][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \ + int (*)[__ARM_mve_type_uint64_t][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint64x2_t), p2));}) + #endif /* MVE Floating point. */ #ifdef __cplusplus diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 62735e5ddab1125e6f38fbf5a5cb5c04936a7717..b679511e42ce909cc9ef19e1cb790e8a5254d538 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -411,6 +411,8 @@ (define_mode_attr MVE_H_ELEM [ (V8HI "V8HI") (V4SI "V4HI")]) (define_mode_attr V_sz_elem1 [(V16QI "b") (V8HI "h") (V4SI "w") (V8HF "h") (V4SF "w")]) +(define_mode_attr V_extr_elem [(V16QI "u8") (V8HI "u16") (V4SI "32") + (V8HF "u16") (V4SF "32")]) (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U]) (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S]) @@ -10863,3 +10865,121 @@ return ""; } [(set_attr "length" "16")]) +;; +;; [vgetq_lane_u, vgetq_lane_s, vgetq_lane_f]) +;; +(define_insn "mve_vec_extract" + [(set (match_operand: 0 "s_register_operand" "=r") + (vec_select: + (match_operand:MVE_VLD_ST 1 "s_register_operand" "w") + (parallel [(match_operand:SI 2 "immediate_operand" "i")])))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" +{ + if (BYTES_BIG_ENDIAN) + { + int elt = INTVAL (operands[2]); + elt = GET_MODE_NUNITS (mode) - 1 - elt; + operands[2] = GEN_INT (elt); + } + return "vmov.\t%0, %q1[%c2]"; +} + [(set_attr "type" "mve_move")]) + +(define_insn "mve_vec_extractv2didi" + [(set (match_operand:DI 0 "s_register_operand" "=r") + (vec_select:DI + (match_operand:V2DI 1 "s_register_operand" "w") + (parallel [(match_operand:SI 2 "immediate_operand" "i")])))] + "TARGET_HAVE_MVE" +{ + int elt = INTVAL (operands[2]); + if (BYTES_BIG_ENDIAN) + elt = 1 - elt; + + if (elt == 0) + return "vmov\t%Q0, %R0, %e1"; + else + return "vmov\t%J0, %K0, %f1"; +} + [(set_attr "type" "mve_move")]) + +(define_insn "*mve_vec_extract_sext_internal" + [(set (match_operand:SI 0 "s_register_operand" "=r") + (sign_extend:SI + (vec_select: + (match_operand:MVE_2 1 "s_register_operand" "w") + (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" +{ + if (BYTES_BIG_ENDIAN) + { + int elt = INTVAL (operands[2]); + elt = GET_MODE_NUNITS (mode) - 1 - elt; + operands[2] = GEN_INT (elt); + } + return "vmov.s\t%0, %q1[%c2]"; +} + [(set_attr "type" "mve_move")]) + +(define_insn "*mve_vec_extract_zext_internal" + [(set (match_operand:SI 0 "s_register_operand" "=r") + (zero_extend:SI + (vec_select: + (match_operand:MVE_2 1 "s_register_operand" "w") + (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" +{ + if (BYTES_BIG_ENDIAN) + { + int elt = INTVAL (operands[2]); + elt = GET_MODE_NUNITS (mode) - 1 - elt; + operands[2] = GEN_INT (elt); + } + return "vmov.u\t%0, %q1[%c2]"; +} + [(set_attr "type" "mve_move")]) + +;; +;; [vsetq_lane_u, vsetq_lane_s, vsetq_lane_f]) +;; +(define_insn "mve_vec_set_internal" + [(set (match_operand:VQ2 0 "s_register_operand" "=w") + (vec_merge:VQ2 + (vec_duplicate:VQ2 + (match_operand: 1 "nonimmediate_operand" "r")) + (match_operand:VQ2 3 "s_register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" +{ + int elt = ffs ((int) INTVAL (operands[2])) - 1; + if (BYTES_BIG_ENDIAN) + elt = GET_MODE_NUNITS (mode) - 1 - elt; + operands[2] = GEN_INT (elt); + + return "vmov.\t%q0[%c2], %1"; +} + [(set_attr "type" "mve_move")]) + +(define_insn "mve_vec_setv2di_internal" + [(set (match_operand:V2DI 0 "s_register_operand" "=w") + (vec_merge:V2DI + (vec_duplicate:V2DI + (match_operand:DI 1 "nonimmediate_operand" "r")) + (match_operand:V2DI 3 "s_register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")))] + "TARGET_HAVE_MVE" +{ + int elt = ffs ((int) INTVAL (operands[2])) - 1; + if (BYTES_BIG_ENDIAN) + elt = 1 - elt; + + if (elt == 0) + return "vmov\t%e0, %Q1, %R1"; + else + return "vmov\t%f0, %J1, %K1"; +} + [(set_attr "type" "mve_move")]) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index df8f4fd4166b0d9cd08f01f3ac7ac3958f20b9db..72bdf557dc9fd16a6d97286d7232f9a9071a6e5f 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -411,18 +411,6 @@ [(set_attr "type" "neon_load1_all_lanes_q,neon_from_gp_q")] ) -(define_expand "vec_set" - [(match_operand:VDQ 0 "s_register_operand") - (match_operand: 1 "s_register_operand") - (match_operand:SI 2 "immediate_operand")] - "TARGET_NEON" -{ - HOST_WIDE_INT elem = HOST_WIDE_INT_1 << INTVAL (operands[2]); - emit_insn (gen_vec_set_internal (operands[0], operands[1], - GEN_INT (elem), operands[0])); - DONE; -}) - (define_insn "vec_extract" [(set (match_operand: 0 "nonimmediate_operand" "=Um,r") (vec_select: @@ -445,7 +433,10 @@ [(set_attr "type" "neon_store1_one_lane,neon_to_gp")] ) -(define_insn "vec_extract" +;; This pattern is renamed from "vec_extract" to +;; "neon_vec_extract" and this pattern is called +;; by define_expand in vec-common.md file. +(define_insn "neon_vec_extract" [(set (match_operand: 0 "nonimmediate_operand" "=Um,r") (vec_select: (match_operand:VQ2 1 "s_register_operand" "w,w") @@ -471,7 +462,9 @@ [(set_attr "type" "neon_store1_one_lane,neon_to_gp")] ) -(define_insn "vec_extractv2didi" +;; This pattern is renamed from "vec_extractv2didi" to "neon_vec_extractv2didi" +;; and this pattern is called by define_expand in vec-common.md file. +(define_insn "neon_vec_extractv2didi" [(set (match_operand:DI 0 "nonimmediate_operand" "=Um,r") (vec_select:DI (match_operand:V2DI 1 "s_register_operand" "w,w") diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md index 82a1a6bd7698fa25db88a5cdb5b3e762dc80a589..e052047f8b18e4b251ea0b322448c414b53ea422 100644 --- a/gcc/config/arm/vec-common.md +++ b/gcc/config/arm/vec-common.md @@ -190,3 +190,40 @@ arm_expand_vec_perm (operands[0], operands[1], operands[2], operands[3]); DONE; }) + +;; The following expand patterns are moved from neon.md to here and +;; modifications are made to support both NEON and MVE. This are needed for +;; intrinsics vgetq_lane and vsetq_lane intrinsics in MVE. + +(define_expand "vec_extract" + [(match_operand: 0 "nonimmediate_operand") + (match_operand:VQX 1 "s_register_operand") + (match_operand:SI 2 "immediate_operand")] + "TARGET_NEON || TARGET_HAVE_MVE" +{ + if (TARGET_NEON) + emit_insn (gen_neon_vec_extract (operands[0], operands[1], + operands[2])); + else if (TARGET_HAVE_MVE) + emit_insn (gen_mve_vec_extract (operands[0], operands[1], + operands[2])); + else + gcc_unreachable (); + DONE; +}) + +(define_expand "vec_set" + [(match_operand:VQX 0 "s_register_operand" "") + (match_operand: 1 "s_register_operand" "") + (match_operand:SI 2 "immediate_operand" "")] + "TARGET_NEON || TARGET_HAVE_MVE" +{ + HOST_WIDE_INT elem = HOST_WIDE_INT_1 << INTVAL (operands[2]); + if (TARGET_NEON) + emit_insn (gen_vec_set_internal (operands[0], operands[1], + GEN_INT (elem), operands[0])); + else + emit_insn (gen_mve_vec_set_internal (operands[0], operands[1], + GEN_INT (elem), operands[0])); + DONE; +}) diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c new file mode 100644 index 0000000000000000000000000000000000000000..ba5cf8187080919ff8df24b6a747ebe40a701a17 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +float16_t +foo (float16x8_t a) +{ + return vgetq_lane_f16 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.u16" } } */ + +float16_t +foo1 (float16x8_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c new file mode 100644 index 0000000000000000000000000000000000000000..e660e71bfb44b56f9e3f7c6ea55fe33753961641 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +float32_t +foo (float32x4_t a) +{ + return vgetq_lane_f32 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ + +float32_t +foo1 (float32x4_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c new file mode 100644 index 0000000000000000000000000000000000000000..51609c3c4e40240204df1645717ed9eee38d897b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int16_t +foo (int16x8_t a) +{ + return vgetq_lane_s16 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.s16" } } */ + +int16_t +foo1 (int16x8_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.s16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c new file mode 100644 index 0000000000000000000000000000000000000000..47f092faeaef9abbcb947eb94a2a29ff011216c3 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int32_t +foo (int32x4_t a) +{ + return vgetq_lane_s32 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ + +int32_t +foo1 (int32x4_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c new file mode 100644 index 0000000000000000000000000000000000000000..4e048dbf3f64548891232790e5da21718b4d33ad --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int64_t +foo (int64x2_t a) +{ + return vgetq_lane_s64 (a, 0); +} + +/* { dg-final { scan-assembler {vmov\tr0, r1, d0} } } */ + +int64_t +foo1 (int64x2_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler {vmov\tr0, r1, d0} } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c new file mode 100644 index 0000000000000000000000000000000000000000..ccf26912915411952eef220c3cc689df3f7504bb --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int8_t +foo (int8x16_t a) +{ + return vgetq_lane_s8 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.s8" } } */ + +int8_t +foo1 (int8x16_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.s8" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c new file mode 100644 index 0000000000000000000000000000000000000000..ea463a6cc782cc04ea4607a20751d57f7d84808f --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint16_t +foo (uint16x8_t a) +{ + return vgetq_lane_u16 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.u16" } } */ + +uint16_t +foo1 (uint16x8_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c new file mode 100644 index 0000000000000000000000000000000000000000..e136d2e8ab2b99aff08e51eb3d8b32602d5a9990 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint32_t +foo (uint32x4_t a) +{ + return vgetq_lane_u32 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ + +uint32_t +foo1 (uint32x4_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c new file mode 100644 index 0000000000000000000000000000000000000000..427687fbf404940fc5a8334bac5034e2aeb9112e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint64_t +foo (uint64x2_t a) +{ + return vgetq_lane_u64 (a, 0); +} + +/* { dg-final { scan-assembler {vmov\tr0, r1, d0} } } */ + +uint64_t +foo1 (uint64x2_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler {vmov\tr0, r1, d0} } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c new file mode 100644 index 0000000000000000000000000000000000000000..4db14fce783eead4c0f602b02f7e1353adc0c671 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint8_t +foo (uint8x16_t a) +{ + return vgetq_lane_u8 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.u8" } } */ + +uint8_t +foo1 (uint8x16_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.u8" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c new file mode 100644 index 0000000000000000000000000000000000000000..a2e4b765b61c612c580e2177a57255786e5529e2 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +float16x8_t +foo (float16_t a, float16x8_t b) +{ + return vsetq_lane_f16 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.16" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c new file mode 100644 index 0000000000000000000000000000000000000000..c538d62c41145434a74770b75c762da4c3fc5898 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +float32x4_t +foo (float32_t a, float32x4_t b) +{ + return vsetq_lane_f32 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c new file mode 100644 index 0000000000000000000000000000000000000000..648789dc8df3f7bdc640255c6fb3f458a554c8e5 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int16x8_t +foo (int16_t a, int16x8_t b) +{ + return vsetq_lane_s16 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.16" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c new file mode 100644 index 0000000000000000000000000000000000000000..18ca25b637ca5335d8068aee41f2a220416f660d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int32x4_t +foo (int32_t a, int32x4_t b) +{ + return vsetq_lane_s32 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c new file mode 100644 index 0000000000000000000000000000000000000000..0c82eeb081cb2733078a07a7e13f3854a51afd27 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int64x2_t +foo (int64_t a, int64x2_t b) +{ + return vsetq_lane_s64 (a, b, 0); +} + +/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]} } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c new file mode 100644 index 0000000000000000000000000000000000000000..87c06cd013da9a6b8e5a866bc946f2e10cd81522 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int8x16_t +foo (int8_t a, int8x16_t b) +{ + return vsetq_lane_s8 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.8" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c new file mode 100644 index 0000000000000000000000000000000000000000..92984297ad1d83bab29d1cfd12d72932eda50fe4 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint16x8_t +foo (uint16_t a, uint16x8_t b) +{ + return vsetq_lane_u16 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.16" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c new file mode 100644 index 0000000000000000000000000000000000000000..755e7c9a978cd28dfcef108f6e624413fea35dff --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint32x4_t +foo (uint32_t a, uint32x4_t b) +{ + return vsetq_lane_u32 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c new file mode 100644 index 0000000000000000000000000000000000000000..d7d593b2193cec787fefe547a47b1261def00f40 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint64x2_t +foo (uint64_t a, uint64x2_t b) +{ + return vsetq_lane_u64 (a, b, 0); +} + +/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]} } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c new file mode 100644 index 0000000000000000000000000000000000000000..b263360fc1a23124e91a705232cc6eb77012f803 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint8x16_t +foo (uint8_t a, uint8x16_t b) +{ + return vsetq_lane_u8 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.8" } } */ + diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index d0259d7bd96c565d901b7634e9f735e0e14bc9dc..9dcf8d692670cd8552fade9868bc051683553b91 100644 --- a/gcc/config/arm/arm_mve.h +++ b/gcc/config/arm/arm_mve.h @@ -2506,8 +2506,40 @@ typedef struct { uint8x16_t val[4]; } uint8x16x4_t; #define vld1q_z_f32(__base, __p) __arm_vld1q_z_f32(__base, __p) #define vst2q_f32(__addr, __value) __arm_vst2q_f32(__addr, __value) #define vst1q_p_f32(__addr, __value, __p) __arm_vst1q_p_f32(__addr, __value, __p) +#define vsetq_lane_f16(__a, __b, __idx) __arm_vsetq_lane_f16(__a, __b, __idx) +#define vsetq_lane_f32(__a, __b, __idx) __arm_vsetq_lane_f32(__a, __b, __idx) +#define vsetq_lane_s16(__a, __b, __idx) __arm_vsetq_lane_s16(__a, __b, __idx) +#define vsetq_lane_s32(__a, __b, __idx) __arm_vsetq_lane_s32(__a, __b, __idx) +#define vsetq_lane_s8(__a, __b, __idx) __arm_vsetq_lane_s8(__a, __b, __idx) +#define vsetq_lane_s64(__a, __b, __idx) __arm_vsetq_lane_s64(__a, __b, __idx) +#define vsetq_lane_u8(__a, __b, __idx) __arm_vsetq_lane_u8(__a, __b, __idx) +#define vsetq_lane_u16(__a, __b, __idx) __arm_vsetq_lane_u16(__a, __b, __idx) +#define vsetq_lane_u32(__a, __b, __idx) __arm_vsetq_lane_u32(__a, __b, __idx) +#define vsetq_lane_u64(__a, __b, __idx) __arm_vsetq_lane_u64(__a, __b, __idx) +#define vgetq_lane_f16(__a, __idx) __arm_vgetq_lane_f16(__a, __idx) +#define vgetq_lane_f32(__a, __idx) __arm_vgetq_lane_f32(__a, __idx) +#define vgetq_lane_s16(__a, __idx) __arm_vgetq_lane_s16(__a, __idx) +#define vgetq_lane_s32(__a, __idx) __arm_vgetq_lane_s32(__a, __idx) +#define vgetq_lane_s8(__a, __idx) __arm_vgetq_lane_s8(__a, __idx) +#define vgetq_lane_s64(__a, __idx) __arm_vgetq_lane_s64(__a, __idx) +#define vgetq_lane_u8(__a, __idx) __arm_vgetq_lane_u8(__a, __idx) +#define vgetq_lane_u16(__a, __idx) __arm_vgetq_lane_u16(__a, __idx) +#define vgetq_lane_u32(__a, __idx) __arm_vgetq_lane_u32(__a, __idx) +#define vgetq_lane_u64(__a, __idx) __arm_vgetq_lane_u64(__a, __idx) #endif +/* For big-endian, GCC's vector indices are reversed within each 64 bits + compared to the architectural lane indices used by MVE intrinsics. */ +#define __ARM_NUM_LANES(__v) (sizeof (__v) / sizeof (__v[0])) +#ifdef __ARM_BIG_ENDIAN +#define __ARM_LANEQ(__vec, __idx) (__idx ^ (__ARM_NUM_LANES(__vec)/2 - 1)) +#else +#define __ARM_LANEQ(__vec, __idx) __idx +#endif +#define __ARM_CHECK_LANEQ(__vec, __idx) \ + __builtin_arm_lane_check (__ARM_NUM_LANES(__vec), \ + __ARM_LANEQ(__vec, __idx)) + __extension__ extern __inline void __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) __arm_vst4q_s8 (int8_t * __addr, int8x16x4_t __value) @@ -16371,6 +16403,142 @@ __arm_vld4q_u32 (uint32_t const * __addr) return __rv.__i; } +__extension__ extern __inline int16x8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_s16 (int16_t __a, int16x8_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_s32 (int32_t __a, int32x4_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline int8x16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_s8 (int8_t __a, int8x16_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline int64x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_s64 (int64_t __a, int64x2_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline uint8x16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_u8 (uint8_t __a, uint8x16_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline uint16x8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_u16 (uint16_t __a, uint16x8_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline uint32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_u32 (uint32_t __a, uint32x4_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline uint64x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_u64 (uint64_t __a, uint64x2_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline int16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_s16 (int16x8_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline int32_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_s32 (int32x4_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline int8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_s8 (int8x16_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline int64_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_s64 (int64x2_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline uint8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_u8 (uint8x16_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline uint16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_u16 (uint16x8_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline uint32_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_u32 (uint32x4_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline uint64_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_u64 (uint64x2_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point. */ __extension__ extern __inline void @@ -19804,6 +19972,39 @@ __arm_vst1q_p_f32 (float32_t * __addr, float32x4_t __value, mve_pred16_t __p) return vstrwq_p_f32 (__addr, __value, __p); } +__extension__ extern __inline float16x8_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_f16 (float16_t __a, float16x8_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vsetq_lane_f32 (float32_t __a, float32x4_t __b, const int __idx) +{ + __ARM_CHECK_LANEQ (__b, __idx); + __b[__ARM_LANEQ(__b,__idx)] = __a; + return __b; +} + +__extension__ extern __inline float16_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_f16 (float16x8_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} + +__extension__ extern __inline float32_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +__arm_vgetq_lane_f32 (float32x4_t __a, const int __idx) +{ + __ARM_CHECK_LANEQ (__a, __idx); + return __a[__ARM_LANEQ(__a,__idx)]; +} #endif enum { @@ -22165,6 +22366,35 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: __arm_vcmulq_rot90_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, float16x8_t), p3), \ int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: __arm_vcmulq_rot90_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, float32x4_t), p3));}) +#define vgetq_lane(p0,p1) __arm_vgetq_lane(p0,p1) +#define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \ + _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \ + int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \ + int (*)[__ARM_mve_type_int16x8_t]: __arm_vgetq_lane_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1), \ + int (*)[__ARM_mve_type_int32x4_t]: __arm_vgetq_lane_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \ + int (*)[__ARM_mve_type_int64x2_t]: __arm_vgetq_lane_s64 (__ARM_mve_coerce(__p0, int64x2_t), p1), \ + int (*)[__ARM_mve_type_uint8x16_t]: __arm_vgetq_lane_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1), \ + int (*)[__ARM_mve_type_uint16x8_t]: __arm_vgetq_lane_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \ + int (*)[__ARM_mve_type_uint32x4_t]: __arm_vgetq_lane_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1), \ + int (*)[__ARM_mve_type_uint64x2_t]: __arm_vgetq_lane_u64 (__ARM_mve_coerce(__p0, uint64x2_t), p1), \ + int (*)[__ARM_mve_type_float16x8_t]: __arm_vgetq_lane_f16 (__ARM_mve_coerce(__p0, float16x8_t), p1), \ + int (*)[__ARM_mve_type_float32x4_t]: __arm_vgetq_lane_f32 (__ARM_mve_coerce(__p0, float32x4_t), p1));}) + +#define vsetq_lane(p0,p1,p2) __arm_vsetq_lane(p0,p1,p2) +#define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \ + __typeof(p1) __p1 = (p1); \ + _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \ + int (*)[__ARM_mve_type_int8_t][__ARM_mve_type_int8x16_t]: __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \ + int (*)[__ARM_mve_type_int16_t][__ARM_mve_type_int16x8_t]: __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \ + int (*)[__ARM_mve_type_int32_t][__ARM_mve_type_int32x4_t]: __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \ + int (*)[__ARM_mve_type_int64_t][__ARM_mve_type_int64x2_t]: __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int64x2_t), p2), \ + int (*)[__ARM_mve_type_uint8_t][__ARM_mve_type_uint8x16_t]: __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \ + int (*)[__ARM_mve_type_uint16_t][__ARM_mve_type_uint16x8_t]: __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \ + int (*)[__ARM_mve_type_uint32_t][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \ + int (*)[__ARM_mve_type_uint64_t][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint64x2_t), p2), \ + int (*)[__ARM_mve_type_float16_t][__ARM_mve_type_float16x8_t]: __arm_vsetq_lane_f16 (__ARM_mve_coerce(__p0, float16_t), __ARM_mve_coerce(__p1, float16x8_t), p2), \ + int (*)[__ARM_mve_type_float32_t][__ARM_mve_type_float32x4_t]: __arm_vsetq_lane_f32 (__ARM_mve_coerce(__p0, float32_t), __ARM_mve_coerce(__p1, float32x4_t), p2));}) + #else /* MVE Interger. */ #define vst4q(p0,p1) __arm_vst4q(p0,p1) @@ -26262,6 +26492,31 @@ extern void *__ARM_undef; int (*)[__ARM_mve_type_uint16_t_const_ptr]: __arm_vld4q_u16 (__ARM_mve_coerce(__p0, uint16_t const *)), \ int (*)[__ARM_mve_type_uint32_t_const_ptr]: __arm_vld4q_u32 (__ARM_mve_coerce(__p0, uint32_t const *)));}) +#define vgetq_lane(p0,p1) __arm_vgetq_lane(p0,p1) +#define __arm_vgetq_lane(p0,p1) ({ __typeof(p0) __p0 = (p0); \ + _Generic( (int (*)[__ARM_mve_typeid(__p0)])0, \ + int (*)[__ARM_mve_type_int8x16_t]: __arm_vgetq_lane_s8 (__ARM_mve_coerce(__p0, int8x16_t), p1), \ + int (*)[__ARM_mve_type_int16x8_t]: __arm_vgetq_lane_s16 (__ARM_mve_coerce(__p0, int16x8_t), p1), \ + int (*)[__ARM_mve_type_int32x4_t]: __arm_vgetq_lane_s32 (__ARM_mve_coerce(__p0, int32x4_t), p1), \ + int (*)[__ARM_mve_type_int64x2_t]: __arm_vgetq_lane_s64 (__ARM_mve_coerce(__p0, int64x2_t), p1), \ + int (*)[__ARM_mve_type_uint8x16_t]: __arm_vgetq_lane_u8 (__ARM_mve_coerce(__p0, uint8x16_t), p1), \ + int (*)[__ARM_mve_type_uint16x8_t]: __arm_vgetq_lane_u16 (__ARM_mve_coerce(__p0, uint16x8_t), p1), \ + int (*)[__ARM_mve_type_uint32x4_t]: __arm_vgetq_lane_u32 (__ARM_mve_coerce(__p0, uint32x4_t), p1), \ + int (*)[__ARM_mve_type_uint64x2_t]: __arm_vgetq_lane_u64 (__ARM_mve_coerce(__p0, uint64x2_t), p1));}) + +#define vsetq_lane(p0,p1,p2) __arm_vsetq_lane(p0,p1,p2) +#define __arm_vsetq_lane(p0,p1,p2) ({ __typeof(p0) __p0 = (p0); \ + __typeof(p1) __p1 = (p1); \ + _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \ + int (*)[__ARM_mve_type_int8_t][__ARM_mve_type_int8x16_t]: __arm_vsetq_lane_s8 (__ARM_mve_coerce(__p0, int8_t), __ARM_mve_coerce(__p1, int8x16_t), p2), \ + int (*)[__ARM_mve_type_int16_t][__ARM_mve_type_int16x8_t]: __arm_vsetq_lane_s16 (__ARM_mve_coerce(__p0, int16_t), __ARM_mve_coerce(__p1, int16x8_t), p2), \ + int (*)[__ARM_mve_type_int32_t][__ARM_mve_type_int32x4_t]: __arm_vsetq_lane_s32 (__ARM_mve_coerce(__p0, int32_t), __ARM_mve_coerce(__p1, int32x4_t), p2), \ + int (*)[__ARM_mve_type_int64_t][__ARM_mve_type_int64x2_t]: __arm_vsetq_lane_s64 (__ARM_mve_coerce(__p0, int64_t), __ARM_mve_coerce(__p1, int64x2_t), p2), \ + int (*)[__ARM_mve_type_uint8_t][__ARM_mve_type_uint8x16_t]: __arm_vsetq_lane_u8 (__ARM_mve_coerce(__p0, uint8_t), __ARM_mve_coerce(__p1, uint8x16_t), p2), \ + int (*)[__ARM_mve_type_uint16_t][__ARM_mve_type_uint16x8_t]: __arm_vsetq_lane_u16 (__ARM_mve_coerce(__p0, uint16_t), __ARM_mve_coerce(__p1, uint16x8_t), p2), \ + int (*)[__ARM_mve_type_uint32_t][__ARM_mve_type_uint32x4_t]: __arm_vsetq_lane_u32 (__ARM_mve_coerce(__p0, uint32_t), __ARM_mve_coerce(__p1, uint32x4_t), p2), \ + int (*)[__ARM_mve_type_uint64_t][__ARM_mve_type_uint64x2_t]: __arm_vsetq_lane_u64 (__ARM_mve_coerce(__p0, uint64_t), __ARM_mve_coerce(__p1, uint64x2_t), p2));}) + #endif /* MVE Floating point. */ #ifdef __cplusplus diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 62735e5ddab1125e6f38fbf5a5cb5c04936a7717..b679511e42ce909cc9ef19e1cb790e8a5254d538 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -411,6 +411,8 @@ (define_mode_attr MVE_H_ELEM [ (V8HI "V8HI") (V4SI "V4HI")]) (define_mode_attr V_sz_elem1 [(V16QI "b") (V8HI "h") (V4SI "w") (V8HF "h") (V4SF "w")]) +(define_mode_attr V_extr_elem [(V16QI "u8") (V8HI "u16") (V4SI "32") + (V8HF "u16") (V4SF "32")]) (define_int_iterator VCVTQ_TO_F [VCVTQ_TO_F_S VCVTQ_TO_F_U]) (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S]) @@ -10863,3 +10865,121 @@ return ""; } [(set_attr "length" "16")]) +;; +;; [vgetq_lane_u, vgetq_lane_s, vgetq_lane_f]) +;; +(define_insn "mve_vec_extract" + [(set (match_operand: 0 "s_register_operand" "=r") + (vec_select: + (match_operand:MVE_VLD_ST 1 "s_register_operand" "w") + (parallel [(match_operand:SI 2 "immediate_operand" "i")])))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" +{ + if (BYTES_BIG_ENDIAN) + { + int elt = INTVAL (operands[2]); + elt = GET_MODE_NUNITS (mode) - 1 - elt; + operands[2] = GEN_INT (elt); + } + return "vmov.\t%0, %q1[%c2]"; +} + [(set_attr "type" "mve_move")]) + +(define_insn "mve_vec_extractv2didi" + [(set (match_operand:DI 0 "s_register_operand" "=r") + (vec_select:DI + (match_operand:V2DI 1 "s_register_operand" "w") + (parallel [(match_operand:SI 2 "immediate_operand" "i")])))] + "TARGET_HAVE_MVE" +{ + int elt = INTVAL (operands[2]); + if (BYTES_BIG_ENDIAN) + elt = 1 - elt; + + if (elt == 0) + return "vmov\t%Q0, %R0, %e1"; + else + return "vmov\t%J0, %K0, %f1"; +} + [(set_attr "type" "mve_move")]) + +(define_insn "*mve_vec_extract_sext_internal" + [(set (match_operand:SI 0 "s_register_operand" "=r") + (sign_extend:SI + (vec_select: + (match_operand:MVE_2 1 "s_register_operand" "w") + (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" +{ + if (BYTES_BIG_ENDIAN) + { + int elt = INTVAL (operands[2]); + elt = GET_MODE_NUNITS (mode) - 1 - elt; + operands[2] = GEN_INT (elt); + } + return "vmov.s\t%0, %q1[%c2]"; +} + [(set_attr "type" "mve_move")]) + +(define_insn "*mve_vec_extract_zext_internal" + [(set (match_operand:SI 0 "s_register_operand" "=r") + (zero_extend:SI + (vec_select: + (match_operand:MVE_2 1 "s_register_operand" "w") + (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" +{ + if (BYTES_BIG_ENDIAN) + { + int elt = INTVAL (operands[2]); + elt = GET_MODE_NUNITS (mode) - 1 - elt; + operands[2] = GEN_INT (elt); + } + return "vmov.u\t%0, %q1[%c2]"; +} + [(set_attr "type" "mve_move")]) + +;; +;; [vsetq_lane_u, vsetq_lane_s, vsetq_lane_f]) +;; +(define_insn "mve_vec_set_internal" + [(set (match_operand:VQ2 0 "s_register_operand" "=w") + (vec_merge:VQ2 + (vec_duplicate:VQ2 + (match_operand: 1 "nonimmediate_operand" "r")) + (match_operand:VQ2 3 "s_register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")))] + "(TARGET_HAVE_MVE && VALID_MVE_SI_MODE (mode)) + || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (mode))" +{ + int elt = ffs ((int) INTVAL (operands[2])) - 1; + if (BYTES_BIG_ENDIAN) + elt = GET_MODE_NUNITS (mode) - 1 - elt; + operands[2] = GEN_INT (elt); + + return "vmov.\t%q0[%c2], %1"; +} + [(set_attr "type" "mve_move")]) + +(define_insn "mve_vec_setv2di_internal" + [(set (match_operand:V2DI 0 "s_register_operand" "=w") + (vec_merge:V2DI + (vec_duplicate:V2DI + (match_operand:DI 1 "nonimmediate_operand" "r")) + (match_operand:V2DI 3 "s_register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")))] + "TARGET_HAVE_MVE" +{ + int elt = ffs ((int) INTVAL (operands[2])) - 1; + if (BYTES_BIG_ENDIAN) + elt = 1 - elt; + + if (elt == 0) + return "vmov\t%e0, %Q1, %R1"; + else + return "vmov\t%f0, %J1, %K1"; +} + [(set_attr "type" "mve_move")]) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index df8f4fd4166b0d9cd08f01f3ac7ac3958f20b9db..72bdf557dc9fd16a6d97286d7232f9a9071a6e5f 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -411,18 +411,6 @@ [(set_attr "type" "neon_load1_all_lanes_q,neon_from_gp_q")] ) -(define_expand "vec_set" - [(match_operand:VDQ 0 "s_register_operand") - (match_operand: 1 "s_register_operand") - (match_operand:SI 2 "immediate_operand")] - "TARGET_NEON" -{ - HOST_WIDE_INT elem = HOST_WIDE_INT_1 << INTVAL (operands[2]); - emit_insn (gen_vec_set_internal (operands[0], operands[1], - GEN_INT (elem), operands[0])); - DONE; -}) - (define_insn "vec_extract" [(set (match_operand: 0 "nonimmediate_operand" "=Um,r") (vec_select: @@ -445,7 +433,10 @@ [(set_attr "type" "neon_store1_one_lane,neon_to_gp")] ) -(define_insn "vec_extract" +;; This pattern is renamed from "vec_extract" to +;; "neon_vec_extract" and this pattern is called +;; by define_expand in vec-common.md file. +(define_insn "neon_vec_extract" [(set (match_operand: 0 "nonimmediate_operand" "=Um,r") (vec_select: (match_operand:VQ2 1 "s_register_operand" "w,w") @@ -471,7 +462,9 @@ [(set_attr "type" "neon_store1_one_lane,neon_to_gp")] ) -(define_insn "vec_extractv2didi" +;; This pattern is renamed from "vec_extractv2didi" to "neon_vec_extractv2didi" +;; and this pattern is called by define_expand in vec-common.md file. +(define_insn "neon_vec_extractv2didi" [(set (match_operand:DI 0 "nonimmediate_operand" "=Um,r") (vec_select:DI (match_operand:V2DI 1 "s_register_operand" "w,w") diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md index 82a1a6bd7698fa25db88a5cdb5b3e762dc80a589..e052047f8b18e4b251ea0b322448c414b53ea422 100644 --- a/gcc/config/arm/vec-common.md +++ b/gcc/config/arm/vec-common.md @@ -190,3 +190,40 @@ arm_expand_vec_perm (operands[0], operands[1], operands[2], operands[3]); DONE; }) + +;; The following expand patterns are moved from neon.md to here and +;; modifications are made to support both NEON and MVE. This are needed for +;; intrinsics vgetq_lane and vsetq_lane intrinsics in MVE. + +(define_expand "vec_extract" + [(match_operand: 0 "nonimmediate_operand") + (match_operand:VQX 1 "s_register_operand") + (match_operand:SI 2 "immediate_operand")] + "TARGET_NEON || TARGET_HAVE_MVE" +{ + if (TARGET_NEON) + emit_insn (gen_neon_vec_extract (operands[0], operands[1], + operands[2])); + else if (TARGET_HAVE_MVE) + emit_insn (gen_mve_vec_extract (operands[0], operands[1], + operands[2])); + else + gcc_unreachable (); + DONE; +}) + +(define_expand "vec_set" + [(match_operand:VQX 0 "s_register_operand" "") + (match_operand: 1 "s_register_operand" "") + (match_operand:SI 2 "immediate_operand" "")] + "TARGET_NEON || TARGET_HAVE_MVE" +{ + HOST_WIDE_INT elem = HOST_WIDE_INT_1 << INTVAL (operands[2]); + if (TARGET_NEON) + emit_insn (gen_vec_set_internal (operands[0], operands[1], + GEN_INT (elem), operands[0])); + else + emit_insn (gen_mve_vec_set_internal (operands[0], operands[1], + GEN_INT (elem), operands[0])); + DONE; +}) diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c new file mode 100644 index 0000000000000000000000000000000000000000..ba5cf8187080919ff8df24b6a747ebe40a701a17 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +float16_t +foo (float16x8_t a) +{ + return vgetq_lane_f16 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.u16" } } */ + +float16_t +foo1 (float16x8_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c new file mode 100644 index 0000000000000000000000000000000000000000..e660e71bfb44b56f9e3f7c6ea55fe33753961641 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +float32_t +foo (float32x4_t a) +{ + return vgetq_lane_f32 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ + +float32_t +foo1 (float32x4_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c new file mode 100644 index 0000000000000000000000000000000000000000..51609c3c4e40240204df1645717ed9eee38d897b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int16_t +foo (int16x8_t a) +{ + return vgetq_lane_s16 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.s16" } } */ + +int16_t +foo1 (int16x8_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.s16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c new file mode 100644 index 0000000000000000000000000000000000000000..47f092faeaef9abbcb947eb94a2a29ff011216c3 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int32_t +foo (int32x4_t a) +{ + return vgetq_lane_s32 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ + +int32_t +foo1 (int32x4_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c new file mode 100644 index 0000000000000000000000000000000000000000..4e048dbf3f64548891232790e5da21718b4d33ad --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s64.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int64_t +foo (int64x2_t a) +{ + return vgetq_lane_s64 (a, 0); +} + +/* { dg-final { scan-assembler {vmov\tr0, r1, d0} } } */ + +int64_t +foo1 (int64x2_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler {vmov\tr0, r1, d0} } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c new file mode 100644 index 0000000000000000000000000000000000000000..ccf26912915411952eef220c3cc689df3f7504bb --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int8_t +foo (int8x16_t a) +{ + return vgetq_lane_s8 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.s8" } } */ + +int8_t +foo1 (int8x16_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.s8" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c new file mode 100644 index 0000000000000000000000000000000000000000..ea463a6cc782cc04ea4607a20751d57f7d84808f --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint16_t +foo (uint16x8_t a) +{ + return vgetq_lane_u16 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.u16" } } */ + +uint16_t +foo1 (uint16x8_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c new file mode 100644 index 0000000000000000000000000000000000000000..e136d2e8ab2b99aff08e51eb3d8b32602d5a9990 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint32_t +foo (uint32x4_t a) +{ + return vgetq_lane_u32 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ + +uint32_t +foo1 (uint32x4_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c new file mode 100644 index 0000000000000000000000000000000000000000..427687fbf404940fc5a8334bac5034e2aeb9112e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u64.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint64_t +foo (uint64x2_t a) +{ + return vgetq_lane_u64 (a, 0); +} + +/* { dg-final { scan-assembler {vmov\tr0, r1, d0} } } */ + +uint64_t +foo1 (uint64x2_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler {vmov\tr0, r1, d0} } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c new file mode 100644 index 0000000000000000000000000000000000000000..4db14fce783eead4c0f602b02f7e1353adc0c671 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint8_t +foo (uint8x16_t a) +{ + return vgetq_lane_u8 (a, 0); +} + +/* { dg-final { scan-assembler "vmov.u8" } } */ + +uint8_t +foo1 (uint8x16_t a) +{ + return vgetq_lane (a, 0); +} + +/* { dg-final { scan-assembler "vmov.u8" } } */ diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c new file mode 100644 index 0000000000000000000000000000000000000000..a2e4b765b61c612c580e2177a57255786e5529e2 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +float16x8_t +foo (float16_t a, float16x8_t b) +{ + return vsetq_lane_f16 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.16" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c new file mode 100644 index 0000000000000000000000000000000000000000..c538d62c41145434a74770b75c762da4c3fc5898 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +float32x4_t +foo (float32_t a, float32x4_t b) +{ + return vsetq_lane_f32 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c new file mode 100644 index 0000000000000000000000000000000000000000..648789dc8df3f7bdc640255c6fb3f458a554c8e5 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int16x8_t +foo (int16_t a, int16x8_t b) +{ + return vsetq_lane_s16 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.16" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c new file mode 100644 index 0000000000000000000000000000000000000000..18ca25b637ca5335d8068aee41f2a220416f660d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int32x4_t +foo (int32_t a, int32x4_t b) +{ + return vsetq_lane_s32 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c new file mode 100644 index 0000000000000000000000000000000000000000..0c82eeb081cb2733078a07a7e13f3854a51afd27 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int64x2_t +foo (int64_t a, int64x2_t b) +{ + return vsetq_lane_s64 (a, b, 0); +} + +/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]} } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c new file mode 100644 index 0000000000000000000000000000000000000000..87c06cd013da9a6b8e5a866bc946f2e10cd81522 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +int8x16_t +foo (int8_t a, int8x16_t b) +{ + return vsetq_lane_s8 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.8" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c new file mode 100644 index 0000000000000000000000000000000000000000..92984297ad1d83bab29d1cfd12d72932eda50fe4 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint16x8_t +foo (uint16_t a, uint16x8_t b) +{ + return vsetq_lane_u16 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.16" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c new file mode 100644 index 0000000000000000000000000000000000000000..755e7c9a978cd28dfcef108f6e624413fea35dff --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint32x4_t +foo (uint32_t a, uint32x4_t b) +{ + return vsetq_lane_u32 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.32" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c new file mode 100644 index 0000000000000000000000000000000000000000..d7d593b2193cec787fefe547a47b1261def00f40 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint64x2_t +foo (uint64_t a, uint64x2_t b) +{ + return vsetq_lane_u64 (a, b, 0); +} + +/* { dg-final { scan-assembler {vmov\td0, r[1-9]*[0-9], r[1-9]*[0-9]} } } */ + diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c new file mode 100644 index 0000000000000000000000000000000000000000..b263360fc1a23124e91a705232cc6eb77012f803 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard -O2" } */ +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mfpu=auto"} } */ + +#include "arm_mve.h" + +uint8x16_t +foo (uint8_t a, uint8x16_t b) +{ + return vsetq_lane_u8 (a, b, 0); +} + +/* { dg-final { scan-assembler "vmov.8" } } */ +