From patchwork Wed Oct 28 11:16:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Corallo X-Patchwork-Id: 1389241 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=hWi2fUj3; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CLmGq6XCkz9sVD for ; Wed, 28 Oct 2020 22:16:19 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 65143396E02B; Wed, 28 Oct 2020 11:16:17 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 65143396E02B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1603883777; bh=h26IRKPtvPvbzQBuL2utNCVhVIQCQ4xZo52TZPdDTLA=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=hWi2fUj3DKNOe4LaJIGOc+RZp3STCvnacMJSM8jhb0XX7tdv7Yg2/6mZpfyfZqCED /7VS+RLNib0tZ1UyXJa4mrAIQBjHKXPrGpLakO+Rpe1SnQe2As4M4RX7VLQY4G0vFC /4Q/pBP09bPrjrR5zFMy3WpEaHCiHilLOFEjPzHc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60041.outbound.protection.outlook.com [40.107.6.41]) by sourceware.org (Postfix) with ESMTPS id 40D13396EC83 for ; Wed, 28 Oct 2020 11:16:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 40D13396EC83 Received: from DB7PR05CA0029.eurprd05.prod.outlook.com (2603:10a6:10:36::42) by VI1PR08MB5309.eurprd08.prod.outlook.com (2603:10a6:803:133::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3455.28; Wed, 28 Oct 2020 11:16:10 +0000 Received: from DB5EUR03FT019.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:36:cafe::7e) by DB7PR05CA0029.outlook.office365.com (2603:10a6:10:36::42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.18 via Frontend Transport; Wed, 28 Oct 2020 11:16:10 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5EUR03FT019.mail.protection.outlook.com (10.152.20.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3520.15 via Frontend Transport; Wed, 28 Oct 2020 11:16:09 +0000 Received: ("Tessian outbound 7c188528bfe0:v64"); Wed, 28 Oct 2020 11:16:09 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: baaf7226832c5293 X-CR-MTA-TID: 64aa7808 Received: from 3a65761da2d6.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 4D52B3AD-331E-43E4-8367-F5B368995634.1; Wed, 28 Oct 2020 11:16:04 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 3a65761da2d6.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Oct 2020 11:16:04 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=O5oEbbDbwzFaJZQ+ocmEFqX7ypyHlUh95ec53E0bZ1B6WXhcbObGsqt+h6vTcyNjxMOino+EEbzDQtOCHsZre6Vq5bfuyIo4yKhvc9mu9kQlIOj7yfeT86XFVjR0KHvoRamrGPaW9E0RRgM623PACNIFPTGV8wgi8kZ0M2r2IbRHWqKl8OSmVHo/eeYFTAs3zUqftFl19HDqZDAmCGWQ9KsncWw139SfxvoqxiaGcg7feemR2eVXLRFSq54jDw+iIh2Mk+lIAKBHqmU1CQc3O0P/ILzi7fF2p9OPwwU1Gs1cdfKyVZEXOTf8CfLCq3qpswEw39R0LHB4NjbcvEybZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=h26IRKPtvPvbzQBuL2utNCVhVIQCQ4xZo52TZPdDTLA=; b=GiOb19cJ7/dXm4mmkGPi1pjpDcZ2tpvbkGHEhfbZZz0eujnGawCP7rDtFgIhY4MkMjHXunfgrTpQ4mFHfMKzH2ByRaY4AXhjEh6WTiA4k5w5T87xunwHCuryRHSIKMTjS2OFtbBN7lz3Cc7nk3fzsK+Q4ZBydvcOZC6u11OQg6wU814qeu8kQ1XDPn+t7F+CmYB3AdoPFrHKDMDEPRcAKIfhXMMSF6DHslOd/KJPhHd0Ul5Hh3rFZzPc261e1lmMjqXZevOzm4W/vUijvS2jX/P+p9zBWPBHFJ9n88YmE81UWsyTdIAkZagZaGn1zz1HOuGZqvFZjA1SIRNolX0eTQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; Received: from AM6PR08MB4900.eurprd08.prod.outlook.com (2603:10a6:20b:cc::10) by AM6PR08MB3496.eurprd08.prod.outlook.com (2603:10a6:20b:4e::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.18; Wed, 28 Oct 2020 11:16:02 +0000 Received: from AM6PR08MB4900.eurprd08.prod.outlook.com ([fe80::48cc:50a9:4261:4260]) by AM6PR08MB4900.eurprd08.prod.outlook.com ([fe80::48cc:50a9:4261:4260%7]) with mapi id 15.20.3499.018; Wed, 28 Oct 2020 11:16:02 +0000 To: Andrea Corallo via Gcc-patches Subject: [PATCH V3] aarch64: Add vstN_lane_bf16 + vstNq_lane_bf16 intrinsics References: Date: Wed, 28 Oct 2020 12:16:00 +0100 In-Reply-To: (Andrea Corallo via Gcc-patches's message of "Mon, 26 Oct 2020 10:08:45 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) X-Originating-IP: [217.140.106.37] X-ClientProxiedBy: LNXP265CA0046.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:5c::34) To AM6PR08MB4900.eurprd08.prod.outlook.com (2603:10a6:20b:cc::10) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from e124257 (217.140.106.37) by LNXP265CA0046.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:5c::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3499.19 via Frontend Transport; Wed, 28 Oct 2020 11:16:01 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 4bd20732-7072-4b9b-d146-08d87b32df6c X-MS-TrafficTypeDiagnostic: AM6PR08MB3496:|VI1PR08MB5309: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:8273;OLM:8273; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: sQoVTnmEVpP2IINU7BsIAqrm9kcRb4TctsP8FGl/0m8/Q8XKkTamkti3bn87zzzUH8s9fW3KP1bnkbWm10tY80ZGdXv8nNmUDSW0rVdbaH5YTObORSFQQvKw5Sgq8cLbL4quXopAwMuijgHc9uNBokVPrZzVVXztYNozQDzuUxrECAw5sE/jOcHETQprTGfgIJN3KmuxQSYAsm8ZV6sFTdLKOzubGJusUHQYmlikjX7VxMNqO1oA4R/IpPeNNk3tTh4Q9AI7v5v65eV/LJ69DCqultA/KapHt9PLVG7qjDHlTd9qDxo2ZFhp4CoG4rcfl+temSYnlxWDu1O8ioSMrz4Aez9yXmz5Awxl62yja8xEe1t32c/U3NsErBl/i4AbYnxclUChouC3dJb4VvhgYw== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:AM6PR08MB4900.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39860400002)(376002)(366004)(396003)(136003)(346002)(6496006)(44832011)(16526019)(186003)(478600001)(54906003)(2906002)(33964004)(316002)(52116002)(66946007)(66556008)(66476007)(66616009)(5660300002)(26005)(6486002)(8676002)(36756003)(235185007)(4326008)(956004)(2616005)(8936002)(86362001)(6916009); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData: 2m58dUE1szaZqqMNiSmUtRrZBE92+AC4Srab8HcNI6axRzdAABLtG3GerZ92Ij2OViwJyk3PcJ5RAG/ppsqFwREfm7+urH7jV9Xy5LIKjMVSx4yEcqQ+JJlUQHE5c8CyIGJqdl9V6p58deTMcvqRmK1l1bUt6uP6rQSAFDU/pM2pkAHoN6D6/44p8U5jC2cyUGfuo2v8O18KYyItqOaLuFKhLBp/p6n9TsYwSfrbN5DTfNDnv7U3pR8UYT+obZH/7iA2gBJCfg8x2o22WMuvucG9fvvYzsN2wE44jcEKfWpX8aF343G4gwzrEERMnVXuWm/qKkIZi+bdSqJM6pnq5qaW4bY1xU5QSLKIA/J2YzScdDvqGTv9J/qoIBXexDCRIAI0xOd4Niu7S6tChQ4djpdBj/9icn9CExOQafGXPfj3FgQMY2a+Fm7Uh1T/rXSC+0PeXWQLp34L67U/83Xgo6BysDL7PLlSbdeMDM4NNiJKBOq9rgbgTaVb8KrroWolkL58wOG+IGnRw4I3cm6sK3QzHuGk8kM5CFx0WxtRv3GqumSfNxYUA19DZVPcw4FQs+wCSDKCg5ZJNf71KTMtBewas7Ig0/2aK1ikeyhURs8WJBgURNh3pvWMIBfsSFf3wyQdnb6DLaofqIlX9wfpFQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB3496 Original-Authentication-Results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT019.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 49500b8d-12a1-435b-61c3-08d87b32dab4 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9nm+Md/8E9mf0pnCSaWaCBPy1p/rYph3a38/JQQYIQCw2QGHHbnlPsm26DNcMYZwuGiTqNyLfwTK18AmA+7fwKfZMBZlNZRgG7pkrtF1HWV10t1KfvJmtQQXEV4a8Jakn5n35taNTXPOWxfJUYe/J5/feIOIVnSd4JZUGVswyWU75jFd1xiMX/yOOnKrWEhzBMgoFTtppgVlxLLWEeRxXFySNTAP2A0a+VRDOmX1R1QPmoZ6Zhg/kSC2eCVFwXFuOHEmW6YbUABmVpqw98cABcFP1W6PtPx1gkQPuWyCb8TL7TUiVAM5hApz9Plu5P1hu5L/ZwtCBHYeVJYUVbqPBq8juvxbsxuOMZeMVSt/vyhOegsnHT1Mnk2LeDaL5PvROQm4s24XJ05mSMIZ/sHJtaB3E8O9TRNt6JZ7aGYpyKXm0pOd7poNj6IptqRVG1ezYZmhEiBx4aIatM1V0Vh83tQABQ2GzvTQxJ/IBO30zCs= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(346002)(396003)(39860400002)(136003)(46966005)(235185007)(66616009)(70206006)(186003)(6496006)(5660300002)(54906003)(316002)(70586007)(44832011)(16526019)(33964004)(26005)(336012)(82740400003)(2616005)(81166007)(478600001)(4326008)(86362001)(6486002)(6916009)(82310400003)(356005)(8936002)(36756003)(956004)(2906002)(8676002)(47076004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Oct 2020 11:16:09.9877 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4bd20732-7072-4b9b-d146-08d87b32df6c X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5EUR03FT019.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB5309 X-Spam-Status: No, score=-14.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_SHORT, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Andrea Corallo via Gcc-patches From: Andrea Corallo Reply-To: Andrea Corallo Cc: Richard Earnshaw , nd , richard.sandiford@arm.com Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Andrea Corallo via Gcc-patches writes: > Hi all, > > Second version of the patch here implementing the bfloat16_t neon > related store intrinsics: vst2_lane_bf16, vst2q_lane_bf16, > vst3_lane_bf16, vst3q_lane_bf16 vst4_lane_bf16, vst4q_lane_bf16. > > Please see refer to: > ACLE > ISA > > This better narrows testcases so they do not cause regressions for the > arm backend where these intrinsics are not yet present. > > Please see refer to: > ACLE > ISA > Hi all, third version of this patch following the suggestions got for its sister patch Regtested and bootstrapped. Okay for trunk and 10? Thanks! Andrea From 55535eada983c4be9cd6a4ba26afec685c01ba91 Mon Sep 17 00:00:00 2001 From: Andrea Corallo Date: Thu, 8 Oct 2020 11:02:09 +0200 Subject: [PATCH] aarch64: Add vstN_lane_bf16 + vstNq_lane_bf16 intrinsics gcc/ChangeLog 2020-10-19 Andrea Corallo * config/aarch64/arm_neon.h (__ST2_LANE_FUNC, __ST3_LANE_FUNC) (__ST4_LANE_FUNC): Rename the macro generating the 'q' variants into __ST2Q_LANE_FUNC, __ST2Q_LANE_FUNC, __ST2Q_LANE_FUNC so they all can be undefed at the and of the file. (vst2_lane_bf16, vst2q_lane_bf16, vst3_lane_bf16, vst3q_lane_bf16) (vst4_lane_bf16, vst4q_lane_bf16): Add new intrinsics. gcc/testsuite/ChangeLog 2020-10-19 Andrea Corallo * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h (hbfloat16_t): Define type. (CHECK_FP): Make it working for bfloat types. * gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_1.c: New file. * gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_2.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vst2_lane_bf16_indices_1.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_bf16_indices_1.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vst3_lane_bf16_indices_1.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_bf16_indices_1.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vst4_lane_bf16_indices_1.c: Likewise. * gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_bf16_indices_1.c: Likewise. --- gcc/config/aarch64/arm_neon.h | 110 +++++---- .../aarch64/advsimd-intrinsics/arm-neon-ref.h | 4 +- .../advsimd-intrinsics/bf16_vstN_lane_1.c | 227 ++++++++++++++++++ .../advsimd-intrinsics/bf16_vstN_lane_2.c | 52 ++++ .../vst2_lane_bf16_indices_1.c | 16 ++ .../vst2q_lane_bf16_indices_1.c | 16 ++ .../vst3_lane_bf16_indices_1.c | 16 ++ .../vst3q_lane_bf16_indices_1.c | 16 ++ .../vst4_lane_bf16_indices_1.c | 16 ++ .../vst4q_lane_bf16_indices_1.c | 16 ++ 10 files changed, 440 insertions(+), 49 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2_lane_bf16_indices_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_bf16_indices_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3_lane_bf16_indices_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_bf16_indices_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4_lane_bf16_indices_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_bf16_indices_1.c diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 8b380201553..7071610e90c 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -10926,8 +10926,7 @@ __ST2_LANE_FUNC (uint32x2x2_t, uint32x4x2_t, uint32_t, v2si, v4si, si, u32, __ST2_LANE_FUNC (uint64x1x2_t, uint64x2x2_t, uint64_t, di, v2di, di, u64, int64x2_t) -#undef __ST2_LANE_FUNC -#define __ST2_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix) \ +#define __ST2Q_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix) \ __extension__ extern __inline void \ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \ vst2q_lane_ ## funcsuffix (ptrtype *__ptr, \ @@ -10939,20 +10938,20 @@ vst2q_lane_ ## funcsuffix (ptrtype *__ptr, \ __ptr, __temp.__o, __c); \ } -__ST2_LANE_FUNC (float16x8x2_t, float16_t, v8hf, hf, f16) -__ST2_LANE_FUNC (float32x4x2_t, float32_t, v4sf, sf, f32) -__ST2_LANE_FUNC (float64x2x2_t, float64_t, v2df, df, f64) -__ST2_LANE_FUNC (poly8x16x2_t, poly8_t, v16qi, qi, p8) -__ST2_LANE_FUNC (poly16x8x2_t, poly16_t, v8hi, hi, p16) -__ST2_LANE_FUNC (poly64x2x2_t, poly64_t, v2di, di, p64) -__ST2_LANE_FUNC (int8x16x2_t, int8_t, v16qi, qi, s8) -__ST2_LANE_FUNC (int16x8x2_t, int16_t, v8hi, hi, s16) -__ST2_LANE_FUNC (int32x4x2_t, int32_t, v4si, si, s32) -__ST2_LANE_FUNC (int64x2x2_t, int64_t, v2di, di, s64) -__ST2_LANE_FUNC (uint8x16x2_t, uint8_t, v16qi, qi, u8) -__ST2_LANE_FUNC (uint16x8x2_t, uint16_t, v8hi, hi, u16) -__ST2_LANE_FUNC (uint32x4x2_t, uint32_t, v4si, si, u32) -__ST2_LANE_FUNC (uint64x2x2_t, uint64_t, v2di, di, u64) +__ST2Q_LANE_FUNC (float16x8x2_t, float16_t, v8hf, hf, f16) +__ST2Q_LANE_FUNC (float32x4x2_t, float32_t, v4sf, sf, f32) +__ST2Q_LANE_FUNC (float64x2x2_t, float64_t, v2df, df, f64) +__ST2Q_LANE_FUNC (poly8x16x2_t, poly8_t, v16qi, qi, p8) +__ST2Q_LANE_FUNC (poly16x8x2_t, poly16_t, v8hi, hi, p16) +__ST2Q_LANE_FUNC (poly64x2x2_t, poly64_t, v2di, di, p64) +__ST2Q_LANE_FUNC (int8x16x2_t, int8_t, v16qi, qi, s8) +__ST2Q_LANE_FUNC (int16x8x2_t, int16_t, v8hi, hi, s16) +__ST2Q_LANE_FUNC (int32x4x2_t, int32_t, v4si, si, s32) +__ST2Q_LANE_FUNC (int64x2x2_t, int64_t, v2di, di, s64) +__ST2Q_LANE_FUNC (uint8x16x2_t, uint8_t, v16qi, qi, u8) +__ST2Q_LANE_FUNC (uint16x8x2_t, uint16_t, v8hi, hi, u16) +__ST2Q_LANE_FUNC (uint32x4x2_t, uint32_t, v4si, si, u32) +__ST2Q_LANE_FUNC (uint64x2x2_t, uint64_t, v2di, di, u64) #define __ST3_LANE_FUNC(intype, largetype, ptrtype, mode, \ qmode, ptr_mode, funcsuffix, signedtype) \ @@ -11011,8 +11010,7 @@ __ST3_LANE_FUNC (uint32x2x3_t, uint32x4x3_t, uint32_t, v2si, v4si, si, u32, __ST3_LANE_FUNC (uint64x1x3_t, uint64x2x3_t, uint64_t, di, v2di, di, u64, int64x2_t) -#undef __ST3_LANE_FUNC -#define __ST3_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix) \ +#define __ST3Q_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix) \ __extension__ extern __inline void \ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \ vst3q_lane_ ## funcsuffix (ptrtype *__ptr, \ @@ -11024,20 +11022,20 @@ vst3q_lane_ ## funcsuffix (ptrtype *__ptr, \ __ptr, __temp.__o, __c); \ } -__ST3_LANE_FUNC (float16x8x3_t, float16_t, v8hf, hf, f16) -__ST3_LANE_FUNC (float32x4x3_t, float32_t, v4sf, sf, f32) -__ST3_LANE_FUNC (float64x2x3_t, float64_t, v2df, df, f64) -__ST3_LANE_FUNC (poly8x16x3_t, poly8_t, v16qi, qi, p8) -__ST3_LANE_FUNC (poly16x8x3_t, poly16_t, v8hi, hi, p16) -__ST3_LANE_FUNC (poly64x2x3_t, poly64_t, v2di, di, p64) -__ST3_LANE_FUNC (int8x16x3_t, int8_t, v16qi, qi, s8) -__ST3_LANE_FUNC (int16x8x3_t, int16_t, v8hi, hi, s16) -__ST3_LANE_FUNC (int32x4x3_t, int32_t, v4si, si, s32) -__ST3_LANE_FUNC (int64x2x3_t, int64_t, v2di, di, s64) -__ST3_LANE_FUNC (uint8x16x3_t, uint8_t, v16qi, qi, u8) -__ST3_LANE_FUNC (uint16x8x3_t, uint16_t, v8hi, hi, u16) -__ST3_LANE_FUNC (uint32x4x3_t, uint32_t, v4si, si, u32) -__ST3_LANE_FUNC (uint64x2x3_t, uint64_t, v2di, di, u64) +__ST3Q_LANE_FUNC (float16x8x3_t, float16_t, v8hf, hf, f16) +__ST3Q_LANE_FUNC (float32x4x3_t, float32_t, v4sf, sf, f32) +__ST3Q_LANE_FUNC (float64x2x3_t, float64_t, v2df, df, f64) +__ST3Q_LANE_FUNC (poly8x16x3_t, poly8_t, v16qi, qi, p8) +__ST3Q_LANE_FUNC (poly16x8x3_t, poly16_t, v8hi, hi, p16) +__ST3Q_LANE_FUNC (poly64x2x3_t, poly64_t, v2di, di, p64) +__ST3Q_LANE_FUNC (int8x16x3_t, int8_t, v16qi, qi, s8) +__ST3Q_LANE_FUNC (int16x8x3_t, int16_t, v8hi, hi, s16) +__ST3Q_LANE_FUNC (int32x4x3_t, int32_t, v4si, si, s32) +__ST3Q_LANE_FUNC (int64x2x3_t, int64_t, v2di, di, s64) +__ST3Q_LANE_FUNC (uint8x16x3_t, uint8_t, v16qi, qi, u8) +__ST3Q_LANE_FUNC (uint16x8x3_t, uint16_t, v8hi, hi, u16) +__ST3Q_LANE_FUNC (uint32x4x3_t, uint32_t, v4si, si, u32) +__ST3Q_LANE_FUNC (uint64x2x3_t, uint64_t, v2di, di, u64) #define __ST4_LANE_FUNC(intype, largetype, ptrtype, mode, \ qmode, ptr_mode, funcsuffix, signedtype) \ @@ -11101,8 +11099,7 @@ __ST4_LANE_FUNC (uint32x2x4_t, uint32x4x4_t, uint32_t, v2si, v4si, si, u32, __ST4_LANE_FUNC (uint64x1x4_t, uint64x2x4_t, uint64_t, di, v2di, di, u64, int64x2_t) -#undef __ST4_LANE_FUNC -#define __ST4_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix) \ +#define __ST4Q_LANE_FUNC(intype, ptrtype, mode, ptr_mode, funcsuffix) \ __extension__ extern __inline void \ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) \ vst4q_lane_ ## funcsuffix (ptrtype *__ptr, \ @@ -11114,20 +11111,20 @@ vst4q_lane_ ## funcsuffix (ptrtype *__ptr, \ __ptr, __temp.__o, __c); \ } -__ST4_LANE_FUNC (float16x8x4_t, float16_t, v8hf, hf, f16) -__ST4_LANE_FUNC (float32x4x4_t, float32_t, v4sf, sf, f32) -__ST4_LANE_FUNC (float64x2x4_t, float64_t, v2df, df, f64) -__ST4_LANE_FUNC (poly8x16x4_t, poly8_t, v16qi, qi, p8) -__ST4_LANE_FUNC (poly16x8x4_t, poly16_t, v8hi, hi, p16) -__ST4_LANE_FUNC (poly64x2x4_t, poly64_t, v2di, di, p64) -__ST4_LANE_FUNC (int8x16x4_t, int8_t, v16qi, qi, s8) -__ST4_LANE_FUNC (int16x8x4_t, int16_t, v8hi, hi, s16) -__ST4_LANE_FUNC (int32x4x4_t, int32_t, v4si, si, s32) -__ST4_LANE_FUNC (int64x2x4_t, int64_t, v2di, di, s64) -__ST4_LANE_FUNC (uint8x16x4_t, uint8_t, v16qi, qi, u8) -__ST4_LANE_FUNC (uint16x8x4_t, uint16_t, v8hi, hi, u16) -__ST4_LANE_FUNC (uint32x4x4_t, uint32_t, v4si, si, u32) -__ST4_LANE_FUNC (uint64x2x4_t, uint64_t, v2di, di, u64) +__ST4Q_LANE_FUNC (float16x8x4_t, float16_t, v8hf, hf, f16) +__ST4Q_LANE_FUNC (float32x4x4_t, float32_t, v4sf, sf, f32) +__ST4Q_LANE_FUNC (float64x2x4_t, float64_t, v2df, df, f64) +__ST4Q_LANE_FUNC (poly8x16x4_t, poly8_t, v16qi, qi, p8) +__ST4Q_LANE_FUNC (poly16x8x4_t, poly16_t, v8hi, hi, p16) +__ST4Q_LANE_FUNC (poly64x2x4_t, poly64_t, v2di, di, p64) +__ST4Q_LANE_FUNC (int8x16x4_t, int8_t, v16qi, qi, s8) +__ST4Q_LANE_FUNC (int16x8x4_t, int16_t, v8hi, hi, s16) +__ST4Q_LANE_FUNC (int32x4x4_t, int32_t, v4si, si, s32) +__ST4Q_LANE_FUNC (int64x2x4_t, int64_t, v2di, di, s64) +__ST4Q_LANE_FUNC (uint8x16x4_t, uint8_t, v16qi, qi, u8) +__ST4Q_LANE_FUNC (uint16x8x4_t, uint16_t, v8hi, hi, u16) +__ST4Q_LANE_FUNC (uint32x4x4_t, uint32_t, v4si, si, u32) +__ST4Q_LANE_FUNC (uint64x2x4_t, uint64_t, v2di, di, u64) __extension__ extern __inline int64_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) @@ -35749,6 +35746,17 @@ __LD3Q_LANE_FUNC (bfloat16x8x3_t, bfloat16x8_t, bfloat16_t, v8bf, bf, bf16) __LD4_LANE_FUNC (bfloat16x4x4_t, bfloat16x4_t, bfloat16x8x4_t, bfloat16_t, v4bf, v8bf, bf, bf16, bfloat16x8_t) __LD4Q_LANE_FUNC (bfloat16x8x4_t, bfloat16x8_t, bfloat16_t, v8bf, bf, bf16) + +__ST2_LANE_FUNC (bfloat16x4x2_t, bfloat16x8x2_t, bfloat16_t, v4bf, v8bf, bf, + bf16, bfloat16x8_t) +__ST2Q_LANE_FUNC (bfloat16x8x2_t, bfloat16_t, v8bf, bf, bf16) +__ST3_LANE_FUNC (bfloat16x4x3_t, bfloat16x8x3_t, bfloat16_t, v4bf, v8bf, bf, + bf16, bfloat16x8_t) +__ST3Q_LANE_FUNC (bfloat16x8x3_t, bfloat16_t, v8bf, bf, bf16) +__ST4_LANE_FUNC (bfloat16x4x4_t, bfloat16x8x4_t, bfloat16_t, v4bf, v8bf, bf, + bf16, bfloat16x8_t) +__ST4Q_LANE_FUNC (bfloat16x8x4_t, bfloat16_t, v8bf, bf, bf16) + #pragma GCC pop_options /* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics. */ @@ -35968,5 +35976,11 @@ vaddq_p128 (poly128_t __a, poly128_t __b) #undef __LD3Q_LANE_FUNC #undef __LD4_LANE_FUNC #undef __LD4Q_LANE_FUNC +#undef __ST2_LANE_FUNC +#undef __ST2Q_LANE_FUNC +#undef __ST3_LANE_FUNC +#undef __ST3Q_LANE_FUNC +#undef __ST4_LANE_FUNC +#undef __ST4Q_LANE_FUNC #endif diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h index 791972c737e..61fe7e759dc 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h @@ -11,6 +11,8 @@ typedef uint16_t hfloat16_t; typedef uint32_t hfloat32_t; typedef uint64_t hfloat64_t; +typedef uint16_t hbfloat16_t; + extern void abort(void); extern void *memset(void *, int, size_t); extern void *memcpy(void *, const void *, size_t); @@ -107,7 +109,7 @@ extern size_t strlen(const char *); { \ union fp_operand { \ uint##W##_t i; \ - float##W##_t f; \ + T##W##_t f; \ } tmp_res, tmp_exp; \ tmp_res.f = VECT_VAR(result, T, W, N)[i]; \ tmp_exp.i = VECT_VAR(EXPECTED, h##T, W, N)[i]; \ diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_1.c new file mode 100644 index 00000000000..2c70bb9de9c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_1.c @@ -0,0 +1,227 @@ +/* { dg-do run { target { aarch64*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ + +#include +#include "arm-neon-ref.h" +#include "compute-ref-data.h" + +/* Expected results for vst2, chunk 0. */ +VECT_VAR_DECL(expected_st2_0,hbfloat,16,4) [] = { 0xABAB, 0x3210, 0x0, 0x0 }; +VECT_VAR_DECL(expected_st2_0,hbfloat,16,8) [] = { 0xABAB, 0x3210, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0 }; + +/* Expected results for vst2, chunk 1. */ +VECT_VAR_DECL(expected_st2_1,hbfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 }; +VECT_VAR_DECL(expected_st2_1,hbfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0 }; + +/* Expected results for vst3, chunk 0. */ +VECT_VAR_DECL(expected_st3_0,hbfloat,16,4) [] = { 0xABAB, 0x3210, 0xCAFE, 0x0 }; +VECT_VAR_DECL(expected_st3_0,hbfloat,16,8) [] = { 0xABAB, 0x3210, 0xCAFE, 0x0, + 0x0, 0x0, 0x0, 0x0 }; + +/* Expected results for vst3, chunk 1. */ +VECT_VAR_DECL(expected_st3_1,hbfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 }; +VECT_VAR_DECL(expected_st3_1,hbfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0 }; + +/* Expected results for vst3, chunk 2. */ +VECT_VAR_DECL(expected_st3_2,hbfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 }; +VECT_VAR_DECL(expected_st3_2,hbfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0 }; + +/* Expected results for vst4, chunk 0. */ +VECT_VAR_DECL(expected_st4_0,hbfloat,16,4) [] = + { 0xABAB, 0x3210, 0xCAFE, 0x1234 }; +VECT_VAR_DECL(expected_st4_0,hbfloat,16,8) [] = + { 0xABAB, 0x3210, 0xCAFE, 0x1234, 0x0, 0x0, 0x0, 0x0 }; + +/* Expected results for vst4, chunk 1. */ +VECT_VAR_DECL(expected_st4_1,hbfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 }; +VECT_VAR_DECL(expected_st4_1,hbfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0 }; + +/* Expected results for vst4, chunk 2. */ +VECT_VAR_DECL(expected_st4_2,hbfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 }; +VECT_VAR_DECL(expected_st4_2,hbfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0 }; + +/* Expected results for vst4, chunk 3. */ +VECT_VAR_DECL(expected_st4_3,hbfloat,16,4) [] = { 0x0, 0x0, 0x0, 0x0 }; +VECT_VAR_DECL(expected_st4_3,hbfloat,16,8) [] = { 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0 }; + +typedef union +{ + bfloat16_t bf16; + uint16_t u16; +} bfloat16_u_t; + +static bfloat16_t result_bfloat16x4[4]; +static bfloat16_t result_bfloat16x8[8]; + +void exec_vstX_lane (void) +{ + bfloat16_u_t bfloat16_data[4]; + bfloat16_data[0].u16 = 0xABAB; + bfloat16_data[1].u16 = 0x3210; + bfloat16_data[2].u16 = 0xCAFE; + bfloat16_data[3].u16 = 0x1234; + + bfloat16_t buffer_vld2_lane_bfloat16x2 [2] = + { bfloat16_data[0].bf16, + bfloat16_data[1].bf16 }; + bfloat16_t buffer_vld3_lane_bfloat16x3 [3] = + { bfloat16_data[0].bf16, + bfloat16_data[1].bf16, + bfloat16_data[2].bf16 }; + bfloat16_t buffer_vld4_lane_bfloat16x4 [4] = + { bfloat16_data[0].bf16, + bfloat16_data[1].bf16, + bfloat16_data[2].bf16, + bfloat16_data[3].bf16 }; + + /* In this case, input variables are arrays of vectors. */ +#define DECL_VSTX_LANE(T1, W, N, X) \ + VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vector, T1, W, N, X); \ + VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vector_src, T1, W, N, X); \ + VECT_VAR_DECL(result_bis_##X, T1, W, N)[X * N] + + /* We need to use a temporary result buffer (result_bis), because + the one used for other tests is not large enough. A subset of the + result data is moved from result_bis to result, and it is this + subset which is used to check the actual behavior. The next + macro enables to move another chunk of data from result_bis to + result. */ + /* We also use another extra input buffer (buffer_src), which we + fill with 0xAA, and which it used to load a vector from which we + read a given lane. */ +#define TEST_VSTX_LANE(Q, T1, T2, W, N, X, L) \ + memset (VECT_VAR(buffer_src, T1, W, N), 0xAA, \ + sizeof(VECT_VAR(buffer_src, T1, W, N))); \ + memset (VECT_VAR(result_bis_##X, T1, W, N), 0, \ + sizeof(VECT_VAR(result_bis_##X, T1, W, N))); \ + \ + VECT_ARRAY_VAR(vector_src, T1, W, N, X) = \ + vld##X##Q##_##T2##W(VECT_VAR(buffer_src, T1, W, N)); \ + \ + VECT_ARRAY_VAR(vector, T1, W, N, X) = \ + /* Use dedicated init buffer, of size X. */ \ + vld##X##Q##_lane_##T2##W(VECT_VAR(buffer_vld##X##_lane, T1, W, X), \ + VECT_ARRAY_VAR(vector_src, T1, W, N, X), \ + L); \ + vst##X##Q##_lane_##T2##W(VECT_VAR(result_bis_##X, T1, W, N), \ + VECT_ARRAY_VAR(vector, T1, W, N, X), \ + L); \ + memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(result_bis_##X, T1, W, N), \ + sizeof(VECT_VAR(result, T1, W, N))); + + /* Overwrite "result" with the contents of "result_bis"[Y]. */ +#define TEST_EXTRA_CHUNK(T1, W, N, X, Y) \ + memcpy(VECT_VAR(result, T1, W, N), \ + &(VECT_VAR(result_bis_##X, T1, W, N)[Y*N]), \ + sizeof(VECT_VAR(result, T1, W, N))); + +#define DUMMY_ARRAY(V, T, W, N, L) VECT_VAR_DECL(V,T,W,N)[N*L] + + DECL_VSTX_LANE(bfloat, 16, 4, 2); + DECL_VSTX_LANE(bfloat, 16, 8, 2); + DECL_VSTX_LANE(bfloat, 16, 4, 3); + DECL_VSTX_LANE(bfloat, 16, 8, 3); + DECL_VSTX_LANE(bfloat, 16, 4, 4); + DECL_VSTX_LANE(bfloat, 16, 8, 4); + + DUMMY_ARRAY(buffer_src, bfloat, 16, 4, 4); + DUMMY_ARRAY(buffer_src, bfloat, 16, 8, 4); + + /* Check vst2_lane/vst2q_lane. */ + clean_results (); + TEST_VSTX_LANE(, bfloat, bf, 16, 4, 2, 2); + TEST_VSTX_LANE(q, bfloat, bf, 16, 8, 2, 6); + +#undef CMT +#define CMT " (chunk 0)" +#undef TEST_MSG +#define TEST_MSG "VST2_LANE/VST2Q_LANE" + CHECK_FP(TEST_MSG, bfloat, 16, 4, PRIx16, expected_st2_0, CMT); + CHECK_FP(TEST_MSG, bfloat, 16, 8, PRIx16, expected_st2_0, CMT); + TEST_EXTRA_CHUNK(bfloat, 16, 4, 2, 1); + TEST_EXTRA_CHUNK(bfloat, 16, 8, 2, 1); + +#undef CMT +#define CMT " (chunk 1)" + CHECK_FP(TEST_MSG, bfloat, 16, 4, PRIx16, expected_st2_1, CMT); + CHECK_FP(TEST_MSG, bfloat, 16, 8, PRIx16, expected_st2_1, CMT); + + /* Check vst3_lane/vst3q_lane. */ + clean_results (); +#undef TEST_MSG +#define TEST_MSG "VST3_LANE/VST3Q_LANE" + TEST_VSTX_LANE(, bfloat, bf, 16, 4, 3, 2); + TEST_VSTX_LANE(q, bfloat, bf, 16, 8, 3, 6); + +#undef CMT +#define CMT " (chunk 0)" + CHECK_FP(TEST_MSG, bfloat, 16, 4, PRIx16, expected_st3_0, CMT); + CHECK_FP(TEST_MSG, bfloat, 16, 8, PRIx16, expected_st3_0, CMT); + + TEST_EXTRA_CHUNK(bfloat, 16, 4, 3, 1); + TEST_EXTRA_CHUNK(bfloat, 16, 8, 3, 1); + + +#undef CMT +#define CMT " (chunk 1)" + CHECK_FP(TEST_MSG, bfloat, 16, 4, PRIx16, expected_st3_1, CMT); + CHECK_FP(TEST_MSG, bfloat, 16, 8, PRIx16, expected_st3_1, CMT); + + TEST_EXTRA_CHUNK(bfloat, 16, 4, 3, 2); + TEST_EXTRA_CHUNK(bfloat, 16, 8, 3, 2); + +#undef CMT +#define CMT " (chunk 2)" + CHECK_FP(TEST_MSG, bfloat, 16, 4, PRIx16, expected_st3_2, CMT); + CHECK_FP(TEST_MSG, bfloat, 16, 8, PRIx16, expected_st3_2, CMT); + + /* Check vst4_lane/vst4q_lane. */ + clean_results (); +#undef TEST_MSG +#define TEST_MSG "VST4_LANE/VST4Q_LANE" + TEST_VSTX_LANE(, bfloat, bf, 16, 4, 4, 2); + TEST_VSTX_LANE(q, bfloat, bf, 16, 8, 4, 6); + +#undef CMT +#define CMT " (chunk 0)" + CHECK_FP(TEST_MSG, bfloat, 16, 4, PRIx16, expected_st4_0, CMT); + CHECK_FP(TEST_MSG, bfloat, 16, 8, PRIx16, expected_st4_0, CMT); + + TEST_EXTRA_CHUNK(bfloat, 16, 4, 4, 1); + TEST_EXTRA_CHUNK(bfloat, 16, 8, 4, 1); + +#undef CMT +#define CMT " (chunk 1)" + CHECK_FP(TEST_MSG, bfloat, 16, 4, PRIx16, expected_st4_1, CMT); + CHECK_FP(TEST_MSG, bfloat, 16, 8, PRIx16, expected_st4_1, CMT); + + TEST_EXTRA_CHUNK(bfloat, 16, 4, 4, 2); + TEST_EXTRA_CHUNK(bfloat, 16, 8, 4, 2); + +#undef CMT +#define CMT " (chunk 2)" + CHECK_FP(TEST_MSG, bfloat, 16, 4, PRIx16, expected_st4_2, CMT); + CHECK_FP(TEST_MSG, bfloat, 16, 8, PRIx16, expected_st4_2, CMT); + + TEST_EXTRA_CHUNK(bfloat, 16, 4, 4, 3); + TEST_EXTRA_CHUNK(bfloat, 16, 8, 4, 3); + +#undef CMT +#define CMT " (chunk 3)" + CHECK_FP(TEST_MSG, bfloat, 16, 4, PRIx16, expected_st4_3, CMT); + CHECK_FP(TEST_MSG, bfloat, 16, 8, PRIx16, expected_st4_3, CMT); +} + +int main (void) +{ + exec_vstX_lane (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_2.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_2.c new file mode 100644 index 00000000000..f70c34dbd83 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_2.c @@ -0,0 +1,52 @@ +/* { dg-do assemble { target { aarch64*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ +/* { dg-additional-options "-O2 --save-temps" } */ + +#include + +void +test_vst2_lane_bf16 (bfloat16_t *ptr, bfloat16x4x2_t b) +{ + vst2_lane_bf16 (ptr, b, 2); +} + +/* { dg-final { scan-assembler-times "st2\\t{v2.h - v3.h}\\\[2\\\], \\\[x0\\\]" 1 } } */ + +void +test_vst2q_lane_bf16 (bfloat16_t *ptr, bfloat16x8x2_t b) +{ + vst2q_lane_bf16 (ptr, b, 2); +} + +/* { dg-final { scan-assembler-times "st2\\t{v0.h - v1.h}\\\[2\\\], \\\[x0\\\]" 1 } } */ + +void +test_vst3_lane_bf16 (bfloat16_t *ptr, bfloat16x4x3_t b) +{ + vst3_lane_bf16 (ptr, b, 2); +} + +void +test_vst3q_lane_bf16 (bfloat16_t *ptr, bfloat16x8x3_t b) +{ + vst3q_lane_bf16 (ptr, b, 2); +} + +/* { dg-final { scan-assembler-times "st3\\t{v4.h - v6.h}\\\[2\\\], \\\[x0\\\]" 2 } } */ + +void +test_vst4_lane_bf16 (bfloat16_t *ptr, bfloat16x4x4_t b) +{ + vst4_lane_bf16 (ptr, b, 2); +} + +/* { dg-final { scan-assembler-times "st4\\t{v4.h - v7.h}\\\[2\\\], \\\[x0\\\]" 1 } } */ + +void +test_vst4q_lane_bf16 (bfloat16_t *ptr, bfloat16x8x4_t b) +{ + vst4q_lane_bf16 (ptr, b, 2); +} + +/* { dg-final { scan-assembler-times "st4\\t{v0.h - v3.h}\\\[2\\\], \\\[x0\\\]" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2_lane_bf16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2_lane_bf16_indices_1.c new file mode 100644 index 00000000000..4579217dbf2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2_lane_bf16_indices_1.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ + +#include + +void +f_vst2_lane_bf16 (bfloat16_t * p, bfloat16x4x2_t v) +{ + /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 0 } */ + vst2_lane_bf16 (p, v, 4); + /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 0 } */ + vst2_lane_bf16 (p, v, -1); + return; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_bf16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_bf16_indices_1.c new file mode 100644 index 00000000000..29b72eae291 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_bf16_indices_1.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ + +#include + +void +f_vst2q_lane_bf16 (bfloat16_t * p, bfloat16x8x2_t v) +{ + /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */ + vst2q_lane_bf16 (p, v, 8); + /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 0 } */ + vst2q_lane_bf16 (p, v, -1); + return; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3_lane_bf16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3_lane_bf16_indices_1.c new file mode 100644 index 00000000000..ee0117f813a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3_lane_bf16_indices_1.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ + +#include + +void +f_vst3_lane_bf16 (bfloat16_t * p, bfloat16x4x3_t v) +{ + /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 0 } */ + vst3_lane_bf16 (p, v, 4); + /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 0 } */ + vst3_lane_bf16 (p, v, -1); + return; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_bf16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_bf16_indices_1.c new file mode 100644 index 00000000000..ae13a7f7f8d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_bf16_indices_1.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ + +#include + +void +f_vst3q_lane_bf16 (bfloat16_t * p, bfloat16x8x3_t v) +{ + /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */ + vst3q_lane_bf16 (p, v, 8); + /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 0 } */ + vst3q_lane_bf16 (p, v, -1); + return; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4_lane_bf16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4_lane_bf16_indices_1.c new file mode 100644 index 00000000000..541bd311d53 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4_lane_bf16_indices_1.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ + +#include + +void +f_vst4_lane_bf16 (bfloat16_t * p, bfloat16x4x4_t v) +{ + /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 0 } */ + vst4_lane_bf16 (p, v, 4); + /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 0 } */ + vst4_lane_bf16 (p, v, -1); + return; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_bf16_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_bf16_indices_1.c new file mode 100644 index 00000000000..f3c42db34ec --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_bf16_indices_1.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ + +#include + +void +f_vst4q_lane_bf16 (bfloat16_t * p, bfloat16x8x4_t v) +{ + /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */ + vst4q_lane_bf16 (p, v, 8); + /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 0 } */ + vst4q_lane_bf16 (p, v, -1); + return; +}