From patchwork Fri Dec 20 18:44:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Delia Burduv X-Patchwork-Id: 1214276 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-516390-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="F7ciVAi4"; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.b="i/lE61DM"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=armh.onmicrosoft.com header.i=@armh.onmicrosoft.com header.b="i/lE61DM"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47fd2R28gFz9sP6 for ; Sat, 21 Dec 2019 05:44:31 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; q=dns; s=default; b=L0mO/1XdCyump6RgDnUqUbl4kbp7hNrf+JTFOjYdUpUVQNwyOr oelDFa301DQ27sVaumYuuHFjhjuQ/z7YkLBtmf/1XpzvZp9lTGeVuLKfwI9piga2 5MhSPWa48nSa9fTUEHrGn2JwSnpFSMcepau0MlfbIs7qbtS9Uo+xn6Gh4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; s= default; bh=E3i3i+FHqiezMT0IjkeFqFeAzoc=; b=F7ciVAi4z394IIi+9WTL Ro1UKwSQ8WzjTtrQt+pQI/U6xgvcGnYdEaxNZ1A5rpvKgcihrjjcNBGVGHozbcft muTJVlRzjrbXkCv3kuxOodW3N5FTsjeVvy7WYZPuGOrP9SvJeP3iE0BNUd9sMW/x ydx8psBX7O16kXB2458VXIw= Received: (qmail 9975 invoked by alias); 20 Dec 2019 18:44:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 9966 invoked by uid 89); 20 Dec 2019 18:44:23 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 spammy= X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com Received: from mail-eopbgr130057.outbound.protection.outlook.com (HELO EUR01-HE1-obe.outbound.protection.outlook.com) (40.107.13.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 20 Dec 2019 18:44:19 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HNjM5dwerswOg7pSfLUtCrfSjtD5STGtvoxKQIe2OU0=; b=i/lE61DMKXCxU6VkxUE6nwgqgA8V6x2NfP2ETv+uKIWXZsCqT4T0LICgrXYjDGyjugIwoaQmMR0EpVKyHXn3m44+sB/zNXdeDFJvr8HBYKKfiBfIQvsQ4q0Y7jnrwN6prm+s8FIBTb65yN0Y8eIMo7UB+Sv0ytbHEHPUDzvhsGA= Received: from DB6PR0802CA0046.eurprd08.prod.outlook.com (2603:10a6:4:a3::32) by VI1PR0801MB1790.eurprd08.prod.outlook.com (2603:10a6:800:5b::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2559.14; Fri, 20 Dec 2019 18:44:16 +0000 Received: from AM5EUR03FT053.eop-EUR03.prod.protection.outlook.com (2a01:111:f400:7e08::206) by DB6PR0802CA0046.outlook.office365.com (2603:10a6:4:a3::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2559.14 via Frontend Transport; Fri, 20 Dec 2019 18:44:16 +0000 Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; gcc.gnu.org; dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT053.mail.protection.outlook.com (10.152.16.210) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2559.14 via Frontend Transport; Fri, 20 Dec 2019 18:44:16 +0000 Received: ("Tessian outbound 4f3bc9719026:v40"); Fri, 20 Dec 2019 18:44:16 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 5a2a5eba23dc0b95 X-CR-MTA-TID: 64aa7808 Received: from 29d473f4f6ac.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id B2972306-8624-47AB-88E6-66D1BCB088A2.1; Fri, 20 Dec 2019 18:44:10 +0000 Received: from EUR01-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 29d473f4f6ac.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 20 Dec 2019 18:44:10 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mpi1Wung8CmOtKWP6ep5ixPwimSRrzzgage6QjCkaTRXAqOcRwsmimCMCZN0JqJ4jhp6tu1WGy5iPbE2tQOh2e0WhlsVd8OzXKsXN+73KuOmXK8sex1Frvd+xJiIGr4ieeW4vjN0tV1iO/8oDfQ94RzyHuv/ckEk8pgMIQCk16qYAvlNvfdVctP3Arg7dN88yPJZSaQ0bVsp41mYdGnad/qD8899p+VCJWex7b4tGVXzTge5t7hfiODxhrImt+NVtq3WqcwceKVY7mIby5LeS+PjTqDkS7OTBrPNelqDm7F+8spdb7NC+A/BkwLo34FKIa7tLwgyd6l5brrYJttfPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HNjM5dwerswOg7pSfLUtCrfSjtD5STGtvoxKQIe2OU0=; b=BPhXImczmZZMJhA5AnWgBBzWkSoMeSgC+Xhfmo7JPQf0ZBE4qmWl6XEZPSk0EBFd2+rHmo4hOfA//5ZnbUH3/1nVlt8SE1nYkhgv2bPfohCkhIlEppZTBk8OKOuyZVYB4Q49zubCTAAHP51aDJncWNlOQWElHqnR6DYGYYAEKTdKDACLkU93HjJKhG44OkKJ3UZPNjdxp6aENCQNkFPdH6kljep9cVdGWWlr6oSpo/V/TlsBeFpg5BlpjwF2zDF8fAL7i0unD33fsJCmDW3Tx0s5eMhq9xYAxgNAw0wgqK5j6bOraRgUQoCvrVVWlGZQmezgzpkBWgFF61nUR5TmZQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HNjM5dwerswOg7pSfLUtCrfSjtD5STGtvoxKQIe2OU0=; b=i/lE61DMKXCxU6VkxUE6nwgqgA8V6x2NfP2ETv+uKIWXZsCqT4T0LICgrXYjDGyjugIwoaQmMR0EpVKyHXn3m44+sB/zNXdeDFJvr8HBYKKfiBfIQvsQ4q0Y7jnrwN6prm+s8FIBTb65yN0Y8eIMo7UB+Sv0ytbHEHPUDzvhsGA= Received: from VI1PR08MB4096.eurprd08.prod.outlook.com (20.178.126.87) by VI1PR08MB2894.eurprd08.prod.outlook.com (10.170.238.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2559.15; Fri, 20 Dec 2019 18:44:09 +0000 Received: from VI1PR08MB4096.eurprd08.prod.outlook.com ([fe80::65ed:6cb7:3c80:ba3b]) by VI1PR08MB4096.eurprd08.prod.outlook.com ([fe80::65ed:6cb7:3c80:ba3b%3]) with mapi id 15.20.2559.015; Fri, 20 Dec 2019 18:44:09 +0000 From: Delia Burduv To: "gcc-patches@gcc.gnu.org" CC: "nickc@redhat.com" , Richard Earnshaw , Ramana Radhakrishnan , Kyrylo Tkachov Subject: [GCC][PATCH][AArch32] ACLE intrinsics bfloat16 vmmla and vfma for AArch32 AdvSIMD Date: Fri, 20 Dec 2019 18:44:08 +0000 Message-ID: <1994305c-a05b-774b-15da-5f1474b7c841@arm.com> Authentication-Results-Original: spf=none (sender IP is ) smtp.mailfrom=Delia.Burduv@arm.com; x-ms-exchange-transport-forked: True x-checkrecipientrouted: true x-ms-oob-tlc-oobclassifiers: OLM:4125;OLM:4125; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM; SFS:(10009020)(4636009)(396003)(376002)(136003)(346002)(366004)(39860400002)(54534003)(199004)(189003)(6486002)(5660300002)(71200400001)(8676002)(4326008)(31696002)(86362001)(2906002)(6916009)(478600001)(316002)(54906003)(36756003)(81166006)(81156014)(6506007)(186003)(8936002)(31686004)(52116002)(2616005)(26005)(66476007)(4001150100001)(66446008)(66946007)(6512007)(64756008)(66556008)(66616009)(44832011); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR08MB2894; H:VI1PR08MB4096.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: bcKt0MgYb2ZkUMQ/2/M5e2DCfnBi0HJpJumJqmvIF/fVbuUYI9uu4NW+iDSZaoEEPi2AY3z+L83Lg0rJ5wz0UJBHKCKO7fiL4puKBi6sM1FRekWm4JPPcppF3JTttcCDIlBXRTL2wRkqlopd1UDlomXrkm6NU9tNczLFOu87UM+U/uwL6VM7WCPd6khBZFROq5DUwRizeFjPK4Z72xVSTy+oMVOALZJdhJyWbO8MaokeNk//nZSkSxMpJH59Q+WfW4KZNzL0BlRs+uyAEbIASQVDQGdqJveYc/V+UGk8fwM1d92EKzmhuUc4gXxjYVlSOEYUGwaEi/x27c5teehw0Xuph87opQtc3r+Zx3qfzNvCF1a4gdP0gid9FUQiUch+vZTIIiajswsBv+TMFIYZZciBvZuRNNB83IbZ6X44JjubXTbif5DF/OigtSZzVzx2mduyHDv7ewF+aHwSg5VYoYqpAIBPAeDc65uJV5mHvc9BKz6I+n9/cG2nU3tEBSAC8y/C0FBIbnU2UWuko+sS5zJlXMT/UF3+ENNXNS7nJRg= MIME-Version: 1.0 Original-Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Delia.Burduv@arm.com; X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT053.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 18f59462-8b4f-42e7-dccd-08d7857c9916 This patch adds the ARMv8.6 ACLE intrinsics for vmmla, vfmab and vfmat as part of the BFloat16 extension. (https://developer.arm.com/docs/101028/latest.) The intrinsics are declared in arm_neon.h and the RTL patterns are defined in neon.md. Two new tests are added to check assembler output and lane indices. This patch depends on the Arm back-end patche. (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html) Tested for regression on arm-none-eabi and armeb-none-eabi. I don't have commit rights, so if this is ok can someone please commit it for me? gcc/ChangeLog: 2019-11-12 Delia Burduv * config/arm/arm_neon.h (vbfmmlaq_f32): New. (vbfmlalbq_f32): New. (vbfmlaltq_f32): New. (vbfmlalbq_lane_f32): New. (vbfmlaltq_lane_f32): New. (vbfmlalbq_laneq_f32): New. (vbfmlaltq_laneq_f32): New. * config/arm/arm_neon_builtins.def (vbfmmla): New. (vbfmab): New. (vbfmat): New. (vbfmab_lane): New. (vbfmat_lane): New. (vbfmab_laneq): New. (vbfmat_laneq): New. * config/arm/iterators.md (BF_MA): New int iterator. (bt): New int attribute. (VQXBF): Copy of VQX with V8BF. (V_HALF): Added V8BF. * config/arm/neon.md (neon_vbfmmlav8hi): New insn. (neon_vbfmav8hi): New insn. (neon_vbfma_lanev8hi): New insn. (neon_vbfma_laneqv8hi): New expand. (neon_vget_high): Changed iterator to VQXBF. * config/arm/unspecs.md (UNSPEC_BFMMLA): New UNSPEC. (UNSPEC_BFMAB): New UNSPEC. (UNSPEC_BFMAT): New UNSPEC. 2019-11-12 Delia Burduv * gcc.target/arm/simd/bf16_ma_1.c: New test. * gcc.target/arm/simd/bf16_ma_2.c: New test. * gcc.target/arm/simd/bf16_mmla_1.c: New test. diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 71e7568e4315a9354062dee5442ca4af9d9660a9..097d7bb30ad0109ca2f41885206b1cfb2ce962dc 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -91,6 +91,60 @@ typedef float float32_t; #ifdef __ARM_FEATURE_BF16_VECTOR_ARITHMETIC typedef __simd128_bfloat16_t bfloat16x8_t; typedef __simd64_bfloat16_t bfloat16x4_t; + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ + return __builtin_neon_vbfmmlav8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlalbq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ + return __builtin_neon_vbfmabv8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlaltq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b) +{ + return __builtin_neon_vbfmatv8bf (__r, __a, __b); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlalbq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b, + const int __index) +{ + return __builtin_neon_vbfmab_lanev8bf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlaltq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b, + const int __index) +{ + return __builtin_neon_vbfmat_lanev8bf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlalbq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b, + const int __index) +{ + return __builtin_neon_vbfmab_laneqv8bf (__r, __a, __b, __index); +} + +__extension__ extern __inline float32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b, + const int __index) +{ + return __builtin_neon_vbfmat_laneqv8bf (__r, __a, __b, __index); +} + #endif #pragma GCC pop_options #pragma GCC pop_options diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def index bcccf93f7fa2750e9006e5856efecbec0fb331b9..169781fa9a07930eb755165019427be055dc36ef 100644 --- a/gcc/config/arm/arm_neon_builtins.def +++ b/gcc/config/arm/arm_neon_builtins.def @@ -373,3 +373,12 @@ VAR2 (MAC_LANE_PAIR, vcmlaq_lane0, v4sf, v8hf) VAR2 (MAC_LANE_PAIR, vcmlaq_lane90, v4sf, v8hf) VAR2 (MAC_LANE_PAIR, vcmlaq_lane180, v4sf, v8hf) VAR2 (MAC_LANE_PAIR, vcmlaq_lane270, v4sf, v8hf) + +VAR1 (TERNOP, vbfmmla, v8bf) + +VAR1 (TERNOP, vbfmab, v8bf) +VAR1 (TERNOP, vbfmat, v8bf) +VAR1 (MAC_LANE, vbfmab_lane, v8bf) +VAR1 (MAC_LANE, vbfmat_lane, v8bf) +VAR1 (MAC_LANE, vbfmab_laneq, v8bf) +VAR1 (MAC_LANE, vbfmat_laneq, v8bf) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 439021fa0733ac31706287c4f98d62b080afc3a1..b31f54ffe8957d3dad0a7e3d3fedc48911e7b2c4 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -108,6 +108,9 @@ ;; Quad-width vector modes plus 64-bit elements. (define_mode_iterator VQX [V16QI V8HI V8HF V4SI V4SF V2DI]) +;; Quad-width vector modes plus 64-bit elements and V8BF. +(define_mode_iterator VQXBF [V16QI V8HI V8HF (V8BF "TARGET_BF16_SIMD") V4SI V4SF V2DI]) + ;; Quad-width vector modes without floating-point elements. (define_mode_iterator VQI [V16QI V8HI V4SI]) @@ -488,6 +491,8 @@ (define_int_iterator VCADD [UNSPEC_VCADD90 UNSPEC_VCADD270]) (define_int_iterator VCMLA [UNSPEC_VCMLA UNSPEC_VCMLA90 UNSPEC_VCMLA180 UNSPEC_VCMLA270]) +(define_int_iterator BF_MA [UNSPEC_BFMAB UNSPEC_BFMAT]) + ;;---------------------------------------------------------------------------- ;; Mode attributes ;;---------------------------------------------------------------------------- @@ -612,7 +617,8 @@ (define_mode_attr V_HALF [(V16QI "V8QI") (V8HI "V4HI") (V8HF "V4HF") (V4SI "V2SI") (V4SF "V2SF") (V2DF "DF") - (V2DI "DI") (V4HF "HF")]) + (V2DI "DI") (V4HF "HF") + (V8BF "V4BF")]) ;; Same, but lower-case. (define_mode_attr V_half [(V16QI "v8qi") (V8HI "v4hi") @@ -1174,4 +1180,7 @@ (define_int_attr opsuffix [(UNSPEC_DOT_S "s8") (UNSPEC_DOT_U "u8")]) +;; An iterator for VFMA +(define_int_attr bt [(UNSPEC_BFMAB "b") (UNSPEC_BFMAT "t")]) + (define_int_attr smlaw_op [(UNSPEC_SMLAWB "smlawb") (UNSPEC_SMLAWT "smlawt")]) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index b724aab65f720bf0e48bb828f0874426effd235c..42763de178a96422f9df7f4500e4328adfa81d27 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -3879,7 +3879,7 @@ if (BYTES_BIG_ENDIAN) (define_expand "neon_vget_high" [(match_operand: 0 "s_register_operand") - (match_operand:VQX 1 "s_register_operand")] + (match_operand:VQXBF 1 "s_register_operand")] "TARGET_NEON" { emit_move_insn (operands[0], @@ -6556,3 +6556,62 @@ if (BYTES_BIG_ENDIAN) "vabd. %0, %1, %2" [(set_attr "type" "neon_fp_abd_s")] ) + +(define_insn "neon_vbfmmlav8bf" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (plus:V4SF (match_operand:V4SF 1 "register_operand" "0") + (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w") + (match_operand:V8BF 3 "register_operand" "w")] + UNSPEC_BFMMLA)))] + "TARGET_BF16_SIMD" + "vmmla.bf16\\t%q0, %q2, %q3" + [(set_attr "type" "neon_mla_s_q")] +) + +(define_insn "neon_vbfmav8bf" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (plus: V4SF (match_operand:V4SF 1 "register_operand" "0") + (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w") + (match_operand:V8BF 3 "register_operand" "w")] + BF_MA)))] + "TARGET_BF16_SIMD" + "vfma.bf16\\t%q0, %q2, %q3" + [(set_attr "type" "neon_fp_mla_s")] +) + +(define_insn "neon_vbfma_lanev8bf" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (plus: V4SF (match_operand:V4SF 1 "register_operand" "0") + (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w") + (match_operand:V4BF 3 "register_operand" "x") + (match_operand:SI 4 "const_int_operand" "n")] + BF_MA)))] + "TARGET_BF16_SIMD" + "vfma.bf16\\t%q0, %q2, %P3[%c4]" + [(set_attr "type" "neon_fp_mla_s")] +) + +(define_expand "neon_vbfma_laneqv8bf" + [(set (match_operand:V4SF 0 "register_operand" "=w") + (plus: V4SF (match_operand:V4SF 1 "register_operand" "0") + (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w") + (match_operand:V8BF 3 "register_operand" "x") + (match_operand:SI 4 "const_int_operand" "n")] + BF_MA)))] + "TARGET_BF16_SIMD" + { + int lane = INTVAL (operands[4]); + gcc_assert (lane >=0 && lane <=7); + if (lane < 4) + emit_insn (gen_neon_vbfma_lanev8bf (operands[0], operands[1], operands[2], operands[3], operands[4])); + else + { + rtx op_highpart = gen_reg_rtx (V4BFmode); + emit_insn (gen_neon_vget_highv8bf (op_highpart, operands[3])); + operands[4] = GEN_INT (lane - 4); + emit_insn (gen_neon_vbfma_lanev8bf (operands[0], operands[1], operands[2], op_highpart, operands[4])); + } + DONE; + } + [(set_attr "type" "neon_fp_mla_s")] +) diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index b4196b0e5cd939c3ee5e3f9bd19622fcc963adae..f452082b4bdb3a22a8e3b62113bb7f9470279e93 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -493,4 +493,7 @@ UNSPEC_VCMLA90 UNSPEC_VCMLA180 UNSPEC_VCMLA270 + UNSPEC_BFMMLA + UNSPEC_BFMAB + UNSPEC_BFMAT ]) diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_ma_1.c b/gcc/testsuite/gcc.target/arm/simd/bf16_ma_1.c new file mode 100644 index 0000000000000000000000000000000000000000..ead3e9d569f45f5507985e5d7cb12e0541349dd1 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_ma_1.c @@ -0,0 +1,84 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ +/* { dg-additional-options "-save-temps" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_neon.h" + +/* +**test_vbfmlalbq_f32: +** ... +** vfmab.bf16\tq[0-9]+, q[0-9]+, q[0-9]+ +** ... +*/ +float32x4_t +test_vbfmlalbq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ + return vbfmlalbq_f32 (r, a, b); +} + +/* +**test_vbfmlaltq_f32: +** ... +** vfmat.bf16\tq[0-9]+, q[0-9]+, q[0-9]+ +** ... +*/ +float32x4_t +test_vbfmlaltq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ + return vbfmlaltq_f32 (r, a, b); +} + +/* +**test_vbfmlalbq_lane_f32: +** ... +** vfmab.bf16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\] +** ... +*/ +float32x4_t +test_vbfmlalbq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + return vbfmlalbq_lane_f32 (r, a, b, 0); +} + +/* +**test_vbfmlaltq_lane_f32: +** ... +** vfmat.bf16\tq[0-9]+, q[0-9]+, d[0-9]+\[2\] +** ... +*/ +float32x4_t +test_vbfmlaltq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + return vbfmlaltq_lane_f32 (r, a, b, 2); +} + +/* +**test_vbfmlalbq_laneq_f32: +** ... +** vfmab.bf16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\] +** ... +*/ +float32x4_t +test_vbfmlalbq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ + return vbfmlalbq_laneq_f32 (r, a, b, 5); +} + +/* +**test_vbfmlaltq_laneq_f32: +** ... +** vfmat.bf16\tq[0-9]+, q[0-9]+, d[0-9]+\[3\] +** ... +*/ +float32x4_t +test_vbfmlaltq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b) +{ + return vbfmlaltq_laneq_f32 (r, a, b, 7); +} + +int main() +{ + return 0; +} diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_ma_2.c b/gcc/testsuite/gcc.target/arm/simd/bf16_ma_2.c new file mode 100644 index 0000000000000000000000000000000000000000..226ed7e1d8e4747d73b0518c809aaf0e3c5bc78d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_ma_2.c @@ -0,0 +1,31 @@ +/* { dg-do compile { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ + +#include "arm_neon.h" + +/* Test lane index limits for vbfmlalbq_lane_f32 */ +float32x4_t +test_vbfmlalbq_lane_f32_low (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + return __builtin_neon_vbfmab_lanev8bf (r, a, b, -1); /* { dg-error {lane -1 out of range 0 - 3} } */ +} + +float32x4_t +test_vbfmlalbq_lane_f32_high (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + return __builtin_neon_vbfmab_lanev8bf (r, a, b, 4); /* { dg-error {lane 4 out of range 0 - 3} } */ +} + +/* Test lane index limits for vbfmlaltq_lane_f32 */ +float32x4_t +test_vbfmlaltq_lane_f32_low (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + return __builtin_neon_vbfmat_lanev8bf (r, a, b, -1); /* { dg-error {lane -1 out of range 0 - 3} } */ +} + +float32x4_t +test_vbfmlaltq_lane_f32_high (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b) +{ + return __builtin_neon_vbfmat_lanev8bf (r, a, b, 4); /* { dg-error {lane 4 out of range 0 - 3} } */ +} diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_mmla_1.c b/gcc/testsuite/gcc.target/arm/simd/bf16_mmla_1.c new file mode 100644 index 0000000000000000000000000000000000000000..0c7422b78c385850eaa53492af0da8826e8b3b4a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/bf16_mmla_1.c @@ -0,0 +1,24 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */ +/* { dg-add-options arm_v8_2a_bf16_neon } */ +/* { dg-additional-options "-save-temps" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "arm_neon.h" + +/* +**test_vbfmmlaq_f32: +** ... +** vmmla.bf16\tq[0-9]+, q[0-9]+, q[0-9]+ +** ... +*/ +float32x4_t +test_vbfmmlaq_f32 (float32x4_t r, bfloat16x8_t x, bfloat16x8_t y) +{ + return vbfmmlaq_f32 (r, x, y); +} + +int main() +{ + return 0; +}