From patchwork Wed Jul 20 17:00:46 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiong Wang X-Patchwork-Id: 650797 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rvjqK60rtz9t0p for ; Thu, 21 Jul 2016 03:01:17 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=Z1XXHy0c; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=fzHEr2ncECRdrIklq o6E8IXULKLUO6SL10JXpIbylNNpHUN02+swzi+FFHfU0q1XqavEQbRJsYR5XUxId LOPJo65driS4/fNutlGSifqCf/iCKjsAEazvs/PWuyhnKr/ZMRbrK+c7qzG/HuaB F3vcPXCOVlo5w7CNN10BPpLX8g= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:from:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=etwb9zvWxuAUgqGZ+3CWPAP S47g=; b=Z1XXHy0cHMXGqHEOyUnrYYS5UUnex0iQ4pQv2Su9rt5WwqJ3Za/a+3a 45zZ3uGbLHlY8opcgJI+kk/kK2SPgdtFhRxftydxFVwWNVhyP2t3slbo+9HeI3ME S4iDL7wHSF0pvwudYnIcUVW9FiBy2/U+2DCK1+/dSaScDk0nVC/0= Received: (qmail 75254 invoked by alias); 20 Jul 2016 17:01:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 75165 invoked by uid 89); 20 Jul 2016 17:01:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=sk:immedia X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 20 Jul 2016 17:00:49 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 254B23A1 for ; Wed, 20 Jul 2016 10:01:59 -0700 (PDT) Received: from [10.2.206.198] (e104437-lin.cambridge.arm.com [10.2.206.198]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2DB403F387 for ; Wed, 20 Jul 2016 10:00:48 -0700 (PDT) Subject: Re: [AArch64][3/14] ARMv8.2-A FP16 two operands vector intrinsics To: GCC Patches References: <67f7b93f-0a92-de8f-8c50-5b4b573fed3a@foss.arm.com> <99eb95e3-5e9c-c6c9-b85f-e67d15f4859a@foss.arm.com> <21c3c64f-95ad-c127-3f8a-4afd236aae33@foss.arm.com> <4f39f1aa-195d-87c8-c5f5-631e5bf89e5e@foss.arm.com> From: Jiong Wang Message-ID: Date: Wed, 20 Jul 2016 18:00:46 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <4f39f1aa-195d-87c8-c5f5-631e5bf89e5e@foss.arm.com> X-IsSubscribed: yes On 07/07/16 17:15, Jiong Wang wrote: > This patch add ARMv8.2-A FP16 two operands vector intrinsics. The updated patch resolve the conflict with https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00309.html The change is to let aarch64_emit_approx_div return false for V4HFmode and V8HFmode. gcc/ 2016-07-20 Jiong Wang * config/aarch64/aarch64-simd-builtins.def: Register new builtins. * config/aarch64/aarch64-simd.md (aarch64_rsqrts): Extend to HF modes. (fabd3): Likewise. (3): Likewise. (3): Likewise. (aarch64_p): Likewise. (3): Likewise. (3): Likewise. (3): Likewise. (aarch64_faddp): Likewise. (aarch64_fmulx): Likewise. (aarch64_frecps): Likewise. (*aarch64_fac): Rename to aarch64_fac. (add3): Extend to HF modes. (sub3): Likewise. (mul3): Likewise. (div3): Likewise. (*div3): Likewise. * config/aarch64/aarch64.c (aarch64_emit_approx_div): Return false for V4HF and V8HF. * config/aarch64/iterators.md (VDQ_HSDI, VSDQ_HSDI): New mode iterator. * config/aarch64/arm_neon.h (vadd_f16): Likewise. (vaddq_f16): Likewise. (vabd_f16): Likewise. (vabdq_f16): Likewise. (vcage_f16): Likewise. (vcageq_f16): Likewise. (vcagt_f16): Likewise. (vcagtq_f16): Likewise. (vcale_f16): Likewise. (vcaleq_f16): Likewise. (vcalt_f16): Likewise. (vcaltq_f16): Likewise. (vceq_f16): Likewise. (vceqq_f16): Likewise. (vcge_f16): Likewise. (vcgeq_f16): Likewise. (vcgt_f16): Likewise. (vcgtq_f16): Likewise. (vcle_f16): Likewise. (vcleq_f16): Likewise. (vclt_f16): Likewise. (vcltq_f16): Likewise. (vcvt_n_f16_s16): Likewise. (vcvtq_n_f16_s16): Likewise. (vcvt_n_f16_u16): Likewise. (vcvtq_n_f16_u16): Likewise. (vcvt_n_s16_f16): Likewise. (vcvtq_n_s16_f16): Likewise. (vcvt_n_u16_f16): Likewise. (vcvtq_n_u16_f16): Likewise. (vdiv_f16): Likewise. (vdivq_f16): Likewise. (vdup_lane_f16): Likewise. (vdup_laneq_f16): Likewise. (vdupq_lane_f16): Likewise. (vdupq_laneq_f16): Likewise. (vdups_lane_f16): Likewise. (vdups_laneq_f16): Likewise. (vmax_f16): Likewise. (vmaxq_f16): Likewise. (vmaxnm_f16): Likewise. (vmaxnmq_f16): Likewise. (vmin_f16): Likewise. (vminq_f16): Likewise. (vminnm_f16): Likewise. (vminnmq_f16): Likewise. (vmul_f16): Likewise. (vmulq_f16): Likewise. (vmulx_f16): Likewise. (vmulxq_f16): Likewise. (vpadd_f16): Likewise. (vpaddq_f16): Likewise. (vpmax_f16): Likewise. (vpmaxq_f16): Likewise. (vpmaxnm_f16): Likewise. (vpmaxnmq_f16): Likewise. (vpmin_f16): Likewise. (vpminq_f16): Likewise. (vpminnm_f16): Likewise. (vpminnmq_f16): Likewise. (vrecps_f16): Likewise. (vrecpsq_f16): Likewise. (vrsqrts_f16): Likewise. (vrsqrtsq_f16): Likewise. (vsub_f16): Likewise. (vsubq_f16): Likewise. diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 22c87be429ba1aac2bbe77f1119d16b6b8bd6e80..007dad60b6999158a1c9c1cf2a501a9f0712af54 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -41,7 +41,7 @@ BUILTIN_VDC (COMBINE, combine, 0) BUILTIN_VB (BINOP, pmul, 0) - BUILTIN_VALLF (BINOP, fmulx, 0) + BUILTIN_VHSDF_SDF (BINOP, fmulx, 0) BUILTIN_VHSDF_DF (UNOP, sqrt, 2) BUILTIN_VD_BHSI (BINOP, addp, 0) VAR1 (UNOP, addp, 0, di) @@ -248,22 +248,22 @@ BUILTIN_VDQ_BHSI (BINOP, smin, 3) BUILTIN_VDQ_BHSI (BINOP, umax, 3) BUILTIN_VDQ_BHSI (BINOP, umin, 3) - BUILTIN_VDQF (BINOP, smax_nan, 3) - BUILTIN_VDQF (BINOP, smin_nan, 3) + BUILTIN_VHSDF (BINOP, smax_nan, 3) + BUILTIN_VHSDF (BINOP, smin_nan, 3) /* Implemented by 3. */ - BUILTIN_VDQF (BINOP, fmax, 3) - BUILTIN_VDQF (BINOP, fmin, 3) + BUILTIN_VHSDF (BINOP, fmax, 3) + BUILTIN_VHSDF (BINOP, fmin, 3) /* Implemented by aarch64_p. */ BUILTIN_VDQ_BHSI (BINOP, smaxp, 0) BUILTIN_VDQ_BHSI (BINOP, sminp, 0) BUILTIN_VDQ_BHSI (BINOP, umaxp, 0) BUILTIN_VDQ_BHSI (BINOP, uminp, 0) - BUILTIN_VDQF (BINOP, smaxp, 0) - BUILTIN_VDQF (BINOP, sminp, 0) - BUILTIN_VDQF (BINOP, smax_nanp, 0) - BUILTIN_VDQF (BINOP, smin_nanp, 0) + BUILTIN_VHSDF (BINOP, smaxp, 0) + BUILTIN_VHSDF (BINOP, sminp, 0) + BUILTIN_VHSDF (BINOP, smax_nanp, 0) + BUILTIN_VHSDF (BINOP, smin_nanp, 0) /* Implemented by 2. */ BUILTIN_VHSDF (UNOP, btrunc, 2) @@ -383,7 +383,7 @@ BUILTIN_VDQ_SI (UNOP, urecpe, 0) BUILTIN_VHSDF (UNOP, frecpe, 0) - BUILTIN_VDQF (BINOP, frecps, 0) + BUILTIN_VHSDF (BINOP, frecps, 0) /* Implemented by a mixture of abs2 patterns. Note the DImode builtin is only ever used for the int64x1_t intrinsic, there is no scalar version. */ @@ -475,22 +475,22 @@ BUILTIN_VSDQ_HSI (QUADOP_LANE, sqrdmlsh_laneq, 0) /* Implemented by <*><*>3. */ - BUILTIN_VSDQ_SDI (SHIFTIMM, scvtf, 3) - BUILTIN_VSDQ_SDI (FCVTIMM_SUS, ucvtf, 3) - BUILTIN_VALLF (SHIFTIMM, fcvtzs, 3) - BUILTIN_VALLF (SHIFTIMM_USS, fcvtzu, 3) + BUILTIN_VSDQ_HSDI (SHIFTIMM, scvtf, 3) + BUILTIN_VSDQ_HSDI (FCVTIMM_SUS, ucvtf, 3) + BUILTIN_VHSDF_SDF (SHIFTIMM, fcvtzs, 3) + BUILTIN_VHSDF_SDF (SHIFTIMM_USS, fcvtzu, 3) /* Implemented by aarch64_rsqrte. */ BUILTIN_VHSDF_SDF (UNOP, rsqrte, 0) /* Implemented by aarch64_rsqrts. */ - BUILTIN_VALLF (BINOP, rsqrts, 0) + BUILTIN_VHSDF_SDF (BINOP, rsqrts, 0) /* Implemented by fabd3. */ - BUILTIN_VALLF (BINOP, fabd, 3) + BUILTIN_VHSDF_SDF (BINOP, fabd, 3) /* Implemented by aarch64_faddp. */ - BUILTIN_VDQF (BINOP, faddp, 0) + BUILTIN_VHSDF (BINOP, faddp, 0) /* Implemented by aarch64_cm. */ BUILTIN_VHSDF_SDF (BINOP_USS, cmeq, 0) @@ -501,3 +501,9 @@ /* Implemented by neg2. */ BUILTIN_VHSDF (UNOP, neg, 2) + + /* Implemented by aarch64_fac. */ + BUILTIN_VHSDF_SDF (BINOP_USS, faclt, 0) + BUILTIN_VHSDF_SDF (BINOP_USS, facle, 0) + BUILTIN_VHSDF_SDF (BINOP_USS, facgt, 0) + BUILTIN_VHSDF_SDF (BINOP_USS, facge, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 8d895a545672a255da6234d6fafeea51dc92ae3b..ec7ab8669cec217e196e9b3d341119bb5988346c 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -391,13 +391,13 @@ [(set_attr "type" "neon_fp_rsqrte_")]) (define_insn "aarch64_rsqrts" - [(set (match_operand:VALLF 0 "register_operand" "=w") - (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w") - (match_operand:VALLF 2 "register_operand" "w")] - UNSPEC_RSQRTS))] + [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w") + (unspec:VHSDF_SDF [(match_operand:VHSDF_SDF 1 "register_operand" "w") + (match_operand:VHSDF_SDF 2 "register_operand" "w")] + UNSPEC_RSQRTS))] "TARGET_SIMD" "frsqrts\\t%0, %1, %2" - [(set_attr "type" "neon_fp_rsqrts_")]) + [(set_attr "type" "neon_fp_rsqrts_")]) (define_expand "rsqrt2" [(set (match_operand:VALLF 0 "register_operand" "=w") @@ -475,14 +475,14 @@ ) (define_insn "fabd3" - [(set (match_operand:VALLF 0 "register_operand" "=w") - (abs:VALLF - (minus:VALLF - (match_operand:VALLF 1 "register_operand" "w") - (match_operand:VALLF 2 "register_operand" "w"))))] + [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w") + (abs:VHSDF_SDF + (minus:VHSDF_SDF + (match_operand:VHSDF_SDF 1 "register_operand" "w") + (match_operand:VHSDF_SDF 2 "register_operand" "w"))))] "TARGET_SIMD" "fabd\t%0, %1, %2" - [(set_attr "type" "neon_fp_abd_")] + [(set_attr "type" "neon_fp_abd_")] ) (define_insn "and3" @@ -1105,10 +1105,10 @@ ;; Pairwise FP Max/Min operations. (define_insn "aarch64_p" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")] - FMAXMINV))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")] + FMAXMINV))] "TARGET_SIMD" "p\t%0., %1., %2." [(set_attr "type" "neon_minmax")] @@ -1517,36 +1517,36 @@ ;; FP arithmetic operations. (define_insn "add3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (plus:VDQF (match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (plus:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")))] "TARGET_SIMD" "fadd\\t%0., %1., %2." - [(set_attr "type" "neon_fp_addsub_")] + [(set_attr "type" "neon_fp_addsub_")] ) (define_insn "sub3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (minus:VDQF (match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (minus:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")))] "TARGET_SIMD" "fsub\\t%0., %1., %2." - [(set_attr "type" "neon_fp_addsub_")] + [(set_attr "type" "neon_fp_addsub_")] ) (define_insn "mul3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (mult:VDQF (match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (mult:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")))] "TARGET_SIMD" "fmul\\t%0., %1., %2." - [(set_attr "type" "neon_fp_mul_")] + [(set_attr "type" "neon_fp_mul_")] ) (define_expand "div3" - [(set (match_operand:VDQF 0 "register_operand") - (div:VDQF (match_operand:VDQF 1 "general_operand") - (match_operand:VDQF 2 "register_operand")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (div:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")))] "TARGET_SIMD" { if (aarch64_emit_approx_div (operands[0], operands[1], operands[2])) @@ -1556,12 +1556,12 @@ }) (define_insn "*div3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (div:VDQF (match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (div:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")))] "TARGET_SIMD" "fdiv\\t%0., %1., %2." - [(set_attr "type" "neon_fp_div_")] + [(set_attr "type" "neon_fp_div_")] ) (define_insn "neg2" @@ -1826,24 +1826,26 @@ ;; Convert between fixed-point and floating-point (vector modes) -(define_insn "3" - [(set (match_operand: 0 "register_operand" "=w") - (unspec: [(match_operand:VDQF 1 "register_operand" "w") - (match_operand:SI 2 "immediate_operand" "i")] +(define_insn "3" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: + [(match_operand:VHSDF 1 "register_operand" "w") + (match_operand:SI 2 "immediate_operand" "i")] FCVT_F2FIXED))] "TARGET_SIMD" "\t%0, %1, #%2" - [(set_attr "type" "neon_fp_to_int_")] + [(set_attr "type" "neon_fp_to_int_")] ) -(define_insn "3" - [(set (match_operand: 0 "register_operand" "=w") - (unspec: [(match_operand:VDQ_SDI 1 "register_operand" "w") - (match_operand:SI 2 "immediate_operand" "i")] +(define_insn "3" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: + [(match_operand:VDQ_HSDI 1 "register_operand" "w") + (match_operand:SI 2 "immediate_operand" "i")] FCVT_FIXED2F))] "TARGET_SIMD" "\t%0, %1, #%2" - [(set_attr "type" "neon_int_to_fp_")] + [(set_attr "type" "neon_int_to_fp_")] ) ;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] patterns @@ -2002,33 +2004,33 @@ ;; NaNs. (define_insn "3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (FMAXMIN:VDQF (match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (FMAXMIN:VHSDF (match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")))] "TARGET_SIMD" "fnm\\t%0., %1., %2." - [(set_attr "type" "neon_fp_minmax_")] + [(set_attr "type" "neon_fp_minmax_")] ) (define_insn "3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")] - FMAXMIN_UNS))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")] + FMAXMIN_UNS))] "TARGET_SIMD" "\\t%0., %1., %2." - [(set_attr "type" "neon_fp_minmax_")] + [(set_attr "type" "neon_fp_minmax_")] ) ;; Auto-vectorized forms for the IEEE-754 fmax()/fmin() functions (define_insn "3" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")] - FMAXMIN))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")] + FMAXMIN))] "TARGET_SIMD" "\\t%0., %1., %2." - [(set_attr "type" "neon_fp_minmax_")] + [(set_attr "type" "neon_fp_minmax_")] ) ;; 'across lanes' add. @@ -2048,13 +2050,13 @@ ) (define_insn "aarch64_faddp" - [(set (match_operand:VDQF 0 "register_operand" "=w") - (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w") - (match_operand:VDQF 2 "register_operand" "w")] - UNSPEC_FADDV))] + [(set (match_operand:VHSDF 0 "register_operand" "=w") + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") + (match_operand:VHSDF 2 "register_operand" "w")] + UNSPEC_FADDV))] "TARGET_SIMD" "faddp\t%0., %1., %2." - [(set_attr "type" "neon_fp_reduc_add_")] + [(set_attr "type" "neon_fp_reduc_add_")] ) (define_insn "aarch64_reduc_plus_internal" @@ -3050,13 +3052,14 @@ ;; fmulx. (define_insn "aarch64_fmulx" - [(set (match_operand:VALLF 0 "register_operand" "=w") - (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w") - (match_operand:VALLF 2 "register_operand" "w")] - UNSPEC_FMULX))] + [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w") + (unspec:VHSDF_SDF + [(match_operand:VHSDF_SDF 1 "register_operand" "w") + (match_operand:VHSDF_SDF 2 "register_operand" "w")] + UNSPEC_FMULX))] "TARGET_SIMD" "fmulx\t%0, %1, %2" - [(set_attr "type" "neon_fp_mul_")] + [(set_attr "type" "neon_fp_mul_")] ) ;; vmulxq_lane_f32, and vmulx_laneq_f32 @@ -4310,16 +4313,18 @@ ;; Note we can also handle what would be fac(le|lt) by ;; generating fac(ge|gt). -(define_insn "*aarch64_fac" +(define_insn "aarch64_fac" [(set (match_operand: 0 "register_operand" "=w") (neg: (FAC_COMPARISONS: - (abs:VALLF (match_operand:VALLF 1 "register_operand" "w")) - (abs:VALLF (match_operand:VALLF 2 "register_operand" "w")) + (abs:VHSDF_SDF + (match_operand:VHSDF_SDF 1 "register_operand" "w")) + (abs:VHSDF_SDF + (match_operand:VHSDF_SDF 2 "register_operand" "w")) )))] "TARGET_SIMD" "fac\t%0, %, %" - [(set_attr "type" "neon_fp_compare_")] + [(set_attr "type" "neon_fp_compare_")] ) ;; addp @@ -5431,13 +5436,14 @@ ) (define_insn "aarch64_frecps" - [(set (match_operand:VALLF 0 "register_operand" "=w") - (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w") - (match_operand:VALLF 2 "register_operand" "w")] - UNSPEC_FRECPS))] + [(set (match_operand:VHSDF_SDF 0 "register_operand" "=w") + (unspec:VHSDF_SDF + [(match_operand:VHSDF_SDF 1 "register_operand" "w") + (match_operand:VHSDF_SDF 2 "register_operand" "w")] + UNSPEC_FRECPS))] "TARGET_SIMD" "frecps\\t%0, %1, %2" - [(set_attr "type" "neon_fp_recps_")] + [(set_attr "type" "neon_fp_recps_")] ) (define_insn "aarch64_urecpe" diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5ed633542efe58763d68fd9bfbb478ae6ef569c3..a7437c04eb936a5e3ebd0bc77eb4afd8c052df28 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7717,6 +7717,10 @@ bool aarch64_emit_approx_div (rtx quo, rtx num, rtx den) { machine_mode mode = GET_MODE (quo); + + if (mode == V4HFmode || mode == V8HFmode) + return false; + bool use_approx_division_p = (flag_mlow_precision_div || (aarch64_tune_params.approx_modes->division & AARCH64_APPROX_MODE (mode))); diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index b4310f27aac08ab6ff5e89d58512dafc389b2c37..baae27619a6a1c34c0ad338f2afec4932b51cbeb 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -26385,6 +26385,368 @@ vsqrtq_f16 (float16x8_t a) return __builtin_aarch64_sqrtv8hf (a); } +/* ARMv8.2-A FP16 two operands vector intrinsics. */ + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vadd_f16 (float16x4_t __a, float16x4_t __b) +{ + return __a + __b; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vaddq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __a + __b; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vabd_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_fabdv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vabdq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_fabdv8hf (a, b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcage_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_facgev4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcageq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_facgev8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcagt_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_facgtv4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcagtq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_facgtv8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcale_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_faclev4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcaleq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_faclev8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcalt_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_facltv4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcaltq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_facltv8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vceq_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_cmeqv4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vceqq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_cmeqv8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcge_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_cmgev4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcgeq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_cmgev8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcgt_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_cmgtv4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcgtq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_cmgtv8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcle_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_cmlev4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcleq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_cmlev8hf_uss (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vclt_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_cmltv4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcltq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_cmltv8hf_uss (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vcvt_n_f16_s16 (int16x4_t __a, const int __b) +{ + return __builtin_aarch64_scvtfv4hi (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vcvtq_n_f16_s16 (int16x8_t __a, const int __b) +{ + return __builtin_aarch64_scvtfv8hi (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vcvt_n_f16_u16 (uint16x4_t __a, const int __b) +{ + return __builtin_aarch64_ucvtfv4hi_sus (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vcvtq_n_f16_u16 (uint16x8_t __a, const int __b) +{ + return __builtin_aarch64_ucvtfv8hi_sus (__a, __b); +} + +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) +vcvt_n_s16_f16 (float16x4_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzsv4hf (__a, __b); +} + +__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) +vcvtq_n_s16_f16 (float16x8_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzsv8hf (__a, __b); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vcvt_n_u16_f16 (float16x4_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzuv4hf_uss (__a, __b); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vcvtq_n_u16_f16 (float16x8_t __a, const int __b) +{ + return __builtin_aarch64_fcvtzuv8hf_uss (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vdiv_f16 (float16x4_t __a, float16x4_t __b) +{ + return __a / __b; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vdivq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __a / __b; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmax_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_smax_nanv4hf (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmaxq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_smax_nanv8hf (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmaxnm_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_fmaxv4hf (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmaxnmq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_fmaxv8hf (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmin_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_smin_nanv4hf (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vminq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_smin_nanv8hf (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vminnm_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_fminv4hf (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vminnmq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_fminv8hf (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmul_f16 (float16x4_t __a, float16x4_t __b) +{ + return __a * __b; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmulq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __a * __b; +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vmulx_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_fmulxv4hf (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vmulxq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_fmulxv8hf (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vpadd_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_faddpv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vpaddq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_faddpv8hf (a, b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vpmax_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_smax_nanpv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vpmaxq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_smax_nanpv8hf (a, b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vpmaxnm_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_smaxpv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vpmaxnmq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_smaxpv8hf (a, b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vpmin_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_smin_nanpv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vpminq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_smin_nanpv8hf (a, b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vpminnm_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_sminpv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vpminnmq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_sminpv8hf (a, b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrecps_f16 (float16x4_t __a, float16x4_t __b) +{ + return __builtin_aarch64_frecpsv4hf (__a, __b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrecpsq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __builtin_aarch64_frecpsv8hf (__a, __b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vrsqrts_f16 (float16x4_t a, float16x4_t b) +{ + return __builtin_aarch64_rsqrtsv4hf (a, b); +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vrsqrtsq_f16 (float16x8_t a, float16x8_t b) +{ + return __builtin_aarch64_rsqrtsv8hf (a, b); +} + +__extension__ static __inline float16x4_t __attribute__ ((__always_inline__)) +vsub_f16 (float16x4_t __a, float16x4_t __b) +{ + return __a - __b; +} + +__extension__ static __inline float16x8_t __attribute__ ((__always_inline__)) +vsubq_f16 (float16x8_t __a, float16x8_t __b) +{ + return __a - __b; +} + #pragma GCC pop_options #undef __aarch64_vget_lane_any diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index af5eda9b9f4a80e1309655dcd7798337e1d818eb..35190b4343bd6dfb3a77a58bd1697426962cedc7 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -166,9 +166,19 @@ ;; Vector modes for S and D (define_mode_iterator VDQ_SDI [V2SI V4SI V2DI]) +;; Vector modes for H, S and D +(define_mode_iterator VDQ_HSDI [(V4HI "TARGET_SIMD_F16INST") + (V8HI "TARGET_SIMD_F16INST") + V2SI V4SI V2DI]) + ;; Scalar and Vector modes for S and D (define_mode_iterator VSDQ_SDI [V2SI V4SI V2DI SI DI]) +;; Scalar and Vector modes for S and D, Vector modes for H. +(define_mode_iterator VSDQ_HSDI [(V4HI "TARGET_SIMD_F16INST") + (V8HI "TARGET_SIMD_F16INST") + V2SI V4SI V2DI SI DI]) + ;; Vector modes for Q and H types. (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])