[8/17,ARM] Add VFP FP16 arithmetic instructions.

Message ID	573B2C4E.4090900@foss.arm.com
State	New
Headers	show Return-Path: <gcc-patches-return-427499-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=YppBjPaTPWQhGFKS9 qBE1D4Gkf3KO/WiYoPihOBHIcG+uLsmsaoeVIdSFMIGwJUGlkZNP3S6sYj/0S3sK xq7s4Xx2RKbS7jAjPk6NTMf350f9ZB0hqOrbV+LvjUmVX0iSsbaRhz39wFIRrHlq Bl+Ocg6rglH4rlCf2ZGSXmv0z0= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Subject: [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions. To: gcc-patches <gcc-patches@gcc.gnu.org> References: <573B28A3.9030603@foss.arm.com> From: Matthew Wahab <matthew.wahab@foss.arm.com> Message-ID: <573B2C4E.4090900@foss.arm.com> Date: Tue, 17 May 2016 15:35:58 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <573B28A3.9030603@foss.arm.com> Content-Type: multipart/mixed; boundary="------------090904020702010304020907"

From 3e773f2ec85ea66d0be0e3a97ea52826156c00f2 Mon Sep 17 00:00:00 2001 From: Matthew Wahab <matthew.wahab@arm.com> Date: Thu, 7 Apr 2016 14:49:17 +0100 Subject: [PATCH 08/17] [PATCH 8/17][ARM] Add VFP FP16 arithmetic instructions. 2016-05-17 Matthew Wahab <matthew.wahab@arm.com> * config/arm/iterators.md (Code iterators): Fix some white-space in the comments. (GLTE): New. (ABSNEG): New (FCVT): Moved from vfp.md. (VCVT_HF_US_N): New. (VCVT_SI_US_N): New. (VCVT_HF_US): New. (VCVTH_US): New. (FP16_RND): New. (absneg_str): New. (FCVTI32typename): Moved from vfp.md. (sup): Add UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S, UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S, UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N, UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N, UNSPEC_VCVTH_S_N, UNSPEC_VCVTH_U_N, UNSPEC_VCVTH_S and UNSPEC_VCVTH_U. (vcvth_op): New. (fp16_rnd_str): New. (fp16_rnd_insn): New. * config/arm/unspecs.md (UNSPEC_VCVT_HF_S_N): New. (UNSPEC_VCVT_HF_U_N): New. (UNSPEC_VCVT_SI_S_N): New. (UNSPEC_VCVT_SI_U_N): New. (UNSPEC_VCVTH_S): New. (UNSPEC_VCVTH_U): New. (UNSPEC_VCVTA_S): New. (UNSPEC_VCVTA_U): New. (UNSPEC_VCVTM_S): New. (UNSPEC_VCVTM_U): New. (UNSPEC_VCVTN_S): New. (UNSPEC_VCVTN_U): New. (UNSPEC_VCVTP_S): New. (UNSPEC_VCVTP_U): New. (UNSPEC_VCVTP_S): New. (UNSPEC_VCVTP_U): New. (UNSPEC_VRND): New. (UNSPEC_VRNDA): New. (UNSPEC_VRNDI): New. (UNSPEC_VRNDM): New. (UNSPEC_VRNDN): New. (UNSPEC_VRNDP): New. (UNSPEC_VRNDX): New. * config/arm/vfp.md (<absneg_str>hf2): New. (neon_v<absneg_str>hf): New. (neon_v<fp16_rnd_str>hf): New. (neon_vrndihf): New. (addhf3_fp16): New. (neon_vaddhf): New. (subhf3_fp16): New. (neon_vsubhf): New. (divhf3_fp16): New. (neon_vdivhf): New. (mulhf3_fp16): New. (neon_vmulhf): New. (*mulsf3neghf_vfp): New. (*negmulhf3_vfp): New. (*mulsf3addhf_vfp): New. (*mulhf3subhf_vfp): New. (*mulhf3neghfaddhf_vfp): New. (*mulhf3neghfsubhf_vfp): New. (fmahf4_fp16): New. (neon_vfmahf): New. (fmsubhf4_fp16): New. (neon_vfmshf): New. (*fnmsubhf4): New. (*fnmaddhf4): New. (neon_vsqrthf): New. (neon_vrsqrtshf): New. (FCVT): Move to iterators.md. (FCVTI32typename): Likewise. (neon_vcvthhf): New. (neon_vcvthsi): New. (neon_vcvth_nhf_unspec): New. (neon_vcvth_nhf): New. (neon_vcvth_nsi_unspec): New. (neon_vcvth_nsi): New. (neon_vcvt<vcvth_op>hsi): New. (neon_<fmaxmin_op>hf): New. testsuite/ 2016-05-17 Matthew Wahab <matthew.wahab@arm.com> * gcc.target/arm/armv8_2-fp16-arith-1.c: New. * gcc.target/arm/armv8_2-fp16-conv-1.c: New. --- gcc/config/arm/iterators.md | 59 ++- gcc/config/arm/unspecs.md | 21 + gcc/config/arm/vfp.md | 423 ++++++++++++++++++++- .../gcc.target/arm/armv8_2-fp16-arith-1.c | 68 ++++ gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c | 101 +++++ 5 files changed, 666 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c create mode 100644 gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 3f9d9e4..9371b6a 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -199,14 +199,17 @@ ;; Code iterators ;;---------------------------------------------------------------------------- -;; A list of condition codes used in compare instructions where -;; the carry flag from the addition is used instead of doing the +;; A list of condition codes used in compare instructions where +;; the carry flag from the addition is used instead of doing the ;; compare a second time. (define_code_iterator LTUGEU [ltu geu]) ;; The signed gt, ge comparisons (define_code_iterator GTGE [gt ge]) +;; The signed gt, ge, lt, le comparisons +(define_code_iterator GLTE [gt ge lt le]) + ;; The unsigned gt, ge comparisons (define_code_iterator GTUGEU [gtu geu]) @@ -235,6 +238,12 @@ ;; Binary operators whose second operand can be shifted. (define_code_iterator SHIFTABLE_OPS [plus minus ior xor and]) +;; Operations on the sign of a number. +(define_code_iterator ABSNEG [abs neg]) + +;; Conversions. +(define_code_iterator FCVT [unsigned_float float]) + ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows ;; a stack pointer opoerand. The minus operation is a candidate for an rsub ;; and hence only plus is supported. @@ -330,6 +339,22 @@ (define_int_iterator VCVT_US_N [UNSPEC_VCVT_S_N UNSPEC_VCVT_U_N]) +(define_int_iterator VCVT_HF_US_N [UNSPEC_VCVT_HF_S_N UNSPEC_VCVT_HF_U_N]) + +(define_int_iterator VCVT_SI_US_N [UNSPEC_VCVT_SI_S_N UNSPEC_VCVT_SI_U_N]) + +(define_int_iterator VCVT_HF_US [UNSPEC_VCVTA_S UNSPEC_VCVTA_U + UNSPEC_VCVTM_S UNSPEC_VCVTM_U + UNSPEC_VCVTN_S UNSPEC_VCVTN_U + UNSPEC_VCVTP_S UNSPEC_VCVTP_U]) + +(define_int_iterator VCVTH_US [UNSPEC_VCVTH_S UNSPEC_VCVTH_U]) + +;; Operators for FP16 instructions. +(define_int_iterator FP16_RND [UNSPEC_VRND UNSPEC_VRNDA + UNSPEC_VRNDM UNSPEC_VRNDN + UNSPEC_VRNDP UNSPEC_VRNDX]) + (define_int_iterator VQMOVN [UNSPEC_VQMOVN_S UNSPEC_VQMOVN_U]) (define_int_iterator VMOVL [UNSPEC_VMOVL_S UNSPEC_VMOVL_U]) @@ -687,6 +712,12 @@ (define_code_attr shift [(ashiftrt "ashr") (lshiftrt "lshr")]) (define_code_attr shifttype [(ashiftrt "signed") (lshiftrt "unsigned")]) +;; String reprentations of operations on the sign of a number. +(define_code_attr absneg_str [(abs "abs") (neg "neg")]) + +;; Conversions. +(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")]) + ;;---------------------------------------------------------------------------- ;; Int attributes ;;---------------------------------------------------------------------------- @@ -718,7 +749,13 @@ (UNSPEC_VPMAX "s") (UNSPEC_VPMAX_U "u") (UNSPEC_VPMIN "s") (UNSPEC_VPMIN_U "u") (UNSPEC_VCVT_S "s") (UNSPEC_VCVT_U "u") + (UNSPEC_VCVTA_S "s") (UNSPEC_VCVTA_U "u") + (UNSPEC_VCVTM_S "s") (UNSPEC_VCVTM_U "u") + (UNSPEC_VCVTN_S "s") (UNSPEC_VCVTN_U "u") + (UNSPEC_VCVTP_S "s") (UNSPEC_VCVTP_U "u") (UNSPEC_VCVT_S_N "s") (UNSPEC_VCVT_U_N "u") + (UNSPEC_VCVT_HF_S_N "s") (UNSPEC_VCVT_HF_U_N "u") + (UNSPEC_VCVT_SI_S_N "s") (UNSPEC_VCVT_SI_U_N "u") (UNSPEC_VQMOVN_S "s") (UNSPEC_VQMOVN_U "u") (UNSPEC_VMOVL_S "s") (UNSPEC_VMOVL_U "u") (UNSPEC_VSHL_S "s") (UNSPEC_VSHL_U "u") @@ -733,9 +770,25 @@ (UNSPEC_VSHLL_S_N "s") (UNSPEC_VSHLL_U_N "u") (UNSPEC_VSRA_S_N "s") (UNSPEC_VSRA_U_N "u") (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u") - + (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u") ]) +(define_int_attr vcvth_op + [(UNSPEC_VCVTA_S "a") (UNSPEC_VCVTA_U "a") + (UNSPEC_VCVTM_S "m") (UNSPEC_VCVTM_U "m") + (UNSPEC_VCVTN_S "n") (UNSPEC_VCVTN_U "n") + (UNSPEC_VCVTP_S "p") (UNSPEC_VCVTP_U "p")]) + +(define_int_attr fp16_rnd_str + [(UNSPEC_VRND "rnd") (UNSPEC_VRNDA "rnda") + (UNSPEC_VRNDM "rndm") (UNSPEC_VRNDN "rndn") + (UNSPEC_VRNDP "rndp") (UNSPEC_VRNDX "rndx")]) + +(define_int_attr fp16_rnd_insn + [(UNSPEC_VRND "vrintz") (UNSPEC_VRNDA "vrinta") + (UNSPEC_VRNDM "vrintm") (UNSPEC_VRNDN "vrintn") + (UNSPEC_VRNDP "vrintp") (UNSPEC_VRNDX "vrintx")]) + (define_int_attr cmp_op_unsp [(UNSPEC_VCEQ "eq") (UNSPEC_VCGT "gt") (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le") (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge") diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index 5744c62..57a47ff 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -203,6 +203,20 @@ UNSPEC_VCVT_U UNSPEC_VCVT_S_N UNSPEC_VCVT_U_N + UNSPEC_VCVT_HF_S_N + UNSPEC_VCVT_HF_U_N + UNSPEC_VCVT_SI_S_N + UNSPEC_VCVT_SI_U_N + UNSPEC_VCVTH_S + UNSPEC_VCVTH_U + UNSPEC_VCVTA_S + UNSPEC_VCVTA_U + UNSPEC_VCVTM_S + UNSPEC_VCVTM_U + UNSPEC_VCVTN_S + UNSPEC_VCVTN_U + UNSPEC_VCVTP_S + UNSPEC_VCVTP_U UNSPEC_VEXT UNSPEC_VHADD_S UNSPEC_VHADD_U @@ -365,5 +379,12 @@ UNSPEC_NVRINTN UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH + UNSPEC_VRND + UNSPEC_VRNDA + UNSPEC_VRNDI + UNSPEC_VRNDM + UNSPEC_VRNDN + UNSPEC_VRNDP + UNSPEC_VRNDX ]) diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index b1c13fa..6202fc3 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -937,9 +937,73 @@ (set_attr "type" "ffarithd")] ) +;; ABS and NEG for FP16. +(define_insn "<absneg_str>hf2" + [(set (match_operand:HF 0 "s_register_operand" "=w") + (ABSNEG:HF (match_operand:HF 1 "s_register_operand" "w")))] + "TARGET_VFP_FP16INST" + "v<absneg_str>.f16\t%0, %1" + [(set_attr "conds" "unconditional") + (set_attr "type" "ffariths")] +) + +(define_expand "neon_v<absneg_str>hf" + [(set + (match_operand:HF 0 "s_register_operand") + (ABSNEG:HF (match_operand:HF 1 "s_register_operand")))] + "TARGET_VFP_FP16INST" +{ + emit_insn (gen_<absneg_str>hf2 (operands[0], operands[1])); + DONE; +}) + +;; VRND for FP16. +(define_insn "neon_v<fp16_rnd_str>hf" + [(set (match_operand:HF 0 "s_register_operand" "=w") + (unspec:HF + [(match_operand:HF 1 "s_register_operand" "w")] + FP16_RND))] + "TARGET_VFP_FP16INST" + "<fp16_rnd_insn>.f16\t%0, %1" + [(set_attr "conds" "unconditional") + (set_attr "type" "neon_fp_round_s")] +) + +(define_insn "neon_vrndihf" + [(set (match_operand:HF 0 "s_register_operand" "=w") + (unspec:HF + [(match_operand:HF 1 "s_register_operand" "w")] + UNSPEC_VRNDI))] + "TARGET_VFP_FP16INST" + "vrintr.f16\t%0, %1" + [(set_attr "conds" "unconditional") + (set_attr "type" "neon_fp_round_s")] +) ;; Arithmetic insns +(define_insn "addhf3_fp16" + [(set + (match_operand:HF 0 "s_register_operand" "=w") + (plus:HF + (match_operand:HF 1 "s_register_operand" "w") + (match_operand:HF 2 "s_register_operand" "w")))] + "TARGET_VFP_FP16INST" + "vadd.f16\t%0, %1, %2" + [(set_attr "conds" "unconditional") + (set_attr "type" "fadds")] +) + +(define_expand "neon_vaddhf" + [(match_operand:HF 0 "s_register_operand") + (match_operand:HF 1 "s_register_operand") + (match_operand:HF 2 "s_register_operand")] + "TARGET_VFP_FP16INST" +{ + emit_insn (gen_addhf3_fp16 (operands[0], operands[1], operands[2])); + DONE; +}) + (define_insn "*addsf3_vfp" [(set (match_operand:SF 0 "s_register_operand" "=t") (plus:SF (match_operand:SF 1 "s_register_operand" "t") @@ -962,6 +1026,28 @@ (set_attr "type" "faddd")] ) +(define_insn "subhf3_fp16" + [(set + (match_operand:HF 0 "s_register_operand" "=w") + (minus:HF + (match_operand:HF 1 "s_register_operand" "w") + (match_operand:HF 2 "s_register_operand" "w")))] + "TARGET_VFP_FP16INST" + "vsub.f16\t%0, %1, %2" + [(set_attr "conds" "unconditional") + (set_attr "type" "fadds")] +) + +(define_expand "neon_vsubhf" + [(match_operand:HF 0 "s_register_operand") + (match_operand:HF 1 "s_register_operand") + (match_operand:HF 2 "s_register_operand")] + "TARGET_VFP_FP16INST" +{ + emit_insn (gen_subhf3_fp16 (operands[0], operands[1], + operands[2])); + DONE; +}) (define_insn "*subsf3_vfp" [(set (match_operand:SF 0 "s_register_operand" "=t") @@ -988,6 +1074,29 @@ ;; Division insns +;; FP16 Division. +(define_insn "divhf3_fp16" + [(set + (match_operand:HF 0 "s_register_operand" "=w") + (div:HF + (match_operand:HF 1 "s_register_operand" "w") + (match_operand:HF 2 "s_register_operand" "w")))] + "TARGET_VFP_FP16INST" + "vdiv.f16\t%0, %1, %2" + [(set_attr "conds" "unconditional") + (set_attr "type" "fdivs")] +) + +(define_expand "neon_vdivhf" + [(match_operand:HF 0 "s_register_operand") + (match_operand:HF 1 "s_register_operand") + (match_operand:HF 2 "s_register_operand")] + "TARGET_VFP_FP16INST" +{ + emit_insn (gen_divhf3_fp16 (operands[0], operands[1], operands[2])); + DONE; +}) + ; VFP9 Erratum 760019: It's potentially unsafe to overwrite the input ; operands, so mark the output as early clobber for VFPv2 on ARMv5 or ; earlier. @@ -1018,6 +1127,27 @@ ;; Multiplication insns +(define_insn "mulhf3_fp16" + [(set + (match_operand:HF 0 "s_register_operand" "=w") + (mult:HF (match_operand:HF 1 "s_register_operand" "w") + (match_operand:HF 2 "s_register_operand" "w")))] + "TARGET_VFP_FP16INST" + "vmul.f16\t%0, %1, %2" + [(set_attr "conds" "unconditional") + (set_attr "type" "fmuls")] +) + +(define_expand "neon_vmulfhf" + [(match_operand:HF 0 "s_register_operand") + (match_operand:HF 1 "s_register_operand") + (match_operand:HF 2 "s_register_operand")] + "TARGET_VFP_FP16INST" +{ + emit_insn (gen_mulhf3_fp16 (operands[0], operands[1], operands[2])); + DONE; +}) + (define_insn "*mulsf3_vfp" [(set (match_operand:SF 0 "s_register_operand" "=t") (mult:SF (match_operand:SF 1 "s_register_operand" "t") @@ -1040,6 +1170,26 @@ (set_attr "type" "fmuld")] ) +(define_insn "*mulsf3neghf_vfp" + [(set (match_operand:HF 0 "s_register_operand" "=t") + (mult:HF (neg:HF (match_operand:HF 1 "s_register_operand" "t")) + (match_operand:HF 2 "s_register_operand" "t")))] + "TARGET_VFP_FP16INST && !flag_rounding_math" + "vnmul.f16\\t%0, %1, %2" + [(set_attr "conds" "unconditional") + (set_attr "type" "fmuls")] +) + +(define_insn "*negmulhf3_vfp" + [(set (match_operand:HF 0 "s_register_operand" "=t") + (neg:HF (mult:HF (match_operand:HF 1 "s_register_operand" "t") + (match_operand:HF 2 "s_register_operand" "t"))))] + "TARGET_VFP_FP16INST" + "vnmul.f16\\t%0, %1, %2" + [(set_attr "conds" "unconditional") + (set_attr "type" "fmuls")] +) + (define_insn "*mulsf3negsf_vfp" [(set (match_operand:SF 0 "s_register_operand" "=t") (mult:SF (neg:SF (match_operand:SF 1 "s_register_operand" "t")) @@ -1089,6 +1239,18 @@ ;; Multiply-accumulate insns ;; 0 = 1 * 2 + 0 +(define_insn "*mulsf3addhf_vfp" + [(set (match_operand:HF 0 "s_register_operand" "=t") + (plus:HF + (mult:HF (match_operand:HF 2 "s_register_operand" "t") + (match_operand:HF 3 "s_register_operand" "t")) + (match_operand:HF 1 "s_register_operand" "0")))] + "TARGET_VFP_FP16INST" + "vmla.f16\\t%0, %2, %3" + [(set_attr "conds" "unconditional") + (set_attr "type" "fmacs")] +) + (define_insn "*mulsf3addsf_vfp" [(set (match_operand:SF 0 "s_register_operand" "=t") (plus:SF (mult:SF (match_operand:SF 2 "s_register_operand" "t") @@ -1114,6 +1276,17 @@ ) ;; 0 = 1 * 2 - 0 +(define_insn "*mulhf3subhf_vfp" + [(set (match_operand:HF 0 "s_register_operand" "=t") + (minus:HF (mult:HF (match_operand:HF 2 "s_register_operand" "t") + (match_operand:HF 3 "s_register_operand" "t")) + (match_operand:HF 1 "s_register_operand" "0")))] + "TARGET_VFP_FP16INST" + "vnmls.f16\\t%0, %2, %3" + [(set_attr "conds" "unconditional") + (set_attr "type" "fmacs")] +) + (define_insn "*mulsf3subsf_vfp" [(set (match_operand:SF 0 "s_register_operand" "=t") (minus:SF (mult:SF (match_operand:SF 2 "s_register_operand" "t") @@ -1139,6 +1312,17 @@ ) ;; 0 = -(1 * 2) + 0 +(define_insn "*mulhf3neghfaddhf_vfp" + [(set (match_operand:HF 0 "s_register_operand" "=t") + (minus:HF (match_operand:HF 1 "s_register_operand" "0") + (mult:HF (match_operand:HF 2 "s_register_operand" "t") + (match_operand:HF 3 "s_register_operand" "t"))))] + "TARGET_VFP_FP16INST" + "vmls.f16\\t%0, %2, %3" + [(set_attr "conds" "unconditional") + (set_attr "type" "fmacs")] +) + (define_insn "*mulsf3negsfaddsf_vfp" [(set (match_operand:SF 0 "s_register_operand" "=t") (minus:SF (match_operand:SF 1 "s_register_operand" "0") @@ -1165,6 +1349,18 @@ ;; 0 = -(1 * 2) - 0 +(define_insn "*mulhf3neghfsubhf_vfp" + [(set (match_operand:HF 0 "s_register_operand" "=t") + (minus:HF (mult:HF + (neg:HF (match_operand:HF 2 "s_register_operand" "t")) + (match_operand:HF 3 "s_register_operand" "t")) + (match_operand:HF 1 "s_register_operand" "0")))] + "TARGET_VFP_FP16INST" + "vnmla.f16\\t%0, %2, %3" + [(set_attr "conds" "unconditional") + (set_attr "type" "fmacs")] +) + (define_insn "*mulsf3negsfsubsf_vfp" [(set (match_operand:SF 0 "s_register_operand" "=t") (minus:SF (mult:SF @@ -1193,6 +1389,30 @@ ;; Fused-multiply-accumulate +(define_insn "fmahf4_fp16" + [(set (match_operand:HF 0 "register_operand" "=w") + (fma:HF + (match_operand:HF 1 "register_operand" "w") + (match_operand:HF 2 "register_operand" "w") + (match_operand:HF 3 "register_operand" "0")))] + "TARGET_VFP_FP16INST" + "vfma.f16\\t%0, %1, %2" + [(set_attr "conds" "unconditional") + (set_attr "type" "ffmas")] +) + +(define_expand "neon_vfmahf" + [(match_operand:HF 0 "s_register_operand") + (match_operand:HF 1 "s_register_operand") + (match_operand:HF 2 "s_register_operand") + (match_operand:HF 3 "s_register_operand")] + "TARGET_VFP_FP16INST" +{ + emit_insn (gen_fmahf4_fp16 (operands[0], operands[2], operands[3], + operands[1])); + DONE; +}) + (define_insn "fma<SDF:mode>4" [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>") (fma:SDF (match_operand:SDF 1 "register_operand" "<F_constraint>") @@ -1205,6 +1425,30 @@ (set_attr "type" "ffma<vfp_type>")] ) +(define_insn "fmsubhf4_fp16" + [(set (match_operand:HF 0 "register_operand" "=w") + (fma:HF + (neg:HF (match_operand:HF 1 "register_operand" "w")) + (match_operand:HF 2 "register_operand" "w") + (match_operand:HF 3 "register_operand" "0")))] + "TARGET_VFP_FP16INST" + "vfms.f16\\t%0, %1, %2" + [(set_attr "conds" "unconditional") + (set_attr "type" "ffmas")] +) + +(define_expand "neon_vfmshf" + [(match_operand:HF 0 "s_register_operand") + (match_operand:HF 1 "s_register_operand") + (match_operand:HF 2 "s_register_operand") + (match_operand:HF 3 "s_register_operand")] + "TARGET_VFP_FP16INST" +{ + emit_insn (gen_fmsubhf4_fp16 (operands[0], operands[2], operands[3], + operands[1])); + DONE; +}) + (define_insn "*fmsub<SDF:mode>4" [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>") (fma:SDF (neg:SDF (match_operand:SDF 1 "register_operand" @@ -1218,6 +1462,17 @@ (set_attr "type" "ffma<vfp_type>")] ) +(define_insn "*fnmsubhf4" + [(set (match_operand:HF 0 "register_operand" "=w") + (fma:HF (match_operand:HF 1 "register_operand" "w") + (match_operand:HF 2 "register_operand" "w") + (neg:HF (match_operand:HF 3 "register_operand" "0"))))] + "TARGET_VFP_FP16INST" + "vfnms.f16\\t%0, %1, %2" + [(set_attr "conds" "unconditional") + (set_attr "type" "ffmas")] +) + (define_insn "*fnmsub<SDF:mode>4" [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>") (fma:SDF (match_operand:SDF 1 "register_operand" "<F_constraint>") @@ -1230,6 +1485,17 @@ (set_attr "type" "ffma<vfp_type>")] ) +(define_insn "*fnmaddhf4" + [(set (match_operand:HF 0 "register_operand" "=w") + (fma:HF (neg:HF (match_operand:HF 1 "register_operand" "w")) + (match_operand:HF 2 "register_operand" "w") + (neg:HF (match_operand:HF 3 "register_operand" "0"))))] + "TARGET_VFP_FP16INST" + "vfnma.f16\\t%0, %1, %2" + [(set_attr "conds" "unconditional") + (set_attr "type" "ffmas")] +) + (define_insn "*fnmadd<SDF:mode>4" [(set (match_operand:SDF 0 "register_operand" "=<F_constraint>") (fma:SDF (neg:SDF (match_operand:SDF 1 "register_operand" @@ -1372,6 +1638,27 @@ ;; Sqrt insns. +(define_insn "neon_vsqrthf" + [(set (match_operand:HF 0 "s_register_operand" "=w") + (sqrt:HF (match_operand:HF 1 "s_register_operand" "w")))] + "TARGET_VFP_FP16INST" + "vsqrt.f16\t%0, %1" + [(set_attr "conds" "unconditional") + (set_attr "type" "fsqrts")] +) + +(define_insn "neon_vrsqrtshf" + [(set + (match_operand:HF 0 "s_register_operand" "=w") + (unspec:HF [(match_operand:HF 1 "s_register_operand" "w") + (match_operand:HF 2 "s_register_operand" "w")] + UNSPEC_VRSQRTS))] + "TARGET_VFP_FP16INST" + "vrsqrts.f16\t%0, %1, %2" + [(set_attr "conds" "unconditional") + (set_attr "type" "fsqrts")] +) + ; VFP9 Erratum 760019: It's potentially unsafe to overwrite the input ; operands, so mark the output as early clobber for VFPv2 on ARMv5 or ; earlier. @@ -1528,9 +1815,6 @@ ) ;; Fixed point to floating point conversions. -(define_code_iterator FCVT [unsigned_float float]) -(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")]) - (define_insn "*combine_vcvt_f32_<FCVTI32typename>" [(set (match_operand:SF 0 "s_register_operand" "=t") (mult:SF (FCVT:SF (match_operand:SI 1 "s_register_operand" "0")) @@ -1575,6 +1859,125 @@ (set_attr "type" "f_cvtf2i")] ) +;; FP16 conversions. +(define_insn "neon_vcvthhf" + [(set (match_operand:HF 0 "s_register_operand" "=w") + (unspec:HF + [(match_operand:SI 1 "s_register_operand" "w")] + VCVTH_US))] + "TARGET_VFP_FP16INST" + "vcvt.f16.%#32\t%0, %1" + [(set_attr "conds" "unconditional") + (set_attr "type" "f_cvti2f")] +) + +(define_insn "neon_vcvthsi" + [(set (match_operand:SI 0 "s_register_operand" "=w") + (unspec:SI + [(match_operand:HF 1 "s_register_operand" "w")] + VCVTH_US))] + "TARGET_VFP_FP16INST" + "vcvt.%#32.f16\t%0, %1" + [(set_attr "conds" "unconditional") + (set_attr "type" "f_cvtf2i")] +) + +;; The neon_vcvth_nhf patterns are used to generate the instruction for the +;; vcvth_n_f16_32 arm_fp16 intrinsics. They are complicated by the +;; hardware requirement that the source and destination registers are the same +;; despite having different machine modes. The approach is to use a temporary +;; register for the conversion and move that to the correct destination. + +;; Generate an unspec pattern for the intrinsic. +(define_insn "neon_vcvth_nhf_unspec" + [(set + (match_operand:SI 0 "s_register_operand" "=w") + (unspec:SI + [(match_operand:SI 1 "s_register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")] + VCVT_HF_US_N)) + (set + (match_operand:HF 3 "s_register_operand" "=w") + (float_truncate:HF (float:SF (match_dup 0))))] + "TARGET_VFP_FP16INST" +{ + neon_const_bounds (operands[2], 1, 33); + return "vcvt.f16.32\t%0, %0, %2\;vmov.f32\t%3, %0"; +} + [(set_attr "conds" "unconditional") + (set_attr "type" "f_cvti2f")] +) + +;; Generate the instruction patterns needed for vcvth_n_f16_s32 neon intrinsics. +(define_expand "neon_vcvth_nhf" + [(match_operand:HF 0 "s_register_operand") + (unspec:HF [(match_operand:SI 1 "s_register_operand") + (match_operand:SI 2 "immediate_operand")] + VCVT_HF_US_N)] +"TARGET_VFP_FP16INST" +{ + rtx op1 = gen_reg_rtx (SImode); + + neon_const_bounds (operands[2], 1, 33); + + emit_move_insn (op1, operands[1]); + emit_insn (gen_neon_vcvth_nhf_unspec (op1, op1, operands[2], + operands[0])); + DONE; +}) + +;; The neon_vcvth_nsi patterns are used to generate the instruction for the +;; vcvth_n_32_f16 arm_fp16 intrinsics. They have the same restrictions and +;; are implemented in the same way as the neon_vcvth_nhf patterns. + +;; Generate an unspec pattern, constraining the registers. +(define_insn "neon_vcvth_nsi_unspec" + [(set (match_operand:SI 0 "s_register_operand" "=w") + (unspec:SI + [(fix:SI + (fix:SF + (float_extend:SF + (match_operand:HF 1 "s_register_operand" "w")))) + (match_operand:SI 2 "immediate_operand" "i")] + VCVT_SI_US_N))] + "TARGET_VFP_FP16INST" +{ + neon_const_bounds (operands[2], 1, 33); + return "vmov.f32\t%0, %1\;vcvt.%#32.f16\t%0, %0, %2"; +} + [(set_attr "conds" "unconditional") + (set_attr "type" "f_cvtf2i")] +) + +;; Generate the instruction patterns needed for vcvth_n_f16_s32 neon intrinsics. +(define_expand "neon_vcvth_nsi" + [(match_operand:SI 0 "s_register_operand") + (unspec:SI + [(match_operand:HF 1 "s_register_operand") + (match_operand:SI 2 "immediate_operand")] + VCVT_SI_US_N)] + "TARGET_VFP_FP16INST" +{ + rtx op1 = gen_reg_rtx (SImode); + + neon_const_bounds (operands[2], 1, 33); + emit_insn (gen_neon_vcvth_nsi_unspec (op1, operands[1], operands[2])); + emit_move_insn (operands[0], op1); + DONE; +}) + +(define_insn "neon_vcvt<vcvth_op>hsi" + [(set + (match_operand:SI 0 "s_register_operand" "=w") + (unspec:SI + [(match_operand:HF 1 "s_register_operand" "w")] + VCVT_HF_US))] + "TARGET_VFP_FP16INST" + "vcvt<vcvth_op>.%#32.f16\t%0, %1" + [(set_attr "conds" "unconditional") + (set_attr "type" "f_cvtf2i")] +) + ;; Store multiple insn used in function prologue. (define_insn "*push_multi_vfp" [(match_parallel 2 "multi_register_push" @@ -1644,6 +2047,20 @@ ) ;; Scalar forms for the IEEE-754 fmax()/fmin() functions + +(define_insn "neon_<fmaxmin_op>hf" + [(set + (match_operand:HF 0 "s_register_operand" "=w") + (unspec:HF + [(match_operand:HF 1 "s_register_operand" "w") + (match_operand:HF 2 "s_register_operand" "w")] + VMAXMINFNM))] + "TARGET_VFP_FP16INST" + "<fmaxmin_op>.f16\t%0, %1, %2" + [(set_attr "conds" "unconditional") + (set_attr "type" "f_minmaxs")] +) + (define_insn "<fmaxmin><mode>3" [(set (match_operand:SDF 0 "s_register_operand" "=<F_constraint>") (unspec:SDF [(match_operand:SDF 1 "s_register_operand" "<F_constraint>") diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c new file mode 100644 index 0000000..8399288 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c @@ -0,0 +1,68 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */ +/* { dg-options "-O2 -ffast-math" } */ +/* { dg-add-options arm_v8_2a_fp16_scalar } */ + +/* Test instructions generated for half-precision arithmetic. */ + +typedef __fp16 float16_t; +typedef __simd64_float16_t float16x4_t; +typedef __simd128_float16_t float16x8_t; + +float16_t +fp16_abs (float16_t a) +{ + return (a < 0) ? -a : a; +} + +#define TEST_UNOP(NAME, OPERATOR, TY) \ + TY test_##NAME##_##TY (TY a) \ + { \ + return OPERATOR (a); \ + } + +#define TEST_BINOP(NAME, OPERATOR, TY) \ + TY test_##NAME##_##TY (TY a, TY b) \ + { \ + return a OPERATOR b; \ + } + +#define TEST_CMP(NAME, OPERATOR, RTY, TY) \ + RTY test_##NAME##_##TY (TY a, TY b) \ + { \ + return a OPERATOR b; \ + } + +/* Scalars. */ + +TEST_UNOP (neg, -, float16_t) +TEST_UNOP (abs, fp16_abs, float16_t) + +TEST_BINOP (add, +, float16_t) +TEST_BINOP (sub, -, float16_t) +TEST_BINOP (mult, *, float16_t) +TEST_BINOP (div, /, float16_t) + +TEST_CMP (equal, ==, int, float16_t) +TEST_CMP (unequal, !=, int, float16_t) +TEST_CMP (lessthan, <, int, float16_t) +TEST_CMP (greaterthan, >, int, float16_t) +TEST_CMP (lessthanequal, <=, int, float16_t) +TEST_CMP (greaterthanqual, >=, int, float16_t) + +/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } } */ + +/* { dg-final { scan-assembler-times {vadd\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vsub\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vmul\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vdiv\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 2 } } */ +/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 4 } } */ + +/* { dg-final { scan-assembler-not {vadd\.f16} } } */ +/* { dg-final { scan-assembler-not {vsub\.f16} } } */ +/* { dg-final { scan-assembler-not {vmul\.f16} } } */ +/* { dg-final { scan-assembler-not {vdiv\.f16} } } */ +/* { dg-final { scan-assembler-not {vcmp\.f16} } } */ +/* { dg-final { scan-assembler-not {vcmpe\.f16} } } */ diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c new file mode 100644 index 0000000..c9639a5 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-conv-1.c @@ -0,0 +1,101 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_v8_2a_fp16_scalar } */ + +/* Test ARMv8.2 FP16 conversions. */ +#include <arm_fp16.h> + +float +f16_to_f32 (__fp16 a) +{ + return (float)a; +} + +float +f16_to_pf32 (__fp16* a) +{ + return (float)*a; +} + +short +f16_to_s16 (__fp16 a) +{ + return (short)a; +} + +short +pf16_to_s16 (__fp16* a) +{ + return (short)*a; +} + +/* { dg-final { scan-assembler-times {vcvtb\.f32\.f16\ts[0-9]+, s[0-9]+} 4 } } */ + +__fp16 +f32_to_f16 (float a) +{ + return (__fp16)a; +} + +void +f32_to_pf16 (__fp16* x, float a) +{ + *x = (__fp16)a; +} + +__fp16 +s16_to_f16 (short a) +{ + return (__fp16)a; +} + +void +s16_to_pf16 (__fp16* x, short a) +{ + *x = (__fp16)a; +} + +/* { dg-final { scan-assembler-times {vcvtb\.f16\.f32\ts[0-9]+, s[0-9]+} 4 } } */ + +float +s16_to_f32 (short a) +{ + return (float)a; +} + +/* { dg-final { scan-assembler-times {vcvt\.f32\.s32\ts[0-9]+, s[0-9]+} 3 } } */ + +short +f32_to_s16 (float a) +{ + return (short)a; +} + +/* { dg-final { scan-assembler-times {vcvt\.s32\.f32\ts[0-9]+, s[0-9]+} 3 } } */ + +unsigned short +f32_to_u16 (float a) +{ + return (unsigned short)a; +} + +/* { dg-final { scan-assembler-times {vcvt\.u32\.f32\ts[0-9]+, s[0-9]+} 1 } } */ + +short +f64_to_s16 (double a) +{ + return (short)a; +} + +/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } } */ + +unsigned short +f64_to_u16 (double a) +{ + return (unsigned short)a; +} + +/* { dg-final { scan-assembler-times {vcvt\.s32\.f64\ts[0-9]+, d[0-9]+} 1 } } */ + + -- 2.1.4

[8/17,ARM] Add VFP FP16 arithmetic instructions.

Commit Message

Comments

Patch