[9/17,ARM] Add NEON FP16 arithmetic instructions.

Message ID	573B2CA9.5060703@foss.arm.com
State	New
Headers	show Return-Path: <gcc-patches-return-427500-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=cO2lNRHPWqJxnUIJJ mxm6lrj+qZTjfrPIh1lzHQSjc/8P9qFnVejjSJohWUFUqHFlJbkLw0CZGN0B7xA8 eVwXVbVFRpm6Xjhi1f2/1HXVZ3XtUF47wOUmot+QHfQ3S0IKfVAVpWafOgY9Ljs9 /e1aeX4xSi2MzFkUFyavp9Z+/w= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org Subject: [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions. To: gcc-patches <gcc-patches@gcc.gnu.org> References: <573B28A3.9030603@foss.arm.com> From: Matthew Wahab <matthew.wahab@foss.arm.com> Message-ID: <573B2CA9.5060703@foss.arm.com> Date: Tue, 17 May 2016 15:37:29 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <573B28A3.9030603@foss.arm.com> Content-Type: multipart/mixed; boundary="------------060509070405000308030804"

From 623f36632cc2848f16ba1c75f400198a72dc6ea4 Mon Sep 17 00:00:00 2001 From: Matthew Wahab <matthew.wahab@arm.com> Date: Thu, 7 Apr 2016 16:19:57 +0100 Subject: [PATCH 09/17] [PATCH 9/17][ARM] Add NEON FP16 arithmetic instructions. 2016-05-17 Matthew Wahab <matthew.wahab@arm.com> * config/arm/iterators.md (VCVTHI): New. (NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE. Fix a long line. (NEON_VAGLTE): New. (VFM_LANE_AS): New. (VH_CVTTO): New. (V_reg): Add HF, V4HF and V8HF. Fix white-space. (V_HALF): Add V4HF. Fix white-space. (V_if_elem): Add HF, V4HF and V8HF. Fix white-space. (V_s_elem): Likewise. (V_sz_elem): Fix white-space. (V_elem_ch): Likewise. (VH_elem_ch): New. (scalar_mul_constraint): Add V8HF and V4HF. (Is_float_mode): Fix white-space. (Is_d_reg): Add V4HF and V8HF. Fix white-space. (q): Add HF. Fix white-space. (float_sup): New. (float_SUP): New. (cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT. (neon_vfm_lane_as): New. * config/arm/neon.md (add<mode>3_fp16): New. (sub<mode>3_fp16): New. (mul<mode>3add<mode>_neon): New. (*fma<VH:mode>4): New. (fma<VH:mode>4_intrinsic): New. (fmsub<VCVTF:mode>4_intrinsic): Fix white-space. (*fmsub<VH:mode>4): New. (fmsub<VH:mode>4_intrinsic): New. (<absneg_str><mode>2_fp16): New. (neon_v<absneg_str><mode>): New. (neon_v<fp16_rnd_str><mode>): New. (neon_vsqrte<mode>): New. (neon_vpadd<mode>): New. (neon_vadd<mode>): New. (neon_vsub<mode>): New. (neon_vadd<mode>_unspec): New. (neon_vsub<mode>_unspec): New. (neon_vmulf<mode>): New. (neon_vfma<VH:mode>): New. (neon_vfms<VH:mode>): New. (neon_vc<cmp_op><mode>): New. (neon_vc<cmp_op><mode>_fp16insn): New (neon_vc<cmp_op_unsp><mode>_fp16insn_unspec): New. (neon_vca<cmp_op><mode>): New. (neon_vca<cmp_op><mode>_fp16insn): New. (neon_vca<cmp_op_unsp><mode>_fp16insn_unspec): New. (neon_vc<cmp_op>z<mode>): New. (neon_vabd<mode>): New. (neon_v<maxmin>f<mode>): New. (neon_vp<maxmin>f<mode>): New. (neon_<fmaxmin_op><mode>): New. (neon_vrecps<mode>): New. (neon_vrsqrts<mode>): New. (neon_vrecpe<mode>): New (VH variant). (neon_vdup_lane<mode>_internal): New. (neon_vdup_lane<mode>): New. (neon_vcvt<mode>): New (VCVTHI variant). (neon_vcvt<mode>): New (VH variant). (neon_vcvt_n<mode>): New (VH variant). (neon_vcvt_n<mode>): New (VCVTHI variant). (neon_vcvt<vcvth_op><mode>): New (VH variant). (neon_vmul_lane<mode>): New. (neon_vmul_n<mode>): New. * config/arm/unspecs.md (UNSPEC_VCALE): New (UNSPEC_VCALT): New. (UNSPEC_VFMA_LANE): New. (UNSPECS_VFMS_LANE): New. (UNSPECS_VSQRTE): New. testsuite/ 2016-05-17 Matthew Wahab <matthew.wahab@arm.com> * gcc.target/arm/armv8_2-fp16-arith-1.c: Add tests for float16x4_t and float16x8_t. --- gcc/config/arm/iterators.md | 121 +++-- gcc/config/arm/neon.md | 503 ++++++++++++++++++++- gcc/config/arm/unspecs.md | 6 +- .../gcc.target/arm/armv8_2-fp16-arith-1.c | 49 +- 4 files changed, 621 insertions(+), 58 deletions(-) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 9371b6a..be39e4a 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -145,6 +145,9 @@ ;; Vector modes form int->float conversions. (define_mode_iterator VCVTI [V2SI V4SI]) +;; Vector modes for int->half conversions. +(define_mode_iterator VCVTHI [V4HI V8HI]) + ;; Vector modes for doubleword multiply-accumulate, etc. insns. (define_mode_iterator VMD [V4HI V2SI V2SF]) @@ -267,10 +270,14 @@ (define_int_iterator VRINT [UNSPEC_VRINTZ UNSPEC_VRINTP UNSPEC_VRINTM UNSPEC_VRINTR UNSPEC_VRINTX UNSPEC_VRINTA]) -(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE UNSPEC_VCLT UNSPEC_VCLE]) +(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE + UNSPEC_VCLT UNSPEC_VCLE]) (define_int_iterator NEON_VACMP [UNSPEC_VCAGE UNSPEC_VCAGT]) +(define_int_iterator NEON_VAGLTE [UNSPEC_VCAGE UNSPEC_VCAGT + UNSPEC_VCALE UNSPEC_VCALT]) + (define_int_iterator VCVT [UNSPEC_VRINTP UNSPEC_VRINTM UNSPEC_VRINTA]) (define_int_iterator NEON_VRINT [UNSPEC_NVRINTP UNSPEC_NVRINTZ UNSPEC_NVRINTM @@ -398,6 +405,8 @@ (define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH]) +(define_int_iterator VFM_LANE_AS [UNSPEC_VFMA_LANE UNSPEC_VFMS_LANE]) + ;;---------------------------------------------------------------------------- ;; Mode attributes ;;---------------------------------------------------------------------------- @@ -416,6 +425,10 @@ (define_mode_attr V_cvtto [(V2SI "v2sf") (V2SF "v2si") (V4SI "v4sf") (V4SF "v4si")]) +;; (Opposite) mode to convert to/from for vector-half mode conversions. +(define_mode_attr VH_CVTTO [(V4HI "V4HF") (V4HF "V4HI") + (V8HI "V8HF") (V8HF "V8HI")]) + ;; Define element mode for each vector mode. (define_mode_attr V_elem [(V8QI "QI") (V16QI "QI") (V4HI "HI") (V8HI "HI") @@ -459,12 +472,13 @@ ;; Register width from element mode (define_mode_attr V_reg [(V8QI "P") (V16QI "q") - (V4HI "P") (V8HI "q") - (V4HF "P") (V8HF "q") - (V2SI "P") (V4SI "q") - (V2SF "P") (V4SF "q") - (DI "P") (V2DI "q") - (SF "") (DF "P")]) + (V4HI "P") (V8HI "q") + (V4HF "P") (V8HF "q") + (V2SI "P") (V4SI "q") + (V2SF "P") (V4SF "q") + (DI "P") (V2DI "q") + (SF "") (DF "P") + (HF "")]) ;; Wider modes with the same number of elements. (define_mode_attr V_widen [(V8QI "V8HI") (V4HI "V4SI") (V2SI "V2DI")]) @@ -480,7 +494,7 @@ (define_mode_attr V_HALF [(V16QI "V8QI") (V8HI "V4HI") (V8HF "V4HF") (V4SI "V2SI") (V4SF "V2SF") (V2DF "DF") - (V2DI "DI")]) + (V2DI "DI") (V4HF "HF")]) ;; Same, but lower-case. (define_mode_attr V_half [(V16QI "v8qi") (V8HI "v4hi") @@ -529,18 +543,22 @@ ;; Get element type from double-width mode, for operations where we ;; don't care about signedness. (define_mode_attr V_if_elem [(V8QI "i8") (V16QI "i8") - (V4HI "i16") (V8HI "i16") - (V2SI "i32") (V4SI "i32") - (DI "i64") (V2DI "i64") - (V2SF "f32") (V4SF "f32") - (SF "f32") (DF "f64")]) + (V4HI "i16") (V8HI "i16") + (V2SI "i32") (V4SI "i32") + (DI "i64") (V2DI "i64") + (V2SF "f32") (V4SF "f32") + (SF "f32") (DF "f64") + (HF "f16") (V4HF "f16") + (V8HF "f16")]) ;; Same, but for operations which work on signed values. (define_mode_attr V_s_elem [(V8QI "s8") (V16QI "s8") - (V4HI "s16") (V8HI "s16") - (V2SI "s32") (V4SI "s32") - (DI "s64") (V2DI "s64") - (V2SF "f32") (V4SF "f32")]) + (V4HI "s16") (V8HI "s16") + (V2SI "s32") (V4SI "s32") + (DI "s64") (V2DI "s64") + (V2SF "f32") (V4SF "f32") + (HF "f16") (V4HF "f16") + (V8HF "f16")]) ;; Same, but for operations which work on unsigned values. (define_mode_attr V_u_elem [(V8QI "u8") (V16QI "u8") @@ -557,17 +575,22 @@ (V2SF "32") (V4SF "32")]) (define_mode_attr V_sz_elem [(V8QI "8") (V16QI "8") - (V4HI "16") (V8HI "16") - (V2SI "32") (V4SI "32") - (DI "64") (V2DI "64") + (V4HI "16") (V8HI "16") + (V2SI "32") (V4SI "32") + (DI "64") (V2DI "64") (V4HF "16") (V8HF "16") - (V2SF "32") (V4SF "32")]) + (V2SF "32") (V4SF "32")]) (define_mode_attr V_elem_ch [(V8QI "b") (V16QI "b") - (V4HI "h") (V8HI "h") - (V2SI "s") (V4SI "s") - (DI "d") (V2DI "d") - (V2SF "s") (V4SF "s")]) + (V4HI "h") (V8HI "h") + (V2SI "s") (V4SI "s") + (DI "d") (V2DI "d") + (V2SF "s") (V4SF "s") + (V2SF "s") (V4SF "s")]) + +(define_mode_attr VH_elem_ch [(V4HI "s") (V8HI "s") + (V4HF "s") (V8HF "s") + (HF "s")]) ;; Element sizes for duplicating ARM registers to all elements of a vector. (define_mode_attr VD_dup [(V8QI "8") (V4HI "16") (V2SI "32") (V2SF "32")]) @@ -603,16 +626,17 @@ ;; This mode attribute is used to obtain the correct register constraints. (define_mode_attr scalar_mul_constraint [(V4HI "x") (V2SI "t") (V2SF "t") - (V8HI "x") (V4SI "t") (V4SF "t")]) + (V8HI "x") (V4SI "t") (V4SF "t") + (V8HF "x") (V4HF "x")]) ;; Predicates used for setting type for neon instructions (define_mode_attr Is_float_mode [(V8QI "false") (V16QI "false") - (V4HI "false") (V8HI "false") - (V2SI "false") (V4SI "false") - (V4HF "true") (V8HF "true") - (V2SF "true") (V4SF "true") - (DI "false") (V2DI "false")]) + (V4HI "false") (V8HI "false") + (V2SI "false") (V4SI "false") + (V4HF "true") (V8HF "true") + (V2SF "true") (V4SF "true") + (DI "false") (V2DI "false")]) (define_mode_attr Scalar_mul_8_16 [(V8QI "true") (V16QI "true") (V4HI "true") (V8HI "true") @@ -621,10 +645,10 @@ (DI "false") (V2DI "false")]) (define_mode_attr Is_d_reg [(V8QI "true") (V16QI "false") - (V4HI "true") (V8HI "false") - (V2SI "true") (V4SI "false") - (V2SF "true") (V4SF "false") - (DI "true") (V2DI "false") + (V4HI "true") (V8HI "false") + (V2SI "true") (V4SI "false") + (V2SF "true") (V4SF "false") + (DI "true") (V2DI "false") (V4HF "true") (V8HF "false")]) (define_mode_attr V_mode_nunits [(V8QI "8") (V16QI "16") @@ -670,12 +694,14 @@ ;; Mode attribute used to build the "type" attribute. (define_mode_attr q [(V8QI "") (V16QI "_q") - (V4HI "") (V8HI "_q") - (V2SI "") (V4SI "_q") + (V4HI "") (V8HI "_q") + (V2SI "") (V4SI "_q") (V4HF "") (V8HF "_q") - (V2SF "") (V4SF "_q") - (DI "") (V2DI "_q") - (DF "") (V2DF "_q")]) + (V2SF "") (V4SF "_q") + (V4HF "") (V8HF "_q") + (DI "") (V2DI "_q") + (DF "") (V2DF "_q") + (HF "")]) (define_mode_attr pf [(V8QI "p") (V16QI "p") (V2SF "f") (V4SF "f")]) @@ -718,6 +744,10 @@ ;; Conversions. (define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")]) +(define_code_attr float_sup [(unsigned_float "u") (float "s")]) + +(define_code_attr float_SUP [(unsigned_float "U") (float "S")]) + ;;---------------------------------------------------------------------------- ;; Int attributes ;;---------------------------------------------------------------------------- @@ -790,9 +820,10 @@ (UNSPEC_VRNDP "vrintp") (UNSPEC_VRNDX "vrintx")]) (define_int_attr cmp_op_unsp [(UNSPEC_VCEQ "eq") (UNSPEC_VCGT "gt") - (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le") - (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge") - (UNSPEC_VCAGT "gt")]) + (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le") + (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge") + (UNSPEC_VCAGT "gt") (UNSPEC_VCALE "le") + (UNSPEC_VCALT "lt")]) (define_int_attr r [ (UNSPEC_VRHADD_S "r") (UNSPEC_VRHADD_U "r") @@ -908,3 +939,7 @@ ;; Attributes for VQRDMLAH/VQRDMLSH (define_int_attr neon_rdma_as [(UNSPEC_VQRDMLAH "a") (UNSPEC_VQRDMLSH "s")]) + +;; Attributes for VFMA_LANE/ VFMS_LANE +(define_int_attr neon_vfm_lane_as + [(UNSPEC_VFMA_LANE "a") (UNSPEC_VFMS_LANE "s")]) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 5fcc991..7a44f5f 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -505,6 +505,20 @@ (const_string "neon_add<q>")))] ) +(define_insn "add<mode>3_fp16" + [(set + (match_operand:VH 0 "s_register_operand" "=w") + (plus:VH + (match_operand:VH 1 "s_register_operand" "w") + (match_operand:VH 2 "s_register_operand" "w")))] + "TARGET_NEON_FP16INST" + "vadd.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set (attr "type") + (if_then_else (match_test "<Is_float_mode>") + (const_string "neon_fp_addsub_s<q>") + (const_string "neon_add<q>")))] +) + (define_insn "adddi3_neon" [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?w,?&r,?&r,?&r") (plus:DI (match_operand:DI 1 "s_register_operand" "%w,0,0,w,r,0,r") @@ -543,6 +557,17 @@ (const_string "neon_sub<q>")))] ) +(define_insn "sub<mode>3_fp16" + [(set + (match_operand:VH 0 "s_register_operand" "=w") + (minus:VH + (match_operand:VH 1 "s_register_operand" "w") + (match_operand:VH 2 "s_register_operand" "w")))] + "TARGET_NEON_FP16INST" + "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_sub<q>")] +) + (define_insn "subdi3_neon" [(set (match_operand:DI 0 "s_register_operand" "=w,?&r,?&r,?&r,?w") (minus:DI (match_operand:DI 1 "s_register_operand" "w,0,r,0,w") @@ -591,6 +616,16 @@ (const_string "neon_mla_<V_elem_ch><q>")))] ) +(define_insn "mul<mode>3add<mode>_neon" + [(set (match_operand:VH 0 "s_register_operand" "=w") + (plus:VH (mult:VH (match_operand:VH 2 "s_register_operand" "w") + (match_operand:VH 3 "s_register_operand" "w")) + (match_operand:VH 1 "s_register_operand" "0")))] + "TARGET_NEON_FP16INST && (!<Is_float_mode> || flag_unsafe_math_optimizations)" + "vmla.f16\t%<V_reg>0, %<V_reg>2, %<V_reg>3" + [(set_attr "type" "neon_fp_mla_s<q>")] +) + (define_insn "mul<mode>3neg<mode>add<mode>_neon" [(set (match_operand:VDQW 0 "s_register_operand" "=w") (minus:VDQW (match_operand:VDQW 1 "s_register_operand" "0") @@ -629,6 +664,28 @@ [(set_attr "type" "neon_fp_mla_s<q>")] ) +(define_insn "*fma<VH:mode>4" + [(set (match_operand:VH 0 "register_operand" "=w") + (fma:VH + (match_operand:VH 1 "register_operand" "w") + (match_operand:VH 2 "register_operand" "w") + (match_operand:VH 3 "register_operand" "0")))] + "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations" + "vfma.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_fp_mla_s<q>")] +) + +(define_insn "fma<VH:mode>4_intrinsic" + [(set (match_operand:VH 0 "register_operand" "=w") + (fma:VH + (match_operand:VH 1 "register_operand" "w") + (match_operand:VH 2 "register_operand" "w") + (match_operand:VH 3 "register_operand" "0")))] + "TARGET_NEON_FP16INST" + "vfma.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_fp_mla_s<q>")] +) + (define_insn "*fmsub<VCVTF:mode>4" [(set (match_operand:VCVTF 0 "register_operand" "=w") (fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w")) @@ -640,13 +697,36 @@ ) (define_insn "fmsub<VCVTF:mode>4_intrinsic" - [(set (match_operand:VCVTF 0 "register_operand" "=w") - (fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w")) - (match_operand:VCVTF 2 "register_operand" "w") - (match_operand:VCVTF 3 "register_operand" "0")))] - "TARGET_NEON && TARGET_FMA" - "vfms%?.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2" - [(set_attr "type" "neon_fp_mla_s<q>")] + [(set (match_operand:VCVTF 0 "register_operand" "=w") + (fma:VCVTF + (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w")) + (match_operand:VCVTF 2 "register_operand" "w") + (match_operand:VCVTF 3 "register_operand" "0")))] + "TARGET_NEON && TARGET_FMA" + "vfms%?.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_fp_mla_s<q>")] +) + +(define_insn "*fmsub<VH:mode>4" + [(set (match_operand:VH 0 "register_operand" "=w") + (fma:VH + (neg:VH (match_operand:VH 1 "register_operand" "w")) + (match_operand:VH 2 "register_operand" "w") + (match_operand:VH 3 "register_operand" "0")))] + "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations" + "vfms.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_fp_mla_s<q>")] +) + +(define_insn "fmsub<VH:mode>4_intrinsic" + [(set (match_operand:VH 0 "register_operand" "=w") + (fma:VH + (neg:VH (match_operand:VH 1 "register_operand" "w")) + (match_operand:VH 2 "register_operand" "w") + (match_operand:VH 3 "register_operand" "0")))] + "TARGET_NEON_FP16INST" + "vfms.<V_if_elem>\\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_fp_mla_s<q>")] ) (define_insn "neon_vrint<NEON_VRINT:nvrint_variant><VCVTF:mode>" @@ -860,6 +940,44 @@ "" ) +(define_insn "<absneg_str><mode>2_fp16" + [(set (match_operand:VH 0 "s_register_operand" "=w") + (ABSNEG:VH (match_operand:VH 1 "s_register_operand" "w")))] + "TARGET_NEON_FP16INST" + "v<absneg_str>.<V_s_elem>\t%<V_reg>0, %<V_reg>1" + [(set_attr "type" "neon_abs<q>")] +) + +(define_expand "neon_v<absneg_str><mode>" + [(set + (match_operand:VH 0 "s_register_operand") + (ABSNEG:VH (match_operand:VH 1 "s_register_operand")))] + "TARGET_NEON_FP16INST" +{ + emit_insn (gen_<absneg_str><mode>2_fp16 (operands[0], operands[1])); + DONE; +}) + +(define_insn "neon_v<fp16_rnd_str><mode>" + [(set (match_operand:VH 0 "s_register_operand" "=w") + (unspec:VH + [(match_operand:VH 1 "s_register_operand" "w")] + FP16_RND))] + "TARGET_NEON_FP16INST" + "<fp16_rnd_insn>.<V_s_elem>\t%<V_reg>0, %<V_reg>1" + [(set_attr "type" "neon_fp_round_s<q>")] +) + +(define_insn "neon_vsqrte<mode>" + [(set (match_operand:VH 0 "s_register_operand" "=w") + (unspec:VH + [(match_operand:VH 1 "s_register_operand" "w")] + UNSPEC_VSQRTE))] + "TARGET_NEON_FP16INST" + "vsqrte.f16\t%<V_reg>0, %<V_reg>1" + [(set_attr "type" "neon_fp_rsqrte_s<q>")] +) + (define_insn "*umin<mode>3_neon" [(set (match_operand:VDQIW 0 "s_register_operand" "=w") (umin:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w") @@ -1601,6 +1719,17 @@ (const_string "neon_reduc_add<q>")))] ) +(define_insn "neon_vpaddv4hf" + [(set + (match_operand:V4HF 0 "s_register_operand" "=w") + (unspec:V4HF [(match_operand:V4HF 1 "s_register_operand" "w") + (match_operand:V4HF 2 "s_register_operand" "w")] + UNSPEC_VPADD))] + "TARGET_NEON_FP16INST" + "vpadd.f16\t%P0, %P1, %P2" + [(set_attr "type" "neon_reduc_add")] +) + (define_insn "neon_vpsmin<mode>" [(set (match_operand:VD 0 "s_register_operand" "=w") (unspec:VD [(match_operand:VD 1 "s_register_operand" "w") @@ -1949,6 +2078,26 @@ DONE; }) +(define_expand "neon_vadd<mode>" + [(match_operand:VH 0 "s_register_operand") + (match_operand:VH 1 "s_register_operand") + (match_operand:VH 2 "s_register_operand")] + "TARGET_NEON_FP16INST" +{ + emit_insn (gen_add<mode>3_fp16 (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_expand "neon_vsub<mode>" + [(match_operand:VH 0 "s_register_operand") + (match_operand:VH 1 "s_register_operand") + (match_operand:VH 2 "s_register_operand")] + "TARGET_NEON_FP16INST" +{ + emit_insn (gen_sub<mode>3_fp16 (operands[0], operands[1], operands[2])); + DONE; +}) + ; Note that NEON operations don't support the full IEEE 754 standard: in ; particular, denormal values are flushed to zero. This means that GCC cannot ; use those instructions for autovectorization, etc. unless @@ -1974,6 +2123,30 @@ (const_string "neon_add<q>")))] ) +(define_insn "neon_vadd<mode>_unspec" + [(set + (match_operand:VH 0 "s_register_operand" "=w") + (unspec:VH + [(match_operand:VH 1 "s_register_operand" "w") + (match_operand:VH 2 "s_register_operand" "w")] + UNSPEC_VADD))] + "TARGET_NEON_FP16INST" + "vadd.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_add<q>")] +) + +(define_insn "neon_vsub<mode>_unspec" + [(set + (match_operand:VH 0 "s_register_operand" "=w") + (unspec:VH + [(match_operand:VH 1 "s_register_operand" "w") + (match_operand:VH 2 "s_register_operand" "w")] + UNSPEC_VSUB))] + "TARGET_NEON_FP16INST" + "vsub.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_sub<q>")] +) + (define_insn "neon_vaddl<mode>" [(set (match_operand:<V_widen> 0 "s_register_operand" "=w") (unspec:<V_widen> [(match_operand:VDI 1 "s_register_operand" "w") @@ -2040,6 +2213,17 @@ (const_string "neon_mul_<V_elem_ch><q>")))] ) +(define_insn "neon_vmulf<mode>" + [(set + (match_operand:VH 0 "s_register_operand" "=w") + (mult:VH + (match_operand:VH 1 "s_register_operand" "w") + (match_operand:VH 2 "s_register_operand" "w")))] + "TARGET_NEON_FP16INST" + "vmul.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_mul_<VH_elem_ch><q>")] +) + (define_expand "neon_vmla<mode>" [(match_operand:VDQW 0 "s_register_operand" "=w") (match_operand:VDQW 1 "s_register_operand" "0") @@ -2068,6 +2252,18 @@ DONE; }) +(define_expand "neon_vfma<VH:mode>" + [(match_operand:VH 0 "s_register_operand") + (match_operand:VH 1 "s_register_operand") + (match_operand:VH 2 "s_register_operand") + (match_operand:VH 3 "s_register_operand")] + "TARGET_NEON_FP16INST" +{ + emit_insn (gen_fma<mode>4_intrinsic (operands[0], operands[2], operands[3], + operands[1])); + DONE; +}) + (define_expand "neon_vfms<VCVTF:mode>" [(match_operand:VCVTF 0 "s_register_operand") (match_operand:VCVTF 1 "s_register_operand") @@ -2080,6 +2276,18 @@ DONE; }) +(define_expand "neon_vfms<VH:mode>" + [(match_operand:VH 0 "s_register_operand") + (match_operand:VH 1 "s_register_operand") + (match_operand:VH 2 "s_register_operand") + (match_operand:VH 3 "s_register_operand")] + "TARGET_NEON_FP16INST" +{ + emit_insn (gen_fmsub<mode>4_intrinsic (operands[0], operands[2], operands[3], + operands[1])); + DONE; +}) + ; Used for intrinsics when flag_unsafe_math_optimizations is false. (define_insn "neon_vmla<mode>_unspec" @@ -2380,6 +2588,72 @@ [(set_attr "type" "neon_fp_compare_s<q>")] ) +(define_expand "neon_vc<cmp_op><mode>" + [(match_operand:<V_cmp_result> 0 "s_register_operand") + (neg:<V_cmp_result> + (COMPARISONS:VH + (match_operand:VH 1 "s_register_operand") + (match_operand:VH 2 "reg_or_zero_operand")))] + "TARGET_NEON_FP16INST" +{ + /* For FP comparisons use UNSPECS unless -funsafe-math-optimizations + are enabled. */ + if (GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT + && !flag_unsafe_math_optimizations) + emit_insn + (gen_neon_vc<cmp_op><mode>_fp16insn_unspec + (operands[0], operands[1], operands[2])); + else + emit_insn + (gen_neon_vc<cmp_op><mode>_fp16insn + (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_insn "neon_vc<cmp_op><mode>_fp16insn" + [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w") + (neg:<V_cmp_result> + (COMPARISONS:<V_cmp_result> + (match_operand:VH 1 "s_register_operand" "w,w") + (match_operand:VH 2 "reg_or_zero_operand" "w,Dz"))))] + "TARGET_NEON_FP16INST + && !(GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT + && !flag_unsafe_math_optimizations)" +{ + char pattern[100]; + sprintf (pattern, "vc<cmp_op>.%s%%#<V_sz_elem>\t%%<V_reg>0," + " %%<V_reg>1, %s", + GET_MODE_CLASS (<MODE>mode) == MODE_VECTOR_FLOAT + ? "f" : "<cmp_type>", + which_alternative == 0 + ? "%<V_reg>2" : "#0"); + output_asm_insn (pattern, operands); + return ""; +} + [(set (attr "type") + (if_then_else (match_operand 2 "zero_operand") + (const_string "neon_compare_zero<q>") + (const_string "neon_compare<q>")))]) + +(define_insn "neon_vc<cmp_op_unsp><mode>_fp16insn_unspec" + [(set + (match_operand:<V_cmp_result> 0 "s_register_operand" "=w,w") + (unspec:<V_cmp_result> + [(match_operand:VH 1 "s_register_operand" "w,w") + (match_operand:VH 2 "reg_or_zero_operand" "w,Dz")] + NEON_VCMP))] + "TARGET_NEON_FP16INST" +{ + char pattern[100]; + sprintf (pattern, "vc<cmp_op_unsp>.f%%#<V_sz_elem>\t%%<V_reg>0," + " %%<V_reg>1, %s", + which_alternative == 0 + ? "%<V_reg>2" : "#0"); + output_asm_insn (pattern, operands); + return ""; +} + [(set_attr "type" "neon_fp_compare_s<q>")]) + (define_insn "neon_vc<cmp_op>u<mode>" [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w") (neg:<V_cmp_result> @@ -2431,6 +2705,60 @@ [(set_attr "type" "neon_fp_compare_s<q>")] ) +(define_expand "neon_vca<cmp_op><mode>" + [(set + (match_operand:<V_cmp_result> 0 "s_register_operand") + (neg:<V_cmp_result> + (GLTE:<V_cmp_result> + (abs:VH (match_operand:VH 1 "s_register_operand")) + (abs:VH (match_operand:VH 2 "s_register_operand")))))] + "TARGET_NEON_FP16INST" +{ + if (flag_unsafe_math_optimizations) + emit_insn (gen_neon_vca<cmp_op><mode>_fp16insn + (operands[0], operands[1], operands[2])); + else + emit_insn (gen_neon_vca<cmp_op><mode>_fp16insn_unspec + (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_insn "neon_vca<cmp_op><mode>_fp16insn" + [(set + (match_operand:<V_cmp_result> 0 "s_register_operand" "=w") + (neg:<V_cmp_result> + (GLTE:<V_cmp_result> + (abs:VH (match_operand:VH 1 "s_register_operand" "w")) + (abs:VH (match_operand:VH 2 "s_register_operand" "w")))))] + "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations" + "vac<cmp_op>.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_fp_compare_s<q>")] +) + +(define_insn "neon_vca<cmp_op_unsp><mode>_fp16insn_unspec" + [(set (match_operand:<V_cmp_result> 0 "s_register_operand" "=w") + (unspec:<V_cmp_result> + [(match_operand:VH 1 "s_register_operand" "w") + (match_operand:VH 2 "s_register_operand" "w")] + NEON_VAGLTE))] + "TARGET_NEON" + "vac<cmp_op_unsp>.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_fp_compare_s<q>")] +) + +(define_expand "neon_vc<cmp_op>z<mode>" + [(set + (match_operand:<V_cmp_result> 0 "s_register_operand") + (COMPARISONS:<V_cmp_result> + (match_operand:VH 1 "s_register_operand") + (const_int 0)))] + "TARGET_NEON_FP16INST" + { + emit_insn (gen_neon_vc<cmp_op><mode> (operands[0], operands[1], + CONST0_RTX (<MODE>mode))); + DONE; +}) + (define_insn "neon_vtst<mode>" [(set (match_operand:VDQIW 0 "s_register_operand" "=w") (unspec:VDQIW [(match_operand:VDQIW 1 "s_register_operand" "w") @@ -2451,6 +2779,16 @@ [(set_attr "type" "neon_abd<q>")] ) +(define_insn "neon_vabd<mode>" + [(set (match_operand:VH 0 "s_register_operand" "=w") + (unspec:VH [(match_operand:VH 1 "s_register_operand" "w") + (match_operand:VH 2 "s_register_operand" "w")] + UNSPEC_VABD_F))] + "TARGET_NEON_FP16INST" + "vabd.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_abd<q>")] +) + (define_insn "neon_vabdf<mode>" [(set (match_operand:VCVTF 0 "s_register_operand" "=w") (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w") @@ -2513,6 +2851,40 @@ [(set_attr "type" "neon_fp_minmax_s<q>")] ) +(define_insn "neon_v<maxmin>f<mode>" + [(set (match_operand:VH 0 "s_register_operand" "=w") + (unspec:VH + [(match_operand:VH 1 "s_register_operand" "w") + (match_operand:VH 2 "s_register_operand" "w")] + VMAXMINF))] + "TARGET_NEON_FP16INST" + "v<maxmin>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_fp_minmax_s<q>")] +) + +(define_insn "neon_vp<maxmin>fv4hf" + [(set (match_operand:V4HF 0 "s_register_operand" "=w") + (unspec:V4HF + [(match_operand:V4HF 1 "s_register_operand" "w") + (match_operand:V4HF 2 "s_register_operand" "w")] + VPMAXMINF))] + "TARGET_NEON_FP16INST" + "vp<maxmin>.f16\t%P0, %P1, %P2" + [(set_attr "type" "neon_reduc_minmax")] +) + +(define_insn "neon_<fmaxmin_op><mode>" + [(set + (match_operand:VH 0 "s_register_operand" "=w") + (unspec:VH + [(match_operand:VH 1 "s_register_operand" "w") + (match_operand:VH 2 "s_register_operand" "w")] + VMAXMINFNM))] + "TARGET_NEON_FP16INST" + "<fmaxmin_op>.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_fp_minmax_s<q>")] +) + ;; Vector forms for the IEEE-754 fmax()/fmin() functions (define_insn "<fmaxmin><mode>3" [(set (match_operand:VCVTF 0 "s_register_operand" "=w") @@ -2584,6 +2956,17 @@ [(set_attr "type" "neon_fp_recps_s<q>")] ) +(define_insn "neon_vrecps<mode>" + [(set + (match_operand:VH 0 "s_register_operand" "=w") + (unspec:VH [(match_operand:VH 1 "s_register_operand" "w") + (match_operand:VH 2 "s_register_operand" "w")] + UNSPEC_VRECPS))] + "TARGET_NEON_FP16INST" + "vrecps.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_fp_recps_s<q>")] +) + (define_insn "neon_vrsqrts<mode>" [(set (match_operand:VCVTF 0 "s_register_operand" "=w") (unspec:VCVTF [(match_operand:VCVTF 1 "s_register_operand" "w") @@ -2594,6 +2977,17 @@ [(set_attr "type" "neon_fp_rsqrts_s<q>")] ) +(define_insn "neon_vrsqrts<mode>" + [(set + (match_operand:VH 0 "s_register_operand" "=w") + (unspec:VH [(match_operand:VH 1 "s_register_operand" "w") + (match_operand:VH 2 "s_register_operand" "w")] + UNSPEC_VRSQRTS))] + "TARGET_NEON_FP16INST" + "vrsqrts.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2" + [(set_attr "type" "neon_fp_rsqrts_s<q>")] +) + (define_expand "neon_vabs<mode>" [(match_operand:VDQW 0 "s_register_operand" "") (match_operand:VDQW 1 "s_register_operand" "")] @@ -2709,6 +3103,15 @@ }) (define_insn "neon_vrecpe<mode>" + [(set (match_operand:VH 0 "s_register_operand" "=w") + (unspec:VH [(match_operand:VH 1 "s_register_operand" "w")] + UNSPEC_VRECPE))] + "TARGET_NEON_FP16INST" + "vrecpe.f16\t%<V_reg>0, %<V_reg>1" + [(set_attr "type" "neon_fp_recpe_s<q>")] +) + +(define_insn "neon_vrecpe<mode>" [(set (match_operand:V32 0 "s_register_operand" "=w") (unspec:V32 [(match_operand:V32 1 "s_register_operand" "w")] UNSPEC_VRECPE))] @@ -3251,6 +3654,28 @@ if (BYTES_BIG_ENDIAN) [(set_attr "type" "neon_fp_cvt_narrow_s_q")] ) +(define_insn "neon_vcvt<mode>" + [(set + (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w") + (unspec:<VH_CVTTO> + [(match_operand:VCVTHI 1 "s_register_operand" "w")] + VCVT_US))] + "TARGET_NEON_FP16INST" + "vcvt.f16.%#16\t%<V_reg>0, %<V_reg>1" + [(set_attr "type" "neon_int_to_fp_<VH_elem_ch><q>")] +) + +(define_insn "neon_vcvt<mode>" + [(set + (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w") + (unspec:<VH_CVTTO> + [(match_operand:VH 1 "s_register_operand" "w")] + VCVT_US))] + "TARGET_NEON_FP16INST" + "vcvt.%#16.f16\t%<V_reg>0, %<V_reg>1" + [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")] +) + (define_insn "neon_vcvt_n<mode>" [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w") (unspec:<V_CVTTO> [(match_operand:VCVTF 1 "s_register_operand" "w") @@ -3265,6 +3690,20 @@ if (BYTES_BIG_ENDIAN) ) (define_insn "neon_vcvt_n<mode>" + [(set (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w") + (unspec:<VH_CVTTO> + [(match_operand:VH 1 "s_register_operand" "w") + (match_operand:SI 2 "immediate_operand" "i")] + VCVT_US_N))] + "TARGET_NEON_FP16INST" +{ + neon_const_bounds (operands[2], 1, 33); + return "vcvt.%#16.f16\t%<V_reg>0, %<V_reg>1, %2"; +} + [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")] +) + +(define_insn "neon_vcvt_n<mode>" [(set (match_operand:<V_CVTTO> 0 "s_register_operand" "=w") (unspec:<V_CVTTO> [(match_operand:VCVTI 1 "s_register_operand" "w") (match_operand:SI 2 "immediate_operand" "i")] @@ -3277,6 +3716,31 @@ if (BYTES_BIG_ENDIAN) [(set_attr "type" "neon_int_to_fp_<V_elem_ch><q>")] ) +(define_insn "neon_vcvt_n<mode>" + [(set (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w") + (unspec:<VH_CVTTO> + [(match_operand:VCVTHI 1 "s_register_operand" "w") + (match_operand:SI 2 "immediate_operand" "i")] + VCVT_US_N))] + "TARGET_NEON_FP16INST" +{ + neon_const_bounds (operands[2], 1, 33); + return "vcvt.f16.%#16\t%<V_reg>0, %<V_reg>1, %2"; +} + [(set_attr "type" "neon_int_to_fp_<VH_elem_ch><q>")] +) + +(define_insn "neon_vcvt<vcvth_op><mode>" + [(set + (match_operand:<VH_CVTTO> 0 "s_register_operand" "=w") + (unspec:<VH_CVTTO> + [(match_operand:VH 1 "s_register_operand" "w")] + VCVT_HF_US))] + "TARGET_NEON_FP16INST" + "vcvt<vcvth_op>.%#16.f16\t%<V_reg>0, %<V_reg>1" + [(set_attr "type" "neon_fp_to_int_<VH_elem_ch><q>")] +) + (define_insn "neon_vmovn<mode>" [(set (match_operand:<V_narrow> 0 "s_register_operand" "=w") (unspec:<V_narrow> [(match_operand:VN 1 "s_register_operand" "w")] @@ -3347,6 +3811,18 @@ if (BYTES_BIG_ENDIAN) (const_string "neon_mul_<V_elem_ch>_scalar<q>")))] ) +(define_insn "neon_vmul_lane<mode>" + [(set (match_operand:VH 0 "s_register_operand" "=w") + (unspec:VH [(match_operand:VH 1 "s_register_operand" "w") + (match_operand:V4HF 2 "s_register_operand" + "<scalar_mul_constraint>") + (match_operand:SI 3 "immediate_operand" "i")] + UNSPEC_VMUL_LANE))] + "TARGET_NEON_FP16INST" + "vmul.f16\t%<V_reg>0, %<V_reg>1, %P2[%c3]" + [(set_attr "type" "neon_fp_mul_s_scalar<q>")] +) + (define_insn "neon_vmull_lane<mode>" [(set (match_operand:<V_widen> 0 "s_register_operand" "=w") (unspec:<V_widen> [(match_operand:VMDI 1 "s_register_operand" "w") @@ -3601,6 +4077,19 @@ if (BYTES_BIG_ENDIAN) DONE; }) +(define_expand "neon_vmul_n<mode>" + [(match_operand:VH 0 "s_register_operand") + (match_operand:VH 1 "s_register_operand") + (match_operand:<V_elem> 2 "s_register_operand")] + "TARGET_NEON_FP16INST" +{ + rtx tmp = gen_reg_rtx (V4HFmode); + emit_insn (gen_neon_vset_lanev4hf (tmp, operands[2], tmp, const0_rtx)); + emit_insn (gen_neon_vmul_lane<mode> (operands[0], operands[1], tmp, + const0_rtx)); + DONE; +}) + (define_expand "neon_vmulls_n<mode>" [(match_operand:<V_widen> 0 "s_register_operand" "") (match_operand:VMDI 1 "s_register_operand" "") diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index 57a47ff..cc5a16a 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -191,6 +191,8 @@ UNSPEC_VBSL UNSPEC_VCAGE UNSPEC_VCAGT + UNSPEC_VCALE + UNSPEC_VCALT UNSPEC_VCEQ UNSPEC_VCGE UNSPEC_VCGEU @@ -258,6 +260,8 @@ UNSPEC_VMLSL_S_LANE UNSPEC_VMLSL_U_LANE UNSPEC_VMLSL_LANE + UNSPEC_VFMA_LANE + UNSPEC_VFMS_LANE UNSPEC_VMOVL_S UNSPEC_VMOVL_U UNSPEC_VMOVN @@ -386,5 +390,5 @@ UNSPEC_VRNDN UNSPEC_VRNDP UNSPEC_VRNDX + UNSPEC_VSQRTE ]) - diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c index 8399288..029d13c 100644 --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-arith-1.c @@ -9,6 +9,9 @@ typedef __fp16 float16_t; typedef __simd64_float16_t float16x4_t; typedef __simd128_float16_t float16x8_t; +typedef short int16x4_t __attribute__ ((vector_size (8))); +typedef short int int16x8_t __attribute__ ((vector_size (16))); + float16_t fp16_abs (float16_t a) { @@ -50,15 +53,47 @@ TEST_CMP (greaterthan, >, int, float16_t) TEST_CMP (lessthanequal, <=, int, float16_t) TEST_CMP (greaterthanqual, >=, int, float16_t) -/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 1 } } */ +/* Vectors of size 4. */ + +TEST_UNOP (neg, -, float16x4_t) + +TEST_BINOP (add, +, float16x4_t) +TEST_BINOP (sub, -, float16x4_t) +TEST_BINOP (mult, *, float16x4_t) +TEST_BINOP (div, /, float16x4_t) + +TEST_CMP (equal, ==, int16x4_t, float16x4_t) +TEST_CMP (unequal, !=, int16x4_t, float16x4_t) +TEST_CMP (lessthan, <, int16x4_t, float16x4_t) +TEST_CMP (greaterthan, >, int16x4_t, float16x4_t) +TEST_CMP (lessthanequal, <=, int16x4_t, float16x4_t) +TEST_CMP (greaterthanqual, >=, int16x4_t, float16x4_t) + +/* Vectors of size 8. */ + +TEST_UNOP (neg, -, float16x8_t) + +TEST_BINOP (add, +, float16x8_t) +TEST_BINOP (sub, -, float16x8_t) +TEST_BINOP (mult, *, float16x8_t) +TEST_BINOP (div, /, float16x8_t) + +TEST_CMP (equal, ==, int16x8_t, float16x8_t) +TEST_CMP (unequal, !=, int16x8_t, float16x8_t) +TEST_CMP (lessthan, <, int16x8_t, float16x8_t) +TEST_CMP (greaterthan, >, int16x8_t, float16x8_t) +TEST_CMP (lessthanequal, <=, int16x8_t, float16x8_t) +TEST_CMP (greaterthanqual, >=, int16x8_t, float16x8_t) + +/* { dg-final { scan-assembler-times {vneg\.f16\ts[0-9]+, s[0-9]+} 13 } } */ /* { dg-final { scan-assembler-times {vabs\.f16\ts[0-9]+, s[0-9]+} 2 } } */ -/* { dg-final { scan-assembler-times {vadd\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ -/* { dg-final { scan-assembler-times {vsub\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ -/* { dg-final { scan-assembler-times {vmul\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ -/* { dg-final { scan-assembler-times {vdiv\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ -/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 2 } } */ -/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 4 } } */ +/* { dg-final { scan-assembler-times {vadd\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } } */ +/* { dg-final { scan-assembler-times {vsub\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } } */ +/* { dg-final { scan-assembler-times {vmul\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } } */ +/* { dg-final { scan-assembler-times {vdiv\.f32\ts[0-9]+, s[0-9]+, s[0-9]+} 13 } } */ +/* { dg-final { scan-assembler-times {vcmp\.f32\ts[0-9]+, s[0-9]+} 26 } } */ +/* { dg-final { scan-assembler-times {vcmpe\.f32\ts[0-9]+, s[0-9]+} 52 } } */ /* { dg-final { scan-assembler-not {vadd\.f16} } } */ /* { dg-final { scan-assembler-not {vsub\.f16} } } */ -- 2.1.4

[9/17,ARM] Add NEON FP16 arithmetic instructions.

Commit Message

Comments

Patch