From patchwork Wed Dec 6 07:04:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1872469 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SlT1x0WDTz23mf for ; Wed, 6 Dec 2023 18:05:25 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1424A386D62B for ; Wed, 6 Dec 2023 07:05:23 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id C72D0386C5A5 for ; Wed, 6 Dec 2023 07:05:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C72D0386C5A5 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C72D0386C5A5 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701846312; cv=none; b=QRulH0vUHGHLiyfIMO+D/duhS4ehds7V97j28+lw/OuJzok3RlrTcEEBxSfJB28EgezG4BL1qwvrDuGnNkidyHC0pi9P4T1Q0kLh0SWV4PYtTq/ohg9qpc1v50Cpy1iu9WwoLvei48IpL/v+PgHMYl5yPSmBr+2GLKaJShXFCPk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701846312; c=relaxed/simple; bh=8g8MbX5LagJGwRTPWwMhyvVL+MeJ+0CciaeqAruD5+c=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=mxb9z29rQ+TWoiGCpMmdS5d/RgVRPA55uhvfJjPLhhHBiRaJU1NX5KvppA2tHo0gkAi1P2ay+NkesJ/pPui8QjPYMzi5ocCMBnwpNaJWmBw53fLK7vk2sIiEruJuVSAB31ndz0qD+y4plD8yH9PHZNPm+0j332/Kxk85m6fNXqA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8AxZ+gcHXBlQzs_AA--.25131S3; Wed, 06 Dec 2023 15:05:00 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dxvi8XHXBlp0BWAA--.59594S5; Wed, 06 Dec 2023 15:04:57 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH v3 1/5] LoongArch: Add support for LoongArch V1.1 approximate instructions. Date: Wed, 6 Dec 2023 15:04:49 +0800 Message-Id: <20231206070453.3252-2-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231206070453.3252-1-xujiahao@loongson.cn> References: <20231206070453.3252-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dxvi8XHXBlp0BWAA--.59594S5 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj9fXoWfur1UXFW5Wr4fAF1ktF4kZrc_yoW5WF4DGo WrCF4DJa1xGryIyrW5KrnxXrWjvayFyF4DAay3Zws5Ca1xJr90k347W3WFy342qF1kWrn8 C3s5W3sxXa4xJFs5l-sFpf9Il3svdjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY17kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r1I6r4UM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUAVWUtwAv7VC2z280 aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2Iq xVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42 IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY 6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aV CY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxU7MmhUUUUU X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds define_insn/builtins/intrinsics for these instructions, and add option -mfrecipe to control instruction generation. gcc/ChangeLog: * config/loongarch/genopts/isa-evolution.in (fecipe): Add. * config/loongarch/larchintrin.h (__frecipe_s): New intrinsic. (__frecipe_d): Ditto. (__frsqrte_s): Ditto. (__frsqrte_d): Ditto. * config/loongarch/lasx.md (lasx_xvfrecipe_): New insn pattern. (lasx_xvfrsqrte_): Ditto. * config/loongarch/lasxintrin.h (__lasx_xvfrecipe_s): New intrinsic. (__lasx_xvfrecipe_d): Ditto. (__lasx_xvfrsqrte_s): Ditto. (__lasx_xvfrsqrte_d): Ditto. * config/loongarch/loongarch-builtins.cc (AVAIL_ALL): Add predicates. (LSX_EXT_BUILTIN): New macro. (LASX_EXT_BUILTIN): Ditto. * config/loongarch/loongarch-cpucfg-map.h: Regenerate. * config/loongarch/loongarch-c.cc: Add builtin macro "__loongarch_frecipe". * config/loongarch/loongarch-def.cc: Regenerate. * config/loongarch/loongarch-str.h (OPTSTR_FRECIPE): Regenerate. * config/loongarch/loongarch.cc (loongarch_asm_code_end): Dump status for TARGET_FRECIPE. * config/loongarch/loongarch.md (loongarch_frecipe_): New insn pattern. (loongarch_frsqrte_): Ditto. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/lsx.md (lsx_vfrecipe_): New insn pattern. (lsx_vfrsqrte_): Ditto. * config/loongarch/lsxintrin.h (__lsx_vfrecipe_s): New intrinsic. (__lsx_vfrecipe_d): Ditto. (__lsx_vfrsqrte_s): Ditto. (__lsx_vfrsqrte_d): Ditto. * doc/extend.texi: Add documentation for LoongArch new builtins and intrinsics. gcc/testsuite/ChangeLog: * gcc.target/loongarch/larch-frecipe-builtin.c: New test. * gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c: New test. * gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c: New test. diff --git a/gcc/config/loongarch/genopts/isa-evolution.in b/gcc/config/loongarch/genopts/isa-evolution.in index a6bc3f87f20..11a198b649f 100644 --- a/gcc/config/loongarch/genopts/isa-evolution.in +++ b/gcc/config/loongarch/genopts/isa-evolution.in @@ -1,3 +1,4 @@ +2 25 frecipe Support frecipe.{s/d} and frsqrte.{s/d} instructions. 2 26 div32 Support div.w[u] and mod.w[u] instructions with inputs not sign-extended. 2 27 lam-bh Support am{swap/add}[_db].{b/h} instructions. 2 28 lamcas Support amcas[_db].{b/h/w/d} instructions. diff --git a/gcc/config/loongarch/larchintrin.h b/gcc/config/loongarch/larchintrin.h index e571ed27b37..bb1cda831eb 100644 --- a/gcc/config/loongarch/larchintrin.h +++ b/gcc/config/loongarch/larchintrin.h @@ -333,6 +333,44 @@ __iocsrwr_d (unsigned long int _1, unsigned int _2) } #endif +#ifdef __loongarch_frecipe +/* Assembly instruction format: fd, fj. */ +/* Data types in instruction templates: SF, SF. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__frecipe_s (float _1) +{ + __builtin_loongarch_frecipe_s ((float) _1); +} + +/* Assembly instruction format: fd, fj. */ +/* Data types in instruction templates: DF, DF. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__frecipe_d (double _1) +{ + __builtin_loongarch_frecipe_d ((double) _1); +} + +/* Assembly instruction format: fd, fj. */ +/* Data types in instruction templates: SF, SF. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__frsqrte_s (float _1) +{ + __builtin_loongarch_frsqrte_s ((float) _1); +} + +/* Assembly instruction format: fd, fj. */ +/* Data types in instruction templates: DF, DF. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__frsqrte_d (double _1) +{ + __builtin_loongarch_frsqrte_d ((double) _1); +} +#endif + /* Assembly instruction format: ui15. */ /* Data types in instruction templates: USI. */ #define __dbar(/*ui15*/ _1) __builtin_loongarch_dbar ((_1)) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index 116b30c0774..f6e5208a6f1 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -40,8 +40,10 @@ (define_c_enum "unspec" [ UNSPEC_LASX_XVFCVTL UNSPEC_LASX_XVFLOGB UNSPEC_LASX_XVFRECIP + UNSPEC_LASX_XVFRECIPE UNSPEC_LASX_XVFRINT UNSPEC_LASX_XVFRSQRT + UNSPEC_LASX_XVFRSQRTE UNSPEC_LASX_XVFCMP_SAF UNSPEC_LASX_XVFCMP_SEQ UNSPEC_LASX_XVFCMP_SLE @@ -1633,6 +1635,17 @@ (define_insn "lasx_xvfrecip_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Instructions. + +(define_insn "lasx_xvfrecipe_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRECIPE))] + "ISA_HAS_LASX && TARGET_FRECIPE" + "xvfrecipe.\t%u0,%u1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lasx_xvfrsqrt_" [(set (match_operand:FLASX 0 "register_operand" "=f") (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] @@ -1642,6 +1655,17 @@ (define_insn "lasx_xvfrsqrt_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Square Root Instructions. + +(define_insn "lasx_xvfrsqrte_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRSQRTE))] + "ISA_HAS_LASX && TARGET_FRECIPE" + "xvfrsqrte.\t%u0,%u1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lasx_xvftint_u__" [(set (match_operand: 0 "register_operand" "=f") (unspec: [(match_operand:FLASX 1 "register_operand" "f")] diff --git a/gcc/config/loongarch/lasxintrin.h b/gcc/config/loongarch/lasxintrin.h index 7bce2c757f1..5e65e76e74c 100644 --- a/gcc/config/loongarch/lasxintrin.h +++ b/gcc/config/loongarch/lasxintrin.h @@ -2399,6 +2399,40 @@ __m256d __lasx_xvfrecip_d (__m256d _1) return (__m256d)__builtin_lasx_xvfrecip_d ((v4f64)_1); } +#if defined(__loongarch_frecipe) +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrecipe_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrecipe_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrecipe_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrecipe_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrsqrte_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrsqrte_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrsqrte_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrsqrte_d ((v4f64)_1); +} +#endif + /* Assembly instruction format: xd, xj. */ /* Data types in instruction templates: V8SF, V8SF. */ extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index 5d037ab7f10..507fc953c72 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -120,6 +120,9 @@ struct loongarch_builtin_description AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI) AVAIL_ALL (lsx, ISA_HAS_LSX) AVAIL_ALL (lasx, ISA_HAS_LASX) +AVAIL_ALL (frecipe, TARGET_FRECIPE && TARGET_HARD_FLOAT_ABI) +AVAIL_ALL (lsx_frecipe, ISA_HAS_LSX && TARGET_FRECIPE) +AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE) /* Construct a loongarch_builtin_description from the given arguments. @@ -164,6 +167,15 @@ AVAIL_ALL (lasx, ISA_HAS_LASX) "__builtin_lsx_" #INSN, LARCH_BUILTIN_DIRECT, \ FUNCTION_TYPE, loongarch_builtin_avail_lsx } + /* Define an LSX LARCH_BUILTIN_DIRECT function __builtin_lsx_ + for instruction CODE_FOR_lsx_. FUNCTION_TYPE is a builtin_description + field. AVAIL is the name of the availability predicate, without the leading + loongarch_builtin_avail_. */ +#define LSX_EXT_BUILTIN(INSN, FUNCTION_TYPE, AVAIL) \ + { CODE_FOR_lsx_ ## INSN, \ + "__builtin_lsx_" #INSN, LARCH_BUILTIN_DIRECT, \ + FUNCTION_TYPE, loongarch_builtin_avail_##AVAIL } + /* Define an LSX LARCH_BUILTIN_LSX_TEST_BRANCH function __builtin_lsx_ for instruction CODE_FOR_lsx_. FUNCTION_TYPE is a builtin_description @@ -189,6 +201,15 @@ AVAIL_ALL (lasx, ISA_HAS_LASX) "__builtin_lasx_" #INSN, LARCH_BUILTIN_LASX, \ FUNCTION_TYPE, loongarch_builtin_avail_lasx } +/* Define an LASX LARCH_BUILTIN_DIRECT function __builtin_lasx_ + for instruction CODE_FOR_lasx_. FUNCTION_TYPE is a builtin_description + field. AVAIL is the name of the availability predicate, without the leading + loongarch_builtin_avail_. */ +#define LASX_EXT_BUILTIN(INSN, FUNCTION_TYPE, AVAIL) \ + { CODE_FOR_lasx_ ## INSN, \ + "__builtin_lasx_" #INSN, LARCH_BUILTIN_LASX, \ + FUNCTION_TYPE, loongarch_builtin_avail_##AVAIL } + /* Define an LASX LARCH_BUILTIN_DIRECT_NO_TARGET function __builtin_lasx_ for instruction CODE_FOR_lasx_. FUNCTION_TYPE is a builtin_description field. */ @@ -804,6 +825,27 @@ static const struct loongarch_builtin_description loongarch_builtins[] = { DIRECT_NO_TARGET_BUILTIN (syscall, LARCH_VOID_FTYPE_USI, default), DIRECT_NO_TARGET_BUILTIN (break, LARCH_VOID_FTYPE_USI, default), + /* Built-in functions for frecipe.{s/d} and frsqrte.{s/d}. */ + + DIRECT_BUILTIN (frecipe_s, LARCH_SF_FTYPE_SF, frecipe), + DIRECT_BUILTIN (frecipe_d, LARCH_DF_FTYPE_DF, frecipe), + DIRECT_BUILTIN (frsqrte_s, LARCH_SF_FTYPE_SF, frecipe), + DIRECT_BUILTIN (frsqrte_d, LARCH_DF_FTYPE_DF, frecipe), + + /* Built-in functions for new LSX instructions. */ + + LSX_EXT_BUILTIN (vfrecipe_s, LARCH_V4SF_FTYPE_V4SF, lsx_frecipe), + LSX_EXT_BUILTIN (vfrecipe_d, LARCH_V2DF_FTYPE_V2DF, lsx_frecipe), + LSX_EXT_BUILTIN (vfrsqrte_s, LARCH_V4SF_FTYPE_V4SF, lsx_frecipe), + LSX_EXT_BUILTIN (vfrsqrte_d, LARCH_V2DF_FTYPE_V2DF, lsx_frecipe), + + /* Built-in functions for new LASX instructions. */ + + LASX_EXT_BUILTIN (xvfrecipe_s, LARCH_V8SF_FTYPE_V8SF, lasx_frecipe), + LASX_EXT_BUILTIN (xvfrecipe_d, LARCH_V4DF_FTYPE_V4DF, lasx_frecipe), + LASX_EXT_BUILTIN (xvfrsqrte_s, LARCH_V8SF_FTYPE_V8SF, lasx_frecipe), + LASX_EXT_BUILTIN (xvfrsqrte_d, LARCH_V4DF_FTYPE_V4DF, lasx_frecipe), + /* Built-in functions for LSX. */ LSX_BUILTIN (vsll_b, LARCH_V16QI_FTYPE_V16QI_V16QI), LSX_BUILTIN (vsll_h, LARCH_V8HI_FTYPE_V8HI_V8HI), diff --git a/gcc/config/loongarch/loongarch-c.cc b/gcc/config/loongarch/loongarch-c.cc index fbc33a10351..44f52245c78 100644 --- a/gcc/config/loongarch/loongarch-c.cc +++ b/gcc/config/loongarch/loongarch-c.cc @@ -102,6 +102,9 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile) else builtin_define ("__loongarch_frlen=0"); + if (TARGET_HARD_FLOAT && TARGET_FRECIPE) + builtin_define ("__loongarch_frecipe"); + if (ISA_HAS_LSX) { builtin_define ("__loongarch_simd"); diff --git a/gcc/config/loongarch/loongarch-cpucfg-map.h b/gcc/config/loongarch/loongarch-cpucfg-map.h index 02ff1671255..148333c249c 100644 --- a/gcc/config/loongarch/loongarch-cpucfg-map.h +++ b/gcc/config/loongarch/loongarch-cpucfg-map.h @@ -29,6 +29,7 @@ static constexpr struct { unsigned int cpucfg_bit; HOST_WIDE_INT isa_evolution_bit; } cpucfg_map[] = { + { 2, 1u << 25, OPTION_MASK_ISA_FRECIPE }, { 2, 1u << 26, OPTION_MASK_ISA_DIV32 }, { 2, 1u << 27, OPTION_MASK_ISA_LAM_BH }, { 2, 1u << 28, OPTION_MASK_ISA_LAMCAS }, diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc index bc6997e45b5..c41804a180e 100644 --- a/gcc/config/loongarch/loongarch-def.cc +++ b/gcc/config/loongarch/loongarch-def.cc @@ -60,7 +60,8 @@ array_arch loongarch_cpu_default_isa = .fpu_ (ISA_EXT_FPU64) .simd_ (ISA_EXT_SIMD_LASX) .evolution_ (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA - | OPTION_MASK_ISA_LAM_BH | OPTION_MASK_ISA_LAMCAS)); + | OPTION_MASK_ISA_LAM_BH | OPTION_MASK_ISA_LAMCAS + | OPTION_MASK_ISA_FRECIPE)); static inline loongarch_cache la464_cache () { diff --git a/gcc/config/loongarch/loongarch-str.h b/gcc/config/loongarch/loongarch-str.h index 7c78d1443d5..4d1bfd675e8 100644 --- a/gcc/config/loongarch/loongarch-str.h +++ b/gcc/config/loongarch/loongarch-str.h @@ -68,6 +68,7 @@ along with GCC; see the file COPYING3. If not see #define STR_EXPLICIT_RELOCS_NONE "none" #define STR_EXPLICIT_RELOCS_ALWAYS "always" +#define OPTSTR_FRECIPE "frecipe" #define OPTSTR_DIV32 "div32" #define OPTSTR_LAM_BH "lam-bh" #define OPTSTR_LAMCAS "lamcas" diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 3545e66a10e..57a20bec8a4 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -11503,6 +11503,7 @@ loongarch_asm_code_end (void) loongarch_cpu_strings [la_target.cpu_tune]); fprintf (asm_out_file, "%s Base ISA: %s\n", ASM_COMMENT_START, loongarch_isa_base_strings [la_target.isa.base]); + DUMP_FEATURE (TARGET_FRECIPE); DUMP_FEATURE (TARGET_DIV32); DUMP_FEATURE (TARGET_LAM_BH); DUMP_FEATURE (TARGET_LAMCAS); diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 7a101dd64b7..07beede8892 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -59,6 +59,12 @@ (define_c_enum "unspec" [ ;; Stack tie UNSPEC_TIE + ;; RSQRT + UNSPEC_RSQRTE + + ;; RECIP + UNSPEC_RECIPE + ;; CRC UNSPEC_CRC UNSPEC_CRCC @@ -220,6 +226,7 @@ (define_attr "qword_mode" "no,yes" ;; fmadd floating point multiply-add ;; fdiv floating point divide ;; frdiv floating point reciprocal divide +;; frecipe floating point approximate reciprocal ;; fabs floating point absolute value ;; flogb floating point exponent extract ;; fneg floating point negation @@ -229,6 +236,7 @@ (define_attr "qword_mode" "no,yes" ;; fscaleb floating point scale ;; fsqrt floating point square root ;; frsqrt floating point reciprocal square root +;; frsqrte floating point approximate reciprocal square root ;; multi multiword sequence (or user asm statements) ;; atomic atomic memory update instruction ;; syncloop memory atomic operation implemented as a sync loop @@ -238,8 +246,8 @@ (define_attr "type" "unknown,branch,jump,call,load,fpload,fpidxload,store,fpstore,fpidxstore, prefetch,prefetchx,condmove,mgtf,mftg,const,arith,logical, shift,slt,signext,clz,trap,imul,idiv,move, - fmove,fadd,fmul,fmadd,fdiv,frdiv,fabs,flogb,fneg,fcmp,fcopysign,fcvt, - fscaleb,fsqrt,frsqrt,accext,accmod,multi,atomic,syncloop,nop,ghost, + fmove,fadd,fmul,fmadd,fdiv,frdiv,frecipe,fabs,flogb,fneg,fcmp,fcopysign,fcvt, + fscaleb,fsqrt,frsqrt,frsqrte,accext,accmod,multi,atomic,syncloop,nop,ghost, simd_div,simd_fclass,simd_flog2,simd_fadd,simd_fcvt,simd_fmul,simd_fmadd, simd_fdiv,simd_bitins,simd_bitmov,simd_insert,simd_sld,simd_mul,simd_fcmp, simd_fexp2,simd_int_arith,simd_bit,simd_shift,simd_splat,simd_fill, @@ -908,6 +916,18 @@ (define_insn "*recip3" [(set_attr "type" "frdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Instructions. + +(define_insn "loongarch_frecipe_" + [(set (match_operand:ANYF 0 "register_operand" "=f") + (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] + UNSPEC_RECIPE))] + "TARGET_FRECIPE" + "frecipe.\t%0,%1" + [(set_attr "type" "frecipe") + (set_attr "mode" "") + (set_attr "insn_count" "1")]) + ;; Integer division and modulus. (define_expand "3" [(set (match_operand:GPR 0 "register_operand") @@ -1133,6 +1153,17 @@ (define_insn "*rsqrtb" [(set_attr "type" "frsqrt") (set_attr "mode" "") (set_attr "insn_count" "1")]) + +;; Approximate Reciprocal Square Root Instructions. + +(define_insn "loongarch_frsqrte_" + [(set (match_operand:ANYF 0 "register_operand" "=f") + (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] + UNSPEC_RSQRTE))] + "TARGET_FRECIPE" + "frsqrte.\t%0,%1" + [(set_attr "type" "frsqrte") + (set_attr "mode" "")]) ;; ;; .................... diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt index 41e6424e861..cdd59ae4fcf 100644 --- a/gcc/config/loongarch/loongarch.opt +++ b/gcc/config/loongarch/loongarch.opt @@ -260,6 +260,10 @@ default value is 4. Variable HOST_WIDE_INT isa_evolution = 0 +mfrecipe +Target Mask(ISA_FRECIPE) Var(isa_evolution) +Support frecipe.{s/d} and frsqrte.{s/d} instructions. + mdiv32 Target Mask(ISA_DIV32) Var(isa_evolution) Support div.w[u] and mod.w[u] instructions with inputs not sign-extended. diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 23239993404..e2393aed139 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -42,8 +42,10 @@ (define_c_enum "unspec" [ UNSPEC_LSX_VFCVTL UNSPEC_LSX_VFLOGB UNSPEC_LSX_VFRECIP + UNSPEC_LSX_VFRECIPE UNSPEC_LSX_VFRINT UNSPEC_LSX_VFRSQRT + UNSPEC_LSX_VFRSQRTE UNSPEC_LSX_VFCMP_SAF UNSPEC_LSX_VFCMP_SEQ UNSPEC_LSX_VFCMP_SLE @@ -1546,6 +1548,17 @@ (define_insn "lsx_vfrecip_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Instructions. + +(define_insn "lsx_vfrecipe_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRECIPE))] + "ISA_HAS_LSX && TARGET_FRECIPE" + "vfrecipe.\t%w0,%w1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lsx_vfrsqrt_" [(set (match_operand:FLSX 0 "register_operand" "=f") (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] @@ -1555,6 +1568,17 @@ (define_insn "lsx_vfrsqrt_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Square Root Instructions. + +(define_insn "lsx_vfrsqrte_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRSQRTE))] + "ISA_HAS_LSX && TARGET_FRECIPE" + "vfrsqrte.\t%w0,%w1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lsx_vftint_u__" [(set (match_operand: 0 "register_operand" "=f") (unspec: [(match_operand:FLSX 1 "register_operand" "f")] diff --git a/gcc/config/loongarch/lsxintrin.h b/gcc/config/loongarch/lsxintrin.h index 29553c093fa..57a6fc40a8f 100644 --- a/gcc/config/loongarch/lsxintrin.h +++ b/gcc/config/loongarch/lsxintrin.h @@ -2480,6 +2480,40 @@ __m128d __lsx_vfrecip_d (__m128d _1) return (__m128d)__builtin_lsx_vfrecip_d ((v2f64)_1); } +#if defined(__loongarch_frecipe) +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrecipe_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrecipe_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrecipe_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrecipe_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrsqrte_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrsqrte_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrsqrte_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrsqrte_d ((v2f64)_1); +} +#endif + /* Assembly instruction format: vd, vj. */ /* Data types in instruction templates: V4SF, V4SF. */ extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 32ae15e1d5b..98c6d320fbe 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -17027,6 +17027,14 @@ The intrinsics provided are listed below: void __builtin_loongarch_break (imm0_32767) @end smallexample +These instrisic functions are available by using @option{-mfrecipe}. +@smallexample + float __builtin_loongarch_frecipe_s (float); + double __builtin_loongarch_frecipe_d (double); + float __builtin_loongarch_frsqrte_s (float); + double __builtin_loongarch_frsqrte_d (double); +@end smallexample + @emph{Note:}Since the control register is divided into 32-bit and 64-bit, but the access instruction is not distinguished. So GCC renames the control instructions when implementing intrinsics. @@ -17099,6 +17107,15 @@ function you need to include @code{larchintrin.h}. void __break (imm0_32767) @end smallexample +These instrisic functions are available by including @code{larchintrin.h} and +using @option{-mfrecipe}. +@smallexample + float __frecipe_s (float); + double __frecipe_d (double); + float __frsqrte_s (float); + double __frsqrte_d (double); +@end smallexample + Additional built-in functions are available for LoongArch family processors to efficiently use 128-bit floating-point (__float128) values. @@ -17939,6 +17956,15 @@ __m128i __lsx_vxori_b (__m128i, imm0_255); __m128i __lsx_vxor_v (__m128i, __m128i); @end smallexample +These instrisic functions are available by including @code{lsxintrin.h} and +using @option{-mfrecipe} and @option{-mlsx}. +@smallexample +__m128d __lsx_vfrecipe_d (__m128d); +__m128 __lsx_vfrecipe_s (__m128); +__m128d __lsx_vfrsqrte_d (__m128d); +__m128 __lsx_vfrsqrte_s (__m128); +@end smallexample + @node LoongArch ASX Vector Intrinsics @subsection LoongArch ASX Vector Intrinsics @@ -18778,6 +18804,15 @@ __m256i __lasx_xvxori_b (__m256i, imm0_255); __m256i __lasx_xvxor_v (__m256i, __m256i); @end smallexample +These instrisic functions are available by including @code{lasxintrin.h} and +using @option{-mfrecipe} and @option{-mlasx}. +@smallexample +__m256d __lasx_xvfrecipe_d (__m256d); +__m256 __lasx_xvfrecipe_s (__m256); +__m256d __lasx_xvfrsqrte_d (__m256d); +__m256 __lasx_xvfrsqrte_s (__m256); +@end smallexample + @node MIPS DSP Built-in Functions @subsection MIPS DSP Built-in Functions diff --git a/gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c b/gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c new file mode 100644 index 00000000000..b9329f34676 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c @@ -0,0 +1,28 @@ +/* Test builtins for frecipe.{s/d} and frsqrte.{s/d} instructions */ +/* { dg-do compile } */ +/* { dg-options "-mfrecipe" } */ +/* { dg-final { scan-assembler-times "test_frecipe_s:.*frecipe\\.s.*test_frecipe_s" 1 } } */ +/* { dg-final { scan-assembler-times "test_frecipe_d:.*frecipe\\.d.*test_frecipe_d" 1 } } */ +/* { dg-final { scan-assembler-times "test_frsqrte_s:.*frsqrte\\.s.*test_frsqrte_s" 1 } } */ +/* { dg-final { scan-assembler-times "test_frsqrte_d:.*frsqrte\\.d.*test_frsqrte_d" 1 } } */ + +float +test_frecipe_s (float _1) +{ + return __builtin_loongarch_frecipe_s (_1); +} +double +test_frecipe_d (double _1) +{ + return __builtin_loongarch_frecipe_d (_1); +} +float +test_frsqrte_s (float _1) +{ + return __builtin_loongarch_frsqrte_s (_1); +} +double +test_frsqrte_d (double _1) +{ + return __builtin_loongarch_frsqrte_d (_1); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c new file mode 100644 index 00000000000..522535b45a3 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c @@ -0,0 +1,30 @@ +/* Test builtins for xvfrecipe.{s/d} and xvfrsqrte.{s/d} instructions */ +/* { dg-do compile } */ +/* { dg-options "-mlasx -mfrecipe" } */ +/* { dg-final { scan-assembler-times "lasx_xvfrecipe_s:.*xvfrecipe\\.s.*lasx_xvfrecipe_s" 1 } } */ +/* { dg-final { scan-assembler-times "lasx_xvfrecipe_d:.*xvfrecipe\\.d.*lasx_xvfrecipe_d" 1 } } */ +/* { dg-final { scan-assembler-times "lasx_xvfrsqrte_s:.*xvfrsqrte\\.s.*lasx_xvfrsqrte_s" 1 } } */ +/* { dg-final { scan-assembler-times "lasx_xvfrsqrte_d:.*xvfrsqrte\\.d.*lasx_xvfrsqrte_d" 1 } } */ + +#include + +v8f32 +__lasx_xvfrecipe_s (v8f32 _1) +{ + return __builtin_lasx_xvfrecipe_s (_1); +} +v4f64 +__lasx_xvfrecipe_d (v4f64 _1) +{ + return __builtin_lasx_xvfrecipe_d (_1); +} +v8f32 +__lasx_xvfrsqrte_s (v8f32 _1) +{ + return __builtin_lasx_xvfrsqrte_s (_1); +} +v4f64 +__lasx_xvfrsqrte_d (v4f64 _1) +{ + return __builtin_lasx_xvfrsqrte_d (_1); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c new file mode 100644 index 00000000000..4ad0cb0ffd6 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c @@ -0,0 +1,30 @@ +/* Test builtins for vfrecipe.{s/d} and vfrsqrte.{s/d} instructions */ +/* { dg-do compile } */ +/* { dg-options "-mlsx -mfrecipe" } */ +/* { dg-final { scan-assembler-times "lsx_vfrecipe_s:.*vfrecipe\\.s.*lsx_vfrecipe_s" 1 } } */ +/* { dg-final { scan-assembler-times "lsx_vfrecipe_d:.*vfrecipe\\.d.*lsx_vfrecipe_d" 1 } } */ +/* { dg-final { scan-assembler-times "lsx_vfrsqrte_s:.*vfrsqrte\\.s.*lsx_vfrsqrte_s" 1 } } */ +/* { dg-final { scan-assembler-times "lsx_vfrsqrte_d:.*vfrsqrte\\.d.*lsx_vfrsqrte_d" 1 } } */ + +#include + +v4f32 +__lsx_vfrecipe_s (v4f32 _1) +{ + return __builtin_lsx_vfrecipe_s (_1); +} +v2f64 +__lsx_vfrecipe_d (v2f64 _1) +{ + return __builtin_lsx_vfrecipe_d (_1); +} +v4f32 +__lsx_vfrsqrte_s (v4f32 _1) +{ + return __builtin_lsx_vfrsqrte_s (_1); +} +v2f64 +__lsx_vfrsqrte_d (v2f64 _1) +{ + return __builtin_lsx_vfrsqrte_d (_1); +} From patchwork Wed Dec 6 07:04:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1872472 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SlT2W3WK4z23mf for ; Wed, 6 Dec 2023 18:05:55 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 36F373870922 for ; Wed, 6 Dec 2023 07:05:53 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 80E113865C21 for ; Wed, 6 Dec 2023 07:05:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 80E113865C21 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 80E113865C21 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:470:142:3::10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701846324; cv=none; b=FZjwEbF3JUEvW6+7J0Iozq2J2UtJy4FgeWHnOg3kaURMfDTx9R5E8rN8rAcW+sRPNI9tWWoBkYT/Z/6UBjOo1XbRz/VQOSrCxZpypWAjVLgRzI6A5mN88ZyNyonS8RFxUkwaRr0r21/FFS0QdkCULl0QQd9RLznjh1DJ92gT7uM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701846324; c=relaxed/simple; bh=SYLL0MPKyX3ubCnXPY2Rc8+Yoz12WrTvhRZj6ryyT7Y=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=KjvszpjguK24MQrzNV+yWpZQW9KIcWOAfqJ+YeG10cCkpsbJ8YQT2M3b41Tb2UohDQ43xZ4uyZvYPHtTIQUnZPsSb/JPrAc01cCJW6HsYjRRGKuU2SCdenTcbOunsCyv7AIq5IEhOVSeGWpVT0zFMGJcsxD2gJ90b2N8uXV1mp8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from mail.loongson.cn ([114.242.206.163]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rAly6-0000no-FG for gcc-patches@gcc.gnu.org; Wed, 06 Dec 2023 02:05:17 -0500 Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8AxlPAfHXBlRjs_AA--.60693S3; Wed, 06 Dec 2023 15:05:03 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dxvi8XHXBlp0BWAA--.59594S6; Wed, 06 Dec 2023 15:05:01 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH v3 2/5] LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions. Date: Wed, 6 Dec 2023 15:04:50 +0800 Message-Id: <20231206070453.3252-3-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231206070453.3252-1-xujiahao@loongson.cn> References: <20231206070453.3252-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dxvi8XHXBlp0BWAA--.59594S6 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj93XoW3WF4xCFyUWr4kWw1xWw4Dtrc_yoW3uw18p3 9rCw1vyrW8JFs7Kr1kt3y5Xr45tr9rGF129a9I93y2kan0q3WDZF1vkFZFqFyjqw4rGryI vw4rW3WjvFWUC3cCm3ZEXasCq-sJn29KB7ZKAUJUUUU5529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkFb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r126r13M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r126r1DMcIj6I8E87Iv 67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07jjpB-UUUUU= Received-SPF: pass client-ip=114.242.206.163; envelope-from=xujiahao@loongson.cn; helo=mail.loongson.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_FAIL, SPF_HELO_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to .. (rsqrt2): .. this. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrsqrt_d): Redefine to standard pattern name. (CODE_FOR_lsx_vfrsqrt_s): Ditto. (CODE_FOR_lasx_xvfrsqrt_d): Ditto. (CODE_FOR_lasx_xvfrsqrt_s): Ditto. * config/loongarch/loongarch.cc (use_rsqrt_p): New function. (loongarch_optab_supported_p): Ditto. (TARGET_OPTAB_SUPPORTED_P): New hook. * config/loongarch/loongarch.md (*rsqrta): Remove. (*rsqrt2): New insn pattern. (*rsqrtb): Remove. * config/loongarch/lsx.md (lsx_vfrsqrt_): Renamed to .. (rsqrt2): .. this. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-rsqrt.c: New test. * gcc.target/loongarch/vector/lsx/lsx-rsqrt.c: New test. diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index f6e5208a6f1..c8edc1bfd76 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -1646,10 +1646,10 @@ (define_insn "lasx_xvfrecipe_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) -(define_insn "lasx_xvfrsqrt_" +(define_insn "rsqrt2" [(set (match_operand:FLASX 0 "register_operand" "=f") - (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] - UNSPEC_LASX_XVFRSQRT))] + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRSQRT))] "ISA_HAS_LASX" "xvfrsqrt.\t%u0,%u1" [(set_attr "type" "simd_fdiv") diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index 507fc953c72..ba8686d4ceb 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -500,6 +500,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE) #define CODE_FOR_lsx_vssrlrn_bu_h CODE_FOR_lsx_vssrlrn_u_bu_h #define CODE_FOR_lsx_vssrlrn_hu_w CODE_FOR_lsx_vssrlrn_u_hu_w #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d +#define CODE_FOR_lsx_vfrsqrt_d CODE_FOR_rsqrtv2df2 +#define CODE_FOR_lsx_vfrsqrt_s CODE_FOR_rsqrtv4sf2 /* LoongArch ASX define CODE_FOR_lasx_mxxx */ #define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3 @@ -776,6 +778,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE) #define CODE_FOR_lasx_xvsat_hu CODE_FOR_lasx_xvsat_u_hu #define CODE_FOR_lasx_xvsat_wu CODE_FOR_lasx_xvsat_u_wu #define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du +#define CODE_FOR_lasx_xvfrsqrt_d CODE_FOR_rsqrtv4df2 +#define CODE_FOR_lasx_xvfrsqrt_s CODE_FOR_rsqrtv8sf2 static const struct loongarch_builtin_description loongarch_builtins[] = { #define LARCH_MOVFCSR2GR 0 diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 57a20bec8a4..96a4b846f2d 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -11487,6 +11487,30 @@ loongarch_builtin_support_vector_misalignment (machine_mode mode, is_packed); } +static bool +use_rsqrt_p (void) +{ + return (flag_finite_math_only + && !flag_trapping_math + && flag_unsafe_math_optimizations); +} + +/* Implement the TARGET_OPTAB_SUPPORTED_P hook. */ + +static bool +loongarch_optab_supported_p (int op, machine_mode, machine_mode, + optimization_type opt_type) +{ + switch (op) + { + case rsqrt_optab: + return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (); + + default: + return true; + } +} + /* If -fverbose-asm, dump some info for debugging. */ static void loongarch_asm_code_end (void) @@ -11625,6 +11649,9 @@ loongarch_asm_code_end (void) #undef TARGET_FUNCTION_ARG_BOUNDARY #define TARGET_FUNCTION_ARG_BOUNDARY loongarch_function_arg_boundary +#undef TARGET_OPTAB_SUPPORTED_P +#define TARGET_OPTAB_SUPPORTED_P loongarch_optab_supported_p + #undef TARGET_VECTOR_MODE_SUPPORTED_P #define TARGET_VECTOR_MODE_SUPPORTED_P loongarch_vector_mode_supported_p diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 07beede8892..fd154b02e48 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -60,6 +60,7 @@ (define_c_enum "unspec" [ UNSPEC_TIE ;; RSQRT + UNSPEC_RSQRT UNSPEC_RSQRTE ;; RECIP @@ -1134,25 +1135,14 @@ (define_insn "sqrt2" (set_attr "mode" "") (set_attr "insn_count" "1")]) -(define_insn "*rsqrta" +(define_insn "*rsqrt2" [(set (match_operand:ANYF 0 "register_operand" "=f") - (div:ANYF (match_operand:ANYF 1 "const_1_operand" "") - (sqrt:ANYF (match_operand:ANYF 2 "register_operand" "f"))))] - "flag_unsafe_math_optimizations" - "frsqrt.\t%0,%2" - [(set_attr "type" "frsqrt") - (set_attr "mode" "") - (set_attr "insn_count" "1")]) - -(define_insn "*rsqrtb" - [(set (match_operand:ANYF 0 "register_operand" "=f") - (sqrt:ANYF (div:ANYF (match_operand:ANYF 1 "const_1_operand" "") - (match_operand:ANYF 2 "register_operand" "f"))))] - "flag_unsafe_math_optimizations" - "frsqrt.\t%0,%2" + (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] + UNSPEC_RSQRT))] + "TARGET_HARD_FLOAT" + "frsqrt.\t%0,%1" [(set_attr "type" "frsqrt") - (set_attr "mode" "") - (set_attr "insn_count" "1")]) + (set_attr "mode" "")]) ;; Approximate Reciprocal Square Root Instructions. diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index e2393aed139..aeae1b1a622 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -1559,10 +1559,10 @@ (define_insn "lsx_vfrecipe_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) -(define_insn "lsx_vfrsqrt_" +(define_insn "rsqrt2" [(set (match_operand:FLSX 0 "register_operand" "=f") - (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] - UNSPEC_LSX_VFRSQRT))] + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRSQRT))] "ISA_HAS_LSX" "vfrsqrt.\t%w0,%w1" [(set_attr "type" "simd_fdiv") diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c new file mode 100644 index 00000000000..24316944d4e --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlasx -ffast-math" } */ +/* { dg-final { scan-assembler "xvfrsqrt.s" } } */ +/* { dg-final { scan-assembler "xvfrsqrt.d" } } */ + +extern float sqrtf (float); + +float a[8], b[8]; + +void +foo1(void) +{ + for (int i = 0; i < 8; i++) + a[i] = 1 / sqrtf (b[i]); +} + +extern double sqrt (double); + +double da[4], db[4]; + +void +foo2(void) +{ + for (int i = 0; i < 4; i++) + da[i] = 1 / sqrt (db[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c new file mode 100644 index 00000000000..519cc47644c --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlsx -ffast-math" } */ +/* { dg-final { scan-assembler "vfrsqrt.s" } } */ +/* { dg-final { scan-assembler "vfrsqrt.d" } } */ + +extern float sqrtf (float); + +float a[4], b[4]; + +void +foo1(void) +{ + for (int i = 0; i < 4; i++) + a[i] = 1 / sqrtf (b[i]); +} + +extern double sqrt (double); + +double da[2], db[2]; + +void +foo2(void) +{ + for (int i = 0; i < 2; i++) + da[i] = 1 / sqrt (db[i]); +} From patchwork Wed Dec 6 07:04:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1872470 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SlT2959nwz23mf for ; Wed, 6 Dec 2023 18:05:37 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6B5B1386D621 for ; Wed, 6 Dec 2023 07:05:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 8324D3845BDE for ; Wed, 6 Dec 2023 07:05:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8324D3845BDE Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8324D3845BDE Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:470:142:3::10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701846324; cv=none; b=CDBqtL4S3cF6HuyyY/p8R1DQlWtDcomo7K2Ds0YmTO2eTjP+0NfuXY889WPrj45DgMKDxYZTQ9QcTkas2YOhUSf4asOEwbEvR77MxZSogr0IN3x0+VgtCcYYj7NN9p+ilq+irz8N4LK5EADpvE3XeykVuIT8bMntUITCafULmAY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701846324; c=relaxed/simple; bh=EntkarUqWmzN5aay2Ju73qLYajU1MEfGIwSUv3kUBCA=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=dGQay5qJJkXn0F0+HVmmhP05471+/5m8doxpqX4zkQkq/qzZU4yyeRrbs+v1z9+qmxlvhYlrpPwaKwxXbbGmtaZ4/ymTMRI+FGtiCx6ojrnImgPo+A2hmJtA7jbn7oATVXR8WjN+4kcXC9UihCsNa0abAwOxHNdhOY/J7048w1I= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from mail.loongson.cn ([114.242.206.163]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rAly8-0000og-1L for gcc-patches@gcc.gnu.org; Wed, 06 Dec 2023 02:05:17 -0500 Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8Bxd+giHXBlSTs_AA--.24548S3; Wed, 06 Dec 2023 15:05:06 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dxvi8XHXBlp0BWAA--.59594S7; Wed, 06 Dec 2023 15:05:04 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH v3 3/5] LoongArch: Redefine pattern for xvfrecip/vfrecip instructions. Date: Wed, 6 Dec 2023 15:04:51 +0800 Message-Id: <20231206070453.3252-4-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231206070453.3252-1-xujiahao@loongson.cn> References: <20231206070453.3252-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dxvi8XHXBlp0BWAA--.59594S7 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj93XoW3WF43Gr15Jr1UtFWUur1rGrX_yoW7AryDpr ZrC3ZFyrWrJFsIgw1ktay5Xr15Kr9rKF429FW3Z39xAa1jqw1vyF1FkFZIqF17Xw4rKr1I va1Fga1YvFWDC3gCm3ZEXasCq-sJn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkFb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1Y6r17M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E87Iv 67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07josjUUUUUU= Received-SPF: pass client-ip=114.242.206.163; envelope-from=xujiahao@loongson.cn; helo=mail.loongson.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-13.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_FAIL, SPF_HELO_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec, and enable [x]vfrecip instructions to be generated during auto-vectorization. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to .. (recip3): .. this. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrecip_d): Redefine to new pattern name. (CODE_FOR_lsx_vfrecip_s): Ditto. (CODE_FOR_lasx_xvfrecip_d): Ditto. (CODE_FOR_lasx_xvfrecip_s): Ditto. (loongarch_expand_builtin_direct): For the vector recip instructions, construct a temporary parameter const1_vector. * config/loongarch/lsx.md (lsx_vfrecip_): Renamed to .. (recip3): .. this. * config/loongarch/predicates.md (const_vector_1_operand): New predicate. diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index c8edc1bfd76..e4310c4523d 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -1626,12 +1626,12 @@ (define_insn "lasx_xvfmina_" [(set_attr "type" "simd_fminmax") (set_attr "mode" "")]) -(define_insn "lasx_xvfrecip_" +(define_insn "recip3" [(set (match_operand:FLASX 0 "register_operand" "=f") - (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] - UNSPEC_LASX_XVFRECIP))] + (div:FLASX (match_operand:FLASX 1 "const_vector_1_operand" "") + (match_operand:FLASX 2 "register_operand" "f")))] "ISA_HAS_LASX" - "xvfrecip.\t%u0,%u1" + "xvfrecip.\t%u0,%u2" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index ba8686d4ceb..c77394176db 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -502,6 +502,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE) #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d #define CODE_FOR_lsx_vfrsqrt_d CODE_FOR_rsqrtv2df2 #define CODE_FOR_lsx_vfrsqrt_s CODE_FOR_rsqrtv4sf2 +#define CODE_FOR_lsx_vfrecip_d CODE_FOR_recipv2df3 +#define CODE_FOR_lsx_vfrecip_s CODE_FOR_recipv4sf3 /* LoongArch ASX define CODE_FOR_lasx_mxxx */ #define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3 @@ -780,6 +782,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE) #define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du #define CODE_FOR_lasx_xvfrsqrt_d CODE_FOR_rsqrtv4df2 #define CODE_FOR_lasx_xvfrsqrt_s CODE_FOR_rsqrtv8sf2 +#define CODE_FOR_lasx_xvfrecip_d CODE_FOR_recipv4df3 +#define CODE_FOR_lasx_xvfrecip_s CODE_FOR_recipv8sf3 static const struct loongarch_builtin_description loongarch_builtins[] = { #define LARCH_MOVFCSR2GR 0 @@ -3024,6 +3028,22 @@ loongarch_expand_builtin_direct (enum insn_code icode, rtx target, tree exp, if (has_target_p) create_output_operand (&ops[opno++], target, TYPE_MODE (TREE_TYPE (exp))); + /* For the vector reciprocal instructions, we need to construct a temporary + parameter const1_vector. */ + switch (icode) + { + case CODE_FOR_recipv8sf3: + case CODE_FOR_recipv4df3: + case CODE_FOR_recipv4sf3: + case CODE_FOR_recipv2df3: + loongarch_prepare_builtin_arg (&ops[2], exp, 0); + create_input_operand (&ops[1], CONST1_RTX (ops[0].mode), ops[0].mode); + return loongarch_expand_builtin_insn (icode, 3, ops, has_target_p); + + default: + break; + } + /* Map the arguments to the other operands. */ gcc_assert (opno + call_expr_nargs (exp) == insn_data[icode].n_generator_args); diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index aeae1b1a622..06402e3b353 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -1539,12 +1539,12 @@ (define_insn "lsx_vfmina_" [(set_attr "type" "simd_fminmax") (set_attr "mode" "")]) -(define_insn "lsx_vfrecip_" +(define_insn "recip3" [(set (match_operand:FLSX 0 "register_operand" "=f") - (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] - UNSPEC_LSX_VFRECIP))] + (div:FLSX (match_operand:FLSX 1 "const_vector_1_operand" "") + (match_operand:FLSX 2 "register_operand" "f")))] "ISA_HAS_LSX" - "vfrecip.\t%w0,%w1" + "vfrecip.\t%w0,%w2" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index d02e846cb12..f7796da10b2 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md @@ -227,6 +227,10 @@ (define_predicate "const_1_operand" (and (match_code "const_int,const_wide_int,const_double,const_vector") (match_test "op == CONST1_RTX (GET_MODE (op))"))) +(define_predicate "const_vector_1_operand" + (and (match_code "const_vector") + (match_test "op == CONST1_RTX (GET_MODE (op))"))) + (define_predicate "reg_or_1_operand" (ior (match_operand 0 "const_1_operand") (match_operand 0 "register_operand"))) From patchwork Wed Dec 6 07:04:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1872473 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SlT3122Syz23mf for ; Wed, 6 Dec 2023 18:06:21 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 36A2E384577C for ; Wed, 6 Dec 2023 07:06:19 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 15DDD386D62E for ; Wed, 6 Dec 2023 07:05:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 15DDD386D62E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 15DDD386D62E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:470:142:3::10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701846324; cv=none; b=HBfA1hRMdHSgCyZwweqfk1XtoLmlIpu87oMY7fBFORu53mLQMvyfPk360TqA8NCcuQzeIXUXfON+CSJWZPcLy8kd6K22c+qR2hJNP2yfy6HxQ3ZTiwP6uEYyLm+GFpOQqe4JnVgwv+n0fydzjK73BmnyGVf+Z56Gaz+6QxL1P7Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701846324; c=relaxed/simple; bh=hv/ia8d5dXQGYle6Wo/OdxjbVwZ2FwFheRNeSA5lT9I=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=ib8cePTGaR5BM9vcwYSKKAPnpLY649FSurRMdSINnzkyqfRGBLpKLwuKGygnZ1KbR8xJGvRjRxksMFMEJlQWmXEGArHLkN/GvB0f+EvdKkAg3OYzi2Bb5qh5jYXysccMBdY/vf8oW799ev6jFE9rAZ6KoE84rFGil9E2uhucZaA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from mail.loongson.cn ([114.242.206.163]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rAly8-0000px-1R for gcc-patches@gcc.gnu.org; Wed, 06 Dec 2023 02:05:19 -0500 Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8BxHOslHXBlTDs_AA--.56420S3; Wed, 06 Dec 2023 15:05:10 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dxvi8XHXBlp0BWAA--.59594S8; Wed, 06 Dec 2023 15:05:07 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH v3 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math. Date: Wed, 6 Dec 2023 15:04:52 +0800 Message-Id: <20231206070453.3252-5-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231206070453.3252-1-xujiahao@loongson.cn> References: <20231206070453.3252-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dxvi8XHXBlp0BWAA--.59594S8 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj9fXoWfCr47Xr48tF15KrW5Wr45Jwc_yoW5uF4rGo WrAF4DGw18GrySkw4DKrsxZry8Xw1jyr4xAa9I9wn5CFs7Xr15t3sFka1Yv343CrnxXry5 C3s7uFZ8Z347Za1kl-sFpf9Il3svdjkaLaAFLSUrUUUUjb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY87kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUtVWrXwAv7VC2z280 aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2Iq xVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42 IY6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY 6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aV CY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU84xRDUUUUU== Received-SPF: pass client-ip=114.242.206.163; envelope-from=xujiahao@loongson.cn; helo=mail.loongson.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_FAIL, SPF_HELO_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org When both the -mrecip and -mfrecipe options are enabled, use approximate reciprocal instructions and approximate reciprocal square root instructions with additional Newton-Raphson steps to implement single precision floating-point division, square root and reciprocal square root operations, for a better performance. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in (recip_mask): New variable. (-mrecip, -mrecip): New options. * config/loongarch/lasx.md (div3): New expander. (*div3): Rename. (sqrt2): New expander. (*sqrt2): Rename. (rsqrt2): New expander. * config/loongarch/loongarch-protos.h (loongarch_emit_swrsqrtsf): New prototype. (loongarch_emit_swdivsf): Ditto. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Set recip_mask for -mrecip and -mrecip= options. (loongarch_emit_swrsqrtsf): New function. (loongarch_emit_swdivsf): Ditto. * config/loongarch/loongarch.h (RECIP_MASK_NONE, RECIP_MASK_DIV, RECIP_MASK_SQRT RECIP_MASK_RSQRT, RECIP_MASK_VEC_DIV, RECIP_MASK_VEC_SQRT, RECIP_MASK_VEC_RSQRT RECIP_MASK_ALL): New bitmasks. (TARGET_RECIP_DIV, TARGET_RECIP_SQRT, TARGET_RECIP_RSQRT, TARGET_RECIP_VEC_DIV TARGET_RECIP_VEC_SQRT, TARGET_RECIP_VEC_RSQRT): New tests. * config/loongarch/loongarch.md (sqrt2): New expander. (*sqrt2): Rename. (rsqrt2): New expander. * config/loongarch/loongarch.opt (recip_mask): New variable. (-mrecip, -mrecip): New options. * config/loongarch/lsx.md (div3): New expander. (*div3): Rename. (sqrt2): New expander. (*sqrt2): Rename. (rsqrt2): New expander. * config/loongarch/predicates.md (reg_or_vecotr_1_operand): New predicate. * doc/invoke.texi (LoongArch Options): Document new options. gcc/testsuite/ChangeLog: * gcc.target/loongarch/divf.c: New test. * gcc.target/loongarch/recip-divf.c: New test. * gcc.target/loongarch/recip-sqrtf.c: New test. * gcc.target/loongarch/sqrtf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-divf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip-divf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip.c: New test. * gcc.target/loongarch/vector/lasx/lasx-sqrtf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-divf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip-divf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip.c: New test. * gcc.target/loongarch/vector/lsx/lsx-sqrtf.c: New test. diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 483b185b059..c3848d02fd3 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -23,6 +23,9 @@ config/loongarch/loongarch-opts.h HeaderInclude config/loongarch/loongarch-str.h +TargetVariable +unsigned int recip_mask = 0 + ; ISA related options ;; Base ISA Enum @@ -194,6 +197,14 @@ mexplicit-relocs Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET) Use %reloc() assembly operators (for backward compatibility). +mrecip +Target RejectNegative Var(loongarch_recip) +Generate approximate reciprocal divide and square root for better throughput. + +mrecip= +Target RejectNegative Joined Var(loongarch_recip_name) +Control generation of reciprocal estimates. + ; The code model option names for -mcmodel. Enum Name(cmodel) Type(int) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index e4310c4523d..f6f2feedbb3 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -1194,7 +1194,25 @@ (define_insn "mul3" [(set_attr "type" "simd_fmul") (set_attr "mode" "")]) -(define_insn "div3" +(define_expand "div3" + [(set (match_operand:FLASX 0 "register_operand") + (div:FLASX (match_operand:FLASX 1 "reg_or_vecotr_1_operand") + (match_operand:FLASX 2 "register_operand")))] + "ISA_HAS_LASX" +{ + if (mode == V8SFmode + && TARGET_RECIP_VEC_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) + { + loongarch_emit_swdivsf (operands[0], operands[1], + operands[2], V8SFmode); + DONE; + } +}) + +(define_insn "*div3" [(set (match_operand:FLASX 0 "register_operand" "=f") (div:FLASX (match_operand:FLASX 1 "register_operand" "f") (match_operand:FLASX 2 "register_operand" "f")))] @@ -1223,7 +1241,23 @@ (define_insn "fnma4" [(set_attr "type" "simd_fmadd") (set_attr "mode" "")]) -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:FLASX 0 "register_operand") + (sqrt:FLASX (match_operand:FLASX 1 "register_operand")))] + "ISA_HAS_LASX" +{ + if (mode == V8SFmode + && TARGET_RECIP_VEC_SQRT + && flag_unsafe_math_optimizations + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V8SFmode, 0); + DONE; + } +}) + +(define_insn "*sqrt2" [(set (match_operand:FLASX 0 "register_operand" "=f") (sqrt:FLASX (match_operand:FLASX 1 "register_operand" "f")))] "ISA_HAS_LASX" @@ -1646,7 +1680,20 @@ (define_insn "lasx_xvfrecipe_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) -(define_insn "rsqrt2" +(define_expand "rsqrt2" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRSQRT))] + "ISA_HAS_LASX" + { + if (mode == V8SFmode && TARGET_RECIP_VEC_RSQRT) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V8SFmode, 1); + DONE; + } +}) + +(define_insn "*rsqrt2" [(set (match_operand:FLASX 0 "register_operand" "=f") (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] UNSPEC_LASX_XVFRSQRT))] diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index cb8fc36b086..f2ff93b5e10 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -220,5 +220,7 @@ extern rtx loongarch_gen_const_int_vector_shuffle (machine_mode, int); extern tree loongarch_build_builtin_va_list (void); extern rtx loongarch_build_signbit_mask (machine_mode, bool, bool); +extern void loongarch_emit_swrsqrtsf (rtx, rtx, machine_mode, bool); +extern void loongarch_emit_swdivsf (rtx, rtx, rtx, machine_mode); extern bool loongarch_explicit_relocs_p (enum loongarch_symbol_type); #endif /* ! GCC_LOONGARCH_PROTOS_H */ diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 96a4b846f2d..2c06edcff92 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -7547,6 +7547,71 @@ loongarch_option_override_internal (struct gcc_options *opts, /* Function to allocate machine-dependent function status. */ init_machine_status = &loongarch_init_machine_status; + + /* -mrecip options. */ + static struct + { + const char *string; /* option name. */ + unsigned int mask; /* mask bits to set. */ + } + const recip_options[] = { + { "all", RECIP_MASK_ALL }, + { "none", RECIP_MASK_NONE }, + { "div", RECIP_MASK_DIV }, + { "sqrt", RECIP_MASK_SQRT }, + { "rsqrt", RECIP_MASK_RSQRT }, + { "vec-div", RECIP_MASK_VEC_DIV }, + { "vec-sqrt", RECIP_MASK_VEC_SQRT }, + { "vec-rsqrt", RECIP_MASK_VEC_RSQRT }, + }; + + if (loongarch_recip_name) + { + char *p = ASTRDUP (loongarch_recip_name); + char *q; + unsigned int mask, i; + bool invert; + + while ((q = strtok (p, ",")) != NULL) + { + p = NULL; + if (*q == '!') + { + invert = true; + q++; + } + else + invert = false; + + if (!strcmp (q, "default")) + mask = RECIP_MASK_ALL; + else + { + for (i = 0; i < ARRAY_SIZE (recip_options); i++) + if (!strcmp (q, recip_options[i].string)) + { + mask = recip_options[i].mask; + break; + } + + if (i == ARRAY_SIZE (recip_options)) + { + error ("unknown option for %<-mrecip=%s%>", q); + invert = false; + mask = RECIP_MASK_NONE; + } + } + + if (invert) + recip_mask &= ~mask; + else + recip_mask |= mask; + } + } + if (loongarch_recip) + recip_mask |= RECIP_MASK_ALL; + if (!TARGET_FRECIPE) + recip_mask = RECIP_MASK_NONE; } @@ -11470,6 +11535,126 @@ loongarch_build_signbit_mask (machine_mode mode, bool vect, bool invert) return force_reg (vec_mode, v); } +/* Use rsqrte instruction and Newton-Rhapson to compute the approximation of + a single precision floating point [reciprocal] square root. */ + +void loongarch_emit_swrsqrtsf (rtx res, rtx a, machine_mode mode, bool recip) +{ + rtx x0, e0, e1, e2, mhalf, monehalf; + REAL_VALUE_TYPE r; + int unspec; + + x0 = gen_reg_rtx (mode); + e0 = gen_reg_rtx (mode); + e1 = gen_reg_rtx (mode); + e2 = gen_reg_rtx (mode); + + real_arithmetic (&r, ABS_EXPR, &dconsthalf, NULL); + mhalf = const_double_from_real_value (r, SFmode); + + real_arithmetic (&r, PLUS_EXPR, &dconsthalf, &dconst1); + monehalf = const_double_from_real_value (r, SFmode); + unspec = UNSPEC_RSQRTE; + + if (VECTOR_MODE_P (mode)) + { + mhalf = loongarch_build_const_vector (mode, true, mhalf); + monehalf = loongarch_build_const_vector (mode, true, monehalf); + unspec = GET_MODE_SIZE (mode) == 32 ? UNSPEC_LASX_XVFRSQRTE + : UNSPEC_LSX_VFRSQRTE; + } + + /* rsqrt(a) = rsqrte(a) * (1.5 - 0.5 * a * rsqrte(a) * rsqrte(a)) + sqrt(a) = a * rsqrte(a) * (1.5 - 0.5 * a * rsqrte(a) * rsqrte(a)) */ + + a = force_reg (mode, a); + + /* x0 = rsqrt(a) estimate. */ + emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, a), + unspec))); + + /* If (a == 0.0) Filter out infinity to prevent NaN for sqrt(0.0). */ + if (!recip) + { + rtx zero = force_reg (mode, CONST0_RTX (mode)); + + if (VECTOR_MODE_P (mode)) + { + machine_mode imode = related_int_vector_mode (mode).require (); + rtx mask = gen_reg_rtx (imode); + emit_insn (gen_rtx_SET (mask, gen_rtx_NE (imode, a, zero))); + emit_insn (gen_rtx_SET (x0, gen_rtx_AND (mode, x0, + gen_lowpart (mode, mask)))); + } + else + { + rtx target = emit_conditional_move (x0, { GT, a, zero, mode }, + x0, zero, mode, 0); + if (target != x0) + emit_move_insn (x0, target); + } + } + + /* e0 = x0 * a */ + emit_insn (gen_rtx_SET (e0, gen_rtx_MULT (mode, x0, a))); + /* e1 = e0 * x0 */ + emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0))); + + /* e2 = 1.5 - e1 * 0.5 */ + mhalf = force_reg (mode, mhalf); + monehalf = force_reg (mode, monehalf); + emit_insn (gen_rtx_SET (e2, gen_rtx_FMA (mode, + gen_rtx_NEG (mode, e1), + mhalf, monehalf))); + + if (recip) + /* res = e2 * x0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, x0, e2))); + else + /* res = e2 * e0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e2, e0))); +} + +/* Use recipe instruction and Newton-Rhapson to compute the approximation of + a single precision floating point divide. */ + +void loongarch_emit_swdivsf (rtx res, rtx a, rtx b, machine_mode mode) +{ + rtx x0, e0, mtwo; + REAL_VALUE_TYPE r; + x0 = gen_reg_rtx (mode); + e0 = gen_reg_rtx (mode); + int unspec = UNSPEC_RECIPE; + + real_arithmetic (&r, ABS_EXPR, &dconst2, NULL); + mtwo = const_double_from_real_value (r, SFmode); + + if (VECTOR_MODE_P (mode)) + { + mtwo = loongarch_build_const_vector (mode, true, mtwo); + unspec = GET_MODE_SIZE (mode) == 32 ? UNSPEC_LASX_XVFRECIPE + : UNSPEC_LSX_VFRECIPE; + } + + mtwo = force_reg (mode, mtwo); + + /* a / b = a * recipe(b) * (2.0 - b * recipe(b)) */ + + /* x0 = 1./b estimate. */ + emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b), + unspec))); + /* 2.0 - b * x0 */ + emit_insn (gen_rtx_SET (e0, gen_rtx_FMA (mode, + gen_rtx_NEG (mode, b), x0, mtwo))); + + /* x0 = a * x0 */ + if (a != CONST1_RTX (mode)) + emit_insn (gen_rtx_SET (x0, gen_rtx_MULT (mode, a, x0))); + + /* res = e0 * x0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0))); +} + static bool loongarch_builtin_support_vector_misalignment (machine_mode mode, const_tree type, @@ -11665,6 +11850,9 @@ loongarch_asm_code_end (void) #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \ loongarch_autovectorize_vector_modes +#undef TARGET_OPTAB_SUPPORTED_P +#define TARGET_OPTAB_SUPPORTED_P loongarch_optab_supported_p + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS loongarch_init_builtins #undef TARGET_BUILTIN_DECL diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h index fa8a3f5582f..f1350b6048f 100644 --- a/gcc/config/loongarch/loongarch.h +++ b/gcc/config/loongarch/loongarch.h @@ -702,6 +702,24 @@ enum reg_class && (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT \ || GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)) +#define RECIP_MASK_NONE 0x00 +#define RECIP_MASK_DIV 0x01 +#define RECIP_MASK_SQRT 0x02 +#define RECIP_MASK_RSQRT 0x04 +#define RECIP_MASK_VEC_DIV 0x08 +#define RECIP_MASK_VEC_SQRT 0x10 +#define RECIP_MASK_VEC_RSQRT 0x20 +#define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \ + | RECIP_MASK_RSQRT | RECIP_MASK_VEC_SQRT \ + | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_RSQRT) + +#define TARGET_RECIP_DIV ((recip_mask & RECIP_MASK_DIV) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_SQRT ((recip_mask & RECIP_MASK_SQRT) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_RSQRT ((recip_mask & RECIP_MASK_RSQRT) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_VEC_DIV ((recip_mask & RECIP_MASK_VEC_DIV) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_VEC_SQRT ((recip_mask & RECIP_MASK_VEC_SQRT) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_VEC_RSQRT ((recip_mask & RECIP_MASK_VEC_RSQRT) != 0 || TARGET_uARCH_LA664) + /* 1 if N is a possible register number for function argument passing. We have no FP argument registers when soft-float. */ diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index fd154b02e48..1a10b809e3c 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -893,9 +893,21 @@ (define_peephole ;; Float division and modulus. (define_expand "div3" [(set (match_operand:ANYF 0 "register_operand") - (div:ANYF (match_operand:ANYF 1 "reg_or_1_operand") - (match_operand:ANYF 2 "register_operand")))] - "") + (div:ANYF (match_operand:ANYF 1 "reg_or_1_operand") + (match_operand:ANYF 2 "register_operand")))] + "" +{ + if (mode == SFmode + && TARGET_RECIP_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) + { + loongarch_emit_swdivsf (operands[0], operands[1], + operands[2], SFmode); + DONE; + } +}) (define_insn "*div3" [(set (match_operand:ANYF 0 "register_operand" "=f") @@ -1126,7 +1138,23 @@ (define_insn "*fnma4" ;; ;; .................... -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:ANYF 0 "register_operand") + (sqrt:ANYF (match_operand:ANYF 1 "register_operand")))] + "" + { + if (mode == SFmode + && TARGET_RECIP_SQRT + && flag_unsafe_math_optimizations + && !optimize_insn_for_size_p () + && flag_finite_math_only && !flag_trapping_math) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], SFmode, 0); + DONE; + } + }) + +(define_insn "*sqrt2" [(set (match_operand:ANYF 0 "register_operand" "=f") (sqrt:ANYF (match_operand:ANYF 1 "register_operand" "f")))] "" @@ -1135,6 +1163,19 @@ (define_insn "sqrt2" (set_attr "mode" "") (set_attr "insn_count" "1")]) +(define_expand "rsqrt2" + [(set (match_operand:ANYF 0 "register_operand") + (unspec:ANYF [(match_operand:ANYF 1 "register_operand")] + UNSPEC_RSQRT))] + "TARGET_HARD_FLOAT" +{ + if (mode == SFmode && TARGET_RECIP_RSQRT) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], SFmode, 1); + DONE; + } +}) + (define_insn "*rsqrt2" [(set (match_operand:ANYF 0 "register_operand" "=f") (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt index cdd59ae4fcf..61d25130ea9 100644 --- a/gcc/config/loongarch/loongarch.opt +++ b/gcc/config/loongarch/loongarch.opt @@ -31,6 +31,9 @@ config/loongarch/loongarch-opts.h HeaderInclude config/loongarch/loongarch-str.h +TargetVariable +unsigned int recip_mask = 0 + ; ISA related options ;; Base ISA Enum @@ -202,6 +205,14 @@ mexplicit-relocs Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET) Use %reloc() assembly operators (for backward compatibility). +mrecip +Target RejectNegative Var(loongarch_recip) +Generate approximate reciprocal divide and square root for better throughput. + +mrecip= +Target RejectNegative Joined Var(loongarch_recip_name) +Control generation of reciprocal estimates. + ; The code model option names for -mcmodel. Enum Name(cmodel) Type(int) diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 06402e3b353..55810041d39 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -1083,7 +1083,25 @@ (define_insn "mul3" [(set_attr "type" "simd_fmul") (set_attr "mode" "")]) -(define_insn "div3" +(define_expand "div3" + [(set (match_operand:FLSX 0 "register_operand") + (div:FLSX (match_operand:FLSX 1 "reg_or_vecotr_1_operand") + (match_operand:FLSX 2 "register_operand")))] + "ISA_HAS_LSX" +{ + if (mode == V4SFmode + && TARGET_RECIP_VEC_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) + { + loongarch_emit_swdivsf (operands[0], operands[1], + operands[2], V4SFmode); + DONE; + } +}) + +(define_insn "*div3" [(set (match_operand:FLSX 0 "register_operand" "=f") (div:FLSX (match_operand:FLSX 1 "register_operand" "f") (match_operand:FLSX 2 "register_operand" "f")))] @@ -1112,7 +1130,23 @@ (define_insn "fnma4" [(set_attr "type" "simd_fmadd") (set_attr "mode" "")]) -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:FLSX 0 "register_operand") + (sqrt:FLSX (match_operand:FLSX 1 "register_operand")))] + "ISA_HAS_LSX" +{ + if (mode == V4SFmode + && TARGET_RECIP_VEC_SQRT + && flag_unsafe_math_optimizations + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V4SFmode, 0); + DONE; + } +}) + +(define_insn "*sqrt2" [(set (match_operand:FLSX 0 "register_operand" "=f") (sqrt:FLSX (match_operand:FLSX 1 "register_operand" "f")))] "ISA_HAS_LSX" @@ -1559,7 +1593,20 @@ (define_insn "lsx_vfrecipe_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) -(define_insn "rsqrt2" +(define_expand "rsqrt2" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRSQRT))] + "ISA_HAS_LSX" +{ + if (mode == V4SFmode && TARGET_RECIP_VEC_RSQRT) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V4SFmode, 1); + DONE; + } +}) + +(define_insn "*rsqrt2" [(set (match_operand:FLSX 0 "register_operand" "=f") (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] UNSPEC_LSX_VFRSQRT))] diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index f7796da10b2..9e9ce58cb53 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md @@ -235,6 +235,10 @@ (define_predicate "reg_or_1_operand" (ior (match_operand 0 "const_1_operand") (match_operand 0 "register_operand"))) +(define_predicate "reg_or_vecotr_1_operand" + (ior (match_operand 0 "const_vector_1_operand") + (match_operand 0 "register_operand"))) + ;; These are used in vec_merge, hence accept bitmask as const_int. (define_predicate "const_exp_2_operand" (and (match_code "const_int") diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 6fe63b5f999..bb83edbcb8f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1051,6 +1051,7 @@ Objective-C and Objective-C++ Dialects}. -mexplicit-relocs=@var{style} -mexplicit-relocs -mno-explicit-relocs -mdirect-extern-access -mno-direct-extern-access -mcmodel=@var{code-model} -mrelax -mpass-mrelax-to-as} +-mrecip -mrecip=@var{opt} @emph{M32R/D Options} @gccoptlist{-m32r2 -m32rx -m32r @@ -26598,6 +26599,59 @@ detecting corresponding assembler support: This option is mostly useful for debugging, or interoperation with assemblers different from the build-time one. +@opindex mrecip +@item -mrecip +This option enables use of the reciprocal estimate and reciprocal square +root estimate instructions with additional Newton-Raphson steps to increase +precision instead of doing a divide or square root and divide for +floating-point arguments. +These instructions are generated only when @option{-funsafe-math-optimizations} +is enabled together with @option{-ffinite-math-only} and +@option{-fno-trapping-math}. +This option is off by default. Before you can use this option, you must sure the +target CPU supports frecipe and frsqrte instructions. +Note that while the throughput of the sequence is higher than the throughput of +the non-reciprocal instruction, the precision of the sequence can be decreased +by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). + +@opindex mrecip=opt +@item -mrecip=@var{opt} +This option controls which reciprocal estimate instructions +may be used. @var{opt} is a comma-separated list of options, which may +be preceded by a @samp{!} to invert the option: + +@table @samp +@item all +Enable all estimate instructions. + +@item default +Enable the default instructions, equivalent to @option{-mrecip}. + +@item none +Disable all estimate instructions, equivalent to @option{-mno-recip}. + +@item div +Enable the approximation for scalar division. + +@item vec-div +Enable the approximation for vectorized division. + +@item sqrt +Enable the approximation for scalar square root. + +@item vec-sqrt +Enable the approximation for vectorized square root. + +@item rsqrt +Enable the approximation for scalar reciprocal square root. + +@item vec-rsqrt +Enable the approximation for vectorized reciprocal square root. +@end table + +So, for example, @option{-mrecip=all,!sqrt} enables +all of the reciprocal approximations, except for scalar square root. + @item loongarch-vect-unroll-limit The vectorizer will use available tuning information to determine whether it would be beneficial to unroll the main vectorized loop and by how much. This diff --git a/gcc/testsuite/gcc.target/loongarch/divf.c b/gcc/testsuite/gcc.target/loongarch/divf.c new file mode 100644 index 00000000000..6c831817c9e --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/divf.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mfrecipe -fno-unsafe-math-optimizations" } */ +/* { dg-final { scan-assembler "fdiv.s" } } */ +/* { dg-final { scan-assembler-not "frecipe.s" } } */ + +float +foo(float a, float b) +{ + return a / b; +} diff --git a/gcc/testsuite/gcc.target/loongarch/recip-divf.c b/gcc/testsuite/gcc.target/loongarch/recip-divf.c new file mode 100644 index 00000000000..db5e3e48888 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/recip-divf.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mfrecipe" } */ +/* { dg-final { scan-assembler "frecipe.s" } } */ + +float +foo(float a, float b) +{ + return a / b; +} diff --git a/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c new file mode 100644 index 00000000000..7f45db6cdea --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mfrecipe" } */ +/* { dg-final { scan-assembler-times "frsqrte.s" 3 } } */ + +extern float sqrtf (float); + +float +foo1 (float a, float b) +{ + return a/sqrtf(b); +} + +float +foo2 (float a, float b) +{ + return sqrtf(a/b); +} + +float +foo3 (float a) +{ + return sqrtf(a); +} diff --git a/gcc/testsuite/gcc.target/loongarch/sqrtf.c b/gcc/testsuite/gcc.target/loongarch/sqrtf.c new file mode 100644 index 00000000000..c2720faac7b --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/sqrtf.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mfrecipe -fno-unsafe-math-optimizations" } */ +/* { dg-final { scan-assembler-times "fsqrt.s" 3 } } */ +/* { dg-final { scan-assembler-not "frsqrte.s" } } */ + +extern float sqrtf (float); + +float +foo1 (float a, float b) +{ + return a/sqrtf(b); +} + +float +foo2 (float a, float b) +{ + return sqrtf(a/b); +} + +float +foo3 (float a) +{ + return sqrtf(a); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c new file mode 100644 index 00000000000..748a82200d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mrecip -mlasx -mfrecipe -fno-unsafe-math-optimizations" } */ +/* { dg-final { scan-assembler "xvfdiv.s" } } */ +/* { dg-final { scan-assembler-not "xvfrecipe.s" } } */ + +float a[8],b[8],c[8]; + +void +foo () +{ + for (int i = 0; i < 8; i++) + c[i] = a[i] / b[i]; +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c new file mode 100644 index 00000000000..6532756f07d --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlasx -mfrecipe" } */ +/* { dg-final { scan-assembler "xvfrecipe.s" } } */ + +float a[8],b[8],c[8]; + +void +foo () +{ + for (int i = 0; i < 8; i++) + c[i] = a[i] / b[i]; +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c new file mode 100644 index 00000000000..a623dff8f27 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlasx -mfrecipe" } */ +/* { dg-final { scan-assembler-times "xvfrsqrte.s" 3 } } */ + +float a[8], b[8], c[8]; + +extern float sqrtf (float); + +void +foo1 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = a[i] / sqrtf (b[i]); +} + +void +foo2 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = sqrtf (a[i] / b[i]); +} + +void +foo3 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = sqrtf (a[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c new file mode 100644 index 00000000000..083c868406b --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlasx -fno-vect-cost-model" } */ +/* { dg-final { scan-assembler "xvfrecip.s" } } */ +/* { dg-final { scan-assembler "xvfrecip.d" } } */ +/* { dg-final { scan-assembler-not "xvfdiv.s" } } */ +/* { dg-final { scan-assembler-not "xvfdiv.d" } } */ + +float a[8], b[8]; + +void +foo1(void) +{ + for (int i = 0; i < 8; i++) + a[i] = 1 / (b[i]); +} + +double da[4], db[4]; + +void +foo2(void) +{ + for (int i = 0; i < 4; i++) + da[i] = 1 / (db[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c new file mode 100644 index 00000000000..a005a38865d --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -fno-unsafe-math-optimizations -mrecip -mlasx -mfrecipe" } */ +/* { dg-final { scan-assembler-times "xvfsqrt.s" 3 } } */ +/* { dg-final { scan-assembler-not "xvfrsqrte.s" } } */ + +float a[8], b[8], c[8]; + +extern float sqrtf (float); + +void +foo1 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = a[i] / sqrtf (b[i]); +} + +void +foo2 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = sqrtf (a[i] / b[i]); +} + +void +foo3 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = sqrtf (a[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c new file mode 100644 index 00000000000..1219b1ef842 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlsx -mfrecipe -fno-unsafe-math-optimizations" } */ +/* { dg-final { scan-assembler "vfdiv.s" } } */ +/* { dg-final { scan-assembler-not "vfrecipe.s" } } */ + +float a[4],b[4],c[4]; + +void +foo () +{ + for (int i = 0; i < 4; i++) + c[i] = a[i] / b[i]; +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c new file mode 100644 index 00000000000..edbe8d9098f --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlsx -mfrecipe" } */ +/* { dg-final { scan-assembler "vfrecipe.s" } } */ + +float a[4],b[4],c[4]; + +void +foo () +{ + for (int i = 0; i < 4; i++) + c[i] = a[i] / b[i]; +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c new file mode 100644 index 00000000000..d356f915eb5 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlsx -mfrecipe" } */ +/* { dg-final { scan-assembler-times "vfrsqrte.s" 3 } } */ + +float a[4], b[4], c[4]; + +extern float sqrtf (float); + +void +foo1 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = a[i] / sqrtf (b[i]); +} + +void +foo2 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = sqrtf (a[i] / b[i]); +} + +void +foo3 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = sqrtf (a[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c new file mode 100644 index 00000000000..c4d6af4db93 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlsx -fno-vect-cost-model" } */ +/* { dg-final { scan-assembler "vfrecip.s" } } */ +/* { dg-final { scan-assembler "vfrecip.d" } } */ +/* { dg-final { scan-assembler-not "vfdiv.s" } } */ +/* { dg-final { scan-assembler-not "vfdiv.d" } } */ + +float a[4], b[4]; + +void +foo1(void) +{ + for (int i = 0; i < 4; i++) + a[i] = 1 / (b[i]); +} + +double da[2], db[2]; + +void +foo2(void) +{ + for (int i = 0; i < 2; i++) + da[i] = 1 / (db[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c new file mode 100644 index 00000000000..3ff6570a67a --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlsx -mfrecipe -fno-unsafe-math-optimizations" } */ +/* { dg-final { scan-assembler-times "vfsqrt.s" 3 } } */ +/* { dg-final { scan-assembler-not "vfrsqrte.s" } } */ + +float a[4], b[4], c[4]; + +extern float sqrtf (float); + +void +foo1 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = a[i] / sqrtf (b[i]); +} + +void +foo2 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = sqrtf (a[i] / b[i]); +} + +void +foo3 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = sqrtf (a[i]); +} From patchwork Wed Dec 6 07:04:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1872471 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SlT2S2pD0z23mf for ; Wed, 6 Dec 2023 18:05:52 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C54F4386C5AD for ; Wed, 6 Dec 2023 07:05:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id D6DE338449F5 for ; Wed, 6 Dec 2023 07:05:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D6DE338449F5 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D6DE338449F5 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701846316; cv=none; b=grI5V88W/9hi2dyksf1KupjxdfscchPN15mviU2AxMId3SEgRF3pDHVLecpbepLa7fMk5dWlYqWbAFYotPRQg9sj7p3PA9RPXh+e1snx7LeoLBBcBNzJQFIpSsyu3JLWKdcJXe/NrBxQOAkWTGngWOOzMGsy4eiVLB9HAmt26Hw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701846316; c=relaxed/simple; bh=gtvKEAUbYRjuSptFnR0xIoa60twLkfCjw3OJJ2hx9L4=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=ro6WydDUyYvEU/2M+xnjJn4RonPKRg/M1KBEYmuLtCGkwHiJuFFl+Gut46FMm4SFg6DZJjyTZ6d5YGBrhUJbyWObTvilhF1s7suepowvV8RZ3xaEG99xq/2HT+xuUv1QuSzDWgiCjLmuNBMjYmzbZdv905HldrdIhM87P+uOa4Y= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8Cxc_AoHXBlUDs_AA--.60467S3; Wed, 06 Dec 2023 15:05:12 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dxvi8XHXBlp0BWAA--.59594S9; Wed, 06 Dec 2023 15:05:11 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH v3 5/5] LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf when -mrecip is enabled. Date: Wed, 6 Dec 2023 15:04:53 +0800 Message-Id: <20231206070453.3252-6-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231206070453.3252-1-xujiahao@loongson.cn> References: <20231206070453.3252-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dxvi8XHXBlp0BWAA--.59594S9 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj93XoW7KrWUAFW7trWfZryrCry7twc_yoW8tFyUpr ZIyr13tw4DJr47WrsrJ3yxWw1ayr9xGF42qa13ta4fCa17Kr1Fq3WkKr1qvFZrX3y5WryI vr1IqFs8Za45CwbCm3ZEXasCq-sJn29KB7ZKAUJUUUU5529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUk2b4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r106r15M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E87Iv 67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv6x kF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUcHUqUUUUU X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Using -mrecip generates a sequence of instructions to replace divf, sqrtf and rsqrtf. The number of generated instructions is close to or exceeds the maximum issue instructions per cycle of the LoongArch, so vectorized loop unrolling is not performed on them. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_vector_costs::determine_suggested_unroll_factor): If m_has_recip is true, uf return 1. (loongarch_vector_costs::add_stmt_cost): Detect the use of approximate instruction sequence. diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 2c06edcff92..0ca60e15ced 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -3974,7 +3974,9 @@ protected: /* Reduction factor for suggesting unroll factor. */ unsigned m_reduc_factor = 0; /* True if the loop contains an average operation. */ - bool m_has_avg =false; + bool m_has_avg = false; + /* True if the loop uses approximation instruction sequence. */ + bool m_has_recip = false; }; /* Implement TARGET_VECTORIZE_CREATE_COSTS. */ @@ -4021,7 +4023,7 @@ loongarch_vector_costs::determine_suggested_unroll_factor (loop_vec_info loop_vi { class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - if (m_has_avg) + if (m_has_avg || m_has_recip) return 1; /* Don't unroll if it's specified explicitly not to be unrolled. */ @@ -4081,6 +4083,36 @@ loongarch_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, } } + combined_fn cfn; + if (kind == vector_stmt + && stmt_info + && stmt_info->stmt) + { + /* Detect the use of approximate instruction sequence. */ + if ((TARGET_RECIP_VEC_SQRT || TARGET_RECIP_VEC_RSQRT) + && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST) + switch (cfn) + { + case CFN_BUILT_IN_SQRTF: + m_has_recip = true; + default: + break; + } + else if (TARGET_RECIP_VEC_DIV + && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN) + { + machine_mode mode = TYPE_MODE (vectype); + switch (gimple_assign_rhs_code (stmt_info->stmt)) + { + case RDIV_EXPR: + if (GET_MODE_INNER (mode) == SFmode) + m_has_recip = true; + default: + break; + } + } + } + return retval; }