From patchwork Tue Dec 5 07:01:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1871847 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Sks1H5Q9yz1ySd for ; Tue, 5 Dec 2023 18:02:43 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2FC6138312E3 for ; Tue, 5 Dec 2023 07:02:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 334FD385828B for ; Tue, 5 Dec 2023 07:02:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 334FD385828B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 334FD385828B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701759731; cv=none; b=R8vUmg3D/37C7sPKUoEPKdD8YsmeCHUUSg0gXse5yX9riDL4kGc1Q7dFRDIvjHMbNsUJXBsfBRFS2vWsyyxhw2EHyFHFQ/K6iY1Pc+hEb+URo61SIHa0cLl8eGwDcVIOA45wV0FDYBAfqW/j2fKEp1SEsWZZataOeyoj+pZWXDo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701759731; c=relaxed/simple; bh=z79VbZQ9GHMmw36z6Al78OsEoIcbADqlqL2KfJMvLDM=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=tKxIADY6JiYDmKni1O8dtxabH+iDe40Su0E7k8zXSGn8pSpq9Xw583+VWzvEY9GeszushpsD3H7fyvuTw2tRud9HO1ZuLZ5jyugctAxFxpmeTxNTCLFHN3/+KvNBEt5wG6bYmnQVcSXZHbAsAVXrYJjrUOgx5D+Sr8AyRuaP1Zg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8BxNujpym5lx_g+AA--.6551S3; Tue, 05 Dec 2023 15:02:01 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Cxvdzeym5ljDlVAA--.57842S5; Tue, 05 Dec 2023 15:01:58 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH v2 1/5] LoongArch: Add support for LoongArch V1.1 approximate instructions. Date: Tue, 5 Dec 2023 15:01:43 +0800 Message-Id: <20231205070147.53352-2-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231205070147.53352-1-xujiahao@loongson.cn> References: <20231205070147.53352-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Cxvdzeym5ljDlVAA--.57842S5 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj9fXoWfur1UXFW8tr15JF4ruFyrZrc_yoW5XFyDGo WrCF4DJa1xGryIyFW5Krn5ZrWjvaySyF4DAay3Zws5Ca1xJ3s0yw17u3WFy342qF4kWrn8 C3s5W3sxXFWxJFs5l-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY17kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r1j6r1xM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r1j6r4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280 aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2Iq xVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42 IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY 6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aV CY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxUwmhFDUUUU X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds define_insn/builtins/intrinsics for these instructions, and add option -mfrecipe to control instruction generation. gcc/ChangeLog: * config/loongarch/genopts/isa-evolution.in (fecipe): Add. * config/loongarch/larchintrin.h (__frecipe_s): New intrinsic. (__frecipe_d): Ditto. (__frsqrte_s): Ditto. (__frsqrte_d): Ditto. * config/loongarch/lasx.md (lasx_xvfrecipe_): New insn pattern. (lasx_xvfrsqrte_): Ditto. * config/loongarch/lasxintrin.h (__lasx_xvfrecipe_s): New intrinsic. (__lasx_xvfrecipe_d): Ditto. (__lasx_xvfrsqrte_s): Ditto. (__lasx_xvfrsqrte_d): Ditto. * config/loongarch/loongarch-builtins.cc (AVAIL_ALL): Add predicates. (LSX_EXT_BUILTIN): New macro. (LASX_EXT_BUILTIN): Ditto. * config/loongarch/loongarch-cpucfg-map.h: Regenerate. * config/loongarch/loongarch-str.h (OPTSTR_FRECIPE): Regenerate. * config/loongarch/loongarch.cc (loongarch_asm_code_end): Dump status for TARGET_FRECIPE. * config/loongarch/loongarch.md (loongarch_frecipe_): New insn pattern. (loongarch_frsqrte_): Ditto. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/lsx.md (lsx_vfrecipe_): New insn pattern. (lsx_vfrsqrte_): Ditto. * config/loongarch/lsxintrin.h (__lsx_vfrecipe_s): New intrinsic. (__lsx_vfrecipe_d): Ditto. (__lsx_vfrsqrte_s): Ditto. (__lsx_vfrsqrte_d): Ditto. * doc/extend.texi: Add documentation for LoongArch new builtins and intrinsics. gcc/testsuite/ChangeLog: * gcc.target/loongarch/larch-frecipe-builtin.c: New test. * gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c: New test. * gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c: New test. diff --git a/gcc/config/loongarch/genopts/isa-evolution.in b/gcc/config/loongarch/genopts/isa-evolution.in index a6bc3f87f20..11a198b649f 100644 --- a/gcc/config/loongarch/genopts/isa-evolution.in +++ b/gcc/config/loongarch/genopts/isa-evolution.in @@ -1,3 +1,4 @@ +2 25 frecipe Support frecipe.{s/d} and frsqrte.{s/d} instructions. 2 26 div32 Support div.w[u] and mod.w[u] instructions with inputs not sign-extended. 2 27 lam-bh Support am{swap/add}[_db].{b/h} instructions. 2 28 lamcas Support amcas[_db].{b/h/w/d} instructions. diff --git a/gcc/config/loongarch/larchintrin.h b/gcc/config/loongarch/larchintrin.h index e571ed27b37..028081cccfb 100644 --- a/gcc/config/loongarch/larchintrin.h +++ b/gcc/config/loongarch/larchintrin.h @@ -333,6 +333,44 @@ __iocsrwr_d (unsigned long int _1, unsigned int _2) } #endif +#if defined(__loongarch64) && defined(TARGET_FRECIPE) +/* Assembly instruction format: fd, fj. */ +/* Data types in instruction templates: SF, SF. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__frecipe_s (float _1) +{ + __builtin_loongarch_frecipe_s ((float) _1); +} + +/* Assembly instruction format: fd, fj. */ +/* Data types in instruction templates: DF, DF. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__frecipe_d (double _1) +{ + __builtin_loongarch_frecipe_d ((double) _1); +} + +/* Assembly instruction format: fd, fj. */ +/* Data types in instruction templates: SF, SF. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__frsqrte_s (float _1) +{ + __builtin_loongarch_frsqrte_s ((float) _1); +} + +/* Assembly instruction format: fd, fj. */ +/* Data types in instruction templates: DF, DF. */ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__frsqrte_d (double _1) +{ + __builtin_loongarch_frsqrte_d ((double) _1); +} +#endif + /* Assembly instruction format: ui15. */ /* Data types in instruction templates: USI. */ #define __dbar(/*ui15*/ _1) __builtin_loongarch_dbar ((_1)) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index 116b30c0774..f6e5208a6f1 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -40,8 +40,10 @@ (define_c_enum "unspec" [ UNSPEC_LASX_XVFCVTL UNSPEC_LASX_XVFLOGB UNSPEC_LASX_XVFRECIP + UNSPEC_LASX_XVFRECIPE UNSPEC_LASX_XVFRINT UNSPEC_LASX_XVFRSQRT + UNSPEC_LASX_XVFRSQRTE UNSPEC_LASX_XVFCMP_SAF UNSPEC_LASX_XVFCMP_SEQ UNSPEC_LASX_XVFCMP_SLE @@ -1633,6 +1635,17 @@ (define_insn "lasx_xvfrecip_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Instructions. + +(define_insn "lasx_xvfrecipe_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRECIPE))] + "ISA_HAS_LASX && TARGET_FRECIPE" + "xvfrecipe.\t%u0,%u1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lasx_xvfrsqrt_" [(set (match_operand:FLASX 0 "register_operand" "=f") (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] @@ -1642,6 +1655,17 @@ (define_insn "lasx_xvfrsqrt_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Square Root Instructions. + +(define_insn "lasx_xvfrsqrte_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRSQRTE))] + "ISA_HAS_LASX && TARGET_FRECIPE" + "xvfrsqrte.\t%u0,%u1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lasx_xvftint_u__" [(set (match_operand: 0 "register_operand" "=f") (unspec: [(match_operand:FLASX 1 "register_operand" "f")] diff --git a/gcc/config/loongarch/lasxintrin.h b/gcc/config/loongarch/lasxintrin.h index 7bce2c757f1..c76eb6c58ef 100644 --- a/gcc/config/loongarch/lasxintrin.h +++ b/gcc/config/loongarch/lasxintrin.h @@ -2399,6 +2399,40 @@ __m256d __lasx_xvfrecip_d (__m256d _1) return (__m256d)__builtin_lasx_xvfrecip_d ((v4f64)_1); } +#ifdef TARGET_FRECIPE +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrecipe_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrecipe_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrecipe_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrecipe_d ((v4f64)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrsqrte_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrsqrte_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrsqrte_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrsqrte_d ((v4f64)_1); +} +#endif + /* Assembly instruction format: xd, xj. */ /* Data types in instruction templates: V8SF, V8SF. */ extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index 5d037ab7f10..bf95a44c0d2 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -120,6 +120,9 @@ struct loongarch_builtin_description AVAIL_ALL (hard_float, TARGET_HARD_FLOAT_ABI) AVAIL_ALL (lsx, ISA_HAS_LSX) AVAIL_ALL (lasx, ISA_HAS_LASX) +AVAIL_ALL (frecipe, TARGET_FRECIPE) +AVAIL_ALL (lsx_frecipe, ISA_HAS_LSX && TARGET_FRECIPE) +AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE) /* Construct a loongarch_builtin_description from the given arguments. @@ -164,6 +167,15 @@ AVAIL_ALL (lasx, ISA_HAS_LASX) "__builtin_lsx_" #INSN, LARCH_BUILTIN_DIRECT, \ FUNCTION_TYPE, loongarch_builtin_avail_lsx } + /* Define an LSX LARCH_BUILTIN_DIRECT function __builtin_lsx_ + for instruction CODE_FOR_lsx_. FUNCTION_TYPE is a builtin_description + field. AVAIL is the name of the availability predicate, without the leading + loongarch_builtin_avail_. */ +#define LSX_EXT_BUILTIN(INSN, FUNCTION_TYPE, AVAIL) \ + { CODE_FOR_lsx_ ## INSN, \ + "__builtin_lsx_" #INSN, LARCH_BUILTIN_DIRECT, \ + FUNCTION_TYPE, loongarch_builtin_avail_##AVAIL } + /* Define an LSX LARCH_BUILTIN_LSX_TEST_BRANCH function __builtin_lsx_ for instruction CODE_FOR_lsx_. FUNCTION_TYPE is a builtin_description @@ -189,6 +201,15 @@ AVAIL_ALL (lasx, ISA_HAS_LASX) "__builtin_lasx_" #INSN, LARCH_BUILTIN_LASX, \ FUNCTION_TYPE, loongarch_builtin_avail_lasx } +/* Define an LASX LARCH_BUILTIN_DIRECT function __builtin_lasx_ + for instruction CODE_FOR_lasx_. FUNCTION_TYPE is a builtin_description + field. AVAIL is the name of the availability predicate, without the leading + loongarch_builtin_avail_. */ +#define LASX_EXT_BUILTIN(INSN, FUNCTION_TYPE, AVAIL) \ + { CODE_FOR_lasx_ ## INSN, \ + "__builtin_lasx_" #INSN, LARCH_BUILTIN_LASX, \ + FUNCTION_TYPE, loongarch_builtin_avail_##AVAIL } + /* Define an LASX LARCH_BUILTIN_DIRECT_NO_TARGET function __builtin_lasx_ for instruction CODE_FOR_lasx_. FUNCTION_TYPE is a builtin_description field. */ @@ -804,6 +825,27 @@ static const struct loongarch_builtin_description loongarch_builtins[] = { DIRECT_NO_TARGET_BUILTIN (syscall, LARCH_VOID_FTYPE_USI, default), DIRECT_NO_TARGET_BUILTIN (break, LARCH_VOID_FTYPE_USI, default), + /* Built-in functions for frecipe.{s/d} and frsqrte.{s/d}. */ + + DIRECT_BUILTIN (frecipe_s, LARCH_SF_FTYPE_SF, frecipe), + DIRECT_BUILTIN (frecipe_d, LARCH_DF_FTYPE_DF, frecipe), + DIRECT_BUILTIN (frsqrte_s, LARCH_SF_FTYPE_SF, frecipe), + DIRECT_BUILTIN (frsqrte_d, LARCH_DF_FTYPE_DF, frecipe), + + /* Built-in functions for new LSX instructions. */ + + LSX_EXT_BUILTIN (vfrecipe_s, LARCH_V4SF_FTYPE_V4SF, lsx_frecipe), + LSX_EXT_BUILTIN (vfrecipe_d, LARCH_V2DF_FTYPE_V2DF, lsx_frecipe), + LSX_EXT_BUILTIN (vfrsqrte_s, LARCH_V4SF_FTYPE_V4SF, lsx_frecipe), + LSX_EXT_BUILTIN (vfrsqrte_d, LARCH_V2DF_FTYPE_V2DF, lsx_frecipe), + + /* Built-in functions for new LASX instructions. */ + + LASX_EXT_BUILTIN (xvfrecipe_s, LARCH_V8SF_FTYPE_V8SF, lasx_frecipe), + LASX_EXT_BUILTIN (xvfrecipe_d, LARCH_V4DF_FTYPE_V4DF, lasx_frecipe), + LASX_EXT_BUILTIN (xvfrsqrte_s, LARCH_V8SF_FTYPE_V8SF, lasx_frecipe), + LASX_EXT_BUILTIN (xvfrsqrte_d, LARCH_V4DF_FTYPE_V4DF, lasx_frecipe), + /* Built-in functions for LSX. */ LSX_BUILTIN (vsll_b, LARCH_V16QI_FTYPE_V16QI_V16QI), LSX_BUILTIN (vsll_h, LARCH_V8HI_FTYPE_V8HI_V8HI), diff --git a/gcc/config/loongarch/loongarch-cpucfg-map.h b/gcc/config/loongarch/loongarch-cpucfg-map.h index 02ff1671255..148333c249c 100644 --- a/gcc/config/loongarch/loongarch-cpucfg-map.h +++ b/gcc/config/loongarch/loongarch-cpucfg-map.h @@ -29,6 +29,7 @@ static constexpr struct { unsigned int cpucfg_bit; HOST_WIDE_INT isa_evolution_bit; } cpucfg_map[] = { + { 2, 1u << 25, OPTION_MASK_ISA_FRECIPE }, { 2, 1u << 26, OPTION_MASK_ISA_DIV32 }, { 2, 1u << 27, OPTION_MASK_ISA_LAM_BH }, { 2, 1u << 28, OPTION_MASK_ISA_LAMCAS }, diff --git a/gcc/config/loongarch/loongarch-str.h b/gcc/config/loongarch/loongarch-str.h index 0384493765c..236b804766e 100644 --- a/gcc/config/loongarch/loongarch-str.h +++ b/gcc/config/loongarch/loongarch-str.h @@ -69,6 +69,7 @@ along with GCC; see the file COPYING3. If not see #define STR_EXPLICIT_RELOCS_NONE "none" #define STR_EXPLICIT_RELOCS_ALWAYS "always" +#define OPTSTR_FRECIPE "frecipe" #define OPTSTR_DIV32 "div32" #define OPTSTR_LAM_BH "lam-bh" #define OPTSTR_LAMCAS "lamcas" diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 3545e66a10e..57a20bec8a4 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -11503,6 +11503,7 @@ loongarch_asm_code_end (void) loongarch_cpu_strings [la_target.cpu_tune]); fprintf (asm_out_file, "%s Base ISA: %s\n", ASM_COMMENT_START, loongarch_isa_base_strings [la_target.isa.base]); + DUMP_FEATURE (TARGET_FRECIPE); DUMP_FEATURE (TARGET_DIV32); DUMP_FEATURE (TARGET_LAM_BH); DUMP_FEATURE (TARGET_LAMCAS); diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 7a101dd64b7..07beede8892 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -59,6 +59,12 @@ (define_c_enum "unspec" [ ;; Stack tie UNSPEC_TIE + ;; RSQRT + UNSPEC_RSQRTE + + ;; RECIP + UNSPEC_RECIPE + ;; CRC UNSPEC_CRC UNSPEC_CRCC @@ -220,6 +226,7 @@ (define_attr "qword_mode" "no,yes" ;; fmadd floating point multiply-add ;; fdiv floating point divide ;; frdiv floating point reciprocal divide +;; frecipe floating point approximate reciprocal ;; fabs floating point absolute value ;; flogb floating point exponent extract ;; fneg floating point negation @@ -229,6 +236,7 @@ (define_attr "qword_mode" "no,yes" ;; fscaleb floating point scale ;; fsqrt floating point square root ;; frsqrt floating point reciprocal square root +;; frsqrte floating point approximate reciprocal square root ;; multi multiword sequence (or user asm statements) ;; atomic atomic memory update instruction ;; syncloop memory atomic operation implemented as a sync loop @@ -238,8 +246,8 @@ (define_attr "type" "unknown,branch,jump,call,load,fpload,fpidxload,store,fpstore,fpidxstore, prefetch,prefetchx,condmove,mgtf,mftg,const,arith,logical, shift,slt,signext,clz,trap,imul,idiv,move, - fmove,fadd,fmul,fmadd,fdiv,frdiv,fabs,flogb,fneg,fcmp,fcopysign,fcvt, - fscaleb,fsqrt,frsqrt,accext,accmod,multi,atomic,syncloop,nop,ghost, + fmove,fadd,fmul,fmadd,fdiv,frdiv,frecipe,fabs,flogb,fneg,fcmp,fcopysign,fcvt, + fscaleb,fsqrt,frsqrt,frsqrte,accext,accmod,multi,atomic,syncloop,nop,ghost, simd_div,simd_fclass,simd_flog2,simd_fadd,simd_fcvt,simd_fmul,simd_fmadd, simd_fdiv,simd_bitins,simd_bitmov,simd_insert,simd_sld,simd_mul,simd_fcmp, simd_fexp2,simd_int_arith,simd_bit,simd_shift,simd_splat,simd_fill, @@ -908,6 +916,18 @@ (define_insn "*recip3" [(set_attr "type" "frdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Instructions. + +(define_insn "loongarch_frecipe_" + [(set (match_operand:ANYF 0 "register_operand" "=f") + (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] + UNSPEC_RECIPE))] + "TARGET_FRECIPE" + "frecipe.\t%0,%1" + [(set_attr "type" "frecipe") + (set_attr "mode" "") + (set_attr "insn_count" "1")]) + ;; Integer division and modulus. (define_expand "3" [(set (match_operand:GPR 0 "register_operand") @@ -1133,6 +1153,17 @@ (define_insn "*rsqrtb" [(set_attr "type" "frsqrt") (set_attr "mode" "") (set_attr "insn_count" "1")]) + +;; Approximate Reciprocal Square Root Instructions. + +(define_insn "loongarch_frsqrte_" + [(set (match_operand:ANYF 0 "register_operand" "=f") + (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] + UNSPEC_RSQRTE))] + "TARGET_FRECIPE" + "frsqrte.\t%0,%1" + [(set_attr "type" "frsqrte") + (set_attr "mode" "")]) ;; ;; .................... diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt index 4d36e3ec4de..38c6b23d400 100644 --- a/gcc/config/loongarch/loongarch.opt +++ b/gcc/config/loongarch/loongarch.opt @@ -263,6 +263,10 @@ default value is 4. Variable HOST_WIDE_INT isa_evolution = 0 +mfrecipe +Target Mask(ISA_FRECIPE) Var(isa_evolution) +Support frecipe.{s/d} and frsqrte.{s/d} instructions. + mdiv32 Target Mask(ISA_DIV32) Var(isa_evolution) Support div.w[u] and mod.w[u] instructions with inputs not sign-extended. diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 23239993404..e2393aed139 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -42,8 +42,10 @@ (define_c_enum "unspec" [ UNSPEC_LSX_VFCVTL UNSPEC_LSX_VFLOGB UNSPEC_LSX_VFRECIP + UNSPEC_LSX_VFRECIPE UNSPEC_LSX_VFRINT UNSPEC_LSX_VFRSQRT + UNSPEC_LSX_VFRSQRTE UNSPEC_LSX_VFCMP_SAF UNSPEC_LSX_VFCMP_SEQ UNSPEC_LSX_VFCMP_SLE @@ -1546,6 +1548,17 @@ (define_insn "lsx_vfrecip_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Instructions. + +(define_insn "lsx_vfrecipe_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRECIPE))] + "ISA_HAS_LSX && TARGET_FRECIPE" + "vfrecipe.\t%w0,%w1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lsx_vfrsqrt_" [(set (match_operand:FLSX 0 "register_operand" "=f") (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] @@ -1555,6 +1568,17 @@ (define_insn "lsx_vfrsqrt_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Square Root Instructions. + +(define_insn "lsx_vfrsqrte_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRSQRTE))] + "ISA_HAS_LSX && TARGET_FRECIPE" + "vfrsqrte.\t%w0,%w1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lsx_vftint_u__" [(set (match_operand: 0 "register_operand" "=f") (unspec: [(match_operand:FLSX 1 "register_operand" "f")] diff --git a/gcc/config/loongarch/lsxintrin.h b/gcc/config/loongarch/lsxintrin.h index 29553c093fa..890173c53e0 100644 --- a/gcc/config/loongarch/lsxintrin.h +++ b/gcc/config/loongarch/lsxintrin.h @@ -2480,6 +2480,40 @@ __m128d __lsx_vfrecip_d (__m128d _1) return (__m128d)__builtin_lsx_vfrecip_d ((v2f64)_1); } +#ifdef TARGET_FRECIPE +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrecipe_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrecipe_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrecipe_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrecipe_d ((v2f64)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrsqrte_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrsqrte_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrsqrte_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrsqrte_d ((v2f64)_1); +} +#endif + /* Assembly instruction format: vd, vj. */ /* Data types in instruction templates: V4SF, V4SF. */ extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 32ae15e1d5b..99eeb83043f 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -17020,6 +17020,11 @@ The intrinsics provided are listed below: void __builtin_loongarch_iocsrwr_w (unsigned int, unsigned int) void __builtin_loongarch_iocsrwr_d (unsigned long int, unsigned int) + float __builtin_loongarch_frecipe_s (float); + double __builtin_loongarch_frecipe_d (double); + float __builtin_loongarch_frsqrte_s (float); + double __builtin_loongarch_frsqrte_d (double); + void __builtin_loongarch_dbar (imm0_32767) void __builtin_loongarch_ibar (imm0_32767) @@ -17092,6 +17097,11 @@ function you need to include @code{larchintrin.h}. void __iocsrwr_w (unsigned int, unsigned int) void __iocsrwr_d (unsigned long, unsigned int) + float __frecipe_s (float); + double __frecipe_d (double); + float __frsqrte_s (float); + double __frsqrte_d (double); + void __dbar (imm0_32767) void __ibar (imm0_32767) @@ -17434,6 +17444,8 @@ __m128d __lsx_vfnmsub_d (__m128d, __m128d, __m128d); __m128 __lsx_vfnmsub_s (__m128, __m128, __m128); __m128d __lsx_vfrecip_d (__m128d); __m128 __lsx_vfrecip_s (__m128); +__m128d __lsx_vfrecipe_d (__m128d); +__m128 __lsx_vfrecipe_s (__m128); __m128d __lsx_vfrint_d (__m128d); __m128i __lsx_vfrintrm_d (__m128d); __m128i __lsx_vfrintrm_s (__m128); @@ -17446,6 +17458,8 @@ __m128i __lsx_vfrintrz_s (__m128); __m128 __lsx_vfrint_s (__m128); __m128d __lsx_vfrsqrt_d (__m128d); __m128 __lsx_vfrsqrt_s (__m128); +__m128d __lsx_vfrsqrte_d (__m128d); +__m128 __lsx_vfrsqrte_s (__m128); __m128i __lsx_vfrstp_b (__m128i, __m128i, __m128i); __m128i __lsx_vfrstp_h (__m128i, __m128i, __m128i); __m128i __lsx_vfrstpi_b (__m128i, __m128i, imm0_31); @@ -18269,6 +18283,8 @@ __m256d __lasx_xvfnmsub_d (__m256d, __m256d, __m256d); __m256 __lasx_xvfnmsub_s (__m256, __m256, __m256); __m256d __lasx_xvfrecip_d (__m256d); __m256 __lasx_xvfrecip_s (__m256); +__m256d __lasx_xvfrecipe_d (__m256d); +__m256 __lasx_xvfrecipe_s (__m256); __m256d __lasx_xvfrint_d (__m256d); __m256i __lasx_xvfrintrm_d (__m256d); __m256i __lasx_xvfrintrm_s (__m256); @@ -18281,6 +18297,8 @@ __m256i __lasx_xvfrintrz_s (__m256); __m256 __lasx_xvfrint_s (__m256); __m256d __lasx_xvfrsqrt_d (__m256d); __m256 __lasx_xvfrsqrt_s (__m256); +__m256d __lasx_xvfrsqrte_d (__m256d); +__m256 __lasx_xvfrsqrte_s (__m256); __m256i __lasx_xvfrstp_b (__m256i, __m256i, __m256i); __m256i __lasx_xvfrstp_h (__m256i, __m256i, __m256i); __m256i __lasx_xvfrstpi_b (__m256i, __m256i, imm0_31); diff --git a/gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c b/gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c new file mode 100644 index 00000000000..b9329f34676 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/larch-frecipe-builtin.c @@ -0,0 +1,28 @@ +/* Test builtins for frecipe.{s/d} and frsqrte.{s/d} instructions */ +/* { dg-do compile } */ +/* { dg-options "-mfrecipe" } */ +/* { dg-final { scan-assembler-times "test_frecipe_s:.*frecipe\\.s.*test_frecipe_s" 1 } } */ +/* { dg-final { scan-assembler-times "test_frecipe_d:.*frecipe\\.d.*test_frecipe_d" 1 } } */ +/* { dg-final { scan-assembler-times "test_frsqrte_s:.*frsqrte\\.s.*test_frsqrte_s" 1 } } */ +/* { dg-final { scan-assembler-times "test_frsqrte_d:.*frsqrte\\.d.*test_frsqrte_d" 1 } } */ + +float +test_frecipe_s (float _1) +{ + return __builtin_loongarch_frecipe_s (_1); +} +double +test_frecipe_d (double _1) +{ + return __builtin_loongarch_frecipe_d (_1); +} +float +test_frsqrte_s (float _1) +{ + return __builtin_loongarch_frsqrte_s (_1); +} +double +test_frsqrte_d (double _1) +{ + return __builtin_loongarch_frsqrte_d (_1); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c new file mode 100644 index 00000000000..522535b45a3 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-frecipe-builtin.c @@ -0,0 +1,30 @@ +/* Test builtins for xvfrecipe.{s/d} and xvfrsqrte.{s/d} instructions */ +/* { dg-do compile } */ +/* { dg-options "-mlasx -mfrecipe" } */ +/* { dg-final { scan-assembler-times "lasx_xvfrecipe_s:.*xvfrecipe\\.s.*lasx_xvfrecipe_s" 1 } } */ +/* { dg-final { scan-assembler-times "lasx_xvfrecipe_d:.*xvfrecipe\\.d.*lasx_xvfrecipe_d" 1 } } */ +/* { dg-final { scan-assembler-times "lasx_xvfrsqrte_s:.*xvfrsqrte\\.s.*lasx_xvfrsqrte_s" 1 } } */ +/* { dg-final { scan-assembler-times "lasx_xvfrsqrte_d:.*xvfrsqrte\\.d.*lasx_xvfrsqrte_d" 1 } } */ + +#include + +v8f32 +__lasx_xvfrecipe_s (v8f32 _1) +{ + return __builtin_lasx_xvfrecipe_s (_1); +} +v4f64 +__lasx_xvfrecipe_d (v4f64 _1) +{ + return __builtin_lasx_xvfrecipe_d (_1); +} +v8f32 +__lasx_xvfrsqrte_s (v8f32 _1) +{ + return __builtin_lasx_xvfrsqrte_s (_1); +} +v4f64 +__lasx_xvfrsqrte_d (v4f64 _1) +{ + return __builtin_lasx_xvfrsqrte_d (_1); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c new file mode 100644 index 00000000000..4ad0cb0ffd6 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-frecipe-builtin.c @@ -0,0 +1,30 @@ +/* Test builtins for vfrecipe.{s/d} and vfrsqrte.{s/d} instructions */ +/* { dg-do compile } */ +/* { dg-options "-mlsx -mfrecipe" } */ +/* { dg-final { scan-assembler-times "lsx_vfrecipe_s:.*vfrecipe\\.s.*lsx_vfrecipe_s" 1 } } */ +/* { dg-final { scan-assembler-times "lsx_vfrecipe_d:.*vfrecipe\\.d.*lsx_vfrecipe_d" 1 } } */ +/* { dg-final { scan-assembler-times "lsx_vfrsqrte_s:.*vfrsqrte\\.s.*lsx_vfrsqrte_s" 1 } } */ +/* { dg-final { scan-assembler-times "lsx_vfrsqrte_d:.*vfrsqrte\\.d.*lsx_vfrsqrte_d" 1 } } */ + +#include + +v4f32 +__lsx_vfrecipe_s (v4f32 _1) +{ + return __builtin_lsx_vfrecipe_s (_1); +} +v2f64 +__lsx_vfrecipe_d (v2f64 _1) +{ + return __builtin_lsx_vfrecipe_d (_1); +} +v4f32 +__lsx_vfrsqrte_s (v4f32 _1) +{ + return __builtin_lsx_vfrsqrte_s (_1); +} +v2f64 +__lsx_vfrsqrte_d (v2f64 _1) +{ + return __builtin_lsx_vfrsqrte_d (_1); +} From patchwork Tue Dec 5 07:01:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1871846 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Sks0y5bN6z1ySd for ; Tue, 5 Dec 2023 18:02:26 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 895B23843B4B for ; Tue, 5 Dec 2023 07:02:23 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 53960384CBB6 for ; Tue, 5 Dec 2023 07:02:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 53960384CBB6 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 53960384CBB6 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701759731; cv=none; b=SLO+Fr//hPRvALGHSUHFcY/pQQ3ug4X5RNdjRzY45DbsYUC555OiwrpCf/Ms/GDqQp2tyrEWKmjhAZmOCPOMwyOlVQMhtsd4u8MUySm8l2Gr45acfa1GQB0sCh5NG1mvhEj3kIxXhEMoIudBi5uNO1SbZEWJ7LORkQNDRhbxDgc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701759731; c=relaxed/simple; bh=rx1HhqPfTb/hFWc+/y1ZG1ALgHrLsyYOlHbaYRzvJwk=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=hL5MJzPKIVxjLO8tLBSNhcQofk67dgXjP2KMdzHqU2BtjLzJFmoiNmKePGnwNU1WU+Tnrvx2uV8mat9y7vkmPVl9lEqp4lyUV1yiUsSiHWQlmsy6+LPWazdTX/9ZBxGNliw46pRi7h4MKfLVmc5oPqH1EFJKLWAs3SZvdes8FWA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8AxV_Htym5ly_g+AA--.60882S3; Tue, 05 Dec 2023 15:02:05 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Cxvdzeym5ljDlVAA--.57842S6; Tue, 05 Dec 2023 15:02:03 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH v2 2/5] LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions. Date: Tue, 5 Dec 2023 15:01:44 +0800 Message-Id: <20231205070147.53352-3-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231205070147.53352-1-xujiahao@loongson.cn> References: <20231205070147.53352-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Cxvdzeym5ljDlVAA--.57842S6 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj93XoW3WF4xCFyUWr4kWw1xWw4Dtrc_yoW3uw18p3 9rCw1vyrW8JFs7Kr1kt3y5Xr45tr9rGF129a9Iv3y2kan0q3WDZF1vkFZFqFyjqw4rGryI vw4rW3WjvFWUC3cCm3ZEXasCq-sJn29KB7ZKAUJUUUU5529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkFb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1Y6r17M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E87Iv 67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07josjUUUUUU= X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard pattern name. Define function use_rsqrt_p to decide when to use rsqrt optab. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to .. (rsqrt2): .. this. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrsqrt_d): Redefine to standard pattern name. (CODE_FOR_lsx_vfrsqrt_s): Ditto. (CODE_FOR_lasx_xvfrsqrt_d): Ditto. (CODE_FOR_lasx_xvfrsqrt_s): Ditto. * config/loongarch/loongarch.cc (use_rsqrt_p): New function. (loongarch_optab_supported_p): Ditto. (TARGET_OPTAB_SUPPORTED_P): New hook. * config/loongarch/loongarch.md (*rsqrta): Remove. (*rsqrt2): New insn pattern. (*rsqrtb): Remove. * config/loongarch/lsx.md (lsx_vfrsqrt_): Renamed to .. (rsqrt2): .. this. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-rsqrt.c: New test. * gcc.target/loongarch/vector/lsx/lsx-rsqrt.c: New test. diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index f6e5208a6f1..c8edc1bfd76 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -1646,10 +1646,10 @@ (define_insn "lasx_xvfrecipe_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) -(define_insn "lasx_xvfrsqrt_" +(define_insn "rsqrt2" [(set (match_operand:FLASX 0 "register_operand" "=f") - (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] - UNSPEC_LASX_XVFRSQRT))] + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRSQRT))] "ISA_HAS_LASX" "xvfrsqrt.\t%u0,%u1" [(set_attr "type" "simd_fdiv") diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index bf95a44c0d2..b196e142d61 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -500,6 +500,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE) #define CODE_FOR_lsx_vssrlrn_bu_h CODE_FOR_lsx_vssrlrn_u_bu_h #define CODE_FOR_lsx_vssrlrn_hu_w CODE_FOR_lsx_vssrlrn_u_hu_w #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d +#define CODE_FOR_lsx_vfrsqrt_d CODE_FOR_rsqrtv2df2 +#define CODE_FOR_lsx_vfrsqrt_s CODE_FOR_rsqrtv4sf2 /* LoongArch ASX define CODE_FOR_lasx_mxxx */ #define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3 @@ -776,6 +778,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE) #define CODE_FOR_lasx_xvsat_hu CODE_FOR_lasx_xvsat_u_hu #define CODE_FOR_lasx_xvsat_wu CODE_FOR_lasx_xvsat_u_wu #define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du +#define CODE_FOR_lasx_xvfrsqrt_d CODE_FOR_rsqrtv4df2 +#define CODE_FOR_lasx_xvfrsqrt_s CODE_FOR_rsqrtv8sf2 static const struct loongarch_builtin_description loongarch_builtins[] = { #define LARCH_MOVFCSR2GR 0 diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 57a20bec8a4..96a4b846f2d 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -11487,6 +11487,30 @@ loongarch_builtin_support_vector_misalignment (machine_mode mode, is_packed); } +static bool +use_rsqrt_p (void) +{ + return (flag_finite_math_only + && !flag_trapping_math + && flag_unsafe_math_optimizations); +} + +/* Implement the TARGET_OPTAB_SUPPORTED_P hook. */ + +static bool +loongarch_optab_supported_p (int op, machine_mode, machine_mode, + optimization_type opt_type) +{ + switch (op) + { + case rsqrt_optab: + return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (); + + default: + return true; + } +} + /* If -fverbose-asm, dump some info for debugging. */ static void loongarch_asm_code_end (void) @@ -11625,6 +11649,9 @@ loongarch_asm_code_end (void) #undef TARGET_FUNCTION_ARG_BOUNDARY #define TARGET_FUNCTION_ARG_BOUNDARY loongarch_function_arg_boundary +#undef TARGET_OPTAB_SUPPORTED_P +#define TARGET_OPTAB_SUPPORTED_P loongarch_optab_supported_p + #undef TARGET_VECTOR_MODE_SUPPORTED_P #define TARGET_VECTOR_MODE_SUPPORTED_P loongarch_vector_mode_supported_p diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 07beede8892..fd154b02e48 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -60,6 +60,7 @@ (define_c_enum "unspec" [ UNSPEC_TIE ;; RSQRT + UNSPEC_RSQRT UNSPEC_RSQRTE ;; RECIP @@ -1134,25 +1135,14 @@ (define_insn "sqrt2" (set_attr "mode" "") (set_attr "insn_count" "1")]) -(define_insn "*rsqrta" +(define_insn "*rsqrt2" [(set (match_operand:ANYF 0 "register_operand" "=f") - (div:ANYF (match_operand:ANYF 1 "const_1_operand" "") - (sqrt:ANYF (match_operand:ANYF 2 "register_operand" "f"))))] - "flag_unsafe_math_optimizations" - "frsqrt.\t%0,%2" - [(set_attr "type" "frsqrt") - (set_attr "mode" "") - (set_attr "insn_count" "1")]) - -(define_insn "*rsqrtb" - [(set (match_operand:ANYF 0 "register_operand" "=f") - (sqrt:ANYF (div:ANYF (match_operand:ANYF 1 "const_1_operand" "") - (match_operand:ANYF 2 "register_operand" "f"))))] - "flag_unsafe_math_optimizations" - "frsqrt.\t%0,%2" + (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] + UNSPEC_RSQRT))] + "TARGET_HARD_FLOAT" + "frsqrt.\t%0,%1" [(set_attr "type" "frsqrt") - (set_attr "mode" "") - (set_attr "insn_count" "1")]) + (set_attr "mode" "")]) ;; Approximate Reciprocal Square Root Instructions. diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index e2393aed139..aeae1b1a622 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -1559,10 +1559,10 @@ (define_insn "lsx_vfrecipe_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) -(define_insn "lsx_vfrsqrt_" +(define_insn "rsqrt2" [(set (match_operand:FLSX 0 "register_operand" "=f") - (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] - UNSPEC_LSX_VFRSQRT))] + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRSQRT))] "ISA_HAS_LSX" "vfrsqrt.\t%w0,%w1" [(set_attr "type" "simd_fdiv") diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c new file mode 100644 index 00000000000..24316944d4e --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-rsqrt.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlasx -ffast-math" } */ +/* { dg-final { scan-assembler "xvfrsqrt.s" } } */ +/* { dg-final { scan-assembler "xvfrsqrt.d" } } */ + +extern float sqrtf (float); + +float a[8], b[8]; + +void +foo1(void) +{ + for (int i = 0; i < 8; i++) + a[i] = 1 / sqrtf (b[i]); +} + +extern double sqrt (double); + +double da[4], db[4]; + +void +foo2(void) +{ + for (int i = 0; i < 4; i++) + da[i] = 1 / sqrt (db[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c new file mode 100644 index 00000000000..519cc47644c --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-rsqrt.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlsx -ffast-math" } */ +/* { dg-final { scan-assembler "vfrsqrt.s" } } */ +/* { dg-final { scan-assembler "vfrsqrt.d" } } */ + +extern float sqrtf (float); + +float a[4], b[4]; + +void +foo1(void) +{ + for (int i = 0; i < 4; i++) + a[i] = 1 / sqrtf (b[i]); +} + +extern double sqrt (double); + +double da[2], db[2]; + +void +foo2(void) +{ + for (int i = 0; i < 2; i++) + da[i] = 1 / sqrt (db[i]); +} From patchwork Tue Dec 5 07:01:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1871848 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Sks1d4HPNz1ySd for ; Tue, 5 Dec 2023 18:03:01 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DDB8E38768AF for ; Tue, 5 Dec 2023 07:02:57 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 2B486384CBA9 for ; Tue, 5 Dec 2023 07:02:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2B486384CBA9 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2B486384CBA9 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701759732; cv=none; b=XYYoJSQx/es39sCYpovKdy+cPW+YMYBcenf5WDPpALgOnZ5AtDqlgISjtujB/gCwvo+axwkU/Km3CHbuzQ70jNJYNIxj9ISjYKB05IwqH4sJXEDXooZpM+mqYLA6Lg7jG7L38M5ScqB4x1kpBC7uTGWX1/okSMueDHPcrb1AVs0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701759732; c=relaxed/simple; bh=Q2oCjL8g9i+D2+gYcazYKSKf1dqfgcnMhisynzzBOLM=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=Zg1nertPOzYbyqZMApj1T28Ll0b0U1SrlgSnSR1XTYRci1iYg9wjTfoJ/wa3RQZwxzUcDg6T6iMNZRI7OANMF5XlUaO+AxxoOJGm+HZ2woDvwOgXjRVY3FpjS8JFjjfih6AwznR07fOd26y9kWmS43UfHyGPxvS+3D1s1cyZ9Ic= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8DxqOrwym5l0Pg+AA--.24562S3; Tue, 05 Dec 2023 15:02:08 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Cxvdzeym5ljDlVAA--.57842S7; Tue, 05 Dec 2023 15:02:07 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH v2 3/5] LoongArch: Redefine pattern for xvfrecip/vfrecip instructions. Date: Tue, 5 Dec 2023 15:01:45 +0800 Message-Id: <20231205070147.53352-4-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231205070147.53352-1-xujiahao@loongson.cn> References: <20231205070147.53352-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Cxvdzeym5ljDlVAA--.57842S7 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj93XoW3WF43Gr15Jr1UtFWUur1rGrX_yoW7AryDpr ZrC3ZFyrWrJFsIgw1ktay5Xr15Kr9rKF429FW3Z39xAa1jqw1vvF1FkFZIqF17Xw4rGr1I va1Fga15ZFWDC3gCm3ZEXasCq-sJn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkFb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r106r15M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E87Iv 67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07jOdb8UUUUU= X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec, and enable [x]vfrecip instructions to be generated during auto-vectorization. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to .. (recip3): .. this. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrecip_d): Redefine to new pattern name. (CODE_FOR_lsx_vfrecip_s): Ditto. (CODE_FOR_lasx_xvfrecip_d): Ditto. (CODE_FOR_lasx_xvfrecip_s): Ditto. (loongarch_expand_builtin_direct): For the vector recip instructions, construct a temporary parameter const1_vector. * config/loongarch/lsx.md (lsx_vfrecip_): Renamed to .. (recip3): .. this. * config/loongarch/predicates.md (const_vector_1_operand): New predicate. diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index c8edc1bfd76..e4310c4523d 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -1626,12 +1626,12 @@ (define_insn "lasx_xvfmina_" [(set_attr "type" "simd_fminmax") (set_attr "mode" "")]) -(define_insn "lasx_xvfrecip_" +(define_insn "recip3" [(set (match_operand:FLASX 0 "register_operand" "=f") - (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] - UNSPEC_LASX_XVFRECIP))] + (div:FLASX (match_operand:FLASX 1 "const_vector_1_operand" "") + (match_operand:FLASX 2 "register_operand" "f")))] "ISA_HAS_LASX" - "xvfrecip.\t%u0,%u1" + "xvfrecip.\t%u0,%u2" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index b196e142d61..e0933537166 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -502,6 +502,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE) #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d #define CODE_FOR_lsx_vfrsqrt_d CODE_FOR_rsqrtv2df2 #define CODE_FOR_lsx_vfrsqrt_s CODE_FOR_rsqrtv4sf2 +#define CODE_FOR_lsx_vfrecip_d CODE_FOR_recipv2df3 +#define CODE_FOR_lsx_vfrecip_s CODE_FOR_recipv4sf3 /* LoongArch ASX define CODE_FOR_lasx_mxxx */ #define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3 @@ -780,6 +782,8 @@ AVAIL_ALL (lasx_frecipe, ISA_HAS_LASX && TARGET_FRECIPE) #define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du #define CODE_FOR_lasx_xvfrsqrt_d CODE_FOR_rsqrtv4df2 #define CODE_FOR_lasx_xvfrsqrt_s CODE_FOR_rsqrtv8sf2 +#define CODE_FOR_lasx_xvfrecip_d CODE_FOR_recipv4df3 +#define CODE_FOR_lasx_xvfrecip_s CODE_FOR_recipv8sf3 static const struct loongarch_builtin_description loongarch_builtins[] = { #define LARCH_MOVFCSR2GR 0 @@ -3024,6 +3028,22 @@ loongarch_expand_builtin_direct (enum insn_code icode, rtx target, tree exp, if (has_target_p) create_output_operand (&ops[opno++], target, TYPE_MODE (TREE_TYPE (exp))); + /* For the vector reciprocal instructions, we need to construct a temporary + parameter const1_vector. */ + switch (icode) + { + case CODE_FOR_recipv8sf3: + case CODE_FOR_recipv4df3: + case CODE_FOR_recipv4sf3: + case CODE_FOR_recipv2df3: + loongarch_prepare_builtin_arg (&ops[2], exp, 0); + create_input_operand (&ops[1], CONST1_RTX (ops[0].mode), ops[0].mode); + return loongarch_expand_builtin_insn (icode, 3, ops, has_target_p); + + default: + break; + } + /* Map the arguments to the other operands. */ gcc_assert (opno + call_expr_nargs (exp) == insn_data[icode].n_generator_args); diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index aeae1b1a622..06402e3b353 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -1539,12 +1539,12 @@ (define_insn "lsx_vfmina_" [(set_attr "type" "simd_fminmax") (set_attr "mode" "")]) -(define_insn "lsx_vfrecip_" +(define_insn "recip3" [(set (match_operand:FLSX 0 "register_operand" "=f") - (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] - UNSPEC_LSX_VFRECIP))] + (div:FLSX (match_operand:FLSX 1 "const_vector_1_operand" "") + (match_operand:FLSX 2 "register_operand" "f")))] "ISA_HAS_LSX" - "vfrecip.\t%w0,%w1" + "vfrecip.\t%w0,%w2" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index d02e846cb12..f7796da10b2 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md @@ -227,6 +227,10 @@ (define_predicate "const_1_operand" (and (match_code "const_int,const_wide_int,const_double,const_vector") (match_test "op == CONST1_RTX (GET_MODE (op))"))) +(define_predicate "const_vector_1_operand" + (and (match_code "const_vector") + (match_test "op == CONST1_RTX (GET_MODE (op))"))) + (define_predicate "reg_or_1_operand" (ior (match_operand 0 "const_1_operand") (match_operand 0 "register_operand"))) From patchwork Tue Dec 5 07:01:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1871849 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Sks1r6gqDz23mj for ; Tue, 5 Dec 2023 18:03:12 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 69AD8386C580 for ; Tue, 5 Dec 2023 07:03:09 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 48B2C386EC2D for ; Tue, 5 Dec 2023 07:02:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 48B2C386EC2D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 48B2C386EC2D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:470:142:3::10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701759752; cv=none; b=KfhGtr6xj04yN8hSFsRi+aDl1Pyc5ogblYZsXImlW12MY26ZtBKfxzaRSygHnWEH+mLtK9/T6yYTTBJhB4Dnt4SJjAEMZUnsOf4Sv8/YLN2Isls5bxhnk+kZZUKaTEpWC1bEbVPALSnjC+W0Sp+AHc4v8xsgtRI+WD6ucp2MQyo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701759752; c=relaxed/simple; bh=TQGw2hX+YfsyEiAn0pXyUzyCA3pdyj6v3ml5pHpTbGg=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=rIJsobhBerj3Clfj96LigfwZTLtNcr1CYRlXXgrun24IuzHhm5pHWQLRFudIvdjzDJqOCAmOId6ytoZ2vJgWNso/a0iL+iE3HUCjTu08xhCU+jXLKTplDT7P7q4BxhW0NgnWXkw7tlVuL4qVi/UaZ29MUsSYe/+nKu8mT4UE5R0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from mail.loongson.cn ([114.242.206.163]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rAPRl-0006gk-9F for gcc-patches@gcc.gnu.org; Tue, 05 Dec 2023 02:02:24 -0500 Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8AxXOr1ym5l1Pg+AA--.32936S3; Tue, 05 Dec 2023 15:02:13 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Cxvdzeym5ljDlVAA--.57842S8; Tue, 05 Dec 2023 15:02:10 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH v2 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math. Date: Tue, 5 Dec 2023 15:01:46 +0800 Message-Id: <20231205070147.53352-5-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231205070147.53352-1-xujiahao@loongson.cn> References: <20231205070147.53352-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Cxvdzeym5ljDlVAA--.57842S8 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj9fXoWfCr47Xr48tF15KrW5Wr45Jwc_yoW5urWkGo WrAF4DGw18GrySkw4DKrsxZry8Xw1jyr4xAa9I93Z5CFs7Xr15t3sFkw4Yv343CrnxXry5 C3s7uFZ8Z347Xa1kl-sFpf9Il3svdjkaLaAFLSUrUUUUbb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY87kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUtVWrXwAv7VC2z280 aVAFwI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2Iq xVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42 IY6xIIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY 6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aV CY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU84xRDUUUUU== Received-SPF: pass client-ip=114.242.206.163; envelope-from=xujiahao@loongson.cn; helo=mail.loongson.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_FAIL, SPF_HELO_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org When both the -mrecip and -mfrecipe options are enabled, use approximate reciprocal instructions and approximate reciprocal square root instructions with additional Newton-Raphson steps to implement single precision floating-point division, square root and reciprocal square root operations, for a better performance. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in (recip_mask): New variable. (-mrecip, -mrecip): New options. * config/loongarch/lasx.md (div3): New expander. (*div3): Rename. (sqrt2): New expander. (*sqrt2): Rename. (rsqrt2): New expander. * config/loongarch/loongarch-protos.h (loongarch_emit_swrsqrtsf): New prototype. (loongarch_emit_swdivsf): Ditto. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Set recip_mask for -mrecip and -mrecip= options. (loongarch_emit_swrsqrtsf): New function. (loongarch_emit_swdivsf): Ditto. * config/loongarch/loongarch.h (RECIP_MASK_NONE, RECIP_MASK_DIV, RECIP_MASK_SQRT RECIP_MASK_RSQRT, RECIP_MASK_VEC_DIV, RECIP_MASK_VEC_SQRT, RECIP_MASK_VEC_RSQRT RECIP_MASK_ALL): New bitmasks. (TARGET_RECIP_DIV, TARGET_RECIP_SQRT, TARGET_RECIP_RSQRT, TARGET_RECIP_VEC_DIV TARGET_RECIP_VEC_SQRT, TARGET_RECIP_VEC_RSQRT): New tests. * config/loongarch/loongarch.md (sqrt2): New expander. (*sqrt2): Rename. (rsqrt2): New expander. * config/loongarch/loongarch.opt (recip_mask): New variable. (-mrecip, -mrecip): New options. * config/loongarch/lsx.md (div3): New expander. (*div3): Rename. (sqrt2): New expander. (*sqrt2): Rename. (rsqrt2): New expander. * config/loongarch/predicates.md (reg_or_vecotr_1_operand): New predicate. * doc/invoke.texi (LoongArch Options): Document new options. gcc/testsuite/ChangeLog: * gcc.target/loongarch/divf.c: New test. * gcc.target/loongarch/recip-divf.c: New test. * gcc.target/loongarch/recip-sqrtf.c: New test. * gcc.target/loongarch/sqrtf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-divf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip-divf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip.c: New test. * gcc.target/loongarch/vector/lasx/lasx-sqrtf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-divf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip-divf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip.c: New test. * gcc.target/loongarch/vector/lsx/lsx-sqrtf.c: New test. diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 8af6cc6f532..cc1a9daf7cf 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -23,6 +23,9 @@ config/loongarch/loongarch-opts.h HeaderInclude config/loongarch/loongarch-str.h +TargetVariable +unsigned int recip_mask = 0 + ; ISA related options ;; Base ISA Enum @@ -197,6 +200,14 @@ mexplicit-relocs Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET) Use %reloc() assembly operators (for backward compatibility). +mrecip +Target RejectNegative Var(loongarch_recip) +Generate approximate reciprocal divide and square root for better throughput. + +mrecip= +Target RejectNegative Joined Var(loongarch_recip_name) +Control generation of reciprocal estimates. + ; The code model option names for -mcmodel. Enum Name(cmodel) Type(int) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index e4310c4523d..f6f2feedbb3 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -1194,7 +1194,25 @@ (define_insn "mul3" [(set_attr "type" "simd_fmul") (set_attr "mode" "")]) -(define_insn "div3" +(define_expand "div3" + [(set (match_operand:FLASX 0 "register_operand") + (div:FLASX (match_operand:FLASX 1 "reg_or_vecotr_1_operand") + (match_operand:FLASX 2 "register_operand")))] + "ISA_HAS_LASX" +{ + if (mode == V8SFmode + && TARGET_RECIP_VEC_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) + { + loongarch_emit_swdivsf (operands[0], operands[1], + operands[2], V8SFmode); + DONE; + } +}) + +(define_insn "*div3" [(set (match_operand:FLASX 0 "register_operand" "=f") (div:FLASX (match_operand:FLASX 1 "register_operand" "f") (match_operand:FLASX 2 "register_operand" "f")))] @@ -1223,7 +1241,23 @@ (define_insn "fnma4" [(set_attr "type" "simd_fmadd") (set_attr "mode" "")]) -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:FLASX 0 "register_operand") + (sqrt:FLASX (match_operand:FLASX 1 "register_operand")))] + "ISA_HAS_LASX" +{ + if (mode == V8SFmode + && TARGET_RECIP_VEC_SQRT + && flag_unsafe_math_optimizations + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V8SFmode, 0); + DONE; + } +}) + +(define_insn "*sqrt2" [(set (match_operand:FLASX 0 "register_operand" "=f") (sqrt:FLASX (match_operand:FLASX 1 "register_operand" "f")))] "ISA_HAS_LASX" @@ -1646,7 +1680,20 @@ (define_insn "lasx_xvfrecipe_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) -(define_insn "rsqrt2" +(define_expand "rsqrt2" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRSQRT))] + "ISA_HAS_LASX" + { + if (mode == V8SFmode && TARGET_RECIP_VEC_RSQRT) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V8SFmode, 1); + DONE; + } +}) + +(define_insn "*rsqrt2" [(set (match_operand:FLASX 0 "register_operand" "=f") (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] UNSPEC_LASX_XVFRSQRT))] diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index cb8fc36b086..f2ff93b5e10 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -220,5 +220,7 @@ extern rtx loongarch_gen_const_int_vector_shuffle (machine_mode, int); extern tree loongarch_build_builtin_va_list (void); extern rtx loongarch_build_signbit_mask (machine_mode, bool, bool); +extern void loongarch_emit_swrsqrtsf (rtx, rtx, machine_mode, bool); +extern void loongarch_emit_swdivsf (rtx, rtx, rtx, machine_mode); extern bool loongarch_explicit_relocs_p (enum loongarch_symbol_type); #endif /* ! GCC_LOONGARCH_PROTOS_H */ diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 96a4b846f2d..2c06edcff92 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -7547,6 +7547,71 @@ loongarch_option_override_internal (struct gcc_options *opts, /* Function to allocate machine-dependent function status. */ init_machine_status = &loongarch_init_machine_status; + + /* -mrecip options. */ + static struct + { + const char *string; /* option name. */ + unsigned int mask; /* mask bits to set. */ + } + const recip_options[] = { + { "all", RECIP_MASK_ALL }, + { "none", RECIP_MASK_NONE }, + { "div", RECIP_MASK_DIV }, + { "sqrt", RECIP_MASK_SQRT }, + { "rsqrt", RECIP_MASK_RSQRT }, + { "vec-div", RECIP_MASK_VEC_DIV }, + { "vec-sqrt", RECIP_MASK_VEC_SQRT }, + { "vec-rsqrt", RECIP_MASK_VEC_RSQRT }, + }; + + if (loongarch_recip_name) + { + char *p = ASTRDUP (loongarch_recip_name); + char *q; + unsigned int mask, i; + bool invert; + + while ((q = strtok (p, ",")) != NULL) + { + p = NULL; + if (*q == '!') + { + invert = true; + q++; + } + else + invert = false; + + if (!strcmp (q, "default")) + mask = RECIP_MASK_ALL; + else + { + for (i = 0; i < ARRAY_SIZE (recip_options); i++) + if (!strcmp (q, recip_options[i].string)) + { + mask = recip_options[i].mask; + break; + } + + if (i == ARRAY_SIZE (recip_options)) + { + error ("unknown option for %<-mrecip=%s%>", q); + invert = false; + mask = RECIP_MASK_NONE; + } + } + + if (invert) + recip_mask &= ~mask; + else + recip_mask |= mask; + } + } + if (loongarch_recip) + recip_mask |= RECIP_MASK_ALL; + if (!TARGET_FRECIPE) + recip_mask = RECIP_MASK_NONE; } @@ -11470,6 +11535,126 @@ loongarch_build_signbit_mask (machine_mode mode, bool vect, bool invert) return force_reg (vec_mode, v); } +/* Use rsqrte instruction and Newton-Rhapson to compute the approximation of + a single precision floating point [reciprocal] square root. */ + +void loongarch_emit_swrsqrtsf (rtx res, rtx a, machine_mode mode, bool recip) +{ + rtx x0, e0, e1, e2, mhalf, monehalf; + REAL_VALUE_TYPE r; + int unspec; + + x0 = gen_reg_rtx (mode); + e0 = gen_reg_rtx (mode); + e1 = gen_reg_rtx (mode); + e2 = gen_reg_rtx (mode); + + real_arithmetic (&r, ABS_EXPR, &dconsthalf, NULL); + mhalf = const_double_from_real_value (r, SFmode); + + real_arithmetic (&r, PLUS_EXPR, &dconsthalf, &dconst1); + monehalf = const_double_from_real_value (r, SFmode); + unspec = UNSPEC_RSQRTE; + + if (VECTOR_MODE_P (mode)) + { + mhalf = loongarch_build_const_vector (mode, true, mhalf); + monehalf = loongarch_build_const_vector (mode, true, monehalf); + unspec = GET_MODE_SIZE (mode) == 32 ? UNSPEC_LASX_XVFRSQRTE + : UNSPEC_LSX_VFRSQRTE; + } + + /* rsqrt(a) = rsqrte(a) * (1.5 - 0.5 * a * rsqrte(a) * rsqrte(a)) + sqrt(a) = a * rsqrte(a) * (1.5 - 0.5 * a * rsqrte(a) * rsqrte(a)) */ + + a = force_reg (mode, a); + + /* x0 = rsqrt(a) estimate. */ + emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, a), + unspec))); + + /* If (a == 0.0) Filter out infinity to prevent NaN for sqrt(0.0). */ + if (!recip) + { + rtx zero = force_reg (mode, CONST0_RTX (mode)); + + if (VECTOR_MODE_P (mode)) + { + machine_mode imode = related_int_vector_mode (mode).require (); + rtx mask = gen_reg_rtx (imode); + emit_insn (gen_rtx_SET (mask, gen_rtx_NE (imode, a, zero))); + emit_insn (gen_rtx_SET (x0, gen_rtx_AND (mode, x0, + gen_lowpart (mode, mask)))); + } + else + { + rtx target = emit_conditional_move (x0, { GT, a, zero, mode }, + x0, zero, mode, 0); + if (target != x0) + emit_move_insn (x0, target); + } + } + + /* e0 = x0 * a */ + emit_insn (gen_rtx_SET (e0, gen_rtx_MULT (mode, x0, a))); + /* e1 = e0 * x0 */ + emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0))); + + /* e2 = 1.5 - e1 * 0.5 */ + mhalf = force_reg (mode, mhalf); + monehalf = force_reg (mode, monehalf); + emit_insn (gen_rtx_SET (e2, gen_rtx_FMA (mode, + gen_rtx_NEG (mode, e1), + mhalf, monehalf))); + + if (recip) + /* res = e2 * x0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, x0, e2))); + else + /* res = e2 * e0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e2, e0))); +} + +/* Use recipe instruction and Newton-Rhapson to compute the approximation of + a single precision floating point divide. */ + +void loongarch_emit_swdivsf (rtx res, rtx a, rtx b, machine_mode mode) +{ + rtx x0, e0, mtwo; + REAL_VALUE_TYPE r; + x0 = gen_reg_rtx (mode); + e0 = gen_reg_rtx (mode); + int unspec = UNSPEC_RECIPE; + + real_arithmetic (&r, ABS_EXPR, &dconst2, NULL); + mtwo = const_double_from_real_value (r, SFmode); + + if (VECTOR_MODE_P (mode)) + { + mtwo = loongarch_build_const_vector (mode, true, mtwo); + unspec = GET_MODE_SIZE (mode) == 32 ? UNSPEC_LASX_XVFRECIPE + : UNSPEC_LSX_VFRECIPE; + } + + mtwo = force_reg (mode, mtwo); + + /* a / b = a * recipe(b) * (2.0 - b * recipe(b)) */ + + /* x0 = 1./b estimate. */ + emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b), + unspec))); + /* 2.0 - b * x0 */ + emit_insn (gen_rtx_SET (e0, gen_rtx_FMA (mode, + gen_rtx_NEG (mode, b), x0, mtwo))); + + /* x0 = a * x0 */ + if (a != CONST1_RTX (mode)) + emit_insn (gen_rtx_SET (x0, gen_rtx_MULT (mode, a, x0))); + + /* res = e0 * x0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0))); +} + static bool loongarch_builtin_support_vector_misalignment (machine_mode mode, const_tree type, @@ -11665,6 +11850,9 @@ loongarch_asm_code_end (void) #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \ loongarch_autovectorize_vector_modes +#undef TARGET_OPTAB_SUPPORTED_P +#define TARGET_OPTAB_SUPPORTED_P loongarch_optab_supported_p + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS loongarch_init_builtins #undef TARGET_BUILTIN_DECL diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h index fa8a3f5582f..ad9510c2c41 100644 --- a/gcc/config/loongarch/loongarch.h +++ b/gcc/config/loongarch/loongarch.h @@ -702,6 +702,24 @@ enum reg_class && (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT \ || GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)) +#define RECIP_MASK_NONE 0x00 +#define RECIP_MASK_DIV 0x01 +#define RECIP_MASK_SQRT 0x02 +#define RECIP_MASK_RSQRT 0x04 +#define RECIP_MASK_VEC_DIV 0x08 +#define RECIP_MASK_VEC_SQRT 0x10 +#define RECIP_MASK_VEC_RSQRT 0x20 +#define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \ + | RECIP_MASK_RSQRT | RECIP_MASK_VEC_SQRT \ + | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_RSQRT) + +#define TARGET_RECIP_DIV ((recip_mask & RECIP_MASK_DIV) != 0) +#define TARGET_RECIP_SQRT ((recip_mask & RECIP_MASK_SQRT) != 0) +#define TARGET_RECIP_RSQRT ((recip_mask & RECIP_MASK_RSQRT) != 0) +#define TARGET_RECIP_VEC_DIV ((recip_mask & RECIP_MASK_VEC_DIV) != 0) +#define TARGET_RECIP_VEC_SQRT ((recip_mask & RECIP_MASK_VEC_SQRT) != 0) +#define TARGET_RECIP_VEC_RSQRT ((recip_mask & RECIP_MASK_VEC_RSQRT) != 0) + /* 1 if N is a possible register number for function argument passing. We have no FP argument registers when soft-float. */ diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index fd154b02e48..1a10b809e3c 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -893,9 +893,21 @@ (define_peephole ;; Float division and modulus. (define_expand "div3" [(set (match_operand:ANYF 0 "register_operand") - (div:ANYF (match_operand:ANYF 1 "reg_or_1_operand") - (match_operand:ANYF 2 "register_operand")))] - "") + (div:ANYF (match_operand:ANYF 1 "reg_or_1_operand") + (match_operand:ANYF 2 "register_operand")))] + "" +{ + if (mode == SFmode + && TARGET_RECIP_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) + { + loongarch_emit_swdivsf (operands[0], operands[1], + operands[2], SFmode); + DONE; + } +}) (define_insn "*div3" [(set (match_operand:ANYF 0 "register_operand" "=f") @@ -1126,7 +1138,23 @@ (define_insn "*fnma4" ;; ;; .................... -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:ANYF 0 "register_operand") + (sqrt:ANYF (match_operand:ANYF 1 "register_operand")))] + "" + { + if (mode == SFmode + && TARGET_RECIP_SQRT + && flag_unsafe_math_optimizations + && !optimize_insn_for_size_p () + && flag_finite_math_only && !flag_trapping_math) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], SFmode, 0); + DONE; + } + }) + +(define_insn "*sqrt2" [(set (match_operand:ANYF 0 "register_operand" "=f") (sqrt:ANYF (match_operand:ANYF 1 "register_operand" "f")))] "" @@ -1135,6 +1163,19 @@ (define_insn "sqrt2" (set_attr "mode" "") (set_attr "insn_count" "1")]) +(define_expand "rsqrt2" + [(set (match_operand:ANYF 0 "register_operand") + (unspec:ANYF [(match_operand:ANYF 1 "register_operand")] + UNSPEC_RSQRT))] + "TARGET_HARD_FLOAT" +{ + if (mode == SFmode && TARGET_RECIP_RSQRT) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], SFmode, 1); + DONE; + } +}) + (define_insn "*rsqrt2" [(set (match_operand:ANYF 0 "register_operand" "=f") (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt index 38c6b23d400..f7a3f765b81 100644 --- a/gcc/config/loongarch/loongarch.opt +++ b/gcc/config/loongarch/loongarch.opt @@ -31,6 +31,9 @@ config/loongarch/loongarch-opts.h HeaderInclude config/loongarch/loongarch-str.h +TargetVariable +unsigned int recip_mask = 0 + ; ISA related options ;; Base ISA Enum @@ -205,6 +208,14 @@ mexplicit-relocs Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET) Use %reloc() assembly operators (for backward compatibility). +mrecip +Target RejectNegative Var(loongarch_recip) +Generate approximate reciprocal divide and square root for better throughput. + +mrecip= +Target RejectNegative Joined Var(loongarch_recip_name) +Control generation of reciprocal estimates. + ; The code model option names for -mcmodel. Enum Name(cmodel) Type(int) diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 06402e3b353..55810041d39 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -1083,7 +1083,25 @@ (define_insn "mul3" [(set_attr "type" "simd_fmul") (set_attr "mode" "")]) -(define_insn "div3" +(define_expand "div3" + [(set (match_operand:FLSX 0 "register_operand") + (div:FLSX (match_operand:FLSX 1 "reg_or_vecotr_1_operand") + (match_operand:FLSX 2 "register_operand")))] + "ISA_HAS_LSX" +{ + if (mode == V4SFmode + && TARGET_RECIP_VEC_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) + { + loongarch_emit_swdivsf (operands[0], operands[1], + operands[2], V4SFmode); + DONE; + } +}) + +(define_insn "*div3" [(set (match_operand:FLSX 0 "register_operand" "=f") (div:FLSX (match_operand:FLSX 1 "register_operand" "f") (match_operand:FLSX 2 "register_operand" "f")))] @@ -1112,7 +1130,23 @@ (define_insn "fnma4" [(set_attr "type" "simd_fmadd") (set_attr "mode" "")]) -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:FLSX 0 "register_operand") + (sqrt:FLSX (match_operand:FLSX 1 "register_operand")))] + "ISA_HAS_LSX" +{ + if (mode == V4SFmode + && TARGET_RECIP_VEC_SQRT + && flag_unsafe_math_optimizations + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V4SFmode, 0); + DONE; + } +}) + +(define_insn "*sqrt2" [(set (match_operand:FLSX 0 "register_operand" "=f") (sqrt:FLSX (match_operand:FLSX 1 "register_operand" "f")))] "ISA_HAS_LSX" @@ -1559,7 +1593,20 @@ (define_insn "lsx_vfrecipe_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) -(define_insn "rsqrt2" +(define_expand "rsqrt2" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRSQRT))] + "ISA_HAS_LSX" +{ + if (mode == V4SFmode && TARGET_RECIP_VEC_RSQRT) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V4SFmode, 1); + DONE; + } +}) + +(define_insn "*rsqrt2" [(set (match_operand:FLSX 0 "register_operand" "=f") (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] UNSPEC_LSX_VFRSQRT))] diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index f7796da10b2..9e9ce58cb53 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md @@ -235,6 +235,10 @@ (define_predicate "reg_or_1_operand" (ior (match_operand 0 "const_1_operand") (match_operand 0 "register_operand"))) +(define_predicate "reg_or_vecotr_1_operand" + (ior (match_operand 0 "const_vector_1_operand") + (match_operand 0 "register_operand"))) + ;; These are used in vec_merge, hence accept bitmask as const_int. (define_predicate "const_exp_2_operand" (and (match_code "const_int") diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 6fe63b5f999..726eead2d8e 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1211,6 +1211,7 @@ Objective-C and Objective-C++ Dialects}. -msoft-float -mhard-float -mdouble-float -munordered-float -mcmov -mror -mrori -msext -msfimm -mshftimm -mcmodel=@var{code-model}} +-mrecip -mrecip=@var{opt} @emph{PDP-11 Options} @gccoptlist{-mfpu -msoft-float -mac0 -mno-ac0 -m40 -m45 -m10 @@ -26598,6 +26599,59 @@ detecting corresponding assembler support: This option is mostly useful for debugging, or interoperation with assemblers different from the build-time one. +@opindex mrecip +@item -mrecip +This option enables use of the reciprocal estimate and reciprocal square +root estimate instructions with additional Newton-Raphson steps to increase +precision instead of doing a divide or square root and divide for +floating-point arguments. +These instructions are generated only when @option{-funsafe-math-optimizations} +is enabled together with @option{-ffinite-math-only} and +@option{-fno-trapping-math}. +This option is off by default. Before you can use this option, you must sure the +target CPU supports frecipe and frsqrte instructions. +Note that while the throughput of the sequence is higher than the throughput of +the non-reciprocal instruction, the precision of the sequence can be decreased +by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). + +@opindex mrecip=opt +@item -mrecip=@var{opt} +This option controls which reciprocal estimate instructions +may be used. @var{opt} is a comma-separated list of options, which may +be preceded by a @samp{!} to invert the option: + +@table @samp +@item all +Enable all estimate instructions. + +@item default +Enable the default instructions, equivalent to @option{-mrecip}. + +@item none +Disable all estimate instructions, equivalent to @option{-mno-recip}. + +@item div +Enable the approximation for scalar division. + +@item vec-div +Enable the approximation for vectorized division. + +@item sqrt +Enable the approximation for scalar square root. + +@item vec-sqrt +Enable the approximation for vectorized square root. + +@item rsqrt +Enable the approximation for scalar reciprocal square root. + +@item vec-rsqrt +Enable the approximation for vectorized reciprocal square root. +@end table + +So, for example, @option{-mrecip=all,!sqrt} enables +all of the reciprocal approximations, except for scalar square root. + @item loongarch-vect-unroll-limit The vectorizer will use available tuning information to determine whether it would be beneficial to unroll the main vectorized loop and by how much. This diff --git a/gcc/testsuite/gcc.target/loongarch/divf.c b/gcc/testsuite/gcc.target/loongarch/divf.c new file mode 100644 index 00000000000..6c831817c9e --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/divf.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mfrecipe -fno-unsafe-math-optimizations" } */ +/* { dg-final { scan-assembler "fdiv.s" } } */ +/* { dg-final { scan-assembler-not "frecipe.s" } } */ + +float +foo(float a, float b) +{ + return a / b; +} diff --git a/gcc/testsuite/gcc.target/loongarch/recip-divf.c b/gcc/testsuite/gcc.target/loongarch/recip-divf.c new file mode 100644 index 00000000000..db5e3e48888 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/recip-divf.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mfrecipe" } */ +/* { dg-final { scan-assembler "frecipe.s" } } */ + +float +foo(float a, float b) +{ + return a / b; +} diff --git a/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c new file mode 100644 index 00000000000..7f45db6cdea --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mfrecipe" } */ +/* { dg-final { scan-assembler-times "frsqrte.s" 3 } } */ + +extern float sqrtf (float); + +float +foo1 (float a, float b) +{ + return a/sqrtf(b); +} + +float +foo2 (float a, float b) +{ + return sqrtf(a/b); +} + +float +foo3 (float a) +{ + return sqrtf(a); +} diff --git a/gcc/testsuite/gcc.target/loongarch/sqrtf.c b/gcc/testsuite/gcc.target/loongarch/sqrtf.c new file mode 100644 index 00000000000..c2720faac7b --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/sqrtf.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mfrecipe -fno-unsafe-math-optimizations" } */ +/* { dg-final { scan-assembler-times "fsqrt.s" 3 } } */ +/* { dg-final { scan-assembler-not "frsqrte.s" } } */ + +extern float sqrtf (float); + +float +foo1 (float a, float b) +{ + return a/sqrtf(b); +} + +float +foo2 (float a, float b) +{ + return sqrtf(a/b); +} + +float +foo3 (float a) +{ + return sqrtf(a); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c new file mode 100644 index 00000000000..748a82200d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-divf.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mrecip -mlasx -mfrecipe -fno-unsafe-math-optimizations" } */ +/* { dg-final { scan-assembler "xvfdiv.s" } } */ +/* { dg-final { scan-assembler-not "xvfrecipe.s" } } */ + +float a[8],b[8],c[8]; + +void +foo () +{ + for (int i = 0; i < 8; i++) + c[i] = a[i] / b[i]; +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c new file mode 100644 index 00000000000..6532756f07d --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlasx -mfrecipe" } */ +/* { dg-final { scan-assembler "xvfrecipe.s" } } */ + +float a[8],b[8],c[8]; + +void +foo () +{ + for (int i = 0; i < 8; i++) + c[i] = a[i] / b[i]; +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c new file mode 100644 index 00000000000..a623dff8f27 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlasx -mfrecipe" } */ +/* { dg-final { scan-assembler-times "xvfrsqrte.s" 3 } } */ + +float a[8], b[8], c[8]; + +extern float sqrtf (float); + +void +foo1 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = a[i] / sqrtf (b[i]); +} + +void +foo2 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = sqrtf (a[i] / b[i]); +} + +void +foo3 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = sqrtf (a[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c new file mode 100644 index 00000000000..083c868406b --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlasx -fno-vect-cost-model" } */ +/* { dg-final { scan-assembler "xvfrecip.s" } } */ +/* { dg-final { scan-assembler "xvfrecip.d" } } */ +/* { dg-final { scan-assembler-not "xvfdiv.s" } } */ +/* { dg-final { scan-assembler-not "xvfdiv.d" } } */ + +float a[8], b[8]; + +void +foo1(void) +{ + for (int i = 0; i < 8; i++) + a[i] = 1 / (b[i]); +} + +double da[4], db[4]; + +void +foo2(void) +{ + for (int i = 0; i < 4; i++) + da[i] = 1 / (db[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c new file mode 100644 index 00000000000..a005a38865d --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-sqrtf.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -fno-unsafe-math-optimizations -mrecip -mlasx -mfrecipe" } */ +/* { dg-final { scan-assembler-times "xvfsqrt.s" 3 } } */ +/* { dg-final { scan-assembler-not "xvfrsqrte.s" } } */ + +float a[8], b[8], c[8]; + +extern float sqrtf (float); + +void +foo1 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = a[i] / sqrtf (b[i]); +} + +void +foo2 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = sqrtf (a[i] / b[i]); +} + +void +foo3 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = sqrtf (a[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c new file mode 100644 index 00000000000..1219b1ef842 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-divf.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlsx -mfrecipe -fno-unsafe-math-optimizations" } */ +/* { dg-final { scan-assembler "vfdiv.s" } } */ +/* { dg-final { scan-assembler-not "vfrecipe.s" } } */ + +float a[4],b[4],c[4]; + +void +foo () +{ + for (int i = 0; i < 4; i++) + c[i] = a[i] / b[i]; +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c new file mode 100644 index 00000000000..edbe8d9098f --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlsx -mfrecipe" } */ +/* { dg-final { scan-assembler "vfrecipe.s" } } */ + +float a[4],b[4],c[4]; + +void +foo () +{ + for (int i = 0; i < 4; i++) + c[i] = a[i] / b[i]; +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c new file mode 100644 index 00000000000..d356f915eb5 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlsx -mfrecipe" } */ +/* { dg-final { scan-assembler-times "vfrsqrte.s" 3 } } */ + +float a[4], b[4], c[4]; + +extern float sqrtf (float); + +void +foo1 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = a[i] / sqrtf (b[i]); +} + +void +foo2 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = sqrtf (a[i] / b[i]); +} + +void +foo3 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = sqrtf (a[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c new file mode 100644 index 00000000000..c4d6af4db93 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mlsx -fno-vect-cost-model" } */ +/* { dg-final { scan-assembler "vfrecip.s" } } */ +/* { dg-final { scan-assembler "vfrecip.d" } } */ +/* { dg-final { scan-assembler-not "vfdiv.s" } } */ +/* { dg-final { scan-assembler-not "vfdiv.d" } } */ + +float a[4], b[4]; + +void +foo1(void) +{ + for (int i = 0; i < 4; i++) + a[i] = 1 / (b[i]); +} + +double da[2], db[2]; + +void +foo2(void) +{ + for (int i = 0; i < 2; i++) + da[i] = 1 / (db[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c new file mode 100644 index 00000000000..3ff6570a67a --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-sqrtf.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlsx -mfrecipe -fno-unsafe-math-optimizations" } */ +/* { dg-final { scan-assembler-times "vfsqrt.s" 3 } } */ +/* { dg-final { scan-assembler-not "vfrsqrte.s" } } */ + +float a[4], b[4], c[4]; + +extern float sqrtf (float); + +void +foo1 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = a[i] / sqrtf (b[i]); +} + +void +foo2 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = sqrtf (a[i] / b[i]); +} + +void +foo3 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = sqrtf (a[i]); +} From patchwork Tue Dec 5 07:01:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1871850 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Sks2B5VSSz1ySd for ; Tue, 5 Dec 2023 18:03:30 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6405C3845BD5 for ; Tue, 5 Dec 2023 07:03:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 07AFF3857B99 for ; Tue, 5 Dec 2023 07:02:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 07AFF3857B99 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 07AFF3857B99 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701759743; cv=none; b=TWVv+Sf6JYXw7lm7KtuZepK2v+ZmUZz9AdNKYR4Di4Ahugwl8B/N60xact3xzhQEYsxB1Gq9i6xyABpb9lb9DZp33BbTqtrXRKDZ+Xgk5bMJZw3TvbmiWvLlTqHltlenZFzH4eDInx06ubaN4J54ObRZPtwsBXQlcmmT5xF/vss= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701759743; c=relaxed/simple; bh=gtvKEAUbYRjuSptFnR0xIoa60twLkfCjw3OJJ2hx9L4=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=s/Wfp0cBEKTnhGUq5L/HJ3itIH5FfJZKb15z82BtPIBcaKHgiHLHsm/jXRJJvovXah/2GHcamw/5BbP+ksTZwgCkYUxiXz2LOJNopQj0p9dCvSHcWIyTy0wMyUbiaOJt9TJjrBsUZ4cm9Jpr3KhLNQPumIlQKC8U6cTh1I5qzOg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8CxNvH5ym5l1_g+AA--.59904S3; Tue, 05 Dec 2023 15:02:17 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Cxvdzeym5ljDlVAA--.57842S9; Tue, 05 Dec 2023 15:02:16 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH v2 5/5] LoongArch: Vectorized loop unrolling is disable for divf/sqrtf/rsqrtf when -mrecip is enabled. Date: Tue, 5 Dec 2023 15:01:47 +0800 Message-Id: <20231205070147.53352-6-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231205070147.53352-1-xujiahao@loongson.cn> References: <20231205070147.53352-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Cxvdzeym5ljDlVAA--.57842S9 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj93XoW7KrWUAFW7trWfZryrCry7twc_yoW8tFyUpr ZIyr13tw4DJr47WrsrJ3yxWw1ayr9xGF42qa13ta4fCa17Kr1Fq3WkKr1qvFZrX3y5WryI vr1IqFs8Za45CwbCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUk2b4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r106r15M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E87Iv 67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv6x kF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUcHUqUUUUU X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Using -mrecip generates a sequence of instructions to replace divf, sqrtf and rsqrtf. The number of generated instructions is close to or exceeds the maximum issue instructions per cycle of the LoongArch, so vectorized loop unrolling is not performed on them. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_vector_costs::determine_suggested_unroll_factor): If m_has_recip is true, uf return 1. (loongarch_vector_costs::add_stmt_cost): Detect the use of approximate instruction sequence. diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 2c06edcff92..0ca60e15ced 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -3974,7 +3974,9 @@ protected: /* Reduction factor for suggesting unroll factor. */ unsigned m_reduc_factor = 0; /* True if the loop contains an average operation. */ - bool m_has_avg =false; + bool m_has_avg = false; + /* True if the loop uses approximation instruction sequence. */ + bool m_has_recip = false; }; /* Implement TARGET_VECTORIZE_CREATE_COSTS. */ @@ -4021,7 +4023,7 @@ loongarch_vector_costs::determine_suggested_unroll_factor (loop_vec_info loop_vi { class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - if (m_has_avg) + if (m_has_avg || m_has_recip) return 1; /* Don't unroll if it's specified explicitly not to be unrolled. */ @@ -4081,6 +4083,36 @@ loongarch_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, } } + combined_fn cfn; + if (kind == vector_stmt + && stmt_info + && stmt_info->stmt) + { + /* Detect the use of approximate instruction sequence. */ + if ((TARGET_RECIP_VEC_SQRT || TARGET_RECIP_VEC_RSQRT) + && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST) + switch (cfn) + { + case CFN_BUILT_IN_SQRTF: + m_has_recip = true; + default: + break; + } + else if (TARGET_RECIP_VEC_DIV + && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN) + { + machine_mode mode = TYPE_MODE (vectype); + switch (gimple_assign_rhs_code (stmt_info->stmt)) + { + case RDIV_EXPR: + if (GET_MODE_INNER (mode) == SFmode) + m_has_recip = true; + default: + break; + } + } + } + return retval; }