From patchwork Wed Dec 13 14:20:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xi Ruoyao X-Patchwork-Id: 1875678 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=xry111.site header.i=@xry111.site header.a=rsa-sha256 header.s=default header.b=NlguD0Eo; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SqyPj1Nhxz20H6 for ; Thu, 14 Dec 2023 01:23:05 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 68B13385B534 for ; Wed, 13 Dec 2023 14:23:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id A447C3858C2F for ; Wed, 13 Dec 2023 14:22:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A447C3858C2F Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A447C3858C2F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=89.208.246.23 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702477371; cv=none; b=c/+v/NPapOIzNSB6fN2SuuUYXqufGwHDw8aY7tqUzsNow5DAQxsvRZdBr/K0PnGKuF0oeLn/rFHaOUk8Drm8zUxpJtj8XMH0Ringz32xy7qvcglbIhN0MnIEafem+Yl5xBt4bRvtdHGBwP2wm9KzL5tOaOgjuV0D9exfjN/0vsY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702477371; c=relaxed/simple; bh=IaEjf1hjIXI64hvDvsl27WCEwBrjPMilQuhA4hitsOs=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=qbH8FXzGhZOlmZ3fVhd0JTOQno4H9IFGv8FO7EFF5alMK9fIFImibzkGk62O/kE2S1YDwWcZAVc4MZ+kuDWU1TPibBoDrsE7igO3ZCtCHsBr8gnh5y/um/VhoCLS6/pQSOENvOEmtloXvb3Jn95wk24t5AD9V/vnXZx9aUCbnAA= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1702477368; bh=IaEjf1hjIXI64hvDvsl27WCEwBrjPMilQuhA4hitsOs=; h=From:To:Cc:Subject:Date:From; b=NlguD0EoXo3QxczjwmfYuiafO3efJ0PwI11rJPfqFXub6jDTLcraZZuF1E0jLjhS/ /ZjZktGPc2Izl0LywAO44YRk8IcO2qev9v3Qx3Ul6I+oO3SH6Ym8+nWYRq+hO1ZVZq And4lUPU3cTDSQfYzF7Qy546pzp4FYv0DZRkuaGk= Received: from stargazer.. (unknown [113.200.174.81]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id D24D366D8C; Wed, 13 Dec 2023 09:22:46 -0500 (EST) From: Xi Ruoyao To: gcc-patches@gcc.gnu.org Cc: chenglulu , i@xen0n.name, xuchenghua@loongson.cn, Xi Ruoyao Subject: [PATCH] LoongArch: Use the movcf2gr instruction to implement cstore4 Date: Wed, 13 Dec 2023 22:20:09 +0800 Message-ID: <20231213142240.7974-1-xry111@xry111.site> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, LIKELY_SPAM_FROM, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org We used a branch to load floating-point comparison results into GPR. This is very slow when the branch is not predictable. Use the movcf2gr instruction to implement cstore4 if movcf2gr is fast enough. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in (muse-movcf2gr): New option. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch-tune.h (loongarch_rtx_cost_data::movcf2gr): New field. (loongarch_rtx_cost_data::movcf2gr_): New method. (loongarch_rtx_cost_data::use_movcf2gr): New method. (simple_insn_cost): Declare. * config/loongarch/loongarch-def.cc (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Set movcf2gr to COSTS_N_INSNS (7). (loongarch_cpu_rtx_cost_data): Set movcf2gr to COSTS_N_INSNS (1) for LA664. (loongarch_rtx_cost_optimize_size): Set movcf2gr to COSTS_N_INSNS (1) + 1. (simple_insn_cost): Define and initialize to COSTS_N_INSNS (1). * doc/invoke.texi (-muse-movcf2gr): Document the new option. * config/loongarch/predicates.md (loongarch_fcmp_operator): New predicate. * config/loongarch/loongarch.md (movcf2gr): New define_insn. (cstore4): New define_expand. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Set the default of -muse-movcf2gr based on -mtune=. gcc/testsuite/ChangeLog: * gcc.target/loongarch/movcf2gr.c: New test. --- Bootstrapped and regtested on loongarch64-linux-gnu (twice, with BOOT_CFLAGS and {C,CXX}FLAGS_FOR_TARGET set to "-O2 -muse-movcf2gr" and "-O2 -mno-use-movcf2gr"). Ok for trunk? gcc/config/loongarch/genopts/loongarch.opt.in | 4 +++ gcc/config/loongarch/loongarch-def.cc | 12 +++++-- gcc/config/loongarch/loongarch-tune.h | 14 ++++++++ gcc/config/loongarch/loongarch.cc | 3 ++ gcc/config/loongarch/loongarch.md | 36 +++++++++++++++++++ gcc/config/loongarch/loongarch.opt | 4 +++ gcc/config/loongarch/predicates.md | 4 +++ gcc/doc/invoke.texi | 8 +++++ gcc/testsuite/gcc.target/loongarch/movcf2gr.c | 9 +++++ 9 files changed, 91 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/movcf2gr.c diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index c3848d02fd3..a87915d9b5a 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -245,6 +245,10 @@ mpass-mrelax-to-as Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION) Pass -mrelax or -mno-relax option to the assembler. +muse-movcf2gr +Target Var(loongarch_use_movcf2gr) Init(M_OPT_UNSET) +Emit the movcf2gr instruction. + -param=loongarch-vect-unroll-limit= Target Joined UInteger Var(loongarch_vect_unroll_limit) Init(6) IntegerRange(1, 64) Param Used to limit unroll factor which indicates how much the autovectorizer may diff --git a/gcc/config/loongarch/loongarch-def.cc b/gcc/config/loongarch/loongarch-def.cc index 4a8885e8343..6da085d375e 100644 --- a/gcc/config/loongarch/loongarch-def.cc +++ b/gcc/config/loongarch/loongarch-def.cc @@ -36,6 +36,8 @@ using array_tune = array; template using array_arch = array; +const int simple_insn_cost = COSTS_N_INSNS (1); + /* CPU property tables. */ array_tune loongarch_cpu_strings = array_tune () .set (CPU_NATIVE, STR_CPU_NATIVE) @@ -101,15 +103,18 @@ loongarch_rtx_cost_data::loongarch_rtx_cost_data () int_mult_di (COSTS_N_INSNS (4)), int_div_si (COSTS_N_INSNS (5)), int_div_di (COSTS_N_INSNS (5)), + movcf2gr (COSTS_N_INSNS (7)), branch_cost (6), memory_latency (4) {} /* The following properties cannot be looked up directly using "cpucfg". So it is necessary to provide a default value for "unknown native" tune targets (i.e. -mtune=native while PRID does not correspond to - any known "-mtune" type). Currently all numbers are default. */ + any known "-mtune" type). */ array_tune loongarch_cpu_rtx_cost_data = - array_tune (); + array_tune () + .set (CPU_LA664, + loongarch_rtx_cost_data ().movcf2gr_ (COSTS_N_INSNS (1))); /* RTX costs to use when optimizing for size. We use a value slightly larger than COSTS_N_INSNS (1) for all of them @@ -125,7 +130,8 @@ const loongarch_rtx_cost_data loongarch_rtx_cost_optimize_size = .int_mult_si_ (COST_COMPLEX_INSN) .int_mult_di_ (COST_COMPLEX_INSN) .int_div_si_ (COST_COMPLEX_INSN) - .int_div_di_ (COST_COMPLEX_INSN); + .int_div_di_ (COST_COMPLEX_INSN) + .movcf2gr_ (COST_COMPLEX_INSN); array_tune loongarch_cpu_issue_rate = array_tune () .set (CPU_NATIVE, 4) diff --git a/gcc/config/loongarch/loongarch-tune.h b/gcc/config/loongarch/loongarch-tune.h index 4aa01c54c08..7f478e009cd 100644 --- a/gcc/config/loongarch/loongarch-tune.h +++ b/gcc/config/loongarch/loongarch-tune.h @@ -23,6 +23,8 @@ along with GCC; see the file COPYING3. If not see #include "loongarch-def-array.h" +extern const int simple_insn_cost; + /* RTX costs of various operations on the different architectures. */ struct loongarch_rtx_cost_data { @@ -35,6 +37,7 @@ struct loongarch_rtx_cost_data unsigned short int_mult_di; unsigned short int_div_si; unsigned short int_div_di; + unsigned short movcf2gr; unsigned short branch_cost; unsigned short memory_latency; @@ -95,6 +98,12 @@ struct loongarch_rtx_cost_data return *this; } + loongarch_rtx_cost_data movcf2gr_ (unsigned short _movcf2gr) + { + movcf2gr = _movcf2gr; + return *this; + } + loongarch_rtx_cost_data branch_cost_ (unsigned short _branch_cost) { branch_cost = _branch_cost; @@ -107,6 +116,11 @@ struct loongarch_rtx_cost_data return *this; } + bool use_movcf2gr () const + { + /* If movcf2gr is cheaper than two li.w and a branch, use it. */ + return movcf2gr <= simple_insn_cost * 2 + branch_cost; + } }; /* Costs to use when optimizing for size. */ diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 390e3206a17..35e84964eb7 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -7528,6 +7528,9 @@ loongarch_option_override_internal (struct gcc_options *opts, else loongarch_cost = &loongarch_cpu_rtx_cost_data[la_target.cpu_tune]; + if (loongarch_use_movcf2gr == M_OPT_UNSET) + loongarch_use_movcf2gr = loongarch_cost->use_movcf2gr (); + /* If the user hasn't specified a branch cost, use the processor's default. */ if (loongarch_branch_cost == 0) diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index a5d0dcd65fe..de3015c923b 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -3169,6 +3169,42 @@ (define_insn "s__using_FCCmode" [(set_attr "type" "fcmp") (set_attr "mode" "FCC")]) +(define_insn "movcf2gr" + [(set (match_operand:GPR 0 "register_operand" "=r") + (if_then_else:GPR (ne (match_operand:FCC 1 "register_operand" "z") + (const_int 0)) + (const_int 1) + (const_int 0)))] + "TARGET_HARD_FLOAT && loongarch_use_movcf2gr" + "movcf2gr\t%0,%1" + [(set_attr "type" "move") + (set_attr "mode" "FCC")]) + +(define_expand "cstore4" + [(set (match_operand:SI 0 "register_operand") + (match_operator:SI 1 "loongarch_fcmp_operator" + [(match_operand:ANYF 2 "register_operand") + (match_operand:ANYF 3 "register_operand")]))] + "loongarch_use_movcf2gr" + { + rtx fcc = gen_reg_rtx (FCCmode); + rtx cmp = gen_rtx_fmt_ee (GET_CODE (operands[1]), FCCmode, + operands[2], operands[3]); + + emit_insn (gen_rtx_SET (fcc, cmp)); + if (TARGET_64BIT) + { + rtx gpr = gen_reg_rtx (DImode); + emit_insn (gen_movcf2grdi (gpr, fcc)); + emit_insn (gen_rtx_SET (operands[0], + lowpart_subreg (SImode, gpr, DImode))); + } + else + emit_insn (gen_movcf2grsi (operands[0], fcc)); + + DONE; + }) + ;; ;; .................... diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt index 61d25130ea9..b553eaa34e7 100644 --- a/gcc/config/loongarch/loongarch.opt +++ b/gcc/config/loongarch/loongarch.opt @@ -253,6 +253,10 @@ mpass-mrelax-to-as Target Var(loongarch_pass_mrelax_to_as) Init(HAVE_AS_MRELAX_OPTION) Pass -mrelax or -mno-relax option to the assembler. +muse-movcf2gr +Target Var(loongarch_use_movcf2gr) Init(M_OPT_UNSET) +Emit the movcf2gr instruction. + -param=loongarch-vect-unroll-limit= Target Joined UInteger Var(loongarch_vect_unroll_limit) Init(6) IntegerRange(1, 64) Param Used to limit unroll factor which indicates how much the autovectorizer may diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index 9e9ce58cb53..83fea08315c 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md @@ -590,6 +590,10 @@ (define_predicate "order_operator" (define_predicate "loongarch_cstore_operator" (match_code "ne,eq,gt,gtu,ge,geu,lt,ltu,le,leu")) +(define_predicate "loongarch_fcmp_operator" + (match_code + "unordered,uneq,unlt,unle,eq,lt,le,ordered,ltgt,ne,ge,gt,unge,ungt")) + (define_predicate "small_data_pattern" (and (match_code "set,parallel,unspec,unspec_volatile,prefetch") (match_test "loongarch_small_data_pattern_p (op)"))) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 1f26f80d26c..1f79a888627 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -26811,6 +26811,14 @@ Enable the approximation for vectorized reciprocal square root. So, for example, @option{-mrecip=all,!sqrt} enables all of the reciprocal approximations, except for scalar square root. +@item -muse-movcf2gr +@itemx -mno-use-movcf2gr +Use (do not use) the @code{movcf2gr} instruction. The default is +dependent on the setting of @option{-mtune=} option: +@option{-muse-movcf2gr} if tuning for a microarchitecture where the +@code{movcf2gr} instruction is faster than a @code{bceqz} or @code{bcnez} +branch setting a GPR to 0 or 1; @option{-mno-use-movcf2gr} otherwise. + @item loongarch-vect-unroll-limit The vectorizer will use available tuning information to determine whether it would be beneficial to unroll the main vectorized loop and by how much. This diff --git a/gcc/testsuite/gcc.target/loongarch/movcf2gr.c b/gcc/testsuite/gcc.target/loongarch/movcf2gr.c new file mode 100644 index 00000000000..d27c393b5ed --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/movcf2gr.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=loongarch64 -mtune=la664 -mabi=lp64d" } */ +/* { dg-final { scan-assembler "movcf2gr\t\\\$r4,\\\$fcc" } } */ + +int +t (float a, float b) +{ + return a > b; +}