[09/44] RISC-V: Rework branch costing model for if-conversion

Message ID	alpine.DEB.2.20.2311171501110.5892@tpp.orcam.me.uk
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 75EC9385C32C Date: Sun, 19 Nov 2023 05:36:52 +0000 (GMT) From: "Maciej W. Rozycki" <macro@embecosm.com> To: gcc-patches@gcc.gnu.org cc: Andrew Waterman <andrew@sifive.com>, Jim Wilson <jim.wilson.gcc@gmail.com>, Kito Cheng <kito.cheng@gmail.com>, Palmer Dabbelt <palmer@dabbelt.com> Subject: [PATCH 09/44] RISC-V: Rework branch costing model for if-conversion In-Reply-To: <alpine.DEB.2.20.2311171315580.5892@tpp.orcam.me.uk> Message-ID: <alpine.DEB.2.20.2311171501110.5892@tpp.orcam.me.uk> References: <alpine.DEB.2.20.2311171315580.5892@tpp.orcam.me.uk> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	RISC-V: Various if-conversion fixes and improvements \| expand [00/44] RISC-V: Various if-conversion fixes and improvements [13/44] RISC-V/testsuite: Add branchless cases for FP cond-move operations [01/44] testsuite: Add cases for conditional-move and conditional-add operations [02/44] RISC-V/testsuite: Add cases for integer SFB cond-move operations [03/44] RISC-V: Reorder comment on SFB patterns [04/44] RISC-V: Sanitise NEED_EQ_NE_P case with `riscv_emit_int_compare' [05/44] RISC-V: Fix `mode' usage in `riscv_expand_conditional_move' [06/44] RISC-V: Avoid repeated GET_MODE calls in `riscv_expand_conditional_move' [07/44] RISC-V: Use `nullptr' in `riscv_expand_conditional_move' [08/44] RISC-V: Simplify EQ vs NE selection in `riscv_expand_conditional_move' [09/44] RISC-V: Rework branch costing model for if-conversion [10/44] RISC-V/testsuite: Add branched cases for integer cond-move operations [11/44] RISC-V/testsuite: Add branchless cases for integer cond-move operations [12/44] RISC-V/testsuite: Add branched cases for FP cond-move operations [14/44] RISC-V: Also invert the cond-move condition for GEU and LEU [15/44] RISC-V/testsuite: Add branched cases for GEU and LEU cond-move operations [16/44] RISC-V/testsuite: Add branchless cases for GEU and LEU cond-move operations [17/44] RISC-V: Avoid extraneous EQ or NE operation in cond-move expansion [18/44] RISC-V/testsuite: Add branched cases for equality cond-move operations [19/44] RISC-V/testsuite: Add branchless cases for equality cond-move operations [20/44] RISC-V: Also accept constants for T-Head cond-move comparison operands [21/44] RISC-V: Also accept constants for T-Head cond-move data input operands [22/44] RISC-V: Fold all the cond-move variants together [23/44] RISC-V/testsuite: Add branched cases for T-Head non-equality cond moves [24/44] RISC-V/testsuite: Add branchless cases for T-Head non-equality cond moves [25/44] RISC-V: Implement `riscv_emit_unary' helper [26/44] RISC-V: Add `movMODEcc' implementation for generic targets [27/44] RISC-V/testsuite: Add branched cases for generic integer cond moves [28/44] RISC-V/testsuite: Add branchless cases for generic integer cond moves [29/44] RISC-V: Add `addMODEcc' implementation for generic targets [30/44] RISC-V/testsuite: Add branched cases for generic integer cond adds [31/44] RISC-V/testsuite: Add branchless cases for generic integer cond adds [32/44] RISC-V: Only use SUBREG if applicable in `riscv_expand_float_scc' [33/44] RISC-V: Also allow FP conditions in `riscv_expand_conditional_move' [34/44] RISC-V: Provide FP conditional-branch instructions for if-conversion [35/44] RISC-V: Avoid extraneous integer comparison for FP comparisons [36/44] RISC-V/testsuite: Add branched cases for generic FP cond moves [37/44] RISC-V/testsuite: Add branchless cases for generic FP cond moves [38/44] RISC-V/testsuite: Add branched cases for generic FP cond adds [39/44] RISC-V/testsuite: Add branchless cases for generic FP cond adds [40/44] RISC-V: Handle FP NE operator via inversion in cond-operation expansion [41/44] RISC-V/testsuite: Add branched cases for FP NE cond-move operations [42/44] RISC-V/testsuite: Add branched cases for FP NE cond-move operations [43/44] RISC-V/testsuite: Add branched cases for FP NE cond-add operation [44/44] RISC-V/testsuite: Add branchless cases for FP NE cond-add operation

Message ID

alpine.DEB.2.20.2311171501110.5892@tpp.orcam.me.uk

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 75EC9385C32C
Date: Sun, 19 Nov 2023 05:36:52 +0000 (GMT)
From: "Maciej W. Rozycki" <macro@embecosm.com>
To: gcc-patches@gcc.gnu.org
cc: Andrew Waterman <andrew@sifive.com>,
 Jim Wilson <jim.wilson.gcc@gmail.com>,
 Kito Cheng <kito.cheng@gmail.com>, Palmer Dabbelt <palmer@dabbelt.com>
Subject: [PATCH 09/44] RISC-V: Rework branch costing model for
 if-conversion
In-Reply-To: <alpine.DEB.2.20.2311171315580.5892@tpp.orcam.me.uk>
Message-ID: <alpine.DEB.2.20.2311171501110.5892@tpp.orcam.me.uk>
References: <alpine.DEB.2.20.2311171315580.5892@tpp.orcam.me.uk>
User-Agent: Alpine 2.20 (DEB 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Precedence: list
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org

Series

RISC-V: Various if-conversion fixes and improvements | expand

Commit Message

Maciej W. Rozycki Nov. 19, 2023, 5:36 a.m. UTC

The generic branch costing model for if-conversion assumes a fixed cost
of COSTS_N_INSNS (2) for a conditional branch, and that one half of that
cost comes from a preceding condition-set instruction, such as with
MODE_CC targets, and then the other half of that cost is for the actual
branch instruction. This is hardcoded for `if_info.original_cost' in
`noce_find_if_block' and regardless of the cost set for branches via
BRANCH_COST.

Then `default_max_noce_ifcvt_seq_cost' instructs if-conversion to prefer
a branchless sequence as costly as high as triple the BRANCH_COST value
set. This is apparently to make up for the inability to accurately
guess the branch penalty.

Consequently for the BRANCH_COST of 3 we commonly set for tuning,
if-conversion will consider branchless sequences costing 3 * 3 - 2 = 7
instruction units more than a corresponding branch sequence. For the
BRANCH_COST of 4 such as with `sifive-7-series' tuning this is even
worse, at 3 * 4 - 2 = 10. Effectively it means a branchless sequence
will always be chosen if available, even a very inefficient one.

Rework the branch costing model to better match our architecture,
observing in particular that we have no preparatory instructions for
branches so that the cost of a branch is naked BRANCH_COST plus any
extra overhead the processing of a branch's source RTX might incur.

Provide TARGET_INSN_COST and TARGET_MAX_NOCE_IFCVT_SEQ_COST handlers
than that return suitable cost based on BRANCH_COST. The latter hook
usually returns a value that is lower than the cost of the corresponding
branched sequence. This is because we don't really want to produce a
branchless sequence that is more expensive than the original branched
sequence. If this turns out too conservative for some corner case, then
this choice might be revisited.

Then we don't want to fiddle with `noce_find_if_block' without a lot of
cross-target verification, so add TARGET_NOCE_CONVERSION_PROFITABLE_P
defined such that it subtracts the fixed COSTS_N_INSNS (2) cost from the
cost of the original branched sequence supplied and instead adds actual
branch cost calculated from the conditional branch instruction used. It
is then further tweaked according to simple analysis of the replacement
branchless sequence produced so as to cancel the cost of an extraneous
zero extend operation produced by `noce_try_store_flag_mask' as observed
with gcc/testsuite/gcc.target/riscv/pr105314.c.

Tweak the testsuite accordingly and set `-mbranch-cost=' explicitly for
the relevant cases so that the expected if-conversion transformation is
made regardless of the default BRANCH_COST value of tuning in effect.
Some of these settings will be lowered later on as deficiencies in
branchless sequence generation have been fixed that lower their cost
calculated by if-conversion.

gcc/
* config/riscv/riscv.cc (riscv_insn_cost): New function.
(riscv_max_noce_ifcvt_seq_cost): Likewise.
(riscv_noce_conversion_profitable_p): Likewise.
(TARGET_INSN_COST): New macro.
(TARGET_MAX_NOCE_IFCVT_SEQ_COST): New macro.
(TARGET_NOCE_CONVERSION_PROFITABLE_P: New macro.

gcc/testsuite/
* gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c:
Explicitly set the branch cost.
* gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c:
Likewise.
* gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c:
Likewise.
* gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c:
Likewise.
* gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c:
Likewise.
* gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c:
Likewise.
---
FWIW I don't understand why the test cases absolutely HAD to have such
overlong names guaranteed to exceed our 80 column limit in any context.
It's such a pain to handle.
---
gcc/config/riscv/riscv.cc | 120 ++++++++++
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c | 4
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c | 4
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c | 4
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c | 4
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c | 4
gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c | 4
7 files changed, 132 insertions(+), 12 deletions(-)

gcc-riscv-branch-cost.diff

Comments

Jeff Law Nov. 19, 2023, 6:52 p.m. UTC | #1

On 11/18/23 22:36, Maciej W. Rozycki wrote:
> The generic branch costing model for if-conversion assumes a fixed cost
> of COSTS_N_INSNS (2) for a conditional branch, and that one half of that
> cost comes from a preceding condition-set instruction, such as with
> MODE_CC targets, and then the other half of that cost is for the actual
> branch instruction.  This is hardcoded for `if_info.original_cost' in
> `noce_find_if_block' and regardless of the cost set for branches via
> BRANCH_COST.
> 
> Then `default_max_noce_ifcvt_seq_cost' instructs if-conversion to prefer
> a branchless sequence as costly as high as triple the BRANCH_COST value
> set.  This is apparently to make up for the inability to accurately
> guess the branch penalty.
> 
> Consequently for the BRANCH_COST of 3 we commonly set for tuning,
> if-conversion will consider branchless sequences costing 3 * 3 - 2 = 7
> instruction units more than a corresponding branch sequence.  For the
> BRANCH_COST of 4 such as with `sifive-7-series' tuning this is even
> worse, at 3 * 4 - 2 = 10.  Effectively it means a branchless sequence
> will always be chosen if available, even a very inefficient one.
> 
> Rework the branch costing model to better match our architecture,
> observing in particular that we have no preparatory instructions for
> branches so that the cost of a branch is naked BRANCH_COST plus any
> extra overhead the processing of a branch's source RTX might incur.
> 
> Provide TARGET_INSN_COST and TARGET_MAX_NOCE_IFCVT_SEQ_COST handlers
> than that return suitable cost based on BRANCH_COST.  The latter hook
> usually returns a value that is lower than the cost of the corresponding
> branched sequence.  This is because we don't really want to produce a
> branchless sequence that is more expensive than the original branched
> sequence.  If this turns out too conservative for some corner case, then
> this choice might be revisited.
> 
> Then we don't want to fiddle with `noce_find_if_block' without a lot of
> cross-target verification, so add TARGET_NOCE_CONVERSION_PROFITABLE_P
> defined such that it subtracts the fixed COSTS_N_INSNS (2) cost from the
> cost of the original branched sequence supplied and instead adds actual
> branch cost calculated from the conditional branch instruction used.  It
> is then further tweaked according to simple analysis of the replacement
> branchless sequence produced so as to cancel the cost of an extraneous
> zero extend operation produced by `noce_try_store_flag_mask' as observed
> with gcc/testsuite/gcc.target/riscv/pr105314.c.
> 
> Tweak the testsuite accordingly and set `-mbranch-cost=' explicitly for
> the relevant cases so that the expected if-conversion transformation is
> made regardless of the default BRANCH_COST value of tuning in effect.
> Some of these settings will be lowered later on as deficiencies in
> branchless sequence generation have been fixed that lower their cost
> calculated by if-conversion.
As I suspect you know a big part of the problem here is that BRANCH_COST 
and rtx_cost don't have any common scale and thus trying to compare 
BRANCH_COST to RTX_COST doesn't have well defined meaning.

That hasn't kept us from trying to do precisely that and the result has 
always been less than satisfactory.  You're introducing more, but I 
don't think there's a reasonable way out of this mess at this point.


> 
> 	gcc/
> 	* config/riscv/riscv.cc (riscv_insn_cost): New function.
> 	(riscv_max_noce_ifcvt_seq_cost): Likewise.
> 	(riscv_noce_conversion_profitable_p): Likewise.
> 	(TARGET_INSN_COST): New macro.
> 	(TARGET_MAX_NOCE_IFCVT_SEQ_COST): New macro.
> 	(TARGET_NOCE_CONVERSION_PROFITABLE_P: New macro.
> 
> 	gcc/testsuite/
> 	* gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c:
> 	Explicitly set the branch cost.
> 	* gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c:
> 	Likewise.
> 	* gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c:
> 	Likewise.
> 	* gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c:
> 	Likewise.
> 	* gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c:
> 	Likewise.
> 	* gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c:
> 	Likewise.
> ---
> FWIW I don't understand why the test cases absolutely HAD to have such
> overlong names guaranteed to exceed our 80 column limit in any context.
> It's such a pain to handle.
I dislike the long names as well.  I nearly changed them myself as part 
of the eswin submission, but that seemed a bit gratituous to me so I 
left them as-is.

If you wanted to rename them, be my guest, consider it pre-approved ;-)

WRT the extraneous zero-extension.  Isn't that arguably a bug in the scc 
expander for risc-v?  Fixing that isn't a prerequisite here, but it 
probably worth a bit of someone's time.

OK for the trunk.

jeff

Maciej W. Rozycki Nov. 23, 2023, 6:34 p.m. UTC | #2

On Sun, 19 Nov 2023, Jeff Law wrote:

> As I suspect you know a big part of the problem here is that BRANCH_COST and
> rtx_cost don't have any common scale and thus trying to compare BRANCH_COST to
> RTX_COST doesn't have well defined meaning.

 We do have preexisting places using COSTS_N_INSNS (BRANCH_COST ()) 
though, as documented in ifcvt.cc:

     ??? Actually, instead of the branch instruction costs we might want
     to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */

so it seems the right direction, and given that we expose this measure to 
the user (and at the very least GCC developers implementing new tuning 
microarchitectures) I think it's the only sane way to do branch costing: 
define the measure in terms of how many ordinary ALU instructions a branch 
is statistically equivalent to.

 We may have to consider whether we want/need higher resolution here, 
effectively branch costs including a fractional part.

> That hasn't kept us from trying to do precisely that and the result has always
> been less than satisfactory.  You're introducing more, but I don't think
> there's a reasonable way out of this mess at this point.

 Ack.

> > FWIW I don't understand why the test cases absolutely HAD to have such
> > overlong names guaranteed to exceed our 80 column limit in any context.
> > It's such a pain to handle.
> I dislike the long names as well.  I nearly changed them myself as part of the
> eswin submission, but that seemed a bit gratituous to me so I left them as-is.
> 
> If you wanted to rename them, be my guest, consider it pre-approved ;-)

 Ack.

> WRT the extraneous zero-extension.  Isn't that arguably a bug in the scc
> expander for risc-v?  Fixing that isn't a prerequisite here, but it probably
> worth a bit of someone's time.

 I've looked at it already and it's the middle end that ends up with the 
zero-extension, specifically `convert_move' invoked from `emit_cstore' 
down the call to `noce_try_store_flag_mask', to widen the output from 
`cstoredi4', so I don't think we can do anything in the backend to prevent 
it from happening.  And neither I think we can do anything useful about 
`cstoredi4' having a SImode output, as it's a pattern matched by name 
rather than RTX, so we can't provide variants having a SImode and a DImode 
output each both at a time, as that would cause a name clash.

 What we could do is adding new named patterns with the output mode spelt 
out explicitly, such as `cstoresidi4', `cstoredidi4', etc., but that would 
be quite a big rewrite.  Maybe worth doing, maybe not, I don't know.  For 
the time being the hack I have made does the trick.

  Maciej

Jeff Law Nov. 29, 2023, 1:19 a.m. UTC | #3

On 11/23/23 11:34, Maciej W. Rozycki wrote:
> On Sun, 19 Nov 2023, Jeff Law wrote:
> 
>> As I suspect you know a big part of the problem here is that BRANCH_COST and
>> rtx_cost don't have any common scale and thus trying to compare BRANCH_COST to
>> RTX_COST doesn't have well defined meaning.
> 
>   We do have preexisting places using COSTS_N_INSNS (BRANCH_COST ())
> though, as documented in ifcvt.cc:
> 
>       ??? Actually, instead of the branch instruction costs we might want
>       to use COSTS_N_INSNS (BRANCH_COST ()) as in other places.  */
> 
> so it seems the right direction, and given that we expose this measure to
> the user (and at the very least GCC developers implementing new tuning
> microarchitectures) I think it's the only sane way to do branch costing:
> define the measure in terms of how many ordinary ALU instructions a branch
> is statistically equivalent to.
Oh, it probably is the only sane way at this point.  To do anything more 
sane we'd have to disassociate the BRANCH_COST uses during tree/gimple 
from those in RTL-land.

FWIW, I was looking at a regression with our internal tests after your 
changes.   It was quite nice to see how well twiddling -mbranch-cost 
correlated to how many instructions we would allow in a conditional move 
sequence.

The downside is it highlighted the gimple vs RTL use issue.  I'm 
confident that we would like to see a higher branch cost in the RTL 
phases for our uarch, but I'm much less comfortable with how that's 
going to change the decisions made in trees/gimple.  We'll have to 
investigate that at some depth.



> 
>> WRT the extraneous zero-extension.  Isn't that arguably a bug in the scc
>> expander for risc-v?  Fixing that isn't a prerequisite here, but it probably
>> worth a bit of someone's time.
> 
>   I've looked at it already and it's the middle end that ends up with the
> zero-extension, specifically `convert_move' invoked from `emit_cstore'
> down the call to `noce_try_store_flag_mask', to widen the output from
> `cstoredi4', so I don't think we can do anything in the backend to prevent
> it from happening.  And neither I think we can do anything useful about
> `cstoredi4' having a SImode output, as it's a pattern matched by name
> rather than RTX, so we can't provide variants having a SImode and a DImode
> output each both at a time, as that would cause a name clash.
We're actually tracking some of these extraneous extensions.  Do you 
happen to know if the zero-extended object happens to be (subreg:SI 
(reg:DI)) kind of construct?  That's the kind of thing we're chasing 
down right now from various points.  Vineet has already fixed one class 
of them.  Jivan and I are looking at others.

jeff

Maciej W. Rozycki Nov. 29, 2023, 12:01 p.m. UTC | #4

On Tue, 28 Nov 2023, Jeff Law wrote:

> FWIW, I was looking at a regression with our internal tests after your
> changes.   It was quite nice to see how well twiddling -mbranch-cost
> correlated to how many instructions we would allow in a conditional move
> sequence.

 I'm a bit concerned though that our interpretation of `-mbranch-cost=0' 
is different from the middle end's, such as in `emit_store_flag':

  /* If we reached here, we can't do this with a scc insn, however there
     are some comparisons that can be done in other ways.  Don't do any
     of these cases if branches are very cheap.  */
  if (BRANCH_COST (optimize_insn_for_speed_p (), false) == 0)
    return 0;

> The downside is it highlighted the gimple vs RTL use issue.  I'm confident
> that we would like to see a higher branch cost in the RTL phases for our
> uarch, but I'm much less comfortable with how that's going to change the
> decisions made in trees/gimple.  We'll have to investigate that at some depth.

 Ack.

> >  I've looked at it already and it's the middle end that ends up with the
> > zero-extension, specifically `convert_move' invoked from `emit_cstore'
> > down the call to `noce_try_store_flag_mask', to widen the output from
> > `cstoredi4', so I don't think we can do anything in the backend to prevent
> > it from happening.  And neither I think we can do anything useful about
> > `cstoredi4' having a SImode output, as it's a pattern matched by name
> > rather than RTX, so we can't provide variants having a SImode and a DImode
> > output each both at a time, as that would cause a name clash.
> We're actually tracking some of these extraneous extensions.  Do you happen to
> know if the zero-extended object happens to be (subreg:SI (reg:DI)) kind of
> construct?  That's the kind of thing we're chasing down right now from various
> points.  Vineet has already fixed one class of them.  Jivan and I are looking
> at others.

 Under GDB it's a plain move from (reg:SI 140) to (reg:DI 139), as in the 
FROM and TO arguments to `convert_move' respectively.  This makes it call 
`convert_mode_scalar', which then chooses between `zext_optab' and 
`sext_optab' as appropriate, under:

  /* If the target has a converter from FROM_MODE to TO_MODE, use it.  */

to produce:

(set (reg:DI 139)
    (zero_extend:DI (reg:SI 140)))

ending up with this complete sequence:

(insn 27 0 28 (set (reg:SI 140)
        (eq:SI (reg/v:DI 137 [ c ])
            (const_int 0 [0]))) -1
     (nil))
(insn 28 27 29 (set (reg:DI 139)
        (zero_extend:DI (reg:SI 140))) -1
     (nil))
(insn 29 28 30 (set (reg:DI 141)
        (neg:DI (reg:DI 139))) -1
     (nil))
(insn 30 29 0 (set (reg/v:DI 134 [ <retval> ])
        (and:DI (reg/v:DI 135 [ a ])
            (reg:DI 141))) -1
     (nil))

passed to `targetm.noce_conversion_profitable_p' right away.  Maybe you 
can teach `emit_cstore' or `convert_move' to use a subreg when it is known
for the particular target that the value produced by the conditional-set 
machine instruction emitted by `cstoreMODE4' is valid unchanged in both 
modes.

 You can fiddle with it by trying:

$ gcc -march=rv64gc -mbranch-cost=3 -O2 -S gcc/testsuite/gcc.target/riscv/pr105314.c

Set a breakpoint at `noce_try_store_flag_mask' and then single-step to see 
how things proceed.

  Maciej

Index: gcc/gcc/config/riscv/riscv.cc
===================================================================
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -43,6 +43,7 @@  along with GCC; see the file COPYING3.
 #include "calls.h"
 #include "function.h"
 #include "explow.h"
+#include "ifcvt.h"
 #include "memmodel.h"
 #include "emit-rtl.h"
 #include "reload.h"
@@ -3295,6 +3296,118 @@  riscv_address_cost (rtx addr, machine_mo
   return riscv_address_insns (addr, mode, false);
 }
 
+/* Implement TARGET_INSN_COST.  We factor in the branch cost in the cost
+   calculation for conditional branches: one unit is considered the cost
+   of microarchitecture-dependent actual branch execution and therefore
+   multiplied by BRANCH_COST and any remaining units are considered fixed
+   branch overhead.  */
+
+static int
+riscv_insn_cost (rtx_insn *insn, bool speed)
+{
+  rtx x = PATTERN (insn);
+  int cost = pattern_cost (x, speed);
+
+  if (JUMP_P (insn)
+      && GET_CODE (x) == SET
+      && GET_CODE (SET_DEST (x)) == PC
+      && GET_CODE (SET_SRC (x)) == IF_THEN_ELSE)
+    cost += COSTS_N_INSNS (BRANCH_COST (speed, false) - 1);
+  return cost;
+}
+
+/* Implement TARGET_MAX_NOCE_IFCVT_SEQ_COST.  Like the default implementation,
+   but we consider cost units of branch instructions equal to cost units of
+   other instructions.  */
+
+static unsigned int
+riscv_max_noce_ifcvt_seq_cost (edge e)
+{
+  bool predictable_p = predictable_edge_p (e);
+
+  if (predictable_p)
+    {
+      if (OPTION_SET_P (param_max_rtl_if_conversion_predictable_cost))
+	return param_max_rtl_if_conversion_predictable_cost;
+    }
+  else
+    {
+      if (OPTION_SET_P (param_max_rtl_if_conversion_unpredictable_cost))
+	return param_max_rtl_if_conversion_unpredictable_cost;
+    }
+
+  return COSTS_N_INSNS (BRANCH_COST (true, predictable_p));
+}
+
+/* Implement TARGET_NOCE_CONVERSION_PROFITABLE_P.  We replace the cost of a
+   conditional branch assumed by `noce_find_if_block' at `COSTS_N_INSNS (2)'
+   by our actual conditional branch cost, observing that our branches test
+   conditions directly, so there is no preparatory extra condition-set
+   instruction.  */
+
+static bool
+riscv_noce_conversion_profitable_p (rtx_insn *seq,
+				    struct noce_if_info *if_info)
+{
+  struct noce_if_info riscv_if_info = *if_info;
+
+  riscv_if_info.original_cost -= COSTS_N_INSNS (2);
+  riscv_if_info.original_cost += insn_cost (if_info->jump, if_info->speed_p);
+
+  /* Hack alert!  When `noce_try_store_flag_mask' uses `cstore<mode>4'
+     to emit a conditional set operation on DImode output it comes up
+     with a sequence such as:
+
+     (insn 26 0 27 (set (reg:SI 140)
+             (eq:SI (reg/v:DI 137 [ c ])
+                 (const_int 0 [0]))) 302 {*seq_zero_disi}
+          (nil))
+     (insn 27 26 28 (set (reg:DI 139)
+             (zero_extend:DI (reg:SI 140))) 116 {*zero_extendsidi2_internal}
+          (nil))
+
+     because our `cstore<mode>4' pattern expands to an insn that gives
+     a SImode output.  The output of conditional set is 0 or 1 boolean,
+     so it is valid for input in any scalar integer mode and therefore
+     combine later folds the zero extend operation into an equivalent
+     conditional set operation that produces a DImode output, however
+     this redundant zero extend operation counts towards the cost of
+     the replacement sequence.  Compensate for that by incrementing the
+     cost of the original sequence as well as the maximum sequence cost
+     accordingly.  */
+  rtx last_dest = NULL_RTX;
+  for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn))
+    {
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+
+      rtx x = PATTERN (insn);
+      if (NONJUMP_INSN_P (insn)
+	  && GET_CODE (x) == SET)
+	{
+	  rtx src = SET_SRC (x);
+	  if (last_dest != NULL_RTX
+	      && GET_CODE (src) == ZERO_EXTEND
+	      && REG_P (XEXP (src, 0))
+	      && REGNO (XEXP (src, 0)) == REGNO (last_dest))
+	    {
+	      riscv_if_info.original_cost += COSTS_N_INSNS (1);
+	      riscv_if_info.max_seq_cost += COSTS_N_INSNS (1);
+	    }
+	  last_dest = NULL_RTX;
+	  rtx dest = SET_DEST (x);
+	  if (COMPARISON_P (src)
+	      && REG_P (dest)
+	      && GET_MODE (dest) == SImode)
+	    last_dest = dest;
+	}
+      else
+	last_dest = NULL_RTX;
+    }
+
+  return default_noce_conversion_profitable_p (seq, &riscv_if_info);
+}
+
 /* Return one word of double-word value OP.  HIGH_P is true to select the
    high part or false to select the low part. */
 
@@ -9823,6 +9936,13 @@  riscv_preferred_else_value (unsigned ifn
 #define TARGET_RTX_COSTS riscv_rtx_costs
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST riscv_address_cost
+#undef TARGET_INSN_COST
+#define TARGET_INSN_COST riscv_insn_cost
+
+#undef TARGET_MAX_NOCE_IFCVT_SEQ_COST
+#define TARGET_MAX_NOCE_IFCVT_SEQ_COST riscv_max_noce_ifcvt_seq_cost
+#undef TARGET_NOCE_CONVERSION_PROFITABLE_P
+#define TARGET_NOCE_CONVERSION_PROFITABLE_P riscv_noce_conversion_profitable_p
 
 #undef TARGET_ASM_FILE_START
 #define TARGET_ASM_FILE_START riscv_file_start
Index: gcc/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c
+++ gcc/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_imm.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zicond -mabi=lp64d" { target { rv64 } } } */
-/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64d -mbranch-cost=4" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f -mbranch-cost=4" { target { rv32 } } } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Os" "-Oz"} } */
 
 long primitiveSemantics_compare_imm_return_imm_imm_00(long a, long b) {
Index: gcc/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c
+++ gcc/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_imm_reg.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zicond -mabi=lp64d" { target { rv64 } } } */
-/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64d -mbranch-cost=4" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f -mbranch-cost=4" { target { rv32 } } } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Os" "-Oz"} } */
 
 long primitiveSemantics_compare_imm_return_imm_reg_00(long a, long b) {
Index: gcc/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c
+++ gcc/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_imm_return_reg_reg.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zicond -mabi=lp64d" { target { rv64 } } } */
-/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64d -mbranch-cost=5" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f -mbranch-cost=5" { target { rv32 } } } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Os" "-Oz"} } */
 
 long primitiveSemantics_compare_imm_return_reg_reg_00(long a, long b, long c) {
Index: gcc/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c
+++ gcc/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_imm.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zicond -mabi=lp64d" { target { rv64 } } } */
-/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64d -mbranch-cost=4" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f -mbranch-cost=4" { target { rv32 } } } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Os" "-Oz"} } */
 
 long primitiveSemantics_compare_reg_return_imm_imm_00(long a, long b, long c) {
Index: gcc/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c
+++ gcc/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_imm_reg.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zicond -mabi=lp64d" { target { rv64 } } } */
-/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64d -mbranch-cost=4" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f -mbranch-cost=4" { target { rv32 } } } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Os" "-Oz"} } */
 
 long primitiveSemantics_compare_reg_return_imm_reg_00(long a, long b, long c) {
Index: gcc/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c
+++ gcc/gcc/testsuite/gcc.target/riscv/zicond-primitiveSemantics_compare_reg_return_reg_reg.c
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zicond -mabi=lp64d" { target { rv64 } } } */
-/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64d -mbranch-cost=5" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_zicond -mabi=ilp32f -mbranch-cost=5" { target { rv32 } } } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Og" "-Os" "-Oz"} } */
 
 long primitiveSemantics_compare_reg_return_reg_reg_00(long a, long b, long c,

[09/44] RISC-V: Rework branch costing model for if-conversion

Commit Message

Comments

Patch