diff mbox series

ifcvt.cc: Prevent excessive if-conversion for conditional moves

Message ID 076a3744-f608-6f31-7244-2bf7ab06cdb1@yahoo.co.jp
State New
Headers show
Series ifcvt.cc: Prevent excessive if-conversion for conditional moves | expand

Commit Message

Takayuki 'January June' Suwa Jan. 11, 2023, 4:20 a.m. UTC
Currently, cond_move_process_if_block() does the conversion without
balancing the cost of the converted sequence with the original one, but
this should be checked by calling targetm.noce_conversion_profitable_p().

Doing so allows us to provide a way based on the target-specific cost
estimate, to prevent unwanted size growth due to excessive conditional
moves on optimizing for size.

On optimizing for speed, default_noce_conversion_profitable_p() allows
plenty of headroom, so this patch has little impact.

Also, if the target-specific cost estimate is accurate or allows for
margins, the impact should be similarly small.

gcc/ChangeLog:

	* ifcvt.cc (cond_move_process_if_block):
	Consider the result of targetm.noce_conversion_profitable_p()
	when replacing the original sequence with the converted one.
---
 gcc/ifcvt.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Robin Dapp Jan. 11, 2023, 8:02 a.m. UTC | #1
Hi,
 
> On optimizing for speed, default_noce_conversion_profitable_p() allows
> plenty of headroom, so this patch has little impact.
> 
> Also, if the target-specific cost estimate is accurate or allows for
> margins, the impact should be similarly small.
I believe this part of ifcvt does/did not use the costing on purpose.
It will generally convert more sequences than other paths that compare
before and after costs since we just count the number of converted
insns comparing them against the "branch costs".  Similar to rtx costs
they are kind of relative to a single insn but AFAIK it's not used
consistently everywhere.  All the major platforms have low branch costs
nowadays (0 or 1?) thus we won't emit too many conditional moves here.

In general I agree that we should compare costs everywhere and not just
count (the costing should include the branch costs as well) but this would
be a major overhaul.  For your case (assuming xtensa), could you not
tune xtensa_branch_cost?  It is currently 3 allowing up to 4 conditional
moves to be generated.  optimize_function_for_speed_p is already being
passed to the hook so you could make use of that and decrease branch
costs when optimizing for size only.

Regards
 Robin
Takayuki 'January June' Suwa Jan. 12, 2023, 3:34 a.m. UTC | #2
On 2023/01/11 17:02, Robin Dapp wrote:
> Hi,
Hi!

>  
>> On optimizing for speed, default_noce_conversion_profitable_p() allows
>> plenty of headroom, so this patch has little impact.
>>
>> Also, if the target-specific cost estimate is accurate or allows for
>> margins, the impact should be similarly small.
> I believe this part of ifcvt does/did not use the costing on purpose.
> It will generally convert more sequences than other paths that compare
> before and after costs since we just count the number of converted
> insns comparing them against the "branch costs".  Similar to rtx costs
> they are kind of relative to a single insn but AFAIK it's not used
> consistently everywhere.  All the major platforms have low branch costs
> nowadays (0 or 1?) thus we won't emit too many conditional moves here.
> 
> In general I agree that we should compare costs everywhere and not just
> count (the costing should include the branch costs as well) but this would
> be a major overhaul.  For your case (assuming xtensa), could you not
> tune xtensa_branch_cost?  It is currently 3 allowing up to 4 conditional
> moves to be generated.  optimize_function_for_speed_p is already being
> passed to the hook so you could make use of that and decrease branch
> costs when optimizing for size only.
> 
> Regards
>  Robin

Thank you for your detailed explanation.

In my case (for Xtensa), the cost of branching isn't really an issue.
The actual problem (that I think) is the costs of the sequence itself before and after conversion.
It is due to the fact that ifcvt's internal estimation is based on PATTERN(insn), so the instruction lengths ("length" attribute) associated with insns are not well reflected.
This is especially noticeable when optimizing for size (overestimating the original cost).

Currently, in addition to the patch, I have implemented the following code, and I'm confirming that it works roughly well (fine adjustments are still required).

/* Return true if the instruction sequence seq is a good candidate as a
   replacement for the if-convertible sequence described in if_info.  */

static bool
xtensa_noce_conversion_profitable_p (rtx_insn *seq,
				     struct noce_if_info *if_info)
{
  unsigned int cost, original_cost;
  bool speed_p;
  rtx_insn *insn;

  speed_p = if_info->speed_p;  /* of TEST_BB */

  /* Estimate the cost for the replacing sequence.  */
  cost = 0;
  for (insn = seq; insn; insn = NEXT_INSN (insn))
    if (active_insn_p (insn))
      cost += xtensa_insn_cost (insn, speed_p);

  /* Short circuit and margins if optimiziing for speed.  */
  if (speed_p)
    return cost <= if_info->max_seq_cost;

  /* Estimate the cost for the original sequence if optimizing for
     size.  */
  original_cost = xtensa_insn_cost (if_info->jump, speed_p);
  speed_p = optimize_bb_for_speed_p (if_info->then_bb);
  FOR_BB_INSNS (if_info->then_bb, insn)
    if (active_insn_p (insn))
      original_cost += xtensa_insn_cost (insn, speed_p);
  if (if_info->else_bb)
    {
      speed_p = optimize_bb_for_speed_p (if_info->else_bb);
      FOR_BB_INSNS (if_info->else_bb, insn)
	if (active_insn_p (insn))
	  original_cost += xtensa_insn_cost (insn, speed_p);
    }

  return cost <= original_cost;
}
Jeff Law March 11, 2023, 4:30 p.m. UTC | #3
On 1/10/23 21:20, Takayuki 'January June' Suwa via Gcc-patches wrote:
> Currently, cond_move_process_if_block() does the conversion without
> balancing the cost of the converted sequence with the original one, but
> this should be checked by calling targetm.noce_conversion_profitable_p().
> 
> Doing so allows us to provide a way based on the target-specific cost
> estimate, to prevent unwanted size growth due to excessive conditional
> moves on optimizing for size.
> 
> On optimizing for speed, default_noce_conversion_profitable_p() allows
> plenty of headroom, so this patch has little impact.
> 
> Also, if the target-specific cost estimate is accurate or allows for
> margins, the impact should be similarly small.
> 
> gcc/ChangeLog:
> 
> 	* ifcvt.cc (cond_move_process_if_block):
> 	Consider the result of targetm.noce_conversion_profitable_p()
> 	when replacing the original sequence with the converted one.
This is OK for gcc-14 when stage1 opens.  The only way I see including 
it in gcc-13 would be if it fixes a regression.

jeff
Jeff Law April 18, 2023, 8:12 p.m. UTC | #4
On 1/10/23 21:20, Takayuki 'January June' Suwa via Gcc-patches wrote:
> Currently, cond_move_process_if_block() does the conversion without
> balancing the cost of the converted sequence with the original one, but
> this should be checked by calling targetm.noce_conversion_profitable_p().
> 
> Doing so allows us to provide a way based on the target-specific cost
> estimate, to prevent unwanted size growth due to excessive conditional
> moves on optimizing for size.
> 
> On optimizing for speed, default_noce_conversion_profitable_p() allows
> plenty of headroom, so this patch has little impact.
> 
> Also, if the target-specific cost estimate is accurate or allows for
> margins, the impact should be similarly small.
> 
> gcc/ChangeLog:
> 
> 	* ifcvt.cc (cond_move_process_if_block):
> 	Consider the result of targetm.noce_conversion_profitable_p()
> 	when replacing the original sequence with the converted one.
THanks.  I pushed this to the trunk.

Jeff
diff mbox series

Patch

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 008796838f7..a896e14bb3c 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -4350,7 +4350,7 @@  cond_move_process_if_block (struct noce_if_info *if_info)
       goto done;
     }
   seq = end_ifcvt_sequence (if_info);
-  if (!seq)
+  if (!seq || !targetm.noce_conversion_profitable_p (seq, if_info))
     goto done;
 
   loc_insn = first_active_insn (then_bb);