[11/29,arm] Reduce cost of insns that are simple reg-reg moves.
diff mbox series

Message ID 20191018194900.34795-12-Richard.Earnshaw@arm.com
State New
Headers show
Series
  • Rewrite DImode arithmetic support
Related show

Commit Message

Richard Earnshaw (lists) Oct. 18, 2019, 7:48 p.m. UTC
Consider this sequence during combine:

Trying 18, 7 -> 22:
   18: r118:SI=r122:SI
      REG_DEAD r122:SI
    7: r114:SI=0x1-r118:SI-ltu(cc:CC_RSB,0)
      REG_DEAD r118:SI
      REG_DEAD cc:CC_RSB
   22: r1:SI=r114:SI
      REG_DEAD r114:SI
Failed to match this instruction:
(set (reg:SI 1 r1 [+4 ])
    (minus:SI (geu:SI (reg:CC_RSB 100 cc)
            (const_int 0 [0]))
        (reg:SI 122)))
Successfully matched this instruction:
(set (reg:SI 114)
    (geu:SI (reg:CC_RSB 100 cc)
        (const_int 0 [0])))
Successfully matched this instruction:
(set (reg:SI 1 r1 [+4 ])
    (minus:SI (reg:SI 114)
        (reg:SI 122)))
allowing combination of insns 18, 7 and 22
original costs 4 + 4 + 4 = 12
replacement costs 8 + 4 = 12

The costs are all correct, but we really don't want this combination
to take place.  The original costs contain an insn that is a simple
move of one pseudo register to another and it is extremely likely that
register allocation will eliminate this insn entirely.  On the other
hand, the resulting sequence really does expand into a sequence that
costs 12 (ie 3 insns).

We don't want to prevent combine from eliminating such moves, as this
can expose more combine opportunities, but we shouldn't rate them as
profitable in themselves.  We can do this be adjusting the costs
slightly so that the benefit of eliminating such a simple insn is
reduced.

We only do this before register allocation; after allocation we give
such insns their full cost.

	* config/arm/arm.c (arm_insn_cost): New function.
	(TARGET_INSN_COST): Override default definition.
---
 gcc/config/arm/arm.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Segher Boessenkool Oct. 19, 2019, 4:17 p.m. UTC | #1
On Fri, Oct 18, 2019 at 08:48:42PM +0100, Richard Earnshaw wrote:
> 
> Consider this sequence during combine:
> 
> Trying 18, 7 -> 22:
>    18: r118:SI=r122:SI
>       REG_DEAD r122:SI
>     7: r114:SI=0x1-r118:SI-ltu(cc:CC_RSB,0)
>       REG_DEAD r118:SI
>       REG_DEAD cc:CC_RSB
>    22: r1:SI=r114:SI
>       REG_DEAD r114:SI

r122 is dead here.  combine will have tried just 7+22 before, and that
failed; it now only succeeds because the register copy is removed.

All like you say and what your patch is about.  But, this is a generic
problem, that should be solved in combine itself (whether or not you also
want to reduce the insn cost here).

Maybe we should simply disallow pseudo-to-pseudo (or even all) copies when
combining more than two insns, always?  I'll experiment.

> We don't want to prevent combine from eliminating such moves, as this
> can expose more combine opportunities, but we shouldn't rate them as
> profitable in themselves.  We can do this be adjusting the costs
> slightly so that the benefit of eliminating such a simple insn is
> reduced.

Yes, but combine should have removed the move in a 2->1 combination
already, if it is beneficial: both 18->7 and 7->22 should have combined
just fine.  This also points to a potential target problem: why are those
two combinations not allowed?


Segher
Richard Earnshaw (lists) Oct. 21, 2019, 3:46 p.m. UTC | #2
On 19/10/2019 17:17, Segher Boessenkool wrote:
> On Fri, Oct 18, 2019 at 08:48:42PM +0100, Richard Earnshaw wrote:
>>
>> Consider this sequence during combine:
>>
>> Trying 18, 7 -> 22:
>>     18: r118:SI=r122:SI
>>        REG_DEAD r122:SI
>>      7: r114:SI=0x1-r118:SI-ltu(cc:CC_RSB,0)
>>        REG_DEAD r118:SI
>>        REG_DEAD cc:CC_RSB
>>     22: r1:SI=r114:SI
>>        REG_DEAD r114:SI
> 
> r122 is dead here.  combine will have tried just 7+22 before, and that
> failed; it now only succeeds because the register copy is removed.
> 
> All like you say and what your patch is about.  But, this is a generic
> problem, that should be solved in combine itself (whether or not you also
> want to reduce the insn cost here).

This patch can easily be reverted if combine is changed; it doesn't 
really interfere with the other patches in the series.

My real problem was that this case didn't really show up until I 
introduced other patches in this series and this combine simplification 
was then causing the generated code to be significantly worse.

> 
> Maybe we should simply disallow pseudo-to-pseudo (or even all) copies when
> combining more than two insns, always?  I'll experiment.

> 
>> We don't want to prevent combine from eliminating such moves, as this
>> can expose more combine opportunities, but we shouldn't rate them as
>> profitable in themselves.  We can do this be adjusting the costs
>> slightly so that the benefit of eliminating such a simple insn is
>> reduced.
> 
> Yes, but combine should have removed the move in a 2->1 combination
> already, if it is beneficial: both 18->7 and 7->22 should have combined
> just fine.  This also points to a potential target problem: why are those
> two combinations not allowed?
> 
> 


In this case it's because we have a generic pattern for

reg = const - reg - ltu(ccreg)

and the specific case of const == 1, which is then re-ordered as

reg = (1 - ltu(ccreg)) - reg

=> reg = geu(ccreg) - reg

and we don't have a pattern for that.  It would be possible to write one 
if really needed, but because the GEU has to match several different 
modes, it's a bit fiddly and results in a lot of code for not much real 
benefit.

For the 2-insn case we don't try a split and simply give up; but when we 
have a three-insn sequence we then combine tries harder and the generic 
splitter does rewrite that in 2 insns.

Perhaps combine should never try a 3, or 4 insn combination where i0 or 
i1 is a simple reg-reg move and only feeds one subsequent insn in the 
sequence on the basis that if that was really going to help then it 
would have been caught by the two-insn sequence.  If it feeds both i2 
and i3 then that's a different matter, of course.

R.
Segher Boessenkool Oct. 21, 2019, 7:51 p.m. UTC | #3
On Mon, Oct 21, 2019 at 04:46:47PM +0100, Richard Earnshaw (lists) wrote:
> On 19/10/2019 17:17, Segher Boessenkool wrote:
> >Maybe we should simply disallow pseudo-to-pseudo (or even all) copies when
> >combining more than two insns, always?  I'll experiment.

> For the 2-insn case we don't try a split and simply give up; but when we 
> have a three-insn sequence we then combine tries harder and the generic 
> splitter does rewrite that in 2 insns.

When doing a 2-2 combination, we don't try to split currently.  We could,
if that ever is useful, but that needs more work (to make sure we won't
loop, for one thing).

> Perhaps combine should never try a 3, or 4 insn combination where i0 or 
> i1 is a simple reg-reg move and only feeds one subsequent insn in the 
> sequence on the basis that if that was really going to help then it 
> would have been caught by the two-insn sequence.  If it feeds both i2 
> and i3 then that's a different matter, of course.

Yeah...  Maybe a simple move as producer where the reg dies in the consumer
should not be tried with 3 or more insns?  (Never doing combinations with
trivial copies results in a few in a thousand extra insns, pretty bad).


Segher
Richard Earnshaw (lists) Oct. 22, 2019, 1:31 p.m. UTC | #4
On 21/10/2019 16:46, Richard Earnshaw (lists) wrote:
> On 19/10/2019 17:17, Segher Boessenkool wrote:

>> Yes, but combine should have removed the move in a 2->1 combination
>> already, if it is beneficial: both 18->7 and 7->22 should have combined
>> just fine.  This also points to a potential target problem: why are those
>> two combinations not allowed?
>>
>>
> 
> 
> In this case it's because we have a generic pattern for
> 
> reg = const - reg - ltu(ccreg)
> 
> and the specific case of const == 1, which is then re-ordered as
> 
> reg = (1 - ltu(ccreg)) - reg
> 
> => reg = geu(ccreg) - reg
> 
> and we don't have a pattern for that.  It would be possible to write one 
> if really needed, but because the GEU has to match several different 
> modes, it's a bit fiddly and results in a lot of code for not much real 
> benefit.

I was thinking about this some more last night and realised I already 
had all the logic in place to do this.


https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01569.html

So now we will do the right thing in this particular case.  I still 
think, however, that with the current combine approach, costing reg-reg 
pseudo moves at 2 rather than 4 is a useful safeguard to have.

R.

Patch
diff mbox series

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b91b52f6d51..e33b6b14d28 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -181,6 +181,7 @@  static bool arm_have_conditional_execution (void);
 static bool arm_cannot_force_const_mem (machine_mode, rtx);
 static bool arm_legitimate_constant_p (machine_mode, rtx);
 static bool arm_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static int arm_insn_cost (rtx_insn *, bool);
 static int arm_address_cost (rtx, machine_mode, addr_space_t, bool);
 static int arm_register_move_cost (machine_mode, reg_class_t, reg_class_t);
 static int arm_memory_move_cost (machine_mode, reg_class_t, bool);
@@ -510,6 +511,8 @@  static const struct attribute_spec arm_attribute_table[] =
 #define TARGET_RTX_COSTS arm_rtx_costs
 #undef  TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST arm_address_cost
+#undef TARGET_INSN_COST
+#define TARGET_INSN_COST arm_insn_cost
 
 #undef TARGET_SHIFT_TRUNCATION_MASK
 #define TARGET_SHIFT_TRUNCATION_MASK arm_shift_truncation_mask
@@ -11486,6 +11489,24 @@  arm_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
   return result;
 }
 
+static int
+arm_insn_cost (rtx_insn *insn, bool speed)
+{
+  int cost;
+
+  /* Don't cost a simple reg-reg move at a full insn cost: such moves
+     will likely disappear during register allocation.  */
+  if (!reload_completed
+      && GET_CODE (PATTERN (insn)) == SET
+      && REG_P (SET_DEST (PATTERN (insn)))
+      && REG_P (SET_SRC (PATTERN (insn))))
+    return 2;
+  cost = pattern_cost (PATTERN (insn), speed);
+  /* If the cost is zero, then it's likely a complex insn.  We don't want the
+     cost of these to be less than something we know about.  */
+  return cost ? cost : COSTS_N_INSNS (2);
+}
+
 /* All address computations that can be done are free, but rtx cost returns
    the same for practically all of them.  So we weight the different types
    of address here in the order (most pref first):