diff mbox series

[09/29,arm] Correctly cost addition with a carry-in

Message ID 20191018194900.34795-10-Richard.Earnshaw@arm.com
State New
Headers show
Series Rewrite DImode arithmetic support | expand

Commit Message

Richard Earnshaw (lists) Oct. 18, 2019, 7:48 p.m. UTC
The cost routine for Arm and Thumb2 was not recognising the idioms that
describe the addition with carry, this results in the instructions
appearing more expensive than they really are, which occasionally can lead
to poor choices by combine.  Recognising all the possible variants is
a little trickier than normal because the expressions can become complex
enough that this is no single canonical from.

	* config/arm/arm.c (strip_carry_operation): New function.
	(arm_rtx_costs_internal, case PLUS): Handle addtion with carry-in
	for SImode.
---
 gcc/config/arm/arm.c | 76 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 65 insertions(+), 11 deletions(-)

Comments

Segher Boessenkool Oct. 19, 2019, 1 p.m. UTC | #1
On Fri, Oct 18, 2019 at 08:48:40PM +0100, Richard Earnshaw wrote:
> 
> The cost routine for Arm and Thumb2 was not recognising the idioms that
> describe the addition with carry, this results in the instructions
> appearing more expensive than they really are, which occasionally can lead
> to poor choices by combine.  Recognising all the possible variants is
> a little trickier than normal because the expressions can become complex
> enough that this is no single canonical from.

There also is the insn_cost hook, which especially for RISC-like targets
is a lot easier to define.


Segher
Richard Earnshaw (lists) Oct. 21, 2019, 2:46 p.m. UTC | #2
On 19/10/2019 14:00, Segher Boessenkool wrote:
> On Fri, Oct 18, 2019 at 08:48:40PM +0100, Richard Earnshaw wrote:
>>
>> The cost routine for Arm and Thumb2 was not recognising the idioms that
>> describe the addition with carry, this results in the instructions
>> appearing more expensive than they really are, which occasionally can lead
>> to poor choices by combine.  Recognising all the possible variants is
>> a little trickier than normal because the expressions can become complex
>> enough that this is no single canonical from.
> 
> There also is the insn_cost hook, which especially for RISC-like targets
> is a lot easier to define.
> 
> 
> Segher
> 

Easier, but not a complete replacement for rtx_costs, so not necessarily 
easier in the end...

R.
Segher Boessenkool Oct. 21, 2019, 3:46 p.m. UTC | #3
On Mon, Oct 21, 2019 at 03:46:53PM +0100, Richard Earnshaw (lists) wrote:
> On 19/10/2019 14:00, Segher Boessenkool wrote:
> >On Fri, Oct 18, 2019 at 08:48:40PM +0100, Richard Earnshaw wrote:
> >>
> >>The cost routine for Arm and Thumb2 was not recognising the idioms that
> >>describe the addition with carry, this results in the instructions
> >>appearing more expensive than they really are, which occasionally can lead
> >>to poor choices by combine.  Recognising all the possible variants is
> >>a little trickier than normal because the expressions can become complex
> >>enough that this is no single canonical from.
> >
> >There also is the insn_cost hook, which especially for RISC-like targets
> >is a lot easier to define.
> 
> Easier, but not a complete replacement for rtx_costs, so not necessarily 
> easier in the end...

It isn't a full replacement *yet*, still chipping away at it.  If your
port has an rtx_cost already, adding ai reasonable insn_cost will only
improve it, not regress anything.

But there are some places that still need rtx_costs, yes.

Do you have anything in particular in mind?  PRs welcome!


Segher
Richard Earnshaw (lists) Oct. 21, 2019, 4:06 p.m. UTC | #4
On 21/10/2019 16:46, Segher Boessenkool wrote:
> On Mon, Oct 21, 2019 at 03:46:53PM +0100, Richard Earnshaw (lists) wrote:
>> On 19/10/2019 14:00, Segher Boessenkool wrote:
>>> On Fri, Oct 18, 2019 at 08:48:40PM +0100, Richard Earnshaw wrote:
>>>>
>>>> The cost routine for Arm and Thumb2 was not recognising the idioms that
>>>> describe the addition with carry, this results in the instructions
>>>> appearing more expensive than they really are, which occasionally can lead
>>>> to poor choices by combine.  Recognising all the possible variants is
>>>> a little trickier than normal because the expressions can become complex
>>>> enough that this is no single canonical from.
>>>
>>> There also is the insn_cost hook, which especially for RISC-like targets
>>> is a lot easier to define.
>>
>> Easier, but not a complete replacement for rtx_costs, so not necessarily
>> easier in the end...
> 
> It isn't a full replacement *yet*, still chipping away at it.  If your
> port has an rtx_cost already, adding ai reasonable insn_cost will only
> improve it, not regress anything.
> 
> But there are some places that still need rtx_costs, yes.
> 
> Do you have anything in particular in mind?  PRs welcome!
> 
> 
> Segher
> 

Nothing specific.  I have vague recollections of a few places that 
generated quite complex RTL expressions and then expected rtx_costs to 
do the decomposition into the cost of the sequence for computing that. 
Perhaps the mul synthesis code was like that, but I'm probably 
misremembering it.

Certainly an API where you *know* the RTL is a legal insn and not 
something made up by the mid-end would be a real step forward - I think 
there are still places in the mid-end that generate (PLUS:QI ...), for 
example, and simple expect rtx_cost to return a useful answer even on a 
RISC target with no such instruction.

R.
Segher Boessenkool Oct. 21, 2019, 4:32 p.m. UTC | #5
On Mon, Oct 21, 2019 at 05:06:20PM +0100, Richard Earnshaw (lists) wrote:
> On 21/10/2019 16:46, Segher Boessenkool wrote:
> >>>There also is the insn_cost hook, which especially for RISC-like targets
> >>>is a lot easier to define.
> >>
> >>Easier, but not a complete replacement for rtx_costs, so not necessarily
> >>easier in the end...
> >
> >It isn't a full replacement *yet*, still chipping away at it.  If your
> >port has an rtx_cost already, adding ai reasonable insn_cost will only
> >improve it, not regress anything.
> >
> >But there are some places that still need rtx_costs, yes.
> >
> >Do you have anything in particular in mind?  PRs welcome!
> 
> Nothing specific.  I have vague recollections of a few places that 
> generated quite complex RTL expressions and then expected rtx_costs to 
> do the decomposition into the cost of the sequence for computing that. 
> Perhaps the mul synthesis code was like that, but I'm probably 
> misremembering it.

Yeah, and there are a few other (less important) places that do not have
anything like an insn.

> Certainly an API where you *know* the RTL is a legal insn and not 
> something made up by the mid-end would be a real step forward -

That is exactly what insn_cost is.  Well, it takes an rtx_insn *, even.

> I think 
> there are still places in the mid-end that generate (PLUS:QI ...), for 
> example, and simple expect rtx_cost to return a useful answer even on a 
> RISC target with no such instruction.

Yup.

CSE still uses rtx_cost directly.  avoid_expensive_constant does, too, and
some other optabs stuff.

Less important are the uses in dojump, expr, expmed (there is your "mult"),

But the really big ones left are set_src_cost and set_rtx_cost I think.


Segher
diff mbox series

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9a779e24cac..dfbd5cde5eb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9504,6 +9504,20 @@  thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
     }
 }
 
+/* Helper function for arm_rtx_costs.  If one operand of the OP, a
+   PLUS, adds the carry flag, then return the other operand.  If
+   neither is a carry, return OP unchanged.  */
+static rtx
+strip_carry_operation (rtx op)
+{
+  gcc_assert (GET_CODE (op) == PLUS);
+  if (arm_carry_operation (XEXP (op, 0), GET_MODE (op)))
+    return XEXP (op, 1);
+  else if (arm_carry_operation (XEXP (op, 1), GET_MODE (op)))
+    return XEXP (op, 0);
+  return op;
+}
+
 /* Helper function for arm_rtx_costs.  If the operand is a valid shift
    operand, then return the operand that is being shifted.  If the shift
    is not by a constant, then set SHIFT_REG to point to the operand.
@@ -10253,8 +10267,41 @@  arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	      return true;
 	    }
 
+	  rtx op0 = XEXP (x, 0);
+	  rtx op1 = XEXP (x, 1);
+
+	  /* Handle a side effect of adding in the carry to an addition.  */
+	  if (GET_CODE (op0) == PLUS
+	      && arm_carry_operation (op1, mode))
+	    {
+	      op1 = XEXP (op0, 1);
+	      op0 = XEXP (op0, 0);
+	    }
+	  else if (GET_CODE (op1) == PLUS
+		   && arm_carry_operation (op0, mode))
+	    {
+	      op0 = XEXP (op1, 0);
+	      op1 = XEXP (op1, 1);
+	    }
+	  else if (GET_CODE (op0) == PLUS)
+	    {
+	      op0 = strip_carry_operation (op0);
+	      if (swap_commutative_operands_p (op0, op1))
+		std::swap (op0, op1);
+	    }
+
+	  if (arm_carry_operation (op0, mode))
+	    {
+	      /* Adding the carry to a register is a canonicalization of
+		 adding 0 to the register plus the carry.  */
+	      if (speed_p)
+		*cost += extra_cost->alu.arith;
+	      *cost += rtx_cost (op1, mode, PLUS, 1, speed_p);
+	      return true;
+	    }
+
 	  shift_reg = NULL;
-	  shift_op = shifter_op_p (XEXP (x, 0), &shift_reg);
+	  shift_op = shifter_op_p (op0, &shift_reg);
 	  if (shift_op != NULL)
 	    {
 	      if (shift_reg)
@@ -10267,12 +10314,13 @@  arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 		*cost += extra_cost->alu.arith_shift;
 
 	      *cost += (rtx_cost (shift_op, mode, ASHIFT, 0, speed_p)
-			+ rtx_cost (XEXP (x, 1), mode, PLUS, 1, speed_p));
+			+ rtx_cost (op1, mode, PLUS, 1, speed_p));
 	      return true;
 	    }
-	  if (GET_CODE (XEXP (x, 0)) == MULT)
+
+	  if (GET_CODE (op0) == MULT)
 	    {
-	      rtx mul_op = XEXP (x, 0);
+	      rtx mul_op = op0;
 
 	      if (TARGET_DSP_MULTIPLY
 		  && ((GET_CODE (XEXP (mul_op, 0)) == SIGN_EXTEND
@@ -10296,7 +10344,7 @@  arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 				      SIGN_EXTEND, 0, speed_p)
 			    + rtx_cost (XEXP (XEXP (mul_op, 1), 0), mode,
 					SIGN_EXTEND, 0, speed_p)
-			    + rtx_cost (XEXP (x, 1), mode, PLUS, 1, speed_p));
+			    + rtx_cost (op1, mode, PLUS, 1, speed_p));
 		  return true;
 		}
 
@@ -10304,24 +10352,30 @@  arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 		*cost += extra_cost->mult[0].add;
 	      *cost += (rtx_cost (XEXP (mul_op, 0), mode, MULT, 0, speed_p)
 			+ rtx_cost (XEXP (mul_op, 1), mode, MULT, 1, speed_p)
-			+ rtx_cost (XEXP (x, 1), mode, PLUS, 1, speed_p));
+			+ rtx_cost (op1, mode, PLUS, 1, speed_p));
 	      return true;
 	    }
-	  if (CONST_INT_P (XEXP (x, 1)))
+
+	  if (CONST_INT_P (op1))
 	    {
 	      int insns = arm_gen_constant (PLUS, SImode, NULL_RTX,
-					    INTVAL (XEXP (x, 1)), NULL_RTX,
+					    INTVAL (op1), NULL_RTX,
 					    NULL_RTX, 1, 0);
 	      *cost = COSTS_N_INSNS (insns);
 	      if (speed_p)
 		*cost += insns * extra_cost->alu.arith;
-	      *cost += rtx_cost (XEXP (x, 0), mode, PLUS, 0, speed_p);
+	      *cost += rtx_cost (op0, mode, PLUS, 0, speed_p);
 	      return true;
 	    }
-	  else if (speed_p)
+
+	  if (speed_p)
 	    *cost += extra_cost->alu.arith;
 
-	  return false;
+	  /* Don't recurse here because we want to test the operands
+	     without any carry operation.  */
+	  *cost += rtx_cost (op0, mode, PLUS, 0, speed_p);
+	  *cost += rtx_cost (op1, mode, PLUS, 1, speed_p);
+	  return true;
 	}
 
       if (mode == DImode)