diff mbox

Fix gcc.dg/lower-subreg-1.c failure (was: [C Patch]: pr52543)

Message ID 201205090602.q4962Pli001609@ignucius.se.axis.com
State New
Headers show

Commit Message

Hans-Peter Nilsson May 9, 2012, 6:02 a.m. UTC
> From: Richard Sandiford <rdsandiford@googlemail.com>
> Date: Tue, 1 May 2012 16:46:38 +0200

> To repeat: as things stand, very few targets define proper rtx costs
> for SET.

IMHO it's wrong to start blaming targets when rtx_cost doesn't
take the mode in account in the first place, for the default
cost.  (Well, except for the modes-tieable subreg special-case.)
The targets where an operation in N * word_mode costs no more
than one in word_mode, if there even is one, is a minority,
let's adjust the defaults to that.

>  This patch is therefore expected to prevent lower-subreg
> from running in cases where it's actually benefical.  If you see that
> happening, please check whether the rtx_costs are defined properly.

Well, for CRIS (one of the targets of the PR53176 fallout) they
are sane, basically.  Where cris_rtx_costs returns true, it
returns mostly(*) ballparkly-correct costs *where it's passed an
rtx for which there's a corresponding insn*, otherwise falling
back to the defaults.  It shouldn't have to check for validity
of the rtx asked about; core GCC already knows which insns there
are and can gate that in rtx_cost or its callers.

(*) I see a bug in that cris_rtx_costs doesn't check the mode
for extendsidi2, to return COSTS_N_INSNS (3) instead of 0
(because a sign-extending move to SImode doesn't cost more than
a move; a sign- or zero-extension is also free in an operand for
addition and multiplication).  But this isn't on the path that
lower-subreg.c takes, so only an incidental observation...

> Of course, if the costs are defined properly and lower-subreg still
> makes the wrong choice, we need to look at why.

By the way, regarding validity of rtx_cost calls:

> +++ gcc/lower-subreg.c  2012-05-01 09:46:48.473830772 +0100

> +/* Return the cost of a CODE shift in mode MODE by OP1 bits, using the
> +   rtxes in RTXES.  SPEED_P selects between the speed and size cost.  */
> +
> +static int
> +shift_cost (bool speed_p, struct cost_rtxes *rtxes, enum rtx_code code,
> +           enum machine_mode mode, int op1)
> +{
> +  PUT_MODE (rtxes->target, mode);
> +  PUT_CODE (rtxes->shift, code);
> +  PUT_MODE (rtxes->shift, mode);
> +  PUT_MODE (rtxes->source, mode);
> +  XEXP (rtxes->shift, 1) = GEN_INT (op1);
> +  SET_SRC (rtxes->set) = rtxes->shift;
> +  return insn_rtx_cost (rtxes->set, speed_p);
> +}
> +
> +/* For each X in the range [0, BITS_PER_WORD), set SPLITTING[X]
> +   to true if it is profitable to split a double-word CODE shift
> +   of X + BITS_PER_WORD bits.  SPEED_P says whether we are testing
> +   for speed or size profitability.
> +
> +   Use the rtxes in RTXES to calculate costs.  WORD_MOVE_ZERO_COST is
> +   the cost of moving zero into a word-mode register.  WORD_MOVE_COST
> +   is the cost of moving between word registers.  */
> +
> +static void
> +compute_splitting_shift (bool speed_p, struct cost_rtxes *rtxes,
> +                        bool *splitting, enum rtx_code code,
> +                        int word_move_zero_cost, int word_move_cost)
> +{

I think there should be a gating check whether the target
implements that kind of shift in that mode at all, before
checking the cost.  Not sure whether it's generally best to put
that test here, or to make the rtx_cost function return the cost
of a libcall for that mode when that happens.  Similar for the
other insns.

Isn't the below better than doing virtually the same in each
target's rtx_costs?  Not tested yet besides "make cc1" and
checking that lower-subreg.c yields sane costs and that
gcc.dg/lower-subreg-1.c passes for cris-elf.  Note that
untieable SUBREGs still get a higher cost than tieable ones.

I'll test this for cris-elf, please tell me if/what other tests
and targets are required (simulator or compilefarm targets only,
please).

	* rtlanal.c (rtx_cost): Adjust default cost for X with a
	UNITS_PER_WORD factor for all X according to the size of
	its mode, not just for SUBREGs with untieable modes.


brgds, H-P

Comments

Richard Sandiford May 9, 2012, 9:14 a.m. UTC | #1
Hans-Peter Nilsson <hans-peter.nilsson@axis.com> writes:
>> From: Richard Sandiford <rdsandiford@googlemail.com>
>> Date: Tue, 1 May 2012 16:46:38 +0200
>
>> To repeat: as things stand, very few targets define proper rtx costs
>> for SET.
>
> IMHO it's wrong to start blaming targets when rtx_cost doesn't
> take the mode in account in the first place, for the default
> cost.  (Well, except for the modes-tieable subreg special-case.)
> The targets where an operation in N * word_mode costs no more
> than one in word_mode, if there even is one, is a minority,
> let's adjust the defaults to that.

I'll pass on approving or disapproving this patch, but for the record:
a factor of word_mode seems a bit too simplistic.  It's OK for moves
and logic ops, but addition of multiword modes is a bit more complicated.
Multiplication and division by multiword modes is even more so, of course.

>> Of course, if the costs are defined properly and lower-subreg still
>> makes the wrong choice, we need to look at why.
>
> By the way, regarding validity of rtx_cost calls:
>
>> +++ gcc/lower-subreg.c  2012-05-01 09:46:48.473830772 +0100
>
>> +/* Return the cost of a CODE shift in mode MODE by OP1 bits, using the
>> +   rtxes in RTXES.  SPEED_P selects between the speed and size cost.  */
>> +
>> +static int
>> +shift_cost (bool speed_p, struct cost_rtxes *rtxes, enum rtx_code code,
>> +           enum machine_mode mode, int op1)
>> +{
>> +  PUT_MODE (rtxes->target, mode);
>> +  PUT_CODE (rtxes->shift, code);
>> +  PUT_MODE (rtxes->shift, mode);
>> +  PUT_MODE (rtxes->source, mode);
>> +  XEXP (rtxes->shift, 1) = GEN_INT (op1);
>> +  SET_SRC (rtxes->set) = rtxes->shift;
>> +  return insn_rtx_cost (rtxes->set, speed_p);
>> +}
>> +
>> +/* For each X in the range [0, BITS_PER_WORD), set SPLITTING[X]
>> +   to true if it is profitable to split a double-word CODE shift
>> +   of X + BITS_PER_WORD bits.  SPEED_P says whether we are testing
>> +   for speed or size profitability.
>> +
>> +   Use the rtxes in RTXES to calculate costs.  WORD_MOVE_ZERO_COST is
>> +   the cost of moving zero into a word-mode register.  WORD_MOVE_COST
>> +   is the cost of moving between word registers.  */
>> +
>> +static void
>> +compute_splitting_shift (bool speed_p, struct cost_rtxes *rtxes,
>> +                        bool *splitting, enum rtx_code code,
>> +                        int word_move_zero_cost, int word_move_cost)
>> +{
>
> I think there should be a gating check whether the target
> implements that kind of shift in that mode at all, before
> checking the cost.  Not sure whether it's generally best to put
> that test here, or to make the rtx_cost function return the cost
> of a libcall for that mode when that happens.  Similar for the
> other insns.

This has come up a few times in past discussions about rtx_cost
(as I'm sure you remember :-)).  On the one hand, I agree it might
be nice to shield the backend from invalid insns.  That would
effectively mean calling expand on each insn though, which would be
rather expensive.  Especially in a GC world.  It would also prevent
the target from being able to take things like pipeline characteristics
into account.  E.g. if you know that something takes 2 operations
that must be issued sequentially, you might want to add in an extra
(sub-COSTS_N_INSN (1)) cost compared to 2 operations that can be issued
together, even if the individual operations take the same number of cycles.

The same goes for multiplication in general.  On some targets
the speed of a multiplication can depend on the second operand,
so knowing its value is useful even if the target doesn't have
a multiplication instruction that takes constant operands.

As things stand, rtx_cost is intentionally used for more than
just valid target insns.  One of the main uses has always been
to decide whether it is better to implement multiplications by a
shift-and-add sequence, or whether to use multiplication instructions.
The associated expmed code has never tried to decide which shifts are
actually representable as matching rtl insns and which aren't.
The same goes for division-using-multiplication.

So I think this patch is using rtx_cost according to its current
interface.  If someone wants to change or restrict that interface,
than that's a separate change IMO.  And it should be done consistently
rather than in this one place.

In this case it doesn't matter anyway.  If we never see a shift
in mode M by amount X, we'll never need to make a decision about
whether to split it.

> Isn't the below better than doing virtually the same in each
> target's rtx_costs?

FWIW, MIPS, SH and x86 all used something slightly different (and more
complicated).  I imagine PowerPC and SPARC will too.  So "each" seems
a bit strong.

That's not an objection to the patch though.  I realise some ports do
have very regular architectures where every register is the same width
and has the same move cost.

Richard
diff mbox

Patch

Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	(revision 187308)
+++ gcc/rtlanal.c	(working copy)
@@ -3755,10 +3755,17 @@  rtx_cost (rtx x, enum rtx_code outer_cod
   enum rtx_code code;
   const char *fmt;
   int total;
+  int factor;
 
   if (x == 0)
     return 0;
 
+  /* A size N times larger than UNITS_PER_WORD likely needs N times as
+     many insns, taking N times as long.  */
+  factor = GET_MODE_SIZE (GET_MODE (x)) / UNITS_PER_WORD;
+  if (factor == 0)
+    factor = 1;
+
   /* Compute the default costs of certain things.
      Note that targetm.rtx_costs can override the defaults.  */
 
@@ -3766,20 +3773,27 @@  rtx_cost (rtx x, enum rtx_code outer_cod
   switch (code)
     {
     case MULT:
-      total = COSTS_N_INSNS (5);
+      total = factor * COSTS_N_INSNS (5);
       break;
     case DIV:
     case UDIV:
     case MOD:
     case UMOD:
-      total = COSTS_N_INSNS (7);
+      total = factor * COSTS_N_INSNS (7);
       break;
     case USE:
       /* Used in combine.c as a marker.  */
       total = 0;
       break;
+    case SET:
+      /* A SET doesn't have a mode, so let's look at the SET_DEST to get
+	 the mode for the factor.  */
+      factor = GET_MODE_SIZE (GET_MODE (SET_DEST (x))) / UNITS_PER_WORD;
+      if (factor == 0)
+	factor = 1;
+      /* Pass through.  */
     default:
-      total = COSTS_N_INSNS (1);
+      total = factor * COSTS_N_INSNS (1);
     }
 
   switch (code)
@@ -3792,8 +3806,7 @@  rtx_cost (rtx x, enum rtx_code outer_cod
       /* If we can't tie these modes, make this expensive.  The larger
 	 the mode, the more expensive it is.  */
       if (! MODES_TIEABLE_P (GET_MODE (x), GET_MODE (SUBREG_REG (x))))
-	return COSTS_N_INSNS (2
-			      + GET_MODE_SIZE (GET_MODE (x)) / UNITS_PER_WORD);
+	return COSTS_N_INSNS (2 + factor);
       break;
 
     default: