diff mbox

Patch: Revert find_reloads part of the IRA merge

Message ID 4C370373.7000607@codesourcery.com
State New
Headers show

Commit Message

Bernd Schmidt July 9, 2010, 11:09 a.m. UTC
For a while now I've been struggling with a few Thumb-1 patches, because
trying to tune the backend's code generation always seemed to produce
some inexplicable results.  I've finally tracked the problem down to a
change in reload.

As part of the big IRA merge, find_reloads was changed to prefer
alternatives for which fewer alternatives require a
SMALL_REGISTER_CLASS_P class.  (I'll repeat what I said at the time,
changes such as this should have been posted and reviewed separately
from IRA.)

This change causes problems for Thumb-1, because its general register
class, LO_REGS, is CLASS_LIKELY_SPILLED_P, while the high registers
(which are fewer in number and harder to access) aren't.  As a result,
we get bizarre results where reload prefers alternatives using high
registers over alternatives using low registers, or even preferring
alternatives with fewer operands allowing registers.  Another defect is
that the code doesn't take goal_alternative_win into account.

There are other conceivable ways of tuning this heuristic, e.g.
computing the smallest register class size over all operands for an
alternative, and then picking the alternative that has the highest such
minimum.  I tested a patch for this, but thinking about it some more,
this also has the potential for unpleasant surprises: imagine a Thumb-1
insn with one alternative allowing "l" (LO_REGS) and another allowing
"lk" (LO_REGS + the stack pointer); we'd always prefer the second one
for no reason.  At this point I decided that the best thing to do for
now would be to revert this change.

It should be possible to do better here.  The thourough, but likely
expensive way, would be to record reloads for all alternatives with the
same cost, then choosing the best one (requiring the least spills) in
the caller using find_reload_regs.  A simpler way that is closer to
Vlad's change here would be to determine if an operand is input, output,
or both, then picking the right regset(s) from the insn_chain, looking
if there are any registers available, and then counting the number of
known forced spills for each alternative.

It might also be possible to achieve a similar effect by just reordering
alternatives in a port that wishes to prevent reload from using
likely-spilled classes if possible.

The patch below has bootstrapped on i686-linux; regression tests in
progress.  I've also tested completely different patches for the
problem, but I guess no one wants to know that. :-)

Typical code generation differences on Thumb-1 look like this:

-       mov     r2, #192
-       mov     r3, r2
-       add     r3, r3, r9
+       mov     r3, r9
+       add     r3, r3, #192

There's a small arm-specific bit in the patch, which switches two
alternatives on the grounds that synthesizing a negative constant in a
register is more expensive than reloading one register into another.
Hence, the alternative using constants should be preferred if it can
match, and thus it should go first:

-       mov     r5, #32
-       neg     r5, r5
-       add     r4, r2, r5
+       mov     r4, r2
+       sub     r4, r4, #32
        bmi     .L2

This was a case where the find_reloads code produced better results for
entirely the wrong reasons.

I'll commit the reload patch if no one objects soonish and if the ARM
part is approved.


Bernd
* reload.c (find_reloads): Revert code to penalize small register
	classes that was brought in with the IRA merge.
	* config/arm/arm.md (addsi3_cbranch): Switch alternatives 0 and 1.

Comments

Richard Earnshaw July 9, 2010, 1:28 p.m. UTC | #1
On Fri, 2010-07-09 at 13:09 +0200, Bernd Schmidt wrote:
> For a while now I've been struggling with a few Thumb-1 patches, because
> trying to tune the backend's code generation always seemed to produce
> some inexplicable results.  I've finally tracked the problem down to a
> change in reload.
> 
> As part of the big IRA merge, find_reloads was changed to prefer
> alternatives for which fewer alternatives require a
> SMALL_REGISTER_CLASS_P class.  (I'll repeat what I said at the time,
> changes such as this should have been posted and reviewed separately
> from IRA.)
> 
> This change causes problems for Thumb-1, because its general register
> class, LO_REGS, is CLASS_LIKELY_SPILLED_P, while the high registers
> (which are fewer in number and harder to access) aren't.  As a result,
> we get bizarre results where reload prefers alternatives using high
> registers over alternatives using low registers, or even preferring
> alternatives with fewer operands allowing registers.  Another defect is
> that the code doesn't take goal_alternative_win into account.
> 
> There are other conceivable ways of tuning this heuristic, e.g.
> computing the smallest register class size over all operands for an
> alternative, and then picking the alternative that has the highest such
> minimum.  I tested a patch for this, but thinking about it some more,
> this also has the potential for unpleasant surprises: imagine a Thumb-1
> insn with one alternative allowing "l" (LO_REGS) and another allowing
> "lk" (LO_REGS + the stack pointer); we'd always prefer the second one
> for no reason.  At this point I decided that the best thing to do for
> now would be to revert this change.
> 
> It should be possible to do better here.  The thourough, but likely
> expensive way, would be to record reloads for all alternatives with the
> same cost, then choosing the best one (requiring the least spills) in
> the caller using find_reload_regs.  A simpler way that is closer to
> Vlad's change here would be to determine if an operand is input, output,
> or both, then picking the right regset(s) from the insn_chain, looking
> if there are any registers available, and then counting the number of
> known forced spills for each alternative.
> 
> It might also be possible to achieve a similar effect by just reordering
> alternatives in a port that wishes to prevent reload from using
> likely-spilled classes if possible.
> 
> The patch below has bootstrapped on i686-linux; regression tests in
> progress.  I've also tested completely different patches for the
> problem, but I guess no one wants to know that. :-)
> 
> Typical code generation differences on Thumb-1 look like this:
> 
> -       mov     r2, #192
> -       mov     r3, r2
> -       add     r3, r3, r9
> +       mov     r3, r9
> +       add     r3, r3, #192
> 
> There's a small arm-specific bit in the patch, which switches two
> alternatives on the grounds that synthesizing a negative constant in a
> register is more expensive than reloading one register into another.
> Hence, the alternative using constants should be preferred if it can
> match, and thus it should go first:
> 
> -       mov     r5, #32
> -       neg     r5, r5
> -       add     r4, r2, r5
> +       mov     r4, r2
> +       sub     r4, r4, #32
>         bmi     .L2
> 
> This was a case where the find_reloads code produced better results for
> entirely the wrong reasons.
> 
> I'll commit the reload patch if no one objects soonish and if the ARM
> part is approved.
> 
> 
> Bernd

The ARM bit is fine.

R.
diff mbox

Patch

Index: reload.c
===================================================================
--- reload.c	(revision 161987)
+++ reload.c	(working copy)
@@ -2602,7 +2602,6 @@  find_reloads (rtx insn, int replace, int
   char goal_alternative_earlyclobber[MAX_RECOG_OPERANDS];
   int goal_alternative_swapped;
   int best;
-  int best_small_class_operands_num;
   int commutative;
   char operands_match[MAX_RECOG_OPERANDS][MAX_RECOG_OPERANDS];
   rtx substed_operand[MAX_RECOG_OPERANDS];
@@ -2928,7 +2927,6 @@  find_reloads (rtx insn, int replace, int
      all the operands together against the register constraints.  */
 
   best = MAX_RECOG_OPERANDS * 2 + 600;
-  best_small_class_operands_num = 0;
 
   swapped = 0;
   goal_alternative_swapped = 0;
@@ -3714,27 +3712,7 @@  find_reloads (rtx insn, int replace, int
 	 record it as the chosen goal for reloading.  */
       if (! bad)
 	{
-	  bool change_p = false;
-	  int small_class_operands_num = 0;
-
-	  if (best >= losers)
-	    {
-	      for (i = 0; i < noperands; i++)
-		small_class_operands_num
-		  += SMALL_REGISTER_CLASS_P (this_alternative[i]) ? 1 : 0;
-	      if (best > losers
-		  || (best == losers
-		      /* If the cost of the reloads is the same,
-			 prefer alternative which requires minimal
-			 number of small register classes for the
-			 operands.  This improves chances of reloads
-			 for insn requiring small register
-			 classes.  */
-		      && (small_class_operands_num
-			  < best_small_class_operands_num)))
-		change_p = true;
-	    }
-	  if (change_p)
+	  if (best > losers)
 	    {
 	      for (i = 0; i < noperands; i++)
 		{
@@ -3750,7 +3728,6 @@  find_reloads (rtx insn, int replace, int
 		}
 	      goal_alternative_swapped = swapped;
 	      best = losers;
-	      best_small_class_operands_num = small_class_operands_num;
 	      goal_alternative_number = this_alternative_number;
 	      goal_earlyclobber = this_earlyclobber;
 	    }
Index: config/arm/arm.md
===================================================================
--- config/arm/arm.md	(revision 161987)
+++ config/arm/arm.md	(working copy)
@@ -7459,8 +7459,8 @@  (define_insn "*addsi3_cbranch"
 	(if_then_else
 	 (match_operator 4 "arm_comparison_operator"
 	  [(plus:SI
-	    (match_operand:SI 2 "s_register_operand" "%l,0,*l,1,1,1")
-	    (match_operand:SI 3 "reg_or_int_operand" "lL,IJ,*l,lIJ,lIJ,lIJ"))
+	    (match_operand:SI 2 "s_register_operand" "%0,l,*l,1,1,1")
+	    (match_operand:SI 3 "reg_or_int_operand" "IJ,lL,*l,lIJ,lIJ,lIJ"))
 	   (const_int 0)])
 	 (label_ref (match_operand 5 "" ""))
 	 (pc)))