Message ID | 4C370373.7000607@codesourcery.com |
---|---|
State | New |
Headers | show |
On Fri, 2010-07-09 at 13:09 +0200, Bernd Schmidt wrote: > For a while now I've been struggling with a few Thumb-1 patches, because > trying to tune the backend's code generation always seemed to produce > some inexplicable results. I've finally tracked the problem down to a > change in reload. > > As part of the big IRA merge, find_reloads was changed to prefer > alternatives for which fewer alternatives require a > SMALL_REGISTER_CLASS_P class. (I'll repeat what I said at the time, > changes such as this should have been posted and reviewed separately > from IRA.) > > This change causes problems for Thumb-1, because its general register > class, LO_REGS, is CLASS_LIKELY_SPILLED_P, while the high registers > (which are fewer in number and harder to access) aren't. As a result, > we get bizarre results where reload prefers alternatives using high > registers over alternatives using low registers, or even preferring > alternatives with fewer operands allowing registers. Another defect is > that the code doesn't take goal_alternative_win into account. > > There are other conceivable ways of tuning this heuristic, e.g. > computing the smallest register class size over all operands for an > alternative, and then picking the alternative that has the highest such > minimum. I tested a patch for this, but thinking about it some more, > this also has the potential for unpleasant surprises: imagine a Thumb-1 > insn with one alternative allowing "l" (LO_REGS) and another allowing > "lk" (LO_REGS + the stack pointer); we'd always prefer the second one > for no reason. At this point I decided that the best thing to do for > now would be to revert this change. > > It should be possible to do better here. The thourough, but likely > expensive way, would be to record reloads for all alternatives with the > same cost, then choosing the best one (requiring the least spills) in > the caller using find_reload_regs. A simpler way that is closer to > Vlad's change here would be to determine if an operand is input, output, > or both, then picking the right regset(s) from the insn_chain, looking > if there are any registers available, and then counting the number of > known forced spills for each alternative. > > It might also be possible to achieve a similar effect by just reordering > alternatives in a port that wishes to prevent reload from using > likely-spilled classes if possible. > > The patch below has bootstrapped on i686-linux; regression tests in > progress. I've also tested completely different patches for the > problem, but I guess no one wants to know that. :-) > > Typical code generation differences on Thumb-1 look like this: > > - mov r2, #192 > - mov r3, r2 > - add r3, r3, r9 > + mov r3, r9 > + add r3, r3, #192 > > There's a small arm-specific bit in the patch, which switches two > alternatives on the grounds that synthesizing a negative constant in a > register is more expensive than reloading one register into another. > Hence, the alternative using constants should be preferred if it can > match, and thus it should go first: > > - mov r5, #32 > - neg r5, r5 > - add r4, r2, r5 > + mov r4, r2 > + sub r4, r4, #32 > bmi .L2 > > This was a case where the find_reloads code produced better results for > entirely the wrong reasons. > > I'll commit the reload patch if no one objects soonish and if the ARM > part is approved. > > > Bernd The ARM bit is fine. R.
Index: reload.c =================================================================== --- reload.c (revision 161987) +++ reload.c (working copy) @@ -2602,7 +2602,6 @@ find_reloads (rtx insn, int replace, int char goal_alternative_earlyclobber[MAX_RECOG_OPERANDS]; int goal_alternative_swapped; int best; - int best_small_class_operands_num; int commutative; char operands_match[MAX_RECOG_OPERANDS][MAX_RECOG_OPERANDS]; rtx substed_operand[MAX_RECOG_OPERANDS]; @@ -2928,7 +2927,6 @@ find_reloads (rtx insn, int replace, int all the operands together against the register constraints. */ best = MAX_RECOG_OPERANDS * 2 + 600; - best_small_class_operands_num = 0; swapped = 0; goal_alternative_swapped = 0; @@ -3714,27 +3712,7 @@ find_reloads (rtx insn, int replace, int record it as the chosen goal for reloading. */ if (! bad) { - bool change_p = false; - int small_class_operands_num = 0; - - if (best >= losers) - { - for (i = 0; i < noperands; i++) - small_class_operands_num - += SMALL_REGISTER_CLASS_P (this_alternative[i]) ? 1 : 0; - if (best > losers - || (best == losers - /* If the cost of the reloads is the same, - prefer alternative which requires minimal - number of small register classes for the - operands. This improves chances of reloads - for insn requiring small register - classes. */ - && (small_class_operands_num - < best_small_class_operands_num))) - change_p = true; - } - if (change_p) + if (best > losers) { for (i = 0; i < noperands; i++) { @@ -3750,7 +3728,6 @@ find_reloads (rtx insn, int replace, int } goal_alternative_swapped = swapped; best = losers; - best_small_class_operands_num = small_class_operands_num; goal_alternative_number = this_alternative_number; goal_earlyclobber = this_earlyclobber; } Index: config/arm/arm.md =================================================================== --- config/arm/arm.md (revision 161987) +++ config/arm/arm.md (working copy) @@ -7459,8 +7459,8 @@ (define_insn "*addsi3_cbranch" (if_then_else (match_operator 4 "arm_comparison_operator" [(plus:SI - (match_operand:SI 2 "s_register_operand" "%l,0,*l,1,1,1") - (match_operand:SI 3 "reg_or_int_operand" "lL,IJ,*l,lIJ,lIJ,lIJ")) + (match_operand:SI 2 "s_register_operand" "%0,l,*l,1,1,1") + (match_operand:SI 3 "reg_or_int_operand" "IJ,lL,*l,lIJ,lIJ,lIJ")) (const_int 0)]) (label_ref (match_operand 5 "" "")) (pc)))