Message ID | 87lhrpnmg0.fsf@talisman.default |
---|---|
State | New |
Headers | show |
On 07/19/14 14:05, Richard Sandiford wrote: > IRA (like the old allocators) calculates a "best" class and an "alternative" > class for each allocno. The best class is the one with the lowest cost while > the alternative class is the biggest class whose cost is smaller than the > cost of spilling. When optimising we adjust the costs so that the best > class is preferred, but when not optimisting we effectively use the > alternative class: > > if (optimize && ALLOCNO_CLASS (a) != pref[i]) > { > n = ira_class_hard_regs_num[aclass]; > ALLOCNO_HARD_REG_COSTS (a) > = reg_costs = ira_allocate_cost_vector (aclass); > for (j = n - 1; j >= 0; j--) > { > hard_regno = ira_class_hard_regs[aclass][j]; > if (TEST_HARD_REG_BIT (reg_class_contents[pref[i]], hard_regno)) > reg_costs[j] = ALLOCNO_CLASS_COST (a); > else > { > rclass = REGNO_REG_CLASS (hard_regno); > num = cost_classes_ptr->index[rclass]; > if (num < 0) > { > num = cost_classes_ptr->hard_regno_index[hard_regno]; > ira_assert (num >= 0); > } > reg_costs[j] = COSTS (costs, i)->cost[num]; > } > } > } > > In some cases the alternative class can be significantly more costly > than the preferred class. If reg_alloc_order lists a member of the > alternative class that isn't in the best class, then for -O0 we'll tend > to use it even if many members of the best class are still free. > > Obviously we don't want to spend too much on RA at -O0. But with the > code above disabled, I think we should use the best class as the allocno > class if it isn't likely to be spilled. The patch below does this. > > One case where this occurs is on MIPS with: > > (set (reg:DI y) (sign_extend:SI (reg:SI x))) > ...(reg:DI y)... > > where y is used in a context that requires a GPR and where the sign > extension is done by: > > (define_insn_and_split "extendsidi2" > [(set (match_operand:DI 0 "register_operand" "=d,l,d") > (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "0,0,m")))] > "TARGET_64BIT" > "@ > # > # > lw\t%0,%1" > "&& reload_completed && register_operand (operands[1], VOIDmode)" > [(const_int 0)] > { > emit_note (NOTE_INSN_DELETED); > DONE; > } > [(set_attr "move_type" "move,move,load") > (set_attr "mode" "DI")]) > > ("l" is the LO register.) Because a "d" register would satisfy both > the definition and use of "y", it is the best class and has a cost of 0. > But the costs are deliberately set up so that using "l" is cheaper > than memory (which might not be true on all processors, but that's > a question of -mtune-specific costs). So the alternative class is > GR_AND_MD?_REGS. The problem is that the LO register comes first > in the allocation order: > > #define REG_ALLOC_ORDER \ > { /* Accumulator registers. When GPRs and accumulators have equal \ > cost, we generally prefer to use accumulators. For example, \ > a division of multiplication result is better allocated to LO, \ > so that we put the MFLO at the point of use instead of at the \ > point of definition. It's also needed if we're to take advantage \ > of the extra accumulators available with -mdspr2. In some cases, \ > it can also help to reduce register pressure. */ \ > 64, 65,176,177,178,179,180,181, \ > > So we end up picking LO even though it is significantly more costly > than GPRs in this case (but still less costly than memory, again > according to the chosen cost model). This is the cause of a regression > in gcc.target/mips/branch-7.c after: > > 2014-06-24 Catherine Moore <clm@codesourcery.com> > Sandra Loosemore <sandra@codesourcery.com> > > * config/mips/mips.c (mips_order_regs_for_local_alloc): Delete. > * config/mips/mips.h (ADJUST_REG_ALLOC_ORDER): Delete. > * config/mips/mips-protos.h (mips_order_regs_for_local_alloc): Delete. > > But I think it's a general problem. I tried the patch on CSiBE for: > > MIPS -mips64r2 -mabi=32 -mips16 > MIPS -mips64r2 -mabi=32 > MIPS -mips64r2 -mabi=n32 > MIPS -mips64r2 -mabi=64 > MIPS -mips3 -mabi=64 > x86_64 -m32 > x86_64 > > all at -O0, and it was a size win in all cases. The biggest win was for > MIPS -mips64r2 -mabi=32 (which isn't affected by the sign_extend case above, > since the pattern is restricted to 64-bit targets). The saving there was 10%. > The saving for x86_64 -m32 was just 0.05%, but the saving for x86_64 was a > more worthwhile 0.5%. > > Tested on mips64-linux-gnu and x86_64-linux-gnu. It fixes the > gcc.target/mips/branch-7.c regression for MIPS. OK to install? > > Thanks, > Richard > > > gcc/ > * ira-costs.c (find_costs_and_classes): For -O0, use the best class > as the allocation class if it isn't likely to be spilled. Seems reasonable. Please keep an eye out for fallout on other targets (I'm thinking ARM in particular). Thanks, jeff
Index: gcc/ira-costs.c =================================================================== --- gcc/ira-costs.c 2014-07-19 12:22:53.182907048 +0100 +++ gcc/ira-costs.c 2014-07-19 21:02:08.000441494 +0100 @@ -1753,6 +1753,20 @@ find_costs_and_classes (FILE *dump_file) alt_class = ira_allocno_class_translate[alt_class]; if (best_cost > i_mem_cost) regno_aclass[i] = NO_REGS; + else if (!optimize && !targetm.class_likely_spilled_p (best)) + /* Registers in the alternative class are likely to need + longer or slower sequences than registers in the best class. + When optimizing we make some effort to use the best class + over the alternative class where possible, but at -O0 we + effectively give the alternative class equal weight. + We then run the risk of using slower alternative registers + when plenty of registers from the best class are still free. + This is especially true because live ranges tend to be very + short in -O0 code and so register pressure tends to be low. + + Avoid that by ignoring the alternative class if the best + class has plenty of registers. */ + regno_aclass[i] = best; else { /* Make the common class the biggest class of best and