Message ID | 4FEA77E0.4020501@redhat.com |
---|---|
State | New |
Headers | show |
On Wed, Jun 27, 2012 at 5:02 AM, Richard Henderson <rth@redhat.com> wrote: > The problem I'd like to solve is stuff like > > pxor %xmm4, %xmm4 > ... > movdqa %xmm4, %xmm2 > pcmpgtd %xmm0, %xmm2 > > In that there's no point performing the copy from xmm4 > rather than just emitting a new pxor insn. > > The Real Problem, as I see it, is that at the point (g)cse > runs we have no visibility into the 2-operand matching > constraint on that pcmpgtd so we make the wrong choice > in sharing the zero. > > If we're using AVX, instead of SSE, we don't use matching > constraints and given the 3-operand insn, hoisting the zero > is the right and proper thing to do because we won't need > to emit that movdqa. > > Of course, this fires for normal integer code as well. > Some cases it's a clear win: > > -: 41 be 1f 00 00 00 mov $0x1f,%r14d > ... > -: 4c 89 f1 mov %r14,%rcx > +: b9 1f 00 00 00 mov $0x1f,%ecx > > sometimes not (increased code size): > > -: 41 bd 01 00 00 00 mov $0x1,%r13d > -: 4d 89 ec mov %r13,%r12 > +: 41 bc 01 00 00 00 mov $0x1,%r12d > +: 41 bd 01 00 00 00 mov $0x1,%r13d I suppose that might be fixed if instead of + /* Only use the constant when it's just as cheap as a reg move. */ + if (set_src_cost (c, optimize_function_for_speed_p (cfun)) == 0) + return c; you'd unconditionall use size costs? > although the total difference is minimal, and ambiguous: > > new text old text > cc1 13971302 13971342 > cc1plus 15882736 15882728 > > Also, note that in the first case above, r14 is otherwise > unused, and we wind up with an unnecessary save/restore of > the register in the function. > > Thoughts? We have an inverse issue elsewhere in that we don't CSE a propagated constant but get mov $0, %(eax) mov $0, 4%(eax) ... instead of doing one register clearing and then re-using that as zero. But I suppose reload is not exactly the place to fix that ;) Richard. > > r~
On 06/27/2012 01:45 AM, Richard Guenther wrote: >> > Of course, this fires for normal integer code as well. >> > Some cases it's a clear win: >> > >> > -: 41 be 1f 00 00 00 mov $0x1f,%r14d >> > ... >> > -: 4c 89 f1 mov %r14,%rcx >> > +: b9 1f 00 00 00 mov $0x1f,%ecx >> > >> > sometimes not (increased code size): >> > >> > -: 41 bd 01 00 00 00 mov $0x1,%r13d >> > -: 4d 89 ec mov %r13,%r12 >> > +: 41 bc 01 00 00 00 mov $0x1,%r12d >> > +: 41 bd 01 00 00 00 mov $0x1,%r13d > I suppose that might be fixed if instead of > > + /* Only use the constant when it's just as cheap as a reg move. */ > + if (set_src_cost (c, optimize_function_for_speed_p (cfun)) == 0) > + return c; > > you'd unconditionall use size costs? > For one, without x86 cost changes that wouldn't affect anything. For another, unconditionally using size costs, locally, would then exchange the missed optimization from the second case to the first. > We have an inverse issue elsewhere in that we don't CSE a propagated constant > but get > > mov $0, %(eax) > mov $0, 4%(eax) > ... > > instead of doing one register clearing and then re-using that as zero. But I > suppose reload is not exactly the place to fix that ;) That would be exactly because x86 doesn't model immediate costs properly. My patch trying to un-cse in exactly the spot where the value is about to be clobbered. While we could give a go at this in a pre-reload pass, it would be just a guess until register allocation does or does not assign a hard reg to the constant, and does or does not choose an alternative that requires the constant match an output. Having reviewed more of the cc1 asm diff, the vast majority of cases are: * the cx input to string insns, * (1 << n). These results are certainly skewed by the kind of stuff we do in gcc, but it makes a fair amount of sense. r~
On 06/27/2012 10:45 AM, Richard Guenther wrote: > On Wed, Jun 27, 2012 at 5:02 AM, Richard Henderson <rth@redhat.com> wrote: >> sometimes not (increased code size): >> >> -: 41 bd 01 00 00 00 mov $0x1,%r13d >> -: 4d 89 ec mov %r13,%r12 >> +: 41 bc 01 00 00 00 mov $0x1,%r12d >> +: 41 bd 01 00 00 00 mov $0x1,%r13d > > I suppose that might be fixed if instead of > > + /* Only use the constant when it's just as cheap as a reg move. */ > + if (set_src_cost (c, optimize_function_for_speed_p (cfun)) == 0) > + return c; > > you'd unconditionall use size costs? I've added some code last year or so in rtl.h to operate on full_rtx_costs, taking both into account (use the primary cost comparison, and if that's equal, use the secondary). Would that work here? Bernd
diff --git a/gcc/reload.c b/gcc/reload.c index e42cc5c..fb928be 100644 --- a/gcc/reload.c +++ b/gcc/reload.c @@ -2534,6 +2534,38 @@ safe_from_earlyclobber (rtx op, rtx clobber) early_data = decompose (clobber); return immune_p (op, clobber, early_data); } + +/* For matching operand constraints, we may need to make a reg-reg copy. + In some cases the source reg is equivalent to a constant, and it's + more efficient to set the dest reg from the constant rather than the + source reg. Return the source to use for the reload. */ + +static rtx +matching_constraint_src (rtx src) +{ + unsigned regno; + rtx c; + + if (!REG_P (src)) + return src; + + regno = REGNO (src); + if (HARD_REGISTER_NUM_P (regno)) + { + regno = ORIGINAL_REGNO (src); + if (HARD_REGISTER_NUM_P (regno)) + return src; + } + c = reg_equiv_constant (regno); + if (c == NULL) + return src; + + /* Only use the constant when it's just as cheap as a reg move. */ + if (set_src_cost (c, optimize_function_for_speed_p (cfun)) == 0) + return c; + else + return src; +} /* Main entry point of this file: search the body of INSN for values that need reloading and record them with push_reload. @@ -4049,28 +4081,28 @@ find_reloads (rtx insn, int replace, int ind_levels, int live_known, else if (modified[i] == RELOAD_READ && modified[goal_alternative_matched[i]] == RELOAD_WRITE) { + int a = goal_alternative_matched[i]; operand_reloadnum[i] - = push_reload (recog_data.operand[i], - recog_data.operand[goal_alternative_matched[i]], + = push_reload (matching_constraint_src (recog_data.operand[i]), + recog_data.operand[a], recog_data.operand_loc[i], - recog_data.operand_loc[goal_alternative_matched[i]], + recog_data.operand_loc[a], (enum reg_class) goal_alternative[i], - operand_mode[i], - operand_mode[goal_alternative_matched[i]], + operand_mode[i], operand_mode[a], 0, 0, i, RELOAD_OTHER); - operand_reloadnum[goal_alternative_matched[i]] = output_reloadnum; + operand_reloadnum[a] = output_reloadnum; } else if (modified[i] == RELOAD_WRITE && modified[goal_alternative_matched[i]] == RELOAD_READ) { - operand_reloadnum[goal_alternative_matched[i]] - = push_reload (recog_data.operand[goal_alternative_matched[i]], + int a = goal_alternative_matched[i]; + operand_reloadnum[a] + = push_reload (matching_constraint_src (recog_data.operand[a]), recog_data.operand[i], - recog_data.operand_loc[goal_alternative_matched[i]], + recog_data.operand_loc[a], recog_data.operand_loc[i], (enum reg_class) goal_alternative[i], - operand_mode[goal_alternative_matched[i]], - operand_mode[i], + operand_mode[a], operand_mode[i], 0, 0, i, RELOAD_OTHER); operand_reloadnum[i] = output_reloadnum; }