| Submitter | Sandra Loosemore |
|---|---|
| Date | Aug. 1, 2012, 10:06 p.m. |
| Message ID | <5019A869.2050909@codesourcery.com> |
| Download | mbox | patch |
| Permalink | /patch/174612/ |
| State | New |
| Headers | show |
Comments
Sandra Loosemore <sandra@codesourcery.com> writes: > The existing scheduler bypass information for madd on the 74k uses some > bits copied from the 24k, and is not quite correct. This patch is based > on one originally sent to us by MIPS and has been present in our local > source base for years. I've confirmed that we are legally allowed to > contribute this to the FSF; ok for mainline? Sorry to ask, but do you have a record of why? Reason I ask is that... > Index: gcc/config/mips/74k.md > =================================================================== > --- gcc/config/mips/74k.md (revision 189988) > +++ gcc/config/mips/74k.md (working copy) > @@ -168,10 +168,11 @@ > ;; mult/madd/msub->int_mfhilo : 4 cycles (default) > ;; mult->madd/msub : 1 cycles > ;; madd/msub->madd/msub : 1 cycles > -(define_bypass 1 "r74k_int_mult,r74k_int_mul3" "r74k_int_madd" > - "mips_linked_madd_p") > -(define_bypass 1 "r74k_int_madd" "r74k_int_madd" > - "mips_linked_madd_p") > +(define_bypass 1 "r74k_int_mult" "r74k_int_madd") > +(define_bypass 1 "r74k_int_madd" "r74k_int_madd") > + > +(define_bypass 1 "r74k_int_mul3" "r74k_int_madd" > + "mips_mult_madd_chain_bypass_p") > > ;; -------------------------------------------------------------- > ;; Floating Point Instructions ...this looks like a step backwards. Before reload, the original version assumes that a multiplication feeding a madd is going to form a chain _as long as_ the result of the multiplication is used as the accumulator input to the madd. The condition is important because something like: r1 = r2 * r3 r6 = r1 * r4 + r5 (which is perfectly OK before reload) shouldn't form a chain. mult and mul3 are equivalent at this stage because we're still using pseudo registers and haven't yet introduced uses of LO. Having both reservations in the same bypass makes sense because of that. The original version should be OK after reload too. It should then never be the case that r74k_int_mul3 feeds r74k_int_madd in a way that satisfies mips_linked_madd_p, so the bypass is harmless but becomes temporarily redundant. But it should always be the case that r74k_int_mult only feeds r74k_int_madd in a way that satisfies mips_linked_madd_p, so the _predicate_ should be harmless but temporarily equivalent to "always true". I.e. the original version should behave as: (define_bypass 1 "r74k_int_mult" "r74k_int_madd") (define_bypass 1 "r74k_int_madd" "r74k_int_madd") after reload (with no bypass for r74k_int_mul3). At least that's how it's supposed to work. The new version is obviously equivalent in the post-reload case, but seems to be making the pre-reload case worse. I'm wondering whether someone hit a situation where mips_linked_madd_p wasn't working properly. Richard
On 08/04/2012 07:48 AM, Richard Sandiford wrote: > Sandra Loosemore<sandra@codesourcery.com> writes: >> The existing scheduler bypass information for madd on the 74k uses some >> bits copied from the 24k, and is not quite correct. This patch is based >> on one originally sent to us by MIPS and has been present in our local >> source base for years. I've confirmed that we are legally allowed to >> contribute this to the FSF; ok for mainline? > > Sorry to ask, but do you have a record of why? Reason I ask is that... > >> Index: gcc/config/mips/74k.md >> =================================================================== >> --- gcc/config/mips/74k.md (revision 189988) >> +++ gcc/config/mips/74k.md (working copy) >> @@ -168,10 +168,11 @@ >> ;; mult/madd/msub->int_mfhilo : 4 cycles (default) >> ;; mult->madd/msub : 1 cycles >> ;; madd/msub->madd/msub : 1 cycles >> -(define_bypass 1 "r74k_int_mult,r74k_int_mul3" "r74k_int_madd" >> - "mips_linked_madd_p") >> -(define_bypass 1 "r74k_int_madd" "r74k_int_madd" >> - "mips_linked_madd_p") >> +(define_bypass 1 "r74k_int_mult" "r74k_int_madd") >> +(define_bypass 1 "r74k_int_madd" "r74k_int_madd") >> + >> +(define_bypass 1 "r74k_int_mul3" "r74k_int_madd" >> + "mips_mult_madd_chain_bypass_p") >> >> ;; -------------------------------------------------------------- >> ;; Floating Point Instructions > > ...this looks like a step backwards. >[long explanation trimmed] Sigh, I wasn't able to find any detailed rationale for this patch. However, the "DSP ALU scheduling" patch I posted separately gives a clue what the intent was as it also uses mips_mult_madd_chain_bypass_p to give different behavior pre- and post-reload: > +;; Before reload, all multiplier is registered as imul3 (which has a long > +;; latency). We temporary jig the latency such that the macc groups > +;; are scheduled closely together during the first scheduler pass. > +(define_bypass 1 "r74k_int_mul3" "r74k_dsp_mac" > + "mips_mult_madd_chain_bypass_p") > +(define_bypass 1 "r74k_int_mul3" "r74k_dsp_mac_sat" > + "mips_mult_madd_chain_bypass_p") If this is incorrect or looks like a hack to paper over some other problem, I'd be happy to drop the predicate on these bits too. WDYT? -Sandra
Sandra Loosemore <sandra@codesourcery.com> writes: >> +;; Before reload, all multiplier is registered as imul3 (which has a long >> +;; latency). We temporary jig the latency such that the macc groups >> +;; are scheduled closely together during the first scheduler pass. >> +(define_bypass 1 "r74k_int_mul3" "r74k_dsp_mac" >> + "mips_mult_madd_chain_bypass_p") >> +(define_bypass 1 "r74k_int_mul3" "r74k_dsp_mac_sat" >> + "mips_mult_madd_chain_bypass_p") > > If this is incorrect or looks like a hack to paper over some other > problem, I'd be happy to drop the predicate on these bits too. WDYT? Hmm, yeah, it does look like they should be using mips_linked_madd_p instead, except that mips_linked_madd_p isn't yet wired up to handle DSP macs. Rather than pattern-match them all, the easiest thing would probably be to define a new attribute along the lines of: (define_attr "accum_in" "none,0,1,2,3,4,5" (const_string "none")) and use it for the existing imadds too. E.g.: (define_insn "*mul_acc_si" [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?") (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d") (match_operand:SI 2 "register_operand" "d,d")) (match_operand:SI 3 "register_operand" "0,d"))) (clobber (match_scratch:SI 4 "=X,l")) (clobber (match_scratch:SI 5 "=X,&d"))] "GENERATE_MADD_MSUB && !TARGET_MIPS16" "@ madd\t%1,%2 #" [(set_attr "type" "imadd") (set_attr "accum_in" "3") (set_attr "mode" "SI") (set_attr "length" "4,8")]) Then mips_linked_madd_p can use get_attr_accum_in to check for chains. But if that's more work than you want right now, just dropping the predicates above would be OK too, like you say. Richard
On 9/08/2012, at 7:10 AM, Richard Sandiford wrote: > Hmm, yeah, it does look like they should be using mips_linked_madd_p > instead, except that mips_linked_madd_p isn't yet wired up to handle > DSP macs. Rather than pattern-match them all, the easiest thing would > probably be to define a new attribute along the lines of: > > (define_attr "accum_in" "none,0,1,2,3,4,5" (const_string "none")) > > and use it for the existing imadds too. E.g.: > > (define_insn "*mul_acc_si" > [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?") > (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d") > (match_operand:SI 2 "register_operand" "d,d")) > (match_operand:SI 3 "register_operand" "0,d"))) > (clobber (match_scratch:SI 4 "=X,l")) > (clobber (match_scratch:SI 5 "=X,&d"))] > "GENERATE_MADD_MSUB && !TARGET_MIPS16" > "@ > madd\t%1,%2 > #" > [(set_attr "type" "imadd") > (set_attr "accum_in" "3") > (set_attr "mode" "SI") > (set_attr "length" "4,8")]) > > Then mips_linked_madd_p can use get_attr_accum_in to check for chains. > I thought I'll butt in since I did a very similar thing for sync_memmodel a couple of months ago. Is the attached patch what you have in mind? It is a standalone change by itself and can be checked in if OK. Sandra can than adjust DSP patches she's working on to use mips_linked_madd_p. OK to apply if no regressions? Thanks, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
Maxim Kuvyrkov <maxim@codesourcery.com> writes: > I thought I'll butt in since I did a very similar thing for > sync_memmodel a couple of months ago. Thanks. > + /* We take care in instruction definitions to make sure accum_in operand is > + a register_operand or [a more restrictive] muldiv_target_operand. */ > + gcc_assert (REG_P (accum_in_op)); register_operand can accept subregs too. I think it'd be better to leave this bit out. > diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md > index 759958b..79c1f25 100644 > --- a/gcc/config/mips/mips.md > +++ b/gcc/config/mips/mips.md > @@ -275,6 +275,10 @@ > (define_attr "sync_memmodel" "" (const_int 10)) > > > +;; Accumulator operand for madd patterns. > +(define_attr "accum_in" "none,0,1,2,3,4,5" (const_string "none")) > + > + Nit: just one blank line between attributes. > @@ -1715,6 +1724,7 @@ > "ISA_HAS_MACC && reload_completed" > "macc\t%3,%1,%2" > [(set_attr "type" "imadd") > + (set_attr "accum_in" "3") > (set_attr "mode" "SI")]) > > (define_insn "*msac2" > @@ -1729,6 +1739,7 @@ > "ISA_HAS_MSAC && reload_completed" > "msac\t%3,%1,%2" > [(set_attr "type" "imadd") > + (set_attr "accum_in" "3") > (set_attr "mode" "SI")]) > > ;; Convert macc $0,<r1>,<r2> & mflo <r3> into macc <r3>,<r1>,<r2> These two should be "0" instead. OK with those changes, thanks. Richard
On 14/08/2012, at 7:08 PM, Richard Sandiford wrote:
> OK with those changes, thanks.
Checked in with the noted changes and a fixed bug.
It turns out that mips_linked_madd_p is also called via mips_macc_chains_reorder, which may pass a (use ...) instruction, which causes get_attr_* to blow up. Fixed by adding
if (recog_memoized (in_insn) < 0)
return false;
at the beginning of mips_linked_madd_p.
Richard, thanks for review, and thanks Sandra for testing the patch.
--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics
Patch
Index: gcc/config/mips/74k.md =================================================================== --- gcc/config/mips/74k.md (revision 189988) +++ gcc/config/mips/74k.md (working copy) @@ -168,10 +168,11 @@ ;; mult/madd/msub->int_mfhilo : 4 cycles (default) ;; mult->madd/msub : 1 cycles ;; madd/msub->madd/msub : 1 cycles -(define_bypass 1 "r74k_int_mult,r74k_int_mul3" "r74k_int_madd" - "mips_linked_madd_p") -(define_bypass 1 "r74k_int_madd" "r74k_int_madd" - "mips_linked_madd_p") +(define_bypass 1 "r74k_int_mult" "r74k_int_madd") +(define_bypass 1 "r74k_int_madd" "r74k_int_madd") + +(define_bypass 1 "r74k_int_mul3" "r74k_int_madd" + "mips_mult_madd_chain_bypass_p") ;; -------------------------------------------------------------- ;; Floating Point Instructions Index: gcc/config/mips/mips-protos.h =================================================================== --- gcc/config/mips/mips-protos.h (revision 189988) +++ gcc/config/mips/mips-protos.h (working copy) @@ -296,6 +296,7 @@ extern unsigned int mips_sync_loop_insns extern const char *mips_output_division (const char *, rtx *); extern unsigned int mips_hard_regno_nregs (int, enum machine_mode); extern bool mips_linked_madd_p (rtx, rtx); +extern bool mips_mult_madd_chain_bypass_p (rtx, rtx); extern bool mips_store_data_bypass_p (rtx, rtx); extern rtx mips_prefetch_cookie (rtx, rtx); Index: gcc/config/mips/mips.c =================================================================== --- gcc/config/mips/mips.c (revision 189988) +++ gcc/config/mips/mips.c (working copy) @@ -12392,6 +12392,18 @@ mips_linked_madd_p (rtx out_insn, rtx in return false; } +/* Helper function for 74k; returns true to enable the chained mult/madd + bypass. */ +bool +mips_mult_madd_chain_bypass_p (rtx out_insn ATTRIBUTE_UNUSED, + rtx in_insn ATTRIBUTE_UNUSED) +{ + if (reload_completed) + return false; + else + return true; +} + /* True if the dependency between OUT_INSN and IN_INSN is on the store data rather than the address. We need this because the cprestore pattern is type "store", but is defined using an UNSPEC_VOLATILE,