Patchwork [MIPS] 74k madd scheduler tweaks

login
register
mail settings
Submitter Sandra Loosemore
Date Aug. 1, 2012, 10:06 p.m.
Message ID <5019A869.2050909@codesourcery.com>
Download mbox | patch
Permalink /patch/174612/
State New
Headers show

Comments

Sandra Loosemore - Aug. 1, 2012, 10:06 p.m.
The existing scheduler bypass information for madd on the 74k uses some 
bits copied from the 24k, and is not quite correct.  This patch is based 
on one originally sent to us by MIPS and has been present in our local 
source base for years.  I've confirmed that we are legally allowed to 
contribute this to the FSF; ok for mainline?

-Sandra

2012-08-01  Sandra Loosemore  <sandra@codesourcery.com>
	    Maxim Kuvyrkov  <maxim@codesourcery.com>
	    Julian Brown  <julian@codesourcery.com>
	    MIPS Technologies, Inc.
	
	* config/mips/74k.md (r74k_int_mult, r74k_int_madd): Don't use
	mips_linked_madd_p for bypasses.
	(r74k_int_mul3): Use mips_mult_madd_chain_bypass_p for bypass.
	* config/mips/mips.c (mips_mult_madd_chain_bypass_p): New.
	* config/mips/mips-protos.h (mips_mult_madd_chain_bypass_p): Add
	prototype.
Richard Sandiford - Aug. 4, 2012, 1:48 p.m.
Sandra Loosemore <sandra@codesourcery.com> writes:
> The existing scheduler bypass information for madd on the 74k uses some 
> bits copied from the 24k, and is not quite correct.  This patch is based 
> on one originally sent to us by MIPS and has been present in our local 
> source base for years.  I've confirmed that we are legally allowed to 
> contribute this to the FSF; ok for mainline?

Sorry to ask, but do you have a record of why?  Reason I ask is that...

> Index: gcc/config/mips/74k.md
> ===================================================================
> --- gcc/config/mips/74k.md	(revision 189988)
> +++ gcc/config/mips/74k.md	(working copy)
> @@ -168,10 +168,11 @@
>  ;; mult/madd/msub->int_mfhilo  : 4 cycles (default)
>  ;; mult->madd/msub             : 1 cycles
>  ;; madd/msub->madd/msub        : 1 cycles
> -(define_bypass 1 "r74k_int_mult,r74k_int_mul3" "r74k_int_madd"
> -  "mips_linked_madd_p")
> -(define_bypass 1 "r74k_int_madd" "r74k_int_madd"
> -  "mips_linked_madd_p")
> +(define_bypass 1 "r74k_int_mult" "r74k_int_madd")
> +(define_bypass 1 "r74k_int_madd" "r74k_int_madd")
> +
> +(define_bypass 1 "r74k_int_mul3" "r74k_int_madd"
> +  "mips_mult_madd_chain_bypass_p")
>  
>  ;; --------------------------------------------------------------
>  ;; Floating Point Instructions

...this looks like a step backwards.  Before reload, the original
version assumes that a multiplication feeding a madd is going to form
a chain _as long as_ the result of the multiplication is used as the
accumulator input to the madd.  The condition is important because
something like:

     r1 = r2 * r3
     r6 = r1 * r4 + r5

(which is perfectly OK before reload) shouldn't form a chain.

mult and mul3 are equivalent at this stage because we're still using
pseudo registers and haven't yet introduced uses of LO.  Having both
reservations in the same bypass makes sense because of that.

The original version should be OK after reload too.  It should then
never be the case that r74k_int_mul3 feeds r74k_int_madd in a way
that satisfies mips_linked_madd_p, so the bypass is harmless but
becomes temporarily redundant.  But it should always be the case
that r74k_int_mult only feeds r74k_int_madd in a way that satisfies
mips_linked_madd_p, so the _predicate_ should be harmless but temporarily
equivalent to "always true".

I.e. the original version should behave as:

  (define_bypass 1 "r74k_int_mult" "r74k_int_madd")
  (define_bypass 1 "r74k_int_madd" "r74k_int_madd")

after reload (with no bypass for r74k_int_mul3).

At least that's how it's supposed to work.  The new version is
obviously equivalent in the post-reload case, but seems to be making
the pre-reload case worse.  I'm wondering whether someone hit a
situation where mips_linked_madd_p wasn't working properly.

Richard
Sandra Loosemore - Aug. 4, 2012, 8:44 p.m.
On 08/04/2012 07:48 AM, Richard Sandiford wrote:
> Sandra Loosemore<sandra@codesourcery.com>  writes:
>> The existing scheduler bypass information for madd on the 74k uses some
>> bits copied from the 24k, and is not quite correct.  This patch is based
>> on one originally sent to us by MIPS and has been present in our local
>> source base for years.  I've confirmed that we are legally allowed to
>> contribute this to the FSF; ok for mainline?
>
> Sorry to ask, but do you have a record of why?  Reason I ask is that...
>
>> Index: gcc/config/mips/74k.md
>> ===================================================================
>> --- gcc/config/mips/74k.md	(revision 189988)
>> +++ gcc/config/mips/74k.md	(working copy)
>> @@ -168,10 +168,11 @@
>>   ;; mult/madd/msub->int_mfhilo  : 4 cycles (default)
>>   ;; mult->madd/msub             : 1 cycles
>>   ;; madd/msub->madd/msub        : 1 cycles
>> -(define_bypass 1 "r74k_int_mult,r74k_int_mul3" "r74k_int_madd"
>> -  "mips_linked_madd_p")
>> -(define_bypass 1 "r74k_int_madd" "r74k_int_madd"
>> -  "mips_linked_madd_p")
>> +(define_bypass 1 "r74k_int_mult" "r74k_int_madd")
>> +(define_bypass 1 "r74k_int_madd" "r74k_int_madd")
>> +
>> +(define_bypass 1 "r74k_int_mul3" "r74k_int_madd"
>> +  "mips_mult_madd_chain_bypass_p")
>>
>>   ;; --------------------------------------------------------------
>>   ;; Floating Point Instructions
>
> ...this looks like a step backwards.
>[long explanation trimmed]

Sigh, I wasn't able to find any detailed rationale for this patch. 
However, the "DSP ALU scheduling" patch I posted separately gives a clue 
what the intent was as it also uses mips_mult_madd_chain_bypass_p to 
give different behavior pre- and post-reload:

> +;; Before reload, all multiplier is registered as imul3 (which has a long
> +;;  latency).  We temporary jig the latency such that the macc groups
> +;;  are scheduled closely together during the first scheduler pass.
> +(define_bypass 1 "r74k_int_mul3" "r74k_dsp_mac"
> +  "mips_mult_madd_chain_bypass_p")
> +(define_bypass 1 "r74k_int_mul3" "r74k_dsp_mac_sat"
> +  "mips_mult_madd_chain_bypass_p")

If this is incorrect or looks like a hack to paper over some other 
problem, I'd be happy to drop the predicate on these bits too.  WDYT?

-Sandra
Richard Sandiford - Aug. 8, 2012, 7:10 p.m.
Sandra Loosemore <sandra@codesourcery.com> writes:
>> +;; Before reload, all multiplier is registered as imul3 (which has a long
>> +;;  latency).  We temporary jig the latency such that the macc groups
>> +;;  are scheduled closely together during the first scheduler pass.
>> +(define_bypass 1 "r74k_int_mul3" "r74k_dsp_mac"
>> +  "mips_mult_madd_chain_bypass_p")
>> +(define_bypass 1 "r74k_int_mul3" "r74k_dsp_mac_sat"
>> +  "mips_mult_madd_chain_bypass_p")
>
> If this is incorrect or looks like a hack to paper over some other 
> problem, I'd be happy to drop the predicate on these bits too.  WDYT?

Hmm, yeah, it does look like they should be using mips_linked_madd_p
instead, except that mips_linked_madd_p isn't yet wired up to handle
DSP macs.  Rather than pattern-match them all, the easiest thing would
probably be to define a new attribute along the lines of:

(define_attr "accum_in" "none,0,1,2,3,4,5" (const_string "none"))

and use it for the existing imadds too.  E.g.:

(define_insn "*mul_acc_si"
  [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?")
	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d")
			  (match_operand:SI 2 "register_operand" "d,d"))
		 (match_operand:SI 3 "register_operand" "0,d")))
   (clobber (match_scratch:SI 4 "=X,l"))
   (clobber (match_scratch:SI 5 "=X,&d"))]
  "GENERATE_MADD_MSUB && !TARGET_MIPS16"
  "@
    madd\t%1,%2
    #"
  [(set_attr "type"	"imadd")
   (set_attr "accum_in" "3")
   (set_attr "mode"	"SI")
   (set_attr "length"	"4,8")])

Then mips_linked_madd_p can use get_attr_accum_in to check for chains.

But if that's more work than you want right now, just dropping the
predicates above would be OK too, like you say.

Richard
Maxim Kuvyrkov - Aug. 14, 2012, 12:06 a.m.
On 9/08/2012, at 7:10 AM, Richard Sandiford wrote:

> Hmm, yeah, it does look like they should be using mips_linked_madd_p
> instead, except that mips_linked_madd_p isn't yet wired up to handle
> DSP macs.  Rather than pattern-match them all, the easiest thing would
> probably be to define a new attribute along the lines of:
> 
> (define_attr "accum_in" "none,0,1,2,3,4,5" (const_string "none"))
> 
> and use it for the existing imadds too.  E.g.:
> 
> (define_insn "*mul_acc_si"
>  [(set (match_operand:SI 0 "register_operand" "=l*?*?,d?")
> 	(plus:SI (mult:SI (match_operand:SI 1 "register_operand" "d,d")
> 			  (match_operand:SI 2 "register_operand" "d,d"))
> 		 (match_operand:SI 3 "register_operand" "0,d")))
>   (clobber (match_scratch:SI 4 "=X,l"))
>   (clobber (match_scratch:SI 5 "=X,&d"))]
>  "GENERATE_MADD_MSUB && !TARGET_MIPS16"
>  "@
>    madd\t%1,%2
>    #"
>  [(set_attr "type"	"imadd")
>   (set_attr "accum_in" "3")
>   (set_attr "mode"	"SI")
>   (set_attr "length"	"4,8")])
> 
> Then mips_linked_madd_p can use get_attr_accum_in to check for chains.
> 

I thought I'll butt in since I did a very similar thing for sync_memmodel a couple of months ago.

Is the attached patch what you have in mind?  It is a standalone change by itself and can be checked in if OK.  Sandra can than adjust DSP patches she's working on to use mips_linked_madd_p.

OK to apply if no regressions?

Thanks,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics
Richard Sandiford - Aug. 14, 2012, 7:08 a.m.
Maxim Kuvyrkov <maxim@codesourcery.com> writes:
> I thought I'll butt in since I did a very similar thing for
> sync_memmodel a couple of months ago.

Thanks.

> +  /* We take care in instruction definitions to make sure accum_in operand is
> +     a register_operand or [a more restrictive] muldiv_target_operand.  */
> +  gcc_assert (REG_P (accum_in_op));

register_operand can accept subregs too.  I think it'd be better to
leave this bit out.

> diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
> index 759958b..79c1f25 100644
> --- a/gcc/config/mips/mips.md
> +++ b/gcc/config/mips/mips.md
> @@ -275,6 +275,10 @@
>  (define_attr "sync_memmodel" "" (const_int 10))
>  
>  
> +;; Accumulator operand for madd patterns.
> +(define_attr "accum_in" "none,0,1,2,3,4,5" (const_string "none"))
> +
> +

Nit: just one blank line between attributes.

> @@ -1715,6 +1724,7 @@
>    "ISA_HAS_MACC && reload_completed"
>    "macc\t%3,%1,%2"
>    [(set_attr "type"	"imadd")
> +   (set_attr "accum_in"	"3")
>     (set_attr "mode"	"SI")])
>  
>  (define_insn "*msac2"
> @@ -1729,6 +1739,7 @@
>    "ISA_HAS_MSAC && reload_completed"
>    "msac\t%3,%1,%2"
>    [(set_attr "type"	"imadd")
> +   (set_attr "accum_in"	"3")
>     (set_attr "mode"	"SI")])
>  
>  ;; Convert macc $0,<r1>,<r2> & mflo <r3> into macc <r3>,<r1>,<r2>

These two should be "0" instead.

OK with those changes, thanks.

Richard
Maxim Kuvyrkov - Aug. 15, 2012, 5:53 a.m.
On 14/08/2012, at 7:08 PM, Richard Sandiford wrote:

> OK with those changes, thanks.

Checked in with the noted changes and a fixed bug.

It turns out that mips_linked_madd_p is also called via mips_macc_chains_reorder, which may pass a (use ...) instruction, which causes get_attr_* to blow up.  Fixed by adding

if (recog_memoized (in_insn) < 0)
  return false;

at the beginning of mips_linked_madd_p.

Richard, thanks for review, and thanks Sandra for testing the patch.

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics

Patch

Index: gcc/config/mips/74k.md
===================================================================
--- gcc/config/mips/74k.md	(revision 189988)
+++ gcc/config/mips/74k.md	(working copy)
@@ -168,10 +168,11 @@ 
 ;; mult/madd/msub->int_mfhilo  : 4 cycles (default)
 ;; mult->madd/msub             : 1 cycles
 ;; madd/msub->madd/msub        : 1 cycles
-(define_bypass 1 "r74k_int_mult,r74k_int_mul3" "r74k_int_madd"
-  "mips_linked_madd_p")
-(define_bypass 1 "r74k_int_madd" "r74k_int_madd"
-  "mips_linked_madd_p")
+(define_bypass 1 "r74k_int_mult" "r74k_int_madd")
+(define_bypass 1 "r74k_int_madd" "r74k_int_madd")
+
+(define_bypass 1 "r74k_int_mul3" "r74k_int_madd"
+  "mips_mult_madd_chain_bypass_p")
 
 ;; --------------------------------------------------------------
 ;; Floating Point Instructions
Index: gcc/config/mips/mips-protos.h
===================================================================
--- gcc/config/mips/mips-protos.h	(revision 189988)
+++ gcc/config/mips/mips-protos.h	(working copy)
@@ -296,6 +296,7 @@  extern unsigned int mips_sync_loop_insns
 extern const char *mips_output_division (const char *, rtx *);
 extern unsigned int mips_hard_regno_nregs (int, enum machine_mode);
 extern bool mips_linked_madd_p (rtx, rtx);
+extern bool mips_mult_madd_chain_bypass_p (rtx, rtx);
 extern bool mips_store_data_bypass_p (rtx, rtx);
 extern rtx mips_prefetch_cookie (rtx, rtx);
 
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 189988)
+++ gcc/config/mips/mips.c	(working copy)
@@ -12392,6 +12392,18 @@  mips_linked_madd_p (rtx out_insn, rtx in
   return false;
 }
 
+/* Helper function for 74k; returns true to enable the chained mult/madd
+   bypass.  */
+bool
+mips_mult_madd_chain_bypass_p (rtx out_insn ATTRIBUTE_UNUSED,
+			       rtx in_insn ATTRIBUTE_UNUSED)
+{
+  if (reload_completed)
+    return false;
+  else
+    return true;
+}
+
 /* True if the dependency between OUT_INSN and IN_INSN is on the store
    data rather than the address.  We need this because the cprestore
    pattern is type "store", but is defined using an UNSPEC_VOLATILE,