Patchwork fixes for power4 scheduler description

login
register
mail settings
Submitter Steven Bosscher
Date July 10, 2012, 12:20 p.m.
Message ID <CABu31nP1vUUrmjDSdyE4LwjX0=7rOuect00E9UZLv7C3H3_58g@mail.gmail.com>
Download mbox | patch
Permalink /patch/170167/
State New
Headers show

Comments

Steven Bosscher - July 10, 2012, 12:20 p.m.
Hello,

These look like typos:

* "power4-store-update" wants "iuX,iuY" for X=1|2 and Y=1|2. The
"iu2,iu1" case appeared twice.
* "power4-three" wants "iuX,iuX,iuY|iuX,iuY,iuY" for X=1|2 and Y=1|2.
The "iu1,iu1,iu2" case appeared twice.

Bootstrapped&tested on powerpc64-unknown-linux-gnu.
OK for trunk?

Note, it'd be nice if the size of the power4iu automaton could be
reduced somehow. It is by far the largest automaton in the rs6000 back
end, accounting for ~40% of the total.

Automaton `power4iu'
     8128 NDFA states,          68391 NDFA arcs
    12609 DFA states,           104521 DFA arcs
    10894 minimal DFA states,   89248 minimal DFA arcs
      683 all insns         23 insn equivalence classes
    0 locked states
107203 transition comb vector els, 250562 trans table els: use simple vect
250562 min delay table els, compression factor 1

(All:)
124282 all allocated states,     493955 all allocated arcs
811237 all allocated alternative states
248236 all transition comb vector els, 613376 all trans table els
613376 all min delay table els
    0 all locked states

For comparison, power7iu:

Automaton `power7iu'
     7697 NDFA states,          23034 NDFA arcs
     3738 DFA states,           10277 DFA arcs
     3690 minimal DFA states,   10056 minimal DFA arcs
      683 all insns          9 insn equivalence classes
    0 locked states
10690 transition comb vector els, 33210 trans table els: use comb vect
33210 min delay table els, compression factor 1

I don't understand well enough how the scheduler descriptions are
translated to DFAs, so I don't really understand why the power4iu
automaton needs so many table elts, but the above seems
disproportional to me.

Ciao!
Steven

        * config/rs6000/power4.md (power4-store-update): Fix reservation.
        (power4-three): Likewise.
David Edelsohn - July 17, 2012, 5:51 p.m.
On Tue, Jul 10, 2012 at 8:20 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> Hello,
>
> These look like typos:
>
> * "power4-store-update" wants "iuX,iuY" for X=1|2 and Y=1|2. The
> "iu2,iu1" case appeared twice.
> * "power4-three" wants "iuX,iuX,iuY|iuX,iuY,iuY" for X=1|2 and Y=1|2.
> The "iu1,iu1,iu2" case appeared twice.
>
> Bootstrapped&tested on powerpc64-unknown-linux-gnu.
> OK for trunk?
>
> Note, it'd be nice if the size of the power4iu automaton could be
> reduced somehow. It is by far the largest automaton in the rs6000 back
> end, accounting for ~40% of the total.

Steven,

The change to the power3-three reservation is okay.

The change to power4-store-update looks incorrect or at least incomplete.

These reservations and others were changed by Vlad in March/April 2004
to fix a consistency check that he introduced at the time. Note that
the dispatch units for the final choice also is wrong and duplicated
from the previous line for power4-store-update:

   |(du3_power4+du4_power4,lsu2_power4)\
    |(du3_power4+du4_power4,lsu2_power4))+\

It looks like the final line should be

du4_power4+du1_power4,lsu1_power4

On POWER4, the dispatch slot forces specific function units.  I don't
remember and don't have the documents handy to know if the function
units for store-update should be

2-1
2-2
1-1
1-2

or

2-1
2-2
1-2
1-1

I think it is the latter.

If you look at the thread from 2009, we discuss that the POWER4
scheduler description already is an approximation because an accurate
description creates an unreasonably large automata.  A lot of the
problem is the 1 cycle delay for dependent integer ops.

Thanks, David
Pat Haugen - July 19, 2012, 6:43 p.m.
On 07/17/2012 12:51 PM, David Edelsohn wrote:
> The change to power4-store-update looks incorrect or at least incomplete.
>
> These reservations and others were changed by Vlad in March/April 2004
> to fix a consistency check that he introduced at the time. Note that
> the dispatch units for the final choice also is wrong and duplicated
> from the previous line for power4-store-update:
>
>     |(du3_power4+du4_power4,lsu2_power4)\
>      |(du3_power4+du4_power4,lsu2_power4))+\
>
> It looks like the final line should be
>
> du4_power4+du1_power4,lsu1_power4

This 4th line should be removed altogether. If only the 4th dispatch slot is available and a cracked (or microcoded) insn is next, a nop will fill that slot and the cracked insn will start a new dispatch group.

>
> On POWER4, the dispatch slot forces specific function units.  I don't
> remember and don't have the documents handy to know if the function
> units for store-update should be
>
> 2-1
> 2-2
> 1-1
> 1-2
>
> or
>
> 2-1
> 2-2
> 1-2
> 1-1
>
> I think it is the latter.

On Power4, the dispatch to issue/execution unit affinity is as follows (4 dispatch slots, 2 FXU/LSU issue/execution units):
1 -> 1
2 -> 2
3 -> 2
4 -> 1

Update form stores are cracked into a store and addi, and the store is dual issued to the LSU/FXU. So based on this info I would think it should look like the following (untested).

@@ -141,12 +141,10 @@ (define_insn_reservation "power4-store-u
         (eq_attr "cpu" "power4"))
    "((du1_power4+du2_power4,lsu1_power4)\
      |(du2_power4+du3_power4,lsu2_power4)\
-    |(du3_power4+du4_power4,lsu2_power4)\
      |(du3_power4+du4_power4,lsu2_power4))+\
-   ((nothing,iu2_power4,iu1_power4)\
+   ((nothing,iu1_power4,iu2_power4)\
      |(nothing,iu2_power4,iu2_power4)\
-    |(nothing,iu1_power4,iu2_power4)\
-    |(nothing,iu1_power4,iu2_power4))")
+    |(nothing,iu2_power4,iu1_power4))")

-Pat

>
> If you look at the thread from 2009, we discuss that the POWER4
> scheduler description already is an approximation because an accurate
> description creates an unreasonably large automata.  A lot of the
> problem is the 1 cycle delay for dependent integer ops.
David Edelsohn - July 20, 2012, 8:34 p.m.
On Thu, Jul 19, 2012 at 2:43 PM, Pat Haugen <pthaugen@linux.vnet.ibm.com> wrote:

> Update form stores are cracked into a store and addi, and the store is dual
> issued to the LSU/FXU. So based on this info I would think it should look
> like the following (untested).
>
> @@ -141,12 +141,10 @@ (define_insn_reservation "power4-store-u
>         (eq_attr "cpu" "power4"))
>    "((du1_power4+du2_power4,lsu1_power4)\
>      |(du2_power4+du3_power4,lsu2_power4)\
> -    |(du3_power4+du4_power4,lsu2_power4)\
>      |(du3_power4+du4_power4,lsu2_power4))+\
> -   ((nothing,iu2_power4,iu1_power4)\
> +   ((nothing,iu1_power4,iu2_power4)\
>      |(nothing,iu2_power4,iu2_power4)\
> -    |(nothing,iu1_power4,iu2_power4)\
> -    |(nothing,iu1_power4,iu2_power4))")
> +    |(nothing,iu2_power4,iu1_power4))")

Looks good.  Check it in after testing.

Thanks, David
Pat Haugen - July 20, 2012, 9:33 p.m.
On 07/20/2012 03:34 PM, David Edelsohn wrote:
> Looks good.  Check it in after testing.
>

Testing went fine, patch checked in.

-Pat

Patch

Index: config/rs6000/power4.md
===================================================================
--- config/rs6000/power4.md     (revision 189388)
+++ config/rs6000/power4.md     (working copy)
@@ -145,7 +145,7 @@ 
     |(du3_power4+du4_power4,lsu2_power4))+\
    ((nothing,iu2_power4,iu1_power4)\
     |(nothing,iu2_power4,iu2_power4)\
-    |(nothing,iu1_power4,iu2_power4)\
+    |(nothing,iu1_power4,iu1_power4)\
     |(nothing,iu1_power4,iu2_power4))")

 (define_insn_reservation "power4-store-update-indexed" 12
@@ -212,7 +212,7 @@ 
    ((iu1_power4,nothing,iu2_power4,nothing,iu2_power4)\
     |(iu2_power4,nothing,iu2_power4,nothing,iu1_power4)\
     |(iu2_power4,nothing,iu1_power4,nothing,iu1_power4)\
-    |(iu1_power4,nothing,iu2_power4,nothing,iu2_power4))")
+    |(iu1_power4,nothing,iu1_power4,nothing,iu2_power4))")

 (define_insn_reservation "power4-insert" 4
   (and (eq_attr "type" "insert_word")