diff mbox

[SMS,2/2] Support instructions with REG_INC_NOTE (second try)

Message ID CAHz1=dWb73CaAJCywt0+uwKx+9TE2wo1dXGmMDW8G753dvh2xA@mail.gmail.com
State New
Headers show

Commit Message

Revital Eres Sept. 26, 2011, 4:31 a.m. UTC
Hello,

This patch extends the implementation to support instructions with
REG_INC notes.
It addresses the comments from the previous submission:
http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01299.html.

btw, regarding your previous question about the usage of the address
register been auto inc, apparently it can be used as follows:

insn 1) def1 = MEM [ pre_dec (addr) ]
out edges: [1 -(T,2,1)-> 1]  [1 -(T,2,0)-> 2]
in edges: [1 -(T,2,1)-> 1]  [2 -(T,2,1)-> 1]  [2 -(A,0,1)->1]

insn 2) MEM [ addr + 3 ] = def1
out edges:  [2 -(T,2,1)-> 1]  [2 -(A,0,1)->1]
in edges: [1 -(T,2,0)-> 2]

Reg-moves were not created for the address when testing on ppc.

Tested and bootstrap with patch 1 on ppc64-redhat-linux
enabling SMS on loops with SC 1. On arm-linux-gnueabi bootstrap
c on top of the set of patches that support do-loop pattern
(http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01807.html) which solves
the bootstrap failure on ARM with SMS flags.

OK for mainline?

Thanks,
Revital


        * ddg.c (autoinc_var_is_used_p): New function.
        (create_ddg_dep_from_intra_loop_link,
        add_cross_iteration_register_deps): Call it.
        * modulo-sched.c (sms_schedule): Handle instructions with REG_INC.

Comments

Ayal Zaks Sept. 27, 2011, 1:46 a.m. UTC | #1
On Mon, Sep 26, 2011 at 7:31 AM, Revital Eres <revital.eres@linaro.org> wrote:
> Hello,
>
> This patch extends the implementation to support instructions with
> REG_INC notes.
> It addresses the comments from the previous submission:
> http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01299.html.
>

ok, so if we have an auto-inc'ing insn which defines (auto-inc's) an
addr register and another (say, result) register, we want to allow the
result register to have life ranges in excess of ii (by eliminating
anti-dep edges of distance 1 from uses to def, and generating
reg_moves if/where needed), but avoid having such life ranges of addr
(by retaining such anti-dep edges). Right?

> btw, regarding your previous question about the usage of the address
> register been auto inc, apparently it can be used as follows:
>
> insn 1) def1 = MEM [ pre_dec (addr) ]
> out edges: [1 -(T,2,1)-> 1]  [1 -(T,2,0)-> 2]
> in edges: [1 -(T,2,1)-> 1]  [2 -(T,2,1)-> 1]  [2 -(A,0,1)->1]
>
> insn 2) MEM [ addr + 3 ] = def1
> out edges:  [2 -(T,2,1)-> 1]  [2 -(A,0,1)->1]
> in edges: [1 -(T,2,0)-> 2]
>

Are these all the edges? We have only one True dependence edge from
insn 1 to insn 2, but insn 1 is setting two registers both used by
insn 2 (regardless of what we decide to do with Anti-deps). As for
Anti-deps of distance 1, we have only one going back from insn 2 to
insn 1, perhaps corresponding to addr, allowing reg_moves for def1(?).
But, it won't help def1, because this other Anti-dep will force them
to be scheduled w/o reg_moves.

> Reg-moves were not created for the address when testing on ppc.

ok; note that the example above shows one could potentially have an
insn 2 scheduled in a later stage than insn 1, wanting to feed off an
earlier version of the pre_dec'd addr. (In this case, it would
probably be better to update the offset (3) than use a reg_move for
the addr. (Such folding might work for other cases as well(?))).

>
> Tested and bootstrap with patch 1 on ppc64-redhat-linux
> enabling SMS on loops with SC 1. On arm-linux-gnueabi bootstrap
> c on top of the set of patches that support do-loop pattern
> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01807.html) which solves
> the bootstrap failure on ARM with SMS flags.
>
> OK for mainline?
>

OK, with the following comments:

Make sure reg_moves are generated for the correct (result, not addr)
register, in generate_reg_moves().

"been">>"being" (multiple appearances).

Add a note that autoinc_var_is_used_p (rtx def_insn, rtx use_insn)
doesn't need to consider the specific address register; no reg_moves
will be allowed for any life range defined by def_insn and used by
use_insn, if use_insn uses an address register auto-inc'ed by
def_insn.

In other words, one would expect to see two Anti-dep edges from insn 2
to insn 1, right?

Ayal.


> Thanks,
> Revital
>
>
>        * ddg.c (autoinc_var_is_used_p): New function.
>        (create_ddg_dep_from_intra_loop_link,
>        add_cross_iteration_register_deps): Call it.
>        * modulo-sched.c (sms_schedule): Handle instructions with REG_INC.
>
Revital Eres Sept. 27, 2011, 6:47 a.m. UTC | #2
Hello,

> ok, so if we have an auto-inc'ing insn which defines (auto-inc's) an
> addr register and another (say, result) register, we want to allow the
> result register to have life ranges in excess of ii (by eliminating
> anti-dep edges of distance 1 from uses to def, and generating
> reg_moves if/where needed), but avoid having such life ranges of addr
> (by retaining such anti-dep edges). Right?

Yes.
>
> Are these all the edges? We have only one True dependence edge from
> insn 1 to insn 2, but insn 1 is setting two registers both used by
> insn 2 (regardless of what we decide to do with Anti-deps). As for
> Anti-deps of distance 1, we have only one going back from insn 2 to
> insn 1, perhaps corresponding to addr, allowing reg_moves for def1(?).
> But, it won't help def1, because this other Anti-dep will force them
> to be scheduled w/o reg_moves.

Please ignore the edges in the previous example. It indeed was a mistake,
sorry about the confusion.  Here are two examples taken from bootstrap
on PPC of how the address is used; with the current patch applied and
running with -fmodulo-sched-allow-regmoves:

Node num: 2
(insn 3681 3678 3682 500 (set (reg:QI 2914 [ MEM[base: D.9586_4130,
offset: 0B] ])
        (mem:QI (pre_dec:DI (reg:DI 1644 [ ivtmp.687 ])) [0 MEM[base:
D.9586_4130, offset: 0B]+0 S1 A8])) ../../gcc/libiberty/regex.c:4259
358 {*movqi_internal}
     (expr_list:REG_INC (reg:DI 1644 [ ivtmp.687 ])
        (nil)))
OUT ARCS:  [3681 -(T,2,1)-> 3681]  [3681 -(T,2,0)-> 3682]
IN ARCS:  [3682 -(A,0,1)-> 3681]  [3681 -(T,2,1)-> 3681]  [3682
-(A,0,1)-> 3681]  [3682 -(T,2,1)-> 3681]
Node num: 3
(insn 3682 3681 3683 500 (set (mem:QI (plus:DI (reg:DI 1644 [ ivtmp.687 ])
                (const_int 3 [0x3])) [0 MEM[base: D.9586_4130, offset:
3B]+0 S1 A8])
        (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ]))
../../gcc/libiberty/regex.c:4259 358 {*movqi_internal}
     (expr_list:REG_DEAD (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ])
        (nil)))
OUT ARCS:  [3682 -(A,0,1)-> 3681]  [3682 -(A,0,1)-> 3681]  [3682
-(O,0,0)-> 7263]  [3682 -(A,0,0)-> 3683]  [3682 -(T,2,1)-> 3681]
IN ARCS:  [3681 -(T,2,0)-> 3682]

Another example of usage is as follows (the address register is not
used in MEM):

Node num: 0
(insn 1419 1415 1423 9 (set (mem/f:DI (pre_inc:DI (reg:DI 1882 [
ivtmp.1636 ])) [3 MEM[base: D.10911_2945, offset: 0B]+0 S8 A64])
        (reg/f:DI 3923)) ../../gcc/libiberty/regex.c:5788 378
{*movdi_internal64}
     (expr_list:REG_INC (reg:DI 1882 [ ivtmp.1636 ])
        (nil)))
OUT ARCS:  [1419 -(T,2,1)-> 1419]  [1419 -(O,0,0)-> 5932]  [1419
-(O,0,0)-> 1449]  [1419 -(T,2,1)-> 1434]  [1419 -(T,2,0)-> 1434]
[1419 -(T,2,0)-> 1432]  [1419 -(O,0,0)-> 1431]  [1419 -(O,0,0)-> 1427]
 [1419 -(O,0,0)-> 1423]
IN ARCS:  [1419 -(T,2,1)-> 1419]  [1432 -(A,0,1)-> 1419]  [1449
-(O,0,1)-> 1419]  [1434 -(A,0,1)-> 1419]  [1431 -(O,0,1)-> 1419]
[1427 -(O,0,1)-> 1419]  [1423 -(O,0,1)-> 1419]
Node num: 4
(insn 1432 1431 1433 9 (set (reg:DI 2632)
        (plus:DI (reg/v/f:DI 1058 [ reg_info ])
            (reg:DI 1882 [ ivtmp.1636 ])))
../../gcc/libiberty/regex.c:5543 79 {*adddi3_internal1}
     (nil))
OUT ARCS:  [1432 -(A,0,1)-> 1419]  [1432 -(T,1,0)-> 1433]
IN ARCS:  [1419 -(T,2,0)-> 1432]

>> OK for mainline?
>>
>
> OK, with the following comments:

Thanks, will address the comments and re-submit.

> In other words, one would expect to see two Anti-dep edges from insn 2
> to insn 1, right?

Yes, that's indeed the case in the first example above.

Thanks,
Revital
Ayal Zaks Sept. 27, 2011, 11:59 a.m. UTC | #3
On Tue, Sep 27, 2011 at 9:47 AM, Revital Eres <revital.eres@linaro.org> wrote:
> Hello,
>
>> ok, so if we have an auto-inc'ing insn which defines (auto-inc's) an
>> addr register and another (say, result) register, we want to allow the
>> result register to have life ranges in excess of ii (by eliminating
>> anti-dep edges of distance 1 from uses to def, and generating
>> reg_moves if/where needed), but avoid having such life ranges of addr
>> (by retaining such anti-dep edges). Right?
>
> Yes.
>>
>> Are these all the edges? We have only one True dependence edge from
>> insn 1 to insn 2, but insn 1 is setting two registers both used by
>> insn 2 (regardless of what we decide to do with Anti-deps). As for
>> Anti-deps of distance 1, we have only one going back from insn 2 to
>> insn 1, perhaps corresponding to addr, allowing reg_moves for def1(?).
>> But, it won't help def1, because this other Anti-dep will force them
>> to be scheduled w/o reg_moves.
>
> Please ignore the edges in the previous example. It indeed was a mistake,
> sorry about the confusion.  Here are two examples taken from bootstrap
> on PPC of how the address is used; with the current patch applied and
> running with -fmodulo-sched-allow-regmoves:
>

Ok, this does have two anti-dep edges. But still, only a single true
dependence(?) ... can you see why?

Thanks,
Ayal.


> Node num: 2
> (insn 3681 3678 3682 500 (set (reg:QI 2914 [ MEM[base: D.9586_4130,
> offset: 0B] ])
>        (mem:QI (pre_dec:DI (reg:DI 1644 [ ivtmp.687 ])) [0 MEM[base:
> D.9586_4130, offset: 0B]+0 S1 A8])) ../../gcc/libiberty/regex.c:4259
> 358 {*movqi_internal}
>     (expr_list:REG_INC (reg:DI 1644 [ ivtmp.687 ])
>        (nil)))
> OUT ARCS:  [3681 -(T,2,1)-> 3681]  [3681 -(T,2,0)-> 3682]
> IN ARCS:  [3682 -(A,0,1)-> 3681]  [3681 -(T,2,1)-> 3681]  [3682
> -(A,0,1)-> 3681]  [3682 -(T,2,1)-> 3681]
> Node num: 3
> (insn 3682 3681 3683 500 (set (mem:QI (plus:DI (reg:DI 1644 [ ivtmp.687 ])
>                (const_int 3 [0x3])) [0 MEM[base: D.9586_4130, offset:
> 3B]+0 S1 A8])
>        (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ]))
> ../../gcc/libiberty/regex.c:4259 358 {*movqi_internal}
>     (expr_list:REG_DEAD (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ])
>        (nil)))
> OUT ARCS:  [3682 -(A,0,1)-> 3681]  [3682 -(A,0,1)-> 3681]  [3682
> -(O,0,0)-> 7263]  [3682 -(A,0,0)-> 3683]  [3682 -(T,2,1)-> 3681]
> IN ARCS:  [3681 -(T,2,0)-> 3682]
>
> Another example of usage is as follows (the address register is not
> used in MEM):
>
> Node num: 0
> (insn 1419 1415 1423 9 (set (mem/f:DI (pre_inc:DI (reg:DI 1882 [
> ivtmp.1636 ])) [3 MEM[base: D.10911_2945, offset: 0B]+0 S8 A64])
>        (reg/f:DI 3923)) ../../gcc/libiberty/regex.c:5788 378
> {*movdi_internal64}
>     (expr_list:REG_INC (reg:DI 1882 [ ivtmp.1636 ])
>        (nil)))
> OUT ARCS:  [1419 -(T,2,1)-> 1419]  [1419 -(O,0,0)-> 5932]  [1419
> -(O,0,0)-> 1449]  [1419 -(T,2,1)-> 1434]  [1419 -(T,2,0)-> 1434]
> [1419 -(T,2,0)-> 1432]  [1419 -(O,0,0)-> 1431]  [1419 -(O,0,0)-> 1427]
>  [1419 -(O,0,0)-> 1423]
> IN ARCS:  [1419 -(T,2,1)-> 1419]  [1432 -(A,0,1)-> 1419]  [1449
> -(O,0,1)-> 1419]  [1434 -(A,0,1)-> 1419]  [1431 -(O,0,1)-> 1419]
> [1427 -(O,0,1)-> 1419]  [1423 -(O,0,1)-> 1419]
> Node num: 4
> (insn 1432 1431 1433 9 (set (reg:DI 2632)
>        (plus:DI (reg/v/f:DI 1058 [ reg_info ])
>            (reg:DI 1882 [ ivtmp.1636 ])))
> ../../gcc/libiberty/regex.c:5543 79 {*adddi3_internal1}
>     (nil))
> OUT ARCS:  [1432 -(A,0,1)-> 1419]  [1432 -(T,1,0)-> 1433]
> IN ARCS:  [1419 -(T,2,0)-> 1432]
>
>>> OK for mainline?
>>>
>>
>> OK, with the following comments:
>
> Thanks, will address the comments and re-submit.
>
>> In other words, one would expect to see two Anti-dep edges from insn 2
>> to insn 1, right?
>
> Yes, that's indeed the case in the first example above.
>
> Thanks,
> Revital
>
Revital Eres Sept. 27, 2011, 1:55 p.m. UTC | #4
Hello,

> Ok, this does have two anti-dep edges. But still, only a single true
> dependence(?) ... can you see why?

The intra edge [3681 -(T,2,0)-> 3682] was created by haifa-sched and I guess
that because both of the expected true-dep edges (one for the target
and one for the address) are identical only one is created.  The rest
of the inter edges were created in ddg.c where we do not check for
multiply identical edges.

Thanks,
Revital



>
> Thanks,
> Ayal.
>
>
>> Node num: 2
>> (insn 3681 3678 3682 500 (set (reg:QI 2914 [ MEM[base: D.9586_4130,
>> offset: 0B] ])
>>        (mem:QI (pre_dec:DI (reg:DI 1644 [ ivtmp.687 ])) [0 MEM[base:
>> D.9586_4130, offset: 0B]+0 S1 A8])) ../../gcc/libiberty/regex.c:4259
>> 358 {*movqi_internal}
>>     (expr_list:REG_INC (reg:DI 1644 [ ivtmp.687 ])
>>        (nil)))
>> OUT ARCS:  [3681 -(T,2,1)-> 3681]  [3681 -(T,2,0)-> 3682]
>> IN ARCS:  [3682 -(A,0,1)-> 3681]  [3681 -(T,2,1)-> 3681]  [3682
>> -(A,0,1)-> 3681]  [3682 -(T,2,1)-> 3681]
>> Node num: 3
>> (insn 3682 3681 3683 500 (set (mem:QI (plus:DI (reg:DI 1644 [ ivtmp.687 ])
>>                (const_int 3 [0x3])) [0 MEM[base: D.9586_4130, offset:
>> 3B]+0 S1 A8])
>>        (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ]))
>> ../../gcc/libiberty/regex.c:4259 358 {*movqi_internal}
>>     (expr_list:REG_DEAD (reg:QI 2914 [ MEM[base: D.9586_4130, offset: 0B] ])
>>        (nil)))
>> OUT ARCS:  [3682 -(A,0,1)-> 3681]  [3682 -(A,0,1)-> 3681]  [3682
>> -(O,0,0)-> 7263]  [3682 -(A,0,0)-> 3683]  [3682 -(T,2,1)-> 3681]
>> IN ARCS:  [3681 -(T,2,0)-> 3682]
>>
>> Another example of usage is as follows (the address register is not
>> used in MEM):
>>
>> Node num: 0
>> (insn 1419 1415 1423 9 (set (mem/f:DI (pre_inc:DI (reg:DI 1882 [
>> ivtmp.1636 ])) [3 MEM[base: D.10911_2945, offset: 0B]+0 S8 A64])
>>        (reg/f:DI 3923)) ../../gcc/libiberty/regex.c:5788 378
>> {*movdi_internal64}
>>     (expr_list:REG_INC (reg:DI 1882 [ ivtmp.1636 ])
>>        (nil)))
>> OUT ARCS:  [1419 -(T,2,1)-> 1419]  [1419 -(O,0,0)-> 5932]  [1419
>> -(O,0,0)-> 1449]  [1419 -(T,2,1)-> 1434]  [1419 -(T,2,0)-> 1434]
>> [1419 -(T,2,0)-> 1432]  [1419 -(O,0,0)-> 1431]  [1419 -(O,0,0)-> 1427]
>>  [1419 -(O,0,0)-> 1423]
>> IN ARCS:  [1419 -(T,2,1)-> 1419]  [1432 -(A,0,1)-> 1419]  [1449
>> -(O,0,1)-> 1419]  [1434 -(A,0,1)-> 1419]  [1431 -(O,0,1)-> 1419]
>> [1427 -(O,0,1)-> 1419]  [1423 -(O,0,1)-> 1419]
>> Node num: 4
>> (insn 1432 1431 1433 9 (set (reg:DI 2632)
>>        (plus:DI (reg/v/f:DI 1058 [ reg_info ])
>>            (reg:DI 1882 [ ivtmp.1636 ])))
>> ../../gcc/libiberty/regex.c:5543 79 {*adddi3_internal1}
>>     (nil))
>> OUT ARCS:  [1432 -(A,0,1)-> 1419]  [1432 -(T,1,0)-> 1433]
>> IN ARCS:  [1419 -(T,2,0)-> 1432]
>>
>>>> OK for mainline?
>>>>
>>>
>>> OK, with the following comments:
>>
>> Thanks, will address the comments and re-submit.
>>
>>> In other words, one would expect to see two Anti-dep edges from insn 2
>>> to insn 1, right?
>>
>> Yes, that's indeed the case in the first example above.
>>
>> Thanks,
>> Revital
>>
>
diff mbox

Patch

Index: ddg.c
===================================================================
--- ddg.c	(revision 179138)
+++ ddg.c	(working copy)
@@ -145,6 +145,23 @@  mem_access_insn_p (rtx insn)
   return rtx_mem_access_p (PATTERN (insn));
 }
 
+/* Return true if DEF_INSN contains address been auto-inc or auto-dec
+   which is used in USE_INSN.  Otherwise return false.  The result is
+   been used to decide whether to remove the edge between def_insn and
+   use_insn when -fmodulo-sched-allow-regmoves is set.  */
+static bool
+autoinc_var_is_used_p (rtx def_insn, rtx use_insn)
+{
+  rtx note;
+
+  for (note = REG_NOTES (def_insn); note; note = XEXP (note, 1))
+    if (REG_NOTE_KIND (note) == REG_INC
+	&& reg_referenced_p (XEXP (note, 0), PATTERN (use_insn)))
+      return true;
+
+  return false;
+}
+
 /* Computes the dependence parameters (latency, distance etc.), creates
    a ddg_edge and adds it to the given DDG.  */
 static void
@@ -173,10 +190,15 @@  create_ddg_dep_from_intra_loop_link (ddg
      compensate for that by generating reg-moves based on the life-range
      analysis.  The anti-deps that will be deleted are the ones which
      have true-deps edges in the opposite direction (in other words
-     the kernel has only one def of the relevant register).  TODO:
-     support the removal of all anti-deps edges, i.e. including those
+     the kernel has only one def of the relevant register).
+     If the address that is been auto-inc or auto-dec in DEST_NODE
+     is used in SRC_NODE then do not remove the edge to make sure
+     reg-moves will not be created for this address.  
+     TODO: support the removal of all anti-deps edges, i.e. including those
      whose register has multiple defs in the loop.  */
-  if (flag_modulo_sched_allow_regmoves && (t == ANTI_DEP && dt == REG_DEP))
+  if (flag_modulo_sched_allow_regmoves 
+      && (t == ANTI_DEP && dt == REG_DEP)
+      && !autoinc_var_is_used_p (dest_node->insn, src_node->insn))
     {
       rtx set;
 
@@ -302,10 +324,14 @@  add_cross_iteration_register_deps (ddg_p
 	  gcc_assert (first_def_node);
 
          /* Always create the edge if the use node is a branch in
-            order to prevent the creation of reg-moves.  */
+            order to prevent the creation of reg-moves.  
+            If the address that is been auto-inc or auto-dec in LAST_DEF
+            is used in USE_INSN then do not remove the edge to make sure
+            reg-moves will not be created for that address.  */
           if (DF_REF_ID (last_def) != DF_REF_ID (first_def)
               || !flag_modulo_sched_allow_regmoves
-	      || JUMP_P (use_node->insn))
+	      || JUMP_P (use_node->insn)
+              || autoinc_var_is_used_p (DF_REF_INSN (last_def), use_insn))
             create_ddg_dep_no_link (g, use_node, first_def_node, ANTI_DEP,
                                     REG_DEP, 1);
 
Index: modulo-sched.c
===================================================================
--- modulo-sched.c	(revision 179138)
+++ modulo-sched.c	(working copy)
@@ -1266,12 +1271,10 @@  sms_schedule (void)
 	continue;
       }
 
-      /* Don't handle BBs with calls or barriers or auto-increment insns 
-	 (to avoid creating invalid reg-moves for the auto-increment insns),
+      /* Don't handle BBs with calls or barriers
 	 or !single_set with the exception of instructions that include
 	 count_reg---these instructions are part of the control part
 	 that do-loop recognizes.
-         ??? Should handle auto-increment insns.
          ??? Should handle insns defining subregs.  */
      for (insn = head; insn != NEXT_INSN (tail); insn = NEXT_INSN (insn))
       {
@@ -1282,7 +1285,6 @@  sms_schedule (void)
             || (NONDEBUG_INSN_P (insn) && !JUMP_P (insn)
                 && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE
                 && !reg_mentioned_p (count_reg, insn))
-            || (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
             || (INSN_P (insn) && (set = single_set (insn))
                 && GET_CODE (SET_DEST (set)) == SUBREG))
         break;
@@ -1296,8 +1298,6 @@  sms_schedule (void)
 		fprintf (dump_file, "SMS loop-with-call\n");
 	      else if (BARRIER_P (insn))
 		fprintf (dump_file, "SMS loop-with-barrier\n");
-              else if (FIND_REG_INC_NOTE (insn, NULL_RTX) != 0)
-                fprintf (dump_file, "SMS reg inc\n");
               else if ((NONDEBUG_INSN_P (insn) && !JUMP_P (insn)
                 && !single_set (insn) && GET_CODE (PATTERN (insn)) != USE))
                 fprintf (dump_file, "SMS loop-with-not-single-set\n");