Message ID | 4FAD2270.7030103@redhat.com |
---|---|
State | New |
Headers | show |
>>> 2012-05-10 Vladimir Makarov<vmakarov@redhat.com> >>> >>> PR rtl-optimization/53125 >>> * ira.c (ira): Call find_moveable_pseudos and >>> move_unallocated_pseudos if only ira_conflicts_p is true. And the attached patch fixes the reginfo slowness. The reginfo pass was doing: for each bb: for each insn in bb, from BB_END(bb) to BB_HEAD(bb): for each reg that died between BB_END(bb) and insn: // i.e. register is live at insn REG_LIVE_LENGTH(reg)++ With very large basic blocks, and the kind of code for the test case of the PR results in many live registers for SPARC (for x86_64 there are far fewer live registers). For SPARC, there are O(1e5) insns and O(1e4) registers live for most of the basic block. That is effectively almost quadratic behavior in the number of insns per basic block. But the above loop is a bit silly. It is much easier and computationally cheaper to just remember at what insn reg died (last used), and add to REG_LIVE_LENGTH the distance from the insn that sets reg to the insn that used reg. It turns out that (before or after the patch) partial or conditional sets never kill a register, so that REG_LIVE_LENGTH for registers set by partial or conditional stores is not very accurate. I have a few ideas for how to improve that situation a bit, but I'm not sure if it's worth the trouble. I also think that the computation of REG_FREQ and REG_FREQ_CALLS_CROSSED has been wrong for some time. The maximum REG_FREQ is REG_FREQ_MAX (see regs.h) but regstat.c just accumulates REG_FREQ_FROM_BB. I've added a few lines to limit the REG_FREQ to REG_FREQ_MAX, e.g.: REG_FREQ (dregno) += REG_FREQ_FROM_BB (bb); + REG_FREQ (dregno) = + MIN (REG_FREQ (dregno), REG_FREQ_MAX); With this patch, the reginfo pass disappears from the time report for the test case attached to PR53125, compiled at -O0. Bootstrapped and tested on x86_64-unknown-linux-gnu, and I verified for a few simple test cases that the computed REG_LIVE_LENGHTs are unchanged. OK for trunk? Ciao! Steven
Now with patch On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: >>>> 2012-05-10 Vladimir Makarov<vmakarov@redhat.com> >>>> >>>> PR rtl-optimization/53125 >>>> * ira.c (ira): Call find_moveable_pseudos and >>>> move_unallocated_pseudos if only ira_conflicts_p is true. > > And the attached patch fixes the reginfo slowness. > > The reginfo pass was doing: > > for each bb: > for each insn in bb, from BB_END(bb) to BB_HEAD(bb): > for each reg that died between BB_END(bb) and insn: // i.e. > register is live at insn > REG_LIVE_LENGTH(reg)++ > > With very large basic blocks, and the kind of code for the test case > of the PR results in many live registers for SPARC (for x86_64 there > are far fewer live registers). For SPARC, there are O(1e5) insns and > O(1e4) registers live for most of the basic block. That is effectively > almost quadratic behavior in the number of insns per basic block. > > But the above loop is a bit silly. It is much easier and > computationally cheaper to just remember at what insn reg died (last > used), and add to REG_LIVE_LENGTH the distance from the insn that sets > reg to the insn that used reg. > > It turns out that (before or after the patch) partial or conditional > sets never kill a register, so that REG_LIVE_LENGTH for registers set > by partial or conditional stores is not very accurate. I have a few > ideas for how to improve that situation a bit, but I'm not sure if > it's worth the trouble. > > I also think that the computation of REG_FREQ and > REG_FREQ_CALLS_CROSSED has been wrong for some time. The maximum > REG_FREQ is REG_FREQ_MAX (see regs.h) but regstat.c just accumulates > REG_FREQ_FROM_BB. I've added a few lines to limit the REG_FREQ to > REG_FREQ_MAX, e.g.: > > REG_FREQ (dregno) += REG_FREQ_FROM_BB (bb); > + REG_FREQ (dregno) = > + MIN (REG_FREQ (dregno), REG_FREQ_MAX); > > With this patch, the reginfo pass disappears from the time report for > the test case attached to PR53125, compiled at -O0. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, and I verified > for a few simple test cases that the computed REG_LIVE_LENGHTs are > unchanged. OK for trunk? > > Ciao! > Steven
On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: > Now with patch > > On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: >>>>> 2012-05-10 Vladimir Makarov<vmakarov@redhat.com> >>>>> >>>>> PR rtl-optimization/53125 >>>>> * ira.c (ira): Call find_moveable_pseudos and >>>>> move_unallocated_pseudos if only ira_conflicts_p is true. >> >> And the attached patch fixes the reginfo slowness. >> >> The reginfo pass was doing: >> >> for each bb: >> for each insn in bb, from BB_END(bb) to BB_HEAD(bb): >> for each reg that died between BB_END(bb) and insn: // i.e. >> register is live at insn >> REG_LIVE_LENGTH(reg)++ >> >> With very large basic blocks, and the kind of code for the test case >> of the PR results in many live registers for SPARC (for x86_64 there >> are far fewer live registers). For SPARC, there are O(1e5) insns and >> O(1e4) registers live for most of the basic block. That is effectively >> almost quadratic behavior in the number of insns per basic block. >> >> But the above loop is a bit silly. It is much easier and >> computationally cheaper to just remember at what insn reg died (last >> used), and add to REG_LIVE_LENGTH the distance from the insn that sets >> reg to the insn that used reg. >> >> It turns out that (before or after the patch) partial or conditional >> sets never kill a register, so that REG_LIVE_LENGTH for registers set >> by partial or conditional stores is not very accurate. I have a few >> ideas for how to improve that situation a bit, but I'm not sure if >> it's worth the trouble. >> >> I also think that the computation of REG_FREQ and >> REG_FREQ_CALLS_CROSSED has been wrong for some time. The maximum >> REG_FREQ is REG_FREQ_MAX (see regs.h) but regstat.c just accumulates >> REG_FREQ_FROM_BB. I've added a few lines to limit the REG_FREQ to >> REG_FREQ_MAX, e.g.: >> >> REG_FREQ (dregno) += REG_FREQ_FROM_BB (bb); >> + REG_FREQ (dregno) = >> + MIN (REG_FREQ (dregno), REG_FREQ_MAX); >> >> With this patch, the reginfo pass disappears from the time report for >> the test case attached to PR53125, compiled at -O0. >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu, and I verified >> for a few simple test cases that the computed REG_LIVE_LENGHTs are >> unchanged. OK for trunk? >> >> Ciao! >> Steven Ping?
On 05/15/2012 07:59 AM, Steven Bosscher wrote: > On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher<stevenb.gcc@gmail.com> wrote: >> Now with patch >> >> On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher<stevenb.gcc@gmail.com> wrote: >>>>>> 2012-05-10 Vladimir Makarov<vmakarov@redhat.com> >>>>>> >>>>>> PR rtl-optimization/53125 >>>>>> * ira.c (ira): Call find_moveable_pseudos and >>>>>> move_unallocated_pseudos if only ira_conflicts_p is true. >>> And the attached patch fixes the reginfo slowness. >>> >>> The reginfo pass was doing: >>> >>> for each bb: >>> for each insn in bb, from BB_END(bb) to BB_HEAD(bb): >>> for each reg that died between BB_END(bb) and insn: // i.e. >>> register is live at insn >>> REG_LIVE_LENGTH(reg)++ >>> >>> With very large basic blocks, and the kind of code for the test case >>> of the PR results in many live registers for SPARC (for x86_64 there >>> are far fewer live registers). For SPARC, there are O(1e5) insns and >>> O(1e4) registers live for most of the basic block. That is effectively >>> almost quadratic behavior in the number of insns per basic block. >>> >>> But the above loop is a bit silly. It is much easier and >>> computationally cheaper to just remember at what insn reg died (last >>> used), and add to REG_LIVE_LENGTH the distance from the insn that sets >>> reg to the insn that used reg. >>> >>> It turns out that (before or after the patch) partial or conditional >>> sets never kill a register, so that REG_LIVE_LENGTH for registers set >>> by partial or conditional stores is not very accurate. I have a few >>> ideas for how to improve that situation a bit, but I'm not sure if >>> it's worth the trouble. >>> >>> I also think that the computation of REG_FREQ and >>> REG_FREQ_CALLS_CROSSED has been wrong for some time. The maximum >>> REG_FREQ is REG_FREQ_MAX (see regs.h) but regstat.c just accumulates >>> REG_FREQ_FROM_BB. I've added a few lines to limit the REG_FREQ to >>> REG_FREQ_MAX, e.g.: >>> >>> REG_FREQ (dregno) += REG_FREQ_FROM_BB (bb); >>> + REG_FREQ (dregno) = >>> + MIN (REG_FREQ (dregno), REG_FREQ_MAX); >>> >>> With this patch, the reginfo pass disappears from the time report for >>> the test case attached to PR53125, compiled at -O0. >>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu, and I verified >>> for a few simple test cases that the computed REG_LIVE_LENGHTs are >>> unchanged. OK for trunk? Sorry, Steven. I believe that regstat.c is a part of DF-infrastructure. At least, it was introduced by this work. Therefore I did not respond to your email and probably can not have a right to approve the patch. May be Kenny or Paolo (as the most active reviewers of DF-infrastructure these days) can help you with the approval.
> Now with patch > > On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: >>>>> 2012-05-10 Vladimir Makarov<vmakarov@redhat.com> >>>>> >>>>> PR rtl-optimization/53125 >>>>> * ira.c (ira): Call find_moveable_pseudos and >>>>> move_unallocated_pseudos if only ira_conflicts_p is true. >> And the attached patch fixes the reginfo slowness. >> >> The reginfo pass was doing: >> >> for each bb: >> for each insn in bb, from BB_END(bb) to BB_HEAD(bb): >> for each reg that died between BB_END(bb) and insn: // i.e. >> register is live at insn >> REG_LIVE_LENGTH(reg)++ >> >> With very large basic blocks, and the kind of code for the test case >> of the PR results in many live registers for SPARC (for x86_64 there >> are far fewer live registers). For SPARC, there are O(1e5) insns and >> O(1e4) registers live for most of the basic block. That is effectively >> almost quadratic behavior in the number of insns per basic block. >> >> But the above loop is a bit silly. It is much easier and >> computationally cheaper to just remember at what insn reg died (last >> used), and add to REG_LIVE_LENGTH the distance from the insn that sets >> reg to the insn that used reg. Yes, this makes a lot of sense. Patch ok with one typo fixed. >> It turns out that (before or after the patch) partial or conditional >> sets never kill a register, so that REG_LIVE_LENGTH for registers set >> by partial or conditional stores is not very accurate. But this is correct, isn't it? The live range for such registers is indeed extending before and after the store. > @@ -310,22 +322,26 @@ regstat_bb_compute_ri (unsigned int bb_i > > if (bitmap_set_bit (live, uregno)) > { > - /* This register is now live. */ > + /* This register is now live. Begin to process it locally. > > - /* If we have seen this regno, then it has already been > - processed correctly with the per insn increment. If > - we have not seen it we set the bit so that begins to > - get processed locally. Note that we don't even get > - here if the variable was live at the end of the block > - since just a ref inside the block does not effect the > - calculations. */ > + Note that we don't even get here if the variable was live > + at the end of the block since just a ref inside the block > + does not effect the calculations. */ > REG_LIVE_LENGTH (uregno) ++; > + local_live_last_luid[uregno] = luid; > bitmap_set_bit (local_live, uregno); > bitmap_set_bit (local_processed, uregno); > } > } > } > > + /* Add the liveness length to all registers that were used somewhere > + in this bock, but not between that use and the head of this block. */ Typo is here ("bock"->"block"). Paolo
Index: ira.c =================================================================== --- ira.c (revision 187372) +++ ira.c (working copy) @@ -4125,7 +4125,12 @@ ira (FILE *f) } allocated_reg_info_size = max_reg_num (); - find_moveable_pseudos (); + + /* It is not worth to do such improvement when we use a simple + allocation because of -O0 usage or because the function is too + big. */ + if (ira_conflicts_p) + find_moveable_pseudos (); max_regno_before_ira = max_reg_num (); ira_setup_eliminable_regset (); @@ -4234,7 +4239,10 @@ ira (FILE *f) max_regno * sizeof (struct ira_spilled_reg_stack_slot)); } allocate_initial_values (reg_equivs); - move_unallocated_pseudos (); + + /* See comment for find_moveable_pseudos call. */ + if (ira_conflicts_p) + move_unallocated_pseudos (); } static void