diff mbox

patch for PR53125

Message ID 4FAD2270.7030103@redhat.com
State New
Headers show

Commit Message

Vladimir Makarov May 11, 2012, 2:30 p.m. UTC
On 05/11/2012 08:17 AM, Richard Earnshaw wrote:
> On 10/05/12 20:58, Vladimir Makarov wrote:
>> The following patch is for PR53125.  The PR is described on
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53125.
>>
>> The patch improves the compilation speed by 35% for the case.
>>
>> The patch was successfully bootstrapped on x86-64.
>>
>> Committed as rev. 187373.
>>
>> 2012-05-10  Vladimir Makarov<vmakarov@redhat.com>
>>
>>           PR rtl-optimization/53125
>>           * ira.c (ira): Call find_moveable_pseudos and
>>           move_unallocated_pseudos if only ira_conflicts_p is true.
>>
>>
>
> ENOPATCH?
>
Sorry for ommitting the patch.  I added it to the attachment.

Comments

Steven Bosscher May 11, 2012, 6:42 p.m. UTC | #1
>>> 2012-05-10  Vladimir Makarov<vmakarov@redhat.com>
>>>
>>>          PR rtl-optimization/53125
>>>          * ira.c (ira): Call find_moveable_pseudos and
>>>          move_unallocated_pseudos if only ira_conflicts_p is true.

And the attached patch fixes the reginfo slowness.

The reginfo pass was doing:

  for each bb:
    for each insn in bb, from BB_END(bb) to BB_HEAD(bb):
      for each reg that died between BB_END(bb) and insn: // i.e.
register is live at insn
        REG_LIVE_LENGTH(reg)++

With very large basic blocks, and the kind of code for the test case
of the PR results in many live registers for SPARC (for x86_64 there
are far fewer live registers). For SPARC, there are O(1e5) insns and
O(1e4) registers live for most of the basic block. That is effectively
almost quadratic behavior in the number of insns per basic block.

But the above loop is a bit silly. It is much easier and
computationally cheaper to just remember at what insn reg died (last
used), and add to REG_LIVE_LENGTH the distance from the insn that sets
reg to the insn that used reg.

It turns out that (before or after the patch) partial or conditional
sets never kill a register, so that REG_LIVE_LENGTH for registers set
by partial or conditional stores is not very accurate. I have a few
ideas for how to improve that situation a bit, but I'm not sure if
it's worth the trouble.

I also think that the computation of REG_FREQ and
REG_FREQ_CALLS_CROSSED has been wrong for some time. The maximum
REG_FREQ is REG_FREQ_MAX (see regs.h) but regstat.c just accumulates
REG_FREQ_FROM_BB. I've added a few lines to limit the REG_FREQ to
REG_FREQ_MAX, e.g.:

                  REG_FREQ (dregno) += REG_FREQ_FROM_BB (bb);
+                 REG_FREQ (dregno) =
+                   MIN (REG_FREQ (dregno), REG_FREQ_MAX);

With this patch, the reginfo pass disappears from the time report for
the test case attached to PR53125, compiled at -O0.

Bootstrapped and tested on x86_64-unknown-linux-gnu, and I verified
for a few simple test cases that the computed REG_LIVE_LENGHTs are
unchanged. OK for trunk?

Ciao!
Steven
Steven Bosscher May 11, 2012, 6:42 p.m. UTC | #2
Now with patch

On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>>>> 2012-05-10  Vladimir Makarov<vmakarov@redhat.com>
>>>>
>>>>          PR rtl-optimization/53125
>>>>          * ira.c (ira): Call find_moveable_pseudos and
>>>>          move_unallocated_pseudos if only ira_conflicts_p is true.
>
> And the attached patch fixes the reginfo slowness.
>
> The reginfo pass was doing:
>
>  for each bb:
>    for each insn in bb, from BB_END(bb) to BB_HEAD(bb):
>      for each reg that died between BB_END(bb) and insn: // i.e.
> register is live at insn
>        REG_LIVE_LENGTH(reg)++
>
> With very large basic blocks, and the kind of code for the test case
> of the PR results in many live registers for SPARC (for x86_64 there
> are far fewer live registers). For SPARC, there are O(1e5) insns and
> O(1e4) registers live for most of the basic block. That is effectively
> almost quadratic behavior in the number of insns per basic block.
>
> But the above loop is a bit silly. It is much easier and
> computationally cheaper to just remember at what insn reg died (last
> used), and add to REG_LIVE_LENGTH the distance from the insn that sets
> reg to the insn that used reg.
>
> It turns out that (before or after the patch) partial or conditional
> sets never kill a register, so that REG_LIVE_LENGTH for registers set
> by partial or conditional stores is not very accurate. I have a few
> ideas for how to improve that situation a bit, but I'm not sure if
> it's worth the trouble.
>
> I also think that the computation of REG_FREQ and
> REG_FREQ_CALLS_CROSSED has been wrong for some time. The maximum
> REG_FREQ is REG_FREQ_MAX (see regs.h) but regstat.c just accumulates
> REG_FREQ_FROM_BB. I've added a few lines to limit the REG_FREQ to
> REG_FREQ_MAX, e.g.:
>
>                  REG_FREQ (dregno) += REG_FREQ_FROM_BB (bb);
> +                 REG_FREQ (dregno) =
> +                   MIN (REG_FREQ (dregno), REG_FREQ_MAX);
>
> With this patch, the reginfo pass disappears from the time report for
> the test case attached to PR53125, compiled at -O0.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, and I verified
> for a few simple test cases that the computed REG_LIVE_LENGHTs are
> unchanged. OK for trunk?
>
> Ciao!
> Steven
Steven Bosscher May 15, 2012, 11:59 a.m. UTC | #3
On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> Now with patch
>
> On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>>>>> 2012-05-10  Vladimir Makarov<vmakarov@redhat.com>
>>>>>
>>>>>          PR rtl-optimization/53125
>>>>>          * ira.c (ira): Call find_moveable_pseudos and
>>>>>          move_unallocated_pseudos if only ira_conflicts_p is true.
>>
>> And the attached patch fixes the reginfo slowness.
>>
>> The reginfo pass was doing:
>>
>>  for each bb:
>>    for each insn in bb, from BB_END(bb) to BB_HEAD(bb):
>>      for each reg that died between BB_END(bb) and insn: // i.e.
>> register is live at insn
>>        REG_LIVE_LENGTH(reg)++
>>
>> With very large basic blocks, and the kind of code for the test case
>> of the PR results in many live registers for SPARC (for x86_64 there
>> are far fewer live registers). For SPARC, there are O(1e5) insns and
>> O(1e4) registers live for most of the basic block. That is effectively
>> almost quadratic behavior in the number of insns per basic block.
>>
>> But the above loop is a bit silly. It is much easier and
>> computationally cheaper to just remember at what insn reg died (last
>> used), and add to REG_LIVE_LENGTH the distance from the insn that sets
>> reg to the insn that used reg.
>>
>> It turns out that (before or after the patch) partial or conditional
>> sets never kill a register, so that REG_LIVE_LENGTH for registers set
>> by partial or conditional stores is not very accurate. I have a few
>> ideas for how to improve that situation a bit, but I'm not sure if
>> it's worth the trouble.
>>
>> I also think that the computation of REG_FREQ and
>> REG_FREQ_CALLS_CROSSED has been wrong for some time. The maximum
>> REG_FREQ is REG_FREQ_MAX (see regs.h) but regstat.c just accumulates
>> REG_FREQ_FROM_BB. I've added a few lines to limit the REG_FREQ to
>> REG_FREQ_MAX, e.g.:
>>
>>                  REG_FREQ (dregno) += REG_FREQ_FROM_BB (bb);
>> +                 REG_FREQ (dregno) =
>> +                   MIN (REG_FREQ (dregno), REG_FREQ_MAX);
>>
>> With this patch, the reginfo pass disappears from the time report for
>> the test case attached to PR53125, compiled at -O0.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, and I verified
>> for a few simple test cases that the computed REG_LIVE_LENGHTs are
>> unchanged. OK for trunk?
>>
>> Ciao!
>> Steven

Ping?
Vladimir Makarov May 15, 2012, 8:38 p.m. UTC | #4
On 05/15/2012 07:59 AM, Steven Bosscher wrote:
> On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher<stevenb.gcc@gmail.com>  wrote:
>> Now with patch
>>
>> On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher<stevenb.gcc@gmail.com>  wrote:
>>>>>> 2012-05-10  Vladimir Makarov<vmakarov@redhat.com>
>>>>>>
>>>>>>           PR rtl-optimization/53125
>>>>>>           * ira.c (ira): Call find_moveable_pseudos and
>>>>>>           move_unallocated_pseudos if only ira_conflicts_p is true.
>>> And the attached patch fixes the reginfo slowness.
>>>
>>> The reginfo pass was doing:
>>>
>>>   for each bb:
>>>     for each insn in bb, from BB_END(bb) to BB_HEAD(bb):
>>>       for each reg that died between BB_END(bb) and insn: // i.e.
>>> register is live at insn
>>>         REG_LIVE_LENGTH(reg)++
>>>
>>> With very large basic blocks, and the kind of code for the test case
>>> of the PR results in many live registers for SPARC (for x86_64 there
>>> are far fewer live registers). For SPARC, there are O(1e5) insns and
>>> O(1e4) registers live for most of the basic block. That is effectively
>>> almost quadratic behavior in the number of insns per basic block.
>>>
>>> But the above loop is a bit silly. It is much easier and
>>> computationally cheaper to just remember at what insn reg died (last
>>> used), and add to REG_LIVE_LENGTH the distance from the insn that sets
>>> reg to the insn that used reg.
>>>
>>> It turns out that (before or after the patch) partial or conditional
>>> sets never kill a register, so that REG_LIVE_LENGTH for registers set
>>> by partial or conditional stores is not very accurate. I have a few
>>> ideas for how to improve that situation a bit, but I'm not sure if
>>> it's worth the trouble.
>>>
>>> I also think that the computation of REG_FREQ and
>>> REG_FREQ_CALLS_CROSSED has been wrong for some time. The maximum
>>> REG_FREQ is REG_FREQ_MAX (see regs.h) but regstat.c just accumulates
>>> REG_FREQ_FROM_BB. I've added a few lines to limit the REG_FREQ to
>>> REG_FREQ_MAX, e.g.:
>>>
>>>                   REG_FREQ (dregno) += REG_FREQ_FROM_BB (bb);
>>> +                 REG_FREQ (dregno) =
>>> +                   MIN (REG_FREQ (dregno), REG_FREQ_MAX);
>>>
>>> With this patch, the reginfo pass disappears from the time report for
>>> the test case attached to PR53125, compiled at -O0.
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, and I verified
>>> for a few simple test cases that the computed REG_LIVE_LENGHTs are
>>> unchanged. OK for trunk?
Sorry, Steven.  I believe that regstat.c is a part of 
DF-infrastructure.  At least, it was introduced by this work.  Therefore 
I did not respond to your email and probably can not have a right to 
approve the patch.

May be Kenny or Paolo (as the most active reviewers of DF-infrastructure 
these days) can help you with the approval.
Paolo Bonzini May 16, 2012, 6:27 a.m. UTC | #5
> Now with patch
> 
> On Fri, May 11, 2012 at 8:42 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>>>>> 2012-05-10  Vladimir Makarov<vmakarov@redhat.com>
>>>>>
>>>>>          PR rtl-optimization/53125
>>>>>          * ira.c (ira): Call find_moveable_pseudos and
>>>>>          move_unallocated_pseudos if only ira_conflicts_p is true.
>> And the attached patch fixes the reginfo slowness.
>>
>> The reginfo pass was doing:
>>
>>  for each bb:
>>    for each insn in bb, from BB_END(bb) to BB_HEAD(bb):
>>      for each reg that died between BB_END(bb) and insn: // i.e.
>> register is live at insn
>>        REG_LIVE_LENGTH(reg)++
>>
>> With very large basic blocks, and the kind of code for the test case
>> of the PR results in many live registers for SPARC (for x86_64 there
>> are far fewer live registers). For SPARC, there are O(1e5) insns and
>> O(1e4) registers live for most of the basic block. That is effectively
>> almost quadratic behavior in the number of insns per basic block.
>>
>> But the above loop is a bit silly. It is much easier and
>> computationally cheaper to just remember at what insn reg died (last
>> used), and add to REG_LIVE_LENGTH the distance from the insn that sets
>> reg to the insn that used reg.

Yes, this makes a lot of sense.  Patch ok with one typo fixed.

>> It turns out that (before or after the patch) partial or conditional
>> sets never kill a register, so that REG_LIVE_LENGTH for registers set
>> by partial or conditional stores is not very accurate.

But this is correct, isn't it?  The live range for such registers is
indeed extending before and after the store.

> @@ -310,22 +322,26 @@ regstat_bb_compute_ri (unsigned int bb_i
>  
>  	  if (bitmap_set_bit (live, uregno))
>  	    {
> -	      /* This register is now live.  */
> +	      /* This register is now live.  Begin to process it locally.
>  
> -	      /* If we have seen this regno, then it has already been
> -		 processed correctly with the per insn increment.  If
> -		 we have not seen it we set the bit so that begins to
> -		 get processed locally.  Note that we don't even get
> -		 here if the variable was live at the end of the block
> -		 since just a ref inside the block does not effect the
> -		 calculations.  */
> +		 Note that we don't even get here if the variable was live
> +		 at the end of the block since just a ref inside the block
> +		 does not effect the calculations.  */
>  	      REG_LIVE_LENGTH (uregno) ++;
> +	      local_live_last_luid[uregno] = luid;
>  	      bitmap_set_bit (local_live, uregno);
>  	      bitmap_set_bit (local_processed, uregno);
>  	    }
>  	}
>      }
>  
> +  /* Add the liveness length to all registers that were used somewhere
> +     in this bock, but not between that use and the head of this block.  */

Typo is here ("bock"->"block").

Paolo
diff mbox

Patch

Index: ira.c
===================================================================
--- ira.c       (revision 187372)
+++ ira.c       (working copy)
@@ -4125,7 +4125,12 @@  ira (FILE *f)
     }

   allocated_reg_info_size = max_reg_num ();
-  find_moveable_pseudos ();
+
+  /* It is not worth to do such improvement when we use a simple
+     allocation because of -O0 usage or because the function is too
+     big.  */
+  if (ira_conflicts_p)
+    find_moveable_pseudos ();

   max_regno_before_ira = max_reg_num ();
   ira_setup_eliminable_regset ();
@@ -4234,7 +4239,10 @@  ira (FILE *f)
              max_regno * sizeof (struct ira_spilled_reg_stack_slot));
     }
   allocate_initial_values (reg_equivs);
-  move_unallocated_pseudos ();
+
+  /* See comment for find_moveable_pseudos call.  */
+  if (ira_conflicts_p)
+    move_unallocated_pseudos ();
 }

 static void