Patchwork [lra] 3rd patch to speed more compilation of PR54146

login
register
mail settings
Submitter Vladimir Makarov
Date Oct. 9, 2012, 4:10 p.m.
Message ID <50744C5D.2010501@redhat.com>
Download mbox | patch
Permalink /patch/190368/
State New
Headers show

Comments

Vladimir Makarov - Oct. 9, 2012, 4:10 p.m.
On 10/08/2012 05:14 PM, Steven Bosscher wrote:
> On Mon, Oct 8, 2012 at 10:26 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>> So I checked it on big file with > hundred functionson Intel machine and got
>>
>> before a part of the patch implementing the insn stack as sbitmap
>> real=243.40 user=241.61 system=1.00
>>
>> after the part of the patch:
>> real=244.89 user=243.02 system=0.96
> Is that more than just noise, you think? A ~1.5s difference on ~240
> total isn't very much. I measured the timings on my set of cc1-i
> files, and sometimes the without-patch compiler was faster by a tiny
> amount, and sometimes it was slower. Even on an average of 10 runs I
> really couldn't say that the patch was a win or loss on the whole.
>
> I measured this on a mostly idle machine at home, not gcc17, which
> seems to be even more busy than usual lately, thanks to you and me :-)
>
>
Sorry, Steven.  It was a noise.  I ran it again now 3 times  and found 
that was a noise.

After some thinking, I realized that sbitmap is really the best 
representation for this particular case.  That is because at the 1st 
constraint pass we have all bits are set as we process *all* insns on 
the 1st pass.  So sbitmap (at least before the extension and if we have 
pretty compact UIDs) will be always smaller than bitmap.

I committed the following your patch to the branch (as rev. 192264).

And again, sorry for the false conclusions.

2012-10-09  Steven Bosscher  <steven@gcc.gnu.org>

         * lra-int.h (lra_constraint_insn_stack_bitmap,
         lra_constraint_insn_stack): Remove.
         (lra_pop_insn, lra_insn_stack_length): New prototypes.
         * lra.c (lra_constraint_insn_stack_bitmap): Make static sbitmap.
         (lra_constraint_insn_stack): Make static.
         (lra_push_insn_1): New function.
         (lra_push_insn): Rewrite using lra_push_insn_1.
         (lra_push_insn_and_update_insn_regno_info): Likewise.
         (lra_pop_insn, lra_insn_stack_length): New functions.
         * lra_constraints.c (lra_constraints): Use new interface to
         insns stack instead of manipulating in-place.
Richard Guenther - Oct. 9, 2012, 5:06 p.m.
On Tue, Oct 9, 2012 at 6:10 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> On 10/08/2012 05:14 PM, Steven Bosscher wrote:
>>
>> On Mon, Oct 8, 2012 at 10:26 PM, Vladimir Makarov <vmakarov@redhat.com>
>> wrote:
>>
>>> So I checked it on big file with > hundred functionson Intel machine and
>>> got
>>>
>>> before a part of the patch implementing the insn stack as sbitmap
>>> real=243.40 user=241.61 system=1.00
>>>
>>> after the part of the patch:
>>> real=244.89 user=243.02 system=0.96
>>
>> Is that more than just noise, you think? A ~1.5s difference on ~240
>> total isn't very much. I measured the timings on my set of cc1-i
>> files, and sometimes the without-patch compiler was faster by a tiny
>> amount, and sometimes it was slower. Even on an average of 10 runs I
>> really couldn't say that the patch was a win or loss on the whole.
>>
>> I measured this on a mostly idle machine at home, not gcc17, which
>> seems to be even more busy than usual lately, thanks to you and me :-)
>>
>>
> Sorry, Steven.  It was a noise.  I ran it again now 3 times  and found that
> was a noise.
>
> After some thinking, I realized that sbitmap is really the best
> representation for this particular case.  That is because at the 1st
> constraint pass we have all bits are set as we process *all* insns on the
> 1st pass.  So sbitmap (at least before the extension and if we have pretty
> compact UIDs) will be always smaller than bitmap.
>
> I committed the following your patch to the branch (as rev. 192264).
>
> And again, sorry for the false conclusions.

Btw, congratulations for all the speedups (even though they probably
are noise for "regular" programs?)!  I'm looking forward to the
merge of LRA for x86 now.

Richard.

> 2012-10-09  Steven Bosscher  <steven@gcc.gnu.org>
>
>
>         * lra-int.h (lra_constraint_insn_stack_bitmap,
>         lra_constraint_insn_stack): Remove.
>         (lra_pop_insn, lra_insn_stack_length): New prototypes.
>         * lra.c (lra_constraint_insn_stack_bitmap): Make static sbitmap.
>         (lra_constraint_insn_stack): Make static.
>         (lra_push_insn_1): New function.
>         (lra_push_insn): Rewrite using lra_push_insn_1.
>         (lra_push_insn_and_update_insn_regno_info): Likewise.
>         (lra_pop_insn, lra_insn_stack_length): New functions.
>         * lra_constraints.c (lra_constraints): Use new interface to
>         insns stack instead of manipulating in-place.
>
Vladimir Makarov - Oct. 9, 2012, 7:02 p.m.
On 10/09/2012 01:06 PM, Richard Guenther wrote:
> On Tue, Oct 9, 2012 at 6:10 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> On 10/08/2012 05:14 PM, Steven Bosscher wrote:
>>> On Mon, Oct 8, 2012 at 10:26 PM, Vladimir Makarov <vmakarov@redhat.com>
>>> wrote:
>>>
>>>> So I checked it on big file with > hundred functionson Intel machine and
>>>> got
>>>>
>>>> before a part of the patch implementing the insn stack as sbitmap
>>>> real=243.40 user=241.61 system=1.00
>>>>
>>>> after the part of the patch:
>>>> real=244.89 user=243.02 system=0.96
>>> Is that more than just noise, you think? A ~1.5s difference on ~240
>>> total isn't very much. I measured the timings on my set of cc1-i
>>> files, and sometimes the without-patch compiler was faster by a tiny
>>> amount, and sometimes it was slower. Even on an average of 10 runs I
>>> really couldn't say that the patch was a win or loss on the whole.
>>>
>>> I measured this on a mostly idle machine at home, not gcc17, which
>>> seems to be even more busy than usual lately, thanks to you and me :-)
>>>
>>>
>> Sorry, Steven.  It was a noise.  I ran it again now 3 times  and found that
>> was a noise.
>>
>> After some thinking, I realized that sbitmap is really the best
>> representation for this particular case.  That is because at the 1st
>> constraint pass we have all bits are set as we process *all* insns on the
>> 1st pass.  So sbitmap (at least before the extension and if we have pretty
>> compact UIDs) will be always smaller than bitmap.
>>
>> I committed the following your patch to the branch (as rev. 192264).
>>
>> And again, sorry for the false conclusions.
> Btw, congratulations for all the speedups (even though they probably
> are noise for "regular" programs?)!  I'm looking forward to the
> merge of LRA for x86 now.
>
Thanks, Richard.  Many thanks to Steven too.

I think I'll do merging in a week as I need to address proposals from 
the reviews (some of them needs good testing).

Patch

Index: lra-int.h
===================================================================
--- lra-int.h	(revision 192183)
+++ lra-int.h	(working copy)
@@ -249,14 +243,13 @@  extern HARD_REG_SET lra_no_alloc_regs;
 extern int lra_insn_recog_data_len;
 extern lra_insn_recog_data_t *lra_insn_recog_data;
 
-extern bitmap_head lra_constraint_insn_stack_bitmap;
-extern VEC (rtx, heap) *lra_constraint_insn_stack;
-
 extern int lra_curr_reload_num;
 
 extern void lra_push_insn (rtx);
 extern void lra_push_insn_by_uid (unsigned int);
 extern void lra_push_insn_and_update_insn_regno_info (rtx);
+extern rtx lra_pop_insn (void);
+extern unsigned int lra_insn_stack_length (void);
 
 extern rtx lra_create_new_reg_with_unique_value (enum machine_mode, rtx,
 						 enum reg_class, const char *);
Index: lra.c
===================================================================
--- lra.c	(revision 192183)
+++ lra.c	(working copy)
@@ -1736,19 +1736,43 @@  lra_get_insn_regs (int uid)
    should be processed by the next constraint pass.  */
 
 /* Bitmap used to put an insn on the stack only in one exemplar.  */
-bitmap_head lra_constraint_insn_stack_bitmap;
+static sbitmap lra_constraint_insn_stack_bitmap;
 
 /* The stack itself.  */
 VEC (rtx, heap) *lra_constraint_insn_stack;
 
+/* Put INSN on the stack.  If ALWAYS_UPDATE is true, always update the reg
+   info for INSN, otherwise only update it if INSN is not already on the
+   stack.  */
+static inline void
+lra_push_insn_1 (rtx insn, bool always_update)
+{
+  unsigned int uid = INSN_UID (insn);
+  if (always_update)
+    lra_update_insn_regno_info (insn);
+  if (uid >= SBITMAP_SIZE (lra_constraint_insn_stack_bitmap))
+    lra_constraint_insn_stack_bitmap =
+      sbitmap_resize (lra_constraint_insn_stack_bitmap, 3 * uid / 2, 0);
+  if (TEST_BIT (lra_constraint_insn_stack_bitmap, uid))
+    return;
+  SET_BIT (lra_constraint_insn_stack_bitmap, uid);
+  if (! always_update)
+    lra_update_insn_regno_info (insn);
+  VEC_safe_push (rtx, heap, lra_constraint_insn_stack, insn);
+}
+
 /* Put INSN on the stack.  */
 void
 lra_push_insn (rtx insn)
 {
-  if (! bitmap_set_bit (&lra_constraint_insn_stack_bitmap, INSN_UID (insn)))
-    return;
-  lra_update_insn_regno_info (insn);
-  VEC_safe_push (rtx, heap, lra_constraint_insn_stack, insn);
+  lra_push_insn_1 (insn, false);
+}
+
+/* Put INSN on the stack and update its reg info.  */
+void
+lra_push_insn_and_update_insn_regno_info (rtx insn)
+{
+  lra_push_insn_1 (insn, true);
 }
 
 /* Put insn with UID on the stack.  */
@@ -1758,14 +1782,20 @@  lra_push_insn_by_uid (unsigned int uid)
   lra_push_insn (lra_insn_recog_data[uid]->insn);
 }
 
-/* Put INSN on the stack and update its reg info.  */
-void
-lra_push_insn_and_update_insn_regno_info (rtx insn)
+/* Take the last-inserted insns off the stack and return it.  */
+rtx
+lra_pop_insn (void)
 {
-  lra_update_insn_regno_info (insn);
-  if (! bitmap_set_bit (&lra_constraint_insn_stack_bitmap, INSN_UID (insn)))
-    return;
-  VEC_safe_push (rtx, heap, lra_constraint_insn_stack, insn);
+  rtx insn = VEC_pop (rtx, lra_constraint_insn_stack);
+  RESET_BIT (lra_constraint_insn_stack_bitmap, INSN_UID (insn));
+  return insn;
+}
+
+/* Return the current size of the insn stack.  */
+unsigned int
+lra_insn_stack_length (void)
+{
+  return VEC_length (rtx, lra_constraint_insn_stack);
 }
 
 /* Push insns FROM to TO (excluding it) going in reverse order.	 */
@@ -2240,7 +2270,8 @@  lra (FILE *f)
      expensive when a lot of RTL changes are made.  */
   df_set_flags (DF_NO_INSN_RESCAN);
   lra_constraint_insn_stack = VEC_alloc (rtx, heap, get_max_uid ());
-  bitmap_initialize (&lra_constraint_insn_stack_bitmap, &reg_obstack);
+  lra_constraint_insn_stack_bitmap = sbitmap_alloc (get_max_uid ());
+  sbitmap_zero (lra_constraint_insn_stack_bitmap);
   lra_live_ranges_init ();
   lra_contraints_init ();
   lra_curr_reload_num = 0;
@@ -2322,7 +2353,7 @@  lra (FILE *f)
   lra_live_ranges_finish ();
   lra_contraints_finish ();
   finish_reg_info ();
-  bitmap_clear (&lra_constraint_insn_stack_bitmap);
+  sbitmap_free (lra_constraint_insn_stack_bitmap);
   VEC_free (rtx, heap, lra_constraint_insn_stack);
   finish_insn_recog_data ();
   regstat_free_n_sets_and_refs ();
Index: lra-constraints.c
===================================================================
--- lra-constraints.c	(revision 192183)
+++ lra-constraints.c	(working copy)
@@ -851,7 +851,7 @@  operands_match_p (rtx x, rtx y, int y_ha
   return true;
 }
 
-/* Reload pseudos created for input and output(or early clobber)
+/* Reload pseudos created for input and output (or early clobber)
    reloads which should have the same hard register.  Such pseudos as
    input operands should be not inherited because the output rewrite
    the value in any away.  */
@@ -3674,7 +3674,7 @@  lra_constraints (bool first_p)
 {
   bool changed_p;
   int i, hard_regno, new_insns_num;
-  unsigned int min_len;
+  unsigned int min_len, new_min_len;
   rtx set, x, dest_reg;
   basic_block last_bb;
 
@@ -3738,24 +3738,23 @@  lra_constraints (bool first_p)
 	  }
       }
   lra_eliminate (false);
-  min_len = VEC_length (rtx, lra_constraint_insn_stack);
+  min_len = lra_insn_stack_length ();
   new_insns_num = 0;
   last_bb = NULL;
   changed_p = false;
-  for (;VEC_length (rtx, lra_constraint_insn_stack) != 0;)
+  while ((new_min_len = lra_insn_stack_length ()) != 0)
     {
-      curr_insn = VEC_pop (rtx, lra_constraint_insn_stack);
-      bitmap_clear_bit (&lra_constraint_insn_stack_bitmap,
-			INSN_UID (curr_insn));
+      curr_insn = lra_pop_insn ();
+      --new_min_len;
       curr_bb = BLOCK_FOR_INSN (curr_insn); 
       if (curr_bb != last_bb)
 	{
 	  last_bb = curr_bb;
 	  bb_reload_num = lra_curr_reload_num;
 	}
-      if (min_len > VEC_length (rtx, lra_constraint_insn_stack))
+      if (min_len > new_min_len)
 	{
-	  min_len = VEC_length (rtx, lra_constraint_insn_stack);
+	  min_len = new_min_len;
 	  new_insns_num = 0;
 	}
       if (new_insns_num > MAX_RELOAD_INSNS_NUMBER)