Patchwork [lra] patch to speed more compilation of PR54146

login
register
mail settings
Submitter Vladimir Makarov
Date Oct. 7, 2012, 3:59 p.m.
Message ID <5071A6FB.5030202@redhat.com>
Download mbox | patch
Permalink /patch/189855/
State New
Headers show

Comments

Vladimir Makarov - Oct. 7, 2012, 3:59 p.m.
The following patch speeds LRA up more on PR54146.  Below times for 
compilation of the test on gcc17.fsffrance.org (an AMD machine):

Before:
real=1214.71 user=1192.05 system=22.48
After:
real=1144.37 user=1124.31 system=20.11

The patch should not change the generated code.  About 2/3 of speed up 
is achieved by insertion of live ranges of pseudos only interesting for 
assignment in the start point chains.
1/3 is achieved by switching iterations through a bitmap instead of 
sparseset at the end of function process_bb_lives.

The patch was successfully bootstrapped on x86/x86-64.

Committed as rev. 192183.

2012-10-07  Vladimir Makarov  <vmakarov@redhat.com>

         * lra-int.h (struct lra_live_range): Remove finish_next.
         (lra_start_point_ranges, lra_finish_point_ranges): Remove.
         * lra-lives.c (lra_start_point_ranges, lra_finish_point_ranges):
         Remove.
         (process_bb_lives): Change start regno in
         EXECUTE_IF_SET_IN_BITMAP.  Iterate on DF_LR_IN (bb) instead of
         pseudos_live_through_calls.
         (create_start_finish_chains, rebuild_start_finish_chains): Remove.
         (compress_live_ranges): Don't call rebuild_start_finish_chains.
         (lra_create_live_ranges): Don't call create_start_finish_chains.
         (lra_clear_live_ranges): Remove code freeing
         lra_start_point_ranges and lra_finish_point_ranges.
         * lra-assigns.c (start_point_ranges, not_in_chain_mark): New.
         (create_live_range_start_chains): Ditto.
         (insert_in_live_range_start_chain): Ditto.
         (finish_live_range_start_chains): Ditto.
         (update_lives, assign_temporarily): Call
         insert_in_live_range_start_chain.
         (find_hard_regno_for, spill_for): Rename lra_start_point_ranges to
         start_point_ranges.
         (setup_live_pseudos_and_spill_after_risky): Ditto.
         (lra_assign): Call create_live_range_start_chains and
         finish_live_range_start_chains.
Steven Bosscher - Oct. 7, 2012, 4:24 p.m.
Hi Vlad,

Thanks for working on this!

> -  EXECUTE_IF_SET_IN_BITMAP (reg_live_out, 0, j, bi)
> -    if (j >= FIRST_PSEUDO_REGISTER)
> -      mark_pseudo_live (j);
> +  EXECUTE_IF_SET_IN_BITMAP (reg_live_out, FIRST_PSEUDO_REGISTER, j, bi)
> +    mark_pseudo_live (j);

FWIW, the above is optimized by GCC itself, in VRP. (I was surprised
and impressed by that :-)

Ciao!
Steven
Steven Bosscher - Oct. 7, 2012, 9:27 p.m.
On Sun, Oct 7, 2012 at 5:59 PM, Vladimir Makarov wrote:
> The following patch speeds LRA up more on PR54146.  Below times for
> compilation of the test on gcc17.fsffrance.org (an AMD machine):
>
> Before:
> real=1214.71 user=1192.05 system=22.48
> After:
> real=1144.37 user=1124.31 system=20.11

Hi Vlad,

The next bottle-neck in my timings is in
lra-eliminate.c:lra_eliminate(), in this loop:

   FOR_EACH_BB (bb)
     FOR_BB_INSNS_SAFE (bb, insn, temp)
       {
       if (bitmap_bit_p (&insns_with_changed_offsets, INSN_UID (insn)))
          process_insn_for_elimination (insn, final_p);
       }

The problem is in bitmap_bit_p. Random access to a large bitmap can be
very slow.

I'm playing with a patch to expand the insns_with_changed_offsets
bitmap to an sbitmap, and will send a patch if this works better.

Ciao!
Steven
Richard Guenther - Oct. 8, 2012, 7:20 a.m.
On Sun, Oct 7, 2012 at 11:27 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Sun, Oct 7, 2012 at 5:59 PM, Vladimir Makarov wrote:
>> The following patch speeds LRA up more on PR54146.  Below times for
>> compilation of the test on gcc17.fsffrance.org (an AMD machine):
>>
>> Before:
>> real=1214.71 user=1192.05 system=22.48
>> After:
>> real=1144.37 user=1124.31 system=20.11
>
> Hi Vlad,
>
> The next bottle-neck in my timings is in
> lra-eliminate.c:lra_eliminate(), in this loop:
>
>    FOR_EACH_BB (bb)
>      FOR_BB_INSNS_SAFE (bb, insn, temp)
>        {
>        if (bitmap_bit_p (&insns_with_changed_offsets, INSN_UID (insn)))
>           process_insn_for_elimination (insn, final_p);
>        }
>
> The problem is in bitmap_bit_p. Random access to a large bitmap can be
> very slow.
>
> I'm playing with a patch to expand the insns_with_changed_offsets
> bitmap to an sbitmap, and will send a patch if this works better.

Or make insns_with_changed_offsets a VEC of insns (or a pointer-set).

Richard.

> Ciao!
> Steven
Jakub Jelinek - Oct. 8, 2012, 8:18 a.m.
On Mon, Oct 08, 2012 at 09:20:47AM +0200, Richard Guenther wrote:
> On Sun, Oct 7, 2012 at 11:27 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> > The next bottle-neck in my timings is in
> > lra-eliminate.c:lra_eliminate(), in this loop:
> >
> >    FOR_EACH_BB (bb)
> >      FOR_BB_INSNS_SAFE (bb, insn, temp)
> >        {
> >        if (bitmap_bit_p (&insns_with_changed_offsets, INSN_UID (insn)))
> >           process_insn_for_elimination (insn, final_p);
> >        }
> >
> > The problem is in bitmap_bit_p. Random access to a large bitmap can be
> > very slow.
> >
> > I'm playing with a patch to expand the insns_with_changed_offsets
> > bitmap to an sbitmap, and will send a patch if this works better.
> 
> Or make insns_with_changed_offsets a VEC of insns (or a pointer-set).

Or use temporarily some rtx flag on the insns, from what I can see,
in_struct on *INSN is right now only used during scheduling and from reorg
till eoc, so for LRA sitting in between both scheduling passes it might
be possible to use that bit too.

	Jakub
Steven Bosscher - Oct. 8, 2012, 10:20 a.m.
On Mon, Oct 8, 2012 at 10:18 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > I'm playing with a patch to expand the insns_with_changed_offsets
>> > bitmap to an sbitmap, and will send a patch if this works better.
>>
>> Or make insns_with_changed_offsets a VEC of insns (or a pointer-set).
>
> Or use temporarily some rtx flag on the insns, from what I can see,
> in_struct on *INSN is right now only used during scheduling and from reorg
> till eoc, so for LRA sitting in between both scheduling passes it might
> be possible to use that bit too.

AFAICT neither of these ideas will work because only insn UIDs are
used when computing insns_with_changed_offsets. You'd need the actual
insn for a VEC, pointer map or flag. Also, with a VEC or pointer map,
it's difficult to union of the the insn_bitmap sets.

The patch I have for this uses an sbitmap, it's posted in a new thread
starting here:
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00698.html

Ciao!
Steven

Patch

Index: lra-lives.c
===================================================================
--- lra-lives.c	(revision 192169)
+++ lra-lives.c	(working copy)
@@ -54,10 +54,6 @@  along with GCC; see the file COPYING3.	I
    correspond to places where output operands are born.	 */
 int lra_live_max_point;
 
-/* Arrays of size LRA_LIVE_MAX_POINT mapping a program point to the
-   pseudo live ranges with given start/finish point.  */
-lra_live_range_t *lra_start_point_ranges, *lra_finish_point_ranges;
-
 /* Accumulated execution frequency of all references for each hard
    register.  */
 int lra_hard_reg_usage[FIRST_PSEUDO_REGISTER];
@@ -529,9 +525,8 @@  process_bb_lives (basic_block bb)
   REG_SET_TO_HARD_REG_SET (hard_regs_live, reg_live_out);
   AND_COMPL_HARD_REG_SET (hard_regs_live, eliminable_regset);
   AND_COMPL_HARD_REG_SET (hard_regs_live, lra_no_alloc_regs);
-  EXECUTE_IF_SET_IN_BITMAP (reg_live_out, 0, j, bi)
-    if (j >= FIRST_PSEUDO_REGISTER)
-      mark_pseudo_live (j);
+  EXECUTE_IF_SET_IN_BITMAP (reg_live_out, FIRST_PSEUDO_REGISTER, j, bi)
+    mark_pseudo_live (j);
       
   freq = REG_FREQ_FROM_BB (bb);
 
@@ -755,47 +750,13 @@  process_bb_lives (basic_block bb)
   EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
     mark_pseudo_dead (i);
 
-  EXECUTE_IF_SET_IN_SPARSESET (pseudos_live_through_calls, i)
-    if (bitmap_bit_p (DF_LR_IN (bb), i))
-      check_pseudos_live_through_calls (i);
+  EXECUTE_IF_SET_IN_BITMAP (DF_LR_IN (bb), FIRST_PSEUDO_REGISTER, j, bi)
+    if (sparseset_bit_p (pseudos_live_through_calls, j))
+      check_pseudos_live_through_calls (j);
   
   incr_curr_point (freq);
 }
 
-/* Create and set up LRA_START_POINT_RANGES and
-   LRA_FINISH_POINT_RANGES.  */
-static void
-create_start_finish_chains (void)
-{
-  int i, max_regno;
-  lra_live_range_t r;
-
-  lra_start_point_ranges = XCNEWVEC (lra_live_range_t, lra_live_max_point);
-  lra_finish_point_ranges = XCNEWVEC (lra_live_range_t, lra_live_max_point);
-  max_regno = max_reg_num ();
-  for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
-    {
-      for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
-	{
-	  r->start_next = lra_start_point_ranges[r->start];
-	  lra_start_point_ranges[r->start] = r;
-	  r->finish_next = lra_finish_point_ranges[r->finish];
-	  lra_finish_point_ranges[r->finish] = r;
-	}
-    }
-}
-
-/* Rebuild LRA_START_POINT_RANGES and LRA_FINISH_POINT_RANGES after
-   new live ranges and program points were added as a result of new
-   insn generation.  */
-static void
-rebuild_start_finish_chains (void)
-{
-  free (lra_finish_point_ranges);
-  free (lra_start_point_ranges);
-  create_start_finish_chains ();
-}
-
 /* Compress pseudo live ranges by removing program points where
    nothing happens.  Complexity of many algorithms in LRA is linear
    function of program points number.  To speed up the code we try to
@@ -934,7 +895,6 @@  static void
 compress_live_ranges (void)
 {
   remove_some_program_points_and_update_live_ranges ();
-  rebuild_start_finish_chains ();
   if (lra_dump_file != NULL)
     {
       fprintf (lra_dump_file, "Ranges after the compression:\n");
@@ -997,7 +957,6 @@  lra_create_live_ranges (bool all_p)
   FOR_EACH_BB (bb)
     process_bb_lives (bb);
   lra_live_max_point = curr_point;
-  create_start_finish_chains ();
   if (lra_dump_file != NULL)
     print_live_ranges (lra_dump_file);
   /* Clean up.	*/
@@ -1018,16 +977,6 @@  lra_clear_live_ranges (void)
 {
   int i;
 
-  if (lra_finish_point_ranges != NULL)
-    {
-      free (lra_finish_point_ranges);
-      lra_finish_point_ranges = NULL;
-    }
-  if (lra_start_point_ranges != NULL)
-    {
-      free (lra_start_point_ranges);
-      lra_start_point_ranges = NULL;
-    }
   for (i = 0; i < max_reg_num (); i++)
     free_live_range_list (lra_reg_info[i].live_ranges);
   VEC_free (int, heap, point_freq_vec);
Index: lra-assigns.c
===================================================================
--- lra-assigns.c	(revision 192169)
+++ lra-assigns.c	(working copy)
@@ -213,6 +213,66 @@  pseudo_compare_func (const void *v1p, co
   return r1 - r2;
 }
 
+/* Arrays of size LRA_LIVE_MAX_POINT mapping a program point to the
+   pseudo live ranges with given start point.  We insert only live
+   ranges of pseudos interesting for assignment purposes.  They are
+   reload pseudos and pseudos assigned to hard registers.  */
+static lra_live_range_t *start_point_ranges;
+
+/* Used as a flag that a live range is not inserted in the start point
+   chain.  */
+static struct lra_live_range not_in_chain_mark;
+
+/* Create and set up START_POINT_RANGES.  */
+static void
+create_live_range_start_chains (void)
+{
+  int i, max_regno;
+  lra_live_range_t r;
+
+  start_point_ranges = XCNEWVEC (lra_live_range_t, lra_live_max_point);
+  max_regno = max_reg_num ();
+  for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
+    if (i >= lra_constraint_new_regno_start || reg_renumber[i] >= 0)
+      {
+	for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
+	  {
+	    r->start_next = start_point_ranges[r->start];
+	    start_point_ranges[r->start] = r;
+	  }
+      }
+    else
+      {
+	for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
+	  r->start_next = &not_in_chain_mark;
+      }
+}
+
+/* Insert live ranges of pseudo REGNO into start chains if they are
+   not there yet.  */
+static void
+insert_in_live_range_start_chain (int regno)
+{
+  lra_live_range_t r = lra_reg_info[regno].live_ranges;
+
+  if (r->start_next != &not_in_chain_mark)
+    return;
+  for (; r != NULL; r = r->next)
+    {
+      r->start_next = start_point_ranges[r->start];
+      start_point_ranges[r->start] = r;
+    }
+}
+
+/* Free START_POINT_RANGES.  */
+static void
+finish_live_range_start_chains (void)
+{
+  gcc_assert (start_point_ranges != NULL);
+  free (start_point_ranges);
+  start_point_ranges = NULL;
+}
+
 /* Map: program point -> bitmap of all pseudos living at the point and
    assigned to hard registers.	*/
 static bitmap_head *live_hard_reg_pseudos;
@@ -279,7 +339,10 @@  update_lives (int regno, bool free_p)
 	if (free_p)
 	  bitmap_clear_bit (&live_hard_reg_pseudos[p], regno);
 	else
-	  bitmap_set_bit (&live_hard_reg_pseudos[p], regno);
+	  {
+	    bitmap_set_bit (&live_hard_reg_pseudos[p], regno);
+	    insert_in_live_range_start_chain (regno);
+	  }
     }
 }
 
@@ -378,7 +441,7 @@  find_hard_regno_for (int regno, int *cos
 	{
 	  lra_live_range_t r2;
 	  
-	  for (r2 = lra_start_point_ranges[p];
+	  for (r2 = start_point_ranges[p];
 	       r2 != NULL;
 	       r2 = r2->start_next)
 	    {
@@ -697,7 +760,10 @@  assign_temporarily (int regno, int hard_
 	if (hard_regno < 0)
 	  bitmap_clear_bit (&live_hard_reg_pseudos[p], regno);
 	else
-	  bitmap_set_bit (&live_hard_reg_pseudos[p], regno);
+	  {
+	    bitmap_set_bit (&live_hard_reg_pseudos[p], regno);
+	    insert_in_live_range_start_chain (regno);
+	  }
     }
   live_pseudos_reg_renumber[regno] = hard_regno;
 }
@@ -794,7 +860,7 @@  spill_for (int regno, bitmap spilled_pse
 		{
 		  lra_live_range_t r2;
 		  
-		  for (r2 = lra_start_point_ranges[p];
+		  for (r2 = start_point_ranges[p];
 		       r2 != NULL;
 		       r2 = r2->start_next)
 		    if (r2->regno >= lra_constraint_new_regno_start)
@@ -968,7 +1034,7 @@  setup_live_pseudos_and_spill_after_risky
 	    {
 	      lra_live_range_t r2;
 	      
-	      for (r2 = lra_start_point_ranges[p];
+	      for (r2 = start_point_ranges[p];
 		   r2 != NULL;
 		   r2 = r2->start_next)
 		if (live_pseudos_reg_renumber[r2->regno] >= 0)
@@ -1267,6 +1333,7 @@  lra_assign (void)
     regno_allocno_class_array[i] = lra_get_allocno_class (i);
   init_regno_assign_info ();
   bitmap_initialize (&all_spilled_pseudos, &reg_obstack);
+  create_live_range_start_chains ();
   setup_live_pseudos_and_spill_after_risky_transforms (&all_spilled_pseudos);
 #ifdef ENABLE_CHECKING
   for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
@@ -1292,6 +1359,7 @@  lra_assign (void)
 	no_spills_p = false;
 	break;
       }
+  finish_live_range_start_chains ();
   bitmap_clear (&all_spilled_pseudos);
   bitmap_initialize (&insns_to_process, &reg_obstack);
   EXECUTE_IF_SET_IN_BITMAP (&changed_pseudo_bitmap, 0, u, bi)
Index: lra-int.h
===================================================================
--- lra-int.h	(revision 192169)
+++ lra-int.h	(working copy)
@@ -62,8 +62,8 @@  struct lra_live_range
   /* Next structure describing program points where the pseudo
      lives.  */
   lra_live_range_t next;
-  /* Pointers to structures with the same start/finish.	 */
-  lra_live_range_t start_next, finish_next;
+  /* Pointer to structures with the same start.	 */
+  lra_live_range_t start_next;
 };
 
 typedef struct lra_copy *lra_copy_t;
@@ -315,7 +315,6 @@  extern bool lra_undo_inheritance (void);
 /* lra-lives.c: */
 
 extern int lra_live_max_point;
-extern lra_live_range_t *lra_start_point_ranges, *lra_finish_point_ranges;
 extern int *lra_point_freq;
 
 extern int lra_hard_reg_usage[FIRST_PSEUDO_REGISTER];