diff mbox

[df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps

Message ID alpine.LNX.2.02.1107080549070.1237@localhost.localdomain
State New
Headers show

Commit Message

Dimitrios Apostolou July 8, 2011, 3:20 a.m. UTC
Hello list,

The attached patch does two things for df_get_call_refs():
* First it uses HARD_REG_SETs for defs_generated and 
regs_invalidated_by_call, instead of bitmaps. Replacing in total more 
than 400K calls (for my testcase) to bitmap_bit_p() with the much faster 
TEST_HARD_REG_BIT, reduces the total instruction count from about 13M to 
1.5M.
* Second it produces the REFs in REGNO order, which is important to keep 
the collection_rec sorted most times, and avoid expensive calls to 
qsort(). Thanks to Paolo Bonzini for idea and mentoring.

The second part makes a big difference if accompanied with another patch 
in df_insn_refs_collect(). I'll post a followup patch, that is 
unfortunately unstable for some of my tests, so I'd appreciate any 
comments.


Thanks,
Dimitris

Comments

Dimitrios Apostolou July 8, 2011, 3:33 a.m. UTC | #1
To document the gains from the bitmaps, here is (part of) the annotated 
source from callgrind profiler, showing instruction count. Before:


  1,154,400      if (bitmap_bit_p(regs_invalidated_by_call_regset, i)
  8,080,800  => bitmap.c:bitmap_bit_p (192400x)
  1,021,200          && !bitmap_bit_p (&defs_generated, i)
  5,106,000  => bitmap.c:bitmap_bit_p (170200x)
    340,400          && (!is_sibling_call
          .              || !bitmap_bit_p (df->exit_block_uses, i)
          .              || refers_to_regno_p (i, i+1,
          .                                    crtl->return_rtx, NULL)))
  2,053,500        df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i
35,279,934  => df-scan.c:df_ref_record (170200x)
          .                       NULL, bb, insn_info, DF_REF_REG_DEF,
          .                       DF_REF_MAY_CLOBBER | flags);
          .      }


After:


  1,346,800      if (TEST_HARD_REG_BIT(regs_invalidated_by_call, i)
    510,600          && !TEST_HARD_REG_BIT (defs_generated, i)
    340,400          && (!is_sibling_call
          .              || !bitmap_bit_p (df->exit_block_uses, i)
          .              || refers_to_regno_p (i, i+1,
          .                                    crtl->return_rtx, NULL)))
  2,057,200        df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i
35,279,934  => df-scan.c:df_ref_record (170200x)
          .                       NULL, bb, insn_info, DF_REF_REG_DEF,
          .                       DF_REF_MAY_CLOBBER | flags);
          .      }



Dimitris
Steven Bosscher July 8, 2011, 6:36 a.m. UTC | #2
On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
> The attached patch does two things for df_get_call_refs():

How did you test this patch?

Normally, a patch submission comes with text like, "Bootstrapped &
tested on ..., no regressions.". Also, you chould write a ChangeLog
entry, best included in your mail somewhere at the end ;-)

Ciao!
Steven
Jakub Jelinek July 8, 2011, 6:37 a.m. UTC | #3
On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote:
> The attached patch does two things for df_get_call_refs():
> * First it uses HARD_REG_SETs for defs_generated and
> regs_invalidated_by_call, instead of bitmaps. Replacing in total
> more than 400K calls (for my testcase) to bitmap_bit_p() with the
> much faster TEST_HARD_REG_BIT, reduces the total instruction count
> from about 13M to 1.5M.

Have you verified that collection_rec->def_vec never contains pseudo
register references?  Otherwise you couldn't use
HARD_REG_SET... gcc_checking_assert might be useful.

Also, a nit, space should be added before (.
  CLEAR_HARD_REG_SET(defs_generated);

	Jakub
Dimitrios Apostolou July 8, 2011, 6:59 a.m. UTC | #4
On Fri, 8 Jul 2011, Steven Bosscher wrote:
> On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
>> The attached patch does two things for df_get_call_refs():
>
> How did you test this patch?
>
> Normally, a patch submission comes with text like, "Bootstrapped &
> tested on ..., no regressions.". Also, you chould write a ChangeLog
> entry, best included in your mail somewhere at the end ;-)

Hi Steven, thanks for the instructions. I've not run the mandatory tests 
you have told me about, only done some minor testing due to lack of time. 
I'm not yet posting patches for inclusion, but more as an RFC. Should such 
patches be sent to gcc instead of gcc-patches?


Thanks,
Dimitris
Richard Biener July 8, 2011, 8:40 a.m. UTC | #5
On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
> Hello list,
>
> The attached patch does two things for df_get_call_refs():
> * First it uses HARD_REG_SETs for defs_generated and
> regs_invalidated_by_call, instead of bitmaps. Replacing in total more than
> 400K calls (for my testcase) to bitmap_bit_p() with the much faster
> TEST_HARD_REG_BIT, reduces the total instruction count from about 13M to
> 1.5M.
> * Second it produces the REFs in REGNO order, which is important to keep the
> collection_rec sorted most times, and avoid expensive calls to qsort().
> Thanks to Paolo Bonzini for idea and mentoring.
>
> The second part makes a big difference if accompanied with another patch in
> df_insn_refs_collect(). I'll post a followup patch, that is unfortunately
> unstable for some of my tests, so I'd appreciate any comments.

Did you check the impact on memory usage?  I suppose on targets
with not many hard registers it should even improve, but do we expect
memory usage to be worse in any case?

Thanks,
Richard.

>
> Thanks,
> Dimitris
>
Dimitrios Apostolou July 8, 2011, 9:05 a.m. UTC | #6
On Fri, 8 Jul 2011, Jakub Jelinek wrote:
> On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote:
>> The attached patch does two things for df_get_call_refs():
>> * First it uses HARD_REG_SETs for defs_generated and
>> regs_invalidated_by_call, instead of bitmaps. Replacing in total
>> more than 400K calls (for my testcase) to bitmap_bit_p() with the
>> much faster TEST_HARD_REG_BIT, reduces the total instruction count
>> from about 13M to 1.5M.
>
> Have you verified that collection_rec->def_vec never contains pseudo
> register references?  Otherwise you couldn't use
> HARD_REG_SET... gcc_checking_assert might be useful.
>

Hi Jakub, Steve pointed me to the following from GCC Internals Manual:

call_insn insns have the same extra fields as insn insns, accessed in the 
same way and in addition contain a field CALL_INSN_FUNCTION_USAGE, which 
contains a list (chain of expr_list expressions) containing use and 
clobber expressions that denote hard registers and MEMs used or clobbered 
by the called function.


So doesn't that mean that for CALL insns it should contain only HARD_REG 
DEFs? I will ofcourse use an assert to be sure.


Thanks,
Dimitris
Dimitrios Apostolou July 8, 2011, 10:12 a.m. UTC | #7
On Fri, 8 Jul 2011, Richard Guenther wrote:
> On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
>> Hello list,
>>
>> The attached patch does two things for df_get_call_refs():
>> * First it uses HARD_REG_SETs for defs_generated and
>> regs_invalidated_by_call, instead of bitmaps. Replacing in total more than
>> 400K calls (for my testcase) to bitmap_bit_p() with the much faster
>> TEST_HARD_REG_BIT, reduces the total instruction count from about 13M to
>> 1.5M.
>> * Second it produces the REFs in REGNO order, which is important to keep the
>> collection_rec sorted most times, and avoid expensive calls to qsort().
>> Thanks to Paolo Bonzini for idea and mentoring.
>>
>> The second part makes a big difference if accompanied with another patch in
>> df_insn_refs_collect(). I'll post a followup patch, that is unfortunately
>> unstable for some of my tests, so I'd appreciate any comments.
>
> Did you check the impact on memory usage?  I suppose on targets
> with not many hard registers it should even improve, but do we expect
> memory usage to be worse in any case?

Hi Richard, I didn't check memory usage, is that important? Since the 
struct bitmap is fairly bulky, it should take an arch with lots of hard 
regs (which one has the most?).

But still a few bytes tradeoff wouldn't be acceptable for a much faster 
type? And IMHO it makes the code better to understand, since once you see 
HARD_REG_SET you know you can't expect else. FWIW I'm now in the process 
of converting all other bitmap uses for hard regs, to HARD_REG_SETs, at 
least within DF. I'm not sure whether performance gains will be visible, 
however, not much code is as hot as df_get_call_refs().


Thanks,
Dimitris
Paolo Bonzini July 8, 2011, 2:14 p.m. UTC | #8
On 07/08/2011 11:05 AM, Dimitrios Apostolou wrote:
> On Fri, 8 Jul 2011, Jakub Jelinek wrote:
>> On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote:
>>> The attached patch does two things for df_get_call_refs():
>>> * First it uses HARD_REG_SETs for defs_generated and
>>> regs_invalidated_by_call, instead of bitmaps. Replacing in total
>>> more than 400K calls (for my testcase) to bitmap_bit_p() with the
>>> much faster TEST_HARD_REG_BIT, reduces the total instruction count
>>> from about 13M to 1.5M.
>>
>> Have you verified that collection_rec->def_vec never contains pseudo
>> register references? Otherwise you couldn't use
>> HARD_REG_SET... gcc_checking_assert might be useful.
>>
>
> Hi Jakub, Steve pointed me to the following from GCC Internals Manual:
>
> call_insn insns have the same extra fields as insn insns, accessed in
> the same way and in addition contain a field CALL_INSN_FUNCTION_USAGE,
> which contains a list (chain of expr_list expressions) containing use
> and clobber expressions that denote hard registers and MEMs used or
> clobbered by the called function.
>
> So doesn't that mean that for CALL insns it should contain only HARD_REG
> DEFs? I will ofcourse use an assert to be sure.

That part is only for CALL_INSN_FUNCTION_USAGE, which is what 
df_get_call_refs handles.  However, if you rewrite the handling of 
defs_generated as required by your second patch, you'll then be sure 
that you will only have hard registers.

BTW, what testcase are you using?  I suggest that you try building 
stage1 with CFLAGS=--save-temps, and get some of the largest 
preprocessed .i files from there (combine and fold-const for example). 
You can then time them very easily from the old and new build 
directories, with "./cc1 /path/to/file.i -O2".

Paolo
diff mbox

Patch

=== modified file 'gcc/df-scan.c'
--- gcc/df-scan.c	2011-02-02 20:08:06 +0000
+++ gcc/df-scan.c	2011-07-08 01:28:55 +0000
@@ -3317,20 +3317,56 @@ 
                   int flags)
 {
   rtx note;
-  bitmap_iterator bi;
-  unsigned int ui;
   bool is_sibling_call;
   unsigned int i;
   df_ref def;
-  bitmap_head defs_generated;
+  HARD_REG_SET defs_generated;
 
-  bitmap_initialize (&defs_generated, &df_bitmap_obstack);
+  CLEAR_HARD_REG_SET(defs_generated);
 
   /* Do not generate clobbers for registers that are the result of the
      call.  This causes ordering problems in the chain building code
      depending on which def is seen first.  */
   FOR_EACH_VEC_ELT (df_ref, collection_rec->def_vec, i, def)
-    bitmap_set_bit (&defs_generated, DF_REF_REGNO (def));
+    SET_HARD_REG_BIT (defs_generated, DF_REF_REGNO (def));
+
+  is_sibling_call = SIBLING_CALL_P (insn_info->insn);
+
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    {
+      if (i == STACK_POINTER_REGNUM)
+	/* The stack ptr is used (honorarily) by a CALL insn.  */
+	df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+		       NULL, bb, insn_info, DF_REF_REG_USE,
+		       DF_REF_CALL_STACK_USAGE | flags);
+      else if (global_regs[i])
+	{
+	  /* Calls to const functions cannot access any global registers and
+	     calls to pure functions cannot set them.  All other calls may
+	     reference any of the global registers, so they are recorded as
+	     used. */
+	  if (!RTL_CONST_CALL_P (insn_info->insn))
+	    {
+	      df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			     NULL, bb, insn_info, DF_REF_REG_USE, flags);
+	      if (!RTL_PURE_CALL_P (insn_info->insn))
+		df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			       NULL, bb, insn_info, DF_REF_REG_DEF, flags);
+	    }
+	}
+      /* TODO HARD_REG_SET set intersection! */
+      else			/* !global_regs[i] */
+	/* track Caller-Saved registers */
+	if (TEST_HARD_REG_BIT(regs_invalidated_by_call, i)
+	    && !TEST_HARD_REG_BIT (defs_generated, i)
+	    && (!is_sibling_call
+		|| !bitmap_bit_p (df->exit_block_uses, i)
+		|| refers_to_regno_p (i, i+1,
+				      crtl->return_rtx, NULL)))
+	  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			 NULL, bb, insn_info, DF_REF_REG_DEF,
+			 DF_REF_MAY_CLOBBER | flags);
+    }
 
   /* Record the registers used to pass arguments, and explicitly
      noted as clobbered.  */
@@ -3345,7 +3381,7 @@ 
 	  if (REG_P (XEXP (XEXP (note, 0), 0)))
 	    {
 	      unsigned int regno = REGNO (XEXP (XEXP (note, 0), 0));
-	      if (!bitmap_bit_p (&defs_generated, regno))
+	      if (!TEST_HARD_REG_BIT (defs_generated, regno))
 		df_defs_record (collection_rec, XEXP (note, 0), bb,
 				insn_info, flags);
 	    }
@@ -3355,40 +3391,6 @@ 
 	}
     }
 
-  /* The stack ptr is used (honorarily) by a CALL insn.  */
-  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[STACK_POINTER_REGNUM],
-		 NULL, bb, insn_info, DF_REF_REG_USE,
-		 DF_REF_CALL_STACK_USAGE | flags);
-
-  /* Calls to const functions cannot access any global registers and calls to
-     pure functions cannot set them.  All other calls may reference any of the
-     global registers, so they are recorded as used.  */
-  if (!RTL_CONST_CALL_P (insn_info->insn))
-    for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-      if (global_regs[i])
-	{
-	  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
-			 NULL, bb, insn_info, DF_REF_REG_USE, flags);
-	  if (!RTL_PURE_CALL_P (insn_info->insn))
-	    df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
-			   NULL, bb, insn_info, DF_REF_REG_DEF, flags);
-	}
-
-  is_sibling_call = SIBLING_CALL_P (insn_info->insn);
-  EXECUTE_IF_SET_IN_BITMAP (regs_invalidated_by_call_regset, 0, ui, bi)
-    {
-      if (!global_regs[ui]
-	  && (!bitmap_bit_p (&defs_generated, ui))
-	  && (!is_sibling_call
-	      || !bitmap_bit_p (df->exit_block_uses, ui)
-	      || refers_to_regno_p (ui, ui+1,
-				    crtl->return_rtx, NULL)))
-        df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[ui],
-		       NULL, bb, insn_info, DF_REF_REG_DEF,
-		       DF_REF_MAY_CLOBBER | flags);
-    }
-
-  bitmap_clear (&defs_generated);
   return;
 }