Message ID | alpine.LNX.2.02.1107080549070.1237@localhost.localdomain |
---|---|
State | New |
Headers | show |
To document the gains from the bitmaps, here is (part of) the annotated source from callgrind profiler, showing instruction count. Before: 1,154,400 if (bitmap_bit_p(regs_invalidated_by_call_regset, i) 8,080,800 => bitmap.c:bitmap_bit_p (192400x) 1,021,200 && !bitmap_bit_p (&defs_generated, i) 5,106,000 => bitmap.c:bitmap_bit_p (170200x) 340,400 && (!is_sibling_call . || !bitmap_bit_p (df->exit_block_uses, i) . || refers_to_regno_p (i, i+1, . crtl->return_rtx, NULL))) 2,053,500 df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i 35,279,934 => df-scan.c:df_ref_record (170200x) . NULL, bb, insn_info, DF_REF_REG_DEF, . DF_REF_MAY_CLOBBER | flags); . } After: 1,346,800 if (TEST_HARD_REG_BIT(regs_invalidated_by_call, i) 510,600 && !TEST_HARD_REG_BIT (defs_generated, i) 340,400 && (!is_sibling_call . || !bitmap_bit_p (df->exit_block_uses, i) . || refers_to_regno_p (i, i+1, . crtl->return_rtx, NULL))) 2,057,200 df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i 35,279,934 => df-scan.c:df_ref_record (170200x) . NULL, bb, insn_info, DF_REF_REG_DEF, . DF_REF_MAY_CLOBBER | flags); . } Dimitris
On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
> The attached patch does two things for df_get_call_refs():
How did you test this patch?
Normally, a patch submission comes with text like, "Bootstrapped &
tested on ..., no regressions.". Also, you chould write a ChangeLog
entry, best included in your mail somewhere at the end ;-)
Ciao!
Steven
On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote: > The attached patch does two things for df_get_call_refs(): > * First it uses HARD_REG_SETs for defs_generated and > regs_invalidated_by_call, instead of bitmaps. Replacing in total > more than 400K calls (for my testcase) to bitmap_bit_p() with the > much faster TEST_HARD_REG_BIT, reduces the total instruction count > from about 13M to 1.5M. Have you verified that collection_rec->def_vec never contains pseudo register references? Otherwise you couldn't use HARD_REG_SET... gcc_checking_assert might be useful. Also, a nit, space should be added before (. CLEAR_HARD_REG_SET(defs_generated); Jakub
On Fri, 8 Jul 2011, Steven Bosscher wrote: > On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote: >> The attached patch does two things for df_get_call_refs(): > > How did you test this patch? > > Normally, a patch submission comes with text like, "Bootstrapped & > tested on ..., no regressions.". Also, you chould write a ChangeLog > entry, best included in your mail somewhere at the end ;-) Hi Steven, thanks for the instructions. I've not run the mandatory tests you have told me about, only done some minor testing due to lack of time. I'm not yet posting patches for inclusion, but more as an RFC. Should such patches be sent to gcc instead of gcc-patches? Thanks, Dimitris
On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote: > Hello list, > > The attached patch does two things for df_get_call_refs(): > * First it uses HARD_REG_SETs for defs_generated and > regs_invalidated_by_call, instead of bitmaps. Replacing in total more than > 400K calls (for my testcase) to bitmap_bit_p() with the much faster > TEST_HARD_REG_BIT, reduces the total instruction count from about 13M to > 1.5M. > * Second it produces the REFs in REGNO order, which is important to keep the > collection_rec sorted most times, and avoid expensive calls to qsort(). > Thanks to Paolo Bonzini for idea and mentoring. > > The second part makes a big difference if accompanied with another patch in > df_insn_refs_collect(). I'll post a followup patch, that is unfortunately > unstable for some of my tests, so I'd appreciate any comments. Did you check the impact on memory usage? I suppose on targets with not many hard registers it should even improve, but do we expect memory usage to be worse in any case? Thanks, Richard. > > Thanks, > Dimitris >
On Fri, 8 Jul 2011, Jakub Jelinek wrote: > On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote: >> The attached patch does two things for df_get_call_refs(): >> * First it uses HARD_REG_SETs for defs_generated and >> regs_invalidated_by_call, instead of bitmaps. Replacing in total >> more than 400K calls (for my testcase) to bitmap_bit_p() with the >> much faster TEST_HARD_REG_BIT, reduces the total instruction count >> from about 13M to 1.5M. > > Have you verified that collection_rec->def_vec never contains pseudo > register references? Otherwise you couldn't use > HARD_REG_SET... gcc_checking_assert might be useful. > Hi Jakub, Steve pointed me to the following from GCC Internals Manual: call_insn insns have the same extra fields as insn insns, accessed in the same way and in addition contain a field CALL_INSN_FUNCTION_USAGE, which contains a list (chain of expr_list expressions) containing use and clobber expressions that denote hard registers and MEMs used or clobbered by the called function. So doesn't that mean that for CALL insns it should contain only HARD_REG DEFs? I will ofcourse use an assert to be sure. Thanks, Dimitris
On Fri, 8 Jul 2011, Richard Guenther wrote: > On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote: >> Hello list, >> >> The attached patch does two things for df_get_call_refs(): >> * First it uses HARD_REG_SETs for defs_generated and >> regs_invalidated_by_call, instead of bitmaps. Replacing in total more than >> 400K calls (for my testcase) to bitmap_bit_p() with the much faster >> TEST_HARD_REG_BIT, reduces the total instruction count from about 13M to >> 1.5M. >> * Second it produces the REFs in REGNO order, which is important to keep the >> collection_rec sorted most times, and avoid expensive calls to qsort(). >> Thanks to Paolo Bonzini for idea and mentoring. >> >> The second part makes a big difference if accompanied with another patch in >> df_insn_refs_collect(). I'll post a followup patch, that is unfortunately >> unstable for some of my tests, so I'd appreciate any comments. > > Did you check the impact on memory usage? I suppose on targets > with not many hard registers it should even improve, but do we expect > memory usage to be worse in any case? Hi Richard, I didn't check memory usage, is that important? Since the struct bitmap is fairly bulky, it should take an arch with lots of hard regs (which one has the most?). But still a few bytes tradeoff wouldn't be acceptable for a much faster type? And IMHO it makes the code better to understand, since once you see HARD_REG_SET you know you can't expect else. FWIW I'm now in the process of converting all other bitmap uses for hard regs, to HARD_REG_SETs, at least within DF. I'm not sure whether performance gains will be visible, however, not much code is as hot as df_get_call_refs(). Thanks, Dimitris
On 07/08/2011 11:05 AM, Dimitrios Apostolou wrote: > On Fri, 8 Jul 2011, Jakub Jelinek wrote: >> On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote: >>> The attached patch does two things for df_get_call_refs(): >>> * First it uses HARD_REG_SETs for defs_generated and >>> regs_invalidated_by_call, instead of bitmaps. Replacing in total >>> more than 400K calls (for my testcase) to bitmap_bit_p() with the >>> much faster TEST_HARD_REG_BIT, reduces the total instruction count >>> from about 13M to 1.5M. >> >> Have you verified that collection_rec->def_vec never contains pseudo >> register references? Otherwise you couldn't use >> HARD_REG_SET... gcc_checking_assert might be useful. >> > > Hi Jakub, Steve pointed me to the following from GCC Internals Manual: > > call_insn insns have the same extra fields as insn insns, accessed in > the same way and in addition contain a field CALL_INSN_FUNCTION_USAGE, > which contains a list (chain of expr_list expressions) containing use > and clobber expressions that denote hard registers and MEMs used or > clobbered by the called function. > > So doesn't that mean that for CALL insns it should contain only HARD_REG > DEFs? I will ofcourse use an assert to be sure. That part is only for CALL_INSN_FUNCTION_USAGE, which is what df_get_call_refs handles. However, if you rewrite the handling of defs_generated as required by your second patch, you'll then be sure that you will only have hard registers. BTW, what testcase are you using? I suggest that you try building stage1 with CFLAGS=--save-temps, and get some of the largest preprocessed .i files from there (combine and fold-const for example). You can then time them very easily from the old and new build directories, with "./cc1 /path/to/file.i -O2". Paolo
=== modified file 'gcc/df-scan.c' --- gcc/df-scan.c 2011-02-02 20:08:06 +0000 +++ gcc/df-scan.c 2011-07-08 01:28:55 +0000 @@ -3317,20 +3317,56 @@ int flags) { rtx note; - bitmap_iterator bi; - unsigned int ui; bool is_sibling_call; unsigned int i; df_ref def; - bitmap_head defs_generated; + HARD_REG_SET defs_generated; - bitmap_initialize (&defs_generated, &df_bitmap_obstack); + CLEAR_HARD_REG_SET(defs_generated); /* Do not generate clobbers for registers that are the result of the call. This causes ordering problems in the chain building code depending on which def is seen first. */ FOR_EACH_VEC_ELT (df_ref, collection_rec->def_vec, i, def) - bitmap_set_bit (&defs_generated, DF_REF_REGNO (def)); + SET_HARD_REG_BIT (defs_generated, DF_REF_REGNO (def)); + + is_sibling_call = SIBLING_CALL_P (insn_info->insn); + + for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) + { + if (i == STACK_POINTER_REGNUM) + /* The stack ptr is used (honorarily) by a CALL insn. */ + df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i], + NULL, bb, insn_info, DF_REF_REG_USE, + DF_REF_CALL_STACK_USAGE | flags); + else if (global_regs[i]) + { + /* Calls to const functions cannot access any global registers and + calls to pure functions cannot set them. All other calls may + reference any of the global registers, so they are recorded as + used. */ + if (!RTL_CONST_CALL_P (insn_info->insn)) + { + df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i], + NULL, bb, insn_info, DF_REF_REG_USE, flags); + if (!RTL_PURE_CALL_P (insn_info->insn)) + df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i], + NULL, bb, insn_info, DF_REF_REG_DEF, flags); + } + } + /* TODO HARD_REG_SET set intersection! */ + else /* !global_regs[i] */ + /* track Caller-Saved registers */ + if (TEST_HARD_REG_BIT(regs_invalidated_by_call, i) + && !TEST_HARD_REG_BIT (defs_generated, i) + && (!is_sibling_call + || !bitmap_bit_p (df->exit_block_uses, i) + || refers_to_regno_p (i, i+1, + crtl->return_rtx, NULL))) + df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i], + NULL, bb, insn_info, DF_REF_REG_DEF, + DF_REF_MAY_CLOBBER | flags); + } /* Record the registers used to pass arguments, and explicitly noted as clobbered. */ @@ -3345,7 +3381,7 @@ if (REG_P (XEXP (XEXP (note, 0), 0))) { unsigned int regno = REGNO (XEXP (XEXP (note, 0), 0)); - if (!bitmap_bit_p (&defs_generated, regno)) + if (!TEST_HARD_REG_BIT (defs_generated, regno)) df_defs_record (collection_rec, XEXP (note, 0), bb, insn_info, flags); } @@ -3355,40 +3391,6 @@ } } - /* The stack ptr is used (honorarily) by a CALL insn. */ - df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[STACK_POINTER_REGNUM], - NULL, bb, insn_info, DF_REF_REG_USE, - DF_REF_CALL_STACK_USAGE | flags); - - /* Calls to const functions cannot access any global registers and calls to - pure functions cannot set them. All other calls may reference any of the - global registers, so they are recorded as used. */ - if (!RTL_CONST_CALL_P (insn_info->insn)) - for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) - if (global_regs[i]) - { - df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i], - NULL, bb, insn_info, DF_REF_REG_USE, flags); - if (!RTL_PURE_CALL_P (insn_info->insn)) - df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i], - NULL, bb, insn_info, DF_REF_REG_DEF, flags); - } - - is_sibling_call = SIBLING_CALL_P (insn_info->insn); - EXECUTE_IF_SET_IN_BITMAP (regs_invalidated_by_call_regset, 0, ui, bi) - { - if (!global_regs[ui] - && (!bitmap_bit_p (&defs_generated, ui)) - && (!is_sibling_call - || !bitmap_bit_p (df->exit_block_uses, ui) - || refers_to_regno_p (ui, ui+1, - crtl->return_rtx, NULL))) - df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[ui], - NULL, bb, insn_info, DF_REF_REG_DEF, - DF_REF_MAY_CLOBBER | flags); - } - - bitmap_clear (&defs_generated); return; }