Message ID | 20110608020503.73AB8B218D@azwildcat.mtv.corp.google.com |
---|---|
State | New |
Headers | show |
+davidxl On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Patch Description: > ================= > > I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990. > > This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering. > > I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary. > > Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times: > **************************** > .section .note.callgraph.text._Z3foov,"",@progbits > .string "Function _Z3foov" > .string "_Z3barv" > .string "100" > .string "_Z3zapv" > .string "50" > *************************** > > For now, this is for google/main. I will re-submit for review to trunk along with data layout. > > Google ref 41940 > > 2011-06-07 Sriraman Tallam <tmsriram@google.com> > > * doc/invoke.texi: document option -fcallgraph-profiles-sections. > * final.c (dump_cgraph_profiles): New function. > (rest_of_handle_final): Create new section '.note.callgraph.text' > with compiler flag -fcallgraph-profiles-sections > * common.opt: New option -fcallgraph-profiles-sections. > * params.def (DEFPARAM): New param > PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD. > > Index: doc/invoke.texi > =================================================================== > --- doc/invoke.texi (revision 174789) > +++ doc/invoke.texi (working copy) > @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}. > -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol > -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol > -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol > --fcheck-data-deps -fclone-hot-version-paths @gol > +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol > -fcombine-stack-adjustments -fconserve-stack @gol > -fcompare-elim -fcprop-registers -fcrossjumping @gol > -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol > @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline > @opindex fripa-verbose > Enable printing of verbose information about dynamic inter-procedural optimizations. > This is used in conjunction with the @option{-fripa}. > + > +@item -fcallgraph-profiles-sections > +@opindex fcallgraph-profiles-sections > +Emit call graph edge profile counts in .note.callgraph.text sections. This is > +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text > +section is created for each function. This section lists every callee and the > +number of times it is called. The params variable > +"note-cgraph-section-edge-threshold" can be used to only list edges above a > +certain threshold. > @end table > > The following options control compiler behavior regarding floating > Index: final.c > =================================================================== > --- final.c (revision 174789) > +++ final.c (working copy) > @@ -4321,13 +4321,37 @@ debug_free_queue (void) > symbol_queue_size = 0; > } > } > - > + > +/* List the call graph profiled edges whise value is greater than > + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the > + ".note.callgraph.text" section. */ > +static void > +dump_cgraph_profiles (void) > +{ > + struct cgraph_node *node = cgraph_node (current_function_decl); > + struct cgraph_edge *e; > + struct cgraph_node *callee; > + > + for (e = node->callees; e != NULL; e = e->next_callee) > + { > + if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD)) > + continue; > + callee = e->callee; > + fprintf (asm_out_file, "\t.string \"%s\"\n", > + IDENTIFIER_POINTER (decl_assembler_name (callee->decl))); > + fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n", > + e->count); > + } > +} > + > /* Turn the RTL into assembly. */ > static unsigned int > rest_of_handle_final (void) > { > rtx x; > const char *fnname; > + char *profile_fnname; > + unsigned int flags; > > /* Get the function's name, as described by its RTL. This may be > different from the DECL_NAME name used in the source file. */ > @@ -4387,6 +4411,21 @@ rest_of_handle_final (void) > targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0), > decl_fini_priority_lookup > (current_function_decl)); > + > + /* With -fcgraph-section, add ".note.callgraph.text" section for storing > + profiling information. */ > + if (flag_callgraph_profiles_sections > + && flag_profile_use > + && cgraph_node (current_function_decl) != NULL) > + { > + flags = SECTION_DEBUG; > + asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname); > + switch_to_section (get_section (profile_fnname, flags, NULL)); > + fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname); > + dump_cgraph_profiles (); > + free (profile_fnname); > + } > + > return 0; > } > > Index: common.opt > =================================================================== > --- common.opt (revision 174789) > +++ common.opt (working copy) > @@ -907,6 +907,10 @@ fcaller-saves > Common Report Var(flag_caller_saves) Optimization > Save registers around function calls > > +fcallgraph-profiles-sections > +Common Report Var(flag_callgraph_profiles_sections) Init(0) > +Generate .note.callgraph.text sections listing callees and edge counts. > + > fcheck-data-deps > Common Report Var(flag_check_data_deps) > Compare the results of several data dependence analyzers. > Index: params.def > =================================================================== > --- params.def (revision 174789) > +++ params.def (working copy) > @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH, > "maximum length of the call graph path to be cloned " > "while doing multiversioning", > 2, 0, 5) > + > +/* Only output those call graph edges in .note.callgraph.text sections > + whose count is greater than this value. */ > +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD, > + "note-cgraph-section-edge-threshold", > + "minimum call graph edge count for inclusion in " > + ".note.callgraph.text section", > + 0, 0, 0) > + > /* > Local variables: > mode:c > > -- > This patch is available for review at http://codereview.appspot.com/4591045 >
ok for google/main. David On Wed, Jun 8, 2011 at 9:13 AM, Sriraman Tallam <tmsriram@google.com> wrote: > +davidxl > > On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> Patch Description: >> ================= >> >> I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990. >> >> This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering. >> >> I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary. >> >> Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times: >> **************************** >> .section .note.callgraph.text._Z3foov,"",@progbits >> .string "Function _Z3foov" >> .string "_Z3barv" >> .string "100" >> .string "_Z3zapv" >> .string "50" >> *************************** >> >> For now, this is for google/main. I will re-submit for review to trunk along with data layout. >> >> Google ref 41940 >> >> 2011-06-07 Sriraman Tallam <tmsriram@google.com> >> >> * doc/invoke.texi: document option -fcallgraph-profiles-sections. >> * final.c (dump_cgraph_profiles): New function. >> (rest_of_handle_final): Create new section '.note.callgraph.text' >> with compiler flag -fcallgraph-profiles-sections >> * common.opt: New option -fcallgraph-profiles-sections. >> * params.def (DEFPARAM): New param >> PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD. >> >> Index: doc/invoke.texi >> =================================================================== >> --- doc/invoke.texi (revision 174789) >> +++ doc/invoke.texi (working copy) >> @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}. >> -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol >> -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol >> -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol >> --fcheck-data-deps -fclone-hot-version-paths @gol >> +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol >> -fcombine-stack-adjustments -fconserve-stack @gol >> -fcompare-elim -fcprop-registers -fcrossjumping @gol >> -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol >> @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline >> @opindex fripa-verbose >> Enable printing of verbose information about dynamic inter-procedural optimizations. >> This is used in conjunction with the @option{-fripa}. >> + >> +@item -fcallgraph-profiles-sections >> +@opindex fcallgraph-profiles-sections >> +Emit call graph edge profile counts in .note.callgraph.text sections. This is >> +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text >> +section is created for each function. This section lists every callee and the >> +number of times it is called. The params variable >> +"note-cgraph-section-edge-threshold" can be used to only list edges above a >> +certain threshold. >> @end table >> >> The following options control compiler behavior regarding floating >> Index: final.c >> =================================================================== >> --- final.c (revision 174789) >> +++ final.c (working copy) >> @@ -4321,13 +4321,37 @@ debug_free_queue (void) >> symbol_queue_size = 0; >> } >> } >> - >> + >> +/* List the call graph profiled edges whise value is greater than >> + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the >> + ".note.callgraph.text" section. */ >> +static void >> +dump_cgraph_profiles (void) >> +{ >> + struct cgraph_node *node = cgraph_node (current_function_decl); >> + struct cgraph_edge *e; >> + struct cgraph_node *callee; >> + >> + for (e = node->callees; e != NULL; e = e->next_callee) >> + { >> + if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD)) >> + continue; >> + callee = e->callee; >> + fprintf (asm_out_file, "\t.string \"%s\"\n", >> + IDENTIFIER_POINTER (decl_assembler_name (callee->decl))); >> + fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n", >> + e->count); >> + } >> +} >> + >> /* Turn the RTL into assembly. */ >> static unsigned int >> rest_of_handle_final (void) >> { >> rtx x; >> const char *fnname; >> + char *profile_fnname; >> + unsigned int flags; >> >> /* Get the function's name, as described by its RTL. This may be >> different from the DECL_NAME name used in the source file. */ >> @@ -4387,6 +4411,21 @@ rest_of_handle_final (void) >> targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0), >> decl_fini_priority_lookup >> (current_function_decl)); >> + >> + /* With -fcgraph-section, add ".note.callgraph.text" section for storing >> + profiling information. */ >> + if (flag_callgraph_profiles_sections >> + && flag_profile_use >> + && cgraph_node (current_function_decl) != NULL) >> + { >> + flags = SECTION_DEBUG; >> + asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname); >> + switch_to_section (get_section (profile_fnname, flags, NULL)); >> + fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname); >> + dump_cgraph_profiles (); >> + free (profile_fnname); >> + } >> + >> return 0; >> } >> >> Index: common.opt >> =================================================================== >> --- common.opt (revision 174789) >> +++ common.opt (working copy) >> @@ -907,6 +907,10 @@ fcaller-saves >> Common Report Var(flag_caller_saves) Optimization >> Save registers around function calls >> >> +fcallgraph-profiles-sections >> +Common Report Var(flag_callgraph_profiles_sections) Init(0) >> +Generate .note.callgraph.text sections listing callees and edge counts. >> + >> fcheck-data-deps >> Common Report Var(flag_check_data_deps) >> Compare the results of several data dependence analyzers. >> Index: params.def >> =================================================================== >> --- params.def (revision 174789) >> +++ params.def (working copy) >> @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH, >> "maximum length of the call graph path to be cloned " >> "while doing multiversioning", >> 2, 0, 5) >> + >> +/* Only output those call graph edges in .note.callgraph.text sections >> + whose count is greater than this value. */ >> +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD, >> + "note-cgraph-section-edge-threshold", >> + "minimum call graph edge count for inclusion in " >> + ".note.callgraph.text section", >> + 0, 0, 0) >> + >> /* >> Local variables: >> mode:c >> >> -- >> This patch is available for review at http://codereview.appspot.com/4591045 >> >
On Wed, Jun 8, 2011 at 9:16 AM, Xinliang David Li <davidxl@google.com> wrote: > ok for google/main. Thanks, the patch is now committed. > > David > > On Wed, Jun 8, 2011 at 9:13 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> +davidxl >> >> On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> Patch Description: >>> ================= >>> >>> I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990. >>> >>> This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering. >>> >>> I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary. >>> >>> Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times: >>> **************************** >>> .section .note.callgraph.text._Z3foov,"",@progbits >>> .string "Function _Z3foov" >>> .string "_Z3barv" >>> .string "100" >>> .string "_Z3zapv" >>> .string "50" >>> *************************** >>> >>> For now, this is for google/main. I will re-submit for review to trunk along with data layout. >>> >>> Google ref 41940 >>> >>> 2011-06-07 Sriraman Tallam <tmsriram@google.com> >>> >>> * doc/invoke.texi: document option -fcallgraph-profiles-sections. >>> * final.c (dump_cgraph_profiles): New function. >>> (rest_of_handle_final): Create new section '.note.callgraph.text' >>> with compiler flag -fcallgraph-profiles-sections >>> * common.opt: New option -fcallgraph-profiles-sections. >>> * params.def (DEFPARAM): New param >>> PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD. >>> >>> Index: doc/invoke.texi >>> =================================================================== >>> --- doc/invoke.texi (revision 174789) >>> +++ doc/invoke.texi (working copy) >>> @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}. >>> -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol >>> -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol >>> -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol >>> --fcheck-data-deps -fclone-hot-version-paths @gol >>> +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol >>> -fcombine-stack-adjustments -fconserve-stack @gol >>> -fcompare-elim -fcprop-registers -fcrossjumping @gol >>> -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol >>> @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline >>> @opindex fripa-verbose >>> Enable printing of verbose information about dynamic inter-procedural optimizations. >>> This is used in conjunction with the @option{-fripa}. >>> + >>> +@item -fcallgraph-profiles-sections >>> +@opindex fcallgraph-profiles-sections >>> +Emit call graph edge profile counts in .note.callgraph.text sections. This is >>> +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text >>> +section is created for each function. This section lists every callee and the >>> +number of times it is called. The params variable >>> +"note-cgraph-section-edge-threshold" can be used to only list edges above a >>> +certain threshold. >>> @end table >>> >>> The following options control compiler behavior regarding floating >>> Index: final.c >>> =================================================================== >>> --- final.c (revision 174789) >>> +++ final.c (working copy) >>> @@ -4321,13 +4321,37 @@ debug_free_queue (void) >>> symbol_queue_size = 0; >>> } >>> } >>> - >>> + >>> +/* List the call graph profiled edges whise value is greater than >>> + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the >>> + ".note.callgraph.text" section. */ >>> +static void >>> +dump_cgraph_profiles (void) >>> +{ >>> + struct cgraph_node *node = cgraph_node (current_function_decl); >>> + struct cgraph_edge *e; >>> + struct cgraph_node *callee; >>> + >>> + for (e = node->callees; e != NULL; e = e->next_callee) >>> + { >>> + if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD)) >>> + continue; >>> + callee = e->callee; >>> + fprintf (asm_out_file, "\t.string \"%s\"\n", >>> + IDENTIFIER_POINTER (decl_assembler_name (callee->decl))); >>> + fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n", >>> + e->count); >>> + } >>> +} >>> + >>> /* Turn the RTL into assembly. */ >>> static unsigned int >>> rest_of_handle_final (void) >>> { >>> rtx x; >>> const char *fnname; >>> + char *profile_fnname; >>> + unsigned int flags; >>> >>> /* Get the function's name, as described by its RTL. This may be >>> different from the DECL_NAME name used in the source file. */ >>> @@ -4387,6 +4411,21 @@ rest_of_handle_final (void) >>> targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0), >>> decl_fini_priority_lookup >>> (current_function_decl)); >>> + >>> + /* With -fcgraph-section, add ".note.callgraph.text" section for storing >>> + profiling information. */ >>> + if (flag_callgraph_profiles_sections >>> + && flag_profile_use >>> + && cgraph_node (current_function_decl) != NULL) >>> + { >>> + flags = SECTION_DEBUG; >>> + asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname); >>> + switch_to_section (get_section (profile_fnname, flags, NULL)); >>> + fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname); >>> + dump_cgraph_profiles (); >>> + free (profile_fnname); >>> + } >>> + >>> return 0; >>> } >>> >>> Index: common.opt >>> =================================================================== >>> --- common.opt (revision 174789) >>> +++ common.opt (working copy) >>> @@ -907,6 +907,10 @@ fcaller-saves >>> Common Report Var(flag_caller_saves) Optimization >>> Save registers around function calls >>> >>> +fcallgraph-profiles-sections >>> +Common Report Var(flag_callgraph_profiles_sections) Init(0) >>> +Generate .note.callgraph.text sections listing callees and edge counts. >>> + >>> fcheck-data-deps >>> Common Report Var(flag_check_data_deps) >>> Compare the results of several data dependence analyzers. >>> Index: params.def >>> =================================================================== >>> --- params.def (revision 174789) >>> +++ params.def (working copy) >>> @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH, >>> "maximum length of the call graph path to be cloned " >>> "while doing multiversioning", >>> 2, 0, 5) >>> + >>> +/* Only output those call graph edges in .note.callgraph.text sections >>> + whose count is greater than this value. */ >>> +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD, >>> + "note-cgraph-section-edge-threshold", >>> + "minimum call graph edge count for inclusion in " >>> + ".note.callgraph.text section", >>> + 0, 0, 0) >>> + >>> /* >>> Local variables: >>> mode:c >>> >>> -- >>> This patch is available for review at http://codereview.appspot.com/4591045 >>> >> >
================= I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990. This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering. I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary. Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times: **************************** .section .note.callgraph.text._Z3foov,"",@progbits .string "Function _Z3foov" .string "_Z3barv" .string "100" .string "_Z3zapv" .string "50" *************************** For now, this is for google/main. I will re-submit for review to trunk along with data layout. Google ref 41940 2011-06-07 Sriraman Tallam <tmsriram@google.com> * doc/invoke.texi: document option -fcallgraph-profiles-sections. * final.c (dump_cgraph_profiles): New function. (rest_of_handle_final): Create new section '.note.callgraph.text' with compiler flag -fcallgraph-profiles-sections * common.opt: New option -fcallgraph-profiles-sections. * params.def (DEFPARAM): New param PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD. Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 174789) +++ doc/invoke.texi (working copy) @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}. -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol --fcheck-data-deps -fclone-hot-version-paths @gol +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol -fcombine-stack-adjustments -fconserve-stack @gol -fcompare-elim -fcprop-registers -fcrossjumping @gol -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline @opindex fripa-verbose Enable printing of verbose information about dynamic inter-procedural optimizations. This is used in conjunction with the @option{-fripa}. + +@item -fcallgraph-profiles-sections +@opindex fcallgraph-profiles-sections +Emit call graph edge profile counts in .note.callgraph.text sections. This is +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text +section is created for each function. This section lists every callee and the +number of times it is called. The params variable +"note-cgraph-section-edge-threshold" can be used to only list edges above a +certain threshold. @end table The following options control compiler behavior regarding floating Index: final.c =================================================================== --- final.c (revision 174789) +++ final.c (working copy) @@ -4321,13 +4321,37 @@ debug_free_queue (void) symbol_queue_size = 0; } } - + +/* List the call graph profiled edges whise value is greater than + PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the + ".note.callgraph.text" section. */ +static void +dump_cgraph_profiles (void) +{ + struct cgraph_node *node = cgraph_node (current_function_decl); + struct cgraph_edge *e; + struct cgraph_node *callee; + + for (e = node->callees; e != NULL; e = e->next_callee) + { + if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD)) + continue; + callee = e->callee; + fprintf (asm_out_file, "\t.string \"%s\"\n", + IDENTIFIER_POINTER (decl_assembler_name (callee->decl))); + fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n", + e->count); + } +} + /* Turn the RTL into assembly. */ static unsigned int rest_of_handle_final (void) { rtx x; const char *fnname; + char *profile_fnname; + unsigned int flags; /* Get the function's name, as described by its RTL. This may be different from the DECL_NAME name used in the source file. */ @@ -4387,6 +4411,21 @@ rest_of_handle_final (void) targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0), decl_fini_priority_lookup (current_function_decl)); + + /* With -fcgraph-section, add ".note.callgraph.text" section for storing + profiling information. */ + if (flag_callgraph_profiles_sections + && flag_profile_use + && cgraph_node (current_function_decl) != NULL) + { + flags = SECTION_DEBUG; + asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname); + switch_to_section (get_section (profile_fnname, flags, NULL)); + fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname); + dump_cgraph_profiles (); + free (profile_fnname); + } + return 0; } Index: common.opt =================================================================== --- common.opt (revision 174789) +++ common.opt (working copy) @@ -907,6 +907,10 @@ fcaller-saves Common Report Var(flag_caller_saves) Optimization Save registers around function calls +fcallgraph-profiles-sections +Common Report Var(flag_callgraph_profiles_sections) Init(0) +Generate .note.callgraph.text sections listing callees and edge counts. + fcheck-data-deps Common Report Var(flag_check_data_deps) Compare the results of several data dependence analyzers. Index: params.def =================================================================== --- params.def (revision 174789) +++ params.def (working copy) @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH, "maximum length of the call graph path to be cloned " "while doing multiversioning", 2, 0, 5) + +/* Only output those call graph edges in .note.callgraph.text sections + whose count is greater than this value. */ +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD, + "note-cgraph-section-edge-threshold", + "minimum call graph edge count for inclusion in " + ".note.callgraph.text section", + 0, 0, 0) + /* Local variables: mode:c