Patchwork [google] Patch to add compiler flag to dump callgraph edge profiles in special .note sections (issue4591045)

login
register
mail settings
Submitter Sriraman Tallam
Date June 8, 2011, 2:05 a.m.
Message ID <20110608020503.73AB8B218D@azwildcat.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/99353/
State New
Headers show

Comments

Sriraman Tallam - June 8, 2011, 2:05 a.m.
Patch Description:

--
This patch is available for review at http://codereview.appspot.com/4591045
Sriraman Tallam - June 8, 2011, 4:13 p.m.
+davidxl

On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> Patch Description:
> =================
>
> I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990.
>
> This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering.
>
> I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary.
>
> Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times:
> ****************************
> .section        .note.callgraph.text._Z3foov,"",@progbits
>        .string "Function _Z3foov"
>        .string "_Z3barv"
>        .string "100"
>        .string "_Z3zapv"
>        .string "50"
> ***************************
>
> For now, this is for google/main. I will re-submit for review to trunk along with data layout.
>
> Google ref 41940
>
> 2011-06-07  Sriraman Tallam  <tmsriram@google.com>
>
>        * doc/invoke.texi: document option -fcallgraph-profiles-sections.
>        * final.c  (dump_cgraph_profiles): New function.
>        (rest_of_handle_final): Create new section '.note.callgraph.text'
>        with compiler flag -fcallgraph-profiles-sections
>        * common.opt: New option -fcallgraph-profiles-sections.
>        * params.def (DEFPARAM): New param
>        PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.
>
> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi     (revision 174789)
> +++ doc/invoke.texi     (working copy)
> @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
>  -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
>  -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
>  -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
> --fcheck-data-deps -fclone-hot-version-paths @gol
> +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol
>  -fcombine-stack-adjustments -fconserve-stack @gol
>  -fcompare-elim -fcprop-registers -fcrossjumping @gol
>  -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
> @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline
>  @opindex fripa-verbose
>  Enable printing of verbose information about dynamic inter-procedural optimizations.
>  This is used in conjunction with the @option{-fripa}.
> +
> +@item -fcallgraph-profiles-sections
> +@opindex fcallgraph-profiles-sections
> +Emit call graph edge profile counts in .note.callgraph.text sections. This is
> +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
> +section is created for each function. This section lists every callee and the
> +number of times it is called. The params variable
> +"note-cgraph-section-edge-threshold" can be used to only list edges above a
> +certain threshold.
>  @end table
>
>  The following options control compiler behavior regarding floating
> Index: final.c
> ===================================================================
> --- final.c     (revision 174789)
> +++ final.c     (working copy)
> @@ -4321,13 +4321,37 @@ debug_free_queue (void)
>       symbol_queue_size = 0;
>     }
>  }
> -
> +
> +/* List the call graph profiled edges whise value is greater than
> +   PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
> +   ".note.callgraph.text" section. */
> +static void
> +dump_cgraph_profiles (void)
> +{
> +  struct cgraph_node *node = cgraph_node (current_function_decl);
> +  struct cgraph_edge *e;
> +  struct cgraph_node *callee;
> +
> +  for (e = node->callees; e != NULL; e = e->next_callee)
> +    {
> +      if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
> +        continue;
> +      callee = e->callee;
> +      fprintf (asm_out_file, "\t.string \"%s\"\n",
> +               IDENTIFIER_POINTER (decl_assembler_name (callee->decl)));
> +      fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n",
> +               e->count);
> +    }
> +}
> +
>  /* Turn the RTL into assembly.  */
>  static unsigned int
>  rest_of_handle_final (void)
>  {
>   rtx x;
>   const char *fnname;
> +  char *profile_fnname;
> +  unsigned int flags;
>
>   /* Get the function's name, as described by its RTL.  This may be
>      different from the DECL_NAME name used in the source file.  */
> @@ -4387,6 +4411,21 @@ rest_of_handle_final (void)
>     targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0),
>                                decl_fini_priority_lookup
>                                  (current_function_decl));
> +
> +  /* With -fcgraph-section, add ".note.callgraph.text" section for storing
> +     profiling information. */
> +  if (flag_callgraph_profiles_sections
> +      && flag_profile_use
> +      && cgraph_node (current_function_decl) != NULL)
> +    {
> +      flags = SECTION_DEBUG;
> +      asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname);
> +      switch_to_section (get_section (profile_fnname, flags, NULL));
> +      fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname);
> +      dump_cgraph_profiles ();
> +      free (profile_fnname);
> +    }
> +
>   return 0;
>  }
>
> Index: common.opt
> ===================================================================
> --- common.opt  (revision 174789)
> +++ common.opt  (working copy)
> @@ -907,6 +907,10 @@ fcaller-saves
>  Common Report Var(flag_caller_saves) Optimization
>  Save registers around function calls
>
> +fcallgraph-profiles-sections
> +Common Report Var(flag_callgraph_profiles_sections) Init(0)
> +Generate .note.callgraph.text sections listing callees and edge counts.
> +
>  fcheck-data-deps
>  Common Report Var(flag_check_data_deps)
>  Compare the results of several data dependence analyzers.
> Index: params.def
> ===================================================================
> --- params.def  (revision 174789)
> +++ params.def  (working copy)
> @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH,
>          "maximum length of the call graph path to be cloned "
>           "while doing multiversioning",
>          2, 0, 5)
> +
> +/* Only output those call graph edges in .note.callgraph.text sections
> +   whose count is greater than this value. */
> +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD,
> +         "note-cgraph-section-edge-threshold",
> +         "minimum call graph edge count for inclusion in "
> +          ".note.callgraph.text section",
> +         0, 0, 0)
> +
>  /*
>  Local variables:
>  mode:c
>
> --
> This patch is available for review at http://codereview.appspot.com/4591045
>
Xinliang David Li - June 8, 2011, 4:16 p.m.
ok for google/main.

David

On Wed, Jun 8, 2011 at 9:13 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> +davidxl
>
> On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> Patch Description:
>> =================
>>
>> I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990.
>>
>> This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering.
>>
>> I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary.
>>
>> Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times:
>> ****************************
>> .section        .note.callgraph.text._Z3foov,"",@progbits
>>        .string "Function _Z3foov"
>>        .string "_Z3barv"
>>        .string "100"
>>        .string "_Z3zapv"
>>        .string "50"
>> ***************************
>>
>> For now, this is for google/main. I will re-submit for review to trunk along with data layout.
>>
>> Google ref 41940
>>
>> 2011-06-07  Sriraman Tallam  <tmsriram@google.com>
>>
>>        * doc/invoke.texi: document option -fcallgraph-profiles-sections.
>>        * final.c  (dump_cgraph_profiles): New function.
>>        (rest_of_handle_final): Create new section '.note.callgraph.text'
>>        with compiler flag -fcallgraph-profiles-sections
>>        * common.opt: New option -fcallgraph-profiles-sections.
>>        * params.def (DEFPARAM): New param
>>        PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.
>>
>> Index: doc/invoke.texi
>> ===================================================================
>> --- doc/invoke.texi     (revision 174789)
>> +++ doc/invoke.texi     (working copy)
>> @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
>>  -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
>>  -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
>>  -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
>> --fcheck-data-deps -fclone-hot-version-paths @gol
>> +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol
>>  -fcombine-stack-adjustments -fconserve-stack @gol
>>  -fcompare-elim -fcprop-registers -fcrossjumping @gol
>>  -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
>> @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline
>>  @opindex fripa-verbose
>>  Enable printing of verbose information about dynamic inter-procedural optimizations.
>>  This is used in conjunction with the @option{-fripa}.
>> +
>> +@item -fcallgraph-profiles-sections
>> +@opindex fcallgraph-profiles-sections
>> +Emit call graph edge profile counts in .note.callgraph.text sections. This is
>> +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
>> +section is created for each function. This section lists every callee and the
>> +number of times it is called. The params variable
>> +"note-cgraph-section-edge-threshold" can be used to only list edges above a
>> +certain threshold.
>>  @end table
>>
>>  The following options control compiler behavior regarding floating
>> Index: final.c
>> ===================================================================
>> --- final.c     (revision 174789)
>> +++ final.c     (working copy)
>> @@ -4321,13 +4321,37 @@ debug_free_queue (void)
>>       symbol_queue_size = 0;
>>     }
>>  }
>> -
>> +
>> +/* List the call graph profiled edges whise value is greater than
>> +   PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
>> +   ".note.callgraph.text" section. */
>> +static void
>> +dump_cgraph_profiles (void)
>> +{
>> +  struct cgraph_node *node = cgraph_node (current_function_decl);
>> +  struct cgraph_edge *e;
>> +  struct cgraph_node *callee;
>> +
>> +  for (e = node->callees; e != NULL; e = e->next_callee)
>> +    {
>> +      if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
>> +        continue;
>> +      callee = e->callee;
>> +      fprintf (asm_out_file, "\t.string \"%s\"\n",
>> +               IDENTIFIER_POINTER (decl_assembler_name (callee->decl)));
>> +      fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n",
>> +               e->count);
>> +    }
>> +}
>> +
>>  /* Turn the RTL into assembly.  */
>>  static unsigned int
>>  rest_of_handle_final (void)
>>  {
>>   rtx x;
>>   const char *fnname;
>> +  char *profile_fnname;
>> +  unsigned int flags;
>>
>>   /* Get the function's name, as described by its RTL.  This may be
>>      different from the DECL_NAME name used in the source file.  */
>> @@ -4387,6 +4411,21 @@ rest_of_handle_final (void)
>>     targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0),
>>                                decl_fini_priority_lookup
>>                                  (current_function_decl));
>> +
>> +  /* With -fcgraph-section, add ".note.callgraph.text" section for storing
>> +     profiling information. */
>> +  if (flag_callgraph_profiles_sections
>> +      && flag_profile_use
>> +      && cgraph_node (current_function_decl) != NULL)
>> +    {
>> +      flags = SECTION_DEBUG;
>> +      asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname);
>> +      switch_to_section (get_section (profile_fnname, flags, NULL));
>> +      fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname);
>> +      dump_cgraph_profiles ();
>> +      free (profile_fnname);
>> +    }
>> +
>>   return 0;
>>  }
>>
>> Index: common.opt
>> ===================================================================
>> --- common.opt  (revision 174789)
>> +++ common.opt  (working copy)
>> @@ -907,6 +907,10 @@ fcaller-saves
>>  Common Report Var(flag_caller_saves) Optimization
>>  Save registers around function calls
>>
>> +fcallgraph-profiles-sections
>> +Common Report Var(flag_callgraph_profiles_sections) Init(0)
>> +Generate .note.callgraph.text sections listing callees and edge counts.
>> +
>>  fcheck-data-deps
>>  Common Report Var(flag_check_data_deps)
>>  Compare the results of several data dependence analyzers.
>> Index: params.def
>> ===================================================================
>> --- params.def  (revision 174789)
>> +++ params.def  (working copy)
>> @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH,
>>          "maximum length of the call graph path to be cloned "
>>           "while doing multiversioning",
>>          2, 0, 5)
>> +
>> +/* Only output those call graph edges in .note.callgraph.text sections
>> +   whose count is greater than this value. */
>> +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD,
>> +         "note-cgraph-section-edge-threshold",
>> +         "minimum call graph edge count for inclusion in "
>> +          ".note.callgraph.text section",
>> +         0, 0, 0)
>> +
>>  /*
>>  Local variables:
>>  mode:c
>>
>> --
>> This patch is available for review at http://codereview.appspot.com/4591045
>>
>
Sriraman Tallam - June 8, 2011, 4:30 p.m.
On Wed, Jun 8, 2011 at 9:16 AM, Xinliang David Li <davidxl@google.com> wrote:
> ok for google/main.

Thanks, the patch is now committed.

>
> David
>
> On Wed, Jun 8, 2011 at 9:13 AM, Sriraman Tallam <tmsriram@google.com> wrote:
>> +davidxl
>>
>> On Tue, Jun 7, 2011 at 7:05 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> Patch Description:
>>> =================
>>>
>>> I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990.
>>>
>>> This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering.
>>>
>>> I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary.
>>>
>>> Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times:
>>> ****************************
>>> .section        .note.callgraph.text._Z3foov,"",@progbits
>>>        .string "Function _Z3foov"
>>>        .string "_Z3barv"
>>>        .string "100"
>>>        .string "_Z3zapv"
>>>        .string "50"
>>> ***************************
>>>
>>> For now, this is for google/main. I will re-submit for review to trunk along with data layout.
>>>
>>> Google ref 41940
>>>
>>> 2011-06-07  Sriraman Tallam  <tmsriram@google.com>
>>>
>>>        * doc/invoke.texi: document option -fcallgraph-profiles-sections.
>>>        * final.c  (dump_cgraph_profiles): New function.
>>>        (rest_of_handle_final): Create new section '.note.callgraph.text'
>>>        with compiler flag -fcallgraph-profiles-sections
>>>        * common.opt: New option -fcallgraph-profiles-sections.
>>>        * params.def (DEFPARAM): New param
>>>        PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.
>>>
>>> Index: doc/invoke.texi
>>> ===================================================================
>>> --- doc/invoke.texi     (revision 174789)
>>> +++ doc/invoke.texi     (working copy)
>>> @@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
>>>  -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
>>>  -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
>>>  -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
>>> --fcheck-data-deps -fclone-hot-version-paths @gol
>>> +-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol
>>>  -fcombine-stack-adjustments -fconserve-stack @gol
>>>  -fcompare-elim -fcprop-registers -fcrossjumping @gol
>>>  -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
>>> @@ -8114,6 +8114,15 @@ Do not promote static functions with always inline
>>>  @opindex fripa-verbose
>>>  Enable printing of verbose information about dynamic inter-procedural optimizations.
>>>  This is used in conjunction with the @option{-fripa}.
>>> +
>>> +@item -fcallgraph-profiles-sections
>>> +@opindex fcallgraph-profiles-sections
>>> +Emit call graph edge profile counts in .note.callgraph.text sections. This is
>>> +used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
>>> +section is created for each function. This section lists every callee and the
>>> +number of times it is called. The params variable
>>> +"note-cgraph-section-edge-threshold" can be used to only list edges above a
>>> +certain threshold.
>>>  @end table
>>>
>>>  The following options control compiler behavior regarding floating
>>> Index: final.c
>>> ===================================================================
>>> --- final.c     (revision 174789)
>>> +++ final.c     (working copy)
>>> @@ -4321,13 +4321,37 @@ debug_free_queue (void)
>>>       symbol_queue_size = 0;
>>>     }
>>>  }
>>> -
>>> +
>>> +/* List the call graph profiled edges whise value is greater than
>>> +   PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
>>> +   ".note.callgraph.text" section. */
>>> +static void
>>> +dump_cgraph_profiles (void)
>>> +{
>>> +  struct cgraph_node *node = cgraph_node (current_function_decl);
>>> +  struct cgraph_edge *e;
>>> +  struct cgraph_node *callee;
>>> +
>>> +  for (e = node->callees; e != NULL; e = e->next_callee)
>>> +    {
>>> +      if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
>>> +        continue;
>>> +      callee = e->callee;
>>> +      fprintf (asm_out_file, "\t.string \"%s\"\n",
>>> +               IDENTIFIER_POINTER (decl_assembler_name (callee->decl)));
>>> +      fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n",
>>> +               e->count);
>>> +    }
>>> +}
>>> +
>>>  /* Turn the RTL into assembly.  */
>>>  static unsigned int
>>>  rest_of_handle_final (void)
>>>  {
>>>   rtx x;
>>>   const char *fnname;
>>> +  char *profile_fnname;
>>> +  unsigned int flags;
>>>
>>>   /* Get the function's name, as described by its RTL.  This may be
>>>      different from the DECL_NAME name used in the source file.  */
>>> @@ -4387,6 +4411,21 @@ rest_of_handle_final (void)
>>>     targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0),
>>>                                decl_fini_priority_lookup
>>>                                  (current_function_decl));
>>> +
>>> +  /* With -fcgraph-section, add ".note.callgraph.text" section for storing
>>> +     profiling information. */
>>> +  if (flag_callgraph_profiles_sections
>>> +      && flag_profile_use
>>> +      && cgraph_node (current_function_decl) != NULL)
>>> +    {
>>> +      flags = SECTION_DEBUG;
>>> +      asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname);
>>> +      switch_to_section (get_section (profile_fnname, flags, NULL));
>>> +      fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname);
>>> +      dump_cgraph_profiles ();
>>> +      free (profile_fnname);
>>> +    }
>>> +
>>>   return 0;
>>>  }
>>>
>>> Index: common.opt
>>> ===================================================================
>>> --- common.opt  (revision 174789)
>>> +++ common.opt  (working copy)
>>> @@ -907,6 +907,10 @@ fcaller-saves
>>>  Common Report Var(flag_caller_saves) Optimization
>>>  Save registers around function calls
>>>
>>> +fcallgraph-profiles-sections
>>> +Common Report Var(flag_callgraph_profiles_sections) Init(0)
>>> +Generate .note.callgraph.text sections listing callees and edge counts.
>>> +
>>>  fcheck-data-deps
>>>  Common Report Var(flag_check_data_deps)
>>>  Compare the results of several data dependence analyzers.
>>> Index: params.def
>>> ===================================================================
>>> --- params.def  (revision 174789)
>>> +++ params.def  (working copy)
>>> @@ -1002,6 +1002,15 @@ DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH,
>>>          "maximum length of the call graph path to be cloned "
>>>           "while doing multiversioning",
>>>          2, 0, 5)
>>> +
>>> +/* Only output those call graph edges in .note.callgraph.text sections
>>> +   whose count is greater than this value. */
>>> +DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD,
>>> +         "note-cgraph-section-edge-threshold",
>>> +         "minimum call graph edge count for inclusion in "
>>> +          ".note.callgraph.text section",
>>> +         0, 0, 0)
>>> +
>>>  /*
>>>  Local variables:
>>>  mode:c
>>>
>>> --
>>> This patch is available for review at http://codereview.appspot.com/4591045
>>>
>>
>

Patch

=================

I am working on a project to do global function layout in the linker where the linker reads the callgraph edge profile information, generated by FDO, and uses that to find a ordering of functions that will place functions calling each other frequently closer, like the Pettis-Hansen code ordering algorithm described in the paper "Profile-guided Code Poisitioning" in PLDI 1990.

This patch adds a flag that allows the callgraph edge profile information to be stored .note sections called ".note.callgraph.text". The new compiler flag -fcallgraph-profiles-sections generates these sections and must be used along with -fprofile-use. I have added a PARAM to only output callgraph edges greater than a specified threshold. Once this is available, the linker can read these sections and generate a global callgraph which can be used to determine a global function ordering.

I am adding plugin support in the gold linker to allow linker plugins to be able to read the contents of sections and also adding plugin hooks to specify a desired ordering of functions to the linker. The linker patch is available here : http://sourceware.org/ml/binutils/2011-03/msg00043.html. Once this is available, linker plugins can be used to determine the function layout, like the Pettis-Hansen algorithm, of the final binary.

Example: The new .note.callgraph.text sections looks like this for a function foo that calls bar 100 times and zap 50 times:
****************************
.section	.note.callgraph.text._Z3foov,"",@progbits
	.string "Function _Z3foov"
	.string "_Z3barv"
	.string "100"
	.string "_Z3zapv"
	.string "50"
***************************

For now, this is for google/main. I will re-submit for review to trunk along with data layout.

Google ref 41940

2011-06-07  Sriraman Tallam  <tmsriram@google.com>

	* doc/invoke.texi: document option -fcallgraph-profiles-sections.
	* final.c  (dump_cgraph_profiles): New function.
	(rest_of_handle_final): Create new section '.note.callgraph.text'
	with compiler flag -fcallgraph-profiles-sections
	* common.opt: New option -fcallgraph-profiles-sections.
	* params.def (DEFPARAM): New param
        PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD.

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 174789)
+++ doc/invoke.texi	(working copy)
@@ -351,7 +351,7 @@  Objective-C and Objective-C++ Dialects}.
 -falign-labels[=@var{n}] -falign-loops[=@var{n}] -fassociative-math @gol
 -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
 -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
--fcheck-data-deps -fclone-hot-version-paths @gol
+-fcallgraph-profiles-sections -fcheck-data-deps -fclone-hot-version-paths @gol
 -fcombine-stack-adjustments -fconserve-stack @gol
 -fcompare-elim -fcprop-registers -fcrossjumping @gol
 -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
@@ -8114,6 +8114,15 @@  Do not promote static functions with always inline
 @opindex fripa-verbose
 Enable printing of verbose information about dynamic inter-procedural optimizations.
 This is used in conjunction with the @option{-fripa}.
+
+@item -fcallgraph-profiles-sections
+@opindex fcallgraph-profiles-sections
+Emit call graph edge profile counts in .note.callgraph.text sections. This is
+used in conjunction with @option{-fprofile-use}. A new .note.callgraph.text
+section is created for each function. This section lists every callee and the
+number of times it is called. The params variable
+"note-cgraph-section-edge-threshold" can be used to only list edges above a
+certain threshold.
 @end table
 
 The following options control compiler behavior regarding floating
Index: final.c
===================================================================
--- final.c	(revision 174789)
+++ final.c	(working copy)
@@ -4321,13 +4321,37 @@  debug_free_queue (void)
       symbol_queue_size = 0;
     }
 }
-
+
+/* List the call graph profiled edges whise value is greater than
+   PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD in the
+   ".note.callgraph.text" section. */
+static void
+dump_cgraph_profiles (void)
+{
+  struct cgraph_node *node = cgraph_node (current_function_decl);
+  struct cgraph_edge *e;
+  struct cgraph_node *callee;
+
+  for (e = node->callees; e != NULL; e = e->next_callee)
+    {
+      if (e->count <= PARAM_VALUE (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD))
+        continue;
+      callee = e->callee;
+      fprintf (asm_out_file, "\t.string \"%s\"\n",
+               IDENTIFIER_POINTER (decl_assembler_name (callee->decl)));
+      fprintf (asm_out_file, "\t.string \"" HOST_WIDEST_INT_PRINT_DEC "\"\n",
+               e->count);
+    }
+}
+
 /* Turn the RTL into assembly.  */
 static unsigned int
 rest_of_handle_final (void)
 {
   rtx x;
   const char *fnname;
+  char *profile_fnname;
+  unsigned int flags;
 
   /* Get the function's name, as described by its RTL.  This may be
      different from the DECL_NAME name used in the source file.  */
@@ -4387,6 +4411,21 @@  rest_of_handle_final (void)
     targetm.asm_out.destructor (XEXP (DECL_RTL (current_function_decl), 0),
 				decl_fini_priority_lookup
 				  (current_function_decl));
+
+  /* With -fcgraph-section, add ".note.callgraph.text" section for storing
+     profiling information. */
+  if (flag_callgraph_profiles_sections
+      && flag_profile_use
+      && cgraph_node (current_function_decl) != NULL)
+    {
+      flags = SECTION_DEBUG;
+      asprintf (&profile_fnname, ".note.callgraph.text.%s", fnname);
+      switch_to_section (get_section (profile_fnname, flags, NULL));
+      fprintf (asm_out_file, "\t.string \"Function %s\"\n", fnname);
+      dump_cgraph_profiles ();
+      free (profile_fnname);
+    }
+
   return 0;
 }
 
Index: common.opt
===================================================================
--- common.opt	(revision 174789)
+++ common.opt	(working copy)
@@ -907,6 +907,10 @@  fcaller-saves
 Common Report Var(flag_caller_saves) Optimization
 Save registers around function calls
 
+fcallgraph-profiles-sections
+Common Report Var(flag_callgraph_profiles_sections) Init(0)
+Generate .note.callgraph.text sections listing callees and edge counts.
+
 fcheck-data-deps
 Common Report Var(flag_check_data_deps)
 Compare the results of several data dependence analyzers.
Index: params.def
===================================================================
--- params.def	(revision 174789)
+++ params.def	(working copy)
@@ -1002,6 +1002,15 @@  DEFPARAM (PARAM_MVERSN_CLONE_CGRAPH_DEPTH,
 	  "maximum length of the call graph path to be cloned "
           "while doing multiversioning",
 	  2, 0, 5)
+
+/* Only output those call graph edges in .note.callgraph.text sections
+   whose count is greater than this value. */
+DEFPARAM (PARAM_NOTE_CGRAPH_SECTION_EDGE_THRESHOLD,
+	  "note-cgraph-section-edge-threshold",
+	  "minimum call graph edge count for inclusion in "
+          ".note.callgraph.text section",
+	  0, 0, 0)
+
 /*
 Local variables:
 mode:c