diff mbox

Convert more passes to new dump framework

Message ID CAAe5K+Vu0en=WFJp5bw-Vu=n05LYGxvmmfctnZSWeS-5PKT=Rg@mail.gmail.com
State New
Headers show

Commit Message

Teresa Johnson Aug. 7, 2013, 5:23 a.m. UTC
On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>> Hi,
>>
>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>> >> This patch ports messages to the new dump framework,
>>> >
>>> > It would be great this new framework was documented somewhere.  I lost
>>> > track of what was agreed it would be and from the uses in the
>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>
>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>> wiki or elsewhere?
>>
>> Thanks
>>
>>>
>>> >
>>> > I'd also like to point out two other minor things inline:
>>> >
>>> > [...]
>>> >
>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>> >>             Dehao Chen  <dehao@google.com>
>>> >>
>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>> >>         consistent.
>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>> >>         (cgraph_node_opt_info): New function.
>>> >>         (cgraph_node_call_chain): Ditto.
>>> >>         (dump_inline_decision): Ditto.
>>> >>         (inline_call): Invoke dump_inline_decision.
>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>> >>         (compute_branch_probabilities): Ditto.
>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>> >>         when pass not in any opt group.
>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>> >>         (find_func_by_funcdef_no): Ditto.
>>> >>         (check_ic_target): Ditto.
>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>> >>         (coverage_init): Setup new dump framework.
>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>> >>
>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>> >>
>>> >
>>> > [...]
>>> >
>>> >> Index: ipa-inline-transform.c
>>> >> ===================================================================
>>> >> --- ipa-inline-transform.c      (revision 201461)
>>> >> +++ ipa-inline-transform.c      (working copy)
>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>> >>  }
>>> >>
>>> >>
>>> >> +#define MAX_INT_LENGTH 20
>>> >> +
>>> >> +/* Return NODE's name and profile count, if available.  */
>>> >> +
>>> >> +static const char *
>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>> >> +{
>>> >> +  char *buf;
>>> >> +  size_t buf_size;
>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>> >> +
>>> >> +  if (!bfd_name)
>>> >> +    bfd_name = "unknown";
>>> >> +
>>> >> +  buf_size = strlen (bfd_name) + 1;
>>> >> +  if (profile_info)
>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>> >> +
>>> >> +  buf = (char *) xmalloc (buf_size);
>>> >> +
>>> >> +  strcpy (buf, bfd_name);
>>> >> +
>>> >> +  if (profile_info)
>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>> >> +  return buf;
>>> >> +}
>>> >
>>> > I'm not sure if output of this function is aimed only at the user or
>>> > if it is supposed to be used by gcc developers as well.  If the
>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>> > too.  We usually dump it after "/" sign separating it from node name.
>>> > It is invaluable when examining decisions in C++ code where you can
>>> > have lots of clones of a node (and also because existing dumps print
>>> > it, it is easy to combine them).
>>>
>>> The output is useful for both power users doing performance tuning of
>>> their application, and by gcc developers. Adding the id is not so
>>> useful for the former, but I agree that it is very useful for compiler
>>> developers. In fact, in the google branch version we emit more verbose
>>> information (the lipo module id and the funcdef_no) to help uniquely
>>> identify the routines and to aid in post-processing by humans and
>>> tools. So it is probably useful to add something similar here too. Is
>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>> that you added a patch a few months ago to print the
>>> node->symbol.order in the function header, and it also has the
>>> advantage as you note of matching up with existing ipa dumps.
>>
>> node->symbol.order is unique and if I remember correctly, it is not
>> even recycled.  Clones, inline clones, thunks, every symbol table node
>> gets its own symbol order so it should be more unique than funcdef_no.
>> On the other hand it may be a bit cryptic for users but at the same
>> time it is only one number.
>
> Ok, I am going to go ahead and add this to the output.
>
>>
>>>
>>> >
>>> > [...]
>>> >
>>> >> Index: ipa-inline.c
>>> >> ===================================================================
>>> >> --- ipa-inline.c        (revision 201461)
>>> >> +++ ipa-inline.c        (working copy)
>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>> >>  static int overall_size;
>>> >>  static gcov_type max_count;
>>> >>
>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>> >> +bool is_in_ipa_inline = false;
>>> >> +
>>> >>  /* Return false when inlining edge E would lead to violating
>>> >>     limits on function unit growth or stack usage growth.
>>> >>
>>> >
>>> > In this age of removing global variables, are you sure you need this?
>>> > The only user of this seems to be a function that is only being called
>>> > from inline_call... can that ever happen when not inlining?  If you
>>> > plan to use this function also elsewhere, perhaps the callers will
>>> > know whether we are inlining or not and can provide this in a
>>> > parameter?
>>>
>>> This is to distinguish early inlining from ipa inlining.
>>
>> Oh, right, I did not realize that the IPA part was the important bit
>> of the name.
>>
>>> The volume of
>>> early inlining messages is too high to be on for the default setting
>>> of -fopt-info, and are not as interesting usually for performance
>>> tuning. The dumper will only emit the early inline messages under a
>>> more verbose setting (MSG_NOTE):
>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>> The other way I can see to distinguish this would be to check the
>>> always_inline_functions_inlined flag on the caller's function. It
>>> could also be possible to pass down a flag from the callers of
>>> inline_call, but at least one caller (flatten_functions) is shared
>>> between early and late inlining, so the flag needs to be passed
>>> through that as well. WDYT?
>>
>> Did you mean flatten_function?  It already has a bool "early"
>> parameter.  But I can see that being able to quickly figure out
>> whether we are in early inliner or ipa inliner without much hassle is
>> useful enough to justify a global variable a month ago, however I
>> suppose we should not be introducing them now and so you'd have to put
>> such stuff into... well, you'd probably have to put into the universe
>> object somewhere because it is basically shared between two passes.
>> Another option, even though somewhat hackish, would be to look at
>> current_pass and see which pass it is.  I don't know, do what is
>> easier or what you like more, just be aware of the problem.
>
> After thinking about this some more, I think passing down an early
> flag from callers is the cleanest way to go.
>
> I'll fix these and post a new patch later today.

New patch below that removes this global variable, and also outputs
the node->symbol.order (in square brackets after the function name so
as to not clutter it). Inline messages with profile data look look:

test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
with call count 99999000 (via inline instance bar [3] (99999000))

(without FDO the counts in parentheses and the call count would not be
included).

Ok for trunk?
Thanks,
Teresa

013-08-06  Teresa Johnson  <tejohnson@google.com>
            Dehao Chen  <dehao@google.com>

        * dumpfile.c (dump_loc): Output column number, make newlines consistent.
        * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
        * ipa-inline-transform.c (cgraph_node_opt_info): New function.
        (cgraph_node_call_chain): Ditto.
        (dump_inline_decision): Ditto.
        (inline_call): Invoke dump_inline_decision, new parameter.
        * doc/invoke.texi: Document optall -fopt-info flag.
        * profile.c (read_profile_edge_counts): Use new dump framework.
        (compute_branch_probabilities): Ditto.
        * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
        when pass not in any opt group.
        * value-prof.c (check_counter): Use new dump framework.
        (find_func_by_funcdef_no): Ditto.
        (check_ic_target): Ditto.
        * coverage.c (get_coverage_counts): Ditto.
        (coverage_init): Setup new dump framework.
        * ipa-inline.c (recursive_inlining): New inline_call parameter.
        (inline_small_functions): Ditto.
        (flatten_function): Ditto.
        (ipa_inline): Ditto.
        (inline_always_inline_functions): Ditto.
        (early_inline_small_functions): Ditto.
        * ipa-inline.h: Ditto.

        * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
        * testsuite/gcc.dg/pr26570.c: Ditto.
        * testsuite/gcc.dg/pr32773.c: Ditto.
        * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.
        * testsuite/gcc.dg/inline-dump.c: New test.

+int bar(void) { return foo(); }
>
> Thanks,
> Teresa
>
>>
>> Thanks,
>>
>> Martin
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

Comments

Teresa Johnson Aug. 12, 2013, 1:54 p.m. UTC | #1
On Tue, Aug 6, 2013 at 10:23 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>> Hi,
>>>
>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>>> >> This patch ports messages to the new dump framework,
>>>> >
>>>> > It would be great this new framework was documented somewhere.  I lost
>>>> > track of what was agreed it would be and from the uses in the
>>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>>
>>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>>> wiki or elsewhere?
>>>
>>> Thanks
>>>
>>>>
>>>> >
>>>> > I'd also like to point out two other minor things inline:
>>>> >
>>>> > [...]
>>>> >
>>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>> >>             Dehao Chen  <dehao@google.com>
>>>> >>
>>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>>> >>         consistent.
>>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>>> >>         (cgraph_node_opt_info): New function.
>>>> >>         (cgraph_node_call_chain): Ditto.
>>>> >>         (dump_inline_decision): Ditto.
>>>> >>         (inline_call): Invoke dump_inline_decision.
>>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>> >>         (compute_branch_probabilities): Ditto.
>>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>> >>         when pass not in any opt group.
>>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>>> >>         (find_func_by_funcdef_no): Ditto.
>>>> >>         (check_ic_target): Ditto.
>>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>>> >>         (coverage_init): Setup new dump framework.
>>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>>> >>
>>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>>> >>
>>>> >
>>>> > [...]
>>>> >
>>>> >> Index: ipa-inline-transform.c
>>>> >> ===================================================================
>>>> >> --- ipa-inline-transform.c      (revision 201461)
>>>> >> +++ ipa-inline-transform.c      (working copy)
>>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>> >>  }
>>>> >>
>>>> >>
>>>> >> +#define MAX_INT_LENGTH 20
>>>> >> +
>>>> >> +/* Return NODE's name and profile count, if available.  */
>>>> >> +
>>>> >> +static const char *
>>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>>> >> +{
>>>> >> +  char *buf;
>>>> >> +  size_t buf_size;
>>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>> >> +
>>>> >> +  if (!bfd_name)
>>>> >> +    bfd_name = "unknown";
>>>> >> +
>>>> >> +  buf_size = strlen (bfd_name) + 1;
>>>> >> +  if (profile_info)
>>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>>> >> +
>>>> >> +  buf = (char *) xmalloc (buf_size);
>>>> >> +
>>>> >> +  strcpy (buf, bfd_name);
>>>> >> +
>>>> >> +  if (profile_info)
>>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>> >> +  return buf;
>>>> >> +}
>>>> >
>>>> > I'm not sure if output of this function is aimed only at the user or
>>>> > if it is supposed to be used by gcc developers as well.  If the
>>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>>> > too.  We usually dump it after "/" sign separating it from node name.
>>>> > It is invaluable when examining decisions in C++ code where you can
>>>> > have lots of clones of a node (and also because existing dumps print
>>>> > it, it is easy to combine them).
>>>>
>>>> The output is useful for both power users doing performance tuning of
>>>> their application, and by gcc developers. Adding the id is not so
>>>> useful for the former, but I agree that it is very useful for compiler
>>>> developers. In fact, in the google branch version we emit more verbose
>>>> information (the lipo module id and the funcdef_no) to help uniquely
>>>> identify the routines and to aid in post-processing by humans and
>>>> tools. So it is probably useful to add something similar here too. Is
>>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>>> that you added a patch a few months ago to print the
>>>> node->symbol.order in the function header, and it also has the
>>>> advantage as you note of matching up with existing ipa dumps.
>>>
>>> node->symbol.order is unique and if I remember correctly, it is not
>>> even recycled.  Clones, inline clones, thunks, every symbol table node
>>> gets its own symbol order so it should be more unique than funcdef_no.
>>> On the other hand it may be a bit cryptic for users but at the same
>>> time it is only one number.
>>
>> Ok, I am going to go ahead and add this to the output.
>>
>>>
>>>>
>>>> >
>>>> > [...]
>>>> >
>>>> >> Index: ipa-inline.c
>>>> >> ===================================================================
>>>> >> --- ipa-inline.c        (revision 201461)
>>>> >> +++ ipa-inline.c        (working copy)
>>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>>> >>  static int overall_size;
>>>> >>  static gcov_type max_count;
>>>> >>
>>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>>> >> +bool is_in_ipa_inline = false;
>>>> >> +
>>>> >>  /* Return false when inlining edge E would lead to violating
>>>> >>     limits on function unit growth or stack usage growth.
>>>> >>
>>>> >
>>>> > In this age of removing global variables, are you sure you need this?
>>>> > The only user of this seems to be a function that is only being called
>>>> > from inline_call... can that ever happen when not inlining?  If you
>>>> > plan to use this function also elsewhere, perhaps the callers will
>>>> > know whether we are inlining or not and can provide this in a
>>>> > parameter?
>>>>
>>>> This is to distinguish early inlining from ipa inlining.
>>>
>>> Oh, right, I did not realize that the IPA part was the important bit
>>> of the name.
>>>
>>>> The volume of
>>>> early inlining messages is too high to be on for the default setting
>>>> of -fopt-info, and are not as interesting usually for performance
>>>> tuning. The dumper will only emit the early inline messages under a
>>>> more verbose setting (MSG_NOTE):
>>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>>> The other way I can see to distinguish this would be to check the
>>>> always_inline_functions_inlined flag on the caller's function. It
>>>> could also be possible to pass down a flag from the callers of
>>>> inline_call, but at least one caller (flatten_functions) is shared
>>>> between early and late inlining, so the flag needs to be passed
>>>> through that as well. WDYT?
>>>
>>> Did you mean flatten_function?  It already has a bool "early"
>>> parameter.  But I can see that being able to quickly figure out
>>> whether we are in early inliner or ipa inliner without much hassle is
>>> useful enough to justify a global variable a month ago, however I
>>> suppose we should not be introducing them now and so you'd have to put
>>> such stuff into... well, you'd probably have to put into the universe
>>> object somewhere because it is basically shared between two passes.
>>> Another option, even though somewhat hackish, would be to look at
>>> current_pass and see which pass it is.  I don't know, do what is
>>> easier or what you like more, just be aware of the problem.
>>
>> After thinking about this some more, I think passing down an early
>> flag from callers is the cleanest way to go.
>>
>> I'll fix these and post a new patch later today.
>
> New patch below that removes this global variable, and also outputs
> the node->symbol.order (in square brackets after the function name so
> as to not clutter it). Inline messages with profile data look look:
>
> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
> with call count 99999000 (via inline instance bar [3] (99999000))
>
> (without FDO the counts in parentheses and the call count would not be
> included).
>
> Ok for trunk?

Ping.
Teresa

> Thanks,
> Teresa
>
> 013-08-06  Teresa Johnson  <tejohnson@google.com>
>             Dehao Chen  <dehao@google.com>
>
>         * dumpfile.c (dump_loc): Output column number, make newlines consistent.
>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>         * ipa-inline-transform.c (cgraph_node_opt_info): New function.
>         (cgraph_node_call_chain): Ditto.
>         (dump_inline_decision): Ditto.
>         (inline_call): Invoke dump_inline_decision, new parameter.
>         * doc/invoke.texi: Document optall -fopt-info flag.
>         * profile.c (read_profile_edge_counts): Use new dump framework.
>         (compute_branch_probabilities): Ditto.
>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>         when pass not in any opt group.
>         * value-prof.c (check_counter): Use new dump framework.
>         (find_func_by_funcdef_no): Ditto.
>         (check_ic_target): Ditto.
>         * coverage.c (get_coverage_counts): Ditto.
>         (coverage_init): Setup new dump framework.
>         * ipa-inline.c (recursive_inlining): New inline_call parameter.
>         (inline_small_functions): Ditto.
>         (flatten_function): Ditto.
>         (ipa_inline): Ditto.
>         (inline_always_inline_functions): Ditto.
>         (early_inline_small_functions): Ditto.
>         * ipa-inline.h: Ditto.
>
>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>         * testsuite/gcc.dg/pr26570.c: Ditto.
>         * testsuite/gcc.dg/pr32773.c: Ditto.
>         * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.
>         * testsuite/gcc.dg/inline-dump.c: New test.
>
> Index: dumpfile.c
> ===================================================================
> --- dumpfile.c  (revision 201461)
> +++ dumpfile.c  (working copy)
> @@ -257,16 +257,18 @@ dump_open_alternate_stream (struct dump_file_info
>  void
>  dump_loc (int dump_kind, FILE *dfile, source_location loc)
>  {
> -  /* Currently vectorization passes print location information.  */
>    if (dump_kind)
>      {
> +      /* Ensure dump message starts on a new line.  */
> +      fprintf (dfile, "\n");
>        if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
> -        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
> -                 LOCATION_LINE (loc));
> +        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
> +                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
>        else if (current_function_decl)
> -        fprintf (dfile, "\n%s:%d: note: ",
> +        fprintf (dfile, "%s:%d:%d: note: ",
>                   DECL_SOURCE_FILE (current_function_decl),
> -                 DECL_SOURCE_LINE (current_function_decl));
> +                 DECL_SOURCE_LINE (current_function_decl),
> +                 DECL_SOURCE_COLUMN (current_function_decl));
>      }
>  }
>
> Index: dumpfile.h
> ===================================================================
> --- dumpfile.h  (revision 201461)
> +++ dumpfile.h  (working copy)
> @@ -97,8 +97,9 @@ enum tree_dump_index
>  #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
>  #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
>  #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
> +#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
>  #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
> -                              | OPTGROUP_VEC)
> +                              | OPTGROUP_VEC | OPTGROUP_OTHER)
>
>  /* Define a tree dump switch.  */
>  struct dump_file_info
> Index: ipa-inline-transform.c
> ===================================================================
> --- ipa-inline-transform.c      (revision 201461)
> +++ ipa-inline-transform.c      (working copy)
> @@ -192,6 +192,111 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>  }
>
>
> +#define MAX_INT_LENGTH 20
> +
> +/* Return NODE's name and profile count, if available.  */
> +
> +static const char *
> +cgraph_node_opt_info (struct cgraph_node *node)
> +{
> +  char *buf;
> +  size_t buf_size;
> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
> +
> +  if (!bfd_name)
> +    bfd_name = "unknown";
> +
> +  buf_size = strlen (bfd_name) + 1;
> +  if (profile_info)
> +    buf_size += (MAX_INT_LENGTH + 3);
> +  buf_size += MAX_INT_LENGTH;
> +
> +  buf = (char *) xmalloc (buf_size);
> +
> +  strcpy (buf, bfd_name);
> +  //sprintf (buf, "%s/%i", buf, node->symbol.order);
> +  sprintf (buf, "%s [%i]", buf, node->symbol.order);
> +
> +  if (profile_info)
> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
> +  return buf;
> +}
> +
> +
> +/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
> +   function that the caller is inlined to in FINAL_CALLER.  */
> +
> +static const char *
> +cgraph_node_call_chain (struct cgraph_node *caller,
> +                       struct cgraph_node **final_caller)
> +{
> +  struct cgraph_node *node;
> +  const char *via_str = " (via inline instance";
> +  size_t current_string_len = strlen (via_str) + 1;
> +  size_t buf_size = current_string_len;
> +  char *buf = (char *) xmalloc (buf_size);
> +
> +  buf[0] = 0;
> +  gcc_assert (caller->global.inlined_to != NULL);
> +  strcat (buf, via_str);
> +  for (node = caller; node->global.inlined_to != NULL;
> +       node = node->callers->caller)
> +    {
> +      const char *name = cgraph_node_opt_info (node);
> +      current_string_len += (strlen (name) + 1);
> +      if (current_string_len >= buf_size)
> +       {
> +         buf_size = current_string_len * 2;
> +         buf = (char *) xrealloc (buf, buf_size);
> +       }
> +      strcat (buf, " ");
> +      strcat (buf, name);
> +    }
> +  strcat (buf, ")");
> +  *final_caller = node;
> +  return buf;
> +}
> +
> +
> +/* Dump the inline decision of EDGE.  */
> +
> +static void
> +dump_inline_decision (struct cgraph_edge *edge, bool early)
> +{
> +  location_t locus;
> +  const char *inline_chain_text;
> +  const char *call_count_text;
> +  struct cgraph_node *final_caller = edge->caller;
> +
> +  if (final_caller->global.inlined_to != NULL)
> +    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
> +  else
> +    inline_chain_text = "";
> +
> +  if (edge->count > 0)
> +    {
> +      const char *call_count_str = " with call count ";
> +      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
> +      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
> +              edge->count);
> +      call_count_text = buf;
> +    }
> +  else
> +    {
> +      call_count_text = "";
> +    }
> +
> +  locus = gimple_location (edge->call_stmt);
> +  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
> +                   locus,
> +                   "%s inlined into %s%s%s\n",
> +                   cgraph_node_opt_info (edge->callee),
> +                   cgraph_node_opt_info (final_caller),
> +                   call_count_text,
> +                   inline_chain_text);
> +}
> +
> +
>  /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
>     specify whether profile of original function should be updated.  If any new
>     indirect edges are discovered in the process, add them to NEW_EDGES, unless
> @@ -205,7 +310,8 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>  bool
>  inline_call (struct cgraph_edge *e, bool update_original,
>              vec<cgraph_edge_p> *new_edges,
> -            int *overall_size, bool update_overall_summary)
> +            int *overall_size, bool update_overall_summary,
> +             bool early)
>  {
>    int old_size = 0, new_size = 0;
>    struct cgraph_node *to = NULL;
> @@ -218,6 +324,9 @@ inline_call (struct cgraph_edge *e, bool update_or
>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>  #endif
>
> +  if (dump_enabled_p ())
> +    dump_inline_decision (e, early);
> +
>    /* Don't inline inlined edges.  */
>    gcc_assert (e->inline_failed);
>    /* Don't even think of inlining inline clone.  */
> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi     (revision 201461)
> +++ doc/invoke.texi     (working copy)
> @@ -6234,6 +6234,9 @@ Enable dumps from all loop optimizations.
>  Enable dumps from all inlining optimizations.
>  @item vec
>  Enable dumps from all vectorization optimizations.
> +@item optall
> +Enable dumps from all optimizations. This is a superset of
> +the optimization groups listed above.
>  @end table
>
>  For example,
> Index: profile.c
> ===================================================================
> --- profile.c   (revision 201461)
> +++ profile.c   (working copy)
> @@ -432,8 +432,8 @@ read_profile_edge_counts (gcov_type *exec_counts)
>                     if (flag_profile_correction)
>                       {
>                         static bool informed = 0;
> -                       if (!informed)
> -                         inform (input_location,
> +                       if (dump_enabled_p () && !informed)
> +                         dump_printf_loc (MSG_NOTE, input_location,
>                                   "corrupted profile info: edge count
> exceeds maximal count");
>                         informed = 1;
>                       }
> @@ -692,10 +692,11 @@ compute_branch_probabilities (unsigned cfg_checksu
>         {
>           /* Inconsistency detected. Make it flow-consistent. */
>           static int informed = 0;
> -         if (informed == 0)
> +         if (dump_enabled_p () && informed == 0)
>             {
>               informed = 1;
> -             inform (input_location, "correcting inconsistent profile data");
> +             dump_printf_loc (MSG_NOTE, input_location,
> +                              "correcting inconsistent profile data");
>             }
>           correct_negative_edge_counts ();
>           /* Set bb counts to the sum of the outgoing edge counts */
> Index: passes.c
> ===================================================================
> --- passes.c    (revision 201461)
> +++ passes.c    (working copy)
> @@ -524,6 +524,11 @@ pass_manager::register_one_dump_file (struct opt_p
>    flag_name = concat (prefix, name, num, NULL);
>    glob_name = concat (prefix, name, NULL);
>    optgroup_flags |= pass->optinfo_flags;
> +  /* For any passes that do not have an optgroup set, and which are not
> +     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
> +     any dump messages are emitted properly under -fopt-info(-optall).  */
> +  if (optgroup_flags == OPTGROUP_NONE)
> +    optgroup_flags = OPTGROUP_OTHER;
>    id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
>    set_pass_for_id (id, pass);
>    full_name = concat (prefix, pass->name, num, NULL);
> Index: value-prof.c
> ===================================================================
> --- value-prof.c        (revision 201461)
> +++ value-prof.c        (working copy)
> @@ -585,9 +585,11 @@ check_counter (gimple stmt, const char * name,
>                : DECL_SOURCE_LOCATION (current_function_decl);
>        if (flag_profile_correction)
>          {
> -         inform (locus, "correcting inconsistent value profile: "
> -                 "%s profiler overall count (%d) does not match BB count "
> -                  "(%d)", name, (int)*all, (int)bb_count);
> +          if (dump_enabled_p ())
> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
> +                             "correcting inconsistent value profile: %s "
> +                             "profiler overall count (%d) does not match BB "
> +                             "count (%d)", name, (int)*all, (int)bb_count);
>           *all = bb_count;
>           if (*count > *all)
>              *count = *all;
> @@ -1209,9 +1211,11 @@ find_func_by_funcdef_no (int func_id)
>    int max_id = get_last_funcdef_no ();
>    if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
>      {
> -      if (flag_profile_correction)
> -        inform (DECL_SOURCE_LOCATION (current_function_decl),
> -                "Inconsistent profile: indirect call target (%d) does
> not exist", func_id);
> +      if (flag_profile_correction && dump_enabled_p ())
> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> +                         DECL_SOURCE_LOCATION (current_function_decl),
> +                         "Inconsistent profile: indirect call target (%d) "
> +                         "does not exist", func_id);
>        else
>          error ("Inconsistent profile: indirect call target (%d) does
> not exist", func_id);
>
> @@ -1235,8 +1239,10 @@ check_ic_target (gimple call_stmt, struct cgraph_n
>       return true;
>
>     locus =  gimple_location (call_stmt);
> -   inform (locus, "Skipping target %s with mismatching types for icall ",
> -           cgraph_node_name (target));
> +   if (dump_enabled_p ())
> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
> +                      "Skipping target %s with mismatching types for icall ",
> +                      cgraph_node_name (target));
>     return false;
>  }
>
> Index: coverage.c
> ===================================================================
> --- coverage.c  (revision 201461)
> +++ coverage.c  (working copy)
> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "langhooks.h"
>  #include "hash-table.h"
>  #include "tree-iterator.h"
> +#include "tree-pass.h"
>  #include "cgraph.h"
>  #include "dumpfile.h"
>  #include "diagnostic-core.h"
> @@ -341,11 +342,13 @@ get_coverage_counts (unsigned counter, unsigned ex
>      {
>        static int warned = 0;
>
> -      if (!warned++)
> -       inform (input_location, (flag_guess_branch_prob
> -                ? "file %s not found, execution counts estimated"
> -                : "file %s not found, execution counts assumed to be zero"),
> -               da_file_name);
> +      if (!warned++ && dump_enabled_p ())
> +       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
> +                         (flag_guess_branch_prob
> +                          ? "file %s not found, execution counts estimated"
> +                          : "file %s not found, execution counts assumed to "
> +                            "be zero"),
> +                         da_file_name);
>        return NULL;
>      }
>
> @@ -369,21 +372,25 @@ get_coverage_counts (unsigned counter, unsigned ex
>         warning_at (input_location, OPT_Wcoverage_mismatch,
>                     "the control flow of function %qE does not match "
>                     "its profile data (counter %qs)", id, ctr_names[counter]);
> -      if (warning_printed)
> +      if (warning_printed && dump_enabled_p ())
>         {
> -        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
> -                "the mismatch but performance may drop if the
> function is hot");
> +          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
> +                           "use -Wno-error=coverage-mismatch to tolerate "
> +                           "the mismatch but performance may drop if the "
> +                           "function is hot");
>
>           if (!seen_error ()
>               && !warned++)
>             {
> -             inform (input_location, "coverage mismatch ignored");
> -             inform (input_location, flag_guess_branch_prob
> -                     ? G_("execution counts estimated")
> -                     : G_("execution counts assumed to be zero"));
> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
> +                               "coverage mismatch ignored");
> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
> +                               flag_guess_branch_prob
> +                               ? G_("execution counts estimated")
> +                               : G_("execution counts assumed to be zero"));
>               if (!flag_guess_branch_prob)
> -               inform (input_location,
> -                       "this can result in poorly optimized code");
> +               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
> +                                 "this can result in poorly optimized code");
>             }
>         }
>
> @@ -1103,6 +1110,11 @@ coverage_init (const char *filename)
>    int len = strlen (filename);
>    int prefix_len = 0;
>
> +  /* Since coverage_init is invoked very early, before the pass
> +     manager, we need to set up the dumping explicitly. This is
> +     similar to the handling in finish_optimization_passes.  */
> +  dump_start (pass_profile.pass.static_pass_number, NULL);
> +
>    if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
>      profile_data_prefix = getpwd ();
>
> @@ -1145,6 +1157,8 @@ coverage_init (const char *filename)
>           gcov_write_unsigned (bbg_file_stamp);
>         }
>      }
> +
> +  dump_finish (pass_profile.pass.static_pass_number);
>  }
>
>  /* Performs file-level cleanup.  Close notes file, generate coverage
> Index: ipa-inline.c
> ===================================================================
> --- ipa-inline.c        (revision 201461)
> +++ ipa-inline.c        (working copy)
> @@ -1322,7 +1322,7 @@ recursive_inlining (struct cgraph_edge *edge,
>            reset_edge_growth_cache (curr);
>         }
>
> -      inline_call (curr, false, new_edges, &overall_size, true);
> +      inline_call (curr, false, new_edges, &overall_size, true, false);
>        lookup_recursive_calls (node, curr->callee, heap);
>        n++;
>      }
> @@ -1612,7 +1612,8 @@ inline_small_functions (void)
>             fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
>
>           gcc_checking_assert (!callee->global.inlined_to);
> -         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
> +         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
> +                       false);
>           if (flag_indirect_inlining)
>             add_new_edges_to_heap (edge_heap, new_indirect_edges);
>
> @@ -1733,7 +1734,7 @@ flatten_function (struct cgraph_node *node, bool e
>                  xstrdup (cgraph_node_name (callee)),
>                  xstrdup (cgraph_node_name (e->caller)));
>        orig_callee = callee;
> -      inline_call (e, true, NULL, NULL, false);
> +      inline_call (e, true, NULL, NULL, false, early);
>        if (e->callee != orig_callee)
>         orig_callee->symbol.aux = (void *) node;
>        flatten_function (e->callee, early);
> @@ -1852,7 +1853,8 @@ ipa_inline (void)
>                                    inline_summary
> (node->callers->caller)->size);
>                         }
>
> -                     inline_call (node->callers, true, NULL, NULL, true);
> +                     inline_call (node->callers, true, NULL, NULL, true,
> +                                   false);
>                       if (dump_file)
>                         fprintf (dump_file,
>                                  " Inlined into %s which now has %i size\n",
> @@ -1925,7 +1927,7 @@ inline_always_inline_functions (struct cgraph_node
>         fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
>                  xstrdup (cgraph_node_name (e->callee)),
>                  xstrdup (cgraph_node_name (e->caller)));
> -      inline_call (e, true, NULL, NULL, false);
> +      inline_call (e, true, NULL, NULL, false, true);
>        inlined = true;
>      }
>    if (inlined)
> @@ -1977,7 +1979,7 @@ early_inline_small_functions (struct cgraph_node *
>         fprintf (dump_file, " Inlining %s into %s.\n",
>                  xstrdup (cgraph_node_name (callee)),
>                  xstrdup (cgraph_node_name (e->caller)));
> -      inline_call (e, true, NULL, NULL, true);
> +      inline_call (e, true, NULL, NULL, true, true);
>        inlined = true;
>      }
>
> Index: ipa-inline.h
> ===================================================================
> --- ipa-inline.h        (revision 201461)
> +++ ipa-inline.h        (working copy)
> @@ -228,7 +228,8 @@ void free_growth_caches (void);
>  void compute_inline_parameters (struct cgraph_node *, bool);
>
>  /* In ipa-inline-transform.c  */
> -bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
> int *, bool);
> +bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
> +                  bool, bool);
>  unsigned int inline_transform (struct cgraph_node *);
>  void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);
>
> Index: testsuite/gcc.dg/pr40209.c
> ===================================================================
> --- testsuite/gcc.dg/pr40209.c  (revision 201461)
> +++ testsuite/gcc.dg/pr40209.c  (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fprofile-use" } */
> +/* { dg-options "-O2 -fprofile-use -fopt-info" } */
>
>  void process(const char *s);
>
> Index: testsuite/gcc.dg/pr26570.c
> ===================================================================
> --- testsuite/gcc.dg/pr26570.c  (revision 201461)
> +++ testsuite/gcc.dg/pr26570.c  (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
> +/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */
>
>  unsigned test (unsigned a, unsigned b)
>  {
> Index: testsuite/gcc.dg/pr32773.c
> ===================================================================
> --- testsuite/gcc.dg/pr32773.c  (revision 201461)
> +++ testsuite/gcc.dg/pr32773.c  (working copy)
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fprofile-use" } */
> -/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
> +/* { dg-options "-O -fprofile-use -fopt-info" } */
> +/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */
>
>  void foo (int *p)
>  {
> Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
> ===================================================================
> --- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
> +++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
> @@ -1,7 +1,7 @@
>  // PR tree-optimization/39557
>  // invalid post-dom info leads to infinite loop
>  // { dg-do run }
> -// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
> +// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
> -fno-rtti" }
>
>  struct C
>  {
> Index: testsuite/gcc.dg/inline-dump.c
> ===================================================================
> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
> @@ -0,0 +1,11 @@
> +/* Verify that -fopt-info can output correct inline info.  */
> +/* { dg-do compile } */
> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
> +static inline int leaf() {
> +  int i, ret = 0;
> +  for (i = 0; i < 10; i++)
> +    ret += i;
> +  return ret;
> +}
> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
> leaf .*inlined into bar .*via inline instance foo.*\n" } */
> +int bar(void) { return foo(); }
>>
>> Thanks,
>> Teresa
>>
>>>
>>> Thanks,
>>>
>>> Martin
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Aug. 19, 2013, 6:33 p.m. UTC | #2
Ping.
Thanks,
Teresa

On Mon, Aug 12, 2013 at 6:54 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Tue, Aug 6, 2013 at 10:23 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>> Hi,
>>>>
>>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>>>> >> This patch ports messages to the new dump framework,
>>>>> >
>>>>> > It would be great this new framework was documented somewhere.  I lost
>>>>> > track of what was agreed it would be and from the uses in the
>>>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>>>
>>>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>>>> wiki or elsewhere?
>>>>
>>>> Thanks
>>>>
>>>>>
>>>>> >
>>>>> > I'd also like to point out two other minor things inline:
>>>>> >
>>>>> > [...]
>>>>> >
>>>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>> >>             Dehao Chen  <dehao@google.com>
>>>>> >>
>>>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>>>> >>         consistent.
>>>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>>>> >>         (cgraph_node_opt_info): New function.
>>>>> >>         (cgraph_node_call_chain): Ditto.
>>>>> >>         (dump_inline_decision): Ditto.
>>>>> >>         (inline_call): Invoke dump_inline_decision.
>>>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>> >>         (compute_branch_probabilities): Ditto.
>>>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>> >>         when pass not in any opt group.
>>>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>>>> >>         (find_func_by_funcdef_no): Ditto.
>>>>> >>         (check_ic_target): Ditto.
>>>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>>>> >>         (coverage_init): Setup new dump framework.
>>>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>>>> >>
>>>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>>>> >>
>>>>> >
>>>>> > [...]
>>>>> >
>>>>> >> Index: ipa-inline-transform.c
>>>>> >> ===================================================================
>>>>> >> --- ipa-inline-transform.c      (revision 201461)
>>>>> >> +++ ipa-inline-transform.c      (working copy)
>>>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>> >>  }
>>>>> >>
>>>>> >>
>>>>> >> +#define MAX_INT_LENGTH 20
>>>>> >> +
>>>>> >> +/* Return NODE's name and profile count, if available.  */
>>>>> >> +
>>>>> >> +static const char *
>>>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>>>> >> +{
>>>>> >> +  char *buf;
>>>>> >> +  size_t buf_size;
>>>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>>> >> +
>>>>> >> +  if (!bfd_name)
>>>>> >> +    bfd_name = "unknown";
>>>>> >> +
>>>>> >> +  buf_size = strlen (bfd_name) + 1;
>>>>> >> +  if (profile_info)
>>>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>>>> >> +
>>>>> >> +  buf = (char *) xmalloc (buf_size);
>>>>> >> +
>>>>> >> +  strcpy (buf, bfd_name);
>>>>> >> +
>>>>> >> +  if (profile_info)
>>>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>>> >> +  return buf;
>>>>> >> +}
>>>>> >
>>>>> > I'm not sure if output of this function is aimed only at the user or
>>>>> > if it is supposed to be used by gcc developers as well.  If the
>>>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>>>> > too.  We usually dump it after "/" sign separating it from node name.
>>>>> > It is invaluable when examining decisions in C++ code where you can
>>>>> > have lots of clones of a node (and also because existing dumps print
>>>>> > it, it is easy to combine them).
>>>>>
>>>>> The output is useful for both power users doing performance tuning of
>>>>> their application, and by gcc developers. Adding the id is not so
>>>>> useful for the former, but I agree that it is very useful for compiler
>>>>> developers. In fact, in the google branch version we emit more verbose
>>>>> information (the lipo module id and the funcdef_no) to help uniquely
>>>>> identify the routines and to aid in post-processing by humans and
>>>>> tools. So it is probably useful to add something similar here too. Is
>>>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>>>> that you added a patch a few months ago to print the
>>>>> node->symbol.order in the function header, and it also has the
>>>>> advantage as you note of matching up with existing ipa dumps.
>>>>
>>>> node->symbol.order is unique and if I remember correctly, it is not
>>>> even recycled.  Clones, inline clones, thunks, every symbol table node
>>>> gets its own symbol order so it should be more unique than funcdef_no.
>>>> On the other hand it may be a bit cryptic for users but at the same
>>>> time it is only one number.
>>>
>>> Ok, I am going to go ahead and add this to the output.
>>>
>>>>
>>>>>
>>>>> >
>>>>> > [...]
>>>>> >
>>>>> >> Index: ipa-inline.c
>>>>> >> ===================================================================
>>>>> >> --- ipa-inline.c        (revision 201461)
>>>>> >> +++ ipa-inline.c        (working copy)
>>>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>>>> >>  static int overall_size;
>>>>> >>  static gcov_type max_count;
>>>>> >>
>>>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>>>> >> +bool is_in_ipa_inline = false;
>>>>> >> +
>>>>> >>  /* Return false when inlining edge E would lead to violating
>>>>> >>     limits on function unit growth or stack usage growth.
>>>>> >>
>>>>> >
>>>>> > In this age of removing global variables, are you sure you need this?
>>>>> > The only user of this seems to be a function that is only being called
>>>>> > from inline_call... can that ever happen when not inlining?  If you
>>>>> > plan to use this function also elsewhere, perhaps the callers will
>>>>> > know whether we are inlining or not and can provide this in a
>>>>> > parameter?
>>>>>
>>>>> This is to distinguish early inlining from ipa inlining.
>>>>
>>>> Oh, right, I did not realize that the IPA part was the important bit
>>>> of the name.
>>>>
>>>>> The volume of
>>>>> early inlining messages is too high to be on for the default setting
>>>>> of -fopt-info, and are not as interesting usually for performance
>>>>> tuning. The dumper will only emit the early inline messages under a
>>>>> more verbose setting (MSG_NOTE):
>>>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>>>> The other way I can see to distinguish this would be to check the
>>>>> always_inline_functions_inlined flag on the caller's function. It
>>>>> could also be possible to pass down a flag from the callers of
>>>>> inline_call, but at least one caller (flatten_functions) is shared
>>>>> between early and late inlining, so the flag needs to be passed
>>>>> through that as well. WDYT?
>>>>
>>>> Did you mean flatten_function?  It already has a bool "early"
>>>> parameter.  But I can see that being able to quickly figure out
>>>> whether we are in early inliner or ipa inliner without much hassle is
>>>> useful enough to justify a global variable a month ago, however I
>>>> suppose we should not be introducing them now and so you'd have to put
>>>> such stuff into... well, you'd probably have to put into the universe
>>>> object somewhere because it is basically shared between two passes.
>>>> Another option, even though somewhat hackish, would be to look at
>>>> current_pass and see which pass it is.  I don't know, do what is
>>>> easier or what you like more, just be aware of the problem.
>>>
>>> After thinking about this some more, I think passing down an early
>>> flag from callers is the cleanest way to go.
>>>
>>> I'll fix these and post a new patch later today.
>>
>> New patch below that removes this global variable, and also outputs
>> the node->symbol.order (in square brackets after the function name so
>> as to not clutter it). Inline messages with profile data look look:
>>
>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>> with call count 99999000 (via inline instance bar [3] (99999000))
>>
>> (without FDO the counts in parentheses and the call count would not be
>> included).
>>
>> Ok for trunk?
>
> Ping.
> Teresa
>
>> Thanks,
>> Teresa
>>
>> 013-08-06  Teresa Johnson  <tejohnson@google.com>
>>             Dehao Chen  <dehao@google.com>
>>
>>         * dumpfile.c (dump_loc): Output column number, make newlines consistent.
>>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>         * ipa-inline-transform.c (cgraph_node_opt_info): New function.
>>         (cgraph_node_call_chain): Ditto.
>>         (dump_inline_decision): Ditto.
>>         (inline_call): Invoke dump_inline_decision, new parameter.
>>         * doc/invoke.texi: Document optall -fopt-info flag.
>>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>         (compute_branch_probabilities): Ditto.
>>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>         when pass not in any opt group.
>>         * value-prof.c (check_counter): Use new dump framework.
>>         (find_func_by_funcdef_no): Ditto.
>>         (check_ic_target): Ditto.
>>         * coverage.c (get_coverage_counts): Ditto.
>>         (coverage_init): Setup new dump framework.
>>         * ipa-inline.c (recursive_inlining): New inline_call parameter.
>>         (inline_small_functions): Ditto.
>>         (flatten_function): Ditto.
>>         (ipa_inline): Ditto.
>>         (inline_always_inline_functions): Ditto.
>>         (early_inline_small_functions): Ditto.
>>         * ipa-inline.h: Ditto.
>>
>>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>         * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.
>>         * testsuite/gcc.dg/inline-dump.c: New test.
>>
>> Index: dumpfile.c
>> ===================================================================
>> --- dumpfile.c  (revision 201461)
>> +++ dumpfile.c  (working copy)
>> @@ -257,16 +257,18 @@ dump_open_alternate_stream (struct dump_file_info
>>  void
>>  dump_loc (int dump_kind, FILE *dfile, source_location loc)
>>  {
>> -  /* Currently vectorization passes print location information.  */
>>    if (dump_kind)
>>      {
>> +      /* Ensure dump message starts on a new line.  */
>> +      fprintf (dfile, "\n");
>>        if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
>> -        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
>> -                 LOCATION_LINE (loc));
>> +        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
>> +                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
>>        else if (current_function_decl)
>> -        fprintf (dfile, "\n%s:%d: note: ",
>> +        fprintf (dfile, "%s:%d:%d: note: ",
>>                   DECL_SOURCE_FILE (current_function_decl),
>> -                 DECL_SOURCE_LINE (current_function_decl));
>> +                 DECL_SOURCE_LINE (current_function_decl),
>> +                 DECL_SOURCE_COLUMN (current_function_decl));
>>      }
>>  }
>>
>> Index: dumpfile.h
>> ===================================================================
>> --- dumpfile.h  (revision 201461)
>> +++ dumpfile.h  (working copy)
>> @@ -97,8 +97,9 @@ enum tree_dump_index
>>  #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
>>  #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
>>  #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
>> +#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
>>  #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
>> -                              | OPTGROUP_VEC)
>> +                              | OPTGROUP_VEC | OPTGROUP_OTHER)
>>
>>  /* Define a tree dump switch.  */
>>  struct dump_file_info
>> Index: ipa-inline-transform.c
>> ===================================================================
>> --- ipa-inline-transform.c      (revision 201461)
>> +++ ipa-inline-transform.c      (working copy)
>> @@ -192,6 +192,111 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>  }
>>
>>
>> +#define MAX_INT_LENGTH 20
>> +
>> +/* Return NODE's name and profile count, if available.  */
>> +
>> +static const char *
>> +cgraph_node_opt_info (struct cgraph_node *node)
>> +{
>> +  char *buf;
>> +  size_t buf_size;
>> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>> +
>> +  if (!bfd_name)
>> +    bfd_name = "unknown";
>> +
>> +  buf_size = strlen (bfd_name) + 1;
>> +  if (profile_info)
>> +    buf_size += (MAX_INT_LENGTH + 3);
>> +  buf_size += MAX_INT_LENGTH;
>> +
>> +  buf = (char *) xmalloc (buf_size);
>> +
>> +  strcpy (buf, bfd_name);
>> +  //sprintf (buf, "%s/%i", buf, node->symbol.order);
>> +  sprintf (buf, "%s [%i]", buf, node->symbol.order);
>> +
>> +  if (profile_info)
>> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>> +  return buf;
>> +}
>> +
>> +
>> +/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
>> +   function that the caller is inlined to in FINAL_CALLER.  */
>> +
>> +static const char *
>> +cgraph_node_call_chain (struct cgraph_node *caller,
>> +                       struct cgraph_node **final_caller)
>> +{
>> +  struct cgraph_node *node;
>> +  const char *via_str = " (via inline instance";
>> +  size_t current_string_len = strlen (via_str) + 1;
>> +  size_t buf_size = current_string_len;
>> +  char *buf = (char *) xmalloc (buf_size);
>> +
>> +  buf[0] = 0;
>> +  gcc_assert (caller->global.inlined_to != NULL);
>> +  strcat (buf, via_str);
>> +  for (node = caller; node->global.inlined_to != NULL;
>> +       node = node->callers->caller)
>> +    {
>> +      const char *name = cgraph_node_opt_info (node);
>> +      current_string_len += (strlen (name) + 1);
>> +      if (current_string_len >= buf_size)
>> +       {
>> +         buf_size = current_string_len * 2;
>> +         buf = (char *) xrealloc (buf, buf_size);
>> +       }
>> +      strcat (buf, " ");
>> +      strcat (buf, name);
>> +    }
>> +  strcat (buf, ")");
>> +  *final_caller = node;
>> +  return buf;
>> +}
>> +
>> +
>> +/* Dump the inline decision of EDGE.  */
>> +
>> +static void
>> +dump_inline_decision (struct cgraph_edge *edge, bool early)
>> +{
>> +  location_t locus;
>> +  const char *inline_chain_text;
>> +  const char *call_count_text;
>> +  struct cgraph_node *final_caller = edge->caller;
>> +
>> +  if (final_caller->global.inlined_to != NULL)
>> +    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
>> +  else
>> +    inline_chain_text = "";
>> +
>> +  if (edge->count > 0)
>> +    {
>> +      const char *call_count_str = " with call count ";
>> +      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
>> +      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
>> +              edge->count);
>> +      call_count_text = buf;
>> +    }
>> +  else
>> +    {
>> +      call_count_text = "";
>> +    }
>> +
>> +  locus = gimple_location (edge->call_stmt);
>> +  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
>> +                   locus,
>> +                   "%s inlined into %s%s%s\n",
>> +                   cgraph_node_opt_info (edge->callee),
>> +                   cgraph_node_opt_info (final_caller),
>> +                   call_count_text,
>> +                   inline_chain_text);
>> +}
>> +
>> +
>>  /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
>>     specify whether profile of original function should be updated.  If any new
>>     indirect edges are discovered in the process, add them to NEW_EDGES, unless
>> @@ -205,7 +310,8 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>  bool
>>  inline_call (struct cgraph_edge *e, bool update_original,
>>              vec<cgraph_edge_p> *new_edges,
>> -            int *overall_size, bool update_overall_summary)
>> +            int *overall_size, bool update_overall_summary,
>> +             bool early)
>>  {
>>    int old_size = 0, new_size = 0;
>>    struct cgraph_node *to = NULL;
>> @@ -218,6 +324,9 @@ inline_call (struct cgraph_edge *e, bool update_or
>>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>>  #endif
>>
>> +  if (dump_enabled_p ())
>> +    dump_inline_decision (e, early);
>> +
>>    /* Don't inline inlined edges.  */
>>    gcc_assert (e->inline_failed);
>>    /* Don't even think of inlining inline clone.  */
>> Index: doc/invoke.texi
>> ===================================================================
>> --- doc/invoke.texi     (revision 201461)
>> +++ doc/invoke.texi     (working copy)
>> @@ -6234,6 +6234,9 @@ Enable dumps from all loop optimizations.
>>  Enable dumps from all inlining optimizations.
>>  @item vec
>>  Enable dumps from all vectorization optimizations.
>> +@item optall
>> +Enable dumps from all optimizations. This is a superset of
>> +the optimization groups listed above.
>>  @end table
>>
>>  For example,
>> Index: profile.c
>> ===================================================================
>> --- profile.c   (revision 201461)
>> +++ profile.c   (working copy)
>> @@ -432,8 +432,8 @@ read_profile_edge_counts (gcov_type *exec_counts)
>>                     if (flag_profile_correction)
>>                       {
>>                         static bool informed = 0;
>> -                       if (!informed)
>> -                         inform (input_location,
>> +                       if (dump_enabled_p () && !informed)
>> +                         dump_printf_loc (MSG_NOTE, input_location,
>>                                   "corrupted profile info: edge count
>> exceeds maximal count");
>>                         informed = 1;
>>                       }
>> @@ -692,10 +692,11 @@ compute_branch_probabilities (unsigned cfg_checksu
>>         {
>>           /* Inconsistency detected. Make it flow-consistent. */
>>           static int informed = 0;
>> -         if (informed == 0)
>> +         if (dump_enabled_p () && informed == 0)
>>             {
>>               informed = 1;
>> -             inform (input_location, "correcting inconsistent profile data");
>> +             dump_printf_loc (MSG_NOTE, input_location,
>> +                              "correcting inconsistent profile data");
>>             }
>>           correct_negative_edge_counts ();
>>           /* Set bb counts to the sum of the outgoing edge counts */
>> Index: passes.c
>> ===================================================================
>> --- passes.c    (revision 201461)
>> +++ passes.c    (working copy)
>> @@ -524,6 +524,11 @@ pass_manager::register_one_dump_file (struct opt_p
>>    flag_name = concat (prefix, name, num, NULL);
>>    glob_name = concat (prefix, name, NULL);
>>    optgroup_flags |= pass->optinfo_flags;
>> +  /* For any passes that do not have an optgroup set, and which are not
>> +     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
>> +     any dump messages are emitted properly under -fopt-info(-optall).  */
>> +  if (optgroup_flags == OPTGROUP_NONE)
>> +    optgroup_flags = OPTGROUP_OTHER;
>>    id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
>>    set_pass_for_id (id, pass);
>>    full_name = concat (prefix, pass->name, num, NULL);
>> Index: value-prof.c
>> ===================================================================
>> --- value-prof.c        (revision 201461)
>> +++ value-prof.c        (working copy)
>> @@ -585,9 +585,11 @@ check_counter (gimple stmt, const char * name,
>>                : DECL_SOURCE_LOCATION (current_function_decl);
>>        if (flag_profile_correction)
>>          {
>> -         inform (locus, "correcting inconsistent value profile: "
>> -                 "%s profiler overall count (%d) does not match BB count "
>> -                  "(%d)", name, (int)*all, (int)bb_count);
>> +          if (dump_enabled_p ())
>> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>> +                             "correcting inconsistent value profile: %s "
>> +                             "profiler overall count (%d) does not match BB "
>> +                             "count (%d)", name, (int)*all, (int)bb_count);
>>           *all = bb_count;
>>           if (*count > *all)
>>              *count = *all;
>> @@ -1209,9 +1211,11 @@ find_func_by_funcdef_no (int func_id)
>>    int max_id = get_last_funcdef_no ();
>>    if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
>>      {
>> -      if (flag_profile_correction)
>> -        inform (DECL_SOURCE_LOCATION (current_function_decl),
>> -                "Inconsistent profile: indirect call target (%d) does
>> not exist", func_id);
>> +      if (flag_profile_correction && dump_enabled_p ())
>> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>> +                         DECL_SOURCE_LOCATION (current_function_decl),
>> +                         "Inconsistent profile: indirect call target (%d) "
>> +                         "does not exist", func_id);
>>        else
>>          error ("Inconsistent profile: indirect call target (%d) does
>> not exist", func_id);
>>
>> @@ -1235,8 +1239,10 @@ check_ic_target (gimple call_stmt, struct cgraph_n
>>       return true;
>>
>>     locus =  gimple_location (call_stmt);
>> -   inform (locus, "Skipping target %s with mismatching types for icall ",
>> -           cgraph_node_name (target));
>> +   if (dump_enabled_p ())
>> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>> +                      "Skipping target %s with mismatching types for icall ",
>> +                      cgraph_node_name (target));
>>     return false;
>>  }
>>
>> Index: coverage.c
>> ===================================================================
>> --- coverage.c  (revision 201461)
>> +++ coverage.c  (working copy)
>> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "langhooks.h"
>>  #include "hash-table.h"
>>  #include "tree-iterator.h"
>> +#include "tree-pass.h"
>>  #include "cgraph.h"
>>  #include "dumpfile.h"
>>  #include "diagnostic-core.h"
>> @@ -341,11 +342,13 @@ get_coverage_counts (unsigned counter, unsigned ex
>>      {
>>        static int warned = 0;
>>
>> -      if (!warned++)
>> -       inform (input_location, (flag_guess_branch_prob
>> -                ? "file %s not found, execution counts estimated"
>> -                : "file %s not found, execution counts assumed to be zero"),
>> -               da_file_name);
>> +      if (!warned++ && dump_enabled_p ())
>> +       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>> +                         (flag_guess_branch_prob
>> +                          ? "file %s not found, execution counts estimated"
>> +                          : "file %s not found, execution counts assumed to "
>> +                            "be zero"),
>> +                         da_file_name);
>>        return NULL;
>>      }
>>
>> @@ -369,21 +372,25 @@ get_coverage_counts (unsigned counter, unsigned ex
>>         warning_at (input_location, OPT_Wcoverage_mismatch,
>>                     "the control flow of function %qE does not match "
>>                     "its profile data (counter %qs)", id, ctr_names[counter]);
>> -      if (warning_printed)
>> +      if (warning_printed && dump_enabled_p ())
>>         {
>> -        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
>> -                "the mismatch but performance may drop if the
>> function is hot");
>> +          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>> +                           "use -Wno-error=coverage-mismatch to tolerate "
>> +                           "the mismatch but performance may drop if the "
>> +                           "function is hot");
>>
>>           if (!seen_error ()
>>               && !warned++)
>>             {
>> -             inform (input_location, "coverage mismatch ignored");
>> -             inform (input_location, flag_guess_branch_prob
>> -                     ? G_("execution counts estimated")
>> -                     : G_("execution counts assumed to be zero"));
>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>> +                               "coverage mismatch ignored");
>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>> +                               flag_guess_branch_prob
>> +                               ? G_("execution counts estimated")
>> +                               : G_("execution counts assumed to be zero"));
>>               if (!flag_guess_branch_prob)
>> -               inform (input_location,
>> -                       "this can result in poorly optimized code");
>> +               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>> +                                 "this can result in poorly optimized code");
>>             }
>>         }
>>
>> @@ -1103,6 +1110,11 @@ coverage_init (const char *filename)
>>    int len = strlen (filename);
>>    int prefix_len = 0;
>>
>> +  /* Since coverage_init is invoked very early, before the pass
>> +     manager, we need to set up the dumping explicitly. This is
>> +     similar to the handling in finish_optimization_passes.  */
>> +  dump_start (pass_profile.pass.static_pass_number, NULL);
>> +
>>    if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
>>      profile_data_prefix = getpwd ();
>>
>> @@ -1145,6 +1157,8 @@ coverage_init (const char *filename)
>>           gcov_write_unsigned (bbg_file_stamp);
>>         }
>>      }
>> +
>> +  dump_finish (pass_profile.pass.static_pass_number);
>>  }
>>
>>  /* Performs file-level cleanup.  Close notes file, generate coverage
>> Index: ipa-inline.c
>> ===================================================================
>> --- ipa-inline.c        (revision 201461)
>> +++ ipa-inline.c        (working copy)
>> @@ -1322,7 +1322,7 @@ recursive_inlining (struct cgraph_edge *edge,
>>            reset_edge_growth_cache (curr);
>>         }
>>
>> -      inline_call (curr, false, new_edges, &overall_size, true);
>> +      inline_call (curr, false, new_edges, &overall_size, true, false);
>>        lookup_recursive_calls (node, curr->callee, heap);
>>        n++;
>>      }
>> @@ -1612,7 +1612,8 @@ inline_small_functions (void)
>>             fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
>>
>>           gcc_checking_assert (!callee->global.inlined_to);
>> -         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
>> +         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
>> +                       false);
>>           if (flag_indirect_inlining)
>>             add_new_edges_to_heap (edge_heap, new_indirect_edges);
>>
>> @@ -1733,7 +1734,7 @@ flatten_function (struct cgraph_node *node, bool e
>>                  xstrdup (cgraph_node_name (callee)),
>>                  xstrdup (cgraph_node_name (e->caller)));
>>        orig_callee = callee;
>> -      inline_call (e, true, NULL, NULL, false);
>> +      inline_call (e, true, NULL, NULL, false, early);
>>        if (e->callee != orig_callee)
>>         orig_callee->symbol.aux = (void *) node;
>>        flatten_function (e->callee, early);
>> @@ -1852,7 +1853,8 @@ ipa_inline (void)
>>                                    inline_summary
>> (node->callers->caller)->size);
>>                         }
>>
>> -                     inline_call (node->callers, true, NULL, NULL, true);
>> +                     inline_call (node->callers, true, NULL, NULL, true,
>> +                                   false);
>>                       if (dump_file)
>>                         fprintf (dump_file,
>>                                  " Inlined into %s which now has %i size\n",
>> @@ -1925,7 +1927,7 @@ inline_always_inline_functions (struct cgraph_node
>>         fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
>>                  xstrdup (cgraph_node_name (e->callee)),
>>                  xstrdup (cgraph_node_name (e->caller)));
>> -      inline_call (e, true, NULL, NULL, false);
>> +      inline_call (e, true, NULL, NULL, false, true);
>>        inlined = true;
>>      }
>>    if (inlined)
>> @@ -1977,7 +1979,7 @@ early_inline_small_functions (struct cgraph_node *
>>         fprintf (dump_file, " Inlining %s into %s.\n",
>>                  xstrdup (cgraph_node_name (callee)),
>>                  xstrdup (cgraph_node_name (e->caller)));
>> -      inline_call (e, true, NULL, NULL, true);
>> +      inline_call (e, true, NULL, NULL, true, true);
>>        inlined = true;
>>      }
>>
>> Index: ipa-inline.h
>> ===================================================================
>> --- ipa-inline.h        (revision 201461)
>> +++ ipa-inline.h        (working copy)
>> @@ -228,7 +228,8 @@ void free_growth_caches (void);
>>  void compute_inline_parameters (struct cgraph_node *, bool);
>>
>>  /* In ipa-inline-transform.c  */
>> -bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
>> int *, bool);
>> +bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
>> +                  bool, bool);
>>  unsigned int inline_transform (struct cgraph_node *);
>>  void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);
>>
>> Index: testsuite/gcc.dg/pr40209.c
>> ===================================================================
>> --- testsuite/gcc.dg/pr40209.c  (revision 201461)
>> +++ testsuite/gcc.dg/pr40209.c  (working copy)
>> @@ -1,5 +1,5 @@
>>  /* { dg-do compile } */
>> -/* { dg-options "-O2 -fprofile-use" } */
>> +/* { dg-options "-O2 -fprofile-use -fopt-info" } */
>>
>>  void process(const char *s);
>>
>> Index: testsuite/gcc.dg/pr26570.c
>> ===================================================================
>> --- testsuite/gcc.dg/pr26570.c  (revision 201461)
>> +++ testsuite/gcc.dg/pr26570.c  (working copy)
>> @@ -1,5 +1,5 @@
>>  /* { dg-do compile } */
>> -/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
>> +/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */
>>
>>  unsigned test (unsigned a, unsigned b)
>>  {
>> Index: testsuite/gcc.dg/pr32773.c
>> ===================================================================
>> --- testsuite/gcc.dg/pr32773.c  (revision 201461)
>> +++ testsuite/gcc.dg/pr32773.c  (working copy)
>> @@ -1,6 +1,6 @@
>>  /* { dg-do compile } */
>> -/* { dg-options "-O -fprofile-use" } */
>> -/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
>> +/* { dg-options "-O -fprofile-use -fopt-info" } */
>> +/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */
>>
>>  void foo (int *p)
>>  {
>> Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
>> ===================================================================
>> --- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
>> +++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
>> @@ -1,7 +1,7 @@
>>  // PR tree-optimization/39557
>>  // invalid post-dom info leads to infinite loop
>>  // { dg-do run }
>> -// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
>> +// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
>> -fno-rtti" }
>>
>>  struct C
>>  {
>> Index: testsuite/gcc.dg/inline-dump.c
>> ===================================================================
>> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
>> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
>> @@ -0,0 +1,11 @@
>> +/* Verify that -fopt-info can output correct inline info.  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
>> +static inline int leaf() {
>> +  int i, ret = 0;
>> +  for (i = 0; i < 10; i++)
>> +    ret += i;
>> +  return ret;
>> +}
>> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
>> leaf .*inlined into bar .*via inline instance foo.*\n" } */
>> +int bar(void) { return foo(); }
>>>
>>> Thanks,
>>> Teresa
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Martin
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Aug. 27, 2013, 5:56 p.m. UTC | #3
Ping #3.

Thanks,
Teresa

On Mon, Aug 19, 2013 at 11:33 AM, Teresa Johnson <tejohnson@google.com> wrote:
> Ping.
> Thanks,
> Teresa
>
> On Mon, Aug 12, 2013 at 6:54 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Tue, Aug 6, 2013 at 10:23 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>>>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>>>>> >> This patch ports messages to the new dump framework,
>>>>>> >
>>>>>> > It would be great this new framework was documented somewhere.  I lost
>>>>>> > track of what was agreed it would be and from the uses in the
>>>>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>>>>
>>>>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>>>>> wiki or elsewhere?
>>>>>
>>>>> Thanks
>>>>>
>>>>>>
>>>>>> >
>>>>>> > I'd also like to point out two other minor things inline:
>>>>>> >
>>>>>> > [...]
>>>>>> >
>>>>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>>> >>             Dehao Chen  <dehao@google.com>
>>>>>> >>
>>>>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>>>>> >>         consistent.
>>>>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>>>>> >>         (cgraph_node_opt_info): New function.
>>>>>> >>         (cgraph_node_call_chain): Ditto.
>>>>>> >>         (dump_inline_decision): Ditto.
>>>>>> >>         (inline_call): Invoke dump_inline_decision.
>>>>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>>> >>         (compute_branch_probabilities): Ditto.
>>>>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>>> >>         when pass not in any opt group.
>>>>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>>>>> >>         (find_func_by_funcdef_no): Ditto.
>>>>>> >>         (check_ic_target): Ditto.
>>>>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>>>>> >>         (coverage_init): Setup new dump framework.
>>>>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>>>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>>>>> >>
>>>>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>>>>> >>
>>>>>> >
>>>>>> > [...]
>>>>>> >
>>>>>> >> Index: ipa-inline-transform.c
>>>>>> >> ===================================================================
>>>>>> >> --- ipa-inline-transform.c      (revision 201461)
>>>>>> >> +++ ipa-inline-transform.c      (working copy)
>>>>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>>> >>  }
>>>>>> >>
>>>>>> >>
>>>>>> >> +#define MAX_INT_LENGTH 20
>>>>>> >> +
>>>>>> >> +/* Return NODE's name and profile count, if available.  */
>>>>>> >> +
>>>>>> >> +static const char *
>>>>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>>>>> >> +{
>>>>>> >> +  char *buf;
>>>>>> >> +  size_t buf_size;
>>>>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>>>> >> +
>>>>>> >> +  if (!bfd_name)
>>>>>> >> +    bfd_name = "unknown";
>>>>>> >> +
>>>>>> >> +  buf_size = strlen (bfd_name) + 1;
>>>>>> >> +  if (profile_info)
>>>>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>>>>> >> +
>>>>>> >> +  buf = (char *) xmalloc (buf_size);
>>>>>> >> +
>>>>>> >> +  strcpy (buf, bfd_name);
>>>>>> >> +
>>>>>> >> +  if (profile_info)
>>>>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>>>> >> +  return buf;
>>>>>> >> +}
>>>>>> >
>>>>>> > I'm not sure if output of this function is aimed only at the user or
>>>>>> > if it is supposed to be used by gcc developers as well.  If the
>>>>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>>>>> > too.  We usually dump it after "/" sign separating it from node name.
>>>>>> > It is invaluable when examining decisions in C++ code where you can
>>>>>> > have lots of clones of a node (and also because existing dumps print
>>>>>> > it, it is easy to combine them).
>>>>>>
>>>>>> The output is useful for both power users doing performance tuning of
>>>>>> their application, and by gcc developers. Adding the id is not so
>>>>>> useful for the former, but I agree that it is very useful for compiler
>>>>>> developers. In fact, in the google branch version we emit more verbose
>>>>>> information (the lipo module id and the funcdef_no) to help uniquely
>>>>>> identify the routines and to aid in post-processing by humans and
>>>>>> tools. So it is probably useful to add something similar here too. Is
>>>>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>>>>> that you added a patch a few months ago to print the
>>>>>> node->symbol.order in the function header, and it also has the
>>>>>> advantage as you note of matching up with existing ipa dumps.
>>>>>
>>>>> node->symbol.order is unique and if I remember correctly, it is not
>>>>> even recycled.  Clones, inline clones, thunks, every symbol table node
>>>>> gets its own symbol order so it should be more unique than funcdef_no.
>>>>> On the other hand it may be a bit cryptic for users but at the same
>>>>> time it is only one number.
>>>>
>>>> Ok, I am going to go ahead and add this to the output.
>>>>
>>>>>
>>>>>>
>>>>>> >
>>>>>> > [...]
>>>>>> >
>>>>>> >> Index: ipa-inline.c
>>>>>> >> ===================================================================
>>>>>> >> --- ipa-inline.c        (revision 201461)
>>>>>> >> +++ ipa-inline.c        (working copy)
>>>>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>>>>> >>  static int overall_size;
>>>>>> >>  static gcov_type max_count;
>>>>>> >>
>>>>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>>>>> >> +bool is_in_ipa_inline = false;
>>>>>> >> +
>>>>>> >>  /* Return false when inlining edge E would lead to violating
>>>>>> >>     limits on function unit growth or stack usage growth.
>>>>>> >>
>>>>>> >
>>>>>> > In this age of removing global variables, are you sure you need this?
>>>>>> > The only user of this seems to be a function that is only being called
>>>>>> > from inline_call... can that ever happen when not inlining?  If you
>>>>>> > plan to use this function also elsewhere, perhaps the callers will
>>>>>> > know whether we are inlining or not and can provide this in a
>>>>>> > parameter?
>>>>>>
>>>>>> This is to distinguish early inlining from ipa inlining.
>>>>>
>>>>> Oh, right, I did not realize that the IPA part was the important bit
>>>>> of the name.
>>>>>
>>>>>> The volume of
>>>>>> early inlining messages is too high to be on for the default setting
>>>>>> of -fopt-info, and are not as interesting usually for performance
>>>>>> tuning. The dumper will only emit the early inline messages under a
>>>>>> more verbose setting (MSG_NOTE):
>>>>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>>>>> The other way I can see to distinguish this would be to check the
>>>>>> always_inline_functions_inlined flag on the caller's function. It
>>>>>> could also be possible to pass down a flag from the callers of
>>>>>> inline_call, but at least one caller (flatten_functions) is shared
>>>>>> between early and late inlining, so the flag needs to be passed
>>>>>> through that as well. WDYT?
>>>>>
>>>>> Did you mean flatten_function?  It already has a bool "early"
>>>>> parameter.  But I can see that being able to quickly figure out
>>>>> whether we are in early inliner or ipa inliner without much hassle is
>>>>> useful enough to justify a global variable a month ago, however I
>>>>> suppose we should not be introducing them now and so you'd have to put
>>>>> such stuff into... well, you'd probably have to put into the universe
>>>>> object somewhere because it is basically shared between two passes.
>>>>> Another option, even though somewhat hackish, would be to look at
>>>>> current_pass and see which pass it is.  I don't know, do what is
>>>>> easier or what you like more, just be aware of the problem.
>>>>
>>>> After thinking about this some more, I think passing down an early
>>>> flag from callers is the cleanest way to go.
>>>>
>>>> I'll fix these and post a new patch later today.
>>>
>>> New patch below that removes this global variable, and also outputs
>>> the node->symbol.order (in square brackets after the function name so
>>> as to not clutter it). Inline messages with profile data look look:
>>>
>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>
>>> (without FDO the counts in parentheses and the call count would not be
>>> included).
>>>
>>> Ok for trunk?
>>
>> Ping.
>> Teresa
>>
>>> Thanks,
>>> Teresa
>>>
>>> 013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>             Dehao Chen  <dehao@google.com>
>>>
>>>         * dumpfile.c (dump_loc): Output column number, make newlines consistent.
>>>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>         * ipa-inline-transform.c (cgraph_node_opt_info): New function.
>>>         (cgraph_node_call_chain): Ditto.
>>>         (dump_inline_decision): Ditto.
>>>         (inline_call): Invoke dump_inline_decision, new parameter.
>>>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>         (compute_branch_probabilities): Ditto.
>>>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>         when pass not in any opt group.
>>>         * value-prof.c (check_counter): Use new dump framework.
>>>         (find_func_by_funcdef_no): Ditto.
>>>         (check_ic_target): Ditto.
>>>         * coverage.c (get_coverage_counts): Ditto.
>>>         (coverage_init): Setup new dump framework.
>>>         * ipa-inline.c (recursive_inlining): New inline_call parameter.
>>>         (inline_small_functions): Ditto.
>>>         (flatten_function): Ditto.
>>>         (ipa_inline): Ditto.
>>>         (inline_always_inline_functions): Ditto.
>>>         (early_inline_small_functions): Ditto.
>>>         * ipa-inline.h: Ditto.
>>>
>>>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>         * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.
>>>         * testsuite/gcc.dg/inline-dump.c: New test.
>>>
>>> Index: dumpfile.c
>>> ===================================================================
>>> --- dumpfile.c  (revision 201461)
>>> +++ dumpfile.c  (working copy)
>>> @@ -257,16 +257,18 @@ dump_open_alternate_stream (struct dump_file_info
>>>  void
>>>  dump_loc (int dump_kind, FILE *dfile, source_location loc)
>>>  {
>>> -  /* Currently vectorization passes print location information.  */
>>>    if (dump_kind)
>>>      {
>>> +      /* Ensure dump message starts on a new line.  */
>>> +      fprintf (dfile, "\n");
>>>        if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
>>> -        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
>>> -                 LOCATION_LINE (loc));
>>> +        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
>>> +                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
>>>        else if (current_function_decl)
>>> -        fprintf (dfile, "\n%s:%d: note: ",
>>> +        fprintf (dfile, "%s:%d:%d: note: ",
>>>                   DECL_SOURCE_FILE (current_function_decl),
>>> -                 DECL_SOURCE_LINE (current_function_decl));
>>> +                 DECL_SOURCE_LINE (current_function_decl),
>>> +                 DECL_SOURCE_COLUMN (current_function_decl));
>>>      }
>>>  }
>>>
>>> Index: dumpfile.h
>>> ===================================================================
>>> --- dumpfile.h  (revision 201461)
>>> +++ dumpfile.h  (working copy)
>>> @@ -97,8 +97,9 @@ enum tree_dump_index
>>>  #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
>>>  #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
>>>  #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
>>> +#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
>>>  #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
>>> -                              | OPTGROUP_VEC)
>>> +                              | OPTGROUP_VEC | OPTGROUP_OTHER)
>>>
>>>  /* Define a tree dump switch.  */
>>>  struct dump_file_info
>>> Index: ipa-inline-transform.c
>>> ===================================================================
>>> --- ipa-inline-transform.c      (revision 201461)
>>> +++ ipa-inline-transform.c      (working copy)
>>> @@ -192,6 +192,111 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>  }
>>>
>>>
>>> +#define MAX_INT_LENGTH 20
>>> +
>>> +/* Return NODE's name and profile count, if available.  */
>>> +
>>> +static const char *
>>> +cgraph_node_opt_info (struct cgraph_node *node)
>>> +{
>>> +  char *buf;
>>> +  size_t buf_size;
>>> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>> +
>>> +  if (!bfd_name)
>>> +    bfd_name = "unknown";
>>> +
>>> +  buf_size = strlen (bfd_name) + 1;
>>> +  if (profile_info)
>>> +    buf_size += (MAX_INT_LENGTH + 3);
>>> +  buf_size += MAX_INT_LENGTH;
>>> +
>>> +  buf = (char *) xmalloc (buf_size);
>>> +
>>> +  strcpy (buf, bfd_name);
>>> +  //sprintf (buf, "%s/%i", buf, node->symbol.order);
>>> +  sprintf (buf, "%s [%i]", buf, node->symbol.order);
>>> +
>>> +  if (profile_info)
>>> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>> +  return buf;
>>> +}
>>> +
>>> +
>>> +/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
>>> +   function that the caller is inlined to in FINAL_CALLER.  */
>>> +
>>> +static const char *
>>> +cgraph_node_call_chain (struct cgraph_node *caller,
>>> +                       struct cgraph_node **final_caller)
>>> +{
>>> +  struct cgraph_node *node;
>>> +  const char *via_str = " (via inline instance";
>>> +  size_t current_string_len = strlen (via_str) + 1;
>>> +  size_t buf_size = current_string_len;
>>> +  char *buf = (char *) xmalloc (buf_size);
>>> +
>>> +  buf[0] = 0;
>>> +  gcc_assert (caller->global.inlined_to != NULL);
>>> +  strcat (buf, via_str);
>>> +  for (node = caller; node->global.inlined_to != NULL;
>>> +       node = node->callers->caller)
>>> +    {
>>> +      const char *name = cgraph_node_opt_info (node);
>>> +      current_string_len += (strlen (name) + 1);
>>> +      if (current_string_len >= buf_size)
>>> +       {
>>> +         buf_size = current_string_len * 2;
>>> +         buf = (char *) xrealloc (buf, buf_size);
>>> +       }
>>> +      strcat (buf, " ");
>>> +      strcat (buf, name);
>>> +    }
>>> +  strcat (buf, ")");
>>> +  *final_caller = node;
>>> +  return buf;
>>> +}
>>> +
>>> +
>>> +/* Dump the inline decision of EDGE.  */
>>> +
>>> +static void
>>> +dump_inline_decision (struct cgraph_edge *edge, bool early)
>>> +{
>>> +  location_t locus;
>>> +  const char *inline_chain_text;
>>> +  const char *call_count_text;
>>> +  struct cgraph_node *final_caller = edge->caller;
>>> +
>>> +  if (final_caller->global.inlined_to != NULL)
>>> +    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
>>> +  else
>>> +    inline_chain_text = "";
>>> +
>>> +  if (edge->count > 0)
>>> +    {
>>> +      const char *call_count_str = " with call count ";
>>> +      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
>>> +      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
>>> +              edge->count);
>>> +      call_count_text = buf;
>>> +    }
>>> +  else
>>> +    {
>>> +      call_count_text = "";
>>> +    }
>>> +
>>> +  locus = gimple_location (edge->call_stmt);
>>> +  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
>>> +                   locus,
>>> +                   "%s inlined into %s%s%s\n",
>>> +                   cgraph_node_opt_info (edge->callee),
>>> +                   cgraph_node_opt_info (final_caller),
>>> +                   call_count_text,
>>> +                   inline_chain_text);
>>> +}
>>> +
>>> +
>>>  /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
>>>     specify whether profile of original function should be updated.  If any new
>>>     indirect edges are discovered in the process, add them to NEW_EDGES, unless
>>> @@ -205,7 +310,8 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>  bool
>>>  inline_call (struct cgraph_edge *e, bool update_original,
>>>              vec<cgraph_edge_p> *new_edges,
>>> -            int *overall_size, bool update_overall_summary)
>>> +            int *overall_size, bool update_overall_summary,
>>> +             bool early)
>>>  {
>>>    int old_size = 0, new_size = 0;
>>>    struct cgraph_node *to = NULL;
>>> @@ -218,6 +324,9 @@ inline_call (struct cgraph_edge *e, bool update_or
>>>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>>>  #endif
>>>
>>> +  if (dump_enabled_p ())
>>> +    dump_inline_decision (e, early);
>>> +
>>>    /* Don't inline inlined edges.  */
>>>    gcc_assert (e->inline_failed);
>>>    /* Don't even think of inlining inline clone.  */
>>> Index: doc/invoke.texi
>>> ===================================================================
>>> --- doc/invoke.texi     (revision 201461)
>>> +++ doc/invoke.texi     (working copy)
>>> @@ -6234,6 +6234,9 @@ Enable dumps from all loop optimizations.
>>>  Enable dumps from all inlining optimizations.
>>>  @item vec
>>>  Enable dumps from all vectorization optimizations.
>>> +@item optall
>>> +Enable dumps from all optimizations. This is a superset of
>>> +the optimization groups listed above.
>>>  @end table
>>>
>>>  For example,
>>> Index: profile.c
>>> ===================================================================
>>> --- profile.c   (revision 201461)
>>> +++ profile.c   (working copy)
>>> @@ -432,8 +432,8 @@ read_profile_edge_counts (gcov_type *exec_counts)
>>>                     if (flag_profile_correction)
>>>                       {
>>>                         static bool informed = 0;
>>> -                       if (!informed)
>>> -                         inform (input_location,
>>> +                       if (dump_enabled_p () && !informed)
>>> +                         dump_printf_loc (MSG_NOTE, input_location,
>>>                                   "corrupted profile info: edge count
>>> exceeds maximal count");
>>>                         informed = 1;
>>>                       }
>>> @@ -692,10 +692,11 @@ compute_branch_probabilities (unsigned cfg_checksu
>>>         {
>>>           /* Inconsistency detected. Make it flow-consistent. */
>>>           static int informed = 0;
>>> -         if (informed == 0)
>>> +         if (dump_enabled_p () && informed == 0)
>>>             {
>>>               informed = 1;
>>> -             inform (input_location, "correcting inconsistent profile data");
>>> +             dump_printf_loc (MSG_NOTE, input_location,
>>> +                              "correcting inconsistent profile data");
>>>             }
>>>           correct_negative_edge_counts ();
>>>           /* Set bb counts to the sum of the outgoing edge counts */
>>> Index: passes.c
>>> ===================================================================
>>> --- passes.c    (revision 201461)
>>> +++ passes.c    (working copy)
>>> @@ -524,6 +524,11 @@ pass_manager::register_one_dump_file (struct opt_p
>>>    flag_name = concat (prefix, name, num, NULL);
>>>    glob_name = concat (prefix, name, NULL);
>>>    optgroup_flags |= pass->optinfo_flags;
>>> +  /* For any passes that do not have an optgroup set, and which are not
>>> +     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
>>> +     any dump messages are emitted properly under -fopt-info(-optall).  */
>>> +  if (optgroup_flags == OPTGROUP_NONE)
>>> +    optgroup_flags = OPTGROUP_OTHER;
>>>    id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
>>>    set_pass_for_id (id, pass);
>>>    full_name = concat (prefix, pass->name, num, NULL);
>>> Index: value-prof.c
>>> ===================================================================
>>> --- value-prof.c        (revision 201461)
>>> +++ value-prof.c        (working copy)
>>> @@ -585,9 +585,11 @@ check_counter (gimple stmt, const char * name,
>>>                : DECL_SOURCE_LOCATION (current_function_decl);
>>>        if (flag_profile_correction)
>>>          {
>>> -         inform (locus, "correcting inconsistent value profile: "
>>> -                 "%s profiler overall count (%d) does not match BB count "
>>> -                  "(%d)", name, (int)*all, (int)bb_count);
>>> +          if (dump_enabled_p ())
>>> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>> +                             "correcting inconsistent value profile: %s "
>>> +                             "profiler overall count (%d) does not match BB "
>>> +                             "count (%d)", name, (int)*all, (int)bb_count);
>>>           *all = bb_count;
>>>           if (*count > *all)
>>>              *count = *all;
>>> @@ -1209,9 +1211,11 @@ find_func_by_funcdef_no (int func_id)
>>>    int max_id = get_last_funcdef_no ();
>>>    if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
>>>      {
>>> -      if (flag_profile_correction)
>>> -        inform (DECL_SOURCE_LOCATION (current_function_decl),
>>> -                "Inconsistent profile: indirect call target (%d) does
>>> not exist", func_id);
>>> +      if (flag_profile_correction && dump_enabled_p ())
>>> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>>> +                         DECL_SOURCE_LOCATION (current_function_decl),
>>> +                         "Inconsistent profile: indirect call target (%d) "
>>> +                         "does not exist", func_id);
>>>        else
>>>          error ("Inconsistent profile: indirect call target (%d) does
>>> not exist", func_id);
>>>
>>> @@ -1235,8 +1239,10 @@ check_ic_target (gimple call_stmt, struct cgraph_n
>>>       return true;
>>>
>>>     locus =  gimple_location (call_stmt);
>>> -   inform (locus, "Skipping target %s with mismatching types for icall ",
>>> -           cgraph_node_name (target));
>>> +   if (dump_enabled_p ())
>>> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>> +                      "Skipping target %s with mismatching types for icall ",
>>> +                      cgraph_node_name (target));
>>>     return false;
>>>  }
>>>
>>> Index: coverage.c
>>> ===================================================================
>>> --- coverage.c  (revision 201461)
>>> +++ coverage.c  (working copy)
>>> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "langhooks.h"
>>>  #include "hash-table.h"
>>>  #include "tree-iterator.h"
>>> +#include "tree-pass.h"
>>>  #include "cgraph.h"
>>>  #include "dumpfile.h"
>>>  #include "diagnostic-core.h"
>>> @@ -341,11 +342,13 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>      {
>>>        static int warned = 0;
>>>
>>> -      if (!warned++)
>>> -       inform (input_location, (flag_guess_branch_prob
>>> -                ? "file %s not found, execution counts estimated"
>>> -                : "file %s not found, execution counts assumed to be zero"),
>>> -               da_file_name);
>>> +      if (!warned++ && dump_enabled_p ())
>>> +       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                         (flag_guess_branch_prob
>>> +                          ? "file %s not found, execution counts estimated"
>>> +                          : "file %s not found, execution counts assumed to "
>>> +                            "be zero"),
>>> +                         da_file_name);
>>>        return NULL;
>>>      }
>>>
>>> @@ -369,21 +372,25 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>         warning_at (input_location, OPT_Wcoverage_mismatch,
>>>                     "the control flow of function %qE does not match "
>>>                     "its profile data (counter %qs)", id, ctr_names[counter]);
>>> -      if (warning_printed)
>>> +      if (warning_printed && dump_enabled_p ())
>>>         {
>>> -        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
>>> -                "the mismatch but performance may drop if the
>>> function is hot");
>>> +          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                           "use -Wno-error=coverage-mismatch to tolerate "
>>> +                           "the mismatch but performance may drop if the "
>>> +                           "function is hot");
>>>
>>>           if (!seen_error ()
>>>               && !warned++)
>>>             {
>>> -             inform (input_location, "coverage mismatch ignored");
>>> -             inform (input_location, flag_guess_branch_prob
>>> -                     ? G_("execution counts estimated")
>>> -                     : G_("execution counts assumed to be zero"));
>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                               "coverage mismatch ignored");
>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                               flag_guess_branch_prob
>>> +                               ? G_("execution counts estimated")
>>> +                               : G_("execution counts assumed to be zero"));
>>>               if (!flag_guess_branch_prob)
>>> -               inform (input_location,
>>> -                       "this can result in poorly optimized code");
>>> +               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                                 "this can result in poorly optimized code");
>>>             }
>>>         }
>>>
>>> @@ -1103,6 +1110,11 @@ coverage_init (const char *filename)
>>>    int len = strlen (filename);
>>>    int prefix_len = 0;
>>>
>>> +  /* Since coverage_init is invoked very early, before the pass
>>> +     manager, we need to set up the dumping explicitly. This is
>>> +     similar to the handling in finish_optimization_passes.  */
>>> +  dump_start (pass_profile.pass.static_pass_number, NULL);
>>> +
>>>    if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
>>>      profile_data_prefix = getpwd ();
>>>
>>> @@ -1145,6 +1157,8 @@ coverage_init (const char *filename)
>>>           gcov_write_unsigned (bbg_file_stamp);
>>>         }
>>>      }
>>> +
>>> +  dump_finish (pass_profile.pass.static_pass_number);
>>>  }
>>>
>>>  /* Performs file-level cleanup.  Close notes file, generate coverage
>>> Index: ipa-inline.c
>>> ===================================================================
>>> --- ipa-inline.c        (revision 201461)
>>> +++ ipa-inline.c        (working copy)
>>> @@ -1322,7 +1322,7 @@ recursive_inlining (struct cgraph_edge *edge,
>>>            reset_edge_growth_cache (curr);
>>>         }
>>>
>>> -      inline_call (curr, false, new_edges, &overall_size, true);
>>> +      inline_call (curr, false, new_edges, &overall_size, true, false);
>>>        lookup_recursive_calls (node, curr->callee, heap);
>>>        n++;
>>>      }
>>> @@ -1612,7 +1612,8 @@ inline_small_functions (void)
>>>             fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
>>>
>>>           gcc_checking_assert (!callee->global.inlined_to);
>>> -         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
>>> +         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
>>> +                       false);
>>>           if (flag_indirect_inlining)
>>>             add_new_edges_to_heap (edge_heap, new_indirect_edges);
>>>
>>> @@ -1733,7 +1734,7 @@ flatten_function (struct cgraph_node *node, bool e
>>>                  xstrdup (cgraph_node_name (callee)),
>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>        orig_callee = callee;
>>> -      inline_call (e, true, NULL, NULL, false);
>>> +      inline_call (e, true, NULL, NULL, false, early);
>>>        if (e->callee != orig_callee)
>>>         orig_callee->symbol.aux = (void *) node;
>>>        flatten_function (e->callee, early);
>>> @@ -1852,7 +1853,8 @@ ipa_inline (void)
>>>                                    inline_summary
>>> (node->callers->caller)->size);
>>>                         }
>>>
>>> -                     inline_call (node->callers, true, NULL, NULL, true);
>>> +                     inline_call (node->callers, true, NULL, NULL, true,
>>> +                                   false);
>>>                       if (dump_file)
>>>                         fprintf (dump_file,
>>>                                  " Inlined into %s which now has %i size\n",
>>> @@ -1925,7 +1927,7 @@ inline_always_inline_functions (struct cgraph_node
>>>         fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
>>>                  xstrdup (cgraph_node_name (e->callee)),
>>>                  xstrdup (cgraph_node_name (e->caller)));
>>> -      inline_call (e, true, NULL, NULL, false);
>>> +      inline_call (e, true, NULL, NULL, false, true);
>>>        inlined = true;
>>>      }
>>>    if (inlined)
>>> @@ -1977,7 +1979,7 @@ early_inline_small_functions (struct cgraph_node *
>>>         fprintf (dump_file, " Inlining %s into %s.\n",
>>>                  xstrdup (cgraph_node_name (callee)),
>>>                  xstrdup (cgraph_node_name (e->caller)));
>>> -      inline_call (e, true, NULL, NULL, true);
>>> +      inline_call (e, true, NULL, NULL, true, true);
>>>        inlined = true;
>>>      }
>>>
>>> Index: ipa-inline.h
>>> ===================================================================
>>> --- ipa-inline.h        (revision 201461)
>>> +++ ipa-inline.h        (working copy)
>>> @@ -228,7 +228,8 @@ void free_growth_caches (void);
>>>  void compute_inline_parameters (struct cgraph_node *, bool);
>>>
>>>  /* In ipa-inline-transform.c  */
>>> -bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
>>> int *, bool);
>>> +bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
>>> +                  bool, bool);
>>>  unsigned int inline_transform (struct cgraph_node *);
>>>  void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);
>>>
>>> Index: testsuite/gcc.dg/pr40209.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/pr40209.c  (revision 201461)
>>> +++ testsuite/gcc.dg/pr40209.c  (working copy)
>>> @@ -1,5 +1,5 @@
>>>  /* { dg-do compile } */
>>> -/* { dg-options "-O2 -fprofile-use" } */
>>> +/* { dg-options "-O2 -fprofile-use -fopt-info" } */
>>>
>>>  void process(const char *s);
>>>
>>> Index: testsuite/gcc.dg/pr26570.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/pr26570.c  (revision 201461)
>>> +++ testsuite/gcc.dg/pr26570.c  (working copy)
>>> @@ -1,5 +1,5 @@
>>>  /* { dg-do compile } */
>>> -/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
>>> +/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */
>>>
>>>  unsigned test (unsigned a, unsigned b)
>>>  {
>>> Index: testsuite/gcc.dg/pr32773.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/pr32773.c  (revision 201461)
>>> +++ testsuite/gcc.dg/pr32773.c  (working copy)
>>> @@ -1,6 +1,6 @@
>>>  /* { dg-do compile } */
>>> -/* { dg-options "-O -fprofile-use" } */
>>> -/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
>>> +/* { dg-options "-O -fprofile-use -fopt-info" } */
>>> +/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */
>>>
>>>  void foo (int *p)
>>>  {
>>> Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
>>> ===================================================================
>>> --- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
>>> +++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
>>> @@ -1,7 +1,7 @@
>>>  // PR tree-optimization/39557
>>>  // invalid post-dom info leads to infinite loop
>>>  // { dg-do run }
>>> -// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
>>> +// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
>>> -fno-rtti" }
>>>
>>>  struct C
>>>  {
>>> Index: testsuite/gcc.dg/inline-dump.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
>>> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
>>> @@ -0,0 +1,11 @@
>>> +/* Verify that -fopt-info can output correct inline info.  */
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
>>> +static inline int leaf() {
>>> +  int i, ret = 0;
>>> +  for (i = 0; i < 10; i++)
>>> +    ret += i;
>>> +  return ret;
>>> +}
>>> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
>>> leaf .*inlined into bar .*via inline instance foo.*\n" } */
>>> +int bar(void) { return foo(); }
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Martin
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Xinliang David Li Aug. 27, 2013, 6:03 p.m. UTC | #4
+ Honza

On Tue, Aug 27, 2013 at 10:56 AM, Teresa Johnson <tejohnson@google.com> wrote:
> Ping #3.
>
> Thanks,
> Teresa
>
> On Mon, Aug 19, 2013 at 11:33 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> Ping.
>> Thanks,
>> Teresa
>>
>> On Mon, Aug 12, 2013 at 6:54 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Tue, Aug 6, 2013 at 10:23 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>>>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>>>>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>>>>>> >> This patch ports messages to the new dump framework,
>>>>>>> >
>>>>>>> > It would be great this new framework was documented somewhere.  I lost
>>>>>>> > track of what was agreed it would be and from the uses in the
>>>>>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>>>>>
>>>>>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>>>>>> wiki or elsewhere?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>>
>>>>>>> >
>>>>>>> > I'd also like to point out two other minor things inline:
>>>>>>> >
>>>>>>> > [...]
>>>>>>> >
>>>>>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>>>> >>             Dehao Chen  <dehao@google.com>
>>>>>>> >>
>>>>>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>>>>>> >>         consistent.
>>>>>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>>>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>>>>>> >>         (cgraph_node_opt_info): New function.
>>>>>>> >>         (cgraph_node_call_chain): Ditto.
>>>>>>> >>         (dump_inline_decision): Ditto.
>>>>>>> >>         (inline_call): Invoke dump_inline_decision.
>>>>>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>>>> >>         (compute_branch_probabilities): Ditto.
>>>>>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>>>> >>         when pass not in any opt group.
>>>>>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>>>>>> >>         (find_func_by_funcdef_no): Ditto.
>>>>>>> >>         (check_ic_target): Ditto.
>>>>>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>>>>>> >>         (coverage_init): Setup new dump framework.
>>>>>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>>>>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>>>>>> >>
>>>>>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>>>>>> >>
>>>>>>> >
>>>>>>> > [...]
>>>>>>> >
>>>>>>> >> Index: ipa-inline-transform.c
>>>>>>> >> ===================================================================
>>>>>>> >> --- ipa-inline-transform.c      (revision 201461)
>>>>>>> >> +++ ipa-inline-transform.c      (working copy)
>>>>>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>>>> >>  }
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> +#define MAX_INT_LENGTH 20
>>>>>>> >> +
>>>>>>> >> +/* Return NODE's name and profile count, if available.  */
>>>>>>> >> +
>>>>>>> >> +static const char *
>>>>>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>>>>>> >> +{
>>>>>>> >> +  char *buf;
>>>>>>> >> +  size_t buf_size;
>>>>>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>>>>> >> +
>>>>>>> >> +  if (!bfd_name)
>>>>>>> >> +    bfd_name = "unknown";
>>>>>>> >> +
>>>>>>> >> +  buf_size = strlen (bfd_name) + 1;
>>>>>>> >> +  if (profile_info)
>>>>>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>>>>>> >> +
>>>>>>> >> +  buf = (char *) xmalloc (buf_size);
>>>>>>> >> +
>>>>>>> >> +  strcpy (buf, bfd_name);
>>>>>>> >> +
>>>>>>> >> +  if (profile_info)
>>>>>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>>>>> >> +  return buf;
>>>>>>> >> +}
>>>>>>> >
>>>>>>> > I'm not sure if output of this function is aimed only at the user or
>>>>>>> > if it is supposed to be used by gcc developers as well.  If the
>>>>>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>>>>>> > too.  We usually dump it after "/" sign separating it from node name.
>>>>>>> > It is invaluable when examining decisions in C++ code where you can
>>>>>>> > have lots of clones of a node (and also because existing dumps print
>>>>>>> > it, it is easy to combine them).
>>>>>>>
>>>>>>> The output is useful for both power users doing performance tuning of
>>>>>>> their application, and by gcc developers. Adding the id is not so
>>>>>>> useful for the former, but I agree that it is very useful for compiler
>>>>>>> developers. In fact, in the google branch version we emit more verbose
>>>>>>> information (the lipo module id and the funcdef_no) to help uniquely
>>>>>>> identify the routines and to aid in post-processing by humans and
>>>>>>> tools. So it is probably useful to add something similar here too. Is
>>>>>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>>>>>> that you added a patch a few months ago to print the
>>>>>>> node->symbol.order in the function header, and it also has the
>>>>>>> advantage as you note of matching up with existing ipa dumps.
>>>>>>
>>>>>> node->symbol.order is unique and if I remember correctly, it is not
>>>>>> even recycled.  Clones, inline clones, thunks, every symbol table node
>>>>>> gets its own symbol order so it should be more unique than funcdef_no.
>>>>>> On the other hand it may be a bit cryptic for users but at the same
>>>>>> time it is only one number.
>>>>>
>>>>> Ok, I am going to go ahead and add this to the output.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> >
>>>>>>> > [...]
>>>>>>> >
>>>>>>> >> Index: ipa-inline.c
>>>>>>> >> ===================================================================
>>>>>>> >> --- ipa-inline.c        (revision 201461)
>>>>>>> >> +++ ipa-inline.c        (working copy)
>>>>>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>>>>>> >>  static int overall_size;
>>>>>>> >>  static gcov_type max_count;
>>>>>>> >>
>>>>>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>>>>>> >> +bool is_in_ipa_inline = false;
>>>>>>> >> +
>>>>>>> >>  /* Return false when inlining edge E would lead to violating
>>>>>>> >>     limits on function unit growth or stack usage growth.
>>>>>>> >>
>>>>>>> >
>>>>>>> > In this age of removing global variables, are you sure you need this?
>>>>>>> > The only user of this seems to be a function that is only being called
>>>>>>> > from inline_call... can that ever happen when not inlining?  If you
>>>>>>> > plan to use this function also elsewhere, perhaps the callers will
>>>>>>> > know whether we are inlining or not and can provide this in a
>>>>>>> > parameter?
>>>>>>>
>>>>>>> This is to distinguish early inlining from ipa inlining.
>>>>>>
>>>>>> Oh, right, I did not realize that the IPA part was the important bit
>>>>>> of the name.
>>>>>>
>>>>>>> The volume of
>>>>>>> early inlining messages is too high to be on for the default setting
>>>>>>> of -fopt-info, and are not as interesting usually for performance
>>>>>>> tuning. The dumper will only emit the early inline messages under a
>>>>>>> more verbose setting (MSG_NOTE):
>>>>>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>>>>>> The other way I can see to distinguish this would be to check the
>>>>>>> always_inline_functions_inlined flag on the caller's function. It
>>>>>>> could also be possible to pass down a flag from the callers of
>>>>>>> inline_call, but at least one caller (flatten_functions) is shared
>>>>>>> between early and late inlining, so the flag needs to be passed
>>>>>>> through that as well. WDYT?
>>>>>>
>>>>>> Did you mean flatten_function?  It already has a bool "early"
>>>>>> parameter.  But I can see that being able to quickly figure out
>>>>>> whether we are in early inliner or ipa inliner without much hassle is
>>>>>> useful enough to justify a global variable a month ago, however I
>>>>>> suppose we should not be introducing them now and so you'd have to put
>>>>>> such stuff into... well, you'd probably have to put into the universe
>>>>>> object somewhere because it is basically shared between two passes.
>>>>>> Another option, even though somewhat hackish, would be to look at
>>>>>> current_pass and see which pass it is.  I don't know, do what is
>>>>>> easier or what you like more, just be aware of the problem.
>>>>>
>>>>> After thinking about this some more, I think passing down an early
>>>>> flag from callers is the cleanest way to go.
>>>>>
>>>>> I'll fix these and post a new patch later today.
>>>>
>>>> New patch below that removes this global variable, and also outputs
>>>> the node->symbol.order (in square brackets after the function name so
>>>> as to not clutter it). Inline messages with profile data look look:
>>>>
>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>>
>>>> (without FDO the counts in parentheses and the call count would not be
>>>> included).
>>>>
>>>> Ok for trunk?
>>>
>>> Ping.
>>> Teresa
>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>> 013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>             Dehao Chen  <dehao@google.com>
>>>>
>>>>         * dumpfile.c (dump_loc): Output column number, make newlines consistent.
>>>>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>>         * ipa-inline-transform.c (cgraph_node_opt_info): New function.
>>>>         (cgraph_node_call_chain): Ditto.
>>>>         (dump_inline_decision): Ditto.
>>>>         (inline_call): Invoke dump_inline_decision, new parameter.
>>>>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>         (compute_branch_probabilities): Ditto.
>>>>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>         when pass not in any opt group.
>>>>         * value-prof.c (check_counter): Use new dump framework.
>>>>         (find_func_by_funcdef_no): Ditto.
>>>>         (check_ic_target): Ditto.
>>>>         * coverage.c (get_coverage_counts): Ditto.
>>>>         (coverage_init): Setup new dump framework.
>>>>         * ipa-inline.c (recursive_inlining): New inline_call parameter.
>>>>         (inline_small_functions): Ditto.
>>>>         (flatten_function): Ditto.
>>>>         (ipa_inline): Ditto.
>>>>         (inline_always_inline_functions): Ditto.
>>>>         (early_inline_small_functions): Ditto.
>>>>         * ipa-inline.h: Ditto.
>>>>
>>>>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>         * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.
>>>>         * testsuite/gcc.dg/inline-dump.c: New test.
>>>>
>>>> Index: dumpfile.c
>>>> ===================================================================
>>>> --- dumpfile.c  (revision 201461)
>>>> +++ dumpfile.c  (working copy)
>>>> @@ -257,16 +257,18 @@ dump_open_alternate_stream (struct dump_file_info
>>>>  void
>>>>  dump_loc (int dump_kind, FILE *dfile, source_location loc)
>>>>  {
>>>> -  /* Currently vectorization passes print location information.  */
>>>>    if (dump_kind)
>>>>      {
>>>> +      /* Ensure dump message starts on a new line.  */
>>>> +      fprintf (dfile, "\n");
>>>>        if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
>>>> -        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
>>>> -                 LOCATION_LINE (loc));
>>>> +        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
>>>> +                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
>>>>        else if (current_function_decl)
>>>> -        fprintf (dfile, "\n%s:%d: note: ",
>>>> +        fprintf (dfile, "%s:%d:%d: note: ",
>>>>                   DECL_SOURCE_FILE (current_function_decl),
>>>> -                 DECL_SOURCE_LINE (current_function_decl));
>>>> +                 DECL_SOURCE_LINE (current_function_decl),
>>>> +                 DECL_SOURCE_COLUMN (current_function_decl));
>>>>      }
>>>>  }
>>>>
>>>> Index: dumpfile.h
>>>> ===================================================================
>>>> --- dumpfile.h  (revision 201461)
>>>> +++ dumpfile.h  (working copy)
>>>> @@ -97,8 +97,9 @@ enum tree_dump_index
>>>>  #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
>>>>  #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
>>>>  #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
>>>> +#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
>>>>  #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
>>>> -                              | OPTGROUP_VEC)
>>>> +                              | OPTGROUP_VEC | OPTGROUP_OTHER)
>>>>
>>>>  /* Define a tree dump switch.  */
>>>>  struct dump_file_info
>>>> Index: ipa-inline-transform.c
>>>> ===================================================================
>>>> --- ipa-inline-transform.c      (revision 201461)
>>>> +++ ipa-inline-transform.c      (working copy)
>>>> @@ -192,6 +192,111 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>  }
>>>>
>>>>
>>>> +#define MAX_INT_LENGTH 20
>>>> +
>>>> +/* Return NODE's name and profile count, if available.  */
>>>> +
>>>> +static const char *
>>>> +cgraph_node_opt_info (struct cgraph_node *node)
>>>> +{
>>>> +  char *buf;
>>>> +  size_t buf_size;
>>>> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>> +
>>>> +  if (!bfd_name)
>>>> +    bfd_name = "unknown";
>>>> +
>>>> +  buf_size = strlen (bfd_name) + 1;
>>>> +  if (profile_info)
>>>> +    buf_size += (MAX_INT_LENGTH + 3);
>>>> +  buf_size += MAX_INT_LENGTH;
>>>> +
>>>> +  buf = (char *) xmalloc (buf_size);
>>>> +
>>>> +  strcpy (buf, bfd_name);
>>>> +  //sprintf (buf, "%s/%i", buf, node->symbol.order);
>>>> +  sprintf (buf, "%s [%i]", buf, node->symbol.order);
>>>> +
>>>> +  if (profile_info)
>>>> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>> +  return buf;
>>>> +}
>>>> +
>>>> +
>>>> +/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
>>>> +   function that the caller is inlined to in FINAL_CALLER.  */
>>>> +
>>>> +static const char *
>>>> +cgraph_node_call_chain (struct cgraph_node *caller,
>>>> +                       struct cgraph_node **final_caller)
>>>> +{
>>>> +  struct cgraph_node *node;
>>>> +  const char *via_str = " (via inline instance";
>>>> +  size_t current_string_len = strlen (via_str) + 1;
>>>> +  size_t buf_size = current_string_len;
>>>> +  char *buf = (char *) xmalloc (buf_size);
>>>> +
>>>> +  buf[0] = 0;
>>>> +  gcc_assert (caller->global.inlined_to != NULL);
>>>> +  strcat (buf, via_str);
>>>> +  for (node = caller; node->global.inlined_to != NULL;
>>>> +       node = node->callers->caller)
>>>> +    {
>>>> +      const char *name = cgraph_node_opt_info (node);
>>>> +      current_string_len += (strlen (name) + 1);
>>>> +      if (current_string_len >= buf_size)
>>>> +       {
>>>> +         buf_size = current_string_len * 2;
>>>> +         buf = (char *) xrealloc (buf, buf_size);
>>>> +       }
>>>> +      strcat (buf, " ");
>>>> +      strcat (buf, name);
>>>> +    }
>>>> +  strcat (buf, ")");
>>>> +  *final_caller = node;
>>>> +  return buf;
>>>> +}
>>>> +
>>>> +
>>>> +/* Dump the inline decision of EDGE.  */
>>>> +
>>>> +static void
>>>> +dump_inline_decision (struct cgraph_edge *edge, bool early)
>>>> +{
>>>> +  location_t locus;
>>>> +  const char *inline_chain_text;
>>>> +  const char *call_count_text;
>>>> +  struct cgraph_node *final_caller = edge->caller;
>>>> +
>>>> +  if (final_caller->global.inlined_to != NULL)
>>>> +    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
>>>> +  else
>>>> +    inline_chain_text = "";
>>>> +
>>>> +  if (edge->count > 0)
>>>> +    {
>>>> +      const char *call_count_str = " with call count ";
>>>> +      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
>>>> +      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
>>>> +              edge->count);
>>>> +      call_count_text = buf;
>>>> +    }
>>>> +  else
>>>> +    {
>>>> +      call_count_text = "";
>>>> +    }
>>>> +
>>>> +  locus = gimple_location (edge->call_stmt);
>>>> +  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
>>>> +                   locus,
>>>> +                   "%s inlined into %s%s%s\n",
>>>> +                   cgraph_node_opt_info (edge->callee),
>>>> +                   cgraph_node_opt_info (final_caller),
>>>> +                   call_count_text,
>>>> +                   inline_chain_text);
>>>> +}
>>>> +
>>>> +
>>>>  /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
>>>>     specify whether profile of original function should be updated.  If any new
>>>>     indirect edges are discovered in the process, add them to NEW_EDGES, unless
>>>> @@ -205,7 +310,8 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>  bool
>>>>  inline_call (struct cgraph_edge *e, bool update_original,
>>>>              vec<cgraph_edge_p> *new_edges,
>>>> -            int *overall_size, bool update_overall_summary)
>>>> +            int *overall_size, bool update_overall_summary,
>>>> +             bool early)
>>>>  {
>>>>    int old_size = 0, new_size = 0;
>>>>    struct cgraph_node *to = NULL;
>>>> @@ -218,6 +324,9 @@ inline_call (struct cgraph_edge *e, bool update_or
>>>>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>>>>  #endif
>>>>
>>>> +  if (dump_enabled_p ())
>>>> +    dump_inline_decision (e, early);
>>>> +
>>>>    /* Don't inline inlined edges.  */
>>>>    gcc_assert (e->inline_failed);
>>>>    /* Don't even think of inlining inline clone.  */
>>>> Index: doc/invoke.texi
>>>> ===================================================================
>>>> --- doc/invoke.texi     (revision 201461)
>>>> +++ doc/invoke.texi     (working copy)
>>>> @@ -6234,6 +6234,9 @@ Enable dumps from all loop optimizations.
>>>>  Enable dumps from all inlining optimizations.
>>>>  @item vec
>>>>  Enable dumps from all vectorization optimizations.
>>>> +@item optall
>>>> +Enable dumps from all optimizations. This is a superset of
>>>> +the optimization groups listed above.
>>>>  @end table
>>>>
>>>>  For example,
>>>> Index: profile.c
>>>> ===================================================================
>>>> --- profile.c   (revision 201461)
>>>> +++ profile.c   (working copy)
>>>> @@ -432,8 +432,8 @@ read_profile_edge_counts (gcov_type *exec_counts)
>>>>                     if (flag_profile_correction)
>>>>                       {
>>>>                         static bool informed = 0;
>>>> -                       if (!informed)
>>>> -                         inform (input_location,
>>>> +                       if (dump_enabled_p () && !informed)
>>>> +                         dump_printf_loc (MSG_NOTE, input_location,
>>>>                                   "corrupted profile info: edge count
>>>> exceeds maximal count");
>>>>                         informed = 1;
>>>>                       }
>>>> @@ -692,10 +692,11 @@ compute_branch_probabilities (unsigned cfg_checksu
>>>>         {
>>>>           /* Inconsistency detected. Make it flow-consistent. */
>>>>           static int informed = 0;
>>>> -         if (informed == 0)
>>>> +         if (dump_enabled_p () && informed == 0)
>>>>             {
>>>>               informed = 1;
>>>> -             inform (input_location, "correcting inconsistent profile data");
>>>> +             dump_printf_loc (MSG_NOTE, input_location,
>>>> +                              "correcting inconsistent profile data");
>>>>             }
>>>>           correct_negative_edge_counts ();
>>>>           /* Set bb counts to the sum of the outgoing edge counts */
>>>> Index: passes.c
>>>> ===================================================================
>>>> --- passes.c    (revision 201461)
>>>> +++ passes.c    (working copy)
>>>> @@ -524,6 +524,11 @@ pass_manager::register_one_dump_file (struct opt_p
>>>>    flag_name = concat (prefix, name, num, NULL);
>>>>    glob_name = concat (prefix, name, NULL);
>>>>    optgroup_flags |= pass->optinfo_flags;
>>>> +  /* For any passes that do not have an optgroup set, and which are not
>>>> +     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
>>>> +     any dump messages are emitted properly under -fopt-info(-optall).  */
>>>> +  if (optgroup_flags == OPTGROUP_NONE)
>>>> +    optgroup_flags = OPTGROUP_OTHER;
>>>>    id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
>>>>    set_pass_for_id (id, pass);
>>>>    full_name = concat (prefix, pass->name, num, NULL);
>>>> Index: value-prof.c
>>>> ===================================================================
>>>> --- value-prof.c        (revision 201461)
>>>> +++ value-prof.c        (working copy)
>>>> @@ -585,9 +585,11 @@ check_counter (gimple stmt, const char * name,
>>>>                : DECL_SOURCE_LOCATION (current_function_decl);
>>>>        if (flag_profile_correction)
>>>>          {
>>>> -         inform (locus, "correcting inconsistent value profile: "
>>>> -                 "%s profiler overall count (%d) does not match BB count "
>>>> -                  "(%d)", name, (int)*all, (int)bb_count);
>>>> +          if (dump_enabled_p ())
>>>> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>>> +                             "correcting inconsistent value profile: %s "
>>>> +                             "profiler overall count (%d) does not match BB "
>>>> +                             "count (%d)", name, (int)*all, (int)bb_count);
>>>>           *all = bb_count;
>>>>           if (*count > *all)
>>>>              *count = *all;
>>>> @@ -1209,9 +1211,11 @@ find_func_by_funcdef_no (int func_id)
>>>>    int max_id = get_last_funcdef_no ();
>>>>    if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
>>>>      {
>>>> -      if (flag_profile_correction)
>>>> -        inform (DECL_SOURCE_LOCATION (current_function_decl),
>>>> -                "Inconsistent profile: indirect call target (%d) does
>>>> not exist", func_id);
>>>> +      if (flag_profile_correction && dump_enabled_p ())
>>>> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>>>> +                         DECL_SOURCE_LOCATION (current_function_decl),
>>>> +                         "Inconsistent profile: indirect call target (%d) "
>>>> +                         "does not exist", func_id);
>>>>        else
>>>>          error ("Inconsistent profile: indirect call target (%d) does
>>>> not exist", func_id);
>>>>
>>>> @@ -1235,8 +1239,10 @@ check_ic_target (gimple call_stmt, struct cgraph_n
>>>>       return true;
>>>>
>>>>     locus =  gimple_location (call_stmt);
>>>> -   inform (locus, "Skipping target %s with mismatching types for icall ",
>>>> -           cgraph_node_name (target));
>>>> +   if (dump_enabled_p ())
>>>> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>>> +                      "Skipping target %s with mismatching types for icall ",
>>>> +                      cgraph_node_name (target));
>>>>     return false;
>>>>  }
>>>>
>>>> Index: coverage.c
>>>> ===================================================================
>>>> --- coverage.c  (revision 201461)
>>>> +++ coverage.c  (working copy)
>>>> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>  #include "langhooks.h"
>>>>  #include "hash-table.h"
>>>>  #include "tree-iterator.h"
>>>> +#include "tree-pass.h"
>>>>  #include "cgraph.h"
>>>>  #include "dumpfile.h"
>>>>  #include "diagnostic-core.h"
>>>> @@ -341,11 +342,13 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>>      {
>>>>        static int warned = 0;
>>>>
>>>> -      if (!warned++)
>>>> -       inform (input_location, (flag_guess_branch_prob
>>>> -                ? "file %s not found, execution counts estimated"
>>>> -                : "file %s not found, execution counts assumed to be zero"),
>>>> -               da_file_name);
>>>> +      if (!warned++ && dump_enabled_p ())
>>>> +       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                         (flag_guess_branch_prob
>>>> +                          ? "file %s not found, execution counts estimated"
>>>> +                          : "file %s not found, execution counts assumed to "
>>>> +                            "be zero"),
>>>> +                         da_file_name);
>>>>        return NULL;
>>>>      }
>>>>
>>>> @@ -369,21 +372,25 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>>         warning_at (input_location, OPT_Wcoverage_mismatch,
>>>>                     "the control flow of function %qE does not match "
>>>>                     "its profile data (counter %qs)", id, ctr_names[counter]);
>>>> -      if (warning_printed)
>>>> +      if (warning_printed && dump_enabled_p ())
>>>>         {
>>>> -        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
>>>> -                "the mismatch but performance may drop if the
>>>> function is hot");
>>>> +          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                           "use -Wno-error=coverage-mismatch to tolerate "
>>>> +                           "the mismatch but performance may drop if the "
>>>> +                           "function is hot");
>>>>
>>>>           if (!seen_error ()
>>>>               && !warned++)
>>>>             {
>>>> -             inform (input_location, "coverage mismatch ignored");
>>>> -             inform (input_location, flag_guess_branch_prob
>>>> -                     ? G_("execution counts estimated")
>>>> -                     : G_("execution counts assumed to be zero"));
>>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                               "coverage mismatch ignored");
>>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                               flag_guess_branch_prob
>>>> +                               ? G_("execution counts estimated")
>>>> +                               : G_("execution counts assumed to be zero"));
>>>>               if (!flag_guess_branch_prob)
>>>> -               inform (input_location,
>>>> -                       "this can result in poorly optimized code");
>>>> +               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                                 "this can result in poorly optimized code");
>>>>             }
>>>>         }
>>>>
>>>> @@ -1103,6 +1110,11 @@ coverage_init (const char *filename)
>>>>    int len = strlen (filename);
>>>>    int prefix_len = 0;
>>>>
>>>> +  /* Since coverage_init is invoked very early, before the pass
>>>> +     manager, we need to set up the dumping explicitly. This is
>>>> +     similar to the handling in finish_optimization_passes.  */
>>>> +  dump_start (pass_profile.pass.static_pass_number, NULL);
>>>> +
>>>>    if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
>>>>      profile_data_prefix = getpwd ();
>>>>
>>>> @@ -1145,6 +1157,8 @@ coverage_init (const char *filename)
>>>>           gcov_write_unsigned (bbg_file_stamp);
>>>>         }
>>>>      }
>>>> +
>>>> +  dump_finish (pass_profile.pass.static_pass_number);
>>>>  }
>>>>
>>>>  /* Performs file-level cleanup.  Close notes file, generate coverage
>>>> Index: ipa-inline.c
>>>> ===================================================================
>>>> --- ipa-inline.c        (revision 201461)
>>>> +++ ipa-inline.c        (working copy)
>>>> @@ -1322,7 +1322,7 @@ recursive_inlining (struct cgraph_edge *edge,
>>>>            reset_edge_growth_cache (curr);
>>>>         }
>>>>
>>>> -      inline_call (curr, false, new_edges, &overall_size, true);
>>>> +      inline_call (curr, false, new_edges, &overall_size, true, false);
>>>>        lookup_recursive_calls (node, curr->callee, heap);
>>>>        n++;
>>>>      }
>>>> @@ -1612,7 +1612,8 @@ inline_small_functions (void)
>>>>             fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
>>>>
>>>>           gcc_checking_assert (!callee->global.inlined_to);
>>>> -         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
>>>> +         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
>>>> +                       false);
>>>>           if (flag_indirect_inlining)
>>>>             add_new_edges_to_heap (edge_heap, new_indirect_edges);
>>>>
>>>> @@ -1733,7 +1734,7 @@ flatten_function (struct cgraph_node *node, bool e
>>>>                  xstrdup (cgraph_node_name (callee)),
>>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>>        orig_callee = callee;
>>>> -      inline_call (e, true, NULL, NULL, false);
>>>> +      inline_call (e, true, NULL, NULL, false, early);
>>>>        if (e->callee != orig_callee)
>>>>         orig_callee->symbol.aux = (void *) node;
>>>>        flatten_function (e->callee, early);
>>>> @@ -1852,7 +1853,8 @@ ipa_inline (void)
>>>>                                    inline_summary
>>>> (node->callers->caller)->size);
>>>>                         }
>>>>
>>>> -                     inline_call (node->callers, true, NULL, NULL, true);
>>>> +                     inline_call (node->callers, true, NULL, NULL, true,
>>>> +                                   false);
>>>>                       if (dump_file)
>>>>                         fprintf (dump_file,
>>>>                                  " Inlined into %s which now has %i size\n",
>>>> @@ -1925,7 +1927,7 @@ inline_always_inline_functions (struct cgraph_node
>>>>         fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
>>>>                  xstrdup (cgraph_node_name (e->callee)),
>>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>> -      inline_call (e, true, NULL, NULL, false);
>>>> +      inline_call (e, true, NULL, NULL, false, true);
>>>>        inlined = true;
>>>>      }
>>>>    if (inlined)
>>>> @@ -1977,7 +1979,7 @@ early_inline_small_functions (struct cgraph_node *
>>>>         fprintf (dump_file, " Inlining %s into %s.\n",
>>>>                  xstrdup (cgraph_node_name (callee)),
>>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>> -      inline_call (e, true, NULL, NULL, true);
>>>> +      inline_call (e, true, NULL, NULL, true, true);
>>>>        inlined = true;
>>>>      }
>>>>
>>>> Index: ipa-inline.h
>>>> ===================================================================
>>>> --- ipa-inline.h        (revision 201461)
>>>> +++ ipa-inline.h        (working copy)
>>>> @@ -228,7 +228,8 @@ void free_growth_caches (void);
>>>>  void compute_inline_parameters (struct cgraph_node *, bool);
>>>>
>>>>  /* In ipa-inline-transform.c  */
>>>> -bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
>>>> int *, bool);
>>>> +bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
>>>> +                  bool, bool);
>>>>  unsigned int inline_transform (struct cgraph_node *);
>>>>  void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);
>>>>
>>>> Index: testsuite/gcc.dg/pr40209.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/pr40209.c  (revision 201461)
>>>> +++ testsuite/gcc.dg/pr40209.c  (working copy)
>>>> @@ -1,5 +1,5 @@
>>>>  /* { dg-do compile } */
>>>> -/* { dg-options "-O2 -fprofile-use" } */
>>>> +/* { dg-options "-O2 -fprofile-use -fopt-info" } */
>>>>
>>>>  void process(const char *s);
>>>>
>>>> Index: testsuite/gcc.dg/pr26570.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/pr26570.c  (revision 201461)
>>>> +++ testsuite/gcc.dg/pr26570.c  (working copy)
>>>> @@ -1,5 +1,5 @@
>>>>  /* { dg-do compile } */
>>>> -/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
>>>> +/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */
>>>>
>>>>  unsigned test (unsigned a, unsigned b)
>>>>  {
>>>> Index: testsuite/gcc.dg/pr32773.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/pr32773.c  (revision 201461)
>>>> +++ testsuite/gcc.dg/pr32773.c  (working copy)
>>>> @@ -1,6 +1,6 @@
>>>>  /* { dg-do compile } */
>>>> -/* { dg-options "-O -fprofile-use" } */
>>>> -/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
>>>> +/* { dg-options "-O -fprofile-use -fopt-info" } */
>>>> +/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */
>>>>
>>>>  void foo (int *p)
>>>>  {
>>>> Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
>>>> ===================================================================
>>>> --- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
>>>> +++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
>>>> @@ -1,7 +1,7 @@
>>>>  // PR tree-optimization/39557
>>>>  // invalid post-dom info leads to infinite loop
>>>>  // { dg-do run }
>>>> -// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
>>>> +// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
>>>> -fno-rtti" }
>>>>
>>>>  struct C
>>>>  {
>>>> Index: testsuite/gcc.dg/inline-dump.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
>>>> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
>>>> @@ -0,0 +1,11 @@
>>>> +/* Verify that -fopt-info can output correct inline info.  */
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
>>>> +static inline int leaf() {
>>>> +  int i, ret = 0;
>>>> +  for (i = 0; i < 10; i++)
>>>> +    ret += i;
>>>> +  return ret;
>>>> +}
>>>> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
>>>> leaf .*inlined into bar .*via inline instance foo.*\n" } */
>>>> +int bar(void) { return foo(); }
>>>>>
>>>>> Thanks,
>>>>> Teresa
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Martin
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Richard Biener Aug. 28, 2013, 11:01 a.m. UTC | #5
On Wed, Aug 7, 2013 at 7:23 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>> Hi,
>>>
>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>>> >> This patch ports messages to the new dump framework,
>>>> >
>>>> > It would be great this new framework was documented somewhere.  I lost
>>>> > track of what was agreed it would be and from the uses in the
>>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>>
>>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>>> wiki or elsewhere?
>>>
>>> Thanks
>>>
>>>>
>>>> >
>>>> > I'd also like to point out two other minor things inline:
>>>> >
>>>> > [...]
>>>> >
>>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>> >>             Dehao Chen  <dehao@google.com>
>>>> >>
>>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>>> >>         consistent.
>>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>>> >>         (cgraph_node_opt_info): New function.
>>>> >>         (cgraph_node_call_chain): Ditto.
>>>> >>         (dump_inline_decision): Ditto.
>>>> >>         (inline_call): Invoke dump_inline_decision.
>>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>> >>         (compute_branch_probabilities): Ditto.
>>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>> >>         when pass not in any opt group.
>>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>>> >>         (find_func_by_funcdef_no): Ditto.
>>>> >>         (check_ic_target): Ditto.
>>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>>> >>         (coverage_init): Setup new dump framework.
>>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>>> >>
>>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>>> >>
>>>> >
>>>> > [...]
>>>> >
>>>> >> Index: ipa-inline-transform.c
>>>> >> ===================================================================
>>>> >> --- ipa-inline-transform.c      (revision 201461)
>>>> >> +++ ipa-inline-transform.c      (working copy)
>>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>> >>  }
>>>> >>
>>>> >>
>>>> >> +#define MAX_INT_LENGTH 20
>>>> >> +
>>>> >> +/* Return NODE's name and profile count, if available.  */
>>>> >> +
>>>> >> +static const char *
>>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>>> >> +{
>>>> >> +  char *buf;
>>>> >> +  size_t buf_size;
>>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>> >> +
>>>> >> +  if (!bfd_name)
>>>> >> +    bfd_name = "unknown";
>>>> >> +
>>>> >> +  buf_size = strlen (bfd_name) + 1;
>>>> >> +  if (profile_info)
>>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>>> >> +
>>>> >> +  buf = (char *) xmalloc (buf_size);
>>>> >> +
>>>> >> +  strcpy (buf, bfd_name);
>>>> >> +
>>>> >> +  if (profile_info)
>>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>> >> +  return buf;
>>>> >> +}
>>>> >
>>>> > I'm not sure if output of this function is aimed only at the user or
>>>> > if it is supposed to be used by gcc developers as well.  If the
>>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>>> > too.  We usually dump it after "/" sign separating it from node name.
>>>> > It is invaluable when examining decisions in C++ code where you can
>>>> > have lots of clones of a node (and also because existing dumps print
>>>> > it, it is easy to combine them).
>>>>
>>>> The output is useful for both power users doing performance tuning of
>>>> their application, and by gcc developers. Adding the id is not so
>>>> useful for the former, but I agree that it is very useful for compiler
>>>> developers. In fact, in the google branch version we emit more verbose
>>>> information (the lipo module id and the funcdef_no) to help uniquely
>>>> identify the routines and to aid in post-processing by humans and
>>>> tools. So it is probably useful to add something similar here too. Is
>>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>>> that you added a patch a few months ago to print the
>>>> node->symbol.order in the function header, and it also has the
>>>> advantage as you note of matching up with existing ipa dumps.
>>>
>>> node->symbol.order is unique and if I remember correctly, it is not
>>> even recycled.  Clones, inline clones, thunks, every symbol table node
>>> gets its own symbol order so it should be more unique than funcdef_no.
>>> On the other hand it may be a bit cryptic for users but at the same
>>> time it is only one number.
>>
>> Ok, I am going to go ahead and add this to the output.
>>
>>>
>>>>
>>>> >
>>>> > [...]
>>>> >
>>>> >> Index: ipa-inline.c
>>>> >> ===================================================================
>>>> >> --- ipa-inline.c        (revision 201461)
>>>> >> +++ ipa-inline.c        (working copy)
>>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>>> >>  static int overall_size;
>>>> >>  static gcov_type max_count;
>>>> >>
>>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>>> >> +bool is_in_ipa_inline = false;
>>>> >> +
>>>> >>  /* Return false when inlining edge E would lead to violating
>>>> >>     limits on function unit growth or stack usage growth.
>>>> >>
>>>> >
>>>> > In this age of removing global variables, are you sure you need this?
>>>> > The only user of this seems to be a function that is only being called
>>>> > from inline_call... can that ever happen when not inlining?  If you
>>>> > plan to use this function also elsewhere, perhaps the callers will
>>>> > know whether we are inlining or not and can provide this in a
>>>> > parameter?
>>>>
>>>> This is to distinguish early inlining from ipa inlining.
>>>
>>> Oh, right, I did not realize that the IPA part was the important bit
>>> of the name.
>>>
>>>> The volume of
>>>> early inlining messages is too high to be on for the default setting
>>>> of -fopt-info, and are not as interesting usually for performance
>>>> tuning. The dumper will only emit the early inline messages under a
>>>> more verbose setting (MSG_NOTE):
>>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>>> The other way I can see to distinguish this would be to check the
>>>> always_inline_functions_inlined flag on the caller's function. It
>>>> could also be possible to pass down a flag from the callers of
>>>> inline_call, but at least one caller (flatten_functions) is shared
>>>> between early and late inlining, so the flag needs to be passed
>>>> through that as well. WDYT?
>>>
>>> Did you mean flatten_function?  It already has a bool "early"
>>> parameter.  But I can see that being able to quickly figure out
>>> whether we are in early inliner or ipa inliner without much hassle is
>>> useful enough to justify a global variable a month ago, however I
>>> suppose we should not be introducing them now and so you'd have to put
>>> such stuff into... well, you'd probably have to put into the universe
>>> object somewhere because it is basically shared between two passes.
>>> Another option, even though somewhat hackish, would be to look at
>>> current_pass and see which pass it is.  I don't know, do what is
>>> easier or what you like more, just be aware of the problem.
>>
>> After thinking about this some more, I think passing down an early
>> flag from callers is the cleanest way to go.
>>
>> I'll fix these and post a new patch later today.
>
> New patch below that removes this global variable, and also outputs
> the node->symbol.order (in square brackets after the function name so
> as to not clutter it). Inline messages with profile data look look:
>
> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
> with call count 99999000 (via inline instance bar [3] (99999000))

Ick.  This looks both redundant and cluttered.  This is supposed to be
understandable by GCC users, not only GCC developers.

> (without FDO the counts in parentheses and the call count would not be
> included).
>
> Ok for trunk?

Let's split this patch.

> Thanks,
> Teresa
>
> 013-08-06  Teresa Johnson  <tejohnson@google.com>
>             Dehao Chen  <dehao@google.com>
>
>         * dumpfile.c (dump_loc): Output column number, make newlines consistent.

I don't like column numbers, they are of not much use generally.  Does
'make newlines consitent' avoid all the spurious vertical spacing I see with
-fopt-info?

>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.

Good change - please split this out (with the related changes) and commit it.

>         * ipa-inline-transform.c (cgraph_node_opt_info): New function.
>         (cgraph_node_call_chain): Ditto.
>         (dump_inline_decision): Ditto.
>         (inline_call): Invoke dump_inline_decision, new parameter.

The inline stuff should be split and re-sent, it's non-obvious to me (extra
function parameters are not documented for example).  I'd rather have
inline_and_report_call () for example instead of an extra bool parameter.
But let's iterate over this once it's split out.

>         * doc/invoke.texi: Document optall -fopt-info flag.
>         * profile.c (read_profile_edge_counts): Use new dump framework.
>         (compute_branch_probabilities): Ditto.
>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>         when pass not in any opt group.
>         * value-prof.c (check_counter): Use new dump framework.
>         (find_func_by_funcdef_no): Ditto.
>         (check_ic_target): Ditto.
>         * coverage.c (get_coverage_counts): Ditto.
>         (coverage_init): Setup new dump framework.

These pieces look good to me.

>         * ipa-inline.c (recursive_inlining): New inline_call parameter.
>         (inline_small_functions): Ditto.
>         (flatten_function): Ditto.
>         (ipa_inline): Ditto.
>         (inline_always_inline_functions): Ditto.
>         (early_inline_small_functions): Ditto.
>         * ipa-inline.h: Ditto.
>
>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>         * testsuite/gcc.dg/pr26570.c: Ditto.
>         * testsuite/gcc.dg/pr32773.c: Ditto.
>         * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.

Why?  Just remove the stray dg- annotations that deal with the unwanted output?

Thanks,
Richard.

>         * testsuite/gcc.dg/inline-dump.c: New test.
>
> Index: dumpfile.c
> ===================================================================
> --- dumpfile.c  (revision 201461)
> +++ dumpfile.c  (working copy)
> @@ -257,16 +257,18 @@ dump_open_alternate_stream (struct dump_file_info
>  void
>  dump_loc (int dump_kind, FILE *dfile, source_location loc)
>  {
> -  /* Currently vectorization passes print location information.  */
>    if (dump_kind)
>      {
> +      /* Ensure dump message starts on a new line.  */
> +      fprintf (dfile, "\n");
>        if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
> -        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
> -                 LOCATION_LINE (loc));
> +        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
> +                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
>        else if (current_function_decl)
> -        fprintf (dfile, "\n%s:%d: note: ",
> +        fprintf (dfile, "%s:%d:%d: note: ",
>                   DECL_SOURCE_FILE (current_function_decl),
> -                 DECL_SOURCE_LINE (current_function_decl));
> +                 DECL_SOURCE_LINE (current_function_decl),
> +                 DECL_SOURCE_COLUMN (current_function_decl));
>      }
>  }
>
> Index: dumpfile.h
> ===================================================================
> --- dumpfile.h  (revision 201461)
> +++ dumpfile.h  (working copy)
> @@ -97,8 +97,9 @@ enum tree_dump_index
>  #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
>  #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
>  #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
> +#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
>  #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
> -                              | OPTGROUP_VEC)
> +                              | OPTGROUP_VEC | OPTGROUP_OTHER)
>
>  /* Define a tree dump switch.  */
>  struct dump_file_info
> Index: ipa-inline-transform.c
> ===================================================================
> --- ipa-inline-transform.c      (revision 201461)
> +++ ipa-inline-transform.c      (working copy)
> @@ -192,6 +192,111 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>  }
>
>
> +#define MAX_INT_LENGTH 20
> +
> +/* Return NODE's name and profile count, if available.  */
> +
> +static const char *
> +cgraph_node_opt_info (struct cgraph_node *node)
> +{
> +  char *buf;
> +  size_t buf_size;
> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
> +
> +  if (!bfd_name)
> +    bfd_name = "unknown";
> +
> +  buf_size = strlen (bfd_name) + 1;
> +  if (profile_info)
> +    buf_size += (MAX_INT_LENGTH + 3);
> +  buf_size += MAX_INT_LENGTH;
> +
> +  buf = (char *) xmalloc (buf_size);
> +
> +  strcpy (buf, bfd_name);
> +  //sprintf (buf, "%s/%i", buf, node->symbol.order);
> +  sprintf (buf, "%s [%i]", buf, node->symbol.order);
> +
> +  if (profile_info)
> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
> +  return buf;
> +}
> +
> +
> +/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
> +   function that the caller is inlined to in FINAL_CALLER.  */
> +
> +static const char *
> +cgraph_node_call_chain (struct cgraph_node *caller,
> +                       struct cgraph_node **final_caller)
> +{
> +  struct cgraph_node *node;
> +  const char *via_str = " (via inline instance";
> +  size_t current_string_len = strlen (via_str) + 1;
> +  size_t buf_size = current_string_len;
> +  char *buf = (char *) xmalloc (buf_size);
> +
> +  buf[0] = 0;
> +  gcc_assert (caller->global.inlined_to != NULL);
> +  strcat (buf, via_str);
> +  for (node = caller; node->global.inlined_to != NULL;
> +       node = node->callers->caller)
> +    {
> +      const char *name = cgraph_node_opt_info (node);
> +      current_string_len += (strlen (name) + 1);
> +      if (current_string_len >= buf_size)
> +       {
> +         buf_size = current_string_len * 2;
> +         buf = (char *) xrealloc (buf, buf_size);
> +       }
> +      strcat (buf, " ");
> +      strcat (buf, name);
> +    }
> +  strcat (buf, ")");
> +  *final_caller = node;
> +  return buf;
> +}
> +
> +
> +/* Dump the inline decision of EDGE.  */
> +
> +static void
> +dump_inline_decision (struct cgraph_edge *edge, bool early)
> +{
> +  location_t locus;
> +  const char *inline_chain_text;
> +  const char *call_count_text;
> +  struct cgraph_node *final_caller = edge->caller;
> +
> +  if (final_caller->global.inlined_to != NULL)
> +    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
> +  else
> +    inline_chain_text = "";
> +
> +  if (edge->count > 0)
> +    {
> +      const char *call_count_str = " with call count ";
> +      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
> +      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
> +              edge->count);
> +      call_count_text = buf;
> +    }
> +  else
> +    {
> +      call_count_text = "";
> +    }
> +
> +  locus = gimple_location (edge->call_stmt);
> +  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
> +                   locus,
> +                   "%s inlined into %s%s%s\n",
> +                   cgraph_node_opt_info (edge->callee),
> +                   cgraph_node_opt_info (final_caller),
> +                   call_count_text,
> +                   inline_chain_text);
> +}
> +
> +
>  /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
>     specify whether profile of original function should be updated.  If any new
>     indirect edges are discovered in the process, add them to NEW_EDGES, unless
> @@ -205,7 +310,8 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>  bool
>  inline_call (struct cgraph_edge *e, bool update_original,
>              vec<cgraph_edge_p> *new_edges,
> -            int *overall_size, bool update_overall_summary)
> +            int *overall_size, bool update_overall_summary,
> +             bool early)
>  {
>    int old_size = 0, new_size = 0;
>    struct cgraph_node *to = NULL;
> @@ -218,6 +324,9 @@ inline_call (struct cgraph_edge *e, bool update_or
>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>  #endif
>
> +  if (dump_enabled_p ())
> +    dump_inline_decision (e, early);
> +
>    /* Don't inline inlined edges.  */
>    gcc_assert (e->inline_failed);
>    /* Don't even think of inlining inline clone.  */
> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi     (revision 201461)
> +++ doc/invoke.texi     (working copy)
> @@ -6234,6 +6234,9 @@ Enable dumps from all loop optimizations.
>  Enable dumps from all inlining optimizations.
>  @item vec
>  Enable dumps from all vectorization optimizations.
> +@item optall
> +Enable dumps from all optimizations. This is a superset of
> +the optimization groups listed above.
>  @end table
>
>  For example,
> Index: profile.c
> ===================================================================
> --- profile.c   (revision 201461)
> +++ profile.c   (working copy)
> @@ -432,8 +432,8 @@ read_profile_edge_counts (gcov_type *exec_counts)
>                     if (flag_profile_correction)
>                       {
>                         static bool informed = 0;
> -                       if (!informed)
> -                         inform (input_location,
> +                       if (dump_enabled_p () && !informed)
> +                         dump_printf_loc (MSG_NOTE, input_location,
>                                   "corrupted profile info: edge count
> exceeds maximal count");
>                         informed = 1;
>                       }
> @@ -692,10 +692,11 @@ compute_branch_probabilities (unsigned cfg_checksu
>         {
>           /* Inconsistency detected. Make it flow-consistent. */
>           static int informed = 0;
> -         if (informed == 0)
> +         if (dump_enabled_p () && informed == 0)
>             {
>               informed = 1;
> -             inform (input_location, "correcting inconsistent profile data");
> +             dump_printf_loc (MSG_NOTE, input_location,
> +                              "correcting inconsistent profile data");
>             }
>           correct_negative_edge_counts ();
>           /* Set bb counts to the sum of the outgoing edge counts */
> Index: passes.c
> ===================================================================
> --- passes.c    (revision 201461)
> +++ passes.c    (working copy)
> @@ -524,6 +524,11 @@ pass_manager::register_one_dump_file (struct opt_p
>    flag_name = concat (prefix, name, num, NULL);
>    glob_name = concat (prefix, name, NULL);
>    optgroup_flags |= pass->optinfo_flags;
> +  /* For any passes that do not have an optgroup set, and which are not
> +     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
> +     any dump messages are emitted properly under -fopt-info(-optall).  */
> +  if (optgroup_flags == OPTGROUP_NONE)
> +    optgroup_flags = OPTGROUP_OTHER;
>    id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
>    set_pass_for_id (id, pass);
>    full_name = concat (prefix, pass->name, num, NULL);
> Index: value-prof.c
> ===================================================================
> --- value-prof.c        (revision 201461)
> +++ value-prof.c        (working copy)
> @@ -585,9 +585,11 @@ check_counter (gimple stmt, const char * name,
>                : DECL_SOURCE_LOCATION (current_function_decl);
>        if (flag_profile_correction)
>          {
> -         inform (locus, "correcting inconsistent value profile: "
> -                 "%s profiler overall count (%d) does not match BB count "
> -                  "(%d)", name, (int)*all, (int)bb_count);
> +          if (dump_enabled_p ())
> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
> +                             "correcting inconsistent value profile: %s "
> +                             "profiler overall count (%d) does not match BB "
> +                             "count (%d)", name, (int)*all, (int)bb_count);
>           *all = bb_count;
>           if (*count > *all)
>              *count = *all;
> @@ -1209,9 +1211,11 @@ find_func_by_funcdef_no (int func_id)
>    int max_id = get_last_funcdef_no ();
>    if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
>      {
> -      if (flag_profile_correction)
> -        inform (DECL_SOURCE_LOCATION (current_function_decl),
> -                "Inconsistent profile: indirect call target (%d) does
> not exist", func_id);
> +      if (flag_profile_correction && dump_enabled_p ())
> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
> +                         DECL_SOURCE_LOCATION (current_function_decl),
> +                         "Inconsistent profile: indirect call target (%d) "
> +                         "does not exist", func_id);
>        else
>          error ("Inconsistent profile: indirect call target (%d) does
> not exist", func_id);
>
> @@ -1235,8 +1239,10 @@ check_ic_target (gimple call_stmt, struct cgraph_n
>       return true;
>
>     locus =  gimple_location (call_stmt);
> -   inform (locus, "Skipping target %s with mismatching types for icall ",
> -           cgraph_node_name (target));
> +   if (dump_enabled_p ())
> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
> +                      "Skipping target %s with mismatching types for icall ",
> +                      cgraph_node_name (target));
>     return false;
>  }
>
> Index: coverage.c
> ===================================================================
> --- coverage.c  (revision 201461)
> +++ coverage.c  (working copy)
> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "langhooks.h"
>  #include "hash-table.h"
>  #include "tree-iterator.h"
> +#include "tree-pass.h"
>  #include "cgraph.h"
>  #include "dumpfile.h"
>  #include "diagnostic-core.h"
> @@ -341,11 +342,13 @@ get_coverage_counts (unsigned counter, unsigned ex
>      {
>        static int warned = 0;
>
> -      if (!warned++)
> -       inform (input_location, (flag_guess_branch_prob
> -                ? "file %s not found, execution counts estimated"
> -                : "file %s not found, execution counts assumed to be zero"),
> -               da_file_name);
> +      if (!warned++ && dump_enabled_p ())
> +       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
> +                         (flag_guess_branch_prob
> +                          ? "file %s not found, execution counts estimated"
> +                          : "file %s not found, execution counts assumed to "
> +                            "be zero"),
> +                         da_file_name);
>        return NULL;
>      }
>
> @@ -369,21 +372,25 @@ get_coverage_counts (unsigned counter, unsigned ex
>         warning_at (input_location, OPT_Wcoverage_mismatch,
>                     "the control flow of function %qE does not match "
>                     "its profile data (counter %qs)", id, ctr_names[counter]);
> -      if (warning_printed)
> +      if (warning_printed && dump_enabled_p ())
>         {
> -        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
> -                "the mismatch but performance may drop if the
> function is hot");
> +          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
> +                           "use -Wno-error=coverage-mismatch to tolerate "
> +                           "the mismatch but performance may drop if the "
> +                           "function is hot");
>
>           if (!seen_error ()
>               && !warned++)
>             {
> -             inform (input_location, "coverage mismatch ignored");
> -             inform (input_location, flag_guess_branch_prob
> -                     ? G_("execution counts estimated")
> -                     : G_("execution counts assumed to be zero"));
> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
> +                               "coverage mismatch ignored");
> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
> +                               flag_guess_branch_prob
> +                               ? G_("execution counts estimated")
> +                               : G_("execution counts assumed to be zero"));
>               if (!flag_guess_branch_prob)
> -               inform (input_location,
> -                       "this can result in poorly optimized code");
> +               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
> +                                 "this can result in poorly optimized code");
>             }
>         }
>
> @@ -1103,6 +1110,11 @@ coverage_init (const char *filename)
>    int len = strlen (filename);
>    int prefix_len = 0;
>
> +  /* Since coverage_init is invoked very early, before the pass
> +     manager, we need to set up the dumping explicitly. This is
> +     similar to the handling in finish_optimization_passes.  */
> +  dump_start (pass_profile.pass.static_pass_number, NULL);
> +
>    if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
>      profile_data_prefix = getpwd ();
>
> @@ -1145,6 +1157,8 @@ coverage_init (const char *filename)
>           gcov_write_unsigned (bbg_file_stamp);
>         }
>      }
> +
> +  dump_finish (pass_profile.pass.static_pass_number);
>  }
>
>  /* Performs file-level cleanup.  Close notes file, generate coverage
> Index: ipa-inline.c
> ===================================================================
> --- ipa-inline.c        (revision 201461)
> +++ ipa-inline.c        (working copy)
> @@ -1322,7 +1322,7 @@ recursive_inlining (struct cgraph_edge *edge,
>            reset_edge_growth_cache (curr);
>         }
>
> -      inline_call (curr, false, new_edges, &overall_size, true);
> +      inline_call (curr, false, new_edges, &overall_size, true, false);
>        lookup_recursive_calls (node, curr->callee, heap);
>        n++;
>      }
> @@ -1612,7 +1612,8 @@ inline_small_functions (void)
>             fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
>
>           gcc_checking_assert (!callee->global.inlined_to);
> -         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
> +         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
> +                       false);
>           if (flag_indirect_inlining)
>             add_new_edges_to_heap (edge_heap, new_indirect_edges);
>
> @@ -1733,7 +1734,7 @@ flatten_function (struct cgraph_node *node, bool e
>                  xstrdup (cgraph_node_name (callee)),
>                  xstrdup (cgraph_node_name (e->caller)));
>        orig_callee = callee;
> -      inline_call (e, true, NULL, NULL, false);
> +      inline_call (e, true, NULL, NULL, false, early);
>        if (e->callee != orig_callee)
>         orig_callee->symbol.aux = (void *) node;
>        flatten_function (e->callee, early);
> @@ -1852,7 +1853,8 @@ ipa_inline (void)
>                                    inline_summary
> (node->callers->caller)->size);
>                         }
>
> -                     inline_call (node->callers, true, NULL, NULL, true);
> +                     inline_call (node->callers, true, NULL, NULL, true,
> +                                   false);
>                       if (dump_file)
>                         fprintf (dump_file,
>                                  " Inlined into %s which now has %i size\n",
> @@ -1925,7 +1927,7 @@ inline_always_inline_functions (struct cgraph_node
>         fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
>                  xstrdup (cgraph_node_name (e->callee)),
>                  xstrdup (cgraph_node_name (e->caller)));
> -      inline_call (e, true, NULL, NULL, false);
> +      inline_call (e, true, NULL, NULL, false, true);
>        inlined = true;
>      }
>    if (inlined)
> @@ -1977,7 +1979,7 @@ early_inline_small_functions (struct cgraph_node *
>         fprintf (dump_file, " Inlining %s into %s.\n",
>                  xstrdup (cgraph_node_name (callee)),
>                  xstrdup (cgraph_node_name (e->caller)));
> -      inline_call (e, true, NULL, NULL, true);
> +      inline_call (e, true, NULL, NULL, true, true);
>        inlined = true;
>      }
>
> Index: ipa-inline.h
> ===================================================================
> --- ipa-inline.h        (revision 201461)
> +++ ipa-inline.h        (working copy)
> @@ -228,7 +228,8 @@ void free_growth_caches (void);
>  void compute_inline_parameters (struct cgraph_node *, bool);
>
>  /* In ipa-inline-transform.c  */
> -bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
> int *, bool);
> +bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
> +                  bool, bool);
>  unsigned int inline_transform (struct cgraph_node *);
>  void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);
>
> Index: testsuite/gcc.dg/pr40209.c
> ===================================================================
> --- testsuite/gcc.dg/pr40209.c  (revision 201461)
> +++ testsuite/gcc.dg/pr40209.c  (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fprofile-use" } */
> +/* { dg-options "-O2 -fprofile-use -fopt-info" } */
>
>  void process(const char *s);
>
> Index: testsuite/gcc.dg/pr26570.c
> ===================================================================
> --- testsuite/gcc.dg/pr26570.c  (revision 201461)
> +++ testsuite/gcc.dg/pr26570.c  (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
> +/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */
>
>  unsigned test (unsigned a, unsigned b)
>  {
> Index: testsuite/gcc.dg/pr32773.c
> ===================================================================
> --- testsuite/gcc.dg/pr32773.c  (revision 201461)
> +++ testsuite/gcc.dg/pr32773.c  (working copy)
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fprofile-use" } */
> -/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
> +/* { dg-options "-O -fprofile-use -fopt-info" } */
> +/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */
>
>  void foo (int *p)
>  {
> Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
> ===================================================================
> --- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
> +++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
> @@ -1,7 +1,7 @@
>  // PR tree-optimization/39557
>  // invalid post-dom info leads to infinite loop
>  // { dg-do run }
> -// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
> +// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
> -fno-rtti" }
>
>  struct C
>  {
> Index: testsuite/gcc.dg/inline-dump.c
> ===================================================================
> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
> @@ -0,0 +1,11 @@
> +/* Verify that -fopt-info can output correct inline info.  */
> +/* { dg-do compile } */
> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
> +static inline int leaf() {
> +  int i, ret = 0;
> +  for (i = 0; i < 10; i++)
> +    ret += i;
> +  return ret;
> +}
> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
> leaf .*inlined into bar .*via inline instance foo.*\n" } */
> +int bar(void) { return foo(); }
>>
>> Thanks,
>> Teresa
>>
>>>
>>> Thanks,
>>>
>>> Martin
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Aug. 28, 2013, 2:09 p.m. UTC | #6
On Wed, Aug 28, 2013 at 4:01 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Aug 7, 2013 at 7:23 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>> Hi,
>>>>
>>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>>>> >> This patch ports messages to the new dump framework,
>>>>> >
>>>>> > It would be great this new framework was documented somewhere.  I lost
>>>>> > track of what was agreed it would be and from the uses in the
>>>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>>>
>>>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>>>> wiki or elsewhere?
>>>>
>>>> Thanks
>>>>
>>>>>
>>>>> >
>>>>> > I'd also like to point out two other minor things inline:
>>>>> >
>>>>> > [...]
>>>>> >
>>>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>> >>             Dehao Chen  <dehao@google.com>
>>>>> >>
>>>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>>>> >>         consistent.
>>>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>>>> >>         (cgraph_node_opt_info): New function.
>>>>> >>         (cgraph_node_call_chain): Ditto.
>>>>> >>         (dump_inline_decision): Ditto.
>>>>> >>         (inline_call): Invoke dump_inline_decision.
>>>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>> >>         (compute_branch_probabilities): Ditto.
>>>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>> >>         when pass not in any opt group.
>>>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>>>> >>         (find_func_by_funcdef_no): Ditto.
>>>>> >>         (check_ic_target): Ditto.
>>>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>>>> >>         (coverage_init): Setup new dump framework.
>>>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>>>> >>
>>>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>>>> >>
>>>>> >
>>>>> > [...]
>>>>> >
>>>>> >> Index: ipa-inline-transform.c
>>>>> >> ===================================================================
>>>>> >> --- ipa-inline-transform.c      (revision 201461)
>>>>> >> +++ ipa-inline-transform.c      (working copy)
>>>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>> >>  }
>>>>> >>
>>>>> >>
>>>>> >> +#define MAX_INT_LENGTH 20
>>>>> >> +
>>>>> >> +/* Return NODE's name and profile count, if available.  */
>>>>> >> +
>>>>> >> +static const char *
>>>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>>>> >> +{
>>>>> >> +  char *buf;
>>>>> >> +  size_t buf_size;
>>>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>>> >> +
>>>>> >> +  if (!bfd_name)
>>>>> >> +    bfd_name = "unknown";
>>>>> >> +
>>>>> >> +  buf_size = strlen (bfd_name) + 1;
>>>>> >> +  if (profile_info)
>>>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>>>> >> +
>>>>> >> +  buf = (char *) xmalloc (buf_size);
>>>>> >> +
>>>>> >> +  strcpy (buf, bfd_name);
>>>>> >> +
>>>>> >> +  if (profile_info)
>>>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>>> >> +  return buf;
>>>>> >> +}
>>>>> >
>>>>> > I'm not sure if output of this function is aimed only at the user or
>>>>> > if it is supposed to be used by gcc developers as well.  If the
>>>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>>>> > too.  We usually dump it after "/" sign separating it from node name.
>>>>> > It is invaluable when examining decisions in C++ code where you can
>>>>> > have lots of clones of a node (and also because existing dumps print
>>>>> > it, it is easy to combine them).
>>>>>
>>>>> The output is useful for both power users doing performance tuning of
>>>>> their application, and by gcc developers. Adding the id is not so
>>>>> useful for the former, but I agree that it is very useful for compiler
>>>>> developers. In fact, in the google branch version we emit more verbose
>>>>> information (the lipo module id and the funcdef_no) to help uniquely
>>>>> identify the routines and to aid in post-processing by humans and
>>>>> tools. So it is probably useful to add something similar here too. Is
>>>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>>>> that you added a patch a few months ago to print the
>>>>> node->symbol.order in the function header, and it also has the
>>>>> advantage as you note of matching up with existing ipa dumps.
>>>>
>>>> node->symbol.order is unique and if I remember correctly, it is not
>>>> even recycled.  Clones, inline clones, thunks, every symbol table node
>>>> gets its own symbol order so it should be more unique than funcdef_no.
>>>> On the other hand it may be a bit cryptic for users but at the same
>>>> time it is only one number.
>>>
>>> Ok, I am going to go ahead and add this to the output.
>>>
>>>>
>>>>>
>>>>> >
>>>>> > [...]
>>>>> >
>>>>> >> Index: ipa-inline.c
>>>>> >> ===================================================================
>>>>> >> --- ipa-inline.c        (revision 201461)
>>>>> >> +++ ipa-inline.c        (working copy)
>>>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>>>> >>  static int overall_size;
>>>>> >>  static gcov_type max_count;
>>>>> >>
>>>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>>>> >> +bool is_in_ipa_inline = false;
>>>>> >> +
>>>>> >>  /* Return false when inlining edge E would lead to violating
>>>>> >>     limits on function unit growth or stack usage growth.
>>>>> >>
>>>>> >
>>>>> > In this age of removing global variables, are you sure you need this?
>>>>> > The only user of this seems to be a function that is only being called
>>>>> > from inline_call... can that ever happen when not inlining?  If you
>>>>> > plan to use this function also elsewhere, perhaps the callers will
>>>>> > know whether we are inlining or not and can provide this in a
>>>>> > parameter?
>>>>>
>>>>> This is to distinguish early inlining from ipa inlining.
>>>>
>>>> Oh, right, I did not realize that the IPA part was the important bit
>>>> of the name.
>>>>
>>>>> The volume of
>>>>> early inlining messages is too high to be on for the default setting
>>>>> of -fopt-info, and are not as interesting usually for performance
>>>>> tuning. The dumper will only emit the early inline messages under a
>>>>> more verbose setting (MSG_NOTE):
>>>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>>>> The other way I can see to distinguish this would be to check the
>>>>> always_inline_functions_inlined flag on the caller's function. It
>>>>> could also be possible to pass down a flag from the callers of
>>>>> inline_call, but at least one caller (flatten_functions) is shared
>>>>> between early and late inlining, so the flag needs to be passed
>>>>> through that as well. WDYT?
>>>>
>>>> Did you mean flatten_function?  It already has a bool "early"
>>>> parameter.  But I can see that being able to quickly figure out
>>>> whether we are in early inliner or ipa inliner without much hassle is
>>>> useful enough to justify a global variable a month ago, however I
>>>> suppose we should not be introducing them now and so you'd have to put
>>>> such stuff into... well, you'd probably have to put into the universe
>>>> object somewhere because it is basically shared between two passes.
>>>> Another option, even though somewhat hackish, would be to look at
>>>> current_pass and see which pass it is.  I don't know, do what is
>>>> easier or what you like more, just be aware of the problem.
>>>
>>> After thinking about this some more, I think passing down an early
>>> flag from callers is the cleanest way to go.
>>>
>>> I'll fix these and post a new patch later today.
>>
>> New patch below that removes this global variable, and also outputs
>> the node->symbol.order (in square brackets after the function name so
>> as to not clutter it). Inline messages with profile data look look:
>>
>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>> with call count 99999000 (via inline instance bar [3] (99999000))
>
> Ick.  This looks both redundant and cluttered.  This is supposed to be
> understandable by GCC users, not only GCC developers.

The main part that is only useful/understandable to gcc developers is
the node->symbol.order in square brackes, requested by Martin. One
possibility is that I could put that part under a param, disabled by
default. We have something similar on the google branches that emits
LIPO module info in the message, enabled via a param.

I'd argue that the other information (the profile counts, emitted only
when using -fprofile-use, and the inline call chains) are useful if
you want to understand whether and how critical inlines are occurring.
I think this is the type of information that users focused on
optimizations, as well as gcc developers, want when they use
-fopt-info. Otherwise it is difficult to make sense of the inline
information.

>
>> (without FDO the counts in parentheses and the call count would not be
>> included).
>>
>> Ok for trunk?
>
> Let's split this patch.

Ok.

>
>> Thanks,
>> Teresa
>>
>> 013-08-06  Teresa Johnson  <tejohnson@google.com>
>>             Dehao Chen  <dehao@google.com>
>>
>>         * dumpfile.c (dump_loc): Output column number, make newlines consistent.
>
> I don't like column numbers, they are of not much use generally.

I added these here to get consistency with other messages (notes
emitted via inform(), warnings, errors). Plus the dg-message testing
was failing for the test cases that parse this output, since it
expects the column to exist.

> Does
> 'make newlines consitent' avoid all the spurious vertical spacing I see with
> -fopt-info?

Well, it helps get us there. The problem was that before, since
dump_loc was not consistently emitting newlines, the calls had to emit
their own newlines manually in the string to ensure there was a
newline at all. I was thinking that once this is fixed I could go back
and clean up all those calls by removing the newlines in the string. I
could split this part into a separate patch and do both at once.

However, after thinking about this some more this morning, I am
wondering whether it is better to remove the newline emission
completely from dump_loc and rely on the caller to put the newline in
the string. The reason is that there are 2 high level interfaces to
the new dump infrastructure, dump_printf() and dump_printf_loc(). Only
the latter invokes dump_loc and gets the newline at the start of the
message. The typical usage seems to be to start a message via
dump_printf_loc, and then use dump_printf to emit parts of the message
(thus not requiring a newline), but I think it may lead to problems to
rely on this assumption.

So if you agree, I will simply remove the newline altogether from
dump_loc, and ensure that all clients of dump_printf/dump_printf_loc
include a newline char as appropriate in the string they pass.

>
>>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>
> Good change - please split this out (with the related changes) and commit it.

Ok, thanks. Will do.

>
>>         * ipa-inline-transform.c (cgraph_node_opt_info): New function.
>>         (cgraph_node_call_chain): Ditto.
>>         (dump_inline_decision): Ditto.
>>         (inline_call): Invoke dump_inline_decision, new parameter.
>
> The inline stuff should be split and re-sent, it's non-obvious to me (extra
> function parameters are not documented for example).  I'd rather have
> inline_and_report_call () for example instead of an extra bool parameter.
> But let's iterate over this once it's split out.

Ok, I will send this separately. I guess we could have a separate
interface inline_and_report_call that is a wrapper around inline_call
and simply invokes the dumper. Note that flatten_function will need to
conditionally call one of the two interfaces based on the value of its
bool early parameter though.

>
>>         * doc/invoke.texi: Document optall -fopt-info flag.
>>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>         (compute_branch_probabilities): Ditto.
>>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>         when pass not in any opt group.
>>         * value-prof.c (check_counter): Use new dump framework.
>>         (find_func_by_funcdef_no): Ditto.
>>         (check_ic_target): Ditto.
>>         * coverage.c (get_coverage_counts): Ditto.
>>         (coverage_init): Setup new dump framework.
>
> These pieces look good to me.
>
>>         * ipa-inline.c (recursive_inlining): New inline_call parameter.
>>         (inline_small_functions): Ditto.
>>         (flatten_function): Ditto.
>>         (ipa_inline): Ditto.
>>         (inline_always_inline_functions): Ditto.
>>         (early_inline_small_functions): Ditto.
>>         * ipa-inline.h: Ditto.
>>
>>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>         * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.
>
> Why?  Just remove the stray dg- annotations that deal with the unwanted output?

Because there are dg-message annotations that want to confirm this output.

Teresa

>
> Thanks,
> Richard.
>
>>         * testsuite/gcc.dg/inline-dump.c: New test.
>>
>> Index: dumpfile.c
>> ===================================================================
>> --- dumpfile.c  (revision 201461)
>> +++ dumpfile.c  (working copy)
>> @@ -257,16 +257,18 @@ dump_open_alternate_stream (struct dump_file_info
>>  void
>>  dump_loc (int dump_kind, FILE *dfile, source_location loc)
>>  {
>> -  /* Currently vectorization passes print location information.  */
>>    if (dump_kind)
>>      {
>> +      /* Ensure dump message starts on a new line.  */
>> +      fprintf (dfile, "\n");
>>        if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
>> -        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
>> -                 LOCATION_LINE (loc));
>> +        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
>> +                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
>>        else if (current_function_decl)
>> -        fprintf (dfile, "\n%s:%d: note: ",
>> +        fprintf (dfile, "%s:%d:%d: note: ",
>>                   DECL_SOURCE_FILE (current_function_decl),
>> -                 DECL_SOURCE_LINE (current_function_decl));
>> +                 DECL_SOURCE_LINE (current_function_decl),
>> +                 DECL_SOURCE_COLUMN (current_function_decl));
>>      }
>>  }
>>
>> Index: dumpfile.h
>> ===================================================================
>> --- dumpfile.h  (revision 201461)
>> +++ dumpfile.h  (working copy)
>> @@ -97,8 +97,9 @@ enum tree_dump_index
>>  #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
>>  #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
>>  #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
>> +#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
>>  #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
>> -                              | OPTGROUP_VEC)
>> +                              | OPTGROUP_VEC | OPTGROUP_OTHER)
>>
>>  /* Define a tree dump switch.  */
>>  struct dump_file_info
>> Index: ipa-inline-transform.c
>> ===================================================================
>> --- ipa-inline-transform.c      (revision 201461)
>> +++ ipa-inline-transform.c      (working copy)
>> @@ -192,6 +192,111 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>  }
>>
>>
>> +#define MAX_INT_LENGTH 20
>> +
>> +/* Return NODE's name and profile count, if available.  */
>> +
>> +static const char *
>> +cgraph_node_opt_info (struct cgraph_node *node)
>> +{
>> +  char *buf;
>> +  size_t buf_size;
>> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>> +
>> +  if (!bfd_name)
>> +    bfd_name = "unknown";
>> +
>> +  buf_size = strlen (bfd_name) + 1;
>> +  if (profile_info)
>> +    buf_size += (MAX_INT_LENGTH + 3);
>> +  buf_size += MAX_INT_LENGTH;
>> +
>> +  buf = (char *) xmalloc (buf_size);
>> +
>> +  strcpy (buf, bfd_name);
>> +  //sprintf (buf, "%s/%i", buf, node->symbol.order);
>> +  sprintf (buf, "%s [%i]", buf, node->symbol.order);
>> +
>> +  if (profile_info)
>> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>> +  return buf;
>> +}
>> +
>> +
>> +/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
>> +   function that the caller is inlined to in FINAL_CALLER.  */
>> +
>> +static const char *
>> +cgraph_node_call_chain (struct cgraph_node *caller,
>> +                       struct cgraph_node **final_caller)
>> +{
>> +  struct cgraph_node *node;
>> +  const char *via_str = " (via inline instance";
>> +  size_t current_string_len = strlen (via_str) + 1;
>> +  size_t buf_size = current_string_len;
>> +  char *buf = (char *) xmalloc (buf_size);
>> +
>> +  buf[0] = 0;
>> +  gcc_assert (caller->global.inlined_to != NULL);
>> +  strcat (buf, via_str);
>> +  for (node = caller; node->global.inlined_to != NULL;
>> +       node = node->callers->caller)
>> +    {
>> +      const char *name = cgraph_node_opt_info (node);
>> +      current_string_len += (strlen (name) + 1);
>> +      if (current_string_len >= buf_size)
>> +       {
>> +         buf_size = current_string_len * 2;
>> +         buf = (char *) xrealloc (buf, buf_size);
>> +       }
>> +      strcat (buf, " ");
>> +      strcat (buf, name);
>> +    }
>> +  strcat (buf, ")");
>> +  *final_caller = node;
>> +  return buf;
>> +}
>> +
>> +
>> +/* Dump the inline decision of EDGE.  */
>> +
>> +static void
>> +dump_inline_decision (struct cgraph_edge *edge, bool early)
>> +{
>> +  location_t locus;
>> +  const char *inline_chain_text;
>> +  const char *call_count_text;
>> +  struct cgraph_node *final_caller = edge->caller;
>> +
>> +  if (final_caller->global.inlined_to != NULL)
>> +    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
>> +  else
>> +    inline_chain_text = "";
>> +
>> +  if (edge->count > 0)
>> +    {
>> +      const char *call_count_str = " with call count ";
>> +      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
>> +      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
>> +              edge->count);
>> +      call_count_text = buf;
>> +    }
>> +  else
>> +    {
>> +      call_count_text = "";
>> +    }
>> +
>> +  locus = gimple_location (edge->call_stmt);
>> +  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
>> +                   locus,
>> +                   "%s inlined into %s%s%s\n",
>> +                   cgraph_node_opt_info (edge->callee),
>> +                   cgraph_node_opt_info (final_caller),
>> +                   call_count_text,
>> +                   inline_chain_text);
>> +}
>> +
>> +
>>  /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
>>     specify whether profile of original function should be updated.  If any new
>>     indirect edges are discovered in the process, add them to NEW_EDGES, unless
>> @@ -205,7 +310,8 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>  bool
>>  inline_call (struct cgraph_edge *e, bool update_original,
>>              vec<cgraph_edge_p> *new_edges,
>> -            int *overall_size, bool update_overall_summary)
>> +            int *overall_size, bool update_overall_summary,
>> +             bool early)
>>  {
>>    int old_size = 0, new_size = 0;
>>    struct cgraph_node *to = NULL;
>> @@ -218,6 +324,9 @@ inline_call (struct cgraph_edge *e, bool update_or
>>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>>  #endif
>>
>> +  if (dump_enabled_p ())
>> +    dump_inline_decision (e, early);
>> +
>>    /* Don't inline inlined edges.  */
>>    gcc_assert (e->inline_failed);
>>    /* Don't even think of inlining inline clone.  */
>> Index: doc/invoke.texi
>> ===================================================================
>> --- doc/invoke.texi     (revision 201461)
>> +++ doc/invoke.texi     (working copy)
>> @@ -6234,6 +6234,9 @@ Enable dumps from all loop optimizations.
>>  Enable dumps from all inlining optimizations.
>>  @item vec
>>  Enable dumps from all vectorization optimizations.
>> +@item optall
>> +Enable dumps from all optimizations. This is a superset of
>> +the optimization groups listed above.
>>  @end table
>>
>>  For example,
>> Index: profile.c
>> ===================================================================
>> --- profile.c   (revision 201461)
>> +++ profile.c   (working copy)
>> @@ -432,8 +432,8 @@ read_profile_edge_counts (gcov_type *exec_counts)
>>                     if (flag_profile_correction)
>>                       {
>>                         static bool informed = 0;
>> -                       if (!informed)
>> -                         inform (input_location,
>> +                       if (dump_enabled_p () && !informed)
>> +                         dump_printf_loc (MSG_NOTE, input_location,
>>                                   "corrupted profile info: edge count
>> exceeds maximal count");
>>                         informed = 1;
>>                       }
>> @@ -692,10 +692,11 @@ compute_branch_probabilities (unsigned cfg_checksu
>>         {
>>           /* Inconsistency detected. Make it flow-consistent. */
>>           static int informed = 0;
>> -         if (informed == 0)
>> +         if (dump_enabled_p () && informed == 0)
>>             {
>>               informed = 1;
>> -             inform (input_location, "correcting inconsistent profile data");
>> +             dump_printf_loc (MSG_NOTE, input_location,
>> +                              "correcting inconsistent profile data");
>>             }
>>           correct_negative_edge_counts ();
>>           /* Set bb counts to the sum of the outgoing edge counts */
>> Index: passes.c
>> ===================================================================
>> --- passes.c    (revision 201461)
>> +++ passes.c    (working copy)
>> @@ -524,6 +524,11 @@ pass_manager::register_one_dump_file (struct opt_p
>>    flag_name = concat (prefix, name, num, NULL);
>>    glob_name = concat (prefix, name, NULL);
>>    optgroup_flags |= pass->optinfo_flags;
>> +  /* For any passes that do not have an optgroup set, and which are not
>> +     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
>> +     any dump messages are emitted properly under -fopt-info(-optall).  */
>> +  if (optgroup_flags == OPTGROUP_NONE)
>> +    optgroup_flags = OPTGROUP_OTHER;
>>    id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
>>    set_pass_for_id (id, pass);
>>    full_name = concat (prefix, pass->name, num, NULL);
>> Index: value-prof.c
>> ===================================================================
>> --- value-prof.c        (revision 201461)
>> +++ value-prof.c        (working copy)
>> @@ -585,9 +585,11 @@ check_counter (gimple stmt, const char * name,
>>                : DECL_SOURCE_LOCATION (current_function_decl);
>>        if (flag_profile_correction)
>>          {
>> -         inform (locus, "correcting inconsistent value profile: "
>> -                 "%s profiler overall count (%d) does not match BB count "
>> -                  "(%d)", name, (int)*all, (int)bb_count);
>> +          if (dump_enabled_p ())
>> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>> +                             "correcting inconsistent value profile: %s "
>> +                             "profiler overall count (%d) does not match BB "
>> +                             "count (%d)", name, (int)*all, (int)bb_count);
>>           *all = bb_count;
>>           if (*count > *all)
>>              *count = *all;
>> @@ -1209,9 +1211,11 @@ find_func_by_funcdef_no (int func_id)
>>    int max_id = get_last_funcdef_no ();
>>    if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
>>      {
>> -      if (flag_profile_correction)
>> -        inform (DECL_SOURCE_LOCATION (current_function_decl),
>> -                "Inconsistent profile: indirect call target (%d) does
>> not exist", func_id);
>> +      if (flag_profile_correction && dump_enabled_p ())
>> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>> +                         DECL_SOURCE_LOCATION (current_function_decl),
>> +                         "Inconsistent profile: indirect call target (%d) "
>> +                         "does not exist", func_id);
>>        else
>>          error ("Inconsistent profile: indirect call target (%d) does
>> not exist", func_id);
>>
>> @@ -1235,8 +1239,10 @@ check_ic_target (gimple call_stmt, struct cgraph_n
>>       return true;
>>
>>     locus =  gimple_location (call_stmt);
>> -   inform (locus, "Skipping target %s with mismatching types for icall ",
>> -           cgraph_node_name (target));
>> +   if (dump_enabled_p ())
>> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>> +                      "Skipping target %s with mismatching types for icall ",
>> +                      cgraph_node_name (target));
>>     return false;
>>  }
>>
>> Index: coverage.c
>> ===================================================================
>> --- coverage.c  (revision 201461)
>> +++ coverage.c  (working copy)
>> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "langhooks.h"
>>  #include "hash-table.h"
>>  #include "tree-iterator.h"
>> +#include "tree-pass.h"
>>  #include "cgraph.h"
>>  #include "dumpfile.h"
>>  #include "diagnostic-core.h"
>> @@ -341,11 +342,13 @@ get_coverage_counts (unsigned counter, unsigned ex
>>      {
>>        static int warned = 0;
>>
>> -      if (!warned++)
>> -       inform (input_location, (flag_guess_branch_prob
>> -                ? "file %s not found, execution counts estimated"
>> -                : "file %s not found, execution counts assumed to be zero"),
>> -               da_file_name);
>> +      if (!warned++ && dump_enabled_p ())
>> +       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>> +                         (flag_guess_branch_prob
>> +                          ? "file %s not found, execution counts estimated"
>> +                          : "file %s not found, execution counts assumed to "
>> +                            "be zero"),
>> +                         da_file_name);
>>        return NULL;
>>      }
>>
>> @@ -369,21 +372,25 @@ get_coverage_counts (unsigned counter, unsigned ex
>>         warning_at (input_location, OPT_Wcoverage_mismatch,
>>                     "the control flow of function %qE does not match "
>>                     "its profile data (counter %qs)", id, ctr_names[counter]);
>> -      if (warning_printed)
>> +      if (warning_printed && dump_enabled_p ())
>>         {
>> -        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
>> -                "the mismatch but performance may drop if the
>> function is hot");
>> +          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>> +                           "use -Wno-error=coverage-mismatch to tolerate "
>> +                           "the mismatch but performance may drop if the "
>> +                           "function is hot");
>>
>>           if (!seen_error ()
>>               && !warned++)
>>             {
>> -             inform (input_location, "coverage mismatch ignored");
>> -             inform (input_location, flag_guess_branch_prob
>> -                     ? G_("execution counts estimated")
>> -                     : G_("execution counts assumed to be zero"));
>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>> +                               "coverage mismatch ignored");
>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>> +                               flag_guess_branch_prob
>> +                               ? G_("execution counts estimated")
>> +                               : G_("execution counts assumed to be zero"));
>>               if (!flag_guess_branch_prob)
>> -               inform (input_location,
>> -                       "this can result in poorly optimized code");
>> +               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>> +                                 "this can result in poorly optimized code");
>>             }
>>         }
>>
>> @@ -1103,6 +1110,11 @@ coverage_init (const char *filename)
>>    int len = strlen (filename);
>>    int prefix_len = 0;
>>
>> +  /* Since coverage_init is invoked very early, before the pass
>> +     manager, we need to set up the dumping explicitly. This is
>> +     similar to the handling in finish_optimization_passes.  */
>> +  dump_start (pass_profile.pass.static_pass_number, NULL);
>> +
>>    if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
>>      profile_data_prefix = getpwd ();
>>
>> @@ -1145,6 +1157,8 @@ coverage_init (const char *filename)
>>           gcov_write_unsigned (bbg_file_stamp);
>>         }
>>      }
>> +
>> +  dump_finish (pass_profile.pass.static_pass_number);
>>  }
>>
>>  /* Performs file-level cleanup.  Close notes file, generate coverage
>> Index: ipa-inline.c
>> ===================================================================
>> --- ipa-inline.c        (revision 201461)
>> +++ ipa-inline.c        (working copy)
>> @@ -1322,7 +1322,7 @@ recursive_inlining (struct cgraph_edge *edge,
>>            reset_edge_growth_cache (curr);
>>         }
>>
>> -      inline_call (curr, false, new_edges, &overall_size, true);
>> +      inline_call (curr, false, new_edges, &overall_size, true, false);
>>        lookup_recursive_calls (node, curr->callee, heap);
>>        n++;
>>      }
>> @@ -1612,7 +1612,8 @@ inline_small_functions (void)
>>             fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
>>
>>           gcc_checking_assert (!callee->global.inlined_to);
>> -         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
>> +         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
>> +                       false);
>>           if (flag_indirect_inlining)
>>             add_new_edges_to_heap (edge_heap, new_indirect_edges);
>>
>> @@ -1733,7 +1734,7 @@ flatten_function (struct cgraph_node *node, bool e
>>                  xstrdup (cgraph_node_name (callee)),
>>                  xstrdup (cgraph_node_name (e->caller)));
>>        orig_callee = callee;
>> -      inline_call (e, true, NULL, NULL, false);
>> +      inline_call (e, true, NULL, NULL, false, early);
>>        if (e->callee != orig_callee)
>>         orig_callee->symbol.aux = (void *) node;
>>        flatten_function (e->callee, early);
>> @@ -1852,7 +1853,8 @@ ipa_inline (void)
>>                                    inline_summary
>> (node->callers->caller)->size);
>>                         }
>>
>> -                     inline_call (node->callers, true, NULL, NULL, true);
>> +                     inline_call (node->callers, true, NULL, NULL, true,
>> +                                   false);
>>                       if (dump_file)
>>                         fprintf (dump_file,
>>                                  " Inlined into %s which now has %i size\n",
>> @@ -1925,7 +1927,7 @@ inline_always_inline_functions (struct cgraph_node
>>         fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
>>                  xstrdup (cgraph_node_name (e->callee)),
>>                  xstrdup (cgraph_node_name (e->caller)));
>> -      inline_call (e, true, NULL, NULL, false);
>> +      inline_call (e, true, NULL, NULL, false, true);
>>        inlined = true;
>>      }
>>    if (inlined)
>> @@ -1977,7 +1979,7 @@ early_inline_small_functions (struct cgraph_node *
>>         fprintf (dump_file, " Inlining %s into %s.\n",
>>                  xstrdup (cgraph_node_name (callee)),
>>                  xstrdup (cgraph_node_name (e->caller)));
>> -      inline_call (e, true, NULL, NULL, true);
>> +      inline_call (e, true, NULL, NULL, true, true);
>>        inlined = true;
>>      }
>>
>> Index: ipa-inline.h
>> ===================================================================
>> --- ipa-inline.h        (revision 201461)
>> +++ ipa-inline.h        (working copy)
>> @@ -228,7 +228,8 @@ void free_growth_caches (void);
>>  void compute_inline_parameters (struct cgraph_node *, bool);
>>
>>  /* In ipa-inline-transform.c  */
>> -bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
>> int *, bool);
>> +bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
>> +                  bool, bool);
>>  unsigned int inline_transform (struct cgraph_node *);
>>  void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);
>>
>> Index: testsuite/gcc.dg/pr40209.c
>> ===================================================================
>> --- testsuite/gcc.dg/pr40209.c  (revision 201461)
>> +++ testsuite/gcc.dg/pr40209.c  (working copy)
>> @@ -1,5 +1,5 @@
>>  /* { dg-do compile } */
>> -/* { dg-options "-O2 -fprofile-use" } */
>> +/* { dg-options "-O2 -fprofile-use -fopt-info" } */
>>
>>  void process(const char *s);
>>
>> Index: testsuite/gcc.dg/pr26570.c
>> ===================================================================
>> --- testsuite/gcc.dg/pr26570.c  (revision 201461)
>> +++ testsuite/gcc.dg/pr26570.c  (working copy)
>> @@ -1,5 +1,5 @@
>>  /* { dg-do compile } */
>> -/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
>> +/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */
>>
>>  unsigned test (unsigned a, unsigned b)
>>  {
>> Index: testsuite/gcc.dg/pr32773.c
>> ===================================================================
>> --- testsuite/gcc.dg/pr32773.c  (revision 201461)
>> +++ testsuite/gcc.dg/pr32773.c  (working copy)
>> @@ -1,6 +1,6 @@
>>  /* { dg-do compile } */
>> -/* { dg-options "-O -fprofile-use" } */
>> -/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
>> +/* { dg-options "-O -fprofile-use -fopt-info" } */
>> +/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */
>>
>>  void foo (int *p)
>>  {
>> Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
>> ===================================================================
>> --- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
>> +++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
>> @@ -1,7 +1,7 @@
>>  // PR tree-optimization/39557
>>  // invalid post-dom info leads to infinite loop
>>  // { dg-do run }
>> -// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
>> +// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
>> -fno-rtti" }
>>
>>  struct C
>>  {
>> Index: testsuite/gcc.dg/inline-dump.c
>> ===================================================================
>> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
>> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
>> @@ -0,0 +1,11 @@
>> +/* Verify that -fopt-info can output correct inline info.  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
>> +static inline int leaf() {
>> +  int i, ret = 0;
>> +  for (i = 0; i < 10; i++)
>> +    ret += i;
>> +  return ret;
>> +}
>> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
>> leaf .*inlined into bar .*via inline instance foo.*\n" } */
>> +int bar(void) { return foo(); }
>>>
>>> Thanks,
>>> Teresa
>>>
>>>>
>>>> Thanks,
>>>>
>>>> Martin
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Xinliang David Li Aug. 28, 2013, 3:20 p.m. UTC | #7
On Wed, Aug 28, 2013 at 7:09 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Wed, Aug 28, 2013 at 4:01 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Aug 7, 2013 at 7:23 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>>>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>>>>> >> This patch ports messages to the new dump framework,
>>>>>> >
>>>>>> > It would be great this new framework was documented somewhere.  I lost
>>>>>> > track of what was agreed it would be and from the uses in the
>>>>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>>>>
>>>>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>>>>> wiki or elsewhere?
>>>>>
>>>>> Thanks
>>>>>
>>>>>>
>>>>>> >
>>>>>> > I'd also like to point out two other minor things inline:
>>>>>> >
>>>>>> > [...]
>>>>>> >
>>>>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>>> >>             Dehao Chen  <dehao@google.com>
>>>>>> >>
>>>>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>>>>> >>         consistent.
>>>>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>>>>> >>         (cgraph_node_opt_info): New function.
>>>>>> >>         (cgraph_node_call_chain): Ditto.
>>>>>> >>         (dump_inline_decision): Ditto.
>>>>>> >>         (inline_call): Invoke dump_inline_decision.
>>>>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>>> >>         (compute_branch_probabilities): Ditto.
>>>>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>>> >>         when pass not in any opt group.
>>>>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>>>>> >>         (find_func_by_funcdef_no): Ditto.
>>>>>> >>         (check_ic_target): Ditto.
>>>>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>>>>> >>         (coverage_init): Setup new dump framework.
>>>>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>>>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>>>>> >>
>>>>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>>>>> >>
>>>>>> >
>>>>>> > [...]
>>>>>> >
>>>>>> >> Index: ipa-inline-transform.c
>>>>>> >> ===================================================================
>>>>>> >> --- ipa-inline-transform.c      (revision 201461)
>>>>>> >> +++ ipa-inline-transform.c      (working copy)
>>>>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>>> >>  }
>>>>>> >>
>>>>>> >>
>>>>>> >> +#define MAX_INT_LENGTH 20
>>>>>> >> +
>>>>>> >> +/* Return NODE's name and profile count, if available.  */
>>>>>> >> +
>>>>>> >> +static const char *
>>>>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>>>>> >> +{
>>>>>> >> +  char *buf;
>>>>>> >> +  size_t buf_size;
>>>>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>>>> >> +
>>>>>> >> +  if (!bfd_name)
>>>>>> >> +    bfd_name = "unknown";
>>>>>> >> +
>>>>>> >> +  buf_size = strlen (bfd_name) + 1;
>>>>>> >> +  if (profile_info)
>>>>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>>>>> >> +
>>>>>> >> +  buf = (char *) xmalloc (buf_size);
>>>>>> >> +
>>>>>> >> +  strcpy (buf, bfd_name);
>>>>>> >> +
>>>>>> >> +  if (profile_info)
>>>>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>>>> >> +  return buf;
>>>>>> >> +}
>>>>>> >
>>>>>> > I'm not sure if output of this function is aimed only at the user or
>>>>>> > if it is supposed to be used by gcc developers as well.  If the
>>>>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>>>>> > too.  We usually dump it after "/" sign separating it from node name.
>>>>>> > It is invaluable when examining decisions in C++ code where you can
>>>>>> > have lots of clones of a node (and also because existing dumps print
>>>>>> > it, it is easy to combine them).
>>>>>>
>>>>>> The output is useful for both power users doing performance tuning of
>>>>>> their application, and by gcc developers. Adding the id is not so
>>>>>> useful for the former, but I agree that it is very useful for compiler
>>>>>> developers. In fact, in the google branch version we emit more verbose
>>>>>> information (the lipo module id and the funcdef_no) to help uniquely
>>>>>> identify the routines and to aid in post-processing by humans and
>>>>>> tools. So it is probably useful to add something similar here too. Is
>>>>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>>>>> that you added a patch a few months ago to print the
>>>>>> node->symbol.order in the function header, and it also has the
>>>>>> advantage as you note of matching up with existing ipa dumps.
>>>>>
>>>>> node->symbol.order is unique and if I remember correctly, it is not
>>>>> even recycled.  Clones, inline clones, thunks, every symbol table node
>>>>> gets its own symbol order so it should be more unique than funcdef_no.
>>>>> On the other hand it may be a bit cryptic for users but at the same
>>>>> time it is only one number.
>>>>
>>>> Ok, I am going to go ahead and add this to the output.
>>>>
>>>>>
>>>>>>
>>>>>> >
>>>>>> > [...]
>>>>>> >
>>>>>> >> Index: ipa-inline.c
>>>>>> >> ===================================================================
>>>>>> >> --- ipa-inline.c        (revision 201461)
>>>>>> >> +++ ipa-inline.c        (working copy)
>>>>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>>>>> >>  static int overall_size;
>>>>>> >>  static gcov_type max_count;
>>>>>> >>
>>>>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>>>>> >> +bool is_in_ipa_inline = false;
>>>>>> >> +
>>>>>> >>  /* Return false when inlining edge E would lead to violating
>>>>>> >>     limits on function unit growth or stack usage growth.
>>>>>> >>
>>>>>> >
>>>>>> > In this age of removing global variables, are you sure you need this?
>>>>>> > The only user of this seems to be a function that is only being called
>>>>>> > from inline_call... can that ever happen when not inlining?  If you
>>>>>> > plan to use this function also elsewhere, perhaps the callers will
>>>>>> > know whether we are inlining or not and can provide this in a
>>>>>> > parameter?
>>>>>>
>>>>>> This is to distinguish early inlining from ipa inlining.
>>>>>
>>>>> Oh, right, I did not realize that the IPA part was the important bit
>>>>> of the name.
>>>>>
>>>>>> The volume of
>>>>>> early inlining messages is too high to be on for the default setting
>>>>>> of -fopt-info, and are not as interesting usually for performance
>>>>>> tuning. The dumper will only emit the early inline messages under a
>>>>>> more verbose setting (MSG_NOTE):
>>>>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>>>>> The other way I can see to distinguish this would be to check the
>>>>>> always_inline_functions_inlined flag on the caller's function. It
>>>>>> could also be possible to pass down a flag from the callers of
>>>>>> inline_call, but at least one caller (flatten_functions) is shared
>>>>>> between early and late inlining, so the flag needs to be passed
>>>>>> through that as well. WDYT?
>>>>>
>>>>> Did you mean flatten_function?  It already has a bool "early"
>>>>> parameter.  But I can see that being able to quickly figure out
>>>>> whether we are in early inliner or ipa inliner without much hassle is
>>>>> useful enough to justify a global variable a month ago, however I
>>>>> suppose we should not be introducing them now and so you'd have to put
>>>>> such stuff into... well, you'd probably have to put into the universe
>>>>> object somewhere because it is basically shared between two passes.
>>>>> Another option, even though somewhat hackish, would be to look at
>>>>> current_pass and see which pass it is.  I don't know, do what is
>>>>> easier or what you like more, just be aware of the problem.
>>>>
>>>> After thinking about this some more, I think passing down an early
>>>> flag from callers is the cleanest way to go.
>>>>
>>>> I'll fix these and post a new patch later today.
>>>
>>> New patch below that removes this global variable, and also outputs
>>> the node->symbol.order (in square brackets after the function name so
>>> as to not clutter it). Inline messages with profile data look look:
>>>
>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>
>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>> understandable by GCC users, not only GCC developers.
>
> The main part that is only useful/understandable to gcc developers is
> the node->symbol.order in square brackes, requested by Martin. One
> possibility is that I could put that part under a param, disabled by
> default. We have something similar on the google branches that emits
> LIPO module info in the message, enabled via a param.
>
> I'd argue that the other information (the profile counts, emitted only
> when using -fprofile-use, and the inline call chains) are useful if
> you want to understand whether and how critical inlines are occurring.
> I think this is the type of information that users focused on
> optimizations, as well as gcc developers, want when they use
> -fopt-info. Otherwise it is difficult to make sense of the inline
> information.
>
>>
>>> (without FDO the counts in parentheses and the call count would not be
>>> included).
>>>
>>> Ok for trunk?
>>
>> Let's split this patch.
>
> Ok.
>
>>
>>> Thanks,
>>> Teresa
>>>
>>> 013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>             Dehao Chen  <dehao@google.com>
>>>
>>>         * dumpfile.c (dump_loc): Output column number, make newlines consistent.
>>
>> I don't like column numbers, they are of not much use generally.
>
> I added these here to get consistency with other messages (notes
> emitted via inform(), warnings, errors). Plus the dg-message testing
> was failing for the test cases that parse this output, since it
> expects the column to exist.
>
>> Does
>> 'make newlines consitent' avoid all the spurious vertical spacing I see with
>> -fopt-info?
>
> Well, it helps get us there. The problem was that before, since
> dump_loc was not consistently emitting newlines, the calls had to emit
> their own newlines manually in the string to ensure there was a
> newline at all. I was thinking that once this is fixed I could go back
> and clean up all those calls by removing the newlines in the string. I
> could split this part into a separate patch and do both at once.
>
> However, after thinking about this some more this morning, I am
> wondering whether it is better to remove the newline emission
> completely from dump_loc and rely on the caller to put the newline in
> the string. The reason is that there are 2 high level interfaces to
> the new dump infrastructure, dump_printf() and dump_printf_loc(). Only
> the latter invokes dump_loc and gets the newline at the start of the
> message. The typical usage seems to be to start a message via
> dump_printf_loc, and then use dump_printf to emit parts of the message
> (thus not requiring a newline), but I think it may lead to problems to
> rely on this assumption.
>
> So if you agree, I will simply remove the newline altogether from
> dump_loc, and ensure that all clients of dump_printf/dump_printf_loc
> include a newline char as appropriate in the string they pass.


As a helper function, dump_loc should not blindly emit new line as it
has no context.  I have tried to remove it, and push the newline to
higher level helpers -- it mostly works, but the vectorizer verbose
messages need serious clean up -- most of them assume that
dump_printf_loc does not end with new line, so that the expression
dump can follow in the same line (the message texts need clean up too
-- i do not like the === === in info messages).

David


>
>>
>>>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>
>> Good change - please split this out (with the related changes) and commit it.
>
> Ok, thanks. Will do.
>
>>
>>>         * ipa-inline-transform.c (cgraph_node_opt_info): New function.
>>>         (cgraph_node_call_chain): Ditto.
>>>         (dump_inline_decision): Ditto.
>>>         (inline_call): Invoke dump_inline_decision, new parameter.
>>
>> The inline stuff should be split and re-sent, it's non-obvious to me (extra
>> function parameters are not documented for example).  I'd rather have
>> inline_and_report_call () for example instead of an extra bool parameter.
>> But let's iterate over this once it's split out.
>
> Ok, I will send this separately. I guess we could have a separate
> interface inline_and_report_call that is a wrapper around inline_call
> and simply invokes the dumper. Note that flatten_function will need to
> conditionally call one of the two interfaces based on the value of its
> bool early parameter though.
>
>>
>>>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>         (compute_branch_probabilities): Ditto.
>>>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>         when pass not in any opt group.
>>>         * value-prof.c (check_counter): Use new dump framework.
>>>         (find_func_by_funcdef_no): Ditto.
>>>         (check_ic_target): Ditto.
>>>         * coverage.c (get_coverage_counts): Ditto.
>>>         (coverage_init): Setup new dump framework.
>>
>> These pieces look good to me.
>>
>>>         * ipa-inline.c (recursive_inlining): New inline_call parameter.
>>>         (inline_small_functions): Ditto.
>>>         (flatten_function): Ditto.
>>>         (ipa_inline): Ditto.
>>>         (inline_always_inline_functions): Ditto.
>>>         (early_inline_small_functions): Ditto.
>>>         * ipa-inline.h: Ditto.
>>>
>>>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>         * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.
>>
>> Why?  Just remove the stray dg- annotations that deal with the unwanted output?
>
> Because there are dg-message annotations that want to confirm this output.
>
> Teresa
>
>>
>> Thanks,
>> Richard.
>>
>>>         * testsuite/gcc.dg/inline-dump.c: New test.
>>>
>>> Index: dumpfile.c
>>> ===================================================================
>>> --- dumpfile.c  (revision 201461)
>>> +++ dumpfile.c  (working copy)
>>> @@ -257,16 +257,18 @@ dump_open_alternate_stream (struct dump_file_info
>>>  void
>>>  dump_loc (int dump_kind, FILE *dfile, source_location loc)
>>>  {
>>> -  /* Currently vectorization passes print location information.  */
>>>    if (dump_kind)
>>>      {
>>> +      /* Ensure dump message starts on a new line.  */
>>> +      fprintf (dfile, "\n");
>>>        if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
>>> -        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
>>> -                 LOCATION_LINE (loc));
>>> +        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
>>> +                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
>>>        else if (current_function_decl)
>>> -        fprintf (dfile, "\n%s:%d: note: ",
>>> +        fprintf (dfile, "%s:%d:%d: note: ",
>>>                   DECL_SOURCE_FILE (current_function_decl),
>>> -                 DECL_SOURCE_LINE (current_function_decl));
>>> +                 DECL_SOURCE_LINE (current_function_decl),
>>> +                 DECL_SOURCE_COLUMN (current_function_decl));
>>>      }
>>>  }
>>>
>>> Index: dumpfile.h
>>> ===================================================================
>>> --- dumpfile.h  (revision 201461)
>>> +++ dumpfile.h  (working copy)
>>> @@ -97,8 +97,9 @@ enum tree_dump_index
>>>  #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
>>>  #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
>>>  #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
>>> +#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
>>>  #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
>>> -                              | OPTGROUP_VEC)
>>> +                              | OPTGROUP_VEC | OPTGROUP_OTHER)
>>>
>>>  /* Define a tree dump switch.  */
>>>  struct dump_file_info
>>> Index: ipa-inline-transform.c
>>> ===================================================================
>>> --- ipa-inline-transform.c      (revision 201461)
>>> +++ ipa-inline-transform.c      (working copy)
>>> @@ -192,6 +192,111 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>  }
>>>
>>>
>>> +#define MAX_INT_LENGTH 20
>>> +
>>> +/* Return NODE's name and profile count, if available.  */
>>> +
>>> +static const char *
>>> +cgraph_node_opt_info (struct cgraph_node *node)
>>> +{
>>> +  char *buf;
>>> +  size_t buf_size;
>>> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>> +
>>> +  if (!bfd_name)
>>> +    bfd_name = "unknown";
>>> +
>>> +  buf_size = strlen (bfd_name) + 1;
>>> +  if (profile_info)
>>> +    buf_size += (MAX_INT_LENGTH + 3);
>>> +  buf_size += MAX_INT_LENGTH;
>>> +
>>> +  buf = (char *) xmalloc (buf_size);
>>> +
>>> +  strcpy (buf, bfd_name);
>>> +  //sprintf (buf, "%s/%i", buf, node->symbol.order);
>>> +  sprintf (buf, "%s [%i]", buf, node->symbol.order);
>>> +
>>> +  if (profile_info)
>>> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>> +  return buf;
>>> +}
>>> +
>>> +
>>> +/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
>>> +   function that the caller is inlined to in FINAL_CALLER.  */
>>> +
>>> +static const char *
>>> +cgraph_node_call_chain (struct cgraph_node *caller,
>>> +                       struct cgraph_node **final_caller)
>>> +{
>>> +  struct cgraph_node *node;
>>> +  const char *via_str = " (via inline instance";
>>> +  size_t current_string_len = strlen (via_str) + 1;
>>> +  size_t buf_size = current_string_len;
>>> +  char *buf = (char *) xmalloc (buf_size);
>>> +
>>> +  buf[0] = 0;
>>> +  gcc_assert (caller->global.inlined_to != NULL);
>>> +  strcat (buf, via_str);
>>> +  for (node = caller; node->global.inlined_to != NULL;
>>> +       node = node->callers->caller)
>>> +    {
>>> +      const char *name = cgraph_node_opt_info (node);
>>> +      current_string_len += (strlen (name) + 1);
>>> +      if (current_string_len >= buf_size)
>>> +       {
>>> +         buf_size = current_string_len * 2;
>>> +         buf = (char *) xrealloc (buf, buf_size);
>>> +       }
>>> +      strcat (buf, " ");
>>> +      strcat (buf, name);
>>> +    }
>>> +  strcat (buf, ")");
>>> +  *final_caller = node;
>>> +  return buf;
>>> +}
>>> +
>>> +
>>> +/* Dump the inline decision of EDGE.  */
>>> +
>>> +static void
>>> +dump_inline_decision (struct cgraph_edge *edge, bool early)
>>> +{
>>> +  location_t locus;
>>> +  const char *inline_chain_text;
>>> +  const char *call_count_text;
>>> +  struct cgraph_node *final_caller = edge->caller;
>>> +
>>> +  if (final_caller->global.inlined_to != NULL)
>>> +    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
>>> +  else
>>> +    inline_chain_text = "";
>>> +
>>> +  if (edge->count > 0)
>>> +    {
>>> +      const char *call_count_str = " with call count ";
>>> +      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
>>> +      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
>>> +              edge->count);
>>> +      call_count_text = buf;
>>> +    }
>>> +  else
>>> +    {
>>> +      call_count_text = "";
>>> +    }
>>> +
>>> +  locus = gimple_location (edge->call_stmt);
>>> +  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
>>> +                   locus,
>>> +                   "%s inlined into %s%s%s\n",
>>> +                   cgraph_node_opt_info (edge->callee),
>>> +                   cgraph_node_opt_info (final_caller),
>>> +                   call_count_text,
>>> +                   inline_chain_text);
>>> +}
>>> +
>>> +
>>>  /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
>>>     specify whether profile of original function should be updated.  If any new
>>>     indirect edges are discovered in the process, add them to NEW_EDGES, unless
>>> @@ -205,7 +310,8 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>  bool
>>>  inline_call (struct cgraph_edge *e, bool update_original,
>>>              vec<cgraph_edge_p> *new_edges,
>>> -            int *overall_size, bool update_overall_summary)
>>> +            int *overall_size, bool update_overall_summary,
>>> +             bool early)
>>>  {
>>>    int old_size = 0, new_size = 0;
>>>    struct cgraph_node *to = NULL;
>>> @@ -218,6 +324,9 @@ inline_call (struct cgraph_edge *e, bool update_or
>>>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>>>  #endif
>>>
>>> +  if (dump_enabled_p ())
>>> +    dump_inline_decision (e, early);
>>> +
>>>    /* Don't inline inlined edges.  */
>>>    gcc_assert (e->inline_failed);
>>>    /* Don't even think of inlining inline clone.  */
>>> Index: doc/invoke.texi
>>> ===================================================================
>>> --- doc/invoke.texi     (revision 201461)
>>> +++ doc/invoke.texi     (working copy)
>>> @@ -6234,6 +6234,9 @@ Enable dumps from all loop optimizations.
>>>  Enable dumps from all inlining optimizations.
>>>  @item vec
>>>  Enable dumps from all vectorization optimizations.
>>> +@item optall
>>> +Enable dumps from all optimizations. This is a superset of
>>> +the optimization groups listed above.
>>>  @end table
>>>
>>>  For example,
>>> Index: profile.c
>>> ===================================================================
>>> --- profile.c   (revision 201461)
>>> +++ profile.c   (working copy)
>>> @@ -432,8 +432,8 @@ read_profile_edge_counts (gcov_type *exec_counts)
>>>                     if (flag_profile_correction)
>>>                       {
>>>                         static bool informed = 0;
>>> -                       if (!informed)
>>> -                         inform (input_location,
>>> +                       if (dump_enabled_p () && !informed)
>>> +                         dump_printf_loc (MSG_NOTE, input_location,
>>>                                   "corrupted profile info: edge count
>>> exceeds maximal count");
>>>                         informed = 1;
>>>                       }
>>> @@ -692,10 +692,11 @@ compute_branch_probabilities (unsigned cfg_checksu
>>>         {
>>>           /* Inconsistency detected. Make it flow-consistent. */
>>>           static int informed = 0;
>>> -         if (informed == 0)
>>> +         if (dump_enabled_p () && informed == 0)
>>>             {
>>>               informed = 1;
>>> -             inform (input_location, "correcting inconsistent profile data");
>>> +             dump_printf_loc (MSG_NOTE, input_location,
>>> +                              "correcting inconsistent profile data");
>>>             }
>>>           correct_negative_edge_counts ();
>>>           /* Set bb counts to the sum of the outgoing edge counts */
>>> Index: passes.c
>>> ===================================================================
>>> --- passes.c    (revision 201461)
>>> +++ passes.c    (working copy)
>>> @@ -524,6 +524,11 @@ pass_manager::register_one_dump_file (struct opt_p
>>>    flag_name = concat (prefix, name, num, NULL);
>>>    glob_name = concat (prefix, name, NULL);
>>>    optgroup_flags |= pass->optinfo_flags;
>>> +  /* For any passes that do not have an optgroup set, and which are not
>>> +     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
>>> +     any dump messages are emitted properly under -fopt-info(-optall).  */
>>> +  if (optgroup_flags == OPTGROUP_NONE)
>>> +    optgroup_flags = OPTGROUP_OTHER;
>>>    id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
>>>    set_pass_for_id (id, pass);
>>>    full_name = concat (prefix, pass->name, num, NULL);
>>> Index: value-prof.c
>>> ===================================================================
>>> --- value-prof.c        (revision 201461)
>>> +++ value-prof.c        (working copy)
>>> @@ -585,9 +585,11 @@ check_counter (gimple stmt, const char * name,
>>>                : DECL_SOURCE_LOCATION (current_function_decl);
>>>        if (flag_profile_correction)
>>>          {
>>> -         inform (locus, "correcting inconsistent value profile: "
>>> -                 "%s profiler overall count (%d) does not match BB count "
>>> -                  "(%d)", name, (int)*all, (int)bb_count);
>>> +          if (dump_enabled_p ())
>>> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>> +                             "correcting inconsistent value profile: %s "
>>> +                             "profiler overall count (%d) does not match BB "
>>> +                             "count (%d)", name, (int)*all, (int)bb_count);
>>>           *all = bb_count;
>>>           if (*count > *all)
>>>              *count = *all;
>>> @@ -1209,9 +1211,11 @@ find_func_by_funcdef_no (int func_id)
>>>    int max_id = get_last_funcdef_no ();
>>>    if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
>>>      {
>>> -      if (flag_profile_correction)
>>> -        inform (DECL_SOURCE_LOCATION (current_function_decl),
>>> -                "Inconsistent profile: indirect call target (%d) does
>>> not exist", func_id);
>>> +      if (flag_profile_correction && dump_enabled_p ())
>>> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>>> +                         DECL_SOURCE_LOCATION (current_function_decl),
>>> +                         "Inconsistent profile: indirect call target (%d) "
>>> +                         "does not exist", func_id);
>>>        else
>>>          error ("Inconsistent profile: indirect call target (%d) does
>>> not exist", func_id);
>>>
>>> @@ -1235,8 +1239,10 @@ check_ic_target (gimple call_stmt, struct cgraph_n
>>>       return true;
>>>
>>>     locus =  gimple_location (call_stmt);
>>> -   inform (locus, "Skipping target %s with mismatching types for icall ",
>>> -           cgraph_node_name (target));
>>> +   if (dump_enabled_p ())
>>> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>> +                      "Skipping target %s with mismatching types for icall ",
>>> +                      cgraph_node_name (target));
>>>     return false;
>>>  }
>>>
>>> Index: coverage.c
>>> ===================================================================
>>> --- coverage.c  (revision 201461)
>>> +++ coverage.c  (working copy)
>>> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "langhooks.h"
>>>  #include "hash-table.h"
>>>  #include "tree-iterator.h"
>>> +#include "tree-pass.h"
>>>  #include "cgraph.h"
>>>  #include "dumpfile.h"
>>>  #include "diagnostic-core.h"
>>> @@ -341,11 +342,13 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>      {
>>>        static int warned = 0;
>>>
>>> -      if (!warned++)
>>> -       inform (input_location, (flag_guess_branch_prob
>>> -                ? "file %s not found, execution counts estimated"
>>> -                : "file %s not found, execution counts assumed to be zero"),
>>> -               da_file_name);
>>> +      if (!warned++ && dump_enabled_p ())
>>> +       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                         (flag_guess_branch_prob
>>> +                          ? "file %s not found, execution counts estimated"
>>> +                          : "file %s not found, execution counts assumed to "
>>> +                            "be zero"),
>>> +                         da_file_name);
>>>        return NULL;
>>>      }
>>>
>>> @@ -369,21 +372,25 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>         warning_at (input_location, OPT_Wcoverage_mismatch,
>>>                     "the control flow of function %qE does not match "
>>>                     "its profile data (counter %qs)", id, ctr_names[counter]);
>>> -      if (warning_printed)
>>> +      if (warning_printed && dump_enabled_p ())
>>>         {
>>> -        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
>>> -                "the mismatch but performance may drop if the
>>> function is hot");
>>> +          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                           "use -Wno-error=coverage-mismatch to tolerate "
>>> +                           "the mismatch but performance may drop if the "
>>> +                           "function is hot");
>>>
>>>           if (!seen_error ()
>>>               && !warned++)
>>>             {
>>> -             inform (input_location, "coverage mismatch ignored");
>>> -             inform (input_location, flag_guess_branch_prob
>>> -                     ? G_("execution counts estimated")
>>> -                     : G_("execution counts assumed to be zero"));
>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                               "coverage mismatch ignored");
>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                               flag_guess_branch_prob
>>> +                               ? G_("execution counts estimated")
>>> +                               : G_("execution counts assumed to be zero"));
>>>               if (!flag_guess_branch_prob)
>>> -               inform (input_location,
>>> -                       "this can result in poorly optimized code");
>>> +               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                                 "this can result in poorly optimized code");
>>>             }
>>>         }
>>>
>>> @@ -1103,6 +1110,11 @@ coverage_init (const char *filename)
>>>    int len = strlen (filename);
>>>    int prefix_len = 0;
>>>
>>> +  /* Since coverage_init is invoked very early, before the pass
>>> +     manager, we need to set up the dumping explicitly. This is
>>> +     similar to the handling in finish_optimization_passes.  */
>>> +  dump_start (pass_profile.pass.static_pass_number, NULL);
>>> +
>>>    if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
>>>      profile_data_prefix = getpwd ();
>>>
>>> @@ -1145,6 +1157,8 @@ coverage_init (const char *filename)
>>>           gcov_write_unsigned (bbg_file_stamp);
>>>         }
>>>      }
>>> +
>>> +  dump_finish (pass_profile.pass.static_pass_number);
>>>  }
>>>
>>>  /* Performs file-level cleanup.  Close notes file, generate coverage
>>> Index: ipa-inline.c
>>> ===================================================================
>>> --- ipa-inline.c        (revision 201461)
>>> +++ ipa-inline.c        (working copy)
>>> @@ -1322,7 +1322,7 @@ recursive_inlining (struct cgraph_edge *edge,
>>>            reset_edge_growth_cache (curr);
>>>         }
>>>
>>> -      inline_call (curr, false, new_edges, &overall_size, true);
>>> +      inline_call (curr, false, new_edges, &overall_size, true, false);
>>>        lookup_recursive_calls (node, curr->callee, heap);
>>>        n++;
>>>      }
>>> @@ -1612,7 +1612,8 @@ inline_small_functions (void)
>>>             fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
>>>
>>>           gcc_checking_assert (!callee->global.inlined_to);
>>> -         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
>>> +         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
>>> +                       false);
>>>           if (flag_indirect_inlining)
>>>             add_new_edges_to_heap (edge_heap, new_indirect_edges);
>>>
>>> @@ -1733,7 +1734,7 @@ flatten_function (struct cgraph_node *node, bool e
>>>                  xstrdup (cgraph_node_name (callee)),
>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>        orig_callee = callee;
>>> -      inline_call (e, true, NULL, NULL, false);
>>> +      inline_call (e, true, NULL, NULL, false, early);
>>>        if (e->callee != orig_callee)
>>>         orig_callee->symbol.aux = (void *) node;
>>>        flatten_function (e->callee, early);
>>> @@ -1852,7 +1853,8 @@ ipa_inline (void)
>>>                                    inline_summary
>>> (node->callers->caller)->size);
>>>                         }
>>>
>>> -                     inline_call (node->callers, true, NULL, NULL, true);
>>> +                     inline_call (node->callers, true, NULL, NULL, true,
>>> +                                   false);
>>>                       if (dump_file)
>>>                         fprintf (dump_file,
>>>                                  " Inlined into %s which now has %i size\n",
>>> @@ -1925,7 +1927,7 @@ inline_always_inline_functions (struct cgraph_node
>>>         fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
>>>                  xstrdup (cgraph_node_name (e->callee)),
>>>                  xstrdup (cgraph_node_name (e->caller)));
>>> -      inline_call (e, true, NULL, NULL, false);
>>> +      inline_call (e, true, NULL, NULL, false, true);
>>>        inlined = true;
>>>      }
>>>    if (inlined)
>>> @@ -1977,7 +1979,7 @@ early_inline_small_functions (struct cgraph_node *
>>>         fprintf (dump_file, " Inlining %s into %s.\n",
>>>                  xstrdup (cgraph_node_name (callee)),
>>>                  xstrdup (cgraph_node_name (e->caller)));
>>> -      inline_call (e, true, NULL, NULL, true);
>>> +      inline_call (e, true, NULL, NULL, true, true);
>>>        inlined = true;
>>>      }
>>>
>>> Index: ipa-inline.h
>>> ===================================================================
>>> --- ipa-inline.h        (revision 201461)
>>> +++ ipa-inline.h        (working copy)
>>> @@ -228,7 +228,8 @@ void free_growth_caches (void);
>>>  void compute_inline_parameters (struct cgraph_node *, bool);
>>>
>>>  /* In ipa-inline-transform.c  */
>>> -bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
>>> int *, bool);
>>> +bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
>>> +                  bool, bool);
>>>  unsigned int inline_transform (struct cgraph_node *);
>>>  void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);
>>>
>>> Index: testsuite/gcc.dg/pr40209.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/pr40209.c  (revision 201461)
>>> +++ testsuite/gcc.dg/pr40209.c  (working copy)
>>> @@ -1,5 +1,5 @@
>>>  /* { dg-do compile } */
>>> -/* { dg-options "-O2 -fprofile-use" } */
>>> +/* { dg-options "-O2 -fprofile-use -fopt-info" } */
>>>
>>>  void process(const char *s);
>>>
>>> Index: testsuite/gcc.dg/pr26570.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/pr26570.c  (revision 201461)
>>> +++ testsuite/gcc.dg/pr26570.c  (working copy)
>>> @@ -1,5 +1,5 @@
>>>  /* { dg-do compile } */
>>> -/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
>>> +/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */
>>>
>>>  unsigned test (unsigned a, unsigned b)
>>>  {
>>> Index: testsuite/gcc.dg/pr32773.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/pr32773.c  (revision 201461)
>>> +++ testsuite/gcc.dg/pr32773.c  (working copy)
>>> @@ -1,6 +1,6 @@
>>>  /* { dg-do compile } */
>>> -/* { dg-options "-O -fprofile-use" } */
>>> -/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
>>> +/* { dg-options "-O -fprofile-use -fopt-info" } */
>>> +/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */
>>>
>>>  void foo (int *p)
>>>  {
>>> Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
>>> ===================================================================
>>> --- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
>>> +++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
>>> @@ -1,7 +1,7 @@
>>>  // PR tree-optimization/39557
>>>  // invalid post-dom info leads to infinite loop
>>>  // { dg-do run }
>>> -// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
>>> +// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
>>> -fno-rtti" }
>>>
>>>  struct C
>>>  {
>>> Index: testsuite/gcc.dg/inline-dump.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
>>> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
>>> @@ -0,0 +1,11 @@
>>> +/* Verify that -fopt-info can output correct inline info.  */
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
>>> +static inline int leaf() {
>>> +  int i, ret = 0;
>>> +  for (i = 0; i < 10; i++)
>>> +    ret += i;
>>> +  return ret;
>>> +}
>>> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
>>> leaf .*inlined into bar .*via inline instance foo.*\n" } */
>>> +int bar(void) { return foo(); }
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Martin
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Aug. 28, 2013, 4:07 p.m. UTC | #8
On Wed, Aug 28, 2013 at 7:09 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Wed, Aug 28, 2013 at 4:01 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Aug 7, 2013 at 7:23 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>>>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>>>>> >> This patch ports messages to the new dump framework,
>>>>>> >
>>>>>> > It would be great this new framework was documented somewhere.  I lost
>>>>>> > track of what was agreed it would be and from the uses in the
>>>>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>>>>
>>>>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>>>>> wiki or elsewhere?
>>>>>
>>>>> Thanks
>>>>>
>>>>>>
>>>>>> >
>>>>>> > I'd also like to point out two other minor things inline:
>>>>>> >
>>>>>> > [...]
>>>>>> >
>>>>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>>> >>             Dehao Chen  <dehao@google.com>
>>>>>> >>
>>>>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>>>>> >>         consistent.
>>>>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>>>>> >>         (cgraph_node_opt_info): New function.
>>>>>> >>         (cgraph_node_call_chain): Ditto.
>>>>>> >>         (dump_inline_decision): Ditto.
>>>>>> >>         (inline_call): Invoke dump_inline_decision.
>>>>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>>> >>         (compute_branch_probabilities): Ditto.
>>>>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>>> >>         when pass not in any opt group.
>>>>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>>>>> >>         (find_func_by_funcdef_no): Ditto.
>>>>>> >>         (check_ic_target): Ditto.
>>>>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>>>>> >>         (coverage_init): Setup new dump framework.
>>>>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>>>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>>>>> >>
>>>>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>>>>> >>
>>>>>> >
>>>>>> > [...]
>>>>>> >
>>>>>> >> Index: ipa-inline-transform.c
>>>>>> >> ===================================================================
>>>>>> >> --- ipa-inline-transform.c      (revision 201461)
>>>>>> >> +++ ipa-inline-transform.c      (working copy)
>>>>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>>> >>  }
>>>>>> >>
>>>>>> >>
>>>>>> >> +#define MAX_INT_LENGTH 20
>>>>>> >> +
>>>>>> >> +/* Return NODE's name and profile count, if available.  */
>>>>>> >> +
>>>>>> >> +static const char *
>>>>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>>>>> >> +{
>>>>>> >> +  char *buf;
>>>>>> >> +  size_t buf_size;
>>>>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>>>> >> +
>>>>>> >> +  if (!bfd_name)
>>>>>> >> +    bfd_name = "unknown";
>>>>>> >> +
>>>>>> >> +  buf_size = strlen (bfd_name) + 1;
>>>>>> >> +  if (profile_info)
>>>>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>>>>> >> +
>>>>>> >> +  buf = (char *) xmalloc (buf_size);
>>>>>> >> +
>>>>>> >> +  strcpy (buf, bfd_name);
>>>>>> >> +
>>>>>> >> +  if (profile_info)
>>>>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>>>> >> +  return buf;
>>>>>> >> +}
>>>>>> >
>>>>>> > I'm not sure if output of this function is aimed only at the user or
>>>>>> > if it is supposed to be used by gcc developers as well.  If the
>>>>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>>>>> > too.  We usually dump it after "/" sign separating it from node name.
>>>>>> > It is invaluable when examining decisions in C++ code where you can
>>>>>> > have lots of clones of a node (and also because existing dumps print
>>>>>> > it, it is easy to combine them).
>>>>>>
>>>>>> The output is useful for both power users doing performance tuning of
>>>>>> their application, and by gcc developers. Adding the id is not so
>>>>>> useful for the former, but I agree that it is very useful for compiler
>>>>>> developers. In fact, in the google branch version we emit more verbose
>>>>>> information (the lipo module id and the funcdef_no) to help uniquely
>>>>>> identify the routines and to aid in post-processing by humans and
>>>>>> tools. So it is probably useful to add something similar here too. Is
>>>>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>>>>> that you added a patch a few months ago to print the
>>>>>> node->symbol.order in the function header, and it also has the
>>>>>> advantage as you note of matching up with existing ipa dumps.
>>>>>
>>>>> node->symbol.order is unique and if I remember correctly, it is not
>>>>> even recycled.  Clones, inline clones, thunks, every symbol table node
>>>>> gets its own symbol order so it should be more unique than funcdef_no.
>>>>> On the other hand it may be a bit cryptic for users but at the same
>>>>> time it is only one number.
>>>>
>>>> Ok, I am going to go ahead and add this to the output.
>>>>
>>>>>
>>>>>>
>>>>>> >
>>>>>> > [...]
>>>>>> >
>>>>>> >> Index: ipa-inline.c
>>>>>> >> ===================================================================
>>>>>> >> --- ipa-inline.c        (revision 201461)
>>>>>> >> +++ ipa-inline.c        (working copy)
>>>>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>>>>> >>  static int overall_size;
>>>>>> >>  static gcov_type max_count;
>>>>>> >>
>>>>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>>>>> >> +bool is_in_ipa_inline = false;
>>>>>> >> +
>>>>>> >>  /* Return false when inlining edge E would lead to violating
>>>>>> >>     limits on function unit growth or stack usage growth.
>>>>>> >>
>>>>>> >
>>>>>> > In this age of removing global variables, are you sure you need this?
>>>>>> > The only user of this seems to be a function that is only being called
>>>>>> > from inline_call... can that ever happen when not inlining?  If you
>>>>>> > plan to use this function also elsewhere, perhaps the callers will
>>>>>> > know whether we are inlining or not and can provide this in a
>>>>>> > parameter?
>>>>>>
>>>>>> This is to distinguish early inlining from ipa inlining.
>>>>>
>>>>> Oh, right, I did not realize that the IPA part was the important bit
>>>>> of the name.
>>>>>
>>>>>> The volume of
>>>>>> early inlining messages is too high to be on for the default setting
>>>>>> of -fopt-info, and are not as interesting usually for performance
>>>>>> tuning. The dumper will only emit the early inline messages under a
>>>>>> more verbose setting (MSG_NOTE):
>>>>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>>>>> The other way I can see to distinguish this would be to check the
>>>>>> always_inline_functions_inlined flag on the caller's function. It
>>>>>> could also be possible to pass down a flag from the callers of
>>>>>> inline_call, but at least one caller (flatten_functions) is shared
>>>>>> between early and late inlining, so the flag needs to be passed
>>>>>> through that as well. WDYT?
>>>>>
>>>>> Did you mean flatten_function?  It already has a bool "early"
>>>>> parameter.  But I can see that being able to quickly figure out
>>>>> whether we are in early inliner or ipa inliner without much hassle is
>>>>> useful enough to justify a global variable a month ago, however I
>>>>> suppose we should not be introducing them now and so you'd have to put
>>>>> such stuff into... well, you'd probably have to put into the universe
>>>>> object somewhere because it is basically shared between two passes.
>>>>> Another option, even though somewhat hackish, would be to look at
>>>>> current_pass and see which pass it is.  I don't know, do what is
>>>>> easier or what you like more, just be aware of the problem.
>>>>
>>>> After thinking about this some more, I think passing down an early
>>>> flag from callers is the cleanest way to go.
>>>>
>>>> I'll fix these and post a new patch later today.
>>>
>>> New patch below that removes this global variable, and also outputs
>>> the node->symbol.order (in square brackets after the function name so
>>> as to not clutter it). Inline messages with profile data look look:
>>>
>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>
>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>> understandable by GCC users, not only GCC developers.
>
> The main part that is only useful/understandable to gcc developers is
> the node->symbol.order in square brackes, requested by Martin. One
> possibility is that I could put that part under a param, disabled by
> default. We have something similar on the google branches that emits
> LIPO module info in the message, enabled via a param.
>
> I'd argue that the other information (the profile counts, emitted only
> when using -fprofile-use, and the inline call chains) are useful if
> you want to understand whether and how critical inlines are occurring.
> I think this is the type of information that users focused on
> optimizations, as well as gcc developers, want when they use
> -fopt-info. Otherwise it is difficult to make sense of the inline
> information.
>
>>
>>> (without FDO the counts in parentheses and the call count would not be
>>> included).
>>>
>>> Ok for trunk?
>>
>> Let's split this patch.
>
> Ok.
>
>>
>>> Thanks,
>>> Teresa
>>>
>>> 013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>             Dehao Chen  <dehao@google.com>
>>>
>>>         * dumpfile.c (dump_loc): Output column number, make newlines consistent.
>>
>> I don't like column numbers, they are of not much use generally.
>
> I added these here to get consistency with other messages (notes
> emitted via inform(), warnings, errors). Plus the dg-message testing
> was failing for the test cases that parse this output, since it
> expects the column to exist.

The above change (output column number) and the changes in the
testsuite go with the change you have approved below (due to moving
some profile messages to the new framework). Ok to commit these along
with that approved portion?

Thanks,
Teresa

>
>> Does
>> 'make newlines consitent' avoid all the spurious vertical spacing I see with
>> -fopt-info?
>
> Well, it helps get us there. The problem was that before, since
> dump_loc was not consistently emitting newlines, the calls had to emit
> their own newlines manually in the string to ensure there was a
> newline at all. I was thinking that once this is fixed I could go back
> and clean up all those calls by removing the newlines in the string. I
> could split this part into a separate patch and do both at once.
>
> However, after thinking about this some more this morning, I am
> wondering whether it is better to remove the newline emission
> completely from dump_loc and rely on the caller to put the newline in
> the string. The reason is that there are 2 high level interfaces to
> the new dump infrastructure, dump_printf() and dump_printf_loc(). Only
> the latter invokes dump_loc and gets the newline at the start of the
> message. The typical usage seems to be to start a message via
> dump_printf_loc, and then use dump_printf to emit parts of the message
> (thus not requiring a newline), but I think it may lead to problems to
> rely on this assumption.
>
> So if you agree, I will simply remove the newline altogether from
> dump_loc, and ensure that all clients of dump_printf/dump_printf_loc
> include a newline char as appropriate in the string they pass.
>
>>
>>>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>
>> Good change - please split this out (with the related changes) and commit it.
>
> Ok, thanks. Will do.
>
>>
>>>         * ipa-inline-transform.c (cgraph_node_opt_info): New function.
>>>         (cgraph_node_call_chain): Ditto.
>>>         (dump_inline_decision): Ditto.
>>>         (inline_call): Invoke dump_inline_decision, new parameter.
>>
>> The inline stuff should be split and re-sent, it's non-obvious to me (extra
>> function parameters are not documented for example).  I'd rather have
>> inline_and_report_call () for example instead of an extra bool parameter.
>> But let's iterate over this once it's split out.
>
> Ok, I will send this separately. I guess we could have a separate
> interface inline_and_report_call that is a wrapper around inline_call
> and simply invokes the dumper. Note that flatten_function will need to
> conditionally call one of the two interfaces based on the value of its
> bool early parameter though.
>
>>
>>>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>         (compute_branch_probabilities): Ditto.
>>>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>         when pass not in any opt group.
>>>         * value-prof.c (check_counter): Use new dump framework.
>>>         (find_func_by_funcdef_no): Ditto.
>>>         (check_ic_target): Ditto.
>>>         * coverage.c (get_coverage_counts): Ditto.
>>>         (coverage_init): Setup new dump framework.
>>
>> These pieces look good to me.
>>
>>>         * ipa-inline.c (recursive_inlining): New inline_call parameter.
>>>         (inline_small_functions): Ditto.
>>>         (flatten_function): Ditto.
>>>         (ipa_inline): Ditto.
>>>         (inline_always_inline_functions): Ditto.
>>>         (early_inline_small_functions): Ditto.
>>>         * ipa-inline.h: Ditto.
>>>
>>>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>         * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.
>>
>> Why?  Just remove the stray dg- annotations that deal with the unwanted output?
>
> Because there are dg-message annotations that want to confirm this output.
>
> Teresa
>
>>
>> Thanks,
>> Richard.
>>
>>>         * testsuite/gcc.dg/inline-dump.c: New test.
>>>
>>> Index: dumpfile.c
>>> ===================================================================
>>> --- dumpfile.c  (revision 201461)
>>> +++ dumpfile.c  (working copy)
>>> @@ -257,16 +257,18 @@ dump_open_alternate_stream (struct dump_file_info
>>>  void
>>>  dump_loc (int dump_kind, FILE *dfile, source_location loc)
>>>  {
>>> -  /* Currently vectorization passes print location information.  */
>>>    if (dump_kind)
>>>      {
>>> +      /* Ensure dump message starts on a new line.  */
>>> +      fprintf (dfile, "\n");
>>>        if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
>>> -        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
>>> -                 LOCATION_LINE (loc));
>>> +        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
>>> +                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
>>>        else if (current_function_decl)
>>> -        fprintf (dfile, "\n%s:%d: note: ",
>>> +        fprintf (dfile, "%s:%d:%d: note: ",
>>>                   DECL_SOURCE_FILE (current_function_decl),
>>> -                 DECL_SOURCE_LINE (current_function_decl));
>>> +                 DECL_SOURCE_LINE (current_function_decl),
>>> +                 DECL_SOURCE_COLUMN (current_function_decl));
>>>      }
>>>  }
>>>
>>> Index: dumpfile.h
>>> ===================================================================
>>> --- dumpfile.h  (revision 201461)
>>> +++ dumpfile.h  (working copy)
>>> @@ -97,8 +97,9 @@ enum tree_dump_index
>>>  #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
>>>  #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
>>>  #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
>>> +#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
>>>  #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
>>> -                              | OPTGROUP_VEC)
>>> +                              | OPTGROUP_VEC | OPTGROUP_OTHER)
>>>
>>>  /* Define a tree dump switch.  */
>>>  struct dump_file_info
>>> Index: ipa-inline-transform.c
>>> ===================================================================
>>> --- ipa-inline-transform.c      (revision 201461)
>>> +++ ipa-inline-transform.c      (working copy)
>>> @@ -192,6 +192,111 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>  }
>>>
>>>
>>> +#define MAX_INT_LENGTH 20
>>> +
>>> +/* Return NODE's name and profile count, if available.  */
>>> +
>>> +static const char *
>>> +cgraph_node_opt_info (struct cgraph_node *node)
>>> +{
>>> +  char *buf;
>>> +  size_t buf_size;
>>> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>> +
>>> +  if (!bfd_name)
>>> +    bfd_name = "unknown";
>>> +
>>> +  buf_size = strlen (bfd_name) + 1;
>>> +  if (profile_info)
>>> +    buf_size += (MAX_INT_LENGTH + 3);
>>> +  buf_size += MAX_INT_LENGTH;
>>> +
>>> +  buf = (char *) xmalloc (buf_size);
>>> +
>>> +  strcpy (buf, bfd_name);
>>> +  //sprintf (buf, "%s/%i", buf, node->symbol.order);
>>> +  sprintf (buf, "%s [%i]", buf, node->symbol.order);
>>> +
>>> +  if (profile_info)
>>> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>> +  return buf;
>>> +}
>>> +
>>> +
>>> +/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
>>> +   function that the caller is inlined to in FINAL_CALLER.  */
>>> +
>>> +static const char *
>>> +cgraph_node_call_chain (struct cgraph_node *caller,
>>> +                       struct cgraph_node **final_caller)
>>> +{
>>> +  struct cgraph_node *node;
>>> +  const char *via_str = " (via inline instance";
>>> +  size_t current_string_len = strlen (via_str) + 1;
>>> +  size_t buf_size = current_string_len;
>>> +  char *buf = (char *) xmalloc (buf_size);
>>> +
>>> +  buf[0] = 0;
>>> +  gcc_assert (caller->global.inlined_to != NULL);
>>> +  strcat (buf, via_str);
>>> +  for (node = caller; node->global.inlined_to != NULL;
>>> +       node = node->callers->caller)
>>> +    {
>>> +      const char *name = cgraph_node_opt_info (node);
>>> +      current_string_len += (strlen (name) + 1);
>>> +      if (current_string_len >= buf_size)
>>> +       {
>>> +         buf_size = current_string_len * 2;
>>> +         buf = (char *) xrealloc (buf, buf_size);
>>> +       }
>>> +      strcat (buf, " ");
>>> +      strcat (buf, name);
>>> +    }
>>> +  strcat (buf, ")");
>>> +  *final_caller = node;
>>> +  return buf;
>>> +}
>>> +
>>> +
>>> +/* Dump the inline decision of EDGE.  */
>>> +
>>> +static void
>>> +dump_inline_decision (struct cgraph_edge *edge, bool early)
>>> +{
>>> +  location_t locus;
>>> +  const char *inline_chain_text;
>>> +  const char *call_count_text;
>>> +  struct cgraph_node *final_caller = edge->caller;
>>> +
>>> +  if (final_caller->global.inlined_to != NULL)
>>> +    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
>>> +  else
>>> +    inline_chain_text = "";
>>> +
>>> +  if (edge->count > 0)
>>> +    {
>>> +      const char *call_count_str = " with call count ";
>>> +      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
>>> +      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
>>> +              edge->count);
>>> +      call_count_text = buf;
>>> +    }
>>> +  else
>>> +    {
>>> +      call_count_text = "";
>>> +    }
>>> +
>>> +  locus = gimple_location (edge->call_stmt);
>>> +  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
>>> +                   locus,
>>> +                   "%s inlined into %s%s%s\n",
>>> +                   cgraph_node_opt_info (edge->callee),
>>> +                   cgraph_node_opt_info (final_caller),
>>> +                   call_count_text,
>>> +                   inline_chain_text);
>>> +}
>>> +
>>> +
>>>  /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
>>>     specify whether profile of original function should be updated.  If any new
>>>     indirect edges are discovered in the process, add them to NEW_EDGES, unless
>>> @@ -205,7 +310,8 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>  bool
>>>  inline_call (struct cgraph_edge *e, bool update_original,
>>>              vec<cgraph_edge_p> *new_edges,
>>> -            int *overall_size, bool update_overall_summary)
>>> +            int *overall_size, bool update_overall_summary,
>>> +             bool early)
>>>  {
>>>    int old_size = 0, new_size = 0;
>>>    struct cgraph_node *to = NULL;
>>> @@ -218,6 +324,9 @@ inline_call (struct cgraph_edge *e, bool update_or
>>>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>>>  #endif
>>>
>>> +  if (dump_enabled_p ())
>>> +    dump_inline_decision (e, early);
>>> +
>>>    /* Don't inline inlined edges.  */
>>>    gcc_assert (e->inline_failed);
>>>    /* Don't even think of inlining inline clone.  */
>>> Index: doc/invoke.texi
>>> ===================================================================
>>> --- doc/invoke.texi     (revision 201461)
>>> +++ doc/invoke.texi     (working copy)
>>> @@ -6234,6 +6234,9 @@ Enable dumps from all loop optimizations.
>>>  Enable dumps from all inlining optimizations.
>>>  @item vec
>>>  Enable dumps from all vectorization optimizations.
>>> +@item optall
>>> +Enable dumps from all optimizations. This is a superset of
>>> +the optimization groups listed above.
>>>  @end table
>>>
>>>  For example,
>>> Index: profile.c
>>> ===================================================================
>>> --- profile.c   (revision 201461)
>>> +++ profile.c   (working copy)
>>> @@ -432,8 +432,8 @@ read_profile_edge_counts (gcov_type *exec_counts)
>>>                     if (flag_profile_correction)
>>>                       {
>>>                         static bool informed = 0;
>>> -                       if (!informed)
>>> -                         inform (input_location,
>>> +                       if (dump_enabled_p () && !informed)
>>> +                         dump_printf_loc (MSG_NOTE, input_location,
>>>                                   "corrupted profile info: edge count
>>> exceeds maximal count");
>>>                         informed = 1;
>>>                       }
>>> @@ -692,10 +692,11 @@ compute_branch_probabilities (unsigned cfg_checksu
>>>         {
>>>           /* Inconsistency detected. Make it flow-consistent. */
>>>           static int informed = 0;
>>> -         if (informed == 0)
>>> +         if (dump_enabled_p () && informed == 0)
>>>             {
>>>               informed = 1;
>>> -             inform (input_location, "correcting inconsistent profile data");
>>> +             dump_printf_loc (MSG_NOTE, input_location,
>>> +                              "correcting inconsistent profile data");
>>>             }
>>>           correct_negative_edge_counts ();
>>>           /* Set bb counts to the sum of the outgoing edge counts */
>>> Index: passes.c
>>> ===================================================================
>>> --- passes.c    (revision 201461)
>>> +++ passes.c    (working copy)
>>> @@ -524,6 +524,11 @@ pass_manager::register_one_dump_file (struct opt_p
>>>    flag_name = concat (prefix, name, num, NULL);
>>>    glob_name = concat (prefix, name, NULL);
>>>    optgroup_flags |= pass->optinfo_flags;
>>> +  /* For any passes that do not have an optgroup set, and which are not
>>> +     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
>>> +     any dump messages are emitted properly under -fopt-info(-optall).  */
>>> +  if (optgroup_flags == OPTGROUP_NONE)
>>> +    optgroup_flags = OPTGROUP_OTHER;
>>>    id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
>>>    set_pass_for_id (id, pass);
>>>    full_name = concat (prefix, pass->name, num, NULL);
>>> Index: value-prof.c
>>> ===================================================================
>>> --- value-prof.c        (revision 201461)
>>> +++ value-prof.c        (working copy)
>>> @@ -585,9 +585,11 @@ check_counter (gimple stmt, const char * name,
>>>                : DECL_SOURCE_LOCATION (current_function_decl);
>>>        if (flag_profile_correction)
>>>          {
>>> -         inform (locus, "correcting inconsistent value profile: "
>>> -                 "%s profiler overall count (%d) does not match BB count "
>>> -                  "(%d)", name, (int)*all, (int)bb_count);
>>> +          if (dump_enabled_p ())
>>> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>> +                             "correcting inconsistent value profile: %s "
>>> +                             "profiler overall count (%d) does not match BB "
>>> +                             "count (%d)", name, (int)*all, (int)bb_count);
>>>           *all = bb_count;
>>>           if (*count > *all)
>>>              *count = *all;
>>> @@ -1209,9 +1211,11 @@ find_func_by_funcdef_no (int func_id)
>>>    int max_id = get_last_funcdef_no ();
>>>    if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
>>>      {
>>> -      if (flag_profile_correction)
>>> -        inform (DECL_SOURCE_LOCATION (current_function_decl),
>>> -                "Inconsistent profile: indirect call target (%d) does
>>> not exist", func_id);
>>> +      if (flag_profile_correction && dump_enabled_p ())
>>> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>>> +                         DECL_SOURCE_LOCATION (current_function_decl),
>>> +                         "Inconsistent profile: indirect call target (%d) "
>>> +                         "does not exist", func_id);
>>>        else
>>>          error ("Inconsistent profile: indirect call target (%d) does
>>> not exist", func_id);
>>>
>>> @@ -1235,8 +1239,10 @@ check_ic_target (gimple call_stmt, struct cgraph_n
>>>       return true;
>>>
>>>     locus =  gimple_location (call_stmt);
>>> -   inform (locus, "Skipping target %s with mismatching types for icall ",
>>> -           cgraph_node_name (target));
>>> +   if (dump_enabled_p ())
>>> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>> +                      "Skipping target %s with mismatching types for icall ",
>>> +                      cgraph_node_name (target));
>>>     return false;
>>>  }
>>>
>>> Index: coverage.c
>>> ===================================================================
>>> --- coverage.c  (revision 201461)
>>> +++ coverage.c  (working copy)
>>> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "langhooks.h"
>>>  #include "hash-table.h"
>>>  #include "tree-iterator.h"
>>> +#include "tree-pass.h"
>>>  #include "cgraph.h"
>>>  #include "dumpfile.h"
>>>  #include "diagnostic-core.h"
>>> @@ -341,11 +342,13 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>      {
>>>        static int warned = 0;
>>>
>>> -      if (!warned++)
>>> -       inform (input_location, (flag_guess_branch_prob
>>> -                ? "file %s not found, execution counts estimated"
>>> -                : "file %s not found, execution counts assumed to be zero"),
>>> -               da_file_name);
>>> +      if (!warned++ && dump_enabled_p ())
>>> +       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                         (flag_guess_branch_prob
>>> +                          ? "file %s not found, execution counts estimated"
>>> +                          : "file %s not found, execution counts assumed to "
>>> +                            "be zero"),
>>> +                         da_file_name);
>>>        return NULL;
>>>      }
>>>
>>> @@ -369,21 +372,25 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>         warning_at (input_location, OPT_Wcoverage_mismatch,
>>>                     "the control flow of function %qE does not match "
>>>                     "its profile data (counter %qs)", id, ctr_names[counter]);
>>> -      if (warning_printed)
>>> +      if (warning_printed && dump_enabled_p ())
>>>         {
>>> -        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
>>> -                "the mismatch but performance may drop if the
>>> function is hot");
>>> +          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                           "use -Wno-error=coverage-mismatch to tolerate "
>>> +                           "the mismatch but performance may drop if the "
>>> +                           "function is hot");
>>>
>>>           if (!seen_error ()
>>>               && !warned++)
>>>             {
>>> -             inform (input_location, "coverage mismatch ignored");
>>> -             inform (input_location, flag_guess_branch_prob
>>> -                     ? G_("execution counts estimated")
>>> -                     : G_("execution counts assumed to be zero"));
>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                               "coverage mismatch ignored");
>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                               flag_guess_branch_prob
>>> +                               ? G_("execution counts estimated")
>>> +                               : G_("execution counts assumed to be zero"));
>>>               if (!flag_guess_branch_prob)
>>> -               inform (input_location,
>>> -                       "this can result in poorly optimized code");
>>> +               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                                 "this can result in poorly optimized code");
>>>             }
>>>         }
>>>
>>> @@ -1103,6 +1110,11 @@ coverage_init (const char *filename)
>>>    int len = strlen (filename);
>>>    int prefix_len = 0;
>>>
>>> +  /* Since coverage_init is invoked very early, before the pass
>>> +     manager, we need to set up the dumping explicitly. This is
>>> +     similar to the handling in finish_optimization_passes.  */
>>> +  dump_start (pass_profile.pass.static_pass_number, NULL);
>>> +
>>>    if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
>>>      profile_data_prefix = getpwd ();
>>>
>>> @@ -1145,6 +1157,8 @@ coverage_init (const char *filename)
>>>           gcov_write_unsigned (bbg_file_stamp);
>>>         }
>>>      }
>>> +
>>> +  dump_finish (pass_profile.pass.static_pass_number);
>>>  }
>>>
>>>  /* Performs file-level cleanup.  Close notes file, generate coverage
>>> Index: ipa-inline.c
>>> ===================================================================
>>> --- ipa-inline.c        (revision 201461)
>>> +++ ipa-inline.c        (working copy)
>>> @@ -1322,7 +1322,7 @@ recursive_inlining (struct cgraph_edge *edge,
>>>            reset_edge_growth_cache (curr);
>>>         }
>>>
>>> -      inline_call (curr, false, new_edges, &overall_size, true);
>>> +      inline_call (curr, false, new_edges, &overall_size, true, false);
>>>        lookup_recursive_calls (node, curr->callee, heap);
>>>        n++;
>>>      }
>>> @@ -1612,7 +1612,8 @@ inline_small_functions (void)
>>>             fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
>>>
>>>           gcc_checking_assert (!callee->global.inlined_to);
>>> -         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
>>> +         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
>>> +                       false);
>>>           if (flag_indirect_inlining)
>>>             add_new_edges_to_heap (edge_heap, new_indirect_edges);
>>>
>>> @@ -1733,7 +1734,7 @@ flatten_function (struct cgraph_node *node, bool e
>>>                  xstrdup (cgraph_node_name (callee)),
>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>        orig_callee = callee;
>>> -      inline_call (e, true, NULL, NULL, false);
>>> +      inline_call (e, true, NULL, NULL, false, early);
>>>        if (e->callee != orig_callee)
>>>         orig_callee->symbol.aux = (void *) node;
>>>        flatten_function (e->callee, early);
>>> @@ -1852,7 +1853,8 @@ ipa_inline (void)
>>>                                    inline_summary
>>> (node->callers->caller)->size);
>>>                         }
>>>
>>> -                     inline_call (node->callers, true, NULL, NULL, true);
>>> +                     inline_call (node->callers, true, NULL, NULL, true,
>>> +                                   false);
>>>                       if (dump_file)
>>>                         fprintf (dump_file,
>>>                                  " Inlined into %s which now has %i size\n",
>>> @@ -1925,7 +1927,7 @@ inline_always_inline_functions (struct cgraph_node
>>>         fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
>>>                  xstrdup (cgraph_node_name (e->callee)),
>>>                  xstrdup (cgraph_node_name (e->caller)));
>>> -      inline_call (e, true, NULL, NULL, false);
>>> +      inline_call (e, true, NULL, NULL, false, true);
>>>        inlined = true;
>>>      }
>>>    if (inlined)
>>> @@ -1977,7 +1979,7 @@ early_inline_small_functions (struct cgraph_node *
>>>         fprintf (dump_file, " Inlining %s into %s.\n",
>>>                  xstrdup (cgraph_node_name (callee)),
>>>                  xstrdup (cgraph_node_name (e->caller)));
>>> -      inline_call (e, true, NULL, NULL, true);
>>> +      inline_call (e, true, NULL, NULL, true, true);
>>>        inlined = true;
>>>      }
>>>
>>> Index: ipa-inline.h
>>> ===================================================================
>>> --- ipa-inline.h        (revision 201461)
>>> +++ ipa-inline.h        (working copy)
>>> @@ -228,7 +228,8 @@ void free_growth_caches (void);
>>>  void compute_inline_parameters (struct cgraph_node *, bool);
>>>
>>>  /* In ipa-inline-transform.c  */
>>> -bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
>>> int *, bool);
>>> +bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
>>> +                  bool, bool);
>>>  unsigned int inline_transform (struct cgraph_node *);
>>>  void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);
>>>
>>> Index: testsuite/gcc.dg/pr40209.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/pr40209.c  (revision 201461)
>>> +++ testsuite/gcc.dg/pr40209.c  (working copy)
>>> @@ -1,5 +1,5 @@
>>>  /* { dg-do compile } */
>>> -/* { dg-options "-O2 -fprofile-use" } */
>>> +/* { dg-options "-O2 -fprofile-use -fopt-info" } */
>>>
>>>  void process(const char *s);
>>>
>>> Index: testsuite/gcc.dg/pr26570.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/pr26570.c  (revision 201461)
>>> +++ testsuite/gcc.dg/pr26570.c  (working copy)
>>> @@ -1,5 +1,5 @@
>>>  /* { dg-do compile } */
>>> -/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
>>> +/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */
>>>
>>>  unsigned test (unsigned a, unsigned b)
>>>  {
>>> Index: testsuite/gcc.dg/pr32773.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/pr32773.c  (revision 201461)
>>> +++ testsuite/gcc.dg/pr32773.c  (working copy)
>>> @@ -1,6 +1,6 @@
>>>  /* { dg-do compile } */
>>> -/* { dg-options "-O -fprofile-use" } */
>>> -/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
>>> +/* { dg-options "-O -fprofile-use -fopt-info" } */
>>> +/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */
>>>
>>>  void foo (int *p)
>>>  {
>>> Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
>>> ===================================================================
>>> --- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
>>> +++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
>>> @@ -1,7 +1,7 @@
>>>  // PR tree-optimization/39557
>>>  // invalid post-dom info leads to infinite loop
>>>  // { dg-do run }
>>> -// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
>>> +// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
>>> -fno-rtti" }
>>>
>>>  struct C
>>>  {
>>> Index: testsuite/gcc.dg/inline-dump.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
>>> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
>>> @@ -0,0 +1,11 @@
>>> +/* Verify that -fopt-info can output correct inline info.  */
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
>>> +static inline int leaf() {
>>> +  int i, ret = 0;
>>> +  for (i = 0; i < 10; i++)
>>> +    ret += i;
>>> +  return ret;
>>> +}
>>> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
>>> leaf .*inlined into bar .*via inline instance foo.*\n" } */
>>> +int bar(void) { return foo(); }
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Martin
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Richard Biener Aug. 29, 2013, 10:04 a.m. UTC | #9
On Wed, Aug 28, 2013 at 4:09 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Wed, Aug 28, 2013 at 4:01 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Aug 7, 2013 at 7:23 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>>>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>>>>> >> This patch ports messages to the new dump framework,
>>>>>> >
>>>>>> > It would be great this new framework was documented somewhere.  I lost
>>>>>> > track of what was agreed it would be and from the uses in the
>>>>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>>>>
>>>>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>>>>> wiki or elsewhere?
>>>>>
>>>>> Thanks
>>>>>
>>>>>>
>>>>>> >
>>>>>> > I'd also like to point out two other minor things inline:
>>>>>> >
>>>>>> > [...]
>>>>>> >
>>>>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>>> >>             Dehao Chen  <dehao@google.com>
>>>>>> >>
>>>>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>>>>> >>         consistent.
>>>>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>>>>> >>         (cgraph_node_opt_info): New function.
>>>>>> >>         (cgraph_node_call_chain): Ditto.
>>>>>> >>         (dump_inline_decision): Ditto.
>>>>>> >>         (inline_call): Invoke dump_inline_decision.
>>>>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>>> >>         (compute_branch_probabilities): Ditto.
>>>>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>>> >>         when pass not in any opt group.
>>>>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>>>>> >>         (find_func_by_funcdef_no): Ditto.
>>>>>> >>         (check_ic_target): Ditto.
>>>>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>>>>> >>         (coverage_init): Setup new dump framework.
>>>>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>>>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>>>>> >>
>>>>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>>>>> >>
>>>>>> >
>>>>>> > [...]
>>>>>> >
>>>>>> >> Index: ipa-inline-transform.c
>>>>>> >> ===================================================================
>>>>>> >> --- ipa-inline-transform.c      (revision 201461)
>>>>>> >> +++ ipa-inline-transform.c      (working copy)
>>>>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>>> >>  }
>>>>>> >>
>>>>>> >>
>>>>>> >> +#define MAX_INT_LENGTH 20
>>>>>> >> +
>>>>>> >> +/* Return NODE's name and profile count, if available.  */
>>>>>> >> +
>>>>>> >> +static const char *
>>>>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>>>>> >> +{
>>>>>> >> +  char *buf;
>>>>>> >> +  size_t buf_size;
>>>>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>>>> >> +
>>>>>> >> +  if (!bfd_name)
>>>>>> >> +    bfd_name = "unknown";
>>>>>> >> +
>>>>>> >> +  buf_size = strlen (bfd_name) + 1;
>>>>>> >> +  if (profile_info)
>>>>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>>>>> >> +
>>>>>> >> +  buf = (char *) xmalloc (buf_size);
>>>>>> >> +
>>>>>> >> +  strcpy (buf, bfd_name);
>>>>>> >> +
>>>>>> >> +  if (profile_info)
>>>>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>>>> >> +  return buf;
>>>>>> >> +}
>>>>>> >
>>>>>> > I'm not sure if output of this function is aimed only at the user or
>>>>>> > if it is supposed to be used by gcc developers as well.  If the
>>>>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>>>>> > too.  We usually dump it after "/" sign separating it from node name.
>>>>>> > It is invaluable when examining decisions in C++ code where you can
>>>>>> > have lots of clones of a node (and also because existing dumps print
>>>>>> > it, it is easy to combine them).
>>>>>>
>>>>>> The output is useful for both power users doing performance tuning of
>>>>>> their application, and by gcc developers. Adding the id is not so
>>>>>> useful for the former, but I agree that it is very useful for compiler
>>>>>> developers. In fact, in the google branch version we emit more verbose
>>>>>> information (the lipo module id and the funcdef_no) to help uniquely
>>>>>> identify the routines and to aid in post-processing by humans and
>>>>>> tools. So it is probably useful to add something similar here too. Is
>>>>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>>>>> that you added a patch a few months ago to print the
>>>>>> node->symbol.order in the function header, and it also has the
>>>>>> advantage as you note of matching up with existing ipa dumps.
>>>>>
>>>>> node->symbol.order is unique and if I remember correctly, it is not
>>>>> even recycled.  Clones, inline clones, thunks, every symbol table node
>>>>> gets its own symbol order so it should be more unique than funcdef_no.
>>>>> On the other hand it may be a bit cryptic for users but at the same
>>>>> time it is only one number.
>>>>
>>>> Ok, I am going to go ahead and add this to the output.
>>>>
>>>>>
>>>>>>
>>>>>> >
>>>>>> > [...]
>>>>>> >
>>>>>> >> Index: ipa-inline.c
>>>>>> >> ===================================================================
>>>>>> >> --- ipa-inline.c        (revision 201461)
>>>>>> >> +++ ipa-inline.c        (working copy)
>>>>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>>>>> >>  static int overall_size;
>>>>>> >>  static gcov_type max_count;
>>>>>> >>
>>>>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>>>>> >> +bool is_in_ipa_inline = false;
>>>>>> >> +
>>>>>> >>  /* Return false when inlining edge E would lead to violating
>>>>>> >>     limits on function unit growth or stack usage growth.
>>>>>> >>
>>>>>> >
>>>>>> > In this age of removing global variables, are you sure you need this?
>>>>>> > The only user of this seems to be a function that is only being called
>>>>>> > from inline_call... can that ever happen when not inlining?  If you
>>>>>> > plan to use this function also elsewhere, perhaps the callers will
>>>>>> > know whether we are inlining or not and can provide this in a
>>>>>> > parameter?
>>>>>>
>>>>>> This is to distinguish early inlining from ipa inlining.
>>>>>
>>>>> Oh, right, I did not realize that the IPA part was the important bit
>>>>> of the name.
>>>>>
>>>>>> The volume of
>>>>>> early inlining messages is too high to be on for the default setting
>>>>>> of -fopt-info, and are not as interesting usually for performance
>>>>>> tuning. The dumper will only emit the early inline messages under a
>>>>>> more verbose setting (MSG_NOTE):
>>>>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>>>>> The other way I can see to distinguish this would be to check the
>>>>>> always_inline_functions_inlined flag on the caller's function. It
>>>>>> could also be possible to pass down a flag from the callers of
>>>>>> inline_call, but at least one caller (flatten_functions) is shared
>>>>>> between early and late inlining, so the flag needs to be passed
>>>>>> through that as well. WDYT?
>>>>>
>>>>> Did you mean flatten_function?  It already has a bool "early"
>>>>> parameter.  But I can see that being able to quickly figure out
>>>>> whether we are in early inliner or ipa inliner without much hassle is
>>>>> useful enough to justify a global variable a month ago, however I
>>>>> suppose we should not be introducing them now and so you'd have to put
>>>>> such stuff into... well, you'd probably have to put into the universe
>>>>> object somewhere because it is basically shared between two passes.
>>>>> Another option, even though somewhat hackish, would be to look at
>>>>> current_pass and see which pass it is.  I don't know, do what is
>>>>> easier or what you like more, just be aware of the problem.
>>>>
>>>> After thinking about this some more, I think passing down an early
>>>> flag from callers is the cleanest way to go.
>>>>
>>>> I'll fix these and post a new patch later today.
>>>
>>> New patch below that removes this global variable, and also outputs
>>> the node->symbol.order (in square brackets after the function name so
>>> as to not clutter it). Inline messages with profile data look look:
>>>
>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>
>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>> understandable by GCC users, not only GCC developers.
>
> The main part that is only useful/understandable to gcc developers is
> the node->symbol.order in square brackes, requested by Martin. One
> possibility is that I could put that part under a param, disabled by
> default. We have something similar on the google branches that emits
> LIPO module info in the message, enabled via a param.

But we have _dump files_ for that.  That's the developer-consumed
form of opt-info.  -fopt-info is purely user sugar and for usual translation
units it shouldn't exceed a single terminal full of output.

> I'd argue that the other information (the profile counts, emitted only
> when using -fprofile-use, and the inline call chains) are useful if
> you want to understand whether and how critical inlines are occurring.
> I think this is the type of information that users focused on
> optimizations, as well as gcc developers, want when they use
> -fopt-info. Otherwise it is difficult to make sense of the inline
> information.

Well, I doubt that inline information is interesting to users unless we are
able to aggressively filter it to what users are interested in.  Which IMHO
isn't possible - users are interested in "I have not inlined this even though
inlining would severely improve performance" which would indicate a bug
in the heuristics we can reliably detect and thus it wouldn't be there.

Richard.

>>
>>> (without FDO the counts in parentheses and the call count would not be
>>> included).
>>>
>>> Ok for trunk?
>>
>> Let's split this patch.
>
> Ok.
>
>>
>>> Thanks,
>>> Teresa
>>>
>>> 013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>             Dehao Chen  <dehao@google.com>
>>>
>>>         * dumpfile.c (dump_loc): Output column number, make newlines consistent.
>>
>> I don't like column numbers, they are of not much use generally.
>
> I added these here to get consistency with other messages (notes
> emitted via inform(), warnings, errors). Plus the dg-message testing
> was failing for the test cases that parse this output, since it
> expects the column to exist.
>
>> Does
>> 'make newlines consitent' avoid all the spurious vertical spacing I see with
>> -fopt-info?
>
> Well, it helps get us there. The problem was that before, since
> dump_loc was not consistently emitting newlines, the calls had to emit
> their own newlines manually in the string to ensure there was a
> newline at all. I was thinking that once this is fixed I could go back
> and clean up all those calls by removing the newlines in the string. I
> could split this part into a separate patch and do both at once.
>
> However, after thinking about this some more this morning, I am
> wondering whether it is better to remove the newline emission
> completely from dump_loc and rely on the caller to put the newline in
> the string. The reason is that there are 2 high level interfaces to
> the new dump infrastructure, dump_printf() and dump_printf_loc(). Only
> the latter invokes dump_loc and gets the newline at the start of the
> message. The typical usage seems to be to start a message via
> dump_printf_loc, and then use dump_printf to emit parts of the message
> (thus not requiring a newline), but I think it may lead to problems to
> rely on this assumption.
>
> So if you agree, I will simply remove the newline altogether from
> dump_loc, and ensure that all clients of dump_printf/dump_printf_loc
> include a newline char as appropriate in the string they pass.
>
>>
>>>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>
>> Good change - please split this out (with the related changes) and commit it.
>
> Ok, thanks. Will do.
>
>>
>>>         * ipa-inline-transform.c (cgraph_node_opt_info): New function.
>>>         (cgraph_node_call_chain): Ditto.
>>>         (dump_inline_decision): Ditto.
>>>         (inline_call): Invoke dump_inline_decision, new parameter.
>>
>> The inline stuff should be split and re-sent, it's non-obvious to me (extra
>> function parameters are not documented for example).  I'd rather have
>> inline_and_report_call () for example instead of an extra bool parameter.
>> But let's iterate over this once it's split out.
>
> Ok, I will send this separately. I guess we could have a separate
> interface inline_and_report_call that is a wrapper around inline_call
> and simply invokes the dumper. Note that flatten_function will need to
> conditionally call one of the two interfaces based on the value of its
> bool early parameter though.
>
>>
>>>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>         (compute_branch_probabilities): Ditto.
>>>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>         when pass not in any opt group.
>>>         * value-prof.c (check_counter): Use new dump framework.
>>>         (find_func_by_funcdef_no): Ditto.
>>>         (check_ic_target): Ditto.
>>>         * coverage.c (get_coverage_counts): Ditto.
>>>         (coverage_init): Setup new dump framework.
>>
>> These pieces look good to me.
>>
>>>         * ipa-inline.c (recursive_inlining): New inline_call parameter.
>>>         (inline_small_functions): Ditto.
>>>         (flatten_function): Ditto.
>>>         (ipa_inline): Ditto.
>>>         (inline_always_inline_functions): Ditto.
>>>         (early_inline_small_functions): Ditto.
>>>         * ipa-inline.h: Ditto.
>>>
>>>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>         * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.
>>
>> Why?  Just remove the stray dg- annotations that deal with the unwanted output?
>
> Because there are dg-message annotations that want to confirm this output.
>
> Teresa
>
>>
>> Thanks,
>> Richard.
>>
>>>         * testsuite/gcc.dg/inline-dump.c: New test.
>>>
>>> Index: dumpfile.c
>>> ===================================================================
>>> --- dumpfile.c  (revision 201461)
>>> +++ dumpfile.c  (working copy)
>>> @@ -257,16 +257,18 @@ dump_open_alternate_stream (struct dump_file_info
>>>  void
>>>  dump_loc (int dump_kind, FILE *dfile, source_location loc)
>>>  {
>>> -  /* Currently vectorization passes print location information.  */
>>>    if (dump_kind)
>>>      {
>>> +      /* Ensure dump message starts on a new line.  */
>>> +      fprintf (dfile, "\n");
>>>        if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
>>> -        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
>>> -                 LOCATION_LINE (loc));
>>> +        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
>>> +                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
>>>        else if (current_function_decl)
>>> -        fprintf (dfile, "\n%s:%d: note: ",
>>> +        fprintf (dfile, "%s:%d:%d: note: ",
>>>                   DECL_SOURCE_FILE (current_function_decl),
>>> -                 DECL_SOURCE_LINE (current_function_decl));
>>> +                 DECL_SOURCE_LINE (current_function_decl),
>>> +                 DECL_SOURCE_COLUMN (current_function_decl));
>>>      }
>>>  }
>>>
>>> Index: dumpfile.h
>>> ===================================================================
>>> --- dumpfile.h  (revision 201461)
>>> +++ dumpfile.h  (working copy)
>>> @@ -97,8 +97,9 @@ enum tree_dump_index
>>>  #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
>>>  #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
>>>  #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
>>> +#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
>>>  #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
>>> -                              | OPTGROUP_VEC)
>>> +                              | OPTGROUP_VEC | OPTGROUP_OTHER)
>>>
>>>  /* Define a tree dump switch.  */
>>>  struct dump_file_info
>>> Index: ipa-inline-transform.c
>>> ===================================================================
>>> --- ipa-inline-transform.c      (revision 201461)
>>> +++ ipa-inline-transform.c      (working copy)
>>> @@ -192,6 +192,111 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>  }
>>>
>>>
>>> +#define MAX_INT_LENGTH 20
>>> +
>>> +/* Return NODE's name and profile count, if available.  */
>>> +
>>> +static const char *
>>> +cgraph_node_opt_info (struct cgraph_node *node)
>>> +{
>>> +  char *buf;
>>> +  size_t buf_size;
>>> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>> +
>>> +  if (!bfd_name)
>>> +    bfd_name = "unknown";
>>> +
>>> +  buf_size = strlen (bfd_name) + 1;
>>> +  if (profile_info)
>>> +    buf_size += (MAX_INT_LENGTH + 3);
>>> +  buf_size += MAX_INT_LENGTH;
>>> +
>>> +  buf = (char *) xmalloc (buf_size);
>>> +
>>> +  strcpy (buf, bfd_name);
>>> +  //sprintf (buf, "%s/%i", buf, node->symbol.order);
>>> +  sprintf (buf, "%s [%i]", buf, node->symbol.order);
>>> +
>>> +  if (profile_info)
>>> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>> +  return buf;
>>> +}
>>> +
>>> +
>>> +/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
>>> +   function that the caller is inlined to in FINAL_CALLER.  */
>>> +
>>> +static const char *
>>> +cgraph_node_call_chain (struct cgraph_node *caller,
>>> +                       struct cgraph_node **final_caller)
>>> +{
>>> +  struct cgraph_node *node;
>>> +  const char *via_str = " (via inline instance";
>>> +  size_t current_string_len = strlen (via_str) + 1;
>>> +  size_t buf_size = current_string_len;
>>> +  char *buf = (char *) xmalloc (buf_size);
>>> +
>>> +  buf[0] = 0;
>>> +  gcc_assert (caller->global.inlined_to != NULL);
>>> +  strcat (buf, via_str);
>>> +  for (node = caller; node->global.inlined_to != NULL;
>>> +       node = node->callers->caller)
>>> +    {
>>> +      const char *name = cgraph_node_opt_info (node);
>>> +      current_string_len += (strlen (name) + 1);
>>> +      if (current_string_len >= buf_size)
>>> +       {
>>> +         buf_size = current_string_len * 2;
>>> +         buf = (char *) xrealloc (buf, buf_size);
>>> +       }
>>> +      strcat (buf, " ");
>>> +      strcat (buf, name);
>>> +    }
>>> +  strcat (buf, ")");
>>> +  *final_caller = node;
>>> +  return buf;
>>> +}
>>> +
>>> +
>>> +/* Dump the inline decision of EDGE.  */
>>> +
>>> +static void
>>> +dump_inline_decision (struct cgraph_edge *edge, bool early)
>>> +{
>>> +  location_t locus;
>>> +  const char *inline_chain_text;
>>> +  const char *call_count_text;
>>> +  struct cgraph_node *final_caller = edge->caller;
>>> +
>>> +  if (final_caller->global.inlined_to != NULL)
>>> +    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
>>> +  else
>>> +    inline_chain_text = "";
>>> +
>>> +  if (edge->count > 0)
>>> +    {
>>> +      const char *call_count_str = " with call count ";
>>> +      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
>>> +      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
>>> +              edge->count);
>>> +      call_count_text = buf;
>>> +    }
>>> +  else
>>> +    {
>>> +      call_count_text = "";
>>> +    }
>>> +
>>> +  locus = gimple_location (edge->call_stmt);
>>> +  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
>>> +                   locus,
>>> +                   "%s inlined into %s%s%s\n",
>>> +                   cgraph_node_opt_info (edge->callee),
>>> +                   cgraph_node_opt_info (final_caller),
>>> +                   call_count_text,
>>> +                   inline_chain_text);
>>> +}
>>> +
>>> +
>>>  /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
>>>     specify whether profile of original function should be updated.  If any new
>>>     indirect edges are discovered in the process, add them to NEW_EDGES, unless
>>> @@ -205,7 +310,8 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>  bool
>>>  inline_call (struct cgraph_edge *e, bool update_original,
>>>              vec<cgraph_edge_p> *new_edges,
>>> -            int *overall_size, bool update_overall_summary)
>>> +            int *overall_size, bool update_overall_summary,
>>> +             bool early)
>>>  {
>>>    int old_size = 0, new_size = 0;
>>>    struct cgraph_node *to = NULL;
>>> @@ -218,6 +324,9 @@ inline_call (struct cgraph_edge *e, bool update_or
>>>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>>>  #endif
>>>
>>> +  if (dump_enabled_p ())
>>> +    dump_inline_decision (e, early);
>>> +
>>>    /* Don't inline inlined edges.  */
>>>    gcc_assert (e->inline_failed);
>>>    /* Don't even think of inlining inline clone.  */
>>> Index: doc/invoke.texi
>>> ===================================================================
>>> --- doc/invoke.texi     (revision 201461)
>>> +++ doc/invoke.texi     (working copy)
>>> @@ -6234,6 +6234,9 @@ Enable dumps from all loop optimizations.
>>>  Enable dumps from all inlining optimizations.
>>>  @item vec
>>>  Enable dumps from all vectorization optimizations.
>>> +@item optall
>>> +Enable dumps from all optimizations. This is a superset of
>>> +the optimization groups listed above.
>>>  @end table
>>>
>>>  For example,
>>> Index: profile.c
>>> ===================================================================
>>> --- profile.c   (revision 201461)
>>> +++ profile.c   (working copy)
>>> @@ -432,8 +432,8 @@ read_profile_edge_counts (gcov_type *exec_counts)
>>>                     if (flag_profile_correction)
>>>                       {
>>>                         static bool informed = 0;
>>> -                       if (!informed)
>>> -                         inform (input_location,
>>> +                       if (dump_enabled_p () && !informed)
>>> +                         dump_printf_loc (MSG_NOTE, input_location,
>>>                                   "corrupted profile info: edge count
>>> exceeds maximal count");
>>>                         informed = 1;
>>>                       }
>>> @@ -692,10 +692,11 @@ compute_branch_probabilities (unsigned cfg_checksu
>>>         {
>>>           /* Inconsistency detected. Make it flow-consistent. */
>>>           static int informed = 0;
>>> -         if (informed == 0)
>>> +         if (dump_enabled_p () && informed == 0)
>>>             {
>>>               informed = 1;
>>> -             inform (input_location, "correcting inconsistent profile data");
>>> +             dump_printf_loc (MSG_NOTE, input_location,
>>> +                              "correcting inconsistent profile data");
>>>             }
>>>           correct_negative_edge_counts ();
>>>           /* Set bb counts to the sum of the outgoing edge counts */
>>> Index: passes.c
>>> ===================================================================
>>> --- passes.c    (revision 201461)
>>> +++ passes.c    (working copy)
>>> @@ -524,6 +524,11 @@ pass_manager::register_one_dump_file (struct opt_p
>>>    flag_name = concat (prefix, name, num, NULL);
>>>    glob_name = concat (prefix, name, NULL);
>>>    optgroup_flags |= pass->optinfo_flags;
>>> +  /* For any passes that do not have an optgroup set, and which are not
>>> +     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
>>> +     any dump messages are emitted properly under -fopt-info(-optall).  */
>>> +  if (optgroup_flags == OPTGROUP_NONE)
>>> +    optgroup_flags = OPTGROUP_OTHER;
>>>    id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
>>>    set_pass_for_id (id, pass);
>>>    full_name = concat (prefix, pass->name, num, NULL);
>>> Index: value-prof.c
>>> ===================================================================
>>> --- value-prof.c        (revision 201461)
>>> +++ value-prof.c        (working copy)
>>> @@ -585,9 +585,11 @@ check_counter (gimple stmt, const char * name,
>>>                : DECL_SOURCE_LOCATION (current_function_decl);
>>>        if (flag_profile_correction)
>>>          {
>>> -         inform (locus, "correcting inconsistent value profile: "
>>> -                 "%s profiler overall count (%d) does not match BB count "
>>> -                  "(%d)", name, (int)*all, (int)bb_count);
>>> +          if (dump_enabled_p ())
>>> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>> +                             "correcting inconsistent value profile: %s "
>>> +                             "profiler overall count (%d) does not match BB "
>>> +                             "count (%d)", name, (int)*all, (int)bb_count);
>>>           *all = bb_count;
>>>           if (*count > *all)
>>>              *count = *all;
>>> @@ -1209,9 +1211,11 @@ find_func_by_funcdef_no (int func_id)
>>>    int max_id = get_last_funcdef_no ();
>>>    if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
>>>      {
>>> -      if (flag_profile_correction)
>>> -        inform (DECL_SOURCE_LOCATION (current_function_decl),
>>> -                "Inconsistent profile: indirect call target (%d) does
>>> not exist", func_id);
>>> +      if (flag_profile_correction && dump_enabled_p ())
>>> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>>> +                         DECL_SOURCE_LOCATION (current_function_decl),
>>> +                         "Inconsistent profile: indirect call target (%d) "
>>> +                         "does not exist", func_id);
>>>        else
>>>          error ("Inconsistent profile: indirect call target (%d) does
>>> not exist", func_id);
>>>
>>> @@ -1235,8 +1239,10 @@ check_ic_target (gimple call_stmt, struct cgraph_n
>>>       return true;
>>>
>>>     locus =  gimple_location (call_stmt);
>>> -   inform (locus, "Skipping target %s with mismatching types for icall ",
>>> -           cgraph_node_name (target));
>>> +   if (dump_enabled_p ())
>>> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>> +                      "Skipping target %s with mismatching types for icall ",
>>> +                      cgraph_node_name (target));
>>>     return false;
>>>  }
>>>
>>> Index: coverage.c
>>> ===================================================================
>>> --- coverage.c  (revision 201461)
>>> +++ coverage.c  (working copy)
>>> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "langhooks.h"
>>>  #include "hash-table.h"
>>>  #include "tree-iterator.h"
>>> +#include "tree-pass.h"
>>>  #include "cgraph.h"
>>>  #include "dumpfile.h"
>>>  #include "diagnostic-core.h"
>>> @@ -341,11 +342,13 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>      {
>>>        static int warned = 0;
>>>
>>> -      if (!warned++)
>>> -       inform (input_location, (flag_guess_branch_prob
>>> -                ? "file %s not found, execution counts estimated"
>>> -                : "file %s not found, execution counts assumed to be zero"),
>>> -               da_file_name);
>>> +      if (!warned++ && dump_enabled_p ())
>>> +       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                         (flag_guess_branch_prob
>>> +                          ? "file %s not found, execution counts estimated"
>>> +                          : "file %s not found, execution counts assumed to "
>>> +                            "be zero"),
>>> +                         da_file_name);
>>>        return NULL;
>>>      }
>>>
>>> @@ -369,21 +372,25 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>         warning_at (input_location, OPT_Wcoverage_mismatch,
>>>                     "the control flow of function %qE does not match "
>>>                     "its profile data (counter %qs)", id, ctr_names[counter]);
>>> -      if (warning_printed)
>>> +      if (warning_printed && dump_enabled_p ())
>>>         {
>>> -        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
>>> -                "the mismatch but performance may drop if the
>>> function is hot");
>>> +          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                           "use -Wno-error=coverage-mismatch to tolerate "
>>> +                           "the mismatch but performance may drop if the "
>>> +                           "function is hot");
>>>
>>>           if (!seen_error ()
>>>               && !warned++)
>>>             {
>>> -             inform (input_location, "coverage mismatch ignored");
>>> -             inform (input_location, flag_guess_branch_prob
>>> -                     ? G_("execution counts estimated")
>>> -                     : G_("execution counts assumed to be zero"));
>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                               "coverage mismatch ignored");
>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                               flag_guess_branch_prob
>>> +                               ? G_("execution counts estimated")
>>> +                               : G_("execution counts assumed to be zero"));
>>>               if (!flag_guess_branch_prob)
>>> -               inform (input_location,
>>> -                       "this can result in poorly optimized code");
>>> +               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>> +                                 "this can result in poorly optimized code");
>>>             }
>>>         }
>>>
>>> @@ -1103,6 +1110,11 @@ coverage_init (const char *filename)
>>>    int len = strlen (filename);
>>>    int prefix_len = 0;
>>>
>>> +  /* Since coverage_init is invoked very early, before the pass
>>> +     manager, we need to set up the dumping explicitly. This is
>>> +     similar to the handling in finish_optimization_passes.  */
>>> +  dump_start (pass_profile.pass.static_pass_number, NULL);
>>> +
>>>    if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
>>>      profile_data_prefix = getpwd ();
>>>
>>> @@ -1145,6 +1157,8 @@ coverage_init (const char *filename)
>>>           gcov_write_unsigned (bbg_file_stamp);
>>>         }
>>>      }
>>> +
>>> +  dump_finish (pass_profile.pass.static_pass_number);
>>>  }
>>>
>>>  /* Performs file-level cleanup.  Close notes file, generate coverage
>>> Index: ipa-inline.c
>>> ===================================================================
>>> --- ipa-inline.c        (revision 201461)
>>> +++ ipa-inline.c        (working copy)
>>> @@ -1322,7 +1322,7 @@ recursive_inlining (struct cgraph_edge *edge,
>>>            reset_edge_growth_cache (curr);
>>>         }
>>>
>>> -      inline_call (curr, false, new_edges, &overall_size, true);
>>> +      inline_call (curr, false, new_edges, &overall_size, true, false);
>>>        lookup_recursive_calls (node, curr->callee, heap);
>>>        n++;
>>>      }
>>> @@ -1612,7 +1612,8 @@ inline_small_functions (void)
>>>             fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
>>>
>>>           gcc_checking_assert (!callee->global.inlined_to);
>>> -         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
>>> +         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
>>> +                       false);
>>>           if (flag_indirect_inlining)
>>>             add_new_edges_to_heap (edge_heap, new_indirect_edges);
>>>
>>> @@ -1733,7 +1734,7 @@ flatten_function (struct cgraph_node *node, bool e
>>>                  xstrdup (cgraph_node_name (callee)),
>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>        orig_callee = callee;
>>> -      inline_call (e, true, NULL, NULL, false);
>>> +      inline_call (e, true, NULL, NULL, false, early);
>>>        if (e->callee != orig_callee)
>>>         orig_callee->symbol.aux = (void *) node;
>>>        flatten_function (e->callee, early);
>>> @@ -1852,7 +1853,8 @@ ipa_inline (void)
>>>                                    inline_summary
>>> (node->callers->caller)->size);
>>>                         }
>>>
>>> -                     inline_call (node->callers, true, NULL, NULL, true);
>>> +                     inline_call (node->callers, true, NULL, NULL, true,
>>> +                                   false);
>>>                       if (dump_file)
>>>                         fprintf (dump_file,
>>>                                  " Inlined into %s which now has %i size\n",
>>> @@ -1925,7 +1927,7 @@ inline_always_inline_functions (struct cgraph_node
>>>         fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
>>>                  xstrdup (cgraph_node_name (e->callee)),
>>>                  xstrdup (cgraph_node_name (e->caller)));
>>> -      inline_call (e, true, NULL, NULL, false);
>>> +      inline_call (e, true, NULL, NULL, false, true);
>>>        inlined = true;
>>>      }
>>>    if (inlined)
>>> @@ -1977,7 +1979,7 @@ early_inline_small_functions (struct cgraph_node *
>>>         fprintf (dump_file, " Inlining %s into %s.\n",
>>>                  xstrdup (cgraph_node_name (callee)),
>>>                  xstrdup (cgraph_node_name (e->caller)));
>>> -      inline_call (e, true, NULL, NULL, true);
>>> +      inline_call (e, true, NULL, NULL, true, true);
>>>        inlined = true;
>>>      }
>>>
>>> Index: ipa-inline.h
>>> ===================================================================
>>> --- ipa-inline.h        (revision 201461)
>>> +++ ipa-inline.h        (working copy)
>>> @@ -228,7 +228,8 @@ void free_growth_caches (void);
>>>  void compute_inline_parameters (struct cgraph_node *, bool);
>>>
>>>  /* In ipa-inline-transform.c  */
>>> -bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
>>> int *, bool);
>>> +bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
>>> +                  bool, bool);
>>>  unsigned int inline_transform (struct cgraph_node *);
>>>  void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);
>>>
>>> Index: testsuite/gcc.dg/pr40209.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/pr40209.c  (revision 201461)
>>> +++ testsuite/gcc.dg/pr40209.c  (working copy)
>>> @@ -1,5 +1,5 @@
>>>  /* { dg-do compile } */
>>> -/* { dg-options "-O2 -fprofile-use" } */
>>> +/* { dg-options "-O2 -fprofile-use -fopt-info" } */
>>>
>>>  void process(const char *s);
>>>
>>> Index: testsuite/gcc.dg/pr26570.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/pr26570.c  (revision 201461)
>>> +++ testsuite/gcc.dg/pr26570.c  (working copy)
>>> @@ -1,5 +1,5 @@
>>>  /* { dg-do compile } */
>>> -/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
>>> +/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */
>>>
>>>  unsigned test (unsigned a, unsigned b)
>>>  {
>>> Index: testsuite/gcc.dg/pr32773.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/pr32773.c  (revision 201461)
>>> +++ testsuite/gcc.dg/pr32773.c  (working copy)
>>> @@ -1,6 +1,6 @@
>>>  /* { dg-do compile } */
>>> -/* { dg-options "-O -fprofile-use" } */
>>> -/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
>>> +/* { dg-options "-O -fprofile-use -fopt-info" } */
>>> +/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */
>>>
>>>  void foo (int *p)
>>>  {
>>> Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
>>> ===================================================================
>>> --- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
>>> +++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
>>> @@ -1,7 +1,7 @@
>>>  // PR tree-optimization/39557
>>>  // invalid post-dom info leads to infinite loop
>>>  // { dg-do run }
>>> -// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
>>> +// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
>>> -fno-rtti" }
>>>
>>>  struct C
>>>  {
>>> Index: testsuite/gcc.dg/inline-dump.c
>>> ===================================================================
>>> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
>>> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
>>> @@ -0,0 +1,11 @@
>>> +/* Verify that -fopt-info can output correct inline info.  */
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
>>> +static inline int leaf() {
>>> +  int i, ret = 0;
>>> +  for (i = 0; i < 10; i++)
>>> +    ret += i;
>>> +  return ret;
>>> +}
>>> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
>>> leaf .*inlined into bar .*via inline instance foo.*\n" } */
>>> +int bar(void) { return foo(); }
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Martin
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Richard Biener Aug. 29, 2013, 10:05 a.m. UTC | #10
On Wed, Aug 28, 2013 at 5:20 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Wed, Aug 28, 2013 at 7:09 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Wed, Aug 28, 2013 at 4:01 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Wed, Aug 7, 2013 at 7:23 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>>>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>>>>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>>>>>> >> This patch ports messages to the new dump framework,
>>>>>>> >
>>>>>>> > It would be great this new framework was documented somewhere.  I lost
>>>>>>> > track of what was agreed it would be and from the uses in the
>>>>>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>>>>>
>>>>>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>>>>>> wiki or elsewhere?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>>
>>>>>>> >
>>>>>>> > I'd also like to point out two other minor things inline:
>>>>>>> >
>>>>>>> > [...]
>>>>>>> >
>>>>>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>>>> >>             Dehao Chen  <dehao@google.com>
>>>>>>> >>
>>>>>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>>>>>> >>         consistent.
>>>>>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>>>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>>>>>> >>         (cgraph_node_opt_info): New function.
>>>>>>> >>         (cgraph_node_call_chain): Ditto.
>>>>>>> >>         (dump_inline_decision): Ditto.
>>>>>>> >>         (inline_call): Invoke dump_inline_decision.
>>>>>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>>>> >>         (compute_branch_probabilities): Ditto.
>>>>>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>>>> >>         when pass not in any opt group.
>>>>>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>>>>>> >>         (find_func_by_funcdef_no): Ditto.
>>>>>>> >>         (check_ic_target): Ditto.
>>>>>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>>>>>> >>         (coverage_init): Setup new dump framework.
>>>>>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>>>>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>>>>>> >>
>>>>>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>>>>>> >>
>>>>>>> >
>>>>>>> > [...]
>>>>>>> >
>>>>>>> >> Index: ipa-inline-transform.c
>>>>>>> >> ===================================================================
>>>>>>> >> --- ipa-inline-transform.c      (revision 201461)
>>>>>>> >> +++ ipa-inline-transform.c      (working copy)
>>>>>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>>>> >>  }
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> +#define MAX_INT_LENGTH 20
>>>>>>> >> +
>>>>>>> >> +/* Return NODE's name and profile count, if available.  */
>>>>>>> >> +
>>>>>>> >> +static const char *
>>>>>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>>>>>> >> +{
>>>>>>> >> +  char *buf;
>>>>>>> >> +  size_t buf_size;
>>>>>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>>>>> >> +
>>>>>>> >> +  if (!bfd_name)
>>>>>>> >> +    bfd_name = "unknown";
>>>>>>> >> +
>>>>>>> >> +  buf_size = strlen (bfd_name) + 1;
>>>>>>> >> +  if (profile_info)
>>>>>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>>>>>> >> +
>>>>>>> >> +  buf = (char *) xmalloc (buf_size);
>>>>>>> >> +
>>>>>>> >> +  strcpy (buf, bfd_name);
>>>>>>> >> +
>>>>>>> >> +  if (profile_info)
>>>>>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>>>>> >> +  return buf;
>>>>>>> >> +}
>>>>>>> >
>>>>>>> > I'm not sure if output of this function is aimed only at the user or
>>>>>>> > if it is supposed to be used by gcc developers as well.  If the
>>>>>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>>>>>> > too.  We usually dump it after "/" sign separating it from node name.
>>>>>>> > It is invaluable when examining decisions in C++ code where you can
>>>>>>> > have lots of clones of a node (and also because existing dumps print
>>>>>>> > it, it is easy to combine them).
>>>>>>>
>>>>>>> The output is useful for both power users doing performance tuning of
>>>>>>> their application, and by gcc developers. Adding the id is not so
>>>>>>> useful for the former, but I agree that it is very useful for compiler
>>>>>>> developers. In fact, in the google branch version we emit more verbose
>>>>>>> information (the lipo module id and the funcdef_no) to help uniquely
>>>>>>> identify the routines and to aid in post-processing by humans and
>>>>>>> tools. So it is probably useful to add something similar here too. Is
>>>>>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>>>>>> that you added a patch a few months ago to print the
>>>>>>> node->symbol.order in the function header, and it also has the
>>>>>>> advantage as you note of matching up with existing ipa dumps.
>>>>>>
>>>>>> node->symbol.order is unique and if I remember correctly, it is not
>>>>>> even recycled.  Clones, inline clones, thunks, every symbol table node
>>>>>> gets its own symbol order so it should be more unique than funcdef_no.
>>>>>> On the other hand it may be a bit cryptic for users but at the same
>>>>>> time it is only one number.
>>>>>
>>>>> Ok, I am going to go ahead and add this to the output.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> >
>>>>>>> > [...]
>>>>>>> >
>>>>>>> >> Index: ipa-inline.c
>>>>>>> >> ===================================================================
>>>>>>> >> --- ipa-inline.c        (revision 201461)
>>>>>>> >> +++ ipa-inline.c        (working copy)
>>>>>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>>>>>> >>  static int overall_size;
>>>>>>> >>  static gcov_type max_count;
>>>>>>> >>
>>>>>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>>>>>> >> +bool is_in_ipa_inline = false;
>>>>>>> >> +
>>>>>>> >>  /* Return false when inlining edge E would lead to violating
>>>>>>> >>     limits on function unit growth or stack usage growth.
>>>>>>> >>
>>>>>>> >
>>>>>>> > In this age of removing global variables, are you sure you need this?
>>>>>>> > The only user of this seems to be a function that is only being called
>>>>>>> > from inline_call... can that ever happen when not inlining?  If you
>>>>>>> > plan to use this function also elsewhere, perhaps the callers will
>>>>>>> > know whether we are inlining or not and can provide this in a
>>>>>>> > parameter?
>>>>>>>
>>>>>>> This is to distinguish early inlining from ipa inlining.
>>>>>>
>>>>>> Oh, right, I did not realize that the IPA part was the important bit
>>>>>> of the name.
>>>>>>
>>>>>>> The volume of
>>>>>>> early inlining messages is too high to be on for the default setting
>>>>>>> of -fopt-info, and are not as interesting usually for performance
>>>>>>> tuning. The dumper will only emit the early inline messages under a
>>>>>>> more verbose setting (MSG_NOTE):
>>>>>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>>>>>> The other way I can see to distinguish this would be to check the
>>>>>>> always_inline_functions_inlined flag on the caller's function. It
>>>>>>> could also be possible to pass down a flag from the callers of
>>>>>>> inline_call, but at least one caller (flatten_functions) is shared
>>>>>>> between early and late inlining, so the flag needs to be passed
>>>>>>> through that as well. WDYT?
>>>>>>
>>>>>> Did you mean flatten_function?  It already has a bool "early"
>>>>>> parameter.  But I can see that being able to quickly figure out
>>>>>> whether we are in early inliner or ipa inliner without much hassle is
>>>>>> useful enough to justify a global variable a month ago, however I
>>>>>> suppose we should not be introducing them now and so you'd have to put
>>>>>> such stuff into... well, you'd probably have to put into the universe
>>>>>> object somewhere because it is basically shared between two passes.
>>>>>> Another option, even though somewhat hackish, would be to look at
>>>>>> current_pass and see which pass it is.  I don't know, do what is
>>>>>> easier or what you like more, just be aware of the problem.
>>>>>
>>>>> After thinking about this some more, I think passing down an early
>>>>> flag from callers is the cleanest way to go.
>>>>>
>>>>> I'll fix these and post a new patch later today.
>>>>
>>>> New patch below that removes this global variable, and also outputs
>>>> the node->symbol.order (in square brackets after the function name so
>>>> as to not clutter it). Inline messages with profile data look look:
>>>>
>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>
>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>>> understandable by GCC users, not only GCC developers.
>>
>> The main part that is only useful/understandable to gcc developers is
>> the node->symbol.order in square brackes, requested by Martin. One
>> possibility is that I could put that part under a param, disabled by
>> default. We have something similar on the google branches that emits
>> LIPO module info in the message, enabled via a param.
>>
>> I'd argue that the other information (the profile counts, emitted only
>> when using -fprofile-use, and the inline call chains) are useful if
>> you want to understand whether and how critical inlines are occurring.
>> I think this is the type of information that users focused on
>> optimizations, as well as gcc developers, want when they use
>> -fopt-info. Otherwise it is difficult to make sense of the inline
>> information.
>>
>>>
>>>> (without FDO the counts in parentheses and the call count would not be
>>>> included).
>>>>
>>>> Ok for trunk?
>>>
>>> Let's split this patch.
>>
>> Ok.
>>
>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>> 013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>             Dehao Chen  <dehao@google.com>
>>>>
>>>>         * dumpfile.c (dump_loc): Output column number, make newlines consistent.
>>>
>>> I don't like column numbers, they are of not much use generally.
>>
>> I added these here to get consistency with other messages (notes
>> emitted via inform(), warnings, errors). Plus the dg-message testing
>> was failing for the test cases that parse this output, since it
>> expects the column to exist.
>>
>>> Does
>>> 'make newlines consitent' avoid all the spurious vertical spacing I see with
>>> -fopt-info?
>>
>> Well, it helps get us there. The problem was that before, since
>> dump_loc was not consistently emitting newlines, the calls had to emit
>> their own newlines manually in the string to ensure there was a
>> newline at all. I was thinking that once this is fixed I could go back
>> and clean up all those calls by removing the newlines in the string. I
>> could split this part into a separate patch and do both at once.
>>
>> However, after thinking about this some more this morning, I am
>> wondering whether it is better to remove the newline emission
>> completely from dump_loc and rely on the caller to put the newline in
>> the string. The reason is that there are 2 high level interfaces to
>> the new dump infrastructure, dump_printf() and dump_printf_loc(). Only
>> the latter invokes dump_loc and gets the newline at the start of the
>> message. The typical usage seems to be to start a message via
>> dump_printf_loc, and then use dump_printf to emit parts of the message
>> (thus not requiring a newline), but I think it may lead to problems to
>> rely on this assumption.
>>
>> So if you agree, I will simply remove the newline altogether from
>> dump_loc, and ensure that all clients of dump_printf/dump_printf_loc
>> include a newline char as appropriate in the string they pass.
>
>
> As a helper function, dump_loc should not blindly emit new line as it
> has no context.  I have tried to remove it, and push the newline to
> higher level helpers -- it mostly works, but the vectorizer verbose
> messages need serious clean up -- most of them assume that
> dump_printf_loc does not end with new line, so that the expression
> dump can follow in the same line (the message texts need clean up too
> -- i do not like the === === in info messages).

I know, but we should really do that cleanup.

Richard.

> David
>
>
>>
>>>
>>>>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>
>>> Good change - please split this out (with the related changes) and commit it.
>>
>> Ok, thanks. Will do.
>>
>>>
>>>>         * ipa-inline-transform.c (cgraph_node_opt_info): New function.
>>>>         (cgraph_node_call_chain): Ditto.
>>>>         (dump_inline_decision): Ditto.
>>>>         (inline_call): Invoke dump_inline_decision, new parameter.
>>>
>>> The inline stuff should be split and re-sent, it's non-obvious to me (extra
>>> function parameters are not documented for example).  I'd rather have
>>> inline_and_report_call () for example instead of an extra bool parameter.
>>> But let's iterate over this once it's split out.
>>
>> Ok, I will send this separately. I guess we could have a separate
>> interface inline_and_report_call that is a wrapper around inline_call
>> and simply invokes the dumper. Note that flatten_function will need to
>> conditionally call one of the two interfaces based on the value of its
>> bool early parameter though.
>>
>>>
>>>>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>         (compute_branch_probabilities): Ditto.
>>>>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>         when pass not in any opt group.
>>>>         * value-prof.c (check_counter): Use new dump framework.
>>>>         (find_func_by_funcdef_no): Ditto.
>>>>         (check_ic_target): Ditto.
>>>>         * coverage.c (get_coverage_counts): Ditto.
>>>>         (coverage_init): Setup new dump framework.
>>>
>>> These pieces look good to me.
>>>
>>>>         * ipa-inline.c (recursive_inlining): New inline_call parameter.
>>>>         (inline_small_functions): Ditto.
>>>>         (flatten_function): Ditto.
>>>>         (ipa_inline): Ditto.
>>>>         (inline_always_inline_functions): Ditto.
>>>>         (early_inline_small_functions): Ditto.
>>>>         * ipa-inline.h: Ditto.
>>>>
>>>>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>         * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.
>>>
>>> Why?  Just remove the stray dg- annotations that deal with the unwanted output?
>>
>> Because there are dg-message annotations that want to confirm this output.
>>
>> Teresa
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>         * testsuite/gcc.dg/inline-dump.c: New test.
>>>>
>>>> Index: dumpfile.c
>>>> ===================================================================
>>>> --- dumpfile.c  (revision 201461)
>>>> +++ dumpfile.c  (working copy)
>>>> @@ -257,16 +257,18 @@ dump_open_alternate_stream (struct dump_file_info
>>>>  void
>>>>  dump_loc (int dump_kind, FILE *dfile, source_location loc)
>>>>  {
>>>> -  /* Currently vectorization passes print location information.  */
>>>>    if (dump_kind)
>>>>      {
>>>> +      /* Ensure dump message starts on a new line.  */
>>>> +      fprintf (dfile, "\n");
>>>>        if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
>>>> -        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
>>>> -                 LOCATION_LINE (loc));
>>>> +        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
>>>> +                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
>>>>        else if (current_function_decl)
>>>> -        fprintf (dfile, "\n%s:%d: note: ",
>>>> +        fprintf (dfile, "%s:%d:%d: note: ",
>>>>                   DECL_SOURCE_FILE (current_function_decl),
>>>> -                 DECL_SOURCE_LINE (current_function_decl));
>>>> +                 DECL_SOURCE_LINE (current_function_decl),
>>>> +                 DECL_SOURCE_COLUMN (current_function_decl));
>>>>      }
>>>>  }
>>>>
>>>> Index: dumpfile.h
>>>> ===================================================================
>>>> --- dumpfile.h  (revision 201461)
>>>> +++ dumpfile.h  (working copy)
>>>> @@ -97,8 +97,9 @@ enum tree_dump_index
>>>>  #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
>>>>  #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
>>>>  #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
>>>> +#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
>>>>  #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
>>>> -                              | OPTGROUP_VEC)
>>>> +                              | OPTGROUP_VEC | OPTGROUP_OTHER)
>>>>
>>>>  /* Define a tree dump switch.  */
>>>>  struct dump_file_info
>>>> Index: ipa-inline-transform.c
>>>> ===================================================================
>>>> --- ipa-inline-transform.c      (revision 201461)
>>>> +++ ipa-inline-transform.c      (working copy)
>>>> @@ -192,6 +192,111 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>  }
>>>>
>>>>
>>>> +#define MAX_INT_LENGTH 20
>>>> +
>>>> +/* Return NODE's name and profile count, if available.  */
>>>> +
>>>> +static const char *
>>>> +cgraph_node_opt_info (struct cgraph_node *node)
>>>> +{
>>>> +  char *buf;
>>>> +  size_t buf_size;
>>>> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>> +
>>>> +  if (!bfd_name)
>>>> +    bfd_name = "unknown";
>>>> +
>>>> +  buf_size = strlen (bfd_name) + 1;
>>>> +  if (profile_info)
>>>> +    buf_size += (MAX_INT_LENGTH + 3);
>>>> +  buf_size += MAX_INT_LENGTH;
>>>> +
>>>> +  buf = (char *) xmalloc (buf_size);
>>>> +
>>>> +  strcpy (buf, bfd_name);
>>>> +  //sprintf (buf, "%s/%i", buf, node->symbol.order);
>>>> +  sprintf (buf, "%s [%i]", buf, node->symbol.order);
>>>> +
>>>> +  if (profile_info)
>>>> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>> +  return buf;
>>>> +}
>>>> +
>>>> +
>>>> +/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
>>>> +   function that the caller is inlined to in FINAL_CALLER.  */
>>>> +
>>>> +static const char *
>>>> +cgraph_node_call_chain (struct cgraph_node *caller,
>>>> +                       struct cgraph_node **final_caller)
>>>> +{
>>>> +  struct cgraph_node *node;
>>>> +  const char *via_str = " (via inline instance";
>>>> +  size_t current_string_len = strlen (via_str) + 1;
>>>> +  size_t buf_size = current_string_len;
>>>> +  char *buf = (char *) xmalloc (buf_size);
>>>> +
>>>> +  buf[0] = 0;
>>>> +  gcc_assert (caller->global.inlined_to != NULL);
>>>> +  strcat (buf, via_str);
>>>> +  for (node = caller; node->global.inlined_to != NULL;
>>>> +       node = node->callers->caller)
>>>> +    {
>>>> +      const char *name = cgraph_node_opt_info (node);
>>>> +      current_string_len += (strlen (name) + 1);
>>>> +      if (current_string_len >= buf_size)
>>>> +       {
>>>> +         buf_size = current_string_len * 2;
>>>> +         buf = (char *) xrealloc (buf, buf_size);
>>>> +       }
>>>> +      strcat (buf, " ");
>>>> +      strcat (buf, name);
>>>> +    }
>>>> +  strcat (buf, ")");
>>>> +  *final_caller = node;
>>>> +  return buf;
>>>> +}
>>>> +
>>>> +
>>>> +/* Dump the inline decision of EDGE.  */
>>>> +
>>>> +static void
>>>> +dump_inline_decision (struct cgraph_edge *edge, bool early)
>>>> +{
>>>> +  location_t locus;
>>>> +  const char *inline_chain_text;
>>>> +  const char *call_count_text;
>>>> +  struct cgraph_node *final_caller = edge->caller;
>>>> +
>>>> +  if (final_caller->global.inlined_to != NULL)
>>>> +    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
>>>> +  else
>>>> +    inline_chain_text = "";
>>>> +
>>>> +  if (edge->count > 0)
>>>> +    {
>>>> +      const char *call_count_str = " with call count ";
>>>> +      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
>>>> +      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
>>>> +              edge->count);
>>>> +      call_count_text = buf;
>>>> +    }
>>>> +  else
>>>> +    {
>>>> +      call_count_text = "";
>>>> +    }
>>>> +
>>>> +  locus = gimple_location (edge->call_stmt);
>>>> +  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
>>>> +                   locus,
>>>> +                   "%s inlined into %s%s%s\n",
>>>> +                   cgraph_node_opt_info (edge->callee),
>>>> +                   cgraph_node_opt_info (final_caller),
>>>> +                   call_count_text,
>>>> +                   inline_chain_text);
>>>> +}
>>>> +
>>>> +
>>>>  /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
>>>>     specify whether profile of original function should be updated.  If any new
>>>>     indirect edges are discovered in the process, add them to NEW_EDGES, unless
>>>> @@ -205,7 +310,8 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>  bool
>>>>  inline_call (struct cgraph_edge *e, bool update_original,
>>>>              vec<cgraph_edge_p> *new_edges,
>>>> -            int *overall_size, bool update_overall_summary)
>>>> +            int *overall_size, bool update_overall_summary,
>>>> +             bool early)
>>>>  {
>>>>    int old_size = 0, new_size = 0;
>>>>    struct cgraph_node *to = NULL;
>>>> @@ -218,6 +324,9 @@ inline_call (struct cgraph_edge *e, bool update_or
>>>>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>>>>  #endif
>>>>
>>>> +  if (dump_enabled_p ())
>>>> +    dump_inline_decision (e, early);
>>>> +
>>>>    /* Don't inline inlined edges.  */
>>>>    gcc_assert (e->inline_failed);
>>>>    /* Don't even think of inlining inline clone.  */
>>>> Index: doc/invoke.texi
>>>> ===================================================================
>>>> --- doc/invoke.texi     (revision 201461)
>>>> +++ doc/invoke.texi     (working copy)
>>>> @@ -6234,6 +6234,9 @@ Enable dumps from all loop optimizations.
>>>>  Enable dumps from all inlining optimizations.
>>>>  @item vec
>>>>  Enable dumps from all vectorization optimizations.
>>>> +@item optall
>>>> +Enable dumps from all optimizations. This is a superset of
>>>> +the optimization groups listed above.
>>>>  @end table
>>>>
>>>>  For example,
>>>> Index: profile.c
>>>> ===================================================================
>>>> --- profile.c   (revision 201461)
>>>> +++ profile.c   (working copy)
>>>> @@ -432,8 +432,8 @@ read_profile_edge_counts (gcov_type *exec_counts)
>>>>                     if (flag_profile_correction)
>>>>                       {
>>>>                         static bool informed = 0;
>>>> -                       if (!informed)
>>>> -                         inform (input_location,
>>>> +                       if (dump_enabled_p () && !informed)
>>>> +                         dump_printf_loc (MSG_NOTE, input_location,
>>>>                                   "corrupted profile info: edge count
>>>> exceeds maximal count");
>>>>                         informed = 1;
>>>>                       }
>>>> @@ -692,10 +692,11 @@ compute_branch_probabilities (unsigned cfg_checksu
>>>>         {
>>>>           /* Inconsistency detected. Make it flow-consistent. */
>>>>           static int informed = 0;
>>>> -         if (informed == 0)
>>>> +         if (dump_enabled_p () && informed == 0)
>>>>             {
>>>>               informed = 1;
>>>> -             inform (input_location, "correcting inconsistent profile data");
>>>> +             dump_printf_loc (MSG_NOTE, input_location,
>>>> +                              "correcting inconsistent profile data");
>>>>             }
>>>>           correct_negative_edge_counts ();
>>>>           /* Set bb counts to the sum of the outgoing edge counts */
>>>> Index: passes.c
>>>> ===================================================================
>>>> --- passes.c    (revision 201461)
>>>> +++ passes.c    (working copy)
>>>> @@ -524,6 +524,11 @@ pass_manager::register_one_dump_file (struct opt_p
>>>>    flag_name = concat (prefix, name, num, NULL);
>>>>    glob_name = concat (prefix, name, NULL);
>>>>    optgroup_flags |= pass->optinfo_flags;
>>>> +  /* For any passes that do not have an optgroup set, and which are not
>>>> +     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
>>>> +     any dump messages are emitted properly under -fopt-info(-optall).  */
>>>> +  if (optgroup_flags == OPTGROUP_NONE)
>>>> +    optgroup_flags = OPTGROUP_OTHER;
>>>>    id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
>>>>    set_pass_for_id (id, pass);
>>>>    full_name = concat (prefix, pass->name, num, NULL);
>>>> Index: value-prof.c
>>>> ===================================================================
>>>> --- value-prof.c        (revision 201461)
>>>> +++ value-prof.c        (working copy)
>>>> @@ -585,9 +585,11 @@ check_counter (gimple stmt, const char * name,
>>>>                : DECL_SOURCE_LOCATION (current_function_decl);
>>>>        if (flag_profile_correction)
>>>>          {
>>>> -         inform (locus, "correcting inconsistent value profile: "
>>>> -                 "%s profiler overall count (%d) does not match BB count "
>>>> -                  "(%d)", name, (int)*all, (int)bb_count);
>>>> +          if (dump_enabled_p ())
>>>> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>>> +                             "correcting inconsistent value profile: %s "
>>>> +                             "profiler overall count (%d) does not match BB "
>>>> +                             "count (%d)", name, (int)*all, (int)bb_count);
>>>>           *all = bb_count;
>>>>           if (*count > *all)
>>>>              *count = *all;
>>>> @@ -1209,9 +1211,11 @@ find_func_by_funcdef_no (int func_id)
>>>>    int max_id = get_last_funcdef_no ();
>>>>    if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
>>>>      {
>>>> -      if (flag_profile_correction)
>>>> -        inform (DECL_SOURCE_LOCATION (current_function_decl),
>>>> -                "Inconsistent profile: indirect call target (%d) does
>>>> not exist", func_id);
>>>> +      if (flag_profile_correction && dump_enabled_p ())
>>>> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>>>> +                         DECL_SOURCE_LOCATION (current_function_decl),
>>>> +                         "Inconsistent profile: indirect call target (%d) "
>>>> +                         "does not exist", func_id);
>>>>        else
>>>>          error ("Inconsistent profile: indirect call target (%d) does
>>>> not exist", func_id);
>>>>
>>>> @@ -1235,8 +1239,10 @@ check_ic_target (gimple call_stmt, struct cgraph_n
>>>>       return true;
>>>>
>>>>     locus =  gimple_location (call_stmt);
>>>> -   inform (locus, "Skipping target %s with mismatching types for icall ",
>>>> -           cgraph_node_name (target));
>>>> +   if (dump_enabled_p ())
>>>> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>>> +                      "Skipping target %s with mismatching types for icall ",
>>>> +                      cgraph_node_name (target));
>>>>     return false;
>>>>  }
>>>>
>>>> Index: coverage.c
>>>> ===================================================================
>>>> --- coverage.c  (revision 201461)
>>>> +++ coverage.c  (working copy)
>>>> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>  #include "langhooks.h"
>>>>  #include "hash-table.h"
>>>>  #include "tree-iterator.h"
>>>> +#include "tree-pass.h"
>>>>  #include "cgraph.h"
>>>>  #include "dumpfile.h"
>>>>  #include "diagnostic-core.h"
>>>> @@ -341,11 +342,13 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>>      {
>>>>        static int warned = 0;
>>>>
>>>> -      if (!warned++)
>>>> -       inform (input_location, (flag_guess_branch_prob
>>>> -                ? "file %s not found, execution counts estimated"
>>>> -                : "file %s not found, execution counts assumed to be zero"),
>>>> -               da_file_name);
>>>> +      if (!warned++ && dump_enabled_p ())
>>>> +       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                         (flag_guess_branch_prob
>>>> +                          ? "file %s not found, execution counts estimated"
>>>> +                          : "file %s not found, execution counts assumed to "
>>>> +                            "be zero"),
>>>> +                         da_file_name);
>>>>        return NULL;
>>>>      }
>>>>
>>>> @@ -369,21 +372,25 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>>         warning_at (input_location, OPT_Wcoverage_mismatch,
>>>>                     "the control flow of function %qE does not match "
>>>>                     "its profile data (counter %qs)", id, ctr_names[counter]);
>>>> -      if (warning_printed)
>>>> +      if (warning_printed && dump_enabled_p ())
>>>>         {
>>>> -        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
>>>> -                "the mismatch but performance may drop if the
>>>> function is hot");
>>>> +          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                           "use -Wno-error=coverage-mismatch to tolerate "
>>>> +                           "the mismatch but performance may drop if the "
>>>> +                           "function is hot");
>>>>
>>>>           if (!seen_error ()
>>>>               && !warned++)
>>>>             {
>>>> -             inform (input_location, "coverage mismatch ignored");
>>>> -             inform (input_location, flag_guess_branch_prob
>>>> -                     ? G_("execution counts estimated")
>>>> -                     : G_("execution counts assumed to be zero"));
>>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                               "coverage mismatch ignored");
>>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                               flag_guess_branch_prob
>>>> +                               ? G_("execution counts estimated")
>>>> +                               : G_("execution counts assumed to be zero"));
>>>>               if (!flag_guess_branch_prob)
>>>> -               inform (input_location,
>>>> -                       "this can result in poorly optimized code");
>>>> +               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                                 "this can result in poorly optimized code");
>>>>             }
>>>>         }
>>>>
>>>> @@ -1103,6 +1110,11 @@ coverage_init (const char *filename)
>>>>    int len = strlen (filename);
>>>>    int prefix_len = 0;
>>>>
>>>> +  /* Since coverage_init is invoked very early, before the pass
>>>> +     manager, we need to set up the dumping explicitly. This is
>>>> +     similar to the handling in finish_optimization_passes.  */
>>>> +  dump_start (pass_profile.pass.static_pass_number, NULL);
>>>> +
>>>>    if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
>>>>      profile_data_prefix = getpwd ();
>>>>
>>>> @@ -1145,6 +1157,8 @@ coverage_init (const char *filename)
>>>>           gcov_write_unsigned (bbg_file_stamp);
>>>>         }
>>>>      }
>>>> +
>>>> +  dump_finish (pass_profile.pass.static_pass_number);
>>>>  }
>>>>
>>>>  /* Performs file-level cleanup.  Close notes file, generate coverage
>>>> Index: ipa-inline.c
>>>> ===================================================================
>>>> --- ipa-inline.c        (revision 201461)
>>>> +++ ipa-inline.c        (working copy)
>>>> @@ -1322,7 +1322,7 @@ recursive_inlining (struct cgraph_edge *edge,
>>>>            reset_edge_growth_cache (curr);
>>>>         }
>>>>
>>>> -      inline_call (curr, false, new_edges, &overall_size, true);
>>>> +      inline_call (curr, false, new_edges, &overall_size, true, false);
>>>>        lookup_recursive_calls (node, curr->callee, heap);
>>>>        n++;
>>>>      }
>>>> @@ -1612,7 +1612,8 @@ inline_small_functions (void)
>>>>             fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
>>>>
>>>>           gcc_checking_assert (!callee->global.inlined_to);
>>>> -         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
>>>> +         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
>>>> +                       false);
>>>>           if (flag_indirect_inlining)
>>>>             add_new_edges_to_heap (edge_heap, new_indirect_edges);
>>>>
>>>> @@ -1733,7 +1734,7 @@ flatten_function (struct cgraph_node *node, bool e
>>>>                  xstrdup (cgraph_node_name (callee)),
>>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>>        orig_callee = callee;
>>>> -      inline_call (e, true, NULL, NULL, false);
>>>> +      inline_call (e, true, NULL, NULL, false, early);
>>>>        if (e->callee != orig_callee)
>>>>         orig_callee->symbol.aux = (void *) node;
>>>>        flatten_function (e->callee, early);
>>>> @@ -1852,7 +1853,8 @@ ipa_inline (void)
>>>>                                    inline_summary
>>>> (node->callers->caller)->size);
>>>>                         }
>>>>
>>>> -                     inline_call (node->callers, true, NULL, NULL, true);
>>>> +                     inline_call (node->callers, true, NULL, NULL, true,
>>>> +                                   false);
>>>>                       if (dump_file)
>>>>                         fprintf (dump_file,
>>>>                                  " Inlined into %s which now has %i size\n",
>>>> @@ -1925,7 +1927,7 @@ inline_always_inline_functions (struct cgraph_node
>>>>         fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
>>>>                  xstrdup (cgraph_node_name (e->callee)),
>>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>> -      inline_call (e, true, NULL, NULL, false);
>>>> +      inline_call (e, true, NULL, NULL, false, true);
>>>>        inlined = true;
>>>>      }
>>>>    if (inlined)
>>>> @@ -1977,7 +1979,7 @@ early_inline_small_functions (struct cgraph_node *
>>>>         fprintf (dump_file, " Inlining %s into %s.\n",
>>>>                  xstrdup (cgraph_node_name (callee)),
>>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>> -      inline_call (e, true, NULL, NULL, true);
>>>> +      inline_call (e, true, NULL, NULL, true, true);
>>>>        inlined = true;
>>>>      }
>>>>
>>>> Index: ipa-inline.h
>>>> ===================================================================
>>>> --- ipa-inline.h        (revision 201461)
>>>> +++ ipa-inline.h        (working copy)
>>>> @@ -228,7 +228,8 @@ void free_growth_caches (void);
>>>>  void compute_inline_parameters (struct cgraph_node *, bool);
>>>>
>>>>  /* In ipa-inline-transform.c  */
>>>> -bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
>>>> int *, bool);
>>>> +bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
>>>> +                  bool, bool);
>>>>  unsigned int inline_transform (struct cgraph_node *);
>>>>  void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);
>>>>
>>>> Index: testsuite/gcc.dg/pr40209.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/pr40209.c  (revision 201461)
>>>> +++ testsuite/gcc.dg/pr40209.c  (working copy)
>>>> @@ -1,5 +1,5 @@
>>>>  /* { dg-do compile } */
>>>> -/* { dg-options "-O2 -fprofile-use" } */
>>>> +/* { dg-options "-O2 -fprofile-use -fopt-info" } */
>>>>
>>>>  void process(const char *s);
>>>>
>>>> Index: testsuite/gcc.dg/pr26570.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/pr26570.c  (revision 201461)
>>>> +++ testsuite/gcc.dg/pr26570.c  (working copy)
>>>> @@ -1,5 +1,5 @@
>>>>  /* { dg-do compile } */
>>>> -/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
>>>> +/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */
>>>>
>>>>  unsigned test (unsigned a, unsigned b)
>>>>  {
>>>> Index: testsuite/gcc.dg/pr32773.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/pr32773.c  (revision 201461)
>>>> +++ testsuite/gcc.dg/pr32773.c  (working copy)
>>>> @@ -1,6 +1,6 @@
>>>>  /* { dg-do compile } */
>>>> -/* { dg-options "-O -fprofile-use" } */
>>>> -/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
>>>> +/* { dg-options "-O -fprofile-use -fopt-info" } */
>>>> +/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */
>>>>
>>>>  void foo (int *p)
>>>>  {
>>>> Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
>>>> ===================================================================
>>>> --- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
>>>> +++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
>>>> @@ -1,7 +1,7 @@
>>>>  // PR tree-optimization/39557
>>>>  // invalid post-dom info leads to infinite loop
>>>>  // { dg-do run }
>>>> -// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
>>>> +// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
>>>> -fno-rtti" }
>>>>
>>>>  struct C
>>>>  {
>>>> Index: testsuite/gcc.dg/inline-dump.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
>>>> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
>>>> @@ -0,0 +1,11 @@
>>>> +/* Verify that -fopt-info can output correct inline info.  */
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
>>>> +static inline int leaf() {
>>>> +  int i, ret = 0;
>>>> +  for (i = 0; i < 10; i++)
>>>> +    ret += i;
>>>> +  return ret;
>>>> +}
>>>> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
>>>> leaf .*inlined into bar .*via inline instance foo.*\n" } */
>>>> +int bar(void) { return foo(); }
>>>>>
>>>>> Thanks,
>>>>> Teresa
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Martin
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Aug. 29, 2013, 1:18 p.m. UTC | #11
On Wed, Aug 28, 2013 at 9:07 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Wed, Aug 28, 2013 at 7:09 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Wed, Aug 28, 2013 at 4:01 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Wed, Aug 7, 2013 at 7:23 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>>>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>>>>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>>>>>> >> This patch ports messages to the new dump framework,
>>>>>>> >
>>>>>>> > It would be great this new framework was documented somewhere.  I lost
>>>>>>> > track of what was agreed it would be and from the uses in the
>>>>>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>>>>>
>>>>>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>>>>>> wiki or elsewhere?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>>
>>>>>>> >
>>>>>>> > I'd also like to point out two other minor things inline:
>>>>>>> >
>>>>>>> > [...]
>>>>>>> >
>>>>>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>>>> >>             Dehao Chen  <dehao@google.com>
>>>>>>> >>
>>>>>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>>>>>> >>         consistent.
>>>>>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>>>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>>>>>> >>         (cgraph_node_opt_info): New function.
>>>>>>> >>         (cgraph_node_call_chain): Ditto.
>>>>>>> >>         (dump_inline_decision): Ditto.
>>>>>>> >>         (inline_call): Invoke dump_inline_decision.
>>>>>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>>>> >>         (compute_branch_probabilities): Ditto.
>>>>>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>>>> >>         when pass not in any opt group.
>>>>>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>>>>>> >>         (find_func_by_funcdef_no): Ditto.
>>>>>>> >>         (check_ic_target): Ditto.
>>>>>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>>>>>> >>         (coverage_init): Setup new dump framework.
>>>>>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>>>>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>>>>>> >>
>>>>>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>>>>>> >>
>>>>>>> >
>>>>>>> > [...]
>>>>>>> >
>>>>>>> >> Index: ipa-inline-transform.c
>>>>>>> >> ===================================================================
>>>>>>> >> --- ipa-inline-transform.c      (revision 201461)
>>>>>>> >> +++ ipa-inline-transform.c      (working copy)
>>>>>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>>>> >>  }
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> +#define MAX_INT_LENGTH 20
>>>>>>> >> +
>>>>>>> >> +/* Return NODE's name and profile count, if available.  */
>>>>>>> >> +
>>>>>>> >> +static const char *
>>>>>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>>>>>> >> +{
>>>>>>> >> +  char *buf;
>>>>>>> >> +  size_t buf_size;
>>>>>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>>>>> >> +
>>>>>>> >> +  if (!bfd_name)
>>>>>>> >> +    bfd_name = "unknown";
>>>>>>> >> +
>>>>>>> >> +  buf_size = strlen (bfd_name) + 1;
>>>>>>> >> +  if (profile_info)
>>>>>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>>>>>> >> +
>>>>>>> >> +  buf = (char *) xmalloc (buf_size);
>>>>>>> >> +
>>>>>>> >> +  strcpy (buf, bfd_name);
>>>>>>> >> +
>>>>>>> >> +  if (profile_info)
>>>>>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>>>>> >> +  return buf;
>>>>>>> >> +}
>>>>>>> >
>>>>>>> > I'm not sure if output of this function is aimed only at the user or
>>>>>>> > if it is supposed to be used by gcc developers as well.  If the
>>>>>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>>>>>> > too.  We usually dump it after "/" sign separating it from node name.
>>>>>>> > It is invaluable when examining decisions in C++ code where you can
>>>>>>> > have lots of clones of a node (and also because existing dumps print
>>>>>>> > it, it is easy to combine them).
>>>>>>>
>>>>>>> The output is useful for both power users doing performance tuning of
>>>>>>> their application, and by gcc developers. Adding the id is not so
>>>>>>> useful for the former, but I agree that it is very useful for compiler
>>>>>>> developers. In fact, in the google branch version we emit more verbose
>>>>>>> information (the lipo module id and the funcdef_no) to help uniquely
>>>>>>> identify the routines and to aid in post-processing by humans and
>>>>>>> tools. So it is probably useful to add something similar here too. Is
>>>>>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>>>>>> that you added a patch a few months ago to print the
>>>>>>> node->symbol.order in the function header, and it also has the
>>>>>>> advantage as you note of matching up with existing ipa dumps.
>>>>>>
>>>>>> node->symbol.order is unique and if I remember correctly, it is not
>>>>>> even recycled.  Clones, inline clones, thunks, every symbol table node
>>>>>> gets its own symbol order so it should be more unique than funcdef_no.
>>>>>> On the other hand it may be a bit cryptic for users but at the same
>>>>>> time it is only one number.
>>>>>
>>>>> Ok, I am going to go ahead and add this to the output.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> >
>>>>>>> > [...]
>>>>>>> >
>>>>>>> >> Index: ipa-inline.c
>>>>>>> >> ===================================================================
>>>>>>> >> --- ipa-inline.c        (revision 201461)
>>>>>>> >> +++ ipa-inline.c        (working copy)
>>>>>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>>>>>> >>  static int overall_size;
>>>>>>> >>  static gcov_type max_count;
>>>>>>> >>
>>>>>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>>>>>> >> +bool is_in_ipa_inline = false;
>>>>>>> >> +
>>>>>>> >>  /* Return false when inlining edge E would lead to violating
>>>>>>> >>     limits on function unit growth or stack usage growth.
>>>>>>> >>
>>>>>>> >
>>>>>>> > In this age of removing global variables, are you sure you need this?
>>>>>>> > The only user of this seems to be a function that is only being called
>>>>>>> > from inline_call... can that ever happen when not inlining?  If you
>>>>>>> > plan to use this function also elsewhere, perhaps the callers will
>>>>>>> > know whether we are inlining or not and can provide this in a
>>>>>>> > parameter?
>>>>>>>
>>>>>>> This is to distinguish early inlining from ipa inlining.
>>>>>>
>>>>>> Oh, right, I did not realize that the IPA part was the important bit
>>>>>> of the name.
>>>>>>
>>>>>>> The volume of
>>>>>>> early inlining messages is too high to be on for the default setting
>>>>>>> of -fopt-info, and are not as interesting usually for performance
>>>>>>> tuning. The dumper will only emit the early inline messages under a
>>>>>>> more verbose setting (MSG_NOTE):
>>>>>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>>>>>> The other way I can see to distinguish this would be to check the
>>>>>>> always_inline_functions_inlined flag on the caller's function. It
>>>>>>> could also be possible to pass down a flag from the callers of
>>>>>>> inline_call, but at least one caller (flatten_functions) is shared
>>>>>>> between early and late inlining, so the flag needs to be passed
>>>>>>> through that as well. WDYT?
>>>>>>
>>>>>> Did you mean flatten_function?  It already has a bool "early"
>>>>>> parameter.  But I can see that being able to quickly figure out
>>>>>> whether we are in early inliner or ipa inliner without much hassle is
>>>>>> useful enough to justify a global variable a month ago, however I
>>>>>> suppose we should not be introducing them now and so you'd have to put
>>>>>> such stuff into... well, you'd probably have to put into the universe
>>>>>> object somewhere because it is basically shared between two passes.
>>>>>> Another option, even though somewhat hackish, would be to look at
>>>>>> current_pass and see which pass it is.  I don't know, do what is
>>>>>> easier or what you like more, just be aware of the problem.
>>>>>
>>>>> After thinking about this some more, I think passing down an early
>>>>> flag from callers is the cleanest way to go.
>>>>>
>>>>> I'll fix these and post a new patch later today.
>>>>
>>>> New patch below that removes this global variable, and also outputs
>>>> the node->symbol.order (in square brackets after the function name so
>>>> as to not clutter it). Inline messages with profile data look look:
>>>>
>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>
>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>>> understandable by GCC users, not only GCC developers.
>>
>> The main part that is only useful/understandable to gcc developers is
>> the node->symbol.order in square brackes, requested by Martin. One
>> possibility is that I could put that part under a param, disabled by
>> default. We have something similar on the google branches that emits
>> LIPO module info in the message, enabled via a param.
>>
>> I'd argue that the other information (the profile counts, emitted only
>> when using -fprofile-use, and the inline call chains) are useful if
>> you want to understand whether and how critical inlines are occurring.
>> I think this is the type of information that users focused on
>> optimizations, as well as gcc developers, want when they use
>> -fopt-info. Otherwise it is difficult to make sense of the inline
>> information.
>>
>>>
>>>> (without FDO the counts in parentheses and the call count would not be
>>>> included).
>>>>
>>>> Ok for trunk?
>>>
>>> Let's split this patch.
>>
>> Ok.
>>
>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>> 013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>             Dehao Chen  <dehao@google.com>
>>>>
>>>>         * dumpfile.c (dump_loc): Output column number, make newlines consistent.
>>>
>>> I don't like column numbers, they are of not much use generally.
>>
>> I added these here to get consistency with other messages (notes
>> emitted via inform(), warnings, errors). Plus the dg-message testing
>> was failing for the test cases that parse this output, since it
>> expects the column to exist.
>
> The above change (output column number) and the changes in the
> testsuite go with the change you have approved below (due to moving
> some profile messages to the new framework). Ok to commit these along
> with that approved portion?

Richard is this part ok since it goes with the part you approved below?

Thanks,
Teresa

>
> Thanks,
> Teresa
>
>>
>>> Does
>>> 'make newlines consitent' avoid all the spurious vertical spacing I see with
>>> -fopt-info?
>>
>> Well, it helps get us there. The problem was that before, since
>> dump_loc was not consistently emitting newlines, the calls had to emit
>> their own newlines manually in the string to ensure there was a
>> newline at all. I was thinking that once this is fixed I could go back
>> and clean up all those calls by removing the newlines in the string. I
>> could split this part into a separate patch and do both at once.
>>
>> However, after thinking about this some more this morning, I am
>> wondering whether it is better to remove the newline emission
>> completely from dump_loc and rely on the caller to put the newline in
>> the string. The reason is that there are 2 high level interfaces to
>> the new dump infrastructure, dump_printf() and dump_printf_loc(). Only
>> the latter invokes dump_loc and gets the newline at the start of the
>> message. The typical usage seems to be to start a message via
>> dump_printf_loc, and then use dump_printf to emit parts of the message
>> (thus not requiring a newline), but I think it may lead to problems to
>> rely on this assumption.
>>
>> So if you agree, I will simply remove the newline altogether from
>> dump_loc, and ensure that all clients of dump_printf/dump_printf_loc
>> include a newline char as appropriate in the string they pass.
>>
>>>
>>>>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>
>>> Good change - please split this out (with the related changes) and commit it.
>>
>> Ok, thanks. Will do.
>>
>>>
>>>>         * ipa-inline-transform.c (cgraph_node_opt_info): New function.
>>>>         (cgraph_node_call_chain): Ditto.
>>>>         (dump_inline_decision): Ditto.
>>>>         (inline_call): Invoke dump_inline_decision, new parameter.
>>>
>>> The inline stuff should be split and re-sent, it's non-obvious to me (extra
>>> function parameters are not documented for example).  I'd rather have
>>> inline_and_report_call () for example instead of an extra bool parameter.
>>> But let's iterate over this once it's split out.
>>
>> Ok, I will send this separately. I guess we could have a separate
>> interface inline_and_report_call that is a wrapper around inline_call
>> and simply invokes the dumper. Note that flatten_function will need to
>> conditionally call one of the two interfaces based on the value of its
>> bool early parameter though.
>>
>>>
>>>>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>         (compute_branch_probabilities): Ditto.
>>>>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>         when pass not in any opt group.
>>>>         * value-prof.c (check_counter): Use new dump framework.
>>>>         (find_func_by_funcdef_no): Ditto.
>>>>         (check_ic_target): Ditto.
>>>>         * coverage.c (get_coverage_counts): Ditto.
>>>>         (coverage_init): Setup new dump framework.
>>>
>>> These pieces look good to me.
>>>
>>>>         * ipa-inline.c (recursive_inlining): New inline_call parameter.
>>>>         (inline_small_functions): Ditto.
>>>>         (flatten_function): Ditto.
>>>>         (ipa_inline): Ditto.
>>>>         (inline_always_inline_functions): Ditto.
>>>>         (early_inline_small_functions): Ditto.
>>>>         * ipa-inline.h: Ditto.
>>>>
>>>>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>         * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.
>>>
>>> Why?  Just remove the stray dg- annotations that deal with the unwanted output?
>>
>> Because there are dg-message annotations that want to confirm this output.
>>
>> Teresa
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>         * testsuite/gcc.dg/inline-dump.c: New test.
>>>>
>>>> Index: dumpfile.c
>>>> ===================================================================
>>>> --- dumpfile.c  (revision 201461)
>>>> +++ dumpfile.c  (working copy)
>>>> @@ -257,16 +257,18 @@ dump_open_alternate_stream (struct dump_file_info
>>>>  void
>>>>  dump_loc (int dump_kind, FILE *dfile, source_location loc)
>>>>  {
>>>> -  /* Currently vectorization passes print location information.  */
>>>>    if (dump_kind)
>>>>      {
>>>> +      /* Ensure dump message starts on a new line.  */
>>>> +      fprintf (dfile, "\n");
>>>>        if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
>>>> -        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
>>>> -                 LOCATION_LINE (loc));
>>>> +        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
>>>> +                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
>>>>        else if (current_function_decl)
>>>> -        fprintf (dfile, "\n%s:%d: note: ",
>>>> +        fprintf (dfile, "%s:%d:%d: note: ",
>>>>                   DECL_SOURCE_FILE (current_function_decl),
>>>> -                 DECL_SOURCE_LINE (current_function_decl));
>>>> +                 DECL_SOURCE_LINE (current_function_decl),
>>>> +                 DECL_SOURCE_COLUMN (current_function_decl));
>>>>      }
>>>>  }
>>>>
>>>> Index: dumpfile.h
>>>> ===================================================================
>>>> --- dumpfile.h  (revision 201461)
>>>> +++ dumpfile.h  (working copy)
>>>> @@ -97,8 +97,9 @@ enum tree_dump_index
>>>>  #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
>>>>  #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
>>>>  #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
>>>> +#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
>>>>  #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
>>>> -                              | OPTGROUP_VEC)
>>>> +                              | OPTGROUP_VEC | OPTGROUP_OTHER)
>>>>
>>>>  /* Define a tree dump switch.  */
>>>>  struct dump_file_info
>>>> Index: ipa-inline-transform.c
>>>> ===================================================================
>>>> --- ipa-inline-transform.c      (revision 201461)
>>>> +++ ipa-inline-transform.c      (working copy)
>>>> @@ -192,6 +192,111 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>  }
>>>>
>>>>
>>>> +#define MAX_INT_LENGTH 20
>>>> +
>>>> +/* Return NODE's name and profile count, if available.  */
>>>> +
>>>> +static const char *
>>>> +cgraph_node_opt_info (struct cgraph_node *node)
>>>> +{
>>>> +  char *buf;
>>>> +  size_t buf_size;
>>>> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>> +
>>>> +  if (!bfd_name)
>>>> +    bfd_name = "unknown";
>>>> +
>>>> +  buf_size = strlen (bfd_name) + 1;
>>>> +  if (profile_info)
>>>> +    buf_size += (MAX_INT_LENGTH + 3);
>>>> +  buf_size += MAX_INT_LENGTH;
>>>> +
>>>> +  buf = (char *) xmalloc (buf_size);
>>>> +
>>>> +  strcpy (buf, bfd_name);
>>>> +  //sprintf (buf, "%s/%i", buf, node->symbol.order);
>>>> +  sprintf (buf, "%s [%i]", buf, node->symbol.order);
>>>> +
>>>> +  if (profile_info)
>>>> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>> +  return buf;
>>>> +}
>>>> +
>>>> +
>>>> +/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
>>>> +   function that the caller is inlined to in FINAL_CALLER.  */
>>>> +
>>>> +static const char *
>>>> +cgraph_node_call_chain (struct cgraph_node *caller,
>>>> +                       struct cgraph_node **final_caller)
>>>> +{
>>>> +  struct cgraph_node *node;
>>>> +  const char *via_str = " (via inline instance";
>>>> +  size_t current_string_len = strlen (via_str) + 1;
>>>> +  size_t buf_size = current_string_len;
>>>> +  char *buf = (char *) xmalloc (buf_size);
>>>> +
>>>> +  buf[0] = 0;
>>>> +  gcc_assert (caller->global.inlined_to != NULL);
>>>> +  strcat (buf, via_str);
>>>> +  for (node = caller; node->global.inlined_to != NULL;
>>>> +       node = node->callers->caller)
>>>> +    {
>>>> +      const char *name = cgraph_node_opt_info (node);
>>>> +      current_string_len += (strlen (name) + 1);
>>>> +      if (current_string_len >= buf_size)
>>>> +       {
>>>> +         buf_size = current_string_len * 2;
>>>> +         buf = (char *) xrealloc (buf, buf_size);
>>>> +       }
>>>> +      strcat (buf, " ");
>>>> +      strcat (buf, name);
>>>> +    }
>>>> +  strcat (buf, ")");
>>>> +  *final_caller = node;
>>>> +  return buf;
>>>> +}
>>>> +
>>>> +
>>>> +/* Dump the inline decision of EDGE.  */
>>>> +
>>>> +static void
>>>> +dump_inline_decision (struct cgraph_edge *edge, bool early)
>>>> +{
>>>> +  location_t locus;
>>>> +  const char *inline_chain_text;
>>>> +  const char *call_count_text;
>>>> +  struct cgraph_node *final_caller = edge->caller;
>>>> +
>>>> +  if (final_caller->global.inlined_to != NULL)
>>>> +    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
>>>> +  else
>>>> +    inline_chain_text = "";
>>>> +
>>>> +  if (edge->count > 0)
>>>> +    {
>>>> +      const char *call_count_str = " with call count ";
>>>> +      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
>>>> +      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
>>>> +              edge->count);
>>>> +      call_count_text = buf;
>>>> +    }
>>>> +  else
>>>> +    {
>>>> +      call_count_text = "";
>>>> +    }
>>>> +
>>>> +  locus = gimple_location (edge->call_stmt);
>>>> +  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
>>>> +                   locus,
>>>> +                   "%s inlined into %s%s%s\n",
>>>> +                   cgraph_node_opt_info (edge->callee),
>>>> +                   cgraph_node_opt_info (final_caller),
>>>> +                   call_count_text,
>>>> +                   inline_chain_text);
>>>> +}
>>>> +
>>>> +
>>>>  /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
>>>>     specify whether profile of original function should be updated.  If any new
>>>>     indirect edges are discovered in the process, add them to NEW_EDGES, unless
>>>> @@ -205,7 +310,8 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>  bool
>>>>  inline_call (struct cgraph_edge *e, bool update_original,
>>>>              vec<cgraph_edge_p> *new_edges,
>>>> -            int *overall_size, bool update_overall_summary)
>>>> +            int *overall_size, bool update_overall_summary,
>>>> +             bool early)
>>>>  {
>>>>    int old_size = 0, new_size = 0;
>>>>    struct cgraph_node *to = NULL;
>>>> @@ -218,6 +324,9 @@ inline_call (struct cgraph_edge *e, bool update_or
>>>>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>>>>  #endif
>>>>
>>>> +  if (dump_enabled_p ())
>>>> +    dump_inline_decision (e, early);
>>>> +
>>>>    /* Don't inline inlined edges.  */
>>>>    gcc_assert (e->inline_failed);
>>>>    /* Don't even think of inlining inline clone.  */
>>>> Index: doc/invoke.texi
>>>> ===================================================================
>>>> --- doc/invoke.texi     (revision 201461)
>>>> +++ doc/invoke.texi     (working copy)
>>>> @@ -6234,6 +6234,9 @@ Enable dumps from all loop optimizations.
>>>>  Enable dumps from all inlining optimizations.
>>>>  @item vec
>>>>  Enable dumps from all vectorization optimizations.
>>>> +@item optall
>>>> +Enable dumps from all optimizations. This is a superset of
>>>> +the optimization groups listed above.
>>>>  @end table
>>>>
>>>>  For example,
>>>> Index: profile.c
>>>> ===================================================================
>>>> --- profile.c   (revision 201461)
>>>> +++ profile.c   (working copy)
>>>> @@ -432,8 +432,8 @@ read_profile_edge_counts (gcov_type *exec_counts)
>>>>                     if (flag_profile_correction)
>>>>                       {
>>>>                         static bool informed = 0;
>>>> -                       if (!informed)
>>>> -                         inform (input_location,
>>>> +                       if (dump_enabled_p () && !informed)
>>>> +                         dump_printf_loc (MSG_NOTE, input_location,
>>>>                                   "corrupted profile info: edge count
>>>> exceeds maximal count");
>>>>                         informed = 1;
>>>>                       }
>>>> @@ -692,10 +692,11 @@ compute_branch_probabilities (unsigned cfg_checksu
>>>>         {
>>>>           /* Inconsistency detected. Make it flow-consistent. */
>>>>           static int informed = 0;
>>>> -         if (informed == 0)
>>>> +         if (dump_enabled_p () && informed == 0)
>>>>             {
>>>>               informed = 1;
>>>> -             inform (input_location, "correcting inconsistent profile data");
>>>> +             dump_printf_loc (MSG_NOTE, input_location,
>>>> +                              "correcting inconsistent profile data");
>>>>             }
>>>>           correct_negative_edge_counts ();
>>>>           /* Set bb counts to the sum of the outgoing edge counts */
>>>> Index: passes.c
>>>> ===================================================================
>>>> --- passes.c    (revision 201461)
>>>> +++ passes.c    (working copy)
>>>> @@ -524,6 +524,11 @@ pass_manager::register_one_dump_file (struct opt_p
>>>>    flag_name = concat (prefix, name, num, NULL);
>>>>    glob_name = concat (prefix, name, NULL);
>>>>    optgroup_flags |= pass->optinfo_flags;
>>>> +  /* For any passes that do not have an optgroup set, and which are not
>>>> +     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
>>>> +     any dump messages are emitted properly under -fopt-info(-optall).  */
>>>> +  if (optgroup_flags == OPTGROUP_NONE)
>>>> +    optgroup_flags = OPTGROUP_OTHER;
>>>>    id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
>>>>    set_pass_for_id (id, pass);
>>>>    full_name = concat (prefix, pass->name, num, NULL);
>>>> Index: value-prof.c
>>>> ===================================================================
>>>> --- value-prof.c        (revision 201461)
>>>> +++ value-prof.c        (working copy)
>>>> @@ -585,9 +585,11 @@ check_counter (gimple stmt, const char * name,
>>>>                : DECL_SOURCE_LOCATION (current_function_decl);
>>>>        if (flag_profile_correction)
>>>>          {
>>>> -         inform (locus, "correcting inconsistent value profile: "
>>>> -                 "%s profiler overall count (%d) does not match BB count "
>>>> -                  "(%d)", name, (int)*all, (int)bb_count);
>>>> +          if (dump_enabled_p ())
>>>> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>>> +                             "correcting inconsistent value profile: %s "
>>>> +                             "profiler overall count (%d) does not match BB "
>>>> +                             "count (%d)", name, (int)*all, (int)bb_count);
>>>>           *all = bb_count;
>>>>           if (*count > *all)
>>>>              *count = *all;
>>>> @@ -1209,9 +1211,11 @@ find_func_by_funcdef_no (int func_id)
>>>>    int max_id = get_last_funcdef_no ();
>>>>    if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
>>>>      {
>>>> -      if (flag_profile_correction)
>>>> -        inform (DECL_SOURCE_LOCATION (current_function_decl),
>>>> -                "Inconsistent profile: indirect call target (%d) does
>>>> not exist", func_id);
>>>> +      if (flag_profile_correction && dump_enabled_p ())
>>>> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>>>> +                         DECL_SOURCE_LOCATION (current_function_decl),
>>>> +                         "Inconsistent profile: indirect call target (%d) "
>>>> +                         "does not exist", func_id);
>>>>        else
>>>>          error ("Inconsistent profile: indirect call target (%d) does
>>>> not exist", func_id);
>>>>
>>>> @@ -1235,8 +1239,10 @@ check_ic_target (gimple call_stmt, struct cgraph_n
>>>>       return true;
>>>>
>>>>     locus =  gimple_location (call_stmt);
>>>> -   inform (locus, "Skipping target %s with mismatching types for icall ",
>>>> -           cgraph_node_name (target));
>>>> +   if (dump_enabled_p ())
>>>> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>>> +                      "Skipping target %s with mismatching types for icall ",
>>>> +                      cgraph_node_name (target));
>>>>     return false;
>>>>  }
>>>>
>>>> Index: coverage.c
>>>> ===================================================================
>>>> --- coverage.c  (revision 201461)
>>>> +++ coverage.c  (working copy)
>>>> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>  #include "langhooks.h"
>>>>  #include "hash-table.h"
>>>>  #include "tree-iterator.h"
>>>> +#include "tree-pass.h"
>>>>  #include "cgraph.h"
>>>>  #include "dumpfile.h"
>>>>  #include "diagnostic-core.h"
>>>> @@ -341,11 +342,13 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>>      {
>>>>        static int warned = 0;
>>>>
>>>> -      if (!warned++)
>>>> -       inform (input_location, (flag_guess_branch_prob
>>>> -                ? "file %s not found, execution counts estimated"
>>>> -                : "file %s not found, execution counts assumed to be zero"),
>>>> -               da_file_name);
>>>> +      if (!warned++ && dump_enabled_p ())
>>>> +       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                         (flag_guess_branch_prob
>>>> +                          ? "file %s not found, execution counts estimated"
>>>> +                          : "file %s not found, execution counts assumed to "
>>>> +                            "be zero"),
>>>> +                         da_file_name);
>>>>        return NULL;
>>>>      }
>>>>
>>>> @@ -369,21 +372,25 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>>         warning_at (input_location, OPT_Wcoverage_mismatch,
>>>>                     "the control flow of function %qE does not match "
>>>>                     "its profile data (counter %qs)", id, ctr_names[counter]);
>>>> -      if (warning_printed)
>>>> +      if (warning_printed && dump_enabled_p ())
>>>>         {
>>>> -        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
>>>> -                "the mismatch but performance may drop if the
>>>> function is hot");
>>>> +          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                           "use -Wno-error=coverage-mismatch to tolerate "
>>>> +                           "the mismatch but performance may drop if the "
>>>> +                           "function is hot");
>>>>
>>>>           if (!seen_error ()
>>>>               && !warned++)
>>>>             {
>>>> -             inform (input_location, "coverage mismatch ignored");
>>>> -             inform (input_location, flag_guess_branch_prob
>>>> -                     ? G_("execution counts estimated")
>>>> -                     : G_("execution counts assumed to be zero"));
>>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                               "coverage mismatch ignored");
>>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                               flag_guess_branch_prob
>>>> +                               ? G_("execution counts estimated")
>>>> +                               : G_("execution counts assumed to be zero"));
>>>>               if (!flag_guess_branch_prob)
>>>> -               inform (input_location,
>>>> -                       "this can result in poorly optimized code");
>>>> +               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>> +                                 "this can result in poorly optimized code");
>>>>             }
>>>>         }
>>>>
>>>> @@ -1103,6 +1110,11 @@ coverage_init (const char *filename)
>>>>    int len = strlen (filename);
>>>>    int prefix_len = 0;
>>>>
>>>> +  /* Since coverage_init is invoked very early, before the pass
>>>> +     manager, we need to set up the dumping explicitly. This is
>>>> +     similar to the handling in finish_optimization_passes.  */
>>>> +  dump_start (pass_profile.pass.static_pass_number, NULL);
>>>> +
>>>>    if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
>>>>      profile_data_prefix = getpwd ();
>>>>
>>>> @@ -1145,6 +1157,8 @@ coverage_init (const char *filename)
>>>>           gcov_write_unsigned (bbg_file_stamp);
>>>>         }
>>>>      }
>>>> +
>>>> +  dump_finish (pass_profile.pass.static_pass_number);
>>>>  }
>>>>
>>>>  /* Performs file-level cleanup.  Close notes file, generate coverage
>>>> Index: ipa-inline.c
>>>> ===================================================================
>>>> --- ipa-inline.c        (revision 201461)
>>>> +++ ipa-inline.c        (working copy)
>>>> @@ -1322,7 +1322,7 @@ recursive_inlining (struct cgraph_edge *edge,
>>>>            reset_edge_growth_cache (curr);
>>>>         }
>>>>
>>>> -      inline_call (curr, false, new_edges, &overall_size, true);
>>>> +      inline_call (curr, false, new_edges, &overall_size, true, false);
>>>>        lookup_recursive_calls (node, curr->callee, heap);
>>>>        n++;
>>>>      }
>>>> @@ -1612,7 +1612,8 @@ inline_small_functions (void)
>>>>             fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
>>>>
>>>>           gcc_checking_assert (!callee->global.inlined_to);
>>>> -         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
>>>> +         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
>>>> +                       false);
>>>>           if (flag_indirect_inlining)
>>>>             add_new_edges_to_heap (edge_heap, new_indirect_edges);
>>>>
>>>> @@ -1733,7 +1734,7 @@ flatten_function (struct cgraph_node *node, bool e
>>>>                  xstrdup (cgraph_node_name (callee)),
>>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>>        orig_callee = callee;
>>>> -      inline_call (e, true, NULL, NULL, false);
>>>> +      inline_call (e, true, NULL, NULL, false, early);
>>>>        if (e->callee != orig_callee)
>>>>         orig_callee->symbol.aux = (void *) node;
>>>>        flatten_function (e->callee, early);
>>>> @@ -1852,7 +1853,8 @@ ipa_inline (void)
>>>>                                    inline_summary
>>>> (node->callers->caller)->size);
>>>>                         }
>>>>
>>>> -                     inline_call (node->callers, true, NULL, NULL, true);
>>>> +                     inline_call (node->callers, true, NULL, NULL, true,
>>>> +                                   false);
>>>>                       if (dump_file)
>>>>                         fprintf (dump_file,
>>>>                                  " Inlined into %s which now has %i size\n",
>>>> @@ -1925,7 +1927,7 @@ inline_always_inline_functions (struct cgraph_node
>>>>         fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
>>>>                  xstrdup (cgraph_node_name (e->callee)),
>>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>> -      inline_call (e, true, NULL, NULL, false);
>>>> +      inline_call (e, true, NULL, NULL, false, true);
>>>>        inlined = true;
>>>>      }
>>>>    if (inlined)
>>>> @@ -1977,7 +1979,7 @@ early_inline_small_functions (struct cgraph_node *
>>>>         fprintf (dump_file, " Inlining %s into %s.\n",
>>>>                  xstrdup (cgraph_node_name (callee)),
>>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>> -      inline_call (e, true, NULL, NULL, true);
>>>> +      inline_call (e, true, NULL, NULL, true, true);
>>>>        inlined = true;
>>>>      }
>>>>
>>>> Index: ipa-inline.h
>>>> ===================================================================
>>>> --- ipa-inline.h        (revision 201461)
>>>> +++ ipa-inline.h        (working copy)
>>>> @@ -228,7 +228,8 @@ void free_growth_caches (void);
>>>>  void compute_inline_parameters (struct cgraph_node *, bool);
>>>>
>>>>  /* In ipa-inline-transform.c  */
>>>> -bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
>>>> int *, bool);
>>>> +bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
>>>> +                  bool, bool);
>>>>  unsigned int inline_transform (struct cgraph_node *);
>>>>  void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);
>>>>
>>>> Index: testsuite/gcc.dg/pr40209.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/pr40209.c  (revision 201461)
>>>> +++ testsuite/gcc.dg/pr40209.c  (working copy)
>>>> @@ -1,5 +1,5 @@
>>>>  /* { dg-do compile } */
>>>> -/* { dg-options "-O2 -fprofile-use" } */
>>>> +/* { dg-options "-O2 -fprofile-use -fopt-info" } */
>>>>
>>>>  void process(const char *s);
>>>>
>>>> Index: testsuite/gcc.dg/pr26570.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/pr26570.c  (revision 201461)
>>>> +++ testsuite/gcc.dg/pr26570.c  (working copy)
>>>> @@ -1,5 +1,5 @@
>>>>  /* { dg-do compile } */
>>>> -/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
>>>> +/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */
>>>>
>>>>  unsigned test (unsigned a, unsigned b)
>>>>  {
>>>> Index: testsuite/gcc.dg/pr32773.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/pr32773.c  (revision 201461)
>>>> +++ testsuite/gcc.dg/pr32773.c  (working copy)
>>>> @@ -1,6 +1,6 @@
>>>>  /* { dg-do compile } */
>>>> -/* { dg-options "-O -fprofile-use" } */
>>>> -/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
>>>> +/* { dg-options "-O -fprofile-use -fopt-info" } */
>>>> +/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */
>>>>
>>>>  void foo (int *p)
>>>>  {
>>>> Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
>>>> ===================================================================
>>>> --- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
>>>> +++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
>>>> @@ -1,7 +1,7 @@
>>>>  // PR tree-optimization/39557
>>>>  // invalid post-dom info leads to infinite loop
>>>>  // { dg-do run }
>>>> -// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
>>>> +// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
>>>> -fno-rtti" }
>>>>
>>>>  struct C
>>>>  {
>>>> Index: testsuite/gcc.dg/inline-dump.c
>>>> ===================================================================
>>>> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
>>>> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
>>>> @@ -0,0 +1,11 @@
>>>> +/* Verify that -fopt-info can output correct inline info.  */
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
>>>> +static inline int leaf() {
>>>> +  int i, ret = 0;
>>>> +  for (i = 0; i < 10; i++)
>>>> +    ret += i;
>>>> +  return ret;
>>>> +}
>>>> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
>>>> leaf .*inlined into bar .*via inline instance foo.*\n" } */
>>>> +int bar(void) { return foo(); }
>>>>>
>>>>> Thanks,
>>>>> Teresa
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Martin
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Aug. 29, 2013, 1:34 p.m. UTC | #12
On Thu, Aug 29, 2013 at 3:05 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
>>>> Does
>>>> 'make newlines consitent' avoid all the spurious vertical spacing I see with
>>>> -fopt-info?
>>>
>>> Well, it helps get us there. The problem was that before, since
>>> dump_loc was not consistently emitting newlines, the calls had to emit
>>> their own newlines manually in the string to ensure there was a
>>> newline at all. I was thinking that once this is fixed I could go back
>>> and clean up all those calls by removing the newlines in the string. I
>>> could split this part into a separate patch and do both at once.
>>>
>>> However, after thinking about this some more this morning, I am
>>> wondering whether it is better to remove the newline emission
>>> completely from dump_loc and rely on the caller to put the newline in
>>> the string. The reason is that there are 2 high level interfaces to
>>> the new dump infrastructure, dump_printf() and dump_printf_loc(). Only
>>> the latter invokes dump_loc and gets the newline at the start of the
>>> message. The typical usage seems to be to start a message via
>>> dump_printf_loc, and then use dump_printf to emit parts of the message
>>> (thus not requiring a newline), but I think it may lead to problems to
>>> rely on this assumption.
>>>
>>> So if you agree, I will simply remove the newline altogether from
>>> dump_loc, and ensure that all clients of dump_printf/dump_printf_loc
>>> include a newline char as appropriate in the string they pass.
>>
>>
>> As a helper function, dump_loc should not blindly emit new line as it
>> has no context.  I have tried to remove it, and push the newline to
>> higher level helpers -- it mostly works, but the vectorizer verbose
>> messages need serious clean up -- most of them assume that
>> dump_printf_loc does not end with new line, so that the expression
>> dump can follow in the same line (the message texts need clean up too
>> -- i do not like the === === in info messages).
>
> I know, but we should really do that cleanup.

I can work on this and will send a separate patch.
Teresa
Richard Biener Aug. 29, 2013, 1:41 p.m. UTC | #13
Ok.

Richard.

On Thu, Aug 29, 2013 at 3:18 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Wed, Aug 28, 2013 at 9:07 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Wed, Aug 28, 2013 at 7:09 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Wed, Aug 28, 2013 at 4:01 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Wed, Aug 7, 2013 at 7:23 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>>>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>>>>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>>>>>>>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>>>>>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>>>>>>>> >> This patch ports messages to the new dump framework,
>>>>>>>> >
>>>>>>>> > It would be great this new framework was documented somewhere.  I lost
>>>>>>>> > track of what was agreed it would be and from the uses in the
>>>>>>>> > vectorizer I was never quite sure how to utilize it in other passes.
>>>>>>>>
>>>>>>>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>>>>>>>> wiki or elsewhere?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>>
>>>>>>>> >
>>>>>>>> > I'd also like to point out two other minor things inline:
>>>>>>>> >
>>>>>>>> > [...]
>>>>>>>> >
>>>>>>>> >> 2013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>>>>> >>             Dehao Chen  <dehao@google.com>
>>>>>>>> >>
>>>>>>>> >>         * dumpfile.c (dump_loc): Add column number to output, make newlines
>>>>>>>> >>         consistent.
>>>>>>>> >>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>>>>>> >>         * ipa-inline-transform.c (clone_inlined_nodes):
>>>>>>>> >>         (cgraph_node_opt_info): New function.
>>>>>>>> >>         (cgraph_node_call_chain): Ditto.
>>>>>>>> >>         (dump_inline_decision): Ditto.
>>>>>>>> >>         (inline_call): Invoke dump_inline_decision.
>>>>>>>> >>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>>>>> >>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>>>>> >>         (compute_branch_probabilities): Ditto.
>>>>>>>> >>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>>>>> >>         when pass not in any opt group.
>>>>>>>> >>         * value-prof.c (check_counter): Use new dump framework.
>>>>>>>> >>         (find_func_by_funcdef_no): Ditto.
>>>>>>>> >>         (check_ic_target): Ditto.
>>>>>>>> >>         * coverage.c (get_coverage_counts): Ditto.
>>>>>>>> >>         (coverage_init): Setup new dump framework.
>>>>>>>> >>         * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>>>>>>>> >>         * ipa-inline.h (is_in_ipa_inline): Declare.
>>>>>>>> >>
>>>>>>>> >>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>>>>> >>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>>>>> >>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>>>>> >>         * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> > [...]
>>>>>>>> >
>>>>>>>> >> Index: ipa-inline-transform.c
>>>>>>>> >> ===================================================================
>>>>>>>> >> --- ipa-inline-transform.c      (revision 201461)
>>>>>>>> >> +++ ipa-inline-transform.c      (working copy)
>>>>>>>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>>>>> >>  }
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> +#define MAX_INT_LENGTH 20
>>>>>>>> >> +
>>>>>>>> >> +/* Return NODE's name and profile count, if available.  */
>>>>>>>> >> +
>>>>>>>> >> +static const char *
>>>>>>>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>>>>>>>> >> +{
>>>>>>>> >> +  char *buf;
>>>>>>>> >> +  size_t buf_size;
>>>>>>>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>>>>>> >> +
>>>>>>>> >> +  if (!bfd_name)
>>>>>>>> >> +    bfd_name = "unknown";
>>>>>>>> >> +
>>>>>>>> >> +  buf_size = strlen (bfd_name) + 1;
>>>>>>>> >> +  if (profile_info)
>>>>>>>> >> +    buf_size += (MAX_INT_LENGTH + 3);
>>>>>>>> >> +
>>>>>>>> >> +  buf = (char *) xmalloc (buf_size);
>>>>>>>> >> +
>>>>>>>> >> +  strcpy (buf, bfd_name);
>>>>>>>> >> +
>>>>>>>> >> +  if (profile_info)
>>>>>>>> >> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>>>>>> >> +  return buf;
>>>>>>>> >> +}
>>>>>>>> >
>>>>>>>> > I'm not sure if output of this function is aimed only at the user or
>>>>>>>> > if it is supposed to be used by gcc developers as well.  If the
>>>>>>>> > latter, an incredibly useful thing is to also dump node->symbol.order
>>>>>>>> > too.  We usually dump it after "/" sign separating it from node name.
>>>>>>>> > It is invaluable when examining decisions in C++ code where you can
>>>>>>>> > have lots of clones of a node (and also because existing dumps print
>>>>>>>> > it, it is easy to combine them).
>>>>>>>>
>>>>>>>> The output is useful for both power users doing performance tuning of
>>>>>>>> their application, and by gcc developers. Adding the id is not so
>>>>>>>> useful for the former, but I agree that it is very useful for compiler
>>>>>>>> developers. In fact, in the google branch version we emit more verbose
>>>>>>>> information (the lipo module id and the funcdef_no) to help uniquely
>>>>>>>> identify the routines and to aid in post-processing by humans and
>>>>>>>> tools. So it is probably useful to add something similar here too. Is
>>>>>>>> the node->symbol.order more or less unique than the funcdef_no? I see
>>>>>>>> that you added a patch a few months ago to print the
>>>>>>>> node->symbol.order in the function header, and it also has the
>>>>>>>> advantage as you note of matching up with existing ipa dumps.
>>>>>>>
>>>>>>> node->symbol.order is unique and if I remember correctly, it is not
>>>>>>> even recycled.  Clones, inline clones, thunks, every symbol table node
>>>>>>> gets its own symbol order so it should be more unique than funcdef_no.
>>>>>>> On the other hand it may be a bit cryptic for users but at the same
>>>>>>> time it is only one number.
>>>>>>
>>>>>> Ok, I am going to go ahead and add this to the output.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> >
>>>>>>>> > [...]
>>>>>>>> >
>>>>>>>> >> Index: ipa-inline.c
>>>>>>>> >> ===================================================================
>>>>>>>> >> --- ipa-inline.c        (revision 201461)
>>>>>>>> >> +++ ipa-inline.c        (working copy)
>>>>>>>> >> @@ -118,6 +118,9 @@ along with GCC; see the file COPYING3.  If not see
>>>>>>>> >>  static int overall_size;
>>>>>>>> >>  static gcov_type max_count;
>>>>>>>> >>
>>>>>>>> >> +/* Global variable to denote if it is in ipa-inline pass. */
>>>>>>>> >> +bool is_in_ipa_inline = false;
>>>>>>>> >> +
>>>>>>>> >>  /* Return false when inlining edge E would lead to violating
>>>>>>>> >>     limits on function unit growth or stack usage growth.
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> > In this age of removing global variables, are you sure you need this?
>>>>>>>> > The only user of this seems to be a function that is only being called
>>>>>>>> > from inline_call... can that ever happen when not inlining?  If you
>>>>>>>> > plan to use this function also elsewhere, perhaps the callers will
>>>>>>>> > know whether we are inlining or not and can provide this in a
>>>>>>>> > parameter?
>>>>>>>>
>>>>>>>> This is to distinguish early inlining from ipa inlining.
>>>>>>>
>>>>>>> Oh, right, I did not realize that the IPA part was the important bit
>>>>>>> of the name.
>>>>>>>
>>>>>>>> The volume of
>>>>>>>> early inlining messages is too high to be on for the default setting
>>>>>>>> of -fopt-info, and are not as interesting usually for performance
>>>>>>>> tuning. The dumper will only emit the early inline messages under a
>>>>>>>> more verbose setting (MSG_NOTE):
>>>>>>>>       dump_printf_loc (is_in_ipa_inline ? MSG_OPTIMIZED_LOCATIONS : MSG_NOTE ...
>>>>>>>> The other way I can see to distinguish this would be to check the
>>>>>>>> always_inline_functions_inlined flag on the caller's function. It
>>>>>>>> could also be possible to pass down a flag from the callers of
>>>>>>>> inline_call, but at least one caller (flatten_functions) is shared
>>>>>>>> between early and late inlining, so the flag needs to be passed
>>>>>>>> through that as well. WDYT?
>>>>>>>
>>>>>>> Did you mean flatten_function?  It already has a bool "early"
>>>>>>> parameter.  But I can see that being able to quickly figure out
>>>>>>> whether we are in early inliner or ipa inliner without much hassle is
>>>>>>> useful enough to justify a global variable a month ago, however I
>>>>>>> suppose we should not be introducing them now and so you'd have to put
>>>>>>> such stuff into... well, you'd probably have to put into the universe
>>>>>>> object somewhere because it is basically shared between two passes.
>>>>>>> Another option, even though somewhat hackish, would be to look at
>>>>>>> current_pass and see which pass it is.  I don't know, do what is
>>>>>>> easier or what you like more, just be aware of the problem.
>>>>>>
>>>>>> After thinking about this some more, I think passing down an early
>>>>>> flag from callers is the cleanest way to go.
>>>>>>
>>>>>> I'll fix these and post a new patch later today.
>>>>>
>>>>> New patch below that removes this global variable, and also outputs
>>>>> the node->symbol.order (in square brackets after the function name so
>>>>> as to not clutter it). Inline messages with profile data look look:
>>>>>
>>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>>
>>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>>>> understandable by GCC users, not only GCC developers.
>>>
>>> The main part that is only useful/understandable to gcc developers is
>>> the node->symbol.order in square brackes, requested by Martin. One
>>> possibility is that I could put that part under a param, disabled by
>>> default. We have something similar on the google branches that emits
>>> LIPO module info in the message, enabled via a param.
>>>
>>> I'd argue that the other information (the profile counts, emitted only
>>> when using -fprofile-use, and the inline call chains) are useful if
>>> you want to understand whether and how critical inlines are occurring.
>>> I think this is the type of information that users focused on
>>> optimizations, as well as gcc developers, want when they use
>>> -fopt-info. Otherwise it is difficult to make sense of the inline
>>> information.
>>>
>>>>
>>>>> (without FDO the counts in parentheses and the call count would not be
>>>>> included).
>>>>>
>>>>> Ok for trunk?
>>>>
>>>> Let's split this patch.
>>>
>>> Ok.
>>>
>>>>
>>>>> Thanks,
>>>>> Teresa
>>>>>
>>>>> 013-08-06  Teresa Johnson  <tejohnson@google.com>
>>>>>             Dehao Chen  <dehao@google.com>
>>>>>
>>>>>         * dumpfile.c (dump_loc): Output column number, make newlines consistent.
>>>>
>>>> I don't like column numbers, they are of not much use generally.
>>>
>>> I added these here to get consistency with other messages (notes
>>> emitted via inform(), warnings, errors). Plus the dg-message testing
>>> was failing for the test cases that parse this output, since it
>>> expects the column to exist.
>>
>> The above change (output column number) and the changes in the
>> testsuite go with the change you have approved below (due to moving
>> some profile messages to the new framework). Ok to commit these along
>> with that approved portion?
>
> Richard is this part ok since it goes with the part you approved below?
>
> Thanks,
> Teresa
>
>>
>> Thanks,
>> Teresa
>>
>>>
>>>> Does
>>>> 'make newlines consitent' avoid all the spurious vertical spacing I see with
>>>> -fopt-info?
>>>
>>> Well, it helps get us there. The problem was that before, since
>>> dump_loc was not consistently emitting newlines, the calls had to emit
>>> their own newlines manually in the string to ensure there was a
>>> newline at all. I was thinking that once this is fixed I could go back
>>> and clean up all those calls by removing the newlines in the string. I
>>> could split this part into a separate patch and do both at once.
>>>
>>> However, after thinking about this some more this morning, I am
>>> wondering whether it is better to remove the newline emission
>>> completely from dump_loc and rely on the caller to put the newline in
>>> the string. The reason is that there are 2 high level interfaces to
>>> the new dump infrastructure, dump_printf() and dump_printf_loc(). Only
>>> the latter invokes dump_loc and gets the newline at the start of the
>>> message. The typical usage seems to be to start a message via
>>> dump_printf_loc, and then use dump_printf to emit parts of the message
>>> (thus not requiring a newline), but I think it may lead to problems to
>>> rely on this assumption.
>>>
>>> So if you agree, I will simply remove the newline altogether from
>>> dump_loc, and ensure that all clients of dump_printf/dump_printf_loc
>>> include a newline char as appropriate in the string they pass.
>>>
>>>>
>>>>>         * dumpfile.h (OPTGROUP_OTHER): Add and enable under OPTGROUP_ALL.
>>>>
>>>> Good change - please split this out (with the related changes) and commit it.
>>>
>>> Ok, thanks. Will do.
>>>
>>>>
>>>>>         * ipa-inline-transform.c (cgraph_node_opt_info): New function.
>>>>>         (cgraph_node_call_chain): Ditto.
>>>>>         (dump_inline_decision): Ditto.
>>>>>         (inline_call): Invoke dump_inline_decision, new parameter.
>>>>
>>>> The inline stuff should be split and re-sent, it's non-obvious to me (extra
>>>> function parameters are not documented for example).  I'd rather have
>>>> inline_and_report_call () for example instead of an extra bool parameter.
>>>> But let's iterate over this once it's split out.
>>>
>>> Ok, I will send this separately. I guess we could have a separate
>>> interface inline_and_report_call that is a wrapper around inline_call
>>> and simply invokes the dumper. Note that flatten_function will need to
>>> conditionally call one of the two interfaces based on the value of its
>>> bool early parameter though.
>>>
>>>>
>>>>>         * doc/invoke.texi: Document optall -fopt-info flag.
>>>>>         * profile.c (read_profile_edge_counts): Use new dump framework.
>>>>>         (compute_branch_probabilities): Ditto.
>>>>>         * passes.c (pass_manager::register_one_dump_file): Use OPTGROUP_OTHER
>>>>>         when pass not in any opt group.
>>>>>         * value-prof.c (check_counter): Use new dump framework.
>>>>>         (find_func_by_funcdef_no): Ditto.
>>>>>         (check_ic_target): Ditto.
>>>>>         * coverage.c (get_coverage_counts): Ditto.
>>>>>         (coverage_init): Setup new dump framework.
>>>>
>>>> These pieces look good to me.
>>>>
>>>>>         * ipa-inline.c (recursive_inlining): New inline_call parameter.
>>>>>         (inline_small_functions): Ditto.
>>>>>         (flatten_function): Ditto.
>>>>>         (ipa_inline): Ditto.
>>>>>         (inline_always_inline_functions): Ditto.
>>>>>         (early_inline_small_functions): Ditto.
>>>>>         * ipa-inline.h: Ditto.
>>>>>
>>>>>         * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>>>>>         * testsuite/gcc.dg/pr26570.c: Ditto.
>>>>>         * testsuite/gcc.dg/pr32773.c: Ditto.
>>>>>         * testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto.
>>>>
>>>> Why?  Just remove the stray dg- annotations that deal with the unwanted output?
>>>
>>> Because there are dg-message annotations that want to confirm this output.
>>>
>>> Teresa
>>>
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>>         * testsuite/gcc.dg/inline-dump.c: New test.
>>>>>
>>>>> Index: dumpfile.c
>>>>> ===================================================================
>>>>> --- dumpfile.c  (revision 201461)
>>>>> +++ dumpfile.c  (working copy)
>>>>> @@ -257,16 +257,18 @@ dump_open_alternate_stream (struct dump_file_info
>>>>>  void
>>>>>  dump_loc (int dump_kind, FILE *dfile, source_location loc)
>>>>>  {
>>>>> -  /* Currently vectorization passes print location information.  */
>>>>>    if (dump_kind)
>>>>>      {
>>>>> +      /* Ensure dump message starts on a new line.  */
>>>>> +      fprintf (dfile, "\n");
>>>>>        if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
>>>>> -        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
>>>>> -                 LOCATION_LINE (loc));
>>>>> +        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
>>>>> +                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
>>>>>        else if (current_function_decl)
>>>>> -        fprintf (dfile, "\n%s:%d: note: ",
>>>>> +        fprintf (dfile, "%s:%d:%d: note: ",
>>>>>                   DECL_SOURCE_FILE (current_function_decl),
>>>>> -                 DECL_SOURCE_LINE (current_function_decl));
>>>>> +                 DECL_SOURCE_LINE (current_function_decl),
>>>>> +                 DECL_SOURCE_COLUMN (current_function_decl));
>>>>>      }
>>>>>  }
>>>>>
>>>>> Index: dumpfile.h
>>>>> ===================================================================
>>>>> --- dumpfile.h  (revision 201461)
>>>>> +++ dumpfile.h  (working copy)
>>>>> @@ -97,8 +97,9 @@ enum tree_dump_index
>>>>>  #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
>>>>>  #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
>>>>>  #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
>>>>> +#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
>>>>>  #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
>>>>> -                              | OPTGROUP_VEC)
>>>>> +                              | OPTGROUP_VEC | OPTGROUP_OTHER)
>>>>>
>>>>>  /* Define a tree dump switch.  */
>>>>>  struct dump_file_info
>>>>> Index: ipa-inline-transform.c
>>>>> ===================================================================
>>>>> --- ipa-inline-transform.c      (revision 201461)
>>>>> +++ ipa-inline-transform.c      (working copy)
>>>>> @@ -192,6 +192,111 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>>  }
>>>>>
>>>>>
>>>>> +#define MAX_INT_LENGTH 20
>>>>> +
>>>>> +/* Return NODE's name and profile count, if available.  */
>>>>> +
>>>>> +static const char *
>>>>> +cgraph_node_opt_info (struct cgraph_node *node)
>>>>> +{
>>>>> +  char *buf;
>>>>> +  size_t buf_size;
>>>>> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
>>>>> +
>>>>> +  if (!bfd_name)
>>>>> +    bfd_name = "unknown";
>>>>> +
>>>>> +  buf_size = strlen (bfd_name) + 1;
>>>>> +  if (profile_info)
>>>>> +    buf_size += (MAX_INT_LENGTH + 3);
>>>>> +  buf_size += MAX_INT_LENGTH;
>>>>> +
>>>>> +  buf = (char *) xmalloc (buf_size);
>>>>> +
>>>>> +  strcpy (buf, bfd_name);
>>>>> +  //sprintf (buf, "%s/%i", buf, node->symbol.order);
>>>>> +  sprintf (buf, "%s [%i]", buf, node->symbol.order);
>>>>> +
>>>>> +  if (profile_info)
>>>>> +    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
>>>>> +  return buf;
>>>>> +}
>>>>> +
>>>>> +
>>>>> +/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
>>>>> +   function that the caller is inlined to in FINAL_CALLER.  */
>>>>> +
>>>>> +static const char *
>>>>> +cgraph_node_call_chain (struct cgraph_node *caller,
>>>>> +                       struct cgraph_node **final_caller)
>>>>> +{
>>>>> +  struct cgraph_node *node;
>>>>> +  const char *via_str = " (via inline instance";
>>>>> +  size_t current_string_len = strlen (via_str) + 1;
>>>>> +  size_t buf_size = current_string_len;
>>>>> +  char *buf = (char *) xmalloc (buf_size);
>>>>> +
>>>>> +  buf[0] = 0;
>>>>> +  gcc_assert (caller->global.inlined_to != NULL);
>>>>> +  strcat (buf, via_str);
>>>>> +  for (node = caller; node->global.inlined_to != NULL;
>>>>> +       node = node->callers->caller)
>>>>> +    {
>>>>> +      const char *name = cgraph_node_opt_info (node);
>>>>> +      current_string_len += (strlen (name) + 1);
>>>>> +      if (current_string_len >= buf_size)
>>>>> +       {
>>>>> +         buf_size = current_string_len * 2;
>>>>> +         buf = (char *) xrealloc (buf, buf_size);
>>>>> +       }
>>>>> +      strcat (buf, " ");
>>>>> +      strcat (buf, name);
>>>>> +    }
>>>>> +  strcat (buf, ")");
>>>>> +  *final_caller = node;
>>>>> +  return buf;
>>>>> +}
>>>>> +
>>>>> +
>>>>> +/* Dump the inline decision of EDGE.  */
>>>>> +
>>>>> +static void
>>>>> +dump_inline_decision (struct cgraph_edge *edge, bool early)
>>>>> +{
>>>>> +  location_t locus;
>>>>> +  const char *inline_chain_text;
>>>>> +  const char *call_count_text;
>>>>> +  struct cgraph_node *final_caller = edge->caller;
>>>>> +
>>>>> +  if (final_caller->global.inlined_to != NULL)
>>>>> +    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
>>>>> +  else
>>>>> +    inline_chain_text = "";
>>>>> +
>>>>> +  if (edge->count > 0)
>>>>> +    {
>>>>> +      const char *call_count_str = " with call count ";
>>>>> +      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
>>>>> +      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
>>>>> +              edge->count);
>>>>> +      call_count_text = buf;
>>>>> +    }
>>>>> +  else
>>>>> +    {
>>>>> +      call_count_text = "";
>>>>> +    }
>>>>> +
>>>>> +  locus = gimple_location (edge->call_stmt);
>>>>> +  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
>>>>> +                   locus,
>>>>> +                   "%s inlined into %s%s%s\n",
>>>>> +                   cgraph_node_opt_info (edge->callee),
>>>>> +                   cgraph_node_opt_info (final_caller),
>>>>> +                   call_count_text,
>>>>> +                   inline_chain_text);
>>>>> +}
>>>>> +
>>>>> +
>>>>>  /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
>>>>>     specify whether profile of original function should be updated.  If any new
>>>>>     indirect edges are discovered in the process, add them to NEW_EDGES, unless
>>>>> @@ -205,7 +310,8 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
>>>>>  bool
>>>>>  inline_call (struct cgraph_edge *e, bool update_original,
>>>>>              vec<cgraph_edge_p> *new_edges,
>>>>> -            int *overall_size, bool update_overall_summary)
>>>>> +            int *overall_size, bool update_overall_summary,
>>>>> +             bool early)
>>>>>  {
>>>>>    int old_size = 0, new_size = 0;
>>>>>    struct cgraph_node *to = NULL;
>>>>> @@ -218,6 +324,9 @@ inline_call (struct cgraph_edge *e, bool update_or
>>>>>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>>>>>  #endif
>>>>>
>>>>> +  if (dump_enabled_p ())
>>>>> +    dump_inline_decision (e, early);
>>>>> +
>>>>>    /* Don't inline inlined edges.  */
>>>>>    gcc_assert (e->inline_failed);
>>>>>    /* Don't even think of inlining inline clone.  */
>>>>> Index: doc/invoke.texi
>>>>> ===================================================================
>>>>> --- doc/invoke.texi     (revision 201461)
>>>>> +++ doc/invoke.texi     (working copy)
>>>>> @@ -6234,6 +6234,9 @@ Enable dumps from all loop optimizations.
>>>>>  Enable dumps from all inlining optimizations.
>>>>>  @item vec
>>>>>  Enable dumps from all vectorization optimizations.
>>>>> +@item optall
>>>>> +Enable dumps from all optimizations. This is a superset of
>>>>> +the optimization groups listed above.
>>>>>  @end table
>>>>>
>>>>>  For example,
>>>>> Index: profile.c
>>>>> ===================================================================
>>>>> --- profile.c   (revision 201461)
>>>>> +++ profile.c   (working copy)
>>>>> @@ -432,8 +432,8 @@ read_profile_edge_counts (gcov_type *exec_counts)
>>>>>                     if (flag_profile_correction)
>>>>>                       {
>>>>>                         static bool informed = 0;
>>>>> -                       if (!informed)
>>>>> -                         inform (input_location,
>>>>> +                       if (dump_enabled_p () && !informed)
>>>>> +                         dump_printf_loc (MSG_NOTE, input_location,
>>>>>                                   "corrupted profile info: edge count
>>>>> exceeds maximal count");
>>>>>                         informed = 1;
>>>>>                       }
>>>>> @@ -692,10 +692,11 @@ compute_branch_probabilities (unsigned cfg_checksu
>>>>>         {
>>>>>           /* Inconsistency detected. Make it flow-consistent. */
>>>>>           static int informed = 0;
>>>>> -         if (informed == 0)
>>>>> +         if (dump_enabled_p () && informed == 0)
>>>>>             {
>>>>>               informed = 1;
>>>>> -             inform (input_location, "correcting inconsistent profile data");
>>>>> +             dump_printf_loc (MSG_NOTE, input_location,
>>>>> +                              "correcting inconsistent profile data");
>>>>>             }
>>>>>           correct_negative_edge_counts ();
>>>>>           /* Set bb counts to the sum of the outgoing edge counts */
>>>>> Index: passes.c
>>>>> ===================================================================
>>>>> --- passes.c    (revision 201461)
>>>>> +++ passes.c    (working copy)
>>>>> @@ -524,6 +524,11 @@ pass_manager::register_one_dump_file (struct opt_p
>>>>>    flag_name = concat (prefix, name, num, NULL);
>>>>>    glob_name = concat (prefix, name, NULL);
>>>>>    optgroup_flags |= pass->optinfo_flags;
>>>>> +  /* For any passes that do not have an optgroup set, and which are not
>>>>> +     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
>>>>> +     any dump messages are emitted properly under -fopt-info(-optall).  */
>>>>> +  if (optgroup_flags == OPTGROUP_NONE)
>>>>> +    optgroup_flags = OPTGROUP_OTHER;
>>>>>    id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
>>>>>    set_pass_for_id (id, pass);
>>>>>    full_name = concat (prefix, pass->name, num, NULL);
>>>>> Index: value-prof.c
>>>>> ===================================================================
>>>>> --- value-prof.c        (revision 201461)
>>>>> +++ value-prof.c        (working copy)
>>>>> @@ -585,9 +585,11 @@ check_counter (gimple stmt, const char * name,
>>>>>                : DECL_SOURCE_LOCATION (current_function_decl);
>>>>>        if (flag_profile_correction)
>>>>>          {
>>>>> -         inform (locus, "correcting inconsistent value profile: "
>>>>> -                 "%s profiler overall count (%d) does not match BB count "
>>>>> -                  "(%d)", name, (int)*all, (int)bb_count);
>>>>> +          if (dump_enabled_p ())
>>>>> +            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>>>> +                             "correcting inconsistent value profile: %s "
>>>>> +                             "profiler overall count (%d) does not match BB "
>>>>> +                             "count (%d)", name, (int)*all, (int)bb_count);
>>>>>           *all = bb_count;
>>>>>           if (*count > *all)
>>>>>              *count = *all;
>>>>> @@ -1209,9 +1211,11 @@ find_func_by_funcdef_no (int func_id)
>>>>>    int max_id = get_last_funcdef_no ();
>>>>>    if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
>>>>>      {
>>>>> -      if (flag_profile_correction)
>>>>> -        inform (DECL_SOURCE_LOCATION (current_function_decl),
>>>>> -                "Inconsistent profile: indirect call target (%d) does
>>>>> not exist", func_id);
>>>>> +      if (flag_profile_correction && dump_enabled_p ())
>>>>> +        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
>>>>> +                         DECL_SOURCE_LOCATION (current_function_decl),
>>>>> +                         "Inconsistent profile: indirect call target (%d) "
>>>>> +                         "does not exist", func_id);
>>>>>        else
>>>>>          error ("Inconsistent profile: indirect call target (%d) does
>>>>> not exist", func_id);
>>>>>
>>>>> @@ -1235,8 +1239,10 @@ check_ic_target (gimple call_stmt, struct cgraph_n
>>>>>       return true;
>>>>>
>>>>>     locus =  gimple_location (call_stmt);
>>>>> -   inform (locus, "Skipping target %s with mismatching types for icall ",
>>>>> -           cgraph_node_name (target));
>>>>> +   if (dump_enabled_p ())
>>>>> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
>>>>> +                      "Skipping target %s with mismatching types for icall ",
>>>>> +                      cgraph_node_name (target));
>>>>>     return false;
>>>>>  }
>>>>>
>>>>> Index: coverage.c
>>>>> ===================================================================
>>>>> --- coverage.c  (revision 201461)
>>>>> +++ coverage.c  (working copy)
>>>>> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>>  #include "langhooks.h"
>>>>>  #include "hash-table.h"
>>>>>  #include "tree-iterator.h"
>>>>> +#include "tree-pass.h"
>>>>>  #include "cgraph.h"
>>>>>  #include "dumpfile.h"
>>>>>  #include "diagnostic-core.h"
>>>>> @@ -341,11 +342,13 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>>>      {
>>>>>        static int warned = 0;
>>>>>
>>>>> -      if (!warned++)
>>>>> -       inform (input_location, (flag_guess_branch_prob
>>>>> -                ? "file %s not found, execution counts estimated"
>>>>> -                : "file %s not found, execution counts assumed to be zero"),
>>>>> -               da_file_name);
>>>>> +      if (!warned++ && dump_enabled_p ())
>>>>> +       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>>> +                         (flag_guess_branch_prob
>>>>> +                          ? "file %s not found, execution counts estimated"
>>>>> +                          : "file %s not found, execution counts assumed to "
>>>>> +                            "be zero"),
>>>>> +                         da_file_name);
>>>>>        return NULL;
>>>>>      }
>>>>>
>>>>> @@ -369,21 +372,25 @@ get_coverage_counts (unsigned counter, unsigned ex
>>>>>         warning_at (input_location, OPT_Wcoverage_mismatch,
>>>>>                     "the control flow of function %qE does not match "
>>>>>                     "its profile data (counter %qs)", id, ctr_names[counter]);
>>>>> -      if (warning_printed)
>>>>> +      if (warning_printed && dump_enabled_p ())
>>>>>         {
>>>>> -        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
>>>>> -                "the mismatch but performance may drop if the
>>>>> function is hot");
>>>>> +          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>>> +                           "use -Wno-error=coverage-mismatch to tolerate "
>>>>> +                           "the mismatch but performance may drop if the "
>>>>> +                           "function is hot");
>>>>>
>>>>>           if (!seen_error ()
>>>>>               && !warned++)
>>>>>             {
>>>>> -             inform (input_location, "coverage mismatch ignored");
>>>>> -             inform (input_location, flag_guess_branch_prob
>>>>> -                     ? G_("execution counts estimated")
>>>>> -                     : G_("execution counts assumed to be zero"));
>>>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>>> +                               "coverage mismatch ignored");
>>>>> +             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>>> +                               flag_guess_branch_prob
>>>>> +                               ? G_("execution counts estimated")
>>>>> +                               : G_("execution counts assumed to be zero"));
>>>>>               if (!flag_guess_branch_prob)
>>>>> -               inform (input_location,
>>>>> -                       "this can result in poorly optimized code");
>>>>> +               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
>>>>> +                                 "this can result in poorly optimized code");
>>>>>             }
>>>>>         }
>>>>>
>>>>> @@ -1103,6 +1110,11 @@ coverage_init (const char *filename)
>>>>>    int len = strlen (filename);
>>>>>    int prefix_len = 0;
>>>>>
>>>>> +  /* Since coverage_init is invoked very early, before the pass
>>>>> +     manager, we need to set up the dumping explicitly. This is
>>>>> +     similar to the handling in finish_optimization_passes.  */
>>>>> +  dump_start (pass_profile.pass.static_pass_number, NULL);
>>>>> +
>>>>>    if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
>>>>>      profile_data_prefix = getpwd ();
>>>>>
>>>>> @@ -1145,6 +1157,8 @@ coverage_init (const char *filename)
>>>>>           gcov_write_unsigned (bbg_file_stamp);
>>>>>         }
>>>>>      }
>>>>> +
>>>>> +  dump_finish (pass_profile.pass.static_pass_number);
>>>>>  }
>>>>>
>>>>>  /* Performs file-level cleanup.  Close notes file, generate coverage
>>>>> Index: ipa-inline.c
>>>>> ===================================================================
>>>>> --- ipa-inline.c        (revision 201461)
>>>>> +++ ipa-inline.c        (working copy)
>>>>> @@ -1322,7 +1322,7 @@ recursive_inlining (struct cgraph_edge *edge,
>>>>>            reset_edge_growth_cache (curr);
>>>>>         }
>>>>>
>>>>> -      inline_call (curr, false, new_edges, &overall_size, true);
>>>>> +      inline_call (curr, false, new_edges, &overall_size, true, false);
>>>>>        lookup_recursive_calls (node, curr->callee, heap);
>>>>>        n++;
>>>>>      }
>>>>> @@ -1612,7 +1612,8 @@ inline_small_functions (void)
>>>>>             fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
>>>>>
>>>>>           gcc_checking_assert (!callee->global.inlined_to);
>>>>> -         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
>>>>> +         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
>>>>> +                       false);
>>>>>           if (flag_indirect_inlining)
>>>>>             add_new_edges_to_heap (edge_heap, new_indirect_edges);
>>>>>
>>>>> @@ -1733,7 +1734,7 @@ flatten_function (struct cgraph_node *node, bool e
>>>>>                  xstrdup (cgraph_node_name (callee)),
>>>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>>>        orig_callee = callee;
>>>>> -      inline_call (e, true, NULL, NULL, false);
>>>>> +      inline_call (e, true, NULL, NULL, false, early);
>>>>>        if (e->callee != orig_callee)
>>>>>         orig_callee->symbol.aux = (void *) node;
>>>>>        flatten_function (e->callee, early);
>>>>> @@ -1852,7 +1853,8 @@ ipa_inline (void)
>>>>>                                    inline_summary
>>>>> (node->callers->caller)->size);
>>>>>                         }
>>>>>
>>>>> -                     inline_call (node->callers, true, NULL, NULL, true);
>>>>> +                     inline_call (node->callers, true, NULL, NULL, true,
>>>>> +                                   false);
>>>>>                       if (dump_file)
>>>>>                         fprintf (dump_file,
>>>>>                                  " Inlined into %s which now has %i size\n",
>>>>> @@ -1925,7 +1927,7 @@ inline_always_inline_functions (struct cgraph_node
>>>>>         fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
>>>>>                  xstrdup (cgraph_node_name (e->callee)),
>>>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>>> -      inline_call (e, true, NULL, NULL, false);
>>>>> +      inline_call (e, true, NULL, NULL, false, true);
>>>>>        inlined = true;
>>>>>      }
>>>>>    if (inlined)
>>>>> @@ -1977,7 +1979,7 @@ early_inline_small_functions (struct cgraph_node *
>>>>>         fprintf (dump_file, " Inlining %s into %s.\n",
>>>>>                  xstrdup (cgraph_node_name (callee)),
>>>>>                  xstrdup (cgraph_node_name (e->caller)));
>>>>> -      inline_call (e, true, NULL, NULL, true);
>>>>> +      inline_call (e, true, NULL, NULL, true, true);
>>>>>        inlined = true;
>>>>>      }
>>>>>
>>>>> Index: ipa-inline.h
>>>>> ===================================================================
>>>>> --- ipa-inline.h        (revision 201461)
>>>>> +++ ipa-inline.h        (working copy)
>>>>> @@ -228,7 +228,8 @@ void free_growth_caches (void);
>>>>>  void compute_inline_parameters (struct cgraph_node *, bool);
>>>>>
>>>>>  /* In ipa-inline-transform.c  */
>>>>> -bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
>>>>> int *, bool);
>>>>> +bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
>>>>> +                  bool, bool);
>>>>>  unsigned int inline_transform (struct cgraph_node *);
>>>>>  void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);
>>>>>
>>>>> Index: testsuite/gcc.dg/pr40209.c
>>>>> ===================================================================
>>>>> --- testsuite/gcc.dg/pr40209.c  (revision 201461)
>>>>> +++ testsuite/gcc.dg/pr40209.c  (working copy)
>>>>> @@ -1,5 +1,5 @@
>>>>>  /* { dg-do compile } */
>>>>> -/* { dg-options "-O2 -fprofile-use" } */
>>>>> +/* { dg-options "-O2 -fprofile-use -fopt-info" } */
>>>>>
>>>>>  void process(const char *s);
>>>>>
>>>>> Index: testsuite/gcc.dg/pr26570.c
>>>>> ===================================================================
>>>>> --- testsuite/gcc.dg/pr26570.c  (revision 201461)
>>>>> +++ testsuite/gcc.dg/pr26570.c  (working copy)
>>>>> @@ -1,5 +1,5 @@
>>>>>  /* { dg-do compile } */
>>>>> -/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
>>>>> +/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */
>>>>>
>>>>>  unsigned test (unsigned a, unsigned b)
>>>>>  {
>>>>> Index: testsuite/gcc.dg/pr32773.c
>>>>> ===================================================================
>>>>> --- testsuite/gcc.dg/pr32773.c  (revision 201461)
>>>>> +++ testsuite/gcc.dg/pr32773.c  (working copy)
>>>>> @@ -1,6 +1,6 @@
>>>>>  /* { dg-do compile } */
>>>>> -/* { dg-options "-O -fprofile-use" } */
>>>>> -/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
>>>>> +/* { dg-options "-O -fprofile-use -fopt-info" } */
>>>>> +/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */
>>>>>
>>>>>  void foo (int *p)
>>>>>  {
>>>>> Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
>>>>> ===================================================================
>>>>> --- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
>>>>> +++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
>>>>> @@ -1,7 +1,7 @@
>>>>>  // PR tree-optimization/39557
>>>>>  // invalid post-dom info leads to infinite loop
>>>>>  // { dg-do run }
>>>>> -// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
>>>>> +// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
>>>>> -fno-rtti" }
>>>>>
>>>>>  struct C
>>>>>  {
>>>>> Index: testsuite/gcc.dg/inline-dump.c
>>>>> ===================================================================
>>>>> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
>>>>> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
>>>>> @@ -0,0 +1,11 @@
>>>>> +/* Verify that -fopt-info can output correct inline info.  */
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
>>>>> +static inline int leaf() {
>>>>> +  int i, ret = 0;
>>>>> +  for (i = 0; i < 10; i++)
>>>>> +    ret += i;
>>>>> +  return ret;
>>>>> +}
>>>>> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
>>>>> leaf .*inlined into bar .*via inline instance foo.*\n" } */
>>>>> +int bar(void) { return foo(); }
>>>>>>
>>>>>> Thanks,
>>>>>> Teresa
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Martin
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Aug. 29, 2013, 3:15 p.m. UTC | #14
On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
>>>> New patch below that removes this global variable, and also outputs
>>>> the node->symbol.order (in square brackets after the function name so
>>>> as to not clutter it). Inline messages with profile data look look:
>>>>
>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>
>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>>> understandable by GCC users, not only GCC developers.
>>
>> The main part that is only useful/understandable to gcc developers is
>> the node->symbol.order in square brackes, requested by Martin. One
>> possibility is that I could put that part under a param, disabled by
>> default. We have something similar on the google branches that emits
>> LIPO module info in the message, enabled via a param.
>
> But we have _dump files_ for that.  That's the developer-consumed
> form of opt-info.  -fopt-info is purely user sugar and for usual translation
> units it shouldn't exceed a single terminal full of output.

But as a developer I don't want to have to parse lots of dump files
for a summary of the major optimizations performed (e.g. inlining,
unrolling) for an application, unless I am diving into the reasons for
why or why not one of those optimizations occurred in a particular
location. I really do want a summary emitted to stderr so that it is
easily searchable/summarizable for the app as a whole.

For example, some of the apps I am interested in have thousands of
input files, and trying to collect and parse dump files for each and
every one is overwhelming (it probably would be even if my input files
numbered in the hundreds). What has been very useful is having these
high level summary messages of inlines and unrolls emitted to stderr
by -fopt-info. Then it is easy to search and sort by hotness to get a
feel for things like what inlines are missing when moving to a new
compiler, or compiling a new version of the source, for example. Then
you know which files to focus on and collect dump files for.

>
>> I'd argue that the other information (the profile counts, emitted only
>> when using -fprofile-use, and the inline call chains) are useful if
>> you want to understand whether and how critical inlines are occurring.
>> I think this is the type of information that users focused on
>> optimizations, as well as gcc developers, want when they use
>> -fopt-info. Otherwise it is difficult to make sense of the inline
>> information.
>
> Well, I doubt that inline information is interesting to users unless we are
> able to aggressively filter it to what users are interested in.  Which IMHO
> isn't possible - users are interested in "I have not inlined this even though
> inlining would severely improve performance" which would indicate a bug
> in the heuristics we can reliably detect and thus it wouldn't be there.

I have interacted with users who are aware of optimizations such as
inlining and unrolling and want to look at that information to
diagnose performance differences when refactoring code or using a new
compiler version. I also think inlining (especially cross-module) is
one example of an optimization that is still being tuned, and user
reports of performance issues related to that have been useful.

I really think that the two groups of people who will find -fopt-info
useful are gcc developers and savvy performance-hungry users. For the
former group the additional info is extremely useful. For the latter
group some of the extra information may not be required (although a
call count is useful for those using profile feedback), but IMO is not
unreasonable.

Teresa
Andreas Schwab Aug. 30, 2013, 7:17 a.m. UTC | #15
Teresa Johnson <tejohnson@google.com> writes:

> Index: testsuite/gcc.dg/inline-dump.c
> ===================================================================
> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
> @@ -0,0 +1,11 @@
> +/* Verify that -fopt-info can output correct inline info.  */
> +/* { dg-do compile } */
> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
> +static inline int leaf() {
> +  int i, ret = 0;
> +  for (i = 0; i < 10; i++)
> +    ret += i;
> +  return ret;
> +}
> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
> leaf .*inlined into bar .*via inline instance foo.*\n" } */

I don't see that message, neither on ia64 nor m68k.

Andreas.
Richard Biener Aug. 30, 2013, 8:30 a.m. UTC | #16
On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>>>>> New patch below that removes this global variable, and also outputs
>>>>> the node->symbol.order (in square brackets after the function name so
>>>>> as to not clutter it). Inline messages with profile data look look:
>>>>>
>>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>>
>>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>>>> understandable by GCC users, not only GCC developers.
>>>
>>> The main part that is only useful/understandable to gcc developers is
>>> the node->symbol.order in square brackes, requested by Martin. One
>>> possibility is that I could put that part under a param, disabled by
>>> default. We have something similar on the google branches that emits
>>> LIPO module info in the message, enabled via a param.
>>
>> But we have _dump files_ for that.  That's the developer-consumed
>> form of opt-info.  -fopt-info is purely user sugar and for usual translation
>> units it shouldn't exceed a single terminal full of output.
>
> But as a developer I don't want to have to parse lots of dump files
> for a summary of the major optimizations performed (e.g. inlining,
> unrolling) for an application, unless I am diving into the reasons for
> why or why not one of those optimizations occurred in a particular
> location. I really do want a summary emitted to stderr so that it is
> easily searchable/summarizable for the app as a whole.
>
> For example, some of the apps I am interested in have thousands of
> input files, and trying to collect and parse dump files for each and
> every one is overwhelming (it probably would be even if my input files
> numbered in the hundreds). What has been very useful is having these
> high level summary messages of inlines and unrolls emitted to stderr
> by -fopt-info. Then it is easy to search and sort by hotness to get a
> feel for things like what inlines are missing when moving to a new
> compiler, or compiling a new version of the source, for example. Then
> you know which files to focus on and collect dump files for.

I thought we can direct dump files to stderr now?  So, just use
-fdump-tree-all=stderr

and grep its contents.

>>
>>> I'd argue that the other information (the profile counts, emitted only
>>> when using -fprofile-use, and the inline call chains) are useful if
>>> you want to understand whether and how critical inlines are occurring.
>>> I think this is the type of information that users focused on
>>> optimizations, as well as gcc developers, want when they use
>>> -fopt-info. Otherwise it is difficult to make sense of the inline
>>> information.
>>
>> Well, I doubt that inline information is interesting to users unless we are
>> able to aggressively filter it to what users are interested in.  Which IMHO
>> isn't possible - users are interested in "I have not inlined this even though
>> inlining would severely improve performance" which would indicate a bug
>> in the heuristics we can reliably detect and thus it wouldn't be there.
>
> I have interacted with users who are aware of optimizations such as
> inlining and unrolling and want to look at that information to
> diagnose performance differences when refactoring code or using a new
> compiler version. I also think inlining (especially cross-module) is
> one example of an optimization that is still being tuned, and user
> reports of performance issues related to that have been useful.
>
> I really think that the two groups of people who will find -fopt-info
> useful are gcc developers and savvy performance-hungry users. For the
> former group the additional info is extremely useful. For the latter
> group some of the extra information may not be required (although a
> call count is useful for those using profile feedback), but IMO is not
> unreasonable.

well, your proposed output wrecks my 80x24 terminal already due to overly
long lines.

In the end we may up with a verbosity level for each sub-set of opt-info
messages.  Ick.

Richard.

> Teresa
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Aug. 30, 2013, 1:17 p.m. UTC | #17
Sorry, I should not have committed that new test along with this
portion of the patch. Removed as of r202106.
Teresa

On Fri, Aug 30, 2013 at 12:17 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Teresa Johnson <tejohnson@google.com> writes:
>
>> Index: testsuite/gcc.dg/inline-dump.c
>> ===================================================================
>> --- testsuite/gcc.dg/inline-dump.c      (revision 0)
>> +++ testsuite/gcc.dg/inline-dump.c      (revision 0)
>> @@ -0,0 +1,11 @@
>> +/* Verify that -fopt-info can output correct inline info.  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
>> +static inline int leaf() {
>> +  int i, ret = 0;
>> +  for (i = 0; i < 10; i++)
>> +    ret += i;
>> +  return ret;
>> +}
>> +static inline int foo(void) { return leaf(); } /* { dg-message "note:
>> leaf .*inlined into bar .*via inline instance foo.*\n" } */
>
> I don't see that message, neither on ia64 nor m68k.
>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
Xinliang David Li Aug. 30, 2013, 4:27 p.m. UTC | #18
Except that in this form, the dump will be extremely large and not
suitable for very large applications. Besides, we might also want to
use the same machinery (dump_printf_loc etc) for dump file dumping.
The current behavior of using '-details' to turn on opt-info-all
messages for dump files are not desirable.  How about the following:

1) add a new dump_kind modifier so that when that modifier is
specified, the messages won't goto the alt_dumpfile (controlled by
-fopt-info), but only to primary dump file. With this, the inline
messages can be dumped via:

   dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .....)


2) add more flags in -fdump- support:

   -fdump-ipa-inline-opt   --> turn on opt-info messages only
   -fdump-ipa-inline-optall --> turn on opt-info-all messages
   -fdump-tree-pre-ir --> turn on GIMPLE dump only
   -fdump-tree-pre-details --> turn on everything (ir, optall, trace)

With this, developers can really just use


-fdump-ipa-inline-opt=stderr for inline messages.

thanks,

David

On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>>>>> New patch below that removes this global variable, and also outputs
>>>>>> the node->symbol.order (in square brackets after the function name so
>>>>>> as to not clutter it). Inline messages with profile data look look:
>>>>>>
>>>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>>>
>>>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>>>>> understandable by GCC users, not only GCC developers.
>>>>
>>>> The main part that is only useful/understandable to gcc developers is
>>>> the node->symbol.order in square brackes, requested by Martin. One
>>>> possibility is that I could put that part under a param, disabled by
>>>> default. We have something similar on the google branches that emits
>>>> LIPO module info in the message, enabled via a param.
>>>
>>> But we have _dump files_ for that.  That's the developer-consumed
>>> form of opt-info.  -fopt-info is purely user sugar and for usual translation
>>> units it shouldn't exceed a single terminal full of output.
>>
>> But as a developer I don't want to have to parse lots of dump files
>> for a summary of the major optimizations performed (e.g. inlining,
>> unrolling) for an application, unless I am diving into the reasons for
>> why or why not one of those optimizations occurred in a particular
>> location. I really do want a summary emitted to stderr so that it is
>> easily searchable/summarizable for the app as a whole.
>>
>> For example, some of the apps I am interested in have thousands of
>> input files, and trying to collect and parse dump files for each and
>> every one is overwhelming (it probably would be even if my input files
>> numbered in the hundreds). What has been very useful is having these
>> high level summary messages of inlines and unrolls emitted to stderr
>> by -fopt-info. Then it is easy to search and sort by hotness to get a
>> feel for things like what inlines are missing when moving to a new
>> compiler, or compiling a new version of the source, for example. Then
>> you know which files to focus on and collect dump files for.
>
> I thought we can direct dump files to stderr now?  So, just use
> -fdump-tree-all=stderr
>
> and grep its contents.
>
>>>
>>>> I'd argue that the other information (the profile counts, emitted only
>>>> when using -fprofile-use, and the inline call chains) are useful if
>>>> you want to understand whether and how critical inlines are occurring.
>>>> I think this is the type of information that users focused on
>>>> optimizations, as well as gcc developers, want when they use
>>>> -fopt-info. Otherwise it is difficult to make sense of the inline
>>>> information.
>>>
>>> Well, I doubt that inline information is interesting to users unless we are
>>> able to aggressively filter it to what users are interested in.  Which IMHO
>>> isn't possible - users are interested in "I have not inlined this even though
>>> inlining would severely improve performance" which would indicate a bug
>>> in the heuristics we can reliably detect and thus it wouldn't be there.
>>
>> I have interacted with users who are aware of optimizations such as
>> inlining and unrolling and want to look at that information to
>> diagnose performance differences when refactoring code or using a new
>> compiler version. I also think inlining (especially cross-module) is
>> one example of an optimization that is still being tuned, and user
>> reports of performance issues related to that have been useful.
>>
>> I really think that the two groups of people who will find -fopt-info
>> useful are gcc developers and savvy performance-hungry users. For the
>> former group the additional info is extremely useful. For the latter
>> group some of the extra information may not be required (although a
>> call count is useful for those using profile feedback), but IMO is not
>> unreasonable.
>
> well, your proposed output wrecks my 80x24 terminal already due to overly
> long lines.
>
> In the end we may up with a verbosity level for each sub-set of opt-info
> messages.  Ick.
>
> Richard.
>
>> Teresa
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Aug. 30, 2013, 7:51 p.m. UTC | #19
On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li <davidxl@google.com> wrote:
> Except that in this form, the dump will be extremely large and not
> suitable for very large applications.

Yes. I did some measurements for both a fairly large source file that
is heavily optimized with LIPO and for a simple toy example that has
some inlining. For the large source file, the output from
-fdump-ipa-inline=stderr was almost 100x the line count of the
-fopt-info output. For the toy source file it was 43x. The size of the
-details output was 250x and 100x, respectively. Which is untenable
for a large app.

The issue I am having here is that I want a more verbose message, not
a more voluminous set of messages. Using either -fopt-info-all or
-fdump-ipa-inline to provoke the more verbose inline message will give
me a much greater volume of output.

One compromise could be to emit the more verbose inliner message under
a param (and a more concise "foo inlined into bar" by default with
-fopt-info). Or we could do some variant of what David talks about
below.

> Besides, we might also want to
> use the same machinery (dump_printf_loc etc) for dump file dumping.
> The current behavior of using '-details' to turn on opt-info-all
> messages for dump files are not desirable.

Interestingly, this doesn't even work. When I do
-fdump-ipa-inline-details=stderr (with my patch containing the inliner
messages) I am not getting those inliner messages emitted to stderr.
Even though in dumpfile.c "details" is set to (TDF_DETAILS |
MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not
sure why, but will need to debug this.

> How about the following:
>
> 1) add a new dump_kind modifier so that when that modifier is
> specified, the messages won't goto the alt_dumpfile (controlled by
> -fopt-info), but only to primary dump file. With this, the inline
> messages can be dumped via:
>
>    dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .....)

(you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) )

Typically OR-ing together flags like this indicates dump under any of
those conditions. But we could implement special handling for
OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to
the primary dump file, and only under the other conditions specified
in the flag (here under "-optimized")

>
>
> 2) add more flags in -fdump- support:
>
>    -fdump-ipa-inline-opt   --> turn on opt-info messages only
>    -fdump-ipa-inline-optall --> turn on opt-info-all messages

According to the documentation (see the -fdump-tree- documentation on
http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options),
the above are already supposed to be there (-optimized, -missed, -note
and -optall). However, specifying any of these gives a warning like:
   cc1: warning: ignoring unknown option ‘optimized’ in
‘-fdump-ipa-inline’ [enabled by default]
Probably because none is listed in the dump_options[] array in dumpfile.c.

However, I don't think there is currently a way to use -fdump- options
and *only* get one of these, as much of the current dump output is
emitted whenever there is a dump_file defined. Until everything is
migrated to the new framework it may be difficult to get this to work.

>    -fdump-tree-pre-ir --> turn on GIMPLE dump only
>    -fdump-tree-pre-details --> turn on everything (ir, optall, trace)
>
> With this, developers can really just use
>
>
> -fdump-ipa-inline-opt=stderr for inline messages.

Yes, if we can figure out a good way to get this to work (i.e. only
emit the optimized messages and not the rest of the dump messages).
And unfortunately to get them all you need to specify
"-fdump-ipa-all-optimized -fdump-tree-all-optimized
-fdump-rtl-all-optimized" instead of just -fopt-info. Unless we can
add -fdump-all-all-optimized.

Teresa

>
> thanks,
>
> David
>
> On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>>>>> New patch below that removes this global variable, and also outputs
>>>>>>> the node->symbol.order (in square brackets after the function name so
>>>>>>> as to not clutter it). Inline messages with profile data look look:
>>>>>>>
>>>>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>>>>
>>>>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>>>>>> understandable by GCC users, not only GCC developers.
>>>>>
>>>>> The main part that is only useful/understandable to gcc developers is
>>>>> the node->symbol.order in square brackes, requested by Martin. One
>>>>> possibility is that I could put that part under a param, disabled by
>>>>> default. We have something similar on the google branches that emits
>>>>> LIPO module info in the message, enabled via a param.
>>>>
>>>> But we have _dump files_ for that.  That's the developer-consumed
>>>> form of opt-info.  -fopt-info is purely user sugar and for usual translation
>>>> units it shouldn't exceed a single terminal full of output.
>>>
>>> But as a developer I don't want to have to parse lots of dump files
>>> for a summary of the major optimizations performed (e.g. inlining,
>>> unrolling) for an application, unless I am diving into the reasons for
>>> why or why not one of those optimizations occurred in a particular
>>> location. I really do want a summary emitted to stderr so that it is
>>> easily searchable/summarizable for the app as a whole.
>>>
>>> For example, some of the apps I am interested in have thousands of
>>> input files, and trying to collect and parse dump files for each and
>>> every one is overwhelming (it probably would be even if my input files
>>> numbered in the hundreds). What has been very useful is having these
>>> high level summary messages of inlines and unrolls emitted to stderr
>>> by -fopt-info. Then it is easy to search and sort by hotness to get a
>>> feel for things like what inlines are missing when moving to a new
>>> compiler, or compiling a new version of the source, for example. Then
>>> you know which files to focus on and collect dump files for.
>>
>> I thought we can direct dump files to stderr now?  So, just use
>> -fdump-tree-all=stderr
>>
>> and grep its contents.
>>
>>>>
>>>>> I'd argue that the other information (the profile counts, emitted only
>>>>> when using -fprofile-use, and the inline call chains) are useful if
>>>>> you want to understand whether and how critical inlines are occurring.
>>>>> I think this is the type of information that users focused on
>>>>> optimizations, as well as gcc developers, want when they use
>>>>> -fopt-info. Otherwise it is difficult to make sense of the inline
>>>>> information.
>>>>
>>>> Well, I doubt that inline information is interesting to users unless we are
>>>> able to aggressively filter it to what users are interested in.  Which IMHO
>>>> isn't possible - users are interested in "I have not inlined this even though
>>>> inlining would severely improve performance" which would indicate a bug
>>>> in the heuristics we can reliably detect and thus it wouldn't be there.
>>>
>>> I have interacted with users who are aware of optimizations such as
>>> inlining and unrolling and want to look at that information to
>>> diagnose performance differences when refactoring code or using a new
>>> compiler version. I also think inlining (especially cross-module) is
>>> one example of an optimization that is still being tuned, and user
>>> reports of performance issues related to that have been useful.
>>>
>>> I really think that the two groups of people who will find -fopt-info
>>> useful are gcc developers and savvy performance-hungry users. For the
>>> former group the additional info is extremely useful. For the latter
>>> group some of the extra information may not be required (although a
>>> call count is useful for those using profile feedback), but IMO is not
>>> unreasonable.
>>
>> well, your proposed output wrecks my 80x24 terminal already due to overly
>> long lines.
>>
>> In the end we may up with a verbosity level for each sub-set of opt-info
>> messages.  Ick.
>>
>> Richard.
>>
>>> Teresa
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Xinliang David Li Aug. 30, 2013, 8:30 p.m. UTC | #20
On Fri, Aug 30, 2013 at 12:51 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li <davidxl@google.com> wrote:
>> Except that in this form, the dump will be extremely large and not
>> suitable for very large applications.
>
> Yes. I did some measurements for both a fairly large source file that
> is heavily optimized with LIPO and for a simple toy example that has
> some inlining. For the large source file, the output from
> -fdump-ipa-inline=stderr was almost 100x the line count of the
> -fopt-info output. For the toy source file it was 43x. The size of the
> -details output was 250x and 100x, respectively. Which is untenable
> for a large app.
>
> The issue I am having here is that I want a more verbose message, not
> a more voluminous set of messages. Using either -fopt-info-all or
> -fdump-ipa-inline to provoke the more verbose inline message will give
> me a much greater volume of output.
>
> One compromise could be to emit the more verbose inliner message under
> a param (and a more concise "foo inlined into bar" by default with
> -fopt-info). Or we could do some variant of what David talks about
> below.

something like --param=verbose-opt-info=1


>
>> Besides, we might also want to
>> use the same machinery (dump_printf_loc etc) for dump file dumping.
>> The current behavior of using '-details' to turn on opt-info-all
>> messages for dump files are not desirable.
>
> Interestingly, this doesn't even work. When I do
> -fdump-ipa-inline-details=stderr (with my patch containing the inliner
> messages) I am not getting those inliner messages emitted to stderr.
> Even though in dumpfile.c "details" is set to (TDF_DETAILS |
> MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not
> sure why, but will need to debug this.

It works for vectorizer pass.

>
>> How about the following:
>>
>> 1) add a new dump_kind modifier so that when that modifier is
>> specified, the messages won't goto the alt_dumpfile (controlled by
>> -fopt-info), but only to primary dump file. With this, the inline
>> messages can be dumped via:
>>
>>    dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .....)
>
> (you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) )
>

Yes.

> Typically OR-ing together flags like this indicates dump under any of
> those conditions. But we could implement special handling for
> OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to
> the primary dump file, and only under the other conditions specified
> in the flag (here under "-optimized")
>
>>
>>
>> 2) add more flags in -fdump- support:
>>
>>    -fdump-ipa-inline-opt   --> turn on opt-info messages only
>>    -fdump-ipa-inline-optall --> turn on opt-info-all messages
>
> According to the documentation (see the -fdump-tree- documentation on
> http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options),
> the above are already supposed to be there (-optimized, -missed, -note
> and -optall). However, specifying any of these gives a warning like:
>    cc1: warning: ignoring unknown option ‘optimized’ in
> ‘-fdump-ipa-inline’ [enabled by default]
> Probably because none is listed in the dump_options[] array in dumpfile.c.
>
> However, I don't think there is currently a way to use -fdump- options
> and *only* get one of these, as much of the current dump output is
> emitted whenever there is a dump_file defined. Until everything is
> migrated to the new framework it may be difficult to get this to work.
>
>>    -fdump-tree-pre-ir --> turn on GIMPLE dump only
>>    -fdump-tree-pre-details --> turn on everything (ir, optall, trace)
>>
>> With this, developers can really just use
>>
>>
>> -fdump-ipa-inline-opt=stderr for inline messages.
>
> Yes, if we can figure out a good way to get this to work (i.e. only
> emit the optimized messages and not the rest of the dump messages).
> And unfortunately to get them all you need to specify
> "-fdump-ipa-all-optimized -fdump-tree-all-optimized
> -fdump-rtl-all-optimized" instead of just -fopt-info. Unless we can
> add -fdump-all-all-optimized.

Having general support requires cleanup of all the old style  if
(dump_file) fprintf (dump_file, ...) instances to be:

  if (dump_enabled_p ())
    dump_printf (dump_kind ....);


However, it might be easier to do this filtering for IR dump only (in
execute_function_dump) -- do not dump IR if any of the MSG_xxxx is
specified unless IR flag (a new flag) is also specified.

David


>
> Teresa
>
>>
>> thanks,
>>
>> David
>>
>> On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>> New patch below that removes this global variable, and also outputs
>>>>>>>> the node->symbol.order (in square brackets after the function name so
>>>>>>>> as to not clutter it). Inline messages with profile data look look:
>>>>>>>>
>>>>>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>>>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>>>>>
>>>>>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>>>>>>> understandable by GCC users, not only GCC developers.
>>>>>>
>>>>>> The main part that is only useful/understandable to gcc developers is
>>>>>> the node->symbol.order in square brackes, requested by Martin. One
>>>>>> possibility is that I could put that part under a param, disabled by
>>>>>> default. We have something similar on the google branches that emits
>>>>>> LIPO module info in the message, enabled via a param.
>>>>>
>>>>> But we have _dump files_ for that.  That's the developer-consumed
>>>>> form of opt-info.  -fopt-info is purely user sugar and for usual translation
>>>>> units it shouldn't exceed a single terminal full of output.
>>>>
>>>> But as a developer I don't want to have to parse lots of dump files
>>>> for a summary of the major optimizations performed (e.g. inlining,
>>>> unrolling) for an application, unless I am diving into the reasons for
>>>> why or why not one of those optimizations occurred in a particular
>>>> location. I really do want a summary emitted to stderr so that it is
>>>> easily searchable/summarizable for the app as a whole.
>>>>
>>>> For example, some of the apps I am interested in have thousands of
>>>> input files, and trying to collect and parse dump files for each and
>>>> every one is overwhelming (it probably would be even if my input files
>>>> numbered in the hundreds). What has been very useful is having these
>>>> high level summary messages of inlines and unrolls emitted to stderr
>>>> by -fopt-info. Then it is easy to search and sort by hotness to get a
>>>> feel for things like what inlines are missing when moving to a new
>>>> compiler, or compiling a new version of the source, for example. Then
>>>> you know which files to focus on and collect dump files for.
>>>
>>> I thought we can direct dump files to stderr now?  So, just use
>>> -fdump-tree-all=stderr
>>>
>>> and grep its contents.
>>>
>>>>>
>>>>>> I'd argue that the other information (the profile counts, emitted only
>>>>>> when using -fprofile-use, and the inline call chains) are useful if
>>>>>> you want to understand whether and how critical inlines are occurring.
>>>>>> I think this is the type of information that users focused on
>>>>>> optimizations, as well as gcc developers, want when they use
>>>>>> -fopt-info. Otherwise it is difficult to make sense of the inline
>>>>>> information.
>>>>>
>>>>> Well, I doubt that inline information is interesting to users unless we are
>>>>> able to aggressively filter it to what users are interested in.  Which IMHO
>>>>> isn't possible - users are interested in "I have not inlined this even though
>>>>> inlining would severely improve performance" which would indicate a bug
>>>>> in the heuristics we can reliably detect and thus it wouldn't be there.
>>>>
>>>> I have interacted with users who are aware of optimizations such as
>>>> inlining and unrolling and want to look at that information to
>>>> diagnose performance differences when refactoring code or using a new
>>>> compiler version. I also think inlining (especially cross-module) is
>>>> one example of an optimization that is still being tuned, and user
>>>> reports of performance issues related to that have been useful.
>>>>
>>>> I really think that the two groups of people who will find -fopt-info
>>>> useful are gcc developers and savvy performance-hungry users. For the
>>>> former group the additional info is extremely useful. For the latter
>>>> group some of the extra information may not be required (although a
>>>> call count is useful for those using profile feedback), but IMO is not
>>>> unreasonable.
>>>
>>> well, your proposed output wrecks my 80x24 terminal already due to overly
>>> long lines.
>>>
>>> In the end we may up with a verbosity level for each sub-set of opt-info
>>> messages.  Ick.
>>>
>>> Richard.
>>>
>>>> Teresa
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Aug. 30, 2013, 9:23 p.m. UTC | #21
On Fri, Aug 30, 2013 at 1:30 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Fri, Aug 30, 2013 at 12:51 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li <davidxl@google.com> wrote:
>>> Except that in this form, the dump will be extremely large and not
>>> suitable for very large applications.
>>
>> Yes. I did some measurements for both a fairly large source file that
>> is heavily optimized with LIPO and for a simple toy example that has
>> some inlining. For the large source file, the output from
>> -fdump-ipa-inline=stderr was almost 100x the line count of the
>> -fopt-info output. For the toy source file it was 43x. The size of the
>> -details output was 250x and 100x, respectively. Which is untenable
>> for a large app.
>>
>> The issue I am having here is that I want a more verbose message, not
>> a more voluminous set of messages. Using either -fopt-info-all or
>> -fdump-ipa-inline to provoke the more verbose inline message will give
>> me a much greater volume of output.
>>
>> One compromise could be to emit the more verbose inliner message under
>> a param (and a more concise "foo inlined into bar" by default with
>> -fopt-info). Or we could do some variant of what David talks about
>> below.
>
> something like --param=verbose-opt-info=1

Yes. Richard, would this be acceptable for now?

i.e. the inliner messages would be like:

-fopt-info:
   "test.c:8:3: note: foobar inlined into foo with call count 99999000"
(the "with call count X" only when there is profile feedback)

-fopt-info --param=verbose-opt-info=1:
   "test.c:8:3: note: foobar/0 (99999000) inlined into foo/2 (1000)
with call count 99999000 (via inline instance bar [3] (99999000))
(again the call counts only emitted under profile feedback)

>
>
>>
>>> Besides, we might also want to
>>> use the same machinery (dump_printf_loc etc) for dump file dumping.
>>> The current behavior of using '-details' to turn on opt-info-all
>>> messages for dump files are not desirable.
>>
>> Interestingly, this doesn't even work. When I do
>> -fdump-ipa-inline-details=stderr (with my patch containing the inliner
>> messages) I am not getting those inliner messages emitted to stderr.
>> Even though in dumpfile.c "details" is set to (TDF_DETAILS |
>> MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not
>> sure why, but will need to debug this.
>
> It works for vectorizer pass.

Ok, let me see what is going on - I just confirmed that it is not
working for the loop unroller messages either.

>
>>
>>> How about the following:
>>>
>>> 1) add a new dump_kind modifier so that when that modifier is
>>> specified, the messages won't goto the alt_dumpfile (controlled by
>>> -fopt-info), but only to primary dump file. With this, the inline
>>> messages can be dumped via:
>>>
>>>    dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .....)
>>
>> (you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) )
>>
>
> Yes.
>
>> Typically OR-ing together flags like this indicates dump under any of
>> those conditions. But we could implement special handling for
>> OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to
>> the primary dump file, and only under the other conditions specified
>> in the flag (here under "-optimized")
>>
>>>
>>>
>>> 2) add more flags in -fdump- support:
>>>
>>>    -fdump-ipa-inline-opt   --> turn on opt-info messages only
>>>    -fdump-ipa-inline-optall --> turn on opt-info-all messages
>>
>> According to the documentation (see the -fdump-tree- documentation on
>> http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options),
>> the above are already supposed to be there (-optimized, -missed, -note
>> and -optall). However, specifying any of these gives a warning like:
>>    cc1: warning: ignoring unknown option ‘optimized’ in
>> ‘-fdump-ipa-inline’ [enabled by default]
>> Probably because none is listed in the dump_options[] array in dumpfile.c.
>>
>> However, I don't think there is currently a way to use -fdump- options
>> and *only* get one of these, as much of the current dump output is
>> emitted whenever there is a dump_file defined. Until everything is
>> migrated to the new framework it may be difficult to get this to work.
>>
>>>    -fdump-tree-pre-ir --> turn on GIMPLE dump only
>>>    -fdump-tree-pre-details --> turn on everything (ir, optall, trace)
>>>
>>> With this, developers can really just use
>>>
>>>
>>> -fdump-ipa-inline-opt=stderr for inline messages.
>>
>> Yes, if we can figure out a good way to get this to work (i.e. only
>> emit the optimized messages and not the rest of the dump messages).
>> And unfortunately to get them all you need to specify
>> "-fdump-ipa-all-optimized -fdump-tree-all-optimized
>> -fdump-rtl-all-optimized" instead of just -fopt-info. Unless we can
>> add -fdump-all-all-optimized.
>
> Having general support requires cleanup of all the old style  if
> (dump_file) fprintf (dump_file, ...) instances to be:
>
>   if (dump_enabled_p ())
>     dump_printf (dump_kind ....);

Right. But that is going to be a big longer-term effort - grepping for
dump_file in gcc/*.c gives about 6000 instances.

>
>
> However, it might be easier to do this filtering for IR dump only (in
> execute_function_dump) -- do not dump IR if any of the MSG_xxxx is
> specified unless IR flag (a new flag) is also specified.

Unfortunately there are a lot of messages that are not from
execute_function_dump.

Thanks,
Teresa

>
> David
>
>
>>
>> Teresa
>>
>>>
>>> thanks,
>>>
>>> David
>>>
>>> On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>>> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>> New patch below that removes this global variable, and also outputs
>>>>>>>>> the node->symbol.order (in square brackets after the function name so
>>>>>>>>> as to not clutter it). Inline messages with profile data look look:
>>>>>>>>>
>>>>>>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>>>>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>>>>>>
>>>>>>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>>>>>>>> understandable by GCC users, not only GCC developers.
>>>>>>>
>>>>>>> The main part that is only useful/understandable to gcc developers is
>>>>>>> the node->symbol.order in square brackes, requested by Martin. One
>>>>>>> possibility is that I could put that part under a param, disabled by
>>>>>>> default. We have something similar on the google branches that emits
>>>>>>> LIPO module info in the message, enabled via a param.
>>>>>>
>>>>>> But we have _dump files_ for that.  That's the developer-consumed
>>>>>> form of opt-info.  -fopt-info is purely user sugar and for usual translation
>>>>>> units it shouldn't exceed a single terminal full of output.
>>>>>
>>>>> But as a developer I don't want to have to parse lots of dump files
>>>>> for a summary of the major optimizations performed (e.g. inlining,
>>>>> unrolling) for an application, unless I am diving into the reasons for
>>>>> why or why not one of those optimizations occurred in a particular
>>>>> location. I really do want a summary emitted to stderr so that it is
>>>>> easily searchable/summarizable for the app as a whole.
>>>>>
>>>>> For example, some of the apps I am interested in have thousands of
>>>>> input files, and trying to collect and parse dump files for each and
>>>>> every one is overwhelming (it probably would be even if my input files
>>>>> numbered in the hundreds). What has been very useful is having these
>>>>> high level summary messages of inlines and unrolls emitted to stderr
>>>>> by -fopt-info. Then it is easy to search and sort by hotness to get a
>>>>> feel for things like what inlines are missing when moving to a new
>>>>> compiler, or compiling a new version of the source, for example. Then
>>>>> you know which files to focus on and collect dump files for.
>>>>
>>>> I thought we can direct dump files to stderr now?  So, just use
>>>> -fdump-tree-all=stderr
>>>>
>>>> and grep its contents.
>>>>
>>>>>>
>>>>>>> I'd argue that the other information (the profile counts, emitted only
>>>>>>> when using -fprofile-use, and the inline call chains) are useful if
>>>>>>> you want to understand whether and how critical inlines are occurring.
>>>>>>> I think this is the type of information that users focused on
>>>>>>> optimizations, as well as gcc developers, want when they use
>>>>>>> -fopt-info. Otherwise it is difficult to make sense of the inline
>>>>>>> information.
>>>>>>
>>>>>> Well, I doubt that inline information is interesting to users unless we are
>>>>>> able to aggressively filter it to what users are interested in.  Which IMHO
>>>>>> isn't possible - users are interested in "I have not inlined this even though
>>>>>> inlining would severely improve performance" which would indicate a bug
>>>>>> in the heuristics we can reliably detect and thus it wouldn't be there.
>>>>>
>>>>> I have interacted with users who are aware of optimizations such as
>>>>> inlining and unrolling and want to look at that information to
>>>>> diagnose performance differences when refactoring code or using a new
>>>>> compiler version. I also think inlining (especially cross-module) is
>>>>> one example of an optimization that is still being tuned, and user
>>>>> reports of performance issues related to that have been useful.
>>>>>
>>>>> I really think that the two groups of people who will find -fopt-info
>>>>> useful are gcc developers and savvy performance-hungry users. For the
>>>>> former group the additional info is extremely useful. For the latter
>>>>> group some of the extra information may not be required (although a
>>>>> call count is useful for those using profile feedback), but IMO is not
>>>>> unreasonable.
>>>>
>>>> well, your proposed output wrecks my 80x24 terminal already due to overly
>>>> long lines.
>>>>
>>>> In the end we may up with a verbosity level for each sub-set of opt-info
>>>> messages.  Ick.
>>>>
>>>> Richard.
>>>>
>>>>> Teresa
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Bernhard Reutner-Fischer Aug. 31, 2013, 7:26 a.m. UTC | #22
On 30 August 2013 23:23:16 Teresa Johnson <tejohnson@google.com> wrote:
> On Fri, Aug 30, 2013 at 1:30 PM, Xinliang David Li <davidxl@google.com> wrote:
> > On Fri, Aug 30, 2013 at 12:51 PM, Teresa Johnson <tejohnson@google.com> 
> wrote:
> >> On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li <davidxl@google.com> 
> wrote:
> >>> Except that in this form, the dump will be extremely large and not
> >>> suitable for very large applications.
> >>
> >> Yes. I did some measurements for both a fairly large source file that
> >> is heavily optimized with LIPO and for a simple toy example that has
> >> some inlining. For the large source file, the output from
> >> -fdump-ipa-inline=stderr was almost 100x the line count of the
> >> -fopt-info output. For the toy source file it was 43x. The size of the
> >> -details output was 250x and 100x, respectively. Which is untenable
> >> for a large app.
> >>
> >> The issue I am having here is that I want a more verbose message, not
> >> a more voluminous set of messages. Using either -fopt-info-all or
> >> -fdump-ipa-inline to provoke the more verbose inline message will give
> >> me a much greater volume of output.
> >>
> >> One compromise could be to emit the more verbose inliner message under
> >> a param (and a more concise "foo inlined into bar" by default with
> >> -fopt-info). Or we could do some variant of what David talks about
> >> below.
> >
> > something like --param=verbose-opt-info=1
>
> Yes. Richard, would this be acceptable for now?
>
> i.e. the inliner messages would be like:
>
> -fopt-info:
>    "test.c:8:3: note: foobar inlined into foo with call count 99999000"
> (the "with call count X" only when there is profile feedback)
>
> -fopt-info --param=verbose-opt-info=1:
>    "test.c:8:3: note: foobar/0 (99999000) inlined into foo/2 (1000)
> with call count 99999000 (via inline instance bar [3] (99999000))
> (again the call counts only emitted under profile feedback)

Assuming the [3] is order, please change that to match what the in liner 
uses, I.e. /3

Thanks
>
> >
> >
> >>
> >>> Besides, we might also want to
> >>> use the same machinery (dump_printf_loc etc) for dump file dumping.
> >>> The current behavior of using '-details' to turn on opt-info-all
> >>> messages for dump files are not desirable.
> >>
> >> Interestingly, this doesn't even work. When I do
> >> -fdump-ipa-inline-details=stderr (with my patch containing the inliner
> >> messages) I am not getting those inliner messages emitted to stderr.
> >> Even though in dumpfile.c "details" is set to (TDF_DETAILS |
> >> MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not
> >> sure why, but will need to debug this.
> >
> > It works for vectorizer pass.
>
> Ok, let me see what is going on - I just confirmed that it is not
> working for the loop unroller messages either.
>
> >
> >>
> >>> How about the following:
> >>>
> >>> 1) add a new dump_kind modifier so that when that modifier is
> >>> specified, the messages won't goto the alt_dumpfile (controlled by
> >>> -fopt-info), but only to primary dump file. With this, the inline
> >>> messages can be dumped via:
> >>>
> >>>    dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .....)
> >>
> >> (you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) )
> >>
> >
> > Yes.
> >
> >> Typically OR-ing together flags like this indicates dump under any of
> >> those conditions. But we could implement special handling for
> >> OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to
> >> the primary dump file, and only under the other conditions specified
> >> in the flag (here under "-optimized")
> >>
> >>>
> >>>
> >>> 2) add more flags in -fdump- support:
> >>>
> >>>    -fdump-ipa-inline-opt   --> turn on opt-info messages only
> >>>    -fdump-ipa-inline-optall --> turn on opt-info-all messages
> >>
> >> According to the documentation (see the -fdump-tree- documentation on
> >> http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options),
> >> the above are already supposed to be there (-optimized, -missed, -note
> >> and -optall). However, specifying any of these gives a warning like:
> >>    cc1: warning: ignoring unknown option ‘optimized’ in
> >> ‘-fdump-ipa-inline’ [enabled by default]
> >> Probably because none is listed in the dump_options[] array in dumpfile.c.
> >>
> >> However, I don't think there is currently a way to use -fdump- options
> >> and *only* get one of these, as much of the current dump output is
> >> emitted whenever there is a dump_file defined. Until everything is
> >> migrated to the new framework it may be difficult to get this to work.
> >>
> >>>    -fdump-tree-pre-ir --> turn on GIMPLE dump only
> >>>    -fdump-tree-pre-details --> turn on everything (ir, optall, trace)
> >>>
> >>> With this, developers can really just use
> >>>
> >>>
> >>> -fdump-ipa-inline-opt=stderr for inline messages.
> >>
> >> Yes, if we can figure out a good way to get this to work (i.e. only
> >> emit the optimized messages and not the rest of the dump messages).
> >> And unfortunately to get them all you need to specify
> >> "-fdump-ipa-all-optimized -fdump-tree-all-optimized
> >> -fdump-rtl-all-optimized" instead of just -fopt-info. Unless we can
> >> add -fdump-all-all-optimized.
> >
> > Having general support requires cleanup of all the old style  if
> > (dump_file) fprintf (dump_file, ...) instances to be:
> >
> >   if (dump_enabled_p ())
> >     dump_printf (dump_kind ....);
>
> Right. But that is going to be a big longer-term effort - grepping for
> dump_file in gcc/*.c gives about 6000 instances.
>
> >
> >
> > However, it might be easier to do this filtering for IR dump only (in
> > execute_function_dump) -- do not dump IR if any of the MSG_xxxx is
> > specified unless IR flag (a new flag) is also specified.
>
> Unfortunately there are a lot of messages that are not from
> execute_function_dump.
>
> Thanks,
> Teresa
>
> >
> > David
> >
> >
> >>
> >> Teresa
> >>
> >>>
> >>> thanks,
> >>>
> >>> David
> >>>
> >>> On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
> >>> <richard.guenther@gmail.com> wrote:
> >>>> On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson <tejohnson@google.com> 
> wrote:
> >>>>> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
> >>>>> <richard.guenther@gmail.com> wrote:
> >>>>>>>>> New patch below that removes this global variable, and also outputs
> >>>>>>>>> the node->symbol.order (in square brackets after the function name so
> >>>>>>>>> as to not clutter it). Inline messages with profile data look look:
> >>>>>>>>>
> >>>>>>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
> >>>>>>>>> with call count 99999000 (via inline instance bar [3] (99999000))
> >>>>>>>>
> >>>>>>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
> >>>>>>>> understandable by GCC users, not only GCC developers.
> >>>>>>>
> >>>>>>> The main part that is only useful/understandable to gcc developers is
> >>>>>>> the node->symbol.order in square brackes, requested by Martin. One
> >>>>>>> possibility is that I could put that part under a param, disabled by
> >>>>>>> default. We have something similar on the google branches that emits
> >>>>>>> LIPO module info in the message, enabled via a param.
> >>>>>>
> >>>>>> But we have _dump files_ for that.  That's the developer-consumed
> >>>>>> form of opt-info.  -fopt-info is purely user sugar and for usual 
> translation
> >>>>>> units it shouldn't exceed a single terminal full of output.
> >>>>>
> >>>>> But as a developer I don't want to have to parse lots of dump files
> >>>>> for a summary of the major optimizations performed (e.g. inlining,
> >>>>> unrolling) for an application, unless I am diving into the reasons for
> >>>>> why or why not one of those optimizations occurred in a particular
> >>>>> location. I really do want a summary emitted to stderr so that it is
> >>>>> easily searchable/summarizable for the app as a whole.
> >>>>>
> >>>>> For example, some of the apps I am interested in have thousands of
> >>>>> input files, and trying to collect and parse dump files for each and
> >>>>> every one is overwhelming (it probably would be even if my input files
> >>>>> numbered in the hundreds). What has been very useful is having these
> >>>>> high level summary messages of inlines and unrolls emitted to stderr
> >>>>> by -fopt-info. Then it is easy to search and sort by hotness to get a
> >>>>> feel for things like what inlines are missing when moving to a new
> >>>>> compiler, or compiling a new version of the source, for example. Then
> >>>>> you know which files to focus on and collect dump files for.
> >>>>
> >>>> I thought we can direct dump files to stderr now?  So, just use
> >>>> -fdump-tree-all=stderr
> >>>>
> >>>> and grep its contents.
> >>>>
> >>>>>>
> >>>>>>> I'd argue that the other information (the profile counts, emitted only
> >>>>>>> when using -fprofile-use, and the inline call chains) are useful if
> >>>>>>> you want to understand whether and how critical inlines are occurring.
> >>>>>>> I think this is the type of information that users focused on
> >>>>>>> optimizations, as well as gcc developers, want when they use
> >>>>>>> -fopt-info. Otherwise it is difficult to make sense of the inline
> >>>>>>> information.
> >>>>>>
> >>>>>> Well, I doubt that inline information is interesting to users unless 
> we are
> >>>>>> able to aggressively filter it to what users are interested in.  
> Which IMHO
> >>>>>> isn't possible - users are interested in "I have not inlined this 
> even though
> >>>>>> inlining would severely improve performance" which would indicate a bug
> >>>>>> in the heuristics we can reliably detect and thus it wouldn't be there.
> >>>>>
> >>>>> I have interacted with users who are aware of optimizations such as
> >>>>> inlining and unrolling and want to look at that information to
> >>>>> diagnose performance differences when refactoring code or using a new
> >>>>> compiler version. I also think inlining (especially cross-module) is
> >>>>> one example of an optimization that is still being tuned, and user
> >>>>> reports of performance issues related to that have been useful.
> >>>>>
> >>>>> I really think that the two groups of people who will find -fopt-info
> >>>>> useful are gcc developers and savvy performance-hungry users. For the
> >>>>> former group the additional info is extremely useful. For the latter
> >>>>> group some of the extra information may not be required (although a
> >>>>> call count is useful for those using profile feedback), but IMO is not
> >>>>> unreasonable.
> >>>>
> >>>> well, your proposed output wrecks my 80x24 terminal already due to overly
> >>>> long lines.
> >>>>
> >>>> In the end we may up with a verbosity level for each sub-set of opt-info
> >>>> messages.  Ick.
> >>>>
> >>>> Richard.
> >>>>
> >>>>> Teresa
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413


Sent with AquaMail for Android
http://www.aqua-mail.com
Teresa Johnson Aug. 31, 2013, 2:04 p.m. UTC | #23
On Sat, Aug 31, 2013 at 12:26 AM, Bernhard Reutner-Fischer
<rep.dot.nop@gmail.com> wrote:
> On 30 August 2013 23:23:16 Teresa Johnson <tejohnson@google.com> wrote:
>>
>> On Fri, Aug 30, 2013 at 1:30 PM, Xinliang David Li <davidxl@google.com>
>> wrote:
>> > On Fri, Aug 30, 2013 at 12:51 PM, Teresa Johnson <tejohnson@google.com>
>> > wrote:
>> >> On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li <davidxl@google.com>
>> >> wrote:
>> >>> Except that in this form, the dump will be extremely large and not
>> >>> suitable for very large applications.
>> >>
>> >> Yes. I did some measurements for both a fairly large source file that
>> >> is heavily optimized with LIPO and for a simple toy example that has
>> >> some inlining. For the large source file, the output from
>> >> -fdump-ipa-inline=stderr was almost 100x the line count of the
>> >> -fopt-info output. For the toy source file it was 43x. The size of the
>> >> -details output was 250x and 100x, respectively. Which is untenable
>> >> for a large app.
>> >>
>> >> The issue I am having here is that I want a more verbose message, not
>> >> a more voluminous set of messages. Using either -fopt-info-all or
>> >> -fdump-ipa-inline to provoke the more verbose inline message will give
>> >> me a much greater volume of output.
>> >>
>> >> One compromise could be to emit the more verbose inliner message under
>> >> a param (and a more concise "foo inlined into bar" by default with
>> >> -fopt-info). Or we could do some variant of what David talks about
>> >> below.
>> >
>> > something like --param=verbose-opt-info=1
>>
>> Yes. Richard, would this be acceptable for now?
>>
>> i.e. the inliner messages would be like:
>>
>> -fopt-info:
>>    "test.c:8:3: note: foobar inlined into foo with call count 99999000"
>> (the "with call count X" only when there is profile feedback)
>>
>> -fopt-info --param=verbose-opt-info=1:
>>    "test.c:8:3: note: foobar/0 (99999000) inlined into foo/2 (1000)
>> with call count 99999000 (via inline instance bar [3] (99999000))
>> (again the call counts only emitted under profile feedback)
>
>
> Assuming the [3] is order, please change that to match what the in liner
> uses, I.e. /3

Agreed - I meant to switch that back to "/" in both places but missed
the last. It should read:

"test.c:8:3: note: foobar/0 (99999000) inlined into foo/2 (1000) with
call count 99999000 (via inline instance bar/3 (99999000))

Thanks,
Teresa

>
> Thanks
>
>>
>> >
>> >
>> >>
>> >>> Besides, we might also want to
>> >>> use the same machinery (dump_printf_loc etc) for dump file dumping.
>> >>> The current behavior of using '-details' to turn on opt-info-all
>> >>> messages for dump files are not desirable.
>> >>
>> >> Interestingly, this doesn't even work. When I do
>> >> -fdump-ipa-inline-details=stderr (with my patch containing the inliner
>> >> messages) I am not getting those inliner messages emitted to stderr.
>> >> Even though in dumpfile.c "details" is set to (TDF_DETAILS |
>> >> MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not
>> >> sure why, but will need to debug this.
>> >
>> > It works for vectorizer pass.
>>
>> Ok, let me see what is going on - I just confirmed that it is not
>> working for the loop unroller messages either.
>>
>> >
>> >>
>> >>> How about the following:
>> >>>
>> >>> 1) add a new dump_kind modifier so that when that modifier is
>> >>> specified, the messages won't goto the alt_dumpfile (controlled by
>> >>> -fopt-info), but only to primary dump file. With this, the inline
>> >>> messages can be dumped via:
>> >>>
>> >>>    dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY,
>> >>> .....)
>> >>
>> >> (you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) )
>> >>
>> >
>> > Yes.
>> >
>> >> Typically OR-ing together flags like this indicates dump under any of
>> >> those conditions. But we could implement special handling for
>> >> OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to
>> >> the primary dump file, and only under the other conditions specified
>> >> in the flag (here under "-optimized")
>> >>
>> >>>
>> >>>
>> >>> 2) add more flags in -fdump- support:
>> >>>
>> >>>    -fdump-ipa-inline-opt   --> turn on opt-info messages only
>> >>>    -fdump-ipa-inline-optall --> turn on opt-info-all messages
>> >>
>> >> According to the documentation (see the -fdump-tree- documentation on
>> >>
>> >> http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options),
>> >> the above are already supposed to be there (-optimized, -missed, -note
>> >> and -optall). However, specifying any of these gives a warning like:
>> >>    cc1: warning: ignoring unknown option ‘optimized’ in
>> >> ‘-fdump-ipa-inline’ [enabled by default]
>> >> Probably because none is listed in the dump_options[] array in
>> >> dumpfile.c.
>> >>
>> >> However, I don't think there is currently a way to use -fdump- options
>> >> and *only* get one of these, as much of the current dump output is
>> >> emitted whenever there is a dump_file defined. Until everything is
>> >> migrated to the new framework it may be difficult to get this to work.
>> >>
>> >>>    -fdump-tree-pre-ir --> turn on GIMPLE dump only
>> >>>    -fdump-tree-pre-details --> turn on everything (ir, optall, trace)
>> >>>
>> >>> With this, developers can really just use
>> >>>
>> >>>
>> >>> -fdump-ipa-inline-opt=stderr for inline messages.
>> >>
>> >> Yes, if we can figure out a good way to get this to work (i.e. only
>> >> emit the optimized messages and not the rest of the dump messages).
>> >> And unfortunately to get them all you need to specify
>> >> "-fdump-ipa-all-optimized -fdump-tree-all-optimized
>> >> -fdump-rtl-all-optimized" instead of just -fopt-info. Unless we can
>> >> add -fdump-all-all-optimized.
>> >
>> > Having general support requires cleanup of all the old style  if
>> > (dump_file) fprintf (dump_file, ...) instances to be:
>> >
>> >   if (dump_enabled_p ())
>> >     dump_printf (dump_kind ....);
>>
>> Right. But that is going to be a big longer-term effort - grepping for
>> dump_file in gcc/*.c gives about 6000 instances.
>>
>> >
>> >
>> > However, it might be easier to do this filtering for IR dump only (in
>> > execute_function_dump) -- do not dump IR if any of the MSG_xxxx is
>> > specified unless IR flag (a new flag) is also specified.
>>
>> Unfortunately there are a lot of messages that are not from
>> execute_function_dump.
>>
>> Thanks,
>> Teresa
>>
>> >
>> > David
>> >
>> >
>> >>
>> >> Teresa
>> >>
>> >>>
>> >>> thanks,
>> >>>
>> >>> David
>> >>>
>> >>> On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
>> >>> <richard.guenther@gmail.com> wrote:
>> >>>> On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson
>> >>>> <tejohnson@google.com> wrote:
>> >>>>> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
>> >>>>> <richard.guenther@gmail.com> wrote:
>> >>>>>>>>> New patch below that removes this global variable, and also
>> >>>>>>>>> outputs
>> >>>>>>>>> the node->symbol.order (in square brackets after the function
>> >>>>>>>>> name so
>> >>>>>>>>> as to not clutter it). Inline messages with profile data look
>> >>>>>>>>> look:
>> >>>>>>>>>
>> >>>>>>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2]
>> >>>>>>>>> (1000)
>> >>>>>>>>> with call count 99999000 (via inline instance bar [3]
>> >>>>>>>>> (99999000))
>> >>>>>>>>
>> >>>>>>>> Ick.  This looks both redundant and cluttered.  This is supposed
>> >>>>>>>> to be
>> >>>>>>>> understandable by GCC users, not only GCC developers.
>> >>>>>>>
>> >>>>>>> The main part that is only useful/understandable to gcc developers
>> >>>>>>> is
>> >>>>>>> the node->symbol.order in square brackes, requested by Martin. One
>> >>>>>>> possibility is that I could put that part under a param, disabled
>> >>>>>>> by
>> >>>>>>> default. We have something similar on the google branches that
>> >>>>>>> emits
>> >>>>>>> LIPO module info in the message, enabled via a param.
>> >>>>>>
>> >>>>>> But we have _dump files_ for that.  That's the developer-consumed
>> >>>>>> form of opt-info.  -fopt-info is purely user sugar and for usual
>> >>>>>> translation
>> >>>>>> units it shouldn't exceed a single terminal full of output.
>> >>>>>
>> >>>>> But as a developer I don't want to have to parse lots of dump files
>> >>>>> for a summary of the major optimizations performed (e.g. inlining,
>> >>>>> unrolling) for an application, unless I am diving into the reasons
>> >>>>> for
>> >>>>> why or why not one of those optimizations occurred in a particular
>> >>>>> location. I really do want a summary emitted to stderr so that it is
>> >>>>> easily searchable/summarizable for the app as a whole.
>> >>>>>
>> >>>>> For example, some of the apps I am interested in have thousands of
>> >>>>> input files, and trying to collect and parse dump files for each and
>> >>>>> every one is overwhelming (it probably would be even if my input
>> >>>>> files
>> >>>>> numbered in the hundreds). What has been very useful is having these
>> >>>>> high level summary messages of inlines and unrolls emitted to stderr
>> >>>>> by -fopt-info. Then it is easy to search and sort by hotness to get
>> >>>>> a
>> >>>>> feel for things like what inlines are missing when moving to a new
>> >>>>> compiler, or compiling a new version of the source, for example.
>> >>>>> Then
>> >>>>> you know which files to focus on and collect dump files for.
>> >>>>
>> >>>> I thought we can direct dump files to stderr now?  So, just use
>> >>>> -fdump-tree-all=stderr
>> >>>>
>> >>>> and grep its contents.
>> >>>>
>> >>>>>>
>> >>>>>>> I'd argue that the other information (the profile counts, emitted
>> >>>>>>> only
>> >>>>>>> when using -fprofile-use, and the inline call chains) are useful
>> >>>>>>> if
>> >>>>>>> you want to understand whether and how critical inlines are
>> >>>>>>> occurring.
>> >>>>>>> I think this is the type of information that users focused on
>> >>>>>>> optimizations, as well as gcc developers, want when they use
>> >>>>>>> -fopt-info. Otherwise it is difficult to make sense of the inline
>> >>>>>>> information.
>> >>>>>>
>> >>>>>> Well, I doubt that inline information is interesting to users
>> >>>>>> unless we are
>> >>>>>> able to aggressively filter it to what users are interested in.
>> >>>>>> Which IMHO
>> >>>>>> isn't possible - users are interested in "I have not inlined this
>> >>>>>> even though
>> >>>>>> inlining would severely improve performance" which would indicate a
>> >>>>>> bug
>> >>>>>> in the heuristics we can reliably detect and thus it wouldn't be
>> >>>>>> there.
>> >>>>>
>> >>>>> I have interacted with users who are aware of optimizations such as
>> >>>>> inlining and unrolling and want to look at that information to
>> >>>>> diagnose performance differences when refactoring code or using a
>> >>>>> new
>> >>>>> compiler version. I also think inlining (especially cross-module) is
>> >>>>> one example of an optimization that is still being tuned, and user
>> >>>>> reports of performance issues related to that have been useful.
>> >>>>>
>> >>>>> I really think that the two groups of people who will find
>> >>>>> -fopt-info
>> >>>>> useful are gcc developers and savvy performance-hungry users. For
>> >>>>> the
>> >>>>> former group the additional info is extremely useful. For the latter
>> >>>>> group some of the extra information may not be required (although a
>> >>>>> call count is useful for those using profile feedback), but IMO is
>> >>>>> not
>> >>>>> unreasonable.
>> >>>>
>> >>>> well, your proposed output wrecks my 80x24 terminal already due to
>> >>>> overly
>> >>>> long lines.
>> >>>>
>> >>>> In the end we may up with a verbosity level for each sub-set of
>> >>>> opt-info
>> >>>> messages.  Ick.
>> >>>>
>> >>>> Richard.
>> >>>>
>> >>>>> Teresa
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Teresa Johnson | Software Engineer | tejohnson@google.com |
>> >>>>> 408-460-2413
>> >>
>> >>
>> >>
>> >> --
>> >> Teresa Johnson | Software Engineer | tejohnson@google.com |
>> >> 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> Sent with AquaMail for Android
> http://www.aqua-mail.com
>
>
Richard Biener Sept. 2, 2013, 8:59 a.m. UTC | #24
On Fri, Aug 30, 2013 at 6:27 PM, Xinliang David Li <davidxl@google.com> wrote:
> Except that in this form, the dump will be extremely large and not
> suitable for very large applications.

So?  You asked for it and you can easily grep the output before storing it
away.

> Besides, we might also want to
> use the same machinery (dump_printf_loc etc) for dump file dumping.
> The current behavior of using '-details' to turn on opt-info-all
> messages for dump files are not desirable.  How about the following:
>
> 1) add a new dump_kind modifier so that when that modifier is
> specified, the messages won't goto the alt_dumpfile (controlled by
> -fopt-info), but only to primary dump file. With this, the inline
> messages can be dumped via:
>
>    dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .....)
>
>
> 2) add more flags in -fdump- support:
>
>    -fdump-ipa-inline-opt   --> turn on opt-info messages only
>    -fdump-ipa-inline-optall --> turn on opt-info-all messages
>    -fdump-tree-pre-ir --> turn on GIMPLE dump only
>    -fdump-tree-pre-details --> turn on everything (ir, optall, trace)
>
> With this, developers can really just use
>
>
> -fdump-ipa-inline-opt=stderr for inline messages.
>
> thanks,
>
> David
>
> On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>>>>> New patch below that removes this global variable, and also outputs
>>>>>>> the node->symbol.order (in square brackets after the function name so
>>>>>>> as to not clutter it). Inline messages with profile data look look:
>>>>>>>
>>>>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>>>>
>>>>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>>>>>> understandable by GCC users, not only GCC developers.
>>>>>
>>>>> The main part that is only useful/understandable to gcc developers is
>>>>> the node->symbol.order in square brackes, requested by Martin. One
>>>>> possibility is that I could put that part under a param, disabled by
>>>>> default. We have something similar on the google branches that emits
>>>>> LIPO module info in the message, enabled via a param.
>>>>
>>>> But we have _dump files_ for that.  That's the developer-consumed
>>>> form of opt-info.  -fopt-info is purely user sugar and for usual translation
>>>> units it shouldn't exceed a single terminal full of output.
>>>
>>> But as a developer I don't want to have to parse lots of dump files
>>> for a summary of the major optimizations performed (e.g. inlining,
>>> unrolling) for an application, unless I am diving into the reasons for
>>> why or why not one of those optimizations occurred in a particular
>>> location. I really do want a summary emitted to stderr so that it is
>>> easily searchable/summarizable for the app as a whole.
>>>
>>> For example, some of the apps I am interested in have thousands of
>>> input files, and trying to collect and parse dump files for each and
>>> every one is overwhelming (it probably would be even if my input files
>>> numbered in the hundreds). What has been very useful is having these
>>> high level summary messages of inlines and unrolls emitted to stderr
>>> by -fopt-info. Then it is easy to search and sort by hotness to get a
>>> feel for things like what inlines are missing when moving to a new
>>> compiler, or compiling a new version of the source, for example. Then
>>> you know which files to focus on and collect dump files for.
>>
>> I thought we can direct dump files to stderr now?  So, just use
>> -fdump-tree-all=stderr
>>
>> and grep its contents.
>>
>>>>
>>>>> I'd argue that the other information (the profile counts, emitted only
>>>>> when using -fprofile-use, and the inline call chains) are useful if
>>>>> you want to understand whether and how critical inlines are occurring.
>>>>> I think this is the type of information that users focused on
>>>>> optimizations, as well as gcc developers, want when they use
>>>>> -fopt-info. Otherwise it is difficult to make sense of the inline
>>>>> information.
>>>>
>>>> Well, I doubt that inline information is interesting to users unless we are
>>>> able to aggressively filter it to what users are interested in.  Which IMHO
>>>> isn't possible - users are interested in "I have not inlined this even though
>>>> inlining would severely improve performance" which would indicate a bug
>>>> in the heuristics we can reliably detect and thus it wouldn't be there.
>>>
>>> I have interacted with users who are aware of optimizations such as
>>> inlining and unrolling and want to look at that information to
>>> diagnose performance differences when refactoring code or using a new
>>> compiler version. I also think inlining (especially cross-module) is
>>> one example of an optimization that is still being tuned, and user
>>> reports of performance issues related to that have been useful.
>>>
>>> I really think that the two groups of people who will find -fopt-info
>>> useful are gcc developers and savvy performance-hungry users. For the
>>> former group the additional info is extremely useful. For the latter
>>> group some of the extra information may not be required (although a
>>> call count is useful for those using profile feedback), but IMO is not
>>> unreasonable.
>>
>> well, your proposed output wrecks my 80x24 terminal already due to overly
>> long lines.
>>
>> In the end we may up with a verbosity level for each sub-set of opt-info
>> messages.  Ick.
>>
>> Richard.
>>
>>> Teresa
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Richard Biener Sept. 2, 2013, 9:01 a.m. UTC | #25
On Fri, Aug 30, 2013 at 9:51 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li <davidxl@google.com> wrote:
>> Except that in this form, the dump will be extremely large and not
>> suitable for very large applications.
>
> Yes. I did some measurements for both a fairly large source file that
> is heavily optimized with LIPO and for a simple toy example that has
> some inlining. For the large source file, the output from
> -fdump-ipa-inline=stderr was almost 100x the line count of the
> -fopt-info output. For the toy source file it was 43x. The size of the
> -details output was 250x and 100x, respectively. Which is untenable
> for a large app.
>
> The issue I am having here is that I want a more verbose message, not
> a more voluminous set of messages. Using either -fopt-info-all or
> -fdump-ipa-inline to provoke the more verbose inline message will give
> me a much greater volume of output.

I think we will never reach the state where the dumping is exactly what
each developer wants (because their wants will differ).  Developers can
easily post-process the stderr output with piping through grep.

Richard.

> One compromise could be to emit the more verbose inliner message under
> a param (and a more concise "foo inlined into bar" by default with
> -fopt-info). Or we could do some variant of what David talks about
> below.
>
>> Besides, we might also want to
>> use the same machinery (dump_printf_loc etc) for dump file dumping.
>> The current behavior of using '-details' to turn on opt-info-all
>> messages for dump files are not desirable.
>
> Interestingly, this doesn't even work. When I do
> -fdump-ipa-inline-details=stderr (with my patch containing the inliner
> messages) I am not getting those inliner messages emitted to stderr.
> Even though in dumpfile.c "details" is set to (TDF_DETAILS |
> MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not
> sure why, but will need to debug this.
>
>> How about the following:
>>
>> 1) add a new dump_kind modifier so that when that modifier is
>> specified, the messages won't goto the alt_dumpfile (controlled by
>> -fopt-info), but only to primary dump file. With this, the inline
>> messages can be dumped via:
>>
>>    dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .....)
>
> (you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) )
>
> Typically OR-ing together flags like this indicates dump under any of
> those conditions. But we could implement special handling for
> OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to
> the primary dump file, and only under the other conditions specified
> in the flag (here under "-optimized")
>
>>
>>
>> 2) add more flags in -fdump- support:
>>
>>    -fdump-ipa-inline-opt   --> turn on opt-info messages only
>>    -fdump-ipa-inline-optall --> turn on opt-info-all messages
>
> According to the documentation (see the -fdump-tree- documentation on
> http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options),
> the above are already supposed to be there (-optimized, -missed, -note
> and -optall). However, specifying any of these gives a warning like:
>    cc1: warning: ignoring unknown option ‘optimized’ in
> ‘-fdump-ipa-inline’ [enabled by default]
> Probably because none is listed in the dump_options[] array in dumpfile.c.
>
> However, I don't think there is currently a way to use -fdump- options
> and *only* get one of these, as much of the current dump output is
> emitted whenever there is a dump_file defined. Until everything is
> migrated to the new framework it may be difficult to get this to work.
>
>>    -fdump-tree-pre-ir --> turn on GIMPLE dump only
>>    -fdump-tree-pre-details --> turn on everything (ir, optall, trace)
>>
>> With this, developers can really just use
>>
>>
>> -fdump-ipa-inline-opt=stderr for inline messages.
>
> Yes, if we can figure out a good way to get this to work (i.e. only
> emit the optimized messages and not the rest of the dump messages).
> And unfortunately to get them all you need to specify
> "-fdump-ipa-all-optimized -fdump-tree-all-optimized
> -fdump-rtl-all-optimized" instead of just -fopt-info. Unless we can
> add -fdump-all-all-optimized.
>
> Teresa
>
>>
>> thanks,
>>
>> David
>>
>> On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>> New patch below that removes this global variable, and also outputs
>>>>>>>> the node->symbol.order (in square brackets after the function name so
>>>>>>>> as to not clutter it). Inline messages with profile data look look:
>>>>>>>>
>>>>>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>>>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>>>>>
>>>>>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>>>>>>> understandable by GCC users, not only GCC developers.
>>>>>>
>>>>>> The main part that is only useful/understandable to gcc developers is
>>>>>> the node->symbol.order in square brackes, requested by Martin. One
>>>>>> possibility is that I could put that part under a param, disabled by
>>>>>> default. We have something similar on the google branches that emits
>>>>>> LIPO module info in the message, enabled via a param.
>>>>>
>>>>> But we have _dump files_ for that.  That's the developer-consumed
>>>>> form of opt-info.  -fopt-info is purely user sugar and for usual translation
>>>>> units it shouldn't exceed a single terminal full of output.
>>>>
>>>> But as a developer I don't want to have to parse lots of dump files
>>>> for a summary of the major optimizations performed (e.g. inlining,
>>>> unrolling) for an application, unless I am diving into the reasons for
>>>> why or why not one of those optimizations occurred in a particular
>>>> location. I really do want a summary emitted to stderr so that it is
>>>> easily searchable/summarizable for the app as a whole.
>>>>
>>>> For example, some of the apps I am interested in have thousands of
>>>> input files, and trying to collect and parse dump files for each and
>>>> every one is overwhelming (it probably would be even if my input files
>>>> numbered in the hundreds). What has been very useful is having these
>>>> high level summary messages of inlines and unrolls emitted to stderr
>>>> by -fopt-info. Then it is easy to search and sort by hotness to get a
>>>> feel for things like what inlines are missing when moving to a new
>>>> compiler, or compiling a new version of the source, for example. Then
>>>> you know which files to focus on and collect dump files for.
>>>
>>> I thought we can direct dump files to stderr now?  So, just use
>>> -fdump-tree-all=stderr
>>>
>>> and grep its contents.
>>>
>>>>>
>>>>>> I'd argue that the other information (the profile counts, emitted only
>>>>>> when using -fprofile-use, and the inline call chains) are useful if
>>>>>> you want to understand whether and how critical inlines are occurring.
>>>>>> I think this is the type of information that users focused on
>>>>>> optimizations, as well as gcc developers, want when they use
>>>>>> -fopt-info. Otherwise it is difficult to make sense of the inline
>>>>>> information.
>>>>>
>>>>> Well, I doubt that inline information is interesting to users unless we are
>>>>> able to aggressively filter it to what users are interested in.  Which IMHO
>>>>> isn't possible - users are interested in "I have not inlined this even though
>>>>> inlining would severely improve performance" which would indicate a bug
>>>>> in the heuristics we can reliably detect and thus it wouldn't be there.
>>>>
>>>> I have interacted with users who are aware of optimizations such as
>>>> inlining and unrolling and want to look at that information to
>>>> diagnose performance differences when refactoring code or using a new
>>>> compiler version. I also think inlining (especially cross-module) is
>>>> one example of an optimization that is still being tuned, and user
>>>> reports of performance issues related to that have been useful.
>>>>
>>>> I really think that the two groups of people who will find -fopt-info
>>>> useful are gcc developers and savvy performance-hungry users. For the
>>>> former group the additional info is extremely useful. For the latter
>>>> group some of the extra information may not be required (although a
>>>> call count is useful for those using profile feedback), but IMO is not
>>>> unreasonable.
>>>
>>> well, your proposed output wrecks my 80x24 terminal already due to overly
>>> long lines.
>>>
>>> In the end we may up with a verbosity level for each sub-set of opt-info
>>> messages.  Ick.
>>>
>>> Richard.
>>>
>>>> Teresa
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Richard Biener Sept. 2, 2013, 9:02 a.m. UTC | #26
On Fri, Aug 30, 2013 at 11:23 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Fri, Aug 30, 2013 at 1:30 PM, Xinliang David Li <davidxl@google.com> wrote:
>> On Fri, Aug 30, 2013 at 12:51 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Fri, Aug 30, 2013 at 9:27 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>> Except that in this form, the dump will be extremely large and not
>>>> suitable for very large applications.
>>>
>>> Yes. I did some measurements for both a fairly large source file that
>>> is heavily optimized with LIPO and for a simple toy example that has
>>> some inlining. For the large source file, the output from
>>> -fdump-ipa-inline=stderr was almost 100x the line count of the
>>> -fopt-info output. For the toy source file it was 43x. The size of the
>>> -details output was 250x and 100x, respectively. Which is untenable
>>> for a large app.
>>>
>>> The issue I am having here is that I want a more verbose message, not
>>> a more voluminous set of messages. Using either -fopt-info-all or
>>> -fdump-ipa-inline to provoke the more verbose inline message will give
>>> me a much greater volume of output.
>>>
>>> One compromise could be to emit the more verbose inliner message under
>>> a param (and a more concise "foo inlined into bar" by default with
>>> -fopt-info). Or we could do some variant of what David talks about
>>> below.
>>
>> something like --param=verbose-opt-info=1
>
> Yes. Richard, would this be acceptable for now?
>
> i.e. the inliner messages would be like:
>
> -fopt-info:
>    "test.c:8:3: note: foobar inlined into foo with call count 99999000"
> (the "with call count X" only when there is profile feedback)
>
> -fopt-info --param=verbose-opt-info=1:
>    "test.c:8:3: note: foobar/0 (99999000) inlined into foo/2 (1000)
> with call count 99999000 (via inline instance bar [3] (99999000))
> (again the call counts only emitted under profile feedback)

It looks like a hack to me.  Is -fdump-ipa-inline useful at all?  That is,
can't we simply push some of the -details dumping into the non-details
dump?

Richard.

>>
>>
>>>
>>>> Besides, we might also want to
>>>> use the same machinery (dump_printf_loc etc) for dump file dumping.
>>>> The current behavior of using '-details' to turn on opt-info-all
>>>> messages for dump files are not desirable.
>>>
>>> Interestingly, this doesn't even work. When I do
>>> -fdump-ipa-inline-details=stderr (with my patch containing the inliner
>>> messages) I am not getting those inliner messages emitted to stderr.
>>> Even though in dumpfile.c "details" is set to (TDF_DETAILS |
>>> MSG_OPTIMIZED_LOCATIONS | MSG_MISSED_OPTIMIZATION | MSG_NOTE). I'm not
>>> sure why, but will need to debug this.
>>
>> It works for vectorizer pass.
>
> Ok, let me see what is going on - I just confirmed that it is not
> working for the loop unroller messages either.
>
>>
>>>
>>>> How about the following:
>>>>
>>>> 1) add a new dump_kind modifier so that when that modifier is
>>>> specified, the messages won't goto the alt_dumpfile (controlled by
>>>> -fopt-info), but only to primary dump file. With this, the inline
>>>> messages can be dumped via:
>>>>
>>>>    dump_printf_loc (OPT_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY, .....)
>>>
>>> (you mean (MSG_OPTIMIZED_LOCATIONS | OPT_DUMP_FILE_ONLY) )
>>>
>>
>> Yes.
>>
>>> Typically OR-ing together flags like this indicates dump under any of
>>> those conditions. But we could implement special handling for
>>> OPT_DUMP_FILE_ONLY, which in the above case would mean dump only to
>>> the primary dump file, and only under the other conditions specified
>>> in the flag (here under "-optimized")
>>>
>>>>
>>>>
>>>> 2) add more flags in -fdump- support:
>>>>
>>>>    -fdump-ipa-inline-opt   --> turn on opt-info messages only
>>>>    -fdump-ipa-inline-optall --> turn on opt-info-all messages
>>>
>>> According to the documentation (see the -fdump-tree- documentation on
>>> http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options),
>>> the above are already supposed to be there (-optimized, -missed, -note
>>> and -optall). However, specifying any of these gives a warning like:
>>>    cc1: warning: ignoring unknown option ‘optimized’ in
>>> ‘-fdump-ipa-inline’ [enabled by default]
>>> Probably because none is listed in the dump_options[] array in dumpfile.c.
>>>
>>> However, I don't think there is currently a way to use -fdump- options
>>> and *only* get one of these, as much of the current dump output is
>>> emitted whenever there is a dump_file defined. Until everything is
>>> migrated to the new framework it may be difficult to get this to work.
>>>
>>>>    -fdump-tree-pre-ir --> turn on GIMPLE dump only
>>>>    -fdump-tree-pre-details --> turn on everything (ir, optall, trace)
>>>>
>>>> With this, developers can really just use
>>>>
>>>>
>>>> -fdump-ipa-inline-opt=stderr for inline messages.
>>>
>>> Yes, if we can figure out a good way to get this to work (i.e. only
>>> emit the optimized messages and not the rest of the dump messages).
>>> And unfortunately to get them all you need to specify
>>> "-fdump-ipa-all-optimized -fdump-tree-all-optimized
>>> -fdump-rtl-all-optimized" instead of just -fopt-info. Unless we can
>>> add -fdump-all-all-optimized.
>>
>> Having general support requires cleanup of all the old style  if
>> (dump_file) fprintf (dump_file, ...) instances to be:
>>
>>   if (dump_enabled_p ())
>>     dump_printf (dump_kind ....);
>
> Right. But that is going to be a big longer-term effort - grepping for
> dump_file in gcc/*.c gives about 6000 instances.
>
>>
>>
>> However, it might be easier to do this filtering for IR dump only (in
>> execute_function_dump) -- do not dump IR if any of the MSG_xxxx is
>> specified unless IR flag (a new flag) is also specified.
>
> Unfortunately there are a lot of messages that are not from
> execute_function_dump.
>
> Thanks,
> Teresa
>
>>
>> David
>>
>>
>>>
>>> Teresa
>>>
>>>>
>>>> thanks,
>>>>
>>>> David
>>>>
>>>> On Fri, Aug 30, 2013 at 1:30 AM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Thu, Aug 29, 2013 at 5:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>>>> On Thu, Aug 29, 2013 at 3:04 AM, Richard Biener
>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>>>> New patch below that removes this global variable, and also outputs
>>>>>>>>>> the node->symbol.order (in square brackets after the function name so
>>>>>>>>>> as to not clutter it). Inline messages with profile data look look:
>>>>>>>>>>
>>>>>>>>>> test.c:8:3: note: foobar [0] (99999000) inlined into foo [2] (1000)
>>>>>>>>>> with call count 99999000 (via inline instance bar [3] (99999000))
>>>>>>>>>
>>>>>>>>> Ick.  This looks both redundant and cluttered.  This is supposed to be
>>>>>>>>> understandable by GCC users, not only GCC developers.
>>>>>>>>
>>>>>>>> The main part that is only useful/understandable to gcc developers is
>>>>>>>> the node->symbol.order in square brackes, requested by Martin. One
>>>>>>>> possibility is that I could put that part under a param, disabled by
>>>>>>>> default. We have something similar on the google branches that emits
>>>>>>>> LIPO module info in the message, enabled via a param.
>>>>>>>
>>>>>>> But we have _dump files_ for that.  That's the developer-consumed
>>>>>>> form of opt-info.  -fopt-info is purely user sugar and for usual translation
>>>>>>> units it shouldn't exceed a single terminal full of output.
>>>>>>
>>>>>> But as a developer I don't want to have to parse lots of dump files
>>>>>> for a summary of the major optimizations performed (e.g. inlining,
>>>>>> unrolling) for an application, unless I am diving into the reasons for
>>>>>> why or why not one of those optimizations occurred in a particular
>>>>>> location. I really do want a summary emitted to stderr so that it is
>>>>>> easily searchable/summarizable for the app as a whole.
>>>>>>
>>>>>> For example, some of the apps I am interested in have thousands of
>>>>>> input files, and trying to collect and parse dump files for each and
>>>>>> every one is overwhelming (it probably would be even if my input files
>>>>>> numbered in the hundreds). What has been very useful is having these
>>>>>> high level summary messages of inlines and unrolls emitted to stderr
>>>>>> by -fopt-info. Then it is easy to search and sort by hotness to get a
>>>>>> feel for things like what inlines are missing when moving to a new
>>>>>> compiler, or compiling a new version of the source, for example. Then
>>>>>> you know which files to focus on and collect dump files for.
>>>>>
>>>>> I thought we can direct dump files to stderr now?  So, just use
>>>>> -fdump-tree-all=stderr
>>>>>
>>>>> and grep its contents.
>>>>>
>>>>>>>
>>>>>>>> I'd argue that the other information (the profile counts, emitted only
>>>>>>>> when using -fprofile-use, and the inline call chains) are useful if
>>>>>>>> you want to understand whether and how critical inlines are occurring.
>>>>>>>> I think this is the type of information that users focused on
>>>>>>>> optimizations, as well as gcc developers, want when they use
>>>>>>>> -fopt-info. Otherwise it is difficult to make sense of the inline
>>>>>>>> information.
>>>>>>>
>>>>>>> Well, I doubt that inline information is interesting to users unless we are
>>>>>>> able to aggressively filter it to what users are interested in.  Which IMHO
>>>>>>> isn't possible - users are interested in "I have not inlined this even though
>>>>>>> inlining would severely improve performance" which would indicate a bug
>>>>>>> in the heuristics we can reliably detect and thus it wouldn't be there.
>>>>>>
>>>>>> I have interacted with users who are aware of optimizations such as
>>>>>> inlining and unrolling and want to look at that information to
>>>>>> diagnose performance differences when refactoring code or using a new
>>>>>> compiler version. I also think inlining (especially cross-module) is
>>>>>> one example of an optimization that is still being tuned, and user
>>>>>> reports of performance issues related to that have been useful.
>>>>>>
>>>>>> I really think that the two groups of people who will find -fopt-info
>>>>>> useful are gcc developers and savvy performance-hungry users. For the
>>>>>> former group the additional info is extremely useful. For the latter
>>>>>> group some of the extra information may not be required (although a
>>>>>> call count is useful for those using profile feedback), but IMO is not
>>>>>> unreasonable.
>>>>>
>>>>> well, your proposed output wrecks my 80x24 terminal already due to overly
>>>>> long lines.
>>>>>
>>>>> In the end we may up with a verbosity level for each sub-set of opt-info
>>>>> messages.  Ick.
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Teresa
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
diff mbox

Patch

Index: dumpfile.c
===================================================================
--- dumpfile.c  (revision 201461)
+++ dumpfile.c  (working copy)
@@ -257,16 +257,18 @@  dump_open_alternate_stream (struct dump_file_info
 void
 dump_loc (int dump_kind, FILE *dfile, source_location loc)
 {
-  /* Currently vectorization passes print location information.  */
   if (dump_kind)
     {
+      /* Ensure dump message starts on a new line.  */
+      fprintf (dfile, "\n");
       if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
-        fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
-                 LOCATION_LINE (loc));
+        fprintf (dfile, "%s:%d:%d: note: ", LOCATION_FILE (loc),
+                 LOCATION_LINE (loc), LOCATION_COLUMN (loc));
       else if (current_function_decl)
-        fprintf (dfile, "\n%s:%d: note: ",
+        fprintf (dfile, "%s:%d:%d: note: ",
                  DECL_SOURCE_FILE (current_function_decl),
-                 DECL_SOURCE_LINE (current_function_decl));
+                 DECL_SOURCE_LINE (current_function_decl),
+                 DECL_SOURCE_COLUMN (current_function_decl));
     }
 }

Index: dumpfile.h
===================================================================
--- dumpfile.h  (revision 201461)
+++ dumpfile.h  (working copy)
@@ -97,8 +97,9 @@  enum tree_dump_index
 #define OPTGROUP_LOOP        (1 << 2)   /* Loop optimization passes */
 #define OPTGROUP_INLINE      (1 << 3)   /* Inlining passes */
 #define OPTGROUP_VEC         (1 << 4)   /* Vectorization passes */
+#define OPTGROUP_OTHER       (1 << 5)   /* All other passes */
 #define OPTGROUP_ALL        (OPTGROUP_IPA | OPTGROUP_LOOP | OPTGROUP_INLINE \
-                              | OPTGROUP_VEC)
+                              | OPTGROUP_VEC | OPTGROUP_OTHER)

 /* Define a tree dump switch.  */
 struct dump_file_info
Index: ipa-inline-transform.c
===================================================================
--- ipa-inline-transform.c      (revision 201461)
+++ ipa-inline-transform.c      (working copy)
@@ -192,6 +192,111 @@  clone_inlined_nodes (struct cgraph_edge *e, bool d
 }


+#define MAX_INT_LENGTH 20
+
+/* Return NODE's name and profile count, if available.  */
+
+static const char *
+cgraph_node_opt_info (struct cgraph_node *node)
+{
+  char *buf;
+  size_t buf_size;
+  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
+
+  if (!bfd_name)
+    bfd_name = "unknown";
+
+  buf_size = strlen (bfd_name) + 1;
+  if (profile_info)
+    buf_size += (MAX_INT_LENGTH + 3);
+  buf_size += MAX_INT_LENGTH;
+
+  buf = (char *) xmalloc (buf_size);
+
+  strcpy (buf, bfd_name);
+  //sprintf (buf, "%s/%i", buf, node->symbol.order);
+  sprintf (buf, "%s [%i]", buf, node->symbol.order);
+
+  if (profile_info)
+    sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, node->count);
+  return buf;
+}
+
+
+/* Return CALLER's inlined call chain. Save the cgraph_node of the ultimate
+   function that the caller is inlined to in FINAL_CALLER.  */
+
+static const char *
+cgraph_node_call_chain (struct cgraph_node *caller,
+                       struct cgraph_node **final_caller)
+{
+  struct cgraph_node *node;
+  const char *via_str = " (via inline instance";
+  size_t current_string_len = strlen (via_str) + 1;
+  size_t buf_size = current_string_len;
+  char *buf = (char *) xmalloc (buf_size);
+
+  buf[0] = 0;
+  gcc_assert (caller->global.inlined_to != NULL);
+  strcat (buf, via_str);
+  for (node = caller; node->global.inlined_to != NULL;
+       node = node->callers->caller)
+    {
+      const char *name = cgraph_node_opt_info (node);
+      current_string_len += (strlen (name) + 1);
+      if (current_string_len >= buf_size)
+       {
+         buf_size = current_string_len * 2;
+         buf = (char *) xrealloc (buf, buf_size);
+       }
+      strcat (buf, " ");
+      strcat (buf, name);
+    }
+  strcat (buf, ")");
+  *final_caller = node;
+  return buf;
+}
+
+
+/* Dump the inline decision of EDGE.  */
+
+static void
+dump_inline_decision (struct cgraph_edge *edge, bool early)
+{
+  location_t locus;
+  const char *inline_chain_text;
+  const char *call_count_text;
+  struct cgraph_node *final_caller = edge->caller;
+
+  if (final_caller->global.inlined_to != NULL)
+    inline_chain_text = cgraph_node_call_chain (final_caller, &final_caller);
+  else
+    inline_chain_text = "";
+
+  if (edge->count > 0)
+    {
+      const char *call_count_str = " with call count ";
+      char *buf = (char *) xmalloc (strlen (call_count_str) + MAX_INT_LENGTH);
+      sprintf (buf, "%s"HOST_WIDEST_INT_PRINT_DEC, call_count_str,
+              edge->count);
+      call_count_text = buf;
+    }
+  else
+    {
+      call_count_text = "";
+    }
+
+  locus = gimple_location (edge->call_stmt);
+  dump_printf_loc (early ? MSG_NOTE : MSG_OPTIMIZED_LOCATIONS,
+                   locus,
+                   "%s inlined into %s%s%s\n",
+                   cgraph_node_opt_info (edge->callee),
+                   cgraph_node_opt_info (final_caller),
+                   call_count_text,
+                   inline_chain_text);
+}
+
+
 /* Mark edge E as inlined and update callgraph accordingly.  UPDATE_ORIGINAL
    specify whether profile of original function should be updated.  If any new
    indirect edges are discovered in the process, add them to NEW_EDGES, unless
@@ -205,7 +310,8 @@  clone_inlined_nodes (struct cgraph_edge *e, bool d
 bool
 inline_call (struct cgraph_edge *e, bool update_original,
             vec<cgraph_edge_p> *new_edges,
-            int *overall_size, bool update_overall_summary)
+            int *overall_size, bool update_overall_summary,
+             bool early)
 {
   int old_size = 0, new_size = 0;
   struct cgraph_node *to = NULL;
@@ -218,6 +324,9 @@  inline_call (struct cgraph_edge *e, bool update_or
   bool predicated = inline_edge_summary (e)->predicate != NULL;
 #endif

+  if (dump_enabled_p ())
+    dump_inline_decision (e, early);
+
   /* Don't inline inlined edges.  */
   gcc_assert (e->inline_failed);
   /* Don't even think of inlining inline clone.  */
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi     (revision 201461)
+++ doc/invoke.texi     (working copy)
@@ -6234,6 +6234,9 @@  Enable dumps from all loop optimizations.
 Enable dumps from all inlining optimizations.
 @item vec
 Enable dumps from all vectorization optimizations.
+@item optall
+Enable dumps from all optimizations. This is a superset of
+the optimization groups listed above.
 @end table

 For example,
Index: profile.c
===================================================================
--- profile.c   (revision 201461)
+++ profile.c   (working copy)
@@ -432,8 +432,8 @@  read_profile_edge_counts (gcov_type *exec_counts)
                    if (flag_profile_correction)
                      {
                        static bool informed = 0;
-                       if (!informed)
-                         inform (input_location,
+                       if (dump_enabled_p () && !informed)
+                         dump_printf_loc (MSG_NOTE, input_location,
                                  "corrupted profile info: edge count
exceeds maximal count");
                        informed = 1;
                      }
@@ -692,10 +692,11 @@  compute_branch_probabilities (unsigned cfg_checksu
        {
          /* Inconsistency detected. Make it flow-consistent. */
          static int informed = 0;
-         if (informed == 0)
+         if (dump_enabled_p () && informed == 0)
            {
              informed = 1;
-             inform (input_location, "correcting inconsistent profile data");
+             dump_printf_loc (MSG_NOTE, input_location,
+                              "correcting inconsistent profile data");
            }
          correct_negative_edge_counts ();
          /* Set bb counts to the sum of the outgoing edge counts */
Index: passes.c
===================================================================
--- passes.c    (revision 201461)
+++ passes.c    (working copy)
@@ -524,6 +524,11 @@  pass_manager::register_one_dump_file (struct opt_p
   flag_name = concat (prefix, name, num, NULL);
   glob_name = concat (prefix, name, NULL);
   optgroup_flags |= pass->optinfo_flags;
+  /* For any passes that do not have an optgroup set, and which are not
+     IPA passes setup above, set the optgroup to OPTGROUP_OTHER so that
+     any dump messages are emitted properly under -fopt-info(-optall).  */
+  if (optgroup_flags == OPTGROUP_NONE)
+    optgroup_flags = OPTGROUP_OTHER;
   id = dump_register (dot_name, flag_name, glob_name, flags, optgroup_flags);
   set_pass_for_id (id, pass);
   full_name = concat (prefix, pass->name, num, NULL);
Index: value-prof.c
===================================================================
--- value-prof.c        (revision 201461)
+++ value-prof.c        (working copy)
@@ -585,9 +585,11 @@  check_counter (gimple stmt, const char * name,
               : DECL_SOURCE_LOCATION (current_function_decl);
       if (flag_profile_correction)
         {
-         inform (locus, "correcting inconsistent value profile: "
-                 "%s profiler overall count (%d) does not match BB count "
-                  "(%d)", name, (int)*all, (int)bb_count);
+          if (dump_enabled_p ())
+            dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
+                             "correcting inconsistent value profile: %s "
+                             "profiler overall count (%d) does not match BB "
+                             "count (%d)", name, (int)*all, (int)bb_count);
          *all = bb_count;
          if (*count > *all)
             *count = *all;
@@ -1209,9 +1211,11 @@  find_func_by_funcdef_no (int func_id)
   int max_id = get_last_funcdef_no ();
   if (func_id >= max_id || cgraph_node_map[func_id] == NULL)
     {
-      if (flag_profile_correction)
-        inform (DECL_SOURCE_LOCATION (current_function_decl),
-                "Inconsistent profile: indirect call target (%d) does
not exist", func_id);
+      if (flag_profile_correction && dump_enabled_p ())
+        dump_printf_loc (MSG_MISSED_OPTIMIZATION,
+                         DECL_SOURCE_LOCATION (current_function_decl),
+                         "Inconsistent profile: indirect call target (%d) "
+                         "does not exist", func_id);
       else
         error ("Inconsistent profile: indirect call target (%d) does
not exist", func_id);

@@ -1235,8 +1239,10 @@  check_ic_target (gimple call_stmt, struct cgraph_n
      return true;

    locus =  gimple_location (call_stmt);
-   inform (locus, "Skipping target %s with mismatching types for icall ",
-           cgraph_node_name (target));
+   if (dump_enabled_p ())
+     dump_printf_loc (MSG_MISSED_OPTIMIZATION, locus,
+                      "Skipping target %s with mismatching types for icall ",
+                      cgraph_node_name (target));
    return false;
 }

Index: coverage.c
===================================================================
--- coverage.c  (revision 201461)
+++ coverage.c  (working copy)
@@ -43,6 +43,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "hash-table.h"
 #include "tree-iterator.h"
+#include "tree-pass.h"
 #include "cgraph.h"
 #include "dumpfile.h"
 #include "diagnostic-core.h"
@@ -341,11 +342,13 @@  get_coverage_counts (unsigned counter, unsigned ex
     {
       static int warned = 0;

-      if (!warned++)
-       inform (input_location, (flag_guess_branch_prob
-                ? "file %s not found, execution counts estimated"
-                : "file %s not found, execution counts assumed to be zero"),
-               da_file_name);
+      if (!warned++ && dump_enabled_p ())
+       dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
+                         (flag_guess_branch_prob
+                          ? "file %s not found, execution counts estimated"
+                          : "file %s not found, execution counts assumed to "
+                            "be zero"),
+                         da_file_name);
       return NULL;
     }

@@ -369,21 +372,25 @@  get_coverage_counts (unsigned counter, unsigned ex
        warning_at (input_location, OPT_Wcoverage_mismatch,
                    "the control flow of function %qE does not match "
                    "its profile data (counter %qs)", id, ctr_names[counter]);
-      if (warning_printed)
+      if (warning_printed && dump_enabled_p ())
        {
-        inform (input_location, "use -Wno-error=coverage-mismatch to tolerate "
-                "the mismatch but performance may drop if the
function is hot");
+          dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
+                           "use -Wno-error=coverage-mismatch to tolerate "
+                           "the mismatch but performance may drop if the "
+                           "function is hot");

          if (!seen_error ()
              && !warned++)
            {
-             inform (input_location, "coverage mismatch ignored");
-             inform (input_location, flag_guess_branch_prob
-                     ? G_("execution counts estimated")
-                     : G_("execution counts assumed to be zero"));
+             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
+                               "coverage mismatch ignored");
+             dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
+                               flag_guess_branch_prob
+                               ? G_("execution counts estimated")
+                               : G_("execution counts assumed to be zero"));
              if (!flag_guess_branch_prob)
-               inform (input_location,
-                       "this can result in poorly optimized code");
+               dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, input_location,
+                                 "this can result in poorly optimized code");
            }
        }

@@ -1103,6 +1110,11 @@  coverage_init (const char *filename)
   int len = strlen (filename);
   int prefix_len = 0;

+  /* Since coverage_init is invoked very early, before the pass
+     manager, we need to set up the dumping explicitly. This is
+     similar to the handling in finish_optimization_passes.  */
+  dump_start (pass_profile.pass.static_pass_number, NULL);
+
   if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
     profile_data_prefix = getpwd ();

@@ -1145,6 +1157,8 @@  coverage_init (const char *filename)
          gcov_write_unsigned (bbg_file_stamp);
        }
     }
+
+  dump_finish (pass_profile.pass.static_pass_number);
 }

 /* Performs file-level cleanup.  Close notes file, generate coverage
Index: ipa-inline.c
===================================================================
--- ipa-inline.c        (revision 201461)
+++ ipa-inline.c        (working copy)
@@ -1322,7 +1322,7 @@  recursive_inlining (struct cgraph_edge *edge,
           reset_edge_growth_cache (curr);
        }

-      inline_call (curr, false, new_edges, &overall_size, true);
+      inline_call (curr, false, new_edges, &overall_size, true, false);
       lookup_recursive_calls (node, curr->callee, heap);
       n++;
     }
@@ -1612,7 +1612,8 @@  inline_small_functions (void)
            fprintf (dump_file, " Peeling recursion with depth %i\n", depth);

          gcc_checking_assert (!callee->global.inlined_to);
-         inline_call (edge, true, &new_indirect_edges, &overall_size, true);
+         inline_call (edge, true, &new_indirect_edges, &overall_size, true,
+                       false);
          if (flag_indirect_inlining)
            add_new_edges_to_heap (edge_heap, new_indirect_edges);

@@ -1733,7 +1734,7 @@  flatten_function (struct cgraph_node *node, bool e
                 xstrdup (cgraph_node_name (callee)),
                 xstrdup (cgraph_node_name (e->caller)));
       orig_callee = callee;
-      inline_call (e, true, NULL, NULL, false);
+      inline_call (e, true, NULL, NULL, false, early);
       if (e->callee != orig_callee)
        orig_callee->symbol.aux = (void *) node;
       flatten_function (e->callee, early);
@@ -1852,7 +1853,8 @@  ipa_inline (void)
                                   inline_summary
(node->callers->caller)->size);
                        }

-                     inline_call (node->callers, true, NULL, NULL, true);
+                     inline_call (node->callers, true, NULL, NULL, true,
+                                   false);
                      if (dump_file)
                        fprintf (dump_file,
                                 " Inlined into %s which now has %i size\n",
@@ -1925,7 +1927,7 @@  inline_always_inline_functions (struct cgraph_node
        fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
                 xstrdup (cgraph_node_name (e->callee)),
                 xstrdup (cgraph_node_name (e->caller)));
-      inline_call (e, true, NULL, NULL, false);
+      inline_call (e, true, NULL, NULL, false, true);
       inlined = true;
     }
   if (inlined)
@@ -1977,7 +1979,7 @@  early_inline_small_functions (struct cgraph_node *
        fprintf (dump_file, " Inlining %s into %s.\n",
                 xstrdup (cgraph_node_name (callee)),
                 xstrdup (cgraph_node_name (e->caller)));
-      inline_call (e, true, NULL, NULL, true);
+      inline_call (e, true, NULL, NULL, true, true);
       inlined = true;
     }

Index: ipa-inline.h
===================================================================
--- ipa-inline.h        (revision 201461)
+++ ipa-inline.h        (working copy)
@@ -228,7 +228,8 @@  void free_growth_caches (void);
 void compute_inline_parameters (struct cgraph_node *, bool);

 /* In ipa-inline-transform.c  */
-bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *,
int *, bool);
+bool inline_call (struct cgraph_edge *, bool, vec<cgraph_edge_p> *, int *,
+                  bool, bool);
 unsigned int inline_transform (struct cgraph_node *);
 void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *);

Index: testsuite/gcc.dg/pr40209.c
===================================================================
--- testsuite/gcc.dg/pr40209.c  (revision 201461)
+++ testsuite/gcc.dg/pr40209.c  (working copy)
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fprofile-use" } */
+/* { dg-options "-O2 -fprofile-use -fopt-info" } */

 void process(const char *s);

Index: testsuite/gcc.dg/pr26570.c
===================================================================
--- testsuite/gcc.dg/pr26570.c  (revision 201461)
+++ testsuite/gcc.dg/pr26570.c  (working copy)
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fprofile-generate -fprofile-use" } */
+/* { dg-options "-O2 -fprofile-generate -fprofile-use -fopt-info" } */

 unsigned test (unsigned a, unsigned b)
 {
Index: testsuite/gcc.dg/pr32773.c
===================================================================
--- testsuite/gcc.dg/pr32773.c  (revision 201461)
+++ testsuite/gcc.dg/pr32773.c  (working copy)
@@ -1,6 +1,6 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O -fprofile-use" } */
-/* { dg-options "-O -m4 -fprofile-use" { target sh-*-* } } */
+/* { dg-options "-O -fprofile-use -fopt-info" } */
+/* { dg-options "-O -m4 -fprofile-use -fopt-info" { target sh-*-* } } */

 void foo (int *p)
 {
Index: testsuite/g++.dg/tree-ssa/dom-invalid.C
===================================================================
--- testsuite/g++.dg/tree-ssa/dom-invalid.C     (revision 201461)
+++ testsuite/g++.dg/tree-ssa/dom-invalid.C     (working copy)
@@ -1,7 +1,7 @@ 
 // PR tree-optimization/39557
 // invalid post-dom info leads to infinite loop
 // { dg-do run }
-// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fno-rtti" }
+// { dg-options "-Wall -fno-exceptions -O2 -fprofile-use -fopt-info
-fno-rtti" }

 struct C
 {
Index: testsuite/gcc.dg/inline-dump.c
===================================================================
--- testsuite/gcc.dg/inline-dump.c      (revision 0)
+++ testsuite/gcc.dg/inline-dump.c      (revision 0)
@@ -0,0 +1,11 @@ 
+/* Verify that -fopt-info can output correct inline info.  */
+/* { dg-do compile } */
+/* { dg-options "-Wall -fopt-info-inline=stderr -O2 -fno-early-inlining" } */
+static inline int leaf() {
+  int i, ret = 0;
+  for (i = 0; i < 10; i++)
+    ret += i;
+  return ret;
+}
+static inline int foo(void) { return leaf(); } /* { dg-message "note:
leaf .*inlined into bar .*via inline instance foo.*\n" } */