Patchwork decide edge's hotness when there is profile info

login
register
mail settings
Submitter Teresa Johnson
Date Nov. 8, 2013, 9:58 p.m.
Message ID <CAAe5K+UOn4wxvwsGhy_EjiFFQMDB-emRbpNJkY5bBf6QgRJo4w@mail.gmail.com>
Download mbox | patch
Permalink /patch/289925/
State New
Headers show

Comments

Teresa Johnson - Nov. 8, 2013, 9:58 p.m.
On Tue, Nov 5, 2013 at 6:02 AM, Teresa Johnson <tejohnson@google.com> wrote:
> Ping.
> Thanks, Teresa
>

I updated the patch to include the follow-on work to apply call counts
to 0-weight functions using the estimated frequencies when inlining.
This helps avoid 0-weight blocks of inlined code which are
particularly bad for function splitting.

Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on
profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk?

Thanks,
Teresa

2013-11-08  Teresa Johnson  <tejohnson@google.com>
            Jan Hubicka  <jh@suse.cz>

        * predict.c (drop_profile): New function.
        (handle_missing_profiles): Ditto.
        (counts_to_freqs): Don't overwrite estimated frequencies
        when function has no profile counts.
        * predict.h (handle_missing_profiles): Declare.
        * tree-inline.c (freqs_to_counts): New function.
        (copy_cfg_body): Invoke freqs_to_counts as needed.
        * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.


> On Wed, Oct 30, 2013 at 1:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Fri, Oct 18, 2013 at 2:17 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> Here is the patch updated to use the new parameter from r203830.
>>>> Passed bootstrap and regression tests.
>>>>
>>>> 2013-10-18  Jan Hubicka  <jh@suse.cz>
>>>>             Teresa Johnson  <tejohnson@google.com>
>>>>
>>>>         * predict.c (handle_missing_profiles): New function.
>>>>         (counts_to_freqs): Don't overwrite estimated frequencies
>>>>         when function has no profile counts.
>>>>         * predict.h (handle_missing_profiles): Declare.
>>>>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>>>
>>>> Index: predict.c
>>>> ===================================================================
>>>> --- predict.c   (revision 203830)
>>>> +++ predict.c   (working copy)
>>>> @@ -2758,6 +2758,40 @@ estimate_loops (void)
>>>>    BITMAP_FREE (tovisit);
>>>>  }
>>>>
>>>
>>> You need block comment. It should explain the problem of COMDATs and how they are handled.
>>> It is not an obvious thing.
>>
>> Done.
>>
>>>
>>>> +void
>>>> +handle_missing_profiles (void)
>>>> +{
>>>> +  struct cgraph_node *node;
>>>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>>> extra line
>>>> +  /* See if 0 count function has non-0 count callers.  In this case we
>>>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>>>> +  FOR_EACH_DEFINED_FUNCTION (node)
>>>> +    {
>>>> +      struct cgraph_edge *e;
>>>> +      gcov_type call_count = 0;
>>>> +      struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl);
>>> Extra line
>>>> +      if (node->count)
>>>> +        continue;
>>>> +      for (e = node->callers; e; e = e->next_caller)
>>>> +        call_count += e->count;
>>> What about checking if the sum is way off even for non-0 counts.
>>> I.e. for case where function was inlined to some calls but not to others?  In
>>> that case we may want to scale up the counts (with some resonable care for
>>> capping)
>>
>> In this patch I am not changing any counts, so I am leaving this one
>> for follow-on work (even for the routines missing counts completely
>> like these, we don't apply any counts, just mark them as guessed. I
>> have a follow-on patch to send once this goes in that does apply
>> counts to these 0-count routines only when we decide to inline as we
>> discussed).
>>
>>>
>>> Also I think the code really should propagate - i.e. have simple worklist and
>>> look for functions that are COMDAT and are called by other COMDAT functions
>>> where we decided to drop the profile.  Having non-trivial chains of comdats is
>>> common thing.
>>
>> Done.
>>
>>>
>>> What about outputting user visible warning/error on the incosnsistency when
>>> no COMDAT function is involved same way as we do for BB profile?
>>
>> Done. This one caught the fact that we have this situation for "extern
>> template" functions as well when I tested on cpu2006. I added in a
>> check to ignore those when issuing the warning (they are not emitted
>> thus don't get any profile counts).
>>
>> Updated patch below.
>>
>> Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on
>> profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2013-10-30  Teresa Johnson  <tejohnson@google.com>
>>
>>         * predict.c (drop_profile): New function.
>>         (handle_missing_profiles): Ditto.
>>         (counts_to_freqs): Don't overwrite estimated frequencies
>>         when function has no profile counts.
>>         * predict.h (handle_missing_profiles): Declare.
>>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>
>> Index: predict.c
>> ===================================================================
>> --- predict.c   (revision 204213)
>> +++ predict.c   (working copy)
>> @@ -2765,6 +2765,107 @@ estimate_loops (void)
>>    BITMAP_FREE (tovisit);
>>  }
>>
>> +/* Drop the profile for NODE to guessed, and update its frequency based on
>> +   whether it is expected to be HOT.  */
>> +
>> +static void
>> +drop_profile (struct cgraph_node *node, bool hot)
>> +{
>> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>> +
>> +  if (dump_file)
>> +    fprintf (dump_file,
>> +             "Dropping 0 profile for %s/%i. %s based on calls.\n",
>> +             cgraph_node_name (node), node->order,
>> +             hot ? "Function is hot" : "Function is normal");
>> +  /* We only expect to miss profiles for functions that are reached
>> +     via non-zero call edges in cases where the function may have
>> +     been linked from another module or library (COMDATs and extern
>> +     templates). See the comments below for handle_missing_profiles.  */
>> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
>> +    warning (0,
>> +             "Missing counts for called function %s/%i",
>> +             cgraph_node_name (node), node->order);
>> +
>> +  profile_status_for_function (fn)
>> +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>> +  node->frequency
>> +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>> +}
>> +
>> +/* In the case of COMDAT routines, multiple object files will contain the same
>> +   function and the linker will select one for the binary. In that case
>> +   all the other copies from the profile instrument binary will be missing
>> +   profile counts. Look for cases where this happened, due to non-zero
>> +   call counts going to 0-count functions, and drop the profile to guessed
>> +   so that we can use the estimated probabilities and avoid optimizing only
>> +   for size.
>> +
>> +   The other case where the profile may be missing is when the routine
>> +   is not going to be emitted to the object file, e.g. for "extern template"
>> +   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
>> +   all other cases of non-zero calls to 0-count functions.  */
>> +
>> +void
>> +handle_missing_profiles (void)
>> +{
>> +  struct cgraph_node *node;
>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>> +  vec<struct cgraph_node *> worklist;
>> +  worklist.create (64);
>> +
>> +  /* See if 0 count function has non-0 count callers.  In this case we
>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>> +  FOR_EACH_DEFINED_FUNCTION (node)
>> +    {
>> +      struct cgraph_edge *e;
>> +      gcov_type call_count = 0;
>> +      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>> +
>> +      if (node->count)
>> +        continue;
>> +      for (e = node->callers; e; e = e->next_caller)
>> +        call_count += e->count;
>> +      if (call_count
>> +          && fn && fn->cfg
>> +          && (call_count * unlikely_count_fraction >= profile_info->runs))
>> +        {
>> +          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
>> +
>> +          drop_profile (node, maybe_hot);
>> +          worklist.safe_push (node);
>> +        }
>> +    }
>> +
>> +  /* Propagate the profile dropping to other 0-count COMDATs that are
>> +     potentially called by COMDATs we already dropped the profile on.  */
>> +  while (worklist.length () > 0)
>> +    {
>> +      struct cgraph_edge *e;
>> +
>> +      node = worklist.pop ();
>> +      for (e = node->callees; e; e = e->next_caller)
>> +        {
>> +          struct cgraph_node *callee = e->callee;
>> +          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
>> +
>> +          if (callee->count > 0)
>> +            continue;
>> +          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>> +              && profile_status_for_function (fn) == PROFILE_READ)
>> +            {
>> +              /* Since there are no non-0 call counts to this function,
>> +                 we don't know for sure whether it is hot. Indicate to
>> +                 the drop_profile routine that function should be marked
>> +                 normal, rather than hot.  */
>> +              drop_profile (node, false);
>> +              worklist.safe_push (callee);
>> +            }
>> +        }
>> +    }
>> +  worklist.release ();
>> +}
>> +
>>  /* Convert counts measured by profile driven feedback to frequencies.
>>     Return nonzero iff there was any nonzero execution count.  */
>>
>> @@ -2774,6 +2875,9 @@ counts_to_freqs (void)
>>    gcov_type count_max, true_count_max = 0;
>>    basic_block bb;
>>
>> +  if (!ENTRY_BLOCK_PTR->count)
>> +    return 0;
>> +
>>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>>      true_count_max = MAX (bb->count, true_count_max);
>>
>> Index: predict.h
>> ===================================================================
>> --- predict.h   (revision 204213)
>> +++ predict.h   (working copy)
>> @@ -37,6 +37,7 @@ enum prediction
>>
>>  extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
>>  extern int counts_to_freqs (void);
>> +extern void handle_missing_profiles (void);
>>  extern void estimate_bb_frequencies (bool);
>>  extern const char *predictor_name (enum br_predictor);
>>  extern tree build_predict_expr (enum br_predictor, enum prediction);
>> Index: tree-profile.c
>> ===================================================================
>> --- tree-profile.c      (revision 204213)
>> +++ tree-profile.c      (working copy)
>> @@ -612,6 +612,8 @@ tree_profiling (void)
>>        pop_cfun ();
>>      }
>>
>> +  handle_missing_profiles ();
>> +
>>    del_node_map ();
>>    return 0;
>>  }
>>
>>>
>>> Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Steven Bosscher - Nov. 8, 2013, 10:14 p.m.
On Fri, Nov 8, 2013 at 10:58 PM, Teresa Johnson wrote:

> +static void
> +drop_profile (struct cgraph_node *node, bool hot)
> +{
> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +  if (dump_file)
> +    fprintf (dump_file,
> +             "Dropping 0 profile for %s/%i. %s based on calls.\n",
> +             cgraph_node_name (node), node->order,
> +             hot ? "Function is hot" : "Function is normal");
> +  /* We only expect to miss profiles for functions that are reached
> +     via non-zero call edges in cases where the function may have
> +     been linked from another module or library (COMDATs and extern
> +     templates). See the comments below for handle_missing_profiles.  */
> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
> +    warning (0,
> +             "Missing counts for called function %s/%i",
> +             cgraph_node_name (node), node->order);

Maybe OPT_Wdisabled_optimization?



> +
> +  profile_status_for_function (fn)
> +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> +  node->frequency
> +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;

In GCC code style the = goes at the end of the line:

  profile_status_for_function (fn)
    (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
  node->frequency =
    hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;



> @@ -2774,6 +2875,9 @@ counts_to_freqs (void)
>    gcov_type count_max, true_count_max = 0;
>    basic_block bb;
>
> +  if (!ENTRY_BLOCK_PTR->count)
> +    return 0;
> +

Deserves a one-line comment ;-)




> +void
> +freqs_to_counts (struct cgraph_node *node, gcov_type count)
> +{
> +  basic_block bb;
> +  edge_iterator ei;
> +  edge e;
> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +  FOR_ALL_BB_FN(bb, fn)
> +  {
> +    bb->count = apply_scale (count,
> +                             GCOV_COMPUTE_SCALE (bb->frequency, BB_FREQ_MAX));
> +    FOR_EACH_EDGE (e, ei, bb->succs)
> +      e->count = apply_probability (e->src->count, e->probability);
> +  }

Indent +2:

  FOR_ALL_BB_FN (bb, fn)
    {
      ...
    }


Ciao!
Steven
Eric Botcazou - Nov. 10, 2013, 12:08 p.m.
> > +
> > +  profile_status_for_function (fn)
> > +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> > +  node->frequency
> > +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
> 
> In GCC code style the = goes at the end of the line:
> 
>   profile_status_for_function (fn)
>     (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>   node->frequency =
>     hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;

Absolutely not, Teresa's version is the correct one, see reload.c for example.
Steven Bosscher - Nov. 10, 2013, 4:37 p.m.
On Sun, Nov 10, 2013 at 1:08 PM, Eric Botcazou wrote:
>> > +
>> > +  profile_status_for_function (fn)
>> > +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>> > +  node->frequency
>> > +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>>
>> In GCC code style the = goes at the end of the line:
>>
>>   profile_status_for_function (fn)
>>     (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>>   node->frequency =
>>     hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>
> Absolutely not, Teresa's version is the correct one, see reload.c for example.


Hmm, "absolutely"?

[stevenb@gcc1-power7 trunk]$ egrep -ch "^\s+= " gcc/*.[ch] | awk
'{sum=sum+$1}END{print sum}'
1797
[stevenb@gcc1-power7 trunk]$ egrep -ch "\s=$" gcc/*.[ch] | awk
'{sum=sum+$1}END{print sum}'
685

I can't find a rule/guide in the GNU or GCC coding style documents.

Anyway, not important. The "=" starting a new line is the majority, so
Theresa please forget about my comment :-)

Ciao!
Steven
Jan Hubicka - Nov. 11, 2013, 3:23 p.m.
> 2013-11-08  Teresa Johnson  <tejohnson@google.com>
>             Jan Hubicka  <jh@suse.cz>
> 
>         * predict.c (drop_profile): New function.
>         (handle_missing_profiles): Ditto.
>         (counts_to_freqs): Don't overwrite estimated frequencies
>         when function has no profile counts.
>         * predict.h (handle_missing_profiles): Declare.
>         * tree-inline.c (freqs_to_counts): New function.
>         (copy_cfg_body): Invoke freqs_to_counts as needed.
>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
> 
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 204516)
> +++ predict.c   (working copy)
> @@ -2765,6 +2765,107 @@ estimate_loops (void)
>    BITMAP_FREE (tovisit);
>  }
> 
> +/* Drop the profile for NODE to guessed, and update its frequency based on
> +   whether it is expected to be HOT.  */
> +
> +static void
> +drop_profile (struct cgraph_node *node, bool hot)
> +{
> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +  if (dump_file)
> +    fprintf (dump_file,
> +             "Dropping 0 profile for %s/%i. %s based on calls.\n",
> +             cgraph_node_name (node), node->order,
> +             hot ? "Function is hot" : "Function is normal");
> +  /* We only expect to miss profiles for functions that are reached
> +     via non-zero call edges in cases where the function may have
> +     been linked from another module or library (COMDATs and extern
> +     templates). See the comments below for handle_missing_profiles.  */
> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
> +    warning (0,
> +             "Missing counts for called function %s/%i",
> +             cgraph_node_name (node), node->order);
> +
> +  profile_status_for_function (fn)
> +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> +  node->frequency
> +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
> +}
> +
> +/* In the case of COMDAT routines, multiple object files will contain the same
> +   function and the linker will select one for the binary. In that case
> +   all the other copies from the profile instrument binary will be missing
> +   profile counts. Look for cases where this happened, due to non-zero
> +   call counts going to 0-count functions, and drop the profile to guessed
> +   so that we can use the estimated probabilities and avoid optimizing only
> +   for size.
> +
> +   The other case where the profile may be missing is when the routine
> +   is not going to be emitted to the object file, e.g. for "extern template"
> +   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
> +   all other cases of non-zero calls to 0-count functions.  */
> +
> +void
> +handle_missing_profiles (void)
> +{
> +  struct cgraph_node *node;
> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
> +  vec<struct cgraph_node *> worklist;
> +  worklist.create (64);
> +
> +  /* See if 0 count function has non-0 count callers.  In this case we
> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    {
> +      struct cgraph_edge *e;
> +      gcov_type call_count = 0;
> +      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +      if (node->count)
> +        continue;
> +      for (e = node->callers; e; e = e->next_caller)
> +        call_count += e->count;
> +      if (call_count
> +          && fn && fn->cfg
> +          && (call_count * unlikely_count_fraction >= profile_info->runs))
> +        {
> +          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
> +
> +          drop_profile (node, maybe_hot);
> +          worklist.safe_push (node);
> +        }

Perhaps we should add an note/error on mishandled profile when function is not COMDAT?
That way we may notice further bugs in this area.
> +    }
> +
> +  /* Propagate the profile dropping to other 0-count COMDATs that are
> +     potentially called by COMDATs we already dropped the profile on.  */
> +  while (worklist.length () > 0)
> +    {
> +      struct cgraph_edge *e;
> +
> +      node = worklist.pop ();
> +      for (e = node->callees; e; e = e->next_caller)
> +        {
> +          struct cgraph_node *callee = e->callee;
> +          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
> +
> +          if (callee->count > 0)
> +            continue;
> +          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
> +              && profile_status_for_function (fn) == PROFILE_READ)

Perhaps we can check here maybe_hot_bb_p for the caller and rely on profile estimate
to give us false only for really known to be unlikely paths? (i.e. EH handling, noreturns
etc.)
> +            {
> +              /* Since there are no non-0 call counts to this function,
> +                 we don't know for sure whether it is hot. Indicate to
> +                 the drop_profile routine that function should be marked
> +                 normal, rather than hot.  */
> +              drop_profile (node, false);
> +              worklist.safe_push (callee);
> +            }
> +        }
> +    }
> +  worklist.release ();

I would add a pointer set so we avoid duplicates in worklist.  This is potentially quadratic
for large programs.

OK, with these changes.

Honza
Teresa Johnson - Nov. 11, 2013, 3:55 p.m.
On Mon, Nov 11, 2013 at 7:23 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> 2013-11-08  Teresa Johnson  <tejohnson@google.com>
>>             Jan Hubicka  <jh@suse.cz>
>>
>>         * predict.c (drop_profile): New function.
>>         (handle_missing_profiles): Ditto.
>>         (counts_to_freqs): Don't overwrite estimated frequencies
>>         when function has no profile counts.
>>         * predict.h (handle_missing_profiles): Declare.
>>         * tree-inline.c (freqs_to_counts): New function.
>>         (copy_cfg_body): Invoke freqs_to_counts as needed.
>>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>
>> Index: predict.c
>> ===================================================================
>> --- predict.c   (revision 204516)
>> +++ predict.c   (working copy)
>> @@ -2765,6 +2765,107 @@ estimate_loops (void)
>>    BITMAP_FREE (tovisit);
>>  }
>>
>> +/* Drop the profile for NODE to guessed, and update its frequency based on
>> +   whether it is expected to be HOT.  */
>> +
>> +static void
>> +drop_profile (struct cgraph_node *node, bool hot)
>> +{
>> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>> +
>> +  if (dump_file)
>> +    fprintf (dump_file,
>> +             "Dropping 0 profile for %s/%i. %s based on calls.\n",
>> +             cgraph_node_name (node), node->order,
>> +             hot ? "Function is hot" : "Function is normal");
>> +  /* We only expect to miss profiles for functions that are reached
>> +     via non-zero call edges in cases where the function may have
>> +     been linked from another module or library (COMDATs and extern
>> +     templates). See the comments below for handle_missing_profiles.  */
>> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
>> +    warning (0,
>> +             "Missing counts for called function %s/%i",
>> +             cgraph_node_name (node), node->order);
>> +
>> +  profile_status_for_function (fn)
>> +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>> +  node->frequency
>> +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>> +}
>> +
>> +/* In the case of COMDAT routines, multiple object files will contain the same
>> +   function and the linker will select one for the binary. In that case
>> +   all the other copies from the profile instrument binary will be missing
>> +   profile counts. Look for cases where this happened, due to non-zero
>> +   call counts going to 0-count functions, and drop the profile to guessed
>> +   so that we can use the estimated probabilities and avoid optimizing only
>> +   for size.
>> +
>> +   The other case where the profile may be missing is when the routine
>> +   is not going to be emitted to the object file, e.g. for "extern template"
>> +   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
>> +   all other cases of non-zero calls to 0-count functions.  */
>> +
>> +void
>> +handle_missing_profiles (void)
>> +{
>> +  struct cgraph_node *node;
>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>> +  vec<struct cgraph_node *> worklist;
>> +  worklist.create (64);
>> +
>> +  /* See if 0 count function has non-0 count callers.  In this case we
>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>> +  FOR_EACH_DEFINED_FUNCTION (node)
>> +    {
>> +      struct cgraph_edge *e;
>> +      gcov_type call_count = 0;
>> +      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>> +
>> +      if (node->count)
>> +        continue;
>> +      for (e = node->callers; e; e = e->next_caller)
>> +        call_count += e->count;
>> +      if (call_count
>> +          && fn && fn->cfg
>> +          && (call_count * unlikely_count_fraction >= profile_info->runs))
>> +        {
>> +          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
>> +
>> +          drop_profile (node, maybe_hot);
>> +          worklist.safe_push (node);
>> +        }
>
> Perhaps we should add an note/error on mishandled profile when function is not COMDAT?
> That way we may notice further bugs in this area.

I have a warning like that already in drop_profile(). Is that
sufficient? Also, Steven Bosscher suggested putting that warning under
OPT_Wdisabled_optimization instead of '0', what do you prefer for
that?

>> +    }
>> +
>> +  /* Propagate the profile dropping to other 0-count COMDATs that are
>> +     potentially called by COMDATs we already dropped the profile on.  */
>> +  while (worklist.length () > 0)
>> +    {
>> +      struct cgraph_edge *e;
>> +
>> +      node = worklist.pop ();
>> +      for (e = node->callees; e; e = e->next_caller)
>> +        {
>> +          struct cgraph_node *callee = e->callee;
>> +          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
>> +
>> +          if (callee->count > 0)
>> +            continue;
>> +          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>> +              && profile_status_for_function (fn) == PROFILE_READ)
>
> Perhaps we can check here maybe_hot_bb_p for the caller and rely on profile estimate
> to give us false only for really known to be unlikely paths? (i.e. EH handling, noreturns
> etc.)

Ok, let me try this.

>> +            {
>> +              /* Since there are no non-0 call counts to this function,
>> +                 we don't know for sure whether it is hot. Indicate to
>> +                 the drop_profile routine that function should be marked
>> +                 normal, rather than hot.  */
>> +              drop_profile (node, false);
>> +              worklist.safe_push (callee);
>> +            }
>> +        }
>> +    }
>> +  worklist.release ();
>
> I would add a pointer set so we avoid duplicates in worklist.  This is potentially quadratic
> for large programs.

I'll add a visited_nodes set to keep track of processed nodes so they
don't get added to the worklist multiple times.

Thanks,
Teresa

>
> OK, with these changes.
>
> Honza
Jan Hubicka - Nov. 11, 2013, 4:22 p.m.
> I have a warning like that already in drop_profile(). Is that

I think it should be warning (or silent) for COMDAT and error/note for
other functions (depending on flag_profile_correction).
I guess drop_profile is better place for it indeed.

> sufficient? Also, Steven Bosscher suggested putting that warning under
> OPT_Wdisabled_optimization instead of '0', what do you prefer for
> that?

It is a bit different case than other disabled optimizations we have (where
the optimization does not happen because of --param tunable limits), but I
am fine with both alternatives.
The warning may end up quite noisy so having way to silence it definitely
makes sense.
> 
> >> +    }
> >> +
> >> +  /* Propagate the profile dropping to other 0-count COMDATs that are
> >> +     potentially called by COMDATs we already dropped the profile on.  */
> >> +  while (worklist.length () > 0)
> >> +    {
> >> +      struct cgraph_edge *e;
> >> +
> >> +      node = worklist.pop ();
> >> +      for (e = node->callees; e; e = e->next_caller)
> >> +        {
> >> +          struct cgraph_node *callee = e->callee;
> >> +          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
> >> +
> >> +          if (callee->count > 0)
> >> +            continue;
> >> +          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
> >> +              && profile_status_for_function (fn) == PROFILE_READ)
> >
> > Perhaps we can check here maybe_hot_bb_p for the caller and rely on profile estimate
> > to give us false only for really known to be unlikely paths? (i.e. EH handling, noreturns
> > etc.)
> 
> Ok, let me try this.
> 
> >> +            {
> >> +              /* Since there are no non-0 call counts to this function,
> >> +                 we don't know for sure whether it is hot. Indicate to
> >> +                 the drop_profile routine that function should be marked
> >> +                 normal, rather than hot.  */
> >> +              drop_profile (node, false);
> >> +              worklist.safe_push (callee);
> >> +            }
> >> +        }
> >> +    }
> >> +  worklist.release ();
> >
> > I would add a pointer set so we avoid duplicates in worklist.  This is potentially quadratic
> > for large programs.
> 
> I'll add a visited_nodes set to keep track of processed nodes so they
> don't get added to the worklist multiple times.

Perhaps you can also track this by testing profile_status_for_function. Both solutions are fine for me.

Honza
> 
> Thanks,
> Teresa
> 
> >
> > OK, with these changes.
> >
> > Honza
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson - Nov. 11, 2013, 5:05 p.m.
On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> I have a warning like that already in drop_profile(). Is that
>
> I think it should be warning (or silent) for COMDAT and error/note for
> other functions (depending on flag_profile_correction).
> I guess drop_profile is better place for it indeed.

I don't want to warn for COMDAT since this is currently an expected
issue in that case, so I'm afraid it will be too noisy (but there is a
note emitted to the dump file to track how often this occurs). So
currently the warning is only emitted in cases where we don't expect
it to occur (e.g. non-comdat).

>
>> sufficient? Also, Steven Bosscher suggested putting that warning under
>> OPT_Wdisabled_optimization instead of '0', what do you prefer for
>> that?
>
> It is a bit different case than other disabled optimizations we have (where
> the optimization does not happen because of --param tunable limits), but I
> am fine with both alternatives.
> The warning may end up quite noisy so having way to silence it definitely
> makes sense.

I didn't find any warnings being emitted when running this for spec or
internal benchmarks, so how about instead of a warning, I handle it
like you suggest above: depending on the setting of
flag_profile_correction either error or emit a note to the dump file
under MSG_MISSED_OPTIMIZATION?

>>
>> >> +    }
>> >> +
>> >> +  /* Propagate the profile dropping to other 0-count COMDATs that are
>> >> +     potentially called by COMDATs we already dropped the profile on.  */
>> >> +  while (worklist.length () > 0)
>> >> +    {
>> >> +      struct cgraph_edge *e;
>> >> +
>> >> +      node = worklist.pop ();
>> >> +      for (e = node->callees; e; e = e->next_caller)
>> >> +        {
>> >> +          struct cgraph_node *callee = e->callee;
>> >> +          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
>> >> +
>> >> +          if (callee->count > 0)
>> >> +            continue;
>> >> +          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>> >> +              && profile_status_for_function (fn) == PROFILE_READ)
>> >
>> > Perhaps we can check here maybe_hot_bb_p for the caller and rely on profile estimate
>> > to give us false only for really known to be unlikely paths? (i.e. EH handling, noreturns
>> > etc.)
>>
>> Ok, let me try this.
>>
>> >> +            {
>> >> +              /* Since there are no non-0 call counts to this function,
>> >> +                 we don't know for sure whether it is hot. Indicate to
>> >> +                 the drop_profile routine that function should be marked
>> >> +                 normal, rather than hot.  */
>> >> +              drop_profile (node, false);
>> >> +              worklist.safe_push (callee);
>> >> +            }
>> >> +        }
>> >> +    }
>> >> +  worklist.release ();
>> >
>> > I would add a pointer set so we avoid duplicates in worklist.  This is potentially quadratic
>> > for large programs.
>>
>> I'll add a visited_nodes set to keep track of processed nodes so they
>> don't get added to the worklist multiple times.
>
> Perhaps you can also track this by testing profile_status_for_function. Both solutions are fine for me.

Ah - I just realized I am already checking profile_status_for_function
above and adding to the worklist if it is still PROFILE_READ. Since I
call drop_profile before adding to the worklist, we can't end up
adding multiple times. So I don't think there is currently an issue
with this.

Thanks,
Teresa

>
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > OK, with these changes.
>> >
>> > Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Jan Hubicka - Nov. 11, 2013, 5:13 p.m.
> On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> I have a warning like that already in drop_profile(). Is that
> >
> > I think it should be warning (or silent) for COMDAT and error/note for
> > other functions (depending on flag_profile_correction).
> > I guess drop_profile is better place for it indeed.
> 
> I don't want to warn for COMDAT since this is currently an expected
> issue in that case, so I'm afraid it will be too noisy (but there is a
> note emitted to the dump file to track how often this occurs). So
> currently the warning is only emitted in cases where we don't expect
> it to occur (e.g. non-comdat).

I see, I misread it.
The non-comdat warning IMO ought go the same way as the other profile confussions.
I guess something like
if (!flag_profile_correction)
  error (...);
like existing cases in profile.c
> 
> I didn't find any warnings being emitted when running this for spec or
> internal benchmarks, so how about instead of a warning, I handle it
> like you suggest above: depending on the setting of
> flag_profile_correction either error or emit a note to the dump file
> under MSG_MISSED_OPTIMIZATION?

Yep, lets just go with error with !flag_profile_correction and being silent
otherwise.
If we want to go with warnings, we need to change all the similar places
in profile.c and elsewhere.
> 
> Ah - I just realized I am already checking profile_status_for_function
> above and adding to the worklist if it is still PROFILE_READ. Since I
> call drop_profile before adding to the worklist, we can't end up
> adding multiple times. So I don't think there is currently an issue
> with this.

Yep, indeed.

The patch is OK with the error message as discussed above.

Thanks.
Honza
> 
> Thanks,
> Teresa
> 
> >
> > Honza
> >>
> >> Thanks,
> >> Teresa
> >>
> >> >
> >> > OK, with these changes.
> >> >
> >> > Honza
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson - Nov. 11, 2013, 5:17 p.m.
On Mon, Nov 11, 2013 at 9:13 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Mon, Nov 11, 2013 at 8:22 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> I have a warning like that already in drop_profile(). Is that
>> >
>> > I think it should be warning (or silent) for COMDAT and error/note for
>> > other functions (depending on flag_profile_correction).
>> > I guess drop_profile is better place for it indeed.
>>
>> I don't want to warn for COMDAT since this is currently an expected
>> issue in that case, so I'm afraid it will be too noisy (but there is a
>> note emitted to the dump file to track how often this occurs). So
>> currently the warning is only emitted in cases where we don't expect
>> it to occur (e.g. non-comdat).
>
> I see, I misread it.
> The non-comdat warning IMO ought go the same way as the other profile confussions.
> I guess something like
> if (!flag_profile_correction)
>   error (...);
> like existing cases in profile.c

Ok, will do (emitting a note to the dump file as in other cases in
profile.c that do this same thing).

>>
>> I didn't find any warnings being emitted when running this for spec or
>> internal benchmarks, so how about instead of a warning, I handle it
>> like you suggest above: depending on the setting of
>> flag_profile_correction either error or emit a note to the dump file
>> under MSG_MISSED_OPTIMIZATION?
>
> Yep, lets just go with error with !flag_profile_correction and being silent
> otherwise.
> If we want to go with warnings, we need to change all the similar places
> in profile.c and elsewhere.
>>
>> Ah - I just realized I am already checking profile_status_for_function
>> above and adding to the worklist if it is still PROFILE_READ. Since I
>> call drop_profile before adding to the worklist, we can't end up
>> adding multiple times. So I don't think there is currently an issue
>> with this.
>
> Yep, indeed.
>
> The patch is OK with the error message as discussed above.

Ok, will do that and fix a couple of minor issues mentioned by Steven
(missing comment and incorrect indentation in one spot). Will retest
just to make sure I didn't miss any warnings which will now be errors.

Thanks,
Teresa

>
> Thanks.
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > Honza
>> >>
>> >> Thanks,
>> >> Teresa
>> >>
>> >> >
>> >> > OK, with these changes.
>> >> >
>> >> > Honza
>> >>
>> >>
>> >>
>> >> --
>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

Patch

Index: predict.c
===================================================================
--- predict.c   (revision 204516)
+++ predict.c   (working copy)
@@ -2765,6 +2765,107 @@  estimate_loops (void)
   BITMAP_FREE (tovisit);
 }

+/* Drop the profile for NODE to guessed, and update its frequency based on
+   whether it is expected to be HOT.  */
+
+static void
+drop_profile (struct cgraph_node *node, bool hot)
+{
+  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+  if (dump_file)
+    fprintf (dump_file,
+             "Dropping 0 profile for %s/%i. %s based on calls.\n",
+             cgraph_node_name (node), node->order,
+             hot ? "Function is hot" : "Function is normal");
+  /* We only expect to miss profiles for functions that are reached
+     via non-zero call edges in cases where the function may have
+     been linked from another module or library (COMDATs and extern
+     templates). See the comments below for handle_missing_profiles.  */
+  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
+    warning (0,
+             "Missing counts for called function %s/%i",
+             cgraph_node_name (node), node->order);
+
+  profile_status_for_function (fn)
+      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+  node->frequency
+      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+}
+
+/* In the case of COMDAT routines, multiple object files will contain the same
+   function and the linker will select one for the binary. In that case
+   all the other copies from the profile instrument binary will be missing
+   profile counts. Look for cases where this happened, due to non-zero
+   call counts going to 0-count functions, and drop the profile to guessed
+   so that we can use the estimated probabilities and avoid optimizing only
+   for size.
+
+   The other case where the profile may be missing is when the routine
+   is not going to be emitted to the object file, e.g. for "extern template"
+   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
+   all other cases of non-zero calls to 0-count functions.  */
+
+void
+handle_missing_profiles (void)
+{
+  struct cgraph_node *node;
+  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
+  vec<struct cgraph_node *> worklist;
+  worklist.create (64);
+
+  /* See if 0 count function has non-0 count callers.  In this case we
+     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      struct cgraph_edge *e;
+      gcov_type call_count = 0;
+      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+      if (node->count)
+        continue;
+      for (e = node->callers; e; e = e->next_caller)
+        call_count += e->count;
+      if (call_count
+          && fn && fn->cfg
+          && (call_count * unlikely_count_fraction >= profile_info->runs))
+        {
+          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
+
+          drop_profile (node, maybe_hot);
+          worklist.safe_push (node);
+        }
+    }
+
+  /* Propagate the profile dropping to other 0-count COMDATs that are
+     potentially called by COMDATs we already dropped the profile on.  */
+  while (worklist.length () > 0)
+    {
+      struct cgraph_edge *e;
+
+      node = worklist.pop ();
+      for (e = node->callees; e; e = e->next_caller)
+        {
+          struct cgraph_node *callee = e->callee;
+          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
+
+          if (callee->count > 0)
+            continue;
+          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
+              && profile_status_for_function (fn) == PROFILE_READ)
+            {
+              /* Since there are no non-0 call counts to this function,
+                 we don't know for sure whether it is hot. Indicate to
+                 the drop_profile routine that function should be marked
+                 normal, rather than hot.  */
+              drop_profile (node, false);
+              worklist.safe_push (callee);
+            }
+        }
+    }
+  worklist.release ();
+}
+
 /* Convert counts measured by profile driven feedback to frequencies.
    Return nonzero iff there was any nonzero execution count.  */

@@ -2774,6 +2875,9 @@  counts_to_freqs (void)
   gcov_type count_max, true_count_max = 0;
   basic_block bb;

+  if (!ENTRY_BLOCK_PTR->count)
+    return 0;
+
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
     true_count_max = MAX (bb->count, true_count_max);

Index: predict.h
===================================================================
--- predict.h   (revision 204516)
+++ predict.h   (working copy)
@@ -37,6 +37,7 @@  enum prediction

 extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
 extern int counts_to_freqs (void);
+extern void handle_missing_profiles (void);
 extern void estimate_bb_frequencies (bool);
 extern const char *predictor_name (enum br_predictor);
 extern tree build_predict_expr (enum br_predictor, enum prediction);
Index: tree-inline.c
===================================================================
--- tree-inline.c       (revision 204516)
+++ tree-inline.c       (working copy)
@@ -2353,6 +2353,29 @@  redirect_all_calls (copy_body_data * id, basic_blo
     }
 }

+/* Convert estimated frequencies into counts for NODE, scaling COUNT
+   with each bb's frequency. Used when NODE has a 0-weight entry
+   but we are about to inline it into a non-zero count call bb.
+   See the comments for handle_missing_profiles() in predict.c for
+   when this can happen for COMDATs.  */
+
+void
+freqs_to_counts (struct cgraph_node *node, gcov_type count)
+{
+  basic_block bb;
+  edge_iterator ei;
+  edge e;
+  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+  FOR_ALL_BB_FN(bb, fn)
+  {
+    bb->count = apply_scale (count,
+                             GCOV_COMPUTE_SCALE (bb->frequency, BB_FREQ_MAX));
+    FOR_EACH_EDGE (e, ei, bb->succs)
+      e->count = apply_probability (e->src->count, e->probability);
+  }
+}
+
 /* Make a copy of the body of FN so that it can be inserted inline in
    another function.  Walks FN via CFG, returns new fndecl.  */

@@ -2373,6 +2396,24 @@  copy_cfg_body (copy_body_data * id, gcov_type coun
   int incoming_frequency = 0;
   gcov_type incoming_count = 0;

+  /* This can happen for COMDAT routines that end up with 0 counts
+     despite being called (see the comments for handle_missing_profiles()
+     in predict.c as to why). Apply counts to the blocks in the callee
+     before inlining, using the guessed edge frequencies, so that we don't
+     end up with a 0-count inline body which can confuse downstream
+     optimizations such as function splitting.  */
+  if (!ENTRY_BLOCK_PTR_FOR_FUNCTION (src_cfun)->count && count)
+    {
+      /* Apply the larger of the call bb count and the total incoming
+         call edge count to the callee.  */
+      gcov_type in_count = 0;
+      struct cgraph_edge *in_edge;
+      for (in_edge = id->src_node->callers; in_edge;
+           in_edge = in_edge->next_caller)
+        in_count += in_edge->count;
+      freqs_to_counts (id->src_node, count > in_count ? count : in_count);
+    }
+
   if (ENTRY_BLOCK_PTR_FOR_FUNCTION (src_cfun)->count)
     count_scale
         = GCOV_COMPUTE_SCALE (count,
Index: tree-profile.c
===================================================================
--- tree-profile.c      (revision 204516)
+++ tree-profile.c      (working copy)
@@ -612,6 +612,8 @@  tree_profiling (void)
       pop_cfun ();
     }

+  handle_missing_profiles ();
+
   del_node_map ();
   return 0;
 }