Patchwork decide edge's hotness when there is profile info

login
register
mail settings
Submitter Teresa Johnson
Date Oct. 30, 2013, 8:15 p.m.
Message ID <CAAe5K+XKp_5eBhEXxF29OmdrRv9nyr+aXW+PCdGH4hsFGY0X3w@mail.gmail.com>
Download mbox | patch
Permalink /patch/287337/
State New
Headers show

Comments

Teresa Johnson - Oct. 30, 2013, 8:15 p.m.
On Fri, Oct 18, 2013 at 2:17 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Here is the patch updated to use the new parameter from r203830.
>> Passed bootstrap and regression tests.
>>
>> 2013-10-18  Jan Hubicka  <jh@suse.cz>
>>             Teresa Johnson  <tejohnson@google.com>
>>
>>         * predict.c (handle_missing_profiles): New function.
>>         (counts_to_freqs): Don't overwrite estimated frequencies
>>         when function has no profile counts.
>>         * predict.h (handle_missing_profiles): Declare.
>>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>
>> Index: predict.c
>> ===================================================================
>> --- predict.c   (revision 203830)
>> +++ predict.c   (working copy)
>> @@ -2758,6 +2758,40 @@ estimate_loops (void)
>>    BITMAP_FREE (tovisit);
>>  }
>>
>
> You need block comment. It should explain the problem of COMDATs and how they are handled.
> It is not an obvious thing.

Done.

>
>> +void
>> +handle_missing_profiles (void)
>> +{
>> +  struct cgraph_node *node;
>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
> extra line
>> +  /* See if 0 count function has non-0 count callers.  In this case we
>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>> +  FOR_EACH_DEFINED_FUNCTION (node)
>> +    {
>> +      struct cgraph_edge *e;
>> +      gcov_type call_count = 0;
>> +      struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl);
> Extra line
>> +      if (node->count)
>> +        continue;
>> +      for (e = node->callers; e; e = e->next_caller)
>> +        call_count += e->count;
> What about checking if the sum is way off even for non-0 counts.
> I.e. for case where function was inlined to some calls but not to others?  In
> that case we may want to scale up the counts (with some resonable care for
> capping)

In this patch I am not changing any counts, so I am leaving this one
for follow-on work (even for the routines missing counts completely
like these, we don't apply any counts, just mark them as guessed. I
have a follow-on patch to send once this goes in that does apply
counts to these 0-count routines only when we decide to inline as we
discussed).

>
> Also I think the code really should propagate - i.e. have simple worklist and
> look for functions that are COMDAT and are called by other COMDAT functions
> where we decided to drop the profile.  Having non-trivial chains of comdats is
> common thing.

Done.

>
> What about outputting user visible warning/error on the incosnsistency when
> no COMDAT function is involved same way as we do for BB profile?

Done. This one caught the fact that we have this situation for "extern
template" functions as well when I tested on cpu2006. I added in a
check to ignore those when issuing the warning (they are not emitted
thus don't get any profile counts).

Updated patch below.

Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on
profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk?

Thanks,
Teresa

2013-10-30  Teresa Johnson  <tejohnson@google.com>

        * predict.c (drop_profile): New function.
        (handle_missing_profiles): Ditto.
        (counts_to_freqs): Don't overwrite estimated frequencies
        when function has no profile counts.
        * predict.h (handle_missing_profiles): Declare.
        * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.


>
> Honza
Teresa Johnson - Nov. 5, 2013, 2:02 p.m.
Ping.
Thanks, Teresa

On Wed, Oct 30, 2013 at 1:15 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Fri, Oct 18, 2013 at 2:17 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> Here is the patch updated to use the new parameter from r203830.
>>> Passed bootstrap and regression tests.
>>>
>>> 2013-10-18  Jan Hubicka  <jh@suse.cz>
>>>             Teresa Johnson  <tejohnson@google.com>
>>>
>>>         * predict.c (handle_missing_profiles): New function.
>>>         (counts_to_freqs): Don't overwrite estimated frequencies
>>>         when function has no profile counts.
>>>         * predict.h (handle_missing_profiles): Declare.
>>>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>>>
>>> Index: predict.c
>>> ===================================================================
>>> --- predict.c   (revision 203830)
>>> +++ predict.c   (working copy)
>>> @@ -2758,6 +2758,40 @@ estimate_loops (void)
>>>    BITMAP_FREE (tovisit);
>>>  }
>>>
>>
>> You need block comment. It should explain the problem of COMDATs and how they are handled.
>> It is not an obvious thing.
>
> Done.
>
>>
>>> +void
>>> +handle_missing_profiles (void)
>>> +{
>>> +  struct cgraph_node *node;
>>> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
>> extra line
>>> +  /* See if 0 count function has non-0 count callers.  In this case we
>>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>>> +  FOR_EACH_DEFINED_FUNCTION (node)
>>> +    {
>>> +      struct cgraph_edge *e;
>>> +      gcov_type call_count = 0;
>>> +      struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl);
>> Extra line
>>> +      if (node->count)
>>> +        continue;
>>> +      for (e = node->callers; e; e = e->next_caller)
>>> +        call_count += e->count;
>> What about checking if the sum is way off even for non-0 counts.
>> I.e. for case where function was inlined to some calls but not to others?  In
>> that case we may want to scale up the counts (with some resonable care for
>> capping)
>
> In this patch I am not changing any counts, so I am leaving this one
> for follow-on work (even for the routines missing counts completely
> like these, we don't apply any counts, just mark them as guessed. I
> have a follow-on patch to send once this goes in that does apply
> counts to these 0-count routines only when we decide to inline as we
> discussed).
>
>>
>> Also I think the code really should propagate - i.e. have simple worklist and
>> look for functions that are COMDAT and are called by other COMDAT functions
>> where we decided to drop the profile.  Having non-trivial chains of comdats is
>> common thing.
>
> Done.
>
>>
>> What about outputting user visible warning/error on the incosnsistency when
>> no COMDAT function is involved same way as we do for BB profile?
>
> Done. This one caught the fact that we have this situation for "extern
> template" functions as well when I tested on cpu2006. I added in a
> check to ignore those when issuing the warning (they are not emitted
> thus don't get any profile counts).
>
> Updated patch below.
>
> Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on
> profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk?
>
> Thanks,
> Teresa
>
> 2013-10-30  Teresa Johnson  <tejohnson@google.com>
>
>         * predict.c (drop_profile): New function.
>         (handle_missing_profiles): Ditto.
>         (counts_to_freqs): Don't overwrite estimated frequencies
>         when function has no profile counts.
>         * predict.h (handle_missing_profiles): Declare.
>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles.
>
> Index: predict.c
> ===================================================================
> --- predict.c   (revision 204213)
> +++ predict.c   (working copy)
> @@ -2765,6 +2765,107 @@ estimate_loops (void)
>    BITMAP_FREE (tovisit);
>  }
>
> +/* Drop the profile for NODE to guessed, and update its frequency based on
> +   whether it is expected to be HOT.  */
> +
> +static void
> +drop_profile (struct cgraph_node *node, bool hot)
> +{
> +  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +  if (dump_file)
> +    fprintf (dump_file,
> +             "Dropping 0 profile for %s/%i. %s based on calls.\n",
> +             cgraph_node_name (node), node->order,
> +             hot ? "Function is hot" : "Function is normal");
> +  /* We only expect to miss profiles for functions that are reached
> +     via non-zero call edges in cases where the function may have
> +     been linked from another module or library (COMDATs and extern
> +     templates). See the comments below for handle_missing_profiles.  */
> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
> +    warning (0,
> +             "Missing counts for called function %s/%i",
> +             cgraph_node_name (node), node->order);
> +
> +  profile_status_for_function (fn)
> +      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> +  node->frequency
> +      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
> +}
> +
> +/* In the case of COMDAT routines, multiple object files will contain the same
> +   function and the linker will select one for the binary. In that case
> +   all the other copies from the profile instrument binary will be missing
> +   profile counts. Look for cases where this happened, due to non-zero
> +   call counts going to 0-count functions, and drop the profile to guessed
> +   so that we can use the estimated probabilities and avoid optimizing only
> +   for size.
> +
> +   The other case where the profile may be missing is when the routine
> +   is not going to be emitted to the object file, e.g. for "extern template"
> +   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
> +   all other cases of non-zero calls to 0-count functions.  */
> +
> +void
> +handle_missing_profiles (void)
> +{
> +  struct cgraph_node *node;
> +  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
> +  vec<struct cgraph_node *> worklist;
> +  worklist.create (64);
> +
> +  /* See if 0 count function has non-0 count callers.  In this case we
> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    {
> +      struct cgraph_edge *e;
> +      gcov_type call_count = 0;
> +      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> +      if (node->count)
> +        continue;
> +      for (e = node->callers; e; e = e->next_caller)
> +        call_count += e->count;
> +      if (call_count
> +          && fn && fn->cfg
> +          && (call_count * unlikely_count_fraction >= profile_info->runs))
> +        {
> +          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
> +
> +          drop_profile (node, maybe_hot);
> +          worklist.safe_push (node);
> +        }
> +    }
> +
> +  /* Propagate the profile dropping to other 0-count COMDATs that are
> +     potentially called by COMDATs we already dropped the profile on.  */
> +  while (worklist.length () > 0)
> +    {
> +      struct cgraph_edge *e;
> +
> +      node = worklist.pop ();
> +      for (e = node->callees; e; e = e->next_caller)
> +        {
> +          struct cgraph_node *callee = e->callee;
> +          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
> +
> +          if (callee->count > 0)
> +            continue;
> +          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
> +              && profile_status_for_function (fn) == PROFILE_READ)
> +            {
> +              /* Since there are no non-0 call counts to this function,
> +                 we don't know for sure whether it is hot. Indicate to
> +                 the drop_profile routine that function should be marked
> +                 normal, rather than hot.  */
> +              drop_profile (node, false);
> +              worklist.safe_push (callee);
> +            }
> +        }
> +    }
> +  worklist.release ();
> +}
> +
>  /* Convert counts measured by profile driven feedback to frequencies.
>     Return nonzero iff there was any nonzero execution count.  */
>
> @@ -2774,6 +2875,9 @@ counts_to_freqs (void)
>    gcov_type count_max, true_count_max = 0;
>    basic_block bb;
>
> +  if (!ENTRY_BLOCK_PTR->count)
> +    return 0;
> +
>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>      true_count_max = MAX (bb->count, true_count_max);
>
> Index: predict.h
> ===================================================================
> --- predict.h   (revision 204213)
> +++ predict.h   (working copy)
> @@ -37,6 +37,7 @@ enum prediction
>
>  extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
>  extern int counts_to_freqs (void);
> +extern void handle_missing_profiles (void);
>  extern void estimate_bb_frequencies (bool);
>  extern const char *predictor_name (enum br_predictor);
>  extern tree build_predict_expr (enum br_predictor, enum prediction);
> Index: tree-profile.c
> ===================================================================
> --- tree-profile.c      (revision 204213)
> +++ tree-profile.c      (working copy)
> @@ -612,6 +612,8 @@ tree_profiling (void)
>        pop_cfun ();
>      }
>
> +  handle_missing_profiles ();
> +
>    del_node_map ();
>    return 0;
>  }
>
>>
>> Honza
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

Patch

Index: predict.c
===================================================================
--- predict.c   (revision 204213)
+++ predict.c   (working copy)
@@ -2765,6 +2765,107 @@  estimate_loops (void)
   BITMAP_FREE (tovisit);
 }

+/* Drop the profile for NODE to guessed, and update its frequency based on
+   whether it is expected to be HOT.  */
+
+static void
+drop_profile (struct cgraph_node *node, bool hot)
+{
+  struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+  if (dump_file)
+    fprintf (dump_file,
+             "Dropping 0 profile for %s/%i. %s based on calls.\n",
+             cgraph_node_name (node), node->order,
+             hot ? "Function is hot" : "Function is normal");
+  /* We only expect to miss profiles for functions that are reached
+     via non-zero call edges in cases where the function may have
+     been linked from another module or library (COMDATs and extern
+     templates). See the comments below for handle_missing_profiles.  */
+  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
+    warning (0,
+             "Missing counts for called function %s/%i",
+             cgraph_node_name (node), node->order);
+
+  profile_status_for_function (fn)
+      = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+  node->frequency
+      = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+}
+
+/* In the case of COMDAT routines, multiple object files will contain the same
+   function and the linker will select one for the binary. In that case
+   all the other copies from the profile instrument binary will be missing
+   profile counts. Look for cases where this happened, due to non-zero
+   call counts going to 0-count functions, and drop the profile to guessed
+   so that we can use the estimated probabilities and avoid optimizing only
+   for size.
+
+   The other case where the profile may be missing is when the routine
+   is not going to be emitted to the object file, e.g. for "extern template"
+   class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
+   all other cases of non-zero calls to 0-count functions.  */
+
+void
+handle_missing_profiles (void)
+{
+  struct cgraph_node *node;
+  int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
+  vec<struct cgraph_node *> worklist;
+  worklist.create (64);
+
+  /* See if 0 count function has non-0 count callers.  In this case we
+     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      struct cgraph_edge *e;
+      gcov_type call_count = 0;
+      struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+
+      if (node->count)
+        continue;
+      for (e = node->callers; e; e = e->next_caller)
+        call_count += e->count;
+      if (call_count
+          && fn && fn->cfg
+          && (call_count * unlikely_count_fraction >= profile_info->runs))
+        {
+          bool maybe_hot = maybe_hot_count_p (NULL, call_count);
+
+          drop_profile (node, maybe_hot);
+          worklist.safe_push (node);
+        }
+    }
+
+  /* Propagate the profile dropping to other 0-count COMDATs that are
+     potentially called by COMDATs we already dropped the profile on.  */
+  while (worklist.length () > 0)
+    {
+      struct cgraph_edge *e;
+
+      node = worklist.pop ();
+      for (e = node->callees; e; e = e->next_caller)
+        {
+          struct cgraph_node *callee = e->callee;
+          struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
+
+          if (callee->count > 0)
+            continue;
+          if (DECL_COMDAT (callee->decl) && fn && fn->cfg
+              && profile_status_for_function (fn) == PROFILE_READ)
+            {
+              /* Since there are no non-0 call counts to this function,
+                 we don't know for sure whether it is hot. Indicate to
+                 the drop_profile routine that function should be marked
+                 normal, rather than hot.  */
+              drop_profile (node, false);
+              worklist.safe_push (callee);
+            }
+        }
+    }
+  worklist.release ();
+}
+
 /* Convert counts measured by profile driven feedback to frequencies.
    Return nonzero iff there was any nonzero execution count.  */

@@ -2774,6 +2875,9 @@  counts_to_freqs (void)
   gcov_type count_max, true_count_max = 0;
   basic_block bb;

+  if (!ENTRY_BLOCK_PTR->count)
+    return 0;
+
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
     true_count_max = MAX (bb->count, true_count_max);

Index: predict.h
===================================================================
--- predict.h   (revision 204213)
+++ predict.h   (working copy)
@@ -37,6 +37,7 @@  enum prediction

 extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
 extern int counts_to_freqs (void);
+extern void handle_missing_profiles (void);
 extern void estimate_bb_frequencies (bool);
 extern const char *predictor_name (enum br_predictor);
 extern tree build_predict_expr (enum br_predictor, enum prediction);
Index: tree-profile.c
===================================================================
--- tree-profile.c      (revision 204213)
+++ tree-profile.c      (working copy)
@@ -612,6 +612,8 @@  tree_profiling (void)
       pop_cfun ();
     }

+  handle_missing_profiles ();
+
   del_node_map ();
   return 0;
 }