Patchwork Sanitize block partitioning under -freorder-blocks-and-partition

login
register
mail settings
Submitter Jan Hubicka
Date Aug. 21, 2013, 3:25 p.m.
Message ID <20130821152527.GB16124@kam.mff.cuni.cz>
Download mbox | patch
Permalink /patch/268845/
State New
Headers show

Comments

Jan Hubicka - Aug. 21, 2013, 3:25 p.m.
> >
> > Because offline COMDAT functoin will be porduced for every COMDAT used, I think
> > it is bad to porduce any COMDAT (or any reachable function via calls with non-0
> > count) that has empty profile (either because it got lost by COMDAT merging
> > or because of reading mismatch).
> 
> The approach this patch takes is to simply treat those functions the
> same as we would if we didn't feed back profile data in the first
> place, by using the frequencies. This is sufficient except when one is
> inlined, which is why I have the special handling in the inliner
> itself.

Yes, my orignal plan was to have per-function profile_status that 
specify if profile is read, guessed or absent and do function local
decision sanely with each setting.

Here we read the function, we set profile to READ (with all counts being 0).
We should drop it to GUESSED when we see that there are non-0 count edges
calling the function in question and probably we should see if it is obviously
hot (i.e. reachable by a hot call) and promote its function profile to HOT
then to get code placement less bad...
> >
> > Since new direct calls can be discovered later, inline may want to do that
> > again each time it inlines non-0 count call of COMDAT with 0 count...
> 
> How about an approach like this:
> - Invoke init_and_estimate_bb_frequencies as I am doing to guess the
> profiles at profile read time for functions with 0 counts.

I see, here we are out of sync. 
We always used to go with estimated frequencies for functions with 0 counts,
but it seems that this code broke when prediction was moved before profiling.
(we also should keep edge probabilities from predict.c in that case)

The esitmated profile is already there before reading the profile in, so we
only do not want to overwrite it.  Does the following work for you?


> - At inline time, invoke some kind of freqs_to_counts routine for any
> 0-count routine that is reached by non-zero call edges. It would take

We should not need that since frequencies ought to be there.

> the sum of all incoming call edge counts and synthesize counts for the
> bbs using the guessed profile frequencies applied earlier by
> init_and_estimate_bb_frequencies. Then the inliner can do its normal
> bb count scaling.

Yes, i guess we should go this way.  Still we will need to watch overly large values.
Recrusive inlining can probably easily produce quite a nonsense here.

We wil also need to solve problem that in this case cgraph edges will have 0 profile.
We probably want to play the game there and just do the scaling for edge count,
since IPA passes probably do not want to care about partial profiles.
> 
> Does that seem like a reasonable approach?
> 
> There is one other fix in this patch:
> - The clone_inlined_nodes/update_noncloned_frequencies changes below
> are handling a different case: 0-count call edge in this module, with
> non-zero callee node count due to calls from other modules. It will
> allow update_noncloned_frequencies to scale down the edge counts in
> callee's cloned call tree. This was a fix I made for the
> callgraph-based linker plugin function reordering, and not splitting
> (since it is using both the node and edge weights to make ordering
> decisions). Here's a description of the issue when I was debugging it:

Yes, it seems resonable.  I did not really care about comdats at a time
I was writting this function..
> >> Index: ipa-inline-transform.c
> >> ===================================================================
> >> --- ipa-inline-transform.c      (revision 201644)
> >> +++ ipa-inline-transform.c      (working copy)
> >> @@ -51,7 +51,7 @@ int nfunctions_inlined;
> >>
> >>  static void
> >>  update_noncloned_frequencies (struct cgraph_node *node,
> >> -                             int freq_scale)
> >> +                             gcov_type count_scale, int freq_scale)
> >>  {
> >>    struct cgraph_edge *e;
> >>
> >> @@ -60,14 +60,16 @@ update_noncloned_frequencies (struct cgraph_node *
> >>      freq_scale = 1;
> >>    for (e = node->callees; e; e = e->next_callee)
> >>      {
> >> +      e->count = apply_scale (e->count, count_scale);
> >>        e->frequency = e->frequency * (gcov_type) freq_scale / CGRAPH_FREQ_BASE;
> >>        if (e->frequency > CGRAPH_FREQ_MAX)
> >>          e->frequency = CGRAPH_FREQ_MAX;
> >>        if (!e->inline_failed)
> >> -        update_noncloned_frequencies (e->callee, freq_scale);
> >> +        update_noncloned_frequencies (e->callee, count_scale, freq_scale);
> >>      }
> >>    for (e = node->indirect_calls; e; e = e->next_callee)
> >>      {
> >> +      e->count = apply_scale (e->count, count_scale);
> >>        e->frequency = e->frequency * (gcov_type) freq_scale / CGRAPH_FREQ_BASE;
> >>        if (e->frequency > CGRAPH_FREQ_MAX)
> >>          e->frequency = CGRAPH_FREQ_MAX;
> >> @@ -169,7 +171,13 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
> >>             }
> >>           duplicate = false;
> >>           e->callee->symbol.externally_visible = false;
> >> -          update_noncloned_frequencies (e->callee, e->frequency);
> >> +          // In the case of a COMDAT, the callee's count may be from other
> >> +          // modules, and we need to scale it for the current module's calls
> >> +          // (e.g. e->count may be 0 despite e->callee->count > 0).
> >> +          gcov_type count_scale = REG_BR_PROB_BASE;
> >> +          if (e->callee->count > e->count)
> >> +            count_scale = GCOV_COMPUTE_SCALE (e->count, e->callee->count);
> >> +          update_noncloned_frequencies (e->callee, count_scale, e->frequency);

Please fix the comment to the usual style and go ahead with this change.

Thanks,
Honza
> >>         }
> >>        else
> >>         {
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

Patch

Index: tree-profile.c
===================================================================
--- tree-profile.c	(revision 201838)
+++ tree-profile.c	(working copy)
@@ -604,6 +604,34 @@ 
 
       pop_cfun ();
     }
+  /* See if 0 count function has non-0 count callers.  In this case we
+     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      struct cgraph_edge *e;
+      bool called = false;
+      if (node->count)
+	continue;
+      for (e = node->callers; e; e = e->next_caller)
+	{
+	  if (e->count)
+	    called = true;
+	  if (cgraph_maybe_hot_edge_p (e))
+	    break;
+	}
+      if (e || called
+	  && profile_status_for_function
+	      (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
+	{
+	  if (dump_file)
+	    fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
+		     cgraph_node_name (node), node->symbol.order,
+		     e ? "function is hot" : "function is normal");
+	  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
+	    = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+	  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+	}
+    }
 
   del_node_map();
   return 0;
Index: predict.c
===================================================================
--- predict.c	(revision 201838)
+++ predict.c	(working copy)
@@ -2715,6 +2715,9 @@ 
   gcov_type count_max, true_count_max = 0;
   basic_block bb;
 
+  if (!ENTRY_BLOCK_PTR->count)
+    return 0;
+
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
     true_count_max = MAX (bb->count, true_count_max);