Handle more COMDAT profiling issues

> This patch attempts to address the lost profile issue for COMDATs in
> more circumstances, exposed by function splitting.
> 
> My earlier patch handled the case where the comdat had 0 counts since
> the linker kept the copy in a different module. In that case we
> prevent the guessed frequencies on 0-count functions from being
> dropped by counts_to_freq, and then later mark any reached via
> non-zero callgraph edges as guessed. Finally, when one such 0-count
> comdat is inlined the call count is propagated to the callee blocks
> using the guessed probabilities.
> 
> However, in this case, there was a comdat that had a very small
> non-zero count, that was being inlined to a much hotter callsite. This
> could happen when there was a copy that was ipa-inlined
> in the profile gen compile, so the copy in that module gets some
> non-zero counts from the ipa inlined instance, but when the out of
> line copy was eliminated by the linker (selected from a different
> module). In this case the inliner was scaling the bb counts up quite a
> lot when inlining. The problem is that you most likely can't trust
> that the 0 count bbs in such a case are really not executed by the
> callsite it is being into, since the counts are very small and
> correspond to a different callsite. In some internal C++ applications
> I am seeing that we execute out of the split cold portion of COMDATs
> for this reason.

With all the problems we have with COMDATs, it still seems to me that perhaps
best approach would be to merge them in runtime, as we discussed some time ago
and incrementally handle the problem how to get callstack sensitive profiles.
> 
> This problem is more complicated to address than the 0-count instance,
> because we need the cgraph to determine which functions to drop the
> profile on, and at that point the estimated frequencies have already
> been overwritten by counts_to_freqs. To handle this broader case, I
> have changed the drop_profile routine to simply re-estimate the
> probabilities/frequencies (and translate these into counts scaled from
> the incoming call edge counts). This unfortunately necessitates
> rebuilding the cgraph, to propagate the new synthesized counts and
> avoid checking failures downstream. But it will only be rebuilt if we
> dropped any profiles. With this solution, some of the older approach
> can be removed (e.g. propagating counts using the guessed
> probabilities during inlining).
> 
> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu.
> Also tested on
> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens?
> 
> Thanks,
> Teresa

2014-02-12  Teresa Johnson  <tejohnson@google.com>

	* graphite.c (graphite_finalize): Pass new parameter.
	* params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New.
	* predict.c (tree_estimate_probability): New parameter.
	(tree_estimate_probability_worker): Renamed from
        tree_estimate_probability_driver.
	(tree_reestimate_probability): New function.
	(tree_estimate_probability_driver): Invoke
        tree_estimate_probability_worker.
	(freqs_to_counts): Move here from tree-inline.c.
	(drop_profile): Re-estimated profiles when dropping counts.
	(handle_missing_profiles): Drop for some non-zero functions as well,
        get counts from bbs to support invocation before cgraph rebuild.
	(counts_to_freqs): Remove code obviated by reestimation.
	* predict.h (tree_estimate_probability): Update declaration.
	* tree-inline.c (freqs_to_counts): Move to predict.c.
	(copy_cfg_body): Remove code obviated by reestimation.
	* tree-profile.c (tree_profiling): Invoke handle_missing_profiles
        before cgraph rebuild.

   connect_infinite_loops_to_exit ();
   /* We use loop_niter_by_eval, which requires that the loops have
      preheaders.  */
-  create_preheaders (CP_SIMPLE_PREHEADERS);
+  if (!redo)
+    create_preheaders (CP_SIMPLE_PREHEADERS);

So you disable preheaders construction because this all happens at inlining
time when we do not want to change BBs because callgraph is built, right?
Are you sure we do have preheaders built at IPA time for all functions reliably?
(It would make sense to do so, but I think loop analysis is done rather randomly
by IPA analysis passes, such as inliner).
Otherwise the SCEV infrastructure will explode, if I remember correctly.

+/* Force re-estimation of the probabilities, because the profile was
+   deemed insufficient.  */
+
+static void
+tree_reestimate_probability (void)
+{
+  basic_block bb;
+  edge e;
+  edge_iterator ei;
+
+  /* Need to reset the counts to ensure probabilities are recomputed.  */
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      bb->count = 0;
+      FOR_EACH_EDGE (e, ei, bb->succs)
+        e->count = 0;
+    }
+  tree_estimate_probability_worker (true);

You also want to remove recorded loop count estimates, since they are also
driven by profile we chose to not trust.

@@ -2842,48 +2927,77 @@ handle_missing_profiles (void)
       gcov_type max_tp_first_run = 0;
       struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
 
-      if (node->count)
-        continue;
       for (e = node->callers; e; e = e->next_caller)
       {
-        call_count += e->count;
+        call_count += cgraph_function_with_gimple_body_p (e->caller)
+                      ? gimple_bb (e->call_stmt)->count : 0;

I am quite confused by this change - we you don't want to trust edge counts?
Other parts of the patch seems quite resonable, but I don't really follow here.

Honza


Handle more COMDAT profiling issues

Commit Message

Comments

Patch