From patchwork Fri Oct  4 05:15:49 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Teresa Johnson <tejohnson@google.com>
X-Patchwork-Id: 280461
Return-Path: 
 <gcc-patches-return-350457-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by ozlabs.org (Postfix) with ESMTPS id 6D4EB2C0095
	for <incoming@patchwork.ozlabs.org>;
	Fri,  4 Oct 2013 15:16:04 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:date:message-id:subject:from:to:cc:content-type;
	q=dns; s=default; b=IA6xaap7OqwbBrJguK6cX7UQdLvhupD9DWo1CmFBOsa
	Gu5BxLn3CppBDRrRYvKRMDSGuLsv5O0teYagsCQ1kJqNl71yAtcuyL4Byyo/7hQN
	PZmzwAAaQ5qozqrnftIe1RLfGbHVi16lFcAitRB9Ku5Ha/wKstUoyAer2lujDItk
	=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:date:message-id:subject:from:to:cc:content-type;
	s=default; bh=HSBW+H3EjQFk7vD+oe7U9mR3vBY=; b=fAGVdaSsjCI+E6gnH
	xKVkqtWsYi/sA4uJlKi5EOdBS4PDQNecVPoN1KjhA1pO0+2XNqCVpoLndYLJfX82
	7GLOapfhT2KrEnmWw1O2ioqf7kuLpCh1OmxkqXWYO7nuQnfu/CQJFB2E/jSEtoWc
	0SWaRbOiHw4ZeKunSkdMPx8oQ0=
Received: (qmail 24657 invoked by alias); 4 Oct 2013 05:15:56 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-##L=##H@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 24625 invoked by uid 89); 4 Oct 2013 05:15:55 -0000
Received: from mail-wi0-f170.google.com (HELO mail-wi0-f170.google.com)
	(209.85.212.170) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted)
	ESMTPS; Fri, 04 Oct 2013 05:15:55 +0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00,
	NO_RELAYS autolearn=ham version=3.3.2
X-HELO: mail-wi0-f170.google.com
Received: by mail-wi0-f170.google.com with SMTP id cb5so1793164wib.5 for
	<gcc-patches@gcc.gnu.org>; Thu, 03 Oct 2013 22:15:49 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
	s=20130820;
	h=x-gm-message-state:mime-version:date:message-id:subject:from:to:cc
	:content-type; bh=Vx4BsRF1xyW3cTwAHdQY6VtlAD8eHjX+V2N4Apz9klc=;
	b=Cl3faKsAp/d+1SsF+YJMyGoGGGxIhL4m+zDgWwYolhjs6lgd8e6FebGzhz4EF/VogZ
	v6c+QR1h4d2K1aMkCvaB5nsIRPCzyr+KPURkSSbXNLA3Net2140acQHFU6ZbCVBAmr4G
	+8uTzke953k/l5IqKd5jU9mo9YT/d1e+nZ66UzcQDaVICVvNHSdobb9TQQ1veW61afqR
	0RODn9t6Cta+hER0MvzGQcriwq/CBn0GFpYSQ94kyur0N7x10FieWbePmb/kNsqiIvzF
	O7Q3dPGbPiovwIrq1GARRCswy2M3gjysMFqviBEWQrTuk8YVbG5A1LB5pV3tLcyyXKsB
	ziXw==
X-Gm-Message-State: 
 ALoCoQk7nKIazVoarxzA0yn3NHR8pNnOFkHaNEsPI4ReBgQY6g2cIl1eImChCOT7FYUEfX608TRPIXloxxoOqKjxGyUA+kUjlf2DnxpYIrzur/T8B5RvD58o9bnJT5jD/IbfLRsTs6Ad2R1myi9mJe3jMBqFgD9pL6XLvH4C0A7PQ7oPK7Mdk8ZMBhz46OGGIlf6ciP4yaAEAWS+ZjK4PcOyooGw8Lr4Pg==
MIME-Version: 1.0
X-Received: by 10.180.210.243 with SMTP id mx19mr5454684wic.35.1380863749375;
	Thu, 03 Oct 2013 22:15:49 -0700 (PDT)
Received: by 10.216.164.70 with HTTP; Thu, 3 Oct 2013 22:15:49 -0700 (PDT)
Date: Thu, 3 Oct 2013 22:15:49 -0700
Message-ID: 
 <CAAe5K+UJOuP3JGpGuRk1b9ZdeWbhx5tEc6A8ajX3t5OgirrhAQ@mail.gmail.com>
Subject: [Patch] Handle profile counts truncated to 0 in coldness test
From: Teresa Johnson <tejohnson@google.com>
To: Jan Hubicka <hubicka@ucw.cz>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: David Li <davidxl@google.com>
X-IsSubscribed: yes

This patch handles the fact that small profile count values sometimes
get truncated down to 0 during updates due to integer arithmetic. This
causes sub-optimal function splitting under
-freorder-blocks-and-partition.

The first part fixes the logic in probably_never_executed that looks
at the bb frequency when the bb count is zero. It now correctly
compares the count computed from the frequency to the number of
profile runs instead of to REG_BR_PROB_BASE. The minimum ratio of
profile counts to training runs required for a block to not be
considered cold is now encoded in a parameter, with the default value
of 4 to match the existing code.

The second part ensures that frequencies are correctly updated after
inlining. The problem is that after inlining the frequencies were
being recomputed directly from the corresponding bb counts in
rebuild_frequencies. If the counts had been truncated to 0, then the
recomputed frequencies would become 0 as well (often leading to
inconsistencies in the frequencies between blocks). Then the above
change to probably_never_executed would not help identify these blocks
as non-cold from the frequencies. The fix was to use the existing
logic used during static profile rebuilding to also
recompute/propagate the frequencies from the branch probabilities in
the profile feedback case. I also renamed a number of estimate_*
routines to compute_* and updated the comments to reflect the fact
that these routines are not doing estimation (from a static profile),
but in fact recomputing/propagating frequencies from the existing
(either guessed or profile-feedback-based) probabilities.

Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
(One more round of regression testing in progress after making some
slight cleanup changes.)

Thanks,
Teresa

2013-10-03  Teresa Johnson  <tejohnson@google.com>

        * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter.
        * predict.c (probably_never_executed): Compare frequency-based
        count to number of training runs.
        (tree_estimate_probability): Update function call to new name.
        compute_bb_frequencies and pass new parameter.
        (compute_loops_at_level): Renamed.
        (compute_loops): Ditto.
        (compute_bb_frequencies): Renamed, and new parameter to
        force recomputing frequencies.
        (rebuild_frequencies): Recompute bb frequencies from the
        probabilities instead of from counts due to truncation issues.
        * predict.h (compute_bb_frequencies): Update declaration.

Index: params.def
===================================================================
--- params.def  (revision 203152)
+++ params.def  (working copy)
@@ -373,6 +373,12 @@ DEFPARAM(HOT_BB_FREQUENCY_FRACTION,
         "Select fraction of the maximal frequency of executions of
basic block in function given basic block needs to have to be
considered hot",
         1000, 0, 0)

+DEFPARAM(PARAM_MIN_HOT_RUN_RATIO,
+        "min-hot-run-ratio",
+         "The minimum ratio of profile runs to basic block execution count "
+         "for the block to be considered hot",
+        4, 0, 10000)
+
 DEFPARAM (PARAM_ALIGN_THRESHOLD,
          "align-threshold",
          "Select fraction of the maximal frequency of executions of
basic block in function given basic block get alignment",
Index: predict.c
===================================================================
--- predict.c   (revision 203152)
+++ predict.c   (working copy)
@@ -237,17 +237,31 @@ probably_never_executed (struct function *fun,
   gcc_checking_assert (fun);
   if (profile_status_for_function (fun) == PROFILE_READ)
     {
-      if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0)
+      int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO);
+      if (RDIV (count * min_run_ratio, profile_info->runs) > 0)
        return false;
       if (!frequency)
        return true;
       if (!ENTRY_BLOCK_PTR->frequency)
        return false;
-      if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE)
+      if (ENTRY_BLOCK_PTR->count)
        {
-         return (RDIV (frequency * ENTRY_BLOCK_PTR->count,
-                       ENTRY_BLOCK_PTR->frequency)
-                 < REG_BR_PROB_BASE / 4);
+          gcov_type scaled_count
+              = frequency * ENTRY_BLOCK_PTR->count * min_run_ratio;
+          gcov_type computed_count;
+          /* Check for overflow, in which case ENTRY_BLOCK_PTR->count should
+             be large enough to do the division first without losing much
+             precision.  */
+          if (scaled_count/frequency/min_run_ratio != ENTRY_BLOCK_PTR->count)
+            {
+              computed_count = RDIV (ENTRY_BLOCK_PTR->count,
+                                     ENTRY_BLOCK_PTR->frequency);
+              computed_count *= frequency * min_run_ratio;
+            }
+          else
+            computed_count = RDIV (scaled_count, ENTRY_BLOCK_PTR->frequency);
+          if (RDIV (computed_count * min_run_ratio, profile_info->runs) > 0)
+            return false;
        }
       return true;
     }
@@ -2388,7 +2402,7 @@ tree_estimate_probability (void)
   pointer_map_destroy (bb_predictions);
   bb_predictions = NULL;

-  estimate_bb_frequencies ();
+  compute_bb_frequencies (false);
   free_dominance_info (CDI_POST_DOMINATORS);
   remove_fake_exit_edges ();
 }
@@ -2551,7 +2565,7 @@ typedef struct edge_info_def
 #define BLOCK_INFO(B)  ((block_info) (B)->aux)
 #define EDGE_INFO(E)   ((edge_info) (E)->aux)

-/* Helper function for estimate_bb_frequencies.
+/* Helper function for compute_bb_frequencies.
    Propagate the frequencies in blocks marked in
    TOVISIT, starting in HEAD.  */

@@ -2691,10 +2705,10 @@ propagate_freq (basic_block head, bitmap tovisit)
     }
 }

-/* Estimate probabilities of loopback edges in loops at same nest level.  */
+/* Compute frequencies in loops at same nest level.  */

 static void
-estimate_loops_at_level (struct loop *first_loop)
+compute_loops_at_level (struct loop *first_loop)
 {
   struct loop *loop;

@@ -2705,7 +2719,7 @@ static void
       unsigned i;
       bitmap tovisit = BITMAP_ALLOC (NULL);

-      estimate_loops_at_level (loop->inner);
+      compute_loops_at_level (loop->inner);

       /* Find current loop back edge and mark it.  */
       e = loop_latch_edge (loop);
@@ -2723,14 +2737,14 @@ static void
 /* Propagates frequencies through structure of loops.  */

 static void
-estimate_loops (void)
+compute_loops (void)
 {
   bitmap tovisit = BITMAP_ALLOC (NULL);
   basic_block bb;

-  /* Start by estimating the frequencies in the loops.  */
+  /* Start by computing the frequencies in the loops.  */
   if (number_of_loops (cfun) > 1)
-    estimate_loops_at_level (current_loops->tree_root->inner);
+    compute_loops_at_level (current_loops->tree_root->inner);

   /* Now propagate the frequencies through all the blocks.  */
   FOR_ALL_BB (bb)
@@ -2800,15 +2814,16 @@ expensive_function_p (int threshold)
   return false;
 }

-/* Estimate basic blocks frequency by given branch probabilities.  */
+/* Compute and propagate basic block frequencies using the given branch
+   probabilities.  */

 void
-estimate_bb_frequencies (void)
+compute_bb_frequencies (bool rebuild)
 {
   basic_block bb;
   sreal freq_max;

-  if (profile_status != PROFILE_READ || !counts_to_freqs ())
+  if (rebuild || profile_status != PROFILE_READ || !counts_to_freqs ())
     {
       static int real_values_initialized = 0;

@@ -2845,9 +2860,9 @@ void
            }
        }

-      /* First compute probabilities locally for each loop from innermost
+      /* First compute frequencies locally for each loop from innermost
          to outermost to examine probabilities for back edges.  */
-      estimate_loops ();
+      compute_loops ();

       memcpy (&freq_max, &real_zero, sizeof (real_zero));
       FOR_EACH_BB (bb)
@@ -3027,18 +3042,16 @@ void
 rebuild_frequencies (void)
 {
   timevar_push (TV_REBUILD_FREQUENCIES);
-  if (profile_status == PROFILE_GUESSED)
+  if (profile_status == PROFILE_GUESSED || profile_status == PROFILE_READ)
     {
       loop_optimizer_init (0);
       add_noreturn_fake_exit_edges ();
       mark_irreducible_loops ();
       connect_infinite_loops_to_exit ();
-      estimate_bb_frequencies ();
+      compute_bb_frequencies (true);
       remove_fake_exit_edges ();
       loop_optimizer_finalize ();
     }
-  else if (profile_status == PROFILE_READ)
-    counts_to_freqs ();
   else
     gcc_unreachable ();
   timevar_pop (TV_REBUILD_FREQUENCIES);
Index: predict.h
===================================================================
--- predict.h   (revision 203152)
+++ predict.h   (working copy)
@@ -37,7 +37,7 @@ enum prediction

 extern void predict_insn_def (rtx, enum br_predictor, enum prediction);
 extern int counts_to_freqs (void);
-extern void estimate_bb_frequencies (void);
+extern void compute_bb_frequencies (bool rebuild);
 extern const char *predictor_name (enum br_predictor);
 extern tree build_predict_expr (enum br_predictor, enum prediction);
 extern void tree_estimate_probability (void);