From patchwork Wed Aug 21 15:25:27 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 268845 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 611E42C00D5 for ; Thu, 22 Aug 2013 01:25:38 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=T4KevfjpHwAyesqad t3OoTbVHat6cPBhKNZck7aXBEq/tKaY00x23rr68dWVFe5vXCdZYpz9akf18vXY9 tracUKxY4xsODptcZODIBI29vNPUNWXURcckndYeMMZx0GKB2lMYJ+/VwgqbGVCq 6HR2dxCHk4RFrgKcNAoYrRmj6c= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=default; bh=nkY6zF59A9s7AB/uvCDEVkA YCvg=; b=FhPbCDMuicxctfMxi+fmOA1/fWRLZNtPBNJUJox3jAG9Uuv3bAiqeb8 JBe+XiqaPrr6V4HZpAskQnjXOBbjQM5ztdBefIuxP7OugX3cGzsLd4YCeXcgaAEE gRtjNLl6xmAOlICgMRfdPNF/OTCqdRTuE8W/ACr9VjzBC7GFaqY8= Received: (qmail 20148 invoked by alias); 21 Aug 2013 15:25:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 20132 invoked by uid 89); 21 Aug 2013 15:25:31 -0000 X-Spam-SWARE-Status: No, score=-5.4 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_NO, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, RP_MATCHES_RCVD autolearn=ham version=3.3.2 Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Wed, 21 Aug 2013 15:25:29 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 7933D54323A; Wed, 21 Aug 2013 17:25:27 +0200 (CEST) Date: Wed, 21 Aug 2013 17:25:27 +0200 From: Jan Hubicka To: Teresa Johnson Cc: Jan Hubicka , Bernhard Reutner-Fischer , "gcc-patches@gcc.gnu.org" , Steven Bosscher , Jeff Law , "marxin.liska" , Sriraman Tallam Subject: Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition Message-ID: <20130821152527.GB16124@kam.mff.cuni.cz> References: <20130808222332.GA31755@kam.mff.cuni.cz> <20130809095843.GC31755@kam.mff.cuni.cz> <20130809152804.GA6579@kam.mff.cuni.cz> <20130817204408.GA16557@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) > > > > Because offline COMDAT functoin will be porduced for every COMDAT used, I think > > it is bad to porduce any COMDAT (or any reachable function via calls with non-0 > > count) that has empty profile (either because it got lost by COMDAT merging > > or because of reading mismatch). > > The approach this patch takes is to simply treat those functions the > same as we would if we didn't feed back profile data in the first > place, by using the frequencies. This is sufficient except when one is > inlined, which is why I have the special handling in the inliner > itself. Yes, my orignal plan was to have per-function profile_status that specify if profile is read, guessed or absent and do function local decision sanely with each setting. Here we read the function, we set profile to READ (with all counts being 0). We should drop it to GUESSED when we see that there are non-0 count edges calling the function in question and probably we should see if it is obviously hot (i.e. reachable by a hot call) and promote its function profile to HOT then to get code placement less bad... > > > > Since new direct calls can be discovered later, inline may want to do that > > again each time it inlines non-0 count call of COMDAT with 0 count... > > How about an approach like this: > - Invoke init_and_estimate_bb_frequencies as I am doing to guess the > profiles at profile read time for functions with 0 counts. I see, here we are out of sync. We always used to go with estimated frequencies for functions with 0 counts, but it seems that this code broke when prediction was moved before profiling. (we also should keep edge probabilities from predict.c in that case) The esitmated profile is already there before reading the profile in, so we only do not want to overwrite it. Does the following work for you? > - At inline time, invoke some kind of freqs_to_counts routine for any > 0-count routine that is reached by non-zero call edges. It would take We should not need that since frequencies ought to be there. > the sum of all incoming call edge counts and synthesize counts for the > bbs using the guessed profile frequencies applied earlier by > init_and_estimate_bb_frequencies. Then the inliner can do its normal > bb count scaling. Yes, i guess we should go this way. Still we will need to watch overly large values. Recrusive inlining can probably easily produce quite a nonsense here. We wil also need to solve problem that in this case cgraph edges will have 0 profile. We probably want to play the game there and just do the scaling for edge count, since IPA passes probably do not want to care about partial profiles. > > Does that seem like a reasonable approach? > > There is one other fix in this patch: > - The clone_inlined_nodes/update_noncloned_frequencies changes below > are handling a different case: 0-count call edge in this module, with > non-zero callee node count due to calls from other modules. It will > allow update_noncloned_frequencies to scale down the edge counts in > callee's cloned call tree. This was a fix I made for the > callgraph-based linker plugin function reordering, and not splitting > (since it is using both the node and edge weights to make ordering > decisions). Here's a description of the issue when I was debugging it: Yes, it seems resonable. I did not really care about comdats at a time I was writting this function.. > >> Index: ipa-inline-transform.c > >> =================================================================== > >> --- ipa-inline-transform.c (revision 201644) > >> +++ ipa-inline-transform.c (working copy) > >> @@ -51,7 +51,7 @@ int nfunctions_inlined; > >> > >> static void > >> update_noncloned_frequencies (struct cgraph_node *node, > >> - int freq_scale) > >> + gcov_type count_scale, int freq_scale) > >> { > >> struct cgraph_edge *e; > >> > >> @@ -60,14 +60,16 @@ update_noncloned_frequencies (struct cgraph_node * > >> freq_scale = 1; > >> for (e = node->callees; e; e = e->next_callee) > >> { > >> + e->count = apply_scale (e->count, count_scale); > >> e->frequency = e->frequency * (gcov_type) freq_scale / CGRAPH_FREQ_BASE; > >> if (e->frequency > CGRAPH_FREQ_MAX) > >> e->frequency = CGRAPH_FREQ_MAX; > >> if (!e->inline_failed) > >> - update_noncloned_frequencies (e->callee, freq_scale); > >> + update_noncloned_frequencies (e->callee, count_scale, freq_scale); > >> } > >> for (e = node->indirect_calls; e; e = e->next_callee) > >> { > >> + e->count = apply_scale (e->count, count_scale); > >> e->frequency = e->frequency * (gcov_type) freq_scale / CGRAPH_FREQ_BASE; > >> if (e->frequency > CGRAPH_FREQ_MAX) > >> e->frequency = CGRAPH_FREQ_MAX; > >> @@ -169,7 +171,13 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d > >> } > >> duplicate = false; > >> e->callee->symbol.externally_visible = false; > >> - update_noncloned_frequencies (e->callee, e->frequency); > >> + // In the case of a COMDAT, the callee's count may be from other > >> + // modules, and we need to scale it for the current module's calls > >> + // (e.g. e->count may be 0 despite e->callee->count > 0). > >> + gcov_type count_scale = REG_BR_PROB_BASE; > >> + if (e->callee->count > e->count) > >> + count_scale = GCOV_COMPUTE_SCALE (e->count, e->callee->count); > >> + update_noncloned_frequencies (e->callee, count_scale, e->frequency); Please fix the comment to the usual style and go ahead with this change. Thanks, Honza > >> } > >> else > >> { > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 Index: tree-profile.c =================================================================== --- tree-profile.c (revision 201838) +++ tree-profile.c (working copy) @@ -604,6 +604,34 @@ pop_cfun (); } + /* See if 0 count function has non-0 count callers. In this case we + lost some profile. Drop its function profile to PROFILE_GUESSED. */ + FOR_EACH_DEFINED_FUNCTION (node) + { + struct cgraph_edge *e; + bool called = false; + if (node->count) + continue; + for (e = node->callers; e; e = e->next_caller) + { + if (e->count) + called = true; + if (cgraph_maybe_hot_edge_p (e)) + break; + } + if (e || called + && profile_status_for_function + (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ) + { + if (dump_file) + fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n", + cgraph_node_name (node), node->symbol.order, + e ? "function is hot" : "function is normal"); + profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl)) + = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); + node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL; + } + } del_node_map(); return 0; Index: predict.c =================================================================== --- predict.c (revision 201838) +++ predict.c (working copy) @@ -2715,6 +2715,9 @@ gcov_type count_max, true_count_max = 0; basic_block bb; + if (!ENTRY_BLOCK_PTR->count) + return 0; + FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb) true_count_max = MAX (bb->count, true_count_max);