From patchwork Fri Nov 8 21:58:16 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Teresa Johnson X-Patchwork-Id: 289925 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 532152C00AA for ; Sat, 9 Nov 2013 08:58:36 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=xAYgHC8prt0Gmho8Pd IGw+HLZbQwzhs3U7FQ3CLudcQ2xl9fGqY3lzpzAZsOUD+fYy6EMzhN7mhtV3Uutx fUeIv5Y3OtRqtgofuK4lhoMtgJgRCOAITFOUnE4j07CGa7waOXyEUOSboudOeO5m wlUw7HoJGb+HVp+E/uHVtqfek= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=mfFAqeAPAOG+b2l/1/+AF8KF oj8=; b=An3jc2JZGDjNvDhuXI0uqLvny0bq5zCZXEKkWpFmdzc4+nZldHHek/Qk XUKptOSnvSdXnPdmzEU2x3M75crFIyO1H5WdevX39ze3BWN6z7toMJC6sD3oB0W2 +n+JZ52UBDQOTstKBzHI6GtXArAQPguK8K45h2SfugedgPKsOeU= Received: (qmail 29543 invoked by alias); 8 Nov 2013 21:58:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 29530 invoked by uid 89); 8 Nov 2013 21:58:27 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL, BAYES_50, RDNS_NONE, SPF_PASS, URIBL_BLOCKED autolearn=no version=3.3.2 X-HELO: mail-qe0-f42.google.com Received: from Unknown (HELO mail-qe0-f42.google.com) (209.85.128.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 08 Nov 2013 21:58:25 +0000 Received: by mail-qe0-f42.google.com with SMTP id gc15so2588958qeb.1 for ; Fri, 08 Nov 2013 13:58:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=VYiM85Ll3oYdYssbWJ3HXbijInQH6r0Ma5HZoMtW6mE=; b=BJwZgEKQDZmQJ3L/db9X5JqkaxvX0wVidYuwec7OQYsl6CXCpehbf9s0wy/ja1Y3p4 ue5kShtHgjmdPjYW32YikFMGUM6+RhewhKpr++WxDJch+croSE1nMS3NdHwxsQhvjoiZ faUEbVmpzo9WvEFwu6gGvddp+vstzN0qixeA7V4X9YiopHcTZhhjTVs+g9Qlwi+bbmFo YS99AR2kPLx20AneVftq416tqmy4Dmq4tMmmdKmfffzvf9Qj6dVaNjcAXQlVX1goalxP 2Y4r8W+Ke5UzY705y245EEfkcTQLn3HAOxsIVGn6KbXkI3aeA+kNm53OvfhfZ+DI3pnl 2lZw== X-Gm-Message-State: ALoCoQnYO1Hf7mwxIlyMKXhkCzopK2cXAwk5wCU8gB3qk5FKk3kD/8LuVPKvtHktyyD4lSTCpY2aomW2QB/+eLd7Vlhubc+LAcgnApDJ6JJYu+44JQmg5yBQum6StkgMgb6dnG6GyxvSN/q0GA3I1N2SKS0mVkdf1R8OG5dP41KKUzUUCp6YCzGH3/1SoLF5Jv/smHmtVa/5z37hC2nRKtK6lG2b2Tmz1A== MIME-Version: 1.0 X-Received: by 10.224.69.69 with SMTP id y5mr27408114qai.53.1383947897054; Fri, 08 Nov 2013 13:58:17 -0800 (PST) Received: by 10.229.117.69 with HTTP; Fri, 8 Nov 2013 13:58:16 -0800 (PST) In-Reply-To: References: <20131015210324.GA23980@kam.mff.cuni.cz> <20131015215741.GA22365@kam.mff.cuni.cz> <20131018211757.GB31793@kam.mff.cuni.cz> Date: Fri, 8 Nov 2013 13:58:16 -0800 Message-ID: Subject: Re: [PATCH] decide edge's hotness when there is profile info From: Teresa Johnson To: Jan Hubicka Cc: Dehao Chen , Xinliang David Li , GCC Patches X-IsSubscribed: yes On Tue, Nov 5, 2013 at 6:02 AM, Teresa Johnson wrote: > Ping. > Thanks, Teresa > I updated the patch to include the follow-on work to apply call counts to 0-weight functions using the estimated frequencies when inlining. This helps avoid 0-weight blocks of inlined code which are particularly bad for function splitting. Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk? Thanks, Teresa 2013-11-08 Teresa Johnson Jan Hubicka * predict.c (drop_profile): New function. (handle_missing_profiles): Ditto. (counts_to_freqs): Don't overwrite estimated frequencies when function has no profile counts. * predict.h (handle_missing_profiles): Declare. * tree-inline.c (freqs_to_counts): New function. (copy_cfg_body): Invoke freqs_to_counts as needed. * tree-profile.c (tree_profiling): Invoke handle_missing_profiles. > On Wed, Oct 30, 2013 at 1:15 PM, Teresa Johnson wrote: >> On Fri, Oct 18, 2013 at 2:17 PM, Jan Hubicka wrote: >>>> Here is the patch updated to use the new parameter from r203830. >>>> Passed bootstrap and regression tests. >>>> >>>> 2013-10-18 Jan Hubicka >>>> Teresa Johnson >>>> >>>> * predict.c (handle_missing_profiles): New function. >>>> (counts_to_freqs): Don't overwrite estimated frequencies >>>> when function has no profile counts. >>>> * predict.h (handle_missing_profiles): Declare. >>>> * tree-profile.c (tree_profiling): Invoke handle_missing_profiles. >>>> >>>> Index: predict.c >>>> =================================================================== >>>> --- predict.c (revision 203830) >>>> +++ predict.c (working copy) >>>> @@ -2758,6 +2758,40 @@ estimate_loops (void) >>>> BITMAP_FREE (tovisit); >>>> } >>>> >>> >>> You need block comment. It should explain the problem of COMDATs and how they are handled. >>> It is not an obvious thing. >> >> Done. >> >>> >>>> +void >>>> +handle_missing_profiles (void) >>>> +{ >>>> + struct cgraph_node *node; >>>> + int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); >>> extra line >>>> + /* See if 0 count function has non-0 count callers. In this case we >>>> + lost some profile. Drop its function profile to PROFILE_GUESSED. */ >>>> + FOR_EACH_DEFINED_FUNCTION (node) >>>> + { >>>> + struct cgraph_edge *e; >>>> + gcov_type call_count = 0; >>>> + struct function *fn = DECL_STRUCT_FUNCTION (node->symbol.decl); >>> Extra line >>>> + if (node->count) >>>> + continue; >>>> + for (e = node->callers; e; e = e->next_caller) >>>> + call_count += e->count; >>> What about checking if the sum is way off even for non-0 counts. >>> I.e. for case where function was inlined to some calls but not to others? In >>> that case we may want to scale up the counts (with some resonable care for >>> capping) >> >> In this patch I am not changing any counts, so I am leaving this one >> for follow-on work (even for the routines missing counts completely >> like these, we don't apply any counts, just mark them as guessed. I >> have a follow-on patch to send once this goes in that does apply >> counts to these 0-count routines only when we decide to inline as we >> discussed). >> >>> >>> Also I think the code really should propagate - i.e. have simple worklist and >>> look for functions that are COMDAT and are called by other COMDAT functions >>> where we decided to drop the profile. Having non-trivial chains of comdats is >>> common thing. >> >> Done. >> >>> >>> What about outputting user visible warning/error on the incosnsistency when >>> no COMDAT function is involved same way as we do for BB profile? >> >> Done. This one caught the fact that we have this situation for "extern >> template" functions as well when I tested on cpu2006. I added in a >> check to ignore those when issuing the warning (they are not emitted >> thus don't get any profile counts). >> >> Updated patch below. >> >> Bootstrapped and tested on x86-64-unknown-linux-gnu. Also tested on >> profiledbootstrap and profile-use build of SPEC cpu2006. Ok for trunk? >> >> Thanks, >> Teresa >> >> 2013-10-30 Teresa Johnson >> >> * predict.c (drop_profile): New function. >> (handle_missing_profiles): Ditto. >> (counts_to_freqs): Don't overwrite estimated frequencies >> when function has no profile counts. >> * predict.h (handle_missing_profiles): Declare. >> * tree-profile.c (tree_profiling): Invoke handle_missing_profiles. >> >> Index: predict.c >> =================================================================== >> --- predict.c (revision 204213) >> +++ predict.c (working copy) >> @@ -2765,6 +2765,107 @@ estimate_loops (void) >> BITMAP_FREE (tovisit); >> } >> >> +/* Drop the profile for NODE to guessed, and update its frequency based on >> + whether it is expected to be HOT. */ >> + >> +static void >> +drop_profile (struct cgraph_node *node, bool hot) >> +{ >> + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >> + >> + if (dump_file) >> + fprintf (dump_file, >> + "Dropping 0 profile for %s/%i. %s based on calls.\n", >> + cgraph_node_name (node), node->order, >> + hot ? "Function is hot" : "Function is normal"); >> + /* We only expect to miss profiles for functions that are reached >> + via non-zero call edges in cases where the function may have >> + been linked from another module or library (COMDATs and extern >> + templates). See the comments below for handle_missing_profiles. */ >> + if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl)) >> + warning (0, >> + "Missing counts for called function %s/%i", >> + cgraph_node_name (node), node->order); >> + >> + profile_status_for_function (fn) >> + = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); >> + node->frequency >> + = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL; >> +} >> + >> +/* In the case of COMDAT routines, multiple object files will contain the same >> + function and the linker will select one for the binary. In that case >> + all the other copies from the profile instrument binary will be missing >> + profile counts. Look for cases where this happened, due to non-zero >> + call counts going to 0-count functions, and drop the profile to guessed >> + so that we can use the estimated probabilities and avoid optimizing only >> + for size. >> + >> + The other case where the profile may be missing is when the routine >> + is not going to be emitted to the object file, e.g. for "extern template" >> + class methods. Those will be marked DECL_EXTERNAL. Emit a warning in >> + all other cases of non-zero calls to 0-count functions. */ >> + >> +void >> +handle_missing_profiles (void) >> +{ >> + struct cgraph_node *node; >> + int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); >> + vec worklist; >> + worklist.create (64); >> + >> + /* See if 0 count function has non-0 count callers. In this case we >> + lost some profile. Drop its function profile to PROFILE_GUESSED. */ >> + FOR_EACH_DEFINED_FUNCTION (node) >> + { >> + struct cgraph_edge *e; >> + gcov_type call_count = 0; >> + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); >> + >> + if (node->count) >> + continue; >> + for (e = node->callers; e; e = e->next_caller) >> + call_count += e->count; >> + if (call_count >> + && fn && fn->cfg >> + && (call_count * unlikely_count_fraction >= profile_info->runs)) >> + { >> + bool maybe_hot = maybe_hot_count_p (NULL, call_count); >> + >> + drop_profile (node, maybe_hot); >> + worklist.safe_push (node); >> + } >> + } >> + >> + /* Propagate the profile dropping to other 0-count COMDATs that are >> + potentially called by COMDATs we already dropped the profile on. */ >> + while (worklist.length () > 0) >> + { >> + struct cgraph_edge *e; >> + >> + node = worklist.pop (); >> + for (e = node->callees; e; e = e->next_caller) >> + { >> + struct cgraph_node *callee = e->callee; >> + struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); >> + >> + if (callee->count > 0) >> + continue; >> + if (DECL_COMDAT (callee->decl) && fn && fn->cfg >> + && profile_status_for_function (fn) == PROFILE_READ) >> + { >> + /* Since there are no non-0 call counts to this function, >> + we don't know for sure whether it is hot. Indicate to >> + the drop_profile routine that function should be marked >> + normal, rather than hot. */ >> + drop_profile (node, false); >> + worklist.safe_push (callee); >> + } >> + } >> + } >> + worklist.release (); >> +} >> + >> /* Convert counts measured by profile driven feedback to frequencies. >> Return nonzero iff there was any nonzero execution count. */ >> >> @@ -2774,6 +2875,9 @@ counts_to_freqs (void) >> gcov_type count_max, true_count_max = 0; >> basic_block bb; >> >> + if (!ENTRY_BLOCK_PTR->count) >> + return 0; >> + >> FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb) >> true_count_max = MAX (bb->count, true_count_max); >> >> Index: predict.h >> =================================================================== >> --- predict.h (revision 204213) >> +++ predict.h (working copy) >> @@ -37,6 +37,7 @@ enum prediction >> >> extern void predict_insn_def (rtx, enum br_predictor, enum prediction); >> extern int counts_to_freqs (void); >> +extern void handle_missing_profiles (void); >> extern void estimate_bb_frequencies (bool); >> extern const char *predictor_name (enum br_predictor); >> extern tree build_predict_expr (enum br_predictor, enum prediction); >> Index: tree-profile.c >> =================================================================== >> --- tree-profile.c (revision 204213) >> +++ tree-profile.c (working copy) >> @@ -612,6 +612,8 @@ tree_profiling (void) >> pop_cfun (); >> } >> >> + handle_missing_profiles (); >> + >> del_node_map (); >> return 0; >> } >> >>> >>> Honza >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 Index: predict.c =================================================================== --- predict.c (revision 204516) +++ predict.c (working copy) @@ -2765,6 +2765,107 @@ estimate_loops (void) BITMAP_FREE (tovisit); } +/* Drop the profile for NODE to guessed, and update its frequency based on + whether it is expected to be HOT. */ + +static void +drop_profile (struct cgraph_node *node, bool hot) +{ + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); + + if (dump_file) + fprintf (dump_file, + "Dropping 0 profile for %s/%i. %s based on calls.\n", + cgraph_node_name (node), node->order, + hot ? "Function is hot" : "Function is normal"); + /* We only expect to miss profiles for functions that are reached + via non-zero call edges in cases where the function may have + been linked from another module or library (COMDATs and extern + templates). See the comments below for handle_missing_profiles. */ + if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl)) + warning (0, + "Missing counts for called function %s/%i", + cgraph_node_name (node), node->order); + + profile_status_for_function (fn) + = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT); + node->frequency + = hot ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL; +} + +/* In the case of COMDAT routines, multiple object files will contain the same + function and the linker will select one for the binary. In that case + all the other copies from the profile instrument binary will be missing + profile counts. Look for cases where this happened, due to non-zero + call counts going to 0-count functions, and drop the profile to guessed + so that we can use the estimated probabilities and avoid optimizing only + for size. + + The other case where the profile may be missing is when the routine + is not going to be emitted to the object file, e.g. for "extern template" + class methods. Those will be marked DECL_EXTERNAL. Emit a warning in + all other cases of non-zero calls to 0-count functions. */ + +void +handle_missing_profiles (void) +{ + struct cgraph_node *node; + int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION); + vec worklist; + worklist.create (64); + + /* See if 0 count function has non-0 count callers. In this case we + lost some profile. Drop its function profile to PROFILE_GUESSED. */ + FOR_EACH_DEFINED_FUNCTION (node) + { + struct cgraph_edge *e; + gcov_type call_count = 0; + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); + + if (node->count) + continue; + for (e = node->callers; e; e = e->next_caller) + call_count += e->count; + if (call_count + && fn && fn->cfg + && (call_count * unlikely_count_fraction >= profile_info->runs)) + { + bool maybe_hot = maybe_hot_count_p (NULL, call_count); + + drop_profile (node, maybe_hot); + worklist.safe_push (node); + } + } + + /* Propagate the profile dropping to other 0-count COMDATs that are + potentially called by COMDATs we already dropped the profile on. */ + while (worklist.length () > 0) + { + struct cgraph_edge *e; + + node = worklist.pop (); + for (e = node->callees; e; e = e->next_caller) + { + struct cgraph_node *callee = e->callee; + struct function *fn = DECL_STRUCT_FUNCTION (callee->decl); + + if (callee->count > 0) + continue; + if (DECL_COMDAT (callee->decl) && fn && fn->cfg + && profile_status_for_function (fn) == PROFILE_READ) + { + /* Since there are no non-0 call counts to this function, + we don't know for sure whether it is hot. Indicate to + the drop_profile routine that function should be marked + normal, rather than hot. */ + drop_profile (node, false); + worklist.safe_push (callee); + } + } + } + worklist.release (); +} + /* Convert counts measured by profile driven feedback to frequencies. Return nonzero iff there was any nonzero execution count. */ @@ -2774,6 +2875,9 @@ counts_to_freqs (void) gcov_type count_max, true_count_max = 0; basic_block bb; + if (!ENTRY_BLOCK_PTR->count) + return 0; + FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb) true_count_max = MAX (bb->count, true_count_max); Index: predict.h =================================================================== --- predict.h (revision 204516) +++ predict.h (working copy) @@ -37,6 +37,7 @@ enum prediction extern void predict_insn_def (rtx, enum br_predictor, enum prediction); extern int counts_to_freqs (void); +extern void handle_missing_profiles (void); extern void estimate_bb_frequencies (bool); extern const char *predictor_name (enum br_predictor); extern tree build_predict_expr (enum br_predictor, enum prediction); Index: tree-inline.c =================================================================== --- tree-inline.c (revision 204516) +++ tree-inline.c (working copy) @@ -2353,6 +2353,29 @@ redirect_all_calls (copy_body_data * id, basic_blo } } +/* Convert estimated frequencies into counts for NODE, scaling COUNT + with each bb's frequency. Used when NODE has a 0-weight entry + but we are about to inline it into a non-zero count call bb. + See the comments for handle_missing_profiles() in predict.c for + when this can happen for COMDATs. */ + +void +freqs_to_counts (struct cgraph_node *node, gcov_type count) +{ + basic_block bb; + edge_iterator ei; + edge e; + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); + + FOR_ALL_BB_FN(bb, fn) + { + bb->count = apply_scale (count, + GCOV_COMPUTE_SCALE (bb->frequency, BB_FREQ_MAX)); + FOR_EACH_EDGE (e, ei, bb->succs) + e->count = apply_probability (e->src->count, e->probability); + } +} + /* Make a copy of the body of FN so that it can be inserted inline in another function. Walks FN via CFG, returns new fndecl. */ @@ -2373,6 +2396,24 @@ copy_cfg_body (copy_body_data * id, gcov_type coun int incoming_frequency = 0; gcov_type incoming_count = 0; + /* This can happen for COMDAT routines that end up with 0 counts + despite being called (see the comments for handle_missing_profiles() + in predict.c as to why). Apply counts to the blocks in the callee + before inlining, using the guessed edge frequencies, so that we don't + end up with a 0-count inline body which can confuse downstream + optimizations such as function splitting. */ + if (!ENTRY_BLOCK_PTR_FOR_FUNCTION (src_cfun)->count && count) + { + /* Apply the larger of the call bb count and the total incoming + call edge count to the callee. */ + gcov_type in_count = 0; + struct cgraph_edge *in_edge; + for (in_edge = id->src_node->callers; in_edge; + in_edge = in_edge->next_caller) + in_count += in_edge->count; + freqs_to_counts (id->src_node, count > in_count ? count : in_count); + } + if (ENTRY_BLOCK_PTR_FOR_FUNCTION (src_cfun)->count) count_scale = GCOV_COMPUTE_SCALE (count, Index: tree-profile.c =================================================================== --- tree-profile.c (revision 204516) +++ tree-profile.c (working copy) @@ -612,6 +612,8 @@ tree_profiling (void) pop_cfun (); } + handle_missing_profiles (); + del_node_map (); return 0; }