From patchwork Sun Sep 29 15:06:57 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Teresa Johnson X-Patchwork-Id: 278871 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 4DF302C00DE for ; Mon, 30 Sep 2013 01:07:08 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=uWO/81L287CWndSYpb 5MXwNxIIEBEhmg5ATU5oH0NYPH3dguRHwx7Io2M0b3I9YPYXXJ0Q6AwBuHSFSGw3 DoO2w9jVxOLno1zNDKHy2bvaVI2EeSHwr1wBP4xPvjOFyKLVUg3bhGPrU+nESf+e 28MiI2qu7T6aW5xoxrceLh2NQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=sOlHlK2I9Rx+l6NsPy3oURrr cPw=; b=gatDgI80p1S51RLve27U6Q7BAPSBI00cLW/RIXn7MAvjL4WVLDgj1Q5Y gsaC8x8oo4WYKWnHAq5mEVmtVGFfTGbxQQgsKjmzw1iae2SZKfPUA0gFzWOppj+I PDL/E+m4lfk6gxV/sv/gsvH386H5X2bDqTUQHYTkamglFAVkO1M= Received: (qmail 31733 invoked by alias); 29 Sep 2013 15:07:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 31722 invoked by uid 89); 29 Sep 2013 15:07:01 -0000 Received: from mail-qa0-f50.google.com (HELO mail-qa0-f50.google.com) (209.85.216.50) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Sun, 29 Sep 2013 15:07:01 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, NO_RELAYS autolearn=ham version=3.3.2 X-HELO: mail-qa0-f50.google.com Received: by mail-qa0-f50.google.com with SMTP id j7so1672541qaq.16 for ; Sun, 29 Sep 2013 08:06:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=7G3nD2ZL9TZiGU3DXVLstLWGpC7ySUXI3tUUXxXOJY4=; b=SKVM2/nnWBr4tsui+Wc/wYW5KUbXmgWA9D7r9GwQUp7keCX9pZlFP6DRrEoGeBXudS rzbVdUUwcFu+JQszTsfDy+OlyJMKw9R4EME/Hhtww+eyxVlM5vBw/xdrgjKHgKmOAjyl 2gKYAUGb4IQiaTvGucpEUQ0LBTVKWnOzBJ04/PJyZDdmhYdluNETEE2+X4TCl+9B+mpz fKtFRnLH3EzIXKX0kZmx+YyXX5vb/ymZnvcCZBnJxvkWtqpeL8dlpAtUQ6zRSwT9Sbz0 FloViqIjL28SETQsL7p10EwudPm9DtL4fO4BCyIDoUftdceBgLXGg4ZWTbzR9cn6ql1J tuag== X-Gm-Message-State: ALoCoQkNHoJzr3qkP+S8/Jivd6aJdzJ7E1Vk/34GQQ5+jzjezoxU6VPo0FPoiNYwpVwl5SRcCqDrXGfrBpY7l0U9DwNrHEuhd56rq/9wblNTX8b0iBNJ4syNiLSaWKliNisQ16yvjc6dnlBEv/6HKtiriSP9O5ggIIHTwGt93veBgdV07btwJy2s4FiaZfIY3tx4qVUFHKY41su9/6gcDt3J7e/HkiLOdQ== MIME-Version: 1.0 X-Received: by 10.49.116.99 with SMTP id jv3mr23091207qeb.34.1380467217413; Sun, 29 Sep 2013 08:06:57 -0700 (PDT) Received: by 10.49.24.225 with HTTP; Sun, 29 Sep 2013 08:06:57 -0700 (PDT) In-Reply-To: References: <20130817204408.GA16557@kam.mff.cuni.cz> <20130819150942.GA28264@kam.mff.cuni.cz> <20130831160420.GC7492@kam.mff.cuni.cz> <20130831214614.GA12372@kam.mff.cuni.cz> <20130924175727.GA24697@kam.mff.cuni.cz> <20130926220209.GB13383@kam.mff.cuni.cz> Date: Sun, 29 Sep 2013 08:06:57 -0700 Message-ID: Subject: Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition From: Teresa Johnson To: Jan Hubicka Cc: "gcc-patches@gcc.gnu.org" , "marxin.liska" X-IsSubscribed: yes On Fri, Sep 27, 2013 at 7:15 AM, Teresa Johnson wrote: > On Thu, Sep 26, 2013 at 3:02 PM, Jan Hubicka wrote: >>> >>> Why not just have probably_never_executed_bb_p return simply return >>> false bb->frequency is non-zero (right now it does the opposite - >> >> We want to have frequencies guessed for functions that was not trained >> in the profiling run (that was patch I posted earlier that I think did not >> go in, yet). > > Right, but for splitting and bb layout purposes, for these statically > guessed unprofiled functions we in fact don't want to do any splitting > or treat the bbs as never executed (which shouldn't be a change from > the status quo since all the bbs in these functions are currently 0 > weight, it's only when we inline in the case of comdats that they > appear colder than the surrounding code, but in fact we don't want > this). > > The only other caller to probably_never_executed_bb_p is > compute_function_frequency, but in the case of statically guessed > functions they will have profile_status != PROFILE_READ and won't > invoke probably_never_executed_bb_p. But re-reading our most recent > exchange on the comdat profile issue, it sounds like you were > suggesting guessing profiles for all 0-weight functions early, then > dropping them from PROFILE_READ to PROFILE_GUESSED only once we > determine in ipa-inline that there is a potentially non-zero call path > to them. In that case with the change I describe above to > probably_never_executed_bb_p, the 0-weight functions with 0 calls to > them will incorrectly be marked as NODE_FREQUENCY_NORMAL, which would > be bad as they would not be size optimized or moved into the cold > section. > > So it seems like we want different handling of these guessed > frequencies in compute_function_frequency and bb-reorder.c. Actually I > think we can handle this by checking if the function entry block has a > 0 count. If so, then we just look at the bb counts and not the > frequencies for determining bb hotness as the frequencies would > presumably have been statically-guessed. This will ensure that the > cgraph node continues to be marked unlikely and size-optimized. If the > function entry block has a non-zero count, then we look at both the bb > count and the bb frequency - if they are both zero then the bb is > probably never executed, but if either is non-zero then we should > treat the block as possibly executed (which will come into play for > splitting and bb layout). Here is a patch to handle the profile insanities conservatively during splitting. It also simplifies the probably_never_executed* code to treat missing counts within a profiled function differently (conservatively, based on frequency) from the case where the whole function has a guessed profile. That way, once a patch to guess profiles for non-executed functions is added, they will continue to have their nodes marked as unlikely. I also pulled the guts of the probably_never_executed_bb_p code out to a helper that is then invoked by both the bb and edge versions of this function, so they stay in sync. This gets rid of a number of the failures with splitting + the linker script to make the unlikely section non-executable. I have a patch to fix some jump threading insanities that I will send separately. Bootstrapped and regression tested on x86_64. Also tested with an lto profiledbootstrap. Ok for trunk? Thanks, Teresa 2013-09-29 Teresa Johnson * bb-reorder.c (find_rarely_executed_basic_blocks_and_crossing_edges): Treat profile insanities conservatively. * predict.c (probably_never_executed): New function. Treat profile insanities conservatively. (probably_never_executed_bb_p): Invoke probably_never_executed. (probably_never_executed_edge_p): Invoke probably_never_executed. > > Teresa > >> >> Currently I return true when frequency indicate that BB is executed at least in >> 1/4th of all executions. With the cases discussed I see we may need to reduce >> this threshold. In general I do not like much hard tests for 0 because meaning >> of 0 depends on REG_BR_FREQ_BASE that is supposed to be changeable and we may >> want to make frequencies sreal, too. >> >> I suppose we may introduce --param for this. You are also right that I should >> update probably_never_executed_edge_p (I intended so, but obviously the code >> ended up in mainline accidentally). >> >> I however saw at least one case of jump threading where this trick did not >> help: the jump threading update confused itself by scaling via counts rather >> than frequencies and ended up with dropping everything to 0. This makes it >> more tempting to try to go with sreals for those.... >> >> Honza >> >>> returns true when bb->frequency is 0)? Making this change removed a >>> bunch of other failures. With this change as well, there are only 3 >>> cases that still fail with 1 train run that pass with 100. Need to >>> look at those. >>> >>> > >>> > Will you look into logic of do_jump or shall I try to dive in? >>> >>> I can take a look, but probably won't have a chance until late this >>> week. If you don't get to it before then I will see if I can figure >>> out why it is applying the branch probabilities this way. >>> >>> Teresa >>> >>> > >>> > Honza >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 Index: bb-reorder.c =================================================================== --- bb-reorder.c (revision 202947) +++ bb-reorder.c (working copy) @@ -1564,8 +1564,25 @@ find_rarely_executed_basic_blocks_and_crossing_edg /* Mark which partition (hot/cold) each basic block belongs in. */ FOR_EACH_BB (bb) { + bool cold_bb = false; if (probably_never_executed_bb_p (cfun, bb)) { + /* Handle profile insanities created by upstream optimizations + by also checking the incoming edge weights. If there is a non-cold + incoming edge, conservatively prevent this block from being split + into the cold section. */ + cold_bb = true; + FOR_EACH_EDGE (e, ei, bb->preds) + { + if (!probably_never_executed_edge_p (cfun, e)) + { + cold_bb = false; + break; + } + } + } + if (cold_bb) + { BB_SET_PARTITION (bb, BB_COLD_PARTITION); cold_bb_count++; } Index: predict.c =================================================================== --- predict.c (revision 202947) +++ predict.c (working copy) @@ -226,26 +226,26 @@ maybe_hot_edge_p (edge e) } -/* Return true in case BB is probably never executed. */ -bool -probably_never_executed_bb_p (struct function *fun, const_basic_block bb) +/* Return true if profile COUNT and FREQUENCY, or function FUN static + node frequency reflects never being executed. */ + +static bool +probably_never_executed (struct function *fun, + gcov_type count, int frequency) { gcc_checking_assert (fun); if (profile_status_for_function (fun) == PROFILE_READ) { - if ((bb->count * 4 + profile_info->runs / 2) / profile_info->runs > 0) + if ((count * 4 + profile_info->runs / 2) / profile_info->runs > 0) return false; - if (!bb->frequency) - return true; - if (!ENTRY_BLOCK_PTR->frequency) - return false; - if (ENTRY_BLOCK_PTR->count && ENTRY_BLOCK_PTR->count < REG_BR_PROB_BASE) - { - return (RDIV (bb->frequency * ENTRY_BLOCK_PTR->count, - ENTRY_BLOCK_PTR->frequency) - < REG_BR_PROB_BASE / 4); - } + // If this is a profiled function (entry bb non-zero count), then base + // the coldness decision on the frequency. This will handle cases where + // counts are not updated properly during optimizations or expansion. + if (ENTRY_BLOCK_PTR->count) + return frequency == 0; + // Unprofiled function, frequencies statically assigned. All bbs are + // treated as cold. return true; } if ((!profile_info || !flag_branch_probabilities) @@ -256,19 +256,21 @@ maybe_hot_edge_p (edge e) } +/* Return true in case BB is probably never executed. */ + +bool +probably_never_executed_bb_p (struct function *fun, const_basic_block bb) +{ + return probably_never_executed (fun, bb->count, bb->frequency); +} + + /* Return true in case edge E is probably never executed. */ bool probably_never_executed_edge_p (struct function *fun, edge e) { - gcc_checking_assert (fun); - if (profile_info && flag_branch_probabilities) - return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0; - if ((!profile_info || !flag_branch_probabilities) - && (cgraph_get_node (fun->decl)->frequency - == NODE_FREQUENCY_UNLIKELY_EXECUTED)) - return true; - return false; + return probably_never_executed (fun, e->count, EDGE_FREQUENCY (e)); } /* Return true if NODE should be optimized for size. */