From patchwork Fri Nov 17 19:53:07 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 839145 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-467216-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="eFuit4hw"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ydph35TZGz9ryQ for ; Sat, 18 Nov 2017 06:53:23 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=KCIKO3H4Iosq2AYVcu3dYKsU08COOIl4Vt3Vd9kaKQ0PnCRM6jYfV Do3XqvcOnzsk8bECQTesKiR+FnH71eBsubMF4Qf2rg1Vtt3RN6a4MhG71SlepIKz RahipMaj7P/+Bfre1FBoUTGmGJ09ibbklYyZ+DIusmfUs+NnQ3uS7U= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=AY59M1eyRqjEO2I70wgrFXHEK+I=; b=eFuit4hwB2ReVko+FMZl 1EOmprvUZK4yynFgr/bSDxdSvEXDrAXuJ1DZ+wdjJ0fvXXG8MaYkVLlB2iJOdkfn 9anNqMCPOY+aF2/B8GY9c48bgriV0tS9MNkdqYsb6KG+o27XZbOLDpPswqzb2Awg ocsXsOBx8uk9PodwyIANZY4= Received: (qmail 96717 invoked by alias); 17 Nov 2017 19:53:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 96704 invoked by uid 89); 17 Nov 2017 19:53:10 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-9.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, KB_WAM_FROM_NAME_SINGLEWORD, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=thread3, Hx-languages-length:4147, Stay, dump2c X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 17 Nov 2017 19:53:09 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 5442A546956; Fri, 17 Nov 2017 20:53:07 +0100 (CET) Date: Fri, 17 Nov 2017 20:53:07 +0100 From: Jan Hubicka To: gcc-patches@gcc.gnu.org Subject: Increase precision of static profiles Message-ID: <20171117195307.GC15395@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) Hi, this patch makes static profile to be in range 0...2^30 rather than 0...10000. This is safe now as profile-counts are taking care of possible overflow when the profile ends up cummulating too high after inlining. MThere are two testcases that needs adusting. dump-2.c simply checks for specific value of counter that is now different. pr77445-2 now gets one extra mismatch reported. The mismatch was present before too but due to low precision it was not visible. Bootstrapped/regtested x86_64-linux, comitted. Honza * predict.c (determine_unlikely_bbs): Set cgraph node count to 0 when entry block was promoted unlikely. (estimate_bb_frequencies): Increase frequency scale. * profile-count.h (profile_count): Export precision info. * gcc.dg/tree-ssa/dump-2.c: Fixup template for profile precision changes. * gcc.dg/tree-ssa/pr77445-2.c: Fixup template for profile precision changes. Index: predict.c =================================================================== --- predict.c (revision 254884) +++ predict.c (working copy) @@ -3542,6 +3542,8 @@ determine_unlikely_bbs () bb->index, e->dest->index); e->probability = profile_probability::never (); } + if (ENTRY_BLOCK_PTR_FOR_FN (cfun)->count == profile_count::zero ()) + cgraph_node::get (current_function_decl)->count = profile_count::zero (); } /* Estimate and propagate basic block frequencies using the given branch @@ -3565,7 +3567,11 @@ estimate_bb_frequencies (bool force) { real_values_initialized = 1; real_br_prob_base = REG_BR_PROB_BASE; - real_bb_freq_max = BB_FREQ_MAX; + /* Scaling frequencies up to maximal profile count may result in + frequent overflows especially when inlining loops. + Small scalling results in unnecesary precision loss. Stay in + the half of the (exponential) range. */ + real_bb_freq_max = (uint64_t)1 << (profile_count::n_bits / 2); real_one_half = sreal (1, -1); real_inv_br_prob_base = sreal (1) / real_br_prob_base; real_almost_one = sreal (1) - real_inv_br_prob_base; @@ -3610,6 +3616,8 @@ estimate_bb_frequencies (bool force) freq_max = BLOCK_INFO (bb)->frequency; freq_max = real_bb_freq_max / freq_max; + if (freq_max < 16) + freq_max = 16; cfun->cfg->count_max = profile_count::uninitialized (); FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb) { Index: profile-count.h =================================================================== --- profile-count.h (revision 254884) +++ profile-count.h (working copy) @@ -605,11 +605,13 @@ class sreal; class GTY(()) profile_count { +public: /* Use 62bit to hold basic block counters. Should be at least 64bit. Although a counter cannot be negative, we use a signed type to hold various extra stages. */ static const int n_bits = 61; +private: static const uint64_t max_count = ((uint64_t) 1 << n_bits) - 2; static const uint64_t uninitialized_count = ((uint64_t) 1 << n_bits) - 1; Index: testsuite/gcc.dg/tree-ssa/dump-2.c =================================================================== --- testsuite/gcc.dg/tree-ssa/dump-2.c (revision 254884) +++ testsuite/gcc.dg/tree-ssa/dump-2.c (working copy) @@ -6,4 +6,4 @@ int f(void) return 0; } -/* { dg-final { scan-tree-dump " \\\[local count: 10000\\\]:" "optimized" } } */ +/* { dg-final { scan-tree-dump " \\\[local count: " "optimized" } } */ Index: testsuite/gcc.dg/tree-ssa/pr77445-2.c =================================================================== --- testsuite/gcc.dg/tree-ssa/pr77445-2.c (revision 254884) +++ testsuite/gcc.dg/tree-ssa/pr77445-2.c (working copy) @@ -120,7 +120,7 @@ enum STATES FMS( u8 **in , u32 *transiti profile estimation stage. But the number of inconsistencies should not increase much. */ /* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Invalid sum" 2 "thread1" } } */ +/* { dg-final { scan-tree-dump-times "Invalid sum" 3 "thread1" } } */ /* { dg-final { scan-tree-dump-not "not considered" "thread1" } } */ /* { dg-final { scan-tree-dump-not "not considered" "thread2" } } */ /* { dg-final { scan-tree-dump-not "not considered" "thread3" } } */