From patchwork Fri Jun 16 11:46:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 776673 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wpz9n3FM2z9s8J for ; Fri, 16 Jun 2017 21:46:52 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="o5uV5ASo"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=dHOjUT4KfiZD4tNgmxxSXtJXd2uOkuwWBsidyngDGLqziNG1WlF+7 YYqZR9Mwp9MjWUK23wDgT/Jt1aMDwjxurZ/Nf4KXWIHWlzHq0q6rFEMRPcBhZb+y U0l79YbmJMsvnIVZSimSz1uy/4ZCdknQW4JdUzcAKx3R6ySpJfpQ6g= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=STxBBxdqfsz+DdHBW20dGlKUhKE=; b=o5uV5ASoMhth/c4YOb/b dzt+NXg/D5vF51Iko/bHo8gOiUBY4iD8oGL0nGKFveLUDmFmtWf7s3N59HV0WvXz LMW8A5Co0Dcc5OyFYhSvRsLh0fmVgfzyK5tkPgmsk3xZUJG5PN9lhWiS6Djlri/I NNY2umCDkpVbDQB+VB/pmuE= Received: (qmail 93535 invoked by alias); 16 Jun 2017 11:46:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 93526 invoked by uid 89); 16 Jun 2017 11:46:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=offline, Hx-languages-length:2688, training, perfect X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 16 Jun 2017 11:46:38 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 5C75B542888; Fri, 16 Jun 2017 13:46:41 +0200 (CEST) Date: Fri, 16 Jun 2017 13:46:41 +0200 From: Jan Hubicka To: gcc-patches@gcc.gnu.org Subject: Make profile update in inlining more robust Message-ID: <20170616114641.GB6166@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) Hi, when inlining last copy of function, we assume that the counts are already correct, because all other execution paths has been inlined earlier. While this is true in simple cases of static functions, it can be wrong for comdats where we optimize out the last copy but during training run we may have collected execution counts from other units. This could then resuit in quite large profile inconsistencies and thus it is better to fix. Of course this is not perfect if some offline copy of function is later produced, because that one will have zero counts. I do not think there is easy solution for this problem except for LTO or possibly arranging profile counts of different copies of same comdat to be shared which we don't have infrastructure for. Bootstrapped/regtested x86_64-linux, comitted. Honza * ipa-inline-transform.c (update_noncloned_frequencies): Update also counts. (clone_inlined_nodes): Update. Index: ipa-inline-transform.c =================================================================== --- ipa-inline-transform.c (revision 249227) +++ ipa-inline-transform.c (working copy) @@ -54,10 +54,12 @@ int nfunctions_inlined; /* Scale frequency of NODE edges by FREQ_SCALE. */ static void -update_noncloned_frequencies (struct cgraph_node *node, - int freq_scale) +update_noncloned_frequencies (struct cgraph_node *node, + int freq_scale, profile_count num, + profile_count den) { struct cgraph_edge *e; + bool scale = (num == profile_count::zero () || den > 0); /* We do not want to ignore high loop nest after freq drops to 0. */ if (!freq_scale) @@ -68,14 +70,20 @@ update_noncloned_frequencies (struct cgr if (e->frequency > CGRAPH_FREQ_MAX) e->frequency = CGRAPH_FREQ_MAX; if (!e->inline_failed) - update_noncloned_frequencies (e->callee, freq_scale); + update_noncloned_frequencies (e->callee, freq_scale, num, den); + if (scale) + e->count = e->count.apply_scale (num, den); } for (e = node->indirect_calls; e; e = e->next_callee) { e->frequency = e->frequency * (gcov_type) freq_scale / CGRAPH_FREQ_BASE; if (e->frequency > CGRAPH_FREQ_MAX) e->frequency = CGRAPH_FREQ_MAX; + if (scale) + e->count = e->count.apply_scale (num, den); } + if (scale) + node->count = node->count.apply_scale (num, den); } /* We removed or are going to remove the last call to NODE. @@ -212,7 +220,8 @@ clone_inlined_nodes (struct cgraph_edge } duplicate = false; e->callee->externally_visible = false; - update_noncloned_frequencies (e->callee, e->frequency); + update_noncloned_frequencies (e->callee, e->frequency, + e->count, e->callee->count); dump_callgraph_transformation (e->callee, inlining_into, "inlining to");