From patchwork Mon Sep 16 16:39:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 1163002 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-509061-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ucw.cz Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="a/ovPXCT"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46XBm76Hc1z9sPk for ; Tue, 17 Sep 2019 02:39:34 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; q=dns; s= default; b=Vuzl84PJoxImf7GFJTLQ8pLSEYBz8N7oPARbTPFc3wlz+FV8GIFLn hTYIV0F8olb/Bgkdf9xGP6aO+9f4FvLC1dkip0D4wMCQ2n8eIWnf5Cx2OGTF7FpI nUic73IxYE/oAriZGUarScRQgZTTtfao8C3tf2/JzZ8C1HffuvBYqw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:mime-version:content-type; s= default; bh=uo94Irid4yqH8PmREpGZ3iYk37Y=; b=a/ovPXCTqRVXvKRxKaQL HW/Hktru4/F3y/6KX6fvhJA5/OxK1HY5gH2lV9sVg0ucax+352VIUlMOJq3u2JVf D8iofsNQ8WWOkMFxQ5zZWvwxTRFo/7KPJTpUUbaeMZ9wGXn8DQZPw2Ge0cQbqvLX EsLPh6yvFShGBT585cSLsF0= Received: (qmail 40966 invoked by alias); 16 Sep 2019 16:39:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 40600 invoked by uid 89); 16 Sep 2019 16:39:26 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-9.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT autolearn=ham version=3.3.1 spammy=aiming, shrink, investigate, 058 X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 16 Sep 2019 16:39:17 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 7CE65281EDC; Mon, 16 Sep 2019 18:39:14 +0200 (CEST) Date: Mon, 16 Sep 2019 18:39:14 +0200 From: Jan Hubicka To: gcc-patches@gcc.gnu.org Subject: -O2 inliner returning 1/n: reduce EARLY_INLINING_INSNS for O1 and O2 Message-ID: <20190916163914.macczibmd54rjqri@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: NeoMutt/20170113 (1.7.2) Hi, as discussed on Cauldron this week I plan to push out changes enabling -finline-functions at -O2 with limited parameters aiming to overal better performance without large code size increases. Currently we do inline agressively functions declared inline, we inline when function size is expected to shrink and we also do limited auto-inlining in early inliner for non-inline functions even if code grows. This is handled by PARAM_EARLY_INLINING_INSNS. This patch tunes it down or -O2 in order to get some room for real IPA inliner to do its work. Combined efect of my chages are in https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_elf_detail_stats=on&min_percentage_change=0.001&revisions=ddee20190fa78935338bc3161c1b29b8528d82dd%2C9b247ee17d1030b88462531225cc842251507bb6 This involves further forking inline-insns-auto, inline-insns-single and big-speedup params. Generally I was able to mostly improve SPEC 2006 and 2017 scores as follows: O2 Kabylake SPEC/SPEC2006/INT/total 0.58% SPEC/SPEC2006/FP/total 0.19% SPEC/SPEC2017/FP/total 0.45% SPEC/SPEC2017/INT/total 0.18% O2 LTO Kabylake SPEC/SPEC2006/INT/total 1.08% SPEC/SPEC2006/FP/total 0.60% O2 Zen SPEC/SPEC2006/INT/total 1.64% SPEC/SPEC2006/FP/total 0.23% SPEC/SPEC2017/INT/total -0.58% SPEC/SPEC2017/FP/total 0.52% O2 Zen LTO SPEC/SPEC2006/FP/total 1.40% SPEC/SPEC2006/INT/total 1.26% SPEC/SPEC2017/INT/total 0.93% SPEC/SPEC2017/FP/total -0.22% The SPEC2017 FP on Zen is affected by 10% regression on CactusBSSN that seems to be due to microarchitectural behaviour depending on code layout rather than any inlining changes in hot parts of program. Other notable regression is omnetpp that shows on Zen only too. Comparing Zen and Kaby result it seems that only consistent loser id gcc (3%) a xalancbmk (2.8%) both with non-LTO only. I plan to investigate those if regression persists even though it is bit small and there is no obvious problem in the backtrace. Code size improves by 0.67% or SPEC2006 non-LTO and regresses by 1.64% with LTO For 2017 it is 2.2% improvement and 2.4% regression respectively. The difference between LTO and non-LTO is mostly due to fact that LTO units tends to hit overall unit growth cap of inlining since there are too many inline candidates. For this reason the patch is not as effective on Firefox and other realy big packages as I would like. I still plan number of changes to inliner this stage1 so this is not final situation, but I think it is better to do the change early so it gets tested on other architectures. (And it was concensus of the Caudlron discussion by my understanding) This patch is not enabling -finline-functions so it will temporarily regress perofrmance (and improve code size). i am doing this in incremental steps to get more data on both inliners. Bootstrapped/regtested x86_64-linux, plan to commit it later today. Honza * ipa-inline.c (want_early_inline_function_p): Use PARAM_EARLY_INLINING_INSNS_O2. * params.def (PARAM_EARLY_INLINING_INSNS_O2): New. (PARAM_EARLY_INLINING_INSNS): Update documentation. * invoke.texi (early-inlining-insns-O2): New. (early-inlining-insns): Update documentation. Index: ipa-inline.c =================================================================== --- ipa-inline.c (revision 275716) +++ ipa-inline.c (working copy) @@ -641,6 +641,10 @@ want_early_inline_function_p (struct cgr { int growth = estimate_edge_growth (e); int n; + int early_inlining_insns = opt_for_fn (e->caller->decl, optimize) >= 3 + ? PARAM_VALUE (PARAM_EARLY_INLINING_INSNS) + : PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_O2); + if (growth <= PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SIZE)) ; @@ -654,26 +658,28 @@ want_early_inline_function_p (struct cgr growth); want_inline = false; } - else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS)) + else if (growth > early_inlining_insns) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt, " will not early inline: %C->%C, " - "growth %i exceeds --param early-inlining-insns\n", - e->caller, callee, - growth); + "growth %i exceeds --param early-inlining-insns%s\n", + e->caller, callee, growth, + opt_for_fn (e->caller->decl, optimize) >= 3 + ? "" : "-O2"); want_inline = false; } else if ((n = num_calls (callee)) != 0 - && growth * (n + 1) > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS)) + && growth * (n + 1) > early_inlining_insns) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt, " will not early inline: %C->%C, " - "growth %i exceeds --param early-inlining-insns " + "growth %i exceeds --param early-inlining-insns%s " "divided by number of calls\n", - e->caller, callee, - growth); + e->caller, callee, growth, + opt_for_fn (e->caller->decl, optimize) >= 3 + ? "" : "-O2"); want_inline = false; } } Index: params.def =================================================================== --- params.def (revision 275716) +++ params.def (working copy) @@ -233,8 +233,12 @@ DEFPARAM(PARAM_IPCP_UNIT_GROWTH, 10, 0, 0) DEFPARAM(PARAM_EARLY_INLINING_INSNS, "early-inlining-insns", - "Maximal estimated growth of function body caused by early inlining of single call.", + "Maximal estimated growth of function body caused by early inlining of single call with -O3 and -Ofast.", 14, 0, 0) +DEFPARAM(PARAM_EARLY_INLINING_INSNS_O2, + "early-inlining-insns-O2", + "Maximal estimated growth of function body caused by early inlining of single call with -O1 and -O2.", + 6, 0, 0) DEFPARAM(PARAM_LARGE_STACK_FRAME, "large-stack-frame", "The size of stack frame to be considered large.", Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 275716) +++ doc/invoke.texi (working copy) @@ -11290,9 +11290,17 @@ recursion depth can be guessed from the via a given call expression. This parameter limits inlining only to call expressions whose probability exceeds the given threshold (in percents). +@item early-inlining-insns-O2 +Specify growth that the early inliner can make. In effect it increases +the amount of inlining for code having a large abstraction penalty. +This is applied to functions compiled with @option{-O1} or @option{-O2} +optimization levels. + @item early-inlining-insns Specify growth that the early inliner can make. In effect it increases the amount of inlining for code having a large abstraction penalty. +This is applied to functions compiled with @option{-O3} or @option{-Ofast} +optimization levels. @item max-early-inliner-iterations Limit of iterations of the early inliner. This basically bounds