From patchwork Fri Sep 6 10:58:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Graham Markall X-Patchwork-Id: 1158944 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-508471-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="sXNjbxpJ"; dkim=pass (2048-bit key; unprotected) header.d=embecosm.com header.i=@embecosm.com header.b="dOy4Iou2"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46Pvh51hKvz9s7T for ; Fri, 6 Sep 2019 20:59:15 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; q=dns; s=default; b=GcpfmKpZK/tC +ZjOfS19SU9PKXY5Sd26N8TS2Fd4MTwaQpw3oyTXDv5C/VUB4YDweope+qjPuRe/ 5vAdk7hbQdHBVCq+iU0keaUCxi4taH0xhRoWusCoLYvgikpJXJYRjhNpDPqhGKh6 OkoDvAn8Cf4GbN13L4LFI+/Kj1wP+zI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; s=default; bh=RzMSNwb11Xpd28JC0L /U1NXgSQ0=; b=sXNjbxpJ64JDPA7rsUbqGN+YoS0+X7LE8iL/8kCtZKlKGOIDt/ fc+RiOElIL6VykZxz2ZYJWPm4ws5+V1VUbZhNuQvK7rsmAbJidUwliXnE9EiskI1 Sn2Dwr/xD0RtR1kGiOHbEUy68XexNvongy91TGdBLep8IS83xo8O43ZV8= Received: (qmail 42232 invoked by alias); 6 Sep 2019 10:59:08 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 42224 invoked by uid 89); 6 Sep 2019 10:59:08 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-19.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=8444, 020, positively, presenting X-HELO: mail-wr1-f66.google.com Received: from mail-wr1-f66.google.com (HELO mail-wr1-f66.google.com) (209.85.221.66) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 06 Sep 2019 10:59:05 +0000 Received: by mail-wr1-f66.google.com with SMTP id s18so6111744wrn.1 for ; Fri, 06 Sep 2019 03:59:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=from:to:cc:subject:date:message-id; bh=vuSPgVrZdvl4P+IhxAZY3BOFu2s/NYvF+dtElIo3PAk=; b=dOy4Iou2my1gOOwt0/yIDPwUNZE62cQgOFuGl14d8OicjSaN2lIoci5jzJJk6xKfEH F8rGYyYNHB7V+aKnzQp0B8LPzJ6OP7YlN+2bo5GIrMjh06Yt/4wKISBv+fpI+jgCu5uK vzdO2N36GUudcexatwy4vTXus2vsxCLVqiuysLL54uJlxypOiiD26KfkpQPz5efz8jtA dxggDd+oG61VC/x0aouejUYlhYAOelSQCzzk4842eFs1ePWYFDU7JpxcAH32E6BjvgZk j6EdYXoQmH4BX1yWKAr2UWnpxT3Rglm0wwMiwXMCOWJF/Ni/ZDuZ0YQBF1sGsnXuRJEy jUpw== Received: from carrot.localdomain (cust64-dsl91-135-5.idnet.net. [91.135.5.64]) by smtp.gmail.com with ESMTPSA id h2sm8483002wrb.31.2019.09.06.03.59.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Sep 2019 03:59:02 -0700 (PDT) From: Graham Markall To: gcc-patches@gcc.gnu.org Cc: ofer.shinaar@wdc.com, nidal.faour@wdc.com, Graham Markall Subject: [RFC PATCH] Add inlining growth bias flag Date: Fri, 6 Sep 2019 11:58:46 +0100 Message-Id: <20190906105846.1123-1-graham.markall@embecosm.com> X-IsSubscribed: yes This patch is an RFC to invite comments as to whether the approach to solving the problem is a suitable one for upstream GCC, or whether there are alternative approaches that would be recommended. Motivation ---------- We have observed that in some cases the estimation of callee growth for inlining particular functions can be tuned for better overall code size with particular programs on particular targets. Although modification of the heuristics to make a general improvement is a difficult problem to tackle, simply biasing the growth by a fixed amount can lead to improvements in code size within the context of a particular program. This has first been tested on a proprietary program, where setting the growth bias to -2 resulted in a saving of 1.35% in the text section size (62396 bytes as opposed to 63252 bytes). Using the Embench suite ( https://www.embench.org/ ) for a riscv32 bare-metal target with -Os also shows that adjusting the inline growth estimate carefully can also reduce code size. Results presented here are percentages relative to the Embench reference platform (which is the standard way of presenting Embench results) for values of the bias from -2 to 2: Benchmark -2 -1 0 1 2 aha-mont64 99.05 99.05 99.05 99.05 99.05 crc32 100.00 100.00 100.00 100.00 100.00 cubic 94.66 94.66 94.66 94.66 94.66 edn 100.00 100.00 100.00 100.00 100.00 huffbench 99.88 99.88 99.88 99.88 99.88 matmult-int 100.00 100.00 100.00 102.86 102.86 minver 97.96 97.96 97.96 106.88 106.88 nbody 98.87 98.87 98.87 98.87 98.87 nettle-aes 99.31 99.10 99.10 99.10 99.10 nettle-sha256 99.89 99.89 99.89 99.89 99.89 nsichneu 100.00 100.00 100.00 100.00 100.00 picojpeg 102.56 102.54 99.73 99.38 99.28 qrduino 99.70 99.70 99.90 99.90 99.90 sglib-combined 95.52 100.00 100.00 100.43 101.12 slre 100.66 100.66 99.09 98.85 98.85 st 96.36 96.36 96.36 96.82 96.82 statemate 100.00 100.00 100.00 100.00 100.00 ud 100.00 100.00 100.00 100.00 100.00 wikisort 99.48 99.48 99.48 99.48 99.48 Mean 99.14 99.36 99.15 99.77 99.80 In most cases, leaving the bias at 0 (unmodified) produces the smallest code. However, there are some cases where an alternative value prevails, The "Best Diff" column shows the reduction in size compared to leaving the bias at 0, for cases where changing it yielded an improvement: Benchmark Best Worst Best diff aha-mont64 0 0 crc32 0 0 cubic 0 0 edn 0 0 huffbench 0 0 matmult-int 0 1 / 2 minver 0 1 / 2 nbody 0 0 nettle-aes 0 -2 nettle-sha256 0 0 nsichneu 0 0 picojpeg 2 -2 -0.45% qrduino -1 /-2 0 -0.20% sglib-combined -2 1 -4.48% slre 1 / 2 -1 / -2 -0.24% st 0 1 / 2 statemate 0 0 ud 0 0 wikisort 0 0 In summary, for this test setup: - In most cases, leaving the growth bias at 0 is optimal. - Biasing the growth estimate positively may either increase or decrease code size. - Biasing the estimate negatively may also either increase or decrease code size. - In a small number of cases, biasing the growth estimate improves code size (up to 4.48% smaller for sglib-combined). Patch ----- The growth bias can be set with a flag: -finline-growth-bias= which controls the bias (an increase or decrease) of the inline growth in ipa-inline.c. In cases where the bias would result in a negative function size, we clip the growth estimate so that adding the growth to the size of the function results in a size of 0, by setting the growth to the negative of the function size. There is not a great deal of validation of the argument - if a non-integer is passed as the parameter (e.g. -finline-growth-bias=xyz), it will be as if the parameter were 0. More validation could be added if necessary. It seemed to me that GCC's infrastructure doesn't seem to anticipate option values / parameters that could contain negative values. For parameters, -1 seemed to represent an error and could result in an ICE (-2 etc. pass through OK though). For options, I looked at the UInteger and Host_Wide_Int numeric types, but they both expect a positive integer. I did consider extending this with an Integer type that could accept positive and negative integers, (e.g. starting with augmenting switch_bit_fields in opt-functions.awk) but rooting out assumptions of non-negative integer values in places further down seemed like an onerous task. Further discussion ------------------ Given that a default setting of 0 (equivalent to current behaviour) should provide for some potential improvement in code size in individual situations where it is critical (albeit using a rather coarse instrument to do so), without affecting the general case - is the addition of such a flag to GCC likely to be considered an acceptable inclusion for this purpose? gcc/ChangeLog: * common.opt: Add -finline-growth-bias= flag. * ipa-inline.h (estimate_edge_growth): Adjust inline growth by bias factor, if supplied. --- gcc/common.opt | 4 ++++ gcc/ipa-inline.h | 39 ++++++++++++++++++++++++++++++++++++++- 2 files changed, 42 insertions(+), 1 deletion(-) diff --git a/gcc/common.opt b/gcc/common.opt index c1605385712..4725df2c477 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1670,6 +1670,10 @@ finline-limit= Common RejectNegative Joined UInteger -finline-limit= Limit the size of inlined functions to . +finline-growth-bias= +Common RejectNegative Joined Var(flag_inline_growth_bias) Init(0) +-finline-growth-bias= Adjust the growth estimate for inlined functions by . + finline-atomics Common Report Var(flag_inline_atomics) Init(1) Optimization Inline __atomic operations when a lock free instruction sequence is available. diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h index 18c8e1eebd0..65e105b00a2 100644 --- a/gcc/ipa-inline.h +++ b/gcc/ipa-inline.h @@ -84,7 +84,44 @@ estimate_edge_growth (struct cgraph_edge *edge) { ipa_call_summary *s = ipa_call_summaries->get (edge); gcc_checking_assert (s->call_stmt_size || !edge->callee->analyzed); - return (estimate_edge_size (edge) - s->call_stmt_size); + + if (dump_file) + { + const char *caller_name + = edge->caller->ultimate_alias_target ()->name (); + const char *callee_name + = edge->callee->ultimate_alias_target ()->name (); + + fprintf (dump_file, "Estimate edge growth for %s -> %s\n", + caller_name, callee_name); + fprintf (dump_file, "\t estimate_edge_size = %d\n", + estimate_edge_size (edge)); + fprintf (dump_file, "\t s (%p) ->call_stmt_size = %d\n", + ((void *) s), s->call_stmt_size); + } + + int growth = (estimate_edge_size (edge) - s->call_stmt_size); + + int sz = 0; + if (flag_inline_growth_bias) + sz = strtol(flag_inline_growth_bias, NULL, 0); + if (sz != 0) + { + struct cgraph_node *caller = edge->caller; + ipa_fn_summary *fs = ipa_fn_summaries->get (caller); + if (dump_file) + fprintf (dump_file, "\t Adjusting growth by %d\n", sz); + growth += sz; + if (growth < 0 && fs->size + growth < 0) { + if (dump_file) + fprintf (dump_file, "\t Growth %d would shrink size beneath zero.\n", growth); + growth = -fs->size; + if (dump_file) + fprintf (dump_file, "\t Setting growth to %d to make size zero.\n", growth); + } + } + + return growth; } /* Return estimated callee runtime increase after inlining