From patchwork Mon Sep 16 16:39:14 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Hubicka <hubicka@ucw.cz>
X-Patchwork-Id: 1163002
Return-Path: 
 <gcc-patches-return-509061-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-509061-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=none (p=none dis=none) header.from=ucw.cz
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="a/ovPXCT"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 46XBm76Hc1z9sPk
	for <incoming@patchwork.ozlabs.org>;
	Tue, 17 Sep 2019 02:39:34 +1000 (AEST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:message-id:mime-version:content-type; q=dns; s=
	default; b=Vuzl84PJoxImf7GFJTLQ8pLSEYBz8N7oPARbTPFc3wlz+FV8GIFLn
	hTYIV0F8olb/Bgkdf9xGP6aO+9f4FvLC1dkip0D4wMCQ2n8eIWnf5Cx2OGTF7FpI
	nUic73IxYE/oAriZGUarScRQgZTTtfao8C3tf2/JzZ8C1HffuvBYqw=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:subject:message-id:mime-version:content-type; s=
	default; bh=uo94Irid4yqH8PmREpGZ3iYk37Y=; b=a/ovPXCTqRVXvKRxKaQL
	HW/Hktru4/F3y/6KX6fvhJA5/OxK1HY5gH2lV9sVg0ucax+352VIUlMOJq3u2JVf
	D8iofsNQ8WWOkMFxQ5zZWvwxTRFo/7KPJTpUUbaeMZ9wGXn8DQZPw2Ge0cQbqvLX
	EsLPh6yvFShGBT585cSLsF0=
Received: (qmail 40966 invoked by alias); 16 Sep 2019 16:39:26 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 40600 invoked by uid 89); 16 Sep 2019 16:39:26 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-9.5 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS,
	KAM_NUMSUBJECT autolearn=ham version=3.3.1 spammy=aiming,
	shrink, investigate, 058
X-HELO: nikam.ms.mff.cuni.cz
Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz)
	(195.113.20.16) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Mon, 16 Sep 2019 16:39:17 +0000
Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202)	id
	7CE65281EDC; Mon, 16 Sep 2019 18:39:14 +0200 (CEST)
Date: Mon, 16 Sep 2019 18:39:14 +0200
From: Jan Hubicka <hubicka@ucw.cz>
To: gcc-patches@gcc.gnu.org
Subject: -O2 inliner returning 1/n: reduce EARLY_INLINING_INSNS for O1 and O2
Message-ID: <20190916163914.macczibmd54rjqri@kam.mff.cuni.cz>
MIME-Version: 1.0
Content-Disposition: inline
User-Agent: NeoMutt/20170113 (1.7.2)

Hi,
as discussed on Cauldron this week I plan to push out changes enabling
-finline-functions at -O2 with limited parameters aiming to overal
better performance without large code size increases.

Currently we do inline agressively functions declared inline, we inline
when function size is expected to shrink and we also do limited
auto-inlining in early inliner for non-inline functions even if code
grows.  This is handled by PARAM_EARLY_INLINING_INSNS.

This patch tunes it down or -O2 in order to get some room for real
IPA inliner to do its work.
Combined efect of my chages are in
https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_elf_detail_stats=on&min_percentage_change=0.001&revisions=ddee20190fa78935338bc3161c1b29b8528d82dd%2C9b247ee17d1030b88462531225cc842251507bb6

This involves further forking inline-insns-auto, inline-insns-single and
big-speedup params.

Generally I was able to mostly improve SPEC 2006 and 2017 scores as
follows:

O2 Kabylake
SPEC/SPEC2006/INT/total 		0.58% 	
SPEC/SPEC2006/FP/total 			0.19% 	
SPEC/SPEC2017/FP/total 		 	0.45% 	
SPEC/SPEC2017/INT/total 	 	0.18% 	

O2 LTO Kabylake
SPEC/SPEC2006/INT/total 	 	1.08% 	
SPEC/SPEC2006/FP/total 		 	0.60% 	

O2 Zen
SPEC/SPEC2006/INT/total 	 	1.64% 	
SPEC/SPEC2006/FP/total 		 	0.23% 	
SPEC/SPEC2017/INT/total 	 	-0.58% 	
SPEC/SPEC2017/FP/total 		 	0.52% 	

O2 Zen LTO
SPEC/SPEC2006/FP/total 	 	1.40% 	
SPEC/SPEC2006/INT/total	 	1.26% 	
SPEC/SPEC2017/INT/total 	0.93% 	
SPEC/SPEC2017/FP/total 		-0.22% 	

The SPEC2017 FP on Zen is affected by 10% regression on CactusBSSN that
seems to be due to microarchitectural behaviour depending on code layout
rather than any inlining changes in hot parts of program.  Other notable
regression is omnetpp that shows on Zen only too.  Comparing Zen and
Kaby result it seems that only consistent loser id gcc (3%) a xalancbmk
(2.8%) both with non-LTO only. I plan to investigate those if regression
persists even though it is bit small and there is no obvious problem in
the backtrace.

Code size improves by 0.67% or SPEC2006 non-LTO and regresses by 1.64% with LTO
For 2017 it is 2.2% improvement and 2.4% regression respectively.

The difference between LTO and non-LTO is mostly due to fact that LTO
units tends to hit overall unit growth cap of inlining since there are
too many inline candidates. For this reason the patch is not as
effective on Firefox and other realy big packages as I would like.  I
still plan number of changes to inliner this stage1 so this is not final
situation, but I think it is better to do the change early so it gets
tested on other architectures. (And it was concensus of the Caudlron
discussion by my understanding)

This patch is not enabling -finline-functions so it will temporarily
regress perofrmance (and improve code size). i am doing this in
incremental steps to get more data on both inliners.

Bootstrapped/regtested x86_64-linux, plan to commit it later today.

Honza

	* ipa-inline.c (want_early_inline_function_p): Use
	PARAM_EARLY_INLINING_INSNS_O2.
	* params.def (PARAM_EARLY_INLINING_INSNS_O2): New.
	(PARAM_EARLY_INLINING_INSNS): Update documentation.
	* invoke.texi (early-inlining-insns-O2): New.
	(early-inlining-insns): Update documentation.

Index: ipa-inline.c
===================================================================
--- ipa-inline.c	(revision 275716)
+++ ipa-inline.c	(working copy)
@@ -641,6 +641,10 @@ want_early_inline_function_p (struct cgr
     {
       int growth = estimate_edge_growth (e);
       int n;
+      int early_inlining_insns = opt_for_fn (e->caller->decl, optimize) >= 3
+				 ? PARAM_VALUE (PARAM_EARLY_INLINING_INSNS)
+				 : PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_O2);
+
 
       if (growth <= PARAM_VALUE (PARAM_MAX_INLINE_INSNS_SIZE))
 	;
@@ -654,26 +658,28 @@ want_early_inline_function_p (struct cgr
 			     growth);
 	  want_inline = false;
 	}
-      else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+      else if (growth > early_inlining_insns)
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt,
 			     "  will not early inline: %C->%C, "
-			     "growth %i exceeds --param early-inlining-insns\n",
-			     e->caller, callee,
-			     growth);
+			     "growth %i exceeds --param early-inlining-insns%s\n",
+			     e->caller, callee, growth,
+			     opt_for_fn (e->caller->decl, optimize) >= 3
+			     ? "" : "-O2");
 	  want_inline = false;
 	}
       else if ((n = num_calls (callee)) != 0
-	       && growth * (n + 1) > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+	       && growth * (n + 1) > early_inlining_insns)
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, e->call_stmt,
 			     "  will not early inline: %C->%C, "
-			     "growth %i exceeds --param early-inlining-insns "
+			     "growth %i exceeds --param early-inlining-insns%s "
 			     "divided by number of calls\n",
-			     e->caller, callee,
-			     growth);
+			     e->caller, callee, growth,
+			     opt_for_fn (e->caller->decl, optimize) >= 3
+			     ? "" : "-O2");
 	  want_inline = false;
 	}
     }
Index: params.def
===================================================================
--- params.def	(revision 275716)
+++ params.def	(working copy)
@@ -233,8 +233,12 @@ DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
 	 10, 0, 0)
 DEFPARAM(PARAM_EARLY_INLINING_INSNS,
 	 "early-inlining-insns",
-	 "Maximal estimated growth of function body caused by early inlining of single call.",
+	 "Maximal estimated growth of function body caused by early inlining of single call with -O3 and -Ofast.",
 	 14, 0, 0)
+DEFPARAM(PARAM_EARLY_INLINING_INSNS_O2,
+	 "early-inlining-insns-O2",
+	 "Maximal estimated growth of function body caused by early inlining of single call with -O1 and -O2.",
+	 6, 0, 0)
 DEFPARAM(PARAM_LARGE_STACK_FRAME,
 	 "large-stack-frame",
 	 "The size of stack frame to be considered large.",
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 275716)
+++ doc/invoke.texi	(working copy)
@@ -11290,9 +11290,17 @@ recursion depth can be guessed from the
 via a given call expression.  This parameter limits inlining only to call
 expressions whose probability exceeds the given threshold (in percents).
 
+@item early-inlining-insns-O2
+Specify growth that the early inliner can make.  In effect it increases
+the amount of inlining for code having a large abstraction penalty.
+This is applied to functions compiled with @option{-O1} or @option{-O2}
+optimization levels.
+
 @item early-inlining-insns
 Specify growth that the early inliner can make.  In effect it increases
 the amount of inlining for code having a large abstraction penalty.
+This is applied to functions compiled with @option{-O3} or @option{-Ofast}
+optimization levels.
 
 @item max-early-inliner-iterations
 Limit of iterations of the early inliner.  This basically bounds