From patchwork Tue Apr 18 10:53:15 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bin Cheng X-Patchwork-Id: 751790 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3w6hnT1KXcz9s78 for ; Tue, 18 Apr 2017 20:53:33 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="MDdOuFtS"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; q=dns; s=default; b=KWfN9Jso51XlUr6Eu6i1lctw6xJGnvP589lT7qSx8xXyel/X80 RF/8olzmKI7fbIAblKiGxCuLGWueYJXCJRSVZ4StQKDN506QQek/tXQwW1SXOAym o32brPU7CgdQvU5I2554woDoceX/VvKrjv9FTzEXXNPo3ZUTBNCljy42s= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type:mime-version; s= default; bh=HdumX/TJoEzDRgUuqwc5GibxsdQ=; b=MDdOuFtSvHyQM9NIRLB/ poae73ppQ/H3hUTCh/5RsWt5X245wb4HbNKQzoblpjStPvzlyq1SUl6UKtl67As5 B2qKJK2ze0cy2ZO9W/BDA28/lm2KhFby7ZCfw8zyyCE5fLR+HPKwEwFgWzI8Us+V EQovglewqAy5337fHAaHtlk= Received: (qmail 31998 invoked by alias); 18 Apr 2017 10:53:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 31976 invoked by uid 89); 18 Apr 2017 10:53:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=Whether, shares X-HELO: EUR03-VE1-obe.outbound.protection.outlook.com Received: from mail-eopbgr50042.outbound.protection.outlook.com (HELO EUR03-VE1-obe.outbound.protection.outlook.com) (40.107.5.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 18 Apr 2017 10:53:17 +0000 Received: from VI1PR0802MB2176.eurprd08.prod.outlook.com (10.172.12.21) by VI1PR0802MB2173.eurprd08.prod.outlook.com (10.172.12.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1034.10; Tue, 18 Apr 2017 10:53:15 +0000 Received: from VI1PR0802MB2176.eurprd08.prod.outlook.com ([10.172.12.21]) by VI1PR0802MB2176.eurprd08.prod.outlook.com ([10.172.12.21]) with mapi id 15.01.1034.013; Tue, 18 Apr 2017 10:53:15 +0000 From: Bin Cheng To: "gcc-patches@gcc.gnu.org" CC: nd Subject: [PATCH GCC8][29/33]New register pressure estimation Date: Tue, 18 Apr 2017 10:53:15 +0000 Message-ID: authentication-results: arm.com; dkim=none (message not signed) header.d=none; arm.com; dmarc=none action=none header.from=arm.com; x-microsoft-exchange-diagnostics: 1; VI1PR0802MB2173; 7:JfAfNQxePoZk5Vaay5WjLOp0+Mtbg3VdtAXOgKzxqEOVOhKFWxaaEs08KGSMVvGP6bkiSEVmy0YrfXnFO9YRVNoSiPVf+KZn2dYebl/WcXWOhZvSEXk0GVkV+1sRlcGjEWK99JvTQGIZuCXmpXj43GYrQPHzGbDTd6sHApkwYTG2eF2xgXYqsVMtrh122XOlDyNqJKSfgF7NjMNUCRwj/xCq98/mnq8QiOuIVHSngQ5Hp++SmVlGPA/WJhhPeqaLcEkNiRK0kEzHbulFflW+FTiNX+gCMd4a4grqP94EzTGT/bq0WG9EJ8P5asU3wr9+viReT0H7849NNXsyCMAuRQ== x-ms-office365-filtering-correlation-id: bd11c390-2d73-44a6-98ff-08d486491df9 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(2017030254075)(48565401081)(201703131423075)(201703031133081)(201702281549075); SRVR:VI1PR0802MB2173; nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(102415395)(6040450)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(93006095)(93001095)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(20161123564025)(20161123555025)(20161123562025)(20161123560025)(6072148); SRVR:VI1PR0802MB2173; BCL:0; PCL:0; RULEID:; SRVR:VI1PR0802MB2173; x-forefront-prvs: 028166BF91 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(39410400002)(39450400003)(39840400002)(39860400002)(39400400002)(39850400002)(5640700003)(9686003)(2906002)(3846002)(33656002)(54356999)(3280700002)(305945005)(7736002)(50986999)(38730400002)(55016002)(6116002)(99286003)(102836003)(8936002)(81166006)(122556002)(66066001)(74316002)(86362001)(53936002)(4326008)(6436002)(110136004)(189998001)(99936001)(6506006)(8676002)(2351001)(3660700001)(25786009)(5660300001)(2900100001)(2501003)(6916009)(7696004)(77096006); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR0802MB2173; H:VI1PR0802MB2176.eurprd08.prod.outlook.com; FPR:; SPF:None; MLV:sfv; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Apr 2017 10:53:15.8132 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0802MB2173 X-IsSubscribed: yes Hi, Currently IVOPTs shares the same register pressure computation with RTL loop invariant pass, which doesn't work very well. This patch introduces specific interface for IVOPTs. The general idea is described in the cover message as below: C) Current implementation shares the same register pressure computation with RTL loop inv pass. It has difficulty in handling (especially large) loop nest, and quite often generating too many candidates (especially for outer loops). This change introduces new register pressure computation. The brief idea is to differentiate (hot) innermost loop and outer loop. for (possibly hot) inner most, more registers are allowed as long as the register pressure is within the range of number of target available registers. It can also help to restrict number of candidates for outer loop. Is it OK? Thanks, bin 2017-04-11 Bin Cheng * tree-ssa-loop-ivopts.c (struct ivopts_data): New field. (ivopts_estimate_reg_pressure): New reg_pressure model function. (ivopts_global_cost_for_size): Delete. (determine_set_costs, iv_ca_recount_cost): Call new model function ivopts_estimate_reg_pressure. (determine_hot_innermost_loop): New. (tree_ssa_iv_optimize_loop): Call above function. From 2b6f11666a86f740a7f813eca26905ce15691d5e Mon Sep 17 00:00:00 2001 From: Bin Cheng Date: Fri, 10 Mar 2017 11:03:16 +0000 Subject: [PATCH 29/33] ivopt-reg_pressure-model-20170223.txt --- gcc/tree-ssa-loop-ivopts.c | 82 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 72 insertions(+), 10 deletions(-) diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index db8254c..464f96e 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -589,6 +589,9 @@ struct ivopts_data /* Whether the loop body can only be exited via single exit. */ bool loop_single_exit_p; + + /* The current loop is the innermost loop and maybe hot. */ + bool hot_innermost_loop_p; }; /* An assignment of iv candidates to uses. */ @@ -5537,17 +5540,51 @@ determine_iv_costs (struct ivopts_data *data) fprintf (dump_file, "\n"); } -/* Calculates cost for having N_REGS registers. This number includes - induction variables, invariant variables and invariant expressions. */ +/* Estimate register pressure for loop having N_INVS invariants and N_CANDS + induction variables. Note N_INVS includes both invariant variables and + invariant expressions. */ static unsigned -ivopts_global_cost_for_size (struct ivopts_data *data, unsigned n_regs) +ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs, + unsigned n_cands) { - unsigned cost = estimate_reg_pressure_cost (n_regs, - data->regs_used, data->speed, - data->body_includes_call); - /* Add n_regs to the cost, so that we prefer eliminating ivs if possible. */ - return n_regs + cost; + unsigned cost; + unsigned n_old = data->regs_used, n_new = n_invs + n_cands; + unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs; + bool speed = data->speed, hot_p = data->hot_innermost_loop_p; + + /* If there is a call in the loop body, the call-clobbered registers + are not available for loop invariants. */ + if (data->body_includes_call) + available_regs = available_regs - target_clobbered_regs; + + /* If we have enough registers. */ + if (regs_needed + target_res_regs < available_regs) + { + /* For the maybe hot innermost loop, we use available registers and + not restrict the transformations unnecessarily. For other loops, + we want to use fewer register. */ + cost = hot_p ? 0 : target_reg_cost [speed] * n_new; //regs_needed; + } + /* If close to running out of registers, try to preserve them. */ + else if (regs_needed <= available_regs) + cost = target_reg_cost [speed] * regs_needed; + /* If we run out of available registers but the number of candidates + does not, we penalize extra registers using target_spill_cost. */ + else if (n_cands <= available_regs) + cost = target_reg_cost [speed] * available_regs + + target_spill_cost [speed] * (regs_needed - available_regs); + /* If the number of candidates runs out available registers, we penalize + extra candidate registers using target_spill_cost * 2. Because it is + more expensive to spill induction variable than invariant. */ + else + cost = target_reg_cost [speed] * available_regs + + target_spill_cost [speed] * (n_cands - available_regs) * 2 + + target_spill_cost [speed] * (regs_needed - n_cands); + + /* Finally, add the number of candidates, so that we prefer eliminating + induction variables if possible. */ + return cost + n_cands; } /* For each size of the induction variable set determine the penalty. */ @@ -5607,7 +5644,7 @@ determine_set_costs (struct ivopts_data *data) fprintf (dump_file, " ivs\tcost\n"); for (j = 0; j <= 2 * target_avail_regs; j++) fprintf (dump_file, " %d\t%d\n", j, - ivopts_global_cost_for_size (data, j)); + ivopts_estimate_reg_pressure (data, 0, j)); fprintf (dump_file, "\n"); } } @@ -5666,7 +5703,7 @@ iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs) comp_cost cost = ivs->cand_use_cost; cost += ivs->cand_cost; - cost += ivopts_global_cost_for_size (data, ivs->n_invs + ivs->n_cands); + cost += ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands); ivs->cost = cost; } @@ -7367,6 +7404,30 @@ loop_body_includes_call (basic_block *body, unsigned num_nodes) return false; } +/* Determine if current loop is the innermost loop and maybe hot. */ + +static void +determine_hot_innermost_loop (struct ivopts_data *data) +{ + data->hot_innermost_loop_p = true; + if (!data->speed) + return; + + struct loop *loop = data->current_loop; + if (loop->inner != NULL) + { + data->hot_innermost_loop_p = false; + return; + } + + HOST_WIDE_INT niter = avg_loop_niter (loop); + if (niter < PARAM_VALUE (PARAM_AVG_LOOP_NITER) + || loop_constraint_set_p (loop, LOOP_C_PROLOG) + || loop_constraint_set_p (loop, LOOP_C_EPILOG) + || loop_constraint_set_p (loop, LOOP_C_VERSION)) + data->hot_innermost_loop_p = false; +} + /* Optimizes the LOOP. Returns true if anything changed. */ static bool @@ -7381,6 +7442,7 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, struct loop *loop) data->current_loop = loop; data->loop_loc = find_loop_location (loop); data->speed = optimize_loop_for_speed_p (loop); + determine_hot_innermost_loop (data); if (dump_file && (dump_flags & TDF_DETAILS)) {