From patchwork Wed Aug 14 07:23:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1146786 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-506869-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="YlgSAQK8"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 467h1B62r9z9sML for ; Wed, 14 Aug 2019 17:24:44 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:references:date:mime-version:in-reply-to :content-type:message-id; q=dns; s=default; b=IORAxA2BQ1MuUH002F S9cZU1GePLaGXWeQcFBfKuyOSCycQFnpm+hymZR0mDQI9N07WtzUjoyL7I9Mq3vZ QzXXI6VR2vEdu+9aQ0uPqfqei8nyKbKduinLxQkSZaT8v9lxBvNEVxpAT8O8Mo/s Jl++X+VaNXQDE0OYzXLSdRpww= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:from:to:cc:references:date:mime-version:in-reply-to :content-type:message-id; s=default; bh=4e0CpMvDxEzJLDkdcbAt3DR4 fpM=; b=YlgSAQK8YT01NIVNKKTaCfuiInDqJwll+tZV0C98etrNtj3m/cphj3MS GXo16Pk+LrE7ytjViTASe8nkJ+XKLV9lVE8yC2VWkwu1tb+Ed8P0heQaBJBkLt4o k69xt9KSgrRyfVrmD1h0XyEs9cKeGoD3eqnCWRQd1pVl6BMrg3A= Received: (qmail 54801 invoked by alias); 14 Aug 2019 07:24:04 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 54733 invoked by uid 89); 14 Aug 2019 07:24:03 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-19.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 14 Aug 2019 07:23:59 +0000 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x7E7LhZK116812 for ; Wed, 14 Aug 2019 03:23:57 -0400 Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100]) by mx0a-001b2d01.pphosted.com with ESMTP id 2ucd899jja-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 14 Aug 2019 03:23:56 -0400 Received: from localhost by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 14 Aug 2019 08:23:54 +0100 Received: from b06cxnps4076.portsmouth.uk.ibm.com (9.149.109.198) by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 14 Aug 2019 08:23:52 +0100 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x7E7NpGF6553630 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 14 Aug 2019 07:23:51 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3F77D4C046; Wed, 14 Aug 2019 07:23:51 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DD2AF4C04A; Wed, 14 Aug 2019 07:23:48 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.146.99]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 14 Aug 2019 07:23:48 +0000 (GMT) Subject: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts From: "Kewen.Lin" To: gcc-patches List Cc: "Bin.Cheng" , segher@kernel.crashing.org, Bill Schmidt , Richard Guenther References: <1557803406-123657-1-git-send-email-linkw@linux.ibm.com> <2d897dc2-a01c-5005-6973-aad0c5930aa8@linux.ibm.com> <9d622cb7-2c1f-91bf-a61e-0239aa2ea8bf@linux.ibm.com> Date: Wed, 14 Aug 2019 15:23:47 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: x-cbid: 19081407-0016-0000-0000-0000029EA177 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19081407-0017-0000-0000-000032FEB99C Message-Id: <6d0b4b11-1c33-fbd5-604d-293c01b1c285@linux.ibm.com> X-IsSubscribed: yes Hi! Comparing to the previous versions of implementation mainly based on the existing IV cands but zeroing the related group/use cost, this new one is based on Richard and Segher's suggestion introducing one doloop dedicated IV cand. Some key points are listed below: 1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV cand. 2) Special name "doloop" assigned. 3) Doloop IV cand with form (niter+1, +, -1) 4) For doloop IV cand, no extra one cost like BIV, assign zero cost for step. 5) Support may_be_zero (regressed PR is in this case), the base of doloop IV can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv. 6) Add more expr support in force_expr_to_var_cost for reasonable cost calculation on the IV base with may_be_zero (like COND_EXPR). 7) Set zero cost when using doloop IV cand for doloop use. 8) Add three hooks (should we merge _generic and _address?). *) have_count_reg_decr_p, is to indicate the target has special hardware count register, we shouldn't consider the impact of doloop IV when calculating register pressures. *) doloop_cost_for_generic, is the extra cost when using doloop IV cand for generic type IV use. *) doloop_cost_for_address, is the extra cost when using doloop IV cand for address type IV use. Bootstrapped on powerpc64le-linux-gnu and regression testing passed excepting for one failure on gcc/testsuite/gcc.dg/guality/loop-1.c at -O3 which is tracked by PR89983. Any comments and suggestions are highly appreciated. Thanks! Kewen --------- gcc/ChangeLog 2019-08-14 Kewen Lin PR middle-end/80791 * config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macro. (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. * target.def (have_count_reg_decr_p): New hook. (doloop_cost_for_generic): Likewise. (doloop_cost_for_address): Likewise. * doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): Likewise. (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. * doc/tm.texi: Regenerate. * tree-ssa-loop-ivopts.c (comp_cost::operator+=): Consider infinite cost addend. (record_group): Init doloop_p. (add_candidate_1): Add optional argument doloop, change the handlings accordingly. (add_candidate): Likewise. (add_iv_candidate_for_biv): Update the call to add_candidate. (generic_predict_doloop_p): Update attribute. (force_expr_to_var_cost): Add costing for expressions COND_EXPR/LT_EXPR/ LE_EXPR/GT_EXPR/GE_EXPR/EQ_EXPR/NE_EXPR/UNORDERED_EXPR/ORDERED_EXPR/ UNLT_EXPR/UNLE_EXPR/UNGT_EXPR/UNGE_EXPR/UNEQ_EXPR/LTGT_EXPR/MAX_EXPR/ MIN_EXPR. (determine_group_iv_cost_generic): Update for doloop IV cand. (determine_group_iv_cost_address): Likewise. (determine_group_iv_cost_cond): Likewise. (determine_iv_cost): Likewise. (ivopts_estimate_reg_pressure): Likewise. (cand_value_at): Update argument niter type to struct tree_niter_desc*, consider doloop IV cand and may_be_zero. (may_eliminate_iv): Update the call to cand_value_at, consider doloop IV cand and may_be_zero. (add_iv_candidate_for_doloop): New function. (find_iv_candidates): Call function add_iv_candidate_for_doloop. (determine_set_costs): Update the call to ivopts_estimate_reg_pressure. (iv_ca_recount_cost): Likewise. (iv_ca_new): Init n_doloop_cands. (iv_ca_set_no_cp): Update n_doloop_cands. (iv_ca_set_cp): Likewise. (iv_ca_dump): Dump register cost. (find_doloop_use): Likewise. (tree_ssa_iv_optimize_loop): Call function generic_predict_doloop_p and find_doloop_use. gcc/testsuite/ChangeLog 2019-08-14 Kewen Lin PR middle-end/80791 * gcc.dg/tree-ssa/ivopts-3.c: Adjust for doloop change. * gcc.dg/tree-ssa/ivopts-lt.c: Likewise. * gcc.dg/tree-ssa/pr32044.c: Likewise. diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 6667cd0..5eccbdc 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1912,6 +1912,16 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_PREDICT_DOLOOP_P #define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p +#undef TARGET_HAVE_COUNT_REG_DECR_P +#define TARGET_HAVE_COUNT_REG_DECR_P true + +/* 1000000000 is infinite cost in IVOPTs. */ +#undef TARGET_DOLOOP_COST_FOR_GENERIC +#define TARGET_DOLOOP_COST_FOR_GENERIC 1000000000 + +#undef TARGET_DOLOOP_COST_FOR_ADDRESS +#define TARGET_DOLOOP_COST_FOR_ADDRESS 1000000000 + #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index c2aa4d0..9f3a08a 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11618,6 +11618,29 @@ loops, and will help ivopts to make some decisions. The default version of this hook returns false. @end deftypefn +@deftypevr {Target Hook} bool TARGET_HAVE_COUNT_REG_DECR_P +Return true if the target supports hardware count register for decrement +and branch. This count register can't be used as general register since +moving to/from a general register from/to it is very expensive. +The default value is false. +@end deftypevr + +@deftypevr {Target Hook} int64_t TARGET_DOLOOP_COST_FOR_GENERIC +IVOPTs introduces one doloop dedicated IV candidate, this hook offers + target owner a way to adjust cost when selecting doloop IV candidate for a + generic IV use. At calcuation, this value will be added on normal cost + already calculated by current implementation. +The default value is zero. +@end deftypevr + +@deftypevr {Target Hook} int64_t TARGET_DOLOOP_COST_FOR_ADDRESS +IVOPTs introduces one doloop dedicated IV candidate, this hook offers + target owner a way to adjust cost when selecting doloop IV candidate for an + address IV use. At calcuation, this value will be added on normal cost + already calculated by current implementation. +The default value is zero. +@end deftypevr + @deftypefn {Target Hook} bool TARGET_CAN_USE_DOLOOP_P (const widest_int @var{&iterations}, const widest_int @var{&iterations_max}, unsigned int @var{loop_depth}, bool @var{entered_at_top}) Return true if it is possible to use low-overhead loops (@code{doloop_end} and @code{doloop_begin}) for a particular loop. @var{iterations} gives the diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index b4d57b8..4346773 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7946,6 +7946,12 @@ to by @var{ce_info}. @hook TARGET_PREDICT_DOLOOP_P +@hook TARGET_HAVE_COUNT_REG_DECR_P + +@hook TARGET_DOLOOP_COST_FOR_GENERIC + +@hook TARGET_DOLOOP_COST_FOR_ADDRESS + @hook TARGET_CAN_USE_DOLOOP_P @hook TARGET_INVALID_WITHIN_DOLOOP diff --git a/gcc/target.def b/gcc/target.def index 71b6972..69e2844 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4246,6 +4246,32 @@ The default version of this hook returns false.", bool, (struct loop *loop), default_predict_doloop_p) +DEFHOOKPOD +(have_count_reg_decr_p, + "Return true if the target supports hardware count register for decrement\n\ +and branch. This count register can't be used as general register since\n\ +moving to/from a general register from/to it is very expensive.\n\ +The default value is false.", + bool, false) + +DEFHOOKPOD +(doloop_cost_for_generic, + "IVOPTs introduces one doloop dedicated IV candidate, this hook offers\n\ + target owner a way to adjust cost when selecting doloop IV candidate for a\n\ + generic IV use. At calcuation, this value will be added on normal cost\n\ + already calculated by current implementation.\n\ +The default value is zero.", + int64_t, 0) + +DEFHOOKPOD +(doloop_cost_for_address, + "IVOPTs introduces one doloop dedicated IV candidate, this hook offers\n\ + target owner a way to adjust cost when selecting doloop IV candidate for an\n\ + address IV use. At calcuation, this value will be added on normal cost\n\ + already calculated by current implementation.\n\ +The default value is zero.", + int64_t, 0) + DEFHOOK (can_use_doloop_p, "Return true if it is possible to use low-overhead loops (@code{doloop_end}\n\ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c index 214e6a7..ce4b1d0 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c @@ -10,4 +10,6 @@ int main (void) f2 (); } -/* { dg-final { scan-tree-dump-times "!= 0" 5 "ivopts" } } */ +/* { dg-final { scan-tree-dump-times "!= 0" 5 "ivopts" { target { ! powerpc*-*-* } } } } */ +/* More debug information emitted for doloop on powerpc. */ +/* { dg-final { scan-tree-dump-times "!= 0" 6 "ivopts" { target { powerpc*-*-* } } } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c index 7d5859b..71d7f67 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c @@ -17,6 +17,7 @@ f1 (char *p, uintptr_t i, uintptr_t n) while (i < n); } -/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts" } } */ -/* { dg-final { scan-tree-dump-times "PHI = 45) diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 530ea4a..11852af 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -275,6 +275,9 @@ comp_cost::operator+= (comp_cost cost) comp_cost comp_cost::operator+= (HOST_WIDE_INT c) { + if (c >= INFTY) + this->cost = INFTY; + if (infinite_cost_p ()) return *this; @@ -399,6 +402,8 @@ struct iv_group struct cost_pair *cost_map; /* The selected candidate for the group. */ struct iv_cand *selected; + /* To indicate this is a doloop use group. */ + bool doloop_p; /* Uses in the group. */ vec vuses; }; @@ -439,6 +444,7 @@ struct iv_cand be hoisted out of loop. */ struct iv *orig_iv; /* The original iv if this cand is added from biv with smaller type. */ + bool doloop_p; /* Whether this is a doloop candidate. */ }; /* Hashtable entry for common candidate derived from iv uses. */ @@ -612,6 +618,9 @@ struct ivopts_data /* Whether the loop body can only be exited via single exit. */ bool loop_single_exit_p; + + /* Whether the loop has doloop comparison use. */ + bool doloop_use_p; }; /* An assignment of iv candidates to uses. */ @@ -636,6 +645,9 @@ struct iv_ca /* The number of candidates in the set. */ unsigned n_cands; + /* The number of doloop candidate in the set. */ + unsigned n_doloop_cands; + /* The number of invariants needed, including both invariant variants and invariant expressions. */ unsigned n_invs; @@ -1528,6 +1540,7 @@ record_group (struct ivopts_data *data, enum use_type type) group->type = type; group->related_cands = BITMAP_ALLOC (NULL); group->vuses.create (1); + group->doloop_p = false; data->vgroups.safe_push (group); return group; @@ -3017,9 +3030,9 @@ get_loop_invariant_expr (struct ivopts_data *data, tree inv_expr) replacement of the final value of the iv by a direct computation. */ static struct iv_cand * -add_candidate_1 (struct ivopts_data *data, - tree base, tree step, bool important, enum iv_position pos, - struct iv_use *use, gimple *incremented_at, +add_candidate_1 (struct ivopts_data *data, tree base, tree step, bool important, + enum iv_position pos, struct iv_use *use, + gimple *incremented_at, bool doloop = false, struct iv *orig_iv = NULL) { unsigned i; @@ -3079,11 +3092,15 @@ add_candidate_1 (struct ivopts_data *data, cand->pos = pos; if (pos != IP_ORIGINAL) { - cand->var_before = create_tmp_var_raw (TREE_TYPE (base), "ivtmp"); + if (doloop) + cand->var_before = create_tmp_var_raw (TREE_TYPE (base), "doloop"); + else + cand->var_before = create_tmp_var_raw (TREE_TYPE (base), "ivtmp"); cand->var_after = cand->var_before; } cand->important = important; cand->incremented_at = incremented_at; + cand->doloop_p = doloop; data->vcands.safe_push (cand); if (!poly_int_tree_p (step)) @@ -3116,6 +3133,7 @@ add_candidate_1 (struct ivopts_data *data, } cand->important |= important; + cand->doloop_p |= doloop; /* Relate candidate to the group for which it is added. */ if (use) @@ -3209,16 +3227,17 @@ add_autoinc_candidates (struct ivopts_data *data, tree base, tree step, the end of loop. */ static void -add_candidate (struct ivopts_data *data, - tree base, tree step, bool important, struct iv_use *use, +add_candidate (struct ivopts_data *data, tree base, tree step, bool important, + struct iv_use *use, bool doloop = false, struct iv *orig_iv = NULL) { if (ip_normal_pos (data->current_loop)) - add_candidate_1 (data, base, step, important, - IP_NORMAL, use, NULL, orig_iv); + add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, doloop, + orig_iv); if (ip_end_pos (data->current_loop) && allow_ip_end_pos_p (data->current_loop)) - add_candidate_1 (data, base, step, important, IP_END, use, NULL, orig_iv); + add_candidate_1 (data, base, step, important, IP_END, use, NULL, doloop, + orig_iv); } /* Adds standard iv candidates. */ @@ -3262,7 +3281,7 @@ add_iv_candidate_for_biv (struct ivopts_data *data, struct iv *iv) tree step = fold_convert (sizetype, iv->step); /* Add iv cand of same precision as index part in TARGET_MEM_REF. */ - add_candidate (data, base, step, true, NULL, iv); + add_candidate (data, base, step, true, NULL, false, iv); /* Add iv cand of the original type only if it has nonlinear use. */ if (iv->nonlin_use) add_candidate (data, iv->base, iv->step, true, NULL); @@ -3724,7 +3743,7 @@ prepare_decl_rtl (tree *expr_p, int *ws, void *data) Some RTL specific checks seems unable to be checked in gimple, if any new checks or easy checks _are_ missing here, please add them. */ -static bool ATTRIBUTE_UNUSED +static bool generic_predict_doloop_p (struct ivopts_data *data) { struct loop *loop = data->current_loop; @@ -4177,6 +4196,36 @@ force_expr_to_var_cost (tree expr, bool speed) STRIP_NOPS (op0); op1 = NULL_TREE; break; + /* See add_iv_candidate_for_doloop, for doloop may_be_zero case, we + introduce COND_EXPR for IV base, need to support better cost estimation + for this COND_EXPR and tcc_comparison. */ + case COND_EXPR: + op0 = TREE_OPERAND (expr, 1); + STRIP_NOPS (op0); + op1 = TREE_OPERAND (expr, 2); + STRIP_NOPS (op1); + break; + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: + case UNORDERED_EXPR: + case ORDERED_EXPR: + case UNLT_EXPR: + case UNLE_EXPR: + case UNGT_EXPR: + case UNGE_EXPR: + case UNEQ_EXPR: + case LTGT_EXPR: + case MAX_EXPR: + case MIN_EXPR: + op0 = TREE_OPERAND (expr, 0); + STRIP_NOPS (op0); + op1 = TREE_OPERAND (expr, 1); + STRIP_NOPS (op1); + break; default: /* Just an arbitrary value, FIXME. */ @@ -4258,6 +4307,35 @@ force_expr_to_var_cost (tree expr, bool speed) case RSHIFT_EXPR: cost = comp_cost (add_cost (speed, mode), 0); break; + case COND_EXPR: + op0 = TREE_OPERAND (expr, 0); + STRIP_NOPS (op0); + if (op0 == NULL_TREE || TREE_CODE (op0) == SSA_NAME + || CONSTANT_CLASS_P (op0)) + cost = no_cost; + else + cost = force_expr_to_var_cost (op0, speed); + break; + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: + case UNORDERED_EXPR: + case ORDERED_EXPR: + case UNLT_EXPR: + case UNLE_EXPR: + case UNGT_EXPR: + case UNGE_EXPR: + case UNEQ_EXPR: + case LTGT_EXPR: + case MAX_EXPR: + case MIN_EXPR: + /* Simply use 1.5 * add cost for now, FIXME if there is some more accurate + cost evaluation way. */ + cost = comp_cost (1.5 * add_cost (speed, mode), 0); + break; default: gcc_unreachable (); @@ -4706,8 +4784,12 @@ determine_group_iv_cost_generic (struct ivopts_data *data, if (cand->pos == IP_ORIGINAL && cand->incremented_at == use->stmt) cost = no_cost; else - cost = get_computation_cost (data, use, cand, false, - &inv_vars, NULL, &inv_expr); + { + cost = get_computation_cost (data, use, cand, false, &inv_vars, NULL, + &inv_expr); + if (cand->doloop_p) + cost += targetm.doloop_cost_for_generic; + } if (inv_expr) { @@ -4735,6 +4817,9 @@ determine_group_iv_cost_address (struct ivopts_data *data, cost = get_computation_cost (data, use, cand, true, &inv_vars, &can_autoinc, &inv_expr); + if (cand->doloop_p) + cost += targetm.doloop_cost_for_address; + if (inv_expr) { inv_exprs = BITMAP_ALLOC (NULL); @@ -4783,11 +4868,12 @@ determine_group_iv_cost_address (struct ivopts_data *data, stores it to VAL. */ static void -cand_value_at (struct loop *loop, struct iv_cand *cand, gimple *at, tree niter, - aff_tree *val) +cand_value_at (struct loop *loop, struct iv_cand *cand, gimple *at, + struct tree_niter_desc *desc, aff_tree *val) { aff_tree step, delta, nit; struct iv *iv = cand->iv; + tree niter = desc->niter; tree type = TREE_TYPE (iv->base); tree steptype; if (POINTER_TYPE_P (type)) @@ -4803,7 +4889,15 @@ cand_value_at (struct loop *loop, struct iv_cand *cand, gimple *at, tree niter, if (stmt_after_increment (loop, cand, at)) aff_combination_add (&delta, &step); - tree_to_aff_combination (iv->base, type, val); + tree base = iv->base; + /* See add_iv_candidate_for_doloop, if may_be_zero is set, we want to extract + the value under !may_be_zero to get the compact bound which also well fits + for may_be_zero since we ensure the value for it is const one. */ + if (cand->doloop_p && desc->may_be_zero && !integer_zerop (desc->may_be_zero)) + base = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), + unshare_expr (rewrite_to_non_trapping_overflow (niter)), + build_int_cst (TREE_TYPE (niter), 1)); + tree_to_aff_combination (base, type, val); if (!POINTER_TYPE_P (type)) aff_combination_convert (val, steptype); aff_combination_add (val, &delta); @@ -5142,7 +5236,7 @@ may_eliminate_iv (struct ivopts_data *data, } } - cand_value_at (loop, cand, use->stmt, desc->niter, &bnd); + cand_value_at (loop, cand, use->stmt, desc, &bnd); *bound = fold_convert (TREE_TYPE (cand->iv->base), aff_combination_to_tree (&bnd)); @@ -5159,8 +5253,11 @@ may_eliminate_iv (struct ivopts_data *data, TODO: we could also calculate the value MAY_BE_ZERO ? 0 : NITER and base the exit condition on it. However, that is often too - expensive. */ - if (!integer_zerop (desc->may_be_zero)) + expensive. + + For doloop candidate, we have considered MAY_BE_ZERO for IV base, need to + support MAY_BE_ZERO ? 0 : NITER, so simply bypass this check. */ + if (!integer_zerop (desc->may_be_zero) && !cand->doloop_p) return iv_elimination_compare_lt (data, cand, comp, desc); return true; @@ -5264,6 +5361,9 @@ determine_group_iv_cost_cond (struct ivopts_data *data, inv_vars = inv_vars_elim; inv_vars_elim = NULL; inv_expr = inv_expr_elim; + /* For doloop candidate/use pair, adjust to zero cost. */ + if (group->doloop_p && cand->doloop_p) + cost = no_cost; } else { @@ -5390,6 +5490,42 @@ relate_compare_use_with_all_cands (struct ivopts_data *data) } } +/* Add one doloop dedicated IV candidate: + - Base is (may_be_zero ? 1 : (niter + 1)). + - Step is -1. */ + +static void +add_iv_candidate_for_doloop (struct ivopts_data *data) +{ + tree_niter_desc *niter_desc = niter_for_single_dom_exit (data); + gcc_assert (niter_desc && niter_desc->assumptions); + + tree niter = niter_desc->niter; + tree ntype = TREE_TYPE (niter); + gcc_assert (TREE_CODE (ntype) == INTEGER_TYPE); + + tree may_be_zero = niter_desc->may_be_zero; + if (may_be_zero && integer_zerop (may_be_zero)) + may_be_zero = NULL_TREE; + if (may_be_zero) + { + if (COMPARISON_CLASS_P (may_be_zero)) + { + niter = fold_build3 (COND_EXPR, ntype, may_be_zero, + build_int_cst (ntype, 0), + rewrite_to_non_trapping_overflow (niter)); + } + /* Don't try to obtain the iteration count expression when may_be_zero is + integer_nonzerop (actually iteration count is one) or else. */ + else + return; + } + + tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter), + build_int_cst (ntype, 1)); + add_candidate (data, base, build_int_cst (ntype, -1), true, NULL, true); +} + /* Finds the candidates for the induction variables. */ static void @@ -5398,6 +5534,10 @@ find_iv_candidates (struct ivopts_data *data) /* Add commonly used ivs. */ add_standard_iv_candidates (data); + /* Add doloop dedicate ivs. */ + if (data->doloop_use_p) + add_iv_candidate_for_doloop (data); + /* Add old induction variables. */ add_iv_candidate_for_bivs (data); @@ -5578,16 +5718,21 @@ determine_iv_cost (struct ivopts_data *data, struct iv_cand *cand) or a const set. */ if (cost_base.cost == 0) cost_base.cost = COSTS_N_INSNS (1); - cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base))); - + /* Doloop decrement should be considered as zero cost. */ + if (cand->doloop_p) + cost_step = 0; + else + cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base))); cost = cost_step + adjust_setup_cost (data, cost_base.cost); /* Prefer the original ivs unless we may gain something by replacing it. The reason is to make debugging simpler; so this is not relevant for artificial ivs created by other optimization passes. */ - if (cand->pos != IP_ORIGINAL - || !SSA_NAME_VAR (cand->var_before) - || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before))) + if ((cand->pos != IP_ORIGINAL + || !SSA_NAME_VAR (cand->var_before) + || DECL_ARTIFICIAL (SSA_NAME_VAR (cand->var_before))) + /* Prefer doloop as well. */ + && !cand->doloop_p) cost++; /* Prefer not to insert statements into latch unless there are some @@ -5633,10 +5778,14 @@ determine_iv_costs (struct ivopts_data *data) static unsigned ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs, - unsigned n_cands) + unsigned n_cands, unsigned n_doloop_cands) { unsigned cost; - unsigned n_old = data->regs_used, n_new = n_invs + n_cands; + unsigned n_old = data->regs_used, n_spr_for_doloop = 0; + /* If target supports count register for doloop, it doesn't take GPR. */ + if (targetm.have_count_reg_decr_p) + n_spr_for_doloop = n_doloop_cands; + unsigned n_new = n_invs + n_cands - n_spr_for_doloop; unsigned regs_needed = n_new + n_old, available_regs = target_avail_regs; bool speed = data->speed; @@ -5666,7 +5815,7 @@ ivopts_estimate_reg_pressure (struct ivopts_data *data, unsigned n_invs, /* Finally, add the number of candidates, so that we prefer eliminating induction variables if possible. */ - return cost + n_cands; + return cost + n_cands - n_spr_for_doloop; } /* For each size of the induction variable set determine the penalty. */ @@ -5727,7 +5876,7 @@ determine_set_costs (struct ivopts_data *data) fprintf (dump_file, " ivs\tcost\n"); for (j = 0; j <= 2 * target_avail_regs; j++) fprintf (dump_file, " %d\t%d\n", j, - ivopts_estimate_reg_pressure (data, 0, j)); + ivopts_estimate_reg_pressure (data, 0, j, 0)); fprintf (dump_file, "\n"); } } @@ -5786,7 +5935,8 @@ iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs) comp_cost cost = ivs->cand_use_cost; cost += ivs->cand_cost; - cost += ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands); + cost += ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands, + ivs->n_doloop_cands); ivs->cost = cost; } @@ -5833,6 +5983,8 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs, { bitmap_clear_bit (ivs->cands, cid); ivs->n_cands--; + if (cp->cand->doloop_p) + ivs->n_doloop_cands--; ivs->cand_cost -= cp->cand->cost; iv_ca_set_remove_invs (ivs, cp->cand->inv_vars, ivs->n_inv_var_uses); iv_ca_set_remove_invs (ivs, cp->cand->inv_exprs, ivs->n_inv_expr_uses); @@ -5890,6 +6042,8 @@ iv_ca_set_cp (struct ivopts_data *data, struct iv_ca *ivs, { bitmap_set_bit (ivs->cands, cid); ivs->n_cands++; + if (cp->cand->doloop_p) + ivs->n_doloop_cands++; ivs->cand_cost += cp->cand->cost; iv_ca_set_add_invs (ivs, cp->cand->inv_vars, ivs->n_inv_var_uses); iv_ca_set_add_invs (ivs, cp->cand->inv_exprs, ivs->n_inv_expr_uses); @@ -6100,6 +6254,7 @@ iv_ca_new (struct ivopts_data *data) nw->n_cand_uses = XCNEWVEC (unsigned, data->vcands.length ()); nw->cands = BITMAP_ALLOC (NULL); nw->n_cands = 0; + nw->n_doloop_cands = 0; nw->n_invs = 0; nw->cand_use_cost = no_cost; nw->cand_cost = 0; @@ -6134,6 +6289,9 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs) fprintf (file, " cost: %" PRId64 " (complexity %d)\n", cost.cost, cost.complexity); + fprintf (file, " reg_cost: %d\n", + ivopts_estimate_reg_pressure (data, ivs->n_invs, ivs->n_cands, + ivs->n_doloop_cands)); fprintf (file, " cand_cost: %" PRId64 "\n cand_group_cost: " "%" PRId64 " (complexity %d)\n", ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity); @@ -7568,6 +7726,47 @@ determine_scaling_factor (struct ivopts_data *data, basic_block *body) } } +/* Find doloop comparison use and set its doloop_p on if found. */ + +static bool +find_doloop_use (struct ivopts_data *data) +{ + struct loop *loop = data->current_loop; + + for (unsigned i = 0; i < data->vgroups.length (); i++) + { + struct iv_group *group = data->vgroups[i]; + if (group->type == USE_COMPARE) + { + gcc_assert (group->vuses.length () == 1); + struct iv_use *use = group->vuses[0]; + gimple *stmt = use->stmt; + if (gimple_code (stmt) == GIMPLE_COND) + { + basic_block bb = gimple_bb (stmt); + edge true_edge, false_edge; + extract_true_false_edges_from_block (bb, &true_edge, &false_edge); + /* This comparison is used for loop latch. Require latch is empty + for now. */ + if ((loop->latch == true_edge->dest + || loop->latch == false_edge->dest) + && empty_block_p (loop->latch)) + { + group->doloop_p = true; + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Doloop cmp iv use: "); + print_gimple_stmt (dump_file, stmt, TDF_DETAILS); + } + return true; + } + } + } + } + + return false; +} + /* Optimizes the LOOP. Returns true if anything changed. */ static bool @@ -7580,6 +7779,7 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, struct loop *loop, basic_block *body; gcc_assert (!data->niters); + data->doloop_use_p = false; data->current_loop = loop; data->loop_loc = find_loop_location (loop).get_location_t (); data->speed = optimize_loop_for_speed_p (loop); @@ -7622,6 +7822,22 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, struct loop *loop, /* Determine cost scaling factor for basic blocks in loop. */ determine_scaling_factor (data, body); + if (flag_branch_on_count_reg && generic_predict_doloop_p (data)) + { + if (find_doloop_use (data)) + { + data->doloop_use_p = true; + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "Predict loop %d can perform" + " doloop optimization later.\n", + loop->num); + flow_loop_dump (loop, dump_file, NULL, 1); + } + } + } + /* Finds candidates for the induction variables (item 2). */ find_iv_candidates (data);