From patchwork Wed Jun 9 05:18:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1489700 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=uODyOcBq; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4G0Flx3rFmz9sRN for ; Wed, 9 Jun 2021 15:19:39 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DF1663988415 for ; Wed, 9 Jun 2021 05:19:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DF1663988415 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1623215977; bh=mtKIiUj2KmrhQmFRWqY8DXV8V5ptIgh99lwjFtjxaQI=; h=Subject:To:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=uODyOcBqc++lVbtv7ygvo72s50QvqlPadSAOJ8978yXxMxN/5sGT7hEEBtmyas+bv +HJteWoNajgCOYyA0t/a/Rx9y5W8ePxCtmnXQPNZpk9DGaNTZZpNo/I10yE0z0HPNJ djxiYOw9fmqLZXB/ua33A5VzKBgA6PuzXZ7gfnA0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id BCC9F386FC2B for ; Wed, 9 Jun 2021 05:18:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BCC9F386FC2B Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15953JuG019099; Wed, 9 Jun 2021 01:18:49 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 392pvg0tf9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 09 Jun 2021 01:18:49 -0400 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 15954Yo2027338; Wed, 9 Jun 2021 01:18:49 -0400 Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 392pvg0teu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 09 Jun 2021 01:18:48 -0400 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1595EuGp019658; Wed, 9 Jun 2021 05:18:46 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma04fra.de.ibm.com with ESMTP id 3900w8929w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 09 Jun 2021 05:18:46 +0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1595IiHi31326712 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 9 Jun 2021 05:18:44 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 38CAE4C052; Wed, 9 Jun 2021 05:18:44 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8635D4C05C; Wed, 9 Jun 2021 05:18:42 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.85]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 9 Jun 2021 05:18:42 +0000 (GMT) Subject: [RFC/PATCH] ira: Consider matching constraints with param [PR100328] To: GCC Patches Message-ID: Date: Wed, 9 Jun 2021 13:18:41 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: Swv6lQ8m-IFTlmwHhkgaMiMVLlRGD6TX X-Proofpoint-ORIG-GUID: zBKYmeeLN_k9OekuHks32w9zK5s8trFN X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-06-09_01:2021-06-04, 2021-06-09 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 lowpriorityscore=0 impostorscore=0 clxscore=1015 bulkscore=0 suspectscore=0 spamscore=0 priorityscore=1501 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2106090012 X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Richard Sandiford , Bill Schmidt , Segher Boessenkool Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, PR100328 has some details about this issue, I am trying to brief it here. In the hottest function LBM_performStreamCollideTRT of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions (27 FMA, 19 FMS, 11 FNMA). On rs6000, this kind of FMA style insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg class have 64 registers whose foregoing 32 ones make up the whole FLOAT_REG. There are some differences for these two flavors, taking "*fma4_fpr" as example: (define_insn "*fma4_fpr" [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa,wa") (fma:SFDF (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa") (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0") (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))] // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG. // (f/d) => A floating point register, aka. FLOAT_REG. So for VSX_REG, we only have the destructive form, when VSX_REG alternative being used, the operand 2 or operand 3 is required to be the same as operand 0. reload has to take care of this constraint and create some non-free register copies if required. Assuming one fma insn looks like: op0 = FMA (op1, op2, op3) The best regclass of them are VSX_REG, when op1,op2,op3 are all dead, IRA simply creates three shuffle copies for them (here the operand order matters, since with the same freq, the one with smaller number takes preference), but IMO both op2 and op3 should take higher priority in copy queue due to the matching constraint. I noticed that there is one function ira_get_dup_out_num, which meant to create this kind of constraint copy, but the below code looks to refuse to create if there is an alternative which has valid regclass without spilled need. default: { enum constraint_num cn = lookup_constraint (str); enum reg_class cl = reg_class_for_constraint (cn); if (cl != NO_REGS && !targetm.class_likely_spilled_p (cl)) goto fail ... I cooked one patch attached to make ira respect this kind of matching constraint guarded with one parameter. As I stated in the PR, I was not sure this is on the right track. The RFC patch is to check the matching constraint in all alternatives, if there is one alternative with matching constraint and matches the current preferred regclass (or best of allocno?), it will record the output operand number and further create one constraint copy for it. Normally it can get the priority against shuffle copies and the matching constraint will get satisfied with higher possibility, reload doesn't create extra copies to meet the matching constraint or the desirable register class when it has to. For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay as shuffle copies, and later any of A,B,C,D gets assigned by one hardware register which is a VSX register (VSX_REG) but not a FP register (FLOAT_REG), which means it has to pay costs once we can NOT go with VSX alternatives, so at that time it's important to respect the matching constraint then we can increase the freq for the remaining copies related to this (A/B, A/C, A/D). This idea requires some side tables to record some information and seems a bit complicated in the current framework, so the proposed patch aggressively emphasizes the matching constraint at the time of creating copies. Any comments are highly appreciated! BR, Kewen --- gcc/config/rs6000/rs6000.c | 3 ++ gcc/ira.c | 69 ++++++++++++++++++++++++++++++++++---- gcc/params.opt | 4 +++ 3 files changed, 70 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 5ae40d6f4ce..eb9c4284f91 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4852,6 +4852,9 @@ rs6000_option_override_internal (bool global_init_p) ap = __builtin_next_arg (0). */ if (DEFAULT_ABI != ABI_V4) targetm.expand_builtin_va_start = NULL; + + SET_OPTION_IF_UNSET (&global_options, &global_options_set, + param_ira_consider_dup_in_all_alts, 1); } rs6000_override_options_after_change (); diff --git a/gcc/ira.c b/gcc/ira.c index b93588d8a9f..beebee7499b 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -1937,10 +1939,16 @@ ira_get_dup_out_num (int op_num, alternative_mask alts) return -1; str = recog_data.constraints[op_num]; use_commut_op_p = false; + + rtx op = recog_data.operand[op_num]; + int op_no = reg_or_subregno (op); + enum reg_class op_pref_cl = reg_preferred_class (op_no); + machine_mode op_mode = GET_MODE (op); + for (;;) { - rtx op = recog_data.operand[op_num]; - + bool saw_reg_cstr = false; + for (curr_alt = 0, ignore_p = !TEST_BIT (alts, curr_alt), original = -1;;) { @@ -1963,9 +1971,25 @@ ira_get_dup_out_num (int op_num, alternative_mask alts) { enum constraint_num cn = lookup_constraint (str); enum reg_class cl = reg_class_for_constraint (cn); - if (cl != NO_REGS - && !targetm.class_likely_spilled_p (cl)) - goto fail; + if (cl != NO_REGS && !targetm.class_likely_spilled_p (cl)) + { + if (param_ira_consider_dup_in_all_alts + && op_pref_cl != NO_REGS) + { + /* If it's free to move from one preferred class to + the one without matching constraint, it doesn't + have to respect this constraint with costs. */ + if (cl != op_pref_cl + && (ira_reg_class_intersect[cl][op_pref_cl] + != NO_REGS) + && (ira_may_move_in_cost[op_mode][op_pref_cl][cl] + == 0)) + goto fail; + saw_reg_cstr = true; + } + else + goto fail; + } if (constraint_satisfied_p (op, cn)) goto fail; break; @@ -1979,7 +2003,40 @@ ira_get_dup_out_num (int op_num, alternative_mask alts) str = end; if (original != -1 && original != n) goto fail; - original = n; + if (param_ira_consider_dup_in_all_alts && saw_reg_cstr) + { + rtx out = recog_data.operand[n]; + if (!REG_P (out) + && (GET_CODE (out) != SUBREG + || !REG_P (SUBREG_REG (out)))) + goto fail; + int out_no = reg_or_subregno (out); + if (out_no >= FIRST_PSEUDO_REGISTER) + { + const char *out_alts = recog_data.constraints[n]; + int tot = curr_alt; + while (tot > 0) + { + if (out_alts[0] == ',') + tot--; + out_alts++; + } + enum reg_class out_cl = NO_REGS; + while (*out_alts != '\0' && *out_alts != ',') + { + enum constraint_num cn + = lookup_constraint (out_alts); + out_cl = reg_class_for_constraint (cn); + if (out_cl != NO_REGS) + break; + } + /* Respect this as it's for preferred rclass. */ + if (out_cl == op_pref_cl) + original = n; + } + } + else + original = n; continue; } } diff --git a/gcc/params.opt b/gcc/params.opt index 7c7aa78992a..7d9d3a5876d 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -326,6 +326,10 @@ Max size of conflict table in MB. Common Joined UInteger Var(param_ira_max_loops_num) Init(100) Param Optimization Max loops number for regional RA. +-param=ira-consider-dup-in-all-alts= +Common Joined UInteger Var(param_ira_consider_dup_in_all_alts) Init(0) IntegerRange(0, 1) Param Optimization +Control ira to continue to find matching constraint (duplicated operand number) even if it has encountered some contraint that has the appropriate register class, it will skip those alternatives whose constraint don't have the same register class as which the operand prefers. + -param=iv-always-prune-cand-set-bound= Common Joined UInteger Var(param_iv_always_prune_cand_set_bound) Init(10) Param Optimization If number of candidates in the set is smaller, we always try to remove unused ivs during its optimization.