From patchwork Wed May 10 15:40:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 1779598 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=jSI9VaSn; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QGfPD0jnTz20fl for ; Thu, 11 May 2023 01:40:32 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EF1083853578 for ; Wed, 10 May 2023 15:40:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EF1083853578 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1683733230; bh=3h5gRucfFv1hrfXAJtUW2Bew6lmjEPFBebOuvgJN0uU=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=jSI9VaSngGlZhqmqm08rX4t6Y5hXLcUNf+i8gVbMMaMx/Hw0/yTkHwwTRI95G4pFB lOg4YELaIT3uTW8ukjVBwW/HdRQxfG1vzidg26VjZo7lDZhyHXB7ajcwSTAPtQG+Dg l9zL9PVPrWQb3DEGN+jNazD9SPMdukoJKEgvpa+0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 9BD263854151 for ; Wed, 10 May 2023 15:40:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9BD263854151 Received: from pps.filterd (m0353727.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34AFaq1H010604; Wed, 10 May 2023 15:40:07 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qgbgkwg8j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 10 May 2023 15:40:07 +0000 Received: from m0353727.ppops.net (m0353727.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 34AFbJxN014838; Wed, 10 May 2023 15:40:06 GMT Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qgbgkwg67-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 10 May 2023 15:40:06 +0000 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 34AFWjPH031565; Wed, 10 May 2023 15:40:04 GMT Received: from smtprelay03.wdc07v.mail.ibm.com ([9.208.129.113]) by ppma04wdc.us.ibm.com (PPS) with ESMTPS id 3qf7dkhpux-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 10 May 2023 15:40:04 +0000 Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228]) by smtprelay03.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 34AFe2S75833340 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 10 May 2023 15:40:02 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 61BF05805B; Wed, 10 May 2023 15:40:02 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AF82958068; Wed, 10 May 2023 15:40:01 +0000 (GMT) Received: from toto.the-meissners.org (unknown [9.160.59.115]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Wed, 10 May 2023 15:40:01 +0000 (GMT) Date: Wed, 10 May 2023 11:40:00 -0400 To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion. Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: YWLKpf5cdxsZV9vLXPQBLxYf621x4dd0 X-Proofpoint-ORIG-GUID: PkCUGe3dl7WcSE0kfxB0zZXPH1KXBcca X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-05-10_04,2023-05-05_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 phishscore=0 impostorscore=0 bulkscore=0 spamscore=0 mlxscore=0 priorityscore=1501 mlxlogscore=999 adultscore=0 clxscore=1015 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305100126 X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Meissner via Gcc-patches From: Michael Meissner Reply-To: Michael Meissner Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch applies stricter predicates and constraints for LD and LWA instructions with power10 fusion. These instructions are DS-form instructions, which means that the bottom 2 bits of the address must be 0. In the past, we did not use the stricter predicates and constraints, and if the user used the -fstack-protector option, it would generate a non-prefixed load instruction whose offset was too big if the stack is large. This patch has been tested on: * Little endian power9 with both IEEE and IBM long double * Little endian power10 * Big endian power8 using both 32-bit and 64-bit code generation. Can I check this into the master branch? Assuming I can check this in, I will also commit to the active GCC branches after a burn-in period. 2023-05-10 Michael Meissner gcc/ PR target/105325 * config/rs6000/genfusion.pl (print_ld_cmpi_p10): Use "YZ" constraints for DS-form loads. Set the sign_extend attribute for loads that do sign extension. Use the lwa_operand predicate for the LWA instruction. * config/rs6000/fusion.md: Regenerate. gcc/testsuite/ PR target/105325 * g++.target/powerpc/pr105325.C: New test. * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts. --- gcc/config/rs6000/fusion.md | 17 +++++++----- gcc/config/rs6000/genfusion.pl | 20 +++++++++++--- gcc/testsuite/g++.target/powerpc/pr105325.C | 26 +++++++++++++++++++ .../gcc.target/powerpc/fusion-p10-ldcmpi.c | 4 +-- 4 files changed, 54 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 81ba4b33940..836dbd20948 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -129,6 +129,12 @@ sub print_ld_cmpi_p10 print " \"\"\n"; print " [(set_attr \"type\" \"fused_load_cmpi\")\n"; print " (set_attr \"cost\" \"8\")\n"; + + if ($extend eq "sign") + { + print " (set_attr \"sign_extend\" \"yes\")\n"; + } + print " (set_attr \"length\" \"8\")])\n"; print "\n"; } @@ -147,9 +153,9 @@ sub gen_ld_cmpi_p10 "HI" => "lhz", "QI" => "lbz"); - # Memory predicate to use. + # Memory predicate to use. For LWA, use the special LWA_OPERAND. my %signed_memory_predicate = ("DI" => "ds_form_mem_operand", - "SI" => "ds_form_mem_operand", + "SI" => "lwa_operand", "HI" => "non_update_memory_operand"); my %unsigned_memory_predicate = ("DI" => "ds_form_mem_operand", @@ -161,6 +167,10 @@ sub gen_ld_cmpi_p10 my %np = ("ds" => "NON_PREFIXED_DS", "d" => "NON_PREFIXED_D"); + # Constraint to use. + my %constraint = ("ds" => "YZ", + "d" => "m"); + # Result modes to use. Clobber is used when you are comparing the load to # -1/0/1, but you are not using it otherwise. EXTDI does not exist. We # cannot directly use HI/QI results because we only have word and double word @@ -189,7 +199,8 @@ sub gen_ld_cmpi_p10 print_ld_cmpi_p10 ($lmode, $result, "CC", "", "const_m1_to_1_operand", $extend, - $signed_load{$lmode}, $np{$mem_format}, "m", + $signed_load{$lmode}, $np{$mem_format}, + $constraint{$mem_format}, $signed_memory_predicate{$lmode}); } @@ -204,7 +215,8 @@ sub gen_ld_cmpi_p10 print_ld_cmpi_p10 ($lmode, $result, "CCUNS", "l", "const_0_to_1_operand", $extend, - $unsigned_load{$lmode}, $np{$mem_format}, "m", + $unsigned_load{$lmode}, $np{$mem_format}, + $constraint{$mem_format}, $unsigned_memory_predicate{$lmode}); } } diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index d45fb138a70..da9953d9ad9 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -22,7 +22,7 @@ ;; load mode is DI result mode is clobber compare mode is CC extend is none (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ") (match_operand:DI 3 "const_m1_to_1_operand" "n"))) (clobber (match_scratch:DI 0 "=r"))] "(TARGET_P10_FUSION)" @@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none" ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none" [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") - (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ") (match_operand:DI 3 "const_0_to_1_operand" "n"))) (clobber (match_scratch:DI 0 "=r"))] "(TARGET_P10_FUSION)" @@ -64,7 +64,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none" ;; load mode is DI result mode is DI compare mode is CC extend is none (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ") (match_operand:DI 3 "const_m1_to_1_operand" "n"))) (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))] "(TARGET_P10_FUSION)" @@ -85,7 +85,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none" ;; load mode is DI result mode is DI compare mode is CCUNS extend is none (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none" [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x") - (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m") + (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ") (match_operand:DI 3 "const_0_to_1_operand" "n"))) (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))] "(TARGET_P10_FUSION)" @@ -106,7 +106,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none" ;; load mode is SI result mode is clobber compare mode is CC extend is none (define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:SI 1 "lwa_operand" "YZ") (match_operand:SI 3 "const_m1_to_1_operand" "n"))) (clobber (match_scratch:SI 0 "=r"))] "(TARGET_P10_FUSION)" @@ -148,7 +148,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none" ;; load mode is SI result mode is SI compare mode is CC extend is none (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:SI 1 "lwa_operand" "YZ") (match_operand:SI 3 "const_m1_to_1_operand" "n"))) (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))] "(TARGET_P10_FUSION)" @@ -190,7 +190,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none" ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign" [(set (match_operand:CC 2 "cc_reg_operand" "=x") - (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m") + (compare:CC (match_operand:SI 1 "lwa_operand" "YZ") (match_operand:SI 3 "const_m1_to_1_operand" "n"))) (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))] "(TARGET_P10_FUSION)" @@ -205,6 +205,7 @@ (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign" "" [(set_attr "type" "fused_load_cmpi") (set_attr "cost" "8") + (set_attr "sign_extend" "yes") (set_attr "length" "8")]) ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 @@ -247,6 +248,7 @@ (define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign" "" [(set_attr "type" "fused_load_cmpi") (set_attr "cost" "8") + (set_attr "sign_extend" "yes") (set_attr "length" "8")]) ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 @@ -289,6 +291,7 @@ (define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign" "" [(set_attr "type" "fused_load_cmpi") (set_attr "cost" "8") + (set_attr "sign_extend" "yes") (set_attr "length" "8")]) ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10 diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C new file mode 100644 index 00000000000..d0e66a0b897 --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/pr105325.C @@ -0,0 +1,26 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-require-effective-target powerpc_prefixed_addr } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */ + +/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair + instead of PLWZ/CMPWI. Ultimately the code was dying because the fusion + load + compare -1/0/1 patterns did not handle the possibility that the load + might be prefixed. The -fstack-protector option is needed to show the + bug. */ + +struct Ath__array1D { + int _current; + int getCnt() { return _current; } +}; +struct extMeasure { + int _mapTable[10000]; + Ath__array1D _metRCTable; +}; +void measureRC() { + extMeasure m; + for (; m._metRCTable.getCnt();) + for (;;) + ; +} diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c index 526a026d874..ca7297375a4 100644 --- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c +++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c @@ -61,7 +61,7 @@ TEST(int8_t) /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign" 16 { target lp64 } } } */ /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero" 4 { target lp64 } } } */ /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign" 0 { target lp64 } } } */ -/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none" 4 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none" 8 { target lp64 } } } */ /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero" 0 { target lp64 } } } */ /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none" 2 { target lp64 } } } */ @@ -73,6 +73,6 @@ TEST(int8_t) /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign" 8 { target ilp32 } } } */ /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero" 2 { target ilp32 } } } */ /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign" 0 { target ilp32 } } } */ -/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none" 9 { target ilp32 } } } */ +/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none" 16 { target ilp32 } } } */ /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero" 0 { target ilp32 } } } */ /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none" 6 { target ilp32 } } } */