From patchwork Wed Jul 19 04:33:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hao Liu OS X-Patchwork-Id: 1809573 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=sGkiS22G; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4R5NJB2lr6z20Cs for ; Wed, 19 Jul 2023 14:34:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A758E3857737 for ; Wed, 19 Jul 2023 04:34:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A758E3857737 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689741255; bh=2ezgOHoNt/j6i66cXYgxSLt9nwQab2fdEmzcFcPAL7k=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=sGkiS22G9pOiE0x42Dz022863nE0jdFhwSNM6o6Dch/mZmjgfsQDNvGHaeGewm473 ctK6h0hpjP8g2kUTp3AbVvQcdzWpqxljpnhllRNXX4WXOh5nRqiYAAIJi9tD64O+Zv Kd7ldS6TJgbyyNreELsRcCDTZRh9VcE8TUX03X4U= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2093.outbound.protection.outlook.com [40.107.102.93]) by sourceware.org (Postfix) with ESMTPS id 825E73858D32 for ; Wed, 19 Jul 2023 04:33:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 825E73858D32 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DkeJ2knKayDEiubzLgFooG8u8bG7LTQXx/1jWC53tOdZdHRGrx43PfG/D7ra8QQEWt1HkdPCdtteghHGcTgkpLR4KRzpiSk3xffriL2mTPLFa57tILtl2114BpEkZwxa3vyLr7VIfvx6A5fSeKjOHSj6ewnmDBzD1Y8EtmJVvZhVZcMatYJd4BfxO/5DKkL74ZgceRI3n2QCoS++hDA4CesKxG34O76JCiEhiI4lSB4Q886OJtyAP3gEg9HeDy/i4+VKbV/3snoR2b0YoJ/1xXXmL3WrNhmjh+62tox0mQu7V29roXgZbIUURHb+iUaO96Y6+fjkQJ5sgxuLXb0TsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2ezgOHoNt/j6i66cXYgxSLt9nwQab2fdEmzcFcPAL7k=; b=X1CLTrLWZujOJrcUe+pW3QEIBK27723xvyfiECjgF/ThisRTF7TGg5goW5uzaFcuHSJl6jfFfe8tXA1PIz7TtdZ67+SJcmLDs+KGAT5dW6Eqx1amXK0AtcTGRPK3aXMTVd/4A8KmAGUn72ZczjyWcKO1dsIyjzSWXwXV3I/EMVnSHYHbrpMNQYvOF9WGeMX7PEbvpnuYr/kCpw3uwY6WrQU5xsAS27jQ0QHEL0u9uM7U/7/2DOurIhYl8fvACI0FUUIFIOvorXJc9AiJuM1as/s0+Le06cdYhGRoISdhBkqcpfbfoeFtL7pEyFo/ClfGMP69F6aKfWzUoPrM5yiqmw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none Received: from SJ2PR01MB8635.prod.exchangelabs.com (2603:10b6:a03:57b::16) by SA1PR01MB6544.prod.exchangelabs.com (2603:10b6:806:1ab::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24; Wed, 19 Jul 2023 04:33:49 +0000 Received: from SJ2PR01MB8635.prod.exchangelabs.com ([fe80::4973:da2:1b04:e600]) by SJ2PR01MB8635.prod.exchangelabs.com ([fe80::4973:da2:1b04:e600%6]) with mapi id 15.20.6588.031; Wed, 19 Jul 2023 04:33:49 +0000 To: "GCC-patches@gcc.gnu.org" CC: "richard.sandiford@arm.com" Subject: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Thread-Topic: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625] Thread-Index: AQHZuflIfzSQE63nDUGU9bz2B+2yKg== Date: Wed, 19 Jul 2023 04:33:48 +0000 Message-ID: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2023-07-19T04:33:46.692Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SJ2PR01MB8635:EE_|SA1PR01MB6544:EE_ x-ms-office365-filtering-correlation-id: 7c41f010-1a42-4c7f-1b5f-08db88115872 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: bVwTlNvI0FzbXxXkj9y+SvDnqyZ69zI6QoOwRRJo4xz9DDGkxHldsncIyQwRfcDdJWQHRLDz6L7bwghmYb/YVcI4GCjaR3SkhLxaQEJ9F3sMAuEoSKeRrFWuQM1Cs9mzVd32tsXm6ZneVG+q/NQwLacBqciDHffAeSY0uKDDZDa2e87WGGfx0awuIs7TFAMYAUKHnkRVuILEdKHvWbuHauy1AShwIfNR0cW4ufzvHpCXlF/JXDV21goCp+Wj16VY3+ZXvQdxChi2NC/veT/DgEbS7RdbIJaaBp5c5fnmy/QGaUiucL9RjV8pSTfdYqPvYzNHN/uLiG9N9AcgOUtleDVgUwUdcC5dhE1CZKOmaJlqEmyY8vH7MB42LD8K79nNBg6tnEkNWCci4/+I9OjHCh3VS4KHlPW8L9pKDAtz9e6KpT1CuJ+a8xyXivNHTlCeP3+X8DVwT8nyCyQIskC6+6CVpCPpD5lA/OsjRVABd1AbVAbTDKx1sa9oKWjdBseEqjgUAPWTR7I4VFcGgpKBj2EGmxPq/G2T8B9WFrndww80LxKwGOVwRux1c2bpteYhpJ33MbyzPgyIRAdk32Un+/uVxM0qCnohOZmD+ItEkbA= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SJ2PR01MB8635.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(346002)(39850400004)(396003)(136003)(376002)(366004)(451199021)(86362001)(84970400001)(478600001)(83380400001)(186003)(38100700002)(41300700001)(26005)(71200400001)(33656002)(316002)(8936002)(6916009)(8676002)(2906002)(4326008)(122000001)(66556008)(64756008)(9686003)(7696005)(66946007)(52536014)(66446008)(66476007)(6506007)(91956017)(38070700005)(76116006)(55016003)(5660300002)(66899021); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?Ia2ocKYwjM9LtHud3boxYB0?= =?iso-8859-1?q?K+d4ar1CTwRSAd22nGthuVDuXcOHcz7IuMzuKZ9ffNVVdG+tBFwihIILzWBb?= =?iso-8859-1?q?4Z/plI9P+01ONenZJcX3NtwWe5Et2IruS46qHq7C4q7eSrllx9gBBG/0Vdaq?= =?iso-8859-1?q?PC7r8kepyJlwCGdWaC9qF4MQV26/ggQC81zgIe6Wd7zIaksmHlYkzLbC2lzk?= =?iso-8859-1?q?jxqHhsA6oX4qiXjhxmpmnSf/KwzW33ma+1M/pT8a7lC5xvK/lw2InqSgc1xd?= =?iso-8859-1?q?/Cj2wKp8VvQD1phZWVGp8dTOcGnRb198dUB6A4rxgrFnGS1aouc6mFwEkbB+?= =?iso-8859-1?q?OdxVxbHsrpdDkuFGnOD3GKgw3l0OGMhY0BDEfAuL+H1TxbOWWevSWnbYoNnF?= =?iso-8859-1?q?ajElkebHn+FCxI++FLBgbrAtgH4pdh5YvhtJyKi/7yKOepzGvDZAGcU7z9tX?= =?iso-8859-1?q?0hTwhu6dCfGmOEfQKwNuzt6uNDZ5c9+bQKvVpfrP9Nvvs/xN7kTDlRh+8bZB?= =?iso-8859-1?q?9yzx8uqwjQ6Bn93FLCXO3tKWzui4G9evUVXMWBTmTCvKhMclRo9ZWdtx9aIP?= =?iso-8859-1?q?VI62FluEMcjP1KIGu4p0L6A4tmTJZ/jwsRO+/pEgDNf42g7lIGDiSe5TfIgW?= =?iso-8859-1?q?4kh1MGOd9vs3DozeUSLczBRqmQRKaKILWTpPJZMiBAYPtp4US2r4+Hs0nQme?= =?iso-8859-1?q?Wyy+WjItxP57ybXcdpf/7WhnBaIVsqXntLHCB2x6sn61U10MnlBYvjyKoQ2L?= =?iso-8859-1?q?qN9JzPTr4y7bDnkyARhkoow16bzQjghcBfXOQTNkM7pauT9FBuL/ZrTfLrfI?= =?iso-8859-1?q?NpgP8BBZH+/pRW2t4VlyCCMkYdc9MsnBTq1W/8b/waFnx2F8gNxTfja6hlHz?= =?iso-8859-1?q?iH6YaiedoBlNIGEHDk4I3er2MLOaQg/LmygLpfAXTHbRajv9Rpk9uGIp0OAp?= =?iso-8859-1?q?pne//lfwDSThd29YDu/wg8Mx3KUmesqMYK97xe/aaoR5/oKsMHaOraOMB/6x?= =?iso-8859-1?q?sth/WCZGgFNSunE51LcW5keTLl5pHnJY2txFeIUX+28/gGgt6PdRrd7dWdYW?= =?iso-8859-1?q?9N51Uz6NY4zRjyaqmLGEvGVsML95vhg/EmOhS2kw/Lm8L3SgVY0Eb5dhJHTZ?= =?iso-8859-1?q?skn5MJEeWRyYa5o1kv0FrZAoTn/GWllvOam69s+jWcBRtuiZyjqrzLDP6dBq?= =?iso-8859-1?q?VuKNBKPpHTJahVw7livQ+f7PpLyG2ZTDb7logbhTiN8gmJiW6iVirx3haD7d?= =?iso-8859-1?q?Ph6booJ/ARnWsNizUKCsZ5AECUutkFiOv/vGu3J8NLnueJOYndUgx4d6LCnI?= =?iso-8859-1?q?BJ/gmSirlEbZKDy4kkuoz7mKQsoyocMfzntfV6WtCci7KMpIkYH0ta5gM/zk?= =?iso-8859-1?q?xluyAZu+TtkINHdT5nJzJV6/pCOPlE7mN1drwxzsRdrv/6TLyJVBl75IczLs?= =?iso-8859-1?q?+Jvdw0wbevK1oFBdXjFPgk+CVU2PPEzIRMww7J93/Ch48pdIAZTm3bubQCdo?= =?iso-8859-1?q?7GPeuRX7Nf61PHFlhDXtZDWhrLAJJ1ZmDAehdQZPwqolb9cj/T+a3zJCYD3v?= =?iso-8859-1?q?zIQOLB8eLrkGBhxmalQii4JZ2SEPZsR7Qhw10T174NVm58GjnseWXLFw0UZD?= =?iso-8859-1?q?pNUZ8sujJuDdfVRQjUh9O6JERIF5kn551Q5fxxQ=3D=3D?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SJ2PR01MB8635.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7c41f010-1a42-4c7f-1b5f-08db88115872 X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Jul 2023 04:33:48.2333 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Xrr28A3D+pPtXabiVHN+n396ADhEOqKfdKrDeiJhO+BgsHWrSUYIDQgqGGwmbsN2/qjPfF1Hq5WOJGzduhyCfn8zI5EKB2PE2ciyFHLRflfwpIRbv72P8JnklegZL+nY X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR01MB6544 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Hao Liu OS via Gcc-patches From: Hao Liu OS Reply-To: Hao Liu OS Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This only affects the new costs in aarch64 backend. Currently, the reduction latency of vector body is too large as it is multiplied by stmt count. As the scalar reduction latency is small, the new costs model may think "scalar code would issue more quickly" and increase the vector body cost a lot, which will miss vectorization opportunities. Tested by bootstrapping on aarch64-linux-gnu. gcc/ChangeLog: PR target/110625 * config/aarch64/aarch64.cc (count_ops): Remove the '* count' for reduction_latency. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr110625.c: New testcase. --- gcc/config/aarch64/aarch64.cc | 5 +-- gcc/testsuite/gcc.target/aarch64/pr110625.c | 46 +++++++++++++++++++++ 2 files changed, 47 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625.c diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 560e5431636..27afa64b7d5 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -16788,10 +16788,7 @@ aarch64_vector_costs::count_ops (unsigned int count, vect_cost_for_stmt kind, { unsigned int base = aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, m_vec_flags); - - /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately - that's not yet the case. */ - ops->reduction_latency = MAX (ops->reduction_latency, base * count); + ops->reduction_latency = MAX (ops->reduction_latency, base); } /* Assume that multiply-adds will become a single operation. */ diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625.c b/gcc/testsuite/gcc.target/aarch64/pr110625.c new file mode 100644 index 00000000000..0965cac33a0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr110625.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -mcpu=neoverse-n2 -fdump-tree-vect-details -fno-tree-slp-vectorize" } */ +/* { dg-final { scan-tree-dump-not "reduction latency = 8" "vect" } } */ + +/* Do not increase the vector body cost due to the incorrect reduction latency + Original vector body cost = 51 + Scalar issue estimate: + ... + reduction latency = 2 + estimated min cycles per iteration = 2.000000 + estimated cycles per vector iteration (for VF 2) = 4.000000 + Vector issue estimate: + ... + reduction latency = 8 <-- Too large + estimated min cycles per iteration = 8.000000 + Increasing body cost to 102 because scalar code would issue more quickly + ... + missed: cost model: the vector iteration cost = 102 divided by the scalar iteration cost = 44 is greater or equal to the vectorization factor = 2. + missed: not vectorized: vectorization not profitable. */ + +typedef struct +{ + unsigned short m1, m2, m3, m4; +} the_struct_t; +typedef struct +{ + double m1, m2, m3, m4, m5; +} the_struct2_t; + +double +bar (the_struct2_t *); + +double +foo (double *k, unsigned int n, the_struct_t *the_struct) +{ + unsigned int u; + the_struct2_t result; + for (u = 0; u < n; u++, k--) + { + result.m1 += (*k) * the_struct[u].m1; + result.m2 += (*k) * the_struct[u].m2; + result.m3 += (*k) * the_struct[u].m3; + result.m4 += (*k) * the_struct[u].m4; + } + return bar (&result); +}