From patchwork Wed Jul 5 08:46:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hao Liu OS X-Patchwork-Id: 1803495 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=KTdhuNvA; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QwtZT4N1rz20ZC for ; Wed, 5 Jul 2023 18:47:13 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8559D385771F for ; Wed, 5 Jul 2023 08:47:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8559D385771F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1688546831; bh=48xeb1Qz0YbuAs81PmD8w/UyPu87qttvCPdPiA2LKVo=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=KTdhuNvANM1T/zkBCG5u08rvw7Qw+zelqSJdP3TMfxZmAMuTCNdwhha37EoeIGYEj t6YMhfZXlx8s4c77cMEDjTpaRXr1GeZxNvdPFyU+dNkLBrZjzotG6FXPSe7F9Hzg6H FqqotAIaI5eoP3qiWLavWtNJxn5m1PqZHo4OYG4E= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2113.outbound.protection.outlook.com [40.107.220.113]) by sourceware.org (Postfix) with ESMTPS id 325183858CDA for ; Wed, 5 Jul 2023 08:46:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 325183858CDA ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=I0g6/1if+kMqKRrP4U+dhD9v9mI9ZrCjQbR915Lb1sKmT0HAdDtVjPqCPyp90/gvlCGei/gBgvlbq59y+nEYSG619fyuaYzVcYVw7MfUEbAm7949Or/qwsUaczgLfxz2Jb230LAAjwFgW1Ow5ibrqRxgLTJsBSLJlQbpUjJ7CzapXV8hPwAsmtdB4skuxYPJ79w7FbadYbDEoWkJ8dCwzacIqHQgrl4gKUNFCuPkdXw7G9iBfUUZu3VT9Ncr6mAFlyNFCckwdtOMr9GjYstv2sPvoxcjTq4dt8H6NIqPfcVrFZcLT3v933764j4oeuWGlv8cFm19chs5iM3BhKRIbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=48xeb1Qz0YbuAs81PmD8w/UyPu87qttvCPdPiA2LKVo=; b=K0MVwk1Fsl76cRgv1k+FNqYZvjp6+PGIujU121dCEgcVvbFs8GRltgGWTgngTDJCco12ywJvb6mAc7T09e0/savCzGFvkKr5HUXYxBRRkINfV3LJHP5hxpaCML+43QQTM5igrmH9LmSL3xfv2o1K9eqaGINHPIfX+sMc/W2YaPQ24Ie9NXhVljz3JqpP6WnA5dsUm0HqB1G6qododHoHxQkQlbBLrDSuIQcgJiI3t5OwWe747oitw8q8Zr1UQrT1Nn/4/YSDX7lJ3TiOwhE70Ik8Q4Mu6e7wW/g+9jOBzSQVd5uhTcqDAdVvHqGWb6rx+CSWDWiFcU/Y/WUXmu+vYQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=os.amperecomputing.com; dmarc=pass action=none header.from=os.amperecomputing.com; dkim=pass header.d=os.amperecomputing.com; arc=none Received: from SJ2PR01MB8635.prod.exchangelabs.com (2603:10b6:a03:57b::16) by SA1PR01MB7309.prod.exchangelabs.com (2603:10b6:806:1f4::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.19; Wed, 5 Jul 2023 08:46:28 +0000 Received: from SJ2PR01MB8635.prod.exchangelabs.com ([fe80::79da:7b34:639e:b255]) by SJ2PR01MB8635.prod.exchangelabs.com ([fe80::79da:7b34:639e:b255%6]) with mapi id 15.20.6544.024; Wed, 5 Jul 2023 08:46:27 +0000 To: "GCC-patches@gcc.gnu.org" Subject: [PATCH] Vect: select small VF for epilog of unrolled loop (PR tree-optimization/110474) Thread-Topic: [PATCH] Vect: select small VF for epilog of unrolled loop (PR tree-optimization/110474) Thread-Index: AQHZrxzBO8xX8VCyBkSnC4ebM0aSfg== Date: Wed, 5 Jul 2023 08:46:26 +0000 Message-ID: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Enabled=True; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SiteId=3bc2b170-fd94-476d-b0ce-4229bdc904a7; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_SetDate=2023-07-05T08:46:25.417Z; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Name=Confidential; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_ContentBits=0; MSIP_Label_5b82cb1d-c2e0-4643-920a-bbe7b2d7cc47_Method=Standard; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: SJ2PR01MB8635:EE_|SA1PR01MB7309:EE_ x-ms-office365-filtering-correlation-id: e6ef755b-2963-4eaf-9721-08db7d3451b3 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 4t2xT7gd6bmWSG0ylBtDDoN74OJkws48/6WQtfiLbbji++PlZTI6/CuleDelnkuNT3IN2pj0B4bIlHHLO0ImPoTEsBx7oqAbwweDr4Z0Y4YaWwKBXu6mM9kl1YDceH4q1UIaA/CZ2yvwmXMfDjDiaGj9lWoCpS6RlMVfPGcuECIJwKW3/+EUtfpDzK89V15F7U7rYSq2rIJWZOfs/7e+kVRRuXsaGqd9t9YcVUCTkTCYzlH2TVzfsphj/YbsQ7raH0W9LVrJiWC+g2MGlTM5S3cPbYbylOd51HBDEnmWckT5horO/uqNVonagIG9RV+NjvpTde0jY43IU76n5qSFimadGHLymIWzzoTdV9sUUw3im+Gpq7dekxTf/SC7ePRIVg0Sg2R7FHz9OBHyLcdE+6wEIQUkK+a42VazB5QN1gU0uIFvLThaDA717GUKMOgeklHv6iqe0zKcF5+0j9p9cPJfPx4F6G+MWGNJnzIJH3qBWC33lbcBgQoDcB7fVv/gEq5wH4u7TMpQH2EKNfbes/xBTSS9bpz+Gz4M8H3W1AbyQFj4JJSPcpn6Woi3ouOSHwF70BwKuBAes5YkBTLoT10yU3gdwHDvJ02ONvlnOOE= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SJ2PR01MB8635.prod.exchangelabs.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(366004)(346002)(396003)(136003)(39850400004)(376002)(451199021)(55016003)(83380400001)(38070700005)(2906002)(38100700002)(122000001)(8676002)(5660300002)(8936002)(71200400001)(52536014)(86362001)(84970400001)(6916009)(64756008)(66476007)(66946007)(66556008)(316002)(91956017)(76116006)(478600001)(66446008)(33656002)(186003)(7696005)(41300700001)(6506007)(9686003)(26005); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?1lkdExEcjtaB9uv4Ck42HSg?= =?iso-8859-1?q?Av6DTeBhdCw2SvmvTTq/SsGsQFrxpUrBxUJc+FsVnQQK8xnKGALpytvt+YY8?= =?iso-8859-1?q?Sm3mZCpJNFXU0ZNkfUS6HyNPAr/8/ITdEv+GFp8yg+XbNZi5Z7kB19hwWgWq?= =?iso-8859-1?q?D16rVUHltpv/Q74z2pFsuWoHSUMxiXZt983P96GV0fKUgzMo/uVougQ/4gmI?= =?iso-8859-1?q?Fboi4mudS8cl8grDo3OQxUVGvtMFsxl0/Y7BxJub8Pt5lI7+tC+nP0qJtpC5?= =?iso-8859-1?q?sQXaysQaeFt78T2vxEIOKus0rSPZSpqZ9iAxORYPuOloqGkhIkjU2Jlum39X?= =?iso-8859-1?q?rBICbAUD4jxGziUm205VWC3CLEZDhrUw7OHcX3/qzZkeWvxjJkM9XR5Imeic?= =?iso-8859-1?q?LTyxRZhoflB9kAGhfr5rzKKnUzeVpyaVndl66jTlwMvpOIlFVT/js6TCHImw?= =?iso-8859-1?q?ox3G8wR0ztWup/PCT7ifL8tN6eWFSD+pp5qN5piJMitwup66h8uzdGxGygaE?= =?iso-8859-1?q?NomBL1dz6UAoZCwxSggQ3p+vsCFLBrQCJIw7xY46U5UTVcJgjQE2N5JXkJHS?= =?iso-8859-1?q?z5OdhEPo1iDbI3LTI3Z1gSen4pbxLvZq8O69PEnP80hm0ObJKBhMyVbBAAvr?= =?iso-8859-1?q?+q4/lxSW8hz7G3c/81/fXDpgKnr+Dq2FEopgSebI7PETDPAqiHJ188/A2UpT?= =?iso-8859-1?q?SakzFicY1Hc6/vmavTDNvlVsHe6Y7ieNpKTIkmW8T2HdAf1FYd+i2rHjfZMT?= =?iso-8859-1?q?xPhtrBs8NQzN2GixiuHbYbp7ll2hRIUJbHUwgeal9SyElj8pfXsyGmHTGA3I?= =?iso-8859-1?q?jrR81pOHz8TNZfmhKuLjZV3ZntV3C+5txtdVx14f4iIpIYF1tymTmCJpMABH?= =?iso-8859-1?q?1lfPlsUL4u1ezsyPEKJCUFswz4kqG4uPtEIzFQdmgr/QjVPbyKdf+MP2piIi?= =?iso-8859-1?q?GnIrIOzcvFKIUY08oWiuSthI73SkJmI7smtR+asz19eCUj870hdZksfdOBhl?= =?iso-8859-1?q?ax0aZ1m3t3b9GOCbctsAZTqXhag4L2E+5e6a527z5ID8jGWum4IlH9AtIdzK?= =?iso-8859-1?q?P5ip5akXRPgU74P3lFnrFRVfDqyOrrb2FFR6YFcOJGv7oTBi9LvRb/DM7Ldd?= =?iso-8859-1?q?lyWXZzKzDMHLp6wxFN/CcasORplJ56SKrb0gZb1Qzj01RelE9s8xelhFWrrA?= =?iso-8859-1?q?uHI/dIUGxhjKRC/7B8PGbWGYRCpDzTDDqxRVOtG8pCf+/C/6bGJKZ2JmwvxF?= =?iso-8859-1?q?h9kG6MRWOTYr5xx8660FLeMDbcDTzLUYA8P9r8l2QAgU9jnAPnFQ/1isVEDY?= =?iso-8859-1?q?neakT7/PJ4T7s7fL5j9yH9unXqViT6GUhjIKZG8jyM5USevKrEYmHTW97cSX?= =?iso-8859-1?q?J6QkrLtV+s6zT1s3BsheTLzqUn52bLRTVDhyNsYNkEAmlKAD1XNrLebYtxbt?= =?iso-8859-1?q?KSj89xeIYk1IWkmlIE/ToMSHCxTJnzF0xM3OpX0Gr4hlOCBQi+Ve/gZRFJxO?= =?iso-8859-1?q?sHirBKhBh/riI6iI14nPw880KxIL/v6Fx1Qs465VeYTdghIt+WByYW0ZvkT1?= =?iso-8859-1?q?llharGy7YyNhzUvWtdnFXCiqMG4OyFd/PGWWWzh0aC20FkWkiicGhH617RRD?= =?iso-8859-1?q?hP5/zcusoxhMq9iVfuDv3ffaKpNCR2JAf9l6Tad1xwGmcKTS3i7IW0uDNQKk?= =?iso-8859-1?q?=3D?= MIME-Version: 1.0 X-OriginatorOrg: os.amperecomputing.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SJ2PR01MB8635.prod.exchangelabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: e6ef755b-2963-4eaf-9721-08db7d3451b3 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Jul 2023 08:46:26.5124 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3bc2b170-fd94-476d-b0ce-4229bdc904a7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: A0f9DAdN9xAHWqGlypSlcdPZxoTgjuGk9YQUJlIaXS7HjT2BJz2iMH9zbsGX2C/4Y4YOZQIQt+Sh3Yk4/mcNzbph+mwb7WoGkaFpd4WKMZnrKhZBCiPXTNkeq7tf4VDh X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR01MB7309 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Hao Liu OS via Gcc-patches From: Hao Liu OS Reply-To: Hao Liu OS Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, If a loop is unrolled during vectorization (i.e. suggested_unroll_factor > 1), the VFs of both main and epilog loop are enlarged. The epilog vect loop is specific for a loop with small iteration counts, so a large VF may hurt performance. This patch unscales the main loop VF by suggested_unroll_factor while selecting the epilog loop VF, so that it will be the same as vectorized loop without unrolling (i.e. suggested_unroll_factor = 1). gcc/ChangeLog: PR tree-optimization/110474 * tree-vect-loop.cc (vect_analyze_loop_2): unscale the VF by suggested unroll factor while selecting the epilog vect loop VF. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr110474.c: New testcase. --- gcc/testsuite/gcc.target/aarch64/pr110474.c | 37 +++++++++++++++++++++ gcc/tree-vect-loop.cc | 16 +++++---- 2 files changed, 47 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110474.c diff --git a/gcc/testsuite/gcc.target/aarch64/pr110474.c b/gcc/testsuite/gcc.target/aarch64/pr110474.c new file mode 100644 index 00000000000..e548416162a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr110474.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mtune=neoverse-n2 -mcpu=neoverse-n1 -fdump-tree-vect-details --param aarch64-vect-unroll-limit=2" } */ +/* { dg-final { scan-tree-dump "Choosing vector mode V8HI" "vect" } } */ +/* { dg-final { scan-tree-dump "Choosing epilogue vector mode V8QI" "vect" } } */ + +/* Do not increase the the vector factor of the epilog vectorized loop + for a loop with suggested_unroll_factor > 1. + + before (suggested_unroll_factor=1): + if N >= 16: + main vect loop + if N >= 8: + epilog vect loop + scalar code + + before (suggested_unroll_factor=2): + if N >= 32: + main vect loop + if N >= 16: // May fail to execute vectorized code (e.g. N is 8) + epilog vect loop + scalar code + + after (suggested_unroll_factor=2): + if N >= 32: + main vect loop + if N >= 8: // The same VF as suggested_unroll_factor=1 + epilog vect loop + scalar code */ + +int +foo (short *A, char *B, int N) +{ + int sum = 0; + for (int i = 0; i < N; ++i) + sum += A[i] * B[i]; + return sum; +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 3b46c58a8d8..4d9abd035ea 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -3021,12 +3021,16 @@ start_over: to be able to handle fewer than VF scalars, or needs to have a lower VF than the main loop. */ if (LOOP_VINFO_EPILOGUE_P (loop_vinfo) - && !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) - && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo), - LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo))) - return opt_result::failure_at (vect_location, - "Vectorization factor too high for" - " epilogue loop.\n"); + && !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) + { + poly_uint64 unscaled_vf + = exact_div (LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo), + orig_loop_vinfo->suggested_unroll_factor); + if (maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo), unscaled_vf)) + return opt_result::failure_at (vect_location, + "Vectorization factor too high for" + " epilogue loop.\n"); + } /* Decide whether this loop_vinfo should use partial vectors or peeling, assuming that the loop will be used as a main loop. We will redo