From patchwork Sun Aug 6 22:56:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 1817573 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=vRBXpzq/; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RJvw723bXz1yYl for ; Mon, 7 Aug 2023 08:56:54 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 29CAF3858284 for ; Sun, 6 Aug 2023 22:56:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 29CAF3858284 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1691362612; bh=stuyxitQZLKOgTWUKvFUXPcP1Pq3zIDyrNI5Q4qc7ec=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=vRBXpzq/JstXuGbDUx38/oDw+IYWPhcbeW3MJzndWMbmbuIXecPMqvS2jvNIoLQl/ 00O+oIG6MA+6jVWceRj8q9xKoFX/af7J5eVRZamRO5cAyfPqQe0IUmmZrPRDM8tBVN jwHnwC7Tkl9HuNhzXh0tSzOtZhdGco+mbbiZe3gI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 675D63858C2D for ; Sun, 6 Aug 2023 22:56:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 675D63858C2D Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 0E1DD2812F9; Mon, 7 Aug 2023 00:56:27 +0200 (CEST) Date: Mon, 7 Aug 2023 00:56:26 +0200 To: gcc-patches@gcc.gnu.org Subject: Fix profile update after versioning ifconverted loop Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, JMQ_SPF_NEUTRAL, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jan Hubicka via Gcc-patches From: Jan Hubicka Reply-To: Jan Hubicka Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, If loop is ifconverted and later versioning by vectorizer, vectorizer will reuse the scalar loop produced by ifconvert. Curiously enough it does not seem to do so for versions produced by loop distribution while for loop distribution this matters (since since both ldist versions survive to final code) while after ifcvt it does not (since we remove non-vectorized path). This patch fixes associated profile update. Here it is necessary to scale both arms of the conditional according to runtime checks inserted. We got partly right the loop body, but not the preheader block and block after exit. The first is particularly bad since it changes loop iterations estimates. So we now turn 4 original loops: loop 1: iterations by profile: 473.497707 (reliable) entry count:84821 (precise, freq 0.9979) loop 2: iterations by profile: 100.000000 (reliable) entry count:39848881 (precise, freq 468.8104) loop 3: iterations by profile: 100.000000 (reliable) entry count:39848881 (precise, freq 468.8104) loop 4: iterations by profile: 100.999596 (reliable) entry count:84167 (precise, freq 0.9902) Into following loops iterations by profile: 5.312499 (unreliable, maybe flat) entry count:12742188 (guessed, freq 149.9081) vectorized and split loop 1, peeled iterations by profile: 0.009496 (unreliable, maybe flat) entry count:374798 (guessed, freq 4.4094) split loop 1 (last iteration), peeled iterations by profile: 100.000008 (unreliable) entry count:3945039 (guessed, freq 46.4122) scalar version of loop 1 iterations by profile: 100.000007 (unreliable) entry count:7101070 (guessed, freq 83.5420) redundant scalar version of loop 1 which we could eliminate if vectorizer understood ldist iterations by profile: 100.000000 (unreliable) entry count:35505353 (guessed, freq 417.7100) unvectorized loop 2 iterations by profile: 5.312500 (unreliable) entry count:25563855 (guessed, freq 300.7512) vectorized loop 2, not peeled (hits max-peel-insns) iterations by profile: 100.000007 (unreliable) entry count:7101070 (guessed, freq 83.5420) unvectorized loop 3 iterations by profile: 5.312500 (unreliable) entry count:25563855 (guessed, freq 300.7512) vectorized loop 3, not peeled (hits max-peel-insns) iterations by profile: 473.497707 (reliable) entry count:84821 (precise, freq 0.9979) loop 1 iterations by profile: 100.999596 (reliable) entry count:84167 (precise, freq 0.9902) loop 4 With this change we are on 0 profile erros on hmmer benchmark: Pass dump id |dynamic mismatch |overall | |in count |size |time | 172t ch_vect | 0 | 996 | 385812023346 | 173t ifcvt | 71010686 +71010686| 1021 +2.5%| 468361969416 +21.4%| 174t vect | 210830784 +139820098| 1497 +46.6%| 216073467874 -53.9%| 175t dce | 210830784 | 1387 -7.3%| 205273170281 -5.0%| 176t pcom | 210830784 | 1387 | 201722634966 -1.7%| 177t cunroll | 0 -210830784| 1443 +4.0%| 180441501289 -10.5%| 182t ivopts | 0 | 1385 -4.0%| 136412345683 -24.4%| 183t lim | 0 | 1389 +0.3%| 135093950836 -1.0%| 192t reassoc | 0 | 1381 -0.6%| 134778347700 -0.2%| 193t slsr | 0 | 1380 -0.1%| 134738100330 -0.0%| 195t tracer | 0 | 1521 +10.2%| 134738179146 +0.0%| 196t fre | 2680654 +2680654| 1489 -2.1%| 134659672725 -0.1%| 198t dom | 5361308 +2680654| 1473 -1.1%| 134449553658 -0.2%| 201t vrp | 5361308 | 1474 +0.1%| 134489004050 +0.0%| 202t ccp | 5361308 | 1472 -0.1%| 134440752274 -0.0%| 204t dse | 5361308 | 1444 -1.9%| 133802300525 -0.5%| 206t forwprop| 5361308 | 1433 -0.8%| 133542828370 -0.2%| 207t sink | 5361308 | 1431 -0.1%| 133542658728 -0.0%| 211t store-me| 5361308 | 1430 -0.1%| 133542573728 -0.0%| 212t cddce | 5361308 | 1428 -0.1%| 133541776728 -0.0%| 258r expand | 5361308 |----------------|--------------------| 260r into_cfg| 5361308 | 9334 -0.8%| 885820707913 -0.6%| 261r jump | 5361308 | 9330 -0.0%| 885820367913 -0.0%| 265r fwprop1 | 5361308 | 9206 -1.3%| 876756504385 -1.0%| 267r rtl pre | 5361308 | 9210 +0.0%| 876914305953 +0.0%| 269r cprop | 5361308 | 9202 -0.1%| 876756165101 -0.0%| 271r cse_loca| 5361308 | 9198 -0.0%| 876727760821 -0.0%| 272r ce1 | 5361308 | 9126 -0.8%| 875726815885 -0.1%| 276r loop2_in| 5361308 | 9167 +0.4%| 873573110570 -0.2%| 282r cprop | 5361308 | 9095 -0.8%| 871937317262 -0.2%| 284r cse2 | 5361308 | 9091 -0.0%| 871936977978 -0.0%| 285r dse1 | 5361308 | 9067 -0.3%| 871437031602 -0.1%| 290r combine | 5361308 | 9071 +0.0%| 869206278202 -0.3%| 292r stv | 5361308 | 17157 +89.1%| 2111071925708+142.9%| 295r bbpart | 5361308 | 17161 +0.0%| 2111071925708 | 296r outof_cf| 5361308 | 17233 +0.4%| 2111655121000 +0.0%| 297r split1 | 5361308 | 17245 +0.1%| 2111656138852 +0.0%| 306r ira | 5361308 | 19189 +11.3%| 2136098398308 +1.2%| 307r reload | 5361308 | 12101 -36.9%| 981091222830 -54.1%| 309r postrelo| 5361308 | 12019 -0.7%| 978750345475 -0.2%| 310r gcse2 | 5361308 | 12027 +0.1%| 978329108320 -0.0%| 311r split2 | 5361308 | 12023 -0.0%| 978507631352 +0.0%| 312r ree | 5361308 | 12027 +0.0%| 978505414244 -0.0%| 313r cmpelim | 5361308 | 11979 -0.4%| 977531601988 -0.1%| 314r pro_and_| 5361308 | 12091 +0.9%| 977541801988 +0.0%| 315r dse2 | 5361308 | 12091 | 977541801988 | 316r csa | 5361308 | 12087 -0.0%| 977541461988 -0.0%| 317r jump2 | 5361308 | 12039 -0.4%| 977683176572 +0.0%| 318r compgoto| 5361308 | 12039 | 977683176572 | 320r peephole| 5361308 | 12047 +0.1%| 977362727612 -0.0%| 321r ce3 | 5361308 | 12047 | 977362727612 | 323r cprop_ha| 5361308 | 11907 -1.2%| 968751076676 -0.9%| 324r rtl_dce | 5361308 | 11903 -0.0%| 968593274820 -0.0%| 325r bbro | 5361308 | 11883 -0.2%| 967964046644 -0.1%| Bootstrapped/regtested x86_64-linux, plan to commit it tomorrow if there are no complains. gcc/ChangeLog: PR tree-optimization/106293 * tree-vect-loop-manip.cc (vect_loop_versioning): Fix profile update. * tree-vect-loop.cc (vect_transform_loop): Likewise. gcc/testsuite/ChangeLog: PR tree-optimization/106293 * gcc.dg/vect/vect-cond-11.c: Check profile consistency. * gcc.dg/vect/vect-widen-mult-extern-1.c: Check profile consistency. diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-11.c b/gcc/testsuite/gcc.dg/vect/vect-cond-11.c index 38f1f8f5090..b1347bddbd5 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-cond-11.c +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-11.c @@ -1,3 +1,4 @@ +/* { dg-additional-options "-fdump-tree-optimized-details-blocks" } */ #include "tree-vect.h" #define N 1024 @@ -116,3 +117,4 @@ main () return 0; } +/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-extern-1.c b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-extern-1.c index 2ac3be0c242..2df3c8f1cf1 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-widen-mult-extern-1.c +++ b/gcc/testsuite/gcc.dg/vect/vect-widen-mult-extern-1.c @@ -1,3 +1,4 @@ +/* { dg-additional-options "-fdump-tree-optimized-details-blocks" } */ /* { dg-do compile } */ #define N 1024 @@ -13,3 +14,4 @@ f (unsigned int *x1, unsigned int *x2, unsigned short *y, unsigned char z) x2[i] += 1; } } +/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */ diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 0e7e223f22a..09641901ff1 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -3933,9 +3933,15 @@ vect_loop_versioning (loop_vec_info loop_vinfo, te->probability = prob; fe->probability = prob.invert (); /* We can scale loops counts immediately but have to postpone - scaling the scalar loop because we re-use it during peeling. */ - scale_loop_frequencies (loop_to_version, te->probability); - LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo) = fe->probability; + scaling the scalar loop because we re-use it during peeling. + + Ifcvt duplicates loop preheader, loop body and produces an basic + block after loop exit. We need to scale all that. */ + basic_block preheader = loop_preheader_edge (loop_to_version)->src; + preheader->count = preheader->count.apply_probability (prob * prob2); + scale_loop_frequencies (loop_to_version, prob * prob2); + single_exit (loop_to_version)->dest->count = preheader->count; + LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo) = (prob * prob2).invert (); nloop = scalar_loop; if (dump_enabled_p ()) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index a83952aff60..00058c3c13e 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -11318,8 +11318,19 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) &advance); if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo) && LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo).initialized_p ()) - scale_loop_frequencies (LOOP_VINFO_SCALAR_LOOP (loop_vinfo), - LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo)); + { + /* Ifcvt duplicates loop preheader, loop body and produces an basic + block after loop exit. We need to scale all that. */ + basic_block preheader + = loop_preheader_edge (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))->src; + preheader->count + = preheader->count.apply_probability + (LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo)); + scale_loop_frequencies (LOOP_VINFO_SCALAR_LOOP (loop_vinfo), + LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo)); + single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo))->dest->count + = preheader->count; + } if (niters_vector == NULL_TREE) {