From patchwork Wed Aug 2 07:28:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 1815819 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Vju/Il80; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RG3W72K0Rz1yYC for ; Wed, 2 Aug 2023 17:28:51 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2AC1B385828E for ; Wed, 2 Aug 2023 07:28:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2AC1B385828E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1690961329; bh=JdH7jBzXwJs3xE3YHDozRfCsKRPwzziQtFRcqKH12VY=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=Vju/Il808QO5xmc6+KvWhfNND1X3OPEvF2OzvgaAd7HbobFoCIl9qEiA9ykM2j0xf ir+hSTPazBnncEj887PR0odesEH1Gu8PII6LK0T3FnVeS+kb9zvzX6TKKtbTEGxccs AgIMz+pblzqXu1iWP4GzzioeDiws7V5L/AQ4uWHU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id A7E413858D39 for ; Wed, 2 Aug 2023 07:28:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A7E413858D39 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 5165E2828CA; Wed, 2 Aug 2023 09:28:26 +0200 (CEST) Date: Wed, 2 Aug 2023 09:28:26 +0200 To: gcc-patches@gcc.gnu.org Subject: Fix profile update after cancelled loop distribution Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, JMQ_SPF_NEUTRAL, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jan Hubicka via Gcc-patches From: Jan Hubicka Reply-To: Jan Hubicka Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, Loop distribution and ifcvt introduces verisons of loops which may be removed later if vectorization fails. Ifcvt does this by temporarily breaking profile and producing conditional that has two arms with 100% probability because we know one of the versions will be removed. Loop distribution is trickier, since it introduces test for alignment that either survives to final code if vecotorization suceeds or is turned if it fails. Here we need to assign some reasonable probabilities for the case vectorization goes well, so this code adds logic to scale profile back in case we remove the call. This is not perfect since we drop precise BB counts to guessed. It is not big deal since we do not use much reliablity of bb counts after this point. Other option would be to apply scale only if vectorization succeeds which however needs bit more work at tree-loop-distribution side and would need all code in this patch with small change that fold_loop_internal_call will have to know how to adjust if conditional stays. I decided to go for easier solution for now. Bootstrapped/regtested x86_64-linux, committed. gcc/ChangeLog: * cfg.cc (scale_strictly_dominated_blocks): New function. * cfg.h (scale_strictly_dominated_blocks): Declare. * tree-cfg.cc (fold_loop_internal_call): Fixup CFG profile. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr98308.c: Check that profile is consistent. diff --git a/gcc/cfg.cc b/gcc/cfg.cc index 0de6d6b9e71..9eb9916f61a 100644 --- a/gcc/cfg.cc +++ b/gcc/cfg.cc @@ -1195,3 +1195,27 @@ get_loop_copy (class loop *loop) else return NULL; } + +/* Scales the frequencies of all basic blocks that are strictly + dominated by BB by NUM/DEN. */ + +void +scale_strictly_dominated_blocks (basic_block bb, + profile_count num, profile_count den) +{ + basic_block son; + + if (!den.nonzero_p () && !(num == profile_count::zero ())) + return; + auto_vec worklist; + worklist.safe_push (bb); + + while (!worklist.is_empty ()) + for (son = first_dom_son (CDI_DOMINATORS, worklist.pop ()); + son; + son = next_dom_son (CDI_DOMINATORS, son)) + { + son->count = son->count.apply_scale (num, den); + worklist.safe_push (son); + } +} diff --git a/gcc/cfg.h b/gcc/cfg.h index 4bf4263ebfc..a0e944979c8 100644 --- a/gcc/cfg.h +++ b/gcc/cfg.h @@ -127,6 +127,8 @@ extern void set_bb_copy (basic_block, basic_block); extern basic_block get_bb_copy (basic_block); void set_loop_copy (class loop *, class loop *); class loop *get_loop_copy (class loop *); +void scale_strictly_dominated_blocks (basic_block, + profile_count, profile_count); /* Generic RAII class to allocate a bit from storage of integer type T. The allocated bit is accessible as mask with the single bit set diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index c65af8cc800..c158454946c 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -7703,6 +7703,44 @@ fold_loop_internal_call (gimple *g, tree value) FOR_EACH_IMM_USE_ON_STMT (use_p, iter) SET_USE (use_p, value); update_stmt (use_stmt); + /* If we turn conditional to constant, scale profile counts. + We know that the conditional was created by loop distribution + and all basic blocks dominated by the taken edge are part of + the loop distributed. */ + if (gimple_code (use_stmt) == GIMPLE_COND) + { + edge true_edge, false_edge; + extract_true_false_edges_from_block (gimple_bb (use_stmt), + &true_edge, &false_edge); + edge taken_edge = NULL, other_edge = NULL; + if (gimple_cond_true_p (as_a (use_stmt))) + { + taken_edge = true_edge; + other_edge = false_edge; + } + else if (gimple_cond_false_p (as_a (use_stmt))) + { + taken_edge = false_edge; + other_edge = true_edge; + } + if (taken_edge + && !(taken_edge->probability == profile_probability::always ())) + { + profile_count old_count = taken_edge->count (); + profile_count new_count = taken_edge->src->count; + taken_edge->probability = profile_probability::always (); + other_edge->probability = profile_probability::never (); + /* If we have multiple predecessors, we can't use the dominance + test. This should not happen as the guarded code should + start with pre-header. */ + gcc_assert (single_pred_edge (taken_edge->dest)); + taken_edge->dest->count + = taken_edge->dest->count.apply_scale (new_count, + old_count); + scale_strictly_dominated_blocks (taken_edge->dest, + new_count, old_count); + } + } } } diff --git a/gcc/testsuite/gcc.dg/vect/pr98308.c b/gcc/testsuite/gcc.dg/vect/pr98308.c index 7d717b1ee51..aeec9771c55 100644 --- a/gcc/testsuite/gcc.dg/vect/pr98308.c +++ b/gcc/testsuite/gcc.dg/vect/pr98308.c @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-additional-options "-O3" } */ /* { dg-additional-options "-march=skylake-avx512" { target avx512f } } */ +/* { dg-additional-options "-fdump-tree-optimized-details-blocks" } */ extern unsigned long long int arr_86[]; extern unsigned long long int arr_87[][15]; @@ -14,3 +15,4 @@ void test(_Bool a, unsigned short c[][15], unsigned char d[]) arr_87[h][0] = a ? c[h][i] : 0; } } +/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */