From patchwork Mon Aug 30 12:03:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1522218 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=kLQtUnGM; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Gyprw1brvz9sWX for ; Mon, 30 Aug 2021 22:04:16 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0A63A385782A for ; Mon, 30 Aug 2021 12:04:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0A63A385782A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1630325054; bh=CwBL2eSIGOhmENslBDPZkAfNFwoQYh+LDBMb7/0Qtp8=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=kLQtUnGMdsb05nbx4ZIejry+xfq0oUyYgmQ5obfYnCjfxq7JwaweQBvLwDdp3G3TA LgbYasi0CKkIspWbvNik/UBhydgg2Df4f5KdC23TdWuxa+NU+5JM8FFDMiJhaGBZHF 3MihP6hl6fOnOpOikJ/pARecfvAU9nQazyS8uP9A= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id D799F385AC2D for ; Mon, 30 Aug 2021 12:03:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D799F385AC2D Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 0E601220A2 for ; Mon, 30 Aug 2021 12:03:44 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id EE63113A6F for ; Mon, 30 Aug 2021 12:03:43 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 05/xOB/JLGGxCAAAMHmgww (envelope-from ) for ; Mon, 30 Aug 2021 12:03:43 +0000 Date: Mon, 30 Aug 2021 14:03:43 +0200 (CEST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/102128 - rework if-converted BB vect heuristic Message-ID: <1qpn2q3r-68pp-8325-24q9-6882ooo926r0@fhfr.qr> MIME-Version: 1.0 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This reworks the previous attempt to avoid leaving around if-converted scalar code in BB vectorized loop bodies to keep costing independent subgraphs which should address the observed regression with 519.lbm_r. For this to work we now first cost all subgraphs and only after doing that proceed to emit vectorized code. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-08-30 Richard Biener PR tree-optimization/102128 * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Move scanning for if-converted scalar code to the caller and instead delay clearing the visited flag for profitable subgraphs. (vect_slp_region): Cost all subgraphs before scheduling. For if-converted BB vectorization scan for scalar COND_EXPRs and do not vectorize if any found and the cost model is very-cheap. --- gcc/tree-vect-slp.c | 112 +++++++++++++++++++++++--------------------- 1 file changed, 58 insertions(+), 54 deletions(-) diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 4d688c7a267..4ca24408249 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -5275,34 +5275,6 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo, vector_costs.safe_splice (instance->cost_vec); instance->cost_vec.release (); } - /* When we're vectorizing an if-converted loop body with the - very-cheap cost model make sure we vectorized all if-converted - code. */ - bool force_not_profitable = false; - if (orig_loop && flag_vect_cost_model == VECT_COST_MODEL_VERY_CHEAP) - { - gcc_assert (bb_vinfo->bbs.length () == 1); - for (gimple_stmt_iterator gsi = gsi_start_bb (bb_vinfo->bbs[0]); - !gsi_end_p (gsi); gsi_next (&gsi)) - { - /* The costing above left us with DCEable vectorized scalar - stmts having the visited flag set. */ - if (gimple_visited_p (gsi_stmt (gsi))) - continue; - - if (gassign *ass = dyn_cast (gsi_stmt (gsi))) - if (gimple_assign_rhs_code (ass) == COND_EXPR) - { - force_not_profitable = true; - break; - } - } - } - - /* Unset visited flag. */ - stmt_info_for_cost *cost; - FOR_EACH_VEC_ELT (scalar_costs, i, cost) - gimple_set_visited (cost->stmt_info->stmt, false); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "Cost model analysis: \n"); @@ -5319,6 +5291,7 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo, li_scalar_costs (scalar_costs.length ()); auto_vec > li_vector_costs (vector_costs.length ()); + stmt_info_for_cost *cost; FOR_EACH_VEC_ELT (scalar_costs, i, cost) { unsigned l = gimple_bb (cost->stmt_info->stmt)->loop_father->num; @@ -5341,6 +5314,7 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo, /* Now cost the portions individually. */ unsigned vi = 0; unsigned si = 0; + bool profitable = true; while (si < li_scalar_costs.length () && vi < li_vector_costs.length ()) { @@ -5407,30 +5381,29 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo, example). */ if (vec_outside_cost + vec_inside_cost > scalar_cost) { - scalar_costs.release (); - vector_costs.release (); - return false; + profitable = false; + break; } } - if (vi < li_vector_costs.length ()) + if (profitable && vi < li_vector_costs.length ()) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "Excess vector cost for part in loop %d:\n", li_vector_costs[vi].first); - scalar_costs.release (); - vector_costs.release (); - return false; + profitable = false; } - if (dump_enabled_p () && force_not_profitable) - dump_printf_loc (MSG_NOTE, vect_location, - "not profitable because of unprofitable if-converted " - "scalar code\n"); + /* Unset visited flag. This is delayed when the subgraph is profitable + and we process the loop for remaining unvectorized if-converted code. */ + if (orig_loop && !profitable) + FOR_EACH_VEC_ELT (scalar_costs, i, cost) + gimple_set_visited (cost->stmt_info->stmt, false); scalar_costs.release (); vector_costs.release (); - return !force_not_profitable; + + return profitable; } /* qsort comparator for lane defs. */ @@ -5884,9 +5857,8 @@ vect_slp_region (vec bbs, vec datarefs, bb_vinfo->shared->check_datarefs (); - unsigned i; - slp_instance instance; - FOR_EACH_VEC_ELT (BB_VINFO_SLP_INSTANCES (bb_vinfo), i, instance) + auto_vec profitable_subgraphs; + for (slp_instance instance : BB_VINFO_SLP_INSTANCES (bb_vinfo)) { if (instance->subgraph_entries.is_empty ()) continue; @@ -5894,9 +5866,7 @@ vect_slp_region (vec bbs, vec datarefs, vect_location = instance->location (); if (!unlimited_cost_model (NULL) && !vect_bb_vectorization_profitable_p - (bb_vinfo, - orig_loop ? BB_VINFO_SLP_INSTANCES (bb_vinfo) - : instance->subgraph_entries, orig_loop)) + (bb_vinfo, instance->subgraph_entries, orig_loop)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5908,15 +5878,54 @@ vect_slp_region (vec bbs, vec datarefs, if (!dbg_cnt (vect_slp)) continue; + profitable_subgraphs.safe_push (instance); + } + + /* When we're vectorizing an if-converted loop body with the + very-cheap cost model make sure we vectorized all if-converted + code. */ + if (!profitable_subgraphs.is_empty () + && orig_loop) + { + gcc_assert (bb_vinfo->bbs.length () == 1); + for (gimple_stmt_iterator gsi = gsi_start_bb (bb_vinfo->bbs[0]); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + /* The costing above left us with DCEable vectorized scalar + stmts having the visited flag set on profitable + subgraphs. Do the delayed clearing of the flag here. */ + if (gimple_visited_p (gsi_stmt (gsi))) + { + gimple_set_visited (gsi_stmt (gsi), false); + continue; + } + if (flag_vect_cost_model != VECT_COST_MODEL_VERY_CHEAP) + continue; + + if (gassign *ass = dyn_cast (gsi_stmt (gsi))) + if (gimple_assign_rhs_code (ass) == COND_EXPR) + { + if (!profitable_subgraphs.is_empty () + && dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "not profitable because of " + "unprofitable if-converted scalar " + "code\n"); + profitable_subgraphs.truncate (0); + } + } + } + + /* Finally schedule the profitable subgraphs. */ + for (slp_instance instance : profitable_subgraphs) + { if (!vectorized && dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "Basic block will be vectorized " "using SLP\n"); vectorized = true; - vect_schedule_slp (bb_vinfo, - orig_loop ? BB_VINFO_SLP_INSTANCES (bb_vinfo) - : instance->subgraph_entries); + vect_schedule_slp (bb_vinfo, instance->subgraph_entries); unsigned HOST_WIDE_INT bytes; if (dump_enabled_p ()) @@ -5931,11 +5940,6 @@ vect_slp_region (vec bbs, vec datarefs, "basic block part vectorized using " "variable length vectors\n"); } - - /* When we're called from loop vectorization we're considering - all subgraphs at once. */ - if (orig_loop) - break; } } else