From patchwork Fri Jun 17 10:53:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1644804 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Yx3HN9E5; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LPbWv0Qfkz9s5V for ; Fri, 17 Jun 2022 20:54:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8AFAF3858292 for ; Fri, 17 Jun 2022 10:54:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8AFAF3858292 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655463255; bh=aBIi/chygPcq6H9tMuBrD7DZz2W3/xm5jFyTQzmMPkg=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Yx3HN9E5R6dWS/d+JrGgsBop3Xt/VWEFPg79fsGfh5Rbok3l+yyfGGJXFCr/lIgIw AAJ0lHQXksiBJEqFaPjXXjDkFidHKQ9CZcnUcnDOd0sVNENUow6fcU8c9Qo1/fnbjY WfMn0610G9wo2ksxfrwMUoyryr87nOBOkBDpGg+s= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 1F2A13858292 for ; Fri, 17 Jun 2022 10:53:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1F2A13858292 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25HAF8Ap008375; Fri, 17 Jun 2022 10:53:50 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3grqpn0r4x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Jun 2022 10:53:50 +0000 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 25HAOFmP006023; Fri, 17 Jun 2022 10:53:49 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3grqpn0r45-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Jun 2022 10:53:49 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 25HAod1u021269; Fri, 17 Jun 2022 10:53:47 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma02fra.de.ibm.com with ESMTP id 3gmjp96r39-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Jun 2022 10:53:47 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 25HAriNs19661206 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Jun 2022 10:53:44 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BA26B52050; Fri, 17 Jun 2022 10:53:44 +0000 (GMT) Received: from [9.200.42.225] (unknown [9.200.42.225]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 28BDC5204E; Fri, 17 Jun 2022 10:53:42 +0000 (GMT) Message-ID: <2d1a0b10-0221-2d4c-4aab-0627dc47b14f@linux.ibm.com> Date: Fri, 17 Jun 2022 18:53:41 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Content-Language: en-US Subject: [PATCH] vect: Respect slp decision when applying suggested uf [PR105940] To: GCC Patches X-TM-AS-GCONF: 00 X-Proofpoint-GUID: NHXPrZVqyzNlvGnWxDICIXxbJmiyRpqh X-Proofpoint-ORIG-GUID: rKB9Po73FuUAQV3x3X3TYlH08ZNcM7PC X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-17_08,2022-06-16_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 priorityscore=1501 mlxscore=0 adultscore=0 spamscore=0 impostorscore=0 clxscore=1011 lowpriorityscore=0 malwarescore=0 mlxlogscore=999 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206170047 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Richard Sandiford Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, This follows Richi's suggestion in PR105940, it aims to avoid inconsistent slp decision between when the suggested unroll factor is worked out and when the suggested unroll factor is applied. If the previous slp decision is true when the suggested unroll factor is worked out, when we are applying unroll factor we don't need to start over with slp off if the analysis with slp on fails. On the other hand, if the previous slp decision is false when the suggested unroll factor is worked out, when we are applying unroll factor we can skip the slp handlings. Function vect_is_simple_reduction saves reduction chains for subsequent slp analyses, we have to disable this early otherwise there is an ICE in vectorizable_reduction for below: if (REDUC_GROUP_FIRST_ELEMENT (stmt_info)) gcc_assert (slp_node && REDUC_GROUP_FIRST_ELEMENT (stmt_info) == stmt_info); Bootstrapped and regtested on x86_64-redhat-linux, powerpc64{,le}-linux-gnu and aarch64-linux-gnu. Also tested with SPEC2017 build with some rs6000 hacking. Is it ok for trunk? BR, Kewen ----- PR tree-optimization/105940 gcc/ChangeLog: * tree-vect-loop.cc (vect_analyze_loop_2): Add new parameter slp_done_for_suggested_uf and adjust with it accordingly. (vect_analyze_loop_1): Add new variable slp_done_for_suggested_uf, pass it down to vect_analyze_loop_2 for the initial analysis and applying suggested unroll factor. (vect_is_simple_reduction): Add parameter slp and adjust with it. (vect_analyze_scalar_cycles_1): Add parameter slp and pass down. (vect_analyze_scalar_cycles): Likewise. --- gcc/tree-vect-loop.cc | 101 ++++++++++++++++++++++++++++-------------- 1 file changed, 67 insertions(+), 34 deletions(-) -- 2.27.0 diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index e05f8e87f7d..ccab68caf9a 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -157,7 +157,7 @@ along with GCC; see the file COPYING3. If not see static void vect_estimate_min_profitable_iters (loop_vec_info, int *, int *, unsigned *); static stmt_vec_info vect_is_simple_reduction (loop_vec_info, stmt_vec_info, - bool *, bool *); + bool *, bool *, bool); /* Subroutine of vect_determine_vf_for_stmt that handles only one statement. VECTYPE_MAYBE_SET_P is true if STMT_VINFO_VECTYPE @@ -463,10 +463,12 @@ vect_inner_phi_in_double_reduction_p (loop_vec_info loop_vinfo, gphi *phi) Examine the cross iteration def-use cycles of scalar variables in LOOP. LOOP_VINFO represents the loop that is now being considered for vectorization (can be LOOP, or an outer-loop - enclosing LOOP). */ + enclosing LOOP). SLP indicates there will be some subsequent + slp analyses or not. */ static void -vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, class loop *loop) +vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, class loop *loop, + bool slp) { basic_block bb = loop->header; tree init, step; @@ -545,7 +547,7 @@ vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, class loop *loop) stmt_vec_info reduc_stmt_info = vect_is_simple_reduction (loop_vinfo, stmt_vinfo, &double_reduc, - &reduc_chain); + &reduc_chain, slp); if (reduc_stmt_info) { STMT_VINFO_REDUC_DEF (stmt_vinfo) = reduc_stmt_info; @@ -616,11 +618,11 @@ vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, class loop *loop) a[i] = i; */ static void -vect_analyze_scalar_cycles (loop_vec_info loop_vinfo) +vect_analyze_scalar_cycles (loop_vec_info loop_vinfo, bool slp) { class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - vect_analyze_scalar_cycles_1 (loop_vinfo, loop); + vect_analyze_scalar_cycles_1 (loop_vinfo, loop, slp); /* When vectorizing an outer-loop, the inner-loop is executed sequentially. Reductions in such inner-loop therefore have different properties than @@ -632,7 +634,7 @@ vect_analyze_scalar_cycles (loop_vec_info loop_vinfo) current checks are too strict. */ if (loop->inner) - vect_analyze_scalar_cycles_1 (loop_vinfo, loop->inner); + vect_analyze_scalar_cycles_1 (loop_vinfo, loop->inner, slp); } /* Transfer group and reduction information from STMT_INFO to its @@ -2223,12 +2225,18 @@ vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo, /* Function vect_analyze_loop_2. - Apply a set of analyses on LOOP, and create a loop_vec_info struct - for it. The different analyses will record information in the - loop_vec_info struct. */ + Apply a set of analyses on LOOP specified by LOOP_VINFO, the different + analyses will record information in some members of LOOP_VINFO. FATAL + indicates if some analysis meets fatal error. If one non-NULL pointer + SUGGESTED_UNROLL_FACTOR is provided, it's intent to be filled with one + worked out suggested unroll factor, while one NULL pointer shows it's + going to apply the suggested unroll factor. SLP_DONE_FOR_SUGGESTED_UF + is to hold the slp decision when the suggested unroll factor is worked + out. */ static opt_result vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal, - unsigned *suggested_unroll_factor) + unsigned *suggested_unroll_factor, + bool& slp_done_for_suggested_uf) { opt_result ok = opt_result::success (); int res; @@ -2290,9 +2298,18 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal, return ok; } + /* Check if we are applying unroll factor now. */ + bool applying_suggested_uf = loop_vinfo->suggested_unroll_factor > 1; + gcc_assert (!applying_suggested_uf || !suggested_unroll_factor); + + /* If the slp decision is false when suggested unroll factor is worked + out, and we are applying suggested unroll factor, we can simply skip + all slp related analyses this time. */ + bool slp = !applying_suggested_uf || slp_done_for_suggested_uf; + /* Classify all cross-iteration scalar data-flow cycles. Cross-iteration cycles caused by virtual phis are analyzed separately. */ - vect_analyze_scalar_cycles (loop_vinfo); + vect_analyze_scalar_cycles (loop_vinfo, slp); vect_pattern_recog (loop_vinfo); @@ -2359,26 +2376,30 @@ vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal, poly_uint64 saved_vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo); - /* Check the SLP opportunities in the loop, analyze and build SLP trees. */ - ok = vect_analyze_slp (loop_vinfo, LOOP_VINFO_N_STMTS (loop_vinfo)); - if (!ok) - return ok; - - /* If there are any SLP instances mark them as pure_slp. */ - bool slp = vect_make_slp_decision (loop_vinfo); if (slp) { - /* Find stmts that need to be both vectorized and SLPed. */ - vect_detect_hybrid_slp (loop_vinfo); + /* Check the SLP opportunities in the loop, analyze and build + SLP trees. */ + ok = vect_analyze_slp (loop_vinfo, LOOP_VINFO_N_STMTS (loop_vinfo)); + if (!ok) + return ok; + + /* If there are any SLP instances mark them as pure_slp. */ + slp = vect_make_slp_decision (loop_vinfo); + if (slp) + { + /* Find stmts that need to be both vectorized and SLPed. */ + vect_detect_hybrid_slp (loop_vinfo); - /* Update the vectorization factor based on the SLP decision. */ - vect_update_vf_for_slp (loop_vinfo); + /* Update the vectorization factor based on the SLP decision. */ + vect_update_vf_for_slp (loop_vinfo); - /* Optimize the SLP graph with the vectorization factor fixed. */ - vect_optimize_slp (loop_vinfo); + /* Optimize the SLP graph with the vectorization factor fixed. */ + vect_optimize_slp (loop_vinfo); - /* Gather the loads reachable from the SLP graph entries. */ - vect_gather_slp_loads (loop_vinfo); + /* Gather the loads reachable from the SLP graph entries. */ + vect_gather_slp_loads (loop_vinfo); + } } bool saved_can_use_partial_vectors_p @@ -2394,7 +2415,7 @@ start_over: /* Apply the suggested unrolling factor, this was determined by the backend during finish_cost the first time we ran the analyzis for this vector mode. */ - if (loop_vinfo->suggested_unroll_factor > 1) + if (applying_suggested_uf) LOOP_VINFO_VECT_FACTOR (loop_vinfo) *= loop_vinfo->suggested_unroll_factor; /* Now the vectorization factor is final. */ @@ -2665,6 +2686,8 @@ start_over: gcc_assert (known_eq (vectorization_factor, LOOP_VINFO_VECT_FACTOR (loop_vinfo))); + slp_done_for_suggested_uf = slp; + /* Ok to vectorize! */ LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1; return opt_result::success (); @@ -2678,6 +2701,12 @@ again: if (!slp) return ok; + /* If the slp decision is true when suggested unroll factor is worked + out, and we are applying suggested unroll factor, we don't need to + re-try any more. */ + if (applying_suggested_uf && slp_done_for_suggested_uf) + return ok; + /* If there are reduction chains re-trying will fail anyway. */ if (! LOOP_VINFO_REDUCTION_CHAINS (loop_vinfo).is_empty ()) return ok; @@ -2864,10 +2893,12 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared *shared, machine_mode vector_mode = vector_modes[mode_i]; loop_vinfo->vector_mode = vector_mode; unsigned int suggested_unroll_factor = 1; + bool slp_done_for_suggested_uf; /* Run the main analysis. */ opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, - &suggested_unroll_factor); + &suggested_unroll_factor, + slp_done_for_suggested_uf); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "***** Analysis %s with vector mode %s\n", @@ -2879,13 +2910,15 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared *shared, if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "***** Re-trying analysis for unrolling" - " with unroll factor %d.\n", - suggested_unroll_factor); + " with unroll factor %d and slp %s.\n", + suggested_unroll_factor, + slp_done_for_suggested_uf ? "on" : "off"); loop_vec_info unroll_vinfo = vect_create_loop_vinfo (loop, shared, loop_form_info, main_loop_vinfo); unroll_vinfo->vector_mode = vector_mode; unroll_vinfo->suggested_unroll_factor = suggested_unroll_factor; - opt_result new_res = vect_analyze_loop_2 (unroll_vinfo, fatal, NULL); + opt_result new_res = vect_analyze_loop_2 (unroll_vinfo, fatal, NULL, + slp_done_for_suggested_uf); if (new_res) { delete loop_vinfo; @@ -3598,7 +3631,7 @@ check_reduction_path (dump_user_location_t loc, loop_p loop, gphi *phi, static stmt_vec_info vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info, - bool *double_reduc, bool *reduc_chain_p) + bool *double_reduc, bool *reduc_chain_p, bool slp) { gphi *phi = as_a (phi_info->stmt); gimple *phi_use_stmt = NULL; @@ -3787,7 +3820,7 @@ vect_is_simple_reduction (loop_vec_info loop_info, stmt_vec_info phi_info, continue; reduc_chain.safe_push (stmt_info); } - if (is_slp_reduc && reduc_chain.length () > 1) + if (slp && is_slp_reduc && reduc_chain.length () > 1) { for (unsigned i = 0; i < reduc_chain.length () - 1; ++i) {