From patchwork Tue Jun 13 02:03:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1794253 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=GwB84fec; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QgBgl0KhYz20X6 for ; Tue, 13 Jun 2023 12:04:17 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 183F73857C55 for ; Tue, 13 Jun 2023 02:04:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 183F73857C55 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686621855; bh=DfP78s4ejZX/xk6T5/bESAP04I3jfMu8OzccAcqdMNQ=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=GwB84fecbPMqMxi90o/lRM3A5hOgqTpPGFwiKzii3e1bQ/FVG5MEyZmp0P/z/LuvB eObGrt3OHZhKh5xLk914vWO8lvR6HjQF5lQxQtTgfUzegxlrT1L96sjCfaQqsw5cms pt7LtLLt4Iqd99dEDfZ6R0NyhA4w4UUWWkIo807k= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id A9A093858D32 for ; Tue, 13 Jun 2023 02:03:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A9A093858D32 Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D2175X020248; Tue, 13 Jun 2023 02:03:49 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6fa2829b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:48 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D22Psj024218; Tue, 13 Jun 2023 02:03:48 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6fa28281-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:48 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D1ojFw031937; Tue, 13 Jun 2023 02:03:45 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma02fra.de.ibm.com (PPS) with ESMTPS id 3r4gt51bmn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:45 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35D23h0C51773790 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jun 2023 02:03:43 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4467C20040; Tue, 13 Jun 2023 02:03:43 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5693420043; Tue, 13 Jun 2023 02:03:42 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 13 Jun 2023 02:03:42 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com Subject: [PATCH 1/9] vect: Move vect_model_load_cost next to the transform in vectorizable_load Date: Mon, 12 Jun 2023 21:03:22 -0500 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 2J0PigIBd2vBtkufuH_x7eLqAvvnw3HZ X-Proofpoint-GUID: LLmldVoNeCc4-gqSznHy8e1UdHh10JHo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 impostorscore=0 suspectscore=0 spamscore=0 adultscore=0 lowpriorityscore=0 mlxscore=0 priorityscore=1501 malwarescore=0 clxscore=1015 mlxlogscore=999 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306130016 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch is an initial patch to move costing next to the transform, it still adopts vect_model_load_cost for costing but moves and duplicates it down according to the handlings of different vect_memory_access_types, hope it can make the subsequent patches easy to review. This patch should not have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_load): Move and duplicate the call to vect_model_load_cost down to some different transform paths according to the handlings of different vect_memory_access_types. --- gcc/tree-vect-stmts.cc | 86 ++++++++++++++++++++++++++++-------------- 1 file changed, 57 insertions(+), 29 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index a7acc032d47..44514658be3 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -9430,7 +9430,9 @@ vectorizable_load (vec_info *vinfo, } } - if (!vec_stmt) /* transformation not required. */ + bool costing_p = !vec_stmt; + + if (costing_p) /* transformation not required. */ { if (slp_node && mask @@ -9464,17 +9466,13 @@ vectorizable_load (vec_info *vinfo, vinfo->any_known_not_updated_vssa = true; STMT_VINFO_TYPE (stmt_info) = load_vec_info_type; - vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type, - alignment_support_scheme, misalignment, - &gs_info, slp_node, cost_vec); - return true; } if (!slp) gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info)); - if (dump_enabled_p ()) + if (dump_enabled_p () && !costing_p) dump_printf_loc (MSG_NOTE, vect_location, "transform load. ncopies = %d\n", ncopies); @@ -9485,13 +9483,26 @@ vectorizable_load (vec_info *vinfo, if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl) { - vect_build_gather_load_calls (vinfo, - stmt_info, gsi, vec_stmt, &gs_info, mask); + if (costing_p) + vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type, + alignment_support_scheme, misalignment, &gs_info, + slp_node, cost_vec); + else + vect_build_gather_load_calls (vinfo, stmt_info, gsi, vec_stmt, &gs_info, + mask); return true; } if (memory_access_type == VMAT_INVARIANT) { + if (costing_p) + { + vect_model_load_cost (vinfo, stmt_info, ncopies, vf, + memory_access_type, alignment_support_scheme, + misalignment, &gs_info, slp_node, cost_vec); + return true; + } + gcc_assert (!grouped_load && !mask && !bb_vinfo); /* If we have versioned for aliasing or the loop doesn't have any data dependencies that would preclude this, @@ -9548,6 +9559,14 @@ vectorizable_load (vec_info *vinfo, if (memory_access_type == VMAT_ELEMENTWISE || memory_access_type == VMAT_STRIDED_SLP) { + if (costing_p) + { + vect_model_load_cost (vinfo, stmt_info, ncopies, vf, + memory_access_type, alignment_support_scheme, + misalignment, &gs_info, slp_node, cost_vec); + return true; + } + gimple_stmt_iterator incr_gsi; bool insert_after; tree offvar; @@ -9989,17 +10008,20 @@ vectorizable_load (vec_info *vinfo, here, since we can't guarantee first_stmt_info DR has been initialized yet, use first_stmt_info_for_drptr DR by bumping the distance from first_stmt_info DR instead as below. */ - if (!diff_first_stmt_info) - msq = vect_setup_realignment (vinfo, - first_stmt_info, gsi, &realignment_token, - alignment_support_scheme, NULL_TREE, - &at_loop); - if (alignment_support_scheme == dr_explicit_realign_optimized) - { - phi = as_a (SSA_NAME_DEF_STMT (msq)); - offset = size_binop (MINUS_EXPR, TYPE_SIZE_UNIT (vectype), - size_one_node); - gcc_assert (!first_stmt_info_for_drptr); + if (!costing_p) + { + if (!diff_first_stmt_info) + msq = vect_setup_realignment (vinfo, first_stmt_info, gsi, + &realignment_token, + alignment_support_scheme, NULL_TREE, + &at_loop); + if (alignment_support_scheme == dr_explicit_realign_optimized) + { + phi = as_a (SSA_NAME_DEF_STMT (msq)); + offset = size_binop (MINUS_EXPR, TYPE_SIZE_UNIT (vectype), + size_one_node); + gcc_assert (!first_stmt_info_for_drptr); + } } } else @@ -10020,8 +10042,9 @@ vectorizable_load (vec_info *vinfo, else if (memory_access_type == VMAT_GATHER_SCATTER) { aggr_type = elem_type; - vect_get_strided_load_store_ops (stmt_info, loop_vinfo, &gs_info, - &bump, &vec_offset); + if (!costing_p) + vect_get_strided_load_store_ops (stmt_info, loop_vinfo, &gs_info, &bump, + &vec_offset); } else { @@ -10035,7 +10058,7 @@ vectorizable_load (vec_info *vinfo, auto_vec vec_offsets; auto_vec vec_masks; - if (mask) + if (mask && !costing_p) { if (slp_node) vect_get_slp_defs (SLP_TREE_CHILDREN (slp_node)[mask_index], @@ -10049,7 +10072,7 @@ vectorizable_load (vec_info *vinfo, for (j = 0; j < ncopies; j++) { /* 1. Create the vector or array pointer update chain. */ - if (j == 0) + if (j == 0 && !costing_p) { bool simd_lane_access_p = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0; @@ -10108,7 +10131,7 @@ vectorizable_load (vec_info *vinfo, if (mask) vec_mask = vec_masks[0]; } - else + else if (!costing_p) { gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); if (dataref_offset) @@ -10125,7 +10148,7 @@ vectorizable_load (vec_info *vinfo, dr_chain.create (vec_num); gimple *new_stmt = NULL; - if (memory_access_type == VMAT_LOAD_STORE_LANES) + if (memory_access_type == VMAT_LOAD_STORE_LANES && !costing_p) { tree vec_array; @@ -10177,7 +10200,7 @@ vectorizable_load (vec_info *vinfo, /* Record that VEC_ARRAY is now dead. */ vect_clobber_variable (vinfo, stmt_info, gsi, vec_array); } - else + else if (!costing_p) { for (i = 0; i < vec_num; i++) { @@ -10631,7 +10654,7 @@ vectorizable_load (vec_info *vinfo, if (slp && !slp_perm) continue; - if (slp_perm) + if (slp_perm && !costing_p) { unsigned n_perms; /* For SLP we know we've seen all possible uses of dr_chain so @@ -10643,7 +10666,7 @@ vectorizable_load (vec_info *vinfo, nullptr, true); gcc_assert (ok); } - else + else if (!costing_p) { if (grouped_load) { @@ -10659,9 +10682,14 @@ vectorizable_load (vec_info *vinfo, } dr_chain.release (); } - if (!slp) + if (!slp && !costing_p) *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; + if (costing_p) + vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type, + alignment_support_scheme, misalignment, &gs_info, + slp_node, cost_vec); + return true; } From patchwork Tue Jun 13 02:03:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1794256 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=p0Ze/2G/; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QgBj04NDGz20QH for ; Tue, 13 Jun 2023 12:05:24 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4E2163857C51 for ; Tue, 13 Jun 2023 02:05:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4E2163857C51 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686621921; bh=iSFZQDIFqKy7C8+BrTU+LC3kCipl5K4VhKTUUgxNPoY=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=p0Ze/2G/MQNIs+steGHTI+ETC58oPLO23pZYHE7sLuS148I1OqaJF+2igYTFpo+bd DgMLKXLTF08I+c3sGniVqvYZkY+KXPdTO7c6LG7rueuaJuBmYUDu6CtHMoAE7fzbH8 D102x+HAeoj6VP1fl+no3kZQodwOCcXMFzvroc/Q= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id CF3DA3858D33 for ; Tue, 13 Jun 2023 02:03:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CF3DA3858D33 Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D217Z6020252; Tue, 13 Jun 2023 02:03:50 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6fa2829w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:49 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D22xQ6025525; Tue, 13 Jun 2023 02:03:49 GMT Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6fa2828s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:49 +0000 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D1lqvZ025458; Tue, 13 Jun 2023 02:03:46 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma06ams.nl.ibm.com (PPS) with ESMTPS id 3r4gee1ue3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:46 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35D23iPv18023006 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jun 2023 02:03:44 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6FF8F20043; Tue, 13 Jun 2023 02:03:44 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 81B9620040; Tue, 13 Jun 2023 02:03:43 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 13 Jun 2023 02:03:43 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com Subject: [PATCH 2/9] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER && gs_info.decl Date: Mon, 12 Jun 2023 21:03:23 -0500 Message-Id: <9bad792a4bcef35fbd9906245bf3493672b340fe.1686573640.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Z-tFxoFxGVOAJ2XOlvMjuOK3EcjvOhDb X-Proofpoint-GUID: c-yv9Djaz3OGVHKcHdarypMAH1gdqogk X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 impostorscore=0 suspectscore=0 spamscore=0 adultscore=0 lowpriorityscore=0 mlxscore=0 priorityscore=1501 malwarescore=0 clxscore=1015 mlxlogscore=999 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306130016 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adds one extra argument cost_vec to function vect_build_gather_load_calls, so that we can do costing next to the tranform in vect_build_gather_load_calls. For now, the implementation just follows the handlings in vect_model_load_cost, it isn't so good, so placing one FIXME for any further improvement. This patch should not cause any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vect_build_gather_load_calls): Add the handlings on costing with one extra argument cost_vec. (vectorizable_load): Adjust the call to vect_build_gather_load_calls. (vect_model_load_cost): Assert it won't get VMAT_GATHER_SCATTER with gs_info.decl set any more. --- gcc/tree-vect-stmts.cc | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 44514658be3..744cdf40e26 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1135,6 +1135,8 @@ vect_model_load_cost (vec_info *vinfo, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { + gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl); + unsigned int inside_cost = 0, prologue_cost = 0; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -2819,7 +2821,8 @@ vect_build_gather_load_calls (vec_info *vinfo, stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, gimple **vec_stmt, gather_scatter_info *gs_info, - tree mask) + tree mask, + stmt_vector_for_cost *cost_vec) { loop_vec_info loop_vinfo = dyn_cast (vinfo); class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); @@ -2831,6 +2834,23 @@ vect_build_gather_load_calls (vec_info *vinfo, stmt_vec_info stmt_info, poly_uint64 gather_off_nunits = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype); + /* FIXME: Keep the previous costing way in vect_model_load_cost by costing + N scalar loads, but it should be tweaked to use target specific costs + on related gather load calls. */ + if (!vec_stmt) + { + unsigned int assumed_nunits = vect_nunits_for_cost (vectype); + unsigned int inside_cost; + inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits, + scalar_load, stmt_info, 0, vect_body); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_load_cost: inside_cost = %d, " + "prologue_cost = 0 .\n", + inside_cost); + return; + } + tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info->decl)); tree rettype = TREE_TYPE (TREE_TYPE (gs_info->decl)); tree srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); @@ -9483,13 +9503,8 @@ vectorizable_load (vec_info *vinfo, if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl) { - if (costing_p) - vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type, - alignment_support_scheme, misalignment, &gs_info, - slp_node, cost_vec); - else - vect_build_gather_load_calls (vinfo, stmt_info, gsi, vec_stmt, &gs_info, - mask); + vect_build_gather_load_calls (vinfo, stmt_info, gsi, vec_stmt, &gs_info, + mask, cost_vec); return true; } From patchwork Tue Jun 13 02:03:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1794255 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=TAuIl3NE; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QgBh02cPKz20XW for ; Tue, 13 Jun 2023 12:04:32 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4011A3858023 for ; Tue, 13 Jun 2023 02:04:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4011A3858023 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686621870; bh=PlXErXM0Hq1vo6prvL+ycIHRqez5rE/rBL/Xl+0sxWw=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=TAuIl3NEvKw3m2zbbRoNngdc2HksVkYNn9dHajANtO3DJPO5eSYSstXKNcLVSW0La deMbyKTt7TH79TawR9KwC2CjJfvWsMrREP/7mTQ+vydgxPIV0+YTNv3DaEf44sQwdh vDjIi8lXwSTJU14EvyjUH4rIamQbAsdSTP28xL9g= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id B20243858CDA for ; Tue, 13 Jun 2023 02:03:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B20243858CDA Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D217x5020227; Tue, 13 Jun 2023 02:03:55 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6fa282c7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:54 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D21wpp022474; Tue, 13 Jun 2023 02:03:54 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6fa2829h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:54 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D1DYxK002195; Tue, 13 Jun 2023 02:03:48 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma02fra.de.ibm.com (PPS) with ESMTPS id 3r4gt51bmq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:47 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35D23jH947448400 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jun 2023 02:03:45 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9B1792004B; Tue, 13 Jun 2023 02:03:45 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AD31920040; Tue, 13 Jun 2023 02:03:44 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 13 Jun 2023 02:03:44 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com Subject: [PATCH 3/9] vect: Adjust vectorizable_load costing on VMAT_INVARIANT Date: Mon, 12 Jun 2023 21:03:24 -0500 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: SKYxW2H_i4FvV8m2LrYlsFHky2p7KqAY X-Proofpoint-GUID: 3Pp6oZ4jh4EjJbbvEbk4PZOxCmtFUejG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 impostorscore=0 suspectscore=0 spamscore=0 adultscore=0 lowpriorityscore=0 mlxscore=0 priorityscore=1501 malwarescore=0 clxscore=1015 mlxlogscore=999 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306130016 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adjusts the cost handling on VMAT_INVARIANT in function vectorizable_load. We don't call function vect_model_load_cost for it any more. To make the costing on VMAT_INVARIANT better, this patch is to query hoist_defs_of_uses for hoisting decision, and add costs for different "where" based on it. Currently function hoist_defs_of_uses would always hoist the defs of all SSA uses, adding one argument HOIST_P aims to avoid the actual hoisting during costing phase. gcc/ChangeLog: * tree-vect-stmts.cc (hoist_defs_of_uses): Add one argument HOIST_P. (vectorizable_load): Adjust the handling on VMAT_INVARIANT to respect hoisting decision and without calling vect_model_load_cost. (vect_model_load_cost): Assert it won't get VMAT_INVARIANT any more and remove VMAT_INVARIANT related handlings. --- gcc/tree-vect-stmts.cc | 61 +++++++++++++++++++++++++++--------------- 1 file changed, 39 insertions(+), 22 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 744cdf40e26..19c61d703c8 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1135,7 +1135,8 @@ vect_model_load_cost (vec_info *vinfo, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { - gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl); + gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl) + && memory_access_type != VMAT_INVARIANT); unsigned int inside_cost = 0, prologue_cost = 0; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -1238,16 +1239,6 @@ vect_model_load_cost (vec_info *vinfo, ncopies * assumed_nunits, scalar_load, stmt_info, 0, vect_body); } - else if (memory_access_type == VMAT_INVARIANT) - { - /* Invariant loads will ideally be hoisted and splat to a vector. */ - prologue_cost += record_stmt_cost (cost_vec, 1, - scalar_load, stmt_info, 0, - vect_prologue); - prologue_cost += record_stmt_cost (cost_vec, 1, - scalar_to_vec, stmt_info, 0, - vect_prologue); - } else vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, misalignment, first_stmt_p, @@ -9121,10 +9112,11 @@ permute_vec_elements (vec_info *vinfo, /* Hoist the definitions of all SSA uses on STMT_INFO out of the loop LOOP, inserting them on the loops preheader edge. Returns true if we were successful in doing so (and thus STMT_INFO can be moved then), - otherwise returns false. */ + otherwise returns false. HOIST_P indicates if we want to hoist the + definitions of all SSA uses, it would be false when we are costing. */ static bool -hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop) +hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop, bool hoist_p) { ssa_op_iter i; tree op; @@ -9158,6 +9150,9 @@ hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop) if (!any) return true; + if (!hoist_p) + return true; + FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE) { gimple *def_stmt = SSA_NAME_DEF_STMT (op); @@ -9510,14 +9505,6 @@ vectorizable_load (vec_info *vinfo, if (memory_access_type == VMAT_INVARIANT) { - if (costing_p) - { - vect_model_load_cost (vinfo, stmt_info, ncopies, vf, - memory_access_type, alignment_support_scheme, - misalignment, &gs_info, slp_node, cost_vec); - return true; - } - gcc_assert (!grouped_load && !mask && !bb_vinfo); /* If we have versioned for aliasing or the loop doesn't have any data dependencies that would preclude this, @@ -9525,7 +9512,37 @@ vectorizable_load (vec_info *vinfo, thus we can insert it on the preheader edge. */ bool hoist_p = (LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) && !nested_in_vect_loop - && hoist_defs_of_uses (stmt_info, loop)); + && hoist_defs_of_uses (stmt_info, loop, !costing_p)); + if (costing_p) + { + if (hoist_p) + { + unsigned int prologue_cost; + prologue_cost = record_stmt_cost (cost_vec, 1, scalar_load, + stmt_info, 0, vect_prologue); + prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, + stmt_info, 0, vect_prologue); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_load_cost: inside_cost = 0, " + "prologue_cost = %d .\n", + prologue_cost); + } + else + { + unsigned int inside_cost; + inside_cost = record_stmt_cost (cost_vec, 1, scalar_load, + stmt_info, 0, vect_body); + inside_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, + stmt_info, 0, vect_body); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_load_cost: inside_cost = %d, " + "prologue_cost = 0 .\n", + inside_cost); + } + return true; + } if (hoist_p) { gassign *stmt = as_a (stmt_info->stmt); From patchwork Tue Jun 13 02:03:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1794259 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=yfYlR7Fn; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QgBkp0r6Fz20QH for ; Tue, 13 Jun 2023 12:06:58 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7D84038558BA for ; Tue, 13 Jun 2023 02:06:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7D84038558BA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686622015; bh=Wk2efnAObf7dh8ft7Yn4+GyyR4hRUw1UbaEwS3COqQY=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=yfYlR7Fnmhc/zEipSp+VRm67B4437giM0vNnJ0dom4ffM5RaKc7FINds2dwuvvEOQ kpG4J0hfbrHksWAFNwS8e5cyXl+Cm0bFEIDYCPaDGJAZJ/sqxtw7eAKYFov6sr1DQl Qr2BnIAcM8LTSMF8XvGGwqof5SP3XJKUXuFD4zek= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id C41C83858023 for ; Tue, 13 Jun 2023 02:06:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C41C83858023 Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D1lNlZ010299; Tue, 13 Jun 2023 02:06:28 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f3gg9s6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:06:27 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D26RN9001455; Tue, 13 Jun 2023 02:06:27 GMT Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f3gg9m3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:06:26 +0000 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D0duK9010323; Tue, 13 Jun 2023 02:03:49 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma06fra.de.ibm.com (PPS) with ESMTPS id 3r4gedsc3q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:49 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35D23kb920382260 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jun 2023 02:03:46 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C64F520043; Tue, 13 Jun 2023 02:03:46 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D866B2004B; Tue, 13 Jun 2023 02:03:45 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 13 Jun 2023 02:03:45 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com Subject: [PATCH 4/9] vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP Date: Mon, 12 Jun 2023 21:03:25 -0500 Message-Id: <0281a2a022869efe379130aea6e0782e4827ef61.1686573640.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: reORvQXV-4stvmIX0PZ1UjFtbz4KYBRh X-Proofpoint-ORIG-GUID: -MoQxbBkL_e8Nlig1u4CFiiyPTzmr-Va X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 bulkscore=0 suspectscore=0 lowpriorityscore=0 clxscore=1015 phishscore=0 spamscore=0 adultscore=0 priorityscore=1501 mlxscore=0 malwarescore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306130016 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adjusts the cost handling on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP in function vectorizable_load. We don't call function vect_model_load_cost for them any more. As PR82255 shows, we don't always need a vector construction there, moving costing next to the transform can make us only cost for vector construction when it's actually needed. Besides, it can count the number of loads consistently for some cases. PR tree-optimization/82255 gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP without calling vect_model_load_cost. (vect_model_load_cost): Assert it won't get VMAT_ELEMENTWISE and VMAT_STRIDED_SLP any more, and remove their related handlings. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c: New test. 2023-06-13 Bill Schmidt Kewen Lin --- .../vect/costmodel/ppc/costmodel-pr82255.c | 31 ++++ gcc/tree-vect-stmts.cc | 170 +++++++++++------- 2 files changed, 134 insertions(+), 67 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c new file mode 100644 index 00000000000..9317ee2e15b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +/* PR82255: Ensure we don't require a vec_construct cost when we aren't + going to generate a strided load. */ + +extern int abs (int __x) __attribute__ ((__nothrow__, __leaf__)) +__attribute__ ((__const__)); + +static int +foo (unsigned char *w, int i, unsigned char *x, int j) +{ + int tot = 0; + for (int a = 0; a < 16; a++) + { +#pragma GCC unroll 16 + for (int b = 0; b < 16; b++) + tot += abs (w[b] - x[b]); + w += i; + x += j; + } + return tot; +} + +void +bar (unsigned char *w, unsigned char *x, int i, int *result) +{ + *result = foo (w, 16, x, i); +} + +/* { dg-final { scan-tree-dump-times "vec_construct" 0 "vect" } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 19c61d703c8..651dc800380 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1136,7 +1136,9 @@ vect_model_load_cost (vec_info *vinfo, stmt_vector_for_cost *cost_vec) { gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl) - && memory_access_type != VMAT_INVARIANT); + && memory_access_type != VMAT_INVARIANT + && memory_access_type != VMAT_ELEMENTWISE + && memory_access_type != VMAT_STRIDED_SLP); unsigned int inside_cost = 0, prologue_cost = 0; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -1221,8 +1223,7 @@ vect_model_load_cost (vec_info *vinfo, } /* The loads themselves. */ - if (memory_access_type == VMAT_ELEMENTWISE - || memory_access_type == VMAT_GATHER_SCATTER) + if (memory_access_type == VMAT_GATHER_SCATTER) { tree vectype = STMT_VINFO_VECTYPE (stmt_info); unsigned int assumed_nunits = vect_nunits_for_cost (vectype); @@ -1244,10 +1245,10 @@ vect_model_load_cost (vec_info *vinfo, alignment_support_scheme, misalignment, first_stmt_p, &inside_cost, &prologue_cost, cost_vec, cost_vec, true); - if (memory_access_type == VMAT_ELEMENTWISE - || memory_access_type == VMAT_STRIDED_SLP - || (memory_access_type == VMAT_GATHER_SCATTER - && gs_info->ifn == IFN_LAST && !gs_info->decl)) + + if (memory_access_type == VMAT_GATHER_SCATTER + && gs_info->ifn == IFN_LAST + && !gs_info->decl) inside_cost += record_stmt_cost (cost_vec, ncopies, vec_construct, stmt_info, 0, vect_body); @@ -9591,14 +9592,6 @@ vectorizable_load (vec_info *vinfo, if (memory_access_type == VMAT_ELEMENTWISE || memory_access_type == VMAT_STRIDED_SLP) { - if (costing_p) - { - vect_model_load_cost (vinfo, stmt_info, ncopies, vf, - memory_access_type, alignment_support_scheme, - misalignment, &gs_info, slp_node, cost_vec); - return true; - } - gimple_stmt_iterator incr_gsi; bool insert_after; tree offvar; @@ -9610,6 +9603,7 @@ vectorizable_load (vec_info *vinfo, unsigned int const_nunits = nunits.to_constant (); unsigned HOST_WIDE_INT cst_offset = 0; tree dr_offset; + unsigned int inside_cost = 0; gcc_assert (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)); gcc_assert (!nested_in_vect_loop); @@ -9624,6 +9618,7 @@ vectorizable_load (vec_info *vinfo, first_stmt_info = stmt_info; first_dr_info = dr_info; } + if (slp && grouped_load) { group_size = DR_GROUP_SIZE (first_stmt_info); @@ -9640,43 +9635,44 @@ vectorizable_load (vec_info *vinfo, ref_type = reference_alias_ptr_type (DR_REF (dr_info->dr)); } - dr_offset = get_dr_vinfo_offset (vinfo, first_dr_info); - stride_base - = fold_build_pointer_plus - (DR_BASE_ADDRESS (first_dr_info->dr), - size_binop (PLUS_EXPR, - convert_to_ptrofftype (dr_offset), - convert_to_ptrofftype (DR_INIT (first_dr_info->dr)))); - stride_step = fold_convert (sizetype, DR_STEP (first_dr_info->dr)); + if (!costing_p) + { + dr_offset = get_dr_vinfo_offset (vinfo, first_dr_info); + stride_base = fold_build_pointer_plus ( + DR_BASE_ADDRESS (first_dr_info->dr), + size_binop (PLUS_EXPR, convert_to_ptrofftype (dr_offset), + convert_to_ptrofftype (DR_INIT (first_dr_info->dr)))); + stride_step = fold_convert (sizetype, DR_STEP (first_dr_info->dr)); - /* For a load with loop-invariant (but other than power-of-2) - stride (i.e. not a grouped access) like so: + /* For a load with loop-invariant (but other than power-of-2) + stride (i.e. not a grouped access) like so: - for (i = 0; i < n; i += stride) - ... = array[i]; + for (i = 0; i < n; i += stride) + ... = array[i]; - we generate a new induction variable and new accesses to - form a new vector (or vectors, depending on ncopies): + we generate a new induction variable and new accesses to + form a new vector (or vectors, depending on ncopies): - for (j = 0; ; j += VF*stride) - tmp1 = array[j]; - tmp2 = array[j + stride]; - ... - vectemp = {tmp1, tmp2, ...} - */ + for (j = 0; ; j += VF*stride) + tmp1 = array[j]; + tmp2 = array[j + stride]; + ... + vectemp = {tmp1, tmp2, ...} + */ - ivstep = fold_build2 (MULT_EXPR, TREE_TYPE (stride_step), stride_step, - build_int_cst (TREE_TYPE (stride_step), vf)); + ivstep = fold_build2 (MULT_EXPR, TREE_TYPE (stride_step), stride_step, + build_int_cst (TREE_TYPE (stride_step), vf)); - standard_iv_increment_position (loop, &incr_gsi, &insert_after); + standard_iv_increment_position (loop, &incr_gsi, &insert_after); - stride_base = cse_and_gimplify_to_preheader (loop_vinfo, stride_base); - ivstep = cse_and_gimplify_to_preheader (loop_vinfo, ivstep); - create_iv (stride_base, PLUS_EXPR, ivstep, NULL, - loop, &incr_gsi, insert_after, - &offvar, NULL); + stride_base = cse_and_gimplify_to_preheader (loop_vinfo, stride_base); + ivstep = cse_and_gimplify_to_preheader (loop_vinfo, ivstep); + create_iv (stride_base, PLUS_EXPR, ivstep, NULL, + loop, &incr_gsi, insert_after, + &offvar, NULL); - stride_step = cse_and_gimplify_to_preheader (loop_vinfo, stride_step); + stride_step = cse_and_gimplify_to_preheader (loop_vinfo, stride_step); + } running_off = offvar; alias_off = build_int_cst (ref_type, 0); @@ -9743,11 +9739,23 @@ vectorizable_load (vec_info *vinfo, unsigned int n_groups = 0; for (j = 0; j < ncopies; j++) { - if (nloads > 1) + if (nloads > 1 && !costing_p) vec_alloc (v, nloads); gimple *new_stmt = NULL; for (i = 0; i < nloads; i++) { + if (costing_p) + { + if (VECTOR_TYPE_P (ltype)) + vect_get_load_cost (vinfo, stmt_info, 1, + alignment_support_scheme, misalignment, + false, &inside_cost, nullptr, cost_vec, + cost_vec, true); + else + inside_cost += record_stmt_cost (cost_vec, 1, scalar_load, + stmt_info, 0, vect_body); + continue; + } tree this_off = build_int_cst (TREE_TYPE (alias_off), group_el * elsz + cst_offset); tree data_ref = build2 (MEM_REF, ltype, running_off, this_off); @@ -9778,42 +9786,70 @@ vectorizable_load (vec_info *vinfo, group_el = 0; } } + if (nloads > 1) { - tree vec_inv = build_constructor (lvectype, v); - new_temp = vect_init_vector (vinfo, stmt_info, - vec_inv, lvectype, gsi); - new_stmt = SSA_NAME_DEF_STMT (new_temp); - if (lvectype != vectype) + if (costing_p) + inside_cost += record_stmt_cost (cost_vec, 1, vec_construct, + stmt_info, 0, vect_body); + else { - new_stmt = gimple_build_assign (make_ssa_name (vectype), - VIEW_CONVERT_EXPR, - build1 (VIEW_CONVERT_EXPR, - vectype, new_temp)); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + tree vec_inv = build_constructor (lvectype, v); + new_temp = vect_init_vector (vinfo, stmt_info, vec_inv, + lvectype, gsi); + new_stmt = SSA_NAME_DEF_STMT (new_temp); + if (lvectype != vectype) + { + new_stmt + = gimple_build_assign (make_ssa_name (vectype), + VIEW_CONVERT_EXPR, + build1 (VIEW_CONVERT_EXPR, + vectype, new_temp)); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, + gsi); + } } } - if (slp) + if (!costing_p) { - if (slp_perm) - dr_chain.quick_push (gimple_assign_lhs (new_stmt)); + if (slp) + { + if (slp_perm) + dr_chain.quick_push (gimple_assign_lhs (new_stmt)); + else + SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); + } else - SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); - } - else - { - if (j == 0) - *vec_stmt = new_stmt; - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + { + if (j == 0) + *vec_stmt = new_stmt; + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + } } } if (slp_perm) { unsigned n_perms; - vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, gsi, vf, - false, &n_perms); + if (costing_p) + { + unsigned n_loads; + vect_transform_slp_perm_load (vinfo, slp_node, vNULL, NULL, vf, + true, &n_perms, &n_loads); + inside_cost += record_stmt_cost (cost_vec, n_perms, vec_perm, + first_stmt_info, 0, vect_body); + } + else + vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, gsi, vf, + false, &n_perms); } + + if (costing_p && dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_load_cost: inside_cost = %u, " + "prologue_cost = 0 .\n", + inside_cost); + return true; } From patchwork Tue Jun 13 02:03:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1794260 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=ql2SE5+J; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QgBlW3HtJz20QH for ; Tue, 13 Jun 2023 12:07:35 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 76AF638558B3 for ; Tue, 13 Jun 2023 02:07:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 76AF638558B3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686622053; bh=2iIuUoiMNVt4R8YqrRR2tVZ7ts5u/gTwBTY9G88/4xo=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=ql2SE5+JxqVqFN3QLau6ttby/S5YDMjZRTv0CWdevjKHkdDB9LOnWZ7frqyP1dBFD jrYcqB90zfAURt7aK45u0kucldAS+tThouERkClaYQ3B3DlL+QVqIwEUB/VyCUHRkk tQypeJquHp4sev8a0H0YuGb3pcbp1KGASxwkQVG8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 92CCB3858422 for ; Tue, 13 Jun 2023 02:06:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 92CCB3858422 Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D1lPuM010320; Tue, 13 Jun 2023 02:06:23 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f3gg9ry-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:06:22 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D26LVM000454; Tue, 13 Jun 2023 02:06:21 GMT Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f3gg9mc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:06:21 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D1H3kD031882; Tue, 13 Jun 2023 02:03:50 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3r4gt51ups-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:50 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35D23mCV42861034 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jun 2023 02:03:48 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F14372004B; Tue, 13 Jun 2023 02:03:47 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0FC3520040; Tue, 13 Jun 2023 02:03:47 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 13 Jun 2023 02:03:46 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com Subject: [PATCH 5/9] vect: Adjust vectorizable_load costing on VMAT_GATHER_SCATTER Date: Mon, 12 Jun 2023 21:03:26 -0500 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: QOsdEONhD17RqhKwfUcm6s0-o-RyokJw X-Proofpoint-ORIG-GUID: tjEdt0Kgl_RJ_cnf9S09cB94IAnyuwSA X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 bulkscore=0 suspectscore=0 lowpriorityscore=0 clxscore=1015 phishscore=0 spamscore=0 adultscore=0 priorityscore=1501 mlxscore=0 malwarescore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306130016 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adjusts the cost handling on VMAT_GATHER_SCATTER in function vectorizable_load. We don't call function vect_model_load_cost for it any more. It's mainly for gather loads with IFN or emulated gather loads, it follows the handlings in function vect_model_load_cost. This patch shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on VMAT_GATHER_SCATTER without calling vect_model_load_cost. (vect_model_load_cost): Adjut the assertion on VMAT_GATHER_SCATTER, remove VMAT_GATHER_SCATTER related handlings and the related parameter gs_info. --- gcc/tree-vect-stmts.cc | 123 +++++++++++++++++++++++++---------------- 1 file changed, 75 insertions(+), 48 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 651dc800380..a3fd0bf879e 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1131,11 +1131,10 @@ vect_model_load_cost (vec_info *vinfo, vect_memory_access_type memory_access_type, dr_alignment_support alignment_support_scheme, int misalignment, - gather_scatter_info *gs_info, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { - gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl) + gcc_assert (memory_access_type != VMAT_GATHER_SCATTER && memory_access_type != VMAT_INVARIANT && memory_access_type != VMAT_ELEMENTWISE && memory_access_type != VMAT_STRIDED_SLP); @@ -1222,35 +1221,9 @@ vect_model_load_cost (vec_info *vinfo, group_size); } - /* The loads themselves. */ - if (memory_access_type == VMAT_GATHER_SCATTER) - { - tree vectype = STMT_VINFO_VECTYPE (stmt_info); - unsigned int assumed_nunits = vect_nunits_for_cost (vectype); - if (memory_access_type == VMAT_GATHER_SCATTER - && gs_info->ifn == IFN_LAST && !gs_info->decl) - /* For emulated gathers N offset vector element extracts - (we assume the scalar scaling and ptr + offset add is consumed by - the load). */ - inside_cost += record_stmt_cost (cost_vec, ncopies * assumed_nunits, - vec_to_scalar, stmt_info, 0, - vect_body); - /* N scalar loads plus gathering them into a vector. */ - inside_cost += record_stmt_cost (cost_vec, - ncopies * assumed_nunits, - scalar_load, stmt_info, 0, vect_body); - } - else - vect_get_load_cost (vinfo, stmt_info, ncopies, - alignment_support_scheme, misalignment, first_stmt_p, - &inside_cost, &prologue_cost, - cost_vec, cost_vec, true); - - if (memory_access_type == VMAT_GATHER_SCATTER - && gs_info->ifn == IFN_LAST - && !gs_info->decl) - inside_cost += record_stmt_cost (cost_vec, ncopies, vec_construct, - stmt_info, 0, vect_body); + vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, + misalignment, first_stmt_p, &inside_cost, &prologue_cost, + cost_vec, cost_vec, true); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -10137,6 +10110,7 @@ vectorizable_load (vec_info *vinfo, } tree vec_mask = NULL_TREE; poly_uint64 group_elt = 0; + unsigned int inside_cost = 0; for (j = 0; j < ncopies; j++) { /* 1. Create the vector or array pointer update chain. */ @@ -10268,23 +10242,25 @@ vectorizable_load (vec_info *vinfo, /* Record that VEC_ARRAY is now dead. */ vect_clobber_variable (vinfo, stmt_info, gsi, vec_array); } - else if (!costing_p) + else { for (i = 0; i < vec_num; i++) { tree final_mask = NULL_TREE; - if (loop_masks - && memory_access_type != VMAT_INVARIANT) - final_mask = vect_get_loop_mask (gsi, loop_masks, - vec_num * ncopies, - vectype, vec_num * j + i); - if (vec_mask) - final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, - final_mask, vec_mask, gsi); - - if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) - dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, - gsi, stmt_info, bump); + if (!costing_p) + { + if (loop_masks && memory_access_type != VMAT_INVARIANT) + final_mask + = vect_get_loop_mask (gsi, loop_masks, vec_num * ncopies, + vectype, vec_num * j + i); + if (vec_mask) + final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, + final_mask, vec_mask, gsi); + + if (i > 0 && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, + gsi, stmt_info, bump); + } /* 2. Create the vector-load in the loop. */ switch (alignment_support_scheme) @@ -10298,6 +10274,16 @@ vectorizable_load (vec_info *vinfo, if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.ifn != IFN_LAST) { + if (costing_p) + { + unsigned int cnunits + = vect_nunits_for_cost (vectype); + inside_cost + = record_stmt_cost (cost_vec, cnunits, + scalar_load, stmt_info, 0, + vect_body); + goto vec_num_loop_costing_end; + } if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) vec_offset = vec_offsets[vec_num * j + i]; tree zero = build_zero_cst (vectype); @@ -10322,6 +10308,25 @@ vectorizable_load (vec_info *vinfo, gcc_assert (!final_mask); unsigned HOST_WIDE_INT const_nunits = nunits.to_constant (); + if (costing_p) + { + /* For emulated gathers N offset vector element + offset add is consumed by the load). */ + inside_cost + = record_stmt_cost (cost_vec, const_nunits, + vec_to_scalar, stmt_info, 0, + vect_body); + /* N scalar loads plus gathering them into a + vector. */ + inside_cost + = record_stmt_cost (cost_vec, const_nunits, + scalar_load, stmt_info, 0, + vect_body); + inside_cost + = record_stmt_cost (cost_vec, 1, vec_construct, + stmt_info, 0, vect_body); + goto vec_num_loop_costing_end; + } unsigned HOST_WIDE_INT const_offset_nunits = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype) .to_constant (); @@ -10374,6 +10379,9 @@ vectorizable_load (vec_info *vinfo, break; } + if (costing_p) + goto vec_num_loop_costing_end; + align = known_alignment (DR_TARGET_ALIGNMENT (first_dr_info)); if (alignment_support_scheme == dr_aligned) @@ -10544,6 +10552,8 @@ vectorizable_load (vec_info *vinfo, } case dr_explicit_realign: { + if (costing_p) + goto vec_num_loop_costing_end; tree ptr, bump; tree vs = size_int (TYPE_VECTOR_SUBPARTS (vectype)); @@ -10606,6 +10616,8 @@ vectorizable_load (vec_info *vinfo, } case dr_explicit_realign_optimized: { + if (costing_p) + goto vec_num_loop_costing_end; if (TREE_CODE (dataref_ptr) == SSA_NAME) new_temp = copy_ssa_name (dataref_ptr); else @@ -10702,10 +10714,14 @@ vectorizable_load (vec_info *vinfo, gsi, stmt_info, bump); group_elt = 0; } +vec_num_loop_costing_end: + ; } /* Bump the vector pointer to account for a gap or for excess elements loaded for a permuted SLP load. */ - if (maybe_ne (group_gap_adj, 0U) && slp_perm) + if (!costing_p + && maybe_ne (group_gap_adj, 0U) + && slp_perm) { poly_wide_int bump_val = (wi::to_wide (TYPE_SIZE_UNIT (elem_type)) @@ -10754,9 +10770,20 @@ vectorizable_load (vec_info *vinfo, *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; if (costing_p) - vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type, - alignment_support_scheme, misalignment, &gs_info, - slp_node, cost_vec); + { + if (memory_access_type == VMAT_GATHER_SCATTER) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_load_cost: inside_cost = %u, " + "prologue_cost = 0 .\n", + inside_cost); + } + else + vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type, + alignment_support_scheme, misalignment, slp_node, + cost_vec); + } return true; } From patchwork Tue Jun 13 02:03:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1794254 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=LbCll5Cv; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QgBgz5Gttz20X6 for ; Tue, 13 Jun 2023 12:04:31 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AF9943858296 for ; Tue, 13 Jun 2023 02:04:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AF9943858296 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686621869; bh=USFAO+TfZHyeJuqDINwWdcUMUJcmsagSoosUry4cx+o=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=LbCll5Cv634h2h7AsvNiM4JCgiyGYKH+0C7cGqKb39uqZM0gousXYIJg+suLtYQ6I fOl4OrJiWuCWAPb0po+oWJfowsQvbmhsSa3uuieGdPfYWW+k3rR6WxqqRkllEntWuW 4oS/zEOEHmHPzAQbXg2BLbxiekWgla47KbjPDiCk= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id F35A43858D38 for ; Tue, 13 Jun 2023 02:03:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F35A43858D38 Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D1qaB4008139; Tue, 13 Jun 2023 02:03:54 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f6886np-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:54 +0000 Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D1ujaR017694; Tue, 13 Jun 2023 02:03:53 GMT Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f6886n0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:53 +0000 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D1qLEv027734; Tue, 13 Jun 2023 02:03:51 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma05fra.de.ibm.com (PPS) with ESMTPS id 3r4gt4sbt2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:51 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35D23nvI58458608 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jun 2023 02:03:49 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2722020040; Tue, 13 Jun 2023 02:03:49 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3A62B20043; Tue, 13 Jun 2023 02:03:48 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 13 Jun 2023 02:03:48 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com Subject: [PATCH 6/9] vect: Adjust vectorizable_load costing on VMAT_LOAD_STORE_LANES Date: Mon, 12 Jun 2023 21:03:27 -0500 Message-Id: <1a263aa46335ad08c0cd198b4c2075560a3ed44d.1686573640.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 6T6bzLIMIc7ASoN7NIpRuh_KhZJsOiUF X-Proofpoint-GUID: pSeY0fP0Wl69qI97tKf14nsnH8so7R2h X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 bulkscore=0 phishscore=0 lowpriorityscore=0 malwarescore=0 suspectscore=0 spamscore=0 mlxlogscore=999 impostorscore=0 priorityscore=1501 clxscore=1015 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306130016 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adjusts the cost handling on VMAT_LOAD_STORE_LANES in function vectorizable_load. We don't call function vect_model_load_cost for it any more. It follows what we do in the function vect_model_load_cost, and shouldn't have any functional changes. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_load): Adjust the cost handling on VMAT_LOAD_STORE_LANES without calling vect_model_load_cost. (vectorizable_load): Remove VMAT_LOAD_STORE_LANES related handling and assert it will never get VMAT_LOAD_STORE_LANES. --- gcc/tree-vect-stmts.cc | 73 ++++++++++++++++++++++++------------------ 1 file changed, 42 insertions(+), 31 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index a3fd0bf879e..4c5ce2ab278 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1137,7 +1137,8 @@ vect_model_load_cost (vec_info *vinfo, gcc_assert (memory_access_type != VMAT_GATHER_SCATTER && memory_access_type != VMAT_INVARIANT && memory_access_type != VMAT_ELEMENTWISE - && memory_access_type != VMAT_STRIDED_SLP); + && memory_access_type != VMAT_STRIDED_SLP + && memory_access_type != VMAT_LOAD_STORE_LANES); unsigned int inside_cost = 0, prologue_cost = 0; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -1176,31 +1177,6 @@ vect_model_load_cost (vec_info *vinfo, once per group anyhow. */ bool first_stmt_p = (first_stmt_info == stmt_info); - /* An IFN_LOAD_LANES will load all its vector results, regardless of which - ones we actually need. Account for the cost of unused results. */ - if (first_stmt_p && !slp_node && memory_access_type == VMAT_LOAD_STORE_LANES) - { - unsigned int gaps = DR_GROUP_SIZE (first_stmt_info); - stmt_vec_info next_stmt_info = first_stmt_info; - do - { - gaps -= 1; - next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info); - } - while (next_stmt_info); - if (gaps) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_load_cost: %d unused vectors.\n", - gaps); - vect_get_load_cost (vinfo, stmt_info, ncopies * gaps, - alignment_support_scheme, misalignment, false, - &inside_cost, &prologue_cost, - cost_vec, cost_vec, true); - } - } - /* We assume that the cost of a single load-lanes instruction is equivalent to the cost of DR_GROUP_SIZE separate loads. If a grouped access is instead being provided by a load-and-permute operation, @@ -10110,7 +10086,7 @@ vectorizable_load (vec_info *vinfo, } tree vec_mask = NULL_TREE; poly_uint64 group_elt = 0; - unsigned int inside_cost = 0; + unsigned int inside_cost = 0, prologue_cost = 0; for (j = 0; j < ncopies; j++) { /* 1. Create the vector or array pointer update chain. */ @@ -10190,8 +10166,42 @@ vectorizable_load (vec_info *vinfo, dr_chain.create (vec_num); gimple *new_stmt = NULL; - if (memory_access_type == VMAT_LOAD_STORE_LANES && !costing_p) + if (memory_access_type == VMAT_LOAD_STORE_LANES) { + if (costing_p) + { + /* An IFN_LOAD_LANES will load all its vector results, + regardless of which ones we actually need. Account + for the cost of unused results. */ + if (grouped_load && first_stmt_info == stmt_info) + { + unsigned int gaps = DR_GROUP_SIZE (first_stmt_info); + stmt_vec_info next_stmt_info = first_stmt_info; + do + { + gaps -= 1; + next_stmt_info = DR_GROUP_NEXT_ELEMENT (next_stmt_info); + } + while (next_stmt_info); + if (gaps) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_load_cost: %d " + "unused vectors.\n", + gaps); + vect_get_load_cost (vinfo, stmt_info, gaps, + alignment_support_scheme, + misalignment, false, &inside_cost, + &prologue_cost, cost_vec, cost_vec, + true); + } + } + vect_get_load_cost (vinfo, stmt_info, 1, alignment_support_scheme, + misalignment, false, &inside_cost, + &prologue_cost, cost_vec, cost_vec, true); + continue; + } tree vec_array; vec_array = create_vector_array (vectype, vec_num); @@ -10771,13 +10781,14 @@ vec_num_loop_costing_end: if (costing_p) { - if (memory_access_type == VMAT_GATHER_SCATTER) + if (memory_access_type == VMAT_GATHER_SCATTER + || memory_access_type == VMAT_LOAD_STORE_LANES) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vect_model_load_cost: inside_cost = %u, " - "prologue_cost = 0 .\n", - inside_cost); + "prologue_cost = %u .\n", + inside_cost, prologue_cost); } else vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type, From patchwork Tue Jun 13 02:03:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1794261 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=mQlWcDe0; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QgBlv6l26z20QH for ; Tue, 13 Jun 2023 12:07:55 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DE86B3857C45 for ; Tue, 13 Jun 2023 02:07:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DE86B3857C45 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686622073; bh=ImoyiwXX94q+FFMQvdFVUKR8M60ZthY2akheBRg/UW4=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=mQlWcDe0rbwm6VutQ7TH4mnMv7ycgpTLrPcnlnwGtFBRu9hwAWpzXYavSVitsPRW2 6y3xTqigFv71cS8F7ZjOpH2zaTgA3c9RQHZLCQkir/9utAjICwMbjMtWLrZWNCtfYu TbuZRrIGt4EBwzfScFrPVjcDOeUuq4qD9KECJDGA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 8BBC8385C6E8 for ; Tue, 13 Jun 2023 02:07:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8BBC8385C6E8 Received: from pps.filterd (m0353722.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D27QHs012364; Tue, 13 Jun 2023 02:07:31 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f3era6r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:07:30 +0000 Received: from m0353722.ppops.net (m0353722.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D27TVU012849; Tue, 13 Jun 2023 02:07:29 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f3era21-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:07:28 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D0J485032234; Tue, 13 Jun 2023 02:03:52 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma03ams.nl.ibm.com (PPS) with ESMTPS id 3r4gt51u1c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:52 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35D23oxv27197988 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jun 2023 02:03:50 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5314C20043; Tue, 13 Jun 2023 02:03:50 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 64FF720040; Tue, 13 Jun 2023 02:03:49 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 13 Jun 2023 02:03:49 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com Subject: [PATCH 7/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_REVERSE Date: Mon, 12 Jun 2023 21:03:28 -0500 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: LTRSoTFJCf6CNUvPZkwnE0L90vWKn63m X-Proofpoint-ORIG-GUID: CXPBCiOfpGBWTpc4KvoEfczgtoeTXfEf X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 spamscore=0 malwarescore=0 adultscore=0 suspectscore=0 mlxlogscore=999 clxscore=1015 impostorscore=0 bulkscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306130016 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adjusts the cost handling on VMAT_CONTIGUOUS_REVERSE in function vectorizable_load. We don't call function vect_model_load_cost for it any more. This change makes us not miscount some required vector permutation as the associated test case shows. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_load_cost): Assert it won't get VMAT_CONTIGUOUS_REVERSE any more. (vectorizable_load): Adjust the costing handling on VMAT_CONTIGUOUS_REVERSE without calling vect_model_load_cost. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c: New test. --- .../costmodel/ppc/costmodel-vect-reversed.c | 22 ++++ gcc/tree-vect-stmts.cc | 109 ++++++++++++------ 2 files changed, 93 insertions(+), 38 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c new file mode 100644 index 00000000000..651274be038 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-reversed.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-additional-options "-mvsx" } */ + +/* Verify we do cost the required vec_perm. */ + +int x[1024], y[1024]; + +void +foo () +{ + for (int i = 0; i < 512; ++i) + { + x[2 * i] = y[1023 - (2 * i)]; + x[2 * i + 1] = y[1023 - (2 * i + 1)]; + } +} +/* The reason why it doesn't check the exact count is that + retrying for the epilogue with partial vector capability + like Power10 can result in more than 1 vec_perm. */ +/* { dg-final { scan-tree-dump {\mvec_perm\M} "vect" } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 4c5ce2ab278..7f8d9db5363 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1134,11 +1134,8 @@ vect_model_load_cost (vec_info *vinfo, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { - gcc_assert (memory_access_type != VMAT_GATHER_SCATTER - && memory_access_type != VMAT_INVARIANT - && memory_access_type != VMAT_ELEMENTWISE - && memory_access_type != VMAT_STRIDED_SLP - && memory_access_type != VMAT_LOAD_STORE_LANES); + gcc_assert (memory_access_type == VMAT_CONTIGUOUS + || memory_access_type == VMAT_CONTIGUOUS_PERMUTE); unsigned int inside_cost = 0, prologue_cost = 0; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -10292,7 +10289,7 @@ vectorizable_load (vec_info *vinfo, = record_stmt_cost (cost_vec, cnunits, scalar_load, stmt_info, 0, vect_body); - goto vec_num_loop_costing_end; + break; } if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) vec_offset = vec_offsets[vec_num * j + i]; @@ -10335,7 +10332,7 @@ vectorizable_load (vec_info *vinfo, inside_cost = record_stmt_cost (cost_vec, 1, vec_construct, stmt_info, 0, vect_body); - goto vec_num_loop_costing_end; + break; } unsigned HOST_WIDE_INT const_offset_nunits = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype) @@ -10390,7 +10387,7 @@ vectorizable_load (vec_info *vinfo, } if (costing_p) - goto vec_num_loop_costing_end; + break; align = known_alignment (DR_TARGET_ALIGNMENT (first_dr_info)); @@ -10563,7 +10560,7 @@ vectorizable_load (vec_info *vinfo, case dr_explicit_realign: { if (costing_p) - goto vec_num_loop_costing_end; + break; tree ptr, bump; tree vs = size_int (TYPE_VECTOR_SUBPARTS (vectype)); @@ -10627,7 +10624,7 @@ vectorizable_load (vec_info *vinfo, case dr_explicit_realign_optimized: { if (costing_p) - goto vec_num_loop_costing_end; + break; if (TREE_CODE (dataref_ptr) == SSA_NAME) new_temp = copy_ssa_name (dataref_ptr); else @@ -10650,22 +10647,37 @@ vectorizable_load (vec_info *vinfo, default: gcc_unreachable (); } - vec_dest = vect_create_destination_var (scalar_dest, vectype); - /* DATA_REF is null if we've already built the statement. */ - if (data_ref) + + /* One common place to cost the above vect load for different + alignment support schemes. */ + if (costing_p) { - vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); - new_stmt = gimple_build_assign (vec_dest, data_ref); + if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) + vect_get_load_cost (vinfo, stmt_info, 1, + alignment_support_scheme, misalignment, + false, &inside_cost, &prologue_cost, + cost_vec, cost_vec, true); + } + else + { + vec_dest = vect_create_destination_var (scalar_dest, vectype); + /* DATA_REF is null if we've already built the statement. */ + if (data_ref) + { + vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); + new_stmt = gimple_build_assign (vec_dest, data_ref); + } + new_temp = make_ssa_name (vec_dest, new_stmt); + gimple_set_lhs (new_stmt, new_temp); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); } - new_temp = make_ssa_name (vec_dest, new_stmt); - gimple_set_lhs (new_stmt, new_temp); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); /* 3. Handle explicit realignment if necessary/supported. Create in loop: vec_dest = realign_load (msq, lsq, realignment_token) */ - if (alignment_support_scheme == dr_explicit_realign_optimized - || alignment_support_scheme == dr_explicit_realign) + if (!costing_p + && (alignment_support_scheme == dr_explicit_realign_optimized + || alignment_support_scheme == dr_explicit_realign)) { lsq = gimple_assign_lhs (new_stmt); if (!realignment_token) @@ -10690,26 +10702,34 @@ vectorizable_load (vec_info *vinfo, if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) { - tree perm_mask = perm_mask_for_reverse (vectype); - new_temp = permute_vec_elements (vinfo, new_temp, new_temp, - perm_mask, stmt_info, gsi); - new_stmt = SSA_NAME_DEF_STMT (new_temp); + if (costing_p) + inside_cost = record_stmt_cost (cost_vec, 1, vec_perm, + stmt_info, 0, vect_body); + else + { + tree perm_mask = perm_mask_for_reverse (vectype); + new_temp + = permute_vec_elements (vinfo, new_temp, new_temp, + perm_mask, stmt_info, gsi); + new_stmt = SSA_NAME_DEF_STMT (new_temp); + } } /* Collect vector loads and later create their permutation in vect_transform_grouped_load (). */ - if (grouped_load || slp_perm) + if (!costing_p && (grouped_load || slp_perm)) dr_chain.quick_push (new_temp); /* Store vector loads in the corresponding SLP_NODE. */ - if (slp && !slp_perm) + if (!costing_p && slp && !slp_perm) SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); /* With SLP permutation we load the gaps as well, without we need to skip the gaps after we manage to fully load all elements. group_gap_adj is DR_GROUP_SIZE here. */ group_elt += nunits; - if (maybe_ne (group_gap_adj, 0U) + if (!costing_p + && maybe_ne (group_gap_adj, 0U) && !slp_perm && known_eq (group_elt, group_size - group_gap_adj)) { @@ -10724,8 +10744,6 @@ vectorizable_load (vec_info *vinfo, gsi, stmt_info, bump); group_elt = 0; } -vec_num_loop_costing_end: - ; } /* Bump the vector pointer to account for a gap or for excess elements loaded for a permuted SLP load. */ @@ -10748,18 +10766,30 @@ vec_num_loop_costing_end: if (slp && !slp_perm) continue; - if (slp_perm && !costing_p) - { + if (slp_perm) + { unsigned n_perms; /* For SLP we know we've seen all possible uses of dr_chain so direct vect_transform_slp_perm_load to DCE the unused parts. ??? This is a hack to prevent compile-time issues as seen in PR101120 and friends. */ - bool ok = vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, - gsi, vf, false, &n_perms, - nullptr, true); - gcc_assert (ok); - } + if (costing_p + && memory_access_type != VMAT_CONTIGUOUS + && memory_access_type != VMAT_CONTIGUOUS_PERMUTE) + { + vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf, + true, &n_perms, nullptr); + inside_cost = record_stmt_cost (cost_vec, n_perms, vec_perm, + stmt_info, 0, vect_body); + } + else if (!costing_p) + { + bool ok = vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, + gsi, vf, false, &n_perms, + nullptr, true); + gcc_assert (ok); + } + } else if (!costing_p) { if (grouped_load) @@ -10781,8 +10811,11 @@ vec_num_loop_costing_end: if (costing_p) { - if (memory_access_type == VMAT_GATHER_SCATTER - || memory_access_type == VMAT_LOAD_STORE_LANES) + gcc_assert (memory_access_type != VMAT_INVARIANT + && memory_access_type != VMAT_ELEMENTWISE + && memory_access_type != VMAT_STRIDED_SLP); + if (memory_access_type != VMAT_CONTIGUOUS + && memory_access_type != VMAT_CONTIGUOUS_PERMUTE) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, From patchwork Tue Jun 13 02:03:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1794258 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=wy/XmWud; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QgBkF5QJ8z20QH for ; Tue, 13 Jun 2023 12:06:29 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B91773857016 for ; Tue, 13 Jun 2023 02:06:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B91773857016 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686621987; bh=UWeazaYN0EcO47hx+2yvOIhXof+YK144D1RZcXN1UYo=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=wy/XmWud1D19485Xgi8BFausUNwPvdYjXygQtj91H1DSk/iG0LuSPnnO9EWkWpa9R 4Sv1jZZNf4VYeahMPGyE4UG8I0XxxQwz0vwVEDulvxTTiFDzx7fAByVQLBcB+ZHIaD AHjvq5irfejIgSx1sb3oKwLeD+GT3auR8+QFGUWM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 2ACD93858C74 for ; Tue, 13 Jun 2023 02:03:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2ACD93858C74 Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D1qaB5008139; Tue, 13 Jun 2023 02:03:57 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f6886pw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:56 +0000 Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D1xV8A026350; Tue, 13 Jun 2023 02:03:56 GMT Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6f6886p7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:56 +0000 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D1uSHa031197; Tue, 13 Jun 2023 02:03:54 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma05fra.de.ibm.com (PPS) with ESMTPS id 3r4gt4sbt3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:54 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35D23phU18547298 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jun 2023 02:03:51 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BBA062004B; Tue, 13 Jun 2023 02:03:51 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 906F720040; Tue, 13 Jun 2023 02:03:50 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 13 Jun 2023 02:03:50 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com, ubizjak@gmail.com, hongtao.liu@intel.com Subject: [PATCH 8/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS_PERMUTE Date: Mon, 12 Jun 2023 21:03:29 -0500 Message-Id: <216bf6e61d4fe2caa6b87ae1e5c8e15b6d31c409.1686573640.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: J5OwV_98bKJCcbKliKwJrpaeYtsci4Fn X-Proofpoint-GUID: Z8UQk-ZCwXKPGzI_xQ3vJWIl2fGKb1Yp X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 bulkscore=0 phishscore=0 lowpriorityscore=0 malwarescore=0 suspectscore=0 spamscore=0 mlxlogscore=999 impostorscore=0 priorityscore=1501 clxscore=1011 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306130016 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adjusts the cost handling on VMAT_CONTIGUOUS_PERMUTE in function vectorizable_load. We don't call function vect_model_load_cost for it any more. As the affected test case gcc.target/i386/pr70021.c shows, the previous costing can under-cost the total generated vector loads as for VMAT_CONTIGUOUS_PERMUTE function vect_model_load_cost doesn't consider the group size which is considered as vec_num during the transformation. This patch makes the count of vector load in costing become consistent with what we generates during the transformation. To be more specific, for the given test case, for memory access b[i_20], it costed for 2 vector loads before, with this patch it costs 8 instead, it matches the final count of generated vector loads basing from b. This costing change makes cost model analysis feel it's not profitable to vectorize the first loop, so this patch adjusts the test case without vect cost model any more. But note that this test case also exposes something we can improve further is that although the number of vector permutation what we costed and generated are consistent, but DCE can further optimize some unused permutation out, it would be good if we can predict that and generate only those necessary permutations. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_load_cost): Assert this function only handle memory_access_type VMAT_CONTIGUOUS, remove some VMAT_CONTIGUOUS_PERMUTE related handlings. (vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS_PERMUTE without calling vect_model_load_cost. gcc/testsuite/ChangeLog: * gcc.target/i386/pr70021.c: Adjust with -fno-vect-cost-model. --- gcc/testsuite/gcc.target/i386/pr70021.c | 2 +- gcc/tree-vect-stmts.cc | 88 ++++++++++++++----------- 2 files changed, 51 insertions(+), 39 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/pr70021.c b/gcc/testsuite/gcc.target/i386/pr70021.c index 6562c0f2bd0..d509583601e 100644 --- a/gcc/testsuite/gcc.target/i386/pr70021.c +++ b/gcc/testsuite/gcc.target/i386/pr70021.c @@ -1,7 +1,7 @@ /* PR target/70021 */ /* { dg-do run } */ /* { dg-require-effective-target avx2 } */ -/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details -mtune=skylake" } */ +/* { dg-options "-O2 -ftree-vectorize -mavx2 -fdump-tree-vect-details -mtune=skylake -fno-vect-cost-model" } */ #include "avx2-check.h" diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 7f8d9db5363..e7a97dbe05d 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1134,8 +1134,7 @@ vect_model_load_cost (vec_info *vinfo, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { - gcc_assert (memory_access_type == VMAT_CONTIGUOUS - || memory_access_type == VMAT_CONTIGUOUS_PERMUTE); + gcc_assert (memory_access_type == VMAT_CONTIGUOUS); unsigned int inside_cost = 0, prologue_cost = 0; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -1174,26 +1173,6 @@ vect_model_load_cost (vec_info *vinfo, once per group anyhow. */ bool first_stmt_p = (first_stmt_info == stmt_info); - /* We assume that the cost of a single load-lanes instruction is - equivalent to the cost of DR_GROUP_SIZE separate loads. If a grouped - access is instead being provided by a load-and-permute operation, - include the cost of the permutes. */ - if (first_stmt_p - && memory_access_type == VMAT_CONTIGUOUS_PERMUTE) - { - /* Uses an even and odd extract operations or shuffle operations - for each needed permute. */ - int group_size = DR_GROUP_SIZE (first_stmt_info); - int nstmts = ncopies * ceil_log2 (group_size) * group_size; - inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm, - stmt_info, 0, vect_body); - - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_load_cost: strided group_size = %d .\n", - group_size); - } - vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, misalignment, first_stmt_p, &inside_cost, &prologue_cost, cost_vec, cost_vec, true); @@ -10652,11 +10631,22 @@ vectorizable_load (vec_info *vinfo, alignment support schemes. */ if (costing_p) { - if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) + /* For VMAT_CONTIGUOUS_PERMUTE if it's grouped load, we + only need to take care of the first stmt, whose + stmt_info is first_stmt_info, vec_num iterating on it + will cover the cost for the remaining, it's consistent + with transforming. For the prologue cost for realign, + we only need to count it once for the whole group. */ + bool first_stmt_info_p = first_stmt_info == stmt_info; + bool add_realign_cost = first_stmt_info_p && i == 0; + if (memory_access_type == VMAT_CONTIGUOUS_REVERSE + || (memory_access_type == VMAT_CONTIGUOUS_PERMUTE + && (!grouped_load || first_stmt_info_p))) vect_get_load_cost (vinfo, stmt_info, 1, alignment_support_scheme, misalignment, - false, &inside_cost, &prologue_cost, - cost_vec, cost_vec, true); + add_realign_cost, &inside_cost, + &prologue_cost, cost_vec, cost_vec, + true); } else { @@ -10774,8 +10764,7 @@ vectorizable_load (vec_info *vinfo, ??? This is a hack to prevent compile-time issues as seen in PR101120 and friends. */ if (costing_p - && memory_access_type != VMAT_CONTIGUOUS - && memory_access_type != VMAT_CONTIGUOUS_PERMUTE) + && memory_access_type != VMAT_CONTIGUOUS) { vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf, true, &n_perms, nullptr); @@ -10790,20 +10779,44 @@ vectorizable_load (vec_info *vinfo, gcc_assert (ok); } } - else if (!costing_p) + else { if (grouped_load) { if (memory_access_type != VMAT_LOAD_STORE_LANES) - vect_transform_grouped_load (vinfo, stmt_info, dr_chain, - group_size, gsi); - *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; - } - else - { - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + { + gcc_assert (memory_access_type == VMAT_CONTIGUOUS_PERMUTE); + /* We assume that the cost of a single load-lanes instruction + is equivalent to the cost of DR_GROUP_SIZE separate loads. + If a grouped access is instead being provided by a + load-and-permute operation, include the cost of the + permutes. */ + if (costing_p && first_stmt_info == stmt_info) + { + /* Uses an even and odd extract operations or shuffle + operations for each needed permute. */ + int group_size = DR_GROUP_SIZE (first_stmt_info); + int nstmts = ceil_log2 (group_size) * group_size; + inside_cost + += record_stmt_cost (cost_vec, nstmts, vec_perm, + stmt_info, 0, vect_body); + + if (dump_enabled_p ()) + dump_printf_loc ( + MSG_NOTE, vect_location, + "vect_model_load_cost: strided group_size = %d .\n", + group_size); + } + else if (!costing_p) + vect_transform_grouped_load (vinfo, stmt_info, dr_chain, + group_size, gsi); + } + if (!costing_p) + *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; } - } + else if (!costing_p) + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + } dr_chain.release (); } if (!slp && !costing_p) @@ -10814,8 +10827,7 @@ vectorizable_load (vec_info *vinfo, gcc_assert (memory_access_type != VMAT_INVARIANT && memory_access_type != VMAT_ELEMENTWISE && memory_access_type != VMAT_STRIDED_SLP); - if (memory_access_type != VMAT_CONTIGUOUS - && memory_access_type != VMAT_CONTIGUOUS_PERMUTE) + if (memory_access_type != VMAT_CONTIGUOUS) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, From patchwork Tue Jun 13 02:03:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1794257 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=QnTnwc5x; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QgBjC5B2Tz20QH for ; Tue, 13 Jun 2023 12:05:35 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AE6363858D32 for ; Tue, 13 Jun 2023 02:05:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AE6363858D32 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1686621933; bh=KMYcvLTeXv9fOzBkMngu3Q31sMhdAR+5kddROD1r5V4=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=QnTnwc5xrBe4PjmjFQ3T7GRN96+86rB4UiU9wTgQ3Y9YBWq/VLhe4lBArTNw32WXZ yVXX61mSuRwd/CgcHb0tgRQtODvXdOwgI5Z+16Rk4l70qTODOeaBPkGd+c4IYIp+Ti Lx+gOefoj3T+i+pv1KYLHjCi+REm/ldS4k9I+g6M= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 045AD38582A3 for ; Tue, 13 Jun 2023 02:04:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 045AD38582A3 Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35D21fHv005028; Tue, 13 Jun 2023 02:03:59 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6fag81ry-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:58 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35D223ve006045; Tue, 13 Jun 2023 02:03:58 GMT Received: from ppma02fra.de.ibm.com (47.49.7a9f.ip4.static.sl-reverse.com [159.122.73.71]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3r6fag81r0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:57 +0000 Received: from pps.filterd (ppma02fra.de.ibm.com [127.0.0.1]) by ppma02fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35D12lgc026986; Tue, 13 Jun 2023 02:03:55 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma02fra.de.ibm.com (PPS) with ESMTPS id 3r4gt51bmr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Jun 2023 02:03:55 +0000 Received: from smtpav04.fra02v.mail.ibm.com (smtpav04.fra02v.mail.ibm.com [10.20.54.103]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35D23rOL52101622 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Jun 2023 02:03:53 GMT Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E68C220043; Tue, 13 Jun 2023 02:03:52 +0000 (GMT) Received: from smtpav04.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 050BF2004B; Tue, 13 Jun 2023 02:03:52 +0000 (GMT) Received: from trout.aus.stglabs.ibm.com (unknown [9.40.194.100]) by smtpav04.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 13 Jun 2023 02:03:51 +0000 (GMT) To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com Subject: [PATCH 9/9] vect: Adjust vectorizable_load costing on VMAT_CONTIGUOUS Date: Mon, 12 Jun 2023 21:03:30 -0500 Message-Id: <625eccff9102ffe35497ad03ebd8242d6d6b06a4.1686573640.git.linkw@linux.ibm.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: p1cueWveSHt7twT9OiOa-AXj9dS7R63W X-Proofpoint-GUID: nohtEyf5Duo8jh7GliekJv7y3HkwwH1N X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-06-12_18,2023-06-12_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 clxscore=1015 priorityscore=1501 impostorscore=0 mlxscore=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 lowpriorityscore=0 mlxlogscore=999 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306130016 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Kewen Lin via Gcc-patches From: "Kewen.Lin" Reply-To: Kewen Lin Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adjusts the cost handling on VMAT_CONTIGUOUS in function vectorizable_load. We don't call function vect_model_load_cost for it any more. It removes function vect_model_load_cost which becomes useless and unreachable now. gcc/ChangeLog: * tree-vect-stmts.cc (vect_model_load_cost): Remove. (vectorizable_load): Adjust the cost handling on VMAT_CONTIGUOUS without calling vect_model_load_cost. --- gcc/tree-vect-stmts.cc | 92 +++++------------------------------------- 1 file changed, 9 insertions(+), 83 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index e7a97dbe05d..be3b277e8e1 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1117,73 +1117,6 @@ vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, int ncopies, } } - -/* Function vect_model_load_cost - - Models cost for loads. In the case of grouped accesses, one access has - the overhead of the grouped access attributed to it. Since unaligned - accesses are supported for loads, we also account for the costs of the - access scheme chosen. */ - -static void -vect_model_load_cost (vec_info *vinfo, - stmt_vec_info stmt_info, unsigned ncopies, poly_uint64 vf, - vect_memory_access_type memory_access_type, - dr_alignment_support alignment_support_scheme, - int misalignment, - slp_tree slp_node, - stmt_vector_for_cost *cost_vec) -{ - gcc_assert (memory_access_type == VMAT_CONTIGUOUS); - - unsigned int inside_cost = 0, prologue_cost = 0; - bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); - - gcc_assert (cost_vec); - - /* ??? Somehow we need to fix this at the callers. */ - if (slp_node) - ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); - - if (slp_node && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()) - { - /* If the load is permuted then the alignment is determined by - the first group element not by the first scalar stmt DR. */ - stmt_vec_info first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info); - /* Record the cost for the permutation. */ - unsigned n_perms, n_loads; - vect_transform_slp_perm_load (vinfo, slp_node, vNULL, NULL, - vf, true, &n_perms, &n_loads); - inside_cost += record_stmt_cost (cost_vec, n_perms, vec_perm, - first_stmt_info, 0, vect_body); - - /* And adjust the number of loads performed. This handles - redundancies as well as loads that are later dead. */ - ncopies = n_loads; - } - - /* Grouped loads read all elements in the group at once, - so we want the DR for the first statement. */ - stmt_vec_info first_stmt_info = stmt_info; - if (!slp_node && grouped_access_p) - first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info); - - /* True if we should include any once-per-group costs as well as - the cost of the statement itself. For SLP we only get called - once per group anyhow. */ - bool first_stmt_p = (first_stmt_info == stmt_info); - - vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, - misalignment, first_stmt_p, &inside_cost, &prologue_cost, - cost_vec, cost_vec, true); - - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_load_cost: inside_cost = %d, " - "prologue_cost = %d .\n", inside_cost, prologue_cost); -} - - /* Calculate cost of DR's memory access. */ void vect_get_load_cost (vec_info *, stmt_vec_info stmt_info, int ncopies, @@ -10639,7 +10572,8 @@ vectorizable_load (vec_info *vinfo, we only need to count it once for the whole group. */ bool first_stmt_info_p = first_stmt_info == stmt_info; bool add_realign_cost = first_stmt_info_p && i == 0; - if (memory_access_type == VMAT_CONTIGUOUS_REVERSE + if (memory_access_type == VMAT_CONTIGUOUS + || memory_access_type == VMAT_CONTIGUOUS_REVERSE || (memory_access_type == VMAT_CONTIGUOUS_PERMUTE && (!grouped_load || first_stmt_info_p))) vect_get_load_cost (vinfo, stmt_info, 1, @@ -10763,15 +10697,14 @@ vectorizable_load (vec_info *vinfo, direct vect_transform_slp_perm_load to DCE the unused parts. ??? This is a hack to prevent compile-time issues as seen in PR101120 and friends. */ - if (costing_p - && memory_access_type != VMAT_CONTIGUOUS) + if (costing_p) { vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf, true, &n_perms, nullptr); inside_cost = record_stmt_cost (cost_vec, n_perms, vec_perm, stmt_info, 0, vect_body); } - else if (!costing_p) + else { bool ok = vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, gsi, vf, false, &n_perms, @@ -10827,18 +10760,11 @@ vectorizable_load (vec_info *vinfo, gcc_assert (memory_access_type != VMAT_INVARIANT && memory_access_type != VMAT_ELEMENTWISE && memory_access_type != VMAT_STRIDED_SLP); - if (memory_access_type != VMAT_CONTIGUOUS) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_load_cost: inside_cost = %u, " - "prologue_cost = %u .\n", - inside_cost, prologue_cost); - } - else - vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type, - alignment_support_scheme, misalignment, slp_node, - cost_vec); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_load_cost: inside_cost = %u, " + "prologue_cost = %u .\n", + inside_cost, prologue_cost); } return true;