From patchwork Wed Aug 9 06:36:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 1819150 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RLL1q56H6z1yfD for ; Wed, 9 Aug 2023 16:36:47 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A340F3856965 for ; Wed, 9 Aug 2023 06:36:45 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbguseast2.qq.com (smtpbguseast2.qq.com [54.204.34.130]) by sourceware.org (Postfix) with ESMTPS id F37EE3858404 for ; Wed, 9 Aug 2023 06:36:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F37EE3858404 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp87t1691562985tqx3y770 Received: from server1.localdomain ( [58.60.1.10]) by bizesmtp.qq.com (ESMTP) with id ; Wed, 09 Aug 2023 14:36:24 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: pPKMqzLgSAT6jUdf+NMeHN7Vp0QqLSXVXxXfl2vWjw1xiZYmwANIbdzN+trh4 UVBa7ftl6eBySgQXbfnUTYuMUtpwASvpqViyDsScuB1RW5kCJRYtJQqoDnXr3cpEZLy8vc5 1qO2mIygSmc8chho1ho8waHIbzjCyOwYUhlld+BeYElpWp4bM5EUFdpyzrpSuDxIvPNPI+O gIO4hUPV44mtTM84iXWFBR7PpOfTycvM9T4Kp7KqAzkiFHWcdfyj307zceBNvyfz6XJyDXC S7Vwkc+v+4Qub5rLuUUiSup5XeyNdzbaiIb/Orbr6tMofivpne2uW/4yczi39ad6IiPFmMU hrhR/ABab38Cf9J54/133dNopV/RX1M76yyB5SCotXuoZhMex7BlwslXzoRBzTDsH/jBKzo HtwQgHFPg8xza6X6KTfQUQ== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 14944133363159421833 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH] VECT: Support loop len control on EXTRACT_LAST vectorization Date: Wed, 9 Aug 2023 14:36:22 +0800 Message-Id: <20230809063622.316743-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: Ju-Zhe Zhong Hi, this patch is adding loop len control on extract_last autovectorization. Consider this following case: #include #define EXTRACT_LAST(TYPE) \ TYPE __attribute__ ((noinline, noclone)) \ test_##TYPE (TYPE *x, int n, TYPE value) \ { \ TYPE last; \ for (int j = 0; j < n; ++j) \ { \ last = x[j]; \ x[j] = last * value; \ } \ return last; \ } #define TEST_ALL(T) \ T (uint8_t) \ TEST_ALL (EXTRACT_LAST) ARM SVE IR: Preheader: max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... }); Loop: ... # loop_mask_22 = PHI ... vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22); vect__4.9_27 = vect_last_12.8_23 * vect_cst__26; .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27); ... next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... }); ... Epilogue: _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23); For RVV since we prefer len in loop control, after this patch for RVV: Loop: ... loop_len_22 = SELECT_VL; vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22); vect__4.9_27 = vect_last_12.8_23 * vect_cst__26; .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27); ... Epilogue: _25 = .EXTRACT_LAST (loop_len_22, vect_last_12.8_23); This patch didn't add a new pattern for length loop control of extract_last. Instead we reuse current extract_last. Here is the code: Step 1 - Enable length and record length for extract_last: + machine_mode vec_mode = TYPE_MODE (vectype); + if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode)) + vect_record_loop_len (loop_vinfo, + &LOOP_VINFO_LENS (loop_vinfo), 1, + vectype, 1); + else + vect_record_loop_mask (loop_vinfo, + &LOOP_VINFO_MASKS (loop_vinfo), 1, + vectype, NULL); We use 'get_len_load_store_mode' to check whether targets support loop len control or not. If yes, record a loop len. Step 2 - Build EXTRACT_LAST with len: - tree mask = vect_get_loop_mask (loop_vinfo, gsi, - &LOOP_VINFO_MASKS (loop_vinfo), - 1, vectype, 0); + tree control; + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) + control = vect_get_loop_len (loop_vinfo, gsi, + &LOOP_VINFO_LENS (loop_vinfo), 1, + vectype, 0, 0); + else + control = vect_get_loop_mask (loop_vinfo, gsi, + &LOOP_VINFO_MASKS (loop_vinfo), 1, + vectype, 0); tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type, - mask, vec_lhs_phi); + control, vec_lhs_phi); Reuse the current codes (build EXTRACT_LAST with mask), build length instead if 'LOOP_VINFO_FULLY_WITH_LENGTH_P' is true. This patch has been fully tested in RISC-V port. Bootstrap and Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * tree-vect-loop.cc (vectorizable_live_operation): Add length control. --- gcc/tree-vect-loop.cc | 40 ++++++++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 12 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 00058c3c13e..fde098cafde 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -10311,9 +10311,15 @@ vectorizable_live_operation (vec_info *vinfo, else { gcc_assert (ncopies == 1 && !slp_node); - vect_record_loop_mask (loop_vinfo, - &LOOP_VINFO_MASKS (loop_vinfo), - 1, vectype, NULL); + machine_mode vec_mode = TYPE_MODE (vectype); + if (get_len_load_store_mode (vec_mode, true).exists (&vec_mode)) + vect_record_loop_len (loop_vinfo, + &LOOP_VINFO_LENS (loop_vinfo), 1, + vectype, 1); + else + vect_record_loop_mask (loop_vinfo, + &LOOP_VINFO_MASKS (loop_vinfo), 1, + vectype, NULL); } } /* ??? Enable for loop costing as well. */ @@ -10339,7 +10345,9 @@ vectorizable_live_operation (vec_info *vinfo, gimple *vec_stmt; if (slp_node) { - gcc_assert (!loop_vinfo || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)); + gcc_assert (!loop_vinfo + || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) + || !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)); /* Get the correct slp vectorized stmt. */ vec_lhs = SLP_TREE_VEC_DEFS (slp_node)[vec_entry]; @@ -10383,21 +10391,29 @@ vectorizable_live_operation (vec_info *vinfo, gimple_seq stmts = NULL; tree new_tree; - if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) + || LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) { /* Emit: - SCALAR_RES = EXTRACT_LAST + SCALAR_RES = EXTRACT_LAST - where VEC_LHS is the vectorized live-out result and MASK is - the loop mask for the final iteration. */ + where VEC_LHS is the vectorized live-out result and CONTROL can + be either the loop mask for the final iteration or the loop len + for the final iteration. */ gcc_assert (ncopies == 1 && !slp_node); tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info)); - tree mask = vect_get_loop_mask (loop_vinfo, gsi, - &LOOP_VINFO_MASKS (loop_vinfo), - 1, vectype, 0); + tree control; + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) + control = vect_get_loop_len (loop_vinfo, gsi, + &LOOP_VINFO_LENS (loop_vinfo), 1, + vectype, 0, 0); + else + control = vect_get_loop_mask (loop_vinfo, gsi, + &LOOP_VINFO_MASKS (loop_vinfo), 1, + vectype, 0); tree scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type, - mask, vec_lhs_phi); + control, vec_lhs_phi); /* Convert the extracted vector element to the scalar type. */ new_tree = gimple_convert (&stmts, lhs_type, scalar_res);