From patchwork Fri Aug 11 14:24:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 1820300 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RMmKS0xYMz1yf7 for ; Sat, 12 Aug 2023 00:25:15 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 01F143857351 for ; Fri, 11 Aug 2023 14:25:12 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgeu1.qq.com (smtpbgeu1.qq.com [52.59.177.22]) by sourceware.org (Postfix) with ESMTPS id 7F4323858D20 for ; Fri, 11 Aug 2023 14:24:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7F4323858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp72t1691763891t4k64xik Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 11 Aug 2023 22:24:49 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: IcCSTr/hHjPIWJvCh6AY0ty7Eb80TDU63w5859M9d/0oRUoFVUw3HkeWDPnGN VTYES4P49VaTD6wmg/f4QApPQSnE4DNM+o8SBlCurqUN+jlGFEedTos+bFFZnPOCA/Bs4fB gjze+//ercvBcz4xHoWQAJGZfldWelupzSPKQELiAia9LZQSWcLqkKKE+HQWh4u2Xm1Dr19 dYEhX5HE1y5n3+B4CXYcHPN90WAnbKo+wcXrdyHjGOb78gLqXmU6M00y8cxOwl8IpcFXE7O iTb5NEF1pcn8Pzy9ENRlPJHv6dfUPNsXzd4QG6TQNUyKhxyfOO/KM/h34jb9OmctmDQroRi xoSgapXLwNfjOERfAg4vYUbn53brH4ugtSxRPcHjHws6ylm6HMtMM8yT4p73WRZsgIYN769 1ZnddJKLjPWHCeve+vjkFQ== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 5780460162900055646 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, richard.sandiford@arm.com, Ju-Zhe Zhong Subject: [PATCH V4] VECT: Support loop len control on EXTRACT_LAST vectorization Date: Fri, 11 Aug 2023 22:24:48 +0800 Message-Id: <20230811142448.2913622-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_PASS, TXREP, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: Ju-Zhe Zhong Hi, Richard and Richi. This patch add support live vectorization by VEC_EXTRACT for LEN loop control. Consider this following case: #include #define EXTRACT_LAST(TYPE) \ TYPE __attribute__ ((noinline, noclone)) \ test_##TYPE (TYPE *x, int n, TYPE value) \ { \ TYPE last; \ for (int j = 0; j < n; ++j) \ { \ last = x[j]; \ x[j] = last * value; \ } \ return last; \ } #define TEST_ALL(T) \ T (uint8_t) \ TEST_ALL (EXTRACT_LAST) ARM SVE IR: Preheader: max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... }); Loop: ... # loop_mask_22 = PHI ... vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22); vect__4.9_27 = vect_last_12.8_23 * vect_cst__26; .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27); ... next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... }); ... Epilogue: _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23); For RVV since we prefer len in loop control, after this patch for RVV: Loop: ... loop_len_22 = SELECT_VL; vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22); vect__4.9_27 = vect_last_12.8_23 * vect_cst__26; .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27); ... Epilogue: _25 = .VEC_EXTRACT (loop_len_22 + bias - 1, vect_last_12.8_23); Details of this approach: 1. Step 1 - Add 'vect_can_vectorize_extract_last_with_len_p' to enable live vectorization for LEN loop control. This function we check whether target support: - Use LEN as the loop control. - Support VEC_EXTRACT optab. 2. Step 2 - Record LEN for loop control if 'vect_can_vectorize_extract_last_with_len_p' is true. 3. Step 3 - Gerenate VEC_EXTRACT (v, LEN + BIAS - 1). The only difference between mask and len is that len is using length generated by SELECT_VL and use VEC_EXTRACT pattern. The rest of the live vectorization is totally the same ARM SVE. gcc/ChangeLog: * tree-vect-loop.cc (vectorizable_live_operation): Add loop len control. --- gcc/tree-vect-loop.cc | 78 ++++++++++++++++++++++++++++++++++--------- 1 file changed, 62 insertions(+), 16 deletions(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index bf8d677b584..a011e2dacb2 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -10278,17 +10278,7 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, /* No transformation required. */ if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) { - if (!direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype, - OPTIMIZE_FOR_SPEED)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "can't operate on partial vectors " - "because the target doesn't support extract " - "last reduction.\n"); - LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; - } - else if (slp_node) + if (slp_node) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -10308,9 +10298,28 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, else { gcc_assert (ncopies == 1 && !slp_node); - vect_record_loop_mask (loop_vinfo, - &LOOP_VINFO_MASKS (loop_vinfo), - 1, vectype, NULL); + if (direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype, + OPTIMIZE_FOR_SPEED)) + vect_record_loop_mask (loop_vinfo, + &LOOP_VINFO_MASKS (loop_vinfo), + 1, vectype, NULL); + else if (convert_optab_handler (vec_extract_optab, + TYPE_MODE (vectype), + TYPE_MODE (TREE_TYPE (vectype))) + != CODE_FOR_nothing) + vect_record_loop_len (loop_vinfo, + &LOOP_VINFO_LENS (loop_vinfo), + 1, vectype, 1); + else + { + if (dump_enabled_p ()) + dump_printf_loc ( + MSG_MISSED_OPTIMIZATION, vect_location, + "can't operate on partial vectors " + "because the target doesn't support extract " + "last reduction.\n"); + LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; + } } } /* ??? Enable for loop costing as well. */ @@ -10336,7 +10345,9 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, gimple *vec_stmt; if (slp_node) { - gcc_assert (!loop_vinfo || !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)); + gcc_assert (!loop_vinfo + || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo) + && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))); /* Get the correct slp vectorized stmt. */ vec_lhs = SLP_TREE_VEC_DEFS (slp_node)[vec_entry]; @@ -10380,7 +10391,42 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, gimple_seq stmts = NULL; tree new_tree; - if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)) + { + /* Emit: + + SCALAR_RES = VEC_EXTRACT + + where VEC_LHS is the vectorized live-out result and MASK is + the loop mask for the final iteration. */ + gcc_assert (ncopies == 1 && !slp_node); + gimple_seq tem = NULL; + gimple_stmt_iterator gsi = gsi_last (tem); + tree len + = vect_get_loop_len (loop_vinfo, &gsi, + &LOOP_VINFO_LENS (loop_vinfo), + 1, vectype, 0, 0); + + /* BIAS - 1. */ + signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias_minus_one + = int_const_binop (MINUS_EXPR, + build_int_cst (TREE_TYPE (len), biasval), + build_one_cst (TREE_TYPE (len))); + + /* LAST_INDEX = LEN + (BIAS - 1). */ + tree last_index = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (len), + len, bias_minus_one); + + /* SCALAR_RES = VEC_EXTRACT . */ + tree scalar_res + = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype), + vec_lhs_phi, last_index); + + /* Convert the extracted vector element to the scalar type. */ + new_tree = gimple_convert (&stmts, lhs_type, scalar_res); + } + else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) { /* Emit: