From patchwork Thu Oct 19 11:46:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1851606 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SB5YH5pXnz1ypX for ; Thu, 19 Oct 2023 22:47:15 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7CCC6385841D for ; Thu, 19 Oct 2023 11:47:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 18CC63858D28 for ; Thu, 19 Oct 2023 11:47:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 18CC63858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 18CC63858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:67c:2178:6::1c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697716022; cv=none; b=gquVKFcXBlcsbaIuRZS75Jcd9TyEvhQDX7UbCprCvKNjLrjXydIXi3udfXW/CmPbbl91LwJ4n7bzWnLd0gJZRq7RO1TFkRRqbU5XnVFN2fpg/OWT6jEXKrake3+YJfq3txdVuuH198vvQHWQnKeJQSNKsFZnWWxE+ofMubGlQKE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697716022; c=relaxed/simple; bh=m3gfdArPiVfchVnYurC8vgZ6jP5PQlefGR29ObA0u5A=; h=Date:From:To:Subject:MIME-Version; b=ZydaNSOPMYflAUHXfO5WxKcy53dSZLYihu3bEP4bpwSk3Xsk9ze+OQ9A8LHxjKccnL0Q7jdOshco1Zs1GEyL9C3gSYNKmiUTAkG9wd6RQOUOa4VnZJukWJK+O4uZfTtHviScmxI6z/t0DG7wTnVGs555Uq1tVqvB1f7X5ysn/PY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id D302121A43 for ; Thu, 19 Oct 2023 11:46:58 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 9C8492C521 for ; Thu, 19 Oct 2023 11:46:58 +0000 (UTC) Date: Thu, 19 Oct 2023 11:46:58 +0000 (UTC) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/2] Refactor x86 vectorized gather path User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 X-Spam-Level: X-Spamd-Bar: / Authentication-Results: smtp-out1.suse.de; dkim=none; dmarc=none; spf=softfail (smtp-out1.suse.de: 149.44.160.134 is neither permitted nor denied by domain of rguenther@suse.de) smtp.mailfrom=rguenther@suse.de X-Rspamd-Server: rspamd2 X-Spamd-Result: default: False [-0.01 / 50.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RWL_MAILSPIKE_GOOD(0.00)[149.44.160.134:from]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-3.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[gcc-patches@gcc.gnu.org]; R_SPF_SOFTFAIL(0.60)[~all:c]; RCPT_COUNT_ONE(0.00)[1]; MISSING_MID(2.50)[]; VIOLATED_DIRECT_SPF(3.50)[]; MX_GOOD(-0.01)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_NA(0.20)[suse.de]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.20)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; BAYES_HAM(-3.00)[100.00%] X-Spam-Score: -0.01 X-Rspamd-Queue-Id: D302121A43 X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Message-Id: <20231019114713.7CCC6385841D@sourceware.org> The following moves the builtin decl gather vectorization path along the internal function and emulated gather vectorization paths, simplifying the existing function down to generating the call and required conversions to the actual argument types. This thereby exposes the unique support of two times larger number of offset or data vector lanes. It also makes the code path handle SLP in principle (but SLP build needs adjustments for this, patch coming). Bootstrapped and tested on x86_64-unnknown-linux-gnu, will push. Richard. * tree-vect-stmts.cc (vect_build_gather_load_calls): Rename to ... (vect_build_one_gather_load_call): ... this. Refactor, inline widening/narrowing support ... (vectorizable_load): ... here, do gather vectorization with builtin decls along other gather vectorization. --- gcc/tree-vect-stmts.cc | 406 ++++++++++++++++++----------------------- 1 file changed, 179 insertions(+), 227 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index e5ff44c25f1..ee5f56bbbda 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2595,268 +2595,99 @@ vect_build_zero_merge_argument (vec_info *vinfo, /* Build a gather load call while vectorizing STMT_INFO. Insert new instructions before GSI and add them to VEC_STMT. GS_INFO describes the gather load operation. If the load is conditional, MASK is the - unvectorized condition and MASK_DT is its definition type, otherwise - MASK is null. */ + vectorized condition, otherwise MASK is null. PTR is the base + pointer and OFFSET is the vectorized offset. */ -static void -vect_build_gather_load_calls (vec_info *vinfo, stmt_vec_info stmt_info, - gimple_stmt_iterator *gsi, - gimple **vec_stmt, - gather_scatter_info *gs_info, - tree mask, - stmt_vector_for_cost *cost_vec) +static gimple * +vect_build_one_gather_load_call (vec_info *vinfo, stmt_vec_info stmt_info, + gimple_stmt_iterator *gsi, + gather_scatter_info *gs_info, + tree ptr, tree offset, tree mask) { - loop_vec_info loop_vinfo = dyn_cast (vinfo); - class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); tree vectype = STMT_VINFO_VECTYPE (stmt_info); - poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); - int ncopies = vect_get_num_copies (loop_vinfo, vectype); - edge pe = loop_preheader_edge (loop); - enum { NARROW, NONE, WIDEN } modifier; - poly_uint64 gather_off_nunits - = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype); - - /* FIXME: Keep the previous costing way in vect_model_load_cost by costing - N scalar loads, but it should be tweaked to use target specific costs - on related gather load calls. */ - if (cost_vec) - { - unsigned int assumed_nunits = vect_nunits_for_cost (vectype); - unsigned int inside_cost; - inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits, - scalar_load, stmt_info, 0, vect_body); - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_load_cost: inside_cost = %d, " - "prologue_cost = 0 .\n", - inside_cost); - return; - } - tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info->decl)); tree rettype = TREE_TYPE (TREE_TYPE (gs_info->decl)); tree srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); - tree ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + /* ptrtype */ arglist = TREE_CHAIN (arglist); tree idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); tree masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); tree scaletype = TREE_VALUE (arglist); - tree real_masktype = masktype; + tree var; gcc_checking_assert (types_compatible_p (srctype, rettype) && (!mask || TREE_CODE (masktype) == INTEGER_TYPE || types_compatible_p (srctype, masktype))); - if (mask) - masktype = truth_type_for (srctype); - - tree mask_halftype = masktype; - tree perm_mask = NULL_TREE; - tree mask_perm_mask = NULL_TREE; - if (known_eq (nunits, gather_off_nunits)) - modifier = NONE; - else if (known_eq (nunits * 2, gather_off_nunits)) - { - modifier = WIDEN; - /* Currently widening gathers and scatters are only supported for - fixed-length vectors. */ - int count = gather_off_nunits.to_constant (); - vec_perm_builder sel (count, count, 1); - for (int i = 0; i < count; ++i) - sel.quick_push (i | (count / 2)); - - vec_perm_indices indices (sel, 1, count); - perm_mask = vect_gen_perm_mask_checked (gs_info->offset_vectype, - indices); - } - else if (known_eq (nunits, gather_off_nunits * 2)) + tree op = offset; + if (!useless_type_conversion_p (idxtype, TREE_TYPE (op))) { - modifier = NARROW; - - /* Currently narrowing gathers and scatters are only supported for - fixed-length vectors. */ - int count = nunits.to_constant (); - vec_perm_builder sel (count, count, 1); - sel.quick_grow (count); - for (int i = 0; i < count; ++i) - sel[i] = i < count / 2 ? i : i + count / 2; - vec_perm_indices indices (sel, 2, count); - perm_mask = vect_gen_perm_mask_checked (vectype, indices); - - ncopies *= 2; - - if (mask && VECTOR_TYPE_P (real_masktype)) - { - for (int i = 0; i < count; ++i) - sel[i] = i | (count / 2); - indices.new_vector (sel, 2, count); - mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices); - } - else if (mask) - mask_halftype = truth_type_for (gs_info->offset_vectype); - } - else - gcc_unreachable (); - - tree scalar_dest = gimple_get_lhs (stmt_info->stmt); - tree vec_dest = vect_create_destination_var (scalar_dest, vectype); - - tree ptr = fold_convert (ptrtype, gs_info->base); - if (!is_gimple_min_invariant (ptr)) - { - gimple_seq seq; - ptr = force_gimple_operand (ptr, &seq, true, NULL_TREE); - basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); - gcc_assert (!new_bb); + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)), + TYPE_VECTOR_SUBPARTS (idxtype))); + var = vect_get_new_ssa_name (idxtype, vect_simple_var); + op = build1 (VIEW_CONVERT_EXPR, idxtype, op); + gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + op = var; } - tree scale = build_int_cst (scaletype, gs_info->scale); - - tree vec_oprnd0 = NULL_TREE; - tree vec_mask = NULL_TREE; tree src_op = NULL_TREE; tree mask_op = NULL_TREE; - tree prev_res = NULL_TREE; - - if (!mask) - { - src_op = vect_build_zero_merge_argument (vinfo, stmt_info, rettype); - mask_op = vect_build_all_ones_mask (vinfo, stmt_info, masktype); - } - - auto_vec vec_oprnds0; - auto_vec vec_masks; - vect_get_vec_defs_for_operand (vinfo, stmt_info, - modifier == WIDEN ? ncopies / 2 : ncopies, - gs_info->offset, &vec_oprnds0); if (mask) - vect_get_vec_defs_for_operand (vinfo, stmt_info, - modifier == NARROW ? ncopies / 2 : ncopies, - mask, &vec_masks, masktype); - for (int j = 0; j < ncopies; ++j) { - tree op, var; - if (modifier == WIDEN && (j & 1)) - op = permute_vec_elements (vinfo, vec_oprnd0, vec_oprnd0, - perm_mask, stmt_info, gsi); - else - op = vec_oprnd0 = vec_oprnds0[modifier == WIDEN ? j / 2 : j]; - - if (!useless_type_conversion_p (idxtype, TREE_TYPE (op))) - { - gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)), - TYPE_VECTOR_SUBPARTS (idxtype))); - var = vect_get_new_ssa_name (idxtype, vect_simple_var); - op = build1 (VIEW_CONVERT_EXPR, idxtype, op); - gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - op = var; - } - - if (mask) + if (!useless_type_conversion_p (masktype, TREE_TYPE (mask))) { - if (mask_perm_mask && (j & 1)) - mask_op = permute_vec_elements (vinfo, mask_op, mask_op, - mask_perm_mask, stmt_info, gsi); - else - { - if (modifier == NARROW) - { - if ((j & 1) == 0) - vec_mask = vec_masks[j / 2]; - } - else - vec_mask = vec_masks[j]; - - mask_op = vec_mask; - if (!useless_type_conversion_p (masktype, TREE_TYPE (vec_mask))) - { - poly_uint64 sub1 = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op)); - poly_uint64 sub2 = TYPE_VECTOR_SUBPARTS (masktype); - gcc_assert (known_eq (sub1, sub2)); - var = vect_get_new_ssa_name (masktype, vect_simple_var); - mask_op = build1 (VIEW_CONVERT_EXPR, masktype, mask_op); - gassign *new_stmt - = gimple_build_assign (var, VIEW_CONVERT_EXPR, mask_op); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - mask_op = var; - } - } - if (modifier == NARROW && !VECTOR_TYPE_P (real_masktype)) - { - var = vect_get_new_ssa_name (mask_halftype, vect_simple_var); - gassign *new_stmt - = gimple_build_assign (var, (j & 1) ? VEC_UNPACK_HI_EXPR - : VEC_UNPACK_LO_EXPR, - mask_op); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - mask_op = var; - } - src_op = mask_op; - } - - tree mask_arg = mask_op; - if (masktype != real_masktype) - { - tree utype, optype = TREE_TYPE (mask_op); - if (VECTOR_TYPE_P (real_masktype) - || TYPE_MODE (real_masktype) == TYPE_MODE (optype)) - utype = real_masktype; + tree utype, optype = TREE_TYPE (mask); + if (VECTOR_TYPE_P (masktype) + || TYPE_MODE (masktype) == TYPE_MODE (optype)) + utype = masktype; else utype = lang_hooks.types.type_for_mode (TYPE_MODE (optype), 1); var = vect_get_new_ssa_name (utype, vect_scalar_var); - mask_arg = build1 (VIEW_CONVERT_EXPR, utype, mask_op); + tree mask_arg = build1 (VIEW_CONVERT_EXPR, utype, mask); gassign *new_stmt - = gimple_build_assign (var, VIEW_CONVERT_EXPR, mask_arg); + = gimple_build_assign (var, VIEW_CONVERT_EXPR, mask_arg); vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); mask_arg = var; - if (!useless_type_conversion_p (real_masktype, utype)) + if (!useless_type_conversion_p (masktype, utype)) { gcc_assert (TYPE_PRECISION (utype) - <= TYPE_PRECISION (real_masktype)); - var = vect_get_new_ssa_name (real_masktype, vect_scalar_var); + <= TYPE_PRECISION (masktype)); + var = vect_get_new_ssa_name (masktype, vect_scalar_var); new_stmt = gimple_build_assign (var, NOP_EXPR, mask_arg); vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); mask_arg = var; } src_op = build_zero_cst (srctype); - } - gimple *new_stmt = gimple_build_call (gs_info->decl, 5, src_op, ptr, op, - mask_arg, scale); - - if (!useless_type_conversion_p (vectype, rettype)) - { - gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype), - TYPE_VECTOR_SUBPARTS (rettype))); - op = vect_get_new_ssa_name (rettype, vect_simple_var); - gimple_call_set_lhs (new_stmt, op); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - var = make_ssa_name (vec_dest); - op = build1 (VIEW_CONVERT_EXPR, vectype, op); - new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + mask_op = mask_arg; } else { - var = make_ssa_name (vec_dest, new_stmt); - gimple_call_set_lhs (new_stmt, var); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + src_op = mask; + mask_op = mask; } + } + else + { + src_op = vect_build_zero_merge_argument (vinfo, stmt_info, rettype); + mask_op = vect_build_all_ones_mask (vinfo, stmt_info, masktype); + } - if (modifier == NARROW) - { - if ((j & 1) == 0) - { - prev_res = var; - continue; - } - var = permute_vec_elements (vinfo, prev_res, var, perm_mask, - stmt_info, gsi); - new_stmt = SSA_NAME_DEF_STMT (var); - } + tree scale = build_int_cst (scaletype, gs_info->scale); + gimple *new_stmt = gimple_build_call (gs_info->decl, 5, src_op, ptr, op, + mask_op, scale); - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + if (!useless_type_conversion_p (vectype, rettype)) + { + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype), + TYPE_VECTOR_SUBPARTS (rettype))); + op = vect_get_new_ssa_name (rettype, vect_simple_var); + gimple_call_set_lhs (new_stmt, op); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + op = build1 (VIEW_CONVERT_EXPR, vectype, op); + new_stmt = gimple_build_assign (NULL_TREE, VIEW_CONVERT_EXPR, op); } - *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; + + return new_stmt; } /* Build a scatter store call while vectorizing STMT_INFO. Insert new @@ -10112,13 +9943,6 @@ vectorizable_load (vec_info *vinfo, dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info), *first_dr_info = NULL; ensure_base_align (dr_info); - if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl) - { - vect_build_gather_load_calls (vinfo, stmt_info, gsi, vec_stmt, &gs_info, - mask, cost_vec); - return true; - } - if (memory_access_type == VMAT_INVARIANT) { gcc_assert (!grouped_load && !mask && !bb_vinfo); @@ -11016,6 +10840,134 @@ vectorizable_load (vec_info *vinfo, new_stmt = call; data_ref = NULL_TREE; } + else if (gs_info.decl) + { + /* The builtin decls path for gather is legacy, x86 only. */ + gcc_assert (!final_len && nunits.is_constant ()); + if (costing_p) + { + unsigned int cnunits = vect_nunits_for_cost (vectype); + inside_cost + = record_stmt_cost (cost_vec, cnunits, scalar_load, + stmt_info, 0, vect_body); + continue; + } + poly_uint64 offset_nunits + = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype); + if (known_eq (nunits, offset_nunits)) + { + new_stmt = vect_build_one_gather_load_call + (vinfo, stmt_info, gsi, &gs_info, + dataref_ptr, vec_offsets[vec_num * j + i], + final_mask); + data_ref = NULL_TREE; + } + else if (known_eq (nunits, offset_nunits * 2)) + { + /* We have a offset vector with half the number of + lanes but the builtins will produce full vectype + data with just the lower lanes filled. */ + new_stmt = vect_build_one_gather_load_call + (vinfo, stmt_info, gsi, &gs_info, + dataref_ptr, vec_offsets[2 * vec_num * j + 2 * i], + final_mask); + tree low = make_ssa_name (vectype); + gimple_set_lhs (new_stmt, low); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + + /* now put upper half of final_mask in final_mask low. */ + if (final_mask + && !SCALAR_INT_MODE_P + (TYPE_MODE (TREE_TYPE (final_mask)))) + { + int count = nunits.to_constant (); + vec_perm_builder sel (count, count, 1); + sel.quick_grow (count); + for (int i = 0; i < count; ++i) + sel[i] = i | (count / 2); + vec_perm_indices indices (sel, 2, count); + tree perm_mask = vect_gen_perm_mask_checked + (TREE_TYPE (final_mask), indices); + new_stmt = gimple_build_assign (NULL_TREE, + VEC_PERM_EXPR, + final_mask, + final_mask, + perm_mask); + final_mask = make_ssa_name (TREE_TYPE (final_mask)); + gimple_set_lhs (new_stmt, final_mask); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + else if (final_mask) + { + new_stmt = gimple_build_assign (NULL_TREE, + VEC_UNPACK_HI_EXPR, + final_mask); + final_mask = make_ssa_name + (truth_type_for (gs_info.offset_vectype)); + gimple_set_lhs (new_stmt, final_mask); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + + new_stmt = vect_build_one_gather_load_call + (vinfo, stmt_info, gsi, &gs_info, + dataref_ptr, + vec_offsets[2 * vec_num * j + 2 * i + 1], + final_mask); + tree high = make_ssa_name (vectype); + gimple_set_lhs (new_stmt, high); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + + /* compose low + high. */ + int count = nunits.to_constant (); + vec_perm_builder sel (count, count, 1); + sel.quick_grow (count); + for (int i = 0; i < count; ++i) + sel[i] = i < count / 2 ? i : i + count / 2; + vec_perm_indices indices (sel, 2, count); + tree perm_mask + = vect_gen_perm_mask_checked (vectype, indices); + new_stmt = gimple_build_assign (NULL_TREE, + VEC_PERM_EXPR, + low, high, perm_mask); + data_ref = NULL_TREE; + } + else if (known_eq (nunits * 2, offset_nunits)) + { + /* We have a offset vector with double the number of + lanes. Select the low/high part accordingly. */ + vec_offset = vec_offsets[(vec_num * j + i) / 2]; + if ((vec_num * j + i) & 1) + { + int count = offset_nunits.to_constant (); + vec_perm_builder sel (count, count, 1); + sel.quick_grow (count); + for (int i = 0; i < count; ++i) + sel[i] = i | (count / 2); + vec_perm_indices indices (sel, 2, count); + tree perm_mask = vect_gen_perm_mask_checked + (TREE_TYPE (vec_offset), indices); + new_stmt = gimple_build_assign (NULL_TREE, + VEC_PERM_EXPR, + vec_offset, + vec_offset, + perm_mask); + vec_offset = make_ssa_name (TREE_TYPE (vec_offset)); + gimple_set_lhs (new_stmt, vec_offset); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + new_stmt = vect_build_one_gather_load_call + (vinfo, stmt_info, gsi, &gs_info, + dataref_ptr, vec_offset, final_mask); + data_ref = NULL_TREE; + } + else + gcc_unreachable (); + } else { /* Emulated gather-scatter. */ From patchwork Thu Oct 19 11:47:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1851607 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SB5Z171jbz1ypX for ; Thu, 19 Oct 2023 22:47:53 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C65893857718 for ; Thu, 19 Oct 2023 11:47:51 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id E2E90385771B for ; Thu, 19 Oct 2023 11:47:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E2E90385771B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E2E90385771B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.220.28 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697716038; cv=none; b=B79keNWli4wWW4UhNmyVwWJVImOBMUE9bDJLlUF2DhuTtGb1AOQG3O9wJoYfQT7szFMEKZa3jy6Ze0iGuCfseULqiyUDqfx5e+4dnl4ELwbAE2/t7B5UT5A+5AIugrnCSSilLs3YQnK7n6DAbVICyqEi6WLQrj6oImP4812YPYE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697716038; c=relaxed/simple; bh=cI83B/gwMYwGqkOM8pGcmUg9soMYGqX1wAo14wfUngc=; h=Date:From:To:Subject:MIME-Version; b=pMZFK9bLjG9qix4t5eeDECr82Gnf5ppnaUMcwwPKiQ1gA+wVlSppfoycquzXb6D6C44X5eh98zakyyKpysVybu9ZhxjsO0M6l6i98but9q+Qi1e6TdCQY6uO+RnwGjVtQkhAZyKVqzRg/64T3QRDRxBxsnV9V7p5NdM6Jk5fcqs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id A88CB21A4D for ; Thu, 19 Oct 2023 11:47:14 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 72FFC2C521 for ; Thu, 19 Oct 2023 11:47:14 +0000 (UTC) Date: Thu, 19 Oct 2023 11:47:14 +0000 (UTC) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/2] tree-optimization/111131 - SLP for non-IFN gathers User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 X-Spam-Level: X-Spamd-Bar: / Authentication-Results: smtp-out1.suse.de; dkim=none; dmarc=none; spf=softfail (smtp-out1.suse.de: 149.44.160.134 is neither permitted nor denied by domain of rguenther@suse.de) smtp.mailfrom=rguenther@suse.de X-Rspamd-Server: rspamd2 X-Spamd-Result: default: False [-0.01 / 50.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RWL_MAILSPIKE_GOOD(0.00)[149.44.160.134:from]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-3.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[gcc-patches@gcc.gnu.org]; R_SPF_SOFTFAIL(0.60)[~all:c]; RCPT_COUNT_ONE(0.00)[1]; MISSING_MID(2.50)[]; VIOLATED_DIRECT_SPF(3.50)[]; MX_GOOD(-0.01)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_NA(0.20)[suse.de]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.20)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; BAYES_HAM(-3.00)[100.00%] X-Spam-Score: -0.01 X-Rspamd-Queue-Id: A88CB21A4D X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Message-Id: <20231019114751.C65893857718@sourceware.org> The following implements SLP vectorization support for gathers without relying on IFNs being pattern detected (and supported by the target). That includes support for emulated gathers but also the legacy x86 builtin path. Bootstrapped and tested on x86_64-unknown-linux-gnu, will push. Richard. PR tree-optimization/111131 * tree-vect-loop.cc (update_epilogue_loop_vinfo): Make sure to update all gather/scatter stmt DRs, not only those that eventually got VMAT_GATHER_SCATTER set. * tree-vect-slp.cc (_slp_oprnd_info::first_gs_info): Add. (vect_get_and_check_slp_defs): Handle gathers/scatters, adding the offset as SLP operand and comparing base and scale. (vect_build_slp_tree_1): Handle gathers. (vect_build_slp_tree_2): Likewise. * gcc.dg/vect/vect-gather-1.c: Now expected to vectorize everywhere. * gcc.dg/vect/vect-gather-2.c: Expected to not SLP anywhere. Massage the scale case to more reliably produce a different one. Scan for the specific messages. * gcc.dg/vect/vect-gather-3.c: Masked gather is also supported for AVX2, but not emulated. * gcc.dg/vect/vect-gather-4.c: Expected to not SLP anywhere. Massage to more properly ensure this. * gcc.dg/vect/tsvc/vect-tsvc-s353.c: Expect to vectorize everywhere. --- .../gcc.dg/vect/tsvc/vect-tsvc-s353.c | 2 +- gcc/testsuite/gcc.dg/vect/vect-gather-1.c | 2 +- gcc/testsuite/gcc.dg/vect/vect-gather-2.c | 13 ++++-- gcc/testsuite/gcc.dg/vect/vect-gather-3.c | 2 +- gcc/testsuite/gcc.dg/vect/vect-gather-4.c | 6 +-- gcc/tree-vect-loop.cc | 6 ++- gcc/tree-vect-slp.cc | 45 +++++++++++++++++-- 7 files changed, 61 insertions(+), 15 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c index 98ba7522471..2c4fa3f5991 100644 --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c @@ -44,4 +44,4 @@ int main (int argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! riscv_v } } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c index e3bbf5c0bf8..5f6640d9ab6 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c @@ -58,4 +58,4 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */ +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c index a1f6ba458a9..4c23b808333 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c @@ -8,6 +8,7 @@ f1 (int *restrict y, int *restrict x1, int *restrict x2, { for (int i = 0; i < N; ++i) { + /* Different base. */ y[i * 2] = x1[indices[i * 2]] + 1; y[i * 2 + 1] = x2[indices[i * 2 + 1]] + 2; } @@ -18,8 +19,9 @@ f2 (int *restrict y, int *restrict x, int *restrict indices) { for (int i = 0; i < N; ++i) { - y[i * 2] = x[indices[i * 2]] + 1; - y[i * 2 + 1] = x[indices[i * 2 + 1] * 2] + 2; + /* Different scale. */ + y[i * 2] = *(int *)((char *)x + (__UINTPTR_TYPE__)indices[i * 2] * 4) + 1; + y[i * 2 + 1] = *(int *)((char *)x + (__UINTPTR_TYPE__)indices[i * 2 + 1] * 2) + 2; } } @@ -28,9 +30,12 @@ f3 (int *restrict y, int *restrict x, int *restrict indices) { for (int i = 0; i < N; ++i) { + /* Different type. */ y[i * 2] = x[indices[i * 2]] + 1; - y[i * 2 + 1] = x[(unsigned int) indices[i * 2 + 1]] + 2; + y[i * 2 + 1] = x[((unsigned int *) indices)[i * 2 + 1]] + 2; } } -/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */ +/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect } } */ +/* { dg-final { scan-tree-dump "different gather base" vect { target { ! vect_gather_load_ifn } } } } */ +/* { dg-final { scan-tree-dump "different gather scale" vect { target { ! vect_gather_load_ifn } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-3.c b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c index adfef3bf407..30ba6789e03 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-gather-3.c +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c @@ -62,4 +62,4 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target { vect_gather_load_ifn && vect_masked_load } } } } */ +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target { { vect_gather_load_ifn || avx2 } && vect_masked_load } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-4.c b/gcc/testsuite/gcc.dg/vect/vect-gather-4.c index ee2e4e4999a..1ce63e69199 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-gather-4.c +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-4.c @@ -39,10 +39,10 @@ f3 (int *restrict y, int *restrict x, int *restrict indices) y[i * 2] = (indices[i * 2] < N * 2 ? x[indices[i * 2]] + 1 : 1); - y[i * 2 + 1] = (indices[i * 2 + 1] < N * 2 - ? x[(unsigned int) indices[i * 2 + 1]] + 2 + y[i * 2 + 1] = (((unsigned int *)indices)[i * 2 + 1] < N * 2 + ? x[((unsigned int *) indices)[i * 2 + 1]] + 2 : 2); } } -/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */ +/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect } } */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index ebab1953b9c..8877ebde246 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -11362,8 +11362,7 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance) updated offset we set using ADVANCE. Instead we have to make sure the reference in the data references point to the corresponding copy of the original in the epilogue. */ - if (STMT_VINFO_MEMORY_ACCESS_TYPE (vect_stmt_to_vectorize (stmt_vinfo)) - == VMAT_GATHER_SCATTER) + if (STMT_VINFO_GATHER_SCATTER_P (vect_stmt_to_vectorize (stmt_vinfo))) { DR_REF (dr) = simplify_replace_tree (DR_REF (dr), NULL_TREE, NULL_TREE, @@ -11372,6 +11371,9 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance) = simplify_replace_tree (DR_BASE_ADDRESS (dr), NULL_TREE, NULL_TREE, &find_in_mapping, &mapping); } + else + gcc_assert (STMT_VINFO_MEMORY_ACCESS_TYPE (vect_stmt_to_vectorize (stmt_vinfo)) + != VMAT_GATHER_SCATTER); DR_STMT (dr) = STMT_VINFO_STMT (stmt_vinfo); stmt_vinfo->dr_aux.stmt = stmt_vinfo; /* The vector size of the epilogue is smaller than that of the main loop diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index d081999a763..8efff2e912d 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -283,10 +283,11 @@ typedef struct _slp_oprnd_info vec ops; /* Information about the first statement, its vector def-type, type, the operand itself in case it's constant, and an indication if it's a pattern - stmt. */ + stmt and gather/scatter info. */ tree first_op_type; enum vect_def_type first_dt; bool any_pattern; + gather_scatter_info first_gs_info; } *slp_oprnd_info; @@ -609,6 +610,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, unsigned int i, number_of_oprnds; enum vect_def_type dt = vect_uninitialized_def; slp_oprnd_info oprnd_info; + gather_scatter_info gs_info; unsigned int commutative_op = -1U; bool first = stmt_num == 0; @@ -660,6 +662,19 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, oprnd_info = (*oprnds_info)[i]; + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + { + gcc_assert (number_of_oprnds == 1); + if (!is_a (vinfo) + || !vect_check_gather_scatter (stmt_info, + as_a (vinfo), + first ? &oprnd_info->first_gs_info + : &gs_info)) + return -1; + + oprnd = first ? oprnd_info->first_gs_info.offset : gs_info.offset; + } + stmt_vec_info def_stmt_info; if (!vect_is_simple_use (oprnd, vinfo, &dts[i], &def_stmt_info)) { @@ -792,6 +807,25 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, return 1; } + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + { + if (!operand_equal_p (oprnd_info->first_gs_info.base, + gs_info.base)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "Build SLP failed: different gather base\n"); + return 1; + } + if (oprnd_info->first_gs_info.scale != gs_info.scale) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "Build SLP failed: different gather scale\n"); + return 1; + } + } + /* Not first stmt of the group, check that the def-stmt/s match the def-stmt/s of the first stmt. Allow different definition types for reduction chains: the first stmt must be a @@ -1235,6 +1269,9 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, || rhs_code == INDIRECT_REF || rhs_code == COMPONENT_REF || rhs_code == MEM_REF))) + || (ldst_p + && (STMT_VINFO_GATHER_SCATTER_P (stmt_info) + != STMT_VINFO_GATHER_SCATTER_P (first_stmt_info))) || first_stmt_ldst_p != ldst_p || first_stmt_phi_p != phi_p) { @@ -1357,12 +1394,12 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)) && rhs_code != CFN_GATHER_LOAD && rhs_code != CFN_MASK_GATHER_LOAD + && !STMT_VINFO_GATHER_SCATTER_P (stmt_info) /* Not grouped loads are handled as externals for BB vectorization. For loop vectorization we can handle splats the same we handle single element interleaving. */ && (is_a (vinfo) - || stmt_info != first_stmt_info - || STMT_VINFO_GATHER_SCATTER_P (stmt_info))) + || stmt_info != first_stmt_info)) { /* Not grouped load. */ if (dump_enabled_p ()) @@ -1858,6 +1895,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD) || gimple_call_internal_p (stmt, IFN_GATHER_LOAD) || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)); + else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + gcc_assert (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))); else { *max_nunits = this_max_nunits;