From patchwork Wed Nov 8 15:02:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1861639 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=v7oDQtka; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=fgtCdoqv; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SQSyG0GGqz1yRF for ; Thu, 9 Nov 2023 02:03:17 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 021F4385772B for ; Wed, 8 Nov 2023 15:03:16 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id A41883858D1E for ; Wed, 8 Nov 2023 15:02:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A41883858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A41883858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.220.29 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699455785; cv=none; b=ZiAPVMpgsNCKVM82QiBxXq7r/KUwNuCAxqrePdS/xQcWqev2k0yrnlN7UcrfRrPxiosvP0dIQ7VCl6C2Sbl+XuWtroKv9T/8lzfoO2GL0nKaukO/fiRkDgymhj4WUpPWtrMvRLBkVAFHQ73T8BDEUQNi3kYhTgDaFBG5wUOgQuw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699455785; c=relaxed/simple; bh=EAV01csWsr/9+ICOG5qwjDXEVAYJI9rZ4HIrXlNTUlU=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:MIME-Version: Message-Id; b=vIjKGF1nwZmoxKY7g2jeQ6njb5tCq8ljv7psvMNbLSyL3pigQZRdPVbuYrZMcPC66gTmueb7xOzzPzK3tBENzNuLLM1Da1gXIiSnUeKWf7QtsR5xBLuF9SmJmDbZQkerG3dNCx8BqkNwasNJwOh/HJ6x53T0dsN9IX2UwvlE2LA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id A3CC11F45A for ; Wed, 8 Nov 2023 15:02:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1699455774; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=o8JPehAFx6piKK9u4IkX6AIr9TOHOq6s6TWfgUMe5j8=; b=v7oDQtkaJh+o9qhiUIqkbxGWl6YJyvmq/sLQTk1Ik0CW5zjtVAUjwIBtdhSHux20pz++Ep KdsMYRbtby3Qwm/cmpmiAskiSED4LPtqbJwDZEAwmqEvcHtt/tJer7loA+qBKNZ1AVQE/O rwTwNmqe86rWGtHpj9TB0g7rJgpgzz4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1699455774; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=o8JPehAFx6piKK9u4IkX6AIr9TOHOq6s6TWfgUMe5j8=; b=fgtCdoqvKeMstKZOY65u6tU5B7ibDHs2sx5NjyeqYAsVfhwW498DNPLejKK819cQVgwAGz u0qpGlJphbb+fWBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 8F9F9133F5 for ; Wed, 8 Nov 2023 15:02:54 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id bCLJIR6jS2XkdwAAMHmgww (envelope-from ) for ; Wed, 08 Nov 2023 15:02:54 +0000 Date: Wed, 8 Nov 2023 16:02:54 +0100 (CET) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/4] Fix SLP of masked loads MIME-Version: 1.0 Message-Id: <20231108150254.8F9F9133F5@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The following adjusts things to use the correct mask operand for the SLP of masked loads and gathers. Test coverage is from runtime fails of i386 specific AVX512 tests when enabling single-lane SLP. Bootstrap and regtest running on x86_64-unknown-linux-gnu. * tree-vect-stmts.cc (vectorizable_load): Use the correct vectorized mask operand. --- gcc/tree-vect-stmts.cc | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 65883e04ad7..096a857f2dd 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -10920,9 +10920,6 @@ vectorizable_load (vec_info *vinfo, gsi, stmt_info, bump); } - if (mask && !costing_p) - vec_mask = vec_masks[j]; - gimple *new_stmt = NULL; for (i = 0; i < vec_num; i++) { @@ -10931,6 +10928,8 @@ vectorizable_load (vec_info *vinfo, tree bias = NULL_TREE; if (!costing_p) { + if (mask) + vec_mask = vec_masks[vec_num * j + i]; if (loop_masks) final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, @@ -11285,8 +11284,6 @@ vectorizable_load (vec_info *vinfo, at_loop, offset, &dummy, gsi, &ptr_incr, simd_lane_access_p, bump); - if (mask) - vec_mask = vec_masks[0]; } else if (!costing_p) { @@ -11297,8 +11294,6 @@ vectorizable_load (vec_info *vinfo, else dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, gsi, stmt_info, bump); - if (mask) - vec_mask = vec_masks[j]; } if (grouped_load || slp_perm) @@ -11312,6 +11307,8 @@ vectorizable_load (vec_info *vinfo, tree bias = NULL_TREE; if (!costing_p) { + if (mask) + vec_mask = vec_masks[vec_num * j + i]; if (loop_masks) final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, vec_num * ncopies, vectype, From patchwork Wed Nov 8 15:03:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1861640 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=CdgBCf/c; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=D9fF2zZ4; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SQSyg6Fhjz1yRF for ; Thu, 9 Nov 2023 02:03:39 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 888CC385C6FF for ; Wed, 8 Nov 2023 15:03:32 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 686E4385773C for ; Wed, 8 Nov 2023 15:03:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 686E4385773C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 686E4385773C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.220.28 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699455800; cv=none; b=PUizkzh6NS7ufD2pK1yTaBLXWtoq/E5Qm5gpYQa7vcaLm34noXMsIvDTmAVxtvD9JNuj3lqvAvqRevEnrYvgxzIFQ4F3MiA3eI/vxGA0hup9AZFY6k9LG1MnrOZFNXPNqmpsFiZIZLukh7Cj6mADGjlTqA+8fqMLdmc38XODxWU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699455800; c=relaxed/simple; bh=XePhruUaUtS+BGQ6zeruf/8DH8WIt07jRaxp5GO3fIY=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:MIME-Version: Message-Id; b=g6n8vR7uU4i8ETQPPh098wS2Uq0ri2rqYgP7sb4uaYdmPgCEYQMjcfiMtumONaXJ74xaUnDTqPbQaRtP2GMqk6AZEPAPEN5UsircsOF/2wifh9KqUH8p4d3s5hC31lJEBJ/k2B6IJ4/1RJTnpeSj7BdVLq26Vcuxe46WEcWT/xY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9832621961 for ; Wed, 8 Nov 2023 15:03:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1699455789; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=T7gyEh3R6K2Z+tVnY8joBV9TgIGqBFjaiclXRgdAXYQ=; b=CdgBCf/cEsxSOmWbVTYvORSeiYYgZmwyYzc2AF5HaXVlibmtPGSTahUQhK335C7a5EaecT +1MCyVXWmTcxpui1Jvbx1VggKi5266jHu+ls1iE4PgFjWWiPyddR9Gwq3I9a/I3ZXG5GVl LxDGM+pfPt3Ppjz5YzFGuW5nI0bh5KY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1699455789; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=T7gyEh3R6K2Z+tVnY8joBV9TgIGqBFjaiclXRgdAXYQ=; b=D9fF2zZ4t/u0xdjKW3MNbap+FAAhMDMGvOFzEJFJvRpeCvk3denO5oLa1LfGrvSOvg6FZ4 /CnFSyTo23BfqYCQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 85118133F5 for ; Wed, 8 Nov 2023 15:03:09 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id GrFPHy2jS2X8dwAAMHmgww (envelope-from ) for ; Wed, 08 Nov 2023 15:03:09 +0000 Date: Wed, 8 Nov 2023 16:03:09 +0100 (CET) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/4] TLC to vect_check_store_rhs and vect_slp_child_index_for_operand MIME-Version: 1.0 Message-Id: <20231108150309.85118133F5@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This prepares us for the SLP of scatters. We have to tell vect_slp_child_index_for_operand whether we are dealing with a scatter/gather stmt so this adds an argument similar to the one we have for vect_get_operand_map. This also refactors vect_check_store_rhs to get the actual rhs and the associated SLP node instead of leaving that to the caller. Bootstrap and regtest running on x86_64-unknown-linux-gnu. * tree-vectorizer.h (vect_slp_child_index_for_operand): Add gatherscatter_p argument. * tree-vect-slp.cc (vect_slp_child_index_for_operand): Likewise. Pass it on. * tree-vect-stmts.cc (vect_check_store_rhs): Turn the rhs argument into an output, also output the SLP node associated with it. (vectorizable_simd_clone_call): Adjust. (vectorizable_store): Likewise. (vectorizable_load): Likewise. --- gcc/tree-vect-slp.cc | 5 ++-- gcc/tree-vect-stmts.cc | 52 ++++++++++++++++++++++-------------------- gcc/tree-vectorizer.h | 2 +- 3 files changed, 31 insertions(+), 28 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 13137ede8d4..176aaf270f4 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -589,9 +589,10 @@ vect_get_operand_map (const gimple *stmt, bool gather_scatter_p = false, /* Return the SLP node child index for operand OP of STMT. */ int -vect_slp_child_index_for_operand (const gimple *stmt, int op) +vect_slp_child_index_for_operand (const gimple *stmt, int op, + bool gather_scatter_p) { - const int *opmap = vect_get_operand_map (stmt); + const int *opmap = vect_get_operand_map (stmt, gather_scatter_p); if (!opmap) return op; for (int i = 1; i < 1 + opmap[0]; ++i) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 096a857f2dd..61e23b29516 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2486,42 +2486,33 @@ vect_check_scalar_mask (vec_info *vinfo, stmt_vec_info stmt_info, return true; } -/* Return true if stored value RHS is suitable for vectorizing store - statement STMT_INFO. When returning true, store the type of the - definition in *RHS_DT_OUT, the type of the vectorized store value in +/* Return true if stored value is suitable for vectorizing store + statement STMT_INFO. When returning true, store the scalar stored + in *RHS and *RHS_NODE, the type of the definition in *RHS_DT_OUT, + the type of the vectorized store value in *RHS_VECTYPE_OUT and the type of the store in *VLS_TYPE_OUT. */ static bool vect_check_store_rhs (vec_info *vinfo, stmt_vec_info stmt_info, - slp_tree slp_node, tree rhs, + slp_tree slp_node, tree *rhs, slp_tree *rhs_node, vect_def_type *rhs_dt_out, tree *rhs_vectype_out, vec_load_store_type *vls_type_out) { - /* In the case this is a store from a constant make sure - native_encode_expr can handle it. */ - if (CONSTANT_CLASS_P (rhs) && native_encode_expr (rhs, NULL, 64) == 0) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "cannot encode constant as a byte sequence.\n"); - return false; - } - int op_no = 0; if (gcall *call = dyn_cast (stmt_info->stmt)) { if (gimple_call_internal_p (call) && internal_store_fn_p (gimple_call_internal_fn (call))) op_no = internal_fn_stored_value_index (gimple_call_internal_fn (call)); - if (slp_node) - op_no = vect_slp_child_index_for_operand (call, op_no); } + if (slp_node) + op_no = vect_slp_child_index_for_operand + (stmt_info->stmt, op_no, STMT_VINFO_GATHER_SCATTER_P (stmt_info)); enum vect_def_type rhs_dt; tree rhs_vectype; - slp_tree slp_op; if (!vect_is_simple_use (vinfo, stmt_info, slp_node, op_no, - &rhs, &slp_op, &rhs_dt, &rhs_vectype)) + rhs, rhs_node, &rhs_dt, &rhs_vectype)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -2529,6 +2520,16 @@ vect_check_store_rhs (vec_info *vinfo, stmt_vec_info stmt_info, return false; } + /* In the case this is a store from a constant make sure + native_encode_expr can handle it. */ + if (CONSTANT_CLASS_P (*rhs) && native_encode_expr (*rhs, NULL, 64) == 0) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "cannot encode constant as a byte sequence.\n"); + return false; + } + tree vectype = STMT_VINFO_VECTYPE (stmt_info); if (rhs_vectype && !useless_type_conversion_p (vectype, rhs_vectype)) { @@ -4052,7 +4053,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, int op_no = i + masked_call_offset; if (slp_node) - op_no = vect_slp_child_index_for_operand (stmt, op_no); + op_no = vect_slp_child_index_for_operand (stmt, op_no, false); if (!vect_is_simple_use (vinfo, stmt_info, slp_node, op_no, &op, &slp_op[i], &thisarginfo.dt, &thisarginfo.vectype) @@ -8173,7 +8174,6 @@ vectorizable_store (vec_info *vinfo, stmt_vector_for_cost *cost_vec) { tree data_ref; - tree op; tree vec_oprnd = NULL_TREE; tree elem_type; loop_vec_info loop_vinfo = dyn_cast (vinfo); @@ -8236,15 +8236,14 @@ vectorizable_store (vec_info *vinfo, int mask_index = internal_fn_mask_index (ifn); if (mask_index >= 0 && slp_node) - mask_index = vect_slp_child_index_for_operand (call, mask_index); + mask_index = vect_slp_child_index_for_operand + (call, mask_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info)); if (mask_index >= 0 && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, &mask, NULL, &mask_dt, &mask_vectype)) return false; } - op = vect_get_store_rhs (stmt_info); - /* Cannot have hybrid store SLP -- that would mean storing to the same location twice. */ gcc_assert (slp == PURE_SLP_STMT (stmt_info)); @@ -8279,8 +8278,10 @@ vectorizable_store (vec_info *vinfo, return false; } + tree op; + slp_tree op_node; if (!vect_check_store_rhs (vinfo, stmt_info, slp_node, - op, &rhs_dt, &rhs_vectype, &vls_type)) + &op, &op_node, &rhs_dt, &rhs_vectype, &vls_type)) return false; elem_type = TREE_TYPE (vectype); @@ -9855,7 +9856,8 @@ vectorizable_load (vec_info *vinfo, mask_index = internal_fn_mask_index (ifn); if (mask_index >= 0 && slp_node) - mask_index = vect_slp_child_index_for_operand (call, mask_index); + mask_index = vect_slp_child_index_for_operand + (call, mask_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info)); if (mask_index >= 0 && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, &mask, &slp_op, &mask_dt, &mask_vectype)) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index d2ddc2e4ad5..e4d7ab4567c 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -2462,7 +2462,7 @@ extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info); extern slp_tree vect_create_new_slp_node (unsigned, tree_code); extern void vect_free_slp_tree (slp_tree); extern bool compatible_calls_p (gcall *, gcall *); -extern int vect_slp_child_index_for_operand (const gimple *, int op); +extern int vect_slp_child_index_for_operand (const gimple *, int op, bool); /* In tree-vect-patterns.cc. */ extern void From patchwork Wed Nov 8 15:03:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1861641 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=GMcTwCtu; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=cR8vRUlu; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SQSyr3kQ0z1yRF for ; Thu, 9 Nov 2023 02:03:48 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 793963861858 for ; Wed, 8 Nov 2023 15:03:46 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id F1CD0385C6FA for ; Wed, 8 Nov 2023 15:03:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F1CD0385C6FA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F1CD0385C6FA Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:67c:2178:6::1d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699455816; cv=none; b=ejyywihmGdwsIEVnF68EwNcT0dcpr8+MnUJVdQmt71xElZyY+VKPsyXHg//iw8XXIgfViHrObukJ7L7Dv1N4AMCDaH/GUqdcuTi5GWNQa3vTQXmcg4t+LiCPXIbl7VOBjKzXhnzH0i+dqBOrLK6irnfh0gFx40JXII+6kfyQE4A= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699455816; c=relaxed/simple; bh=YkOIZpW7HIoE1d9FtKriCZUPoh+5Y+G26XrRYiFxIog=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:MIME-Version: Message-Id; b=tAUCWek83woQO76mjywLi6wDRN3l5yAsxJqm48wh011i1o1qeXM53xxvlASnjhy3ZYK4ouPLLTpBIE4jthkKOLodqB/CTOMayecLgEQpECIuIMmUhWNfRjICMCfDsL3scDnDJHDqQtRRVbTV3Na7fMx6bZHjT/9POTLMb5QNK2E= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 103921F45A for ; Wed, 8 Nov 2023 15:03:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1699455805; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=5fcVoYXT3jz7r4kCzXMqZfV9S1DWzEK0UaNBZwI8Xno=; b=GMcTwCtupvnQbQvzwEwsN9tvdoLV03+D1KvqoK6+JJ2/kSu2WJXk2XDcwY58aXINvccXxh KeA5gpZFiX8UMFGHC3+6L01M5FdBj/k9hIMzTW4jLuKbj0gQkoR3v0JuKpwzVs6E78heEF /8PmfeZl+TSqMDtB5TLJAOBDKA9W3AU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1699455805; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=5fcVoYXT3jz7r4kCzXMqZfV9S1DWzEK0UaNBZwI8Xno=; b=cR8vRUluSKtXhLiLAlY41iCGe6Us3ksak7PsYNnIkzRw2reMFCSKgKB2PThnPlUIgaH673 mKj4a/xTaDw55MCg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E6D3A133F5 for ; Wed, 8 Nov 2023 15:03:24 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 5XAwNzyjS2UVeAAAMHmgww (envelope-from ) for ; Wed, 08 Nov 2023 15:03:24 +0000 Date: Wed, 8 Nov 2023 16:03:24 +0100 (CET) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 3/4] Fix SLP of emulated gathers MIME-Version: 1.0 Message-Id: <20231108150324.E6D3A133F5@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The following fixes an error in the SLP of emulated gathers, discovered by x86 specific tests when enabling single-lane SLP. Bootstrap and regtest running on x86_64-unknown-linux-gnu. * tree-vect-stmts.cc (vectorizable_load): Adjust offset vector gathering for SLP of emulated gathers. --- gcc/tree-vect-stmts.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 61e23b29516..913a4fb08ed 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -11163,7 +11163,7 @@ vectorizable_load (vec_info *vinfo, than the data vector for now. */ unsigned HOST_WIDE_INT factor = const_offset_nunits / const_nunits; - vec_offset = vec_offsets[j / factor]; + vec_offset = vec_offsets[(vec_num * j + i) / factor]; unsigned elt_offset = (j % factor) * const_nunits; tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset)); tree scale = size_int (gs_info.scale); From patchwork Wed Nov 8 15:03:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1861642 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=xPiwXJZu; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=AZ1fNlls; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SQSzD2VFDz1yRF for ; Thu, 9 Nov 2023 02:04:08 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 663933861819 for ; Wed, 8 Nov 2023 15:04:03 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 8425A385DC03 for ; Wed, 8 Nov 2023 15:03:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8425A385DC03 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8425A385DC03 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:67c:2178:6::1c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699455828; cv=none; b=getOitwHiUCVEd9jwUBQoHZECppd3stEoswrvy2vgyFKKrMQLouTA/IH4HTqO1psj2oYyglm/U3Dhx+81xh/ouTZtxh4/KOuA+YacgK7fxsplgICjjsbgA5HY8XoAFLDoxjqYO9uKGcHoQU3+mraJMzzUTKtyFCK45DzE5/FGuM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699455828; c=relaxed/simple; bh=kRh7a4UaskQRrGEf6dtpFP1uZoMN7/lteDurQE0n1DU=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:MIME-Version: Message-Id; b=ugRz3SZdgI3NbaBgmzsse3a++vm4ukHQyZIFiaEgSyjpUXp7KMYVfUNpzXbTOkgdy5LVdtiuwkyAc+REHCBv4d08lSnSYdgpCxUY4PhnbIzFYqYfDwv0eMiwqj6obNqKMy9vp9Dk928UJ+1TtlEt/WgJuIiYFfenBz9iaM2MwLg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id AB48521906 for ; Wed, 8 Nov 2023 15:03:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1699455816; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=t7OH+V/B6QKAPHT07q8v9qDIz3bHDEqYZRt7HYNZxtE=; b=xPiwXJZuJt9RtqmsgSDAoiD1P4DqZUjuyXxzEG3uTuf7ihVF8U2XNJMr3Jmi/1uUtflkom TMYv17nfal2fAE1fuKGXgQvT1+TfT+eU7QhS/s+dU0/qamVVG/LSsUvSQpoJ96nA7dLza3 H7Wza7tAdwud9f0VMKf0Nu1kzY19S6o= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1699455816; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=t7OH+V/B6QKAPHT07q8v9qDIz3bHDEqYZRt7HYNZxtE=; b=AZ1fNlls5QzaX7ltTbJOKpdW/usQBRfci0AXozYo2aL3aPCfRQwXr15HwM1wDm+tQPSKTu 588E4Q1WF/NhUAAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 96BCC133F5 for ; Wed, 8 Nov 2023 15:03:36 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 3QqmI0ijS2UseAAAMHmgww (envelope-from ) for ; Wed, 08 Nov 2023 15:03:36 +0000 Date: Wed, 8 Nov 2023 16:03:36 +0100 (CET) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 4/4] Refactor x86 decl based scatter vectorization, prepare SLP MIME-Version: 1.0 Message-Id: <20231108150336.96BCC133F5@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The following refactors the x86 decl based scatter vectorization similar to what I did to the gather path. This prepares scatters for SLP as well, mainly single-lane since there are multiple missing bits to support multi-lane scatters. Tested extensively on the SLP-only branch which has the ability to force SLP even for single lanes. Bootstrap and regtest running on x86_64-unknown-linux-gnu. PR tree-optimization/111133 * tree-vect-stmts.cc (vect_build_scatter_store_calls): Remove and refactor to ... (vect_build_one_scatter_store_call): ... this new function. (vectorizable_store): Use vect_check_scalar_mask to record the SLP node for the mask operand. Code generate scatters with builtin decls from the main scatter vectorization path and prepare that for SLP. --- gcc/tree-vect-stmts.cc | 683 ++++++++++++++++++++--------------------- 1 file changed, 326 insertions(+), 357 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 913a4fb08ed..f41b4825a6a 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2703,238 +2703,87 @@ vect_build_one_gather_load_call (vec_info *vinfo, stmt_vec_info stmt_info, } /* Build a scatter store call while vectorizing STMT_INFO. Insert new - instructions before GSI and add them to VEC_STMT. GS_INFO describes - the scatter store operation. If the store is conditional, MASK is the - unvectorized condition, otherwise MASK is null. */ + instructions before GSI. GS_INFO describes the scatter store operation. + PTR is the base pointer, OFFSET the vectorized offsets and OPRND the + vectorized data to store. + If the store is conditional, MASK is the vectorized condition, otherwise + MASK is null. */ -static void -vect_build_scatter_store_calls (vec_info *vinfo, stmt_vec_info stmt_info, - gimple_stmt_iterator *gsi, gimple **vec_stmt, - gather_scatter_info *gs_info, tree mask, - stmt_vector_for_cost *cost_vec) +static gimple * +vect_build_one_scatter_store_call (vec_info *vinfo, stmt_vec_info stmt_info, + gimple_stmt_iterator *gsi, + gather_scatter_info *gs_info, + tree ptr, tree offset, tree oprnd, tree mask) { - loop_vec_info loop_vinfo = dyn_cast (vinfo); - tree vectype = STMT_VINFO_VECTYPE (stmt_info); - poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); - int ncopies = vect_get_num_copies (loop_vinfo, vectype); - enum { NARROW, NONE, WIDEN } modifier; - poly_uint64 scatter_off_nunits - = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype); - - /* FIXME: Keep the previous costing way in vect_model_store_cost by - costing N scalar stores, but it should be tweaked to use target - specific costs on related scatter store calls. */ - if (cost_vec) - { - tree op = vect_get_store_rhs (stmt_info); - enum vect_def_type dt; - gcc_assert (vect_is_simple_use (op, vinfo, &dt)); - unsigned int inside_cost, prologue_cost = 0; - if (dt == vect_constant_def || dt == vect_external_def) - prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, - stmt_info, 0, vect_prologue); - unsigned int assumed_nunits = vect_nunits_for_cost (vectype); - inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits, - scalar_store, stmt_info, 0, vect_body); - - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "vect_model_store_cost: inside_cost = %d, " - "prologue_cost = %d .\n", - inside_cost, prologue_cost); - return; - } - - tree perm_mask = NULL_TREE, mask_halfvectype = NULL_TREE; - if (known_eq (nunits, scatter_off_nunits)) - modifier = NONE; - else if (known_eq (nunits * 2, scatter_off_nunits)) - { - modifier = WIDEN; - - /* Currently gathers and scatters are only supported for - fixed-length vectors. */ - unsigned int count = scatter_off_nunits.to_constant (); - vec_perm_builder sel (count, count, 1); - for (unsigned i = 0; i < (unsigned int) count; ++i) - sel.quick_push (i | (count / 2)); - - vec_perm_indices indices (sel, 1, count); - perm_mask = vect_gen_perm_mask_checked (gs_info->offset_vectype, indices); - gcc_assert (perm_mask != NULL_TREE); - } - else if (known_eq (nunits, scatter_off_nunits * 2)) - { - modifier = NARROW; - - /* Currently gathers and scatters are only supported for - fixed-length vectors. */ - unsigned int count = nunits.to_constant (); - vec_perm_builder sel (count, count, 1); - for (unsigned i = 0; i < (unsigned int) count; ++i) - sel.quick_push (i | (count / 2)); - - vec_perm_indices indices (sel, 2, count); - perm_mask = vect_gen_perm_mask_checked (vectype, indices); - gcc_assert (perm_mask != NULL_TREE); - ncopies *= 2; - - if (mask) - mask_halfvectype = truth_type_for (gs_info->offset_vectype); - } - else - gcc_unreachable (); - tree rettype = TREE_TYPE (TREE_TYPE (gs_info->decl)); tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info->decl)); - tree ptrtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); + /* tree ptrtype = TREE_VALUE (arglist); */ arglist = TREE_CHAIN (arglist); tree masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); tree idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); tree srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist); tree scaletype = TREE_VALUE (arglist); - gcc_checking_assert (TREE_CODE (masktype) == INTEGER_TYPE && TREE_CODE (rettype) == VOID_TYPE); - tree ptr = fold_convert (ptrtype, gs_info->base); - if (!is_gimple_min_invariant (ptr)) + tree mask_arg = NULL_TREE; + if (mask) { - gimple_seq seq; - ptr = force_gimple_operand (ptr, &seq, true, NULL_TREE); - class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - edge pe = loop_preheader_edge (loop); - basic_block new_bb = gsi_insert_seq_on_edge_immediate (pe, seq); - gcc_assert (!new_bb); + mask_arg = mask; + tree optype = TREE_TYPE (mask_arg); + tree utype; + if (TYPE_MODE (masktype) == TYPE_MODE (optype)) + utype = masktype; + else + utype = lang_hooks.types.type_for_mode (TYPE_MODE (optype), 1); + tree var = vect_get_new_ssa_name (utype, vect_scalar_var); + mask_arg = build1 (VIEW_CONVERT_EXPR, utype, mask_arg); + gassign *new_stmt + = gimple_build_assign (var, VIEW_CONVERT_EXPR, mask_arg); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + mask_arg = var; + if (!useless_type_conversion_p (masktype, utype)) + { + gcc_assert (TYPE_PRECISION (utype) <= TYPE_PRECISION (masktype)); + tree var = vect_get_new_ssa_name (masktype, vect_scalar_var); + new_stmt = gimple_build_assign (var, NOP_EXPR, mask_arg); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + mask_arg = var; + } } - - tree mask_arg = NULL_TREE; - if (mask == NULL_TREE) + else { mask_arg = build_int_cst (masktype, -1); mask_arg = vect_init_vector (vinfo, stmt_info, mask_arg, masktype, NULL); } - tree scale = build_int_cst (scaletype, gs_info->scale); - - auto_vec vec_oprnds0; - auto_vec vec_oprnds1; - auto_vec vec_masks; - if (mask) + tree src = oprnd; + if (!useless_type_conversion_p (srctype, TREE_TYPE (src))) { - tree mask_vectype = truth_type_for (vectype); - vect_get_vec_defs_for_operand (vinfo, stmt_info, - modifier == NARROW ? ncopies / 2 : ncopies, - mask, &vec_masks, mask_vectype); + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src)), + TYPE_VECTOR_SUBPARTS (srctype))); + tree var = vect_get_new_ssa_name (srctype, vect_simple_var); + src = build1 (VIEW_CONVERT_EXPR, srctype, src); + gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, src); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + src = var; } - vect_get_vec_defs_for_operand (vinfo, stmt_info, - modifier == WIDEN ? ncopies / 2 : ncopies, - gs_info->offset, &vec_oprnds0); - tree op = vect_get_store_rhs (stmt_info); - vect_get_vec_defs_for_operand (vinfo, stmt_info, - modifier == NARROW ? ncopies / 2 : ncopies, op, - &vec_oprnds1); - tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE; - tree mask_op = NULL_TREE; - tree src, vec_mask; - for (int j = 0; j < ncopies; ++j) + tree op = offset; + if (!useless_type_conversion_p (idxtype, TREE_TYPE (op))) { - if (modifier == WIDEN) - { - if (j & 1) - op = permute_vec_elements (vinfo, vec_oprnd0, vec_oprnd0, perm_mask, - stmt_info, gsi); - else - op = vec_oprnd0 = vec_oprnds0[j / 2]; - src = vec_oprnd1 = vec_oprnds1[j]; - if (mask) - mask_op = vec_mask = vec_masks[j]; - } - else if (modifier == NARROW) - { - if (j & 1) - src = permute_vec_elements (vinfo, vec_oprnd1, vec_oprnd1, - perm_mask, stmt_info, gsi); - else - src = vec_oprnd1 = vec_oprnds1[j / 2]; - op = vec_oprnd0 = vec_oprnds0[j]; - if (mask) - mask_op = vec_mask = vec_masks[j / 2]; - } - else - { - op = vec_oprnd0 = vec_oprnds0[j]; - src = vec_oprnd1 = vec_oprnds1[j]; - if (mask) - mask_op = vec_mask = vec_masks[j]; - } - - if (!useless_type_conversion_p (srctype, TREE_TYPE (src))) - { - gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src)), - TYPE_VECTOR_SUBPARTS (srctype))); - tree var = vect_get_new_ssa_name (srctype, vect_simple_var); - src = build1 (VIEW_CONVERT_EXPR, srctype, src); - gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, src); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - src = var; - } - - if (!useless_type_conversion_p (idxtype, TREE_TYPE (op))) - { - gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)), - TYPE_VECTOR_SUBPARTS (idxtype))); - tree var = vect_get_new_ssa_name (idxtype, vect_simple_var); - op = build1 (VIEW_CONVERT_EXPR, idxtype, op); - gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - op = var; - } - - if (mask) - { - tree utype; - mask_arg = mask_op; - if (modifier == NARROW) - { - tree var - = vect_get_new_ssa_name (mask_halfvectype, vect_simple_var); - gassign *new_stmt - = gimple_build_assign (var, - (j & 1) ? VEC_UNPACK_HI_EXPR - : VEC_UNPACK_LO_EXPR, - mask_op); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - mask_arg = var; - } - tree optype = TREE_TYPE (mask_arg); - if (TYPE_MODE (masktype) == TYPE_MODE (optype)) - utype = masktype; - else - utype = lang_hooks.types.type_for_mode (TYPE_MODE (optype), 1); - tree var = vect_get_new_ssa_name (utype, vect_scalar_var); - mask_arg = build1 (VIEW_CONVERT_EXPR, utype, mask_arg); - gassign *new_stmt - = gimple_build_assign (var, VIEW_CONVERT_EXPR, mask_arg); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - mask_arg = var; - if (!useless_type_conversion_p (masktype, utype)) - { - gcc_assert (TYPE_PRECISION (utype) <= TYPE_PRECISION (masktype)); - tree var = vect_get_new_ssa_name (masktype, vect_scalar_var); - new_stmt = gimple_build_assign (var, NOP_EXPR, mask_arg); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - mask_arg = var; - } - } - - gcall *new_stmt - = gimple_build_call (gs_info->decl, 5, ptr, mask_arg, op, src, scale); + gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)), + TYPE_VECTOR_SUBPARTS (idxtype))); + tree var = vect_get_new_ssa_name (idxtype, vect_simple_var); + op = build1 (VIEW_CONVERT_EXPR, idxtype, op); + gassign *new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op); vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); - - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + op = var; } - *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; + + tree scale = build_int_cst (scaletype, gs_info->scale); + gcall *new_stmt + = gimple_build_call (gs_info->decl, 5, ptr, mask_arg, op, src, scale); + return new_stmt; } /* Prepare the base and offset in GS_INFO for vectorization. @@ -8209,6 +8058,7 @@ vectorizable_store (vec_info *vinfo, /* Is vectorizable store? */ tree mask = NULL_TREE, mask_vectype = NULL_TREE; + slp_tree mask_node = NULL; if (gassign *assign = dyn_cast (stmt_info->stmt)) { tree scalar_dest = gimple_assign_lhs (assign); @@ -8240,7 +8090,8 @@ vectorizable_store (vec_info *vinfo, (call, mask_index, STMT_VINFO_GATHER_SCATTER_P (stmt_info)); if (mask_index >= 0 && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, - &mask, NULL, &mask_dt, &mask_vectype)) + &mask, &mask_node, &mask_dt, + &mask_vectype)) return false; } @@ -8409,13 +8260,7 @@ vectorizable_store (vec_info *vinfo, ensure_base_align (dr_info); - if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl) - { - vect_build_scatter_store_calls (vinfo, stmt_info, gsi, vec_stmt, &gs_info, - mask, cost_vec); - return true; - } - else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3) + if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3) { gcc_assert (memory_access_type == VMAT_CONTIGUOUS); gcc_assert (!slp); @@ -9052,7 +8897,7 @@ vectorizable_store (vec_info *vinfo, if (memory_access_type == VMAT_GATHER_SCATTER) { - gcc_assert (!slp && !grouped_store); + gcc_assert (!grouped_store); auto_vec vec_offsets; unsigned int inside_cost = 0, prologue_cost = 0; for (j = 0; j < ncopies; j++) @@ -9068,22 +8913,22 @@ vectorizable_store (vec_info *vinfo, /* Since the store is not grouped, DR_GROUP_SIZE is 1, and DR_CHAIN is of size 1. */ gcc_assert (group_size == 1); - op = vect_get_store_rhs (first_stmt_info); - vect_get_vec_defs_for_operand (vinfo, first_stmt_info, - ncopies, op, gvec_oprnds[0]); - vec_oprnd = (*gvec_oprnds[0])[0]; - dr_chain.quick_push (vec_oprnd); + if (slp_node) + vect_get_slp_defs (op_node, gvec_oprnds[0]); + else + vect_get_vec_defs_for_operand (vinfo, first_stmt_info, + ncopies, op, gvec_oprnds[0]); if (mask) { - vect_get_vec_defs_for_operand (vinfo, stmt_info, ncopies, - mask, &vec_masks, - mask_vectype); - vec_mask = vec_masks[0]; + if (slp_node) + vect_get_slp_defs (mask_node, &vec_masks); + else + vect_get_vec_defs_for_operand (vinfo, stmt_info, + ncopies, + mask, &vec_masks, + mask_vectype); } - /* We should have catched mismatched types earlier. */ - gcc_assert ( - useless_type_conversion_p (vectype, TREE_TYPE (vec_oprnd))); if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) vect_get_gather_scatter_ops (loop_vinfo, loop, stmt_info, slp_node, &gs_info, @@ -9099,156 +8944,280 @@ vectorizable_store (vec_info *vinfo, else if (!costing_p) { gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); - vec_oprnd = (*gvec_oprnds[0])[j]; - dr_chain[0] = vec_oprnd; - if (mask) - vec_mask = vec_masks[j]; if (!STMT_VINFO_GATHER_SCATTER_P (stmt_info)) dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, gsi, stmt_info, bump); } new_stmt = NULL; - unsigned HOST_WIDE_INT align; - tree final_mask = NULL_TREE; - tree final_len = NULL_TREE; - tree bias = NULL_TREE; - if (!costing_p) - { - if (loop_masks) - final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, - ncopies, vectype, j); - if (vec_mask) - final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, - final_mask, vec_mask, gsi); - } - - if (gs_info.ifn != IFN_LAST) + for (i = 0; i < vec_num; ++i) { - if (costing_p) + if (!costing_p) { - unsigned int cnunits = vect_nunits_for_cost (vectype); - inside_cost - += record_stmt_cost (cost_vec, cnunits, scalar_store, - stmt_info, 0, vect_body); - continue; + vec_oprnd = (*gvec_oprnds[0])[vec_num * j + i]; + if (mask) + vec_mask = vec_masks[vec_num * j + i]; + /* We should have catched mismatched types earlier. */ + gcc_assert (useless_type_conversion_p (vectype, + TREE_TYPE (vec_oprnd))); + } + unsigned HOST_WIDE_INT align; + tree final_mask = NULL_TREE; + tree final_len = NULL_TREE; + tree bias = NULL_TREE; + if (!costing_p) + { + if (loop_masks) + final_mask = vect_get_loop_mask (loop_vinfo, gsi, + loop_masks, ncopies, + vectype, j); + if (vec_mask) + final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, + final_mask, vec_mask, gsi); } - if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) - vec_offset = vec_offsets[j]; - tree scale = size_int (gs_info.scale); - - if (gs_info.ifn == IFN_MASK_LEN_SCATTER_STORE) + if (gs_info.ifn != IFN_LAST) { - if (loop_lens) - final_len = vect_get_loop_len (loop_vinfo, gsi, loop_lens, - ncopies, vectype, j, 1); + if (costing_p) + { + unsigned int cnunits = vect_nunits_for_cost (vectype); + inside_cost + += record_stmt_cost (cost_vec, cnunits, scalar_store, + stmt_info, 0, vect_body); + continue; + } + + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + vec_offset = vec_offsets[vec_num * j + i]; + tree scale = size_int (gs_info.scale); + + if (gs_info.ifn == IFN_MASK_LEN_SCATTER_STORE) + { + if (loop_lens) + final_len = vect_get_loop_len (loop_vinfo, gsi, + loop_lens, ncopies, + vectype, j, 1); + else + final_len = size_int (TYPE_VECTOR_SUBPARTS (vectype)); + signed char biasval + = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + bias = build_int_cst (intQI_type_node, biasval); + if (!final_mask) + { + mask_vectype = truth_type_for (vectype); + final_mask = build_minus_one_cst (mask_vectype); + } + } + + gcall *call; + if (final_len && final_mask) + call = gimple_build_call_internal + (IFN_MASK_LEN_SCATTER_STORE, 7, dataref_ptr, + vec_offset, scale, vec_oprnd, final_mask, + final_len, bias); + else if (final_mask) + call = gimple_build_call_internal + (IFN_MASK_SCATTER_STORE, 5, dataref_ptr, + vec_offset, scale, vec_oprnd, final_mask); else - final_len = build_int_cst (sizetype, - TYPE_VECTOR_SUBPARTS (vectype)); - signed char biasval - = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); - bias = build_int_cst (intQI_type_node, biasval); - if (!final_mask) + call = gimple_build_call_internal (IFN_SCATTER_STORE, 4, + dataref_ptr, vec_offset, + scale, vec_oprnd); + gimple_call_set_nothrow (call, true); + vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); + new_stmt = call; + } + else if (gs_info.decl) + { + /* The builtin decls path for scatter is legacy, x86 only. */ + gcc_assert (nunits.is_constant () + && (!final_mask + || SCALAR_INT_MODE_P + (TYPE_MODE (TREE_TYPE (final_mask))))); + if (costing_p) { - mask_vectype = truth_type_for (vectype); - final_mask = build_minus_one_cst (mask_vectype); + unsigned int cnunits = vect_nunits_for_cost (vectype); + inside_cost + += record_stmt_cost (cost_vec, cnunits, scalar_store, + stmt_info, 0, vect_body); + continue; } + poly_uint64 offset_nunits + = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype); + if (known_eq (nunits, offset_nunits)) + { + new_stmt = vect_build_one_scatter_store_call + (vinfo, stmt_info, gsi, &gs_info, + dataref_ptr, vec_offsets[vec_num * j + i], + vec_oprnd, final_mask); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + else if (known_eq (nunits, offset_nunits * 2)) + { + /* We have a offset vector with half the number of + lanes but the builtins will store full vectype + data from the lower lanes. */ + new_stmt = vect_build_one_scatter_store_call + (vinfo, stmt_info, gsi, &gs_info, + dataref_ptr, + vec_offsets[2 * vec_num * j + 2 * i], + vec_oprnd, final_mask); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + int count = nunits.to_constant (); + vec_perm_builder sel (count, count, 1); + sel.quick_grow (count); + for (int i = 0; i < count; ++i) + sel[i] = i | (count / 2); + vec_perm_indices indices (sel, 2, count); + tree perm_mask + = vect_gen_perm_mask_checked (vectype, indices); + new_stmt = gimple_build_assign (NULL_TREE, VEC_PERM_EXPR, + vec_oprnd, vec_oprnd, + perm_mask); + vec_oprnd = make_ssa_name (vectype); + gimple_set_lhs (new_stmt, vec_oprnd); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + if (final_mask) + { + new_stmt = gimple_build_assign (NULL_TREE, + VEC_UNPACK_HI_EXPR, + final_mask); + final_mask = make_ssa_name + (truth_type_for (gs_info.offset_vectype)); + gimple_set_lhs (new_stmt, final_mask); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + new_stmt = vect_build_one_scatter_store_call + (vinfo, stmt_info, gsi, &gs_info, + dataref_ptr, + vec_offsets[2 * vec_num * j + 2 * i + 1], + vec_oprnd, final_mask); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + else if (known_eq (nunits * 2, offset_nunits)) + { + /* We have a offset vector with double the number of + lanes. Select the low/high part accordingly. */ + vec_offset = vec_offsets[(vec_num * j + i) / 2]; + if ((vec_num * j + i) & 1) + { + int count = offset_nunits.to_constant (); + vec_perm_builder sel (count, count, 1); + sel.quick_grow (count); + for (int i = 0; i < count; ++i) + sel[i] = i | (count / 2); + vec_perm_indices indices (sel, 2, count); + tree perm_mask = vect_gen_perm_mask_checked + (TREE_TYPE (vec_offset), indices); + new_stmt = gimple_build_assign (NULL_TREE, + VEC_PERM_EXPR, + vec_offset, + vec_offset, + perm_mask); + vec_offset = make_ssa_name (TREE_TYPE (vec_offset)); + gimple_set_lhs (new_stmt, vec_offset); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + new_stmt = vect_build_one_scatter_store_call + (vinfo, stmt_info, gsi, &gs_info, + dataref_ptr, vec_offset, + vec_oprnd, final_mask); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + else + gcc_unreachable (); } - - gcall *call; - if (final_len && final_mask) - call = gimple_build_call_internal (IFN_MASK_LEN_SCATTER_STORE, - 7, dataref_ptr, vec_offset, - scale, vec_oprnd, final_mask, - final_len, bias); - else if (final_mask) - call - = gimple_build_call_internal (IFN_MASK_SCATTER_STORE, 5, - dataref_ptr, vec_offset, scale, - vec_oprnd, final_mask); else - call = gimple_build_call_internal (IFN_SCATTER_STORE, 4, - dataref_ptr, vec_offset, - scale, vec_oprnd); - gimple_call_set_nothrow (call, true); - vect_finish_stmt_generation (vinfo, stmt_info, call, gsi); - new_stmt = call; - } - else - { - /* Emulated scatter. */ - gcc_assert (!final_mask); - if (costing_p) { - unsigned int cnunits = vect_nunits_for_cost (vectype); - /* For emulated scatter N offset vector element extracts - (we assume the scalar scaling and ptr + offset add is - consumed by the load). */ - inside_cost - += record_stmt_cost (cost_vec, cnunits, vec_to_scalar, - stmt_info, 0, vect_body); - /* N scalar stores plus extracting the elements. */ - inside_cost - += record_stmt_cost (cost_vec, cnunits, vec_to_scalar, - stmt_info, 0, vect_body); - inside_cost - += record_stmt_cost (cost_vec, cnunits, scalar_store, - stmt_info, 0, vect_body); - continue; - } + /* Emulated scatter. */ + gcc_assert (!final_mask); + if (costing_p) + { + unsigned int cnunits = vect_nunits_for_cost (vectype); + /* For emulated scatter N offset vector element extracts + (we assume the scalar scaling and ptr + offset add is + consumed by the load). */ + inside_cost + += record_stmt_cost (cost_vec, cnunits, vec_to_scalar, + stmt_info, 0, vect_body); + /* N scalar stores plus extracting the elements. */ + inside_cost + += record_stmt_cost (cost_vec, cnunits, vec_to_scalar, + stmt_info, 0, vect_body); + inside_cost + += record_stmt_cost (cost_vec, cnunits, scalar_store, + stmt_info, 0, vect_body); + continue; + } - unsigned HOST_WIDE_INT const_nunits = nunits.to_constant (); - unsigned HOST_WIDE_INT const_offset_nunits - = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype).to_constant (); - vec *ctor_elts; - vec_alloc (ctor_elts, const_nunits); - gimple_seq stmts = NULL; - tree elt_type = TREE_TYPE (vectype); - unsigned HOST_WIDE_INT elt_size - = tree_to_uhwi (TYPE_SIZE (elt_type)); - /* We support offset vectors with more elements - than the data vector for now. */ - unsigned HOST_WIDE_INT factor - = const_offset_nunits / const_nunits; - vec_offset = vec_offsets[j / factor]; - unsigned elt_offset = (j % factor) * const_nunits; - tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset)); - tree scale = size_int (gs_info.scale); - align = get_object_alignment (DR_REF (first_dr_info->dr)); - tree ltype = build_aligned_type (TREE_TYPE (vectype), align); - for (unsigned k = 0; k < const_nunits; ++k) - { - /* Compute the offsetted pointer. */ - tree boff = size_binop (MULT_EXPR, TYPE_SIZE (idx_type), - bitsize_int (k + elt_offset)); - tree idx - = gimple_build (&stmts, BIT_FIELD_REF, idx_type, vec_offset, - TYPE_SIZE (idx_type), boff); - idx = gimple_convert (&stmts, sizetype, idx); - idx = gimple_build (&stmts, MULT_EXPR, sizetype, idx, scale); - tree ptr - = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (dataref_ptr), - dataref_ptr, idx); - ptr = gimple_convert (&stmts, ptr_type_node, ptr); - /* Extract the element to be stored. */ - tree elt - = gimple_build (&stmts, BIT_FIELD_REF, TREE_TYPE (vectype), - vec_oprnd, TYPE_SIZE (elt_type), - bitsize_int (k * elt_size)); - gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); - stmts = NULL; - tree ref - = build2 (MEM_REF, ltype, ptr, build_int_cst (ref_type, 0)); - new_stmt = gimple_build_assign (ref, elt); - vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + unsigned HOST_WIDE_INT const_nunits = nunits.to_constant (); + unsigned HOST_WIDE_INT const_offset_nunits + = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype).to_constant (); + vec *ctor_elts; + vec_alloc (ctor_elts, const_nunits); + gimple_seq stmts = NULL; + tree elt_type = TREE_TYPE (vectype); + unsigned HOST_WIDE_INT elt_size + = tree_to_uhwi (TYPE_SIZE (elt_type)); + /* We support offset vectors with more elements + than the data vector for now. */ + unsigned HOST_WIDE_INT factor + = const_offset_nunits / const_nunits; + vec_offset = vec_offsets[(vec_num * j + i) / factor]; + unsigned elt_offset = (j % factor) * const_nunits; + tree idx_type = TREE_TYPE (TREE_TYPE (vec_offset)); + tree scale = size_int (gs_info.scale); + align = get_object_alignment (DR_REF (first_dr_info->dr)); + tree ltype = build_aligned_type (TREE_TYPE (vectype), align); + for (unsigned k = 0; k < const_nunits; ++k) + { + /* Compute the offsetted pointer. */ + tree boff = size_binop (MULT_EXPR, TYPE_SIZE (idx_type), + bitsize_int (k + elt_offset)); + tree idx + = gimple_build (&stmts, BIT_FIELD_REF, idx_type, + vec_offset, TYPE_SIZE (idx_type), boff); + idx = gimple_convert (&stmts, sizetype, idx); + idx = gimple_build (&stmts, MULT_EXPR, sizetype, + idx, scale); + tree ptr + = gimple_build (&stmts, PLUS_EXPR, + TREE_TYPE (dataref_ptr), + dataref_ptr, idx); + ptr = gimple_convert (&stmts, ptr_type_node, ptr); + /* Extract the element to be stored. */ + tree elt + = gimple_build (&stmts, BIT_FIELD_REF, + TREE_TYPE (vectype), + vec_oprnd, TYPE_SIZE (elt_type), + bitsize_int (k * elt_size)); + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); + stmts = NULL; + tree ref + = build2 (MEM_REF, ltype, ptr, + build_int_cst (ref_type, 0)); + new_stmt = gimple_build_assign (ref, elt); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + if (slp) + slp_node->push_vec_def (new_stmt); } } - if (j == 0) - *vec_stmt = new_stmt; - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + if (!slp && !costing_p) + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); } + if (!slp && !costing_p) + *vec_stmt = STMT_VINFO_VEC_STMTS (stmt_info)[0]; + if (costing_p && dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "vect_model_store_cost: inside_cost = %d, "