From patchwork Thu Nov 8 11:02:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Renlin Li X-Patchwork-Id: 994761 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-489332-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=foss.arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="pQ6YfpkN"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42rL3l2gcnz9sBZ for ; Thu, 8 Nov 2018 22:02:59 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; q=dns; s= default; b=J7cp1wzJzH2sIkiwMY8XwV4R7jwjsqZJF5rl1l3AhR+8AMVyT7H4Z RdzbmvDZRsEl5fHwEzoHaG8MNzdR3X9+ZBMmVpkzDlq7PqvhWAvGlznaPzYDCKBW iasI0RmbMQbSVQ/0H0+80yD5Dz+QorsFT5/syQ954NvHAGlja0Yp6I= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; s= default; bh=r1QFDxSVR3PoFLnhyVYYnKfboXc=; b=pQ6YfpkNUaQ3CRi9aism AYJwjjHV5J3PFCuwWc7txbI9n9MBExwjeN14T+qThVdAnygt5cDMzy07IE0BQee7 sDLXRVEhe0pz3wPSNUg/h6VJdxquDT4NYY8IjRvKXfhDjZu1wh4M981m50RpJMKA UOB9r3as7MWHh+xWuIecDGQ= Received: (qmail 3680 invoked by alias); 8 Nov 2018 11:02:51 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 3664 invoked by uid 89); 8 Nov 2018 11:02:50 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH autolearn=ham version=3.3.2 spammy=sk:vect_tr, producers X-HELO: foss.arm.com Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 08 Nov 2018 11:02:48 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1243C80D; Thu, 8 Nov 2018 03:02:47 -0800 (PST) Received: from [10.2.206.82] (e109742-lin.cambridge.arm.com [10.2.206.82]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 21DAE3F5CF; Thu, 8 Nov 2018 03:02:45 -0800 (PST) From: Renlin Li Subject: [RFC][PATCH]Merge VEC_COND_EXPR into MASK_STORE after loop vectorization To: "gcc-patches@gcc.gnu.org" , Richard Sandiford , Ramana Radhakrishnan , James Greenhalgh , Richard Biener Message-ID: Date: Thu, 8 Nov 2018 11:02:44 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 X-IsSubscribed: yes Hi all, When allow-store-data-races is enabled, ifcvt would prefer to generated conditional select and unconditional store to convert certain if statement into: _ifc_1 = val _ifc_2 = A[i] val = cond? _ifc_1 : _ifc_2 A[i] = val When the loop got vectorized, this pattern will be turned into MASK_LOAD, VEC_COND_EXPR and MASK_STORE. This could be improved. The change here add a post processing function to combine the VEC_COND_EXPR expression into MASK_STORE, and delete related dead code. I am a little bit conservative here. I didn't change the default behavior of ifcvt to always generate MASK_STORE, although it should be safe in all cases (allow or dis-allow store data race). MASK_STORE might not well handled in vectorization pass compared with conditional select. It might be too early and aggressive to do that in ifcvt. And the performance of MASK_STORE might not good for some platforms. (We could add --param or target hook to differentiate this ifcvt behavior on different platforms) Another reason I did not do that in ifcvt is the data reference analysis. To create a MASK_STORE, a pointer is created as the first argument to the internal function call. If the pointer is created out of array references, e.g. x = &A[i], data reference analysis could not properly analysis the relationship between MEM_REF (x) and ARRAY_REF (A, i). This will create a versioned loop beside the vectorized one. (I have hacks to look through the MEM_REF, and restore the reference back to ARRAY_REF (A, i). Maybe we could do analysis on lowered address expression? I saw we have gimple laddress pass to aid the vectorizer) The approach here comes a little bit late, on the condition that vector MASK_STORE is generated by loop vectorizer. In this case, it is definitely beneficial to do the code transformation. Any thoughts on the best way to fix the issue? This patch has been tested with aarch64-none-elf, no regressions. Regards, Renlin gcc/ChangeLog: 2018-11-08 Renlin Li * tree-vectorizer.h (combine_sel_mask_store): Declare new function. * tree-vect-loop.c (combine_sel_mask_store): Define new function. * tree-vectorizer.c (vectorize_loops): Call it. gcc/testsuite/ChangeLog: 2018-11-08 Renlin Li * gcc.target/aarch64/sve/combine_vcond_mask_store_1.c: New. diff --git a/gcc/testsuite/gcc.target/aarch64/sve/combine_vcond_mask_store_1.c b/gcc/testsuite/gcc.target/aarch64/sve/combine_vcond_mask_store_1.c new file mode 100644 index 0000000000000000000000000000000000000000..64f6b7b00f58ee45bd4a2f91c1a9404911f1a09f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/combine_vcond_mask_store_1.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize --param allow-store-data-races=1 -fdump-tree-vect-details" } */ + +void test () +{ + static int array[100]; + for (unsigned i = 1; i < 16; ++i) + { + int a = array[i]; + if (a & 1) + array[i] = a + 1; + if (array[i] > 10) + array[i] = a + 2; + } +} + +/* { dg-final { scan-tree-dump-times "Combining VEC_COND_EXPR and MASK_STORE" 1 "vect" } } */ diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 177b284e9c617a41c33d1387ba5afbed51d8ed00..9e1a167d03ea5bf640e58b3426d42b4e3c74da56 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -8539,6 +8539,166 @@ vect_transform_loop (loop_vec_info loop_vinfo) return epilogue; } +/* + When allow-store-data-races=1, if-conversion will convert certain if + statements into: + A[i] = cond ? val : A[i]. + If the loop is successfully vectorized, + MASK_LOAD + VEC_COND_EXPR + MASK_STORE will be generated. + + This pattern could be combined into a single MASK_STORE with new mask. + The new mask is the combination of original mask and the value selection mask + in VEC_COND_EXPR. + + After the transformation, the MASK_LOAD and VEC_COND_EXPR might be dead. */ + +void +combine_sel_mask_store (struct loop *loop) +{ + basic_block *bbs = get_loop_body (loop); + unsigned nbbs = loop->num_nodes; + unsigned i; + basic_block bb; + gimple_stmt_iterator gsi; + + vect_location = find_loop_location (loop); + for (i = 0; i < nbbs; i++) + { + bb = bbs[i]; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimple *mask_store = gsi_stmt (gsi); + if (!gimple_call_internal_p (mask_store, IFN_MASK_STORE)) + continue; + + /* + X = MASK_LOAD (PTR, -, MASK) + VAL = ... + Y = VEC_COND (cond, VAL, X) + MASK_STORE (PTR, -, MASK, Y) + */ + tree vec_op = gimple_call_arg (mask_store, 3); + tree store_mask = gimple_call_arg (mask_store, 2); + if (TREE_CODE (vec_op) == SSA_NAME) + { + gimple *def = SSA_NAME_DEF_STMT (vec_op); + gassign *assign = dyn_cast (def); + if (!assign || gimple_assign_rhs_code (assign) != VEC_COND_EXPR) + continue; + + tree sel_cond = gimple_assign_rhs1 (assign); + tree true_val = gimple_assign_rhs2 (assign); + tree false_val = gimple_assign_rhs3 (assign); + gimple *mask_load = NULL; + + /* A[i] = cond ? val : A[i] */ + if (TREE_CODE (false_val) == SSA_NAME) + { + gimple *def = SSA_NAME_DEF_STMT (false_val); + if (gimple_call_internal_p (def, IFN_MASK_LOAD)) + mask_load = def; + } + /* A[i] = cond ? A[i] : val + Transform into: + A[i] = !cond ? val : A[i] */ + if (mask_load == NULL && TREE_CODE (true_val) == SSA_NAME) + { + gimple *def = SSA_NAME_DEF_STMT (true_val); + if (gimple_call_internal_p (def, IFN_MASK_LOAD)) + { + enum tree_code code = TREE_CODE (sel_cond); + tree op_type = TREE_TYPE (TREE_OPERAND (sel_cond, 0)); + code = invert_tree_comparison (code, HONOR_NANS (op_type)); + if (code == ERROR_MARK) + continue; + sel_cond = build2_loc (EXPR_LOCATION (sel_cond), code, + TREE_TYPE (sel_cond), + TREE_OPERAND (sel_cond, 0), + TREE_OPERAND (sel_cond, 1)); + mask_load = def; + true_val = false_val; + } + } + + /* The pair must be in the same basic block, use the same mask, + and access the same memory. */ + if (mask_load == NULL || + gimple_bb (mask_store) != gimple_bb (mask_load) || + store_mask != gimple_call_arg (mask_load, 2) || + gimple_vuse (mask_store) != gimple_vuse (mask_load)) + continue; + + auto_vec refs; + opt_result res + = find_data_references_in_stmt (loop, mask_store, &refs); + if (!res) + continue; + data_reference_p dr_a = refs.pop (); + res = find_data_references_in_stmt (loop, mask_load, &refs); + if (!res) + continue; + data_reference_p dr_b = refs.pop (); + + if (!same_data_refs (dr_a, dr_b)) + continue; + + /* If the data reference is the same, they are accessing the + same memory location. Merge the pattern. */ + tree sel_mask + = force_gimple_operand_gsi (&gsi, unshare_expr (sel_cond), + true, NULL_TREE, + true, GSI_SAME_STMT); + + tree and_mask = make_temp_ssa_name (TREE_TYPE (store_mask), + NULL, "vec_mask_and"); + gimple *and_stmt = gimple_build_assign (and_mask, BIT_AND_EXPR, + sel_mask, store_mask); + gsi_insert_before (&gsi, and_stmt, GSI_SAME_STMT); + + gcall *new_stmt + = gimple_build_call_internal (IFN_MASK_STORE, 4, + gimple_call_arg (mask_store, 0), + gimple_call_arg (mask_store, 1), + and_mask, true_val); + gimple_call_set_nothrow (new_stmt, true); + gsi_replace (&gsi, new_stmt, true); + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Combining VEC_COND_EXPR and MASK_STORE:\n" + "%G%G", assign, new_stmt); + + /* Remove dead statements. */ + if (has_zero_uses (vec_op)) + { + auto_vec worklist; + worklist.safe_push (vec_op); + while (!worklist.is_empty ()) + { + tree val = worklist.pop (); + if (TREE_CODE (val) == SSA_NAME + && has_zero_uses (val)) + { + ssa_op_iter i; + tree op; + gimple *def = SSA_NAME_DEF_STMT (val); + + FOR_EACH_SSA_TREE_OPERAND (op, def, i, SSA_OP_USE) + worklist.safe_push (op); + + gimple_stmt_iterator gsi = gsi_for_stmt (def); + gsi_remove (&gsi, true); + } + } + } + } + } + } + + free (bbs); +} + /* The code below is trying to perform simple optimization - revert if-conversion for masked stores, i.e. if the mask of a store is zero do not perform it and all stored value producers also if possible. diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 988456808318dabd0058f6b0d038f8c272e75c6b..ea661ec609df56f96cb54c3f2996646c05870667 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1510,6 +1510,7 @@ extern void vect_get_store_cost (stmt_vec_info, int, extern bool vect_supportable_shift (enum tree_code, tree); extern tree vect_gen_perm_mask_any (tree, const vec_perm_indices &); extern tree vect_gen_perm_mask_checked (tree, const vec_perm_indices &); +extern void combine_sel_mask_store (struct loop*); extern void optimize_mask_stores (struct loop*); extern gcall *vect_gen_while (tree, tree, tree); extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree); diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 12bf0fcd5bde4b889fb74342c4e7dd52327efa57..a1145cdfbeb826669071dd077420908ba67bdecc 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -1143,6 +1143,7 @@ vectorize_loops (void) loop_vinfo = (loop_vec_info) loop->aux; has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo); delete loop_vinfo; + combine_sel_mask_store (loop); if (has_mask_store && targetm.vectorize.empty_mask_is_expensive (IFN_MASK_STORE)) optimize_mask_stores (loop);