Message ID | CAHFci2_8F+6E5o9O-L22FymKFncBtmYkQEwk3mhN9teD5s+ktg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Wed, May 18, 2016 at 5:06 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Tue, May 17, 2016 at 12:08 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Mon, May 16, 2016 at 10:09 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>> On Fri, May 13, 2016 at 5:53 PM, Richard Biener >>> <richard.guenther@gmail.com> wrote: >>>> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <Bin.Cheng@arm.com> wrote: >>>>>Hi, >>>>>As PR69848 reported, GCC vectorizer now generates comparison outside of >>>>>VEC_COND_EXPR for COND_REDUCTION case, as below: >>>>> >>>>> _20 = vect__1.6_8 != { 0, 0, 0, 0 }; >>>>> vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>; >>>>> _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>; >>>>> >>>>>This results in inefficient expanding. With IR like: >>>>> >>>>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0, >>>>>0, 0 }, vect_c_2.7_13>; >>>>> _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>; >>>>> >>>>>We can do: >>>>>1) Expanding time optimization, for example, reverting comparison >>>>>operator by switching VEC_COND_EXPR operands. This is useful when >>>>>backend only supports some comparison operators. >>>>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR >>>>>instruction which introduced by expand_vec_cond_expr. >>>>> >>>>>This patch fixes this by propagating comparison into VEC_COND_EXPR even >>>>>if it's used multiple times. For now, GCC does single_use_only >>>>>propagation. Ideally, we may duplicate the comparison before each use >>>>>statement just before expanding, so that TER can successfully backtrack >>>>>it from each VEC_COND_EXPR. Unfortunately I didn't find a good pass to >>>>>do this. Tree-vect-generic.c looks like a good candidate, but it's so >>>>>early that following CSE could undo the transform. Another possible >>>>>fix is to generate comparison inside VEC_COND_EXPR directly in function >>>>>vectorizable_reduction. >>>> >>>> I prefer this for now. >>> Hi Richard, you mean this patch, or the possible fix before your comment? >> >> The possible fix before my comment - make the vectorizer generate VEC_COND_EXPRs >> with embedded comparison. > Hi, > Here is updated patch doing that. It's definitely clearer than the > original version. > Bootstrap and test on x86_64. Also checked the expanding time > optimization still happens. Is it OK? Yes. Thanks, Richard. > Thanks, > bin >> >> Thanks, >> Richard. >>
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index d673c67..67053af 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -6159,21 +6159,14 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi, Finally, we update the phi (NEW_PHI_TREE) to take the value of the new cond_expr (INDEX_COND_EXPR). */ - /* Turn the condition from vec_stmt into an ssa name. */ - gimple_stmt_iterator vec_stmt_gsi = gsi_for_stmt (*vec_stmt); - tree ccompare = gimple_assign_rhs1 (*vec_stmt); - tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare)); - gimple *ccompare_stmt = gimple_build_assign (ccompare_name, - ccompare); - gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, GSI_SAME_STMT); - gimple_assign_set_rhs1 (*vec_stmt, ccompare_name); - update_stmt (*vec_stmt); + /* Duplicate the condition from vec_stmt. */ + tree ccompare = unshare_expr (gimple_assign_rhs1 (*vec_stmt)); /* Create a conditional, where the condition is taken from vec_stmt - (CCOMPARE_NAME), then is the induction index (INDEX_BEFORE_INCR) - and else is the phi (NEW_PHI_TREE). */ + (CCOMPARE), then is the induction index (INDEX_BEFORE_INCR) and + else is the phi (NEW_PHI_TREE). */ tree index_cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type, - ccompare_name, indx_before_incr, + ccompare, indx_before_incr, new_phi_tree); cond_name = make_ssa_name (cr_index_vector_type); gimple *index_condition = gimple_build_assign (cond_name,