Message ID | 7a698dd9-fe80-ed74-bed1-ab9b8141e3b6@gmail.com |
---|---|
State | New |
Headers | show |
Series | [committed] Reduce vector comparison of uniform vectors to a scalar comparison | expand |
On Fri, Aug 27, 2021 at 9:31 PM Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > I was working some aspects of our port's vector support and stumbled > across a bit of silly code. We were comparing two vectors that were > both uniform. > > We can simplify a comparison of uniform vectors to a comparison of a > representative element from each. That in turn may expose const/copy > propagation opportunities or even jump threading opportunities. > > And that's precisely what this patch does inside DOM. At the start of > statement processing we see if we can reduce a vector comparison to a > scalar comparison. > > You can see this in pr95308.C on x86. As we enter dom3 we have: > > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > e.1_1 = e; > _27 = f_26(D) != 0; > _163 = e.1_1 != 0; > _164 = _27 & _163; > _196 = _164 ? -1 : 0; > vect_cst__194 = {_196, _196, _196, _196, _196, _196, _196, _196}; > > [ ... ] > > ;; basic block 8, loop depth 1 > ;; pred: 7 > ;; 6 > if (vect_cst__194 == { 0, 0, 0, 0, 0, 0, 0, 0 }) > goto <bb 10>; [100.00%] > else > goto <bb 9>; [20.00%] > ;; succ: 10 > ;; 9 > > > There's another similar comparison later. We transform the comparison into: > > ;; basic block 8, loop depth 1 > ;; pred: 7 > ;; 6 > if (_196 == 0) > goto <bb 13>; [100.00%] > else > goto <bb 9>; [20.00%] > ;; succ: 13 > ;; 9 > > There's nice secondary effects that show up after the transformation as > well. It's an interesting case indeed, but I think a match.pd rule would have been better to address this. Sth like (for cmp (eq ne) (simplify (cmp vec_same_elem_p@0 vec_same_elem_p@1) (if (INTEGRAL_TYPE_P (type) && (TREE_CODE (type) == BOOLEAN_TYPE || TYPE_PRECISION (type) == 1)) (cmp { uniform_vector_p (@0); } { uniform_vector_p (@1); })))) should do the trick. That makes the transform available to more places (and DOM should also pick it up this way). Richard. > > Bootstrapped and regression tested on x86_64. It'll go through the > rest of the public targets as well as our internal port over the next > 24-48 hrs. > > > Committed to the trunk, > > Jeff
On 8/30/2021 12:51 AM, Richard Biener wrote: > On Fri, Aug 27, 2021 at 9:31 PM Jeff Law via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> >> I was working some aspects of our port's vector support and stumbled >> across a bit of silly code. We were comparing two vectors that were >> both uniform. >> >> We can simplify a comparison of uniform vectors to a comparison of a >> representative element from each. That in turn may expose const/copy >> propagation opportunities or even jump threading opportunities. >> >> And that's precisely what this patch does inside DOM. At the start of >> statement processing we see if we can reduce a vector comparison to a >> scalar comparison. >> >> You can see this in pr95308.C on x86. As we enter dom3 we have: >> >> >> ;; basic block 2, loop depth 0 >> ;; pred: ENTRY >> e.1_1 = e; >> _27 = f_26(D) != 0; >> _163 = e.1_1 != 0; >> _164 = _27 & _163; >> _196 = _164 ? -1 : 0; >> vect_cst__194 = {_196, _196, _196, _196, _196, _196, _196, _196}; >> >> [ ... ] >> >> ;; basic block 8, loop depth 1 >> ;; pred: 7 >> ;; 6 >> if (vect_cst__194 == { 0, 0, 0, 0, 0, 0, 0, 0 }) >> goto <bb 10>; [100.00%] >> else >> goto <bb 9>; [20.00%] >> ;; succ: 10 >> ;; 9 >> >> >> There's another similar comparison later. We transform the comparison into: >> >> ;; basic block 8, loop depth 1 >> ;; pred: 7 >> ;; 6 >> if (_196 == 0) >> goto <bb 13>; [100.00%] >> else >> goto <bb 9>; [20.00%] >> ;; succ: 13 >> ;; 9 >> >> There's nice secondary effects that show up after the transformation as >> well. > It's an interesting case indeed, but I think a match.pd rule would have > been better to address this. Sth like > > (for cmp (eq ne) > (simplify > (cmp vec_same_elem_p@0 vec_same_elem_p@1) > (if (INTEGRAL_TYPE_P (type) > && (TREE_CODE (type) == BOOLEAN_TYPE > || TYPE_PRECISION (type) == 1)) > (cmp { uniform_vector_p (@0); } { uniform_vector_p (@1); })))) > > should do the trick. That makes the transform available to more places > (and DOM should also pick it up this way). Good point. I'll give that a whirl. jeff
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index a0df492df10..a5245b33de6 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -1891,6 +1891,66 @@ dom_opt_dom_walker::test_for_singularity (gimple *stmt, } } +/* If STMT is a comparison of two uniform vectors reduce it to a comparison + of scalar objects, otherwise leave STMT unchanged. */ + +static void +reduce_vector_comparison_to_scalar_comparison (gimple *stmt) +{ + if (gimple_code (stmt) == GIMPLE_COND) + { + tree lhs = gimple_cond_lhs (stmt); + tree rhs = gimple_cond_rhs (stmt); + + /* We may have a vector comparison where both arms are uniform + vectors. If so, we can simplify the vector comparison down + to a scalar comparison. */ + if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE + && TREE_CODE (TREE_TYPE (rhs)) == VECTOR_TYPE) + { + /* If either operand is an SSA_NAME, then look back to its + defining statement to try and get at a suitable source. */ + if (TREE_CODE (rhs) == SSA_NAME) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (rhs); + if (gimple_assign_single_p (def_stmt)) + rhs = gimple_assign_rhs1 (def_stmt); + } + + if (TREE_CODE (lhs) == SSA_NAME) + { + gimple *def_stmt = SSA_NAME_DEF_STMT (lhs); + if (gimple_assign_single_p (def_stmt)) + lhs = gimple_assign_rhs1 (def_stmt); + } + + /* Now see if they are both uniform vectors and if so replace + the vector comparison with a scalar comparison. */ + tree rhs_elem = rhs ? uniform_vector_p (rhs) : NULL_TREE; + tree lhs_elem = lhs ? uniform_vector_p (lhs) : NULL_TREE; + if (rhs_elem && lhs_elem) + { + if (dump_file && dump_flags & TDF_DETAILS) + { + fprintf (dump_file, "Reducing vector comparison: "); + print_gimple_stmt (dump_file, stmt, 0); + } + + gimple_cond_set_rhs (as_a <gcond *>(stmt), rhs_elem); + gimple_cond_set_lhs (as_a <gcond *>(stmt), lhs_elem); + gimple_set_modified (stmt, true); + + if (dump_file && dump_flags & TDF_DETAILS) + { + fprintf (dump_file, "To scalar equivalent: "); + print_gimple_stmt (dump_file, stmt, 0); + fprintf (dump_file, "\n"); + } + } + } + } +} + /* Optimize the statement in block BB pointed to by iterator SI. We try to perform some simplistic global redundancy elimination and @@ -1933,6 +1993,11 @@ dom_opt_dom_walker::optimize_stmt (basic_block bb, gimple_stmt_iterator *si, update_stmt_if_modified (stmt); opt_stats.num_stmts++; + /* STMT may be a comparison of uniform vectors that we can simplify + down to a comparison of scalars. Do that transformation first + so that all the scalar optimizations from here onward apply. */ + reduce_vector_comparison_to_scalar_comparison (stmt); + /* Const/copy propagate into USES, VUSES and the RHS of VDEFs. */ cprop_into_stmt (stmt, m_evrp_range_analyzer);