diff mbox series

middle-end/114070 - VEC_COND_EXPR folding

Message ID 20240229083505.9ACA41329E@imap2.dmz-prg2.suse.org
State New
Headers show
Series middle-end/114070 - VEC_COND_EXPR folding | expand

Commit Message

Richard Biener Feb. 29, 2024, 8:35 a.m. UTC
The following amends the PR114070 fix to optimistically allow
the folding when we cannot expand the current vec_cond using
vcond_mask and we're still before vector lowering.  This leaves
a small window between vectorization and lowering where we could
break vec_conds that can be expanded via vcond{,u,eq}, most
susceptible is the loop unrolling pass which applies VN and thus
possibly folding to the unrolled body of a vectorized loop.

This gets back the folding for targets that cannot do vectorization.
It doesn't get back the folding for x86 with AVX512 for example
since that can handle the original IL but not the folded since
it misses some vcond_mask expanders.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

As said for stage1 I want to move vector lowering before vectorization.
While I'm not entirely happy with this patch it forces us into the
correct direction, getting vcond_mask and vcmp{,u,eq} patterns
implemented.  We could use canonicalize_math_p () to close the
vectorizer -> vector lowering gap but this only works when that
pass is run (not with -Og or when disabled).  We could add a new
PROP_vectorizer_il and disable the folding if the vectorizer ran.

Or we could simply live with the regression.

Any preferences?

Thanks,
Richard.

	PR middle-end/114070
	* match.pd ((c ? a : b) op d  -->  c ? (a op d) : (b op d)):
	Allow the folding if before lowering and the current IL
	isn't supported with vcond_mask.
---
 gcc/match.pd | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Jeff Law March 3, 2024, 5:02 p.m. UTC | #1
On 2/29/24 01:35, Richard Biener wrote:
> The following amends the PR114070 fix to optimistically allow
> the folding when we cannot expand the current vec_cond using
> vcond_mask and we're still before vector lowering.  This leaves
> a small window between vectorization and lowering where we could
> break vec_conds that can be expanded via vcond{,u,eq}, most
> susceptible is the loop unrolling pass which applies VN and thus
> possibly folding to the unrolled body of a vectorized loop.
> 
> This gets back the folding for targets that cannot do vectorization.
> It doesn't get back the folding for x86 with AVX512 for example
> since that can handle the original IL but not the folded since
> it misses some vcond_mask expanders.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> As said for stage1 I want to move vector lowering before vectorization.
> While I'm not entirely happy with this patch it forces us into the
> correct direction, getting vcond_mask and vcmp{,u,eq} patterns
> implemented.  We could use canonicalize_math_p () to close the
> vectorizer -> vector lowering gap but this only works when that
> pass is run (not with -Og or when disabled).  We could add a new
> PROP_vectorizer_il and disable the folding if the vectorizer ran.
> 
> Or we could simply live with the regression.
> 
> Any preferences?
Not really.  As I think I said, I consider the regression insignificant 
an I could certainly live with it.

jeff
diff mbox series

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index f3fffd8dec2..4edba7c84fb 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5153,7 +5153,13 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
   (if (TREE_CODE_CLASS (op) != tcc_comparison
        || types_match (type, TREE_TYPE (@1))
-       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
+       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
+       || (optimize_vectors_before_lowering_p ()
+	   /* The following is optimistic on the side of non-support, we are
+	      missing the legacy vcond{,u,eq} cases.  Do this only when
+	      lowering will be able to fixup..  */
+	   && !expand_vec_cond_expr_p (TREE_TYPE (@1),
+				       TREE_TYPE (@0), ERROR_MARK)))
    (vec_cond @0 (op! @1 @3) (op! @2 @4))))
 
 /* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
@@ -5161,13 +5167,19 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (op (vec_cond:s @0 @1 @2) @3)
   (if (TREE_CODE_CLASS (op) != tcc_comparison
        || types_match (type, TREE_TYPE (@1))
-       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
+       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
+       || (optimize_vectors_before_lowering_p ()
+	   && !expand_vec_cond_expr_p (TREE_TYPE (@1),
+				       TREE_TYPE (@0), ERROR_MARK)))
    (vec_cond @0 (op! @1 @3) (op! @2 @3))))
  (simplify
   (op @3 (vec_cond:s @0 @1 @2))
   (if (TREE_CODE_CLASS (op) != tcc_comparison
        || types_match (type, TREE_TYPE (@1))
-       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
+       || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
+       || (optimize_vectors_before_lowering_p ()
+	   && !expand_vec_cond_expr_p (TREE_TYPE (@1),
+				       TREE_TYPE (@0), ERROR_MARK)))
    (vec_cond @0 (op! @3 @1) (op! @3 @2)))))
 
 #if GIMPLE