diff mbox series

forwprop: Tweak choice of VEC_PERM_EXPR filler [PR92822]

Message ID mpttv4hgc3k.fsf@arm.com
State New
Headers show
Series forwprop: Tweak choice of VEC_PERM_EXPR filler [PR92822] | expand

Commit Message

Richard Sandiford Jan. 27, 2020, 3:20 p.m. UTC
For the 2s failures in the PR, we have a V4SF VEC_PERM_EXPR in
which the first two elements are duplicates of one element and
the other two are don't-care:

    v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, ?, ? }>;

The heuristic was to extend this with a blend:

    v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, 2, 3 }>;

but it seems better to extend a partial duplicate to a full duplicate:

    v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, 1, 1 }>;

Obviously this is still just a heuristic though.

I wondered whether to restrict this to two elements or more
but couldn't find any examples in which it made a difference.
Either way should be fine for the purposes of fixing this PR.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Richard mentioned
in the PR that he had a different fix in mind, but since I'd tested
this overnight, I thought I might as well post it anyway as a possible
belt-and-braces fix.  OK to install?

Richard


2020-01-27  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	PR tree-optimization/92822
	* tree-ssa-forwprop.c (simplify_vector_constructor): When filling
	out the don't-care elements of a vector whose significant elements
	are duplicates, make the don't-care elements duplicates too.
---
 gcc/tree-ssa-forwprop.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

Comments

Richard Biener Jan. 28, 2020, 8:46 a.m. UTC | #1
On Mon, 27 Jan 2020, Richard Sandiford wrote:

> For the 2s failures in the PR, we have a V4SF VEC_PERM_EXPR in
> which the first two elements are duplicates of one element and
> the other two are don't-care:
> 
>     v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, ?, ? }>;
> 
> The heuristic was to extend this with a blend:
> 
>     v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, 2, 3 }>;
> 
> but it seems better to extend a partial duplicate to a full duplicate:
> 
>     v4sf_b = VEC_PERM_EXPR <v4sf_a, v4sf_a, { 1, 1, 1, 1 }>;
> 
> Obviously this is still just a heuristic though.
> 
> I wondered whether to restrict this to two elements or more
> but couldn't find any examples in which it made a difference.
> Either way should be fine for the purposes of fixing this PR.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Richard mentioned
> in the PR that he had a different fix in mind, but since I'd tested
> this overnight, I thought I might as well post it anyway as a possible
> belt-and-braces fix.  OK to install?

OK.

It's a sound heuristic - IIRC I played with some form of this
(not sure if I tried to restrict it to all-same elts) and for
some cases it was worse on x86.  On x86 there's a splat but
obviously not from a random vector lane so if one wants to use
that it's an extract (move to lane zero) and then splat.

Anyway, if we have particular cases that regress with this
we want testsuite coverage.  [not sure how difficult it is
to cover all cases up to N elements with some scripting...  looking
over the assembly diff would be not fun in any case...]

Thanks,
Richard.

> Richard
> 
> 
> 2020-01-27  Richard Sandiford  <richard.sandiford@arm.com>
> 
> gcc/
> 	PR tree-optimization/92822
> 	* tree-ssa-forwprop.c (simplify_vector_constructor): When filling
> 	out the don't-care elements of a vector whose significant elements
> 	are duplicates, make the don't-care elements duplicates too.
> ---
>  gcc/tree-ssa-forwprop.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
> index d63e87c8a5b..5203891950a 100644
> --- a/gcc/tree-ssa-forwprop.c
> +++ b/gcc/tree-ssa-forwprop.c
> @@ -2455,16 +2455,26 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
>  	 it and its source indexes to make the permutation supported.
>  	 For now it mimics a blend.  */
>        vec_perm_builder sel (refnelts, refnelts, 1);
> +      bool all_same_p = true;
>        for (i = 0; i < elts.length (); ++i)
> -	sel.quick_push (elts[i].second + elts[i].first * refnelts);
> +	{
> +	  sel.quick_push (elts[i].second + elts[i].first * refnelts);
> +	  all_same_p &= known_eq (sel[i], sel[0]);
> +	}
>        /* And fill the tail with "something".  It's really don't care,
>           and ideally we'd allow VEC_PERM to have a smaller destination
> -	 vector.  As heuristic try to preserve a uniform orig[0] which
> -	 facilitates later pattern-matching VEC_PERM_EXPR to a
> -	 BIT_INSERT_EXPR.  */
> +	 vector.  As a heuristic:
> +
> +	 (a) if what we have so far duplicates a single element, make the
> +	     tail do the same
> +
> +	 (b) otherwise preserve a uniform orig[0].  This facilitates
> +	     later pattern-matching of VEC_PERM_EXPR to a BIT_INSERT_EXPR.  */
>        for (; i < refnelts; ++i)
> -	sel.quick_push ((elts[0].second == 0 && elts[0].first == 0
> -			 ? 0 : refnelts) + i);
> +	sel.quick_push (all_same_p
> +			? sel[0]
> +			: (elts[0].second == 0 && elts[0].first == 0
> +			   ? 0 : refnelts) + i);
>        vec_perm_indices indices (sel, orig[1] ? 2 : 1, refnelts);
>        if (!can_vec_perm_const_p (TYPE_MODE (perm_type), indices))
>  	return false;
diff mbox series

Patch

diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index d63e87c8a5b..5203891950a 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -2455,16 +2455,26 @@  simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	 it and its source indexes to make the permutation supported.
 	 For now it mimics a blend.  */
       vec_perm_builder sel (refnelts, refnelts, 1);
+      bool all_same_p = true;
       for (i = 0; i < elts.length (); ++i)
-	sel.quick_push (elts[i].second + elts[i].first * refnelts);
+	{
+	  sel.quick_push (elts[i].second + elts[i].first * refnelts);
+	  all_same_p &= known_eq (sel[i], sel[0]);
+	}
       /* And fill the tail with "something".  It's really don't care,
          and ideally we'd allow VEC_PERM to have a smaller destination
-	 vector.  As heuristic try to preserve a uniform orig[0] which
-	 facilitates later pattern-matching VEC_PERM_EXPR to a
-	 BIT_INSERT_EXPR.  */
+	 vector.  As a heuristic:
+
+	 (a) if what we have so far duplicates a single element, make the
+	     tail do the same
+
+	 (b) otherwise preserve a uniform orig[0].  This facilitates
+	     later pattern-matching of VEC_PERM_EXPR to a BIT_INSERT_EXPR.  */
       for (; i < refnelts; ++i)
-	sel.quick_push ((elts[0].second == 0 && elts[0].first == 0
-			 ? 0 : refnelts) + i);
+	sel.quick_push (all_same_p
+			? sel[0]
+			: (elts[0].second == 0 && elts[0].first == 0
+			   ? 0 : refnelts) + i);
       vec_perm_indices indices (sel, orig[1] ? 2 : 1, refnelts);
       if (!can_vec_perm_const_p (TYPE_MODE (perm_type), indices))
 	return false;