diff mbox

[rs6000] Fix vec_construct vectorization cost to be somewhat more accurate

Message ID 1470847926.5480.10.camel@oc8801110288.ibm.com
State New
Headers show

Commit Message

Bill Schmidt Aug. 10, 2016, 4:52 p.m. UTC
Sorry for the long delay on getting back to this.  I took a look at the
suggested test cases with the cost model available, and did some SPEC
testing to validate the model.  I found that it is still important to
model the 4xfloat case separately to account for conversion from 64-bit
to 32-bit in our internal representation; correcting the calculation
there actually results in no net change, but the commentary is now
better.  The default cost of N-1 that Richard added in his patch applies
well to POWER also, except for V2DI and V2DF modes (N=2) where this
tends to undercount the cost and encourage unprofitable SLP
vectorization in some cases.  Establishing a minimum cost of 2 avoids
test suite regressions and produces acceptable SPEC results.  (So rather
than using N instead of N-1 as in the previous version of this patch,
I'm using N-1 with a floor of 2.)

I looked through gcc.dg/vect/slp-4[35].c with the cost model enabled,
and the results are sensible with these changes.  SPEC results were all
in the noise range.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Ok for trunk?

Thanks,
Bill


2016-08-10  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
	Correct costs for vec_construct.






On Mon, 2016-07-18 at 14:29 +0200, Richard Biener wrote:
> On Mon, Jul 18, 2016 at 1:56 PM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > Hi Bill,
> >
> > On Fri, Jul 15, 2016 at 08:55:08AM -0500, Bill Schmidt wrote:
> >> This patch is a follow-up to Richard's patch of
> >> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html.  The cost of a
> >> vec_construct (initialization of an N-way vector by N scalars) is too low,
> >> which can cause too-aggressive vectorization in particular for N=8 or
> >> higher.  Richard changed the default cost to N-1, which is generally
> >> sensible.  For powerpc I am going with a slightly higher cost of N, which
> >> will keep us from being less conservative than the previous values when N=2.
> >
> >> In any case, the purpose of this patch is simply to avoid vectorizing
> >> things we shouldn't when we've undercounted the cost of a vec_construct.
> >> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> >> regressions (hence the vectorization decisions in the test suite have
> >> not changed).  Is this ok for trunk?
> >
> > Do you also have a testcase where it does matter?  It would be good to
> > add that, then.  Or is it fixing a regression?
> >
> > I know nothing about the cost model, so someone else will have to review,
> > or I can just say "okay" ;-)
> 
> You can maybe look at gcc.dg/vect/slp-4[35].c (and run it with the cost model
> enabled).
> 
> Richard.
> 
> >
> > Segher
>

Comments

Segher Boessenkool Aug. 11, 2016, 10:40 p.m. UTC | #1
On Wed, Aug 10, 2016 at 11:52:06AM -0500, Bill Schmidt wrote:
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Ok for trunk?

Okay.  Thanks,


Segher


> 2016-08-10  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> 	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
> 	Correct costs for vec_construct.
diff mbox

Patch

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 239310)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -5266,16 +5266,20 @@  rs6000_builtin_vectorization_cost (enum vect_cost_
         return 2;
 
       case vec_construct:
-	elements = TYPE_VECTOR_SUBPARTS (vectype);
+	/* This is a rough approximation assuming non-constant elements
+	   constructed into a vector via element insertion.  FIXME:
+	   vec_construct is not granular enough for uniformly good
+	   decisions.  If the initialization is a splat, this is
+	   cheaper than we estimate.  Improve this someday.  */
 	elem_type = TREE_TYPE (vectype);
 	/* 32-bit vectors loaded into registers are stored as double
-	   precision, so we need n/2 converts in addition to the usual
-	   n/2 merges to construct a vector of short floats from them.  */
+	   precision, so we need 2 permutes, 2 converts, and 1 merge
+	   to construct a vector of short floats from them.  */
 	if (SCALAR_FLOAT_TYPE_P (elem_type)
 	    && TYPE_PRECISION (elem_type) == 32)
-	  return elements + 1;
+	  return 5;
 	else
-	  return elements / 2 + 1;
+	  return max (2, TYPE_VECTOR_SUBPARTS (vectype) - 1);
 
       default:
         gcc_unreachable ();