diff mbox

[rs6000] Fix vec_construct vectorization cost to be somewhat more accurate

Message ID 1b21afb4-a971-a95d-1084-53948c9c7f4c@linux.vnet.ibm.com
State New
Headers show

Commit Message

Bill Schmidt July 15, 2016, 1:55 p.m. UTC
Hi,

This patch is a follow-up to Richard's patch of
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html.  The cost of a
vec_construct (initialization of an N-way vector by N scalars) is too low,
which can cause too-aggressive vectorization in particular for N=8 or
higher.  Richard changed the default cost to N-1, which is generally
sensible.  For powerpc I am going with a slightly higher cost of N, which
will keep us from being less conservative than the previous values when N=2.

The whole cost model for powerpc needs more work (in particular we need
to distinguish among processor models), but that's beyond the scope of
this patch.  One thing that I've called out in the comments is that a
vec_construct can have wildly different costs depending on the scalar
elements.  If they are all the same small constant, then we only need
a single splat-immediate instruction; but for V4SF the cost is potentially
higher because of the need to do converts.  For the splat case, we might
want to teach the vectorizer in general to estimate the cost as just
a vector_stmt rather than a vec_construct, but that requires some target
knowledge of which constants can be duplicated with a splat-immediate.

In any case, the purpose of this patch is simply to avoid vectorizing
things we shouldn't when we've undercounted the cost of a vec_construct.
Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions (hence the vectorization decisions in the test suite have
not changed).  Is this ok for trunk?

Thanks,
Bill


2016-07-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
	Improve vec_construct estimate.

Comments

Richard Biener July 18, 2016, 11:12 a.m. UTC | #1
On Fri, Jul 15, 2016 at 3:55 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> This patch is a follow-up to Richard's patch of
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html.  The cost of a
> vec_construct (initialization of an N-way vector by N scalars) is too low,
> which can cause too-aggressive vectorization in particular for N=8 or
> higher.  Richard changed the default cost to N-1, which is generally
> sensible.  For powerpc I am going with a slightly higher cost of N, which
> will keep us from being less conservative than the previous values when N=2.
>
> The whole cost model for powerpc needs more work (in particular we need
> to distinguish among processor models), but that's beyond the scope of
> this patch.  One thing that I've called out in the comments is that a
> vec_construct can have wildly different costs depending on the scalar
> elements.  If they are all the same small constant, then we only need
> a single splat-immediate instruction; but for V4SF the cost is potentially
> higher because of the need to do converts.  For the splat case, we might
> want to teach the vectorizer in general to estimate the cost as just
> a vector_stmt rather than a vec_construct, but that requires some target
> knowledge of which constants can be duplicated with a splat-immediate.
>
> In any case, the purpose of this patch is simply to avoid vectorizing
> things we shouldn't when we've undercounted the cost of a vec_construct.
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions (hence the vectorization decisions in the test suite have
> not changed).  Is this ok for trunk?

Note that most of the vectorizer testsuite is running with -fno-vect-cost-model,
only the costmodel tests are running with the cost model enabled.

Richard.

> Thanks,
> Bill
>
>
> 2016-07-15  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
>         Improve vec_construct estimate.
>
>
> Index: gcc/config/rs6000/rs6000.c
> ===================================================================
> --- gcc/config/rs6000/rs6000.c  (revision 238312)
> +++ gcc/config/rs6000/rs6000.c  (working copy)
> @@ -5138,7 +5138,6 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
>                                     tree vectype, int misalign)
>  {
>    unsigned elements;
> -  tree elem_type;
>
>    switch (type_of_cost)
>      {
> @@ -5245,16 +5244,16 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
>          return 2;
>
>        case vec_construct:
> -       elements = TYPE_VECTOR_SUBPARTS (vectype);
> -       elem_type = TREE_TYPE (vectype);
> -       /* 32-bit vectors loaded into registers are stored as double
> -          precision, so we need n/2 converts in addition to the usual
> -          n/2 merges to construct a vector of short floats from them.  */
> -       if (SCALAR_FLOAT_TYPE_P (elem_type)
> -           && TYPE_PRECISION (elem_type) == 32)
> -         return elements + 1;
> -       else
> -         return elements / 2 + 1;
> +       /* This is a rough approximation assuming non-constant elements
> +          constructed into a vector via element insertion.  FIXME:
> +          vec_construct is not granular enough for uniformly good
> +          decisions.  If the initialization is a splat, this is
> +          cheaper than we estimate.  If we want to form four SF
> +          values into a vector, it's more expensive (we need to
> +          copy the four elements into two vector registers,
> +          perform two conversions to single precision, and merge
> +          the two result vectors).  Improve this someday.  */
> +       return TYPE_VECTOR_SUBPARTS (vectype);
>
>        default:
>          gcc_unreachable ();
>
Segher Boessenkool July 18, 2016, 11:56 a.m. UTC | #2
Hi Bill,

On Fri, Jul 15, 2016 at 08:55:08AM -0500, Bill Schmidt wrote:
> This patch is a follow-up to Richard's patch of
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html.  The cost of a
> vec_construct (initialization of an N-way vector by N scalars) is too low,
> which can cause too-aggressive vectorization in particular for N=8 or
> higher.  Richard changed the default cost to N-1, which is generally
> sensible.  For powerpc I am going with a slightly higher cost of N, which
> will keep us from being less conservative than the previous values when N=2.

> In any case, the purpose of this patch is simply to avoid vectorizing
> things we shouldn't when we've undercounted the cost of a vec_construct.
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions (hence the vectorization decisions in the test suite have
> not changed).  Is this ok for trunk?

Do you also have a testcase where it does matter?  It would be good to
add that, then.  Or is it fixing a regression?

I know nothing about the cost model, so someone else will have to review,
or I can just say "okay" ;-)


Segher
Richard Biener July 18, 2016, 12:29 p.m. UTC | #3
On Mon, Jul 18, 2016 at 1:56 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> Hi Bill,
>
> On Fri, Jul 15, 2016 at 08:55:08AM -0500, Bill Schmidt wrote:
>> This patch is a follow-up to Richard's patch of
>> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00584.html.  The cost of a
>> vec_construct (initialization of an N-way vector by N scalars) is too low,
>> which can cause too-aggressive vectorization in particular for N=8 or
>> higher.  Richard changed the default cost to N-1, which is generally
>> sensible.  For powerpc I am going with a slightly higher cost of N, which
>> will keep us from being less conservative than the previous values when N=2.
>
>> In any case, the purpose of this patch is simply to avoid vectorizing
>> things we shouldn't when we've undercounted the cost of a vec_construct.
>> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>> regressions (hence the vectorization decisions in the test suite have
>> not changed).  Is this ok for trunk?
>
> Do you also have a testcase where it does matter?  It would be good to
> add that, then.  Or is it fixing a regression?
>
> I know nothing about the cost model, so someone else will have to review,
> or I can just say "okay" ;-)

You can maybe look at gcc.dg/vect/slp-4[35].c (and run it with the cost model
enabled).

Richard.

>
> Segher
diff mbox

Patch

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 238312)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -5138,7 +5138,6 @@  rs6000_builtin_vectorization_cost (enum vect_cost_
                                    tree vectype, int misalign)
 {
   unsigned elements;
-  tree elem_type;
 
   switch (type_of_cost)
     {
@@ -5245,16 +5244,16 @@  rs6000_builtin_vectorization_cost (enum vect_cost_
         return 2;
 
       case vec_construct:
-	elements = TYPE_VECTOR_SUBPARTS (vectype);
-	elem_type = TREE_TYPE (vectype);
-	/* 32-bit vectors loaded into registers are stored as double
-	   precision, so we need n/2 converts in addition to the usual
-	   n/2 merges to construct a vector of short floats from them.  */
-	if (SCALAR_FLOAT_TYPE_P (elem_type)
-	    && TYPE_PRECISION (elem_type) == 32)
-	  return elements + 1;
-	else
-	  return elements / 2 + 1;
+	/* This is a rough approximation assuming non-constant elements
+	   constructed into a vector via element insertion.  FIXME:
+	   vec_construct is not granular enough for uniformly good
+	   decisions.  If the initialization is a splat, this is
+	   cheaper than we estimate.  If we want to form four SF
+	   values into a vector, it's more expensive (we need to
+	   copy the four elements into two vector registers,
+	   perform two conversions to single precision, and merge
+	   the two result vectors).  Improve this someday.  */
+	return TYPE_VECTOR_SUBPARTS (vectype);
 
       default:
         gcc_unreachable ();