diff mbox series

[SLP] SLP vectorization: vectorize vector constructors

Message ID 6b66b279-eba2-80be-58f0-f72dc2316fcb@arm.com
State New
Headers show
Series [SLP] SLP vectorization: vectorize vector constructors | expand

Commit Message

Joel Hutton Oct. 30, 2019, 1:08 p.m. UTC
On 15/10/2019 13:11, Richard Biener wrote:
 >>  > You miss to check that CONSTRUCTOR_NELTS == TYPE_VECTOR_SUBPARTS
 >>  > (we can have omitted trailing zeros).
 >
 > ^^^
 >
 > I don't see this being handled?  You give up on non-SSA names
 > but not on the omitted trailing zeros.

I had thought checking the number of vectors produced would work. I've 
added that check.
I'm slightly confused about what should be done for non-SSA names. 
There's no scalar stmt to gather for a constant in a vector constructor.

 >
 > You build a CONSTRUCTOR of vectors, thus
 >
 > _orig_ssa_1 = { vect_part1_2, vect_part2_3, ... };
I've added code to do this, and a testcase which triggers it.

 >
 > +
 > +         if (constructor)
 > +           {
 > +             SLP_INSTANCE_ROOT_STMT (new_instance) = stmt_info->stmt;
 > +           }
 > +         else
 > +           SLP_INSTANCE_ROOT_STMT (new_instance) = NULL;
 > +
 >
 > too much vertical space, no {} around single-stmt if clauses
Fixed.

 >
 >
 > @@ -2725,6 +2760,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
 >                 stmt_vec_info use_stmt_info = vinfo->lookup_stmt
 > (use_stmt);
 >                 if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
 >                   {
 > +                   /* Check this is not a constructor that will be
 > vectorized
 > +                      away.  */
 > +                   if (BB_VINFO_GROUPED_STORES (vinfo).contains
 > (use_stmt_info))
 > +                       continue;
 >
 > hmm, so why not set the slp type on SLP_INSTANCE_ROOT_STMT instead?
 > In theory the stmt should be part of the SLP tree itself but that's
 > probably too awkward to be made work at the moment ;)
I did try this, but it was indeed very awkward to be made to work.

 >
 > vect_ssa_use_outside_bb and vect_slp_check_for_constructors are new
 > functions which need comments.
Fixed. I've also taken the 'vectorize the root' out into a separate 
function.

 >
 > +  /* For vector constructors, the same SSA name must be used to 
maintain
 > data
 > +     flow into other basic blocks.  */
 > +  if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
 > +      && SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1
 > +      && SLP_TREE_VEC_STMTS (node).exists ())
 > +    {
 >
 > it should read
 >
 >   /* Vectorize the instance root.  */
 >
 > and be in vect_schedule_slp after the vect_schedule_slp_instance.
 > As said above you need to handle SLP_TREE_NUMBER_OF_VEC_STMTS > 1,
 > you also cannot simply do "nothing" here, "failing" vectorization
 > (well, you probably can - DCE will remove the vectorized code - but
 > at least if there were calls in the SLP tree they will be mangled
 > by vectorization so the scalar code is wrecked).  SO it should be
 >
 >  if (SLP_INSTANCE_ROOT_STMT (instance))
 >    .. you may not fail to replace the scalar root stmt here ..
 >
So what should be done in the case that CONSTRUCTOR_NELTS < 
TYPE_VECTOR_SUBPARTS?

 > +         if (CONSTRUCTOR_NELTS (rhs) == 0)
 > +           vectorizable = false;
 > +
 >
 > if you use continue; you can elide the 'vectorizable' variable.
Done.

 >
 > +         if (!vect_ssa_use_outside_bb (gimple_assign_lhs (stmt)))
 > +           vectorizable = false;
 > +
 >
 > why's that?  no comments that clarify ;)  The vector may be
 > used as argument to a call or as source of a store.  So I'd simply
 > remove this check (and the function).

Done. The thinking was that if the vector was used as a source of a 
store the SLP tree would be built from the grouped store instead. Will 
it not cause problems if both the grouped store and the vector 
constructor are used to build SLP trees?



2019-10-10  Joel Hutton  <Joel.Hutton@arm.com>

         * expr.c (store_constructor): Add case for constructor of vectors.
         * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector constructors.
         (vect_bb_slp_scalar_cost): Likewise.
         (vect_slp_check_for_constructors): New function.
         (vect_slp_analyze_bb_1): Add check for vector constructors.
         (vect_schedule_slp_instance): Add case to fixup vector constructor stmt.
         (vectorize_slp_instance_root_stmt): New function
         * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.

gcc/testsuite/ChangeLog:

2019-10-10  Joel Hutton  <Joel.Hutton@arm.com>

         * gcc.dg/vect/bb-slp-40.c: New test.
         * gcc.dg/vect/bb-slp-41.c: New test.

Comments

Richard Biener Oct. 30, 2019, 1:49 p.m. UTC | #1
On Wed, 30 Oct 2019, Joel Hutton wrote:

> On 15/10/2019 13:11, Richard Biener wrote:
>  >>  > You miss to check that CONSTRUCTOR_NELTS == TYPE_VECTOR_SUBPARTS
>  >>  > (we can have omitted trailing zeros).
>  >
>  > ^^^
>  >
>  > I don't see this being handled?  You give up on non-SSA names
>  > but not on the omitted trailing zeros.
> 
> I had thought checking the number of vectors produced would work.

But that's too late to fail.

> I've added that check.
> I'm slightly confused about what should be done for non-SSA names. 
> There's no scalar stmt to gather for a constant in a vector constructor.

For now simply give up.  What could be done is "compact" the
operands to vectorize to only contain SSA names, try to SLP
match and vectorize them and then combine the vectorized result
with the constant elements.

Not worth doing I guess unless it's sth regular like

 { a, b, c, d, 0, 0, 0, 0 }

or so.  But this can be handled as followup if necessary.

>  >
>  > You build a CONSTRUCTOR of vectors, thus
>  >
>  > _orig_ssa_1 = { vect_part1_2, vect_part2_3, ... };
> I've added code to do this, and a testcase which triggers it.

Great.

>  >
>  > +
>  > +         if (constructor)
>  > +           {
>  > +             SLP_INSTANCE_ROOT_STMT (new_instance) = stmt_info->stmt;
>  > +           }
>  > +         else
>  > +           SLP_INSTANCE_ROOT_STMT (new_instance) = NULL;
>  > +
>  >
>  > too much vertical space, no {} around single-stmt if clauses
> Fixed.
> 
>  >
>  >
>  > @@ -2725,6 +2760,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
>  >                 stmt_vec_info use_stmt_info = vinfo->lookup_stmt
>  > (use_stmt);
>  >                 if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
>  >                   {
>  > +                   /* Check this is not a constructor that will be
>  > vectorized
>  > +                      away.  */
>  > +                   if (BB_VINFO_GROUPED_STORES (vinfo).contains
>  > (use_stmt_info))
>  > +                       continue;
>  >
>  > hmm, so why not set the slp type on SLP_INSTANCE_ROOT_STMT instead?
>  > In theory the stmt should be part of the SLP tree itself but that's
>  > probably too awkward to be made work at the moment ;)
> I did try this, but it was indeed very awkward to be made to work.
>
>  >
>  > vect_ssa_use_outside_bb and vect_slp_check_for_constructors are new
>  > functions which need comments.
> Fixed. I've also taken the 'vectorize the root' out into a separate 
> function.
> 
>  >
>  > +  /* For vector constructors, the same SSA name must be used to 
> maintain
>  > data
>  > +     flow into other basic blocks.  */
>  > +  if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
>  > +      && SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1
>  > +      && SLP_TREE_VEC_STMTS (node).exists ())
>  > +    {
>  >
>  > it should read
>  >
>  >   /* Vectorize the instance root.  */
>  >
>  > and be in vect_schedule_slp after the vect_schedule_slp_instance.
>  > As said above you need to handle SLP_TREE_NUMBER_OF_VEC_STMTS > 1,
>  > you also cannot simply do "nothing" here, "failing" vectorization
>  > (well, you probably can - DCE will remove the vectorized code - but
>  > at least if there were calls in the SLP tree they will be mangled
>  > by vectorization so the scalar code is wrecked).  SO it should be
>  >
>  >  if (SLP_INSTANCE_ROOT_STMT (instance))
>  >    .. you may not fail to replace the scalar root stmt here ..
>  >
> So what should be done in the case that CONSTRUCTOR_NELTS < 
> TYPE_VECTOR_SUBPARTS?

You have to fail SLP discovery, not code generation.  The check
you did above should have done this.
 
>  > +         if (CONSTRUCTOR_NELTS (rhs) == 0)
>  > +           vectorizable = false;
>  > +
>  >
>  > if you use continue; you can elide the 'vectorizable' variable.
> Done.
> 
>  >
>  > +         if (!vect_ssa_use_outside_bb (gimple_assign_lhs (stmt)))
>  > +           vectorizable = false;
>  > +
>  >
>  > why's that?  no comments that clarify ;)  The vector may be
>  > used as argument to a call or as source of a store.  So I'd simply
>  > remove this check (and the function).
> 
> Done. The thinking was that if the vector was used as a source of a 
> store the SLP tree would be built from the grouped store instead. Will 
> it not cause problems if both the grouped store and the vector 
> constructor are used to build SLP trees?
> 
> 
> 
> 2019-10-10  Joel Hutton  <Joel.Hutton@arm.com>
> 
>          * expr.c (store_constructor): Add case for constructor of vectors.

Why do you need this?  The vectorizer already creates such CTORs.  Any
testcase that you can show?

>          * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector constructors.
>          (vect_bb_slp_scalar_cost): Likewise.
>          (vect_slp_check_for_constructors): New function.
>          (vect_slp_analyze_bb_1): Add check for vector constructors.
>          (vect_schedule_slp_instance): Add case to fixup vector constructor stmt.
>          (vectorize_slp_instance_root_stmt): New function
>          * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.

+         SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? 
stmt_info->stmt\
+                                                 : NULL;

SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess...

@@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
                stmt_vec_info use_stmt_info = vinfo->lookup_stmt 
(use_stmt);
                if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
                  {
+                   /* Check this is not a constructor that will be 
vectorized
+                      away.  */
+                   if (BB_VINFO_GROUPED_STORES (vinfo).contains 
(use_stmt_info))
+                       continue;
                    (*life)[i] = true;

... which you then simply mark as PURE_SLP_STMT where we call
vect_mark_slp_stmts in vect_slp_analyze_bb_1.

I see you have the TYPE_VECTOR_SUBPARTS check still at transform
stage in vectorize_slp_instance_root_stmt, please simply move
it to vect_slp_check_for_constructors or to vect_analyze_slp_instance
where you have the other rejects (non-SSA names in the ctor).

That is, vectorize_slp_instance_root_stmt may not fail.

+bool
+vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance)
+{
+

extra newline

+  if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1)

the stmt replacing can be commonized between == 1 and > 1 cases.

   FOR_EACH_VEC_ELT (slp_instances, i, instance)
     {
+      slp_tree node = SLP_INSTANCE_TREE (instance);
       /* Schedule the tree of INSTANCE.  */
-      vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance),
+      vect_schedule_slp_instance (node,
                                  instance, bst_map);
+
+      /* Vectorize the instance root.  */
+      if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
+         && SLP_TREE_VEC_STMTS (node).exists ())
+       if (!vectorize_slp_instance_root_stmt (node, instance))
+         {

instance->root == node is always true.  Likewise
SLP_TREE_VEC_STMTS (node).exists ().

@@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo)
       if (is_a <loop_vec_info> (vinfo))

the ChangeLog doesn't mention this.  I guess this isn't necessary
unless you fail vectorizing late which you shouldn't.

diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 
56be28b0cc5a77412f996e70636b08d5b615813e..9f8419e4208b7d438ace41892022f93ebcadd019 
100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -151,6 +151,10 @@ public:
   /* The root of SLP tree.  */
   slp_tree root;

+  /* For vector constructors, the constructor stmt that the SLP tree is 
built
+     from, NULL otherwise.  */
+  gimple *root_stmt;

as said, make this a stmt_vec_info

Thanks,
Richard.


> gcc/testsuite/ChangeLog:
> 
> 2019-10-10  Joel Hutton  <Joel.Hutton@arm.com>
> 
>          * gcc.dg/vect/bb-slp-40.c: New test.
>          * gcc.dg/vect/bb-slp-41.c: New test.
> 
>
Joel Hutton Oct. 30, 2019, 2:19 p.m. UTC | #2
On 30/10/2019 13:49, Richard Biener wrote:
> Why do you need this?  The vectorizer already creates such CTORs.  Any
> testcase that you can show?

typedef long v2di __attribute__((vector_size(16)));
v2di v;
void
foo()
{
   v = (v2di){v[1], v[0]};
}


>>           * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector constructors.
>>           (vect_bb_slp_scalar_cost): Likewise.
>>           (vect_slp_check_for_constructors): New function.
>>           (vect_slp_analyze_bb_1): Add check for vector constructors.
>>           (vect_schedule_slp_instance): Add case to fixup vector constructor stmt.
>>           (vectorize_slp_instance_root_stmt): New function
>>           * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.
> +         SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ?
> stmt_info->stmt\
> +                                                 : NULL;
>
> SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess...
>
> @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
>                  stmt_vec_info use_stmt_info = vinfo->lookup_stmt
> (use_stmt);
>                  if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
>                    {
> +                   /* Check this is not a constructor that will be
> vectorized
> +                      away.  */
> +                   if (BB_VINFO_GROUPED_STORES (vinfo).contains
> (use_stmt_info))
> +                       continue;
>                      (*life)[i] = true;
>
> ... which you then simply mark as PURE_SLP_STMT where we call
> vect_mark_slp_stmts in vect_slp_analyze_bb_1.
>
> I see you have the TYPE_VECTOR_SUBPARTS check still at transform
> stage in vectorize_slp_instance_root_stmt, please simply move
> it to vect_slp_check_for_constructors or to vect_analyze_slp_instance
> where you have the other rejects (non-SSA names in the ctor).
If the check is in vect_slp_check_for_constructors, which vector is used 
as the input to tYPE_VECTOR_SUBPARTS, the lhs of the constructor?
> That is, vectorize_slp_instance_root_stmt may not fail.
>
> +bool
> +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance)
> +{
> +
>
> extra newline
>
> +  if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1)
>
> the stmt replacing can be commonized between == 1 and > 1 cases.
>
>     FOR_EACH_VEC_ELT (slp_instances, i, instance)
>       {
> +      slp_tree node = SLP_INSTANCE_TREE (instance);
>         /* Schedule the tree of INSTANCE.  */
> -      vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance),
> +      vect_schedule_slp_instance (node,
>                                    instance, bst_map);
> +
> +      /* Vectorize the instance root.  */
> +      if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
> +         && SLP_TREE_VEC_STMTS (node).exists ())
> +       if (!vectorize_slp_instance_root_stmt (node, instance))
> +         {
>
> instance->root == node is always true.  Likewise
> SLP_TREE_VEC_STMTS (node).exists ().
>
> @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo)
>         if (is_a <loop_vec_info> (vinfo))
>
> the ChangeLog doesn't mention this.  I guess this isn't necessary
> unless you fail vectorizing late which you shouldn't.
>
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index
> 56be28b0cc5a77412f996e70636b08d5b615813e..9f8419e4208b7d438ace41892022f93ebcadd019
> 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -151,6 +151,10 @@ public:
>     /* The root of SLP tree.  */
>     slp_tree root;
>
> +  /* For vector constructors, the constructor stmt that the SLP tree is
> built
> +     from, NULL otherwise.  */
> +  gimple *root_stmt;
>
> as said, make this a stmt_vec_info
>
> Thanks,
> Richard.
>
>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-10-10  Joel Hutton  <Joel.Hutton@arm.com>
>>
>>           * gcc.dg/vect/bb-slp-40.c: New test.
>>           * gcc.dg/vect/bb-slp-41.c: New test.
>>
>>
Richard Biener Oct. 30, 2019, 2:51 p.m. UTC | #3
On Wed, 30 Oct 2019, Joel Hutton wrote:

> On 30/10/2019 13:49, Richard Biener wrote:
> > Why do you need this?  The vectorizer already creates such CTORs.  Any
> > testcase that you can show?
> 
> typedef long v2di __attribute__((vector_size(16)));
> v2di v;
> void
> foo()
> {
>    v = (v2di){v[1], v[0]};
> }

Huh.  On what architecture?  Is that for a V2DI constructor of
V1DI vectors maybe?  On x86 the code doesn't trigger.

The condition was indeed to check for vector mode elements so
maybe it should simply read

  if (VECTOR_MODE_P (emode))

?  eltmode is always scalar.

> 
> >>           * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector constructors.
> >>           (vect_bb_slp_scalar_cost): Likewise.
> >>           (vect_slp_check_for_constructors): New function.
> >>           (vect_slp_analyze_bb_1): Add check for vector constructors.
> >>           (vect_schedule_slp_instance): Add case to fixup vector constructor stmt.
> >>           (vectorize_slp_instance_root_stmt): New function
> >>           * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.
> > +         SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ?
> > stmt_info->stmt\
> > +                                                 : NULL;
> >
> > SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess...
> >
> > @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
> >                  stmt_vec_info use_stmt_info = vinfo->lookup_stmt
> > (use_stmt);
> >                  if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
> >                    {
> > +                   /* Check this is not a constructor that will be
> > vectorized
> > +                      away.  */
> > +                   if (BB_VINFO_GROUPED_STORES (vinfo).contains
> > (use_stmt_info))
> > +                       continue;
> >                      (*life)[i] = true;
> >
> > ... which you then simply mark as PURE_SLP_STMT where we call
> > vect_mark_slp_stmts in vect_slp_analyze_bb_1.
> >
> > I see you have the TYPE_VECTOR_SUBPARTS check still at transform
> > stage in vectorize_slp_instance_root_stmt, please simply move
> > it to vect_slp_check_for_constructors or to vect_analyze_slp_instance
> > where you have the other rejects (non-SSA names in the ctor).
> If the check is in vect_slp_check_for_constructors, which vector is used 
> as the input to tYPE_VECTOR_SUBPARTS, the lhs of the constructor?

The type of the constructor itself.

> > That is, vectorize_slp_instance_root_stmt may not fail.
> >
> > +bool
> > +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance)
> > +{
> > +
> >
> > extra newline
> >
> > +  if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1)
> >
> > the stmt replacing can be commonized between == 1 and > 1 cases.
> >
> >     FOR_EACH_VEC_ELT (slp_instances, i, instance)
> >       {
> > +      slp_tree node = SLP_INSTANCE_TREE (instance);
> >         /* Schedule the tree of INSTANCE.  */
> > -      vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance),
> > +      vect_schedule_slp_instance (node,
> >                                    instance, bst_map);
> > +
> > +      /* Vectorize the instance root.  */
> > +      if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
> > +         && SLP_TREE_VEC_STMTS (node).exists ())
> > +       if (!vectorize_slp_instance_root_stmt (node, instance))
> > +         {
> >
> > instance->root == node is always true.  Likewise
> > SLP_TREE_VEC_STMTS (node).exists ().
> >
> > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo)
> >         if (is_a <loop_vec_info> (vinfo))
> >
> > the ChangeLog doesn't mention this.  I guess this isn't necessary
> > unless you fail vectorizing late which you shouldn't.
> >
> > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > index
> > 56be28b0cc5a77412f996e70636b08d5b615813e..9f8419e4208b7d438ace41892022f93ebcadd019
> > 100644
> > --- a/gcc/tree-vectorizer.h
> > +++ b/gcc/tree-vectorizer.h
> > @@ -151,6 +151,10 @@ public:
> >     /* The root of SLP tree.  */
> >     slp_tree root;
> >
> > +  /* For vector constructors, the constructor stmt that the SLP tree is
> > built
> > +     from, NULL otherwise.  */
> > +  gimple *root_stmt;
> >
> > as said, make this a stmt_vec_info
> >
> > Thanks,
> > Richard.
> >
> >
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2019-10-10  Joel Hutton  <Joel.Hutton@arm.com>
> >>
> >>           * gcc.dg/vect/bb-slp-40.c: New test.
> >>           * gcc.dg/vect/bb-slp-41.c: New test.
> >>
> >>
>
Joel Hutton Oct. 30, 2019, 3:30 p.m. UTC | #4
On 30/10/2019 14:51, Richard Biener wrote:
> On Wed, 30 Oct 2019, Joel Hutton wrote:
>
>> On 30/10/2019 13:49, Richard Biener wrote:
>>> Why do you need this?  The vectorizer already creates such CTORs.  Any
>>> testcase that you can show?
>> typedef long v2di __attribute__((vector_size(16)));
>> v2di v;
>> void
>> foo()
>> {
>>     v = (v2di){v[1], v[0]};
>> }
> Huh.  On what architecture?  Is that for a V2DI constructor of
> V1DI vectors maybe?  On x86 the code doesn't trigger.

On aarch64.

emode = E_DImode

eltmode = E_DImode


> The condition was indeed to check for vector mode elements so
> maybe it should simply read
>
>    if (VECTOR_MODE_P (emode))
>
> ?  eltmode is always scalar.
I'll try that, thanks.
Joel Hutton Nov. 1, 2019, 9:03 a.m. UTC | #5
On 30/10/2019 13:49, Richard Biener wrote:
 >>
 >>          * expr.c (store_constructor): Add case for constructor of 
vectors.
 > Why do you need this?  The vectorizer already creates such CTORs.  Any
 > testcase that you can show?
 >
 > typedef long v2di __attribute__((vector_size(16)));
 > v2di v;
 > void
 > foo()
 > {
 >    v = (v2di){v[1], v[0]};
 > }
There is a special case for single element vectors, where the vector 
mode and
the element mode are the same.
I've changed this slightly, as your solution of using VECTOR_MODE_P didn't
work for my case where:
   emode = E_DImode
   eltmode = E_DImode
On aarch64. As E_DImode is not a vector mode.
Based on some checks I found in verify_gimple, I set the type of the 
replacement
constructor to the same as the original constructor, as verify_gimple 
seems to
expect flattened types. i.e. a vector of vectors of ints should have 
type vector
of ints.


 > What could be done is "compact" the
 > operands to vectorize to only contain SSA names, try to SLP
 > match and vectorize them and then combine the vectorized result
 > with the constant elements.
 >
 > Not worth doing I guess unless it's sth regular like
 >
 >  { a, b, c, d, 0, 0, 0, 0 }
 >
 > or so.  But this can be handled as followup if necessary.
 >
I agree, it should be possible to support this in future patches.


 > +         SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ?
 > stmt_info->stmt\
 > +                                                 : NULL;
 >
 > SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess...
 >
 > @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
 >                 stmt_vec_info use_stmt_info = vinfo->lookup_stmt
 > (use_stmt);
 >                 if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
 >                   {
 > +                   /* Check this is not a constructor that will be
 > vectorized
 > +                      away.  */
 > +                   if (BB_VINFO_GROUPED_STORES (vinfo).contains
 > (use_stmt_info))
 > +                       continue;
 >                     (*life)[i] = true;
 >
 > ... which you then simply mark as PURE_SLP_STMT where we call
 > vect_mark_slp_stmts in vect_slp_analyze_bb_1.
Done.


 > I see you have the TYPE_VECTOR_SUBPARTS check still at transform
 > stage in vectorize_slp_instance_root_stmt, please simply move
 > it to vect_slp_check_for_constructors or to vect_analyze_slp_instance
 > where you have the other rejects (non-SSA names in the ctor).
 >
Done.


 >
 > +bool
 > +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance)
 > +{
 > +
 >
 > extra newline
Fixed.


 > +  /* For vector constructors, the constructor stmt that the SLP tree is
 > built
 > +     from, NULL otherwise.  */
 > +  gimple *root_stmt;
 >
 > as said, make this a stmt_vec_info
Done.


 > +  if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1)
 >
 > the stmt replacing can be commonized between == 1 and > 1 cases.
Done.


 > +
 > +      /* Vectorize the instance root.  */
 > +      if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
 > +         && SLP_TREE_VEC_STMTS (node).exists ())
 > +       if (!vectorize_slp_instance_root_stmt (node, instance))
 > +         {
 >
 > instance->root == node is always true.  Likewise
 > SLP_TREE_VEC_STMTS (node).exists ().
Done.


 > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo)
 >        if (is_a <loop_vec_info> (vinfo))
 >
 > the ChangeLog doesn't mention this.  I guess this isn't necessary
 > unless you fail vectorizing late which you shouldn't.
 >
It's necessary to avoid:

         removing stmts twice for constructors of the form {_1,_1,_1,_1}
         removing stmts that define ssa names used elsewhere (which 
previously wasn't a consideration because the scalar_stmts were stores 
to memory, not assignments to ssa names)

Updated changelog (and patch)

2019-10-31  Joel Hutton  <Joel.Hutton@arm.com>

         * expr.c (store_constructor): Modify to handle single element 
vectors.
         * tree-vect-slp.c (vect_analyze_slp_instance): Add case for 
vector constructors.
         (vect_slp_check_for_constructors): New function.
         (vect_slp_analyze_bb_1): Call new function to check for vector 
constructors.
         (vectorize_slp_instance_root_stmt): New function.
         (vect_schedule_slp): Call new function to vectorize root stmt 
of vector constructors.
         * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.

gcc/testsuite/ChangeLog:

2019-10-31  Joel Hutton  <Joel.Hutton@arm.com>

         * gcc.dg/vect/bb-slp-40.c: New test.
         * gcc.dg/vect/bb-slp-41.c: New test.
Richard Biener Nov. 4, 2019, 9:49 a.m. UTC | #6
On Fri, 1 Nov 2019, Joel Hutton wrote:

> On 30/10/2019 13:49, Richard Biener wrote:
>  >>
>  >>          * expr.c (store_constructor): Add case for constructor of 
> vectors.
>  > Why do you need this?  The vectorizer already creates such CTORs.  Any
>  > testcase that you can show?
>  >
>  > typedef long v2di __attribute__((vector_size(16)));
>  > v2di v;
>  > void
>  > foo()
>  > {
>  >    v = (v2di){v[1], v[0]};
>  > }
> There is a special case for single element vectors, where the vector 
> mode and
> the element mode are the same.
> I've changed this slightly, as your solution of using VECTOR_MODE_P didn't
> work for my case where:
>    emode = E_DImode
>    eltmode = E_DImode
> On aarch64. As E_DImode is not a vector mode.
> Based on some checks I found in verify_gimple, I set the type of the 
> replacement
> constructor to the same as the original constructor, as verify_gimple 
> seems to
> expect flattened types. i.e. a vector of vectors of ints should have 
> type vector
> of ints.

Huh.  On aarch64 for the above testcase I see mode V2DImode and
emode = eltmode = DImode.  That's just a regular CTOR with
non-vector elements so not sure why you need any adjustment at all
here?

It looks like your vectorization of the CTOR introduces a
V2DImode CTOR of vector(1) long elements which incidentially
have DImode.  That's because somehow V2DI vectorization isn't
profitable but V1DI one is.  In the end it's a noop transform
but the testcase shows that for V2DI vectorization we fail
to cost the CTOR in the scalar cost computation (you have to
include the pattern root there I guess).

Technically we feed valid GIMPLE to the expander so we indeed
shouldn't ICE.  So I'd like to have the earlier keying on
vec_vec_init_p match the later assert we run into, thus
sth like

Index: gcc/expr.c
===================================================================
--- gcc/expr.c  (revision 277765)
+++ gcc/expr.c  (working copy)
@@ -6809,6 +6809,7 @@ store_constructor (tree exp, rtx target,
            && n_elts.is_constant (&const_n_elts))
          {
            machine_mode emode = eltmode;
+           bool vector_typed_elts_p = false;
 
            if (CONSTRUCTOR_NELTS (exp)
                && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp, 
0)->value))
@@ -6819,13 +6820,14 @@ store_constructor (tree exp, rtx target,
                                      * TYPE_VECTOR_SUBPARTS (etype),
                                      n_elts));
                emode = TYPE_MODE (etype);
+               vector_typed_elts_p = true;
              }
            icode = convert_optab_handler (vec_init_optab, mode, emode);
            if (icode != CODE_FOR_nothing)
              {
                unsigned int n = const_n_elts;
 
-               if (emode != eltmode)
+               if (vector_typed_elts_p)
                  {
                    n = CONSTRUCTOR_NELTS (exp);
                    vec_vec_init_p = true;

> 
>  > What could be done is "compact" the
>  > operands to vectorize to only contain SSA names, try to SLP
>  > match and vectorize them and then combine the vectorized result
>  > with the constant elements.
>  >
>  > Not worth doing I guess unless it's sth regular like
>  >
>  >  { a, b, c, d, 0, 0, 0, 0 }
>  >
>  > or so.  But this can be handled as followup if necessary.
>  >
> I agree, it should be possible to support this in future patches.
> 
> 
>  > +         SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ?
>  > stmt_info->stmt\
>  > +                                                 : NULL;
>  >
>  > SLP_INSTANCE_ROOT_STMT should be a stmt_vec_info I guess...
>  >
>  > @@ -2801,6 +2830,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
>  >                 stmt_vec_info use_stmt_info = vinfo->lookup_stmt
>  > (use_stmt);
>  >                 if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
>  >                   {
>  > +                   /* Check this is not a constructor that will be
>  > vectorized
>  > +                      away.  */
>  > +                   if (BB_VINFO_GROUPED_STORES (vinfo).contains
>  > (use_stmt_info))
>  > +                       continue;
>  >                     (*life)[i] = true;
>  >
>  > ... which you then simply mark as PURE_SLP_STMT where we call
>  > vect_mark_slp_stmts in vect_slp_analyze_bb_1.
> Done.
> 
> 
>  > I see you have the TYPE_VECTOR_SUBPARTS check still at transform
>  > stage in vectorize_slp_instance_root_stmt, please simply move
>  > it to vect_slp_check_for_constructors or to vect_analyze_slp_instance
>  > where you have the other rejects (non-SSA names in the ctor).
>  >
> Done.
> 
> 
>  >
>  > +bool
>  > +vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance)
>  > +{
>  > +
>  >
>  > extra newline
> Fixed.
> 
> 
>  > +  /* For vector constructors, the constructor stmt that the SLP tree is
>  > built
>  > +     from, NULL otherwise.  */
>  > +  gimple *root_stmt;
>  >
>  > as said, make this a stmt_vec_info
> Done.
> 
> 
>  > +  if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1)
>  >
>  > the stmt replacing can be commonized between == 1 and > 1 cases.
> Done.
> 
> 
>  > +
>  > +      /* Vectorize the instance root.  */
>  > +      if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
>  > +         && SLP_TREE_VEC_STMTS (node).exists ())
>  > +       if (!vectorize_slp_instance_root_stmt (node, instance))
>  > +         {
>  >
>  > instance->root == node is always true.  Likewise
>  > SLP_TREE_VEC_STMTS (node).exists ().
> Done.
> 
> 
>  > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo)
>  >        if (is_a <loop_vec_info> (vinfo))
>  >
>  > the ChangeLog doesn't mention this.  I guess this isn't necessary
>  > unless you fail vectorizing late which you shouldn't.
>  >
> It's necessary to avoid:
> 
>          removing stmts twice for constructors of the form {_1,_1,_1,_1}
>          removing stmts that define ssa names used elsewhere (which 
> previously wasn't a consideration because the scalar_stmts were stores 
> to memory, not assignments to ssa names)

OK, but the code you are patching is just supposed to remove stores
from the final scalar stmt seeds - it doesn't expect any loads there
which is probably what happens.  I'd instead add a

   /* Do not CSE the stmts feeding a CTOR, they might have uses
      outside of the vectorized stmts.  */
   if (SLP_INSTANCE_ROOT_STMT (instance))
     continue;

before the loop over SLP_TREE_SCALAR_STMTS.

> Updated changelog (and patch)

+         if (!subparts.is_constant () || !(subparts.to_constant ()
+                                           == CONSTRUCTOR_NELTS (rhs)))
+           continue;

can be better written as if (maybe_ne (subparts, CONSTRUCTOR_NELTS (rhs)))

+/* Vectorize the instance root.  Return success or failure.  */
+
+void

since it's now void the comment has to be adjusted.

+      vect_schedule_slp_instance (node,
                                  instance, bst_map);

this now fits in one line.

OK with those changes.

Thanks,
Richard.

> 2019-10-31  Joel Hutton  <Joel.Hutton@arm.com>
> 
>          * expr.c (store_constructor): Modify to handle single element 
> vectors.
>          * tree-vect-slp.c (vect_analyze_slp_instance): Add case for 
> vector constructors.
>          (vect_slp_check_for_constructors): New function.
>          (vect_slp_analyze_bb_1): Call new function to check for vector 
> constructors.
>          (vectorize_slp_instance_root_stmt): New function.
>          (vect_schedule_slp): Call new function to vectorize root stmt 
> of vector constructors.
>          * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-10-31  Joel Hutton  <Joel.Hutton@arm.com>
> 
>          * gcc.dg/vect/bb-slp-40.c: New test.
>          * gcc.dg/vect/bb-slp-41.c: New test.
> 
>
Joel Hutton Nov. 4, 2019, 3:30 p.m. UTC | #7
First copy bounced from mailing list.
 > > On 30/10/2019 13:49, Richard Biener wrote:
 > >  >>
 > >  >>          * expr.c (store_constructor): Add case for constructor of
 > > vectors.
 > >  > Why do you need this?  The vectorizer already creates such 
CTORs.  Any
 > >  > testcase that you can show?
 > >  >
 > >  > typedef long v2di __attribute__((vector_size(16)));
 > >  > v2di v;
 > >  > void
 > >  > foo()
 > >  > {
 > >  >    v = (v2di){v[1], v[0]};
 > >  > }
 > > There is a special case for single element vectors, where the vector
 > > mode and
 > > the element mode are the same.
 > > I've changed this slightly, as your solution of using VECTOR_MODE_P 
didn't
 > > work for my case where:
 > >    emode = E_DImode
 > >    eltmode = E_DImode
 > > On aarch64. As E_DImode is not a vector mode.
 > > Based on some checks I found in verify_gimple, I set the type of the
 > > replacement
 > > constructor to the same as the original constructor, as verify_gimple
 > > seems to
 > > expect flattened types. i.e. a vector of vectors of ints should have
 > > type vector
 > > of ints.
 >
 > Huh.  On aarch64 for the above testcase I see mode V2DImode and
 > emode = eltmode = DImode.  That's just a regular CTOR with
 > non-vector elements so not sure why you need any adjustment at all
 > here?
 >
 > It looks like your vectorization of the CTOR introduces a
 > V2DImode CTOR of vector(1) long elements which incidentially
 > have DImode.  That's because somehow V2DI vectorization isn't
 > profitable but V1DI one is.  In the end it's a noop transform
 > but the testcase shows that for V2DI vectorization we fail
 > to cost the CTOR in the scalar cost computation (you have to
 > include the pattern root there I guess).
 >
Yes, sorry, I was unclear about this, it's the new constructor that the 
change is needed for.

 > Technically we feed valid GIMPLE to the expander so we indeed
 > shouldn't ICE.  So I'd like to have the earlier keying on
 > vec_vec_init_p match the later assert we run into, thus
 > sth like
 >
 > Index: gcc/expr.c
 > ===================================================================
 > --- gcc/expr.c  (revision 277765)
 > +++ gcc/expr.c  (working copy)
 > @@ -6809,6 +6809,7 @@ store_constructor (tree exp, rtx target,
 >             && n_elts.is_constant (&const_n_elts))
 >           {
 >             machine_mode emode = eltmode;
 > +           bool vector_typed_elts_p = false;
 >
 >             if (CONSTRUCTOR_NELTS (exp)
 >                 && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp,
 > 0)->value))
 > @@ -6819,13 +6820,14 @@ store_constructor (tree exp, rtx target,
 >                                       * TYPE_VECTOR_SUBPARTS (etype),
 >                                       n_elts));
 >                 emode = TYPE_MODE (etype);
 > +               vector_typed_elts_p = true;
 >               }
 >             icode = convert_optab_handler (vec_init_optab, mode, emode);
 >             if (icode != CODE_FOR_nothing)
 >               {
 >                 unsigned int n = const_n_elts;
 >
 > -               if (emode != eltmode)
 > +               if (vector_typed_elts_p)
 >                   {
 >                     n = CONSTRUCTOR_NELTS (exp);
 >                     vec_vec_init_p = true;
 >
 >
I've used this, thank you.

 > >  > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo)
 > >  >        if (is_a <loop_vec_info> (vinfo))
 > >  >
 > >  > the ChangeLog doesn't mention this.  I guess this isn't necessary
 > >  > unless you fail vectorizing late which you shouldn't.
 > >  >
 > > It's necessary to avoid:
 > >
 > >          removing stmts twice for constructors of the form 
{_1,_1,_1,_1}
 > >          removing stmts that define ssa names used elsewhere (which
 > > previously wasn't a consideration because the scalar_stmts were stores
 > > to memory, not assignments to ssa names)
 >
 > OK, but the code you are patching is just supposed to remove stores
 > from the final scalar stmt seeds - it doesn't expect any loads there
 > which is probably what happens.  I'd instead add a
 >
 >    /* Do not CSE the stmts feeding a CTOR, they might have uses
 >       outside of the vectorized stmts.  */
 >    if (SLP_INSTANCE_ROOT_STMT (instance))
 >      continue;
 >
 > before the loop over SLP_TREE_SCALAR_STMTS.
Done.

 > +         if (!subparts.is_constant () || !(subparts.to_constant ()
 > +                                           == CONSTRUCTOR_NELTS (rhs)))
 > +           continue;
 >
 > can be better written as if (maybe_ne (subparts, CONSTRUCTOR_NELTS 
(rhs)))
Done.

 > +/* Vectorize the instance root.  Return success or failure. */
 > +
 > +void
 >
 > since it's now void the comment has to be adjusted.
Done.

 > +      vect_schedule_slp_instance (node,
 >                                   instance, bst_map);
 >
 > this now fits in one line.
Done.

 > OK with those changes.
Tested and bootstrapped on aarch64.
OK?

2019-11-04  Joel Hutton  <Joel.Hutton@arm.com>

          * expr.c (store_constructor): Modify to handle single element
vectors.
          * tree-vect-slp.c (vect_analyze_slp_instance): Add case for
vector constructors.
          (vect_slp_check_for_constructors): New function.
          (vect_slp_analyze_bb_1): Call new function to check for vector
constructors.
          (vectorize_slp_instance_root_stmt): New function.
          (vect_schedule_slp): Call new function to vectorize root stmt
of vector constructors.
          * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.

gcc/testsuite/ChangeLog:

2019-11-04  Joel Hutton  <Joel.Hutton@arm.com>

          * gcc.dg/vect/bb-slp-40.c: New test.
          * gcc.dg/vect/bb-slp-41.c: New test.
Richard Biener Nov. 4, 2019, 4:27 p.m. UTC | #8
On November 4, 2019 4:30:57 PM GMT+01:00, Joel Hutton <Joel.Hutton@arm.com> wrote:
>First copy bounced from mailing list.
> > > On 30/10/2019 13:49, Richard Biener wrote:
> > >  >>
>> >  >>          * expr.c (store_constructor): Add case for constructor
>of
> > > vectors.
> > >  > Why do you need this?  The vectorizer already creates such 
>CTORs.  Any
> > >  > testcase that you can show?
> > >  >
> > >  > typedef long v2di __attribute__((vector_size(16)));
> > >  > v2di v;
> > >  > void
> > >  > foo()
> > >  > {
> > >  >    v = (v2di){v[1], v[0]};
> > >  > }
>> > There is a special case for single element vectors, where the
>vector
> > > mode and
> > > the element mode are the same.
>> > I've changed this slightly, as your solution of using VECTOR_MODE_P
>
>didn't
> > > work for my case where:
> > >    emode = E_DImode
> > >    eltmode = E_DImode
> > > On aarch64. As E_DImode is not a vector mode.
>> > Based on some checks I found in verify_gimple, I set the type of
>the
> > > replacement
>> > constructor to the same as the original constructor, as
>verify_gimple
> > > seems to
>> > expect flattened types. i.e. a vector of vectors of ints should
>have
> > > type vector
> > > of ints.
> >
> > Huh.  On aarch64 for the above testcase I see mode V2DImode and
> > emode = eltmode = DImode.  That's just a regular CTOR with
> > non-vector elements so not sure why you need any adjustment at all
> > here?
> >
> > It looks like your vectorization of the CTOR introduces a
> > V2DImode CTOR of vector(1) long elements which incidentially
> > have DImode.  That's because somehow V2DI vectorization isn't
> > profitable but V1DI one is.  In the end it's a noop transform
> > but the testcase shows that for V2DI vectorization we fail
> > to cost the CTOR in the scalar cost computation (you have to
> > include the pattern root there I guess).
> >
>Yes, sorry, I was unclear about this, it's the new constructor that the
>
>change is needed for.
>
> > Technically we feed valid GIMPLE to the expander so we indeed
> > shouldn't ICE.  So I'd like to have the earlier keying on
> > vec_vec_init_p match the later assert we run into, thus
> > sth like
> >
> > Index: gcc/expr.c
> > ===================================================================
> > --- gcc/expr.c  (revision 277765)
> > +++ gcc/expr.c  (working copy)
> > @@ -6809,6 +6809,7 @@ store_constructor (tree exp, rtx target,
> >             && n_elts.is_constant (&const_n_elts))
> >           {
> >             machine_mode emode = eltmode;
> > +           bool vector_typed_elts_p = false;
> >
> >             if (CONSTRUCTOR_NELTS (exp)
> >                 && (TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT (exp,
> > 0)->value))
> > @@ -6819,13 +6820,14 @@ store_constructor (tree exp, rtx target,
>>                                       * TYPE_VECTOR_SUBPARTS (etype),
> >                                       n_elts));
> >                 emode = TYPE_MODE (etype);
> > +               vector_typed_elts_p = true;
> >               }
>>             icode = convert_optab_handler (vec_init_optab, mode,
>emode);
> >             if (icode != CODE_FOR_nothing)
> >               {
> >                 unsigned int n = const_n_elts;
> >
> > -               if (emode != eltmode)
> > +               if (vector_typed_elts_p)
> >                   {
> >                     n = CONSTRUCTOR_NELTS (exp);
> >                     vec_vec_init_p = true;
> >
> >
>I've used this, thank you.
>
> > >  > @@ -4061,15 +4201,42 @@ vect_schedule_slp (vec_info *vinfo)
> > >  >        if (is_a <loop_vec_info> (vinfo))
> > >  >
>> >  > the ChangeLog doesn't mention this.  I guess this isn't
>necessary
> > >  > unless you fail vectorizing late which you shouldn't.
> > >  >
> > > It's necessary to avoid:
> > >
> > >          removing stmts twice for constructors of the form 
>{_1,_1,_1,_1}
>> >          removing stmts that define ssa names used elsewhere (which
>> > previously wasn't a consideration because the scalar_stmts were
>stores
> > > to memory, not assignments to ssa names)
> >
> > OK, but the code you are patching is just supposed to remove stores
> > from the final scalar stmt seeds - it doesn't expect any loads there
> > which is probably what happens.  I'd instead add a
> >
> >    /* Do not CSE the stmts feeding a CTOR, they might have uses
> >       outside of the vectorized stmts.  */
> >    if (SLP_INSTANCE_ROOT_STMT (instance))
> >      continue;
> >
> > before the loop over SLP_TREE_SCALAR_STMTS.
>Done.
>
> > +         if (!subparts.is_constant () || !(subparts.to_constant ()
>> +                                           == CONSTRUCTOR_NELTS
>(rhs)))
> > +           continue;
> >
> > can be better written as if (maybe_ne (subparts, CONSTRUCTOR_NELTS 
>(rhs)))
>Done.
>
> > +/* Vectorize the instance root.  Return success or failure. */
> > +
> > +void
> >
> > since it's now void the comment has to be adjusted.
>Done.
>
> > +      vect_schedule_slp_instance (node,
> >                                   instance, bst_map);
> >
> > this now fits in one line.
>Done.
>
> > OK with those changes.
>Tested and bootstrapped on aarch64.
>OK?

Ok. 

Thanks, 
Richard. 

>2019-11-04  Joel Hutton  <Joel.Hutton@arm.com>
>
>          * expr.c (store_constructor): Modify to handle single element
>vectors.
>          * tree-vect-slp.c (vect_analyze_slp_instance): Add case for
>vector constructors.
>          (vect_slp_check_for_constructors): New function.
>         (vect_slp_analyze_bb_1): Call new function to check for vector
>constructors.
>          (vectorize_slp_instance_root_stmt): New function.
>          (vect_schedule_slp): Call new function to vectorize root stmt
>of vector constructors.
>          * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.
>
>gcc/testsuite/ChangeLog:
>
>2019-11-04  Joel Hutton  <Joel.Hutton@arm.com>
>
>          * gcc.dg/vect/bb-slp-40.c: New test.
>          * gcc.dg/vect/bb-slp-41.c: New test.
diff mbox series

Patch

From 902510bd498acfc9e30636f8267b57027bc63254 Mon Sep 17 00:00:00 2001
From: Joel Hutton <Joel.Hutton@arm.com>
Date: Tue, 22 Oct 2019 10:05:19 +0100
Subject: [PATCH] SLP Vectorization: Vectorize vector constructors

---
 gcc/expr.c                            |   5 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-40.c |  34 +++++
 gcc/testsuite/gcc.dg/vect/bb-slp-41.c |  61 +++++++++
 gcc/tree-vect-slp.c                   | 171 +++++++++++++++++++++++++-
 gcc/tree-vectorizer.h                 |   5 +
 5 files changed, 273 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-40.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-41.c

diff --git a/gcc/expr.c b/gcc/expr.c
index 476c6865f20828fc68f455e70d4874eaabd9d08d..ea4705e47c7e8424cca5e19319ef21db415f496a 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -6825,7 +6825,10 @@  store_constructor (tree exp, rtx target, int cleared, poly_int64 size,
 	      {
 		unsigned int n = const_n_elts;
 
-		if (emode != eltmode)
+		if (emode != eltmode
+		    || (TREE_CODE (TREE_TYPE (exp)) == VECTOR_TYPE
+			&& TREE_CODE (TREE_TYPE (CONSTRUCTOR_ELT(exp, 0)->value))
+			   == VECTOR_TYPE))
 		  {
 		    n = CONSTRUCTOR_NELTS (exp);
 		    vec_vec_init_p = true;
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-40.c b/gcc/testsuite/gcc.dg/vect/bb-slp-40.c
new file mode 100644
index 0000000000000000000000000000000000000000..a1dd372184623f34f8f2825aa5da50dc70c98084
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-40.c
@@ -0,0 +1,34 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-slp-all" } */
+/* { dg-require-effective-target vect_int } */
+
+char g_d[1024], g_s1[1024], g_s2[1024];
+void foo(void)
+{
+    char *d = g_d, *s1 = g_s1, *s2 = g_s2;
+
+    for ( int y = 0; y < 128; y++ )
+    {
+      d[0 ] = s1[0 ] + s2[0 ];
+      d[1 ] = s1[1 ] + s2[1 ];
+      d[2 ] = s1[2 ] + s2[2 ];
+      d[3 ] = s1[3 ] + s2[3 ];
+      d[4 ] = s1[4 ] + s2[4 ];
+      d[5 ] = s1[5 ] + s2[5 ];
+      d[6 ] = s1[6 ] + s2[6 ];
+      d[7 ] = s1[7 ] + s2[7 ];
+      d[8 ] = s1[8 ] + s2[8 ];
+      d[9 ] = s1[9 ] + s2[9 ];
+      d[10] = s1[10] + s2[10];
+      d[11] = s1[11] + s2[11];
+      d[12] = s1[12] + s2[12];
+      d[13] = s1[13] + s2[13];
+      d[14] = s1[14] + s2[14];
+      d[15] = s1[15] + s2[15];
+      d += 16;
+    }
+}
+
+/* See that we vectorize an SLP instance.  */
+/* { dg-final { scan-tree-dump-times "Found vectorizable constructor" 1 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "slp1" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-41.c b/gcc/testsuite/gcc.dg/vect/bb-slp-41.c
new file mode 100644
index 0000000000000000000000000000000000000000..618d1d0ef3f6da93b53df62296620d589ad2e21a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-41.c
@@ -0,0 +1,61 @@ 
+/* { dg-do run } */
+/* { dg-options "-O3 -fdump-tree-slp-all -fno-vect-cost-model" } */
+/* { dg-require-effective-target vect_int } */
+
+#define ARR_SIZE 1000
+
+void foo (int *a, int *b)
+{
+  int i;
+  for (i = 0; i < (ARR_SIZE - 2); ++i)
+    a[i] = b[0] + b[1] + b[i+1] + b[i+2];
+}
+
+void bar (int *a, int *b)
+{
+  int i;
+  for (i = 0; i < (ARR_SIZE - 2); ++i)
+  {
+    a[i] = b[0];
+  }
+  for (i = 0; i < (ARR_SIZE - 2); ++i)
+  {
+    a[i] = a[i] + b[1];
+  }
+  for (i = 0; i < (ARR_SIZE - 2); ++i)
+  {
+    a[i] = a[i] + b[i+1];
+  }
+  for (i = 0; i < (ARR_SIZE - 2); ++i)
+  {
+    a[i] = a[i] + b[i+2];
+  }
+}
+
+int main ()
+{
+  int a1[ARR_SIZE];
+  int a2[ARR_SIZE];
+  int b[ARR_SIZE];
+  int i;
+
+  for (i = 0; i < ARR_SIZE; i++)
+  {
+    a1[i] = 0;
+    a2[i] = 0;
+    b[i]  = i;
+  }
+
+  foo (a1, b);
+  bar (a2, b);
+
+  for (i = 0; i < ARR_SIZE; i++)
+    if (a1[i] != a2[i])
+      return 1;
+
+  return 0;
+
+}
+/* See that we vectorize an SLP instance.  */
+/* { dg-final { scan-tree-dump-times "Found vectorizable constructor" 4 "slp1" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "slp1" } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index e1061ede061751f5fd6c55edb56671a854e7456e..be7bb27899da06d22ddc4833b3e90a2f8c000177 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1983,6 +1983,7 @@  vect_analyze_slp_instance (vec_info *vinfo,
   unsigned int i;
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
   vec<stmt_vec_info> scalar_stmts;
+  bool constructor = false;
 
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
     {
@@ -1996,6 +1997,13 @@  vect_analyze_slp_instance (vec_info *vinfo,
       vectype = STMT_VINFO_VECTYPE (stmt_info);
       group_size = REDUC_GROUP_SIZE (stmt_info);
     }
+  else if (is_gimple_assign (stmt_info->stmt)
+	    && gimple_assign_rhs_code (stmt_info->stmt) == CONSTRUCTOR)
+    {
+      vectype = TREE_TYPE (gimple_assign_rhs1 (stmt_info->stmt));
+      group_size = CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt_info->stmt));
+      constructor = true;
+    }
   else
     {
       gcc_assert (is_a <loop_vec_info> (vinfo));
@@ -2042,6 +2050,25 @@  vect_analyze_slp_instance (vec_info *vinfo,
       STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))
 	= STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ()));
     }
+  else if (constructor)
+    {
+      tree rhs = gimple_assign_rhs1 (stmt_info->stmt);
+      tree val;
+      FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
+	{
+	  if (TREE_CODE (val) == SSA_NAME)
+	    {
+	      gimple* def = SSA_NAME_DEF_STMT (val);
+	      stmt_vec_info def_info = vinfo->lookup_stmt (def);
+	      /* Value is defined in another basic block.  */
+	      if (!def_info)
+		return false;
+	      scalar_stmts.safe_push (def_info);
+	    }
+	  else
+	    return false;
+	}
+    }
   else
     {
       /* Collect reduction statements.  */
@@ -2099,6 +2126,8 @@  vect_analyze_slp_instance (vec_info *vinfo,
 	  SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size;
 	  SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor;
 	  SLP_INSTANCE_LOADS (new_instance) = vNULL;
+	  SLP_INSTANCE_ROOT_STMT (new_instance) = constructor ? stmt_info->stmt\
+						  : NULL;
 	  vect_gather_slp_loads (new_instance, node);
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_NOTE, vect_location,
@@ -2801,6 +2830,10 @@  vect_bb_slp_scalar_cost (basic_block bb,
 		stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
 		if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
 		  {
+		    /* Check this is not a constructor that will be vectorized
+		       away.  */
+		    if (BB_VINFO_GROUPED_STORES (vinfo).contains (use_stmt_info))
+			continue;
 		    (*life)[i] = true;
 		    BREAK_FROM_IMM_USE_STMT (use_iter);
 		  }
@@ -2912,6 +2945,38 @@  vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo)
   return true;
 }
 
+/* Find any vectorizable constructors, and add them to the grouped_store
+   array.  */
+
+static void
+vect_slp_check_for_constructors (bb_vec_info bb_vinfo)
+{
+  gimple_stmt_iterator gsi;
+
+  for (gsi = bb_vinfo->region_begin;
+      gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+
+      if (is_gimple_assign (stmt)
+	  && gimple_assign_rhs_code (stmt) == CONSTRUCTOR
+	  && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
+	  && TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt))) == VECTOR_TYPE)
+	{
+	  tree rhs = gimple_assign_rhs1 (stmt);
+
+	  if (CONSTRUCTOR_NELTS (rhs) == 0)
+	    continue;
+
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Found vectorizable constructor: %G\n", stmt);
+	  stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt);
+	  BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info);
+	}
+    }
+}
+
 /* Check if the region described by BB_VINFO can be vectorized, returning
    true if so.  When returning false, set FATAL to true if the same failure
    would prevent vectorization at other vector sizes, false if it is still
@@ -2959,6 +3024,8 @@  vect_slp_analyze_bb_1 (bb_vec_info bb_vinfo, int n_stmts, bool &fatal)
       return false;
     }
 
+  vect_slp_check_for_constructors (bb_vinfo);
+
   /* If there are no grouped stores in the region there is no need
      to continue with pattern recog as vect_analyze_slp will fail
      anyway.  */
@@ -4022,6 +4089,68 @@  vect_remove_slp_scalar_calls (slp_tree node)
   vect_remove_slp_scalar_calls (node, visited);
 }
 
+/* Vectorize the instance root.  Return success or failure.  */
+
+bool
+vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance)
+{
+
+  if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1)
+    {
+      stmt_vec_info child_stmt_info;
+      int j;
+      tree constructor;
+      constructor = gimple_assign_rhs1 (SLP_INSTANCE_ROOT_STMT (instance));
+
+      FOR_EACH_VEC_ELT (SLP_TREE_VEC_STMTS (node), j, child_stmt_info)
+	{
+	  tree vect_lhs = gimple_get_lhs (child_stmt_info->stmt);
+	  gassign *rstmt
+	    = gimple_build_assign (gimple_get_lhs (instance->root_stmt),
+		gimple_get_lhs (child_stmt_info->stmt));
+	  gimple_stmt_iterator rgsi = gsi_for_stmt (instance->root_stmt);
+	  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect_lhs)).is_constant ()
+	      && (CONSTRUCTOR_NELTS (constructor)
+		  == TYPE_VECTOR_SUBPARTS (TREE_TYPE (vect_lhs)).to_constant ()))
+	    {
+	      gsi_replace (&rgsi, rstmt, true);
+	      SLP_INSTANCE_ROOT_STMT (instance) = NULL;
+	    }
+	  break;
+	}
+    }
+  else if (SLP_TREE_NUMBER_OF_VEC_STMTS (node) > 1)
+    {
+      int nelts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
+      stmt_vec_info child_stmt_info;
+      int j;
+      vec<constructor_elt, va_gc> *v;
+      vec_alloc (v, nelts);
+      FOR_EACH_VEC_ELT (SLP_TREE_VEC_STMTS (node), j, child_stmt_info)
+	{
+	  CONSTRUCTOR_APPEND_ELT (v,
+				  NULL_TREE,
+				  gimple_get_lhs (child_stmt_info->stmt));
+	}
+      tree type = TREE_TYPE (gimple_assign_rhs1 (instance->root_stmt));
+      tree r_constructor = build_constructor (type, v);
+      gassign *rstmt
+	= gimple_build_assign (gimple_get_lhs (instance->root_stmt),
+	    r_constructor);
+      gimple_stmt_iterator rgsi = gsi_for_stmt (instance->root_stmt);
+      gsi_replace (&rgsi, rstmt, true);
+      SLP_INSTANCE_ROOT_STMT (instance) = NULL;
+    }
+  else
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+	    "vectorization of constructor failed.\n");
+      return false;
+    }
+  return true;
+}
+
 /* Generate vector code for all SLP instances in the loop/basic block.  */
 
 void
@@ -4036,9 +4165,20 @@  vect_schedule_slp (vec_info *vinfo)
   slp_instances = vinfo->slp_instances;
   FOR_EACH_VEC_ELT (slp_instances, i, instance)
     {
+      slp_tree node = SLP_INSTANCE_TREE (instance);
       /* Schedule the tree of INSTANCE.  */
-      vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance),
+      vect_schedule_slp_instance (node,
 				  instance, bst_map);
+
+      /* Vectorize the instance root.  */
+      if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
+	  && SLP_TREE_VEC_STMTS (node).exists ())
+	if (!vectorize_slp_instance_root_stmt (node, instance))
+	  {
+	    delete bst_map;
+	    return;
+	  }
+
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_NOTE, vect_location,
                          "vectorizing stmts using SLP.\n");
@@ -4061,15 +4201,42 @@  vect_schedule_slp (vec_info *vinfo)
       if (is_a <loop_vec_info> (vinfo))
 	vect_remove_slp_scalar_calls (root);
 
+      auto_vec<stmt_vec_info> removed;
+
       for (j = 0; SLP_TREE_SCALAR_STMTS (root).iterate (j, &store_info)
                   && j < SLP_INSTANCE_GROUP_SIZE (instance); j++)
         {
 	  if (!STMT_VINFO_DATA_REF (store_info))
 	    break;
 
+	  if (removed.contains (store_info))
+	    continue;
+
 	  store_info = vect_orig_stmt (store_info);
+	  tree lhs = gimple_get_lhs (store_info->stmt);
+	  bool used = false;
+	  /* We must ensure we don't remove def stmts for ssa names used
+	     elsewhere.  */
+	  if (TREE_CODE (lhs) == SSA_NAME)
+	    {
+	      gimple *use_stmt;
+	      imm_use_iterator use_iter;
+
+	      FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, lhs)
+		{
+		  if (use_stmt != store_info->stmt)
+		  {
+		    used = true;
+		    BREAK_FROM_IMM_USE_STMT (use_iter);
+		  }
+		}
+	    }
+
 	  /* Free the attached stmt_vec_info and remove the stmt.  */
-	  vinfo->remove_stmt (store_info);
+
+	  if (!used)
+	    vinfo->remove_stmt (store_info);
+	  removed.safe_push (store_info);
         }
     }
 }
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 56be28b0cc5a77412f996e70636b08d5b615813e..9f8419e4208b7d438ace41892022f93ebcadd019 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -151,6 +151,10 @@  public:
   /* The root of SLP tree.  */
   slp_tree root;
 
+  /* For vector constructors, the constructor stmt that the SLP tree is built
+     from, NULL otherwise.  */
+  gimple *root_stmt;
+
   /* Size of groups of scalar stmts that will be replaced by SIMD stmt/s.  */
   unsigned int group_size;
 
@@ -170,6 +174,7 @@  public:
 #define SLP_INSTANCE_GROUP_SIZE(S)               (S)->group_size
 #define SLP_INSTANCE_UNROLLING_FACTOR(S)         (S)->unrolling_factor
 #define SLP_INSTANCE_LOADS(S)                    (S)->loads
+#define SLP_INSTANCE_ROOT_STMT(S)                (S)->root_stmt
 
 #define SLP_TREE_CHILDREN(S)                     (S)->children
 #define SLP_TREE_SCALAR_STMTS(S)                 (S)->stmts
-- 
2.17.1