diff mbox series

[v2,5/16] middle-end: Add shared machinery for matching patterns involving complex numbers.

Message ID 20200925142837.GA16579@arm.com
State New
Headers show
Series middle-end Add support for SLP vectorization of complex number instructions. | expand

Commit Message

Tamar Christina Sept. 25, 2020, 2:28 p.m. UTC
Hi All,

This patch adds shared machinery for detecting patterns having to do with
complex number operations.  The class ComplexPattern provides helpers for
matching and ultimately undoing the permutation in the tree by rebuilding the
graph.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-slp-patterns.c (complex_operation_t,class ComplexPattern):
	New.

--

Comments

Richard Biener Sept. 28, 2020, 1:21 p.m. UTC | #1
On Fri, 25 Sep 2020, Tamar Christina wrote:

> Hi All,
> 
> This patch adds shared machinery for detecting patterns having to do with
> complex number operations.  The class ComplexPattern provides helpers for
> matching and ultimately undoing the permutation in the tree by rebuilding the
> graph.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

I think you want to change all this to not look at individual
stmts:

+    vect_match_expression_p (slp_tree node, tree_code code, int base, int 
idx,
+                            stmt_vec_info *op1, stmt_vec_info *op2)
+    {
+
+      vec<stmt_vec_info> scalar_stmts = SLP_TREE_SCALAR_STMTS (node);
+

_all_ lanes are supposed to match the operation in SLP_TREE_REPRESENTATIVE
there's no need to do any operation-matching on lane granularity.

The only thing making a difference here is VEC_PERM_EXPR nodes where
again - there's no need to look at (eventually non-existant!)
SLP_TREE_SCALAR_STMTS but its SLP_TREE_REPRESENTATIVE.

+      vec<slp_tree> children = SLP_TREE_CHILDREN (node);
+
+      /* If it's a VEC_PERM_EXPR we need to look one deeper.  
VEC_PERM_EXPR
+        only have one entry.  So pick on.  */
+      if (node->code == VEC_PERM_EXPR)
+       children = SLP_TREE_CHILDREN (children.last ());

that's too simplistic ;)

+         *op1 = SLP_TREE_SCALAR_STMTS (children[0])[n];

please make op1,op2 slp_tree, not stmt_vec_info.

Where do you look at SLP_TREE_LANE_PERMUTATION at all?  I think
the result of

+    vect_detect_pair_op (int base, slp_tree node1, int offset1, slp_tree 
node2,
+                        int offset2, vec<stmt_vec_info> *ops)

could be simply the SLP_TREE_LANE_PERMUTATION? plus its two
child nodes?

In the ComplexAddPattern patch I see

+      /* Correct the arguments after matching.  */
+      std::swap (this->m_vects[1], this->m_vects[3]);

how's that necessary?  The replacement SLP node should end up
with just a SLP_TREE_REPRESENTATIVE (the IFN function call).
That is, the only thing necessary is verification / matching of the 
appropriate "interleaving" scheme imposed by SLP_TREE_LANE_PERMUTATION.
I guess things would go wrong if instead of effectively blending
two vectors we'd interleave two smaller vector type vectors?  Say
a 4-way add and a 4-way sub interleaved into a 8-way addsub, using
V4SF and V8SF vectors?

Going to stop looking at the series at this point, one further note
is that all of the Complex*Patterns seem to share the initial
match and thus redundantly do a vect_detect_pair_op repeatedly
on each node for each pattern?  I wonder if it could be
commoned into a single pattern, they all seem to share
the initial mixed plus/minus, then we have the multiplication
on one or both operand cases.

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-slp-patterns.c (complex_operation_t,class ComplexPattern):
> 	New.
> 
>
Tamar Christina Sept. 28, 2020, 4:06 p.m. UTC | #2
Hi Richi,

> -----Original Message-----
> From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> Behalf Of Richard Biener
> Sent: Monday, September 28, 2020 2:22 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz
> Subject: Re: [PATCH v2 5/16]middle-end: Add shared machinery for matching
> patterns involving complex numbers.
> 
> On Fri, 25 Sep 2020, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch adds shared machinery for detecting patterns having to do
> > with complex number operations.  The class ComplexPattern provides
> > helpers for matching and ultimately undoing the permutation in the
> > tree by rebuilding the graph.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> I think you want to change all this to not look at individual
> stmts:
> 
> +    vect_match_expression_p (slp_tree node, tree_code code, int base,
> + int
> idx,
> +                            stmt_vec_info *op1, stmt_vec_info *op2)
> +    {
> +
> +      vec<stmt_vec_info> scalar_stmts = SLP_TREE_SCALAR_STMTS (node);
> +
> 
> _all_ lanes are supposed to match the operation in
> SLP_TREE_REPRESENTATIVE there's no need to do any operation-matching
> on lane granularity.
> 

That's fair, that flexibility seems like it indeed won't work since the statements are vectorized based on
SLP_TREE_REPRESENTATIVE anyway. So I'll simplify it.

> The only thing making a difference here is VEC_PERM_EXPR nodes where
> again - there's no need to look at (eventually non-existant!)
> SLP_TREE_SCALAR_STMTS but its SLP_TREE_REPRESENTATIVE.
> 
> +      vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> +
> +      /* If it's a VEC_PERM_EXPR we need to look one deeper.
> VEC_PERM_EXPR
> +        only have one entry.  So pick on.  */
> +      if (node->code == VEC_PERM_EXPR)
> +       children = SLP_TREE_CHILDREN (children.last ());
> 
> that's too simplistic ;)
> 
> +         *op1 = SLP_TREE_SCALAR_STMTS (children[0])[n];
> 
> please make op1,op2 slp_tree, not stmt_vec_info.
> 
> Where do you look at SLP_TREE_LANE_PERMUTATION at all?  I think the
> result of
> 

Here I have to admit that I have/am a bit confused as to the relation between the different permute fields.
LOAD permute is quite straight forward, LANE permute I think are shuffles/explicit permutes. 

But then I am lost as to the purpose of a VEC_PERM_EXPR nodes. Is it just a marker to indicate that some node below
has a load permute somewhere?

> +    vect_detect_pair_op (int base, slp_tree node1, int offset1,
> + slp_tree
> node2,
> +                        int offset2, vec<stmt_vec_info> *ops)
> 
> could be simply the SLP_TREE_LANE_PERMUTATION? plus its two child
> nodes?

Right, if I understood correctly, on the two_operands case the lane permute
would tell me whether it's + or -, and in the case of the non- two_operands cases
I probably want to check that it's vNULL since any permute in the order changes
how the instruction accepts the inputs?

> 
> In the ComplexAddPattern patch I see
> 
> +      /* Correct the arguments after matching.  */
> +      std::swap (this->m_vects[1], this->m_vects[3]);
> 
> how's that necessary?  The replacement SLP node should end up with just a
> SLP_TREE_REPRESENTATIVE (the IFN function call).
> That is, the only thing necessary is verification / matching of the appropriate
> "interleaving" scheme imposed by SLP_TREE_LANE_PERMUTATION.

But the order or the statements are important as those decide the LOAD_PERMUTATION
that build_slp_tree creates. 

So in this case the original statement is

       stmt 0 _39 = _37 + _12;
       stmt 1 _6 = _38 - _36;

{_12, _36} result in a LOAD_PERMUTATION of {1, 0} because of how the addition is done.
So to undo the LOAD_PERMUTE it has to build the new children using

       stmt 0 _39 = {_37, _36};
       stmt 1 _6 = {_38, _12};

So the scalar statements are purely a re-ordering to get it to rebuild the children correctly.

Now ADD is simplistic, the more complicated cases like MUL and FMA use this property to
Also rebuild the tree removing unneeded statements.  This method returns these and stores
them in the PatternMatch class, so I don't have to ask for them again later when replacing the
nodes.  Even for SLP_TREE_REPRESENTATIVE don't the arguments in the IFN call need to be
in such way that they both in the same place in the load chain?

> I guess things would go wrong if instead of effectively blending two vectors
> we'd interleave two smaller vector type vectors?  Say a 4-way add and a 4-
> way sub interleaved into a 8-way addsub, using V4SF and V8SF vectors?
> 

At this stage yes, most likely, but it should be rejected at the validate level.

What is also currently rejected is when some of the definitions are external,
which I think I should be able to handle. I'll fix that up with the rest of the changes.


Thanks for the feedback!

Regards,
Tamar

> Going to stop looking at the series at this point, one further note is that all of
> the Complex*Patterns seem to share the initial match and thus redundantly
> do a vect_detect_pair_op repeatedly on each node for each pattern?  I
> wonder if it could be commoned into a single pattern, they all seem to share
> the initial mixed plus/minus, then we have the multiplication on one or both
> operand cases.
> 
> Richard.
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-slp-patterns.c (complex_operation_t,class
> ComplexPattern):
> > 	New.
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imend
Richard Biener Sept. 29, 2020, 6:32 a.m. UTC | #3
On Mon, 28 Sep 2020, Tamar Christina wrote:

> Hi Richi,
> 
> > -----Original Message-----
> > From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> > Behalf Of Richard Biener
> > Sent: Monday, September 28, 2020 2:22 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz
> > Subject: Re: [PATCH v2 5/16]middle-end: Add shared machinery for matching
> > patterns involving complex numbers.
> > 
> > On Fri, 25 Sep 2020, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This patch adds shared machinery for detecting patterns having to do
> > > with complex number operations.  The class ComplexPattern provides
> > > helpers for matching and ultimately undoing the permutation in the
> > > tree by rebuilding the graph.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > 
> > I think you want to change all this to not look at individual
> > stmts:
> > 
> > +    vect_match_expression_p (slp_tree node, tree_code code, int base,
> > + int
> > idx,
> > +                            stmt_vec_info *op1, stmt_vec_info *op2)
> > +    {
> > +
> > +      vec<stmt_vec_info> scalar_stmts = SLP_TREE_SCALAR_STMTS (node);
> > +
> > 
> > _all_ lanes are supposed to match the operation in
> > SLP_TREE_REPRESENTATIVE there's no need to do any operation-matching
> > on lane granularity.
> > 
> 
> That's fair, that flexibility seems like it indeed won't work since the statements are vectorized based on
> SLP_TREE_REPRESENTATIVE anyway. So I'll simplify it.
> 
> > The only thing making a difference here is VEC_PERM_EXPR nodes where
> > again - there's no need to look at (eventually non-existant!)
> > SLP_TREE_SCALAR_STMTS but its SLP_TREE_REPRESENTATIVE.
> > 
> > +      vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > +
> > +      /* If it's a VEC_PERM_EXPR we need to look one deeper.
> > VEC_PERM_EXPR
> > +        only have one entry.  So pick on.  */
> > +      if (node->code == VEC_PERM_EXPR)
> > +       children = SLP_TREE_CHILDREN (children.last ());
> > 
> > that's too simplistic ;)
> > 
> > +         *op1 = SLP_TREE_SCALAR_STMTS (children[0])[n];
> > 
> > please make op1,op2 slp_tree, not stmt_vec_info.
> > 
> > Where do you look at SLP_TREE_LANE_PERMUTATION at all?  I think the
> > result of
> > 
> 
> Here I have to admit that I have/am a bit confused as to the relation 
> between the different permute fields. LOAD permute is quite straight 
> forward, LANE permute I think are shuffles/explicit permutes.
> 
> But then I am lost as to the purpose of a VEC_PERM_EXPR nodes. Is it 
> just a marker to indicate that some node below has a load permute 
> somewhere?

I've introduced VEC_PERM_EXPR for the lack of an isolated way to
perform a permute operation.  Before we've only had the chance
to specify a permutation at loads.  I'm still working on the
load side - one common optimization there for example is to
move a common load permutation across isomorphic operations,
like a plus of two same permuted loads can be done as a
permute of the plus of two unpermuted loads.

Now, the way VEC_PERM SLP nodes are designed also allows them
to be used to mediate between SLP sub-graphs using for example
different vector sizes.  That's because they semantically represent
a concat of the VEC_PERM children lanes and then applying a
selection mask as to which lanes to output.  The comment of
vectorizable_slp_permutation has the most detail here
(it doesn't yet fully implement all cases since VEC_PERM is
currently only used for the two_operators case).

The SLP_TREE_LANE_PERMUTATION specifies { child-nr, lane-nr }
pairs (so it enumerates the 'scalar' output lanes).

I think for all complex ops you are looking for
{ { 0, 2*i + 1 }, { 0, 2*i }, ... } or { { 0, 0 }, { 0, 1 }, .. }
kind of vectors, blended according to mix plus/minus appropriately?

> > +    vect_detect_pair_op (int base, slp_tree node1, int offset1,
> > + slp_tree
> > node2,
> > +                        int offset2, vec<stmt_vec_info> *ops)
> > 
> > could be simply the SLP_TREE_LANE_PERMUTATION? plus its two child
> > nodes?
> 
> Right, if I understood correctly, on the two_operands case the lane 
> permute would tell me whether it's + or -, and in the case of the non- 
> two_operands cases I probably want to check that it's vNULL since any 
> permute in the order changes how the instruction accepts the inputs?

In the vectorized code you can think of the permute nodes producing
a vector permute instruction, shuffling lanes around.  If that changes
any of the semantics then you need to see how to deal with it.

> > 
> > In the ComplexAddPattern patch I see
> > 
> > +      /* Correct the arguments after matching.  */
> > +      std::swap (this->m_vects[1], this->m_vects[3]);
> > 
> > how's that necessary?  The replacement SLP node should end up with just a
> > SLP_TREE_REPRESENTATIVE (the IFN function call).
> > That is, the only thing necessary is verification / matching of the appropriate
> > "interleaving" scheme imposed by SLP_TREE_LANE_PERMUTATION.
> 
> But the order or the statements are important as those decide the 
> LOAD_PERMUTATION that build_slp_tree creates.
> 
> So in this case the original statement is
> 
>        stmt 0 _39 = _37 + _12;
>        stmt 1 _6 = _38 - _36;
> 
> {_12, _36} result in a LOAD_PERMUTATION of {1, 0} because of how the 
> addition is done. So to undo the LOAD_PERMUTE it has to build the new 
> children using
> 
>        stmt 0 _39 = {_37, _36};
>        stmt 1 _6 = {_38, _12};
> 
> So the scalar statements are purely a re-ordering to get it to rebuild 
> the children correctly.

Hmm, but you're looking at the permute node here which has the
add and the subtract as children.  Only that eventually has
{_12, _36} as child which may be a load containing a permute.
If you need {_12, _36} swapped you shoulnd't change the scalar
stmts array in some (which?) SLP node - instead you could
insert a VEC_PERM SLP node applying the permute (the not yet
written phase to optimize permutes should then eventually merge
this with a LOAD_PERMUTATION).

> Now ADD is simplistic, the more complicated cases like MUL and FMA use 
> this property to Also rebuild the tree removing unneeded statements.  
> This method returns these and stores them in the PatternMatch class, so 
> I don't have to ask for them again later when replacing the nodes.  
> Even for SLP_TREE_REPRESENTATIVE don't the arguments in the IFN call 
> need to be in such way that they both in the same place in the load 
> chain?

The GIMPLE stmts just serve as a "pattern" for the actual operation,
the vectorized stmts are now constructed entirely from the SLP
node where for the case of the IFN call the first argument for the 
vectorized IFN is picked from its SLP node first childs vectorized
defs.  That's as opposed to what we needed to do in the past where
we tried to match up the scalar stmts operand with the appropriate
scalar lane def.

> > I guess things would go wrong if instead of effectively blending two 
> > vectors we'd interleave two smaller vector type vectors?  Say a 4-way 
> > add and a 4- way sub interleaved into a 8-way addsub, using V4SF and 
> > V8SF vectors?
> > 
> 
> At this stage yes, most likely, but it should be rejected at the 
> validate level.
> 
> What is also currently rejected is when some of the definitions are 
> external, which I think I should be able to handle. I'll fix that up 
> with the rest of the changes.
> 
> 
> Thanks for the feedback!

You're welcome.

Richard.

> Regards,
> Tamar
> 
> > Going to stop looking at the series at this point, one further note is that all of
> > the Complex*Patterns seem to share the initial match and thus redundantly
> > do a vect_detect_pair_op repeatedly on each node for each pattern?  I
> > wonder if it could be commoned into a single pattern, they all seem to share
> > the initial mixed plus/minus, then we have the multiplication on one or both
> > operand cases.
> > 
> > Richard.
> > 
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-slp-patterns.c (complex_operation_t,class
> > ComplexPattern):
> > > 	New.
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imend
>
Tamar Christina Nov. 3, 2020, 3:06 p.m. UTC | #4
Hi All,

This is a respin containing the requested changes.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-slp-patterns.c (vect_match_expression_p,
	vect_check_lane_permute, vect_detect_pair_op,
	vect_mark_stmts_as_in_pattern, class complex_pattern,
	complex_pattern::validate_p, class complex_operations_pattern,
	complex_operations_pattern::matches,
	complex_operations_pattern::get_name,
	complex_operations_pattern::build,
	complex_operations_pattern::validate_p,
	complex_operations_pattern::get_arity,
	complex_operations_pattern::is_optab_supported_p,
	complex_operations_pattern::get_ifn): New.
	(slp_patterns): Add complex_operations_pattern.

> -----Original Message-----
> From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of Tamar
> Christina
> Sent: Monday, September 28, 2020 5:06 PM
> To: Richard Biener <rguenther@suse.de>
> Cc: nd <nd@arm.com>; gcc-patches@gcc.gnu.org; ook@ucw.cz
> Subject: RE: [PATCH v2 5/16]middle-end: Add shared machinery for matching
> patterns involving complex numbers.
> 
> Hi Richi,
> 
> > -----Original Message-----
> > From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> > Behalf Of Richard Biener
> > Sent: Monday, September 28, 2020 2:22 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz
> > Subject: Re: [PATCH v2 5/16]middle-end: Add shared machinery for
> > matching patterns involving complex numbers.
> >
> > On Fri, 25 Sep 2020, Tamar Christina wrote:
> >
> > > Hi All,
> > >
> > > This patch adds shared machinery for detecting patterns having to do
> > > with complex number operations.  The class ComplexPattern provides
> > > helpers for matching and ultimately undoing the permutation in the
> > > tree by rebuilding the graph.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> >
> > I think you want to change all this to not look at individual
> > stmts:
> >
> > +    vect_match_expression_p (slp_tree node, tree_code code, int base,
> > + int
> > idx,
> > +                            stmt_vec_info *op1, stmt_vec_info *op2)
> > +    {
> > +
> > +      vec<stmt_vec_info> scalar_stmts = SLP_TREE_SCALAR_STMTS (node);
> > +
> >
> > _all_ lanes are supposed to match the operation in
> > SLP_TREE_REPRESENTATIVE there's no need to do any operation-matching
> > on lane granularity.
> >
> 
> That's fair, that flexibility seems like it indeed won't work since the
> statements are vectorized based on SLP_TREE_REPRESENTATIVE anyway. So
> I'll simplify it.
> 
> > The only thing making a difference here is VEC_PERM_EXPR nodes where
> > again - there's no need to look at (eventually non-existant!)
> > SLP_TREE_SCALAR_STMTS but its SLP_TREE_REPRESENTATIVE.
> >
> > +      vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > +
> > +      /* If it's a VEC_PERM_EXPR we need to look one deeper.
> > VEC_PERM_EXPR
> > +        only have one entry.  So pick on.  */
> > +      if (node->code == VEC_PERM_EXPR)
> > +       children = SLP_TREE_CHILDREN (children.last ());
> >
> > that's too simplistic ;)
> >
> > +         *op1 = SLP_TREE_SCALAR_STMTS (children[0])[n];
> >
> > please make op1,op2 slp_tree, not stmt_vec_info.
> >
> > Where do you look at SLP_TREE_LANE_PERMUTATION at all?  I think the
> > result of
> >
> 
> Here I have to admit that I have/am a bit confused as to the relation
> between the different permute fields.
> LOAD permute is quite straight forward, LANE permute I think are
> shuffles/explicit permutes.
> 
> But then I am lost as to the purpose of a VEC_PERM_EXPR nodes. Is it just a
> marker to indicate that some node below has a load permute somewhere?
> 
> > +    vect_detect_pair_op (int base, slp_tree node1, int offset1,
> > + slp_tree
> > node2,
> > +                        int offset2, vec<stmt_vec_info> *ops)
> >
> > could be simply the SLP_TREE_LANE_PERMUTATION? plus its two child
> > nodes?
> 
> Right, if I understood correctly, on the two_operands case the lane permute
> would tell me whether it's + or -, and in the case of the non- two_operands
> cases I probably want to check that it's vNULL since any permute in the order
> changes how the instruction accepts the inputs?
> 
> >
> > In the ComplexAddPattern patch I see
> >
> > +      /* Correct the arguments after matching.  */
> > +      std::swap (this->m_vects[1], this->m_vects[3]);
> >
> > how's that necessary?  The replacement SLP node should end up with
> > just a SLP_TREE_REPRESENTATIVE (the IFN function call).
> > That is, the only thing necessary is verification / matching of the
> > appropriate "interleaving" scheme imposed by
> SLP_TREE_LANE_PERMUTATION.
> 
> But the order or the statements are important as those decide the
> LOAD_PERMUTATION that build_slp_tree creates.
> 
> So in this case the original statement is
> 
>        stmt 0 _39 = _37 + _12;
>        stmt 1 _6 = _38 - _36;
> 
> {_12, _36} result in a LOAD_PERMUTATION of {1, 0} because of how the
> addition is done.
> So to undo the LOAD_PERMUTE it has to build the new children using
> 
>        stmt 0 _39 = {_37, _36};
>        stmt 1 _6 = {_38, _12};
> 
> So the scalar statements are purely a re-ordering to get it to rebuild the
> children correctly.
> 
> Now ADD is simplistic, the more complicated cases like MUL and FMA use this
> property to Also rebuild the tree removing unneeded statements.  This
> method returns these and stores them in the PatternMatch class, so I don't
> have to ask for them again later when replacing the nodes.  Even for
> SLP_TREE_REPRESENTATIVE don't the arguments in the IFN call need to be in
> such way that they both in the same place in the load chain?
> 
> > I guess things would go wrong if instead of effectively blending two
> > vectors we'd interleave two smaller vector type vectors?  Say a 4-way
> > add and a 4- way sub interleaved into a 8-way addsub, using V4SF and V8SF
> vectors?
> >
> 
> At this stage yes, most likely, but it should be rejected at the validate level.
> 
> What is also currently rejected is when some of the definitions are external,
> which I think I should be able to handle. I'll fix that up with the rest of the
> changes.
> 
> 
> Thanks for the feedback!
> 
> Regards,
> Tamar
> 
> > Going to stop looking at the series at this point, one further note is
> > that all of the Complex*Patterns seem to share the initial match and
> > thus redundantly do a vect_detect_pair_op repeatedly on each node for
> > each pattern?  I wonder if it could be commoned into a single pattern,
> > they all seem to share the initial mixed plus/minus, then we have the
> > multiplication on one or both operand cases.
> >
> > Richard.
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-slp-patterns.c (complex_operation_t,class
> > ComplexPattern):
> > > 	New.
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imend
diff mbox series

Patch

diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index f605f68d2a14c4bf4941f97b7c1d57f6acb5ffb1..6453a5b1b6464dba833adc2c2a194db5e712bb79 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -134,6 +134,19 @@  along with GCC; see the file COPYING3.  If not see
   To add a new pattern, implement the VectPattern class and add the type to
   slp_patterns.  */
 
+/* The COMPLEX_OPERATION enum denotes the possible pair of operations that can
+   be matched when looking for expressions that we are interested matching for
+   complex numbers addition and mla.  */
+
+typedef enum _complex_operation {
+  PLUS_PLUS,
+  MINUS_PLUS,
+  PLUS_MINUS,
+  MULT_MULT,
+  NEG_NEG,
+  CMPLX_NONE
+} complex_operation_t;
+
 /* VectSimplePatternMatch holds contextual information about a single match
    found in the SLP tree.  The use of the class is to allow you to defer
    performing any modifications to the SLP tree until they are to be done.  By
@@ -298,6 +311,358 @@  class VectSimplePatternMatch : public VectPatternMatch
     }
 };
 
+/* The ComplexPattern class contains common code for pattern matchers that work
+   on complex numbers.  These provide functionality to allow de-construction and
+   validation of sequences depicting/transforming REAL and IMAG pairs.  */
+
+class ComplexPattern : public VectPattern
+{
+  protected:
+    /* Current list of arguments that were found during the current invocation
+       of the pattern matcher.  */
+    vec<stmt_vec_info> m_vects;
+
+    /* Representative statement for the current match being performed.  */
+    stmt_vec_info m_stmt_info;
+
+    /* A list of all arguments found between all invocations of the current
+       pattern matcher.  */
+    vec<vec<stmt_vec_info>> m_defs;
+
+    /* Checks to see of the expression EXPR is a gimple assign with code CODE
+       and if this is the case the two operands of EXPR is returned in OP1 and
+       OP2.
+
+       If the matching and extraction is successful TRUE is returned otherwise
+       FALSE in which case the value of OP1 and OP2 will not have been touched.
+    */
+
+    bool
+    vect_match_expression_p (slp_tree node, tree_code code, int base, int idx,
+			     stmt_vec_info *op1, stmt_vec_info *op2)
+    {
+
+      vec<stmt_vec_info> scalar_stmts = SLP_TREE_SCALAR_STMTS (node);
+
+      /* Calculate the index of the statement in the node to inspect.  */
+      int n = base + idx;
+      if (scalar_stmts.length () < (unsigned)n) // can use group_size
+	return false;
+
+      gimple* expr = STMT_VINFO_STMT (scalar_stmts[n]);
+      if (!is_gimple_assign (expr)
+	  || gimple_expr_code (expr) != code)
+	return false;
+
+      vec<slp_tree> children = SLP_TREE_CHILDREN (node);
+
+      /* If it's a VEC_PERM_EXPR we need to look one deeper.  VEC_PERM_EXPR
+	 only have one entry.  So pick on.  */
+      if (node->code == VEC_PERM_EXPR)
+	children = SLP_TREE_CHILDREN (children.last ());
+
+      if (children.length () != (op2 ? 2 : 1))
+	return false;
+
+      if (op1)
+	{
+	  if (SLP_TREE_DEF_TYPE (children[0]) != vect_internal_def)
+	    return false;
+	  *op1 = SLP_TREE_SCALAR_STMTS (children[0])[n];
+	}
+
+      if (op2)
+	{
+	  if (SLP_TREE_DEF_TYPE (children[1]) != vect_internal_def)
+	    return false;
+	  *op2 = SLP_TREE_SCALAR_STMTS (children[1])[n];
+	}
+
+      return true;
+    }
+
+    /* This function will match two gimple expressions STMT_0 and STMT_1 in
+       parallel and returns the pair operation that represents the two
+       expressions in the two statements.  The statements are located in NODE1
+       and NODE2 at offset base + offset1 and base + offset2 respectively.
+
+       If match is successful then the corresponding complex_operation is
+       returned and the arguments to the two matched operations are returned in
+       OPS.
+
+       If unsuccessful then CMPLX_NONE is returned and OPS is untouched.
+
+       e.g. the following gimple statements
+
+       stmt 0 _39 = _37 + _12;
+       stmt 1 _6 = _38 - _36;
+
+       will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
+    */
+
+    complex_operation_t
+    vect_detect_pair_op (int base, slp_tree node1, int offset1, slp_tree node2,
+			 int offset2, vec<stmt_vec_info> *ops)
+    {
+      stmt_vec_info op1 = NULL, op2 = NULL, op3 = NULL, op4 = NULL;
+      complex_operation_t result = CMPLX_NONE;
+      #define CHECK_FOR(x, y, z)                                            \
+	(vect_match_expression_p (node1, x, base, offset1, &op1,            \
+				  z ? &op2 : NULL)                          \
+	 && vect_match_expression_p (node2, y, base, offset2, &op3,         \
+				     z ? &op4 : NULL))
+
+      if (CHECK_FOR (MINUS_EXPR, PLUS_EXPR, true))
+	result = MINUS_PLUS;
+      else if (CHECK_FOR (PLUS_EXPR, MINUS_EXPR, true))
+	result = PLUS_MINUS;
+      else if (CHECK_FOR (PLUS_EXPR, PLUS_EXPR, true))
+	result = PLUS_PLUS;
+      else if (CHECK_FOR (MULT_EXPR, MULT_EXPR, true))
+	result = MULT_MULT;
+      else if (CHECK_FOR (NEGATE_EXPR, NEGATE_EXPR, false))
+	result = NEG_NEG;
+      #undef CHECK_FOR
+
+      if (result != CMPLX_NONE && ops != NULL)
+	{
+	  ops->create (4);
+	  ops->quick_push (op1);
+	  ops->quick_push (op2);
+	  ops->quick_push (op3);
+	  ops->quick_push (op4);
+	}
+      return result;
+    }
+
+    /* Overload of vect_detect_pair_op where the statements are assumed to be
+       one after the other.  This inspects node[base] and node[base+1].  */
+
+    complex_operation_t
+    vect_detect_pair_op (int base, slp_tree node, vec<stmt_vec_info> *ops)
+    {
+      return vect_detect_pair_op (base, node, 0, node, 1, ops);
+    }
+
+    /* Create the intermediate states that are needed and generate a new match
+       object with the information.  */
+
+    bool
+    store_results ()
+    {
+      this->m_defs.safe_push (this->m_vects);
+      save_match ();
+      return true;
+    }
+
+    /* This function marks every statement that is being replaced during the
+       the pattern matching as PURE.  Normally when replacing a statement due
+       to a pattern we add the statement to the STMT_VINFO_PATTERN_DEF_SEQ of
+       the pattern that is replacing them.  In this case however this won't
+       work as when doing the replacement we are changing the nodes that are
+       used by the statements.  This means that when vectorized the SSA chain
+       is different than in the BB.
+
+       Declaring the statements as part of the sequence will then cause SSA
+       verification to fail as we may refer to statements that were not in the
+       original USE-DEF chain of the statement we are replacing.
+
+       The downside of this approach is that the statements will still be
+       seen as relevant and so we will still generate code for them and they
+       will be in the output, unconnected until DSE.  We could mark them as
+       irrelevant, but that is only safe if there are no more uses of the node
+       in the SLP graph (So perhaps this should be done in free_slp_tree
+       instead of here.  */
+
+    static void
+    vect_mark_stmts_as_in_pattern (hash_set<slp_tree> *cache, vec<stmt_vec_info> orig,
+				   slp_tree node)
+    {
+       if (cache->contains (node))
+	  return;
+
+       unsigned i;
+       stmt_vec_info stmt_info;
+       slp_tree child;
+
+       cache->add (node);
+
+       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
+	 {
+	    if (gimple_assign_load_p (STMT_VINFO_STMT (stmt_info)))
+	      return;
+
+	    STMT_SLP_TYPE (stmt_info) = pure_slp;
+	 }
+
+       FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
+	 vect_mark_stmts_as_in_pattern (cache, orig, child);
+    }
+
+  protected:
+    ComplexPattern (slp_tree node, vec_info *vinfo)
+      : VectPattern (node, vinfo)
+    { }
+
+    /* Create and store a new VectPatternMatch object with the current match
+       that was found.  */
+
+    void save_match ()
+    {
+      tree type = gimple_expr_type (STMT_VINFO_STMT (this->m_stmt_info));
+      tree vectype = get_vectype_for_scalar_type (this->m_vinfo, type,
+						  this->m_node);
+      VectPatternMatch *match
+	= new VectSimplePatternMatch (this->m_arity, this->m_defs.last (),
+				      this->m_last_ifn, this->m_vinfo,
+				      this->m_last_idx, this->m_node, type,
+				      vectype, this->m_num_args);
+      this->m_matches.safe_push (match);
+    }
+
+  public:
+
+    /* Check to see if all loads rooted in ROOT are linear.  Linearity is
+       defined as having no gaps between values loaded.  */
+    static bool
+    linear_loads_p (slp_tree root)
+    {
+      if (!root)
+	return false;
+
+      unsigned i;
+
+      if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
+	{
+	  vec<unsigned> loads = SLP_TREE_LOAD_PERMUTATION (root);
+	  unsigned leader = loads[0];
+	  unsigned load;
+	  FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
+	  if (load != ++leader)
+	    return false;
+	}
+
+      slp_tree child;
+      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
+	if (!linear_loads_p (child))
+	  return false;
+
+      return true;
+    }
+
+    /* The post transform and validation function for the complex number
+       patterns.  This will re-arrange the tree and re-organize the nodes such
+       that they can be used by the complex number instructions that are to be
+       created.  It does this by doing the following steps:
+
+       1) It looks up the definition nodes of each statement in DEFS which are
+	  the new arguments to be used in the new patterns that will be created.
+	  From this new SLP trees are created by calling vect_build_slp_tree
+	  with the statements in the order we expect them to be.  A majority of
+	  these will be found in the cache and so this call will be fast.
+
+       2) After the new trees are created we check to see if all of them are
+	  linear.  If they are not linear we abort and undo the bookkeeping
+	  information that vect_build_slp_tree created for them.
+
+       3) The children of NODE are replaced with the new set of nodes we
+	  created.
+
+       4) The new sub-tree rooted in NODE are marked as relevant.
+
+       This sequence of operations does an implicit re-ordering nodes.  After
+       which DSE can remove the unused nodes.  e.g. it will undo as much of the
+       permutes as it possibly can.  This is required such that pattern
+       matchers running on the newly created statements match the correct
+       operations.  */
+
+    bool validate_p (poly_uint64 *max_nunits, bool *matches,
+		     unsigned *npermutes, unsigned *tree_size,
+		     scalar_stmts_to_slp_tree_map_t * bst_map)
+    {
+      int group_size = SLP_TREE_SCALAR_STMTS (this->m_node).length ();
+      auto_vec<slp_tree> nodes;
+      auto_vec<stmt_vec_info> stmts;
+      stmts.create (0);
+      stmts.safe_grow_cleared (group_size);
+      nodes.create (0);
+      nodes.safe_grow_cleared (this->m_num_args);
+      slp_tree tmp = NULL;
+      vec<stmt_vec_info> iters = SLP_TREE_SCALAR_STMTS (this->m_node);
+
+      VectPatternMatch *match;
+      unsigned int i, count;
+      int idx = -1;
+      hash_set<slp_tree> *visited = new hash_set<slp_tree> ();
+
+      for (idx = 0; idx < this->m_num_args; idx++)
+	{
+	  count = 0;
+	  FOR_EACH_VEC_ELT (this->m_matches, i, match)
+	  {
+	    vec<stmt_vec_info> def = this->m_defs[i];
+	    for (int x = 0; x < this->m_arity; x++)
+	      {
+		stmt_vec_info op = def[idx + (x * this->m_num_args)];
+		stmts[count++] = op;
+	      }
+	  }
+
+ 	  /* We need top copy the statements in case the node is not in the
+	     cache.  But if it is in the cache we leak?  */
+	  vec<stmt_vec_info> new_stmts = stmts.copy ();
+	  tmp = vect_build_slp_tree (this->m_vinfo, new_stmts, group_size,
+				     max_nunits, matches, npermutes, tree_size,
+				     bst_map);
+
+	  gimple *info
+	    = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (this->m_node));
+	  if (!tmp)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "Could not build new SLP tree for %G\n", info);
+
+	      goto graceful_exit;
+	    }
+
+	  nodes[idx] = tmp;
+	  visited->add (tmp);
+
+	  if (!linear_loads_p (tmp))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "Loads could not be made linear %G\n", info);
+
+	      goto graceful_exit;
+	    }
+	}
+
+      /* Mark all statements that are unused between the new and old nodes as in
+	 a pattern.  */
+      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (this->m_node), i, tmp)
+	vect_mark_stmts_as_in_pattern (visited, iters, tmp);
+
+      delete visited;
+
+      SLP_TREE_CHILDREN (this->m_node).truncate (0);
+      SLP_TREE_CHILDREN (this->m_node).safe_splice (nodes);
+
+      return true;
+
+graceful_exit:
+
+      delete visited;
+
+      FOR_EACH_VEC_ELT (nodes, i, tmp)
+        if (tmp)
+	  vect_free_slp_tree (tmp);
+
+      return false;
+    }
+};
+
 #define SLP_PATTERN(x) &x::create
 VectPatternDecl slp_patterns[]
 {