diff mbox series

[2/2] middle-end : Add support for complex pattern detection for Add, Mul, FMA and FMS

Message ID 20201114143905.GA24636@arm.com
State New
Headers show
Series [1/2] middle-end : Initial scaffolding and definitions for SLP patttern matches | expand

Commit Message

Tamar Christina Nov. 14, 2020, 2:39 p.m. UTC
Hi All,

This patch series adds support for SLP vectorization of complex instructions [1].

These instructions exist only in their vector forms and require you to recognize
two statements in parallel.  Complex operations usually require a permute due to
the fact that the real and imaginary numbers are stored intermixed but these vector
instructions expect this and no longer need the compiler to generate a permute.

The pass is able to insert additional permutes inside the SLP tree in order to
change the data flow such that the new instructions can be used.  These permutes
are then optimized out later during optimize_slp.

These instructions also support rotations along the Argand plane, as such the operands
have to be re-ordered to coincide with their load group.

For now, this patch only adds support for:

  * Complex Addition with rotation of 90 and 270.

  Addition with rotation of the second argument around the Argand plane.
    Supported rotations are 90 and 180.

    c = a + (b * I) and c = a + (b * I * I * I)

  * Complex Multiplication and Multiplication where one operand is conjucated.

  Complex multiplication and Conjucate Complex multiplication of the second
     parameter.

    c = a * b and c = a * conj (b)

  * Complex FMA and FMA where one operand is conjucated.
  * Complex FMS and FMS where one operand is conjucated.

  Complex FMLA, Conjucate FMLA of the second parameter and FMLS.

    c += a * b, c += a * conj (b), c -= a * b and c -= a * conj (b)

  For the conjucate cases it supports under fast-math that the operands that is
  being conjucated be flipped by flipping the arguments to the optab.  This
  allows it to support c = conj (a) * b and c += conj (a) * b.

  where a, b and c are complex numbers.
  
Complex dot-product is not currently supported in this patch set as build_slp fails
for it.  This will be provided as a future patch.
  
These are supported for both integer and floating point and as such these don't look
for real or imaginary pairs but instead rely on the early lowering of complex
numbers by GCC and canonization of the operations such that it just recognizes any
instruction sequence matching the operations requested.

To be safe when the it is not sure it can support the operation or if it finds
something it does not understand it backs off.  This is done by looking at the
order of the loads below the nodes.  Anything that changes the order like a
VEC_PERM is taken into account.  The order of the loads define which operation
is being done.   For instance a complex add with conjucate always has the second
loads in non-contiguous order.   Because this information is queried often it is
heavily cached and because we never modify nodes in place we can cache the
operation through all SLP instances.

When SLP failes (e.g. due to costing) we dissolve the changes and restore the
previous relevancies of the stmts that were in a pattern before.

[1] https://developer.arm.com/documentation/ddi0487/fc/

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? (working on moving tests to middend IFN tests instead).

Thanks,
Tamar

gcc/ChangeLog:

	* tree-vect-slp-patterns.c: New file.
	* Makefile.in: Use it.
	* doc/passes.texi: Document it.
	* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270,
	COMPLEX_MUL, COMPLEX_MUL_CONJ, COMPLEX_FMA, COMPLEX_FMA_CONJ):
	COMPLEX_FMS, COMPLEX_FMS_CONJ): New.
	* optabs.def (cadd90_optab, cadd270_optab, cmul_optab, cmul_conj_optab,
	cmla_optab, cmla_conj_optab, cmls_optab, cmls_conj_optab): New.
	* doc/md.texi: Document them

-- inline copy of patch --



--

Comments

Richard Biener Nov. 16, 2020, 3:11 p.m. UTC | #1
On Sat, 14 Nov 2020, Tamar Christina wrote:

> Hi All,
> 
> This patch series adds support for SLP vectorization of complex instructions [1].
> 
> These instructions exist only in their vector forms and require you to recognize
> two statements in parallel.  Complex operations usually require a permute due to
> the fact that the real and imaginary numbers are stored intermixed but these vector
> instructions expect this and no longer need the compiler to generate a permute.
> 
> The pass is able to insert additional permutes inside the SLP tree in order to
> change the data flow such that the new instructions can be used.  These permutes
> are then optimized out later during optimize_slp.
> 
> These instructions also support rotations along the Argand plane, as such the operands
> have to be re-ordered to coincide with their load group.
> 
> For now, this patch only adds support for:
> 
>   * Complex Addition with rotation of 90 and 270.
> 
>   Addition with rotation of the second argument around the Argand plane.
>     Supported rotations are 90 and 180.
> 
>     c = a + (b * I) and c = a + (b * I * I * I)
> 
>   * Complex Multiplication and Multiplication where one operand is conjucated.
> 
>   Complex multiplication and Conjucate Complex multiplication of the second
>      parameter.
> 
>     c = a * b and c = a * conj (b)
> 
>   * Complex FMA and FMA where one operand is conjucated.
>   * Complex FMS and FMS where one operand is conjucated.
> 
>   Complex FMLA, Conjucate FMLA of the second parameter and FMLS.
> 
>     c += a * b, c += a * conj (b), c -= a * b and c -= a * conj (b)
> 
>   For the conjucate cases it supports under fast-math that the operands that is
>   being conjucated be flipped by flipping the arguments to the optab.  This
>   allows it to support c = conj (a) * b and c += conj (a) * b.
> 
>   where a, b and c are complex numbers.
>   
> Complex dot-product is not currently supported in this patch set as build_slp fails
> for it.  This will be provided as a future patch.
>   
> These are supported for both integer and floating point and as such these don't look
> for real or imaginary pairs but instead rely on the early lowering of complex
> numbers by GCC and canonization of the operations such that it just recognizes any
> instruction sequence matching the operations requested.
> 
> To be safe when the it is not sure it can support the operation or if it finds
> something it does not understand it backs off.  This is done by looking at the
> order of the loads below the nodes.  Anything that changes the order like a
> VEC_PERM is taken into account.  The order of the loads define which operation
> is being done.   For instance a complex add with conjucate always has the second
> loads in non-contiguous order.   Because this information is queried often it is
> heavily cached and because we never modify nodes in place we can cache the
> operation through all SLP instances.
> 
> When SLP failes (e.g. due to costing) we dissolve the changes and restore the
> previous relevancies of the stmts that were in a pattern before.
> 
> [1] https://developer.arm.com/documentation/ddi0487/fc/
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master? (working on moving tests to middend IFN tests instead).
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-slp-patterns.c: New file.
> 	* Makefile.in: Use it.
> 	* doc/passes.texi: Document it.
> 	* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270,
> 	COMPLEX_MUL, COMPLEX_MUL_CONJ, COMPLEX_FMA, COMPLEX_FMA_CONJ):
> 	COMPLEX_FMS, COMPLEX_FMS_CONJ): New.
> 	* optabs.def (cadd90_optab, cadd270_optab, cmul_optab, cmul_conj_optab,
> 	cmla_optab, cmla_conj_optab, cmls_optab, cmls_conj_optab): New.
> 	* doc/md.texi: Document them
> 
> -- inline copy of patch --
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 273654cfa2525099ed0d9adc2f9085c8408b096f..4b556bdbbc2f989d894ef2dbe894318858eaa7aa 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1646,6 +1646,7 @@ OBJS = \
>  	tree-vect-loop.o \
>  	tree-vect-loop-manip.o \
>  	tree-vect-slp.o \
> +	tree-vect-slp-patterns.o \
>  	tree-vectorizer.o \
>  	tree-vector-builder.o \
>  	tree-vrp.o \
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 813875b973bc73920f1195c4897ce3491ca713c2..68792e4c3eeca2bce0112fc28eba38700e66dc00 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -1493,7 +1493,7 @@ can be described by a series of letters for each operand.  The overall
>  constraint for an operand is made from the letters for this operand
>  from the first alternative, a comma, the letters for this operand from
>  the second alternative, a comma, and so on until the last alternative.
> -All operands for a single instruction must have the same number of 
> +All operands for a single instruction must have the same number of
>  alternatives.
>  @ifset INTERNALS
>  Here is how it is done for fullword logical-or on the 68000:
> @@ -1558,17 +1558,17 @@ the first, 1 for the second alternative, etc.).  @xref{Output Statement}.
>  @end ifset
>  @ifclear INTERNALS
>  
> -So the first alternative for the 68000's logical-or could be written as 
> -@code{"+m" (output) : "ir" (input)}.  The second could be @code{"+r" 
> -(output): "irm" (input)}.  However, the fact that two memory locations 
> -cannot be used in a single instruction prevents simply using @code{"+rm" 
> -(output) : "irm" (input)}.  Using multi-alternatives, this might be 
> +So the first alternative for the 68000's logical-or could be written as
> +@code{"+m" (output) : "ir" (input)}.  The second could be @code{"+r"
> +(output): "irm" (input)}.  However, the fact that two memory locations
> +cannot be used in a single instruction prevents simply using @code{"+rm"
> +(output) : "irm" (input)}.  Using multi-alternatives, this might be
>  written as @code{"+m,r" (output) : "ir,irm" (input)}.  This describes
> -all the available alternatives to the compiler, allowing it to choose 
> +all the available alternatives to the compiler, allowing it to choose
>  the most efficient one for the current conditions.
>  
> -There is no way within the template to determine which alternative was 
> -chosen.  However you may be able to wrap your @code{asm} statements with 
> +There is no way within the template to determine which alternative was
> +chosen.  However you may be able to wrap your @code{asm} statements with
>  builtins such as @code{__builtin_constant_p} to achieve the desired results.
>  @end ifclear
>  
> @@ -3062,7 +3062,7 @@ instruction taking only the upper 16-bits of a
>  16-bits being 0.
>  
>  @item L
> -Integer that is valid as an immediate operand for a 
> +Integer that is valid as an immediate operand for a
>  shift instruction. Range 0 to 31.
>  
>  @item M
> @@ -3076,7 +3076,7 @@ Integer that is valid as an immediate operand for
>  a custom instruction opcode. Range 0 to 255.
>  
>  @item P
> -An immediate operand for R2 andchi/andci instructions. 
> +An immediate operand for R2 andchi/andci instructions.
>  
>  @item S
>  Matches immediates which are addresses in the small
> @@ -6132,6 +6132,97 @@ floating-point mode.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{cadd@var{m}@var{n}3} instruction pattern
> +@item @samp{cadd@var{m}@var{n}3}
> +Perform a vector addition of complex numbers in operand 1 with operand 2
> +rotated by @var{m} degrees around the argand plane and storing the result in
> +operand 0.  The instruction must perform the operation on data that in GCC lane
> +order where the 0th lane holds the real part and the the imaginary part in lane
> +1.

So I'm not sure I'm happy with defining this in terms of 'complex 
numbers'.  At least we nowhere specify how complex numbers are laid out in
vector registers.  For V4DF it could be any of {real0,imag0,real1,imag1},
{real0,real1, imag0,imag1}, {imag0,real0,imag1,real1} or
{imag0,imag1, real0,real1}.  While it's easy to give names to semantics
this way I'd prefer to define this in terms of even/odd lanes of a
vector.  Maybe it can refer to the aarch64 example of complex number
instructions implementing such scheme.

So at _least_ we have to define the layout of complex components
in the vector modes (preferably in a target independent way), say
even lanes are real parts and odd lanes are imaginary parts?

> +
> +The operation is only supported for vector modes @var{n} and with
> +rotations @var{m} of 90 or 270.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> +@cindex @code{cmla@var{m}4} instruction pattern
> +@item @samp{cmla@var{m}4}
> +Perform a vector floating point multiply and accumulate of complex numbers
> +in operand 0, operand 1 and operand 2.
> +
> +The instruction must perform the operation on data that in GCC lane
> +order where the 0th lane holds the real part and the the imaginary part in lane
> +1.
> +
> +The operation is only supported for vector modes @var{m}.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> +@cindex @code{cmla_conj@var{m}4} instruction pattern
> +@item @samp{cmla_conj@var{m}4}
> +Perform a vector floating point multiply and accumulate of complex numbers
> +in operand 0, operand 1 and the conjucate of operand 2.
> +
> +The instruction must perform the operation on data that in GCC lane
> +order where the 0th lane holds the real part and the the imaginary part in lane
> +1.
> +
> +The operation is only supported for vector modes @var{m}.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> +@cindex @code{cmls@var{m}4} instruction pattern
> +@item @samp{cmls@var{m}4}
> +Perform a vector floating point multiply and subtract of complex numbers
> +in operand 0, operand 1 and operand 2.
> +
> +The instruction must perform the operation on data that in GCC lane
> +order where the 0th lane holds the real part and the the imaginary part in lane
> +1.
> +
> +The operation is only supported for vector modes @var{m}.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> +@cindex @code{cmls_conj@var{m}4} instruction pattern
> +@item @samp{cmls_conj@var{m}4}
> +Perform a vector floating point multiply and subtract of complex numbers
> +in operand 0, operand 1 and the conjucate of operand 2.
> +
> +The instruction must perform the operation on data that in GCC lane
> +order where the 0th lane holds the real part and the the imaginary part in lane
> +1.
> +
> +The operation is only supported for vector modes @var{m}.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> +@cindex @code{cmul@var{m}4} instruction pattern
> +@item @samp{cmul@var{m}4}
> +Perform a vector floating point multiplication of complex numbers in operand 0
> +and operand 1.
> +
> +The instruction must perform the operation on data that in GCC lane
> +order where the 0th lane holds the real part and the the imaginary part in lane
> +1.
> +
> +The operation is only supported for vector modes @var{m}.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> +@cindex @code{cmul_conj@var{m}4} instruction pattern
> +@item @samp{cmul_conj@var{m}4}
> +Perform a vector floating point multiplication of complex numbers in operand 0
> +and the conjucate of operand 1.
> +
> +The instruction must perform the operation on data that in GCC lane
> +order where the 0th lane holds the real part and the the imaginary part in lane
> +1.
> +
> +The operation is only supported for vector modes @var{m}.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{ffs@var{m}2} instruction pattern
>  @item @samp{ffs@var{m}2}
>  Store into operand 0 one plus the index of the least significant 1-bit
> @@ -7409,7 +7500,7 @@ If this pattern is not defined, then a @code{memory_barrier} pattern
>  will be emitted, followed by a store of the value to the memory operand.
>  
>  @cindex @code{atomic_compare_and_swap@var{mode}} instruction pattern
> -@item @samp{atomic_compare_and_swap@var{mode}} 
> +@item @samp{atomic_compare_and_swap@var{mode}}
>  This pattern, if defined, emits code for an atomic compare-and-swap
>  operation with memory model semantics.  Operand 2 is the memory on which
>  the atomic operation is performed.  Operand 0 is an output operand which
> @@ -7424,7 +7515,7 @@ if the operation fails.
>  
>  If memory referred to in operand 2 contains the value in operand 3, then
>  operand 4 is stored in memory pointed to by operand 2 and fencing based on
> -the memory model in operand 6 is issued.  
> +the memory model in operand 6 is issued.
>  
>  If memory referred to in operand 2 does not contain the value in operand 3,
>  then fencing based on the memory model in operand 7 is issued.
> @@ -7500,9 +7591,9 @@ none of these are available a compare-and-swap loop will be used.
>  @itemx @samp{atomic_fetch_or@var{mode}}, @samp{atomic_fetch_and@var{mode}}
>  @itemx @samp{atomic_fetch_xor@var{mode}}, @samp{atomic_fetch_nand@var{mode}}
>  These patterns emit code for an atomic operation on memory with memory
> -model semantics, and return the original value. Operand 0 is an output 
> -operand which contains the value of the memory location before the 
> -operation was performed.  Operand 1 is the memory on which the atomic 
> +model semantics, and return the original value. Operand 0 is an output
> +operand which contains the value of the memory location before the
> +operation was performed.  Operand 1 is the memory on which the atomic
>  operation is performed.  Operand 2 is the second operand to the binary
>  operator.  Operand 3 is the memory model to be used by the operation.
>  
> @@ -7859,7 +7950,7 @@ simplified) from the PDP-11 target:
>          (plus:HI (match_dup 0)
>                (const_int -1)))]
>    ""
> -  
> +
>    @{
>      if (which_alternative == 0)
>        return "sob %0,%l1";
> diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> index a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99a23386891a7b0c1 100644
> --- a/gcc/doc/passes.texi
> +++ b/gcc/doc/passes.texi
> @@ -709,7 +709,8 @@ loop.
>  The pass is implemented in @file{tree-vectorizer.c} (the main driver),
>  @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts
>  and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
> -functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}.
> +functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and
> +@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher.
>  Analysis of data references is in @file{tree-data-ref.c}.
>  
>  SLP Vectorization.  This pass performs vectorization of straight-line code. The
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 310d37aa53819791b5df1683afca831f08e5892a..acb7d9f3bdc757437d5492a652144ba31c2ef702 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -277,12 +277,21 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
>  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
>  DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
>  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
> +
>  
>  /* FP scales.  */
>  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
>  
>  /* Ternary math functions.  */
>  DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST, cmla_conj, ternary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST, cmls_conj, ternary)
>  
>  /* Unary integer ops.  */
>  DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 78409aa14537d259bf90277751aac00d452a0d3f..19db9c00896cd08adfd20a01669990bbbebd79f1 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -290,6 +290,14 @@ OPTAB_D (atan_optab, "atan$a2")
>  OPTAB_D (atanh_optab, "atanh$a2")
>  OPTAB_D (copysign_optab, "copysign$F$a3")
>  OPTAB_D (xorsign_optab, "xorsign$F$a3")
> +OPTAB_D (cadd90_optab, "cadd90$a3")
> +OPTAB_D (cadd270_optab, "cadd270$a3")
> +OPTAB_D (cmul_optab, "cmul$a3")
> +OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
> +OPTAB_D (cmla_optab, "cmla$a4")
> +OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
> +OPTAB_D (cmls_optab, "cmls$a4")
> +OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
>  OPTAB_D (cos_optab, "cos$a2")
>  OPTAB_D (cosh_optab, "cosh$a2")
>  OPTAB_D (exp10_optab, "exp10$a2")
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..7a2c64aeb9f212cd3ee18adf110e6ac6696249fd
> --- /dev/null
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -0,0 +1,1844 @@
> +/* SLP - Pattern matcher on SLP trees
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "target.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "optabs-tree.h"
> +#include "insn-config.h"
> +#include "recog.h"		/* FIXME: for insn_data */
> +#include "fold-const.h"
> +#include "stor-layout.h"
> +#include "gimple-iterator.h"
> +#include "cfgloop.h"
> +#include "tree-vectorizer.h"
> +#include "langhooks.h"
> +#include "gimple-walk.h"
> +#include "dbgcnt.h"
> +#include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
> +#include "gimple-fold.h"
> +#include "internal-fn.h"
> +
> +/* SLP Pattern matching mechanism.
> +
> +  This extension to the SLP vectorizer allows one to transform the generated SLP
> +  tree based on any pattern.  The difference between this and the normal vect
> +  pattern matcher is that unlike the former, this matcher allows you to match
> +  with instructions that do not belong to the same SSA dominator graph.
> +
> +  The only requirement that this pattern matcher has is that you are only
> +  only allowed to either match an entire group or none.
> +
> +  The pattern matcher currently only allows you to perform replacements to
> +  internal functions.
> +
> +  Once the patterns are matched it is one way, these cannot be undone.  It is
> +  currently not supported to match patterns recursively.
> +
> +  To add a new pattern, implement the vect_pattern class and add the type to
> +  slp_patterns.
> +
> +*/
> +
> +/*******************************************************************************
> + * vect_pattern class
> + ******************************************************************************/
> +
> +/* Default implementation of recognize that peforms matching, validation and
> +   replacement of nodes but that can be overriden if required.  */
> +
> +bool
> +vect_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> +			 vec_info *vinfo)
> +{
> +  if (this->matches (perm_cache))
> +    {
> +      tree vectype = SLP_TREE_VECTYPE (*this->m_node);
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location,
> +			 "Found %s pattern in SLP tree\n",
> +			 this->get_name ());
> +
> +      if (this->is_optab_supported_p (vectype, OPTIMIZE_FOR_SPEED))
> +	{
> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_NOTE, vect_location,
> +			     "Target supports %s vectorization with mode %T\n",
> +			     internal_fn_name (this->get_ifn ()), vectype);
> +	}
> +      else
> +	{
> +	  if (dump_enabled_p ())
> +	    {
> +	      if (!vectype)
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				 "Target does not support vector type for %T\n",
> +				 SLP_TREE_DEF_TYPE (*this->m_node));
> +	      else
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				 "Target does not support %s for vector type "
> +				 "%T\n", internal_fn_name (this->get_ifn ()),
> +				 vectype);
> +	    }
> +          return false;
> +	}
> +    }
> +  else
> +    return false;
> +
> +  if (!this->validate_p (perm_cache))
> +    {
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location,
> +			"transformation for %s not valid due to post "
> +			"condition\n",
> +			internal_fn_name (this->get_ifn ()));
> +      return false;
> +    }
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "Creating vec patterns\n");
> +
> +  gcall* call_stmt = this->build (vinfo);
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "\t %p stmt: %G",
> +		     *this->m_node, call_stmt);
> +
> +  return true;
> +}
> +
> +/*******************************************************************************
> + * General helper types
> + ******************************************************************************/
> +
> +/* The COMPLEX_OPERATION enum denotes the possible pair of operations that can
> +   be matched when looking for expressions that we are interested matching for
> +   complex numbers addition and mla.  */
> +
> +typedef enum _complex_operation : unsigned {
> +  PLUS_PLUS,
> +  MINUS_PLUS,
> +  PLUS_MINUS,
> +  MULT_MULT,
> +  CMPLX_NONE
> +} complex_operation_t;
> +
> +/* A simple pair type used for ordering of combine operations.  */
> +
> +typedef struct map_ { int a; int b; } map_t;
> +
> +/*******************************************************************************
> + * General helper functions
> + ******************************************************************************/
> +
> +/* Helper function of linear_loads_p that checks to see if the load permutation
> +   is sequential and in monotonically increasing order of loads with no gaps.
> +*/
> +
> +static inline bool
> +is_linear_load_p (load_permutation_t loads)
> +{
> +  if (loads.length() == 0)
> +    return false;
> +
> +  unsigned leader = loads[0];
> +  unsigned load, i;
> +  FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
> +    if (load != ++leader)
> +      return false;
> +  return true;
> +}
> +
> +
> +/* Check to see if all loads rooted in ROOT are linear.  Linearity is
> +   defined as having no gaps between values loaded.  */
> +
> +static load_permutation_t
> +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree root,
> +		bool *linear)
> +{
> +  *linear = false;
> +  if (!root)
> +    return vNULL;
> +
> +  unsigned i;
> +  load_permutation_t loads = vNULL;
> +  load_permutation_t *tmp;
> +
> +  if ((tmp = perm_cache->get (root)) != NULL)
> +    {
> +      *linear = is_linear_load_p (*tmp);
> +      return *tmp;
> +    }
> +
> +  perm_cache->put (root, vNULL);
> +
> +  /* If it's a load node, then just read the load permute.  */
> +  if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
> +    {
> +      loads = SLP_TREE_LOAD_PERMUTATION (root);
> +      perm_cache->put (root, loads);
> +      if (!is_linear_load_p (loads))
> +	return loads;
> +    }
> +  else if (SLP_TREE_DEF_TYPE (root) == vect_external_def)
> +    {
> +       loads.create (SLP_TREE_LANES (root));
> +       tree op;
> +       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (root), i, op)
> +	 {
> +	   gimple *defstmt = SSA_NAME_DEF_STMT (op);
> +	   if (!is_gimple_assign (defstmt))
> +	     return vNULL;
> +
> +	   switch (gimple_assign_rhs_code (defstmt))
> +	   {
> +	     case IMAGPART_EXPR:
> +	       loads.safe_push (1);
> +	       break;
> +	     case REALPART_EXPR:
> +	       loads.safe_push (0);
> +	       break;
> +	     default:
> +	       {
> +		 loads.release ();
> +		 return vNULL;
> +	       }
> +	   }
> +	 }
> +
> +       perm_cache->put (root, loads);
> +       if (!is_linear_load_p (loads))
> +	 return loads;
> +    }
> +  else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def)
> +    return vNULL;
> +
> +
> +
> +  auto_vec<load_permutation_t> all_loads;
> +  bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists ();
> +
> +  slp_tree child;
> +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
> +    {
> +      /* Seed it with a value to prevent infinite recursions.  */
> +      loads = linear_loads_p (perm_cache, child, linear);
> +      if ((!*linear && !is_perm) || !loads.exists ())
> +	return loads;
> +
> +      all_loads.safe_push (loads);
> +    }
> +
> +  if (is_perm)
> +    {
> +      lane_permutation_t perm = SLP_TREE_LANE_PERMUTATION (root);
> +      load_permutation_t nloads;
> +      nloads.create (SLP_TREE_LANES (root));
> +      nloads.quick_grow (SLP_TREE_LANES (root));
> +      for (i = 0; i < SLP_TREE_LANES (root); i++)
> +	nloads[i] = all_loads[perm[i].first][perm[i].second];
> +
> +      perm_cache->put (root, nloads);
> +      if (!is_linear_load_p (nloads))
> +	return nloads;
> +      loads = nloads;
> +    }
> +
> +  perm_cache->put (root, loads);
> +  *linear = true;
> +  return loads;
> +}
> +
> +/* Builds a permutation group from the operands in OPS and stores it in BLOCKS.
> +   The group describes how to combine the operators to get a valid linear node.
> +
> +   This is used when combining multiple children from a two_operators node into
> +   one using a lane permute to select the appropriate lane. As an example the
> +   permute { [0 0] [1 4] [2 2] [3 3] [1 4] [5 5] } says the nodes which occur
> +   twice in a group, e.g [0 0] only needs itself to possibly be made linear
> +   whereas [1 4] means to combine the nodes 1 and 4. */
> +
> +static void
> +vect_build_perm_groups (slp_tree_to_load_perm_map_t *perm_cache,
> +			map_t *blocks, vec<slp_tree> ops)
> +{
> +  slp_tree op;
> +  unsigned i;
> +  bool is_linear = false;
> +  unsigned min_eq = -1, max_eq = 0;
> +  unsigned min_idx = 0, max_idx = 0;
> +  FOR_EACH_VEC_ELT (ops, i, op)
> +    {
> +      load_permutation_t perms = linear_loads_p (perm_cache, op, &is_linear);
> +      unsigned x, imin = -1, imax = 0;
> +      for (x = 0; x < perms.length () && !is_linear; x++)
> +	{
> +	  imin = MIN (imin, perms[x]);
> +	  imax = MAX (imax, perms[x]);
> +	}
> +
> +	if (imin != imax || perms.length () == 0 || is_linear)
> +	  blocks[i] = {i, i};
> +	else
> +	  {
> +	    if (imin <= min_eq)
> +	      {
> +		min_eq = imin;
> +		min_idx = i;
> +	      }
> +
> +	    if (imin >= max_eq)
> +	      {
> +		max_eq = imin;
> +		max_idx = i;
> +	      }
> +	  }
> +    }
> +
> +  /* Now fill in the gap.  */
> +  blocks[min_idx] = {min_idx, max_idx};
> +  blocks[max_idx] = {min_idx, max_idx};
> +
> +  if (dump_enabled_p ())
> +    {
> +      dump_printf_loc (MSG_NOTE, vect_location, "pattern group: { ");
> +      for (i = 0; i < ops.length (); i++)
> +	dump_printf (MSG_NOTE,"[%d %d] ", blocks[i].a, blocks[i].b);
> +      dump_printf (MSG_NOTE,"}\n");
> +    }
> +
> +}
> +
> +/* This function attempts to make a node rooted in NODE with parent PARENT
> +   linear.  If the node if already linear than the node itself is returned
> +   in RESULT.
> +
> +   If the node is not linear then a new VEC_PERM_EXPR node is created with a
> +   lane permute that when applied will make the node linear.   If such a
> +   permute cannot be created then FALSE is returned from the function.
> +
> +   Here linearity is defined as having a sequential, monotically increasing
> +   load position inside the load permute generated by the loads reachable from
> +   NODE.  */
> +
> +static bool
> +vect_slp_make_linear (slp_tree_to_load_perm_map_t *perm_cache,
> +		      slp_tree parent, slp_tree node, slp_tree *result)
> +{
> +  bool is_linear = false;
> +  unsigned x, val;
> +  load_permutation_t load_perm = linear_loads_p (perm_cache, node, &is_linear);
> +  if (is_linear)
> +    {
> +      *result = node;
> +      return true;
> +    }
> +
> +  /* Attempt to linearise the permute.  */
> +  vec<std::pair<unsigned, unsigned> > zipped;
> +  zipped.create (load_perm.length ());
> +  FOR_EACH_VEC_ELT (load_perm, x, val)
> +    zipped.quick_push (std::make_pair (val, x));
> +
> +  typedef const std::pair<unsigned, unsigned>* cmp_t;
> +  zipped.qsort ([](const void *a, const void *b) -> int
> +    { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
> +
> +  /* Verify if we have a linear permute sequence.  */
> +  if (zipped.length () > 0)
> +    {
> +      unsigned leader = zipped[0].first;
> +      for (x = 1; x < zipped.length (); x++)
> +	if(!(is_linear = (zipped[x].first == ++leader)))
> +	  break;
> +    }
> +
> +  if (!is_linear)
> +    {
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location,
> +			"Loads could not be made linear %p\n",
> +			node);
> +      zipped.release ();
> +      return false;
> +  }
> +
> +  for (x = 0; x < zipped.length (); x++)
> +    zipped[x].first = 0;
> +
> +  /* Create the new permute node and store it instead.  */
> +  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
> +  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
> +  SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
> +  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
> +  SLP_TREE_CHILDREN (vnode).quick_push (node);
> +  SLP_TREE_REF_COUNT (vnode) = 1;
> +  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
> +  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (parent);
> +  *result = vnode;
> +  return is_linear;
> +}
> +
> +/* Helper utility to check to see if the permutation PERM is one that can be
> +   used in a node combination operation.  This is defined as the permute not
> +   having all the elements being the same. e.g [0 0].  */
> +
> +static inline bool
> +vect_can_combine_node_p (load_permutation_t perm, bool is_linear)
> +{
> +  if (is_linear)
> +    return false;
> +
> +  unsigned i, x;
> +  FOR_EACH_VEC_ELT (perm, i, x)
> +    if (perm[0] != x)
> +      return false;
> +
> +  return true;
> +}
> +
> +/* This function combines the nodes in MAP together to make a new node using a
> +   lane permute.  The nodes to combine are stored in ENTRIES and the resulting
> +   node is returned in RESULT.
> +
> +   If the nodes are already linear then this function fails and returns FALSE.
> +   Otherwise it returns the new node and TRUE.  */
> +
> +static bool
> +vect_slp_make_combine_linear (slp_tree_to_load_perm_map_t *perm_cache,
> +			      slp_tree parent, vec<slp_tree> entries, map_t map,
> +			      slp_tree *result)
> +{
> +  if (map.a == map.b)
> +    return false;
> +
> +  slp_tree node_a = entries[map.a];
> +  slp_tree node_b = entries[map.b];
> +
> +  bool is_a_linear = false;
> +  bool is_b_linear = false;
> +
> +  load_permutation_t load_perm_a
> +    = linear_loads_p (perm_cache, node_a, &is_a_linear);
> +  if (!vect_can_combine_node_p (load_perm_a, is_a_linear))
> +    return false;
> +
> +  load_permutation_t load_perm_b
> +    = linear_loads_p (perm_cache, node_b, &is_b_linear);
> +  if (!vect_can_combine_node_p (load_perm_b, is_b_linear))
> +    return false;
> +
> +  if (!load_perm_a.exists () || !load_perm_b.exists ())
> +    return false;
> +
> +  /* Now we need to figure which node is first.  */
> +  auto_vec<slp_tree> nodes;
> +  nodes.create (2);
> +  vec<std::pair<unsigned, unsigned> > perm;
> +  perm.create (2);
> +  if (load_perm_a[0] < load_perm_b[0])
> +    {
> +      perm.quick_push (std::make_pair (0, 0));
> +      perm.quick_push (std::make_pair (1, 0));
> +    }
> +  else
> +    {
> +      perm.quick_push (std::make_pair (1, 0));
> +      perm.quick_push (std::make_pair (0, 0));
> +    }
> +
> +  nodes.quick_push (node_a);
> +  nodes.quick_push (node_b);
> +
> +  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
> +  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
> +  SLP_TREE_LANE_PERMUTATION (vnode) = perm;
> +  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
> +  SLP_TREE_CHILDREN (vnode).safe_splice (nodes);
> +  SLP_TREE_REF_COUNT (vnode) = 1;
> +  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (parent);
> +  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (parent);
> +  *result = vnode;
> +  return true;
> +}
> +
> +/* Checks to see of the expression represented by NODE is a gimple assign with
> +   code CODE.  */
> +
> +static inline bool
> +vect_match_expression_p (slp_tree node, tree_code code)
> +{
> +  if (!node
> +      || !SLP_TREE_REPRESENTATIVE (node))
> +    return false;
> +
> +  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node));
> +  if (!is_gimple_assign (expr)
> +      || gimple_assign_rhs_code (expr) != code)
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Checks to see if the expression represented by NODE is a call to the internal
> +   function FN.  */
> +
> +static inline bool
> +vect_match_call_p (slp_tree node, internal_fn fn)
> +{
> +  if (!node
> +      || !SLP_TREE_REPRESENTATIVE (node))
> +    return false;
> +
> +  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node));
> +  if (!gimple_call_internal_p (expr, fn))
> +    return false;
> +
> +   return true;
> +}
> +
> +/* Checks to see if NODE is a two_operands node whom's operands are calls to the
> +   internal function FN.  */
> +
> +static bool
> +vect_match_two_call_p (slp_tree node, internal_fn fn)
> +{
> +  if (!SLP_TREE_REPRESENTATIVE (node)
> +      || SLP_TREE_CODE (node) != VEC_PERM_EXPR
> +      || SLP_TREE_CHILDREN (node).length () != 2)
> +    return false;
> +
> +  vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> +
> +  return vect_match_call_p (children[0], fn)
> +	 && vect_match_call_p (children[1], fn);
> +}
> +
> +/* Check if the given lane permute in PERMUTES matches an alternating sequence
> +   of {P0 P1 P0 P1 ...}.  This to account for unrolled loops.  Further mode
> +   there resulting permute must be linear.   */
> +
> +static bool
> +vect_check_lane_permute (lane_permutation_t &permutes,
> +			 unsigned p0, unsigned p1)
> +{
> +  if (permutes.length () == 0)
> +    return false;
> +
> +  unsigned val[2] = {p0, p1};
> +  unsigned seed = permutes[0].second;
> +  for (unsigned i = 0; i < permutes.length (); i++)
> +    if (permutes[i].first != val[i % 2]
> +	|| permutes[i].second != seed++)
> +      return false;
> +
> +  return true;
> +}
> +
> +/* This function will match the two gimple expressions representing NODE1 and
> +   NODE2 in parallel and returns the pair operation that represents the two
> +   expressions in the two statements.
> +
> +   If match is successful then the corresponding complex_operation is
> +   returned and the arguments to the two matched operations are returned in OPS.
> +
> +   If TWO_OPERANDS it is expected that the LANES of the parent VEC_PERM select
> +   from the two nodes alternatingly.
> +
> +   If unsuccessful then CMPLX_NONE is returned and OPS is untouched.
> +
> +   e.g. the following gimple statements
> +
> +   stmt 0 _39 = _37 + _12;
> +   stmt 1 _6 = _38 - _36;
> +
> +   will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
> +*/
> +
> +static complex_operation_t
> +vect_detect_pair_op (slp_tree node1, slp_tree node2, lane_permutation_t &lanes,
> +		     bool two_operands = true, vec<slp_tree> *ops = NULL)
> +{
> +  complex_operation_t result = CMPLX_NONE;
> +
> +  if (vect_match_expression_p (node1, MINUS_EXPR)
> +      && vect_match_expression_p (node2, PLUS_EXPR)
> +      && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> +    result = MINUS_PLUS;
> +  else if (vect_match_expression_p (node1, PLUS_EXPR)
> +	   && vect_match_expression_p (node2, MINUS_EXPR)
> +	   && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> +    result = PLUS_MINUS;
> +  else if (vect_match_expression_p (node1, PLUS_EXPR)
> +	   && vect_match_expression_p (node2, PLUS_EXPR))
> +    result = PLUS_PLUS;
> +  else if (vect_match_expression_p (node1, MULT_EXPR)
> +	   && vect_match_expression_p (node2, MULT_EXPR))
> +    result = MULT_MULT;
> +
> +  if (result != CMPLX_NONE && ops != NULL)
> +    {
> +      ops->create (2);
> +      ops->quick_push (node1);
> +      ops->quick_push (node2);
> +    }
> +  return result;
> +}
> +
> +/* Overload of vect_detect_pair_op that matches against the representative
> +   statements in the children of NODE.  It is expected that NODE has exactly
> +   two children and when TWO_OPERANDS then NODE must be a VEC_PERM.  */
> +
> +static complex_operation_t
> +vect_detect_pair_op (slp_tree node, bool two_operands = true,
> +		     vec<slp_tree> *ops = NULL)
> +{
> +  if (!two_operands && SLP_TREE_CODE (node) == VEC_PERM_EXPR)
> +    return CMPLX_NONE;
> +
> +  if (SLP_TREE_CHILDREN (node).length () != 2)
> +    return CMPLX_NONE;
> +
> +  vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> +  lane_permutation_t &lanes = SLP_TREE_LANE_PERMUTATION (node);
> +
> +  return vect_detect_pair_op (children[0], children[1], lanes, two_operands,
> +			      ops);
> +}
> +
> +/* This function marks every statement that is being replaced during the
> +   the pattern matching as PURE.  Normally when replacing a statement due
> +   to a pattern we add the statement to the STMT_VINFO_PATTERN_DEF_SEQ of
> +   the pattern that is replacing them.  In this case however this won't
> +   work as when doing the replacement we are changing the nodes that are
> +   used by the statements.  This means that when vectorized the SSA chain
> +   is different than in the BB.
> +
> +   Declaring the statements as part of the sequence will then cause SSA
> +   verification to fail as we may refer to statements that were not in the
> +   original USE-DEF chain of the statement we are replacing.
> +
> +   The relevancy of the statements are saved and are later restored if SLP is to
> +   be dissolved.  */
> +
> +static void
> +vect_mark_stmts_as_in_pattern (hash_set<slp_tree> *cache, slp_tree node)
> +{
> +  if (cache->add (node))
> +    return;
> +
> +  unsigned i;
> +  stmt_vec_info stmt_info;
> +  slp_tree child;
> +
> +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> +    {
> +      if (gimple_assign_load_p (STMT_VINFO_STMT (stmt_info)))
> +       return;
> +
> +      vect_save_relevancy (stmt_info);
> +      STMT_VINFO_SLP_VECT_ONLY (stmt_info) = true;
> +      STMT_SLP_TYPE (stmt_info) = pure_slp;
> +    }
> +
> +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> +    vect_mark_stmts_as_in_pattern (cache, child);
> +}
> +
> +
> +/*******************************************************************************
> + * complex_pattern class
> + ******************************************************************************/
> +
> +/* SLP Complex Numbers pattern matching.
> +
> +  As an example, the following simple loop:
> +
> +    double a[restrict N]; double b[restrict N]; double c[restrict N];
> +
> +    for (int i=0; i < N; i+=2)
> +    {
> +      c[i] = a[i] - b[i+1];
> +      c[i+1] = a[i+1] + b[i];
> +    }
> +
> +  which represents a complex addition on with a rotation of 90* around the
> +  argand plane. i.e. if `a` and `b` were complex numbers then this would be the
> +  same as `a + (b * I)`.
> +
> +  Here the expressions for `c[i]` and `c[i+1]` are independent but have to be
> +  both recognized in order for the pattern to work.  As an SLP tree this is
> +  represented as
> +
> +                +--------------------------------+
> +                |       stmt 0 *_9 = _10;        |
> +                |       stmt 1 *_15 = _16;       |
> +                +--------------------------------+
> +                                |
> +                                |
> +                                v
> +                +--------------------------------+
> +                |     stmt 0 _10 = _4 - _8;      |
> +                |    stmt 1 _16 = _12 + _14;     |
> +                | lane permutation { 0[0] 1[1] } |
> +                +--------------------------------+
> +                            |        |
> +                            |        |
> +                            |        |
> +               +-----+      |        |      +-----+
> +               |     |      |        |      |     |
> +         +-----| { } |<-----+        +----->| { } --------+
> +         |     |     |   +------------------|     |       |
> +         |     +-----+   |                  +-----+       |
> +         |        |      |                                |
> +         |        |      |                                |
> +         |        +------|------------------+             |
> +         |               |                  |             |
> +         v               v                  v             v
> +     +--------------------------+     +--------------------------------+
> +     |     stmt 0 _8 = *_7;     |     |        stmt 0 _4 = *_3;        |
> +     |    stmt 1 _14 = *_13;    |     |       stmt 1 _12 = *_11;       |
> +     | load permutation { 1 0 } |     |    load permutation { 0 1 }    |
> +     +--------------------------+     +--------------------------------+
> +
> +  The pattern matcher allows you to replace both statements 0 and 1 or none at
> +  all.  Because this operation is a two operands operation the actual nodes
> +  being replaced are those in the { } nodes.  The actual scalar statements
> +  themselves are not replaced or used during the matching but instead the
> +  SLP_TREE_REPRESENTATIVE statements are inspected.  You are also allowed to
> +  replace and match on any number of nodes.
> +
> +  Because the pattern matcher matches on the representative statement for the
> +  SLP node the case of two_operators it allows you to match the children of the
> +  node.  This is done using the method `recognize ()`.
> +
> +*/
> +
> +/* The complex_pattern class contains common code for pattern matchers that work
> +   on complex numbers.  These provide functionality to allow de-construction and
> +   validation of sequences depicting/transforming REAL and IMAG pairs.  */
> +
> +class complex_pattern : public vect_pattern
> +{
> +  protected:
> +    vec<slp_tree> *m_nodes = NULL;
> +    complex_pattern (slp_tree *node)
> +      : vect_pattern (node)
> +    { }
> +
> +  public:
> +    bool validate_p (slp_tree_to_load_perm_map_t *);
> +    gcall *build (vec_info *);
> +};
> +
> +/* The post transform and validation function for the complex number
> +   patterns.  This will re-arrange the tree and re-organize the nodes such
> +   that they can be used by the complex number instructions that are to be
> +   created.  It does this by doing the following steps:
> +
> +   For each node that isn't already linear a new permute node is created which
> +   when applied makes the node linear.  If such permutation does not exist then
> +   the function returns FALSE.  For nodes that are already linear no work is
> +   done.   */
> +
> +bool
> +complex_pattern::validate_p (slp_tree_to_load_perm_map_t *perm_cache)
> +{
> +  slp_tree node;
> +  unsigned ix;
> +  auto_vec<slp_tree> workset;
> +  FOR_EACH_VEC_ELT (this->m_ops, ix, node)
> +    {
> +      auto_vec<slp_tree> nodes;
> +      nodes.create (this->m_num_args);
> +      slp_tree tmp = NULL;
> +      auto_vec<slp_tree> new_nodes;
> +
> +      unsigned i;
> +      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp)
> +	{
> +	  slp_tree vnode = NULL;
> +	  if (vect_slp_make_linear (perm_cache, node, tmp, &vnode))
> +	    {
> +	      nodes.quick_push (vnode);
> +	      if (vnode != tmp)
> +		new_nodes.safe_push (vnode);
> +	    }
> +	  else
> +	    {
> +	      FOR_EACH_VEC_ELT (new_nodes, i, tmp)
> +		delete tmp;
> +
> +	      FOR_EACH_VEC_ELT (workset, i, tmp)
> +		delete tmp;
> +
> +	      nodes.release();
> +	      return false;
> +	    }
> +	}
> +
> +      slp_tree new_node
> +	= vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node),
> +				    SLP_TREE_CHILDREN (node).length ());
> +      SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node);
> +      SLP_TREE_LANE_PERMUTATION (new_node) = SLP_TREE_LANE_PERMUTATION (node);
> +      SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node);
> +      SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT (node);
> +      SLP_TREE_REPRESENTATIVE (new_node) = SLP_TREE_REPRESENTATIVE (node);
> +      SLP_TREE_CHILDREN (new_node).safe_splice (nodes);
> +      SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node);
> +
> +      workset.safe_push (new_node);
> +    }
> +
> +  FOR_EACH_VEC_ELT (workset, ix, node)
> +    SLP_TREE_CHILDREN (*this->m_node)[ix] = node;
> +
> +  return true;
> +}
> +
> +
> +/* Create a replacement pattern statement for each node in m_node and inserts
> +   the new statement into m_node as the new representative statement.  The old
> +   statement is marked as being in a pattern defined by the new statement.  The
> +   statement is created as call to internal function IFN with m_num_args
> +   arguments.
> +
> +   Futhermore the new pattern is also added to the vectorization information
> +   structure VINFO and the old statement STMT_INFO is marked as unused while
> +   the new statement is marked as used and the number of SLP uses of the new
> +   statement is incremented.
> +
> +   The newly created SLP nodes are marked as SLP only and will be dissolved
> +   if SLP is aborted.
> +
> +   The newly created gimple call is returned and the BB remains unchanged.
> +
> +   This default method is designed to only match against simple operands where
> +   all the input and output types are the same.
> +*/
> +
> +gcall *
> +complex_pattern::build (vec_info *vinfo)
> +{
> +  stmt_vec_info stmt_info;
> +
> +  auto_vec<tree> args;
> +  args.create (this->m_num_args);
> +  args.quick_grow_cleared (this->m_num_args);
> +  slp_tree node;
> +  unsigned ix;
> +  stmt_vec_info call_stmt_info;
> +  gcall *call_stmt = NULL;
> +
> +  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)

So elsewhere I see m_nodes assigned from SLP_TREE_CHILDREN
of m_node.  I assume m_node is the VEC_PERM of the two-operator
operation.  I expected we create a call for m_node here, not
for all children?

> +    {
> +      /* Calculate the location of the statement in NODE to replace.  */
> +      stmt_info = SLP_TREE_REPRESENTATIVE (node);
> +      gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
> +      tree type = TREE_TYPE (gimple_get_lhs (old_stmt));
> +
> +      /* Create the argument set for use by gimple_build_call_internal_vec.  */
> +      for (unsigned i = 0; i < this->m_num_args; i++)
> +	{
> +	  tree var = make_temp_ssa_name (type, old_stmt, "_pa");

Why do you need to create N new SSA names (all with the same
definition statement)?  It looks like this new "scalar" stmt
is only a template stored in SLP_TREE_REPRESENTATIVE
and supposedly there needs to be "something" in the actual
arguments it make us not crash later?  Can't you for example
simply push the old_stmt LHS N times?

For VEC_PERM I originally wanted to set SLP_TREE_REPRESENTATIVE
to NULL_TREE but we don't like that obviously.  So long-term
we might want to change SLP_TREE_CODE to a match_code to allow
tree codes and internal-functions and "ignore" the representative
in case it is not ERROR_MARK.

> +	  args[i] = var;
> +	}
> +
> +      /* Create the new pattern statements.  */
> +      call_stmt = gimple_build_call_internal_vec (this->m_ifn, args);
> +      tree var = make_temp_ssa_name (type, call_stmt, "slp_patt");
> +      gimple_call_set_lhs (call_stmt, var);
> +      gimple_set_location (call_stmt, gimple_location (old_stmt));
> +      gimple_call_set_nothrow (call_stmt, true);
> +
> +      /* Adjust the book-keeping for the new and old statements for use during
> +	 SLP.  This is required to get the right VF and statement during SLP
> +	 analysis.  These changes are created after relevancy has been set for
> +	 the nodes as such we need to manually update them.  Any changes will be
> +	 undone if SLP is cancelled.  */
> +      call_stmt_info
> +	= vinfo->add_pattern_stmt (call_stmt, stmt_info);
> +
> +      /* add_pattern_stmt can't be done in vect_mark_pattern_stmts because
> +	 the non-SLP pattern matchers already have added the statement to VINFO
> +	 by the time it is called.  Some of them need to modify the returned
> +	 stmt_info.  vect_mark_pattern_stmts is called by recog_pattern and it
> +	 would increase the size of each pattern with boilerplate code to make
> +	 the call there.  */
> +      vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt,
> +			       SLP_TREE_VECTYPE (node));
> +      STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> +      STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
> +
> +      /* Since we are replacing all the statements in the group with the same
> +	 thing it doesn't really matter.  So just set it every time a new stmt
> +	 is created.  */
> +      SLP_TREE_REPRESENTATIVE (node) = call_stmt_info;
> +      SLP_TREE_CODE (node) = CALL_EXPR;
> +    }
> +  return call_stmt;
> +}
> +
> +/*******************************************************************************
> + * complex_add_pattern class
> + ******************************************************************************/
> +
> +class complex_add_pattern : public complex_pattern
> +{
> +  protected:
> +    complex_add_pattern (slp_tree *node)
> +      : complex_pattern (node)
> +    {
> +      this->m_num_args = 2;
> +    }
> +
> +  public:
> +    static vect_pattern* create (slp_tree *node)
> +    {
> +       return new complex_add_pattern (node);
> +    }
> +
> +    const char* get_name ()
> +    {
> +      return "Complex Addition";
> +    }
> +
> +    bool matches (slp_tree_to_load_perm_map_t *);
> +    bool matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
> +		  vec<slp_tree> ops);
> +};
> +
> +/* Pattern matcher for trying to match complex addition pattern in SLP tree.
> +
> +   If no match is found then IFN is set to IFN_LAST.
> +   This function matches the patterns shaped as:
> +
> +   c[i] = a[i] - b[i+1];
> +   c[i+1] = a[i+1] + b[i];
> +
> +   If a match occurred then TRUE is returned, else FALSE.  The initial match is
> +   expected to be in OP1 and the initial match operands in args0.  */
> +
> +bool
> +complex_add_pattern::matches (complex_operation_t op,
> +			      slp_tree_to_load_perm_map_t *perm_cache,
> +			      vec<slp_tree> ops)
> +{
> +  this->m_ifn = IFN_LAST;
> +  this->m_ops.safe_splice (ops);
> +
> +  /* Find the two components.  Rotation in the complex plane will modify
> +     the operations:
> +
> +      * Rotation  0: + +
> +      * Rotation 90: - +
> +      * Rotation 180: - -
> +      * Rotation 270: + -
> +
> +      Rotation 0 and 180 can be handled by normal SIMD code, so we don't need
> +      to care about them here.  */
> +  if (op == MINUS_PLUS)
> +    this->m_ifn = IFN_COMPLEX_ADD_ROT90;
> +  else if (op == PLUS_MINUS)
> +    this->m_ifn = IFN_COMPLEX_ADD_ROT270;
> +
> +  /* verify that there is a permute, otherwise this isn't a pattern we
> +     we support.  */
> +  bool is_linear = false;
> +  bool all_linear = true;
> +  unsigned x;
> +  for (x = 0; x < this->m_ops.length () && all_linear; x++)
> +    {
> +      linear_loads_p (perm_cache, this->m_ops[x], &is_linear);
> +      all_linear = all_linear && is_linear;
> +    }
> +
> +  if (this->m_ifn == IFN_LAST || all_linear)
> +    return false;
> +
> +  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> +  return true;
> +}
> +
> +bool
> +complex_add_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache)
> +{
> +  complex_operation_t op
> +    = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
> +  return matches (op, perm_cache, this->m_ops);
> +}
> +
> +/*******************************************************************************
> + * complex_mul_pattern
> + ******************************************************************************/
> +
> +/* Helper function of that looks for a match in the CHILDth child of NODE.  The
> +   child used is stored in RES.
> +
> +   If the match is successful then ARGS will contain the operands matched
> +   and the complex_operation_t type is returned.  If match is not successful
> +   then CMPLX_NONE is returned and ARGS is left unmodified.  */
> +
> +static inline complex_operation_t
> +vect_match_call_complex_mla (slp_tree node, unsigned child,
> +			     vec<slp_tree> *args = NULL, slp_tree *res = NULL)
> +{
> +  gcc_assert (child < SLP_TREE_CHILDREN (node).length ());
> +
> +  slp_tree data = SLP_TREE_CHILDREN (node)[child];
> +
> +  if (res)
> +    *res = data;
> +
> +  return vect_detect_pair_op (data, false, args);
> +}
> +
> +/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the first
> +   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
> +
> +   If a negate is found then the values in ARGS are reordered such that the
> +   negate node is always the second one and the entry is replaced by the child
> +   of the negate node.  */
> +
> +static inline bool
> +vect_normalize_conj_loc (vec<slp_tree> args, bool *neg_first_p = NULL)
> +{
> +  gcc_assert (args.length () == 2);
> +  bool neg_found = false;
> +
> +  if (vect_match_expression_p (args[0], NEGATE_EXPR))
> +    {
> +      std::swap (args[0], args[1]);
> +      neg_found = true;
> +      if (neg_first_p)
> +	*neg_first_p = true;
> +    }
> +  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
> +    {
> +      neg_found = true;
> +      if (neg_first_p)
> +	*neg_first_p = false;
> +    }
> +
> +  if (neg_found)
> +    args[1] = SLP_TREE_CHILDREN (args[1])[0];
> +
> +  return neg_found;
> +}
> +
> +/* Helper function that checks to see if the load permutation PERM is on that
> +   belongs to a duplication operation.  i.e. checks to see if all the values in
> +   PERM are the same.  */
> +
> +static inline bool
> +vect_is_dup (load_permutation_t perm, unsigned idx)
> +{
> +  for (unsigned i = 0; i < perm.length (); i++)
> +    if (perm[i] != idx)
> +     return false;
> +
> +  return true;
> +}
> +
> +/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both MULT_EXPR
> +   nodes but also that they represent an operation that is either a complex
> +   multiplication or a complex multiplication by conjucated value.
> +
> +   Of the negation is expected to be in the first half of the tree (As required
> +   by an FMS pattern) then NEG_FIRST is true.  If the operation is a conjucate
> +   operation then CONJ_FIRST_OPERAND is set to indicate whether the first or
> +   second operand contains the conjucate operation.  */
> +
> +static inline bool
> +vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
> +			     vec<slp_tree> left_op, vec<slp_tree> right_op,
> +			     bool neg_first, bool *conj_first_operand)
> +{
> +  /* The presence of a negation indicates that we have either a conjucate or a
> +     rotation.  We need to distinguish which one.  */
> +  *conj_first_operand = false;
> +
> +  load_permutation_t perm;
> +  bool is_linear;
> +  /* Complex conjucates have the negation on the imaginary part of the
> +     number where rotations affect the real component.  So check if the
> +     negation is on a dup of lane 1.  */
> +  perm = linear_loads_p (perm_cache, right_op[1], &is_linear);
> +  if (is_linear || !vect_is_dup (perm, 1))
> +    return false;
> +
> +  /* Check if the conjucate is on the second first or second operand.  The
> +     order of the node with the conjucate value determines this, and the dup
> +     node must be one of lane 0 of the same DR as the neg node.  */
> +  perm = linear_loads_p (perm_cache, left_op[0], &is_linear);
> +  if (is_linear)
> +    {
> +      perm = linear_loads_p (perm_cache, left_op[1], &is_linear);
> +      if (is_linear)
> +	return false;
> +    }
> +  else if (!neg_first)
> +    *conj_first_operand = true;
> +  else
> +    return false;
> +
> +  if (!vect_is_dup (perm, 0))
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Helper function to help distinguish between a conjucate and a rotation in a
> +   complex multiplication.  The operations have similar shapes but the order of
> +   the load permutes are different.  This function returns TRUE when the order
> +   is consistent with a multiplication or multiplication by conjucated
> +   operand but returns FALSE if it's a multiplication by rotated operand.  */
> +
> +static inline bool
> +vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
> +			     vec<slp_tree> op, unsigned idx)
> +{
> +  /* The left node is the more common case, test it first.  */
> +  bool is_linear;
> +  load_permutation_t perm = linear_loads_p (perm_cache, op[0], &is_linear);
> +  if (is_linear || !vect_is_dup (perm, idx))
> +    {
> +      perm = linear_loads_p (perm_cache, op[1], &is_linear);
> +      if (is_linear || !vect_is_dup (perm, idx))
> +	return false;
> +    }
> +
> +  return true;
> +}
> +
> +class complex_mul_pattern : public complex_pattern
> +{
> +  protected:
> +    /* Allocate enough space for FMA as well.  */
> +    map_t m_blocks[6] = {};
> +    bool m_inplace = false;
> +    complex_mul_pattern (slp_tree *node)
> +      : complex_pattern (node)
> +    {
> +      this->m_num_args = 2;
> +    }
> +
> +  public:
> +    static vect_pattern* create (slp_tree *node)
> +    {
> +       return new complex_mul_pattern (node);
> +    }
> +
> +    const char* get_name ()
> +    {
> +      return "Complex Multiplication";
> +    }
> +
> +    bool validate_p (slp_tree_to_load_perm_map_t *);
> +    bool matches (slp_tree_to_load_perm_map_t *);
> +    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> +		  vec<slp_tree>);
> +};
> +
> +/* Pattern matcher for trying to match complex multiply pattern in SLP tree
> +   If the operation matches then IFN is set to the operation it matched
> +   and the arguments to the two replacement statements are put in m_ops.
> +
> +   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> +
> +   This function matches the patterns shaped as:
> +
> +   double ax = (b[i+1] * a[i]);
> +   double bx = (a[i+1] * b[i]);
> +
> +   c[i] = c[i] - ax;
> +   c[i+1] = c[i+1] + bx;
> +
> +   If a match occurred then TRUE is returned, else FALSE.  The initial match is
> +   expected to be in OP1 and the initial match operands in args0.  */
> +
> +bool
> +complex_mul_pattern::matches (complex_operation_t op,
> +			      slp_tree_to_load_perm_map_t *perm_cache,
> +			      vec<slp_tree> /* ops */)
> +{
> +  this->m_ifn = IFN_LAST;
> +
> +  if (op != MINUS_PLUS)
> +    return false;
> +
> +  slp_tree root = *this->m_node;
> +  /* First two nodes must be a multiply.  */
> +  auto_vec<slp_tree> muls;
> +  if (vect_match_call_complex_mla (root, 0) != MULT_MULT
> +      || vect_match_call_complex_mla (root, 1, &muls) != MULT_MULT)
> +    return false;
> +
> +  /* Now operand2+4 may lead to another expression.  */
> +  auto_vec<slp_tree> left_op, right_op;
> +  left_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
> +  right_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
> +
> +  bool neg_first;
> +  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
> +
> +  if (!is_neg)
> +    {
> +      /* A multiplication needs to multiply agains the real pair, otherwise
> +	 the pattern matches that of FMS.   */
> +      if (!vect_validate_multiplication (perm_cache, left_op, 0))
> +	return false;
> +      this->m_ifn = IFN_COMPLEX_MUL;
> +      this->m_ops.safe_splice (left_op);
> +      this->m_ops.safe_splice (right_op);
> +    }
> +  else if (is_neg)
> +    {
> +      bool conj_first_operand;
> +      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
> +					 neg_first, &conj_first_operand))
> +	return false;
> +
> +      this->m_ifn = IFN_COMPLEX_MUL_CONJ;
> +      /* Check if the conjucate is on the first or second parameter.  */
> +      if (!conj_first_operand)
> +	{
> +	  this->m_ops.safe_push (left_op[1]);
> +	  this->m_ops.safe_push (left_op[0]);
> +	}
> +      else
> +	{
> +	  this->m_ops.safe_push (left_op[0]);
> +	  this->m_ops.safe_push (left_op[1]);
> +	}
> +      this->m_ops.safe_push (right_op[1]);
> +      this->m_ops.safe_push (right_op[0]);
> +    }
> +
> +  vect_build_perm_groups (perm_cache, &this->m_blocks[0], this->m_ops);
> +  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> +
> +  return true;
> +}
> +
> +bool
> +complex_mul_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache)
> +{
> +  complex_operation_t op
> +    = vect_detect_pair_op (*this->m_node);
> +  return matches (op, perm_cache, this->m_ops);
> +}
> +
> +
> +/* Validates to see if the Complex MUL that we have matched is valid.  This is
> +   done through a combination of making nodes linear and combining nodes.  */
> +
> +bool
> +complex_mul_pattern::validate_p (slp_tree_to_load_perm_map_t *perm_cache)
> +{
> +  slp_tree node;
> +  unsigned ix;
> +  hash_set<slp_tree> cache;
> +  auto_vec<slp_tree> workset;
> +  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
> +    {
> +      auto_vec<slp_tree> nodes;
> +      nodes.create (this->m_num_args);
> +      slp_tree tmp = NULL;
> +      auto_vec<slp_tree> new_nodes;
> +
> +      unsigned i;
> +      for (i = 0; i < this->m_num_args; i++)
> +	{
> +	  unsigned index = (ix * this->m_num_args) + i;
> +	  map_t map = this->m_blocks[index];
> +	  slp_tree vnode = NULL;
> +	  bool needs_linear = map.a == map.b;
> +	  tmp = this->m_ops[index];
> +	  cache.add (tmp);
> +	  if (needs_linear
> +	      && vect_slp_make_linear (perm_cache, node, tmp, &vnode))
> +	    {
> +	      nodes.quick_push (vnode);
> +	      if (vnode != tmp)
> +		new_nodes.safe_push (vnode);
> +	    }
> +	  else if (!needs_linear
> +		   && vect_slp_make_combine_linear (perm_cache, node,
> +						    this->m_ops, map, &vnode))
> +	    {
> +	      nodes.quick_push (vnode);
> +	      if (vnode != tmp)
> +		new_nodes.safe_push (vnode);
> +	    }
> +	  else
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_NOTE, vect_location,
> +				"stmts could not be made %s %p\n",
> +				needs_linear ? "linear" : "linear/combined",
> +				tmp);
> +	      FOR_EACH_VEC_ELT (new_nodes, i, tmp)
> +		delete tmp;
> +
> +	      if (!m_inplace)
> +		FOR_EACH_VEC_ELT (workset, i, tmp)
> +		  delete tmp;
> +
> +	      nodes.release();
> +	      return false;
> +	    }
> +
> +	  vect_mark_stmts_as_in_pattern (&cache, node);
> +	}
> +
> +      if (m_inplace)
> +	{
> +	  workset.safe_splice (nodes);
> +	}
> +      else
> +	{
> +	  slp_tree new_node
> +	    = vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node),
> +					SLP_TREE_CHILDREN (node).length ());
> +	  SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node);
> +	  SLP_TREE_LANE_PERMUTATION (new_node)
> +	    = SLP_TREE_LANE_PERMUTATION (node);
> +	  SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node);
> +	  SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT (node);
> +	  SLP_TREE_REPRESENTATIVE (new_node) = SLP_TREE_REPRESENTATIVE (node);
> +	  SLP_TREE_CHILDREN (new_node).safe_splice (nodes);
> +	  SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node);
> +
> +	  workset.safe_push (new_node);
> +	}
> +    }
> +
> +  if (m_inplace)
> +    {
> +      SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> +      SLP_TREE_CHILDREN (*this->m_node).safe_splice (workset);
> +    }
> +  else
> +    {
> +      FOR_EACH_VEC_ELT (workset, ix, node)
> +	SLP_TREE_CHILDREN (*this->m_node)[ix] = node;
> +    }
> +
> +  return true;
> +}
> +
> +/*******************************************************************************
> + * complex_fma_pattern class
> + ******************************************************************************/
> +
> +class complex_fma_pattern : public complex_mul_pattern
> +{
> +  protected:
> +    /* Temporary nodes cache as scope of the class so we don't
> +       have to worry about freeing it. */
> +    auto_vec<slp_tree> m_ncache;
> +    /* List of nodes to mark as no longer being a mul.  The first
> +       one is chosen and the last one is dangling.  */
> +    auto_vec<slp_tree> m_mul_nodes;
> +    complex_fma_pattern (slp_tree *node)
> +      : complex_mul_pattern (node)
> +    {
> +      this->m_num_args = 3;
> +    }
> +
> +  public:
> +    static vect_pattern* create (slp_tree *node)
> +    {
> +       return new complex_fma_pattern (node);
> +    }
> +
> +    const char* get_name ()
> +    {
> +      return "Complex FMA";
> +    }
> +
> +    bool matches (slp_tree_to_load_perm_map_t *);
> +    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> +		  vec<slp_tree> ops);
> +    bool validate_p (slp_tree_to_load_perm_map_t *);
> +};
> +
> +/* Helper function to "reset" a previously matched node and undo the changes
> +   made enough so that the node is treated as an irrelevant node.  */
> +
> +static inline void
> +vect_slp_reset_pattern (slp_tree node)
> +{
> +  stmt_vec_info stmt_info = vect_orig_stmt (SLP_TREE_REPRESENTATIVE (node));
> +  STMT_VINFO_IN_PATTERN_P (stmt_info) = false;
> +  STMT_SLP_TYPE (stmt_info) = pure_slp;
> +  SLP_TREE_REPRESENTATIVE (node) = stmt_info;
> +}
> +
> +/* Pattern matcher for trying to match complex multiply and accumulate
> +   and multiply and subtract patterns in SLP tree.
> +   If the operation matches then IFN is set to the operation it matched and
> +   the arguments to the two replacement statements are put in m_ops.
> +
> +   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> +
> +   This function matches the patterns shaped as:
> +
> +   double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
> +   double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
> +
> +   c[i] = c[i] - ax;
> +   c[i+1] = c[i+1] + bx;
> +
> +   If a match occurred then TRUE is returned, else FALSE.  The match is
> +   performed after COMPLEX_MUL which would have done the majority of the work.
> +   This function merely matches an ADD with a COMPLEX_MUL IFN.  The initial
> +   match is expected to be in OP1 and the initial match operands in args0.  */
> +
> +bool
> +complex_fma_pattern::matches (complex_operation_t op1,
> +			      slp_tree_to_load_perm_map_t * /* perm_cache */,
> +			      vec<slp_tree> /* args0 */)
> +{
> +  this->m_ifn = IFN_LAST;
> +
> +  /* Find the two components.  We match Complex MUL first which reduces the
> +     amount of work this pattern has to do.  After that we just match the
> +     head node and we're done.:
> +
> +     * FMA: + +.
> +
> +     We need to ignore the two_operands nodes that may also match.
> +     For that we can check if they have any scalar statements and also
> +     check that it's not a permute node as we're looking for a normal
> +     PLUS_EXPR operation.  */
> +  if (op1 != CMPLX_NONE)
> +    return false;
> +
> +  /* Find the two components.  We match Complex MUL first which reduces the
> +     amount of work this pattern has to do.  After that we just match the
> +     head node and we're done.:
> +
> +   * FMA: + + on a non-two_operands node.  */
> +  if (SLP_TREE_LANE_PERMUTATION (*this->m_node).exists ()
> +      || !SLP_TREE_SCALAR_STMTS (*this->m_node).exists ()
> +      || !vect_match_expression_p (*this->m_node, PLUS_EXPR))
> +    return false;
> +
> +  slp_tree node = SLP_TREE_CHILDREN (*this->m_node)[1];
> +
> +  if (vect_match_two_call_p (node, IFN_COMPLEX_MUL))
> +    this->m_ifn = IFN_COMPLEX_FMA;
> +  else if (vect_match_two_call_p (node, IFN_COMPLEX_MUL_CONJ))
> +    this->m_ifn = IFN_COMPLEX_FMA_CONJ;
> +  else
> +    return false;
> +
> +  this->m_mul_nodes.safe_splice (SLP_TREE_CHILDREN (node));
> +  node = SLP_TREE_CHILDREN (node)[0];
> +  this->m_ops.create (this->m_num_args);
> +
> +  if (this->m_ifn == IFN_COMPLEX_FMA)
> +    {
> +      this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]);
> +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]);
> +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]);
> +    }
> +  else
> +    {
> +      this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]);
> +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]);
> +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]);
> +    }
> +
> +  this->m_ncache.safe_push (*this->m_node);
> +  this->m_nodes = &this->m_ncache;
> +
> +  return true;
> +}
> +
> +bool
> +complex_fma_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache)
> +{
> +  auto_vec<slp_tree> args0;
> +  complex_operation_t op
> +    = vect_detect_pair_op (*this->m_node, true, &args0);
> +  return matches (op, perm_cache, args0);
> +}
> +
> +bool
> +complex_fma_pattern::validate_p (slp_tree_to_load_perm_map_t *perm_cache)
> +{
> +  /* FMA's top level node is + + and so is not two_operator,
> +     we only need one child.  */
> +  slp_tree node;
> +  auto_vec<slp_tree> nodes;
> +  auto_vec<slp_tree> new_nodes;
> +  nodes.create (this->m_num_args);
> +  hash_set<slp_tree> cache;
> +  unsigned ix;
> +  FOR_EACH_VEC_ELT (this->m_ops, ix, node)
> +    {
> +      slp_tree vnode = NULL;
> +      if (vect_slp_make_linear (perm_cache, *this->m_node, node, &vnode))
> +	{
> +	  nodes.quick_push (vnode);
> +	  if (vnode != node)
> +	    new_nodes.safe_push (vnode);
> +	  if (SLP_TREE_CHILDREN (node).length () == 2)
> +	    {
> +	      cache.add(SLP_TREE_CHILDREN (node)[0]);
> +	      cache.add(SLP_TREE_CHILDREN (node)[1]);
> +	    }
> +	  else
> +	    cache.add (node);
> +	}
> +      else
> +	{
> +	  FOR_EACH_VEC_ELT (new_nodes, ix, node)
> +	    delete node;
> +
> +	  nodes.release();
> +	  return false;
> +	}
> +    }
> +
> +  SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> +  SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes);
> +
> +  /* Ignore the part of the tree that becomes irrelevant for FMA.  */
> +  vect_slp_reset_pattern (this->m_mul_nodes[0]);
> +  vect_slp_reset_pattern (this->m_mul_nodes[1]);
> +  vect_mark_stmts_as_in_pattern (&cache, this->m_mul_nodes[1]);
> +
> +  return true;
> +}
> +
> +
> +/*******************************************************************************
> + * complex_fms_pattern class
> + ******************************************************************************/
> +
> +class complex_fms_pattern : public complex_mul_pattern
> +{
> +  protected:
> +    complex_fms_pattern (slp_tree *node)
> +      : complex_mul_pattern (node)
> +    {
> +      this->m_num_args = 3;
> +    }
> +
> +  public:
> +    static vect_pattern* create (slp_tree *node)
> +    {
> +       return new complex_fms_pattern (node);
> +    }
> +
> +    const char* get_name ()
> +    {
> +      return "Complex FMS";
> +    }
> +
> +    bool matches (slp_tree_to_load_perm_map_t *);
> +    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> +		  vec<slp_tree>);
> +};
> +
> +/* Pattern matcher for trying to match complex multiply and accumulate
> +   and multiply and subtract patterns in SLP tree.
> +   If the operation matches then IFN is set to the operation it matched and
> +   the arguments to the two replacement statements are put in m_ops.
> +
> +   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> +
> +   This function matches the patterns shaped as:
> +
> +   double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
> +   double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
> +
> +   c[i] = c[i] - ax;
> +   c[i+1] = c[i+1] + bx;
> +
> +   If a match occurred then TRUE is returned, else FALSE.  The initial match is
> +   expected to be in OP1 and the initial match operands in args0.  */
> +bool
> +complex_fms_pattern::matches (complex_operation_t op1,
> +			      slp_tree_to_load_perm_map_t *perm_cache,
> +			      vec<slp_tree> args0)
> +{
> +  this->m_ifn = IFN_LAST;
> +
> +  /* Find the two components.  We match Complex MUL first which reduces the
> +     amount of work this pattern has to do.  After that we just match the
> +     head node and we're done.:
> +
> +     * FMS: - +.  */
> +  slp_tree child = NULL;
> +
> +  /* We need to ignore the two_operands nodes that may also match,
> +     for that we can check if they have any scalar statements and also
> +     check that it's not a permute node as we're looking for a normal
> +     PLUS_EXPR operation.  */
> +  if (op1 != PLUS_MINUS)
> +    return false;
> +
> +  child = SLP_TREE_CHILDREN (args0[1])[1];
> +  if (vect_detect_pair_op (child) != MINUS_PLUS)
> +    return false;
> +
> +  /* First two nodes must be a multiply.  */
> +  auto_vec<slp_tree> muls;
> +  if (vect_match_call_complex_mla (child, 0) != MULT_MULT
> +      || vect_match_call_complex_mla (child, 1, &muls) != MULT_MULT)
> +    return false;
> +
> +  /* Now operand2+4 may lead to another expression.  */
> +  auto_vec<slp_tree> left_op, right_op;
> +  left_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
> +  right_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
> +
> +  bool is_neg = vect_normalize_conj_loc (right_op);
> +
> +  this->m_ops.create (6);
> +  child = SLP_TREE_CHILDREN (args0[0])[0];
> +  this->m_ops.quick_push (child);
> +  if (!is_neg)
> +    {
> +      this->m_ifn = IFN_COMPLEX_FMS;
> +      this->m_ops.quick_push (left_op[1]);
> +      this->m_ops.quick_push (left_op[0]);
> +    }
> +  else if (is_neg)
> +    {
> +      bool conj_first_operand;
> +      if (!vect_validate_multiplication (perm_cache, left_op, right_op, false,
> +					 &conj_first_operand))
> +	return false;
> +
> +      this->m_ifn = IFN_COMPLEX_FMS_CONJ;
> +      /* Check if the conjucate is on the first or second parameter.  */
> +      if (!conj_first_operand)
> +	{
> +	  this->m_ops.quick_push (left_op[1]);
> +	  this->m_ops.quick_push (left_op[0]);
> +	}
> +      else
> +	{
> +	  this->m_ops.quick_push (left_op[0]);
> +	  this->m_ops.quick_push (left_op[1]);
> +	}
> +    }
> +
> +  this->m_ops.quick_push (child);
> +  this->m_ops.quick_push (right_op[1]);
> +  this->m_ops.quick_push (right_op[0]);
> +
> +  vect_build_perm_groups (perm_cache, &this->m_blocks[0], this->m_ops);
> +  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> +  return true;
> +}
> +
> +bool
> +complex_fms_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache)
> +{
> +  auto_vec<slp_tree> args0;
> +  complex_operation_t op
> +    = vect_detect_pair_op (*this->m_node, true, &args0);
> +  return matches (op, perm_cache, args0);
> +}
> +
> +
> +/*******************************************************************************
> + * complex_operations_pattern class
> + ******************************************************************************/
> +
> +/* This function combines all the existing pattern matchers above into one class
> +   that shares the functionality between them.  The initial match is shared
> +   between all complex operations.  */
> +
> +class complex_operations_pattern : public complex_pattern
> +{
> +  protected:
> +    complex_operations_pattern (slp_tree *node)
> +      : complex_pattern (node)
> +    { }
> +
> +  gcall *build ();
> +
> +  /* The current "active" pattern.  */
> +  vect_pattern *m_patt = NULL;
> +
> +  public:
> +    using complex_pattern::matches;
> +
> +    bool matches (slp_tree_to_load_perm_map_t *perm_cache);
> +    const char* get_name ();
> +    bool validate_p (slp_tree_to_load_perm_map_t *perm_cache);
> +    bool is_optab_supported_p (tree, optimization_type);
> +    internal_fn get_ifn ();
> +    gcall *build (vec_info *);
> +    void reset (slp_tree *);
> +    ~complex_operations_pattern ();
> +
> +    static vect_pattern* create (slp_tree *node)
> +    {
> +       return new complex_operations_pattern (node);
> +    }
> +};
> +
> +
> +/* Clean up the pattern if one existed.  */
> +
> +complex_operations_pattern::~complex_operations_pattern ()
> +{
> +  if (this->m_patt)
> +    delete this->m_patt;
> +  this->m_patt = NULL;
> +}
> +
> +/* Match but do not perform any additional operations on the SLP tree.  If the
> +   match is successful then that pattern is made "active" in the class.  */
> +
> +bool
> +complex_operations_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache)
> +{
> +  complex_operation_t op
> +    = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
> +
> +  /* Check which pattern this may be.  Match longest pattern first.  */
> +  this->m_patt = complex_fma_pattern::create (this->m_node);
> +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> +    return true;

Hmm, all of these allocations like also

+  for (unsigned x = 0; x < num__slp_patterns; x++)
+    {
+      vect_pattern *pattern = slp_patterns[x] (ref_node);
+      found_p = pattern->recognize (perm_cache, vinfo);
+      delete pattern;
+      found_rec_p = found_p | found_rec_p;
+    }

in the main driver are really due to the C++-ish layout of all this :/
I don't like that very much.  It also highlights when reading
the patch that it's hard to get the difference betwee
recognize (), match () and validate () where all but the first is
kind-of internal implementation details of the complex matchers.

So we could eventually get by with the main driver doing

+  for (unsigned x = 0; x < num__slp_patterns; x++)
+    {
+      vect_pattern *pattern = slp_patterns[x] (ref_node);
       if (pattern)
         found_rec_p = true;
     }

thus merge ->recognize with the static allocator and not do
any heap allocation when nothing is matched?  Then ...

> +  if (op == CMPLX_NONE)
> +    return false;
> +
> +  delete this->m_patt;
> +
> +  this->m_patt = complex_fms_pattern::create (this->m_node);

... this could use static typing and an automatic instance?

The myth that allocation & deallocation is cheap is really that,
a myth.  We're doing 6 allocations & deallocations at this
very toplevel for each SLP node (which means eventually for
each statement in a function).


So why for example would one need to override recognize ()?
I'd argue for nailing it in place and to remove validate_p ()
(should be done by matches?).  At least validate_p for
the complex patterns seem to build SLP nodes which mean
they actually build()?


I think overall the patch looks sensible but it's still quite
hard to follow the flow of an actual pattern match, possibly
because of the strange factoring.

At this point to speed up things can you split the patch
so there's only the very simplest (complex_add for example)
matcher plus requirements?  Then we can push that which
would allow me to actually play with it more easily and
even try out changes?

I'd say split out the optimize_load_redistribution part of
the first patch and merge the complex_add support into
its remains so we have a first patch that does something?

Thanks and sorry for being dense here ... :/  There's pieces
I think we should do better but I still can't lay out a
solution and I guess pointing fingers at the individual pieces
like above will lead to very slow progress only.

Richard.

> +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> +    return true;
> +
> +  delete this->m_patt;
> +
> +  this->m_patt = complex_mul_pattern::create (this->m_node);
> +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> +    return true;
> +
> +  delete this->m_patt;
> +
> +  this->m_patt = complex_add_pattern::create (this->m_node);
> +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> +    return true;
> +
> +  delete this->m_patt;
> +
> +  this->m_patt = NULL;
> +  return false;
> +}
> +
> +/* Friendly name of the operation the pattern matches.  */
> +
> +const char*
> +complex_operations_pattern::get_name ()
> +{
> +  if (!this->m_patt)
> +    return "Complex Operations";
> +
> +  return this->m_patt->get_name ();
> +}
> +
> +/* Only perform the pattern creation part of the matcher.  This creates and
> +   returns the new pattern statement and encapsulates the currently active
> +   pattern.  */
> +
> +gcall *
> +complex_operations_pattern::build (vec_info *vinfo)
> +{
> +  if (!this->m_patt)
> +    return NULL;
> +
> +  return this->m_patt->build (vinfo);
> +}
> +
> +/* Validate the operation the currently active pattern detected.  */
> +
> +bool
> +complex_operations_pattern::validate_p (slp_tree_to_load_perm_map_t *perm_cache)
> +{
> +  if (!this->m_patt)
> +    return false;
> +
> +  return this->m_patt->validate_p (perm_cache);
> +}
> +
> +/* Check if the obtab of the currently active matches is supported.  */
> +
> +bool
> +complex_operations_pattern::is_optab_supported_p (tree vectype,
> +						  optimization_type opt_type)
> +{
> +  if (!vectype || !this->m_patt)
> +    return false;
> +
> +  return this->m_patt->is_optab_supported_p (vectype, opt_type);
> +}
> +
> +/* Return the matched IFN from the current active matcher if any.  */
> +
> +internal_fn
> +complex_operations_pattern::get_ifn ()
> +{
> +  if (!this->m_patt)
> +    return IFN_LAST;
> +
> +  return this->m_patt->get_ifn ();
> +}
> +
> +/*******************************************************************************
> + * Pattern matching definitions
> + ******************************************************************************/
> +
> +#define SLP_PATTERN(x) &x::create
> +vect_pattern_decl_t slp_patterns[]
> +{
> +  /* For least amount of back-tracking and more efficient matching
> +     order patterns from the largest to the smallest.  Especially if they
> +     overlap in what they can detect.  */
> +
> +  /* FMA overlaps with MUL but is the longer sequence.  Because we're in post
> +     order traversal we can't match FMA if included in
> +     complex_operations_pattern so must be checked on it's own.  */
> +  SLP_PATTERN (complex_operations_pattern),
> +};
> +#undef SLP_PATTERN
> +
> +/* Set the number of SLP pattern matchers available.  */
> +size_t num__slp_patterns = sizeof(slp_patterns)/sizeof(vect_pattern_decl_t);
> 
> 
>
Tamar Christina Nov. 16, 2020, 3:54 p.m. UTC | #2
Hi Richi, thanks for the review!

Just a quick comment on one of the questions asked:

> -----Original Message-----
> From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> Behalf Of Richard Biener
> Sent: Monday, November 16, 2020 3:12 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz;
> hongtao.liu@intel.com
> Subject: Re: [PATCH 2/2] middle-end : Add support for complex pattern
> detection for Add, Mul, FMA and FMS
> 
> On Sat, 14 Nov 2020, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch series adds support for SLP vectorization of complex instructions
> [1].
> >
> > These instructions exist only in their vector forms and require you to
> recognize
> > two statements in parallel.  Complex operations usually require a permute
> due to
> > the fact that the real and imaginary numbers are stored intermixed but
> these vector
> > instructions expect this and no longer need the compiler to generate a
> permute.
> >
> > The pass is able to insert additional permutes inside the SLP tree in order to
> > change the data flow such that the new instructions can be used.  These
> permutes
> > are then optimized out later during optimize_slp.
> >
> > These instructions also support rotations along the Argand plane, as such
> the operands
> > have to be re-ordered to coincide with their load group.
> >
> > For now, this patch only adds support for:
> >
> >   * Complex Addition with rotation of 90 and 270.
> >
> >   Addition with rotation of the second argument around the Argand plane.
> >     Supported rotations are 90 and 180.
> >
> >     c = a + (b * I) and c = a + (b * I * I * I)
> >
> >   * Complex Multiplication and Multiplication where one operand is
> conjucated.
> >
> >   Complex multiplication and Conjucate Complex multiplication of the
> second
> >      parameter.
> >
> >     c = a * b and c = a * conj (b)
> >
> >   * Complex FMA and FMA where one operand is conjucated.
> >   * Complex FMS and FMS where one operand is conjucated.
> >
> >   Complex FMLA, Conjucate FMLA of the second parameter and FMLS.
> >
> >     c += a * b, c += a * conj (b), c -= a * b and c -= a * conj (b)
> >
> >   For the conjucate cases it supports under fast-math that the operands
> that is
> >   being conjucated be flipped by flipping the arguments to the optab.  This
> >   allows it to support c = conj (a) * b and c += conj (a) * b.
> >
> >   where a, b and c are complex numbers.
> >
> > Complex dot-product is not currently supported in this patch set as
> build_slp fails
> > for it.  This will be provided as a future patch.
> >
> > These are supported for both integer and floating point and as such these
> don't look
> > for real or imaginary pairs but instead rely on the early lowering of complex
> > numbers by GCC and canonization of the operations such that it just
> recognizes any
> > instruction sequence matching the operations requested.
> >
> > To be safe when the it is not sure it can support the operation or if it finds
> > something it does not understand it backs off.  This is done by looking at
> the
> > order of the loads below the nodes.  Anything that changes the order like a
> > VEC_PERM is taken into account.  The order of the loads define which
> operation
> > is being done.   For instance a complex add with conjucate always has the
> second
> > loads in non-contiguous order.   Because this information is queried often it
> is
> > heavily cached and because we never modify nodes in place we can cache
> the
> > operation through all SLP instances.
> >
> > When SLP failes (e.g. due to costing) we dissolve the changes and restore
> the
> > previous relevancies of the stmts that were in a pattern before.
> >
> > [1] https://developer.arm.com/documentation/ddi0487/fc/
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? (working on moving tests to middend IFN tests instead).
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* tree-vect-slp-patterns.c: New file.
> > 	* Makefile.in: Use it.
> > 	* doc/passes.texi: Document it.
> > 	* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270,
> > 	COMPLEX_MUL, COMPLEX_MUL_CONJ, COMPLEX_FMA,
> COMPLEX_FMA_CONJ):
> > 	COMPLEX_FMS, COMPLEX_FMS_CONJ): New.
> > 	* optabs.def (cadd90_optab, cadd270_optab, cmul_optab,
> cmul_conj_optab,
> > 	cmla_optab, cmla_conj_optab, cmls_optab, cmls_conj_optab): New.
> > 	* doc/md.texi: Document them
> >
> > -- inline copy of patch --
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index
> 273654cfa2525099ed0d9adc2f9085c8408b096f..4b556bdbbc2f989d894ef2dbe
> 894318858eaa7aa 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1646,6 +1646,7 @@ OBJS = \
> >  	tree-vect-loop.o \
> >  	tree-vect-loop-manip.o \
> >  	tree-vect-slp.o \
> > +	tree-vect-slp-patterns.o \
> >  	tree-vectorizer.o \
> >  	tree-vector-builder.o \
> >  	tree-vrp.o \
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index
> 813875b973bc73920f1195c4897ce3491ca713c2..68792e4c3eeca2bce0112fc28e
> ba38700e66dc00 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -1493,7 +1493,7 @@ can be described by a series of letters for each
> operand.  The overall
> >  constraint for an operand is made from the letters for this operand
> >  from the first alternative, a comma, the letters for this operand from
> >  the second alternative, a comma, and so on until the last alternative.
> > -All operands for a single instruction must have the same number of
> > +All operands for a single instruction must have the same number of
> >  alternatives.
> >  @ifset INTERNALS
> >  Here is how it is done for fullword logical-or on the 68000:
> > @@ -1558,17 +1558,17 @@ the first, 1 for the second alternative, etc.).
> @xref{Output Statement}.
> >  @end ifset
> >  @ifclear INTERNALS
> >
> > -So the first alternative for the 68000's logical-or could be written as
> > -@code{"+m" (output) : "ir" (input)}.  The second could be @code{"+r"
> > -(output): "irm" (input)}.  However, the fact that two memory locations
> > -cannot be used in a single instruction prevents simply using @code{"+rm"
> > -(output) : "irm" (input)}.  Using multi-alternatives, this might be
> > +So the first alternative for the 68000's logical-or could be written as
> > +@code{"+m" (output) : "ir" (input)}.  The second could be @code{"+r"
> > +(output): "irm" (input)}.  However, the fact that two memory locations
> > +cannot be used in a single instruction prevents simply using @code{"+rm"
> > +(output) : "irm" (input)}.  Using multi-alternatives, this might be
> >  written as @code{"+m,r" (output) : "ir,irm" (input)}.  This describes
> > -all the available alternatives to the compiler, allowing it to choose
> > +all the available alternatives to the compiler, allowing it to choose
> >  the most efficient one for the current conditions.
> >
> > -There is no way within the template to determine which alternative was
> > -chosen.  However you may be able to wrap your @code{asm} statements
> with
> > +There is no way within the template to determine which alternative was
> > +chosen.  However you may be able to wrap your @code{asm} statements
> with
> >  builtins such as @code{__builtin_constant_p} to achieve the desired
> results.
> >  @end ifclear
> >
> > @@ -3062,7 +3062,7 @@ instruction taking only the upper 16-bits of a
> >  16-bits being 0.
> >
> >  @item L
> > -Integer that is valid as an immediate operand for a
> > +Integer that is valid as an immediate operand for a
> >  shift instruction. Range 0 to 31.
> >
> >  @item M
> > @@ -3076,7 +3076,7 @@ Integer that is valid as an immediate operand for
> >  a custom instruction opcode. Range 0 to 255.
> >
> >  @item P
> > -An immediate operand for R2 andchi/andci instructions.
> > +An immediate operand for R2 andchi/andci instructions.
> >
> >  @item S
> >  Matches immediates which are addresses in the small
> > @@ -6132,6 +6132,97 @@ floating-point mode.
> >
> >  This pattern is not allowed to @code{FAIL}.
> >
> > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern
> > +@item @samp{cadd@var{m}@var{n}3}
> > +Perform a vector addition of complex numbers in operand 1 with operand
> 2
> > +rotated by @var{m} degrees around the argand plane and storing the
> result in
> > +operand 0.  The instruction must perform the operation on data that in
> GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> 
> So I'm not sure I'm happy with defining this in terms of 'complex
> numbers'.  At least we nowhere specify how complex numbers are laid out in
> vector registers.  For V4DF it could be any of {real0,imag0,real1,imag1},
> {real0,real1, imag0,imag1}, {imag0,real0,imag1,real1} or
> {imag0,imag1, real0,real1}.  While it's easy to give names to semantics
> this way I'd prefer to define this in terms of even/odd lanes of a
> vector.  Maybe it can refer to the aarch64 example of complex number
> instructions implementing such scheme.
> 
> So at _least_ we have to define the layout of complex components
> in the vector modes (preferably in a target independent way), say
> even lanes are real parts and odd lanes are imaginary parts?
> 
> > +
> > +The operation is only supported for vector modes @var{n} and with
> > +rotations @var{m} of 90 or 270.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmla@var{m}4} instruction pattern
> > +@item @samp{cmla@var{m}4}
> > +Perform a vector floating point multiply and accumulate of complex
> numbers
> > +in operand 0, operand 1 and operand 2.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmla_conj@var{m}4} instruction pattern
> > +@item @samp{cmla_conj@var{m}4}
> > +Perform a vector floating point multiply and accumulate of complex
> numbers
> > +in operand 0, operand 1 and the conjucate of operand 2.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmls@var{m}4} instruction pattern
> > +@item @samp{cmls@var{m}4}
> > +Perform a vector floating point multiply and subtract of complex numbers
> > +in operand 0, operand 1 and operand 2.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmls_conj@var{m}4} instruction pattern
> > +@item @samp{cmls_conj@var{m}4}
> > +Perform a vector floating point multiply and subtract of complex numbers
> > +in operand 0, operand 1 and the conjucate of operand 2.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmul@var{m}4} instruction pattern
> > +@item @samp{cmul@var{m}4}
> > +Perform a vector floating point multiplication of complex numbers in
> operand 0
> > +and operand 1.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> > +@cindex @code{cmul_conj@var{m}4} instruction pattern
> > +@item @samp{cmul_conj@var{m}4}
> > +Perform a vector floating point multiplication of complex numbers in
> operand 0
> > +and the conjucate of operand 1.
> > +
> > +The instruction must perform the operation on data that in GCC lane
> > +order where the 0th lane holds the real part and the the imaginary part in
> lane
> > +1.
> > +
> > +The operation is only supported for vector modes @var{m}.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> >  @cindex @code{ffs@var{m}2} instruction pattern
> >  @item @samp{ffs@var{m}2}
> >  Store into operand 0 one plus the index of the least significant 1-bit
> > @@ -7409,7 +7500,7 @@ If this pattern is not defined, then a
> @code{memory_barrier} pattern
> >  will be emitted, followed by a store of the value to the memory operand.
> >
> >  @cindex @code{atomic_compare_and_swap@var{mode}} instruction
> pattern
> > -@item @samp{atomic_compare_and_swap@var{mode}}
> > +@item @samp{atomic_compare_and_swap@var{mode}}
> >  This pattern, if defined, emits code for an atomic compare-and-swap
> >  operation with memory model semantics.  Operand 2 is the memory on
> which
> >  the atomic operation is performed.  Operand 0 is an output operand which
> > @@ -7424,7 +7515,7 @@ if the operation fails.
> >
> >  If memory referred to in operand 2 contains the value in operand 3, then
> >  operand 4 is stored in memory pointed to by operand 2 and fencing based
> on
> > -the memory model in operand 6 is issued.
> > +the memory model in operand 6 is issued.
> >
> >  If memory referred to in operand 2 does not contain the value in operand
> 3,
> >  then fencing based on the memory model in operand 7 is issued.
> > @@ -7500,9 +7591,9 @@ none of these are available a compare-and-swap
> loop will be used.
> >  @itemx @samp{atomic_fetch_or@var{mode}},
> @samp{atomic_fetch_and@var{mode}}
> >  @itemx @samp{atomic_fetch_xor@var{mode}},
> @samp{atomic_fetch_nand@var{mode}}
> >  These patterns emit code for an atomic operation on memory with
> memory
> > -model semantics, and return the original value. Operand 0 is an output
> > -operand which contains the value of the memory location before the
> > -operation was performed.  Operand 1 is the memory on which the atomic
> > +model semantics, and return the original value. Operand 0 is an output
> > +operand which contains the value of the memory location before the
> > +operation was performed.  Operand 1 is the memory on which the atomic
> >  operation is performed.  Operand 2 is the second operand to the binary
> >  operator.  Operand 3 is the memory model to be used by the operation.
> >
> > @@ -7859,7 +7950,7 @@ simplified) from the PDP-11 target:
> >          (plus:HI (match_dup 0)
> >                (const_int -1)))]
> >    ""
> > -
> > +
> >    @{
> >      if (which_alternative == 0)
> >        return "sob %0,%l1";
> > diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> > index
> a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99
> a23386891a7b0c1 100644
> > --- a/gcc/doc/passes.texi
> > +++ b/gcc/doc/passes.texi
> > @@ -709,7 +709,8 @@ loop.
> >  The pass is implemented in @file{tree-vectorizer.c} (the main driver),
> >  @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific
> parts
> >  and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
> > -functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}.
> > +functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and
> > +@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher.
> >  Analysis of data references is in @file{tree-data-ref.c}.
> >
> >  SLP Vectorization.  This pass performs vectorization of straight-line code.
> The
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index
> 310d37aa53819791b5df1683afca831f08e5892a..acb7d9f3bdc757437d5492a652
> 144ba31c2ef702 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -277,12 +277,21 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST,
> scalb, binary)
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> >  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90,
> binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST,
> cadd270, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST,
> cmul_conj, binary)
> > +
> >
> >  /* FP scales.  */
> >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> >
> >  /* Ternary math functions.  */
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST,
> cmla_conj, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST,
> cmls_conj, ternary)
> >
> >  /* Unary integer ops.  */
> >  DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb,
> unary)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index
> 78409aa14537d259bf90277751aac00d452a0d3f..19db9c00896cd08adfd20a0166
> 9990bbbebd79f1 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -290,6 +290,14 @@ OPTAB_D (atan_optab, "atan$a2")
> >  OPTAB_D (atanh_optab, "atanh$a2")
> >  OPTAB_D (copysign_optab, "copysign$F$a3")
> >  OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > +OPTAB_D (cadd90_optab, "cadd90$a3")
> > +OPTAB_D (cadd270_optab, "cadd270$a3")
> > +OPTAB_D (cmul_optab, "cmul$a3")
> > +OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
> > +OPTAB_D (cmla_optab, "cmla$a4")
> > +OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
> > +OPTAB_D (cmls_optab, "cmls$a4")
> > +OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
> >  OPTAB_D (cos_optab, "cos$a2")
> >  OPTAB_D (cosh_optab, "cosh$a2")
> >  OPTAB_D (exp10_optab, "exp10$a2")
> > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..7a2c64aeb9f212cd3ee18adf11
> 0e6ac6696249fd
> > --- /dev/null
> > +++ b/gcc/tree-vect-slp-patterns.c
> > @@ -0,0 +1,1844 @@
> > +/* SLP - Pattern matcher on SLP trees
> > +   Copyright (C) 2020 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it under
> > +the terms of the GNU General Public License as published by the Free
> > +Software Foundation; either version 3, or (at your option) any later
> > +version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +<http://www.gnu.org/licenses/>.  */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "target.h"
> > +#include "rtl.h"
> > +#include "tree.h"
> > +#include "gimple.h"
> > +#include "tree-pass.h"
> > +#include "ssa.h"
> > +#include "optabs-tree.h"
> > +#include "insn-config.h"
> > +#include "recog.h"		/* FIXME: for insn_data */
> > +#include "fold-const.h"
> > +#include "stor-layout.h"
> > +#include "gimple-iterator.h"
> > +#include "cfgloop.h"
> > +#include "tree-vectorizer.h"
> > +#include "langhooks.h"
> > +#include "gimple-walk.h"
> > +#include "dbgcnt.h"
> > +#include "tree-vector-builder.h"
> > +#include "vec-perm-indices.h"
> > +#include "gimple-fold.h"
> > +#include "internal-fn.h"
> > +
> > +/* SLP Pattern matching mechanism.
> > +
> > +  This extension to the SLP vectorizer allows one to transform the
> generated SLP
> > +  tree based on any pattern.  The difference between this and the normal
> vect
> > +  pattern matcher is that unlike the former, this matcher allows you to
> match
> > +  with instructions that do not belong to the same SSA dominator graph.
> > +
> > +  The only requirement that this pattern matcher has is that you are only
> > +  only allowed to either match an entire group or none.
> > +
> > +  The pattern matcher currently only allows you to perform replacements
> to
> > +  internal functions.
> > +
> > +  Once the patterns are matched it is one way, these cannot be undone.  It
> is
> > +  currently not supported to match patterns recursively.
> > +
> > +  To add a new pattern, implement the vect_pattern class and add the
> type to
> > +  slp_patterns.
> > +
> > +*/
> > +
> >
> +/*********************************************************
> **********************
> > + * vect_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +/* Default implementation of recognize that peforms matching, validation
> and
> > +   replacement of nodes but that can be overriden if required.  */
> > +
> > +bool
> > +vect_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> > +			 vec_info *vinfo)
> > +{
> > +  if (this->matches (perm_cache))
> > +    {
> > +      tree vectype = SLP_TREE_VECTYPE (*this->m_node);
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location,
> > +			 "Found %s pattern in SLP tree\n",
> > +			 this->get_name ());
> > +
> > +      if (this->is_optab_supported_p (vectype, OPTIMIZE_FOR_SPEED))
> > +	{
> > +	  if (dump_enabled_p ())
> > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > +			     "Target supports %s vectorization with
> mode %T\n",
> > +			     internal_fn_name (this->get_ifn ()), vectype);
> > +	}
> > +      else
> > +	{
> > +	  if (dump_enabled_p ())
> > +	    {
> > +	      if (!vectype)
> > +		dump_printf_loc (MSG_NOTE, vect_location,
> > +				 "Target does not support vector type
> for %T\n",
> > +				 SLP_TREE_DEF_TYPE (*this->m_node));
> > +	      else
> > +		dump_printf_loc (MSG_NOTE, vect_location,
> > +				 "Target does not support %s for vector type
> "
> > +				 "%T\n", internal_fn_name (this->get_ifn ()),
> > +				 vectype);
> > +	    }
> > +          return false;
> > +	}
> > +    }
> > +  else
> > +    return false;
> > +
> > +  if (!this->validate_p (perm_cache))
> > +    {
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location,
> > +			"transformation for %s not valid due to post "
> > +			"condition\n",
> > +			internal_fn_name (this->get_ifn ()));
> > +      return false;
> > +    }
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "Creating vec
> patterns\n");
> > +
> > +  gcall* call_stmt = this->build (vinfo);
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "\t %p stmt: %G",
> > +		     *this->m_node, call_stmt);
> > +
> > +  return true;
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * General helper types
> > +
> **********************************************************
> ********************/
> > +
> > +/* The COMPLEX_OPERATION enum denotes the possible pair of
> operations that can
> > +   be matched when looking for expressions that we are interested
> matching for
> > +   complex numbers addition and mla.  */
> > +
> > +typedef enum _complex_operation : unsigned {
> > +  PLUS_PLUS,
> > +  MINUS_PLUS,
> > +  PLUS_MINUS,
> > +  MULT_MULT,
> > +  CMPLX_NONE
> > +} complex_operation_t;
> > +
> > +/* A simple pair type used for ordering of combine operations.  */
> > +
> > +typedef struct map_ { int a; int b; } map_t;
> > +
> >
> +/*********************************************************
> **********************
> > + * General helper functions
> > +
> **********************************************************
> ********************/
> > +
> > +/* Helper function of linear_loads_p that checks to see if the load
> permutation
> > +   is sequential and in monotonically increasing order of loads with no gaps.
> > +*/
> > +
> > +static inline bool
> > +is_linear_load_p (load_permutation_t loads)
> > +{
> > +  if (loads.length() == 0)
> > +    return false;
> > +
> > +  unsigned leader = loads[0];
> > +  unsigned load, i;
> > +  FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
> > +    if (load != ++leader)
> > +      return false;
> > +  return true;
> > +}
> > +
> > +
> > +/* Check to see if all loads rooted in ROOT are linear.  Linearity is
> > +   defined as having no gaps between values loaded.  */
> > +
> > +static load_permutation_t
> > +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree
> root,
> > +		bool *linear)
> > +{
> > +  *linear = false;
> > +  if (!root)
> > +    return vNULL;
> > +
> > +  unsigned i;
> > +  load_permutation_t loads = vNULL;
> > +  load_permutation_t *tmp;
> > +
> > +  if ((tmp = perm_cache->get (root)) != NULL)
> > +    {
> > +      *linear = is_linear_load_p (*tmp);
> > +      return *tmp;
> > +    }
> > +
> > +  perm_cache->put (root, vNULL);
> > +
> > +  /* If it's a load node, then just read the load permute.  */
> > +  if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
> > +    {
> > +      loads = SLP_TREE_LOAD_PERMUTATION (root);
> > +      perm_cache->put (root, loads);
> > +      if (!is_linear_load_p (loads))
> > +	return loads;
> > +    }
> > +  else if (SLP_TREE_DEF_TYPE (root) == vect_external_def)
> > +    {
> > +       loads.create (SLP_TREE_LANES (root));
> > +       tree op;
> > +       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (root), i, op)
> > +	 {
> > +	   gimple *defstmt = SSA_NAME_DEF_STMT (op);
> > +	   if (!is_gimple_assign (defstmt))
> > +	     return vNULL;
> > +
> > +	   switch (gimple_assign_rhs_code (defstmt))
> > +	   {
> > +	     case IMAGPART_EXPR:
> > +	       loads.safe_push (1);
> > +	       break;
> > +	     case REALPART_EXPR:
> > +	       loads.safe_push (0);
> > +	       break;
> > +	     default:
> > +	       {
> > +		 loads.release ();
> > +		 return vNULL;
> > +	       }
> > +	   }
> > +	 }
> > +
> > +       perm_cache->put (root, loads);
> > +       if (!is_linear_load_p (loads))
> > +	 return loads;
> > +    }
> > +  else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def)
> > +    return vNULL;
> > +
> > +
> > +
> > +  auto_vec<load_permutation_t> all_loads;
> > +  bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists ();
> > +
> > +  slp_tree child;
> > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
> > +    {
> > +      /* Seed it with a value to prevent infinite recursions.  */
> > +      loads = linear_loads_p (perm_cache, child, linear);
> > +      if ((!*linear && !is_perm) || !loads.exists ())
> > +	return loads;
> > +
> > +      all_loads.safe_push (loads);
> > +    }
> > +
> > +  if (is_perm)
> > +    {
> > +      lane_permutation_t perm = SLP_TREE_LANE_PERMUTATION (root);
> > +      load_permutation_t nloads;
> > +      nloads.create (SLP_TREE_LANES (root));
> > +      nloads.quick_grow (SLP_TREE_LANES (root));
> > +      for (i = 0; i < SLP_TREE_LANES (root); i++)
> > +	nloads[i] = all_loads[perm[i].first][perm[i].second];
> > +
> > +      perm_cache->put (root, nloads);
> > +      if (!is_linear_load_p (nloads))
> > +	return nloads;
> > +      loads = nloads;
> > +    }
> > +
> > +  perm_cache->put (root, loads);
> > +  *linear = true;
> > +  return loads;
> > +}
> > +
> > +/* Builds a permutation group from the operands in OPS and stores it in
> BLOCKS.
> > +   The group describes how to combine the operators to get a valid linear
> node.
> > +
> > +   This is used when combining multiple children from a two_operators
> node into
> > +   one using a lane permute to select the appropriate lane. As an example
> the
> > +   permute { [0 0] [1 4] [2 2] [3 3] [1 4] [5 5] } says the nodes which occur
> > +   twice in a group, e.g [0 0] only needs itself to possibly be made linear
> > +   whereas [1 4] means to combine the nodes 1 and 4. */
> > +
> > +static void
> > +vect_build_perm_groups (slp_tree_to_load_perm_map_t *perm_cache,
> > +			map_t *blocks, vec<slp_tree> ops)
> > +{
> > +  slp_tree op;
> > +  unsigned i;
> > +  bool is_linear = false;
> > +  unsigned min_eq = -1, max_eq = 0;
> > +  unsigned min_idx = 0, max_idx = 0;
> > +  FOR_EACH_VEC_ELT (ops, i, op)
> > +    {
> > +      load_permutation_t perms = linear_loads_p (perm_cache, op,
> &is_linear);
> > +      unsigned x, imin = -1, imax = 0;
> > +      for (x = 0; x < perms.length () && !is_linear; x++)
> > +	{
> > +	  imin = MIN (imin, perms[x]);
> > +	  imax = MAX (imax, perms[x]);
> > +	}
> > +
> > +	if (imin != imax || perms.length () == 0 || is_linear)
> > +	  blocks[i] = {i, i};
> > +	else
> > +	  {
> > +	    if (imin <= min_eq)
> > +	      {
> > +		min_eq = imin;
> > +		min_idx = i;
> > +	      }
> > +
> > +	    if (imin >= max_eq)
> > +	      {
> > +		max_eq = imin;
> > +		max_idx = i;
> > +	      }
> > +	  }
> > +    }
> > +
> > +  /* Now fill in the gap.  */
> > +  blocks[min_idx] = {min_idx, max_idx};
> > +  blocks[max_idx] = {min_idx, max_idx};
> > +
> > +  if (dump_enabled_p ())
> > +    {
> > +      dump_printf_loc (MSG_NOTE, vect_location, "pattern group: { ");
> > +      for (i = 0; i < ops.length (); i++)
> > +	dump_printf (MSG_NOTE,"[%d %d] ", blocks[i].a, blocks[i].b);
> > +      dump_printf (MSG_NOTE,"}\n");
> > +    }
> > +
> > +}
> > +
> > +/* This function attempts to make a node rooted in NODE with parent
> PARENT
> > +   linear.  If the node if already linear than the node itself is returned
> > +   in RESULT.
> > +
> > +   If the node is not linear then a new VEC_PERM_EXPR node is created
> with a
> > +   lane permute that when applied will make the node linear.   If such a
> > +   permute cannot be created then FALSE is returned from the function.
> > +
> > +   Here linearity is defined as having a sequential, monotically increasing
> > +   load position inside the load permute generated by the loads reachable
> from
> > +   NODE.  */
> > +
> > +static bool
> > +vect_slp_make_linear (slp_tree_to_load_perm_map_t *perm_cache,
> > +		      slp_tree parent, slp_tree node, slp_tree *result)
> > +{
> > +  bool is_linear = false;
> > +  unsigned x, val;
> > +  load_permutation_t load_perm = linear_loads_p (perm_cache, node,
> &is_linear);
> > +  if (is_linear)
> > +    {
> > +      *result = node;
> > +      return true;
> > +    }
> > +
> > +  /* Attempt to linearise the permute.  */
> > +  vec<std::pair<unsigned, unsigned> > zipped;
> > +  zipped.create (load_perm.length ());
> > +  FOR_EACH_VEC_ELT (load_perm, x, val)
> > +    zipped.quick_push (std::make_pair (val, x));
> > +
> > +  typedef const std::pair<unsigned, unsigned>* cmp_t;
> > +  zipped.qsort ([](const void *a, const void *b) -> int
> > +    { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
> > +
> > +  /* Verify if we have a linear permute sequence.  */
> > +  if (zipped.length () > 0)
> > +    {
> > +      unsigned leader = zipped[0].first;
> > +      for (x = 1; x < zipped.length (); x++)
> > +	if(!(is_linear = (zipped[x].first == ++leader)))
> > +	  break;
> > +    }
> > +
> > +  if (!is_linear)
> > +    {
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location,
> > +			"Loads could not be made linear %p\n",
> > +			node);
> > +      zipped.release ();
> > +      return false;
> > +  }
> > +
> > +  for (x = 0; x < zipped.length (); x++)
> > +    zipped[x].first = 0;
> > +
> > +  /* Create the new permute node and store it instead.  */
> > +  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
> > +  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
> > +  SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
> > +  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
> > +  SLP_TREE_CHILDREN (vnode).quick_push (node);
> > +  SLP_TREE_REF_COUNT (vnode) = 1;
> > +  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
> > +  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE
> (parent);
> > +  *result = vnode;
> > +  return is_linear;
> > +}
> > +
> > +/* Helper utility to check to see if the permutation PERM is one that can
> be
> > +   used in a node combination operation.  This is defined as the permute
> not
> > +   having all the elements being the same. e.g [0 0].  */
> > +
> > +static inline bool
> > +vect_can_combine_node_p (load_permutation_t perm, bool is_linear)
> > +{
> > +  if (is_linear)
> > +    return false;
> > +
> > +  unsigned i, x;
> > +  FOR_EACH_VEC_ELT (perm, i, x)
> > +    if (perm[0] != x)
> > +      return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* This function combines the nodes in MAP together to make a new
> node using a
> > +   lane permute.  The nodes to combine are stored in ENTRIES and the
> resulting
> > +   node is returned in RESULT.
> > +
> > +   If the nodes are already linear then this function fails and returns FALSE.
> > +   Otherwise it returns the new node and TRUE.  */
> > +
> > +static bool
> > +vect_slp_make_combine_linear (slp_tree_to_load_perm_map_t
> *perm_cache,
> > +			      slp_tree parent, vec<slp_tree> entries, map_t
> map,
> > +			      slp_tree *result)
> > +{
> > +  if (map.a == map.b)
> > +    return false;
> > +
> > +  slp_tree node_a = entries[map.a];
> > +  slp_tree node_b = entries[map.b];
> > +
> > +  bool is_a_linear = false;
> > +  bool is_b_linear = false;
> > +
> > +  load_permutation_t load_perm_a
> > +    = linear_loads_p (perm_cache, node_a, &is_a_linear);
> > +  if (!vect_can_combine_node_p (load_perm_a, is_a_linear))
> > +    return false;
> > +
> > +  load_permutation_t load_perm_b
> > +    = linear_loads_p (perm_cache, node_b, &is_b_linear);
> > +  if (!vect_can_combine_node_p (load_perm_b, is_b_linear))
> > +    return false;
> > +
> > +  if (!load_perm_a.exists () || !load_perm_b.exists ())
> > +    return false;
> > +
> > +  /* Now we need to figure which node is first.  */
> > +  auto_vec<slp_tree> nodes;
> > +  nodes.create (2);
> > +  vec<std::pair<unsigned, unsigned> > perm;
> > +  perm.create (2);
> > +  if (load_perm_a[0] < load_perm_b[0])
> > +    {
> > +      perm.quick_push (std::make_pair (0, 0));
> > +      perm.quick_push (std::make_pair (1, 0));
> > +    }
> > +  else
> > +    {
> > +      perm.quick_push (std::make_pair (1, 0));
> > +      perm.quick_push (std::make_pair (0, 0));
> > +    }
> > +
> > +  nodes.quick_push (node_a);
> > +  nodes.quick_push (node_b);
> > +
> > +  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
> > +  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
> > +  SLP_TREE_LANE_PERMUTATION (vnode) = perm;
> > +  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
> > +  SLP_TREE_CHILDREN (vnode).safe_splice (nodes);
> > +  SLP_TREE_REF_COUNT (vnode) = 1;
> > +  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (parent);
> > +  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE
> (parent);
> > +  *result = vnode;
> > +  return true;
> > +}
> > +
> > +/* Checks to see of the expression represented by NODE is a gimple
> assign with
> > +   code CODE.  */
> > +
> > +static inline bool
> > +vect_match_expression_p (slp_tree node, tree_code code)
> > +{
> > +  if (!node
> > +      || !SLP_TREE_REPRESENTATIVE (node))
> > +    return false;
> > +
> > +  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE
> (node));
> > +  if (!is_gimple_assign (expr)
> > +      || gimple_assign_rhs_code (expr) != code)
> > +    return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* Checks to see if the expression represented by NODE is a call to the
> internal
> > +   function FN.  */
> > +
> > +static inline bool
> > +vect_match_call_p (slp_tree node, internal_fn fn)
> > +{
> > +  if (!node
> > +      || !SLP_TREE_REPRESENTATIVE (node))
> > +    return false;
> > +
> > +  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE
> (node));
> > +  if (!gimple_call_internal_p (expr, fn))
> > +    return false;
> > +
> > +   return true;
> > +}
> > +
> > +/* Checks to see if NODE is a two_operands node whom's operands are
> calls to the
> > +   internal function FN.  */
> > +
> > +static bool
> > +vect_match_two_call_p (slp_tree node, internal_fn fn)
> > +{
> > +  if (!SLP_TREE_REPRESENTATIVE (node)
> > +      || SLP_TREE_CODE (node) != VEC_PERM_EXPR
> > +      || SLP_TREE_CHILDREN (node).length () != 2)
> > +    return false;
> > +
> > +  vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > +
> > +  return vect_match_call_p (children[0], fn)
> > +	 && vect_match_call_p (children[1], fn);
> > +}
> > +
> > +/* Check if the given lane permute in PERMUTES matches an alternating
> sequence
> > +   of {P0 P1 P0 P1 ...}.  This to account for unrolled loops.  Further mode
> > +   there resulting permute must be linear.   */
> > +
> > +static bool
> > +vect_check_lane_permute (lane_permutation_t &permutes,
> > +			 unsigned p0, unsigned p1)
> > +{
> > +  if (permutes.length () == 0)
> > +    return false;
> > +
> > +  unsigned val[2] = {p0, p1};
> > +  unsigned seed = permutes[0].second;
> > +  for (unsigned i = 0; i < permutes.length (); i++)
> > +    if (permutes[i].first != val[i % 2]
> > +	|| permutes[i].second != seed++)
> > +      return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* This function will match the two gimple expressions representing
> NODE1 and
> > +   NODE2 in parallel and returns the pair operation that represents the two
> > +   expressions in the two statements.
> > +
> > +   If match is successful then the corresponding complex_operation is
> > +   returned and the arguments to the two matched operations are
> returned in OPS.
> > +
> > +   If TWO_OPERANDS it is expected that the LANES of the parent
> VEC_PERM select
> > +   from the two nodes alternatingly.
> > +
> > +   If unsuccessful then CMPLX_NONE is returned and OPS is untouched.
> > +
> > +   e.g. the following gimple statements
> > +
> > +   stmt 0 _39 = _37 + _12;
> > +   stmt 1 _6 = _38 - _36;
> > +
> > +   will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
> > +*/
> > +
> > +static complex_operation_t
> > +vect_detect_pair_op (slp_tree node1, slp_tree node2,
> lane_permutation_t &lanes,
> > +		     bool two_operands = true, vec<slp_tree> *ops = NULL)
> > +{
> > +  complex_operation_t result = CMPLX_NONE;
> > +
> > +  if (vect_match_expression_p (node1, MINUS_EXPR)
> > +      && vect_match_expression_p (node2, PLUS_EXPR)
> > +      && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> > +    result = MINUS_PLUS;
> > +  else if (vect_match_expression_p (node1, PLUS_EXPR)
> > +	   && vect_match_expression_p (node2, MINUS_EXPR)
> > +	   && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> > +    result = PLUS_MINUS;
> > +  else if (vect_match_expression_p (node1, PLUS_EXPR)
> > +	   && vect_match_expression_p (node2, PLUS_EXPR))
> > +    result = PLUS_PLUS;
> > +  else if (vect_match_expression_p (node1, MULT_EXPR)
> > +	   && vect_match_expression_p (node2, MULT_EXPR))
> > +    result = MULT_MULT;
> > +
> > +  if (result != CMPLX_NONE && ops != NULL)
> > +    {
> > +      ops->create (2);
> > +      ops->quick_push (node1);
> > +      ops->quick_push (node2);
> > +    }
> > +  return result;
> > +}
> > +
> > +/* Overload of vect_detect_pair_op that matches against the
> representative
> > +   statements in the children of NODE.  It is expected that NODE has
> exactly
> > +   two children and when TWO_OPERANDS then NODE must be a
> VEC_PERM.  */
> > +
> > +static complex_operation_t
> > +vect_detect_pair_op (slp_tree node, bool two_operands = true,
> > +		     vec<slp_tree> *ops = NULL)
> > +{
> > +  if (!two_operands && SLP_TREE_CODE (node) == VEC_PERM_EXPR)
> > +    return CMPLX_NONE;
> > +
> > +  if (SLP_TREE_CHILDREN (node).length () != 2)
> > +    return CMPLX_NONE;
> > +
> > +  vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > +  lane_permutation_t &lanes = SLP_TREE_LANE_PERMUTATION (node);
> > +
> > +  return vect_detect_pair_op (children[0], children[1], lanes,
> two_operands,
> > +			      ops);
> > +}
> > +
> > +/* This function marks every statement that is being replaced during the
> > +   the pattern matching as PURE.  Normally when replacing a statement
> due
> > +   to a pattern we add the statement to the
> STMT_VINFO_PATTERN_DEF_SEQ of
> > +   the pattern that is replacing them.  In this case however this won't
> > +   work as when doing the replacement we are changing the nodes that
> are
> > +   used by the statements.  This means that when vectorized the SSA chain
> > +   is different than in the BB.
> > +
> > +   Declaring the statements as part of the sequence will then cause SSA
> > +   verification to fail as we may refer to statements that were not in the
> > +   original USE-DEF chain of the statement we are replacing.
> > +
> > +   The relevancy of the statements are saved and are later restored if SLP
> is to
> > +   be dissolved.  */
> > +
> > +static void
> > +vect_mark_stmts_as_in_pattern (hash_set<slp_tree> *cache, slp_tree
> node)
> > +{
> > +  if (cache->add (node))
> > +    return;
> > +
> > +  unsigned i;
> > +  stmt_vec_info stmt_info;
> > +  slp_tree child;
> > +
> > +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> > +    {
> > +      if (gimple_assign_load_p (STMT_VINFO_STMT (stmt_info)))
> > +       return;
> > +
> > +      vect_save_relevancy (stmt_info);
> > +      STMT_VINFO_SLP_VECT_ONLY (stmt_info) = true;
> > +      STMT_SLP_TYPE (stmt_info) = pure_slp;
> > +    }
> > +
> > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > +    vect_mark_stmts_as_in_pattern (cache, child);
> > +}
> > +
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +/* SLP Complex Numbers pattern matching.
> > +
> > +  As an example, the following simple loop:
> > +
> > +    double a[restrict N]; double b[restrict N]; double c[restrict N];
> > +
> > +    for (int i=0; i < N; i+=2)
> > +    {
> > +      c[i] = a[i] - b[i+1];
> > +      c[i+1] = a[i+1] + b[i];
> > +    }
> > +
> > +  which represents a complex addition on with a rotation of 90* around
> the
> > +  argand plane. i.e. if `a` and `b` were complex numbers then this would be
> the
> > +  same as `a + (b * I)`.
> > +
> > +  Here the expressions for `c[i]` and `c[i+1]` are independent but have to
> be
> > +  both recognized in order for the pattern to work.  As an SLP tree this is
> > +  represented as
> > +
> > +                +--------------------------------+
> > +                |       stmt 0 *_9 = _10;        |
> > +                |       stmt 1 *_15 = _16;       |
> > +                +--------------------------------+
> > +                                |
> > +                                |
> > +                                v
> > +                +--------------------------------+
> > +                |     stmt 0 _10 = _4 - _8;      |
> > +                |    stmt 1 _16 = _12 + _14;     |
> > +                | lane permutation { 0[0] 1[1] } |
> > +                +--------------------------------+
> > +                            |        |
> > +                            |        |
> > +                            |        |
> > +               +-----+      |        |      +-----+
> > +               |     |      |        |      |     |
> > +         +-----| { } |<-----+        +----->| { } --------+
> > +         |     |     |   +------------------|     |       |
> > +         |     +-----+   |                  +-----+       |
> > +         |        |      |                                |
> > +         |        |      |                                |
> > +         |        +------|------------------+             |
> > +         |               |                  |             |
> > +         v               v                  v             v
> > +     +--------------------------+     +--------------------------------+
> > +     |     stmt 0 _8 = *_7;     |     |        stmt 0 _4 = *_3;        |
> > +     |    stmt 1 _14 = *_13;    |     |       stmt 1 _12 = *_11;       |
> > +     | load permutation { 1 0 } |     |    load permutation { 0 1 }    |
> > +     +--------------------------+     +--------------------------------+
> > +
> > +  The pattern matcher allows you to replace both statements 0 and 1 or
> none at
> > +  all.  Because this operation is a two operands operation the actual nodes
> > +  being replaced are those in the { } nodes.  The actual scalar statements
> > +  themselves are not replaced or used during the matching but instead the
> > +  SLP_TREE_REPRESENTATIVE statements are inspected.  You are also
> allowed to
> > +  replace and match on any number of nodes.
> > +
> > +  Because the pattern matcher matches on the representative statement
> for the
> > +  SLP node the case of two_operators it allows you to match the children
> of the
> > +  node.  This is done using the method `recognize ()`.
> > +
> > +*/
> > +
> > +/* The complex_pattern class contains common code for pattern
> matchers that work
> > +   on complex numbers.  These provide functionality to allow de-
> construction and
> > +   validation of sequences depicting/transforming REAL and IMAG pairs.  */
> > +
> > +class complex_pattern : public vect_pattern
> > +{
> > +  protected:
> > +    vec<slp_tree> *m_nodes = NULL;
> > +    complex_pattern (slp_tree *node)
> > +      : vect_pattern (node)
> > +    { }
> > +
> > +  public:
> > +    bool validate_p (slp_tree_to_load_perm_map_t *);
> > +    gcall *build (vec_info *);
> > +};
> > +
> > +/* The post transform and validation function for the complex number
> > +   patterns.  This will re-arrange the tree and re-organize the nodes such
> > +   that they can be used by the complex number instructions that are to be
> > +   created.  It does this by doing the following steps:
> > +
> > +   For each node that isn't already linear a new permute node is created
> which
> > +   when applied makes the node linear.  If such permutation does not exist
> then
> > +   the function returns FALSE.  For nodes that are already linear no work is
> > +   done.   */
> > +
> > +bool
> > +complex_pattern::validate_p (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  slp_tree node;
> > +  unsigned ix;
> > +  auto_vec<slp_tree> workset;
> > +  FOR_EACH_VEC_ELT (this->m_ops, ix, node)
> > +    {
> > +      auto_vec<slp_tree> nodes;
> > +      nodes.create (this->m_num_args);
> > +      slp_tree tmp = NULL;
> > +      auto_vec<slp_tree> new_nodes;
> > +
> > +      unsigned i;
> > +      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp)
> > +	{
> > +	  slp_tree vnode = NULL;
> > +	  if (vect_slp_make_linear (perm_cache, node, tmp, &vnode))
> > +	    {
> > +	      nodes.quick_push (vnode);
> > +	      if (vnode != tmp)
> > +		new_nodes.safe_push (vnode);
> > +	    }
> > +	  else
> > +	    {
> > +	      FOR_EACH_VEC_ELT (new_nodes, i, tmp)
> > +		delete tmp;
> > +
> > +	      FOR_EACH_VEC_ELT (workset, i, tmp)
> > +		delete tmp;
> > +
> > +	      nodes.release();
> > +	      return false;
> > +	    }
> > +	}
> > +
> > +      slp_tree new_node
> > +	= vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node),
> > +				    SLP_TREE_CHILDREN (node).length ());
> > +      SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node);
> > +      SLP_TREE_LANE_PERMUTATION (new_node) =
> SLP_TREE_LANE_PERMUTATION (node);
> > +      SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node);
> > +      SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT (node);
> > +      SLP_TREE_REPRESENTATIVE (new_node) =
> SLP_TREE_REPRESENTATIVE (node);
> > +      SLP_TREE_CHILDREN (new_node).safe_splice (nodes);
> > +      SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node);
> > +
> > +      workset.safe_push (new_node);
> > +    }
> > +
> > +  FOR_EACH_VEC_ELT (workset, ix, node)
> > +    SLP_TREE_CHILDREN (*this->m_node)[ix] = node;
> > +
> > +  return true;
> > +}
> > +
> > +
> > +/* Create a replacement pattern statement for each node in m_node and
> inserts
> > +   the new statement into m_node as the new representative statement.
> The old
> > +   statement is marked as being in a pattern defined by the new statement.
> The
> > +   statement is created as call to internal function IFN with m_num_args
> > +   arguments.
> > +
> > +   Futhermore the new pattern is also added to the vectorization
> information
> > +   structure VINFO and the old statement STMT_INFO is marked as unused
> while
> > +   the new statement is marked as used and the number of SLP uses of the
> new
> > +   statement is incremented.
> > +
> > +   The newly created SLP nodes are marked as SLP only and will be
> dissolved
> > +   if SLP is aborted.
> > +
> > +   The newly created gimple call is returned and the BB remains unchanged.
> > +
> > +   This default method is designed to only match against simple operands
> where
> > +   all the input and output types are the same.
> > +*/
> > +
> > +gcall *
> > +complex_pattern::build (vec_info *vinfo)
> > +{
> > +  stmt_vec_info stmt_info;
> > +
> > +  auto_vec<tree> args;
> > +  args.create (this->m_num_args);
> > +  args.quick_grow_cleared (this->m_num_args);
> > +  slp_tree node;
> > +  unsigned ix;
> > +  stmt_vec_info call_stmt_info;
> > +  gcall *call_stmt = NULL;
> > +
> > +  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
> 
> So elsewhere I see m_nodes assigned from SLP_TREE_CHILDREN
> of m_node.  I assume m_node is the VEC_PERM of the two-operator
> operation.  I expected we create a call for m_node here, not
> for all children?

It depends on the pattern. For most, particularly those of two_operands nodes,
unless I have mistaken something it's the children of the node that determines
what operation is being performed isn't it?

So for  ADD for instance, the tree is

node 0x49aeae0 (max_nunits=4, refcnt=2)
 stmt 0 REALPART_EXPR <*_10> = _4;
 stmt 1 IMAGPART_EXPR <*_10> = _22;
 children 0x49b0b20
node 0x49b0b20 (max_nunits=4, refcnt=2)
 stmt 0 _4 = _13 - _6;
 stmt 1 _22 = _8 + _12;
 lane permutation { 0[0] 1[1] }
 children 0x49b0430 0x49a73c0
node 0x49b0430 (max_nunits=1, refcnt=1)
 { }
 children 0x49ad660 0x499e400
node 0x49ad660 (max_nunits=4, refcnt=3)
 stmt 0 _13 = REALPART_EXPR <*_3>;
 stmt 1 _12 = IMAGPART_EXPR <*_3>;
 load permutation { 0 1 }
node 0x499e400 (max_nunits=4, refcnt=3)
 stmt 0 _6 = IMAGPART_EXPR <*_5>;
 stmt 1 _8 = REALPART_EXPR <*_5>;
 load permutation { 1 0 }
node 0x49a73c0 (max_nunits=1, refcnt=1)
 { }
 children 0x49ad660 0x499e400

where the VEC_PERM node itself just contains the permute and the children of it
0x49b0430 and 0x49a73c0 contain the operations - and + and from what I understood
the lane permute defined how the input is deinterleaved between the two nodes.

So when replacing them it's those two that I need to change and not VEC_PERM itself
otherwise the node could contain a permute and an operation that needs to be vectorized?

Or did you mean I should dissolve the two_operand node completely at this point? I was trying
to minimize the number of changes done on each replacement, because then it means I can assume
that any information I have already calculated of the children of the node being replaced is still the same
when doing recursive matching.

The tree directly out of the matcher is 

    node 0x49aeae0 (max_nunits=4, refcnt=2)
     stmt 0 REALPART_EXPR <*_10> = _4;
     stmt 1 IMAGPART_EXPR <*_10> = _22;
     children 0x49b0b20
    node 0x49b0b20 (max_nunits=4, refcnt=2)
     stmt 0 _4 = _13 - _6;
     stmt 1 _22 = _8 + _12;
     lane permutation { 0[0] 1[1] }
     children 0x4ade0c0 0x4ade220
    node 0x4ade0c0 (max_nunits=1, refcnt=1)
     { }
     SLP Only pattern:
      slp_patt_41 = .COMPLEX_ADD_ROT90 (_pa_43, _pa_42);
     children 0x49ad660 0x49b60f0
    node 0x49ad660 (max_nunits=4, refcnt=3)
     stmt 0 _13 = REALPART_EXPR <*_3>;
     stmt 1 _12 = IMAGPART_EXPR <*_3>;
     load permutation { 0 1 }
    node 0x49b60f0 (max_nunits=1, refcnt=1)
     { }
     lane permutation { 0[1] 0[0] }
     children 0x499e400
    node 0x499e400 (max_nunits=4, refcnt=3)
     stmt 0 _6 = IMAGPART_EXPR <*_5>;
     stmt 1 _8 = REALPART_EXPR <*_5>;
     load permutation { 1 0 }
    node 0x4ade220 (max_nunits=1, refcnt=1)
     { }
     SLP Only pattern:
      slp_patt_32 = .COMPLEX_ADD_ROT90 (_pa_40, _pa_33);
     children 0x49ad660 0x4ade180
    node 0x4ade180 (max_nunits=1, refcnt=1)
     { }
     lane permutation { 0[1] 0[0] }
     children 0x499e400

The one exception to this is when the node being replaced isn't a two_operands node,
like is the case with FMA.  In that case the node is just an ADD node which itself is replaced.
This is why it's a variable, so I can tell it where to perform the replacement.

Thanks,
Tamar

> 
> > +    {
> > +      /* Calculate the location of the statement in NODE to replace.  */
> > +      stmt_info = SLP_TREE_REPRESENTATIVE (node);
> > +      gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
> > +      tree type = TREE_TYPE (gimple_get_lhs (old_stmt));
> > +
> > +      /* Create the argument set for use by gimple_build_call_internal_vec.
> */
> > +      for (unsigned i = 0; i < this->m_num_args; i++)
> > +	{
> > +	  tree var = make_temp_ssa_name (type, old_stmt, "_pa");
> 
> Why do you need to create N new SSA names (all with the same
> definition statement)?  It looks like this new "scalar" stmt
> is only a template stored in SLP_TREE_REPRESENTATIVE
> and supposedly there needs to be "something" in the actual
> arguments it make us not crash later?  Can't you for example
> simply push the old_stmt LHS N times?
> 
> For VEC_PERM I originally wanted to set SLP_TREE_REPRESENTATIVE
> to NULL_TREE but we don't like that obviously.  So long-term
> we might want to change SLP_TREE_CODE to a match_code to allow
> tree codes and internal-functions and "ignore" the representative
> in case it is not ERROR_MARK.
> 
> > +	  args[i] = var;
> > +	}
> > +
> > +      /* Create the new pattern statements.  */
> > +      call_stmt = gimple_build_call_internal_vec (this->m_ifn, args);
> > +      tree var = make_temp_ssa_name (type, call_stmt, "slp_patt");
> > +      gimple_call_set_lhs (call_stmt, var);
> > +      gimple_set_location (call_stmt, gimple_location (old_stmt));
> > +      gimple_call_set_nothrow (call_stmt, true);
> > +
> > +      /* Adjust the book-keeping for the new and old statements for use
> during
> > +	 SLP.  This is required to get the right VF and statement during SLP
> > +	 analysis.  These changes are created after relevancy has been set for
> > +	 the nodes as such we need to manually update them.  Any changes
> will be
> > +	 undone if SLP is cancelled.  */
> > +      call_stmt_info
> > +	= vinfo->add_pattern_stmt (call_stmt, stmt_info);
> > +
> > +      /* add_pattern_stmt can't be done in vect_mark_pattern_stmts
> because
> > +	 the non-SLP pattern matchers already have added the statement to
> VINFO
> > +	 by the time it is called.  Some of them need to modify the returned
> > +	 stmt_info.  vect_mark_pattern_stmts is called by recog_pattern and
> it
> > +	 would increase the size of each pattern with boilerplate code to
> make
> > +	 the call there.  */
> > +      vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt,
> > +			       SLP_TREE_VECTYPE (node));
> > +      STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> > +      STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
> > +
> > +      /* Since we are replacing all the statements in the group with the same
> > +	 thing it doesn't really matter.  So just set it every time a new stmt
> > +	 is created.  */
> > +      SLP_TREE_REPRESENTATIVE (node) = call_stmt_info;
> > +      SLP_TREE_CODE (node) = CALL_EXPR;
> > +    }
> > +  return call_stmt;
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_add_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +class complex_add_pattern : public complex_pattern
> > +{
> > +  protected:
> > +    complex_add_pattern (slp_tree *node)
> > +      : complex_pattern (node)
> > +    {
> > +      this->m_num_args = 2;
> > +    }
> > +
> > +  public:
> > +    static vect_pattern* create (slp_tree *node)
> > +    {
> > +       return new complex_add_pattern (node);
> > +    }
> > +
> > +    const char* get_name ()
> > +    {
> > +      return "Complex Addition";
> > +    }
> > +
> > +    bool matches (slp_tree_to_load_perm_map_t *);
> > +    bool matches (complex_operation_t op,
> slp_tree_to_load_perm_map_t *,
> > +		  vec<slp_tree> ops);
> > +};
> > +
> > +/* Pattern matcher for trying to match complex addition pattern in SLP
> tree.
> > +
> > +   If no match is found then IFN is set to IFN_LAST.
> > +   This function matches the patterns shaped as:
> > +
> > +   c[i] = a[i] - b[i+1];
> > +   c[i+1] = a[i+1] + b[i];
> > +
> > +   If a match occurred then TRUE is returned, else FALSE.  The initial match
> is
> > +   expected to be in OP1 and the initial match operands in args0.  */
> > +
> > +bool
> > +complex_add_pattern::matches (complex_operation_t op,
> > +			      slp_tree_to_load_perm_map_t *perm_cache,
> > +			      vec<slp_tree> ops)
> > +{
> > +  this->m_ifn = IFN_LAST;
> > +  this->m_ops.safe_splice (ops);
> > +
> > +  /* Find the two components.  Rotation in the complex plane will modify
> > +     the operations:
> > +
> > +      * Rotation  0: + +
> > +      * Rotation 90: - +
> > +      * Rotation 180: - -
> > +      * Rotation 270: + -
> > +
> > +      Rotation 0 and 180 can be handled by normal SIMD code, so we don't
> need
> > +      to care about them here.  */
> > +  if (op == MINUS_PLUS)
> > +    this->m_ifn = IFN_COMPLEX_ADD_ROT90;
> > +  else if (op == PLUS_MINUS)
> > +    this->m_ifn = IFN_COMPLEX_ADD_ROT270;
> > +
> > +  /* verify that there is a permute, otherwise this isn't a pattern we
> > +     we support.  */
> > +  bool is_linear = false;
> > +  bool all_linear = true;
> > +  unsigned x;
> > +  for (x = 0; x < this->m_ops.length () && all_linear; x++)
> > +    {
> > +      linear_loads_p (perm_cache, this->m_ops[x], &is_linear);
> > +      all_linear = all_linear && is_linear;
> > +    }
> > +
> > +  if (this->m_ifn == IFN_LAST || all_linear)
> > +    return false;
> > +
> > +  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> > +  return true;
> > +}
> > +
> > +bool
> > +complex_add_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  complex_operation_t op
> > +    = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
> > +  return matches (op, perm_cache, this->m_ops);
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_mul_pattern
> > +
> **********************************************************
> ********************/
> > +
> > +/* Helper function of that looks for a match in the CHILDth child of NODE.
> The
> > +   child used is stored in RES.
> > +
> > +   If the match is successful then ARGS will contain the operands matched
> > +   and the complex_operation_t type is returned.  If match is not
> successful
> > +   then CMPLX_NONE is returned and ARGS is left unmodified.  */
> > +
> > +static inline complex_operation_t
> > +vect_match_call_complex_mla (slp_tree node, unsigned child,
> > +			     vec<slp_tree> *args = NULL, slp_tree *res = NULL)
> > +{
> > +  gcc_assert (child < SLP_TREE_CHILDREN (node).length ());
> > +
> > +  slp_tree data = SLP_TREE_CHILDREN (node)[child];
> > +
> > +  if (res)
> > +    *res = data;
> > +
> > +  return vect_detect_pair_op (data, false, args);
> > +}
> > +
> > +/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the
> first
> > +   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
> > +
> > +   If a negate is found then the values in ARGS are reordered such that the
> > +   negate node is always the second one and the entry is replaced by the
> child
> > +   of the negate node.  */
> > +
> > +static inline bool
> > +vect_normalize_conj_loc (vec<slp_tree> args, bool *neg_first_p = NULL)
> > +{
> > +  gcc_assert (args.length () == 2);
> > +  bool neg_found = false;
> > +
> > +  if (vect_match_expression_p (args[0], NEGATE_EXPR))
> > +    {
> > +      std::swap (args[0], args[1]);
> > +      neg_found = true;
> > +      if (neg_first_p)
> > +	*neg_first_p = true;
> > +    }
> > +  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
> > +    {
> > +      neg_found = true;
> > +      if (neg_first_p)
> > +	*neg_first_p = false;
> > +    }
> > +
> > +  if (neg_found)
> > +    args[1] = SLP_TREE_CHILDREN (args[1])[0];
> > +
> > +  return neg_found;
> > +}
> > +
> > +/* Helper function that checks to see if the load permutation PERM is on
> that
> > +   belongs to a duplication operation.  i.e. checks to see if all the values in
> > +   PERM are the same.  */
> > +
> > +static inline bool
> > +vect_is_dup (load_permutation_t perm, unsigned idx)
> > +{
> > +  for (unsigned i = 0; i < perm.length (); i++)
> > +    if (perm[i] != idx)
> > +     return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both
> MULT_EXPR
> > +   nodes but also that they represent an operation that is either a complex
> > +   multiplication or a complex multiplication by conjucated value.
> > +
> > +   Of the negation is expected to be in the first half of the tree (As
> required
> > +   by an FMS pattern) then NEG_FIRST is true.  If the operation is a
> conjucate
> > +   operation then CONJ_FIRST_OPERAND is set to indicate whether the
> first or
> > +   second operand contains the conjucate operation.  */
> > +
> > +static inline bool
> > +vect_validate_multiplication (slp_tree_to_load_perm_map_t
> *perm_cache,
> > +			     vec<slp_tree> left_op, vec<slp_tree> right_op,
> > +			     bool neg_first, bool *conj_first_operand)
> > +{
> > +  /* The presence of a negation indicates that we have either a conjucate
> or a
> > +     rotation.  We need to distinguish which one.  */
> > +  *conj_first_operand = false;
> > +
> > +  load_permutation_t perm;
> > +  bool is_linear;
> > +  /* Complex conjucates have the negation on the imaginary part of the
> > +     number where rotations affect the real component.  So check if the
> > +     negation is on a dup of lane 1.  */
> > +  perm = linear_loads_p (perm_cache, right_op[1], &is_linear);
> > +  if (is_linear || !vect_is_dup (perm, 1))
> > +    return false;
> > +
> > +  /* Check if the conjucate is on the second first or second operand.  The
> > +     order of the node with the conjucate value determines this, and the
> dup
> > +     node must be one of lane 0 of the same DR as the neg node.  */
> > +  perm = linear_loads_p (perm_cache, left_op[0], &is_linear);
> > +  if (is_linear)
> > +    {
> > +      perm = linear_loads_p (perm_cache, left_op[1], &is_linear);
> > +      if (is_linear)
> > +	return false;
> > +    }
> > +  else if (!neg_first)
> > +    *conj_first_operand = true;
> > +  else
> > +    return false;
> > +
> > +  if (!vect_is_dup (perm, 0))
> > +    return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* Helper function to help distinguish between a conjucate and a rotation
> in a
> > +   complex multiplication.  The operations have similar shapes but the
> order of
> > +   the load permutes are different.  This function returns TRUE when the
> order
> > +   is consistent with a multiplication or multiplication by conjucated
> > +   operand but returns FALSE if it's a multiplication by rotated operand.  */
> > +
> > +static inline bool
> > +vect_validate_multiplication (slp_tree_to_load_perm_map_t
> *perm_cache,
> > +			     vec<slp_tree> op, unsigned idx)
> > +{
> > +  /* The left node is the more common case, test it first.  */
> > +  bool is_linear;
> > +  load_permutation_t perm = linear_loads_p (perm_cache, op[0],
> &is_linear);
> > +  if (is_linear || !vect_is_dup (perm, idx))
> > +    {
> > +      perm = linear_loads_p (perm_cache, op[1], &is_linear);
> > +      if (is_linear || !vect_is_dup (perm, idx))
> > +	return false;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> > +class complex_mul_pattern : public complex_pattern
> > +{
> > +  protected:
> > +    /* Allocate enough space for FMA as well.  */
> > +    map_t m_blocks[6] = {};
> > +    bool m_inplace = false;
> > +    complex_mul_pattern (slp_tree *node)
> > +      : complex_pattern (node)
> > +    {
> > +      this->m_num_args = 2;
> > +    }
> > +
> > +  public:
> > +    static vect_pattern* create (slp_tree *node)
> > +    {
> > +       return new complex_mul_pattern (node);
> > +    }
> > +
> > +    const char* get_name ()
> > +    {
> > +      return "Complex Multiplication";
> > +    }
> > +
> > +    bool validate_p (slp_tree_to_load_perm_map_t *);
> > +    bool matches (slp_tree_to_load_perm_map_t *);
> > +    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> > +		  vec<slp_tree>);
> > +};
> > +
> > +/* Pattern matcher for trying to match complex multiply pattern in SLP
> tree
> > +   If the operation matches then IFN is set to the operation it matched
> > +   and the arguments to the two replacement statements are put in
> m_ops.
> > +
> > +   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> > +
> > +   This function matches the patterns shaped as:
> > +
> > +   double ax = (b[i+1] * a[i]);
> > +   double bx = (a[i+1] * b[i]);
> > +
> > +   c[i] = c[i] - ax;
> > +   c[i+1] = c[i+1] + bx;
> > +
> > +   If a match occurred then TRUE is returned, else FALSE.  The initial match
> is
> > +   expected to be in OP1 and the initial match operands in args0.  */
> > +
> > +bool
> > +complex_mul_pattern::matches (complex_operation_t op,
> > +			      slp_tree_to_load_perm_map_t *perm_cache,
> > +			      vec<slp_tree> /* ops */)
> > +{
> > +  this->m_ifn = IFN_LAST;
> > +
> > +  if (op != MINUS_PLUS)
> > +    return false;
> > +
> > +  slp_tree root = *this->m_node;
> > +  /* First two nodes must be a multiply.  */
> > +  auto_vec<slp_tree> muls;
> > +  if (vect_match_call_complex_mla (root, 0) != MULT_MULT
> > +      || vect_match_call_complex_mla (root, 1, &muls) != MULT_MULT)
> > +    return false;
> > +
> > +  /* Now operand2+4 may lead to another expression.  */
> > +  auto_vec<slp_tree> left_op, right_op;
> > +  left_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
> > +  right_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
> > +
> > +  bool neg_first;
> > +  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
> > +
> > +  if (!is_neg)
> > +    {
> > +      /* A multiplication needs to multiply agains the real pair, otherwise
> > +	 the pattern matches that of FMS.   */
> > +      if (!vect_validate_multiplication (perm_cache, left_op, 0))
> > +	return false;
> > +      this->m_ifn = IFN_COMPLEX_MUL;
> > +      this->m_ops.safe_splice (left_op);
> > +      this->m_ops.safe_splice (right_op);
> > +    }
> > +  else if (is_neg)
> > +    {
> > +      bool conj_first_operand;
> > +      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
> > +					 neg_first, &conj_first_operand))
> > +	return false;
> > +
> > +      this->m_ifn = IFN_COMPLEX_MUL_CONJ;
> > +      /* Check if the conjucate is on the first or second parameter.  */
> > +      if (!conj_first_operand)
> > +	{
> > +	  this->m_ops.safe_push (left_op[1]);
> > +	  this->m_ops.safe_push (left_op[0]);
> > +	}
> > +      else
> > +	{
> > +	  this->m_ops.safe_push (left_op[0]);
> > +	  this->m_ops.safe_push (left_op[1]);
> > +	}
> > +      this->m_ops.safe_push (right_op[1]);
> > +      this->m_ops.safe_push (right_op[0]);
> > +    }
> > +
> > +  vect_build_perm_groups (perm_cache, &this->m_blocks[0], this-
> >m_ops);
> > +  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> > +
> > +  return true;
> > +}
> > +
> > +bool
> > +complex_mul_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  complex_operation_t op
> > +    = vect_detect_pair_op (*this->m_node);
> > +  return matches (op, perm_cache, this->m_ops);
> > +}
> > +
> > +
> > +/* Validates to see if the Complex MUL that we have matched is valid.
> This is
> > +   done through a combination of making nodes linear and combining
> nodes.  */
> > +
> > +bool
> > +complex_mul_pattern::validate_p (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  slp_tree node;
> > +  unsigned ix;
> > +  hash_set<slp_tree> cache;
> > +  auto_vec<slp_tree> workset;
> > +  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
> > +    {
> > +      auto_vec<slp_tree> nodes;
> > +      nodes.create (this->m_num_args);
> > +      slp_tree tmp = NULL;
> > +      auto_vec<slp_tree> new_nodes;
> > +
> > +      unsigned i;
> > +      for (i = 0; i < this->m_num_args; i++)
> > +	{
> > +	  unsigned index = (ix * this->m_num_args) + i;
> > +	  map_t map = this->m_blocks[index];
> > +	  slp_tree vnode = NULL;
> > +	  bool needs_linear = map.a == map.b;
> > +	  tmp = this->m_ops[index];
> > +	  cache.add (tmp);
> > +	  if (needs_linear
> > +	      && vect_slp_make_linear (perm_cache, node, tmp, &vnode))
> > +	    {
> > +	      nodes.quick_push (vnode);
> > +	      if (vnode != tmp)
> > +		new_nodes.safe_push (vnode);
> > +	    }
> > +	  else if (!needs_linear
> > +		   && vect_slp_make_combine_linear (perm_cache, node,
> > +						    this->m_ops, map,
> &vnode))
> > +	    {
> > +	      nodes.quick_push (vnode);
> > +	      if (vnode != tmp)
> > +		new_nodes.safe_push (vnode);
> > +	    }
> > +	  else
> > +	    {
> > +	      if (dump_enabled_p ())
> > +		dump_printf_loc (MSG_NOTE, vect_location,
> > +				"stmts could not be made %s %p\n",
> > +				needs_linear ? "linear" : "linear/combined",
> > +				tmp);
> > +	      FOR_EACH_VEC_ELT (new_nodes, i, tmp)
> > +		delete tmp;
> > +
> > +	      if (!m_inplace)
> > +		FOR_EACH_VEC_ELT (workset, i, tmp)
> > +		  delete tmp;
> > +
> > +	      nodes.release();
> > +	      return false;
> > +	    }
> > +
> > +	  vect_mark_stmts_as_in_pattern (&cache, node);
> > +	}
> > +
> > +      if (m_inplace)
> > +	{
> > +	  workset.safe_splice (nodes);
> > +	}
> > +      else
> > +	{
> > +	  slp_tree new_node
> > +	    = vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node),
> > +					SLP_TREE_CHILDREN (node).length
> ());
> > +	  SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node);
> > +	  SLP_TREE_LANE_PERMUTATION (new_node)
> > +	    = SLP_TREE_LANE_PERMUTATION (node);
> > +	  SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node);
> > +	  SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT
> (node);
> > +	  SLP_TREE_REPRESENTATIVE (new_node) =
> SLP_TREE_REPRESENTATIVE (node);
> > +	  SLP_TREE_CHILDREN (new_node).safe_splice (nodes);
> > +	  SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node);
> > +
> > +	  workset.safe_push (new_node);
> > +	}
> > +    }
> > +
> > +  if (m_inplace)
> > +    {
> > +      SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> > +      SLP_TREE_CHILDREN (*this->m_node).safe_splice (workset);
> > +    }
> > +  else
> > +    {
> > +      FOR_EACH_VEC_ELT (workset, ix, node)
> > +	SLP_TREE_CHILDREN (*this->m_node)[ix] = node;
> > +    }
> > +
> > +  return true;
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_fma_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +class complex_fma_pattern : public complex_mul_pattern
> > +{
> > +  protected:
> > +    /* Temporary nodes cache as scope of the class so we don't
> > +       have to worry about freeing it. */
> > +    auto_vec<slp_tree> m_ncache;
> > +    /* List of nodes to mark as no longer being a mul.  The first
> > +       one is chosen and the last one is dangling.  */
> > +    auto_vec<slp_tree> m_mul_nodes;
> > +    complex_fma_pattern (slp_tree *node)
> > +      : complex_mul_pattern (node)
> > +    {
> > +      this->m_num_args = 3;
> > +    }
> > +
> > +  public:
> > +    static vect_pattern* create (slp_tree *node)
> > +    {
> > +       return new complex_fma_pattern (node);
> > +    }
> > +
> > +    const char* get_name ()
> > +    {
> > +      return "Complex FMA";
> > +    }
> > +
> > +    bool matches (slp_tree_to_load_perm_map_t *);
> > +    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> > +		  vec<slp_tree> ops);
> > +    bool validate_p (slp_tree_to_load_perm_map_t *);
> > +};
> > +
> > +/* Helper function to "reset" a previously matched node and undo the
> changes
> > +   made enough so that the node is treated as an irrelevant node.  */
> > +
> > +static inline void
> > +vect_slp_reset_pattern (slp_tree node)
> > +{
> > +  stmt_vec_info stmt_info = vect_orig_stmt (SLP_TREE_REPRESENTATIVE
> (node));
> > +  STMT_VINFO_IN_PATTERN_P (stmt_info) = false;
> > +  STMT_SLP_TYPE (stmt_info) = pure_slp;
> > +  SLP_TREE_REPRESENTATIVE (node) = stmt_info;
> > +}
> > +
> > +/* Pattern matcher for trying to match complex multiply and accumulate
> > +   and multiply and subtract patterns in SLP tree.
> > +   If the operation matches then IFN is set to the operation it matched and
> > +   the arguments to the two replacement statements are put in m_ops.
> > +
> > +   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> > +
> > +   This function matches the patterns shaped as:
> > +
> > +   double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
> > +   double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
> > +
> > +   c[i] = c[i] - ax;
> > +   c[i+1] = c[i+1] + bx;
> > +
> > +   If a match occurred then TRUE is returned, else FALSE.  The match is
> > +   performed after COMPLEX_MUL which would have done the majority of
> the work.
> > +   This function merely matches an ADD with a COMPLEX_MUL IFN.  The
> initial
> > +   match is expected to be in OP1 and the initial match operands in args0.
> */
> > +
> > +bool
> > +complex_fma_pattern::matches (complex_operation_t op1,
> > +			      slp_tree_to_load_perm_map_t * /* perm_cache
> */,
> > +			      vec<slp_tree> /* args0 */)
> > +{
> > +  this->m_ifn = IFN_LAST;
> > +
> > +  /* Find the two components.  We match Complex MUL first which
> reduces the
> > +     amount of work this pattern has to do.  After that we just match the
> > +     head node and we're done.:
> > +
> > +     * FMA: + +.
> > +
> > +     We need to ignore the two_operands nodes that may also match.
> > +     For that we can check if they have any scalar statements and also
> > +     check that it's not a permute node as we're looking for a normal
> > +     PLUS_EXPR operation.  */
> > +  if (op1 != CMPLX_NONE)
> > +    return false;
> > +
> > +  /* Find the two components.  We match Complex MUL first which
> reduces the
> > +     amount of work this pattern has to do.  After that we just match the
> > +     head node and we're done.:
> > +
> > +   * FMA: + + on a non-two_operands node.  */
> > +  if (SLP_TREE_LANE_PERMUTATION (*this->m_node).exists ()
> > +      || !SLP_TREE_SCALAR_STMTS (*this->m_node).exists ()
> > +      || !vect_match_expression_p (*this->m_node, PLUS_EXPR))
> > +    return false;
> > +
> > +  slp_tree node = SLP_TREE_CHILDREN (*this->m_node)[1];
> > +
> > +  if (vect_match_two_call_p (node, IFN_COMPLEX_MUL))
> > +    this->m_ifn = IFN_COMPLEX_FMA;
> > +  else if (vect_match_two_call_p (node, IFN_COMPLEX_MUL_CONJ))
> > +    this->m_ifn = IFN_COMPLEX_FMA_CONJ;
> > +  else
> > +    return false;
> > +
> > +  this->m_mul_nodes.safe_splice (SLP_TREE_CHILDREN (node));
> > +  node = SLP_TREE_CHILDREN (node)[0];
> > +  this->m_ops.create (this->m_num_args);
> > +
> > +  if (this->m_ifn == IFN_COMPLEX_FMA)
> > +    {
> > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]);
> > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]);
> > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]);
> > +    }
> > +  else
> > +    {
> > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]);
> > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]);
> > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]);
> > +    }
> > +
> > +  this->m_ncache.safe_push (*this->m_node);
> > +  this->m_nodes = &this->m_ncache;
> > +
> > +  return true;
> > +}
> > +
> > +bool
> > +complex_fma_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  auto_vec<slp_tree> args0;
> > +  complex_operation_t op
> > +    = vect_detect_pair_op (*this->m_node, true, &args0);
> > +  return matches (op, perm_cache, args0);
> > +}
> > +
> > +bool
> > +complex_fma_pattern::validate_p (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  /* FMA's top level node is + + and so is not two_operator,
> > +     we only need one child.  */
> > +  slp_tree node;
> > +  auto_vec<slp_tree> nodes;
> > +  auto_vec<slp_tree> new_nodes;
> > +  nodes.create (this->m_num_args);
> > +  hash_set<slp_tree> cache;
> > +  unsigned ix;
> > +  FOR_EACH_VEC_ELT (this->m_ops, ix, node)
> > +    {
> > +      slp_tree vnode = NULL;
> > +      if (vect_slp_make_linear (perm_cache, *this->m_node, node,
> &vnode))
> > +	{
> > +	  nodes.quick_push (vnode);
> > +	  if (vnode != node)
> > +	    new_nodes.safe_push (vnode);
> > +	  if (SLP_TREE_CHILDREN (node).length () == 2)
> > +	    {
> > +	      cache.add(SLP_TREE_CHILDREN (node)[0]);
> > +	      cache.add(SLP_TREE_CHILDREN (node)[1]);
> > +	    }
> > +	  else
> > +	    cache.add (node);
> > +	}
> > +      else
> > +	{
> > +	  FOR_EACH_VEC_ELT (new_nodes, ix, node)
> > +	    delete node;
> > +
> > +	  nodes.release();
> > +	  return false;
> > +	}
> > +    }
> > +
> > +  SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> > +  SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes);
> > +
> > +  /* Ignore the part of the tree that becomes irrelevant for FMA.  */
> > +  vect_slp_reset_pattern (this->m_mul_nodes[0]);
> > +  vect_slp_reset_pattern (this->m_mul_nodes[1]);
> > +  vect_mark_stmts_as_in_pattern (&cache, this->m_mul_nodes[1]);
> > +
> > +  return true;
> > +}
> > +
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_fms_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +class complex_fms_pattern : public complex_mul_pattern
> > +{
> > +  protected:
> > +    complex_fms_pattern (slp_tree *node)
> > +      : complex_mul_pattern (node)
> > +    {
> > +      this->m_num_args = 3;
> > +    }
> > +
> > +  public:
> > +    static vect_pattern* create (slp_tree *node)
> > +    {
> > +       return new complex_fms_pattern (node);
> > +    }
> > +
> > +    const char* get_name ()
> > +    {
> > +      return "Complex FMS";
> > +    }
> > +
> > +    bool matches (slp_tree_to_load_perm_map_t *);
> > +    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> > +		  vec<slp_tree>);
> > +};
> > +
> > +/* Pattern matcher for trying to match complex multiply and accumulate
> > +   and multiply and subtract patterns in SLP tree.
> > +   If the operation matches then IFN is set to the operation it matched and
> > +   the arguments to the two replacement statements are put in m_ops.
> > +
> > +   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> > +
> > +   This function matches the patterns shaped as:
> > +
> > +   double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
> > +   double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
> > +
> > +   c[i] = c[i] - ax;
> > +   c[i+1] = c[i+1] + bx;
> > +
> > +   If a match occurred then TRUE is returned, else FALSE.  The initial match
> is
> > +   expected to be in OP1 and the initial match operands in args0.  */
> > +bool
> > +complex_fms_pattern::matches (complex_operation_t op1,
> > +			      slp_tree_to_load_perm_map_t *perm_cache,
> > +			      vec<slp_tree> args0)
> > +{
> > +  this->m_ifn = IFN_LAST;
> > +
> > +  /* Find the two components.  We match Complex MUL first which
> reduces the
> > +     amount of work this pattern has to do.  After that we just match the
> > +     head node and we're done.:
> > +
> > +     * FMS: - +.  */
> > +  slp_tree child = NULL;
> > +
> > +  /* We need to ignore the two_operands nodes that may also match,
> > +     for that we can check if they have any scalar statements and also
> > +     check that it's not a permute node as we're looking for a normal
> > +     PLUS_EXPR operation.  */
> > +  if (op1 != PLUS_MINUS)
> > +    return false;
> > +
> > +  child = SLP_TREE_CHILDREN (args0[1])[1];
> > +  if (vect_detect_pair_op (child) != MINUS_PLUS)
> > +    return false;
> > +
> > +  /* First two nodes must be a multiply.  */
> > +  auto_vec<slp_tree> muls;
> > +  if (vect_match_call_complex_mla (child, 0) != MULT_MULT
> > +      || vect_match_call_complex_mla (child, 1, &muls) != MULT_MULT)
> > +    return false;
> > +
> > +  /* Now operand2+4 may lead to another expression.  */
> > +  auto_vec<slp_tree> left_op, right_op;
> > +  left_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
> > +  right_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
> > +
> > +  bool is_neg = vect_normalize_conj_loc (right_op);
> > +
> > +  this->m_ops.create (6);
> > +  child = SLP_TREE_CHILDREN (args0[0])[0];
> > +  this->m_ops.quick_push (child);
> > +  if (!is_neg)
> > +    {
> > +      this->m_ifn = IFN_COMPLEX_FMS;
> > +      this->m_ops.quick_push (left_op[1]);
> > +      this->m_ops.quick_push (left_op[0]);
> > +    }
> > +  else if (is_neg)
> > +    {
> > +      bool conj_first_operand;
> > +      if (!vect_validate_multiplication (perm_cache, left_op, right_op, false,
> > +					 &conj_first_operand))
> > +	return false;
> > +
> > +      this->m_ifn = IFN_COMPLEX_FMS_CONJ;
> > +      /* Check if the conjucate is on the first or second parameter.  */
> > +      if (!conj_first_operand)
> > +	{
> > +	  this->m_ops.quick_push (left_op[1]);
> > +	  this->m_ops.quick_push (left_op[0]);
> > +	}
> > +      else
> > +	{
> > +	  this->m_ops.quick_push (left_op[0]);
> > +	  this->m_ops.quick_push (left_op[1]);
> > +	}
> > +    }
> > +
> > +  this->m_ops.quick_push (child);
> > +  this->m_ops.quick_push (right_op[1]);
> > +  this->m_ops.quick_push (right_op[0]);
> > +
> > +  vect_build_perm_groups (perm_cache, &this->m_blocks[0], this-
> >m_ops);
> > +  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> > +  return true;
> > +}
> > +
> > +bool
> > +complex_fms_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  auto_vec<slp_tree> args0;
> > +  complex_operation_t op
> > +    = vect_detect_pair_op (*this->m_node, true, &args0);
> > +  return matches (op, perm_cache, args0);
> > +}
> > +
> > +
> >
> +/*********************************************************
> **********************
> > + * complex_operations_pattern class
> > +
> **********************************************************
> ********************/
> > +
> > +/* This function combines all the existing pattern matchers above into one
> class
> > +   that shares the functionality between them.  The initial match is shared
> > +   between all complex operations.  */
> > +
> > +class complex_operations_pattern : public complex_pattern
> > +{
> > +  protected:
> > +    complex_operations_pattern (slp_tree *node)
> > +      : complex_pattern (node)
> > +    { }
> > +
> > +  gcall *build ();
> > +
> > +  /* The current "active" pattern.  */
> > +  vect_pattern *m_patt = NULL;
> > +
> > +  public:
> > +    using complex_pattern::matches;
> > +
> > +    bool matches (slp_tree_to_load_perm_map_t *perm_cache);
> > +    const char* get_name ();
> > +    bool validate_p (slp_tree_to_load_perm_map_t *perm_cache);
> > +    bool is_optab_supported_p (tree, optimization_type);
> > +    internal_fn get_ifn ();
> > +    gcall *build (vec_info *);
> > +    void reset (slp_tree *);
> > +    ~complex_operations_pattern ();
> > +
> > +    static vect_pattern* create (slp_tree *node)
> > +    {
> > +       return new complex_operations_pattern (node);
> > +    }
> > +};
> > +
> > +
> > +/* Clean up the pattern if one existed.  */
> > +
> > +complex_operations_pattern::~complex_operations_pattern ()
> > +{
> > +  if (this->m_patt)
> > +    delete this->m_patt;
> > +  this->m_patt = NULL;
> > +}
> > +
> > +/* Match but do not perform any additional operations on the SLP tree.  If
> the
> > +   match is successful then that pattern is made "active" in the class.  */
> > +
> > +bool
> > +complex_operations_pattern::matches (slp_tree_to_load_perm_map_t
> *perm_cache)
> > +{
> > +  complex_operation_t op
> > +    = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
> > +
> > +  /* Check which pattern this may be.  Match longest pattern first.  */
> > +  this->m_patt = complex_fma_pattern::create (this->m_node);
> > +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > +    return true;
> 
> Hmm, all of these allocations like also
> 
> +  for (unsigned x = 0; x < num__slp_patterns; x++)
> +    {
> +      vect_pattern *pattern = slp_patterns[x] (ref_node);
> +      found_p = pattern->recognize (perm_cache, vinfo);
> +      delete pattern;
> +      found_rec_p = found_p | found_rec_p;
> +    }
> 
> in the main driver are really due to the C++-ish layout of all this :/
> I don't like that very much.  It also highlights when reading
> the patch that it's hard to get the difference betwee
> recognize (), match () and validate () where all but the first is
> kind-of internal implementation details of the complex matchers.
> 
> So we could eventually get by with the main driver doing
> 
> +  for (unsigned x = 0; x < num__slp_patterns; x++)
> +    {
> +      vect_pattern *pattern = slp_patterns[x] (ref_node);
>        if (pattern)
>          found_rec_p = true;
>      }
> 
> thus merge ->recognize with the static allocator and not do
> any heap allocation when nothing is matched?  Then ...
> 
> > +  if (op == CMPLX_NONE)
> > +    return false;
> > +
> > +  delete this->m_patt;
> > +
> > +  this->m_patt = complex_fms_pattern::create (this->m_node);
> 
> ... this could use static typing and an automatic instance?
> 
> The myth that allocation & deallocation is cheap is really that,
> a myth.  We're doing 6 allocations & deallocations at this
> very toplevel for each SLP node (which means eventually for
> each statement in a function).
> 
> 
> So why for example would one need to override recognize ()?
> I'd argue for nailing it in place and to remove validate_p ()
> (should be done by matches?).  At least validate_p for
> the complex patterns seem to build SLP nodes which mean
> they actually build()?
> 
> 
> I think overall the patch looks sensible but it's still quite
> hard to follow the flow of an actual pattern match, possibly
> because of the strange factoring.
> 
> At this point to speed up things can you split the patch
> so there's only the very simplest (complex_add for example)
> matcher plus requirements?  Then we can push that which
> would allow me to actually play with it more easily and
> even try out changes?
> 
> I'd say split out the optimize_load_redistribution part of
> the first patch and merge the complex_add support into
> its remains so we have a first patch that does something?
> 
> Thanks and sorry for being dense here ... :/  There's pieces
> I think we should do better but I still can't lay out a
> solution and I guess pointing fingers at the individual pieces
> like above will lead to very slow progress only.
> 
> Richard.
> 
> > +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > +    return true;
> > +
> > +  delete this->m_patt;
> > +
> > +  this->m_patt = complex_mul_pattern::create (this->m_node);
> > +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > +    return true;
> > +
> > +  delete this->m_patt;
> > +
> > +  this->m_patt = complex_add_pattern::create (this->m_node);
> > +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > +    return true;
> > +
> > +  delete this->m_patt;
> > +
> > +  this->m_patt = NULL;
> > +  return false;
> > +}
> > +
> > +/* Friendly name of the operation the pattern matches.  */
> > +
> > +const char*
> > +complex_operations_pattern::get_name ()
> > +{
> > +  if (!this->m_patt)
> > +    return "Complex Operations";
> > +
> > +  return this->m_patt->get_name ();
> > +}
> > +
> > +/* Only perform the pattern creation part of the matcher.  This creates
> and
> > +   returns the new pattern statement and encapsulates the currently
> active
> > +   pattern.  */
> > +
> > +gcall *
> > +complex_operations_pattern::build (vec_info *vinfo)
> > +{
> > +  if (!this->m_patt)
> > +    return NULL;
> > +
> > +  return this->m_patt->build (vinfo);
> > +}
> > +
> > +/* Validate the operation the currently active pattern detected.  */
> > +
> > +bool
> > +complex_operations_pattern::validate_p
> (slp_tree_to_load_perm_map_t *perm_cache)
> > +{
> > +  if (!this->m_patt)
> > +    return false;
> > +
> > +  return this->m_patt->validate_p (perm_cache);
> > +}
> > +
> > +/* Check if the obtab of the currently active matches is supported.  */
> > +
> > +bool
> > +complex_operations_pattern::is_optab_supported_p (tree vectype,
> > +						  optimization_type opt_type)
> > +{
> > +  if (!vectype || !this->m_patt)
> > +    return false;
> > +
> > +  return this->m_patt->is_optab_supported_p (vectype, opt_type);
> > +}
> > +
> > +/* Return the matched IFN from the current active matcher if any.  */
> > +
> > +internal_fn
> > +complex_operations_pattern::get_ifn ()
> > +{
> > +  if (!this->m_patt)
> > +    return IFN_LAST;
> > +
> > +  return this->m_patt->get_ifn ();
> > +}
> > +
> >
> +/*********************************************************
> **********************
> > + * Pattern matching definitions
> > +
> **********************************************************
> ********************/
> > +
> > +#define SLP_PATTERN(x) &x::create
> > +vect_pattern_decl_t slp_patterns[]
> > +{
> > +  /* For least amount of back-tracking and more efficient matching
> > +     order patterns from the largest to the smallest.  Especially if they
> > +     overlap in what they can detect.  */
> > +
> > +  /* FMA overlaps with MUL but is the longer sequence.  Because we're in
> post
> > +     order traversal we can't match FMA if included in
> > +     complex_operations_pattern so must be checked on it's own.  */
> > +  SLP_PATTERN (complex_operations_pattern),
> > +};
> > +#undef SLP_PATTERN
> > +
> > +/* Set the number of SLP pattern matchers available.  */
> > +size_t num__slp_patterns =
> sizeof(slp_patterns)/sizeof(vect_pattern_decl_t);
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg,
> Germany; GF: Felix Imend
Richard Biener Nov. 17, 2020, 7:50 a.m. UTC | #3
On Mon, 16 Nov 2020, Tamar Christina wrote:

> Hi Richi, thanks for the review!
> 
> Just a quick comment on one of the questions asked:
> 
> > -----Original Message-----
> > From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> > Behalf Of Richard Biener
> > Sent: Monday, November 16, 2020 3:12 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; ook@ucw.cz;
> > hongtao.liu@intel.com
> > Subject: Re: [PATCH 2/2] middle-end : Add support for complex pattern
> > detection for Add, Mul, FMA and FMS
> > 
> > On Sat, 14 Nov 2020, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This patch series adds support for SLP vectorization of complex instructions
> > [1].
> > >
> > > These instructions exist only in their vector forms and require you to
> > recognize
> > > two statements in parallel.  Complex operations usually require a permute
> > due to
> > > the fact that the real and imaginary numbers are stored intermixed but
> > these vector
> > > instructions expect this and no longer need the compiler to generate a
> > permute.
> > >
> > > The pass is able to insert additional permutes inside the SLP tree in order to
> > > change the data flow such that the new instructions can be used.  These
> > permutes
> > > are then optimized out later during optimize_slp.
> > >
> > > These instructions also support rotations along the Argand plane, as such
> > the operands
> > > have to be re-ordered to coincide with their load group.
> > >
> > > For now, this patch only adds support for:
> > >
> > >   * Complex Addition with rotation of 90 and 270.
> > >
> > >   Addition with rotation of the second argument around the Argand plane.
> > >     Supported rotations are 90 and 180.
> > >
> > >     c = a + (b * I) and c = a + (b * I * I * I)
> > >
> > >   * Complex Multiplication and Multiplication where one operand is
> > conjucated.
> > >
> > >   Complex multiplication and Conjucate Complex multiplication of the
> > second
> > >      parameter.
> > >
> > >     c = a * b and c = a * conj (b)
> > >
> > >   * Complex FMA and FMA where one operand is conjucated.
> > >   * Complex FMS and FMS where one operand is conjucated.
> > >
> > >   Complex FMLA, Conjucate FMLA of the second parameter and FMLS.
> > >
> > >     c += a * b, c += a * conj (b), c -= a * b and c -= a * conj (b)
> > >
> > >   For the conjucate cases it supports under fast-math that the operands
> > that is
> > >   being conjucated be flipped by flipping the arguments to the optab.  This
> > >   allows it to support c = conj (a) * b and c += conj (a) * b.
> > >
> > >   where a, b and c are complex numbers.
> > >
> > > Complex dot-product is not currently supported in this patch set as
> > build_slp fails
> > > for it.  This will be provided as a future patch.
> > >
> > > These are supported for both integer and floating point and as such these
> > don't look
> > > for real or imaginary pairs but instead rely on the early lowering of complex
> > > numbers by GCC and canonization of the operations such that it just
> > recognizes any
> > > instruction sequence matching the operations requested.
> > >
> > > To be safe when the it is not sure it can support the operation or if it finds
> > > something it does not understand it backs off.  This is done by looking at
> > the
> > > order of the loads below the nodes.  Anything that changes the order like a
> > > VEC_PERM is taken into account.  The order of the loads define which
> > operation
> > > is being done.   For instance a complex add with conjucate always has the
> > second
> > > loads in non-contiguous order.   Because this information is queried often it
> > is
> > > heavily cached and because we never modify nodes in place we can cache
> > the
> > > operation through all SLP instances.
> > >
> > > When SLP failes (e.g. due to costing) we dissolve the changes and restore
> > the
> > > previous relevancies of the stmts that were in a pattern before.
> > >
> > > [1] https://developer.arm.com/documentation/ddi0487/fc/
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master? (working on moving tests to middend IFN tests instead).
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* tree-vect-slp-patterns.c: New file.
> > > 	* Makefile.in: Use it.
> > > 	* doc/passes.texi: Document it.
> > > 	* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270,
> > > 	COMPLEX_MUL, COMPLEX_MUL_CONJ, COMPLEX_FMA,
> > COMPLEX_FMA_CONJ):
> > > 	COMPLEX_FMS, COMPLEX_FMS_CONJ): New.
> > > 	* optabs.def (cadd90_optab, cadd270_optab, cmul_optab,
> > cmul_conj_optab,
> > > 	cmla_optab, cmla_conj_optab, cmls_optab, cmls_conj_optab): New.
> > > 	* doc/md.texi: Document them
> > >
> > > -- inline copy of patch --
> > >
> > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > > index
> > 273654cfa2525099ed0d9adc2f9085c8408b096f..4b556bdbbc2f989d894ef2dbe
> > 894318858eaa7aa 100644
> > > --- a/gcc/Makefile.in
> > > +++ b/gcc/Makefile.in
> > > @@ -1646,6 +1646,7 @@ OBJS = \
> > >  	tree-vect-loop.o \
> > >  	tree-vect-loop-manip.o \
> > >  	tree-vect-slp.o \
> > > +	tree-vect-slp-patterns.o \
> > >  	tree-vectorizer.o \
> > >  	tree-vector-builder.o \
> > >  	tree-vrp.o \
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > index
> > 813875b973bc73920f1195c4897ce3491ca713c2..68792e4c3eeca2bce0112fc28e
> > ba38700e66dc00 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -1493,7 +1493,7 @@ can be described by a series of letters for each
> > operand.  The overall
> > >  constraint for an operand is made from the letters for this operand
> > >  from the first alternative, a comma, the letters for this operand from
> > >  the second alternative, a comma, and so on until the last alternative.
> > > -All operands for a single instruction must have the same number of
> > > +All operands for a single instruction must have the same number of
> > >  alternatives.
> > >  @ifset INTERNALS
> > >  Here is how it is done for fullword logical-or on the 68000:
> > > @@ -1558,17 +1558,17 @@ the first, 1 for the second alternative, etc.).
> > @xref{Output Statement}.
> > >  @end ifset
> > >  @ifclear INTERNALS
> > >
> > > -So the first alternative for the 68000's logical-or could be written as
> > > -@code{"+m" (output) : "ir" (input)}.  The second could be @code{"+r"
> > > -(output): "irm" (input)}.  However, the fact that two memory locations
> > > -cannot be used in a single instruction prevents simply using @code{"+rm"
> > > -(output) : "irm" (input)}.  Using multi-alternatives, this might be
> > > +So the first alternative for the 68000's logical-or could be written as
> > > +@code{"+m" (output) : "ir" (input)}.  The second could be @code{"+r"
> > > +(output): "irm" (input)}.  However, the fact that two memory locations
> > > +cannot be used in a single instruction prevents simply using @code{"+rm"
> > > +(output) : "irm" (input)}.  Using multi-alternatives, this might be
> > >  written as @code{"+m,r" (output) : "ir,irm" (input)}.  This describes
> > > -all the available alternatives to the compiler, allowing it to choose
> > > +all the available alternatives to the compiler, allowing it to choose
> > >  the most efficient one for the current conditions.
> > >
> > > -There is no way within the template to determine which alternative was
> > > -chosen.  However you may be able to wrap your @code{asm} statements
> > with
> > > +There is no way within the template to determine which alternative was
> > > +chosen.  However you may be able to wrap your @code{asm} statements
> > with
> > >  builtins such as @code{__builtin_constant_p} to achieve the desired
> > results.
> > >  @end ifclear
> > >
> > > @@ -3062,7 +3062,7 @@ instruction taking only the upper 16-bits of a
> > >  16-bits being 0.
> > >
> > >  @item L
> > > -Integer that is valid as an immediate operand for a
> > > +Integer that is valid as an immediate operand for a
> > >  shift instruction. Range 0 to 31.
> > >
> > >  @item M
> > > @@ -3076,7 +3076,7 @@ Integer that is valid as an immediate operand for
> > >  a custom instruction opcode. Range 0 to 255.
> > >
> > >  @item P
> > > -An immediate operand for R2 andchi/andci instructions.
> > > +An immediate operand for R2 andchi/andci instructions.
> > >
> > >  @item S
> > >  Matches immediates which are addresses in the small
> > > @@ -6132,6 +6132,97 @@ floating-point mode.
> > >
> > >  This pattern is not allowed to @code{FAIL}.
> > >
> > > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern
> > > +@item @samp{cadd@var{m}@var{n}3}
> > > +Perform a vector addition of complex numbers in operand 1 with operand
> > 2
> > > +rotated by @var{m} degrees around the argand plane and storing the
> > result in
> > > +operand 0.  The instruction must perform the operation on data that in
> > GCC lane
> > > +order where the 0th lane holds the real part and the the imaginary part in
> > lane
> > > +1.
> > 
> > So I'm not sure I'm happy with defining this in terms of 'complex
> > numbers'.  At least we nowhere specify how complex numbers are laid out in
> > vector registers.  For V4DF it could be any of {real0,imag0,real1,imag1},
> > {real0,real1, imag0,imag1}, {imag0,real0,imag1,real1} or
> > {imag0,imag1, real0,real1}.  While it's easy to give names to semantics
> > this way I'd prefer to define this in terms of even/odd lanes of a
> > vector.  Maybe it can refer to the aarch64 example of complex number
> > instructions implementing such scheme.
> > 
> > So at _least_ we have to define the layout of complex components
> > in the vector modes (preferably in a target independent way), say
> > even lanes are real parts and odd lanes are imaginary parts?
> > 
> > > +
> > > +The operation is only supported for vector modes @var{n} and with
> > > +rotations @var{m} of 90 or 270.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > > +@cindex @code{cmla@var{m}4} instruction pattern
> > > +@item @samp{cmla@var{m}4}
> > > +Perform a vector floating point multiply and accumulate of complex
> > numbers
> > > +in operand 0, operand 1 and operand 2.
> > > +
> > > +The instruction must perform the operation on data that in GCC lane
> > > +order where the 0th lane holds the real part and the the imaginary part in
> > lane
> > > +1.
> > > +
> > > +The operation is only supported for vector modes @var{m}.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > > +@cindex @code{cmla_conj@var{m}4} instruction pattern
> > > +@item @samp{cmla_conj@var{m}4}
> > > +Perform a vector floating point multiply and accumulate of complex
> > numbers
> > > +in operand 0, operand 1 and the conjucate of operand 2.
> > > +
> > > +The instruction must perform the operation on data that in GCC lane
> > > +order where the 0th lane holds the real part and the the imaginary part in
> > lane
> > > +1.
> > > +
> > > +The operation is only supported for vector modes @var{m}.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > > +@cindex @code{cmls@var{m}4} instruction pattern
> > > +@item @samp{cmls@var{m}4}
> > > +Perform a vector floating point multiply and subtract of complex numbers
> > > +in operand 0, operand 1 and operand 2.
> > > +
> > > +The instruction must perform the operation on data that in GCC lane
> > > +order where the 0th lane holds the real part and the the imaginary part in
> > lane
> > > +1.
> > > +
> > > +The operation is only supported for vector modes @var{m}.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > > +@cindex @code{cmls_conj@var{m}4} instruction pattern
> > > +@item @samp{cmls_conj@var{m}4}
> > > +Perform a vector floating point multiply and subtract of complex numbers
> > > +in operand 0, operand 1 and the conjucate of operand 2.
> > > +
> > > +The instruction must perform the operation on data that in GCC lane
> > > +order where the 0th lane holds the real part and the the imaginary part in
> > lane
> > > +1.
> > > +
> > > +The operation is only supported for vector modes @var{m}.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > > +@cindex @code{cmul@var{m}4} instruction pattern
> > > +@item @samp{cmul@var{m}4}
> > > +Perform a vector floating point multiplication of complex numbers in
> > operand 0
> > > +and operand 1.
> > > +
> > > +The instruction must perform the operation on data that in GCC lane
> > > +order where the 0th lane holds the real part and the the imaginary part in
> > lane
> > > +1.
> > > +
> > > +The operation is only supported for vector modes @var{m}.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > > +@cindex @code{cmul_conj@var{m}4} instruction pattern
> > > +@item @samp{cmul_conj@var{m}4}
> > > +Perform a vector floating point multiplication of complex numbers in
> > operand 0
> > > +and the conjucate of operand 1.
> > > +
> > > +The instruction must perform the operation on data that in GCC lane
> > > +order where the 0th lane holds the real part and the the imaginary part in
> > lane
> > > +1.
> > > +
> > > +The operation is only supported for vector modes @var{m}.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > >  @cindex @code{ffs@var{m}2} instruction pattern
> > >  @item @samp{ffs@var{m}2}
> > >  Store into operand 0 one plus the index of the least significant 1-bit
> > > @@ -7409,7 +7500,7 @@ If this pattern is not defined, then a
> > @code{memory_barrier} pattern
> > >  will be emitted, followed by a store of the value to the memory operand.
> > >
> > >  @cindex @code{atomic_compare_and_swap@var{mode}} instruction
> > pattern
> > > -@item @samp{atomic_compare_and_swap@var{mode}}
> > > +@item @samp{atomic_compare_and_swap@var{mode}}
> > >  This pattern, if defined, emits code for an atomic compare-and-swap
> > >  operation with memory model semantics.  Operand 2 is the memory on
> > which
> > >  the atomic operation is performed.  Operand 0 is an output operand which
> > > @@ -7424,7 +7515,7 @@ if the operation fails.
> > >
> > >  If memory referred to in operand 2 contains the value in operand 3, then
> > >  operand 4 is stored in memory pointed to by operand 2 and fencing based
> > on
> > > -the memory model in operand 6 is issued.
> > > +the memory model in operand 6 is issued.
> > >
> > >  If memory referred to in operand 2 does not contain the value in operand
> > 3,
> > >  then fencing based on the memory model in operand 7 is issued.
> > > @@ -7500,9 +7591,9 @@ none of these are available a compare-and-swap
> > loop will be used.
> > >  @itemx @samp{atomic_fetch_or@var{mode}},
> > @samp{atomic_fetch_and@var{mode}}
> > >  @itemx @samp{atomic_fetch_xor@var{mode}},
> > @samp{atomic_fetch_nand@var{mode}}
> > >  These patterns emit code for an atomic operation on memory with
> > memory
> > > -model semantics, and return the original value. Operand 0 is an output
> > > -operand which contains the value of the memory location before the
> > > -operation was performed.  Operand 1 is the memory on which the atomic
> > > +model semantics, and return the original value. Operand 0 is an output
> > > +operand which contains the value of the memory location before the
> > > +operation was performed.  Operand 1 is the memory on which the atomic
> > >  operation is performed.  Operand 2 is the second operand to the binary
> > >  operator.  Operand 3 is the memory model to be used by the operation.
> > >
> > > @@ -7859,7 +7950,7 @@ simplified) from the PDP-11 target:
> > >          (plus:HI (match_dup 0)
> > >                (const_int -1)))]
> > >    ""
> > > -
> > > +
> > >    @{
> > >      if (which_alternative == 0)
> > >        return "sob %0,%l1";
> > > diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> > > index
> > a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99
> > a23386891a7b0c1 100644
> > > --- a/gcc/doc/passes.texi
> > > +++ b/gcc/doc/passes.texi
> > > @@ -709,7 +709,8 @@ loop.
> > >  The pass is implemented in @file{tree-vectorizer.c} (the main driver),
> > >  @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific
> > parts
> > >  and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
> > > -functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}.
> > > +functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and
> > > +@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher.
> > >  Analysis of data references is in @file{tree-data-ref.c}.
> > >
> > >  SLP Vectorization.  This pass performs vectorization of straight-line code.
> > The
> > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > index
> > 310d37aa53819791b5df1683afca831f08e5892a..acb7d9f3bdc757437d5492a652
> > 144ba31c2ef702 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -277,12 +277,21 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST,
> > scalb, binary)
> > >  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
> > >  DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> > >  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90,
> > binary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST,
> > cadd270, binary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST,
> > cmul_conj, binary)
> > > +
> > >
> > >  /* FP scales.  */
> > >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > >
> > >  /* Ternary math functions.  */
> > >  DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST,
> > cmla_conj, ternary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST,
> > cmls_conj, ternary)
> > >
> > >  /* Unary integer ops.  */
> > >  DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb,
> > unary)
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > index
> > 78409aa14537d259bf90277751aac00d452a0d3f..19db9c00896cd08adfd20a0166
> > 9990bbbebd79f1 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -290,6 +290,14 @@ OPTAB_D (atan_optab, "atan$a2")
> > >  OPTAB_D (atanh_optab, "atanh$a2")
> > >  OPTAB_D (copysign_optab, "copysign$F$a3")
> > >  OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > > +OPTAB_D (cadd90_optab, "cadd90$a3")
> > > +OPTAB_D (cadd270_optab, "cadd270$a3")
> > > +OPTAB_D (cmul_optab, "cmul$a3")
> > > +OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
> > > +OPTAB_D (cmla_optab, "cmla$a4")
> > > +OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
> > > +OPTAB_D (cmls_optab, "cmls$a4")
> > > +OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
> > >  OPTAB_D (cos_optab, "cos$a2")
> > >  OPTAB_D (cosh_optab, "cosh$a2")
> > >  OPTAB_D (exp10_optab, "exp10$a2")
> > > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..7a2c64aeb9f212cd3ee18adf11
> > 0e6ac6696249fd
> > > --- /dev/null
> > > +++ b/gcc/tree-vect-slp-patterns.c
> > > @@ -0,0 +1,1844 @@
> > > +/* SLP - Pattern matcher on SLP trees
> > > +   Copyright (C) 2020 Free Software Foundation, Inc.
> > > +
> > > +This file is part of GCC.
> > > +
> > > +GCC is free software; you can redistribute it and/or modify it under
> > > +the terms of the GNU General Public License as published by the Free
> > > +Software Foundation; either version 3, or (at your option) any later
> > > +version.
> > > +
> > > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> > License
> > > +for more details.
> > > +
> > > +You should have received a copy of the GNU General Public License
> > > +along with GCC; see the file COPYING3.  If not see
> > > +<http://www.gnu.org/licenses/>.  */
> > > +
> > > +#include "config.h"
> > > +#include "system.h"
> > > +#include "coretypes.h"
> > > +#include "backend.h"
> > > +#include "target.h"
> > > +#include "rtl.h"
> > > +#include "tree.h"
> > > +#include "gimple.h"
> > > +#include "tree-pass.h"
> > > +#include "ssa.h"
> > > +#include "optabs-tree.h"
> > > +#include "insn-config.h"
> > > +#include "recog.h"		/* FIXME: for insn_data */
> > > +#include "fold-const.h"
> > > +#include "stor-layout.h"
> > > +#include "gimple-iterator.h"
> > > +#include "cfgloop.h"
> > > +#include "tree-vectorizer.h"
> > > +#include "langhooks.h"
> > > +#include "gimple-walk.h"
> > > +#include "dbgcnt.h"
> > > +#include "tree-vector-builder.h"
> > > +#include "vec-perm-indices.h"
> > > +#include "gimple-fold.h"
> > > +#include "internal-fn.h"
> > > +
> > > +/* SLP Pattern matching mechanism.
> > > +
> > > +  This extension to the SLP vectorizer allows one to transform the
> > generated SLP
> > > +  tree based on any pattern.  The difference between this and the normal
> > vect
> > > +  pattern matcher is that unlike the former, this matcher allows you to
> > match
> > > +  with instructions that do not belong to the same SSA dominator graph.
> > > +
> > > +  The only requirement that this pattern matcher has is that you are only
> > > +  only allowed to either match an entire group or none.
> > > +
> > > +  The pattern matcher currently only allows you to perform replacements
> > to
> > > +  internal functions.
> > > +
> > > +  Once the patterns are matched it is one way, these cannot be undone.  It
> > is
> > > +  currently not supported to match patterns recursively.
> > > +
> > > +  To add a new pattern, implement the vect_pattern class and add the
> > type to
> > > +  slp_patterns.
> > > +
> > > +*/
> > > +
> > >
> > +/*********************************************************
> > **********************
> > > + * vect_pattern class
> > > +
> > **********************************************************
> > ********************/
> > > +
> > > +/* Default implementation of recognize that peforms matching, validation
> > and
> > > +   replacement of nodes but that can be overriden if required.  */
> > > +
> > > +bool
> > > +vect_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
> > > +			 vec_info *vinfo)
> > > +{
> > > +  if (this->matches (perm_cache))
> > > +    {
> > > +      tree vectype = SLP_TREE_VECTYPE (*this->m_node);
> > > +      if (dump_enabled_p ())
> > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > +			 "Found %s pattern in SLP tree\n",
> > > +			 this->get_name ());
> > > +
> > > +      if (this->is_optab_supported_p (vectype, OPTIMIZE_FOR_SPEED))
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	    dump_printf_loc (MSG_NOTE, vect_location,
> > > +			     "Target supports %s vectorization with
> > mode %T\n",
> > > +			     internal_fn_name (this->get_ifn ()), vectype);
> > > +	}
> > > +      else
> > > +	{
> > > +	  if (dump_enabled_p ())
> > > +	    {
> > > +	      if (!vectype)
> > > +		dump_printf_loc (MSG_NOTE, vect_location,
> > > +				 "Target does not support vector type
> > for %T\n",
> > > +				 SLP_TREE_DEF_TYPE (*this->m_node));
> > > +	      else
> > > +		dump_printf_loc (MSG_NOTE, vect_location,
> > > +				 "Target does not support %s for vector type
> > "
> > > +				 "%T\n", internal_fn_name (this->get_ifn ()),
> > > +				 vectype);
> > > +	    }
> > > +          return false;
> > > +	}
> > > +    }
> > > +  else
> > > +    return false;
> > > +
> > > +  if (!this->validate_p (perm_cache))
> > > +    {
> > > +      if (dump_enabled_p ())
> > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > +			"transformation for %s not valid due to post "
> > > +			"condition\n",
> > > +			internal_fn_name (this->get_ifn ()));
> > > +      return false;
> > > +    }
> > > +
> > > +  if (dump_enabled_p ())
> > > +    dump_printf_loc (MSG_NOTE, vect_location, "Creating vec
> > patterns\n");
> > > +
> > > +  gcall* call_stmt = this->build (vinfo);
> > > +  if (dump_enabled_p ())
> > > +    dump_printf_loc (MSG_NOTE, vect_location, "\t %p stmt: %G",
> > > +		     *this->m_node, call_stmt);
> > > +
> > > +  return true;
> > > +}
> > > +
> > >
> > +/*********************************************************
> > **********************
> > > + * General helper types
> > > +
> > **********************************************************
> > ********************/
> > > +
> > > +/* The COMPLEX_OPERATION enum denotes the possible pair of
> > operations that can
> > > +   be matched when looking for expressions that we are interested
> > matching for
> > > +   complex numbers addition and mla.  */
> > > +
> > > +typedef enum _complex_operation : unsigned {
> > > +  PLUS_PLUS,
> > > +  MINUS_PLUS,
> > > +  PLUS_MINUS,
> > > +  MULT_MULT,
> > > +  CMPLX_NONE
> > > +} complex_operation_t;
> > > +
> > > +/* A simple pair type used for ordering of combine operations.  */
> > > +
> > > +typedef struct map_ { int a; int b; } map_t;
> > > +
> > >
> > +/*********************************************************
> > **********************
> > > + * General helper functions
> > > +
> > **********************************************************
> > ********************/
> > > +
> > > +/* Helper function of linear_loads_p that checks to see if the load
> > permutation
> > > +   is sequential and in monotonically increasing order of loads with no gaps.
> > > +*/
> > > +
> > > +static inline bool
> > > +is_linear_load_p (load_permutation_t loads)
> > > +{
> > > +  if (loads.length() == 0)
> > > +    return false;
> > > +
> > > +  unsigned leader = loads[0];
> > > +  unsigned load, i;
> > > +  FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
> > > +    if (load != ++leader)
> > > +      return false;
> > > +  return true;
> > > +}
> > > +
> > > +
> > > +/* Check to see if all loads rooted in ROOT are linear.  Linearity is
> > > +   defined as having no gaps between values loaded.  */
> > > +
> > > +static load_permutation_t
> > > +linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree
> > root,
> > > +		bool *linear)
> > > +{
> > > +  *linear = false;
> > > +  if (!root)
> > > +    return vNULL;
> > > +
> > > +  unsigned i;
> > > +  load_permutation_t loads = vNULL;
> > > +  load_permutation_t *tmp;
> > > +
> > > +  if ((tmp = perm_cache->get (root)) != NULL)
> > > +    {
> > > +      *linear = is_linear_load_p (*tmp);
> > > +      return *tmp;
> > > +    }
> > > +
> > > +  perm_cache->put (root, vNULL);
> > > +
> > > +  /* If it's a load node, then just read the load permute.  */
> > > +  if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
> > > +    {
> > > +      loads = SLP_TREE_LOAD_PERMUTATION (root);
> > > +      perm_cache->put (root, loads);
> > > +      if (!is_linear_load_p (loads))
> > > +	return loads;
> > > +    }
> > > +  else if (SLP_TREE_DEF_TYPE (root) == vect_external_def)
> > > +    {
> > > +       loads.create (SLP_TREE_LANES (root));
> > > +       tree op;
> > > +       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (root), i, op)
> > > +	 {
> > > +	   gimple *defstmt = SSA_NAME_DEF_STMT (op);
> > > +	   if (!is_gimple_assign (defstmt))
> > > +	     return vNULL;
> > > +
> > > +	   switch (gimple_assign_rhs_code (defstmt))
> > > +	   {
> > > +	     case IMAGPART_EXPR:
> > > +	       loads.safe_push (1);
> > > +	       break;
> > > +	     case REALPART_EXPR:
> > > +	       loads.safe_push (0);
> > > +	       break;
> > > +	     default:
> > > +	       {
> > > +		 loads.release ();
> > > +		 return vNULL;
> > > +	       }
> > > +	   }
> > > +	 }
> > > +
> > > +       perm_cache->put (root, loads);
> > > +       if (!is_linear_load_p (loads))
> > > +	 return loads;
> > > +    }
> > > +  else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def)
> > > +    return vNULL;
> > > +
> > > +
> > > +
> > > +  auto_vec<load_permutation_t> all_loads;
> > > +  bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists ();
> > > +
> > > +  slp_tree child;
> > > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
> > > +    {
> > > +      /* Seed it with a value to prevent infinite recursions.  */
> > > +      loads = linear_loads_p (perm_cache, child, linear);
> > > +      if ((!*linear && !is_perm) || !loads.exists ())
> > > +	return loads;
> > > +
> > > +      all_loads.safe_push (loads);
> > > +    }
> > > +
> > > +  if (is_perm)
> > > +    {
> > > +      lane_permutation_t perm = SLP_TREE_LANE_PERMUTATION (root);
> > > +      load_permutation_t nloads;
> > > +      nloads.create (SLP_TREE_LANES (root));
> > > +      nloads.quick_grow (SLP_TREE_LANES (root));
> > > +      for (i = 0; i < SLP_TREE_LANES (root); i++)
> > > +	nloads[i] = all_loads[perm[i].first][perm[i].second];
> > > +
> > > +      perm_cache->put (root, nloads);
> > > +      if (!is_linear_load_p (nloads))
> > > +	return nloads;
> > > +      loads = nloads;
> > > +    }
> > > +
> > > +  perm_cache->put (root, loads);
> > > +  *linear = true;
> > > +  return loads;
> > > +}
> > > +
> > > +/* Builds a permutation group from the operands in OPS and stores it in
> > BLOCKS.
> > > +   The group describes how to combine the operators to get a valid linear
> > node.
> > > +
> > > +   This is used when combining multiple children from a two_operators
> > node into
> > > +   one using a lane permute to select the appropriate lane. As an example
> > the
> > > +   permute { [0 0] [1 4] [2 2] [3 3] [1 4] [5 5] } says the nodes which occur
> > > +   twice in a group, e.g [0 0] only needs itself to possibly be made linear
> > > +   whereas [1 4] means to combine the nodes 1 and 4. */
> > > +
> > > +static void
> > > +vect_build_perm_groups (slp_tree_to_load_perm_map_t *perm_cache,
> > > +			map_t *blocks, vec<slp_tree> ops)
> > > +{
> > > +  slp_tree op;
> > > +  unsigned i;
> > > +  bool is_linear = false;
> > > +  unsigned min_eq = -1, max_eq = 0;
> > > +  unsigned min_idx = 0, max_idx = 0;
> > > +  FOR_EACH_VEC_ELT (ops, i, op)
> > > +    {
> > > +      load_permutation_t perms = linear_loads_p (perm_cache, op,
> > &is_linear);
> > > +      unsigned x, imin = -1, imax = 0;
> > > +      for (x = 0; x < perms.length () && !is_linear; x++)
> > > +	{
> > > +	  imin = MIN (imin, perms[x]);
> > > +	  imax = MAX (imax, perms[x]);
> > > +	}
> > > +
> > > +	if (imin != imax || perms.length () == 0 || is_linear)
> > > +	  blocks[i] = {i, i};
> > > +	else
> > > +	  {
> > > +	    if (imin <= min_eq)
> > > +	      {
> > > +		min_eq = imin;
> > > +		min_idx = i;
> > > +	      }
> > > +
> > > +	    if (imin >= max_eq)
> > > +	      {
> > > +		max_eq = imin;
> > > +		max_idx = i;
> > > +	      }
> > > +	  }
> > > +    }
> > > +
> > > +  /* Now fill in the gap.  */
> > > +  blocks[min_idx] = {min_idx, max_idx};
> > > +  blocks[max_idx] = {min_idx, max_idx};
> > > +
> > > +  if (dump_enabled_p ())
> > > +    {
> > > +      dump_printf_loc (MSG_NOTE, vect_location, "pattern group: { ");
> > > +      for (i = 0; i < ops.length (); i++)
> > > +	dump_printf (MSG_NOTE,"[%d %d] ", blocks[i].a, blocks[i].b);
> > > +      dump_printf (MSG_NOTE,"}\n");
> > > +    }
> > > +
> > > +}
> > > +
> > > +/* This function attempts to make a node rooted in NODE with parent
> > PARENT
> > > +   linear.  If the node if already linear than the node itself is returned
> > > +   in RESULT.
> > > +
> > > +   If the node is not linear then a new VEC_PERM_EXPR node is created
> > with a
> > > +   lane permute that when applied will make the node linear.   If such a
> > > +   permute cannot be created then FALSE is returned from the function.
> > > +
> > > +   Here linearity is defined as having a sequential, monotically increasing
> > > +   load position inside the load permute generated by the loads reachable
> > from
> > > +   NODE.  */
> > > +
> > > +static bool
> > > +vect_slp_make_linear (slp_tree_to_load_perm_map_t *perm_cache,
> > > +		      slp_tree parent, slp_tree node, slp_tree *result)
> > > +{
> > > +  bool is_linear = false;
> > > +  unsigned x, val;
> > > +  load_permutation_t load_perm = linear_loads_p (perm_cache, node,
> > &is_linear);
> > > +  if (is_linear)
> > > +    {
> > > +      *result = node;
> > > +      return true;
> > > +    }
> > > +
> > > +  /* Attempt to linearise the permute.  */
> > > +  vec<std::pair<unsigned, unsigned> > zipped;
> > > +  zipped.create (load_perm.length ());
> > > +  FOR_EACH_VEC_ELT (load_perm, x, val)
> > > +    zipped.quick_push (std::make_pair (val, x));
> > > +
> > > +  typedef const std::pair<unsigned, unsigned>* cmp_t;
> > > +  zipped.qsort ([](const void *a, const void *b) -> int
> > > +    { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
> > > +
> > > +  /* Verify if we have a linear permute sequence.  */
> > > +  if (zipped.length () > 0)
> > > +    {
> > > +      unsigned leader = zipped[0].first;
> > > +      for (x = 1; x < zipped.length (); x++)
> > > +	if(!(is_linear = (zipped[x].first == ++leader)))
> > > +	  break;
> > > +    }
> > > +
> > > +  if (!is_linear)
> > > +    {
> > > +      if (dump_enabled_p ())
> > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > +			"Loads could not be made linear %p\n",
> > > +			node);
> > > +      zipped.release ();
> > > +      return false;
> > > +  }
> > > +
> > > +  for (x = 0; x < zipped.length (); x++)
> > > +    zipped[x].first = 0;
> > > +
> > > +  /* Create the new permute node and store it instead.  */
> > > +  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
> > > +  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
> > > +  SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
> > > +  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
> > > +  SLP_TREE_CHILDREN (vnode).quick_push (node);
> > > +  SLP_TREE_REF_COUNT (vnode) = 1;
> > > +  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
> > > +  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE
> > (parent);
> > > +  *result = vnode;
> > > +  return is_linear;
> > > +}
> > > +
> > > +/* Helper utility to check to see if the permutation PERM is one that can
> > be
> > > +   used in a node combination operation.  This is defined as the permute
> > not
> > > +   having all the elements being the same. e.g [0 0].  */
> > > +
> > > +static inline bool
> > > +vect_can_combine_node_p (load_permutation_t perm, bool is_linear)
> > > +{
> > > +  if (is_linear)
> > > +    return false;
> > > +
> > > +  unsigned i, x;
> > > +  FOR_EACH_VEC_ELT (perm, i, x)
> > > +    if (perm[0] != x)
> > > +      return false;
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* This function combines the nodes in MAP together to make a new
> > node using a
> > > +   lane permute.  The nodes to combine are stored in ENTRIES and the
> > resulting
> > > +   node is returned in RESULT.
> > > +
> > > +   If the nodes are already linear then this function fails and returns FALSE.
> > > +   Otherwise it returns the new node and TRUE.  */
> > > +
> > > +static bool
> > > +vect_slp_make_combine_linear (slp_tree_to_load_perm_map_t
> > *perm_cache,
> > > +			      slp_tree parent, vec<slp_tree> entries, map_t
> > map,
> > > +			      slp_tree *result)
> > > +{
> > > +  if (map.a == map.b)
> > > +    return false;
> > > +
> > > +  slp_tree node_a = entries[map.a];
> > > +  slp_tree node_b = entries[map.b];
> > > +
> > > +  bool is_a_linear = false;
> > > +  bool is_b_linear = false;
> > > +
> > > +  load_permutation_t load_perm_a
> > > +    = linear_loads_p (perm_cache, node_a, &is_a_linear);
> > > +  if (!vect_can_combine_node_p (load_perm_a, is_a_linear))
> > > +    return false;
> > > +
> > > +  load_permutation_t load_perm_b
> > > +    = linear_loads_p (perm_cache, node_b, &is_b_linear);
> > > +  if (!vect_can_combine_node_p (load_perm_b, is_b_linear))
> > > +    return false;
> > > +
> > > +  if (!load_perm_a.exists () || !load_perm_b.exists ())
> > > +    return false;
> > > +
> > > +  /* Now we need to figure which node is first.  */
> > > +  auto_vec<slp_tree> nodes;
> > > +  nodes.create (2);
> > > +  vec<std::pair<unsigned, unsigned> > perm;
> > > +  perm.create (2);
> > > +  if (load_perm_a[0] < load_perm_b[0])
> > > +    {
> > > +      perm.quick_push (std::make_pair (0, 0));
> > > +      perm.quick_push (std::make_pair (1, 0));
> > > +    }
> > > +  else
> > > +    {
> > > +      perm.quick_push (std::make_pair (1, 0));
> > > +      perm.quick_push (std::make_pair (0, 0));
> > > +    }
> > > +
> > > +  nodes.quick_push (node_a);
> > > +  nodes.quick_push (node_b);
> > > +
> > > +  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
> > > +  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
> > > +  SLP_TREE_LANE_PERMUTATION (vnode) = perm;
> > > +  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
> > > +  SLP_TREE_CHILDREN (vnode).safe_splice (nodes);
> > > +  SLP_TREE_REF_COUNT (vnode) = 1;
> > > +  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (parent);
> > > +  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE
> > (parent);
> > > +  *result = vnode;
> > > +  return true;
> > > +}
> > > +
> > > +/* Checks to see of the expression represented by NODE is a gimple
> > assign with
> > > +   code CODE.  */
> > > +
> > > +static inline bool
> > > +vect_match_expression_p (slp_tree node, tree_code code)
> > > +{
> > > +  if (!node
> > > +      || !SLP_TREE_REPRESENTATIVE (node))
> > > +    return false;
> > > +
> > > +  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE
> > (node));
> > > +  if (!is_gimple_assign (expr)
> > > +      || gimple_assign_rhs_code (expr) != code)
> > > +    return false;
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Checks to see if the expression represented by NODE is a call to the
> > internal
> > > +   function FN.  */
> > > +
> > > +static inline bool
> > > +vect_match_call_p (slp_tree node, internal_fn fn)
> > > +{
> > > +  if (!node
> > > +      || !SLP_TREE_REPRESENTATIVE (node))
> > > +    return false;
> > > +
> > > +  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE
> > (node));
> > > +  if (!gimple_call_internal_p (expr, fn))
> > > +    return false;
> > > +
> > > +   return true;
> > > +}
> > > +
> > > +/* Checks to see if NODE is a two_operands node whom's operands are
> > calls to the
> > > +   internal function FN.  */
> > > +
> > > +static bool
> > > +vect_match_two_call_p (slp_tree node, internal_fn fn)
> > > +{
> > > +  if (!SLP_TREE_REPRESENTATIVE (node)
> > > +      || SLP_TREE_CODE (node) != VEC_PERM_EXPR
> > > +      || SLP_TREE_CHILDREN (node).length () != 2)
> > > +    return false;
> > > +
> > > +  vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > > +
> > > +  return vect_match_call_p (children[0], fn)
> > > +	 && vect_match_call_p (children[1], fn);
> > > +}
> > > +
> > > +/* Check if the given lane permute in PERMUTES matches an alternating
> > sequence
> > > +   of {P0 P1 P0 P1 ...}.  This to account for unrolled loops.  Further mode
> > > +   there resulting permute must be linear.   */
> > > +
> > > +static bool
> > > +vect_check_lane_permute (lane_permutation_t &permutes,
> > > +			 unsigned p0, unsigned p1)
> > > +{
> > > +  if (permutes.length () == 0)
> > > +    return false;
> > > +
> > > +  unsigned val[2] = {p0, p1};
> > > +  unsigned seed = permutes[0].second;
> > > +  for (unsigned i = 0; i < permutes.length (); i++)
> > > +    if (permutes[i].first != val[i % 2]
> > > +	|| permutes[i].second != seed++)
> > > +      return false;
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* This function will match the two gimple expressions representing
> > NODE1 and
> > > +   NODE2 in parallel and returns the pair operation that represents the two
> > > +   expressions in the two statements.
> > > +
> > > +   If match is successful then the corresponding complex_operation is
> > > +   returned and the arguments to the two matched operations are
> > returned in OPS.
> > > +
> > > +   If TWO_OPERANDS it is expected that the LANES of the parent
> > VEC_PERM select
> > > +   from the two nodes alternatingly.
> > > +
> > > +   If unsuccessful then CMPLX_NONE is returned and OPS is untouched.
> > > +
> > > +   e.g. the following gimple statements
> > > +
> > > +   stmt 0 _39 = _37 + _12;
> > > +   stmt 1 _6 = _38 - _36;
> > > +
> > > +   will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
> > > +*/
> > > +
> > > +static complex_operation_t
> > > +vect_detect_pair_op (slp_tree node1, slp_tree node2,
> > lane_permutation_t &lanes,
> > > +		     bool two_operands = true, vec<slp_tree> *ops = NULL)
> > > +{
> > > +  complex_operation_t result = CMPLX_NONE;
> > > +
> > > +  if (vect_match_expression_p (node1, MINUS_EXPR)
> > > +      && vect_match_expression_p (node2, PLUS_EXPR)
> > > +      && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> > > +    result = MINUS_PLUS;
> > > +  else if (vect_match_expression_p (node1, PLUS_EXPR)
> > > +	   && vect_match_expression_p (node2, MINUS_EXPR)
> > > +	   && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
> > > +    result = PLUS_MINUS;
> > > +  else if (vect_match_expression_p (node1, PLUS_EXPR)
> > > +	   && vect_match_expression_p (node2, PLUS_EXPR))
> > > +    result = PLUS_PLUS;
> > > +  else if (vect_match_expression_p (node1, MULT_EXPR)
> > > +	   && vect_match_expression_p (node2, MULT_EXPR))
> > > +    result = MULT_MULT;
> > > +
> > > +  if (result != CMPLX_NONE && ops != NULL)
> > > +    {
> > > +      ops->create (2);
> > > +      ops->quick_push (node1);
> > > +      ops->quick_push (node2);
> > > +    }
> > > +  return result;
> > > +}
> > > +
> > > +/* Overload of vect_detect_pair_op that matches against the
> > representative
> > > +   statements in the children of NODE.  It is expected that NODE has
> > exactly
> > > +   two children and when TWO_OPERANDS then NODE must be a
> > VEC_PERM.  */
> > > +
> > > +static complex_operation_t
> > > +vect_detect_pair_op (slp_tree node, bool two_operands = true,
> > > +		     vec<slp_tree> *ops = NULL)
> > > +{
> > > +  if (!two_operands && SLP_TREE_CODE (node) == VEC_PERM_EXPR)
> > > +    return CMPLX_NONE;
> > > +
> > > +  if (SLP_TREE_CHILDREN (node).length () != 2)
> > > +    return CMPLX_NONE;
> > > +
> > > +  vec<slp_tree> children = SLP_TREE_CHILDREN (node);
> > > +  lane_permutation_t &lanes = SLP_TREE_LANE_PERMUTATION (node);
> > > +
> > > +  return vect_detect_pair_op (children[0], children[1], lanes,
> > two_operands,
> > > +			      ops);
> > > +}
> > > +
> > > +/* This function marks every statement that is being replaced during the
> > > +   the pattern matching as PURE.  Normally when replacing a statement
> > due
> > > +   to a pattern we add the statement to the
> > STMT_VINFO_PATTERN_DEF_SEQ of
> > > +   the pattern that is replacing them.  In this case however this won't
> > > +   work as when doing the replacement we are changing the nodes that
> > are
> > > +   used by the statements.  This means that when vectorized the SSA chain
> > > +   is different than in the BB.
> > > +
> > > +   Declaring the statements as part of the sequence will then cause SSA
> > > +   verification to fail as we may refer to statements that were not in the
> > > +   original USE-DEF chain of the statement we are replacing.
> > > +
> > > +   The relevancy of the statements are saved and are later restored if SLP
> > is to
> > > +   be dissolved.  */
> > > +
> > > +static void
> > > +vect_mark_stmts_as_in_pattern (hash_set<slp_tree> *cache, slp_tree
> > node)
> > > +{
> > > +  if (cache->add (node))
> > > +    return;
> > > +
> > > +  unsigned i;
> > > +  stmt_vec_info stmt_info;
> > > +  slp_tree child;
> > > +
> > > +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> > > +    {
> > > +      if (gimple_assign_load_p (STMT_VINFO_STMT (stmt_info)))
> > > +       return;
> > > +
> > > +      vect_save_relevancy (stmt_info);
> > > +      STMT_VINFO_SLP_VECT_ONLY (stmt_info) = true;
> > > +      STMT_SLP_TYPE (stmt_info) = pure_slp;
> > > +    }
> > > +
> > > +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> > > +    vect_mark_stmts_as_in_pattern (cache, child);
> > > +}
> > > +
> > > +
> > >
> > +/*********************************************************
> > **********************
> > > + * complex_pattern class
> > > +
> > **********************************************************
> > ********************/
> > > +
> > > +/* SLP Complex Numbers pattern matching.
> > > +
> > > +  As an example, the following simple loop:
> > > +
> > > +    double a[restrict N]; double b[restrict N]; double c[restrict N];
> > > +
> > > +    for (int i=0; i < N; i+=2)
> > > +    {
> > > +      c[i] = a[i] - b[i+1];
> > > +      c[i+1] = a[i+1] + b[i];
> > > +    }
> > > +
> > > +  which represents a complex addition on with a rotation of 90* around
> > the
> > > +  argand plane. i.e. if `a` and `b` were complex numbers then this would be
> > the
> > > +  same as `a + (b * I)`.
> > > +
> > > +  Here the expressions for `c[i]` and `c[i+1]` are independent but have to
> > be
> > > +  both recognized in order for the pattern to work.  As an SLP tree this is
> > > +  represented as
> > > +
> > > +                +--------------------------------+
> > > +                |       stmt 0 *_9 = _10;        |
> > > +                |       stmt 1 *_15 = _16;       |
> > > +                +--------------------------------+
> > > +                                |
> > > +                                |
> > > +                                v
> > > +                +--------------------------------+
> > > +                |     stmt 0 _10 = _4 - _8;      |
> > > +                |    stmt 1 _16 = _12 + _14;     |
> > > +                | lane permutation { 0[0] 1[1] } |
> > > +                +--------------------------------+
> > > +                            |        |
> > > +                            |        |
> > > +                            |        |
> > > +               +-----+      |        |      +-----+
> > > +               |     |      |        |      |     |
> > > +         +-----| { } |<-----+        +----->| { } --------+
> > > +         |     |     |   +------------------|     |       |
> > > +         |     +-----+   |                  +-----+       |
> > > +         |        |      |                                |
> > > +         |        |      |                                |
> > > +         |        +------|------------------+             |
> > > +         |               |                  |             |
> > > +         v               v                  v             v
> > > +     +--------------------------+     +--------------------------------+
> > > +     |     stmt 0 _8 = *_7;     |     |        stmt 0 _4 = *_3;        |
> > > +     |    stmt 1 _14 = *_13;    |     |       stmt 1 _12 = *_11;       |
> > > +     | load permutation { 1 0 } |     |    load permutation { 0 1 }    |
> > > +     +--------------------------+     +--------------------------------+
> > > +
> > > +  The pattern matcher allows you to replace both statements 0 and 1 or
> > none at
> > > +  all.  Because this operation is a two operands operation the actual nodes
> > > +  being replaced are those in the { } nodes.  The actual scalar statements
> > > +  themselves are not replaced or used during the matching but instead the
> > > +  SLP_TREE_REPRESENTATIVE statements are inspected.  You are also
> > allowed to
> > > +  replace and match on any number of nodes.
> > > +
> > > +  Because the pattern matcher matches on the representative statement
> > for the
> > > +  SLP node the case of two_operators it allows you to match the children
> > of the
> > > +  node.  This is done using the method `recognize ()`.
> > > +
> > > +*/
> > > +
> > > +/* The complex_pattern class contains common code for pattern
> > matchers that work
> > > +   on complex numbers.  These provide functionality to allow de-
> > construction and
> > > +   validation of sequences depicting/transforming REAL and IMAG pairs.  */
> > > +
> > > +class complex_pattern : public vect_pattern
> > > +{
> > > +  protected:
> > > +    vec<slp_tree> *m_nodes = NULL;
> > > +    complex_pattern (slp_tree *node)
> > > +      : vect_pattern (node)
> > > +    { }
> > > +
> > > +  public:
> > > +    bool validate_p (slp_tree_to_load_perm_map_t *);
> > > +    gcall *build (vec_info *);
> > > +};
> > > +
> > > +/* The post transform and validation function for the complex number
> > > +   patterns.  This will re-arrange the tree and re-organize the nodes such
> > > +   that they can be used by the complex number instructions that are to be
> > > +   created.  It does this by doing the following steps:
> > > +
> > > +   For each node that isn't already linear a new permute node is created
> > which
> > > +   when applied makes the node linear.  If such permutation does not exist
> > then
> > > +   the function returns FALSE.  For nodes that are already linear no work is
> > > +   done.   */
> > > +
> > > +bool
> > > +complex_pattern::validate_p (slp_tree_to_load_perm_map_t
> > *perm_cache)
> > > +{
> > > +  slp_tree node;
> > > +  unsigned ix;
> > > +  auto_vec<slp_tree> workset;
> > > +  FOR_EACH_VEC_ELT (this->m_ops, ix, node)
> > > +    {
> > > +      auto_vec<slp_tree> nodes;
> > > +      nodes.create (this->m_num_args);
> > > +      slp_tree tmp = NULL;
> > > +      auto_vec<slp_tree> new_nodes;
> > > +
> > > +      unsigned i;
> > > +      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp)
> > > +	{
> > > +	  slp_tree vnode = NULL;
> > > +	  if (vect_slp_make_linear (perm_cache, node, tmp, &vnode))
> > > +	    {
> > > +	      nodes.quick_push (vnode);
> > > +	      if (vnode != tmp)
> > > +		new_nodes.safe_push (vnode);
> > > +	    }
> > > +	  else
> > > +	    {
> > > +	      FOR_EACH_VEC_ELT (new_nodes, i, tmp)
> > > +		delete tmp;
> > > +
> > > +	      FOR_EACH_VEC_ELT (workset, i, tmp)
> > > +		delete tmp;
> > > +
> > > +	      nodes.release();
> > > +	      return false;
> > > +	    }
> > > +	}
> > > +
> > > +      slp_tree new_node
> > > +	= vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node),
> > > +				    SLP_TREE_CHILDREN (node).length ());
> > > +      SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node);
> > > +      SLP_TREE_LANE_PERMUTATION (new_node) =
> > SLP_TREE_LANE_PERMUTATION (node);
> > > +      SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node);
> > > +      SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT (node);
> > > +      SLP_TREE_REPRESENTATIVE (new_node) =
> > SLP_TREE_REPRESENTATIVE (node);
> > > +      SLP_TREE_CHILDREN (new_node).safe_splice (nodes);
> > > +      SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node);
> > > +
> > > +      workset.safe_push (new_node);
> > > +    }
> > > +
> > > +  FOR_EACH_VEC_ELT (workset, ix, node)
> > > +    SLP_TREE_CHILDREN (*this->m_node)[ix] = node;
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +
> > > +/* Create a replacement pattern statement for each node in m_node and
> > inserts
> > > +   the new statement into m_node as the new representative statement.
> > The old
> > > +   statement is marked as being in a pattern defined by the new statement.
> > The
> > > +   statement is created as call to internal function IFN with m_num_args
> > > +   arguments.
> > > +
> > > +   Futhermore the new pattern is also added to the vectorization
> > information
> > > +   structure VINFO and the old statement STMT_INFO is marked as unused
> > while
> > > +   the new statement is marked as used and the number of SLP uses of the
> > new
> > > +   statement is incremented.
> > > +
> > > +   The newly created SLP nodes are marked as SLP only and will be
> > dissolved
> > > +   if SLP is aborted.
> > > +
> > > +   The newly created gimple call is returned and the BB remains unchanged.
> > > +
> > > +   This default method is designed to only match against simple operands
> > where
> > > +   all the input and output types are the same.
> > > +*/
> > > +
> > > +gcall *
> > > +complex_pattern::build (vec_info *vinfo)
> > > +{
> > > +  stmt_vec_info stmt_info;
> > > +
> > > +  auto_vec<tree> args;
> > > +  args.create (this->m_num_args);
> > > +  args.quick_grow_cleared (this->m_num_args);
> > > +  slp_tree node;
> > > +  unsigned ix;
> > > +  stmt_vec_info call_stmt_info;
> > > +  gcall *call_stmt = NULL;
> > > +
> > > +  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
> > 
> > So elsewhere I see m_nodes assigned from SLP_TREE_CHILDREN
> > of m_node.  I assume m_node is the VEC_PERM of the two-operator
> > operation.  I expected we create a call for m_node here, not
> > for all children?
> 
> It depends on the pattern. For most, particularly those of two_operands nodes,
> unless I have mistaken something it's the children of the node that determines
> what operation is being performed isn't it?

Yes.

> So for  ADD for instance, the tree is
> 
> node 0x49aeae0 (max_nunits=4, refcnt=2)
>  stmt 0 REALPART_EXPR <*_10> = _4;
>  stmt 1 IMAGPART_EXPR <*_10> = _22;
>  children 0x49b0b20
> node 0x49b0b20 (max_nunits=4, refcnt=2)
>  stmt 0 _4 = _13 - _6;
>  stmt 1 _22 = _8 + _12;
>  lane permutation { 0[0] 1[1] }
>  children 0x49b0430 0x49a73c0
> node 0x49b0430 (max_nunits=1, refcnt=1)
>  { }
>  children 0x49ad660 0x499e400
> node 0x49ad660 (max_nunits=4, refcnt=3)
>  stmt 0 _13 = REALPART_EXPR <*_3>;
>  stmt 1 _12 = IMAGPART_EXPR <*_3>;
>  load permutation { 0 1 }
> node 0x499e400 (max_nunits=4, refcnt=3)
>  stmt 0 _6 = IMAGPART_EXPR <*_5>;
>  stmt 1 _8 = REALPART_EXPR <*_5>;
>  load permutation { 1 0 }
> node 0x49a73c0 (max_nunits=1, refcnt=1)
>  { }
>  children 0x49ad660 0x499e400
> 
> where the VEC_PERM node itself just contains the permute and the children of it
> 0x49b0430 and 0x49a73c0 contain the operations - and + and from what I understood
> the lane permute defined how the input is deinterleaved between the two nodes.
> 
> So when replacing them it's those two that I need to change and not VEC_PERM itself
> otherwise the node could contain a permute and an operation that needs to be vectorized?

Ah.  So I expected the complex patterns to match a mixed plus/minus 
operation.  For example on x86 there's addsubpd which (for V2DF)
performs { a0 + b0, a1 - b1 } so this includes the blend between
the two SLP operation nodes.  For such an "exact match" I'd have
expected you replace the permute node with the internal function call.

So you say you even match when the permute doesn't exactly match the
pattern but it instead would match the internal function call
plus a following permute?
 
> Or did you mean I should dissolve the two_operand node completely at this point? I was trying
> to minimize the number of changes done on each replacement, because then it means I can assume
> that any information I have already calculated of the children of the node being replaced is still the same
> when doing recursive matching.
> 
> The tree directly out of the matcher is 
> 
>     node 0x49aeae0 (max_nunits=4, refcnt=2)
>      stmt 0 REALPART_EXPR <*_10> = _4;
>      stmt 1 IMAGPART_EXPR <*_10> = _22;
>      children 0x49b0b20
>     node 0x49b0b20 (max_nunits=4, refcnt=2)
>      stmt 0 _4 = _13 - _6;
>      stmt 1 _22 = _8 + _12;
>      lane permutation { 0[0] 1[1] }
>      children 0x4ade0c0 0x4ade220
>     node 0x4ade0c0 (max_nunits=1, refcnt=1)
>      { }
>      SLP Only pattern:
>       slp_patt_41 = .COMPLEX_ADD_ROT90 (_pa_43, _pa_42);
>      children 0x49ad660 0x49b60f0
>     node 0x49ad660 (max_nunits=4, refcnt=3)
>      stmt 0 _13 = REALPART_EXPR <*_3>;
>      stmt 1 _12 = IMAGPART_EXPR <*_3>;
>      load permutation { 0 1 }
>     node 0x49b60f0 (max_nunits=1, refcnt=1)
>      { }
>      lane permutation { 0[1] 0[0] }
>      children 0x499e400
>     node 0x499e400 (max_nunits=4, refcnt=3)
>      stmt 0 _6 = IMAGPART_EXPR <*_5>;
>      stmt 1 _8 = REALPART_EXPR <*_5>;
>      load permutation { 1 0 }
>     node 0x4ade220 (max_nunits=1, refcnt=1)
>      { }
>      SLP Only pattern:
>       slp_patt_32 = .COMPLEX_ADD_ROT90 (_pa_40, _pa_33);
>      children 0x49ad660 0x4ade180
>     node 0x4ade180 (max_nunits=1, refcnt=1)
>      { }
>      lane permutation { 0[1] 0[0] }
>      children 0x499e400

So you turned the first child of the 0x49b0b20 permute node

> node 0x49b0430 (max_nunits=1, refcnt=1)
>  { }
>  children 0x49ad660 0x499e400

into

>     node 0x4ade0c0 (max_nunits=1, refcnt=1)
>      { }
>      SLP Only pattern:
>       slp_patt_41 = .COMPLEX_ADD_ROT90 (_pa_43, _pa_42);
>      children 0x49ad660 0x49b60f0
>     node 0x49b60f0 (max_nunits=1, refcnt=1)
>      { }
>      lane permutation { 0[1] 0[0] }
>      children 0x499e400

that's indeed not sth I expected - where are the .COMPLEX_ADD_ROT90
nodes merged again?  Because isn't the whole exercise so we have
to cost it only once compared to the two-operator dance which
costs two operations and a blend?

So yes, I expected the three nodes, permute and two children, to be
pattern-recognized and merged into one.  If there's a remaining
permute to be done I would either have expected the patter to not
match or the permute stuck in front.

I'm not sure what you mean about the information on the children
when doing recursive matching - as I read the current patch we're doing
postorder processing so the children are not matched again and when
matching the parents we wouldn't see the children covered by the
pattern?

Thanks,
Richard.

> The one exception to this is when the node being replaced isn't a two_operands node,
> like is the case with FMA.  In that case the node is just an ADD node which itself is replaced.
> This is why it's a variable, so I can tell it where to perform the replacement.
> 
> Thanks,
> Tamar
> 
> > 
> > > +    {
> > > +      /* Calculate the location of the statement in NODE to replace.  */
> > > +      stmt_info = SLP_TREE_REPRESENTATIVE (node);
> > > +      gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
> > > +      tree type = TREE_TYPE (gimple_get_lhs (old_stmt));
> > > +
> > > +      /* Create the argument set for use by gimple_build_call_internal_vec.
> > */
> > > +      for (unsigned i = 0; i < this->m_num_args; i++)
> > > +	{
> > > +	  tree var = make_temp_ssa_name (type, old_stmt, "_pa");
> > 
> > Why do you need to create N new SSA names (all with the same
> > definition statement)?  It looks like this new "scalar" stmt
> > is only a template stored in SLP_TREE_REPRESENTATIVE
> > and supposedly there needs to be "something" in the actual
> > arguments it make us not crash later?  Can't you for example
> > simply push the old_stmt LHS N times?
> > 
> > For VEC_PERM I originally wanted to set SLP_TREE_REPRESENTATIVE
> > to NULL_TREE but we don't like that obviously.  So long-term
> > we might want to change SLP_TREE_CODE to a match_code to allow
> > tree codes and internal-functions and "ignore" the representative
> > in case it is not ERROR_MARK.
> > 
> > > +	  args[i] = var;
> > > +	}
> > > +
> > > +      /* Create the new pattern statements.  */
> > > +      call_stmt = gimple_build_call_internal_vec (this->m_ifn, args);
> > > +      tree var = make_temp_ssa_name (type, call_stmt, "slp_patt");
> > > +      gimple_call_set_lhs (call_stmt, var);
> > > +      gimple_set_location (call_stmt, gimple_location (old_stmt));
> > > +      gimple_call_set_nothrow (call_stmt, true);
> > > +
> > > +      /* Adjust the book-keeping for the new and old statements for use
> > during
> > > +	 SLP.  This is required to get the right VF and statement during SLP
> > > +	 analysis.  These changes are created after relevancy has been set for
> > > +	 the nodes as such we need to manually update them.  Any changes
> > will be
> > > +	 undone if SLP is cancelled.  */
> > > +      call_stmt_info
> > > +	= vinfo->add_pattern_stmt (call_stmt, stmt_info);
> > > +
> > > +      /* add_pattern_stmt can't be done in vect_mark_pattern_stmts
> > because
> > > +	 the non-SLP pattern matchers already have added the statement to
> > VINFO
> > > +	 by the time it is called.  Some of them need to modify the returned
> > > +	 stmt_info.  vect_mark_pattern_stmts is called by recog_pattern and
> > it
> > > +	 would increase the size of each pattern with boilerplate code to
> > make
> > > +	 the call there.  */
> > > +      vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt,
> > > +			       SLP_TREE_VECTYPE (node));
> > > +      STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> > > +      STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
> > > +
> > > +      /* Since we are replacing all the statements in the group with the same
> > > +	 thing it doesn't really matter.  So just set it every time a new stmt
> > > +	 is created.  */
> > > +      SLP_TREE_REPRESENTATIVE (node) = call_stmt_info;
> > > +      SLP_TREE_CODE (node) = CALL_EXPR;
> > > +    }
> > > +  return call_stmt;
> > > +}
> > > +
> > >
> > +/*********************************************************
> > **********************
> > > + * complex_add_pattern class
> > > +
> > **********************************************************
> > ********************/
> > > +
> > > +class complex_add_pattern : public complex_pattern
> > > +{
> > > +  protected:
> > > +    complex_add_pattern (slp_tree *node)
> > > +      : complex_pattern (node)
> > > +    {
> > > +      this->m_num_args = 2;
> > > +    }
> > > +
> > > +  public:
> > > +    static vect_pattern* create (slp_tree *node)
> > > +    {
> > > +       return new complex_add_pattern (node);
> > > +    }
> > > +
> > > +    const char* get_name ()
> > > +    {
> > > +      return "Complex Addition";
> > > +    }
> > > +
> > > +    bool matches (slp_tree_to_load_perm_map_t *);
> > > +    bool matches (complex_operation_t op,
> > slp_tree_to_load_perm_map_t *,
> > > +		  vec<slp_tree> ops);
> > > +};
> > > +
> > > +/* Pattern matcher for trying to match complex addition pattern in SLP
> > tree.
> > > +
> > > +   If no match is found then IFN is set to IFN_LAST.
> > > +   This function matches the patterns shaped as:
> > > +
> > > +   c[i] = a[i] - b[i+1];
> > > +   c[i+1] = a[i+1] + b[i];
> > > +
> > > +   If a match occurred then TRUE is returned, else FALSE.  The initial match
> > is
> > > +   expected to be in OP1 and the initial match operands in args0.  */
> > > +
> > > +bool
> > > +complex_add_pattern::matches (complex_operation_t op,
> > > +			      slp_tree_to_load_perm_map_t *perm_cache,
> > > +			      vec<slp_tree> ops)
> > > +{
> > > +  this->m_ifn = IFN_LAST;
> > > +  this->m_ops.safe_splice (ops);
> > > +
> > > +  /* Find the two components.  Rotation in the complex plane will modify
> > > +     the operations:
> > > +
> > > +      * Rotation  0: + +
> > > +      * Rotation 90: - +
> > > +      * Rotation 180: - -
> > > +      * Rotation 270: + -
> > > +
> > > +      Rotation 0 and 180 can be handled by normal SIMD code, so we don't
> > need
> > > +      to care about them here.  */
> > > +  if (op == MINUS_PLUS)
> > > +    this->m_ifn = IFN_COMPLEX_ADD_ROT90;
> > > +  else if (op == PLUS_MINUS)
> > > +    this->m_ifn = IFN_COMPLEX_ADD_ROT270;
> > > +
> > > +  /* verify that there is a permute, otherwise this isn't a pattern we
> > > +     we support.  */
> > > +  bool is_linear = false;
> > > +  bool all_linear = true;
> > > +  unsigned x;
> > > +  for (x = 0; x < this->m_ops.length () && all_linear; x++)
> > > +    {
> > > +      linear_loads_p (perm_cache, this->m_ops[x], &is_linear);
> > > +      all_linear = all_linear && is_linear;
> > > +    }
> > > +
> > > +  if (this->m_ifn == IFN_LAST || all_linear)
> > > +    return false;
> > > +
> > > +  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> > > +  return true;
> > > +}
> > > +
> > > +bool
> > > +complex_add_pattern::matches (slp_tree_to_load_perm_map_t
> > *perm_cache)
> > > +{
> > > +  complex_operation_t op
> > > +    = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
> > > +  return matches (op, perm_cache, this->m_ops);
> > > +}
> > > +
> > >
> > +/*********************************************************
> > **********************
> > > + * complex_mul_pattern
> > > +
> > **********************************************************
> > ********************/
> > > +
> > > +/* Helper function of that looks for a match in the CHILDth child of NODE.
> > The
> > > +   child used is stored in RES.
> > > +
> > > +   If the match is successful then ARGS will contain the operands matched
> > > +   and the complex_operation_t type is returned.  If match is not
> > successful
> > > +   then CMPLX_NONE is returned and ARGS is left unmodified.  */
> > > +
> > > +static inline complex_operation_t
> > > +vect_match_call_complex_mla (slp_tree node, unsigned child,
> > > +			     vec<slp_tree> *args = NULL, slp_tree *res = NULL)
> > > +{
> > > +  gcc_assert (child < SLP_TREE_CHILDREN (node).length ());
> > > +
> > > +  slp_tree data = SLP_TREE_CHILDREN (node)[child];
> > > +
> > > +  if (res)
> > > +    *res = data;
> > > +
> > > +  return vect_detect_pair_op (data, false, args);
> > > +}
> > > +
> > > +/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the
> > first
> > > +   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
> > > +
> > > +   If a negate is found then the values in ARGS are reordered such that the
> > > +   negate node is always the second one and the entry is replaced by the
> > child
> > > +   of the negate node.  */
> > > +
> > > +static inline bool
> > > +vect_normalize_conj_loc (vec<slp_tree> args, bool *neg_first_p = NULL)
> > > +{
> > > +  gcc_assert (args.length () == 2);
> > > +  bool neg_found = false;
> > > +
> > > +  if (vect_match_expression_p (args[0], NEGATE_EXPR))
> > > +    {
> > > +      std::swap (args[0], args[1]);
> > > +      neg_found = true;
> > > +      if (neg_first_p)
> > > +	*neg_first_p = true;
> > > +    }
> > > +  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
> > > +    {
> > > +      neg_found = true;
> > > +      if (neg_first_p)
> > > +	*neg_first_p = false;
> > > +    }
> > > +
> > > +  if (neg_found)
> > > +    args[1] = SLP_TREE_CHILDREN (args[1])[0];
> > > +
> > > +  return neg_found;
> > > +}
> > > +
> > > +/* Helper function that checks to see if the load permutation PERM is on
> > that
> > > +   belongs to a duplication operation.  i.e. checks to see if all the values in
> > > +   PERM are the same.  */
> > > +
> > > +static inline bool
> > > +vect_is_dup (load_permutation_t perm, unsigned idx)
> > > +{
> > > +  for (unsigned i = 0; i < perm.length (); i++)
> > > +    if (perm[i] != idx)
> > > +     return false;
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both
> > MULT_EXPR
> > > +   nodes but also that they represent an operation that is either a complex
> > > +   multiplication or a complex multiplication by conjucated value.
> > > +
> > > +   Of the negation is expected to be in the first half of the tree (As
> > required
> > > +   by an FMS pattern) then NEG_FIRST is true.  If the operation is a
> > conjucate
> > > +   operation then CONJ_FIRST_OPERAND is set to indicate whether the
> > first or
> > > +   second operand contains the conjucate operation.  */
> > > +
> > > +static inline bool
> > > +vect_validate_multiplication (slp_tree_to_load_perm_map_t
> > *perm_cache,
> > > +			     vec<slp_tree> left_op, vec<slp_tree> right_op,
> > > +			     bool neg_first, bool *conj_first_operand)
> > > +{
> > > +  /* The presence of a negation indicates that we have either a conjucate
> > or a
> > > +     rotation.  We need to distinguish which one.  */
> > > +  *conj_first_operand = false;
> > > +
> > > +  load_permutation_t perm;
> > > +  bool is_linear;
> > > +  /* Complex conjucates have the negation on the imaginary part of the
> > > +     number where rotations affect the real component.  So check if the
> > > +     negation is on a dup of lane 1.  */
> > > +  perm = linear_loads_p (perm_cache, right_op[1], &is_linear);
> > > +  if (is_linear || !vect_is_dup (perm, 1))
> > > +    return false;
> > > +
> > > +  /* Check if the conjucate is on the second first or second operand.  The
> > > +     order of the node with the conjucate value determines this, and the
> > dup
> > > +     node must be one of lane 0 of the same DR as the neg node.  */
> > > +  perm = linear_loads_p (perm_cache, left_op[0], &is_linear);
> > > +  if (is_linear)
> > > +    {
> > > +      perm = linear_loads_p (perm_cache, left_op[1], &is_linear);
> > > +      if (is_linear)
> > > +	return false;
> > > +    }
> > > +  else if (!neg_first)
> > > +    *conj_first_operand = true;
> > > +  else
> > > +    return false;
> > > +
> > > +  if (!vect_is_dup (perm, 0))
> > > +    return false;
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Helper function to help distinguish between a conjucate and a rotation
> > in a
> > > +   complex multiplication.  The operations have similar shapes but the
> > order of
> > > +   the load permutes are different.  This function returns TRUE when the
> > order
> > > +   is consistent with a multiplication or multiplication by conjucated
> > > +   operand but returns FALSE if it's a multiplication by rotated operand.  */
> > > +
> > > +static inline bool
> > > +vect_validate_multiplication (slp_tree_to_load_perm_map_t
> > *perm_cache,
> > > +			     vec<slp_tree> op, unsigned idx)
> > > +{
> > > +  /* The left node is the more common case, test it first.  */
> > > +  bool is_linear;
> > > +  load_permutation_t perm = linear_loads_p (perm_cache, op[0],
> > &is_linear);
> > > +  if (is_linear || !vect_is_dup (perm, idx))
> > > +    {
> > > +      perm = linear_loads_p (perm_cache, op[1], &is_linear);
> > > +      if (is_linear || !vect_is_dup (perm, idx))
> > > +	return false;
> > > +    }
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +class complex_mul_pattern : public complex_pattern
> > > +{
> > > +  protected:
> > > +    /* Allocate enough space for FMA as well.  */
> > > +    map_t m_blocks[6] = {};
> > > +    bool m_inplace = false;
> > > +    complex_mul_pattern (slp_tree *node)
> > > +      : complex_pattern (node)
> > > +    {
> > > +      this->m_num_args = 2;
> > > +    }
> > > +
> > > +  public:
> > > +    static vect_pattern* create (slp_tree *node)
> > > +    {
> > > +       return new complex_mul_pattern (node);
> > > +    }
> > > +
> > > +    const char* get_name ()
> > > +    {
> > > +      return "Complex Multiplication";
> > > +    }
> > > +
> > > +    bool validate_p (slp_tree_to_load_perm_map_t *);
> > > +    bool matches (slp_tree_to_load_perm_map_t *);
> > > +    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> > > +		  vec<slp_tree>);
> > > +};
> > > +
> > > +/* Pattern matcher for trying to match complex multiply pattern in SLP
> > tree
> > > +   If the operation matches then IFN is set to the operation it matched
> > > +   and the arguments to the two replacement statements are put in
> > m_ops.
> > > +
> > > +   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> > > +
> > > +   This function matches the patterns shaped as:
> > > +
> > > +   double ax = (b[i+1] * a[i]);
> > > +   double bx = (a[i+1] * b[i]);
> > > +
> > > +   c[i] = c[i] - ax;
> > > +   c[i+1] = c[i+1] + bx;
> > > +
> > > +   If a match occurred then TRUE is returned, else FALSE.  The initial match
> > is
> > > +   expected to be in OP1 and the initial match operands in args0.  */
> > > +
> > > +bool
> > > +complex_mul_pattern::matches (complex_operation_t op,
> > > +			      slp_tree_to_load_perm_map_t *perm_cache,
> > > +			      vec<slp_tree> /* ops */)
> > > +{
> > > +  this->m_ifn = IFN_LAST;
> > > +
> > > +  if (op != MINUS_PLUS)
> > > +    return false;
> > > +
> > > +  slp_tree root = *this->m_node;
> > > +  /* First two nodes must be a multiply.  */
> > > +  auto_vec<slp_tree> muls;
> > > +  if (vect_match_call_complex_mla (root, 0) != MULT_MULT
> > > +      || vect_match_call_complex_mla (root, 1, &muls) != MULT_MULT)
> > > +    return false;
> > > +
> > > +  /* Now operand2+4 may lead to another expression.  */
> > > +  auto_vec<slp_tree> left_op, right_op;
> > > +  left_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
> > > +  right_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
> > > +
> > > +  bool neg_first;
> > > +  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
> > > +
> > > +  if (!is_neg)
> > > +    {
> > > +      /* A multiplication needs to multiply agains the real pair, otherwise
> > > +	 the pattern matches that of FMS.   */
> > > +      if (!vect_validate_multiplication (perm_cache, left_op, 0))
> > > +	return false;
> > > +      this->m_ifn = IFN_COMPLEX_MUL;
> > > +      this->m_ops.safe_splice (left_op);
> > > +      this->m_ops.safe_splice (right_op);
> > > +    }
> > > +  else if (is_neg)
> > > +    {
> > > +      bool conj_first_operand;
> > > +      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
> > > +					 neg_first, &conj_first_operand))
> > > +	return false;
> > > +
> > > +      this->m_ifn = IFN_COMPLEX_MUL_CONJ;
> > > +      /* Check if the conjucate is on the first or second parameter.  */
> > > +      if (!conj_first_operand)
> > > +	{
> > > +	  this->m_ops.safe_push (left_op[1]);
> > > +	  this->m_ops.safe_push (left_op[0]);
> > > +	}
> > > +      else
> > > +	{
> > > +	  this->m_ops.safe_push (left_op[0]);
> > > +	  this->m_ops.safe_push (left_op[1]);
> > > +	}
> > > +      this->m_ops.safe_push (right_op[1]);
> > > +      this->m_ops.safe_push (right_op[0]);
> > > +    }
> > > +
> > > +  vect_build_perm_groups (perm_cache, &this->m_blocks[0], this-
> > >m_ops);
> > > +  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +bool
> > > +complex_mul_pattern::matches (slp_tree_to_load_perm_map_t
> > *perm_cache)
> > > +{
> > > +  complex_operation_t op
> > > +    = vect_detect_pair_op (*this->m_node);
> > > +  return matches (op, perm_cache, this->m_ops);
> > > +}
> > > +
> > > +
> > > +/* Validates to see if the Complex MUL that we have matched is valid.
> > This is
> > > +   done through a combination of making nodes linear and combining
> > nodes.  */
> > > +
> > > +bool
> > > +complex_mul_pattern::validate_p (slp_tree_to_load_perm_map_t
> > *perm_cache)
> > > +{
> > > +  slp_tree node;
> > > +  unsigned ix;
> > > +  hash_set<slp_tree> cache;
> > > +  auto_vec<slp_tree> workset;
> > > +  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
> > > +    {
> > > +      auto_vec<slp_tree> nodes;
> > > +      nodes.create (this->m_num_args);
> > > +      slp_tree tmp = NULL;
> > > +      auto_vec<slp_tree> new_nodes;
> > > +
> > > +      unsigned i;
> > > +      for (i = 0; i < this->m_num_args; i++)
> > > +	{
> > > +	  unsigned index = (ix * this->m_num_args) + i;
> > > +	  map_t map = this->m_blocks[index];
> > > +	  slp_tree vnode = NULL;
> > > +	  bool needs_linear = map.a == map.b;
> > > +	  tmp = this->m_ops[index];
> > > +	  cache.add (tmp);
> > > +	  if (needs_linear
> > > +	      && vect_slp_make_linear (perm_cache, node, tmp, &vnode))
> > > +	    {
> > > +	      nodes.quick_push (vnode);
> > > +	      if (vnode != tmp)
> > > +		new_nodes.safe_push (vnode);
> > > +	    }
> > > +	  else if (!needs_linear
> > > +		   && vect_slp_make_combine_linear (perm_cache, node,
> > > +						    this->m_ops, map,
> > &vnode))
> > > +	    {
> > > +	      nodes.quick_push (vnode);
> > > +	      if (vnode != tmp)
> > > +		new_nodes.safe_push (vnode);
> > > +	    }
> > > +	  else
> > > +	    {
> > > +	      if (dump_enabled_p ())
> > > +		dump_printf_loc (MSG_NOTE, vect_location,
> > > +				"stmts could not be made %s %p\n",
> > > +				needs_linear ? "linear" : "linear/combined",
> > > +				tmp);
> > > +	      FOR_EACH_VEC_ELT (new_nodes, i, tmp)
> > > +		delete tmp;
> > > +
> > > +	      if (!m_inplace)
> > > +		FOR_EACH_VEC_ELT (workset, i, tmp)
> > > +		  delete tmp;
> > > +
> > > +	      nodes.release();
> > > +	      return false;
> > > +	    }
> > > +
> > > +	  vect_mark_stmts_as_in_pattern (&cache, node);
> > > +	}
> > > +
> > > +      if (m_inplace)
> > > +	{
> > > +	  workset.safe_splice (nodes);
> > > +	}
> > > +      else
> > > +	{
> > > +	  slp_tree new_node
> > > +	    = vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node),
> > > +					SLP_TREE_CHILDREN (node).length
> > ());
> > > +	  SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node);
> > > +	  SLP_TREE_LANE_PERMUTATION (new_node)
> > > +	    = SLP_TREE_LANE_PERMUTATION (node);
> > > +	  SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node);
> > > +	  SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT
> > (node);
> > > +	  SLP_TREE_REPRESENTATIVE (new_node) =
> > SLP_TREE_REPRESENTATIVE (node);
> > > +	  SLP_TREE_CHILDREN (new_node).safe_splice (nodes);
> > > +	  SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node);
> > > +
> > > +	  workset.safe_push (new_node);
> > > +	}
> > > +    }
> > > +
> > > +  if (m_inplace)
> > > +    {
> > > +      SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> > > +      SLP_TREE_CHILDREN (*this->m_node).safe_splice (workset);
> > > +    }
> > > +  else
> > > +    {
> > > +      FOR_EACH_VEC_ELT (workset, ix, node)
> > > +	SLP_TREE_CHILDREN (*this->m_node)[ix] = node;
> > > +    }
> > > +
> > > +  return true;
> > > +}
> > > +
> > >
> > +/*********************************************************
> > **********************
> > > + * complex_fma_pattern class
> > > +
> > **********************************************************
> > ********************/
> > > +
> > > +class complex_fma_pattern : public complex_mul_pattern
> > > +{
> > > +  protected:
> > > +    /* Temporary nodes cache as scope of the class so we don't
> > > +       have to worry about freeing it. */
> > > +    auto_vec<slp_tree> m_ncache;
> > > +    /* List of nodes to mark as no longer being a mul.  The first
> > > +       one is chosen and the last one is dangling.  */
> > > +    auto_vec<slp_tree> m_mul_nodes;
> > > +    complex_fma_pattern (slp_tree *node)
> > > +      : complex_mul_pattern (node)
> > > +    {
> > > +      this->m_num_args = 3;
> > > +    }
> > > +
> > > +  public:
> > > +    static vect_pattern* create (slp_tree *node)
> > > +    {
> > > +       return new complex_fma_pattern (node);
> > > +    }
> > > +
> > > +    const char* get_name ()
> > > +    {
> > > +      return "Complex FMA";
> > > +    }
> > > +
> > > +    bool matches (slp_tree_to_load_perm_map_t *);
> > > +    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> > > +		  vec<slp_tree> ops);
> > > +    bool validate_p (slp_tree_to_load_perm_map_t *);
> > > +};
> > > +
> > > +/* Helper function to "reset" a previously matched node and undo the
> > changes
> > > +   made enough so that the node is treated as an irrelevant node.  */
> > > +
> > > +static inline void
> > > +vect_slp_reset_pattern (slp_tree node)
> > > +{
> > > +  stmt_vec_info stmt_info = vect_orig_stmt (SLP_TREE_REPRESENTATIVE
> > (node));
> > > +  STMT_VINFO_IN_PATTERN_P (stmt_info) = false;
> > > +  STMT_SLP_TYPE (stmt_info) = pure_slp;
> > > +  SLP_TREE_REPRESENTATIVE (node) = stmt_info;
> > > +}
> > > +
> > > +/* Pattern matcher for trying to match complex multiply and accumulate
> > > +   and multiply and subtract patterns in SLP tree.
> > > +   If the operation matches then IFN is set to the operation it matched and
> > > +   the arguments to the two replacement statements are put in m_ops.
> > > +
> > > +   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> > > +
> > > +   This function matches the patterns shaped as:
> > > +
> > > +   double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
> > > +   double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
> > > +
> > > +   c[i] = c[i] - ax;
> > > +   c[i+1] = c[i+1] + bx;
> > > +
> > > +   If a match occurred then TRUE is returned, else FALSE.  The match is
> > > +   performed after COMPLEX_MUL which would have done the majority of
> > the work.
> > > +   This function merely matches an ADD with a COMPLEX_MUL IFN.  The
> > initial
> > > +   match is expected to be in OP1 and the initial match operands in args0.
> > */
> > > +
> > > +bool
> > > +complex_fma_pattern::matches (complex_operation_t op1,
> > > +			      slp_tree_to_load_perm_map_t * /* perm_cache
> > */,
> > > +			      vec<slp_tree> /* args0 */)
> > > +{
> > > +  this->m_ifn = IFN_LAST;
> > > +
> > > +  /* Find the two components.  We match Complex MUL first which
> > reduces the
> > > +     amount of work this pattern has to do.  After that we just match the
> > > +     head node and we're done.:
> > > +
> > > +     * FMA: + +.
> > > +
> > > +     We need to ignore the two_operands nodes that may also match.
> > > +     For that we can check if they have any scalar statements and also
> > > +     check that it's not a permute node as we're looking for a normal
> > > +     PLUS_EXPR operation.  */
> > > +  if (op1 != CMPLX_NONE)
> > > +    return false;
> > > +
> > > +  /* Find the two components.  We match Complex MUL first which
> > reduces the
> > > +     amount of work this pattern has to do.  After that we just match the
> > > +     head node and we're done.:
> > > +
> > > +   * FMA: + + on a non-two_operands node.  */
> > > +  if (SLP_TREE_LANE_PERMUTATION (*this->m_node).exists ()
> > > +      || !SLP_TREE_SCALAR_STMTS (*this->m_node).exists ()
> > > +      || !vect_match_expression_p (*this->m_node, PLUS_EXPR))
> > > +    return false;
> > > +
> > > +  slp_tree node = SLP_TREE_CHILDREN (*this->m_node)[1];
> > > +
> > > +  if (vect_match_two_call_p (node, IFN_COMPLEX_MUL))
> > > +    this->m_ifn = IFN_COMPLEX_FMA;
> > > +  else if (vect_match_two_call_p (node, IFN_COMPLEX_MUL_CONJ))
> > > +    this->m_ifn = IFN_COMPLEX_FMA_CONJ;
> > > +  else
> > > +    return false;
> > > +
> > > +  this->m_mul_nodes.safe_splice (SLP_TREE_CHILDREN (node));
> > > +  node = SLP_TREE_CHILDREN (node)[0];
> > > +  this->m_ops.create (this->m_num_args);
> > > +
> > > +  if (this->m_ifn == IFN_COMPLEX_FMA)
> > > +    {
> > > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]);
> > > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]);
> > > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]);
> > > +    }
> > > +  else
> > > +    {
> > > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]);
> > > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]);
> > > +      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]);
> > > +    }
> > > +
> > > +  this->m_ncache.safe_push (*this->m_node);
> > > +  this->m_nodes = &this->m_ncache;
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +bool
> > > +complex_fma_pattern::matches (slp_tree_to_load_perm_map_t
> > *perm_cache)
> > > +{
> > > +  auto_vec<slp_tree> args0;
> > > +  complex_operation_t op
> > > +    = vect_detect_pair_op (*this->m_node, true, &args0);
> > > +  return matches (op, perm_cache, args0);
> > > +}
> > > +
> > > +bool
> > > +complex_fma_pattern::validate_p (slp_tree_to_load_perm_map_t
> > *perm_cache)
> > > +{
> > > +  /* FMA's top level node is + + and so is not two_operator,
> > > +     we only need one child.  */
> > > +  slp_tree node;
> > > +  auto_vec<slp_tree> nodes;
> > > +  auto_vec<slp_tree> new_nodes;
> > > +  nodes.create (this->m_num_args);
> > > +  hash_set<slp_tree> cache;
> > > +  unsigned ix;
> > > +  FOR_EACH_VEC_ELT (this->m_ops, ix, node)
> > > +    {
> > > +      slp_tree vnode = NULL;
> > > +      if (vect_slp_make_linear (perm_cache, *this->m_node, node,
> > &vnode))
> > > +	{
> > > +	  nodes.quick_push (vnode);
> > > +	  if (vnode != node)
> > > +	    new_nodes.safe_push (vnode);
> > > +	  if (SLP_TREE_CHILDREN (node).length () == 2)
> > > +	    {
> > > +	      cache.add(SLP_TREE_CHILDREN (node)[0]);
> > > +	      cache.add(SLP_TREE_CHILDREN (node)[1]);
> > > +	    }
> > > +	  else
> > > +	    cache.add (node);
> > > +	}
> > > +      else
> > > +	{
> > > +	  FOR_EACH_VEC_ELT (new_nodes, ix, node)
> > > +	    delete node;
> > > +
> > > +	  nodes.release();
> > > +	  return false;
> > > +	}
> > > +    }
> > > +
> > > +  SLP_TREE_CHILDREN (*this->m_node).truncate (0);
> > > +  SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes);
> > > +
> > > +  /* Ignore the part of the tree that becomes irrelevant for FMA.  */
> > > +  vect_slp_reset_pattern (this->m_mul_nodes[0]);
> > > +  vect_slp_reset_pattern (this->m_mul_nodes[1]);
> > > +  vect_mark_stmts_as_in_pattern (&cache, this->m_mul_nodes[1]);
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +
> > >
> > +/*********************************************************
> > **********************
> > > + * complex_fms_pattern class
> > > +
> > **********************************************************
> > ********************/
> > > +
> > > +class complex_fms_pattern : public complex_mul_pattern
> > > +{
> > > +  protected:
> > > +    complex_fms_pattern (slp_tree *node)
> > > +      : complex_mul_pattern (node)
> > > +    {
> > > +      this->m_num_args = 3;
> > > +    }
> > > +
> > > +  public:
> > > +    static vect_pattern* create (slp_tree *node)
> > > +    {
> > > +       return new complex_fms_pattern (node);
> > > +    }
> > > +
> > > +    const char* get_name ()
> > > +    {
> > > +      return "Complex FMS";
> > > +    }
> > > +
> > > +    bool matches (slp_tree_to_load_perm_map_t *);
> > > +    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
> > > +		  vec<slp_tree>);
> > > +};
> > > +
> > > +/* Pattern matcher for trying to match complex multiply and accumulate
> > > +   and multiply and subtract patterns in SLP tree.
> > > +   If the operation matches then IFN is set to the operation it matched and
> > > +   the arguments to the two replacement statements are put in m_ops.
> > > +
> > > +   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
> > > +
> > > +   This function matches the patterns shaped as:
> > > +
> > > +   double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
> > > +   double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
> > > +
> > > +   c[i] = c[i] - ax;
> > > +   c[i+1] = c[i+1] + bx;
> > > +
> > > +   If a match occurred then TRUE is returned, else FALSE.  The initial match
> > is
> > > +   expected to be in OP1 and the initial match operands in args0.  */
> > > +bool
> > > +complex_fms_pattern::matches (complex_operation_t op1,
> > > +			      slp_tree_to_load_perm_map_t *perm_cache,
> > > +			      vec<slp_tree> args0)
> > > +{
> > > +  this->m_ifn = IFN_LAST;
> > > +
> > > +  /* Find the two components.  We match Complex MUL first which
> > reduces the
> > > +     amount of work this pattern has to do.  After that we just match the
> > > +     head node and we're done.:
> > > +
> > > +     * FMS: - +.  */
> > > +  slp_tree child = NULL;
> > > +
> > > +  /* We need to ignore the two_operands nodes that may also match,
> > > +     for that we can check if they have any scalar statements and also
> > > +     check that it's not a permute node as we're looking for a normal
> > > +     PLUS_EXPR operation.  */
> > > +  if (op1 != PLUS_MINUS)
> > > +    return false;
> > > +
> > > +  child = SLP_TREE_CHILDREN (args0[1])[1];
> > > +  if (vect_detect_pair_op (child) != MINUS_PLUS)
> > > +    return false;
> > > +
> > > +  /* First two nodes must be a multiply.  */
> > > +  auto_vec<slp_tree> muls;
> > > +  if (vect_match_call_complex_mla (child, 0) != MULT_MULT
> > > +      || vect_match_call_complex_mla (child, 1, &muls) != MULT_MULT)
> > > +    return false;
> > > +
> > > +  /* Now operand2+4 may lead to another expression.  */
> > > +  auto_vec<slp_tree> left_op, right_op;
> > > +  left_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
> > > +  right_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
> > > +
> > > +  bool is_neg = vect_normalize_conj_loc (right_op);
> > > +
> > > +  this->m_ops.create (6);
> > > +  child = SLP_TREE_CHILDREN (args0[0])[0];
> > > +  this->m_ops.quick_push (child);
> > > +  if (!is_neg)
> > > +    {
> > > +      this->m_ifn = IFN_COMPLEX_FMS;
> > > +      this->m_ops.quick_push (left_op[1]);
> > > +      this->m_ops.quick_push (left_op[0]);
> > > +    }
> > > +  else if (is_neg)
> > > +    {
> > > +      bool conj_first_operand;
> > > +      if (!vect_validate_multiplication (perm_cache, left_op, right_op, false,
> > > +					 &conj_first_operand))
> > > +	return false;
> > > +
> > > +      this->m_ifn = IFN_COMPLEX_FMS_CONJ;
> > > +      /* Check if the conjucate is on the first or second parameter.  */
> > > +      if (!conj_first_operand)
> > > +	{
> > > +	  this->m_ops.quick_push (left_op[1]);
> > > +	  this->m_ops.quick_push (left_op[0]);
> > > +	}
> > > +      else
> > > +	{
> > > +	  this->m_ops.quick_push (left_op[0]);
> > > +	  this->m_ops.quick_push (left_op[1]);
> > > +	}
> > > +    }
> > > +
> > > +  this->m_ops.quick_push (child);
> > > +  this->m_ops.quick_push (right_op[1]);
> > > +  this->m_ops.quick_push (right_op[0]);
> > > +
> > > +  vect_build_perm_groups (perm_cache, &this->m_blocks[0], this-
> > >m_ops);
> > > +  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
> > > +  return true;
> > > +}
> > > +
> > > +bool
> > > +complex_fms_pattern::matches (slp_tree_to_load_perm_map_t
> > *perm_cache)
> > > +{
> > > +  auto_vec<slp_tree> args0;
> > > +  complex_operation_t op
> > > +    = vect_detect_pair_op (*this->m_node, true, &args0);
> > > +  return matches (op, perm_cache, args0);
> > > +}
> > > +
> > > +
> > >
> > +/*********************************************************
> > **********************
> > > + * complex_operations_pattern class
> > > +
> > **********************************************************
> > ********************/
> > > +
> > > +/* This function combines all the existing pattern matchers above into one
> > class
> > > +   that shares the functionality between them.  The initial match is shared
> > > +   between all complex operations.  */
> > > +
> > > +class complex_operations_pattern : public complex_pattern
> > > +{
> > > +  protected:
> > > +    complex_operations_pattern (slp_tree *node)
> > > +      : complex_pattern (node)
> > > +    { }
> > > +
> > > +  gcall *build ();
> > > +
> > > +  /* The current "active" pattern.  */
> > > +  vect_pattern *m_patt = NULL;
> > > +
> > > +  public:
> > > +    using complex_pattern::matches;
> > > +
> > > +    bool matches (slp_tree_to_load_perm_map_t *perm_cache);
> > > +    const char* get_name ();
> > > +    bool validate_p (slp_tree_to_load_perm_map_t *perm_cache);
> > > +    bool is_optab_supported_p (tree, optimization_type);
> > > +    internal_fn get_ifn ();
> > > +    gcall *build (vec_info *);
> > > +    void reset (slp_tree *);
> > > +    ~complex_operations_pattern ();
> > > +
> > > +    static vect_pattern* create (slp_tree *node)
> > > +    {
> > > +       return new complex_operations_pattern (node);
> > > +    }
> > > +};
> > > +
> > > +
> > > +/* Clean up the pattern if one existed.  */
> > > +
> > > +complex_operations_pattern::~complex_operations_pattern ()
> > > +{
> > > +  if (this->m_patt)
> > > +    delete this->m_patt;
> > > +  this->m_patt = NULL;
> > > +}
> > > +
> > > +/* Match but do not perform any additional operations on the SLP tree.  If
> > the
> > > +   match is successful then that pattern is made "active" in the class.  */
> > > +
> > > +bool
> > > +complex_operations_pattern::matches (slp_tree_to_load_perm_map_t
> > *perm_cache)
> > > +{
> > > +  complex_operation_t op
> > > +    = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
> > > +
> > > +  /* Check which pattern this may be.  Match longest pattern first.  */
> > > +  this->m_patt = complex_fma_pattern::create (this->m_node);
> > > +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > > +    return true;
> > 
> > Hmm, all of these allocations like also
> > 
> > +  for (unsigned x = 0; x < num__slp_patterns; x++)
> > +    {
> > +      vect_pattern *pattern = slp_patterns[x] (ref_node);
> > +      found_p = pattern->recognize (perm_cache, vinfo);
> > +      delete pattern;
> > +      found_rec_p = found_p | found_rec_p;
> > +    }
> > 
> > in the main driver are really due to the C++-ish layout of all this :/
> > I don't like that very much.  It also highlights when reading
> > the patch that it's hard to get the difference betwee
> > recognize (), match () and validate () where all but the first is
> > kind-of internal implementation details of the complex matchers.
> > 
> > So we could eventually get by with the main driver doing
> > 
> > +  for (unsigned x = 0; x < num__slp_patterns; x++)
> > +    {
> > +      vect_pattern *pattern = slp_patterns[x] (ref_node);
> >        if (pattern)
> >          found_rec_p = true;
> >      }
> > 
> > thus merge ->recognize with the static allocator and not do
> > any heap allocation when nothing is matched?  Then ...
> > 
> > > +  if (op == CMPLX_NONE)
> > > +    return false;
> > > +
> > > +  delete this->m_patt;
> > > +
> > > +  this->m_patt = complex_fms_pattern::create (this->m_node);
> > 
> > ... this could use static typing and an automatic instance?
> > 
> > The myth that allocation & deallocation is cheap is really that,
> > a myth.  We're doing 6 allocations & deallocations at this
> > very toplevel for each SLP node (which means eventually for
> > each statement in a function).
> > 
> > 
> > So why for example would one need to override recognize ()?
> > I'd argue for nailing it in place and to remove validate_p ()
> > (should be done by matches?).  At least validate_p for
> > the complex patterns seem to build SLP nodes which mean
> > they actually build()?
> > 
> > 
> > I think overall the patch looks sensible but it's still quite
> > hard to follow the flow of an actual pattern match, possibly
> > because of the strange factoring.
> > 
> > At this point to speed up things can you split the patch
> > so there's only the very simplest (complex_add for example)
> > matcher plus requirements?  Then we can push that which
> > would allow me to actually play with it more easily and
> > even try out changes?
> > 
> > I'd say split out the optimize_load_redistribution part of
> > the first patch and merge the complex_add support into
> > its remains so we have a first patch that does something?
> > 
> > Thanks and sorry for being dense here ... :/  There's pieces
> > I think we should do better but I still can't lay out a
> > solution and I guess pointing fingers at the individual pieces
> > like above will lead to very slow progress only.
> > 
> > Richard.
> > 
> > > +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > > +    return true;
> > > +
> > > +  delete this->m_patt;
> > > +
> > > +  this->m_patt = complex_mul_pattern::create (this->m_node);
> > > +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > > +    return true;
> > > +
> > > +  delete this->m_patt;
> > > +
> > > +  this->m_patt = complex_add_pattern::create (this->m_node);
> > > +  if (this->m_patt->matches (op, perm_cache, this->m_ops))
> > > +    return true;
> > > +
> > > +  delete this->m_patt;
> > > +
> > > +  this->m_patt = NULL;
> > > +  return false;
> > > +}
> > > +
> > > +/* Friendly name of the operation the pattern matches.  */
> > > +
> > > +const char*
> > > +complex_operations_pattern::get_name ()
> > > +{
> > > +  if (!this->m_patt)
> > > +    return "Complex Operations";
> > > +
> > > +  return this->m_patt->get_name ();
> > > +}
> > > +
> > > +/* Only perform the pattern creation part of the matcher.  This creates
> > and
> > > +   returns the new pattern statement and encapsulates the currently
> > active
> > > +   pattern.  */
> > > +
> > > +gcall *
> > > +complex_operations_pattern::build (vec_info *vinfo)
> > > +{
> > > +  if (!this->m_patt)
> > > +    return NULL;
> > > +
> > > +  return this->m_patt->build (vinfo);
> > > +}
> > > +
> > > +/* Validate the operation the currently active pattern detected.  */
> > > +
> > > +bool
> > > +complex_operations_pattern::validate_p
> > (slp_tree_to_load_perm_map_t *perm_cache)
> > > +{
> > > +  if (!this->m_patt)
> > > +    return false;
> > > +
> > > +  return this->m_patt->validate_p (perm_cache);
> > > +}
> > > +
> > > +/* Check if the obtab of the currently active matches is supported.  */
> > > +
> > > +bool
> > > +complex_operations_pattern::is_optab_supported_p (tree vectype,
> > > +						  optimization_type opt_type)
> > > +{
> > > +  if (!vectype || !this->m_patt)
> > > +    return false;
> > > +
> > > +  return this->m_patt->is_optab_supported_p (vectype, opt_type);
> > > +}
> > > +
> > > +/* Return the matched IFN from the current active matcher if any.  */
> > > +
> > > +internal_fn
> > > +complex_operations_pattern::get_ifn ()
> > > +{
> > > +  if (!this->m_patt)
> > > +    return IFN_LAST;
> > > +
> > > +  return this->m_patt->get_ifn ();
> > > +}
> > > +
> > >
> > +/*********************************************************
> > **********************
> > > + * Pattern matching definitions
> > > +
> > **********************************************************
> > ********************/
> > > +
> > > +#define SLP_PATTERN(x) &x::create
> > > +vect_pattern_decl_t slp_patterns[]
> > > +{
> > > +  /* For least amount of back-tracking and more efficient matching
> > > +     order patterns from the largest to the smallest.  Especially if they
> > > +     overlap in what they can detect.  */
> > > +
> > > +  /* FMA overlaps with MUL but is the longer sequence.  Because we're in
> > post
> > > +     order traversal we can't match FMA if included in
> > > +     complex_operations_pattern so must be checked on it's own.  */
> > > +  SLP_PATTERN (complex_operations_pattern),
> > > +};
> > > +#undef SLP_PATTERN
> > > +
> > > +/* Set the number of SLP pattern matchers available.  */
> > > +size_t num__slp_patterns =
> > sizeof(slp_patterns)/sizeof(vect_pattern_decl_t);
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg,
> > Germany; GF: Felix Imend
>
diff mbox series

Patch

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 273654cfa2525099ed0d9adc2f9085c8408b096f..4b556bdbbc2f989d894ef2dbe894318858eaa7aa 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1646,6 +1646,7 @@  OBJS = \
 	tree-vect-loop.o \
 	tree-vect-loop-manip.o \
 	tree-vect-slp.o \
+	tree-vect-slp-patterns.o \
 	tree-vectorizer.o \
 	tree-vector-builder.o \
 	tree-vrp.o \
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 813875b973bc73920f1195c4897ce3491ca713c2..68792e4c3eeca2bce0112fc28eba38700e66dc00 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -1493,7 +1493,7 @@  can be described by a series of letters for each operand.  The overall
 constraint for an operand is made from the letters for this operand
 from the first alternative, a comma, the letters for this operand from
 the second alternative, a comma, and so on until the last alternative.
-All operands for a single instruction must have the same number of 
+All operands for a single instruction must have the same number of
 alternatives.
 @ifset INTERNALS
 Here is how it is done for fullword logical-or on the 68000:
@@ -1558,17 +1558,17 @@  the first, 1 for the second alternative, etc.).  @xref{Output Statement}.
 @end ifset
 @ifclear INTERNALS
 
-So the first alternative for the 68000's logical-or could be written as 
-@code{"+m" (output) : "ir" (input)}.  The second could be @code{"+r" 
-(output): "irm" (input)}.  However, the fact that two memory locations 
-cannot be used in a single instruction prevents simply using @code{"+rm" 
-(output) : "irm" (input)}.  Using multi-alternatives, this might be 
+So the first alternative for the 68000's logical-or could be written as
+@code{"+m" (output) : "ir" (input)}.  The second could be @code{"+r"
+(output): "irm" (input)}.  However, the fact that two memory locations
+cannot be used in a single instruction prevents simply using @code{"+rm"
+(output) : "irm" (input)}.  Using multi-alternatives, this might be
 written as @code{"+m,r" (output) : "ir,irm" (input)}.  This describes
-all the available alternatives to the compiler, allowing it to choose 
+all the available alternatives to the compiler, allowing it to choose
 the most efficient one for the current conditions.
 
-There is no way within the template to determine which alternative was 
-chosen.  However you may be able to wrap your @code{asm} statements with 
+There is no way within the template to determine which alternative was
+chosen.  However you may be able to wrap your @code{asm} statements with
 builtins such as @code{__builtin_constant_p} to achieve the desired results.
 @end ifclear
 
@@ -3062,7 +3062,7 @@  instruction taking only the upper 16-bits of a
 16-bits being 0.
 
 @item L
-Integer that is valid as an immediate operand for a 
+Integer that is valid as an immediate operand for a
 shift instruction. Range 0 to 31.
 
 @item M
@@ -3076,7 +3076,7 @@  Integer that is valid as an immediate operand for
 a custom instruction opcode. Range 0 to 255.
 
 @item P
-An immediate operand for R2 andchi/andci instructions. 
+An immediate operand for R2 andchi/andci instructions.
 
 @item S
 Matches immediates which are addresses in the small
@@ -6132,6 +6132,97 @@  floating-point mode.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{cadd@var{m}@var{n}3} instruction pattern
+@item @samp{cadd@var{m}@var{n}3}
+Perform a vector addition of complex numbers in operand 1 with operand 2
+rotated by @var{m} degrees around the argand plane and storing the result in
+operand 0.  The instruction must perform the operation on data that in GCC lane
+order where the 0th lane holds the real part and the the imaginary part in lane
+1.
+
+The operation is only supported for vector modes @var{n} and with
+rotations @var{m} of 90 or 270.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmla@var{m}4} instruction pattern
+@item @samp{cmla@var{m}4}
+Perform a vector floating point multiply and accumulate of complex numbers
+in operand 0, operand 1 and operand 2.
+
+The instruction must perform the operation on data that in GCC lane
+order where the 0th lane holds the real part and the the imaginary part in lane
+1.
+
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmla_conj@var{m}4} instruction pattern
+@item @samp{cmla_conj@var{m}4}
+Perform a vector floating point multiply and accumulate of complex numbers
+in operand 0, operand 1 and the conjucate of operand 2.
+
+The instruction must perform the operation on data that in GCC lane
+order where the 0th lane holds the real part and the the imaginary part in lane
+1.
+
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmls@var{m}4} instruction pattern
+@item @samp{cmls@var{m}4}
+Perform a vector floating point multiply and subtract of complex numbers
+in operand 0, operand 1 and operand 2.
+
+The instruction must perform the operation on data that in GCC lane
+order where the 0th lane holds the real part and the the imaginary part in lane
+1.
+
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmls_conj@var{m}4} instruction pattern
+@item @samp{cmls_conj@var{m}4}
+Perform a vector floating point multiply and subtract of complex numbers
+in operand 0, operand 1 and the conjucate of operand 2.
+
+The instruction must perform the operation on data that in GCC lane
+order where the 0th lane holds the real part and the the imaginary part in lane
+1.
+
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmul@var{m}4} instruction pattern
+@item @samp{cmul@var{m}4}
+Perform a vector floating point multiplication of complex numbers in operand 0
+and operand 1.
+
+The instruction must perform the operation on data that in GCC lane
+order where the 0th lane holds the real part and the the imaginary part in lane
+1.
+
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
+@cindex @code{cmul_conj@var{m}4} instruction pattern
+@item @samp{cmul_conj@var{m}4}
+Perform a vector floating point multiplication of complex numbers in operand 0
+and the conjucate of operand 1.
+
+The instruction must perform the operation on data that in GCC lane
+order where the 0th lane holds the real part and the the imaginary part in lane
+1.
+
+The operation is only supported for vector modes @var{m}.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{ffs@var{m}2} instruction pattern
 @item @samp{ffs@var{m}2}
 Store into operand 0 one plus the index of the least significant 1-bit
@@ -7409,7 +7500,7 @@  If this pattern is not defined, then a @code{memory_barrier} pattern
 will be emitted, followed by a store of the value to the memory operand.
 
 @cindex @code{atomic_compare_and_swap@var{mode}} instruction pattern
-@item @samp{atomic_compare_and_swap@var{mode}} 
+@item @samp{atomic_compare_and_swap@var{mode}}
 This pattern, if defined, emits code for an atomic compare-and-swap
 operation with memory model semantics.  Operand 2 is the memory on which
 the atomic operation is performed.  Operand 0 is an output operand which
@@ -7424,7 +7515,7 @@  if the operation fails.
 
 If memory referred to in operand 2 contains the value in operand 3, then
 operand 4 is stored in memory pointed to by operand 2 and fencing based on
-the memory model in operand 6 is issued.  
+the memory model in operand 6 is issued.
 
 If memory referred to in operand 2 does not contain the value in operand 3,
 then fencing based on the memory model in operand 7 is issued.
@@ -7500,9 +7591,9 @@  none of these are available a compare-and-swap loop will be used.
 @itemx @samp{atomic_fetch_or@var{mode}}, @samp{atomic_fetch_and@var{mode}}
 @itemx @samp{atomic_fetch_xor@var{mode}}, @samp{atomic_fetch_nand@var{mode}}
 These patterns emit code for an atomic operation on memory with memory
-model semantics, and return the original value. Operand 0 is an output 
-operand which contains the value of the memory location before the 
-operation was performed.  Operand 1 is the memory on which the atomic 
+model semantics, and return the original value. Operand 0 is an output
+operand which contains the value of the memory location before the
+operation was performed.  Operand 1 is the memory on which the atomic
 operation is performed.  Operand 2 is the second operand to the binary
 operator.  Operand 3 is the memory model to be used by the operation.
 
@@ -7859,7 +7950,7 @@  simplified) from the PDP-11 target:
         (plus:HI (match_dup 0)
               (const_int -1)))]
   ""
-  
+
   @{
     if (which_alternative == 0)
       return "sob %0,%l1";
diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index a5ae4143a8c1293e674b499120372ee5fe5c412b..c86df5cd843084a5b7933ef99a23386891a7b0c1 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -709,7 +709,8 @@  loop.
 The pass is implemented in @file{tree-vectorizer.c} (the main driver),
 @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts
 and general loop utilities), @file{tree-vect-slp} (loop-aware SLP
-functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}.
+functionality), @file{tree-vect-stmts.c}, @file{tree-vect-data-refs.c} and
+@file{tree-vect-slp-patterns.c} containing the SLP pattern matcher.
 Analysis of data references is in @file{tree-data-ref.c}.
 
 SLP Vectorization.  This pass performs vectorization of straight-line code. The
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 310d37aa53819791b5df1683afca831f08e5892a..acb7d9f3bdc757437d5492a652144ba31c2ef702 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -277,12 +277,21 @@  DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
+
 
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
 
 /* Ternary math functions.  */
 DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA, ECF_CONST, cmla, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMA_CONJ, ECF_CONST, cmla_conj, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS, ECF_CONST, cmls, ternary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_FMS_CONJ, ECF_CONST, cmls_conj, ternary)
 
 /* Unary integer ops.  */
 DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..19db9c00896cd08adfd20a01669990bbbebd79f1 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -290,6 +290,14 @@  OPTAB_D (atan_optab, "atan$a2")
 OPTAB_D (atanh_optab, "atanh$a2")
 OPTAB_D (copysign_optab, "copysign$F$a3")
 OPTAB_D (xorsign_optab, "xorsign$F$a3")
+OPTAB_D (cadd90_optab, "cadd90$a3")
+OPTAB_D (cadd270_optab, "cadd270$a3")
+OPTAB_D (cmul_optab, "cmul$a3")
+OPTAB_D (cmul_conj_optab, "cmul_conj$a3")
+OPTAB_D (cmla_optab, "cmla$a4")
+OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
+OPTAB_D (cmls_optab, "cmls$a4")
+OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
new file mode 100644
index 0000000000000000000000000000000000000000..7a2c64aeb9f212cd3ee18adf110e6ac6696249fd
--- /dev/null
+++ b/gcc/tree-vect-slp-patterns.c
@@ -0,0 +1,1844 @@ 
+/* SLP - Pattern matcher on SLP trees
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "rtl.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "optabs-tree.h"
+#include "insn-config.h"
+#include "recog.h"		/* FIXME: for insn_data */
+#include "fold-const.h"
+#include "stor-layout.h"
+#include "gimple-iterator.h"
+#include "cfgloop.h"
+#include "tree-vectorizer.h"
+#include "langhooks.h"
+#include "gimple-walk.h"
+#include "dbgcnt.h"
+#include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
+#include "gimple-fold.h"
+#include "internal-fn.h"
+
+/* SLP Pattern matching mechanism.
+
+  This extension to the SLP vectorizer allows one to transform the generated SLP
+  tree based on any pattern.  The difference between this and the normal vect
+  pattern matcher is that unlike the former, this matcher allows you to match
+  with instructions that do not belong to the same SSA dominator graph.
+
+  The only requirement that this pattern matcher has is that you are only
+  only allowed to either match an entire group or none.
+
+  The pattern matcher currently only allows you to perform replacements to
+  internal functions.
+
+  Once the patterns are matched it is one way, these cannot be undone.  It is
+  currently not supported to match patterns recursively.
+
+  To add a new pattern, implement the vect_pattern class and add the type to
+  slp_patterns.
+
+*/
+
+/*******************************************************************************
+ * vect_pattern class
+ ******************************************************************************/
+
+/* Default implementation of recognize that peforms matching, validation and
+   replacement of nodes but that can be overriden if required.  */
+
+bool
+vect_pattern::recognize (slp_tree_to_load_perm_map_t *perm_cache,
+			 vec_info *vinfo)
+{
+  if (this->matches (perm_cache))
+    {
+      tree vectype = SLP_TREE_VECTYPE (*this->m_node);
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "Found %s pattern in SLP tree\n",
+			 this->get_name ());
+
+      if (this->is_optab_supported_p (vectype, OPTIMIZE_FOR_SPEED))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "Target supports %s vectorization with mode %T\n",
+			     internal_fn_name (this->get_ifn ()), vectype);
+	}
+      else
+	{
+	  if (dump_enabled_p ())
+	    {
+	      if (!vectype)
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "Target does not support vector type for %T\n",
+				 SLP_TREE_DEF_TYPE (*this->m_node));
+	      else
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "Target does not support %s for vector type "
+				 "%T\n", internal_fn_name (this->get_ifn ()),
+				 vectype);
+	    }
+          return false;
+	}
+    }
+  else
+    return false;
+
+  if (!this->validate_p (perm_cache))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			"transformation for %s not valid due to post "
+			"condition\n",
+			internal_fn_name (this->get_ifn ()));
+      return false;
+    }
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "Creating vec patterns\n");
+
+  gcall* call_stmt = this->build (vinfo);
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "\t %p stmt: %G",
+		     *this->m_node, call_stmt);
+
+  return true;
+}
+
+/*******************************************************************************
+ * General helper types
+ ******************************************************************************/
+
+/* The COMPLEX_OPERATION enum denotes the possible pair of operations that can
+   be matched when looking for expressions that we are interested matching for
+   complex numbers addition and mla.  */
+
+typedef enum _complex_operation : unsigned {
+  PLUS_PLUS,
+  MINUS_PLUS,
+  PLUS_MINUS,
+  MULT_MULT,
+  CMPLX_NONE
+} complex_operation_t;
+
+/* A simple pair type used for ordering of combine operations.  */
+
+typedef struct map_ { int a; int b; } map_t;
+
+/*******************************************************************************
+ * General helper functions
+ ******************************************************************************/
+
+/* Helper function of linear_loads_p that checks to see if the load permutation
+   is sequential and in monotonically increasing order of loads with no gaps.
+*/
+
+static inline bool
+is_linear_load_p (load_permutation_t loads)
+{
+  if (loads.length() == 0)
+    return false;
+
+  unsigned leader = loads[0];
+  unsigned load, i;
+  FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
+    if (load != ++leader)
+      return false;
+  return true;
+}
+
+
+/* Check to see if all loads rooted in ROOT are linear.  Linearity is
+   defined as having no gaps between values loaded.  */
+
+static load_permutation_t
+linear_loads_p (slp_tree_to_load_perm_map_t *perm_cache, slp_tree root,
+		bool *linear)
+{
+  *linear = false;
+  if (!root)
+    return vNULL;
+
+  unsigned i;
+  load_permutation_t loads = vNULL;
+  load_permutation_t *tmp;
+
+  if ((tmp = perm_cache->get (root)) != NULL)
+    {
+      *linear = is_linear_load_p (*tmp);
+      return *tmp;
+    }
+
+  perm_cache->put (root, vNULL);
+
+  /* If it's a load node, then just read the load permute.  */
+  if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
+    {
+      loads = SLP_TREE_LOAD_PERMUTATION (root);
+      perm_cache->put (root, loads);
+      if (!is_linear_load_p (loads))
+	return loads;
+    }
+  else if (SLP_TREE_DEF_TYPE (root) == vect_external_def)
+    {
+       loads.create (SLP_TREE_LANES (root));
+       tree op;
+       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_OPS (root), i, op)
+	 {
+	   gimple *defstmt = SSA_NAME_DEF_STMT (op);
+	   if (!is_gimple_assign (defstmt))
+	     return vNULL;
+
+	   switch (gimple_assign_rhs_code (defstmt))
+	   {
+	     case IMAGPART_EXPR:
+	       loads.safe_push (1);
+	       break;
+	     case REALPART_EXPR:
+	       loads.safe_push (0);
+	       break;
+	     default:
+	       {
+		 loads.release ();
+		 return vNULL;
+	       }
+	   }
+	 }
+
+       perm_cache->put (root, loads);
+       if (!is_linear_load_p (loads))
+	 return loads;
+    }
+  else if (SLP_TREE_DEF_TYPE (root) != vect_internal_def)
+    return vNULL;
+
+
+
+  auto_vec<load_permutation_t> all_loads;
+  bool is_perm = SLP_TREE_LANE_PERMUTATION (root).exists ();
+
+  slp_tree child;
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
+    {
+      /* Seed it with a value to prevent infinite recursions.  */
+      loads = linear_loads_p (perm_cache, child, linear);
+      if ((!*linear && !is_perm) || !loads.exists ())
+	return loads;
+
+      all_loads.safe_push (loads);
+    }
+
+  if (is_perm)
+    {
+      lane_permutation_t perm = SLP_TREE_LANE_PERMUTATION (root);
+      load_permutation_t nloads;
+      nloads.create (SLP_TREE_LANES (root));
+      nloads.quick_grow (SLP_TREE_LANES (root));
+      for (i = 0; i < SLP_TREE_LANES (root); i++)
+	nloads[i] = all_loads[perm[i].first][perm[i].second];
+
+      perm_cache->put (root, nloads);
+      if (!is_linear_load_p (nloads))
+	return nloads;
+      loads = nloads;
+    }
+
+  perm_cache->put (root, loads);
+  *linear = true;
+  return loads;
+}
+
+/* Builds a permutation group from the operands in OPS and stores it in BLOCKS.
+   The group describes how to combine the operators to get a valid linear node.
+
+   This is used when combining multiple children from a two_operators node into
+   one using a lane permute to select the appropriate lane. As an example the
+   permute { [0 0] [1 4] [2 2] [3 3] [1 4] [5 5] } says the nodes which occur
+   twice in a group, e.g [0 0] only needs itself to possibly be made linear
+   whereas [1 4] means to combine the nodes 1 and 4. */
+
+static void
+vect_build_perm_groups (slp_tree_to_load_perm_map_t *perm_cache,
+			map_t *blocks, vec<slp_tree> ops)
+{
+  slp_tree op;
+  unsigned i;
+  bool is_linear = false;
+  unsigned min_eq = -1, max_eq = 0;
+  unsigned min_idx = 0, max_idx = 0;
+  FOR_EACH_VEC_ELT (ops, i, op)
+    {
+      load_permutation_t perms = linear_loads_p (perm_cache, op, &is_linear);
+      unsigned x, imin = -1, imax = 0;
+      for (x = 0; x < perms.length () && !is_linear; x++)
+	{
+	  imin = MIN (imin, perms[x]);
+	  imax = MAX (imax, perms[x]);
+	}
+
+	if (imin != imax || perms.length () == 0 || is_linear)
+	  blocks[i] = {i, i};
+	else
+	  {
+	    if (imin <= min_eq)
+	      {
+		min_eq = imin;
+		min_idx = i;
+	      }
+
+	    if (imin >= max_eq)
+	      {
+		max_eq = imin;
+		max_idx = i;
+	      }
+	  }
+    }
+
+  /* Now fill in the gap.  */
+  blocks[min_idx] = {min_idx, max_idx};
+  blocks[max_idx] = {min_idx, max_idx};
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location, "pattern group: { ");
+      for (i = 0; i < ops.length (); i++)
+	dump_printf (MSG_NOTE,"[%d %d] ", blocks[i].a, blocks[i].b);
+      dump_printf (MSG_NOTE,"}\n");
+    }
+
+}
+
+/* This function attempts to make a node rooted in NODE with parent PARENT
+   linear.  If the node if already linear than the node itself is returned
+   in RESULT.
+
+   If the node is not linear then a new VEC_PERM_EXPR node is created with a
+   lane permute that when applied will make the node linear.   If such a
+   permute cannot be created then FALSE is returned from the function.
+
+   Here linearity is defined as having a sequential, monotically increasing
+   load position inside the load permute generated by the loads reachable from
+   NODE.  */
+
+static bool
+vect_slp_make_linear (slp_tree_to_load_perm_map_t *perm_cache,
+		      slp_tree parent, slp_tree node, slp_tree *result)
+{
+  bool is_linear = false;
+  unsigned x, val;
+  load_permutation_t load_perm = linear_loads_p (perm_cache, node, &is_linear);
+  if (is_linear)
+    {
+      *result = node;
+      return true;
+    }
+
+  /* Attempt to linearise the permute.  */
+  vec<std::pair<unsigned, unsigned> > zipped;
+  zipped.create (load_perm.length ());
+  FOR_EACH_VEC_ELT (load_perm, x, val)
+    zipped.quick_push (std::make_pair (val, x));
+
+  typedef const std::pair<unsigned, unsigned>* cmp_t;
+  zipped.qsort ([](const void *a, const void *b) -> int
+    { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
+
+  /* Verify if we have a linear permute sequence.  */
+  if (zipped.length () > 0)
+    {
+      unsigned leader = zipped[0].first;
+      for (x = 1; x < zipped.length (); x++)
+	if(!(is_linear = (zipped[x].first == ++leader)))
+	  break;
+    }
+
+  if (!is_linear)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			"Loads could not be made linear %p\n",
+			node);
+      zipped.release ();
+      return false;
+  }
+
+  for (x = 0; x < zipped.length (); x++)
+    zipped[x].first = 0;
+
+  /* Create the new permute node and store it instead.  */
+  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
+  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
+  SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
+  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
+  SLP_TREE_CHILDREN (vnode).quick_push (node);
+  SLP_TREE_REF_COUNT (vnode) = 1;
+  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
+  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (parent);
+  *result = vnode;
+  return is_linear;
+}
+
+/* Helper utility to check to see if the permutation PERM is one that can be
+   used in a node combination operation.  This is defined as the permute not
+   having all the elements being the same. e.g [0 0].  */
+
+static inline bool
+vect_can_combine_node_p (load_permutation_t perm, bool is_linear)
+{
+  if (is_linear)
+    return false;
+
+  unsigned i, x;
+  FOR_EACH_VEC_ELT (perm, i, x)
+    if (perm[0] != x)
+      return false;
+
+  return true;
+}
+
+/* This function combines the nodes in MAP together to make a new node using a
+   lane permute.  The nodes to combine are stored in ENTRIES and the resulting
+   node is returned in RESULT.
+
+   If the nodes are already linear then this function fails and returns FALSE.
+   Otherwise it returns the new node and TRUE.  */
+
+static bool
+vect_slp_make_combine_linear (slp_tree_to_load_perm_map_t *perm_cache,
+			      slp_tree parent, vec<slp_tree> entries, map_t map,
+			      slp_tree *result)
+{
+  if (map.a == map.b)
+    return false;
+
+  slp_tree node_a = entries[map.a];
+  slp_tree node_b = entries[map.b];
+
+  bool is_a_linear = false;
+  bool is_b_linear = false;
+
+  load_permutation_t load_perm_a
+    = linear_loads_p (perm_cache, node_a, &is_a_linear);
+  if (!vect_can_combine_node_p (load_perm_a, is_a_linear))
+    return false;
+
+  load_permutation_t load_perm_b
+    = linear_loads_p (perm_cache, node_b, &is_b_linear);
+  if (!vect_can_combine_node_p (load_perm_b, is_b_linear))
+    return false;
+
+  if (!load_perm_a.exists () || !load_perm_b.exists ())
+    return false;
+
+  /* Now we need to figure which node is first.  */
+  auto_vec<slp_tree> nodes;
+  nodes.create (2);
+  vec<std::pair<unsigned, unsigned> > perm;
+  perm.create (2);
+  if (load_perm_a[0] < load_perm_b[0])
+    {
+      perm.quick_push (std::make_pair (0, 0));
+      perm.quick_push (std::make_pair (1, 0));
+    }
+  else
+    {
+      perm.quick_push (std::make_pair (1, 0));
+      perm.quick_push (std::make_pair (0, 0));
+    }
+
+  nodes.quick_push (node_a);
+  nodes.quick_push (node_b);
+
+  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
+  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
+  SLP_TREE_LANE_PERMUTATION (vnode) = perm;
+  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
+  SLP_TREE_CHILDREN (vnode).safe_splice (nodes);
+  SLP_TREE_REF_COUNT (vnode) = 1;
+  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (parent);
+  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (parent);
+  *result = vnode;
+  return true;
+}
+
+/* Checks to see of the expression represented by NODE is a gimple assign with
+   code CODE.  */
+
+static inline bool
+vect_match_expression_p (slp_tree node, tree_code code)
+{
+  if (!node
+      || !SLP_TREE_REPRESENTATIVE (node))
+    return false;
+
+  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node));
+  if (!is_gimple_assign (expr)
+      || gimple_assign_rhs_code (expr) != code)
+    return false;
+
+  return true;
+}
+
+/* Checks to see if the expression represented by NODE is a call to the internal
+   function FN.  */
+
+static inline bool
+vect_match_call_p (slp_tree node, internal_fn fn)
+{
+  if (!node
+      || !SLP_TREE_REPRESENTATIVE (node))
+    return false;
+
+  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node));
+  if (!gimple_call_internal_p (expr, fn))
+    return false;
+
+   return true;
+}
+
+/* Checks to see if NODE is a two_operands node whom's operands are calls to the
+   internal function FN.  */
+
+static bool
+vect_match_two_call_p (slp_tree node, internal_fn fn)
+{
+  if (!SLP_TREE_REPRESENTATIVE (node)
+      || SLP_TREE_CODE (node) != VEC_PERM_EXPR
+      || SLP_TREE_CHILDREN (node).length () != 2)
+    return false;
+
+  vec<slp_tree> children = SLP_TREE_CHILDREN (node);
+
+  return vect_match_call_p (children[0], fn)
+	 && vect_match_call_p (children[1], fn);
+}
+
+/* Check if the given lane permute in PERMUTES matches an alternating sequence
+   of {P0 P1 P0 P1 ...}.  This to account for unrolled loops.  Further mode
+   there resulting permute must be linear.   */
+
+static bool
+vect_check_lane_permute (lane_permutation_t &permutes,
+			 unsigned p0, unsigned p1)
+{
+  if (permutes.length () == 0)
+    return false;
+
+  unsigned val[2] = {p0, p1};
+  unsigned seed = permutes[0].second;
+  for (unsigned i = 0; i < permutes.length (); i++)
+    if (permutes[i].first != val[i % 2]
+	|| permutes[i].second != seed++)
+      return false;
+
+  return true;
+}
+
+/* This function will match the two gimple expressions representing NODE1 and
+   NODE2 in parallel and returns the pair operation that represents the two
+   expressions in the two statements.
+
+   If match is successful then the corresponding complex_operation is
+   returned and the arguments to the two matched operations are returned in OPS.
+
+   If TWO_OPERANDS it is expected that the LANES of the parent VEC_PERM select
+   from the two nodes alternatingly.
+
+   If unsuccessful then CMPLX_NONE is returned and OPS is untouched.
+
+   e.g. the following gimple statements
+
+   stmt 0 _39 = _37 + _12;
+   stmt 1 _6 = _38 - _36;
+
+   will return PLUS_MINUS along with OPS containing {_37, _12, _38, _36}.
+*/
+
+static complex_operation_t
+vect_detect_pair_op (slp_tree node1, slp_tree node2, lane_permutation_t &lanes,
+		     bool two_operands = true, vec<slp_tree> *ops = NULL)
+{
+  complex_operation_t result = CMPLX_NONE;
+
+  if (vect_match_expression_p (node1, MINUS_EXPR)
+      && vect_match_expression_p (node2, PLUS_EXPR)
+      && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
+    result = MINUS_PLUS;
+  else if (vect_match_expression_p (node1, PLUS_EXPR)
+	   && vect_match_expression_p (node2, MINUS_EXPR)
+	   && (!two_operands || vect_check_lane_permute (lanes, 0, 1)))
+    result = PLUS_MINUS;
+  else if (vect_match_expression_p (node1, PLUS_EXPR)
+	   && vect_match_expression_p (node2, PLUS_EXPR))
+    result = PLUS_PLUS;
+  else if (vect_match_expression_p (node1, MULT_EXPR)
+	   && vect_match_expression_p (node2, MULT_EXPR))
+    result = MULT_MULT;
+
+  if (result != CMPLX_NONE && ops != NULL)
+    {
+      ops->create (2);
+      ops->quick_push (node1);
+      ops->quick_push (node2);
+    }
+  return result;
+}
+
+/* Overload of vect_detect_pair_op that matches against the representative
+   statements in the children of NODE.  It is expected that NODE has exactly
+   two children and when TWO_OPERANDS then NODE must be a VEC_PERM.  */
+
+static complex_operation_t
+vect_detect_pair_op (slp_tree node, bool two_operands = true,
+		     vec<slp_tree> *ops = NULL)
+{
+  if (!two_operands && SLP_TREE_CODE (node) == VEC_PERM_EXPR)
+    return CMPLX_NONE;
+
+  if (SLP_TREE_CHILDREN (node).length () != 2)
+    return CMPLX_NONE;
+
+  vec<slp_tree> children = SLP_TREE_CHILDREN (node);
+  lane_permutation_t &lanes = SLP_TREE_LANE_PERMUTATION (node);
+
+  return vect_detect_pair_op (children[0], children[1], lanes, two_operands,
+			      ops);
+}
+
+/* This function marks every statement that is being replaced during the
+   the pattern matching as PURE.  Normally when replacing a statement due
+   to a pattern we add the statement to the STMT_VINFO_PATTERN_DEF_SEQ of
+   the pattern that is replacing them.  In this case however this won't
+   work as when doing the replacement we are changing the nodes that are
+   used by the statements.  This means that when vectorized the SSA chain
+   is different than in the BB.
+
+   Declaring the statements as part of the sequence will then cause SSA
+   verification to fail as we may refer to statements that were not in the
+   original USE-DEF chain of the statement we are replacing.
+
+   The relevancy of the statements are saved and are later restored if SLP is to
+   be dissolved.  */
+
+static void
+vect_mark_stmts_as_in_pattern (hash_set<slp_tree> *cache, slp_tree node)
+{
+  if (cache->add (node))
+    return;
+
+  unsigned i;
+  stmt_vec_info stmt_info;
+  slp_tree child;
+
+  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
+    {
+      if (gimple_assign_load_p (STMT_VINFO_STMT (stmt_info)))
+       return;
+
+      vect_save_relevancy (stmt_info);
+      STMT_VINFO_SLP_VECT_ONLY (stmt_info) = true;
+      STMT_SLP_TYPE (stmt_info) = pure_slp;
+    }
+
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
+    vect_mark_stmts_as_in_pattern (cache, child);
+}
+
+
+/*******************************************************************************
+ * complex_pattern class
+ ******************************************************************************/
+
+/* SLP Complex Numbers pattern matching.
+
+  As an example, the following simple loop:
+
+    double a[restrict N]; double b[restrict N]; double c[restrict N];
+
+    for (int i=0; i < N; i+=2)
+    {
+      c[i] = a[i] - b[i+1];
+      c[i+1] = a[i+1] + b[i];
+    }
+
+  which represents a complex addition on with a rotation of 90* around the
+  argand plane. i.e. if `a` and `b` were complex numbers then this would be the
+  same as `a + (b * I)`.
+
+  Here the expressions for `c[i]` and `c[i+1]` are independent but have to be
+  both recognized in order for the pattern to work.  As an SLP tree this is
+  represented as
+
+                +--------------------------------+
+                |       stmt 0 *_9 = _10;        |
+                |       stmt 1 *_15 = _16;       |
+                +--------------------------------+
+                                |
+                                |
+                                v
+                +--------------------------------+
+                |     stmt 0 _10 = _4 - _8;      |
+                |    stmt 1 _16 = _12 + _14;     |
+                | lane permutation { 0[0] 1[1] } |
+                +--------------------------------+
+                            |        |
+                            |        |
+                            |        |
+               +-----+      |        |      +-----+
+               |     |      |        |      |     |
+         +-----| { } |<-----+        +----->| { } --------+
+         |     |     |   +------------------|     |       |
+         |     +-----+   |                  +-----+       |
+         |        |      |                                |
+         |        |      |                                |
+         |        +------|------------------+             |
+         |               |                  |             |
+         v               v                  v             v
+     +--------------------------+     +--------------------------------+
+     |     stmt 0 _8 = *_7;     |     |        stmt 0 _4 = *_3;        |
+     |    stmt 1 _14 = *_13;    |     |       stmt 1 _12 = *_11;       |
+     | load permutation { 1 0 } |     |    load permutation { 0 1 }    |
+     +--------------------------+     +--------------------------------+
+
+  The pattern matcher allows you to replace both statements 0 and 1 or none at
+  all.  Because this operation is a two operands operation the actual nodes
+  being replaced are those in the { } nodes.  The actual scalar statements
+  themselves are not replaced or used during the matching but instead the
+  SLP_TREE_REPRESENTATIVE statements are inspected.  You are also allowed to
+  replace and match on any number of nodes.
+
+  Because the pattern matcher matches on the representative statement for the
+  SLP node the case of two_operators it allows you to match the children of the
+  node.  This is done using the method `recognize ()`.
+
+*/
+
+/* The complex_pattern class contains common code for pattern matchers that work
+   on complex numbers.  These provide functionality to allow de-construction and
+   validation of sequences depicting/transforming REAL and IMAG pairs.  */
+
+class complex_pattern : public vect_pattern
+{
+  protected:
+    vec<slp_tree> *m_nodes = NULL;
+    complex_pattern (slp_tree *node)
+      : vect_pattern (node)
+    { }
+
+  public:
+    bool validate_p (slp_tree_to_load_perm_map_t *);
+    gcall *build (vec_info *);
+};
+
+/* The post transform and validation function for the complex number
+   patterns.  This will re-arrange the tree and re-organize the nodes such
+   that they can be used by the complex number instructions that are to be
+   created.  It does this by doing the following steps:
+
+   For each node that isn't already linear a new permute node is created which
+   when applied makes the node linear.  If such permutation does not exist then
+   the function returns FALSE.  For nodes that are already linear no work is
+   done.   */
+
+bool
+complex_pattern::validate_p (slp_tree_to_load_perm_map_t *perm_cache)
+{
+  slp_tree node;
+  unsigned ix;
+  auto_vec<slp_tree> workset;
+  FOR_EACH_VEC_ELT (this->m_ops, ix, node)
+    {
+      auto_vec<slp_tree> nodes;
+      nodes.create (this->m_num_args);
+      slp_tree tmp = NULL;
+      auto_vec<slp_tree> new_nodes;
+
+      unsigned i;
+      FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, tmp)
+	{
+	  slp_tree vnode = NULL;
+	  if (vect_slp_make_linear (perm_cache, node, tmp, &vnode))
+	    {
+	      nodes.quick_push (vnode);
+	      if (vnode != tmp)
+		new_nodes.safe_push (vnode);
+	    }
+	  else
+	    {
+	      FOR_EACH_VEC_ELT (new_nodes, i, tmp)
+		delete tmp;
+
+	      FOR_EACH_VEC_ELT (workset, i, tmp)
+		delete tmp;
+
+	      nodes.release();
+	      return false;
+	    }
+	}
+
+      slp_tree new_node
+	= vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node),
+				    SLP_TREE_CHILDREN (node).length ());
+      SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node);
+      SLP_TREE_LANE_PERMUTATION (new_node) = SLP_TREE_LANE_PERMUTATION (node);
+      SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node);
+      SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT (node);
+      SLP_TREE_REPRESENTATIVE (new_node) = SLP_TREE_REPRESENTATIVE (node);
+      SLP_TREE_CHILDREN (new_node).safe_splice (nodes);
+      SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node);
+
+      workset.safe_push (new_node);
+    }
+
+  FOR_EACH_VEC_ELT (workset, ix, node)
+    SLP_TREE_CHILDREN (*this->m_node)[ix] = node;
+
+  return true;
+}
+
+
+/* Create a replacement pattern statement for each node in m_node and inserts
+   the new statement into m_node as the new representative statement.  The old
+   statement is marked as being in a pattern defined by the new statement.  The
+   statement is created as call to internal function IFN with m_num_args
+   arguments.
+
+   Futhermore the new pattern is also added to the vectorization information
+   structure VINFO and the old statement STMT_INFO is marked as unused while
+   the new statement is marked as used and the number of SLP uses of the new
+   statement is incremented.
+
+   The newly created SLP nodes are marked as SLP only and will be dissolved
+   if SLP is aborted.
+
+   The newly created gimple call is returned and the BB remains unchanged.
+
+   This default method is designed to only match against simple operands where
+   all the input and output types are the same.
+*/
+
+gcall *
+complex_pattern::build (vec_info *vinfo)
+{
+  stmt_vec_info stmt_info;
+
+  auto_vec<tree> args;
+  args.create (this->m_num_args);
+  args.quick_grow_cleared (this->m_num_args);
+  slp_tree node;
+  unsigned ix;
+  stmt_vec_info call_stmt_info;
+  gcall *call_stmt = NULL;
+
+  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
+    {
+      /* Calculate the location of the statement in NODE to replace.  */
+      stmt_info = SLP_TREE_REPRESENTATIVE (node);
+      gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
+      tree type = TREE_TYPE (gimple_get_lhs (old_stmt));
+
+      /* Create the argument set for use by gimple_build_call_internal_vec.  */
+      for (unsigned i = 0; i < this->m_num_args; i++)
+	{
+	  tree var = make_temp_ssa_name (type, old_stmt, "_pa");
+	  args[i] = var;
+	}
+
+      /* Create the new pattern statements.  */
+      call_stmt = gimple_build_call_internal_vec (this->m_ifn, args);
+      tree var = make_temp_ssa_name (type, call_stmt, "slp_patt");
+      gimple_call_set_lhs (call_stmt, var);
+      gimple_set_location (call_stmt, gimple_location (old_stmt));
+      gimple_call_set_nothrow (call_stmt, true);
+
+      /* Adjust the book-keeping for the new and old statements for use during
+	 SLP.  This is required to get the right VF and statement during SLP
+	 analysis.  These changes are created after relevancy has been set for
+	 the nodes as such we need to manually update them.  Any changes will be
+	 undone if SLP is cancelled.  */
+      call_stmt_info
+	= vinfo->add_pattern_stmt (call_stmt, stmt_info);
+
+      /* add_pattern_stmt can't be done in vect_mark_pattern_stmts because
+	 the non-SLP pattern matchers already have added the statement to VINFO
+	 by the time it is called.  Some of them need to modify the returned
+	 stmt_info.  vect_mark_pattern_stmts is called by recog_pattern and it
+	 would increase the size of each pattern with boilerplate code to make
+	 the call there.  */
+      vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt,
+			       SLP_TREE_VECTYPE (node));
+      STMT_SLP_TYPE (call_stmt_info) = pure_slp;
+      STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
+
+      /* Since we are replacing all the statements in the group with the same
+	 thing it doesn't really matter.  So just set it every time a new stmt
+	 is created.  */
+      SLP_TREE_REPRESENTATIVE (node) = call_stmt_info;
+      SLP_TREE_CODE (node) = CALL_EXPR;
+    }
+  return call_stmt;
+}
+
+/*******************************************************************************
+ * complex_add_pattern class
+ ******************************************************************************/
+
+class complex_add_pattern : public complex_pattern
+{
+  protected:
+    complex_add_pattern (slp_tree *node)
+      : complex_pattern (node)
+    {
+      this->m_num_args = 2;
+    }
+
+  public:
+    static vect_pattern* create (slp_tree *node)
+    {
+       return new complex_add_pattern (node);
+    }
+
+    const char* get_name ()
+    {
+      return "Complex Addition";
+    }
+
+    bool matches (slp_tree_to_load_perm_map_t *);
+    bool matches (complex_operation_t op, slp_tree_to_load_perm_map_t *,
+		  vec<slp_tree> ops);
+};
+
+/* Pattern matcher for trying to match complex addition pattern in SLP tree.
+
+   If no match is found then IFN is set to IFN_LAST.
+   This function matches the patterns shaped as:
+
+   c[i] = a[i] - b[i+1];
+   c[i+1] = a[i+1] + b[i];
+
+   If a match occurred then TRUE is returned, else FALSE.  The initial match is
+   expected to be in OP1 and the initial match operands in args0.  */
+
+bool
+complex_add_pattern::matches (complex_operation_t op,
+			      slp_tree_to_load_perm_map_t *perm_cache,
+			      vec<slp_tree> ops)
+{
+  this->m_ifn = IFN_LAST;
+  this->m_ops.safe_splice (ops);
+
+  /* Find the two components.  Rotation in the complex plane will modify
+     the operations:
+
+      * Rotation  0: + +
+      * Rotation 90: - +
+      * Rotation 180: - -
+      * Rotation 270: + -
+
+      Rotation 0 and 180 can be handled by normal SIMD code, so we don't need
+      to care about them here.  */
+  if (op == MINUS_PLUS)
+    this->m_ifn = IFN_COMPLEX_ADD_ROT90;
+  else if (op == PLUS_MINUS)
+    this->m_ifn = IFN_COMPLEX_ADD_ROT270;
+
+  /* verify that there is a permute, otherwise this isn't a pattern we
+     we support.  */
+  bool is_linear = false;
+  bool all_linear = true;
+  unsigned x;
+  for (x = 0; x < this->m_ops.length () && all_linear; x++)
+    {
+      linear_loads_p (perm_cache, this->m_ops[x], &is_linear);
+      all_linear = all_linear && is_linear;
+    }
+
+  if (this->m_ifn == IFN_LAST || all_linear)
+    return false;
+
+  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
+  return true;
+}
+
+bool
+complex_add_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache)
+{
+  complex_operation_t op
+    = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
+  return matches (op, perm_cache, this->m_ops);
+}
+
+/*******************************************************************************
+ * complex_mul_pattern
+ ******************************************************************************/
+
+/* Helper function of that looks for a match in the CHILDth child of NODE.  The
+   child used is stored in RES.
+
+   If the match is successful then ARGS will contain the operands matched
+   and the complex_operation_t type is returned.  If match is not successful
+   then CMPLX_NONE is returned and ARGS is left unmodified.  */
+
+static inline complex_operation_t
+vect_match_call_complex_mla (slp_tree node, unsigned child,
+			     vec<slp_tree> *args = NULL, slp_tree *res = NULL)
+{
+  gcc_assert (child < SLP_TREE_CHILDREN (node).length ());
+
+  slp_tree data = SLP_TREE_CHILDREN (node)[child];
+
+  if (res)
+    *res = data;
+
+  return vect_detect_pair_op (data, false, args);
+}
+
+/* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the first
+   child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
+
+   If a negate is found then the values in ARGS are reordered such that the
+   negate node is always the second one and the entry is replaced by the child
+   of the negate node.  */
+
+static inline bool
+vect_normalize_conj_loc (vec<slp_tree> args, bool *neg_first_p = NULL)
+{
+  gcc_assert (args.length () == 2);
+  bool neg_found = false;
+
+  if (vect_match_expression_p (args[0], NEGATE_EXPR))
+    {
+      std::swap (args[0], args[1]);
+      neg_found = true;
+      if (neg_first_p)
+	*neg_first_p = true;
+    }
+  else if (vect_match_expression_p (args[1], NEGATE_EXPR))
+    {
+      neg_found = true;
+      if (neg_first_p)
+	*neg_first_p = false;
+    }
+
+  if (neg_found)
+    args[1] = SLP_TREE_CHILDREN (args[1])[0];
+
+  return neg_found;
+}
+
+/* Helper function that checks to see if the load permutation PERM is on that
+   belongs to a duplication operation.  i.e. checks to see if all the values in
+   PERM are the same.  */
+
+static inline bool
+vect_is_dup (load_permutation_t perm, unsigned idx)
+{
+  for (unsigned i = 0; i < perm.length (); i++)
+    if (perm[i] != idx)
+     return false;
+
+  return true;
+}
+
+/* Helper function that checks to see if LEFT_OP and RIGHT_OP are both MULT_EXPR
+   nodes but also that they represent an operation that is either a complex
+   multiplication or a complex multiplication by conjucated value.
+
+   Of the negation is expected to be in the first half of the tree (As required
+   by an FMS pattern) then NEG_FIRST is true.  If the operation is a conjucate
+   operation then CONJ_FIRST_OPERAND is set to indicate whether the first or
+   second operand contains the conjucate operation.  */
+
+static inline bool
+vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
+			     vec<slp_tree> left_op, vec<slp_tree> right_op,
+			     bool neg_first, bool *conj_first_operand)
+{
+  /* The presence of a negation indicates that we have either a conjucate or a
+     rotation.  We need to distinguish which one.  */
+  *conj_first_operand = false;
+
+  load_permutation_t perm;
+  bool is_linear;
+  /* Complex conjucates have the negation on the imaginary part of the
+     number where rotations affect the real component.  So check if the
+     negation is on a dup of lane 1.  */
+  perm = linear_loads_p (perm_cache, right_op[1], &is_linear);
+  if (is_linear || !vect_is_dup (perm, 1))
+    return false;
+
+  /* Check if the conjucate is on the second first or second operand.  The
+     order of the node with the conjucate value determines this, and the dup
+     node must be one of lane 0 of the same DR as the neg node.  */
+  perm = linear_loads_p (perm_cache, left_op[0], &is_linear);
+  if (is_linear)
+    {
+      perm = linear_loads_p (perm_cache, left_op[1], &is_linear);
+      if (is_linear)
+	return false;
+    }
+  else if (!neg_first)
+    *conj_first_operand = true;
+  else
+    return false;
+
+  if (!vect_is_dup (perm, 0))
+    return false;
+
+  return true;
+}
+
+/* Helper function to help distinguish between a conjucate and a rotation in a
+   complex multiplication.  The operations have similar shapes but the order of
+   the load permutes are different.  This function returns TRUE when the order
+   is consistent with a multiplication or multiplication by conjucated
+   operand but returns FALSE if it's a multiplication by rotated operand.  */
+
+static inline bool
+vect_validate_multiplication (slp_tree_to_load_perm_map_t *perm_cache,
+			     vec<slp_tree> op, unsigned idx)
+{
+  /* The left node is the more common case, test it first.  */
+  bool is_linear;
+  load_permutation_t perm = linear_loads_p (perm_cache, op[0], &is_linear);
+  if (is_linear || !vect_is_dup (perm, idx))
+    {
+      perm = linear_loads_p (perm_cache, op[1], &is_linear);
+      if (is_linear || !vect_is_dup (perm, idx))
+	return false;
+    }
+
+  return true;
+}
+
+class complex_mul_pattern : public complex_pattern
+{
+  protected:
+    /* Allocate enough space for FMA as well.  */
+    map_t m_blocks[6] = {};
+    bool m_inplace = false;
+    complex_mul_pattern (slp_tree *node)
+      : complex_pattern (node)
+    {
+      this->m_num_args = 2;
+    }
+
+  public:
+    static vect_pattern* create (slp_tree *node)
+    {
+       return new complex_mul_pattern (node);
+    }
+
+    const char* get_name ()
+    {
+      return "Complex Multiplication";
+    }
+
+    bool validate_p (slp_tree_to_load_perm_map_t *);
+    bool matches (slp_tree_to_load_perm_map_t *);
+    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
+		  vec<slp_tree>);
+};
+
+/* Pattern matcher for trying to match complex multiply pattern in SLP tree
+   If the operation matches then IFN is set to the operation it matched
+   and the arguments to the two replacement statements are put in m_ops.
+
+   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
+
+   This function matches the patterns shaped as:
+
+   double ax = (b[i+1] * a[i]);
+   double bx = (a[i+1] * b[i]);
+
+   c[i] = c[i] - ax;
+   c[i+1] = c[i+1] + bx;
+
+   If a match occurred then TRUE is returned, else FALSE.  The initial match is
+   expected to be in OP1 and the initial match operands in args0.  */
+
+bool
+complex_mul_pattern::matches (complex_operation_t op,
+			      slp_tree_to_load_perm_map_t *perm_cache,
+			      vec<slp_tree> /* ops */)
+{
+  this->m_ifn = IFN_LAST;
+
+  if (op != MINUS_PLUS)
+    return false;
+
+  slp_tree root = *this->m_node;
+  /* First two nodes must be a multiply.  */
+  auto_vec<slp_tree> muls;
+  if (vect_match_call_complex_mla (root, 0) != MULT_MULT
+      || vect_match_call_complex_mla (root, 1, &muls) != MULT_MULT)
+    return false;
+
+  /* Now operand2+4 may lead to another expression.  */
+  auto_vec<slp_tree> left_op, right_op;
+  left_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
+  right_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
+
+  bool neg_first;
+  bool is_neg = vect_normalize_conj_loc (right_op, &neg_first);
+
+  if (!is_neg)
+    {
+      /* A multiplication needs to multiply agains the real pair, otherwise
+	 the pattern matches that of FMS.   */
+      if (!vect_validate_multiplication (perm_cache, left_op, 0))
+	return false;
+      this->m_ifn = IFN_COMPLEX_MUL;
+      this->m_ops.safe_splice (left_op);
+      this->m_ops.safe_splice (right_op);
+    }
+  else if (is_neg)
+    {
+      bool conj_first_operand;
+      if (!vect_validate_multiplication (perm_cache, left_op, right_op,
+					 neg_first, &conj_first_operand))
+	return false;
+
+      this->m_ifn = IFN_COMPLEX_MUL_CONJ;
+      /* Check if the conjucate is on the first or second parameter.  */
+      if (!conj_first_operand)
+	{
+	  this->m_ops.safe_push (left_op[1]);
+	  this->m_ops.safe_push (left_op[0]);
+	}
+      else
+	{
+	  this->m_ops.safe_push (left_op[0]);
+	  this->m_ops.safe_push (left_op[1]);
+	}
+      this->m_ops.safe_push (right_op[1]);
+      this->m_ops.safe_push (right_op[0]);
+    }
+
+  vect_build_perm_groups (perm_cache, &this->m_blocks[0], this->m_ops);
+  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
+
+  return true;
+}
+
+bool
+complex_mul_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache)
+{
+  complex_operation_t op
+    = vect_detect_pair_op (*this->m_node);
+  return matches (op, perm_cache, this->m_ops);
+}
+
+
+/* Validates to see if the Complex MUL that we have matched is valid.  This is
+   done through a combination of making nodes linear and combining nodes.  */
+
+bool
+complex_mul_pattern::validate_p (slp_tree_to_load_perm_map_t *perm_cache)
+{
+  slp_tree node;
+  unsigned ix;
+  hash_set<slp_tree> cache;
+  auto_vec<slp_tree> workset;
+  FOR_EACH_VEC_ELT (*this->m_nodes, ix, node)
+    {
+      auto_vec<slp_tree> nodes;
+      nodes.create (this->m_num_args);
+      slp_tree tmp = NULL;
+      auto_vec<slp_tree> new_nodes;
+
+      unsigned i;
+      for (i = 0; i < this->m_num_args; i++)
+	{
+	  unsigned index = (ix * this->m_num_args) + i;
+	  map_t map = this->m_blocks[index];
+	  slp_tree vnode = NULL;
+	  bool needs_linear = map.a == map.b;
+	  tmp = this->m_ops[index];
+	  cache.add (tmp);
+	  if (needs_linear
+	      && vect_slp_make_linear (perm_cache, node, tmp, &vnode))
+	    {
+	      nodes.quick_push (vnode);
+	      if (vnode != tmp)
+		new_nodes.safe_push (vnode);
+	    }
+	  else if (!needs_linear
+		   && vect_slp_make_combine_linear (perm_cache, node,
+						    this->m_ops, map, &vnode))
+	    {
+	      nodes.quick_push (vnode);
+	      if (vnode != tmp)
+		new_nodes.safe_push (vnode);
+	    }
+	  else
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				"stmts could not be made %s %p\n",
+				needs_linear ? "linear" : "linear/combined",
+				tmp);
+	      FOR_EACH_VEC_ELT (new_nodes, i, tmp)
+		delete tmp;
+
+	      if (!m_inplace)
+		FOR_EACH_VEC_ELT (workset, i, tmp)
+		  delete tmp;
+
+	      nodes.release();
+	      return false;
+	    }
+
+	  vect_mark_stmts_as_in_pattern (&cache, node);
+	}
+
+      if (m_inplace)
+	{
+	  workset.safe_splice (nodes);
+	}
+      else
+	{
+	  slp_tree new_node
+	    = vect_create_new_slp_node (SLP_TREE_SCALAR_STMTS (node),
+					SLP_TREE_CHILDREN (node).length ());
+	  SLP_TREE_VECTYPE (new_node) = SLP_TREE_VECTYPE (node);
+	  SLP_TREE_LANE_PERMUTATION (new_node)
+	    = SLP_TREE_LANE_PERMUTATION (node);
+	  SLP_TREE_CODE (new_node) = SLP_TREE_CODE (node);
+	  SLP_TREE_REF_COUNT (new_node) = SLP_TREE_REF_COUNT (node);
+	  SLP_TREE_REPRESENTATIVE (new_node) = SLP_TREE_REPRESENTATIVE (node);
+	  SLP_TREE_CHILDREN (new_node).safe_splice (nodes);
+	  SLP_TREE_LANES (new_node) = SLP_TREE_LANES (node);
+
+	  workset.safe_push (new_node);
+	}
+    }
+
+  if (m_inplace)
+    {
+      SLP_TREE_CHILDREN (*this->m_node).truncate (0);
+      SLP_TREE_CHILDREN (*this->m_node).safe_splice (workset);
+    }
+  else
+    {
+      FOR_EACH_VEC_ELT (workset, ix, node)
+	SLP_TREE_CHILDREN (*this->m_node)[ix] = node;
+    }
+
+  return true;
+}
+
+/*******************************************************************************
+ * complex_fma_pattern class
+ ******************************************************************************/
+
+class complex_fma_pattern : public complex_mul_pattern
+{
+  protected:
+    /* Temporary nodes cache as scope of the class so we don't
+       have to worry about freeing it. */
+    auto_vec<slp_tree> m_ncache;
+    /* List of nodes to mark as no longer being a mul.  The first
+       one is chosen and the last one is dangling.  */
+    auto_vec<slp_tree> m_mul_nodes;
+    complex_fma_pattern (slp_tree *node)
+      : complex_mul_pattern (node)
+    {
+      this->m_num_args = 3;
+    }
+
+  public:
+    static vect_pattern* create (slp_tree *node)
+    {
+       return new complex_fma_pattern (node);
+    }
+
+    const char* get_name ()
+    {
+      return "Complex FMA";
+    }
+
+    bool matches (slp_tree_to_load_perm_map_t *);
+    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
+		  vec<slp_tree> ops);
+    bool validate_p (slp_tree_to_load_perm_map_t *);
+};
+
+/* Helper function to "reset" a previously matched node and undo the changes
+   made enough so that the node is treated as an irrelevant node.  */
+
+static inline void
+vect_slp_reset_pattern (slp_tree node)
+{
+  stmt_vec_info stmt_info = vect_orig_stmt (SLP_TREE_REPRESENTATIVE (node));
+  STMT_VINFO_IN_PATTERN_P (stmt_info) = false;
+  STMT_SLP_TYPE (stmt_info) = pure_slp;
+  SLP_TREE_REPRESENTATIVE (node) = stmt_info;
+}
+
+/* Pattern matcher for trying to match complex multiply and accumulate
+   and multiply and subtract patterns in SLP tree.
+   If the operation matches then IFN is set to the operation it matched and
+   the arguments to the two replacement statements are put in m_ops.
+
+   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
+
+   This function matches the patterns shaped as:
+
+   double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
+   double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
+
+   c[i] = c[i] - ax;
+   c[i+1] = c[i+1] + bx;
+
+   If a match occurred then TRUE is returned, else FALSE.  The match is
+   performed after COMPLEX_MUL which would have done the majority of the work.
+   This function merely matches an ADD with a COMPLEX_MUL IFN.  The initial
+   match is expected to be in OP1 and the initial match operands in args0.  */
+
+bool
+complex_fma_pattern::matches (complex_operation_t op1,
+			      slp_tree_to_load_perm_map_t * /* perm_cache */,
+			      vec<slp_tree> /* args0 */)
+{
+  this->m_ifn = IFN_LAST;
+
+  /* Find the two components.  We match Complex MUL first which reduces the
+     amount of work this pattern has to do.  After that we just match the
+     head node and we're done.:
+
+     * FMA: + +.
+
+     We need to ignore the two_operands nodes that may also match.
+     For that we can check if they have any scalar statements and also
+     check that it's not a permute node as we're looking for a normal
+     PLUS_EXPR operation.  */
+  if (op1 != CMPLX_NONE)
+    return false;
+
+  /* Find the two components.  We match Complex MUL first which reduces the
+     amount of work this pattern has to do.  After that we just match the
+     head node and we're done.:
+
+   * FMA: + + on a non-two_operands node.  */
+  if (SLP_TREE_LANE_PERMUTATION (*this->m_node).exists ()
+      || !SLP_TREE_SCALAR_STMTS (*this->m_node).exists ()
+      || !vect_match_expression_p (*this->m_node, PLUS_EXPR))
+    return false;
+
+  slp_tree node = SLP_TREE_CHILDREN (*this->m_node)[1];
+
+  if (vect_match_two_call_p (node, IFN_COMPLEX_MUL))
+    this->m_ifn = IFN_COMPLEX_FMA;
+  else if (vect_match_two_call_p (node, IFN_COMPLEX_MUL_CONJ))
+    this->m_ifn = IFN_COMPLEX_FMA_CONJ;
+  else
+    return false;
+
+  this->m_mul_nodes.safe_splice (SLP_TREE_CHILDREN (node));
+  node = SLP_TREE_CHILDREN (node)[0];
+  this->m_ops.create (this->m_num_args);
+
+  if (this->m_ifn == IFN_COMPLEX_FMA)
+    {
+      this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]);
+      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]);
+      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]);
+    }
+  else
+    {
+      this->m_ops.quick_push (SLP_TREE_CHILDREN (*this->m_node)[0]);
+      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[0]);
+      this->m_ops.quick_push (SLP_TREE_CHILDREN (node)[1]);
+    }
+
+  this->m_ncache.safe_push (*this->m_node);
+  this->m_nodes = &this->m_ncache;
+
+  return true;
+}
+
+bool
+complex_fma_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache)
+{
+  auto_vec<slp_tree> args0;
+  complex_operation_t op
+    = vect_detect_pair_op (*this->m_node, true, &args0);
+  return matches (op, perm_cache, args0);
+}
+
+bool
+complex_fma_pattern::validate_p (slp_tree_to_load_perm_map_t *perm_cache)
+{
+  /* FMA's top level node is + + and so is not two_operator,
+     we only need one child.  */
+  slp_tree node;
+  auto_vec<slp_tree> nodes;
+  auto_vec<slp_tree> new_nodes;
+  nodes.create (this->m_num_args);
+  hash_set<slp_tree> cache;
+  unsigned ix;
+  FOR_EACH_VEC_ELT (this->m_ops, ix, node)
+    {
+      slp_tree vnode = NULL;
+      if (vect_slp_make_linear (perm_cache, *this->m_node, node, &vnode))
+	{
+	  nodes.quick_push (vnode);
+	  if (vnode != node)
+	    new_nodes.safe_push (vnode);
+	  if (SLP_TREE_CHILDREN (node).length () == 2)
+	    {
+	      cache.add(SLP_TREE_CHILDREN (node)[0]);
+	      cache.add(SLP_TREE_CHILDREN (node)[1]);
+	    }
+	  else
+	    cache.add (node);
+	}
+      else
+	{
+	  FOR_EACH_VEC_ELT (new_nodes, ix, node)
+	    delete node;
+
+	  nodes.release();
+	  return false;
+	}
+    }
+
+  SLP_TREE_CHILDREN (*this->m_node).truncate (0);
+  SLP_TREE_CHILDREN (*this->m_node).safe_splice (nodes);
+
+  /* Ignore the part of the tree that becomes irrelevant for FMA.  */
+  vect_slp_reset_pattern (this->m_mul_nodes[0]);
+  vect_slp_reset_pattern (this->m_mul_nodes[1]);
+  vect_mark_stmts_as_in_pattern (&cache, this->m_mul_nodes[1]);
+
+  return true;
+}
+
+
+/*******************************************************************************
+ * complex_fms_pattern class
+ ******************************************************************************/
+
+class complex_fms_pattern : public complex_mul_pattern
+{
+  protected:
+    complex_fms_pattern (slp_tree *node)
+      : complex_mul_pattern (node)
+    {
+      this->m_num_args = 3;
+    }
+
+  public:
+    static vect_pattern* create (slp_tree *node)
+    {
+       return new complex_fms_pattern (node);
+    }
+
+    const char* get_name ()
+    {
+      return "Complex FMS";
+    }
+
+    bool matches (slp_tree_to_load_perm_map_t *);
+    bool matches (complex_operation_t, slp_tree_to_load_perm_map_t *,
+		  vec<slp_tree>);
+};
+
+/* Pattern matcher for trying to match complex multiply and accumulate
+   and multiply and subtract patterns in SLP tree.
+   If the operation matches then IFN is set to the operation it matched and
+   the arguments to the two replacement statements are put in m_ops.
+
+   If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
+
+   This function matches the patterns shaped as:
+
+   double ax = (b[i+1] * a[i]) + (b[i] * a[i]);
+   double bx = (a[i+1] * b[i]) - (a[i+1] * b[i+1]);
+
+   c[i] = c[i] - ax;
+   c[i+1] = c[i+1] + bx;
+
+   If a match occurred then TRUE is returned, else FALSE.  The initial match is
+   expected to be in OP1 and the initial match operands in args0.  */
+bool
+complex_fms_pattern::matches (complex_operation_t op1,
+			      slp_tree_to_load_perm_map_t *perm_cache,
+			      vec<slp_tree> args0)
+{
+  this->m_ifn = IFN_LAST;
+
+  /* Find the two components.  We match Complex MUL first which reduces the
+     amount of work this pattern has to do.  After that we just match the
+     head node and we're done.:
+
+     * FMS: - +.  */
+  slp_tree child = NULL;
+
+  /* We need to ignore the two_operands nodes that may also match,
+     for that we can check if they have any scalar statements and also
+     check that it's not a permute node as we're looking for a normal
+     PLUS_EXPR operation.  */
+  if (op1 != PLUS_MINUS)
+    return false;
+
+  child = SLP_TREE_CHILDREN (args0[1])[1];
+  if (vect_detect_pair_op (child) != MINUS_PLUS)
+    return false;
+
+  /* First two nodes must be a multiply.  */
+  auto_vec<slp_tree> muls;
+  if (vect_match_call_complex_mla (child, 0) != MULT_MULT
+      || vect_match_call_complex_mla (child, 1, &muls) != MULT_MULT)
+    return false;
+
+  /* Now operand2+4 may lead to another expression.  */
+  auto_vec<slp_tree> left_op, right_op;
+  left_op.safe_splice (SLP_TREE_CHILDREN (muls[1]));
+  right_op.safe_splice (SLP_TREE_CHILDREN (muls[0]));
+
+  bool is_neg = vect_normalize_conj_loc (right_op);
+
+  this->m_ops.create (6);
+  child = SLP_TREE_CHILDREN (args0[0])[0];
+  this->m_ops.quick_push (child);
+  if (!is_neg)
+    {
+      this->m_ifn = IFN_COMPLEX_FMS;
+      this->m_ops.quick_push (left_op[1]);
+      this->m_ops.quick_push (left_op[0]);
+    }
+  else if (is_neg)
+    {
+      bool conj_first_operand;
+      if (!vect_validate_multiplication (perm_cache, left_op, right_op, false,
+					 &conj_first_operand))
+	return false;
+
+      this->m_ifn = IFN_COMPLEX_FMS_CONJ;
+      /* Check if the conjucate is on the first or second parameter.  */
+      if (!conj_first_operand)
+	{
+	  this->m_ops.quick_push (left_op[1]);
+	  this->m_ops.quick_push (left_op[0]);
+	}
+      else
+	{
+	  this->m_ops.quick_push (left_op[0]);
+	  this->m_ops.quick_push (left_op[1]);
+	}
+    }
+
+  this->m_ops.quick_push (child);
+  this->m_ops.quick_push (right_op[1]);
+  this->m_ops.quick_push (right_op[0]);
+
+  vect_build_perm_groups (perm_cache, &this->m_blocks[0], this->m_ops);
+  this->m_nodes = &SLP_TREE_CHILDREN (*this->m_node);
+  return true;
+}
+
+bool
+complex_fms_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache)
+{
+  auto_vec<slp_tree> args0;
+  complex_operation_t op
+    = vect_detect_pair_op (*this->m_node, true, &args0);
+  return matches (op, perm_cache, args0);
+}
+
+
+/*******************************************************************************
+ * complex_operations_pattern class
+ ******************************************************************************/
+
+/* This function combines all the existing pattern matchers above into one class
+   that shares the functionality between them.  The initial match is shared
+   between all complex operations.  */
+
+class complex_operations_pattern : public complex_pattern
+{
+  protected:
+    complex_operations_pattern (slp_tree *node)
+      : complex_pattern (node)
+    { }
+
+  gcall *build ();
+
+  /* The current "active" pattern.  */
+  vect_pattern *m_patt = NULL;
+
+  public:
+    using complex_pattern::matches;
+
+    bool matches (slp_tree_to_load_perm_map_t *perm_cache);
+    const char* get_name ();
+    bool validate_p (slp_tree_to_load_perm_map_t *perm_cache);
+    bool is_optab_supported_p (tree, optimization_type);
+    internal_fn get_ifn ();
+    gcall *build (vec_info *);
+    void reset (slp_tree *);
+    ~complex_operations_pattern ();
+
+    static vect_pattern* create (slp_tree *node)
+    {
+       return new complex_operations_pattern (node);
+    }
+};
+
+
+/* Clean up the pattern if one existed.  */
+
+complex_operations_pattern::~complex_operations_pattern ()
+{
+  if (this->m_patt)
+    delete this->m_patt;
+  this->m_patt = NULL;
+}
+
+/* Match but do not perform any additional operations on the SLP tree.  If the
+   match is successful then that pattern is made "active" in the class.  */
+
+bool
+complex_operations_pattern::matches (slp_tree_to_load_perm_map_t *perm_cache)
+{
+  complex_operation_t op
+    = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
+
+  /* Check which pattern this may be.  Match longest pattern first.  */
+  this->m_patt = complex_fma_pattern::create (this->m_node);
+  if (this->m_patt->matches (op, perm_cache, this->m_ops))
+    return true;
+
+  if (op == CMPLX_NONE)
+    return false;
+
+  delete this->m_patt;
+
+  this->m_patt = complex_fms_pattern::create (this->m_node);
+  if (this->m_patt->matches (op, perm_cache, this->m_ops))
+    return true;
+
+  delete this->m_patt;
+
+  this->m_patt = complex_mul_pattern::create (this->m_node);
+  if (this->m_patt->matches (op, perm_cache, this->m_ops))
+    return true;
+
+  delete this->m_patt;
+
+  this->m_patt = complex_add_pattern::create (this->m_node);
+  if (this->m_patt->matches (op, perm_cache, this->m_ops))
+    return true;
+
+  delete this->m_patt;
+
+  this->m_patt = NULL;
+  return false;
+}
+
+/* Friendly name of the operation the pattern matches.  */
+
+const char*
+complex_operations_pattern::get_name ()
+{
+  if (!this->m_patt)
+    return "Complex Operations";
+
+  return this->m_patt->get_name ();
+}
+
+/* Only perform the pattern creation part of the matcher.  This creates and
+   returns the new pattern statement and encapsulates the currently active
+   pattern.  */
+
+gcall *
+complex_operations_pattern::build (vec_info *vinfo)
+{
+  if (!this->m_patt)
+    return NULL;
+
+  return this->m_patt->build (vinfo);
+}
+
+/* Validate the operation the currently active pattern detected.  */
+
+bool
+complex_operations_pattern::validate_p (slp_tree_to_load_perm_map_t *perm_cache)
+{
+  if (!this->m_patt)
+    return false;
+
+  return this->m_patt->validate_p (perm_cache);
+}
+
+/* Check if the obtab of the currently active matches is supported.  */
+
+bool
+complex_operations_pattern::is_optab_supported_p (tree vectype,
+						  optimization_type opt_type)
+{
+  if (!vectype || !this->m_patt)
+    return false;
+
+  return this->m_patt->is_optab_supported_p (vectype, opt_type);
+}
+
+/* Return the matched IFN from the current active matcher if any.  */
+
+internal_fn
+complex_operations_pattern::get_ifn ()
+{
+  if (!this->m_patt)
+    return IFN_LAST;
+
+  return this->m_patt->get_ifn ();
+}
+
+/*******************************************************************************
+ * Pattern matching definitions
+ ******************************************************************************/
+
+#define SLP_PATTERN(x) &x::create
+vect_pattern_decl_t slp_patterns[]
+{
+  /* For least amount of back-tracking and more efficient matching
+     order patterns from the largest to the smallest.  Especially if they
+     overlap in what they can detect.  */
+
+  /* FMA overlaps with MUL but is the longer sequence.  Because we're in post
+     order traversal we can't match FMA if included in
+     complex_operations_pattern so must be checked on it's own.  */
+  SLP_PATTERN (complex_operations_pattern),
+};
+#undef SLP_PATTERN
+
+/* Set the number of SLP pattern matchers available.  */
+size_t num__slp_patterns = sizeof(slp_patterns)/sizeof(vect_pattern_decl_t);