diff mbox series

[v2,6/16] middle-end Add Complex Addition with rotation detection

Message ID 20200925142856.GA17824@arm.com
State New
Headers show
Series middle-end Add support for SLP vectorization of complex number instructions. | expand

Commit Message

Tamar Christina Sept. 25, 2020, 2:28 p.m. UTC
Hi All,

This patch adds pattern detections for the following operation:

  Addition with rotation of the second argument around the Argand plane.
    Supported rotations are 90 and 180.

    c = a + (b * I) and c = a + (b * I * I)

  where a, b and c are complex numbers.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* doc/md.texi: Document optabs.
	* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
	* optabs.def (cadd90_optab, cadd270_optab): New.
	* tree-vect-slp-patterns.c (class ComplexAddPattern): New.
	(slp_patterns): Add ComplexAddPattern.

--

Comments

Richard Sandiford Sept. 29, 2020, 10:02 a.m. UTC | #1
Tamar Christina <tamar.christina@arm.com> writes:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6132,6 +6132,17 @@ floating-point mode.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{cadd@var{m}@var{n}3} instruction pattern
> +@item @samp{cadd@var{m}@var{n}3}
> +Perform a vector addition of complex numbers in operand 1 with operand 2
> +rotated by @var{m} degrees around the argand plane and storing the result in
> +operand 0.  The instruction must perform the operation on data loaded
> +contiguously into the vectors.

Nitpicking, sorry, but I think it would be better to describe the
layout directly rather than in terms of loads, since the preceding
operation might not be a load.

I guess the main question is: what representation do we expect for
big-endian?  A normal Advanced SIMD LDR would give this (for floats):

             MEMORY
   +-----+-----+-----+-----+
   | r0  | i0  | r1  | i1  |
   +-----+-----+-----+-----+
   |  0  |  1  |  2  |  3  |   array numbering
   +-----+-----+-----+-----+
      V     V     V     V      Advanced SIMD LDR
   +-----+-----+-----+-----+
   | r0  | i0  | r1  | i1  |
   +-----+-----+-----+-----+
   |  0  |  1  |  2  |  3  |   GCC lane numbering
   +-----+-----+-----+-----+
   |  3  |  2  |  1  |  0  |   Arm lane numbering
   +-----+-----+-----+-----+
  MSB       REGISTER      LSB

but the FC* instructions put the imaginary parts in the more
significant lane, so the pairs of elements above would need
to be reversed:

             MEMORY
   +-----+-----+-----+-----+
   | r0  | i0  | r1  | i1  |
   +-----+-----+-----+-----+
   |  0  |  1  |  2  |  3  |   array numbering
   +-----+-----+-----+-----+
       \   /       \   /
        \ /         \ /
         X           X         Load and permute
        / \         / \
       /   \       /   \
   +-----+-----+-----+-----+
   | i0  | r0  | i1  | r1  |
   +-----+-----+-----+-----+
   |  0  |  1  |  2  |  3  |   GCC lane numbering
   +-----+-----+-----+-----+
   |  3  |  2  |  1  |  0  |   Arm lane numbering
   +-----+-----+-----+-----+
  MSB       REGISTER      LSB

(Or the whole vector could be reversed.)

We might decide that it just isn't worth doing this for Advanced SIMD.
But should the semantics of the optab be that:

(1) GCC lane number 0 holds a real part, or
(2) the least significant lane holds a real part?

With (1), it would be up to the target to hide the permute above.
With (2), the vectoriser would need to introduce the permute itself.

I'm not sure there's a perfect answer even for Arm targets.  (2) matches
the Advanced SIMD semantics.  But for SVE, the register layout follows
LD1 rather than LDR, and the GCC and architectural lane numbering match up.
(1) would therefore be better than (2) for SVE (and so no permute would be
needed for either endianness on SVE).

> +The operation is only supported for vector modes @var{n} and with
> +rotations @var{m} of 90 or 270.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{ffs@var{m}2} instruction pattern
>  @item @samp{ffs@var{m}2}
>  Store into operand 0 one plus the index of the least significant 1-bit
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3fb005b5af78ef31 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
>  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
>  DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
>  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
>  
>  /* FP scales.  */
>  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
>  OPTAB_D (atanh_optab, "atanh$a2")
>  OPTAB_D (copysign_optab, "copysign$F$a3")
>  OPTAB_D (xorsign_optab, "xorsign$F$a3")
> +OPTAB_D (cadd90_optab, "cadd90$a3")
> +OPTAB_D (cadd270_optab, "cadd270$a3")
>  OPTAB_D (cos_optab, "cos$a2")
>  OPTAB_D (cosh_optab, "cosh$a2")
>  OPTAB_D (exp10_optab, "exp10$a2")
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2bac736dd970be735 100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -663,12 +663,94 @@ graceful_exit:
>      }
>  };
>  
> +class ComplexAddPattern : public ComplexPattern

Another nitpick, sorry, but type names should be lower case rather than
CamelCase.

Thanks,
Richard
Richard Biener Sept. 29, 2020, 10:44 a.m. UTC | #2
On Tue, 29 Sep 2020, Richard Sandiford wrote:

> Tamar Christina <tamar.christina@arm.com> writes:
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6132,6 +6132,17 @@ floating-point mode.
> >  
> >  This pattern is not allowed to @code{FAIL}.
> >  
> > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern
> > +@item @samp{cadd@var{m}@var{n}3}
> > +Perform a vector addition of complex numbers in operand 1 with operand 2
> > +rotated by @var{m} degrees around the argand plane and storing the result in
> > +operand 0.  The instruction must perform the operation on data loaded
> > +contiguously into the vectors.
> 
> Nitpicking, sorry, but I think it would be better to describe the
> layout directly rather than in terms of loads, since the preceding
> operation might not be a load.

So if we're at that and since GCC vectors do not have complex
components can we formulate this in terms avoiding 'complex'?
Isn't this an add of one vector to a vector with adjacant
lanes swapped and possibly negated?  Mentioning that this would
match a complex add in case lanes happen to match up with
complex real/imag parts is OK but the pattern should work
equally well if there's no complex numbers involved?

> I guess the main question is: what representation do we expect for
> big-endian?  A normal Advanced SIMD LDR would give this (for floats):
> 
>              MEMORY
>    +-----+-----+-----+-----+
>    | r0  | i0  | r1  | i1  |
>    +-----+-----+-----+-----+
>    |  0  |  1  |  2  |  3  |   array numbering
>    +-----+-----+-----+-----+
>       V     V     V     V      Advanced SIMD LDR
>    +-----+-----+-----+-----+
>    | r0  | i0  | r1  | i1  |
>    +-----+-----+-----+-----+
>    |  0  |  1  |  2  |  3  |   GCC lane numbering
>    +-----+-----+-----+-----+
>    |  3  |  2  |  1  |  0  |   Arm lane numbering
>    +-----+-----+-----+-----+
>   MSB       REGISTER      LSB
> 
> but the FC* instructions put the imaginary parts in the more
> significant lane, so the pairs of elements above would need
> to be reversed:
> 
>              MEMORY
>    +-----+-----+-----+-----+
>    | r0  | i0  | r1  | i1  |
>    +-----+-----+-----+-----+
>    |  0  |  1  |  2  |  3  |   array numbering
>    +-----+-----+-----+-----+
>        \   /       \   /
>         \ /         \ /
>          X           X         Load and permute
>         / \         / \
>        /   \       /   \
>    +-----+-----+-----+-----+
>    | i0  | r0  | i1  | r1  |
>    +-----+-----+-----+-----+
>    |  0  |  1  |  2  |  3  |   GCC lane numbering
>    +-----+-----+-----+-----+
>    |  3  |  2  |  1  |  0  |   Arm lane numbering
>    +-----+-----+-----+-----+
>   MSB       REGISTER      LSB
> 
> (Or the whole vector could be reversed.)
> 
> We might decide that it just isn't worth doing this for Advanced SIMD.
> But should the semantics of the optab be that:
> 
> (1) GCC lane number 0 holds a real part, or
> (2) the least significant lane holds a real part?
> 
> With (1), it would be up to the target to hide the permute above.
> With (2), the vectoriser would need to introduce the permute itself.
> 
> I'm not sure there's a perfect answer even for Arm targets.  (2) matches
> the Advanced SIMD semantics.  But for SVE, the register layout follows
> LD1 rather than LDR, and the GCC and architectural lane numbering match up.
> (1) would therefore be better than (2) for SVE (and so no permute would be
> needed for either endianness on SVE).
> 
> > +The operation is only supported for vector modes @var{n} and with
> > +rotations @var{m} of 90 or 270.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> >  @cindex @code{ffs@var{m}2} instruction pattern
> >  @item @samp{ffs@var{m}2}
> >  Store into operand 0 one plus the index of the least significant 1-bit
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3fb005b5af78ef31 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> >  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
> >  
> >  /* FP scales.  */
> >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
> >  OPTAB_D (atanh_optab, "atanh$a2")
> >  OPTAB_D (copysign_optab, "copysign$F$a3")
> >  OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > +OPTAB_D (cadd90_optab, "cadd90$a3")
> > +OPTAB_D (cadd270_optab, "cadd270$a3")
> >  OPTAB_D (cos_optab, "cos$a2")
> >  OPTAB_D (cosh_optab, "cosh$a2")
> >  OPTAB_D (exp10_optab, "exp10$a2")
> > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > index 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2bac736dd970be735 100644
> > --- a/gcc/tree-vect-slp-patterns.c
> > +++ b/gcc/tree-vect-slp-patterns.c
> > @@ -663,12 +663,94 @@ graceful_exit:
> >      }
> >  };
> >  
> > +class ComplexAddPattern : public ComplexPattern
> 
> Another nitpick, sorry, but type names should be lower case rather than
> CamelCase.
> 
> Thanks,
> Richard
>
Tamar Christina Nov. 3, 2020, 3:06 p.m. UTC | #3
Hi All,

here is a respin with the requested changes.

I just realized I haven't updated the documentation yet but will do
so soon since I'm sure there will be feedback :)

Thanks,
Tamar

gcc/ChangeLog:

	* doc/md.texi: Document optabs.
	* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
	* optabs.def (cadd90_optab, cadd270_optab): New.
	* tree-vect-slp-patterns.c (linear_loads_p, vect_slp_make_linear,
	class complex_add_pattern,complex_add_pattern::matches): New.
	(complex_operations_pattern::matches): Add complex_add_pattern.

> -----Original Message-----
> From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> Behalf Of Richard Biener
> Sent: Tuesday, September 29, 2020 11:44 AM
> To: Richard Sandiford <Richard.Sandiford@arm.com>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org;
> nd <nd@arm.com>; ook@ucw.cz
> Subject: Re: [PATCH v2 6/16]middle-end Add Complex Addition with rotation
> detection
> 
> On Tue, 29 Sep 2020, Richard Sandiford wrote:
> 
> > Tamar Christina <tamar.christina@arm.com> writes:
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > >
> 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a
> 4e
> > > bbed73a70f29be 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -6132,6 +6132,17 @@ floating-point mode.
> > >
> > >  This pattern is not allowed to @code{FAIL}.
> > >
> > > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern @item
> > > +@samp{cadd@var{m}@var{n}3} Perform a vector addition of complex
> > > +numbers in operand 1 with operand 2 rotated by @var{m} degrees
> > > +around the argand plane and storing the result in operand 0.  The
> > > +instruction must perform the operation on data loaded contiguously
> > > +into the vectors.
> >
> > Nitpicking, sorry, but I think it would be better to describe the
> > layout directly rather than in terms of loads, since the preceding
> > operation might not be a load.
> 
> So if we're at that and since GCC vectors do not have complex components
> can we formulate this in terms avoiding 'complex'?
> Isn't this an add of one vector to a vector with adjacant lanes swapped and
> possibly negated?  Mentioning that this would match a complex add in case
> lanes happen to match up with complex real/imag parts is OK but the pattern
> should work equally well if there's no complex numbers involved?
> 
> > I guess the main question is: what representation do we expect for
> > big-endian?  A normal Advanced SIMD LDR would give this (for floats):
> >
> >              MEMORY
> >    +-----+-----+-----+-----+
> >    | r0  | i0  | r1  | i1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   array numbering
> >    +-----+-----+-----+-----+
> >       V     V     V     V      Advanced SIMD LDR
> >    +-----+-----+-----+-----+
> >    | r0  | i0  | r1  | i1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   GCC lane numbering
> >    +-----+-----+-----+-----+
> >    |  3  |  2  |  1  |  0  |   Arm lane numbering
> >    +-----+-----+-----+-----+
> >   MSB       REGISTER      LSB
> >
> > but the FC* instructions put the imaginary parts in the more
> > significant lane, so the pairs of elements above would need to be
> > reversed:
> >
> >              MEMORY
> >    +-----+-----+-----+-----+
> >    | r0  | i0  | r1  | i1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   array numbering
> >    +-----+-----+-----+-----+
> >        \   /       \   /
> >         \ /         \ /
> >          X           X         Load and permute
> >         / \         / \
> >        /   \       /   \
> >    +-----+-----+-----+-----+
> >    | i0  | r0  | i1  | r1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   GCC lane numbering
> >    +-----+-----+-----+-----+
> >    |  3  |  2  |  1  |  0  |   Arm lane numbering
> >    +-----+-----+-----+-----+
> >   MSB       REGISTER      LSB
> >
> > (Or the whole vector could be reversed.)
> >
> > We might decide that it just isn't worth doing this for Advanced SIMD.
> > But should the semantics of the optab be that:
> >
> > (1) GCC lane number 0 holds a real part, or
> > (2) the least significant lane holds a real part?
> >
> > With (1), it would be up to the target to hide the permute above.
> > With (2), the vectoriser would need to introduce the permute itself.
> >
> > I'm not sure there's a perfect answer even for Arm targets.  (2)
> > matches the Advanced SIMD semantics.  But for SVE, the register layout
> > follows
> > LD1 rather than LDR, and the GCC and architectural lane numbering match
> up.
> > (1) would therefore be better than (2) for SVE (and so no permute
> > would be needed for either endianness on SVE).
> >
> > > +The operation is only supported for vector modes @var{n} and with
> > > +rotations @var{m} of 90 or 270.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > >  @cindex @code{ffs@var{m}2} instruction pattern  @item
> > > @samp{ffs@var{m}2}  Store into operand 0 one plus the index of the
> > > least significant 1-bit diff --git a/gcc/internal-fn.def
> > > b/gcc/internal-fn.def index
> > >
> 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3f
> > > b005b5af78ef31 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb,
> > > binary)  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin,
> binary)
> > > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> > > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST,
> cadd90,
> > > +binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270,
> ECF_CONST,
> > > +cadd270, binary)
> > >
> > >  /* FP scales.  */
> > >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) diff --git
> > > a/gcc/optabs.def b/gcc/optabs.def index
> > >
> 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f
> > > 6848e80edf936d 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")  OPTAB_D
> > > (atanh_optab, "atanh$a2")  OPTAB_D (copysign_optab, "copysign$F$a3")
> > > OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > > +OPTAB_D (cadd90_optab, "cadd90$a3") OPTAB_D (cadd270_optab,
> > > +"cadd270$a3")
> > >  OPTAB_D (cos_optab, "cos$a2")
> > >  OPTAB_D (cosh_optab, "cosh$a2")
> > >  OPTAB_D (exp10_optab, "exp10$a2")
> > > diff --git a/gcc/tree-vect-slp-patterns.c
> > > b/gcc/tree-vect-slp-patterns.c index
> > >
> 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2
> ba
> > > c736dd970be735 100644
> > > --- a/gcc/tree-vect-slp-patterns.c
> > > +++ b/gcc/tree-vect-slp-patterns.c
> > > @@ -663,12 +663,94 @@ graceful_exit:
> > >      }
> > >  };
> > >
> > > +class ComplexAddPattern : public ComplexPattern
> >
> > Another nitpick, sorry, but type names should be lower case rather
> > than CamelCase.
> >
> > Thanks,
> > Richard
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imend
diff mbox series

Patch

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6132,6 +6132,17 @@  floating-point mode.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{cadd@var{m}@var{n}3} instruction pattern
+@item @samp{cadd@var{m}@var{n}3}
+Perform a vector addition of complex numbers in operand 1 with operand 2
+rotated by @var{m} degrees around the argand plane and storing the result in
+operand 0.  The instruction must perform the operation on data loaded
+contiguously into the vectors.
+The operation is only supported for vector modes @var{n} and with
+rotations @var{m} of 90 or 270.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{ffs@var{m}2} instruction pattern
 @item @samp{ffs@var{m}2}
 Store into operand 0 one plus the index of the least significant 1-bit
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3fb005b5af78ef31 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -275,6 +275,8 @@  DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -290,6 +290,8 @@  OPTAB_D (atan_optab, "atan$a2")
 OPTAB_D (atanh_optab, "atanh$a2")
 OPTAB_D (copysign_optab, "copysign$F$a3")
 OPTAB_D (xorsign_optab, "xorsign$F$a3")
+OPTAB_D (cadd90_optab, "cadd90$a3")
+OPTAB_D (cadd270_optab, "cadd270$a3")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2bac736dd970be735 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -663,12 +663,94 @@  graceful_exit:
     }
 };
 
+class ComplexAddPattern : public ComplexPattern
+{
+  protected:
+    ComplexAddPattern (slp_tree node, vec_info *vinfo)
+      : ComplexPattern (node, vinfo)
+    {
+      this->m_arity = 2;
+      this->m_num_args = 2;
+      this->m_vects.create (0);
+      this->m_defs.create (0);
+    }
+
+  public:
+    ~ComplexAddPattern ()
+    {
+      this->m_vects.release ();
+      this->m_defs.release ();
+    }
+
+    static VectPattern* create (slp_tree node, vec_info *vinfo)
+    {
+       return new ComplexAddPattern (node, vinfo);
+    }
+
+    const char* get_name ()
+    {
+      return "Complex Addition";
+    }
+
+    /* Pattern matcher for trying to match complex addition pattern in SLP tree
+       using the N statements statements found in node starting at position IDX.
+       If the operation matches then IFN is set to the operation it matched and
+       the arguments to the two replacement statements are put in VECTS.
+
+       If no match is found then IFN is set to IFN_LAST.
+
+       This function matches the patterns shaped as:
+
+         c[i] = a[i] - b[i+1];
+         c[i+1] = a[i+1] + b[i];
+
+       If a match occurred then TRUE is returned, else FALSE.  */
+
+    bool matches (stmt_vec_info *stmts, int idx)
+    {
+      this->m_last_ifn = IFN_LAST;
+      int base = idx - (this->m_arity - 1);
+      this->m_last_idx = idx;
+      this->m_stmt_info = stmts[0];
+
+      complex_operation_t op
+	= vect_detect_pair_op (base, this->m_node, &this->m_vects);
+
+      /* Find the two components.  Rotation in the complex plane will modify
+	 the operations:
+
+	 * Rotation  0: + +
+	 * Rotation 90: - +
+	 * Rotation 180: - -
+	 * Rotation 270: + -
+
+	Rotation 0 and 180 can be handled by normal SIMD code, so we don't need
+	to care about them here.  */
+      if (op == MINUS_PLUS)
+	this->m_last_ifn = IFN_COMPLEX_ADD_ROT90;
+      else if (op == PLUS_MINUS)
+	this->m_last_ifn = IFN_COMPLEX_ADD_ROT270;
+
+      if (this->m_last_ifn == IFN_LAST)
+	return false;
+
+      /* Correct the arguments after matching.  */
+      std::swap (this->m_vects[1], this->m_vects[3]);
+
+      /* If the two operands are the same, we don't have a permute. In such a case
+	 there is no advantage in doing the replacement.  */
+      return store_results ();
+    }
+};
+
 #define SLP_PATTERN(x) &x::create
 VectPatternDecl slp_patterns[]
 {
   /* For least amount of back-tracking and more efficient matching
      order patterns from the largest to the smallest.  Especially if they
      overlap in what they can detect.  */
+
+  SLP_PATTERN (ComplexAddPattern),
 };
 #undef SLP_PATTERN