mbox series

[v2,0/16,RFC,AArch64/Arm/SVE/SVE2/MVE] middle-end Add support for SLP vectorization of complex number instructions.

Message ID 20200925142704.GA9928@arm.com
Headers show
Series middle-end Add support for SLP vectorization of complex number instructions. | expand

Message

Tamar Christina Sept. 25, 2020, 2:27 p.m. UTC
Hi All,

This patch series adds support for SLP vectorization of complex instructions [1].

These instructions exist only in their vector forms and require you to recognize
two statements in parallel.  Complex operations usually require a permute due to
the fact that the real and imaginary numbers are stored intermixed but these vector
instructions expect this and no longer need the compiler to generate a permute.

For this reason the pass also re-orders the loads in the SLP tree such that they
become contiguous and no longer need the permutes.  The Basic Blocks are left
untouched such that the scalar loop will still correctly issue permutes.

The instructions also support rotations along the Argand plane, as such the operands
have to be re-ordered to coincide with their load group.

For now, this patch only adds support for:

  * Complex Addition with rotation of 0 and 180.
  * Complex Multiplication and Multiplication where one operand is conjucated.
  * Complex FMA and FMA where one operand is conjucated.
  * Complex FMS and FMS where one operand is conjucated.
  
Complex dot-product is not currently supported in this patch set as build_slp fails
for it.  This will be provided as a future patch.
  
These are supported for both integer and floating point and as such these don't look
for real or imaginary pairs but instead rely on the early lowering of complex
numbers by GCC and canonicazation of the operations such that it just recognizes any
instruction sequence matching the operations requested.

To be safe when the it is not sure it can support the operation or if it finds something it
does not understand it backs off.

This patch is an RFC and I am looking on feedback on the approach.  Particularly
this series has one problem which is when it is decided that SLP is not viable
and that the normal loop vectorizer is to be used.

In this case I dissolve the changes but the compiler crashes because the use of
pattern matcher essentially undoes two_operands.  This means that the number of
copies needed when using the patterns and when not are different.  When using
the patterns the two operands become the same and so are treated as manually
unrolled loops.  The problem is that because nunits has already been decided
along with the unroll factor.  When the dissolved statements are then analyzed
they fail.  This is also the reason why I cannot analyze both the pattern and
original statements initially.

The relavent placed in the source code have comments describing the problem.

[1] https://developer.arm.com/documentation/ddi0487/fc/

Thanks,
Tamar

--

Comments

Richard Biener Sept. 28, 2020, 11:55 a.m. UTC | #1
On Fri, 25 Sep 2020, Tamar Christina wrote:

> Hi All,
> 
> This patch series adds support for SLP vectorization of complex instructions [1].
> 
> These instructions exist only in their vector forms and require you to recognize
> two statements in parallel.  Complex operations usually require a permute due to
> the fact that the real and imaginary numbers are stored intermixed but these vector
> instructions expect this and no longer need the compiler to generate a permute.
> 
> For this reason the pass also re-orders the loads in the SLP tree such that they
> become contiguous and no longer need the permutes.  The Basic Blocks are left
> untouched such that the scalar loop will still correctly issue permutes.
> 
> The instructions also support rotations along the Argand plane, as such the operands
> have to be re-ordered to coincide with their load group.
> 
> For now, this patch only adds support for:
> 
>   * Complex Addition with rotation of 0 and 180.
>   * Complex Multiplication and Multiplication where one operand is conjucated.
>   * Complex FMA and FMA where one operand is conjucated.
>   * Complex FMS and FMS where one operand is conjucated.
>   
> Complex dot-product is not currently supported in this patch set as build_slp fails
> for it.  This will be provided as a future patch.
>   
> These are supported for both integer and floating point and as such these don't look
> for real or imaginary pairs but instead rely on the early lowering of complex
> numbers by GCC and canonicazation of the operations such that it just recognizes any
> instruction sequence matching the operations requested.
> 
> To be safe when the it is not sure it can support the operation or if it finds something it
> does not understand it backs off.
> 
> This patch is an RFC and I am looking on feedback on the approach.  Particularly
> this series has one problem which is when it is decided that SLP is not viable
> and that the normal loop vectorizer is to be used.
> 
> In this case I dissolve the changes but the compiler crashes because the use of
> pattern matcher essentially undoes two_operands.  This means that the number of
> copies needed when using the patterns and when not are different.  When using
> the patterns the two operands become the same and so are treated as manually
> unrolled loops.  The problem is that because nunits has already been decided
> along with the unroll factor.  When the dissolved statements are then analyzed
> they fail.  This is also the reason why I cannot analyze both the pattern and
> original statements initially.

That's the same as with "regular" patterns btw., if vectorizing the
pattern fails vectorization fails, we never re-consider and we also
have no way of multiple patterns to choose from.

The way "regular" patterns make this a non-issue is that they try
to only convert things that are likely unhandled/suboptimal and
most likely vectorizable.

That said - the solution to the ICE is to _not_ dissolve the changes and
instead make vectorization fail.

Richard.

> The relavent placed in the source code have comments describing the problem.
> 
> [1] https://developer.arm.com/documentation/ddi0487/fc/
> 
> Thanks,
> Tamar