diff mbox series

[9/9,Arm] Add ACLE intrinsics for complex mutliplication and addition

Message ID 20181211154646.GA29649@arm.com
State New
Headers show
Series None | expand

Commit Message

Tamar Christina Dec. 11, 2018, 3:46 p.m. UTC
Hi All,

This patch adds NEON intrinsics and tests for the Armv8.3-a complex
multiplication and add instructions with a rotate along the Argand plane.

The instructions are documented in the ArmARM[1] and the intrinsics specification
will be published on the Arm website [2].

The Lane versions of these instructions are special in that they always select a pair.
using index 0 means selecting lane 0 and 1.  Because of this the range check for the
intrinsics require special handling.

On Arm, in order to implement some of the lane intrinsics we're using the structure of the
register file.  The lane variant of these instructions always select a D register, but the data
itself can be stored in Q registers.  This means that for single precision complex numbers you are
only allowed to select D[0] but using the register file layout you can get the range 0-1 for lane indices
by selecting between Dn[0] and Dn+1[0].

Same reasoning applies for half float complex numbers, except there your D register indexes can be 0 or 1, so you have
a total range of 4 elements (for a V8HF).


[1] https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
[2] https://developer.arm.com/docs/101028/latest

Bootstrapped Regtested on arm-none-gnueabihf and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-12-11  Tamar Christina  <tamar.christina@arm.com>

	* config/arm/arm-builtins.c
	(enum arm_type_qualifiers): Add qualifier_lane_pair_index.
	(MAC_LANE_PAIR_QUALIFIERS): New.
	(arm_expand_builtin_args): Use it.
	(arm_expand_builtin_1): Likewise.
	* config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
	* config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
	* config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
	* config/arm/arm_neon.h:
	(vcadd_rot90_f16): New.
	(vcaddq_rot90_f16): New.
	(vcadd_rot270_f16): New.
	(vcaddq_rot270_f16): New.
	(vcmla_f16): New.
	(vcmlaq_f16): New.
	(vcmla_lane_f16): New.
	(vcmla_laneq_f16): New.
	(vcmlaq_lane_f16): New.
	(vcmlaq_laneq_f16): New.
	(vcmla_rot90_f16): New.
	(vcmlaq_rot90_f16): New.
	(vcmla_rot90_lane_f16): New.
	(vcmla_rot90_laneq_f16): New.
	(vcmlaq_rot90_lane_f16): New.
	(vcmlaq_rot90_laneq_f16): New.
	(vcmla_rot180_f16): New.
	(vcmlaq_rot180_f16): New.
	(vcmla_rot180_lane_f16): New.
	(vcmla_rot180_laneq_f16): New.
	(vcmlaq_rot180_lane_f16): New.
	(vcmlaq_rot180_laneq_f16): New.
	(vcmla_rot270_f16): New.
	(vcmlaq_rot270_f16): New.
	(vcmla_rot270_lane_f16): New.
	(vcmla_rot270_laneq_f16): New.
	(vcmlaq_rot270_lane_f16): New.
	(vcmlaq_rot270_laneq_f16): New.
	(vcadd_rot90_f32): New.
	(vcaddq_rot90_f32): New.
	(vcadd_rot270_f32): New.
	(vcaddq_rot270_f32): New.
	(vcmla_f32): New.
	(vcmlaq_f32): New.
	(vcmla_lane_f32): New.
	(vcmla_laneq_f32): New.
	(vcmlaq_lane_f32): New.
	(vcmlaq_laneq_f32): New.
	(vcmla_rot90_f32): New.
	(vcmlaq_rot90_f32): New.
	(vcmla_rot90_lane_f32): New.
	(vcmla_rot90_laneq_f32): New.
	(vcmlaq_rot90_lane_f32): New.
	(vcmlaq_rot90_laneq_f32): New.
	(vcmla_rot180_f32): New.
	(vcmlaq_rot180_f32): New.
	(vcmla_rot180_lane_f32): New.
	(vcmla_rot180_laneq_f32): New.
	(vcmlaq_rot180_lane_f32): New.
	(vcmlaq_rot180_laneq_f32): New.
	(vcmla_rot270_f32): New.
	(vcmlaq_rot270_f32): New.
	(vcmla_rot270_lane_f32): New.
	(vcmla_rot270_laneq_f32): New.
	(vcmlaq_rot270_lane_f32): New.
	(vcmlaq_rot270_laneq_f32): New.
	* config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
	vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, vcmla_lane270,
	vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
	vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
	* config/arm/neon.md (neon_vcmla_lane<rot><mode>,
	neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.

gcc/testsuite/ChangeLog:

2018-12-11  Tamar Christina  <tamar.christina@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32 regexpr.
	* gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.

--

Comments

Tamar Christina Dec. 21, 2018, 10:57 a.m. UTC | #1
Ping

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org <gcc-patches-owner@gcc.gnu.org>
> On Behalf Of Tamar Christina
> Sent: Tuesday, December 11, 2018 15:47
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> Subject: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex
> mutliplication and addition
> 
> Hi All,
> 
> This patch adds NEON intrinsics and tests for the Armv8.3-a complex
> multiplication and add instructions with a rotate along the Argand plane.
> 
> The instructions are documented in the ArmARM[1] and the intrinsics
> specification will be published on the Arm website [2].
> 
> The Lane versions of these instructions are special in that they always select a
> pair.
> using index 0 means selecting lane 0 and 1.  Because of this the range check
> for the intrinsics require special handling.
> 
> On Arm, in order to implement some of the lane intrinsics we're using the
> structure of the register file.  The lane variant of these instructions always
> select a D register, but the data itself can be stored in Q registers.  This means
> that for single precision complex numbers you are only allowed to select D[0]
> but using the register file layout you can get the range 0-1 for lane indices by
> selecting between Dn[0] and Dn+1[0].
> 
> Same reasoning applies for half float complex numbers, except there your D
> register indexes can be 0 or 1, so you have a total range of 4 elements (for a
> V8HF).
> 
> 
> [1] https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> reference-manual-armv8-for-armv8-a-architecture-profile
> [2] https://developer.arm.com/docs/101028/latest
> 
> Bootstrapped Regtested on arm-none-gnueabihf and no issues.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
> 
> 	* config/arm/arm-builtins.c
> 	(enum arm_type_qualifiers): Add qualifier_lane_pair_index.
> 	(MAC_LANE_PAIR_QUALIFIERS): New.
> 	(arm_expand_builtin_args): Use it.
> 	(arm_expand_builtin_1): Likewise.
> 	* config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands):
> New.
> 	* config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
> 	* config/arm/arm-c.c (arm_cpu_builtins): Add
> __ARM_FEATURE_COMPLEX.
> 	* config/arm/arm_neon.h:
> 	(vcadd_rot90_f16): New.
> 	(vcaddq_rot90_f16): New.
> 	(vcadd_rot270_f16): New.
> 	(vcaddq_rot270_f16): New.
> 	(vcmla_f16): New.
> 	(vcmlaq_f16): New.
> 	(vcmla_lane_f16): New.
> 	(vcmla_laneq_f16): New.
> 	(vcmlaq_lane_f16): New.
> 	(vcmlaq_laneq_f16): New.
> 	(vcmla_rot90_f16): New.
> 	(vcmlaq_rot90_f16): New.
> 	(vcmla_rot90_lane_f16): New.
> 	(vcmla_rot90_laneq_f16): New.
> 	(vcmlaq_rot90_lane_f16): New.
> 	(vcmlaq_rot90_laneq_f16): New.
> 	(vcmla_rot180_f16): New.
> 	(vcmlaq_rot180_f16): New.
> 	(vcmla_rot180_lane_f16): New.
> 	(vcmla_rot180_laneq_f16): New.
> 	(vcmlaq_rot180_lane_f16): New.
> 	(vcmlaq_rot180_laneq_f16): New.
> 	(vcmla_rot270_f16): New.
> 	(vcmlaq_rot270_f16): New.
> 	(vcmla_rot270_lane_f16): New.
> 	(vcmla_rot270_laneq_f16): New.
> 	(vcmlaq_rot270_lane_f16): New.
> 	(vcmlaq_rot270_laneq_f16): New.
> 	(vcadd_rot90_f32): New.
> 	(vcaddq_rot90_f32): New.
> 	(vcadd_rot270_f32): New.
> 	(vcaddq_rot270_f32): New.
> 	(vcmla_f32): New.
> 	(vcmlaq_f32): New.
> 	(vcmla_lane_f32): New.
> 	(vcmla_laneq_f32): New.
> 	(vcmlaq_lane_f32): New.
> 	(vcmlaq_laneq_f32): New.
> 	(vcmla_rot90_f32): New.
> 	(vcmlaq_rot90_f32): New.
> 	(vcmla_rot90_lane_f32): New.
> 	(vcmla_rot90_laneq_f32): New.
> 	(vcmlaq_rot90_lane_f32): New.
> 	(vcmlaq_rot90_laneq_f32): New.
> 	(vcmla_rot180_f32): New.
> 	(vcmlaq_rot180_f32): New.
> 	(vcmla_rot180_lane_f32): New.
> 	(vcmla_rot180_laneq_f32): New.
> 	(vcmlaq_rot180_lane_f32): New.
> 	(vcmlaq_rot180_laneq_f32): New.
> 	(vcmla_rot270_f32): New.
> 	(vcmlaq_rot270_f32): New.
> 	(vcmla_rot270_lane_f32): New.
> 	(vcmla_rot270_laneq_f32): New.
> 	(vcmlaq_rot270_lane_f32): New.
> 	(vcmlaq_rot270_laneq_f32): New.
> 	* config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0,
> vcmla90,
> 	vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180,
> vcmla_lane270,
> 	vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
> 	vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270):
> New.
> 	* config/arm/neon.md (neon_vcmla_lane<rot><mode>,
> 	neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>):
> New.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
> 
> 	* gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add
> AArch32 regexpr.
> 	* gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c:
> Likewise.
> 
> --
Kyrill Tkachov Dec. 21, 2018, 11:40 a.m. UTC | #2
Hi Tamar,

On 11/12/18 15:46, Tamar Christina wrote:
> Hi All,
>
> This patch adds NEON intrinsics and tests for the Armv8.3-a complex
> multiplication and add instructions with a rotate along the Argand plane.
>
> The instructions are documented in the ArmARM[1] and the intrinsics specification
> will be published on the Arm website [2].
>
> The Lane versions of these instructions are special in that they always select a pair.
> using index 0 means selecting lane 0 and 1.  Because of this the range check for the
> intrinsics require special handling.
>
> On Arm, in order to implement some of the lane intrinsics we're using the structure of the
> register file.  The lane variant of these instructions always select a D register, but the data
> itself can be stored in Q registers.  This means that for single precision complex numbers you are
> only allowed to select D[0] but using the register file layout you can get the range 0-1 for lane indices
> by selecting between Dn[0] and Dn+1[0].
>
> Same reasoning applies for half float complex numbers, except there your D register indexes can be 0 or 1, so you have
> a total range of 4 elements (for a V8HF).
>
>
> [1] https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
> [2] https://developer.arm.com/docs/101028/latest
>
> Bootstrapped Regtested on arm-none-gnueabihf and no issues.
>
> Ok for trunk?
>

Ok.
Thanks,
Kyrill

> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
>
>         * config/arm/arm-builtins.c
>         (enum arm_type_qualifiers): Add qualifier_lane_pair_index.
>         (MAC_LANE_PAIR_QUALIFIERS): New.
>         (arm_expand_builtin_args): Use it.
>         (arm_expand_builtin_1): Likewise.
>         * config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
>         * config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
>         * config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
>         * config/arm/arm_neon.h:
>         (vcadd_rot90_f16): New.
>         (vcaddq_rot90_f16): New.
>         (vcadd_rot270_f16): New.
>         (vcaddq_rot270_f16): New.
>         (vcmla_f16): New.
>         (vcmlaq_f16): New.
>         (vcmla_lane_f16): New.
>         (vcmla_laneq_f16): New.
>         (vcmlaq_lane_f16): New.
>         (vcmlaq_laneq_f16): New.
>         (vcmla_rot90_f16): New.
>         (vcmlaq_rot90_f16): New.
>         (vcmla_rot90_lane_f16): New.
>         (vcmla_rot90_laneq_f16): New.
>         (vcmlaq_rot90_lane_f16): New.
>         (vcmlaq_rot90_laneq_f16): New.
>         (vcmla_rot180_f16): New.
>         (vcmlaq_rot180_f16): New.
>         (vcmla_rot180_lane_f16): New.
>         (vcmla_rot180_laneq_f16): New.
>         (vcmlaq_rot180_lane_f16): New.
>         (vcmlaq_rot180_laneq_f16): New.
>         (vcmla_rot270_f16): New.
>         (vcmlaq_rot270_f16): New.
>         (vcmla_rot270_lane_f16): New.
>         (vcmla_rot270_laneq_f16): New.
>         (vcmlaq_rot270_lane_f16): New.
>         (vcmlaq_rot270_laneq_f16): New.
>         (vcadd_rot90_f32): New.
>         (vcaddq_rot90_f32): New.
>         (vcadd_rot270_f32): New.
>         (vcaddq_rot270_f32): New.
>         (vcmla_f32): New.
>         (vcmlaq_f32): New.
>         (vcmla_lane_f32): New.
>         (vcmla_laneq_f32): New.
>         (vcmlaq_lane_f32): New.
>         (vcmlaq_laneq_f32): New.
>         (vcmla_rot90_f32): New.
>         (vcmlaq_rot90_f32): New.
>         (vcmla_rot90_lane_f32): New.
>         (vcmla_rot90_laneq_f32): New.
>         (vcmlaq_rot90_lane_f32): New.
>         (vcmlaq_rot90_laneq_f32): New.
>         (vcmla_rot180_f32): New.
>         (vcmlaq_rot180_f32): New.
>         (vcmla_rot180_lane_f32): New.
>         (vcmla_rot180_laneq_f32): New.
>         (vcmlaq_rot180_lane_f32): New.
>         (vcmlaq_rot180_laneq_f32): New.
>         (vcmla_rot270_f32): New.
>         (vcmlaq_rot270_f32): New.
>         (vcmla_rot270_lane_f32): New.
>         (vcmla_rot270_laneq_f32): New.
>         (vcmlaq_rot270_lane_f32): New.
>         (vcmlaq_rot270_laneq_f32): New.
>         * config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
>         vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, vcmla_lane270,
>         vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
>         vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
>         * config/arm/neon.md (neon_vcmla_lane<rot><mode>,
>         neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.
>
> gcc/testsuite/ChangeLog:
>
> 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
>
>         * gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32 regexpr.
>         * gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.
>
> --
Tamar Christina Dec. 21, 2018, 6:03 p.m. UTC | #3
Hi All,

I have made a trivial change in the patch and will assume the OK still applies.

I have also changed it from a compile to assemble tests.

Kind Regards,
Tamar

The 12/21/2018 11:40, Kyrill Tkachov wrote:
> Hi Tamar,
> 
> On 11/12/18 15:46, Tamar Christina wrote:
> > Hi All,
> >
> > This patch adds NEON intrinsics and tests for the Armv8.3-a complex
> > multiplication and add instructions with a rotate along the Argand plane.
> >
> > The instructions are documented in the ArmARM[1] and the intrinsics specification
> > will be published on the Arm website [2].
> >
> > The Lane versions of these instructions are special in that they always select a pair.
> > using index 0 means selecting lane 0 and 1.  Because of this the range check for the
> > intrinsics require special handling.
> >
> > On Arm, in order to implement some of the lane intrinsics we're using the structure of the
> > register file.  The lane variant of these instructions always select a D register, but the data
> > itself can be stored in Q registers.  This means that for single precision complex numbers you are
> > only allowed to select D[0] but using the register file layout you can get the range 0-1 for lane indices
> > by selecting between Dn[0] and Dn+1[0].
> >
> > Same reasoning applies for half float complex numbers, except there your D register indexes can be 0 or 1, so you have
> > a total range of 4 elements (for a V8HF).
> >
> >
> > [1] https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
> > [2] https://developer.arm.com/docs/101028/latest
> >
> > Bootstrapped Regtested on arm-none-gnueabihf and no issues.
> >
> > Ok for trunk?
> >
> 
> Ok.
> Thanks,
> Kyrill
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
> >
> >         * config/arm/arm-builtins.c
> >         (enum arm_type_qualifiers): Add qualifier_lane_pair_index.
> >         (MAC_LANE_PAIR_QUALIFIERS): New.
> >         (arm_expand_builtin_args): Use it.
> >         (arm_expand_builtin_1): Likewise.
> >         * config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
> >         * config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
> >         * config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
> >         * config/arm/arm_neon.h:
> >         (vcadd_rot90_f16): New.
> >         (vcaddq_rot90_f16): New.
> >         (vcadd_rot270_f16): New.
> >         (vcaddq_rot270_f16): New.
> >         (vcmla_f16): New.
> >         (vcmlaq_f16): New.
> >         (vcmla_lane_f16): New.
> >         (vcmla_laneq_f16): New.
> >         (vcmlaq_lane_f16): New.
> >         (vcmlaq_laneq_f16): New.
> >         (vcmla_rot90_f16): New.
> >         (vcmlaq_rot90_f16): New.
> >         (vcmla_rot90_lane_f16): New.
> >         (vcmla_rot90_laneq_f16): New.
> >         (vcmlaq_rot90_lane_f16): New.
> >         (vcmlaq_rot90_laneq_f16): New.
> >         (vcmla_rot180_f16): New.
> >         (vcmlaq_rot180_f16): New.
> >         (vcmla_rot180_lane_f16): New.
> >         (vcmla_rot180_laneq_f16): New.
> >         (vcmlaq_rot180_lane_f16): New.
> >         (vcmlaq_rot180_laneq_f16): New.
> >         (vcmla_rot270_f16): New.
> >         (vcmlaq_rot270_f16): New.
> >         (vcmla_rot270_lane_f16): New.
> >         (vcmla_rot270_laneq_f16): New.
> >         (vcmlaq_rot270_lane_f16): New.
> >         (vcmlaq_rot270_laneq_f16): New.
> >         (vcadd_rot90_f32): New.
> >         (vcaddq_rot90_f32): New.
> >         (vcadd_rot270_f32): New.
> >         (vcaddq_rot270_f32): New.
> >         (vcmla_f32): New.
> >         (vcmlaq_f32): New.
> >         (vcmla_lane_f32): New.
> >         (vcmla_laneq_f32): New.
> >         (vcmlaq_lane_f32): New.
> >         (vcmlaq_laneq_f32): New.
> >         (vcmla_rot90_f32): New.
> >         (vcmlaq_rot90_f32): New.
> >         (vcmla_rot90_lane_f32): New.
> >         (vcmla_rot90_laneq_f32): New.
> >         (vcmlaq_rot90_lane_f32): New.
> >         (vcmlaq_rot90_laneq_f32): New.
> >         (vcmla_rot180_f32): New.
> >         (vcmlaq_rot180_f32): New.
> >         (vcmla_rot180_lane_f32): New.
> >         (vcmla_rot180_laneq_f32): New.
> >         (vcmlaq_rot180_lane_f32): New.
> >         (vcmlaq_rot180_laneq_f32): New.
> >         (vcmla_rot270_f32): New.
> >         (vcmlaq_rot270_f32): New.
> >         (vcmla_rot270_lane_f32): New.
> >         (vcmla_rot270_laneq_f32): New.
> >         (vcmlaq_rot270_lane_f32): New.
> >         (vcmlaq_rot270_laneq_f32): New.
> >         * config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
> >         vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, vcmla_lane270,
> >         vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
> >         vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
> >         * config/arm/neon.md (neon_vcmla_lane<rot><mode>,
> >         neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
> >
> >         * gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32 regexpr.
> >         * gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.
> >
> > -- 
> 

--
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 563ca51dcd0d63046d2bf577ca86d5f70a466bcf..1c7eac4b9eae55b76687b9239a2d71f31cc7b8d9 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -82,7 +82,10 @@ enum arm_type_qualifiers
   /* A void pointer.  */
   qualifier_void_pointer = 0x800,
   /* A const void pointer.  */
-  qualifier_const_void_pointer = 0x802
+  qualifier_const_void_pointer = 0x802,
+  /* Lane indices selected in pairs - must be within range of previous
+     argument = a vector.  */
+  qualifier_lane_pair_index = 0x1000
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -144,6 +147,13 @@ arm_mac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
       qualifier_none, qualifier_lane_index };
 #define MAC_LANE_QUALIFIERS (arm_mac_lane_qualifiers)
 
+/* T (T, T, T, lane pair index).  */
+static enum arm_type_qualifiers
+arm_mac_lane_pair_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none,
+      qualifier_none, qualifier_lane_pair_index };
+#define MAC_LANE_PAIR_QUALIFIERS (arm_mac_lane_pair_qualifiers)
+
 /* unsigned T (unsigned T, unsigned T, unsigend T, lane index).  */
 static enum arm_type_qualifiers
 arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -2129,6 +2139,7 @@ typedef enum {
   ARG_BUILTIN_CONSTANT,
   ARG_BUILTIN_LANE_INDEX,
   ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX,
+  ARG_BUILTIN_LANE_PAIR_INDEX,
   ARG_BUILTIN_NEON_MEMORY,
   ARG_BUILTIN_MEMORY,
   ARG_BUILTIN_STOP
@@ -2266,6 +2277,19 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
 		  machine_mode vmode = mode[argc - 1];
 		  neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode), exp);
 		}
+	      /* If the lane index isn't a constant then error out.  */
+	      goto constant_arg;
+
+	    case ARG_BUILTIN_LANE_PAIR_INDEX:
+	      /* Previous argument must be a vector, which this indexes. The
+		 indexing will always select i and i+1 out of the vector, which
+		 puts a limit on i.  */
+	      gcc_assert (argc > 0);
+	      if (CONST_INT_P (op[argc]))
+		{
+		  machine_mode vmode = mode[argc - 1];
+		  neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp);
+		}
 	      /* If the lane index isn't a constant then the next
 		 case will error.  */
 	      /* Fall through.  */
@@ -2427,6 +2451,8 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target,
 
       if (d->qualifiers[qualifiers_k] & qualifier_lane_index)
 	args[k] = ARG_BUILTIN_LANE_INDEX;
+      else if (d->qualifiers[qualifiers_k] & qualifier_lane_pair_index)
+	args[k] = ARG_BUILTIN_LANE_PAIR_INDEX;
       else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index)
 	args[k] = ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX;
       else if (d->qualifiers[qualifiers_k] & qualifier_immediate)
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 4471f7914cf282c516a142174f9913e491558b44..89afc65572f3cdc98fff15afb78ef3af602c5b72 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -76,6 +76,7 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_CRC32", TARGET_CRC32);
   def_or_undef_macro (pfile, "__ARM_FEATURE_DOTPROD", TARGET_DOTPROD);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);
   def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
   cpp_undef (pfile, "__ARM_FEATURE_CMSE");
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cea98669111d318954e9f6102db74172e675304b..f6fec824e68020794a58b94157e064e70b60c456 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -109,6 +109,8 @@ extern int arm_coproc_mem_operand (rtx, bool);
 extern int neon_vector_mem_operand (rtx, int, bool);
 extern int neon_struct_mem_operand (rtx);
 
+extern rtx *neon_vcmla_lane_prepare_operands (machine_mode, rtx *);
+
 extern int tls_mentioned_p (rtx);
 extern int symbol_mentioned_p (rtx);
 extern int label_mentioned_p (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cbcbeeb6e076bb8f632e5c31dd751937af4514f5..20059df4fecf591534f0981727de6e7a4823b83a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12680,6 +12680,44 @@ neon_struct_mem_operand (rtx op)
   return FALSE;
 }
 
+/* Prepares the operands for the VCMLA by lane instruction such that the right
+   register number is selected.  This instruction is special in that it always
+   requires a D register, however there is a choice to be made between Dn[0],
+   Dn[1], D(n+1)[0], and D(n+1)[1] depending on the mode of the registers and
+   the PATTERNMODE of the insn.
+
+   The VCMLA by lane function always selects two values. For instance given D0
+   and a V2SF, the only valid index is 0 as the values in S0 and S1 will be
+   used by the instruction.  However given V4SF then index 0 and 1 are valid as
+   D0[0] or D1[0] are both valid.
+
+   This function centralizes that information based on OPERANDS, OPERANDS[3]
+   will be changed from a REG into a CONST_INT RTX and OPERANDS[4] will be
+   updated to contain the right index.  */
+
+rtx *
+neon_vcmla_lane_prepare_operands (machine_mode patternmode, rtx *operands)
+{
+  int lane = NEON_ENDIAN_LANE_N (patternmode, INTVAL (operands[4]));
+  machine_mode constmode = SImode;
+  machine_mode mode = GET_MODE (operands[3]);
+  int regno = REGNO (operands[3]);
+  regno = ((regno - FIRST_VFP_REGNUM) >> 1);
+  if (lane > 0 && lane >= GET_MODE_NUNITS (mode) / 4)
+    {
+      operands[3] = gen_int_mode (regno + 1, constmode);
+      operands[4]
+	= gen_int_mode (lane - GET_MODE_NUNITS (mode) / 4, constmode);
+    }
+  else
+    {
+      operands[3] = gen_int_mode (regno, constmode);
+      operands[4] = gen_int_mode (lane, constmode);
+    }
+  return operands;
+}
+
+
 /* Return true if X is a register that will be eliminated later on.  */
 int
 arm_eliminable_register (rtx x)
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 6213a4aa0dabec756441523eee870e11485bb1c7..bb3acd20ff3ba6782b1be4363047f62fbb1779e8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18307,6 +18307,445 @@ vfmlsl_laneq_high_u32 (float32x2_t __r, float16x4_t __a, float16x8_t __b,
 #pragma GCC pop_options
 #endif
 
+/* AdvSIMD Complex numbers intrinsics.  */
+#if __ARM_ARCH >= 8
+#pragma GCC push_options
+#pragma GCC target(("arch=armv8.3-a"))
+
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#pragma GCC push_options
+#pragma GCC target(("+fp16"))
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot90_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcadd90v4hf (__a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot90_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcadd90v8hf (__a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot270_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcadd90v4hf (__a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot270_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcadd90v8hf (__a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla0v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla0v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		const int __index)
+{
+  return __builtin_neon_vcmla_lane0v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmla_laneq0v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmlaq_lane0v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vcmla_lane0v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla90v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla90v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		      const int __index)
+{
+  return __builtin_neon_vcmla_lane90v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_laneq90v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmlaq_lane90v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_lane90v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla180v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla180v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane180v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq180v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane180v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane180v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla270v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla270v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane270v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq270v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane270v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane270v8hf (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+#endif
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot90_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcadd90v2sf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot90_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcadd90v4sf (__a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot270_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcadd90v2sf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot270_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcadd90v4sf (__a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla0v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla0v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		const int __index)
+{
+  return __builtin_neon_vcmla_lane0v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmla_laneq0v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmlaq_lane0v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vcmla_lane0v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla90v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla90v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		      const int __index)
+{
+  return __builtin_neon_vcmla_lane90v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_laneq90v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmlaq_lane90v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_lane90v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla180v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla180v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane180v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq180v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane180v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane180v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla270v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla270v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane270v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq270v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane270v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 6ec293324fb879d9528ad6cc998d8a893f2cbaab..dcccc84940a9214d6795b4384e84de8150f2273d 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -351,3 +351,25 @@ VAR2 (TERNOP, sdot, v8qi, v16qi)
 VAR2 (UTERNOP, udot, v8qi, v16qi)
 VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi)
 VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
+
+VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf)
+VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf)
+VAR4 (TERNOP, vcmla0, v2sf, v4sf, v4hf, v8hf)
+VAR4 (TERNOP, vcmla90, v2sf, v4sf, v4hf, v8hf)
+VAR4 (TERNOP, vcmla180, v2sf, v4sf, v4hf, v8hf)
+VAR4 (TERNOP, vcmla270, v2sf, v4sf, v4hf, v8hf)
+
+VAR4 (MAC_LANE_PAIR, vcmla_lane0, v2sf, v4hf, v8hf, v4sf)
+VAR4 (MAC_LANE_PAIR, vcmla_lane90, v2sf, v4hf, v8hf, v4sf)
+VAR4 (MAC_LANE_PAIR, vcmla_lane180, v2sf, v4hf, v8hf, v4sf)
+VAR4 (MAC_LANE_PAIR, vcmla_lane270, v2sf, v4hf, v8hf, v4sf)
+
+VAR2 (MAC_LANE_PAIR, vcmla_laneq0, v2sf, v4hf)
+VAR2 (MAC_LANE_PAIR, vcmla_laneq90, v2sf, v4hf)
+VAR2 (MAC_LANE_PAIR, vcmla_laneq180, v2sf, v4hf)
+VAR2 (MAC_LANE_PAIR, vcmla_laneq270, v2sf, v4hf)
+
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane0, v4sf, v8hf)
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane90, v4sf, v8hf)
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane180, v4sf, v8hf)
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane270, v4sf, v8hf)
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index f50075bf5ffb6be6db1975087da0b468ab05a8a2..795d7e0b9f4aca4a9f5eba61b7fce2ceb7f006fb 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3463,6 +3463,51 @@
   [(set_attr "type" "neon_fcmla")]
 )
 
+(define_insn "neon_vcmla_lane<rot><mode>"
+  [(set (match_operand:VF 0 "s_register_operand" "=w")
+	(plus:VF (match_operand:VF 1 "s_register_operand" "0")
+		 (unspec:VF [(match_operand:VF 2 "s_register_operand" "w")
+			     (match_operand:VF 3 "s_register_operand" "<VF_constraint>")
+			     (match_operand:SI 4 "const_int_operand" "n")]
+			     VCMLA)))]
+  "TARGET_COMPLEX"
+  {
+    operands = neon_vcmla_lane_prepare_operands (<MODE>mode, operands);
+    return "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, d%c3[%c4], #<rot>";
+  }
+  [(set_attr "type" "neon_fcmla")]
+)
+
+(define_insn "neon_vcmla_laneq<rot><mode>"
+  [(set (match_operand:VDF 0 "s_register_operand" "=w")
+	(plus:VDF (match_operand:VDF 1 "s_register_operand" "0")
+		  (unspec:VDF [(match_operand:VDF 2 "s_register_operand" "w")
+			      (match_operand:<V_DOUBLE> 3 "s_register_operand" "<VF_constraint>")
+			      (match_operand:SI 4 "const_int_operand" "n")]
+			      VCMLA)))]
+  "TARGET_COMPLEX"
+  {
+    operands = neon_vcmla_lane_prepare_operands (<MODE>mode, operands);
+    return "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, d%c3[%c4], #<rot>";
+  }
+  [(set_attr "type" "neon_fcmla")]
+)
+
+(define_insn "neon_vcmlaq_lane<rot><mode>"
+  [(set (match_operand:VQ_HSF 0 "s_register_operand" "=w")
+	(plus:VQ_HSF (match_operand:VQ_HSF 1 "s_register_operand" "0")
+		 (unspec:VQ_HSF [(match_operand:VQ_HSF 2 "s_register_operand" "w")
+				 (match_operand:<V_HALF> 3 "s_register_operand" "<VF_constraint>")
+				 (match_operand:SI 4 "const_int_operand" "n")]
+				 VCMLA)))]
+  "TARGET_COMPLEX"
+  {
+    operands = neon_vcmla_lane_prepare_operands (<MODE>mode, operands);
+    return "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, d%c3[%c4], #<rot>";
+  }
+  [(set_attr "type" "neon_fcmla")]
+)
+
 ;; The complex mla operations always need to expand to two instructions.
 ;; The first operation does half the computation and the second does the
 ;; remainder.  Because of this, expand early.
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c
index b7c999333ed3a7aa9708bca3a0510ba754b7e4d4..1428cbe3f695f082ccae91dfb32ab92461561891 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c
@@ -1,5 +1,4 @@
-/* { dg-skip-if "" { arm-*-* } } */
-/* { dg-do assemble } */
+/* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_3a_complex_neon_ok } */
 /* { dg-add-options arm_v8_3a_complex_neon }  */
 /* { dg-additional-options "-O2 -save-temps" } */
@@ -249,3 +248,22 @@ test_vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.s\[1\], #270} 1 { target { aarch64*-*-* } } } } */
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.s\[1\], #90} 1 { target { aarch64*-*-* } } } } */
 /* { dg-final { scan-assembler-times {dup\td[0-9]+, v[0-9]+.d\[1\]} 4 { target { aarch64*-*-* } } } } */
+
+/* { dg-final { scan-assembler-times {vcadd.f32\td[0-9]+, d[0-9]+, d[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcadd.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #0} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #180} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #270} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #0} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #180} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #270} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 1 { target { arm*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c
index dbcebcbfba67172de25bb3ab743270cacf7c9f96..99754b67e4b4f62561a2c094a59bb70d6af4f31a 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c
@@ -1,5 +1,4 @@
-/* { dg-skip-if "" { arm-*-* } } */
-/* { dg-do assemble } */
+/* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_3a_complex_neon_ok } */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */
 /* { dg-add-options arm_v8_3a_complex_neon } */
@@ -304,3 +303,30 @@ test_vcmlaq_rot270_laneq_f16_2 (float16x8_t __r, float16x8_t __a, float16x8_t __
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.h\[3\], #180} 1 { target { aarch64*-*-* } } } } */
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.h\[3\], #270} 1 { target { aarch64*-*-* } } } } */
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.h\[3\], #90} 1 { target { aarch64*-*-* } } } } */
+
+/* { dg-final { scan-assembler-times {vcadd.f16\td[0-9]+, d[0-9]+, d[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcadd.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #0} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #180} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #270} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #90} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #0} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #180} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #270} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #90} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 1 { target { arm*-*-* } } } } */
Tamar Christina Jan. 10, 2019, 3:43 a.m. UTC | #4
Hi Kyrill,

Committed with a the addition of a few trivial defines and iterators that were missing due to
The patch being split. 

Thanks,
Tamar

-----Original Message-----
From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> 
Sent: Friday, December 21, 2018 11:40 AM
To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
Cc: nd <nd@arm.com>; Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; nickc@redhat.com
Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

Hi Tamar,

On 11/12/18 15:46, Tamar Christina wrote:
> Hi All,
>
> This patch adds NEON intrinsics and tests for the Armv8.3-a complex 
> multiplication and add instructions with a rotate along the Argand plane.
>
> The instructions are documented in the ArmARM[1] and the intrinsics 
> specification will be published on the Arm website [2].
>
> The Lane versions of these instructions are special in that they always select a pair.
> using index 0 means selecting lane 0 and 1.  Because of this the range 
> check for the intrinsics require special handling.
>
> On Arm, in order to implement some of the lane intrinsics we're using 
> the structure of the register file.  The lane variant of these 
> instructions always select a D register, but the data itself can be 
> stored in Q registers.  This means that for single precision complex 
> numbers you are only allowed to select D[0] but using the register file layout you can get the range 0-1 for lane indices by selecting between Dn[0] and Dn+1[0].
>
> Same reasoning applies for half float complex numbers, except there 
> your D register indexes can be 0 or 1, so you have a total range of 4 elements (for a V8HF).
>
>
> [1] 
> https://developer.arm.com/docs/ddi0487/latest/arm-architecture-referen
> ce-manual-armv8-for-armv8-a-architecture-profile
> [2] https://developer.arm.com/docs/101028/latest
>
> Bootstrapped Regtested on arm-none-gnueabihf and no issues.
>
> Ok for trunk?
>

Ok.
Thanks,
Kyrill

> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
>
>         * config/arm/arm-builtins.c
>         (enum arm_type_qualifiers): Add qualifier_lane_pair_index.
>         (MAC_LANE_PAIR_QUALIFIERS): New.
>         (arm_expand_builtin_args): Use it.
>         (arm_expand_builtin_1): Likewise.
>         * config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
>         * config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
>         * config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
>         * config/arm/arm_neon.h:
>         (vcadd_rot90_f16): New.
>         (vcaddq_rot90_f16): New.
>         (vcadd_rot270_f16): New.
>         (vcaddq_rot270_f16): New.
>         (vcmla_f16): New.
>         (vcmlaq_f16): New.
>         (vcmla_lane_f16): New.
>         (vcmla_laneq_f16): New.
>         (vcmlaq_lane_f16): New.
>         (vcmlaq_laneq_f16): New.
>         (vcmla_rot90_f16): New.
>         (vcmlaq_rot90_f16): New.
>         (vcmla_rot90_lane_f16): New.
>         (vcmla_rot90_laneq_f16): New.
>         (vcmlaq_rot90_lane_f16): New.
>         (vcmlaq_rot90_laneq_f16): New.
>         (vcmla_rot180_f16): New.
>         (vcmlaq_rot180_f16): New.
>         (vcmla_rot180_lane_f16): New.
>         (vcmla_rot180_laneq_f16): New.
>         (vcmlaq_rot180_lane_f16): New.
>         (vcmlaq_rot180_laneq_f16): New.
>         (vcmla_rot270_f16): New.
>         (vcmlaq_rot270_f16): New.
>         (vcmla_rot270_lane_f16): New.
>         (vcmla_rot270_laneq_f16): New.
>         (vcmlaq_rot270_lane_f16): New.
>         (vcmlaq_rot270_laneq_f16): New.
>         (vcadd_rot90_f32): New.
>         (vcaddq_rot90_f32): New.
>         (vcadd_rot270_f32): New.
>         (vcaddq_rot270_f32): New.
>         (vcmla_f32): New.
>         (vcmlaq_f32): New.
>         (vcmla_lane_f32): New.
>         (vcmla_laneq_f32): New.
>         (vcmlaq_lane_f32): New.
>         (vcmlaq_laneq_f32): New.
>         (vcmla_rot90_f32): New.
>         (vcmlaq_rot90_f32): New.
>         (vcmla_rot90_lane_f32): New.
>         (vcmla_rot90_laneq_f32): New.
>         (vcmlaq_rot90_lane_f32): New.
>         (vcmlaq_rot90_laneq_f32): New.
>         (vcmla_rot180_f32): New.
>         (vcmlaq_rot180_f32): New.
>         (vcmla_rot180_lane_f32): New.
>         (vcmla_rot180_laneq_f32): New.
>         (vcmlaq_rot180_lane_f32): New.
>         (vcmlaq_rot180_laneq_f32): New.
>         (vcmla_rot270_f32): New.
>         (vcmlaq_rot270_f32): New.
>         (vcmla_rot270_lane_f32): New.
>         (vcmla_rot270_laneq_f32): New.
>         (vcmlaq_rot270_lane_f32): New.
>         (vcmlaq_rot270_laneq_f32): New.
>         * config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
>         vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, vcmla_lane270,
>         vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
>         vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
>         * config/arm/neon.md (neon_vcmla_lane<rot><mode>,
>         neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.
>
> gcc/testsuite/ChangeLog:
>
> 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
>
>         * gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32 regexpr.
>         * gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.
>
> --
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 8ea000aca1931ca571fe3e2f8931760e7f7ce295..f646ab537fcdac54a3eaf0f1fa403698e29ef005 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -82,7 +82,10 @@ enum arm_type_qualifiers
   /* A void pointer.  */
   qualifier_void_pointer = 0x800,
   /* A const void pointer.  */
-  qualifier_const_void_pointer = 0x802
+  qualifier_const_void_pointer = 0x802,
+  /* Lane indices selected in pairs - must be within range of previous
+     argument = a vector.  */
+  qualifier_lane_pair_index = 0x1000
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -144,6 +147,13 @@ arm_mac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
       qualifier_none, qualifier_lane_index };
 #define MAC_LANE_QUALIFIERS (arm_mac_lane_qualifiers)
 
+/* T (T, T, T, lane pair index).  */
+static enum arm_type_qualifiers
+arm_mac_lane_pair_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none,
+      qualifier_none, qualifier_lane_pair_index };
+#define MAC_LANE_PAIR_QUALIFIERS (arm_mac_lane_pair_qualifiers)
+
 /* unsigned T (unsigned T, unsigned T, unsigend T, lane index).  */
 static enum arm_type_qualifiers
 arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -2129,6 +2139,7 @@ typedef enum {
   ARG_BUILTIN_CONSTANT,
   ARG_BUILTIN_LANE_INDEX,
   ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX,
+  ARG_BUILTIN_LANE_PAIR_INDEX,
   ARG_BUILTIN_NEON_MEMORY,
   ARG_BUILTIN_MEMORY,
   ARG_BUILTIN_STOP
@@ -2266,6 +2277,19 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
 		  machine_mode vmode = mode[argc - 1];
 		  neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode), exp);
 		}
+	      /* If the lane index isn't a constant then error out.  */
+	      goto constant_arg;
+
+	    case ARG_BUILTIN_LANE_PAIR_INDEX:
+	      /* Previous argument must be a vector, which this indexes. The
+		 indexing will always select i and i+1 out of the vector, which
+		 puts a limit on i.  */
+	      gcc_assert (argc > 0);
+	      if (CONST_INT_P (op[argc]))
+		{
+		  machine_mode vmode = mode[argc - 1];
+		  neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp);
+		}
 	      /* If the lane index isn't a constant then the next
 		 case will error.  */
 	      /* Fall through.  */
@@ -2427,6 +2451,8 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target,
 
       if (d->qualifiers[qualifiers_k] & qualifier_lane_index)
 	args[k] = ARG_BUILTIN_LANE_INDEX;
+      else if (d->qualifiers[qualifiers_k] & qualifier_lane_pair_index)
+	args[k] = ARG_BUILTIN_LANE_PAIR_INDEX;
       else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index)
 	args[k] = ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX;
       else if (d->qualifiers[qualifiers_k] & qualifier_immediate)
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 89119c3b894fc3949ef3bf46ec0671a7927775fa..26784dfbaaee7e6ed28cd0586b85cada4ce7c45f 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -76,6 +76,7 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_CRC32", TARGET_CRC32);
   def_or_undef_macro (pfile, "__ARM_FEATURE_DOTPROD", TARGET_DOTPROD);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);
   def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
   cpp_undef (pfile, "__ARM_FEATURE_CMSE");
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cda13a2ebb80e1a29ace0c8dcce854a5329e5dab..2bc43019864ef70ed1bf1e725bad7437cf9b11d8 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -109,6 +109,8 @@ extern int arm_coproc_mem_operand (rtx, bool);
 extern int neon_vector_mem_operand (rtx, int, bool);
 extern int neon_struct_mem_operand (rtx);
 
+extern rtx *neon_vcmla_lane_prepare_operands (machine_mode, rtx *);
+
 extern int tls_mentioned_p (rtx);
 extern int symbol_mentioned_p (rtx);
 extern int label_mentioned_p (rtx);
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 7acbce7653afac3b064a025e07cc2842f9f24311..f40c61973d54dbbfc16d5d2cfd8c2b2f3c802339 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -220,6 +220,9 @@ extern tree arm_fp16_type_node;
 					isa_bit_dotprod)		\
 			&& arm_arch8_2)
 
+/* Supports the Armv8.3-a Complex number AdvSIMD extensions.  */
+#define TARGET_COMPLEX (TARGET_NEON && arm_arch8_3)
+
 /* FPU supports the floating point FP16 instructions for ARMv8.2-A
    and later.  */
 #define TARGET_VFP_FP16INST \
@@ -442,6 +445,12 @@ extern int arm_arch8_1;
 /* Nonzero if this chip supports the ARM Architecture 8.2 extensions.  */
 extern int arm_arch8_2;
 
+/* Nonzero if this chip supports the ARM Architecture 8.3 extensions.  */
+extern int arm_arch8_3;
+
+/* Nonzero if this chip supports the ARM Architecture 8.4 extensions.  */
+extern int arm_arch8_4;
+
 /* Nonzero if this chip supports the FP16 instructions extension of ARM
    Architecture 8.2.  */
 extern int arm_fp16_inst;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3419b6bd0f8497f56a9916d63d5ad60baf479d34..cb5e7215e813dc922d606662df3fdc5040fd3524 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -895,6 +895,12 @@ int arm_arch8_1 = 0;
 /* Nonzero if this chip supports the ARM Architecture 8.2 extensions.  */
 int arm_arch8_2 = 0;
 
+/* Nonzero if this chip supports the ARM Architecture 8.3 extensions.  */
+int arm_arch8_3 = 0;
+
+/* Nonzero if this chip supports the ARM Architecture 8.4 extensions.  */
+int arm_arch8_4 = 0;
+
 /* Nonzero if this chip supports the FP16 instructions extension of ARM
    Architecture 8.2.  */
 int arm_fp16_inst = 0;
@@ -3649,6 +3655,8 @@ arm_option_reconfigure_globals (void)
   arm_arch8 = bitmap_bit_p (arm_active_target.isa, isa_bit_armv8);
   arm_arch8_1 = bitmap_bit_p (arm_active_target.isa, isa_bit_armv8_1);
   arm_arch8_2 = bitmap_bit_p (arm_active_target.isa, isa_bit_armv8_2);
+  arm_arch8_3 = bitmap_bit_p (arm_active_target.isa, isa_bit_armv8_3);
+  arm_arch8_4 = bitmap_bit_p (arm_active_target.isa, isa_bit_armv8_4);
   arm_arch_thumb1 = bitmap_bit_p (arm_active_target.isa, isa_bit_thumb);
   arm_arch_thumb2 = bitmap_bit_p (arm_active_target.isa, isa_bit_thumb2);
   arm_arch_xscale = bitmap_bit_p (arm_active_target.isa, isa_bit_xscale);
@@ -12713,6 +12721,44 @@ neon_struct_mem_operand (rtx op)
   return FALSE;
 }
 
+/* Prepares the operands for the VCMLA by lane instruction such that the right
+   register number is selected.  This instruction is special in that it always
+   requires a D register, however there is a choice to be made between Dn[0],
+   Dn[1], D(n+1)[0], and D(n+1)[1] depending on the mode of the registers and
+   the PATTERNMODE of the insn.
+
+   The VCMLA by lane function always selects two values. For instance given D0
+   and a V2SF, the only valid index is 0 as the values in S0 and S1 will be
+   used by the instruction.  However given V4SF then index 0 and 1 are valid as
+   D0[0] or D1[0] are both valid.
+
+   This function centralizes that information based on OPERANDS, OPERANDS[3]
+   will be changed from a REG into a CONST_INT RTX and OPERANDS[4] will be
+   updated to contain the right index.  */
+
+rtx *
+neon_vcmla_lane_prepare_operands (machine_mode patternmode, rtx *operands)
+{
+  int lane = NEON_ENDIAN_LANE_N (patternmode, INTVAL (operands[4]));
+  machine_mode constmode = SImode;
+  machine_mode mode = GET_MODE (operands[3]);
+  int regno = REGNO (operands[3]);
+  regno = ((regno - FIRST_VFP_REGNUM) >> 1);
+  if (lane > 0 && lane >= GET_MODE_NUNITS (mode) / 4)
+    {
+      operands[3] = gen_int_mode (regno + 1, constmode);
+      operands[4]
+	= gen_int_mode (lane - GET_MODE_NUNITS (mode) / 4, constmode);
+    }
+  else
+    {
+      operands[3] = gen_int_mode (regno, constmode);
+      operands[4] = gen_int_mode (lane, constmode);
+    }
+  return operands;
+}
+
+
 /* Return true if X is a register that will be eliminated later on.  */
 int
 arm_eliminable_register (rtx x)
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 274bad92d6c3cff2260867cbdc1581b6aa0e30dc..3cc2179ddee2a33f170c62ee58c0399b1bcbfd99 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18307,6 +18307,445 @@ vfmlsl_laneq_high_u32 (float32x2_t __r, float16x4_t __a, float16x8_t __b,
 #pragma GCC pop_options
 #endif
 
+/* AdvSIMD Complex numbers intrinsics.  */
+#if __ARM_ARCH >= 8
+#pragma GCC push_options
+#pragma GCC target(("arch=armv8.3-a"))
+
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#pragma GCC push_options
+#pragma GCC target(("+fp16"))
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot90_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcadd90v4hf (__a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot90_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcadd90v8hf (__a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot270_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcadd90v4hf (__a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot270_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcadd90v8hf (__a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla0v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla0v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		const int __index)
+{
+  return __builtin_neon_vcmla_lane0v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmla_laneq0v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmlaq_lane0v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vcmla_lane0v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla90v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla90v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		      const int __index)
+{
+  return __builtin_neon_vcmla_lane90v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_laneq90v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmlaq_lane90v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_lane90v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla180v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla180v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane180v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq180v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane180v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane180v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla270v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla270v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane270v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq270v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane270v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane270v8hf (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+#endif
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot90_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcadd90v2sf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot90_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcadd90v4sf (__a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot270_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcadd90v2sf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot270_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcadd90v4sf (__a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla0v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla0v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		const int __index)
+{
+  return __builtin_neon_vcmla_lane0v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmla_laneq0v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmlaq_lane0v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vcmla_lane0v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla90v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla90v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		      const int __index)
+{
+  return __builtin_neon_vcmla_lane90v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_laneq90v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmlaq_lane90v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_lane90v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla180v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla180v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane180v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq180v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane180v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane180v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla270v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla270v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane270v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq270v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane270v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index e0b2e7fe68edab3fd6cab28978e760fbc5e7744c..bcccf93f7fa2750e9006e5856efecbec0fb331b9 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -351,3 +351,25 @@ VAR2 (TERNOP, sdot, v8qi, v16qi)
 VAR2 (UTERNOP, udot, v8qi, v16qi)
 VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi)
 VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
+
+VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf)
+VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf)
+VAR4 (TERNOP, vcmla0, v2sf, v4sf, v4hf, v8hf)
+VAR4 (TERNOP, vcmla90, v2sf, v4sf, v4hf, v8hf)
+VAR4 (TERNOP, vcmla180, v2sf, v4sf, v4hf, v8hf)
+VAR4 (TERNOP, vcmla270, v2sf, v4sf, v4hf, v8hf)
+
+VAR4 (MAC_LANE_PAIR, vcmla_lane0, v2sf, v4hf, v8hf, v4sf)
+VAR4 (MAC_LANE_PAIR, vcmla_lane90, v2sf, v4hf, v8hf, v4sf)
+VAR4 (MAC_LANE_PAIR, vcmla_lane180, v2sf, v4hf, v8hf, v4sf)
+VAR4 (MAC_LANE_PAIR, vcmla_lane270, v2sf, v4hf, v8hf, v4sf)
+
+VAR2 (MAC_LANE_PAIR, vcmla_laneq0, v2sf, v4hf)
+VAR2 (MAC_LANE_PAIR, vcmla_laneq90, v2sf, v4hf)
+VAR2 (MAC_LANE_PAIR, vcmla_laneq180, v2sf, v4hf)
+VAR2 (MAC_LANE_PAIR, vcmla_laneq270, v2sf, v4hf)
+
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane0, v4sf, v8hf)
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane90, v4sf, v8hf)
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane180, v4sf, v8hf)
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane270, v4sf, v8hf)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 5f46895d5c76bf2bd7e49a4cc7579ac2e8902bdc..c33e572c3e89c3dc5848bd6b825d618481247558 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -123,6 +123,13 @@
 (define_mode_iterator VF [(V4HF "TARGET_NEON_FP16INST")
 			   (V8HF "TARGET_NEON_FP16INST") V2SF V4SF])
 
+;; Double vector modes.
+(define_mode_iterator VDF [V2SF V4HF])
+
+;; Quad vector Float modes with half/single elements.
+(define_mode_iterator VQ_HSF [V8HF V4SF])
+
+
 ;; All supported vector modes (except those with 64-bit integer elements).
 (define_mode_iterator VDQW [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF])
 
@@ -423,6 +430,9 @@
 
 (define_int_iterator VFMLHALVES [UNSPEC_VFML_LO UNSPEC_VFML_HI])
 
+(define_int_iterator VCADD [UNSPEC_VCADD90 UNSPEC_VCADD270])
+(define_int_iterator VCMLA [UNSPEC_VCMLA UNSPEC_VCMLA90 UNSPEC_VCMLA180 UNSPEC_VCMLA270])
+
 ;;----------------------------------------------------------------------------
 ;; Mode attributes
 ;;----------------------------------------------------------------------------
@@ -741,7 +751,7 @@
 (define_mode_attr F_constraint [(SF "t") (DF "w")])
 (define_mode_attr vfp_type [(SF "s") (DF "d")])
 (define_mode_attr vfp_double_cond [(SF "") (DF "&& TARGET_VFP_DOUBLE")])
-(define_mode_attr VF_constraint [(V2SF "t") (V4SF "w")])
+(define_mode_attr VF_constraint [(V4HF "t") (V8HF "t") (V2SF "t") (V4SF "w")])
 
 ;; Mode attribute used to build the "type" attribute.
 (define_mode_attr q [(V8QI "") (V16QI "_q")
@@ -989,6 +999,13 @@
                           (UNSPEC_SHA1SU0 "V4SI") (UNSPEC_SHA256H "V4SI")
                           (UNSPEC_SHA256H2 "V4SI") (UNSPEC_SHA256SU1 "V4SI")])
 
+(define_int_attr rot [(UNSPEC_VCADD90 "90")
+		      (UNSPEC_VCADD270 "270")
+		      (UNSPEC_VCMLA "0")
+		      (UNSPEC_VCMLA90 "90")
+		      (UNSPEC_VCMLA180 "180")
+		      (UNSPEC_VCMLA270 "270")])
+
 ;; Both kinds of return insn.
 (define_code_iterator RETURNS [return simple_return])
 (define_code_attr return_str [(return "") (simple_return "simple_")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6263cd43ab0480edf8da770e2eb035dd59fb1ac8..6f8e7c1cffd2751c1ee7e03ded0410ad3c09c13f 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3457,6 +3457,80 @@
   DONE;
 })
 
+
+;; The vcadd and vcmla patterns are made UNSPEC for the explicitly due to the
+;; fact that their usage need to guarantee that the source vectors are
+;; contiguous.  It would be wrong to describe the operation without being able
+;; to describe the permute that is also required, but even if that is done
+;; the permute would have been created as a LOAD_LANES which means the values
+;; in the registers are in the wrong order.
+(define_insn "neon_vcadd<rot><mode>"
+  [(set (match_operand:VF 0 "register_operand" "=w")
+	(unspec:VF [(match_operand:VF 1 "register_operand" "w")
+		    (match_operand:VF 2 "register_operand" "w")]
+		    VCADD))]
+  "TARGET_COMPLEX"
+  "vcadd.<V_s_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2, #<rot>"
+  [(set_attr "type" "neon_fcadd")]
+)
+
+(define_insn "neon_vcmla<rot><mode>"
+  [(set (match_operand:VF 0 "register_operand" "=w")
+	(plus:VF (match_operand:VF 1 "register_operand" "0")
+		 (unspec:VF [(match_operand:VF 2 "register_operand" "w")
+			     (match_operand:VF 3 "register_operand" "w")]
+			     VCMLA)))]
+  "TARGET_COMPLEX"
+  "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, %<V_reg>3, #<rot>"
+  [(set_attr "type" "neon_fcmla")]
+)
+
+(define_insn "neon_vcmla_lane<rot><mode>"
+  [(set (match_operand:VF 0 "s_register_operand" "=w")
+	(plus:VF (match_operand:VF 1 "s_register_operand" "0")
+		 (unspec:VF [(match_operand:VF 2 "s_register_operand" "w")
+			     (match_operand:VF 3 "s_register_operand" "<VF_constraint>")
+			     (match_operand:SI 4 "const_int_operand" "n")]
+			     VCMLA)))]
+  "TARGET_COMPLEX"
+  {
+    operands = neon_vcmla_lane_prepare_operands (<MODE>mode, operands);
+    return "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, d%c3[%c4], #<rot>";
+  }
+  [(set_attr "type" "neon_fcmla")]
+)
+
+(define_insn "neon_vcmla_laneq<rot><mode>"
+  [(set (match_operand:VDF 0 "s_register_operand" "=w")
+	(plus:VDF (match_operand:VDF 1 "s_register_operand" "0")
+		  (unspec:VDF [(match_operand:VDF 2 "s_register_operand" "w")
+			      (match_operand:<V_DOUBLE> 3 "s_register_operand" "<VF_constraint>")
+			      (match_operand:SI 4 "const_int_operand" "n")]
+			      VCMLA)))]
+  "TARGET_COMPLEX"
+  {
+    operands = neon_vcmla_lane_prepare_operands (<MODE>mode, operands);
+    return "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, d%c3[%c4], #<rot>";
+  }
+  [(set_attr "type" "neon_fcmla")]
+)
+
+(define_insn "neon_vcmlaq_lane<rot><mode>"
+  [(set (match_operand:VQ_HSF 0 "s_register_operand" "=w")
+	(plus:VQ_HSF (match_operand:VQ_HSF 1 "s_register_operand" "0")
+		 (unspec:VQ_HSF [(match_operand:VQ_HSF 2 "s_register_operand" "w")
+				 (match_operand:<V_HALF> 3 "s_register_operand" "<VF_constraint>")
+				 (match_operand:SI 4 "const_int_operand" "n")]
+				 VCMLA)))]
+  "TARGET_COMPLEX"
+  {
+    operands = neon_vcmla_lane_prepare_operands (<MODE>mode, operands);
+    return "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, d%c3[%c4], #<rot>";
+  }
+  [(set_attr "type" "neon_fcmla")]
+)
+
+
 ;; These instructions map to the __builtins for the Dot Product operations.
 (define_insn "neon_<sup>dot<vsi2qi>"
   [(set (match_operand:VCVTI 0 "register_operand" "=w")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 05e89ff0bed3999356fc2f402b394c3d2904e6d0..174bcc5e3d5e1123cb1c1a595f5003884840aea8 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -418,4 +418,10 @@
   UNSPEC_DOT_U
   UNSPEC_VFML_LO
   UNSPEC_VFML_HI
+  UNSPEC_VCADD90
+  UNSPEC_VCADD270
+  UNSPEC_VCMLA
+  UNSPEC_VCMLA90
+  UNSPEC_VCMLA180
+  UNSPEC_VCMLA270
 ])
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c
index b7c999333ed3a7aa9708bca3a0510ba754b7e4d4..1428cbe3f695f082ccae91dfb32ab92461561891 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c
@@ -1,5 +1,4 @@
-/* { dg-skip-if "" { arm-*-* } } */
-/* { dg-do assemble } */
+/* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_3a_complex_neon_ok } */
 /* { dg-add-options arm_v8_3a_complex_neon }  */
 /* { dg-additional-options "-O2 -save-temps" } */
@@ -249,3 +248,22 @@ test_vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.s\[1\], #270} 1 { target { aarch64*-*-* } } } } */
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.s\[1\], #90} 1 { target { aarch64*-*-* } } } } */
 /* { dg-final { scan-assembler-times {dup\td[0-9]+, v[0-9]+.d\[1\]} 4 { target { aarch64*-*-* } } } } */
+
+/* { dg-final { scan-assembler-times {vcadd.f32\td[0-9]+, d[0-9]+, d[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcadd.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #0} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #180} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #270} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #0} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #180} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #270} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 1 { target { arm*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c
index dbcebcbfba67172de25bb3ab743270cacf7c9f96..99754b67e4b4f62561a2c094a59bb70d6af4f31a 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c
@@ -1,5 +1,4 @@
-/* { dg-skip-if "" { arm-*-* } } */
-/* { dg-do assemble } */
+/* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_3a_complex_neon_ok } */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */
 /* { dg-add-options arm_v8_3a_complex_neon } */
@@ -304,3 +303,30 @@ test_vcmlaq_rot270_laneq_f16_2 (float16x8_t __r, float16x8_t __a, float16x8_t __
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.h\[3\], #180} 1 { target { aarch64*-*-* } } } } */
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.h\[3\], #270} 1 { target { aarch64*-*-* } } } } */
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.h\[3\], #90} 1 { target { aarch64*-*-* } } } } */
+
+/* { dg-final { scan-assembler-times {vcadd.f16\td[0-9]+, d[0-9]+, d[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcadd.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #0} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #180} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #270} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #90} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #0} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #180} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #270} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #90} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 1 { target { arm*-*-* } } } } */
Christophe Lyon Jan. 10, 2019, 3:35 p.m. UTC | #5
Hi Tamar,


On Thu, 10 Jan 2019 at 04:44, Tamar Christina <Tamar.Christina@arm.com> wrote:
>
> Hi Kyrill,
>
> Committed with a the addition of a few trivial defines and iterators that were missing due to
> The patch being split.
>
> Thanks,
> Tamar
>
> -----Original Message-----
> From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
> Sent: Friday, December 21, 2018 11:40 AM
> To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; nickc@redhat.com
> Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition
>
> Hi Tamar,
>
> On 11/12/18 15:46, Tamar Christina wrote:
> > Hi All,
> >
> > This patch adds NEON intrinsics and tests for the Armv8.3-a complex
> > multiplication and add instructions with a rotate along the Argand plane.
> >
> > The instructions are documented in the ArmARM[1] and the intrinsics
> > specification will be published on the Arm website [2].
> >
> > The Lane versions of these instructions are special in that they always select a pair.
> > using index 0 means selecting lane 0 and 1.  Because of this the range
> > check for the intrinsics require special handling.
> >
> > On Arm, in order to implement some of the lane intrinsics we're using
> > the structure of the register file.  The lane variant of these
> > instructions always select a D register, but the data itself can be
> > stored in Q registers.  This means that for single precision complex
> > numbers you are only allowed to select D[0] but using the register file layout you can get the range 0-1 for lane indices by selecting between Dn[0] and Dn+1[0].
> >
> > Same reasoning applies for half float complex numbers, except there
> > your D register indexes can be 0 or 1, so you have a total range of 4 elements (for a V8HF).
> >
> >
> > [1]
> > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-referen
> > ce-manual-armv8-for-armv8-a-architecture-profile
> > [2] https://developer.arm.com/docs/101028/latest
> >
> > Bootstrapped Regtested on arm-none-gnueabihf and no issues.
> >
> > Ok for trunk?
> >
>
> Ok.
> Thanks,
> Kyrill
>
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
> >
> >         * config/arm/arm-builtins.c
> >         (enum arm_type_qualifiers): Add qualifier_lane_pair_index.
> >         (MAC_LANE_PAIR_QUALIFIERS): New.
> >         (arm_expand_builtin_args): Use it.
> >         (arm_expand_builtin_1): Likewise.
> >         * config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
> >         * config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
> >         * config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
> >         * config/arm/arm_neon.h:
> >         (vcadd_rot90_f16): New.
> >         (vcaddq_rot90_f16): New.
> >         (vcadd_rot270_f16): New.
> >         (vcaddq_rot270_f16): New.
> >         (vcmla_f16): New.
> >         (vcmlaq_f16): New.
> >         (vcmla_lane_f16): New.
> >         (vcmla_laneq_f16): New.
> >         (vcmlaq_lane_f16): New.
> >         (vcmlaq_laneq_f16): New.
> >         (vcmla_rot90_f16): New.
> >         (vcmlaq_rot90_f16): New.
> >         (vcmla_rot90_lane_f16): New.
> >         (vcmla_rot90_laneq_f16): New.
> >         (vcmlaq_rot90_lane_f16): New.
> >         (vcmlaq_rot90_laneq_f16): New.
> >         (vcmla_rot180_f16): New.
> >         (vcmlaq_rot180_f16): New.
> >         (vcmla_rot180_lane_f16): New.
> >         (vcmla_rot180_laneq_f16): New.
> >         (vcmlaq_rot180_lane_f16): New.
> >         (vcmlaq_rot180_laneq_f16): New.
> >         (vcmla_rot270_f16): New.
> >         (vcmlaq_rot270_f16): New.
> >         (vcmla_rot270_lane_f16): New.
> >         (vcmla_rot270_laneq_f16): New.
> >         (vcmlaq_rot270_lane_f16): New.
> >         (vcmlaq_rot270_laneq_f16): New.
> >         (vcadd_rot90_f32): New.
> >         (vcaddq_rot90_f32): New.
> >         (vcadd_rot270_f32): New.
> >         (vcaddq_rot270_f32): New.
> >         (vcmla_f32): New.
> >         (vcmlaq_f32): New.
> >         (vcmla_lane_f32): New.
> >         (vcmla_laneq_f32): New.
> >         (vcmlaq_lane_f32): New.
> >         (vcmlaq_laneq_f32): New.
> >         (vcmla_rot90_f32): New.
> >         (vcmlaq_rot90_f32): New.
> >         (vcmla_rot90_lane_f32): New.
> >         (vcmla_rot90_laneq_f32): New.
> >         (vcmlaq_rot90_lane_f32): New.
> >         (vcmlaq_rot90_laneq_f32): New.
> >         (vcmla_rot180_f32): New.
> >         (vcmlaq_rot180_f32): New.
> >         (vcmla_rot180_lane_f32): New.
> >         (vcmla_rot180_laneq_f32): New.
> >         (vcmlaq_rot180_lane_f32): New.
> >         (vcmlaq_rot180_laneq_f32): New.
> >         (vcmla_rot270_f32): New.
> >         (vcmlaq_rot270_f32): New.
> >         (vcmla_rot270_lane_f32): New.
> >         (vcmla_rot270_laneq_f32): New.
> >         (vcmlaq_rot270_lane_f32): New.
> >         (vcmlaq_rot270_laneq_f32): New.
> >         * config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
> >         vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, vcmla_lane270,
> >         vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
> >         vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
> >         * config/arm/neon.md (neon_vcmla_lane<rot><mode>,
> >         neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
> >
> >         * gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32 regexpr.
> >         * gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.
> >
> > --
>

Since r267796, I've noticed a regression on aarch64:
FAIL: gcc.target/aarch64/pr68674.c (test for excess errors)
Excess errors:
/home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33361:10:
error: incompatible types when returning type 'int' but 'float16x4_t'
was expected
/home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33385:10:
error: incompatible types when returning type 'int' but 'float16x4_t'
was expected
/home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33423:10:
error: incompatible types when returning type 'int' but 'float16x4_t'
was expected
/home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33477:10:
error: incompatible types when returning type 'int' but 'float16x4_t'
was expected
/home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33595:10:
error: incompatible types when returning type 'int' but 'float32x2_t'
was expected
/home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33648:10:
error: incompatible types when returning type 'int' but 'float32x2_t'
was expected
/home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33701:10:
error: incompatible types when returning type 'int' but 'float32x2_t'
was expected
/home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33754:10:
error: incompatible types when returning type 'int' but 'float32x2_t'
was expected

I'm surprised you didn't see this during validations?
Tamar Christina Jan. 10, 2019, 3:41 p.m. UTC | #6
Hi Christoph,

It was introduced in a small refactoring after which I only retested the testcases I added,which don't trigger the issue.

In any case it's a trivial fix and I'll submit a patch in a bit.

Tamar
Christophe Lyon Jan. 11, 2019, 10:02 a.m. UTC | #7
Hi Tamar,


On Thu, 10 Jan 2019 at 16:41, Tamar Christina <Tamar.Christina@arm.com> wrote:
>
> Hi Christoph,
>
> It was introduced in a small refactoring after which I only retested the testcases I added,which don't trigger the issue.
>
> In any case it's a trivial fix and I'll submit a patch in a bit.
>
> Tamar
>
> ________________________________________
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, January 10, 2019 3:35:18 PM
> To: Tamar Christina
> Cc: Kyrill Tkachov; gcc-patches@gcc.gnu.org; nd; Ramana Radhakrishnan; Richard Earnshaw; nickc@redhat.com
> Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition
>
> Hi Tamar,
>
>
> On Thu, 10 Jan 2019 at 04:44, Tamar Christina <Tamar.Christina@arm.com> wrote:
> >
> > Hi Kyrill,
> >
> > Committed with a the addition of a few trivial defines and iterators that were missing due to
> > The patch being split.
> >
> > Thanks,
> > Tamar
> >
> > -----Original Message-----
> > From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
> > Sent: Friday, December 21, 2018 11:40 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; nickc@redhat.com
> > Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition
> >
> > Hi Tamar,
> >
> > On 11/12/18 15:46, Tamar Christina wrote:
> > > Hi All,
> > >
> > > This patch adds NEON intrinsics and tests for the Armv8.3-a complex
> > > multiplication and add instructions with a rotate along the Argand plane.
> > >
> > > The instructions are documented in the ArmARM[1] and the intrinsics
> > > specification will be published on the Arm website [2].
> > >
> > > The Lane versions of these instructions are special in that they always select a pair.
> > > using index 0 means selecting lane 0 and 1.  Because of this the range
> > > check for the intrinsics require special handling.
> > >
> > > On Arm, in order to implement some of the lane intrinsics we're using
> > > the structure of the register file.  The lane variant of these
> > > instructions always select a D register, but the data itself can be
> > > stored in Q registers.  This means that for single precision complex
> > > numbers you are only allowed to select D[0] but using the register file layout you can get the range 0-1 for lane indices by selecting between Dn[0] and Dn+1[0].
> > >
> > > Same reasoning applies for half float complex numbers, except there
> > > your D register indexes can be 0 or 1, so you have a total range of 4 elements (for a V8HF).
> > >
> > >
> > > [1]
> > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-referen
> > > ce-manual-armv8-for-armv8-a-architecture-profile
> > > [2] https://developer.arm.com/docs/101028/latest
> > >
> > > Bootstrapped Regtested on arm-none-gnueabihf and no issues.
> > >
> > > Ok for trunk?
> > >
> >
> > Ok.
> > Thanks,
> > Kyrill
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
> > >
> > >         * config/arm/arm-builtins.c
> > >         (enum arm_type_qualifiers): Add qualifier_lane_pair_index.
> > >         (MAC_LANE_PAIR_QUALIFIERS): New.
> > >         (arm_expand_builtin_args): Use it.
> > >         (arm_expand_builtin_1): Likewise.
> > >         * config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
> > >         * config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
> > >         * config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
> > >         * config/arm/arm_neon.h:
> > >         (vcadd_rot90_f16): New.
> > >         (vcaddq_rot90_f16): New.
> > >         (vcadd_rot270_f16): New.
> > >         (vcaddq_rot270_f16): New.
> > >         (vcmla_f16): New.
> > >         (vcmlaq_f16): New.
> > >         (vcmla_lane_f16): New.
> > >         (vcmla_laneq_f16): New.
> > >         (vcmlaq_lane_f16): New.
> > >         (vcmlaq_laneq_f16): New.
> > >         (vcmla_rot90_f16): New.
> > >         (vcmlaq_rot90_f16): New.
> > >         (vcmla_rot90_lane_f16): New.
> > >         (vcmla_rot90_laneq_f16): New.
> > >         (vcmlaq_rot90_lane_f16): New.
> > >         (vcmlaq_rot90_laneq_f16): New.
> > >         (vcmla_rot180_f16): New.
> > >         (vcmlaq_rot180_f16): New.
> > >         (vcmla_rot180_lane_f16): New.
> > >         (vcmla_rot180_laneq_f16): New.
> > >         (vcmlaq_rot180_lane_f16): New.
> > >         (vcmlaq_rot180_laneq_f16): New.
> > >         (vcmla_rot270_f16): New.
> > >         (vcmlaq_rot270_f16): New.
> > >         (vcmla_rot270_lane_f16): New.
> > >         (vcmla_rot270_laneq_f16): New.
> > >         (vcmlaq_rot270_lane_f16): New.
> > >         (vcmlaq_rot270_laneq_f16): New.
> > >         (vcadd_rot90_f32): New.
> > >         (vcaddq_rot90_f32): New.
> > >         (vcadd_rot270_f32): New.
> > >         (vcaddq_rot270_f32): New.
> > >         (vcmla_f32): New.
> > >         (vcmlaq_f32): New.
> > >         (vcmla_lane_f32): New.
> > >         (vcmla_laneq_f32): New.
> > >         (vcmlaq_lane_f32): New.
> > >         (vcmlaq_laneq_f32): New.
> > >         (vcmla_rot90_f32): New.
> > >         (vcmlaq_rot90_f32): New.
> > >         (vcmla_rot90_lane_f32): New.
> > >         (vcmla_rot90_laneq_f32): New.
> > >         (vcmlaq_rot90_lane_f32): New.
> > >         (vcmlaq_rot90_laneq_f32): New.
> > >         (vcmla_rot180_f32): New.
> > >         (vcmlaq_rot180_f32): New.
> > >         (vcmla_rot180_lane_f32): New.
> > >         (vcmla_rot180_laneq_f32): New.
> > >         (vcmlaq_rot180_lane_f32): New.
> > >         (vcmlaq_rot180_laneq_f32): New.
> > >         (vcmla_rot270_f32): New.
> > >         (vcmlaq_rot270_f32): New.
> > >         (vcmla_rot270_lane_f32): New.
> > >         (vcmla_rot270_laneq_f32): New.
> > >         (vcmlaq_rot270_lane_f32): New.
> > >         (vcmlaq_rot270_laneq_f32): New.
> > >         * config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
> > >         vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, vcmla_lane270,
> > >         vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
> > >         vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
> > >         * config/arm/neon.md (neon_vcmla_lane<rot><mode>,
> > >         neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
> > >
> > >         * gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32 regexpr.
> > >         * gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.
> > >
> > > --
> >
>
> Since r267796, I've noticed a regression on aarch64:
> FAIL: gcc.target/aarch64/pr68674.c (test for excess errors)
> Excess errors:
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33361:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33385:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33423:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33477:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33595:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33648:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33701:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33754:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
>
> I'm surprised you didn't see this during validations?


I've noticed other problems on arm-none-linux-gnueabihf:
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
 (test for excess errors)
Excess errors:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:18323:10:
error: this builtin is not supported for this target
[....]
The testcase is compiled with -mfp16-format=ieee -march=armv8.3-a -O2
-march=armv8.3-a+fp16


In addition, guess what, some scan-assembler-times directives fail on
big-endian.....
on armeb-none-linux-gnueabihf :
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #0 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #0
2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #180 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\],
#180 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #270 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\],
#270 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #90 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #90
2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #0 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #0
2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #180 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\],
#180 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #270 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\],
#270 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #90 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #90
2

On aarch64_be, I'm see ICEs:
/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c:
In function 'test_vcmla_laneq_f32':
/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c:78:1:
internal compiler error: Segmentation fault
0xc3967f crash_signal
        /gcc/toplev.c:326
0xa70718 mark_jump_label_1
        /gcc/jump.c:1087
0xa707fb mark_jump_label_1
        /gcc/jump.c:1212
0xa707fb mark_jump_label_1
        /gcc/jump.c:1212
0xa70c62 mark_all_labels
        /gcc/jump.c:332
0xa70c62 rebuild_jump_labels_1
        /gcc/jump.c:74
0x78c6af execute
        /gcc/cfgexpand.c:6549
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

and similar for vector-complex_f16.c

Maybe you've already fixed this later in the series?

Happy new year :)

Christophe
Tamar Christina Jan. 11, 2019, 10:37 a.m. UTC | #8
Hi Christoph,

The arm one is a testism, I have a validated patch that I will commit soon.
The aarch64 one is a big-endian lane ordering issue, I had completely forgotten to test big-endian,
Patch for that is going through validation now.

Will submit the aarch64 one soon, sorry for the mess, splitting of the patches from the remainder of the series
had some casualties.. These should be the last.

Thanks and happy new years to you too!

Kind Regards,
Tamar

-----Original Message-----
From: Christophe Lyon <christophe.lyon@linaro.org> 
Sent: Friday, January 11, 2019 10:02 AM
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>; gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; nickc@redhat.com
Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

Hi Tamar,


On Thu, 10 Jan 2019 at 16:41, Tamar Christina <Tamar.Christina@arm.com> wrote:
>
> Hi Christoph,
>
> It was introduced in a small refactoring after which I only retested the testcases I added,which don't trigger the issue.
>
> In any case it's a trivial fix and I'll submit a patch in a bit.
>
> Tamar
>
> ________________________________________
> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Thursday, January 10, 2019 3:35:18 PM
> To: Tamar Christina
> Cc: Kyrill Tkachov; gcc-patches@gcc.gnu.org; nd; Ramana Radhakrishnan; 
> Richard Earnshaw; nickc@redhat.com
> Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex 
> mutliplication and addition
>
> Hi Tamar,
>
>
> On Thu, 10 Jan 2019 at 04:44, Tamar Christina <Tamar.Christina@arm.com> wrote:
> >
> > Hi Kyrill,
> >
> > Committed with a the addition of a few trivial defines and iterators 
> > that were missing due to The patch being split.
> >
> > Thanks,
> > Tamar
> >
> > -----Original Message-----
> > From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
> > Sent: Friday, December 21, 2018 11:40 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>; 
> > gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; Ramana Radhakrishnan 
> > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw 
> > <Richard.Earnshaw@arm.com>; nickc@redhat.com
> > Subject: Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex 
> > mutliplication and addition
> >
> > Hi Tamar,
> >
> > On 11/12/18 15:46, Tamar Christina wrote:
> > > Hi All,
> > >
> > > This patch adds NEON intrinsics and tests for the Armv8.3-a 
> > > complex multiplication and add instructions with a rotate along the Argand plane.
> > >
> > > The instructions are documented in the ArmARM[1] and the 
> > > intrinsics specification will be published on the Arm website [2].
> > >
> > > The Lane versions of these instructions are special in that they always select a pair.
> > > using index 0 means selecting lane 0 and 1.  Because of this the 
> > > range check for the intrinsics require special handling.
> > >
> > > On Arm, in order to implement some of the lane intrinsics we're 
> > > using the structure of the register file.  The lane variant of 
> > > these instructions always select a D register, but the data itself 
> > > can be stored in Q registers.  This means that for single 
> > > precision complex numbers you are only allowed to select D[0] but using the register file layout you can get the range 0-1 for lane indices by selecting between Dn[0] and Dn+1[0].
> > >
> > > Same reasoning applies for half float complex numbers, except 
> > > there your D register indexes can be 0 or 1, so you have a total range of 4 elements (for a V8HF).
> > >
> > >
> > > [1]
> > > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-ref
> > > eren ce-manual-armv8-for-armv8-a-architecture-profile
> > > [2] https://developer.arm.com/docs/101028/latest
> > >
> > > Bootstrapped Regtested on arm-none-gnueabihf and no issues.
> > >
> > > Ok for trunk?
> > >
> >
> > Ok.
> > Thanks,
> > Kyrill
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
> > >
> > >         * config/arm/arm-builtins.c
> > >         (enum arm_type_qualifiers): Add qualifier_lane_pair_index.
> > >         (MAC_LANE_PAIR_QUALIFIERS): New.
> > >         (arm_expand_builtin_args): Use it.
> > >         (arm_expand_builtin_1): Likewise.
> > >         * config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
> > >         * config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
> > >         * config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
> > >         * config/arm/arm_neon.h:
> > >         (vcadd_rot90_f16): New.
> > >         (vcaddq_rot90_f16): New.
> > >         (vcadd_rot270_f16): New.
> > >         (vcaddq_rot270_f16): New.
> > >         (vcmla_f16): New.
> > >         (vcmlaq_f16): New.
> > >         (vcmla_lane_f16): New.
> > >         (vcmla_laneq_f16): New.
> > >         (vcmlaq_lane_f16): New.
> > >         (vcmlaq_laneq_f16): New.
> > >         (vcmla_rot90_f16): New.
> > >         (vcmlaq_rot90_f16): New.
> > >         (vcmla_rot90_lane_f16): New.
> > >         (vcmla_rot90_laneq_f16): New.
> > >         (vcmlaq_rot90_lane_f16): New.
> > >         (vcmlaq_rot90_laneq_f16): New.
> > >         (vcmla_rot180_f16): New.
> > >         (vcmlaq_rot180_f16): New.
> > >         (vcmla_rot180_lane_f16): New.
> > >         (vcmla_rot180_laneq_f16): New.
> > >         (vcmlaq_rot180_lane_f16): New.
> > >         (vcmlaq_rot180_laneq_f16): New.
> > >         (vcmla_rot270_f16): New.
> > >         (vcmlaq_rot270_f16): New.
> > >         (vcmla_rot270_lane_f16): New.
> > >         (vcmla_rot270_laneq_f16): New.
> > >         (vcmlaq_rot270_lane_f16): New.
> > >         (vcmlaq_rot270_laneq_f16): New.
> > >         (vcadd_rot90_f32): New.
> > >         (vcaddq_rot90_f32): New.
> > >         (vcadd_rot270_f32): New.
> > >         (vcaddq_rot270_f32): New.
> > >         (vcmla_f32): New.
> > >         (vcmlaq_f32): New.
> > >         (vcmla_lane_f32): New.
> > >         (vcmla_laneq_f32): New.
> > >         (vcmlaq_lane_f32): New.
> > >         (vcmlaq_laneq_f32): New.
> > >         (vcmla_rot90_f32): New.
> > >         (vcmlaq_rot90_f32): New.
> > >         (vcmla_rot90_lane_f32): New.
> > >         (vcmla_rot90_laneq_f32): New.
> > >         (vcmlaq_rot90_lane_f32): New.
> > >         (vcmlaq_rot90_laneq_f32): New.
> > >         (vcmla_rot180_f32): New.
> > >         (vcmlaq_rot180_f32): New.
> > >         (vcmla_rot180_lane_f32): New.
> > >         (vcmla_rot180_laneq_f32): New.
> > >         (vcmlaq_rot180_lane_f32): New.
> > >         (vcmlaq_rot180_laneq_f32): New.
> > >         (vcmla_rot270_f32): New.
> > >         (vcmlaq_rot270_f32): New.
> > >         (vcmla_rot270_lane_f32): New.
> > >         (vcmla_rot270_laneq_f32): New.
> > >         (vcmlaq_rot270_lane_f32): New.
> > >         (vcmlaq_rot270_laneq_f32): New.
> > >         * config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
> > >         vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, vcmla_lane270,
> > >         vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
> > >         vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
> > >         * config/arm/neon.md (neon_vcmla_lane<rot><mode>,
> > >         neon_vcmla_laneq<rot><mode>, neon_vcmlaq_lane<rot><mode>): New.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2018-12-11  Tamar Christina  <tamar.christina@arm.com>
> > >
> > >         * gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32 regexpr.
> > >         * gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.
> > >
> > > --
> >
>
> Since r267796, I've noticed a regression on aarch64:
> FAIL: gcc.target/aarch64/pr68674.c (test for excess errors) Excess 
> errors:
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33361:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33385:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33423:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33477:10:
> error: incompatible types when returning type 'int' but 'float16x4_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33595:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33648:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33701:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
> /home/tcwg-buildslave/workspace/tcwg-buildfarm_0/_build/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc.git~master_rev_9ccac37030d1cce880d7df7a5716fb56f89a67f6-stage2/gcc/include/arm_neon.h:33754:10:
> error: incompatible types when returning type 'int' but 'float32x2_t'
> was expected
>
> I'm surprised you didn't see this during validations?


I've noticed other problems on arm-none-linux-gnueabihf:
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c   -O0
 (test for excess errors)
Excess errors:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:18323:10:
error: this builtin is not supported for this target [....] The testcase is compiled with -mfp16-format=ieee -march=armv8.3-a -O2
-march=armv8.3-a+fp16


In addition, guess what, some scan-assembler-times directives fail on big-endian.....
on armeb-none-linux-gnueabihf :
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #0 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #0
2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #180 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\],
#180 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #270 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\],
#270 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #90 found 1 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\td[0-9]+, d[0-9]+, d[0-9]+\\[0\\], #90
2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #0 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #0
2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #180 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\],
#180 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #270 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\],
#270 2
gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0  :
vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #90 found 0 times
FAIL: gcc.target/aarch64/advsimd-intrinsics/vector-complex.c   -O0
scan-assembler-times vcmla.f32\\tq[0-9]+, q[0-9]+, d[0-9]+\\[0\\], #90
2

On aarch64_be, I'm see ICEs:
/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c:
In function 'test_vcmla_laneq_f32':
/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c:78:1:
internal compiler error: Segmentation fault 0xc3967f crash_signal
        /gcc/toplev.c:326
0xa70718 mark_jump_label_1
        /gcc/jump.c:1087
0xa707fb mark_jump_label_1
        /gcc/jump.c:1212
0xa707fb mark_jump_label_1
        /gcc/jump.c:1212
0xa70c62 mark_all_labels
        /gcc/jump.c:332
0xa70c62 rebuild_jump_labels_1
        /gcc/jump.c:74
0x78c6af execute
        /gcc/cfgexpand.c:6549
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

and similar for vector-complex_f16.c

Maybe you've already fixed this later in the series?

Happy new year :)

Christophe
diff mbox series

Patch

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 563ca51dcd0d63046d2bf577ca86d5f70a466bcf..1c7eac4b9eae55b76687b9239a2d71f31cc7b8d9 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -82,7 +82,10 @@  enum arm_type_qualifiers
   /* A void pointer.  */
   qualifier_void_pointer = 0x800,
   /* A const void pointer.  */
-  qualifier_const_void_pointer = 0x802
+  qualifier_const_void_pointer = 0x802,
+  /* Lane indices selected in pairs - must be within range of previous
+     argument = a vector.  */
+  qualifier_lane_pair_index = 0x1000
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -144,6 +147,13 @@  arm_mac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
       qualifier_none, qualifier_lane_index };
 #define MAC_LANE_QUALIFIERS (arm_mac_lane_qualifiers)
 
+/* T (T, T, T, lane pair index).  */
+static enum arm_type_qualifiers
+arm_mac_lane_pair_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none,
+      qualifier_none, qualifier_lane_pair_index };
+#define MAC_LANE_PAIR_QUALIFIERS (arm_mac_lane_pair_qualifiers)
+
 /* unsigned T (unsigned T, unsigned T, unsigend T, lane index).  */
 static enum arm_type_qualifiers
 arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -2129,6 +2139,7 @@  typedef enum {
   ARG_BUILTIN_CONSTANT,
   ARG_BUILTIN_LANE_INDEX,
   ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX,
+  ARG_BUILTIN_LANE_PAIR_INDEX,
   ARG_BUILTIN_NEON_MEMORY,
   ARG_BUILTIN_MEMORY,
   ARG_BUILTIN_STOP
@@ -2266,6 +2277,19 @@  arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
 		  machine_mode vmode = mode[argc - 1];
 		  neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode), exp);
 		}
+	      /* If the lane index isn't a constant then error out.  */
+	      goto constant_arg;
+
+	    case ARG_BUILTIN_LANE_PAIR_INDEX:
+	      /* Previous argument must be a vector, which this indexes. The
+		 indexing will always select i and i+1 out of the vector, which
+		 puts a limit on i.  */
+	      gcc_assert (argc > 0);
+	      if (CONST_INT_P (op[argc]))
+		{
+		  machine_mode vmode = mode[argc - 1];
+		  neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp);
+		}
 	      /* If the lane index isn't a constant then the next
 		 case will error.  */
 	      /* Fall through.  */
@@ -2427,6 +2451,8 @@  arm_expand_builtin_1 (int fcode, tree exp, rtx target,
 
       if (d->qualifiers[qualifiers_k] & qualifier_lane_index)
 	args[k] = ARG_BUILTIN_LANE_INDEX;
+      else if (d->qualifiers[qualifiers_k] & qualifier_lane_pair_index)
+	args[k] = ARG_BUILTIN_LANE_PAIR_INDEX;
       else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index)
 	args[k] = ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX;
       else if (d->qualifiers[qualifiers_k] & qualifier_immediate)
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 4471f7914cf282c516a142174f9913e491558b44..89afc65572f3cdc98fff15afb78ef3af602c5b72 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -76,6 +76,7 @@  arm_cpu_builtins (struct cpp_reader* pfile)
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_CRC32", TARGET_CRC32);
   def_or_undef_macro (pfile, "__ARM_FEATURE_DOTPROD", TARGET_DOTPROD);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);
   def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
   cpp_undef (pfile, "__ARM_FEATURE_CMSE");
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cea98669111d318954e9f6102db74172e675304b..f6fec824e68020794a58b94157e064e70b60c456 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -109,6 +109,8 @@  extern int arm_coproc_mem_operand (rtx, bool);
 extern int neon_vector_mem_operand (rtx, int, bool);
 extern int neon_struct_mem_operand (rtx);
 
+extern rtx *neon_vcmla_lane_prepare_operands (machine_mode, rtx *);
+
 extern int tls_mentioned_p (rtx);
 extern int symbol_mentioned_p (rtx);
 extern int label_mentioned_p (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cbcbeeb6e076bb8f632e5c31dd751937af4514f5..f1df6e585c4f8ceac0478d8cb9cd91bdc283f323 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12680,6 +12680,44 @@  neon_struct_mem_operand (rtx op)
   return FALSE;
 }
 
+/* Prepares the operands for the VCMLA by lane instruction such that the right
+   register number is selected.  This instruction is special in that it always
+   requires a D register, however there is a choice to be made between Dn[0],
+   Dn[1], D(n+1)[0], and D(n+1)[1] depending on the mode of the registers and
+   the PATTERNMODE of the insn.
+
+   The VCMLA by lane function always selects two values. For instance given D0
+   and a V2SF, the only valid index is 0 as the values in S0 and S1 will be
+   used by the instruction.  However given V4SF then index 0 and 1 are valid as
+   D0[0] or D1[0] are both valid.
+
+   This function centralizes that information based on OPERANDS, OPERANDS[3]
+   will be changed from a REG into a CONST_INT RTX and OPERANDS[4] will be
+   updated to contain the right index.  */
+
+rtx *
+neon_vcmla_lane_prepare_operands (machine_mode patternmode, rtx *operands)
+{
+  int lane = NEON_ENDIAN_LANE_N (patternmode, INTVAL (operands[4]));
+  machine_mode constmode = SImode;
+  machine_mode mode = GET_MODE (operands[3]);
+  int regno = REGNO (operands[3]);
+  regno = ((regno - FIRST_VFP_REGNUM) >> 1);
+  if (lane >= GET_MODE_NUNITS (mode) / 4)
+    {
+      operands[3] = gen_int_mode (regno + 1, constmode);
+      operands[4]
+	= gen_int_mode (lane - GET_MODE_NUNITS (mode) / 4, constmode);
+    }
+  else
+    {
+      operands[3] = gen_int_mode (regno, constmode);
+      operands[4] = gen_int_mode (lane, constmode);
+    }
+  return operands;
+}
+
+
 /* Return true if X is a register that will be eliminated later on.  */
 int
 arm_eliminable_register (rtx x)
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 6213a4aa0dabec756441523eee870e11485bb1c7..bb3acd20ff3ba6782b1be4363047f62fbb1779e8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18307,6 +18307,445 @@  vfmlsl_laneq_high_u32 (float32x2_t __r, float16x4_t __a, float16x8_t __b,
 #pragma GCC pop_options
 #endif
 
+/* AdvSIMD Complex numbers intrinsics.  */
+#if __ARM_ARCH >= 8
+#pragma GCC push_options
+#pragma GCC target(("arch=armv8.3-a"))
+
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#pragma GCC push_options
+#pragma GCC target(("+fp16"))
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot90_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcadd90v4hf (__a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot90_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcadd90v8hf (__a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot270_f16 (float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcadd90v4hf (__a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot270_f16 (float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcadd90v8hf (__a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla0v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla0v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		const int __index)
+{
+  return __builtin_neon_vcmla_lane0v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmla_laneq0v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmlaq_lane0v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vcmla_lane0v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla90v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla90v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		      const int __index)
+{
+  return __builtin_neon_vcmla_lane90v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_laneq90v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmlaq_lane90v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_lane90v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla180v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla180v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane180v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq180v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane180v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane180v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b)
+{
+  return __builtin_neon_vcmla270v4hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b)
+{
+  return __builtin_neon_vcmla270v8hf (__r, __a, __b);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_lane_f16 (float16x4_t __r, float16x4_t __a, float16x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane270v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_laneq_f16 (float16x4_t __r, float16x4_t __a, float16x8_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq270v4hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_lane_f16 (float16x8_t __r, float16x8_t __a, float16x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane270v8hf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float16x8_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_laneq_f16 (float16x8_t __r, float16x8_t __a, float16x8_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane270v8hf (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+#endif
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot90_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcadd90v2sf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot90_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcadd90v4sf (__a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcadd_rot270_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcadd90v2sf (__a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcaddq_rot270_f32 (float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcadd90v4sf (__a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla0v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla0v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		const int __index)
+{
+  return __builtin_neon_vcmla_lane0v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmla_laneq0v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+		 const int __index)
+{
+  return __builtin_neon_vcmlaq_lane0v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+		  const int __index)
+{
+  return __builtin_neon_vcmla_lane0v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla90v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla90v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		      const int __index)
+{
+  return __builtin_neon_vcmla_lane90v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot90_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_laneq90v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmlaq_lane90v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot90_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_lane90v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla180v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla180v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane180v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot180_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq180v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane180v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot180_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane180v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_neon_vcmla270v2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
+{
+  return __builtin_neon_vcmla270v4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_lane_f32 (float32x2_t __r, float32x2_t __a, float32x2_t __b,
+		       const int __index)
+{
+  return __builtin_neon_vcmla_lane270v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmla_rot270_laneq_f32 (float32x2_t __r, float32x2_t __a, float32x4_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmla_laneq270v2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_lane_f32 (float32x4_t __r, float32x4_t __a, float32x2_t __b,
+			const int __index)
+{
+  return __builtin_neon_vcmlaq_lane270v4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
+			 const int __index)
+{
+  return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 6ec293324fb879d9528ad6cc998d8a893f2cbaab..dcccc84940a9214d6795b4384e84de8150f2273d 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -351,3 +351,25 @@  VAR2 (TERNOP, sdot, v8qi, v16qi)
 VAR2 (UTERNOP, udot, v8qi, v16qi)
 VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi)
 VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
+
+VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf)
+VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf)
+VAR4 (TERNOP, vcmla0, v2sf, v4sf, v4hf, v8hf)
+VAR4 (TERNOP, vcmla90, v2sf, v4sf, v4hf, v8hf)
+VAR4 (TERNOP, vcmla180, v2sf, v4sf, v4hf, v8hf)
+VAR4 (TERNOP, vcmla270, v2sf, v4sf, v4hf, v8hf)
+
+VAR4 (MAC_LANE_PAIR, vcmla_lane0, v2sf, v4hf, v8hf, v4sf)
+VAR4 (MAC_LANE_PAIR, vcmla_lane90, v2sf, v4hf, v8hf, v4sf)
+VAR4 (MAC_LANE_PAIR, vcmla_lane180, v2sf, v4hf, v8hf, v4sf)
+VAR4 (MAC_LANE_PAIR, vcmla_lane270, v2sf, v4hf, v8hf, v4sf)
+
+VAR2 (MAC_LANE_PAIR, vcmla_laneq0, v2sf, v4hf)
+VAR2 (MAC_LANE_PAIR, vcmla_laneq90, v2sf, v4hf)
+VAR2 (MAC_LANE_PAIR, vcmla_laneq180, v2sf, v4hf)
+VAR2 (MAC_LANE_PAIR, vcmla_laneq270, v2sf, v4hf)
+
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane0, v4sf, v8hf)
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane90, v4sf, v8hf)
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane180, v4sf, v8hf)
+VAR2 (MAC_LANE_PAIR, vcmlaq_lane270, v4sf, v8hf)
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index f50075bf5ffb6be6db1975087da0b468ab05a8a2..795d7e0b9f4aca4a9f5eba61b7fce2ceb7f006fb 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3463,6 +3463,51 @@ 
   [(set_attr "type" "neon_fcmla")]
 )
 
+(define_insn "neon_vcmla_lane<rot><mode>"
+  [(set (match_operand:VF 0 "s_register_operand" "=w")
+	(plus:VF (match_operand:VF 1 "s_register_operand" "0")
+		 (unspec:VF [(match_operand:VF 2 "s_register_operand" "w")
+			     (match_operand:VF 3 "s_register_operand" "<VF_constraint>")
+			     (match_operand:SI 4 "const_int_operand" "n")]
+			     VCMLA)))]
+  "TARGET_COMPLEX"
+  {
+    operands = neon_vcmla_lane_prepare_operands (<MODE>mode, operands);
+    return "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, d%c3[%c4], #<rot>";
+  }
+  [(set_attr "type" "neon_fcmla")]
+)
+
+(define_insn "neon_vcmla_laneq<rot><mode>"
+  [(set (match_operand:VDF 0 "s_register_operand" "=w")
+	(plus:VDF (match_operand:VDF 1 "s_register_operand" "0")
+		  (unspec:VDF [(match_operand:VDF 2 "s_register_operand" "w")
+			      (match_operand:<V_DOUBLE> 3 "s_register_operand" "<VF_constraint>")
+			      (match_operand:SI 4 "const_int_operand" "n")]
+			      VCMLA)))]
+  "TARGET_COMPLEX"
+  {
+    operands = neon_vcmla_lane_prepare_operands (<MODE>mode, operands);
+    return "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, d%c3[%c4], #<rot>";
+  }
+  [(set_attr "type" "neon_fcmla")]
+)
+
+(define_insn "neon_vcmlaq_lane<rot><mode>"
+  [(set (match_operand:VQ_HSF 0 "s_register_operand" "=w")
+	(plus:VQ_HSF (match_operand:VQ_HSF 1 "s_register_operand" "0")
+		 (unspec:VQ_HSF [(match_operand:VQ_HSF 2 "s_register_operand" "w")
+				 (match_operand:<V_HALF> 3 "s_register_operand" "<VF_constraint>")
+				 (match_operand:SI 4 "const_int_operand" "n")]
+				 VCMLA)))]
+  "TARGET_COMPLEX"
+  {
+    operands = neon_vcmla_lane_prepare_operands (<MODE>mode, operands);
+    return "vcmla.<V_s_elem>\t%<V_reg>0, %<V_reg>2, d%c3[%c4], #<rot>";
+  }
+  [(set_attr "type" "neon_fcmla")]
+)
+
 ;; The complex mla operations always need to expand to two instructions.
 ;; The first operation does half the computation and the second does the
 ;; remainder.  Because of this, expand early.
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c
index c5c6c905284214dfabdf289789e10e5d2ee2a1a9..514a4fe600a2ddb38db8a96e09feecc45a424c01 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex.c
@@ -1,4 +1,3 @@ 
-/* { dg-skip-if "" { arm-*-* } } */
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_3a_complex_neon_ok } */
 /* { dg-add-options arm_v8_3a_complex_neon }  */
@@ -257,3 +256,22 @@  test_vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b)
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+\.4s, v[0-9]+\.4s, v[0-9]+\.s\[1\], #270} 1 { target { aarch64*-*-* } } } } */
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+\.4s, v[0-9]+\.4s, v[0-9]+\.s\[1\], #90} 1 { target { aarch64*-*-* } } } } */
 
+/* { dg-final { scan-assembler-times {vcadd.f32\td[0-9]+, d[0-9]+, d[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcadd.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+, #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #0} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #180} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #270} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #0} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #180} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #270} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f32\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 1 { target { arm*-*-* } } } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c
index ab62f03a213f303f2a4427ce7254f05f077c1ab7..ba9a66ed54af7a4ea108fea5fff12fb59f6c5706 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c
@@ -1,4 +1,3 @@ 
-/* { dg-skip-if "" { arm-*-* } } */
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_3a_complex_neon_ok } */
 /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */
@@ -308,3 +307,29 @@  test_vcmlaq_rot270_laneq_f16_2 (float16x8_t __r, float16x8_t __a, float16x8_t __
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.h\[3\], #270} 1 { target { aarch64*-*-* } } } } */
 /* { dg-final { scan-assembler-times {fcmla\tv[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.h\[3\], #90} 1 { target { aarch64*-*-* } } } } */
 
+/* { dg-final { scan-assembler-times {vcadd.f16\td[0-9]+, d[0-9]+, d[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcadd.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 2 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #0} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #180} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #270} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[0\], #90} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+\[1\], #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\td[0-9]+, d[0-9]+, d[0-9]+, #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #0} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #180} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #270} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[0\], #90} 3 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, d[0-9]+\[1\], #90} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #0} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #180} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #270} 1 { target { arm*-*-* } } } } */
+/* { dg-final { scan-assembler-times {vcmla.f16\tq[0-9]+, q[0-9]+, q[0-9]+, #90} 1 { target { arm*-*-* } } } } */