diff mbox

[AArch64] Fix some saturating math NEON intrinsics types

Message ID 539EFE90.7030809@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov June 16, 2014, 2:26 p.m. UTC
Hi all,

I noticed that a few saturating math intrinsics in arm_neon.h for 
aarch64 have the wrong types, i.e. not what's mandated by the ACLE spec.

This patch fixes that by adjusting the types of the builtin functions 
that those intrinsics map to (and in the process cleaning up the VCON 
iterator) and adding tests for the affected intrinsics.

I realise it's quite big, but the changes are mostly uniform.

Bootstrapped and tested aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2014-06-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/iterators.md (VCOND): Handle SI and HI modes.
     Update comments.
     (VCONQ): Make comment more helpful.
     (VCON): Delete.
     * config/aarch64/aarch64-simd.md
     (aarch64_sqdmulh_lane<mode>):
     Use VCOND for operands 2.  Update lane checking and flipping logic.
     (aarch64_sqrdmulh_lane<mode>): Likewise.
     (aarch64_sq<r>dmulh_lane<mode>_internal): Likewise.
     (aarch64_sqdmull2<mode>): Remove VCON, use VQ_HSI mode iterator.
     (aarch64_sqdml<SBINQOPS:as>l_lane<mode>_internal, VD_HSI): Change mode
     attribute of operand 3 to VCOND.
     (aarch64_sqdml<SBINQOPS:as>l_lane<mode>_internal, SD_HSI): Likewise.
     (aarch64_sqdml<SBINQOPS:as>l2_lane<mode>_internal): Likewise.
     (aarch64_sqdmull_lane<mode>_internal, VD_HSI): Likewise.
     (aarch64_sqdmull_lane<mode>_internal, SD_HSI): Likewise.
     (aarch64_sqdmull2_lane<mode>_internal): Likewise.
     (aarch64_sqdml<SBINQOPS:as>l_laneq<mode>_internal, VD_HSI: New
     define_insn.
     (aarch64_sqdml<SBINQOPS:as>l_laneq<mode>_internal, SD_HSI): Likewise.
     (aarch64_sqdml<SBINQOPS:as>l2_laneq<mode>_internal): Likewise.
     (aarch64_sqdmull_laneq<mode>_internal, VD_HSI): Likewise.
     (aarch64_sqdmull_laneq<mode>_internal, SD_HSI): Likewise.
     (aarch64_sqdmull2_laneq<mode>_internal): Likewise.
     (aarch64_sqdmlal_lane<mode>): Change mode attribute of penultimate
     operand to VCOND.  Update lane flipping and bounds checking logic.
     (aarch64_sqdmlal2_lane<mode>): Likewise.
     (aarch64_sqdmlsl_lane<mode>): Likewise.
     (aarch64_sqdmull_lane<mode>): Likewise.
     (aarch64_sqdmull2_lane<mode>): Likewise.
     (aarch64_sqdmlal_laneq<mode>):
     Replace VCON usage with VCONQ.
     Emit aarch64_sqdmlal_laneq<mode>_internal insn.
     (aarch64_sqdmlal2_laneq<mode>): Emit
     aarch64_sqdmlal2_laneq<mode>_internal insn.
     Replace VCON with VCONQ.
     (aarch64_sqdmlsl2_lane<mode>): Replace VCON with VCONQ.
     (aarch64_sqdmlsl2_laneq<mode>): Likewise.
     (aarch64_sqdmull_laneq<mode>): Emit
     aarch64_sqdmull_laneq<mode>_internal insn.
     Replace VCON with VCONQ.
     (aarch64_sqdmull2_laneq<mode>): Emit
     aarch64_sqdmull2_laneq<mode>_internal insn.
     (aarch64_sqdmlsl_laneq<mode>): Replace VCON usage with VCONQ.
     * config/aarch64/arm_neon.h (vqdmlal_high_lane_s16): Change type
     of 3rd argument to int16x4_t.
     (vqdmlalh_lane_s16): Likewise.
     (vqdmlslh_lane_s16): Likewise.
     (vqdmull_high_lane_s16): Likewise.
     (vqdmullh_lane_s16): Change type of 2nd argument to int16x4_t.
     (vqdmlal_lane_s16): Don't create temporary int16x8_t value.
     (vqdmlsl_lane_s16): Likewise.
     (vqdmull_lane_s16): Don't create temporary int16x8_t value.
     (vqdmlal_high_lane_s32): Change type 3rd argument to int32x2_t.
     (vqdmlals_lane_s32): Likewise.
     (vqdmlsls_lane_s32): Likewise.
     (vqdmull_high_lane_s32): Change type 2nd argument to int32x2_t.
     (vqdmulls_lane_s32): Likewise.
     (vqdmlal_lane_s32): Don't create temporary int32x4_t value.
     (vqdmlsl_lane_s32): Likewise.
     (vqdmull_lane_s32): Don't create temporary int32x4_t value.
     (vqdmulhh_lane_s16): Change type of second argument to int16x4_t.
     (vqrdmulhh_lane_s16): Likewise.
     (vqdmlsl_high_lane_s16): Likewise.
     (vqdmulhs_lane_s32): Change type of second argument to int32x2_t.
     (vqdmlsl_high_lane_s32): Likewise.
     (vqrdmulhs_lane_s32): Likewise.

2014-06-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/simd/vqdmulhh_lane_s16.c: New test.
     * gcc.target/aarch64/simd/vqdmulhs_lane_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqrdmulhh_lane_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqrdmulhs_lane_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlal_high_lane_s16.c: New test.
     * gcc.target/aarch64/simd/vqdmlal_high_lane_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlal_high_laneq_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlal_high_laneq_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlal_lane_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlal_lane_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlal_laneq_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlal_laneq_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlalh_lane_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlals_lane_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlsl_high_lane_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlsl_high_lane_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlsl_high_laneq_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlsl_high_laneq_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlsl_lane_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlsl_lane_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlsl_laneq_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlslh_lane_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmlsls_lane_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmulh_laneq_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmulh_laneq_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmulhq_laneq_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmulhq_laneq_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmull_high_lane_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmull_high_lane_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmull_high_laneq_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmull_high_laneq_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmull_lane_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmull_lane_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmull_laneq_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmull_laneq_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqdmullh_lane_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqrdmulh_laneq_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqrdmulh_laneq_s32.c: Likewise.
     * gcc.target/aarch64/simd/vqrdmulhq_laneq_s16.c: Likewise.
     * gcc.target/aarch64/simd/vqrdmulhq_laneq_s32.c: Likewise.
     * gcc.target/aarch64/vector_intrinsics.c: Simplify arm_neon.h include.
     (test_vqdmlal_high_lane_s16): Fix parameter type.
     (test_vqdmlal_high_lane_s32): Likewise.
     (test_vqdmull_high_lane_s16): Likewise.
     (test_vqdmull_high_lane_s32): Likewise.
     (test_vqdmlsl_high_lane_s32): Likewise.
     (test_vqdmlsl_high_lane_s16): Likewise.
     * gcc.target/aarch64/scalar_intrinsics.c (test_vqdmlalh_lane_s16):
     Fix argument type.
     (test_vqdmlals_lane_s32): Likewise.
     (test_vqdmlslh_lane_s16): Likewise.
     (test_vqdmlsls_lane_s32): Likewise.
     (test_vqdmulhh_lane_s16): Likewise.
     (test_vqdmulhs_lane_s32): Likewise.
     (test_vqdmullh_lane_s16): Likewise.
     (test_vqdmulls_lane_s32): Likewise.
     (test_vqrdmulhh_lane_s16): Likewise.
     (test_vqrdmulhs_lane_s32): Likewise.

Comments

Marcus Shawcroft June 20, 2014, 8:41 a.m. UTC | #1
On 16 June 2014 15:26, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> Hi all,
>
> I noticed that a few saturating math intrinsics in arm_neon.h for aarch64
> have the wrong types, i.e. not what's mandated by the ACLE spec.
>
> This patch fixes that by adjusting the types of the builtin functions that
> those intrinsics map to (and in the process cleaning up the VCON iterator)
> and adding tests for the affected intrinsics.
>
> I realise it's quite big, but the changes are mostly uniform.
>
> Bootstrapped and tested aarch64-none-linux-gnu.
>
> Ok for trunk?

OK, can you prepare a 4.9 backport?
Cheers
/Marcus
Kyrylo Tkachov June 20, 2014, 2:14 p.m. UTC | #2
On 20/06/14 09:41, Marcus Shawcroft wrote:
> On 16 June 2014 15:26, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>> Hi all,
>>
>> I noticed that a few saturating math intrinsics in arm_neon.h for aarch64
>> have the wrong types, i.e. not what's mandated by the ACLE spec.
>>
>> This patch fixes that by adjusting the types of the builtin functions that
>> those intrinsics map to (and in the process cleaning up the VCON iterator)
>> and adding tests for the affected intrinsics.
>>
>> I realise it's quite big, but the changes are mostly uniform.
>>
>> Bootstrapped and tested aarch64-none-linux-gnu.
>>
>> Ok for trunk?
> OK, can you prepare a 4.9 backport?

Sure, but it depends on 
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00779.html.
Is it ok to backport that one as well?

It passes regtest on aarch64-none-elf and aarch64_be-none-elf.

Kyrill

> Cheers
> /Marcus
>
Marcus Shawcroft June 23, 2014, 8:26 a.m. UTC | #3
On 20 June 2014 15:14, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:

> Sure, but it depends on
> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00779.html.
> Is it ok to backport that one as well?

This can be backported as well.
/Marcus
Kyrylo Tkachov June 23, 2014, 10:52 a.m. UTC | #4
On 23/06/14 09:26, Marcus Shawcroft wrote:
> On 20 June 2014 15:14, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>
>> Sure, but it depends on
>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00779.html.
>> Is it ok to backport that one as well?
> This can be backported as well.
> /Marcus

Thanks, I've backported to 4.9 the above mentioned 
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00779.html patch as r211889.

Kyrill
Kyrylo Tkachov June 23, 2014, 10:59 a.m. UTC | #5
On 23/06/14 11:52, Kyrill Tkachov wrote:
> On 23/06/14 09:26, Marcus Shawcroft wrote:
>> On 20 June 2014 15:14, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>
>>> Sure, but it depends on
>>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00779.html.
>>> Is it ok to backport that one as well?
>> This can be backported as well.
>> /Marcus
> Thanks, I've backported to 4.9 the above mentioned
> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00779.html patch as r211889.

The backport for this patch itself is in testing...

> Kyrill
>
>
>
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 42bfd3e..7fd7094 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2752,12 +2752,12 @@ 
 (define_expand "aarch64_sqdmulh_lane<mode>"
   [(match_operand:SD_HSI 0 "register_operand" "")
    (match_operand:SD_HSI 1 "register_operand" "")
-   (match_operand:<VCONQ> 2 "register_operand" "")
+   (match_operand:<VCOND> 2 "register_operand" "")
    (match_operand:SI 3 "immediate_operand" "")]
   "TARGET_SIMD"
   {
-    aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<VCONQ>mode));
-    operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
+    aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<VCOND>mode));
+    operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
     emit_insn (gen_aarch64_sqdmulh_lane<mode>_internal (operands[0],
                                                         operands[1],
                                                         operands[2],
@@ -2769,12 +2769,12 @@ 
 (define_expand "aarch64_sqrdmulh_lane<mode>"
   [(match_operand:SD_HSI 0 "register_operand" "")
    (match_operand:SD_HSI 1 "register_operand" "")
-   (match_operand:<VCONQ> 2 "register_operand" "")
+   (match_operand:<VCOND> 2 "register_operand" "")
    (match_operand:SI 3 "immediate_operand" "")]
   "TARGET_SIMD"
   {
-    aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<VCONQ>mode));
-    operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
+    aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<VCOND>mode));
+    operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
     emit_insn (gen_aarch64_sqrdmulh_lane<mode>_internal (operands[0],
                                                          operands[1],
                                                          operands[2],
@@ -2788,12 +2788,12 @@ 
         (unspec:SD_HSI
 	  [(match_operand:SD_HSI 1 "register_operand" "w")
            (vec_select:<VEL>
-             (match_operand:<VCONQ> 2 "register_operand" "<vwx>")
+             (match_operand:<VCOND> 2 "register_operand" "<vwx>")
              (parallel [(match_operand:SI 3 "immediate_operand" "i")]))]
 	 VQDMULH))]
   "TARGET_SIMD"
   "*
-   operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
+   operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
    return \"sq<r>dmulh\\t%<v>0, %<v>1, %2.<v>[%3]\";"
   [(set_attr "type" "neon_sat_mul_<Vetype>_scalar<q>")]
 )
@@ -2829,7 +2829,31 @@ 
 	      (sign_extend:<VWIDE>
 		(vec_duplicate:VD_HSI
 		  (vec_select:<VEL>
-		    (match_operand:<VCON> 3 "register_operand" "<vwx>")
+		    (match_operand:<VCOND> 3 "register_operand" "<vwx>")
+		    (parallel [(match_operand:SI 4 "immediate_operand" "i")])))
+              ))
+	    (const_int 1))))]
+  "TARGET_SIMD"
+  {
+    operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
+    return
+      "sqdml<SBINQOPS:as>l\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
+  }
+  [(set_attr "type" "neon_sat_mla_<Vetype>_scalar_long")]
+)
+
+(define_insn "aarch64_sqdml<SBINQOPS:as>l_laneq<mode>_internal"
+  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+        (SBINQOPS:<VWIDE>
+	  (match_operand:<VWIDE> 1 "register_operand" "0")
+	  (ss_ashift:<VWIDE>
+	    (mult:<VWIDE>
+	      (sign_extend:<VWIDE>
+		(match_operand:VD_HSI 2 "register_operand" "w"))
+	      (sign_extend:<VWIDE>
+		(vec_duplicate:VD_HSI
+		  (vec_select:<VEL>
+		    (match_operand:<VCONQ> 3 "register_operand" "<vwx>")
 		    (parallel [(match_operand:SI 4 "immediate_operand" "i")])))
               ))
 	    (const_int 1))))]
@@ -2852,7 +2876,30 @@ 
 		(match_operand:SD_HSI 2 "register_operand" "w"))
 	      (sign_extend:<VWIDE>
 		(vec_select:<VEL>
-		  (match_operand:<VCON> 3 "register_operand" "<vwx>")
+		  (match_operand:<VCOND> 3 "register_operand" "<vwx>")
+		  (parallel [(match_operand:SI 4 "immediate_operand" "i")])))
+              )
+	    (const_int 1))))]
+  "TARGET_SIMD"
+  {
+    operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
+    return
+      "sqdml<SBINQOPS:as>l\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
+  }
+  [(set_attr "type" "neon_sat_mla_<Vetype>_scalar_long")]
+)
+
+(define_insn "aarch64_sqdml<SBINQOPS:as>l_laneq<mode>_internal"
+  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+        (SBINQOPS:<VWIDE>
+	  (match_operand:<VWIDE> 1 "register_operand" "0")
+	  (ss_ashift:<VWIDE>
+	    (mult:<VWIDE>
+	      (sign_extend:<VWIDE>
+		(match_operand:SD_HSI 2 "register_operand" "w"))
+	      (sign_extend:<VWIDE>
+		(vec_select:<VEL>
+		  (match_operand:<VCONQ> 3 "register_operand" "<vwx>")
 		  (parallel [(match_operand:SI 4 "immediate_operand" "i")])))
               )
 	    (const_int 1))))]
@@ -2869,12 +2916,12 @@ 
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:<VWIDE> 1 "register_operand" "0")
    (match_operand:VSD_HSI 2 "register_operand" "w")
-   (match_operand:<VCON> 3 "register_operand" "<vwx>")
+   (match_operand:<VCOND> 3 "register_operand" "<vwx>")
    (match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
-  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<VCON>mode) / 2);
-  operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
+  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<VCOND>mode));
+  operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
   emit_insn (gen_aarch64_sqdmlal_lane<mode>_internal (operands[0], operands[1],
 						      operands[2], operands[3],
 						      operands[4]));
@@ -2885,13 +2932,13 @@ 
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:<VWIDE> 1 "register_operand" "0")
    (match_operand:VSD_HSI 2 "register_operand" "w")
-   (match_operand:<VCON> 3 "register_operand" "<vwx>")
+   (match_operand:<VCONQ> 3 "register_operand" "<vwx>")
    (match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
-  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<VCON>mode));
-  operands[4] = GEN_INT (ENDIAN_LANE_N (<VCON>mode, INTVAL (operands[4])));
-  emit_insn (gen_aarch64_sqdmlal_lane<mode>_internal (operands[0], operands[1],
+  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<VCONQ>mode));
+  operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
+  emit_insn (gen_aarch64_sqdmlal_laneq<mode>_internal (operands[0], operands[1],
 						      operands[2], operands[3],
 						      operands[4]));
   DONE;
@@ -2901,12 +2948,12 @@ 
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:<VWIDE> 1 "register_operand" "0")
    (match_operand:VSD_HSI 2 "register_operand" "w")
-   (match_operand:<VCON> 3 "register_operand" "<vwx>")
+   (match_operand:<VCOND> 3 "register_operand" "<vwx>")
    (match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
-  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<VCON>mode) / 2);
-  operands[4] = GEN_INT (ENDIAN_LANE_N (<VCON>mode, INTVAL (operands[4])));
+  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<VCOND>mode));
+  operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
   emit_insn (gen_aarch64_sqdmlsl_lane<mode>_internal (operands[0], operands[1],
 						      operands[2], operands[3],
 						      operands[4]));
@@ -2917,13 +2964,13 @@ 
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:<VWIDE> 1 "register_operand" "0")
    (match_operand:VSD_HSI 2 "register_operand" "w")
-   (match_operand:<VCON> 3 "register_operand" "<vwx>")
+   (match_operand:<VCONQ> 3 "register_operand" "<vwx>")
    (match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
-  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<VCON>mode));
-  operands[4] = GEN_INT (ENDIAN_LANE_N (<VCON>mode, INTVAL (operands[4])));
-  emit_insn (gen_aarch64_sqdmlsl_lane<mode>_internal (operands[0], operands[1],
+  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<VCONQ>mode));
+  operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
+  emit_insn (gen_aarch64_sqdmlsl_laneq<mode>_internal (operands[0], operands[1],
 						      operands[2], operands[3],
 						      operands[4]));
   DONE;
@@ -3011,7 +3058,33 @@ 
 		(sign_extend:<VWIDE>
                   (vec_duplicate:<VHALF>
 		    (vec_select:<VEL>
-		      (match_operand:<VCON> 3 "register_operand" "<vwx>")
+		      (match_operand:<VCOND> 3 "register_operand" "<vwx>")
+		      (parallel [(match_operand:SI 4 "immediate_operand" "i")])
+		    ))))
+	      (const_int 1))))]
+  "TARGET_SIMD"
+  {
+    operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
+    return
+     "sqdml<SBINQOPS:as>l2\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
+  }
+  [(set_attr "type" "neon_sat_mla_<Vetype>_scalar_long")]
+)
+
+(define_insn "aarch64_sqdml<SBINQOPS:as>l2_laneq<mode>_internal"
+  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+        (SBINQOPS:<VWIDE>
+	  (match_operand:<VWIDE> 1 "register_operand" "0")
+	  (ss_ashift:<VWIDE>
+	      (mult:<VWIDE>
+		(sign_extend:<VWIDE>
+                  (vec_select:<VHALF>
+                    (match_operand:VQ_HSI 2 "register_operand" "w")
+                    (match_operand:VQ_HSI 5 "vect_par_cnst_hi_half" "")))
+		(sign_extend:<VWIDE>
+                  (vec_duplicate:<VHALF>
+		    (vec_select:<VEL>
+		      (match_operand:<VCONQ> 3 "register_operand" "<vwx>")
 		      (parallel [(match_operand:SI 4 "immediate_operand" "i")])
 		    ))))
 	      (const_int 1))))]
@@ -3028,13 +3101,13 @@ 
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:<VWIDE> 1 "register_operand" "w")
    (match_operand:VQ_HSI 2 "register_operand" "w")
-   (match_operand:<VCON> 3 "register_operand" "<vwx>")
+   (match_operand:<VCOND> 3 "register_operand" "<vwx>")
    (match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
   rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
-  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<MODE>mode) / 2);
-  operands[4] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[4])));
+  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<VCOND>mode));
+  operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
   emit_insn (gen_aarch64_sqdmlal2_lane<mode>_internal (operands[0], operands[1],
 						       operands[2], operands[3],
 						       operands[4], p));
@@ -3045,14 +3118,14 @@ 
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:<VWIDE> 1 "register_operand" "w")
    (match_operand:VQ_HSI 2 "register_operand" "w")
-   (match_operand:<VCON> 3 "register_operand" "<vwx>")
+   (match_operand:<VCONQ> 3 "register_operand" "<vwx>")
    (match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
   rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
-  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<MODE>mode));
-  operands[4] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[4])));
-  emit_insn (gen_aarch64_sqdmlal2_lane<mode>_internal (operands[0], operands[1],
+  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<VCONQ>mode));
+  operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
+  emit_insn (gen_aarch64_sqdmlal2_laneq<mode>_internal (operands[0], operands[1],
 						       operands[2], operands[3],
 						       operands[4], p));
   DONE;
@@ -3062,13 +3135,13 @@ 
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:<VWIDE> 1 "register_operand" "w")
    (match_operand:VQ_HSI 2 "register_operand" "w")
-   (match_operand:<VCON> 3 "register_operand" "<vwx>")
+   (match_operand:<VCOND> 3 "register_operand" "<vwx>")
    (match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
   rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
-  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<MODE>mode) / 2);
-  operands[4] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[4])));
+  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<VCOND>mode));
+  operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
   emit_insn (gen_aarch64_sqdmlsl2_lane<mode>_internal (operands[0], operands[1],
 						       operands[2], operands[3],
 						       operands[4], p));
@@ -3079,14 +3152,14 @@ 
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:<VWIDE> 1 "register_operand" "w")
    (match_operand:VQ_HSI 2 "register_operand" "w")
-   (match_operand:<VCON> 3 "register_operand" "<vwx>")
+   (match_operand:<VCONQ> 3 "register_operand" "<vwx>")
    (match_operand:SI 4 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
   rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
-  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<MODE>mode));
-  operands[4] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[4])));
-  emit_insn (gen_aarch64_sqdmlsl2_lane<mode>_internal (operands[0], operands[1],
+  aarch64_simd_lane_bounds (operands[4], 0, GET_MODE_NUNITS (<VCONQ>mode));
+  operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
+  emit_insn (gen_aarch64_sqdmlsl2_laneq<mode>_internal (operands[0], operands[1],
 						       operands[2], operands[3],
 						       operands[4], p));
   DONE;
@@ -3166,7 +3239,28 @@ 
 	       (sign_extend:<VWIDE>
                  (vec_duplicate:VD_HSI
                    (vec_select:<VEL>
-		     (match_operand:<VCON> 2 "register_operand" "<vwx>")
+		     (match_operand:<VCOND> 2 "register_operand" "<vwx>")
+		     (parallel [(match_operand:SI 3 "immediate_operand" "i")])))
+	       ))
+	     (const_int 1)))]
+  "TARGET_SIMD"
+  {
+    operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
+    return "sqdmull\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
+  }
+  [(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
+)
+
+(define_insn "aarch64_sqdmull_laneq<mode>_internal"
+  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+        (ss_ashift:<VWIDE>
+	     (mult:<VWIDE>
+	       (sign_extend:<VWIDE>
+		 (match_operand:VD_HSI 1 "register_operand" "w"))
+	       (sign_extend:<VWIDE>
+                 (vec_duplicate:VD_HSI
+                   (vec_select:<VEL>
+		     (match_operand:<VCONQ> 2 "register_operand" "<vwx>")
 		     (parallel [(match_operand:SI 3 "immediate_operand" "i")])))
 	       ))
 	     (const_int 1)))]
@@ -3186,7 +3280,27 @@ 
 		 (match_operand:SD_HSI 1 "register_operand" "w"))
 	       (sign_extend:<VWIDE>
                  (vec_select:<VEL>
-		   (match_operand:<VCON> 2 "register_operand" "<vwx>")
+		   (match_operand:<VCOND> 2 "register_operand" "<vwx>")
+		   (parallel [(match_operand:SI 3 "immediate_operand" "i")]))
+	       ))
+	     (const_int 1)))]
+  "TARGET_SIMD"
+  {
+    operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
+    return "sqdmull\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
+  }
+  [(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
+)
+
+(define_insn "aarch64_sqdmull_laneq<mode>_internal"
+  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+        (ss_ashift:<VWIDE>
+	     (mult:<VWIDE>
+	       (sign_extend:<VWIDE>
+		 (match_operand:SD_HSI 1 "register_operand" "w"))
+	       (sign_extend:<VWIDE>
+                 (vec_select:<VEL>
+		   (match_operand:<VCONQ> 2 "register_operand" "<vwx>")
 		   (parallel [(match_operand:SI 3 "immediate_operand" "i")]))
 	       ))
 	     (const_int 1)))]
@@ -3201,12 +3315,12 @@ 
 (define_expand "aarch64_sqdmull_lane<mode>"
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:VSD_HSI 1 "register_operand" "w")
-   (match_operand:<VCON> 2 "register_operand" "<vwx>")
+   (match_operand:<VCOND> 2 "register_operand" "<vwx>")
    (match_operand:SI 3 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
-  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<VCON>mode) / 2);
-  operands[3] = GEN_INT (ENDIAN_LANE_N (<VCON>mode, INTVAL (operands[3])));
+  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<VCOND>mode));
+  operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
   emit_insn (gen_aarch64_sqdmull_lane<mode>_internal (operands[0], operands[1],
 						      operands[2], operands[3]));
   DONE;
@@ -3215,13 +3329,13 @@ 
 (define_expand "aarch64_sqdmull_laneq<mode>"
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:VD_HSI 1 "register_operand" "w")
-   (match_operand:<VCON> 2 "register_operand" "<vwx>")
+   (match_operand:<VCONQ> 2 "register_operand" "<vwx>")
    (match_operand:SI 3 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
-  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<VCON>mode));
+  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<VCONQ>mode));
   operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
-  emit_insn (gen_aarch64_sqdmull_lane<mode>_internal
+  emit_insn (gen_aarch64_sqdmull_laneq<mode>_internal
 	       (operands[0], operands[1], operands[2], operands[3]));
   DONE;
 })
@@ -3270,7 +3384,7 @@ 
 (define_expand "aarch64_sqdmull2<mode>"
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:VQ_HSI 1 "register_operand" "w")
-   (match_operand:<VCON> 2 "register_operand" "w")]
+   (match_operand:VQ_HSI 2 "register_operand" "w")]
   "TARGET_SIMD"
 {
   rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
@@ -3292,7 +3406,30 @@ 
 	       (sign_extend:<VWIDE>
                  (vec_duplicate:<VHALF>
                    (vec_select:<VEL>
-		     (match_operand:<VCON> 2 "register_operand" "<vwx>")
+		     (match_operand:<VCOND> 2 "register_operand" "<vwx>")
+		     (parallel [(match_operand:SI 3 "immediate_operand" "i")])))
+	       ))
+	     (const_int 1)))]
+  "TARGET_SIMD"
+  {
+    operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
+    return "sqdmull2\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
+  }
+  [(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
+)
+
+(define_insn "aarch64_sqdmull2_laneq<mode>_internal"
+  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+        (ss_ashift:<VWIDE>
+	     (mult:<VWIDE>
+	       (sign_extend:<VWIDE>
+		 (vec_select:<VHALF>
+                   (match_operand:VQ_HSI 1 "register_operand" "w")
+                   (match_operand:VQ_HSI 4 "vect_par_cnst_hi_half" "")))
+	       (sign_extend:<VWIDE>
+                 (vec_duplicate:<VHALF>
+                   (vec_select:<VEL>
+		     (match_operand:<VCONQ> 2 "register_operand" "<vwx>")
 		     (parallel [(match_operand:SI 3 "immediate_operand" "i")])))
 	       ))
 	     (const_int 1)))]
@@ -3307,13 +3444,13 @@ 
 (define_expand "aarch64_sqdmull2_lane<mode>"
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:VQ_HSI 1 "register_operand" "w")
-   (match_operand:<VCON> 2 "register_operand" "<vwx>")
+   (match_operand:<VCOND> 2 "register_operand" "<vwx>")
    (match_operand:SI 3 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
   rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
-  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<MODE>mode) / 2);
-  operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
+  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<VCOND>mode));
+  operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
   emit_insn (gen_aarch64_sqdmull2_lane<mode>_internal (operands[0], operands[1],
 						       operands[2], operands[3],
 						       p));
@@ -3323,14 +3460,14 @@ 
 (define_expand "aarch64_sqdmull2_laneq<mode>"
   [(match_operand:<VWIDE> 0 "register_operand" "=w")
    (match_operand:VQ_HSI 1 "register_operand" "w")
-   (match_operand:<VCON> 2 "register_operand" "<vwx>")
+   (match_operand:<VCONQ> 2 "register_operand" "<vwx>")
    (match_operand:SI 3 "immediate_operand" "i")]
   "TARGET_SIMD"
 {
   rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
-  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<MODE>mode));
-  operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
-  emit_insn (gen_aarch64_sqdmull2_lane<mode>_internal (operands[0], operands[1],
+  aarch64_simd_lane_bounds (operands[3], 0, GET_MODE_NUNITS (<VCONQ>mode));
+  operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
+  emit_insn (gen_aarch64_sqdmull2_laneq<mode>_internal (operands[0], operands[1],
 						       operands[2], operands[3],
 						       p));
   DONE;
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index b1b78f9..3ed8a98 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -19219,7 +19219,7 @@  vqdmlal_high_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c)
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vqdmlal_high_lane_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c,
+vqdmlal_high_lane_s16 (int32x4_t __a, int16x8_t __b, int16x4_t __c,
 		       int const __d)
 {
   return __builtin_aarch64_sqdmlal2_lanev8hi (__a, __b, __c, __d);
@@ -19241,8 +19241,7 @@  vqdmlal_high_n_s16 (int32x4_t __a, int16x8_t __b, int16_t __c)
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmlal_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, int const __d)
 {
-  int16x8_t __tmp = vcombine_s16 (__c, vcreate_s16 (__AARCH64_INT64_C (0)));
-  return __builtin_aarch64_sqdmlal_lanev4hi (__a, __b, __tmp, __d);
+  return __builtin_aarch64_sqdmlal_lanev4hi (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
@@ -19270,7 +19269,7 @@  vqdmlal_high_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c)
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vqdmlal_high_lane_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c,
+vqdmlal_high_lane_s32 (int64x2_t __a, int32x4_t __b, int32x2_t __c,
 		       int const __d)
 {
   return __builtin_aarch64_sqdmlal2_lanev4si (__a, __b, __c, __d);
@@ -19292,8 +19291,7 @@  vqdmlal_high_n_s32 (int64x2_t __a, int32x4_t __b, int32_t __c)
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqdmlal_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, int const __d)
 {
-  int32x4_t __tmp = vcombine_s32 (__c, vcreate_s32 (__AARCH64_INT64_C (0)));
-  return __builtin_aarch64_sqdmlal_lanev2si (__a, __b, __tmp, __d);
+  return __builtin_aarch64_sqdmlal_lanev2si (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
@@ -19315,7 +19313,7 @@  vqdmlalh_s16 (int32x1_t __a, int16x1_t __b, int16x1_t __c)
 }
 
 __extension__ static __inline int32x1_t __attribute__ ((__always_inline__))
-vqdmlalh_lane_s16 (int32x1_t __a, int16x1_t __b, int16x8_t __c, const int __d)
+vqdmlalh_lane_s16 (int32x1_t __a, int16x1_t __b, int16x4_t __c, const int __d)
 {
   return __builtin_aarch64_sqdmlal_lanehi (__a, __b, __c, __d);
 }
@@ -19327,7 +19325,7 @@  vqdmlals_s32 (int64x1_t __a, int32x1_t __b, int32x1_t __c)
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
-vqdmlals_lane_s32 (int64x1_t __a, int32x1_t __b, int32x4_t __c, const int __d)
+vqdmlals_lane_s32 (int64x1_t __a, int32x1_t __b, int32x2_t __c, const int __d)
 {
   return __builtin_aarch64_sqdmlal_lanesi (__a, __b, __c, __d);
 }
@@ -19347,7 +19345,7 @@  vqdmlsl_high_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c)
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vqdmlsl_high_lane_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c,
+vqdmlsl_high_lane_s16 (int32x4_t __a, int16x8_t __b, int16x4_t __c,
 		       int const __d)
 {
   return __builtin_aarch64_sqdmlsl2_lanev8hi (__a, __b, __c, __d);
@@ -19369,8 +19367,7 @@  vqdmlsl_high_n_s16 (int32x4_t __a, int16x8_t __b, int16_t __c)
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmlsl_lane_s16 (int32x4_t __a, int16x4_t __b, int16x4_t __c, int const __d)
 {
-  int16x8_t __tmp = vcombine_s16 (__c, vcreate_s16 (__AARCH64_INT64_C (0)));
-  return __builtin_aarch64_sqdmlsl_lanev4hi (__a, __b, __tmp, __d);
+  return __builtin_aarch64_sqdmlsl_lanev4hi (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
@@ -19398,7 +19395,7 @@  vqdmlsl_high_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c)
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vqdmlsl_high_lane_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c,
+vqdmlsl_high_lane_s32 (int64x2_t __a, int32x4_t __b, int32x2_t __c,
 		       int const __d)
 {
   return __builtin_aarch64_sqdmlsl2_lanev4si (__a, __b, __c, __d);
@@ -19420,8 +19417,7 @@  vqdmlsl_high_n_s32 (int64x2_t __a, int32x4_t __b, int32_t __c)
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqdmlsl_lane_s32 (int64x2_t __a, int32x2_t __b, int32x2_t __c, int const __d)
 {
-  int32x4_t __tmp = vcombine_s32 (__c, vcreate_s32 (__AARCH64_INT64_C (0)));
-  return __builtin_aarch64_sqdmlsl_lanev2si (__a, __b, __tmp, __d);
+  return __builtin_aarch64_sqdmlsl_lanev2si (__a, __b, __c, __d);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
@@ -19443,7 +19439,7 @@  vqdmlslh_s16 (int32x1_t __a, int16x1_t __b, int16x1_t __c)
 }
 
 __extension__ static __inline int32x1_t __attribute__ ((__always_inline__))
-vqdmlslh_lane_s16 (int32x1_t __a, int16x1_t __b, int16x8_t __c, const int __d)
+vqdmlslh_lane_s16 (int32x1_t __a, int16x1_t __b, int16x4_t __c, const int __d)
 {
   return __builtin_aarch64_sqdmlsl_lanehi (__a, __b, __c, __d);
 }
@@ -19455,7 +19451,7 @@  vqdmlsls_s32 (int64x1_t __a, int32x1_t __b, int32x1_t __c)
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
-vqdmlsls_lane_s32 (int64x1_t __a, int32x1_t __b, int32x4_t __c, const int __d)
+vqdmlsls_lane_s32 (int64x1_t __a, int32x1_t __b, int32x2_t __c, const int __d)
 {
   return __builtin_aarch64_sqdmlsl_lanesi (__a, __b, __c, __d);
 }
@@ -19493,7 +19489,7 @@  vqdmulhh_s16 (int16x1_t __a, int16x1_t __b)
 }
 
 __extension__ static __inline int16x1_t __attribute__ ((__always_inline__))
-vqdmulhh_lane_s16 (int16x1_t __a, int16x8_t __b, const int __c)
+vqdmulhh_lane_s16 (int16x1_t __a, int16x4_t __b, const int __c)
 {
   return __builtin_aarch64_sqdmulh_lanehi (__a, __b, __c);
 }
@@ -19505,7 +19501,7 @@  vqdmulhs_s32 (int32x1_t __a, int32x1_t __b)
 }
 
 __extension__ static __inline int32x1_t __attribute__ ((__always_inline__))
-vqdmulhs_lane_s32 (int32x1_t __a, int32x4_t __b, const int __c)
+vqdmulhs_lane_s32 (int32x1_t __a, int32x2_t __b, const int __c)
 {
   return __builtin_aarch64_sqdmulh_lanesi (__a, __b, __c);
 }
@@ -19525,7 +19521,7 @@  vqdmull_high_s16 (int16x8_t __a, int16x8_t __b)
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vqdmull_high_lane_s16 (int16x8_t __a, int16x8_t __b, int const __c)
+vqdmull_high_lane_s16 (int16x8_t __a, int16x4_t __b, int const __c)
 {
   return __builtin_aarch64_sqdmull2_lanev8hi (__a, __b,__c);
 }
@@ -19545,8 +19541,7 @@  vqdmull_high_n_s16 (int16x8_t __a, int16_t __b)
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
 vqdmull_lane_s16 (int16x4_t __a, int16x4_t __b, int const __c)
 {
-  int16x8_t __tmp = vcombine_s16 (__b, vcreate_s16 (__AARCH64_INT64_C (0)));
-  return __builtin_aarch64_sqdmull_lanev4hi (__a, __tmp, __c);
+  return __builtin_aarch64_sqdmull_lanev4hi (__a, __b, __c);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
@@ -19574,7 +19569,7 @@  vqdmull_high_s32 (int32x4_t __a, int32x4_t __b)
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vqdmull_high_lane_s32 (int32x4_t __a, int32x4_t __b, int const __c)
+vqdmull_high_lane_s32 (int32x4_t __a, int32x2_t __b, int const __c)
 {
   return __builtin_aarch64_sqdmull2_lanev4si (__a, __b, __c);
 }
@@ -19594,8 +19589,7 @@  vqdmull_high_n_s32 (int32x4_t __a, int32_t __b)
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vqdmull_lane_s32 (int32x2_t __a, int32x2_t __b, int const __c)
 {
-  int32x4_t __tmp = vcombine_s32 (__b, vcreate_s32 (__AARCH64_INT64_C (0)));
-  return __builtin_aarch64_sqdmull_lanev2si (__a, __tmp, __c);
+  return __builtin_aarch64_sqdmull_lanev2si (__a, __b, __c);
 }
 
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
@@ -19617,7 +19611,7 @@  vqdmullh_s16 (int16x1_t __a, int16x1_t __b)
 }
 
 __extension__ static __inline int32x1_t __attribute__ ((__always_inline__))
-vqdmullh_lane_s16 (int16x1_t __a, int16x8_t __b, const int __c)
+vqdmullh_lane_s16 (int16x1_t __a, int16x4_t __b, const int __c)
 {
   return __builtin_aarch64_sqdmull_lanehi (__a, __b, __c);
 }
@@ -19629,7 +19623,7 @@  vqdmulls_s32 (int32x1_t __a, int32x1_t __b)
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
-vqdmulls_lane_s32 (int32x1_t __a, int32x4_t __b, const int __c)
+vqdmulls_lane_s32 (int32x1_t __a, int32x2_t __b, const int __c)
 {
   return __builtin_aarch64_sqdmull_lanesi (__a, __b, __c);
 }
@@ -19811,7 +19805,7 @@  vqrdmulhh_s16 (int16x1_t __a, int16x1_t __b)
 }
 
 __extension__ static __inline int16x1_t __attribute__ ((__always_inline__))
-vqrdmulhh_lane_s16 (int16x1_t __a, int16x8_t __b, const int __c)
+vqrdmulhh_lane_s16 (int16x1_t __a, int16x4_t __b, const int __c)
 {
   return __builtin_aarch64_sqrdmulh_lanehi (__a, __b, __c);
 }
@@ -19823,7 +19817,7 @@  vqrdmulhs_s32 (int32x1_t __a, int32x1_t __b)
 }
 
 __extension__ static __inline int32x1_t __attribute__ ((__always_inline__))
-vqrdmulhs_lane_s32 (int32x1_t __a, int32x4_t __b, const int __c)
+vqrdmulhs_lane_s32 (int32x1_t __a, int32x2_t __b, const int __c)
 {
   return __builtin_aarch64_sqrdmulh_lanesi (__a, __b, __c);
 }
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index bf7b683..5c304bf 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -410,14 +410,15 @@ 
 			(SI   "SI") (HI   "HI")
 			(QI   "QI")])
 
-;; Define container mode for lane selection.
-(define_mode_attr VCOND [(V4HI "V4HI") (V8HI "V4HI")
+;; 64-bit container modes the inner or scalar source mode.
+(define_mode_attr VCOND [(HI "V4HI") (SI "V2SI")
+			 (V4HI "V4HI") (V8HI "V4HI")
 			 (V2SI "V2SI") (V4SI "V2SI")
 			 (DI   "DI") (V2DI "DI")
 			 (V2SF "V2SF") (V4SF "V2SF")
 			 (V2DF "DF")])
 
-;; Define container mode for lane selection.
+;; 128-bit container modes the inner or scalar source mode.
 (define_mode_attr VCONQ [(V8QI "V16QI") (V16QI "V16QI")
 			 (V4HI "V8HI") (V8HI "V8HI")
 			 (V2SI "V4SI") (V4SI "V4SI")
@@ -426,15 +427,6 @@ 
 			 (V2DF "V2DF") (SI   "V4SI")
 			 (HI   "V8HI") (QI   "V16QI")])
 
-;; Define container mode for lane selection.
-(define_mode_attr VCON [(V8QI "V16QI") (V16QI "V16QI")
-			(V4HI "V8HI") (V8HI "V8HI")
-			(V2SI "V4SI") (V4SI "V4SI")
-			(DI   "V2DI") (V2DI "V2DI")
-			(V2SF "V4SF") (V4SF "V4SF")
-			(V2DF "V2DF") (SI   "V4SI")
-			(HI   "V8HI") (QI   "V16QI")])
-
 ;; Half modes of all vector modes.
 (define_mode_attr VHALF [(V8QI "V4QI")  (V16QI "V8QI")
 			 (V4HI "V2HI")  (V8HI  "V4HI")
diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
index aa041cc..782f6d1 100644
--- a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
+++ b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
@@ -387,7 +387,7 @@  test_vqdmlalh_s16 (int32x1_t a, int16x1_t b, int16x1_t c)
 /* { dg-final { scan-assembler-times "\\tsqdmlal\\ts\[0-9\]+, h\[0-9\]+, v" 1 } } */
 
 int32x1_t
-test_vqdmlalh_lane_s16 (int32x1_t a, int16x1_t b, int16x8_t c)
+test_vqdmlalh_lane_s16 (int32x1_t a, int16x1_t b, int16x4_t c)
 {
   return vqdmlalh_lane_s16 (a, b, c, 3);
 }
@@ -403,7 +403,7 @@  test_vqdmlals_s32 (int64x1_t a, int32x1_t b, int32x1_t c)
 /* { dg-final { scan-assembler-times "\\tsqdmlal\\td\[0-9\]+, s\[0-9\]+, v" 1 } } */
 
 int64x1_t
-test_vqdmlals_lane_s32 (int64x1_t a, int32x1_t b, int32x4_t c)
+test_vqdmlals_lane_s32 (int64x1_t a, int32x1_t b, int32x2_t c)
 {
   return vqdmlals_lane_s32 (a, b, c, 1);
 }
@@ -419,7 +419,7 @@  test_vqdmlslh_s16 (int32x1_t a, int16x1_t b, int16x1_t c)
 /* { dg-final { scan-assembler-times "\\tsqdmlsl\\ts\[0-9\]+, h\[0-9\]+, v" 1 } } */
 
 int32x1_t
-test_vqdmlslh_lane_s16 (int32x1_t a, int16x1_t b, int16x8_t c)
+test_vqdmlslh_lane_s16 (int32x1_t a, int16x1_t b, int16x4_t c)
 {
   return vqdmlslh_lane_s16 (a, b, c, 3);
 }
@@ -435,7 +435,7 @@  test_vqdmlsls_s32 (int64x1_t a, int32x1_t b, int32x1_t c)
 /* { dg-final { scan-assembler-times "\\tsqdmlsl\\td\[0-9\]+, s\[0-9\]+, v" 1 } } */
 
 int64x1_t
-test_vqdmlsls_lane_s32 (int64x1_t a, int32x1_t b, int32x4_t c)
+test_vqdmlsls_lane_s32 (int64x1_t a, int32x1_t b, int32x2_t c)
 {
   return vqdmlsls_lane_s32 (a, b, c, 1);
 }
@@ -451,7 +451,7 @@  test_vqdmulhh_s16 (int16x1_t a, int16x1_t b)
 /* { dg-final { scan-assembler-times "\\tsqdmulh\\th\[0-9\]+, h\[0-9\]+, v" 1 } } */
 
 int16x1_t
-test_vqdmulhh_lane_s16 (int16x1_t a, int16x8_t b)
+test_vqdmulhh_lane_s16 (int16x1_t a, int16x4_t b)
 {
   return vqdmulhh_lane_s16 (a, b, 3);
 }
@@ -467,9 +467,9 @@  test_vqdmulhs_s32 (int32x1_t a, int32x1_t b)
 /* { dg-final { scan-assembler-times "\\tsqdmulh\\ts\[0-9\]+, s\[0-9\]+, v" 1 } } */
 
 int32x1_t
-test_vqdmulhs_lane_s32 (int32x1_t a, int32x4_t b)
+test_vqdmulhs_lane_s32 (int32x1_t a, int32x2_t b)
 {
-  return vqdmulhs_lane_s32 (a, b, 3);
+  return vqdmulhs_lane_s32 (a, b, 1);
 }
 
 /* { dg-final { scan-assembler-times "\\tsqdmull\\ts\[0-9\]+, h\[0-9\]+, h\[0-9\]+" 1 } } */
@@ -483,7 +483,7 @@  test_vqdmullh_s16 (int16x1_t a, int16x1_t b)
 /* { dg-final { scan-assembler-times "\\tsqdmull\\ts\[0-9\]+, h\[0-9\]+, v" 1 } } */
 
 int32x1_t
-test_vqdmullh_lane_s16 (int16x1_t a, int16x8_t b)
+test_vqdmullh_lane_s16 (int16x1_t a, int16x4_t b)
 {
   return vqdmullh_lane_s16 (a, b, 3);
 }
@@ -499,7 +499,7 @@  test_vqdmulls_s32 (int32x1_t a, int32x1_t b)
 /* { dg-final { scan-assembler-times "\\tsqdmull\\td\[0-9\]+, s\[0-9\]+, v" 1 } } */
 
 int64x1_t
-test_vqdmulls_lane_s32 (int32x1_t a, int32x4_t b)
+test_vqdmulls_lane_s32 (int32x1_t a, int32x2_t b)
 {
   return vqdmulls_lane_s32 (a, b, 1);
 }
@@ -515,9 +515,9 @@  test_vqrdmulhh_s16 (int16x1_t a, int16x1_t b)
 /* { dg-final { scan-assembler-times "\\tsqrdmulh\\th\[0-9\]+, h\[0-9\]+, v" 1 } } */
 
 int16x1_t
-test_vqrdmulhh_lane_s16 (int16x1_t a, int16x8_t b)
+test_vqrdmulhh_lane_s16 (int16x1_t a, int16x4_t b)
 {
-  return vqrdmulhh_lane_s16 (a, b, 6);
+  return vqrdmulhh_lane_s16 (a, b, 3);
 }
 
 /* { dg-final { scan-assembler-times "\\tsqrdmulh\\ts\[0-9\]+, s\[0-9\]+, s\[0-9\]+" 1 } } */
@@ -531,9 +531,9 @@  test_vqrdmulhs_s32 (int32x1_t a, int32x1_t b)
 /* { dg-final { scan-assembler-times "\\tsqrdmulh\\ts\[0-9\]+, s\[0-9\]+, v" 1 } } */
 
 int32x1_t
-test_vqrdmulhs_lane_s32 (int32x1_t a, int32x4_t b)
+test_vqrdmulhs_lane_s32 (int32x1_t a, int32x2_t b)
 {
-  return vqrdmulhs_lane_s32 (a, b, 2);
+  return vqrdmulhs_lane_s32 (a, b, 1);
 }
 
 /* { dg-final { scan-assembler-times "\\tsuqadd\\tb\[0-9\]+" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_high_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_high_lane_s16.c
new file mode 100644
index 0000000..5ab189e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_high_lane_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlal_high_lane_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqdmlal_high_lane_s16 (int32x4_t a, int16x8_t b, int16x4_t c)
+{
+  return vqdmlal_high_lane_s16 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlal2\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.8\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_high_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_high_lane_s32.c
new file mode 100644
index 0000000..ad39d81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_high_lane_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlal_high_lane_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x2_t
+t_vqdmlal_high_lane_s32 (int64x2_t a, int32x4_t b, int32x2_t c)
+{
+  return vqdmlal_high_lane_s32 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlal2\[ \t\]+\[vV\]\[0-9\]+\.2\[dD\], ?\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_high_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_high_laneq_s16.c
new file mode 100644
index 0000000..5cb2e4a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_high_laneq_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlal_high_laneq_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqdmlal_high_laneq_s16 (int32x4_t a, int16x8_t b, int16x8_t c)
+{
+  return vqdmlal_high_laneq_s16 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlal2\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.8\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_high_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_high_laneq_s32.c
new file mode 100644
index 0000000..981e5f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_high_laneq_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlal_high_laneq_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x2_t
+t_vqdmlal_high_laneq_s32 (int64x2_t a, int32x4_t b, int32x4_t c)
+{
+  return vqdmlal_high_laneq_s32 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlal2\[ \t\]+\[vV\]\[0-9\]+\.2\[dD\], ?\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_lane_s16.c
new file mode 100644
index 0000000..33ea6f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_lane_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlal_lane_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqdmlal_lane_s16 (int32x4_t a, int16x4_t b, int16x4_t c)
+{
+  return vqdmlal_lane_s16 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.4\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_lane_s32.c
new file mode 100644
index 0000000..e2590b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_lane_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlal_lane_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x2_t
+t_vqdmlal_lane_s32 (int64x2_t a, int32x2_t b, int32x2_t c)
+{
+  return vqdmlal_lane_s32 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[vV\]\[0-9\]+\.2\[dD\], ?\[vV\]\[0-9\]+\.2\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_laneq_s16.c
new file mode 100644
index 0000000..fdc8c8d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_laneq_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlal_laneq_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqdmlal_laneq_s16 (int32x4_t a, int16x4_t b, int16x8_t c)
+{
+  return vqdmlal_laneq_s16 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.4\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_laneq_s32.c
new file mode 100644
index 0000000..c16a846
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlal_laneq_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlal_laneq_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x2_t
+t_vqdmlal_laneq_s32 (int64x2_t a, int32x2_t b, int32x4_t c)
+{
+  return vqdmlal_laneq_s32 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[vV\]\[0-9\]+\.2\[dD\], ?\[vV\]\[0-9\]+\.2\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c
new file mode 100644
index 0000000..954b69d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlalh_lane_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlalh_lane_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x1_t
+t_vqdmlalh_lane_s16 (int32x1_t a, int16x1_t b, int16x4_t c)
+{
+  return vqdmlalh_lane_s16 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c
new file mode 100644
index 0000000..e7a6b6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlals_lane_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlals_lane_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x1_t
+t_vqdmlals_lane_s32 (int64x1_t a, int32x1_t b, int32x2_t c)
+{
+  return vqdmlals_lane_s32 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlal\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_high_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_high_lane_s16.c
new file mode 100644
index 0000000..b17e51a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_high_lane_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlsl_high_lane_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqdmlsl_high_lane_s16 (int32x4_t a, int16x8_t b, int16x4_t c)
+{
+  return vqdmlsl_high_lane_s16 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlsl2\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.8\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_high_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_high_lane_s32.c
new file mode 100644
index 0000000..ba399b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_high_lane_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlsl_high_lane_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x2_t
+t_vqdmlsl_high_lane_s32 (int64x2_t a, int32x4_t b, int32x2_t c)
+{
+  return vqdmlsl_high_lane_s32 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlsl2\[ \t\]+\[vV\]\[0-9\]+\.2\[dD\], ?\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_high_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_high_laneq_s16.c
new file mode 100644
index 0000000..eab4732
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_high_laneq_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlsl_high_laneq_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqdmlsl_high_laneq_s16 (int32x4_t a, int16x8_t b, int16x8_t c)
+{
+  return vqdmlsl_high_laneq_s16 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlsl2\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.8\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_high_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_high_laneq_s32.c
new file mode 100644
index 0000000..b926b1a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_high_laneq_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlsl_high_laneq_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x2_t
+t_vqdmlsl_high_laneq_s32 (int64x2_t a, int32x4_t b, int32x4_t c)
+{
+  return vqdmlsl_high_laneq_s32 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlsl2\[ \t\]+\[vV\]\[0-9\]+\.2\[dD\], ?\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_lane_s16.c
new file mode 100644
index 0000000..9ca903c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_lane_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlsl_lane_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqdmlsl_lane_s16 (int32x4_t a, int16x4_t b, int16x4_t c)
+{
+  return vqdmlsl_lane_s16 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.4\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_lane_s32.c
new file mode 100644
index 0000000..5c13fe4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_lane_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlsl_lane_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x2_t
+t_vqdmlsl_lane_s32 (int64x2_t a, int32x2_t b, int32x2_t c)
+{
+  return vqdmlsl_lane_s32 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[vV\]\[0-9\]+\.2\[dD\], ?\[vV\]\[0-9\]+\.2\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_laneq_s32.c
new file mode 100644
index 0000000..4538995
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsl_laneq_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlsl_laneq_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x2_t
+t_vqdmlsl_lane_s32 (int64x2_t a, int32x2_t b, int32x4_t c)
+{
+  return vqdmlsl_laneq_s32 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[vV\]\[0-9\]+\.2\[dD\], ?\[vV\]\[0-9\]+\.2\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c
new file mode 100644
index 0000000..ccebe54
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlslh_lane_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlslh_lane_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x1_t
+t_vqdmlslh_lane_s16 (int32x1_t a, int16x1_t b, int16x4_t c)
+{
+  return vqdmlslh_lane_s16 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c
new file mode 100644
index 0000000..a72aacf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmlsls_lane_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmlsls_lane_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x1_t
+t_vqdmlsls_lane_s32 (int64x1_t a, int32x1_t b, int32x2_t c)
+{
+  return vqdmlsls_lane_s32 (a, b, c, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmlsl\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulh_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulh_laneq_s16.c
new file mode 100644
index 0000000..23255d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulh_laneq_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmulh_laneq_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int16x4_t
+t_vqdmulh_laneq_s16 (int16x4_t a, int16x8_t b)
+{
+  return vqdmulh_laneq_s16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmulh\[ \t\]+\[vV\]\[0-9\]+\.4\[hH\], ?\[vV\]\[0-9\]+\.4\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulh_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulh_laneq_s32.c
new file mode 100644
index 0000000..2aac35d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulh_laneq_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmulh_laneq_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x2_t
+t_vqdmulh_laneq_s32 (int32x2_t a, int32x4_t b)
+{
+  return vqdmulh_laneq_s32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmulh\[ \t\]+\[vV\]\[0-9\]+\.2\[sS\], ?\[vV\]\[0-9\]+\.2\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulhh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulhh_lane_s16.c
new file mode 100644
index 0000000..4779e36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulhh_lane_s16.c
@@ -0,0 +1,36 @@ 
+/* Test the vqdmulhh_lane_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+#include <stdio.h>
+
+extern void abort (void);
+
+int
+main (void)
+{
+  int16_t arg1;
+  int16x4_t arg2;
+  int16_t result;
+  int16_t actual;
+  int16_t expected;
+
+  arg1 = -32768;
+  arg2 = vcreate_s16 (0x0000ffff2489e398ULL);
+  actual = vqdmulhh_lane_s16 (arg1, arg2, 2);
+  expected = 1;
+
+  if (expected != actual)
+    {
+      fprintf (stderr, "Expected: %xd, got %xd\n", expected, actual);
+      abort ();
+    }
+
+  return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "sqdmulh\[ \t\]+\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[2\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulhq_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulhq_laneq_s16.c
new file mode 100644
index 0000000..ff654b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulhq_laneq_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmulhq_laneq_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int16x8_t
+t_vqdmulhq_laneq_s16 (int16x8_t a, int16x8_t b)
+{
+  return vqdmulhq_laneq_s16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmulh\[ \t\]+\[vV\]\[0-9\]+\.8\[hH\], ?\[vV\]\[0-9\]+\.8\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulhq_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulhq_laneq_s32.c
new file mode 100644
index 0000000..bc88245
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulhq_laneq_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmulhq_laneq_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqdmulhq_laneq_s32 (int32x4_t a, int32x4_t b)
+{
+  return vqdmulhq_laneq_s32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmulh\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulhs_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulhs_lane_s32.c
new file mode 100644
index 0000000..9c27f5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulhs_lane_s32.c
@@ -0,0 +1,34 @@ 
+/* Test the vqdmulhs_lane_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+#include <stdio.h>
+
+extern void abort (void);
+
+int
+main (void)
+{
+  int32_t arg1;
+  int32x2_t arg2;
+  int32_t result;
+  int32_t actual;
+  int32_t expected;
+
+  arg1 = 57336;
+  arg2 = vcreate_s32 (0x55897fff7fff0000ULL);
+  actual = vqdmulhs_lane_s32 (arg1, arg2, 0);
+  expected = 57334;
+
+  if (expected != actual)
+    {
+      fprintf (stderr, "Expected: %xd, got %xd\n", expected, actual);
+      abort ();
+    }
+
+  return 0;
+}
+/* { dg-final { scan-assembler-times "sqdmulh\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_high_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_high_lane_s16.c
new file mode 100644
index 0000000..0bfad16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_high_lane_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmull_high_lane_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqdmull_high_lane_s16 (int16x8_t a, int16x4_t b)
+{
+  return vqdmull_high_lane_s16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmull2\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.8\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_high_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_high_lane_s32.c
new file mode 100644
index 0000000..94227ad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_high_lane_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmull_high_lane_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x2_t
+t_vqdmull_high_lane_s32 (int32x4_t a, int32x2_t b)
+{
+  return vqdmull_high_lane_s32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmull2\[ \t\]+\[vV\]\[0-9\]+\.2\[dD\], ?\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_high_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_high_laneq_s16.c
new file mode 100644
index 0000000..393ad98
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_high_laneq_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmull_high_laneq_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqdmull_high_laneq_s16 (int16x8_t a, int16x8_t b)
+{
+  return vqdmull_high_laneq_s16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmull2\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.8\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_high_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_high_laneq_s32.c
new file mode 100644
index 0000000..0a3f48f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_high_laneq_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmull_high_laneq_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x2_t
+t_vqdmull_high_laneq_s32 (int32x4_t a, int32x4_t b)
+{
+  return vqdmull_high_laneq_s32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmull2\[ \t\]+\[vV\]\[0-9\]+\.2\[dD\], ?\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_lane_s16.c
new file mode 100644
index 0000000..39f7262
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_lane_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmull_lane_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqdmull_lane_s16 (int16x4_t a, int16x4_t b)
+{
+  return vqdmull_lane_s16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.4\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_lane_s32.c
new file mode 100644
index 0000000..8ae7d7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_lane_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmull_lane_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x2_t
+t_vqdmull_lane_s32 (int32x2_t a, int32x2_t b)
+{
+  return vqdmull_lane_s32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[vV\]\[0-9\]+\.2\[dD\], ?\[vV\]\[0-9\]+\.2\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_laneq_s16.c
new file mode 100644
index 0000000..3d87352
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_laneq_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmull_laneq_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqdmull_laneq_s16 (int16x4_t a, int16x8_t b)
+{
+  return vqdmull_laneq_s16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.4\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_laneq_s32.c
new file mode 100644
index 0000000..bc35d14
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmull_laneq_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmull_laneq_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x2_t
+t_vqdmull_laneq_s32 (int32x2_t a, int32x4_t b)
+{
+  return vqdmull_laneq_s32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[vV\]\[0-9\]+\.2\[dD\], ?\[vV\]\[0-9\]+\.2\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c
new file mode 100644
index 0000000..94dd0c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_lane_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmullh_lane_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x1_t
+t_vqdmullh_lane_s16 (int16x1_t a, int16x4_t b)
+{
+  return vqdmullh_lane_s16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c
new file mode 100644
index 0000000..9ac7ee7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqdmulls_lane_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int64x1_t
+t_vqdmulls_lane_s32 (int32x1_t a, int32x2_t b)
+{
+  return vqdmulls_lane_s32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqdmull\[ \t\]+\[dD\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulh_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulh_laneq_s16.c
new file mode 100644
index 0000000..7d03619
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulh_laneq_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqrdmulh_laneq_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int16x4_t
+t_vqrdmulh_laneq_s16 (int16x4_t a, int16x8_t b)
+{
+  return vqrdmulh_laneq_s16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqrdmulh\[ \t\]+\[vV\]\[0-9\]+\.4\[hH\], ?\[vV\]\[0-9\]+\.4\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulh_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulh_laneq_s32.c
new file mode 100644
index 0000000..ac5da9c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulh_laneq_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqrdmulh_laneq_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x2_t
+t_vqrdmulh_laneq_s32 (int32x2_t a, int32x4_t b)
+{
+  return vqrdmulh_laneq_s32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqrdmulh\[ \t\]+\[vV\]\[0-9\]+\.2\[sS\], ?\[vV\]\[0-9\]+\.2\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhh_lane_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhh_lane_s16.c
new file mode 100644
index 0000000..afa3e36
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhh_lane_s16.c
@@ -0,0 +1,35 @@ 
+/* Test the vqrdmulhh_lane_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+#include <stdio.h>
+
+extern void abort (void);
+
+int
+main (void)
+{
+  int16_t arg1;
+  int16x4_t arg2;
+  int16_t result;
+  int16_t actual;
+  int16_t expected;
+
+  arg1 = -32768;
+  arg2 = vcreate_s16 (0xd78e000005d78000ULL);
+  actual = vqrdmulhh_lane_s16 (arg1, arg2, 3);
+  expected = 10354;
+
+  if (expected != actual)
+    {
+      fprintf (stderr, "Expected: %xd, got %xd\n", expected, actual);
+      abort ();
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "sqrdmulh\[ \t\]+\[hH\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[3\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhq_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhq_laneq_s16.c
new file mode 100644
index 0000000..ec5434b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhq_laneq_s16.c
@@ -0,0 +1,15 @@ 
+/* Test the vqrdmulhq_laneq_s16 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int16x8_t
+t_vqrdmulhq_laneq_s16 (int16x8_t a, int16x8_t b)
+{
+  return vqrdmulhq_laneq_s16 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqrdmulh\[ \t\]+\[vV\]\[0-9\]+\.8\[hH\], ?\[vV\]\[0-9\]+\.8\[hH\], ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhq_laneq_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhq_laneq_s32.c
new file mode 100644
index 0000000..b2013f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhq_laneq_s32.c
@@ -0,0 +1,15 @@ 
+/* Test the vqrdmulhq_laneq_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+
+int32x4_t
+t_vqrdmulhq_laneq_s32 (int32x4_t a, int32x4_t b)
+{
+  return vqrdmulhq_laneq_s32 (a, b, 0);
+}
+
+/* { dg-final { scan-assembler-times "sqrdmulh\[ \t\]+\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.4\[sS\], ?\[vV\]\[0-9\]+\.\[sS\]\\\[0\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhs_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhs_lane_s32.c
new file mode 100644
index 0000000..83d2ba2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vqrdmulhs_lane_s32.c
@@ -0,0 +1,35 @@ 
+/* Test the vqrdmulhs_lane_s32 AArch64 SIMD intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+
+#include "arm_neon.h"
+#include <stdio.h>
+
+extern void abort (void);
+
+int
+main (void)
+{
+  int32_t arg1;
+  int32x2_t arg2;
+  int32_t result;
+  int32_t actual;
+  int32_t expected;
+
+  arg1 = -2099281921;
+  arg2 = vcreate_s32 (0x000080007fff0000ULL);
+  actual = vqrdmulhs_lane_s32 (arg1, arg2, 1);
+  expected = -32033;
+
+  if (expected != actual)
+    {
+      fprintf (stderr, "Expected: %xd, got %xd\n", expected, actual);
+      abort ();
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "sqrdmulh\[ \t\]+\[sS\]\[0-9\]+, ?\[sS\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[sS\]\\\[1\\\]\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vector_intrinsics.c b/gcc/testsuite/gcc.target/aarch64/vector_intrinsics.c
index affb8a8..52b0496 100644
--- a/gcc/testsuite/gcc.target/aarch64/vector_intrinsics.c
+++ b/gcc/testsuite/gcc.target/aarch64/vector_intrinsics.c
@@ -1,7 +1,7 @@ 
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
 
-#include "../../../config/aarch64/arm_neon.h"
+#include "arm_neon.h"
 
 
 /* { dg-final { scan-assembler-times "\\tfmax\\tv\[0-9\]+\.2s, v\[0-9\].2s, v\[0-9\].2s" 1 } } */
@@ -305,7 +305,7 @@  test_vqdmlal_high_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c)
 /* { dg-final { scan-assembler-times "\\tsqdmlal2\\tv\[0-9\]+\.4s, v\[0-9\]+\.8h, v\[0-9\]+\.h" 3 } } */
 
 int32x4_t
-test_vqdmlal_high_lane_s16 (int32x4_t a, int16x8_t b, int16x8_t c)
+test_vqdmlal_high_lane_s16 (int32x4_t a, int16x8_t b, int16x4_t c)
 {
   return vqdmlal_high_lane_s16 (a, b, c, 3);
 }
@@ -361,7 +361,7 @@  test_vqdmlal_high_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c)
 /* { dg-final { scan-assembler-times "\\tsqdmlal2\\tv\[0-9\]+\.2d, v\[0-9\]+\.4s, v\[0-9\]+\.s" 3 } } */
 
 int64x2_t
-test_vqdmlal_high_lane_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c)
+test_vqdmlal_high_lane_s32 (int64x2_t __a, int32x4_t __b, int32x2_t __c)
 {
   return vqdmlal_high_lane_s32 (__a, __b, __c, 1);
 }
@@ -417,7 +417,7 @@  test_vqdmlsl_high_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c)
 /* { dg-final { scan-assembler-times "\\tsqdmlsl2\\tv\[0-9\]+\.4s, v\[0-9\]+\.8h, v\[0-9\]+\.h" 3 } } */
 
 int32x4_t
-test_vqdmlsl_high_lane_s16 (int32x4_t a, int16x8_t b, int16x8_t c)
+test_vqdmlsl_high_lane_s16 (int32x4_t a, int16x8_t b, int16x4_t c)
 {
   return vqdmlsl_high_lane_s16 (a, b, c, 3);
 }
@@ -473,7 +473,7 @@  test_vqdmlsl_high_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c)
 /* { dg-final { scan-assembler-times "\\tsqdmlsl2\\tv\[0-9\]+\.2d, v\[0-9\]+\.4s, v\[0-9\]+\.s" 3 } } */
 
 int64x2_t
-test_vqdmlsl_high_lane_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c)
+test_vqdmlsl_high_lane_s32 (int64x2_t __a, int32x4_t __b, int32x2_t __c)
 {
   return vqdmlsl_high_lane_s32 (__a, __b, __c, 1);
 }
@@ -529,7 +529,7 @@  test_vqdmull_high_s16 (int16x8_t __a, int16x8_t __b)
 /* { dg-final { scan-assembler-times "\\tsqdmull2\\tv\[0-9\]+\.4s, v\[0-9\]+\.8h, v\[0-9\]+\.h" 3 } } */
 
 int32x4_t
-test_vqdmull_high_lane_s16 (int16x8_t a, int16x8_t b)
+test_vqdmull_high_lane_s16 (int16x8_t a, int16x4_t b)
 {
   return vqdmull_high_lane_s16 (a, b, 3);
 }
@@ -585,7 +585,7 @@  test_vqdmull_high_s32 (int32x4_t __a, int32x4_t __b)
 /* { dg-final { scan-assembler-times "\\tsqdmull2\\tv\[0-9\]+\.2d, v\[0-9\]+\.4s, v\[0-9\]+\.s" 3 } } */
 
 int64x2_t
-test_vqdmull_high_lane_s32 (int32x4_t __a, int32x4_t __b)
+test_vqdmull_high_lane_s32 (int32x4_t __a, int32x2_t __b)
 {
   return vqdmull_high_lane_s32 (__a, __b, 1);
 }