diff mbox

[AArch64] Fix vqtb[lx][234] on big-endian

Message ID CAKdteOYo8LvG82w3v9WCw3_DHBrUBYdVjD6oCp5N7cjeC6=S7Q@mail.gmail.com
State New
Headers show

Commit Message

Christophe Lyon Nov. 6, 2015, 1:49 p.m. UTC
Hi,

As mentioned by James a few weeks ago, the vqtbl[lx][234] intrinsics
are failing on aarch64_be.

The attached patch fixes them, and rewrites them using new builtins
instead of inline assembly.

I wondered about the names of the new builtins, I hope I got them
right: qtbl3, qtbl4, qtbx3, qtbx4 with v8qi and v16qi modes.

I have modified the existing aarch64_tbl3v8qi and aarch64_tbx4v8qi to
use <mode> and share the code with the v16qi variants.

In arm_neon.h, I moved the rewritten intrinsics to the bottom of the
file, in alphabetical order, although the comment says "Start of
optimal implementations in approved order": the previous ones really
seem to be in alphabetical order.

And I added a new testcase, skipped for arm* targets.

This has been tested on aarch64-none-elf and aarch64_be-none-elf
targets, using the Foundation model.

OK?

Christophe.
2015-11-06  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/testsuite/
	* gcc.target/aarch64/advsimd-intrinsics/vqtbX.c: New test.

	gcc/
	* config/aarch64/aarch64-simd-builtins.def: Update builtins
	tables: add tbl3v16qi, qtbl[34]*, tbx4v16qi, qtbx[34]*.
	* config/aarch64/aarch64-simd.md (aarch64_tbl3v8qi): Rename to...
	(aarch64_tbl3<mode>) ... this, which supports v16qi too.
	(aarch64_tbx4v8qi): Rename to...
	aarch64_tbx4<mode>): ... this.
	(aarch64_qtbl3<mode>): New pattern.
	(aarch64_qtbx3<mode>): New pattern.
	(aarch64_qtbl4<mode>): New pattern.
	(aarch64_qtbx4<mode>): New pattern.
	* config/aarch64/arm_neon.h (vqtbl2_s8, vqtbl2_u8, vqtbl2_p8)
	(vqtbl2q_s8, vqtbl2q_u8, vqtbl2q_p8, vqtbl3_s8, vqtbl3_u8)
	(vqtbl3_p8, vqtbl3q_s8, vqtbl3q_u8, vqtbl3q_p8, vqtbl4_s8)
	(vqtbl4_u8, vqtbl4_p8, vqtbl4q_s8, vqtbl4q_u8, vqtbl4q_p8)
	(vqtbx2_s8, vqtbx2_u8, vqtbx2_p8, vqtbx2q_s8, vqtbx2q_u8)
	(vqtbx2q_p8, vqtbx3_s8, vqtbx3_u8, vqtbx3_p8, vqtbx3q_s8)
	(vqtbx3q_u8, vqtbx3q_p8, vqtbx4_s8, vqtbx4_u8, vqtbx4_p8)
	(vqtbx4q_s8, vqtbx4q_u8, vqtbx4q_p8): Rewrite using builtin
	functions.
commit dedb311cc98bccd1633b77b60362e97dc8b9ce51
Author: Christophe Lyon <christophe.lyon@linaro.org>
Date:   Thu Nov 5 22:40:09 2015 +0100

    [AArch64] Fix vqtb[lx]X[q] on big-endian.

Comments

James Greenhalgh Nov. 6, 2015, 5:03 p.m. UTC | #1
On Fri, Nov 06, 2015 at 02:49:38PM +0100, Christophe Lyon wrote:
> Hi,
> 
> As mentioned by James a few weeks ago, the vqtbl[lx][234] intrinsics
> are failing on aarch64_be.
> 
> The attached patch fixes them, and rewrites them using new builtins
> instead of inline assembly.
> 
> I wondered about the names of the new builtins, I hope I got them
> right: qtbl3, qtbl4, qtbx3, qtbx4 with v8qi and v16qi modes.
> 
> I have modified the existing aarch64_tbl3v8qi and aarch64_tbx4v8qi to
> use <mode> and share the code with the v16qi variants.
> 
> In arm_neon.h, I moved the rewritten intrinsics to the bottom of the
> file, in alphabetical order, although the comment says "Start of
> optimal implementations in approved order": the previous ones really
> seem to be in alphabetical order.
> 
> And I added a new testcase, skipped for arm* targets.
> 
> This has been tested on aarch64-none-elf and aarch64_be-none-elf
> targets, using the Foundation model.
> 
> OK?

Hi Christophe,

Thanks for this. With this patch I think we can finally say that
aarch64_be Neon intrinsics are in as good a state as aarch64 Neon
intrinsics. On our internal testsuite the pass rate is now equivalent
between the two. I'm very grateful for your work in this area!

This patch is OK for trunk.

Thanks again,
James

> 
> Christophe.

> 2015-11-06  Christophe Lyon  <christophe.lyon@linaro.org>
> 
> 	gcc/testsuite/
> 	* gcc.target/aarch64/advsimd-intrinsics/vqtbX.c: New test.
> 
> 	gcc/
> 	* config/aarch64/aarch64-simd-builtins.def: Update builtins
> 	tables: add tbl3v16qi, qtbl[34]*, tbx4v16qi, qtbx[34]*.
> 	* config/aarch64/aarch64-simd.md (aarch64_tbl3v8qi): Rename to...
> 	(aarch64_tbl3<mode>) ... this, which supports v16qi too.
> 	(aarch64_tbx4v8qi): Rename to...
> 	aarch64_tbx4<mode>): ... this.
> 	(aarch64_qtbl3<mode>): New pattern.
> 	(aarch64_qtbx3<mode>): New pattern.
> 	(aarch64_qtbl4<mode>): New pattern.
> 	(aarch64_qtbx4<mode>): New pattern.
> 	* config/aarch64/arm_neon.h (vqtbl2_s8, vqtbl2_u8, vqtbl2_p8)
> 	(vqtbl2q_s8, vqtbl2q_u8, vqtbl2q_p8, vqtbl3_s8, vqtbl3_u8)
> 	(vqtbl3_p8, vqtbl3q_s8, vqtbl3q_u8, vqtbl3q_p8, vqtbl4_s8)
> 	(vqtbl4_u8, vqtbl4_p8, vqtbl4q_s8, vqtbl4q_u8, vqtbl4q_p8)
> 	(vqtbx2_s8, vqtbx2_u8, vqtbx2_p8, vqtbx2q_s8, vqtbx2q_u8)
> 	(vqtbx2q_p8, vqtbx3_s8, vqtbx3_u8, vqtbx3_p8, vqtbx3q_s8)
> 	(vqtbx3q_u8, vqtbx3q_p8, vqtbx4_s8, vqtbx4_u8, vqtbx4_p8)
> 	(vqtbx4q_s8, vqtbx4q_u8, vqtbx4q_p8): Rewrite using builtin
> 	functions.
Christophe Lyon Nov. 6, 2015, 8:37 p.m. UTC | #2
On 6 November 2015 at 18:03, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> On Fri, Nov 06, 2015 at 02:49:38PM +0100, Christophe Lyon wrote:
>> Hi,
>>
>> As mentioned by James a few weeks ago, the vqtbl[lx][234] intrinsics
>> are failing on aarch64_be.
>>
>> The attached patch fixes them, and rewrites them using new builtins
>> instead of inline assembly.
>>
>> I wondered about the names of the new builtins, I hope I got them
>> right: qtbl3, qtbl4, qtbx3, qtbx4 with v8qi and v16qi modes.
>>
>> I have modified the existing aarch64_tbl3v8qi and aarch64_tbx4v8qi to
>> use <mode> and share the code with the v16qi variants.
>>
>> In arm_neon.h, I moved the rewritten intrinsics to the bottom of the
>> file, in alphabetical order, although the comment says "Start of
>> optimal implementations in approved order": the previous ones really
>> seem to be in alphabetical order.
>>
>> And I added a new testcase, skipped for arm* targets.
>>
>> This has been tested on aarch64-none-elf and aarch64_be-none-elf
>> targets, using the Foundation model.
>>
>> OK?
>
> Hi Christophe,
>
> Thanks for this. With this patch I think we can finally say that
> aarch64_be Neon intrinsics are in as good a state as aarch64 Neon
> intrinsics. On our internal testsuite the pass rate is now equivalent
> between the two. I'm very grateful for your work in this area!

Thanks for the quick review, committed as r229886.

We are still missing many tests for most of the armv8 intrinsics.
A significant effort, apparently not worth it since you say your
internal testsuite is now clean.

Actually, you say the pass rate is equivalent on little and
big-endian: does it mean that it not 100%?


>
> This patch is OK for trunk.
>
> Thanks again,
> James
>
>>
>> Christophe.
>
>> 2015-11-06  Christophe Lyon  <christophe.lyon@linaro.org>
>>
>>       gcc/testsuite/
>>       * gcc.target/aarch64/advsimd-intrinsics/vqtbX.c: New test.
>>
>>       gcc/
>>       * config/aarch64/aarch64-simd-builtins.def: Update builtins
>>       tables: add tbl3v16qi, qtbl[34]*, tbx4v16qi, qtbx[34]*.
>>       * config/aarch64/aarch64-simd.md (aarch64_tbl3v8qi): Rename to...
>>       (aarch64_tbl3<mode>) ... this, which supports v16qi too.
>>       (aarch64_tbx4v8qi): Rename to...
>>       aarch64_tbx4<mode>): ... this.
>>       (aarch64_qtbl3<mode>): New pattern.
>>       (aarch64_qtbx3<mode>): New pattern.
>>       (aarch64_qtbl4<mode>): New pattern.
>>       (aarch64_qtbx4<mode>): New pattern.
>>       * config/aarch64/arm_neon.h (vqtbl2_s8, vqtbl2_u8, vqtbl2_p8)
>>       (vqtbl2q_s8, vqtbl2q_u8, vqtbl2q_p8, vqtbl3_s8, vqtbl3_u8)
>>       (vqtbl3_p8, vqtbl3q_s8, vqtbl3q_u8, vqtbl3q_p8, vqtbl4_s8)
>>       (vqtbl4_u8, vqtbl4_p8, vqtbl4q_s8, vqtbl4q_u8, vqtbl4q_p8)
>>       (vqtbx2_s8, vqtbx2_u8, vqtbx2_p8, vqtbx2q_s8, vqtbx2q_u8)
>>       (vqtbx2q_p8, vqtbx3_s8, vqtbx3_u8, vqtbx3_p8, vqtbx3q_s8)
>>       (vqtbx3q_u8, vqtbx3q_p8, vqtbx4_s8, vqtbx4_u8, vqtbx4_p8)
>>       (vqtbx4q_s8, vqtbx4q_u8, vqtbx4q_p8): Rewrite using builtin
>>       functions.
>
James Greenhalgh Nov. 7, 2015, 8:44 a.m. UTC | #3
On Fri, Nov 06, 2015 at 09:37:17PM +0100, Christophe Lyon wrote:
> On 6 November 2015 at 18:03, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> > On Fri, Nov 06, 2015 at 02:49:38PM +0100, Christophe Lyon wrote:
> >> Hi,
> >>
> >> As mentioned by James a few weeks ago, the vqtbl[lx][234] intrinsics
> >> are failing on aarch64_be.
> >>
> >> The attached patch fixes them, and rewrites them using new builtins
> >> instead of inline assembly.
> >>
> >> I wondered about the names of the new builtins, I hope I got them
> >> right: qtbl3, qtbl4, qtbx3, qtbx4 with v8qi and v16qi modes.
> >>
> >> I have modified the existing aarch64_tbl3v8qi and aarch64_tbx4v8qi to
> >> use <mode> and share the code with the v16qi variants.
> >>
> >> In arm_neon.h, I moved the rewritten intrinsics to the bottom of the
> >> file, in alphabetical order, although the comment says "Start of
> >> optimal implementations in approved order": the previous ones really
> >> seem to be in alphabetical order.
> >>
> >> And I added a new testcase, skipped for arm* targets.
> >>
> >> This has been tested on aarch64-none-elf and aarch64_be-none-elf
> >> targets, using the Foundation model.
> >>
> >> OK?
> >
> > Hi Christophe,
> >
> > Thanks for this. With this patch I think we can finally say that
> > aarch64_be Neon intrinsics are in as good a state as aarch64 Neon
> > intrinsics. On our internal testsuite the pass rate is now equivalent
> > between the two. I'm very grateful for your work in this area!
> 
> Thanks for the quick review, committed as r229886.
> 
> We are still missing many tests for most of the armv8 intrinsics.
> A significant effort, apparently not worth it since you say your
> internal testsuite is now clean.

The internal testsuiite is of no use to the rest community and is
unlikely to be feasible to submit upstream, so I wouldn't write off
extending the (excellent) set of GCC tests you've been adding so far
as "not worth it". Certainly they were a big help for the big-endian
work.

> Actually, you say the pass rate is equivalent on little and
> big-endian: does it mean that it not 100%?

Yes, I picked my words carefully :-)

The remaining failures are missing intrinsics and conformance issues when
the intrinsics are combined and folded. For an idea of what is missing,
take a look at the LLVM test-suite I pointed you at a few weeks ago:

    <llvm-testsuite>/SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c

I'll try to get some of the "folding" examples in to the upstream
bugzilla - generally they are issues where the semantics of the
intrinsic are well defined for signed overflow, but our use of C
constructs means the midend considers signed overflow undefined, and
performs more aggressive optimisation.

Thanks,
James

> >
> > This patch is OK for trunk.
> >
> > Thanks again,
> > James
> >
> >>
> >> Christophe.
> >
> >> 2015-11-06  Christophe Lyon  <christophe.lyon@linaro.org>
> >>
> >>       gcc/testsuite/
> >>       * gcc.target/aarch64/advsimd-intrinsics/vqtbX.c: New test.
> >>
> >>       gcc/
> >>       * config/aarch64/aarch64-simd-builtins.def: Update builtins
> >>       tables: add tbl3v16qi, qtbl[34]*, tbx4v16qi, qtbx[34]*.
> >>       * config/aarch64/aarch64-simd.md (aarch64_tbl3v8qi): Rename to...
> >>       (aarch64_tbl3<mode>) ... this, which supports v16qi too.
> >>       (aarch64_tbx4v8qi): Rename to...
> >>       aarch64_tbx4<mode>): ... this.
> >>       (aarch64_qtbl3<mode>): New pattern.
> >>       (aarch64_qtbx3<mode>): New pattern.
> >>       (aarch64_qtbl4<mode>): New pattern.
> >>       (aarch64_qtbx4<mode>): New pattern.
> >>       * config/aarch64/arm_neon.h (vqtbl2_s8, vqtbl2_u8, vqtbl2_p8)
> >>       (vqtbl2q_s8, vqtbl2q_u8, vqtbl2q_p8, vqtbl3_s8, vqtbl3_u8)
> >>       (vqtbl3_p8, vqtbl3q_s8, vqtbl3q_u8, vqtbl3q_p8, vqtbl4_s8)
> >>       (vqtbl4_u8, vqtbl4_p8, vqtbl4q_s8, vqtbl4q_u8, vqtbl4q_p8)
> >>       (vqtbx2_s8, vqtbx2_u8, vqtbx2_p8, vqtbx2q_s8, vqtbx2q_u8)
> >>       (vqtbx2q_p8, vqtbx3_s8, vqtbx3_u8, vqtbx3_p8, vqtbx3q_s8)
> >>       (vqtbx3q_u8, vqtbx3q_p8, vqtbx4_s8, vqtbx4_u8, vqtbx4_p8)
> >>       (vqtbx4q_s8, vqtbx4q_u8, vqtbx4q_p8): Rewrite using builtin
> >>       functions.
> >
>
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 654e963..594fc33 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -407,8 +407,26 @@ 
   VAR1 (BINOPP, crypto_pmull, 0, di)
   VAR1 (BINOPP, crypto_pmull, 0, v2di)
 
-  /* Implemented by aarch64_tbl3v8qi.  */
+  /* Implemented by aarch64_tbl3<mode>.  */
   VAR1 (BINOP, tbl3, 0, v8qi)
+  VAR1 (BINOP, tbl3, 0, v16qi)
 
-  /* Implemented by aarch64_tbx4v8qi.  */
+  /* Implemented by aarch64_qtbl3<mode>.  */
+  VAR1 (BINOP, qtbl3, 0, v8qi)
+  VAR1 (BINOP, qtbl3, 0, v16qi)
+
+  /* Implemented by aarch64_qtbl4<mode>.  */
+  VAR1 (BINOP, qtbl4, 0, v8qi)
+  VAR1 (BINOP, qtbl4, 0, v16qi)
+
+  /* Implemented by aarch64_tbx4<mode>.  */
   VAR1 (TERNOP, tbx4, 0, v8qi)
+  VAR1 (TERNOP, tbx4, 0, v16qi)
+
+  /* Implemented by aarch64_qtbx3<mode>.  */
+  VAR1 (TERNOP, qtbx3, 0, v8qi)
+  VAR1 (TERNOP, qtbx3, 0, v16qi)
+
+  /* Implemented by aarch64_qtbx4<mode>.  */
+  VAR1 (TERNOP, qtbx4, 0, v8qi)
+  VAR1 (TERNOP, qtbx4, 0, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 65a2b6f..f330300 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4777,24 +4777,70 @@ 
   [(set_attr "type" "neon_tbl2_q")]
 )
 
-(define_insn "aarch64_tbl3v8qi"
-  [(set (match_operand:V8QI 0 "register_operand" "=w")
-	(unspec:V8QI [(match_operand:OI 1 "register_operand" "w")
-		      (match_operand:V8QI 2 "register_operand" "w")]
+(define_insn "aarch64_tbl3<mode>"
+  [(set (match_operand:VB 0 "register_operand" "=w")
+	(unspec:VB [(match_operand:OI 1 "register_operand" "w")
+		      (match_operand:VB 2 "register_operand" "w")]
 		      UNSPEC_TBL))]
   "TARGET_SIMD"
-  "tbl\\t%S0.8b, {%S1.16b - %T1.16b}, %S2.8b"
+  "tbl\\t%S0.<Vbtype>, {%S1.16b - %T1.16b}, %S2.<Vbtype>"
   [(set_attr "type" "neon_tbl3")]
 )
 
-(define_insn "aarch64_tbx4v8qi"
-  [(set (match_operand:V8QI 0 "register_operand" "=w")
-	(unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0")
+(define_insn "aarch64_tbx4<mode>"
+  [(set (match_operand:VB 0 "register_operand" "=w")
+	(unspec:VB [(match_operand:VB 1 "register_operand" "0")
 		      (match_operand:OI 2 "register_operand" "w")
-		      (match_operand:V8QI 3 "register_operand" "w")]
+		      (match_operand:VB 3 "register_operand" "w")]
+		      UNSPEC_TBX))]
+  "TARGET_SIMD"
+  "tbx\\t%S0.<Vbtype>, {%S2.16b - %T2.16b}, %S3.<Vbtype>"
+  [(set_attr "type" "neon_tbl4")]
+)
+
+;; Three source registers.
+
+(define_insn "aarch64_qtbl3<mode>"
+  [(set (match_operand:VB 0 "register_operand" "=w")
+	(unspec:VB [(match_operand:CI 1 "register_operand" "w")
+		      (match_operand:VB 2 "register_operand" "w")]
+		      UNSPEC_TBL))]
+  "TARGET_SIMD"
+  "tbl\\t%S0.<Vbtype>, {%S1.16b - %U1.16b}, %S2.<Vbtype>"
+  [(set_attr "type" "neon_tbl3")]
+)
+
+(define_insn "aarch64_qtbx3<mode>"
+  [(set (match_operand:VB 0 "register_operand" "=w")
+	(unspec:VB [(match_operand:VB 1 "register_operand" "0")
+		      (match_operand:CI 2 "register_operand" "w")
+		      (match_operand:VB 3 "register_operand" "w")]
+		      UNSPEC_TBX))]
+  "TARGET_SIMD"
+  "tbx\\t%S0.<Vbtype>, {%S2.16b - %U2.16b}, %S3.<Vbtype>"
+  [(set_attr "type" "neon_tbl3")]
+)
+
+;; Four source registers.
+
+(define_insn "aarch64_qtbl4<mode>"
+  [(set (match_operand:VB 0 "register_operand" "=w")
+	(unspec:VB [(match_operand:XI 1 "register_operand" "w")
+		      (match_operand:VB 2 "register_operand" "w")]
+		      UNSPEC_TBL))]
+  "TARGET_SIMD"
+  "tbl\\t%S0.<Vbtype>, {%S1.16b - %V1.16b}, %S2.<Vbtype>"
+  [(set_attr "type" "neon_tbl4")]
+)
+
+(define_insn "aarch64_qtbx4<mode>"
+  [(set (match_operand:VB 0 "register_operand" "=w")
+	(unspec:VB [(match_operand:VB 1 "register_operand" "0")
+		      (match_operand:XI 2 "register_operand" "w")
+		      (match_operand:VB 3 "register_operand" "w")]
 		      UNSPEC_TBX))]
   "TARGET_SIMD"
-  "tbx\\t%S0.8b, {%S2.16b - %T2.16b}, %S3.8b"
+  "tbx\\t%S0.<Vbtype>, {%S2.16b - %V2.16b}, %S3.<Vbtype>"
   [(set_attr "type" "neon_tbl4")]
 )
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index e186348..039e777 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -10691,224 +10691,6 @@  vqtbl1q_u8 (uint8x16_t a, uint8x16_t b)
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vqtbl2_s8 (int8x16x2_t tab, uint8x8_t idx)
-{
-  int8x8_t result;
-  __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
-	   "tbl %0.8b, {v16.16b, v17.16b}, %2.8b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17");
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vqtbl2_u8 (uint8x16x2_t tab, uint8x8_t idx)
-{
-  uint8x8_t result;
-  __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
-	   "tbl %0.8b, {v16.16b, v17.16b}, %2.8b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17");
-  return result;
-}
-
-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vqtbl2_p8 (poly8x16x2_t tab, uint8x8_t idx)
-{
-  poly8x8_t result;
-  __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
-	   "tbl %0.8b, {v16.16b, v17.16b}, %2.8b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17");
-  return result;
-}
-
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vqtbl2q_s8 (int8x16x2_t tab, uint8x16_t idx)
-{
-  int8x16_t result;
-  __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
-	   "tbl %0.16b, {v16.16b, v17.16b}, %2.16b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17");
-  return result;
-}
-
-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vqtbl2q_u8 (uint8x16x2_t tab, uint8x16_t idx)
-{
-  uint8x16_t result;
-  __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
-	   "tbl %0.16b, {v16.16b, v17.16b}, %2.16b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17");
-  return result;
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vqtbl2q_p8 (poly8x16x2_t tab, uint8x16_t idx)
-{
-  poly8x16_t result;
-  __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
-	   "tbl %0.16b, {v16.16b, v17.16b}, %2.16b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17");
-  return result;
-}
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vqtbl3_s8 (int8x16x3_t tab, uint8x8_t idx)
-{
-  int8x8_t result;
-  __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
-	   "tbl %0.8b, {v16.16b - v18.16b}, %2.8b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18");
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vqtbl3_u8 (uint8x16x3_t tab, uint8x8_t idx)
-{
-  uint8x8_t result;
-  __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
-	   "tbl %0.8b, {v16.16b - v18.16b}, %2.8b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18");
-  return result;
-}
-
-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vqtbl3_p8 (poly8x16x3_t tab, uint8x8_t idx)
-{
-  poly8x8_t result;
-  __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
-	   "tbl %0.8b, {v16.16b - v18.16b}, %2.8b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18");
-  return result;
-}
-
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vqtbl3q_s8 (int8x16x3_t tab, uint8x16_t idx)
-{
-  int8x16_t result;
-  __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
-	   "tbl %0.16b, {v16.16b - v18.16b}, %2.16b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18");
-  return result;
-}
-
-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vqtbl3q_u8 (uint8x16x3_t tab, uint8x16_t idx)
-{
-  uint8x16_t result;
-  __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
-	   "tbl %0.16b, {v16.16b - v18.16b}, %2.16b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18");
-  return result;
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vqtbl3q_p8 (poly8x16x3_t tab, uint8x16_t idx)
-{
-  poly8x16_t result;
-  __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
-	   "tbl %0.16b, {v16.16b - v18.16b}, %2.16b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18");
-  return result;
-}
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vqtbl4_s8 (int8x16x4_t tab, uint8x8_t idx)
-{
-  int8x8_t result;
-  __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
-	   "tbl %0.8b, {v16.16b - v19.16b}, %2.8b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18", "v19");
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vqtbl4_u8 (uint8x16x4_t tab, uint8x8_t idx)
-{
-  uint8x8_t result;
-  __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
-	   "tbl %0.8b, {v16.16b - v19.16b}, %2.8b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18", "v19");
-  return result;
-}
-
-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vqtbl4_p8 (poly8x16x4_t tab, uint8x8_t idx)
-{
-  poly8x8_t result;
-  __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
-	   "tbl %0.8b, {v16.16b - v19.16b}, %2.8b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18", "v19");
-  return result;
-}
-
-
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vqtbl4q_s8 (int8x16x4_t tab, uint8x16_t idx)
-{
-  int8x16_t result;
-  __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
-	   "tbl %0.16b, {v16.16b - v19.16b}, %2.16b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18", "v19");
-  return result;
-}
-
-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vqtbl4q_u8 (uint8x16x4_t tab, uint8x16_t idx)
-{
-  uint8x16_t result;
-  __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
-	   "tbl %0.16b, {v16.16b - v19.16b}, %2.16b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18", "v19");
-  return result;
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vqtbl4q_p8 (poly8x16x4_t tab, uint8x16_t idx)
-{
-  poly8x16_t result;
-  __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
-	   "tbl %0.16b, {v16.16b - v19.16b}, %2.16b\n\t"
-	   :"=w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18", "v19");
-  return result;
-}
-
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vqtbx1_s8 (int8x8_t r, int8x16_t tab, uint8x8_t idx)
 {
   int8x8_t result = r;
@@ -10974,227 +10756,6 @@  vqtbx1q_p8 (poly8x16_t r, poly8x16_t tab, uint8x16_t idx)
   return result;
 }
 
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vqtbx2_s8 (int8x8_t r, int8x16x2_t tab, uint8x8_t idx)
-{
-  int8x8_t result = r;
-  __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
-	   "tbx %0.8b, {v16.16b, v17.16b}, %2.8b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17");
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vqtbx2_u8 (uint8x8_t r, uint8x16x2_t tab, uint8x8_t idx)
-{
-  uint8x8_t result = r;
-  __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
-	   "tbx %0.8b, {v16.16b, v17.16b}, %2.8b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17");
-  return result;
-}
-
-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vqtbx2_p8 (poly8x8_t r, poly8x16x2_t tab, uint8x8_t idx)
-{
-  poly8x8_t result = r;
-  __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
-	   "tbx %0.8b, {v16.16b, v17.16b}, %2.8b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17");
-  return result;
-}
-
-
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vqtbx2q_s8 (int8x16_t r, int8x16x2_t tab, uint8x16_t idx)
-{
-  int8x16_t result = r;
-  __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
-	   "tbx %0.16b, {v16.16b, v17.16b}, %2.16b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17");
-  return result;
-}
-
-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vqtbx2q_u8 (uint8x16_t r, uint8x16x2_t tab, uint8x16_t idx)
-{
-  uint8x16_t result = r;
-  __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
-	   "tbx %0.16b, {v16.16b, v17.16b}, %2.16b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17");
-  return result;
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vqtbx2q_p8 (poly8x16_t r, poly8x16x2_t tab, uint8x16_t idx)
-{
-  poly8x16_t result = r;
-  __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
-	   "tbx %0.16b, {v16.16b, v17.16b}, %2.16b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17");
-  return result;
-}
-
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vqtbx3_s8 (int8x8_t r, int8x16x3_t tab, uint8x8_t idx)
-{
-  int8x8_t result = r;
-  __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
-	   "tbx %0.8b, {v16.16b - v18.16b}, %2.8b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18");
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vqtbx3_u8 (uint8x8_t r, uint8x16x3_t tab, uint8x8_t idx)
-{
-  uint8x8_t result = r;
-  __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
-	   "tbx %0.8b, {v16.16b - v18.16b}, %2.8b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18");
-  return result;
-}
-
-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vqtbx3_p8 (poly8x8_t r, poly8x16x3_t tab, uint8x8_t idx)
-{
-  poly8x8_t result = r;
-  __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
-	   "tbx %0.8b, {v16.16b - v18.16b}, %2.8b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18");
-  return result;
-}
-
-
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vqtbx3q_s8 (int8x16_t r, int8x16x3_t tab, uint8x16_t idx)
-{
-  int8x16_t result = r;
-  __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
-	   "tbx %0.16b, {v16.16b - v18.16b}, %2.16b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18");
-  return result;
-}
-
-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vqtbx3q_u8 (uint8x16_t r, uint8x16x3_t tab, uint8x16_t idx)
-{
-  uint8x16_t result = r;
-  __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
-	   "tbx %0.16b, {v16.16b - v18.16b}, %2.16b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18");
-  return result;
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vqtbx3q_p8 (poly8x16_t r, poly8x16x3_t tab, uint8x16_t idx)
-{
-  poly8x16_t result = r;
-  __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
-	   "tbx %0.16b, {v16.16b - v18.16b}, %2.16b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18");
-  return result;
-}
-
-
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vqtbx4_s8 (int8x8_t r, int8x16x4_t tab, uint8x8_t idx)
-{
-  int8x8_t result = r;
-  __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
-	   "tbx %0.8b, {v16.16b - v19.16b}, %2.8b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18", "v19");
-  return result;
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vqtbx4_u8 (uint8x8_t r, uint8x16x4_t tab, uint8x8_t idx)
-{
-  uint8x8_t result = r;
-  __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
-	   "tbx %0.8b, {v16.16b - v19.16b}, %2.8b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18", "v19");
-  return result;
-}
-
-__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vqtbx4_p8 (poly8x8_t r, poly8x16x4_t tab, uint8x8_t idx)
-{
-  poly8x8_t result = r;
-  __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
-	   "tbx %0.8b, {v16.16b - v19.16b}, %2.8b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18", "v19");
-  return result;
-}
-
-
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vqtbx4q_s8 (int8x16_t r, int8x16x4_t tab, uint8x16_t idx)
-{
-  int8x16_t result = r;
-  __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
-	   "tbx %0.16b, {v16.16b - v19.16b}, %2.16b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18", "v19");
-  return result;
-}
-
-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vqtbx4q_u8 (uint8x16_t r, uint8x16x4_t tab, uint8x16_t idx)
-{
-  uint8x16_t result = r;
-  __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
-	   "tbx %0.16b, {v16.16b - v19.16b}, %2.16b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18", "v19");
-  return result;
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vqtbx4q_p8 (poly8x16_t r, poly8x16x4_t tab, uint8x16_t idx)
-{
-  poly8x16_t result = r;
-  __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
-	   "tbx %0.16b, {v16.16b - v19.16b}, %2.16b\n\t"
-	   :"+w"(result)
-	   :"Q"(tab),"w"(idx)
-	   :"memory", "v16", "v17", "v18", "v19");
-  return result;
-}
-
 /* V7 legacy table intrinsics.  */
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
@@ -20745,6 +20306,389 @@  vqsubd_u64 (uint64_t __a, uint64_t __b)
   return __builtin_aarch64_uqsubdi_uuu (__a, __b);
 }
 
+/* vqtbl2 */
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqtbl2_s8 (int8x16x2_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[1], 1);
+  return __builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqtbl2_u8 (uint8x16x2_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  return (uint8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vqtbl2_p8 (poly8x16x2_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  return (poly8x8_t)__builtin_aarch64_tbl3v8qi (__o, (int8x8_t)idx);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqtbl2q_s8 (int8x16x2_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  return __builtin_aarch64_tbl3v16qi (__o, (int8x16_t)idx);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vqtbl2q_u8 (uint8x16x2_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  return (uint8x16_t)__builtin_aarch64_tbl3v16qi (__o, (int8x16_t)idx);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vqtbl2q_p8 (poly8x16x2_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  return (poly8x16_t)__builtin_aarch64_tbl3v16qi (__o, (int8x16_t)idx);
+}
+
+/* vqtbl3 */
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqtbl3_s8 (int8x16x3_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+  return __builtin_aarch64_qtbl3v8qi (__o, (int8x8_t)idx);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqtbl3_u8 (uint8x16x3_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+  return (uint8x8_t)__builtin_aarch64_qtbl3v8qi (__o, (int8x8_t)idx);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vqtbl3_p8 (poly8x16x3_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+  return (poly8x8_t)__builtin_aarch64_qtbl3v8qi (__o, (int8x8_t)idx);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqtbl3q_s8 (int8x16x3_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+  return __builtin_aarch64_qtbl3v16qi (__o, (int8x16_t)idx);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vqtbl3q_u8 (uint8x16x3_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+  return (uint8x16_t)__builtin_aarch64_qtbl3v16qi (__o, (int8x16_t)idx);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vqtbl3q_p8 (poly8x16x3_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+  return (poly8x16_t)__builtin_aarch64_qtbl3v16qi (__o, (int8x16_t)idx);
+}
+
+/* vqtbl4 */
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqtbl4_s8 (int8x16x4_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+  return __builtin_aarch64_qtbl4v8qi (__o, (int8x8_t)idx);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqtbl4_u8 (uint8x16x4_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+  return (uint8x8_t)__builtin_aarch64_qtbl4v8qi (__o, (int8x8_t)idx);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vqtbl4_p8 (poly8x16x4_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+  return (poly8x8_t)__builtin_aarch64_qtbl4v8qi (__o, (int8x8_t)idx);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqtbl4q_s8 (int8x16x4_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+  return __builtin_aarch64_qtbl4v16qi (__o, (int8x16_t)idx);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vqtbl4q_u8 (uint8x16x4_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+  return (uint8x16_t)__builtin_aarch64_qtbl4v16qi (__o, (int8x16_t)idx);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vqtbl4q_p8 (poly8x16x4_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+  return (poly8x16_t)__builtin_aarch64_qtbl4v16qi (__o, (int8x16_t)idx);
+}
+
+
+/* vqtbx2 */
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqtbx2_s8 (int8x8_t r, int8x16x2_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[1], 1);
+  return __builtin_aarch64_tbx4v8qi (r, __o, (int8x8_t)idx);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqtbx2_u8 (uint8x8_t r, uint8x16x2_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  return (uint8x8_t)__builtin_aarch64_tbx4v8qi ((int8x8_t)r, __o,
+						(int8x8_t)idx);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vqtbx2_p8 (poly8x8_t r, poly8x16x2_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  return (poly8x8_t)__builtin_aarch64_tbx4v8qi ((int8x8_t)r, __o,
+						(int8x8_t)idx);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqtbx2q_s8 (int8x16_t r, int8x16x2_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, tab.val[1], 1);
+  return __builtin_aarch64_tbx4v16qi (r, __o, (int8x16_t)idx);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vqtbx2q_u8 (uint8x16_t r, uint8x16x2_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  return (uint8x16_t)__builtin_aarch64_tbx4v16qi ((int8x16_t)r, __o,
+						  (int8x16_t)idx);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vqtbx2q_p8 (poly8x16_t r, poly8x16x2_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_oi __o;
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  return (poly8x16_t)__builtin_aarch64_tbx4v16qi ((int8x16_t)r, __o,
+						  (int8x16_t)idx);
+}
+
+/* vqtbx3 */
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqtbx3_s8 (int8x8_t r, int8x16x3_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[2], 2);
+  return __builtin_aarch64_qtbx3v8qi (r, __o, (int8x8_t)idx);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqtbx3_u8 (uint8x8_t r, uint8x16x3_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+  return (uint8x8_t)__builtin_aarch64_qtbx3v8qi ((int8x8_t)r, __o,
+						 (int8x8_t)idx);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vqtbx3_p8 (poly8x8_t r, poly8x16x3_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+  return (poly8x8_t)__builtin_aarch64_qtbx3v8qi ((int8x8_t)r, __o,
+						 (int8x8_t)idx);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqtbx3q_s8 (int8x16_t r, int8x16x3_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, tab.val[2], 2);
+  return __builtin_aarch64_qtbx3v16qi (r, __o, (int8x16_t)idx);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vqtbx3q_u8 (uint8x16_t r, uint8x16x3_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+  return (uint8x16_t)__builtin_aarch64_qtbx3v16qi ((int8x16_t)r, __o,
+						   (int8x16_t)idx);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vqtbx3q_p8 (poly8x16_t r, poly8x16x3_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_ci __o;
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t)tab.val[2], 2);
+  return (poly8x16_t)__builtin_aarch64_qtbx3v16qi ((int8x16_t)r, __o,
+						   (int8x16_t)idx);
+}
+
+/* vqtbx4 */
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vqtbx4_s8 (int8x8_t r, int8x16x4_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[3], 3);
+  return __builtin_aarch64_qtbx4v8qi (r, __o, (int8x8_t)idx);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vqtbx4_u8 (uint8x8_t r, uint8x16x4_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+  return (uint8x8_t)__builtin_aarch64_qtbx4v8qi ((int8x8_t)r, __o,
+						 (int8x8_t)idx);
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vqtbx4_p8 (poly8x8_t r, poly8x16x4_t tab, uint8x8_t idx)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+  return (poly8x8_t)__builtin_aarch64_qtbx4v8qi ((int8x8_t)r, __o,
+						 (int8x8_t)idx);
+}
+
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vqtbx4q_s8 (int8x16_t r, int8x16x4_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, tab.val[3], 3);
+  return __builtin_aarch64_qtbx4v16qi (r, __o, (int8x16_t)idx);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vqtbx4q_u8 (uint8x16_t r, uint8x16x4_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+  return (uint8x16_t)__builtin_aarch64_qtbx4v16qi ((int8x16_t)r, __o,
+						   (int8x16_t)idx);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vqtbx4q_p8 (poly8x16_t r, poly8x16x4_t tab, uint8x16_t idx)
+{
+  __builtin_aarch64_simd_xi __o;
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[0], 0);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[1], 1);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[2], 2);
+  __o = __builtin_aarch64_set_qregxiv16qi (__o, (int8x16_t)tab.val[3], 3);
+  return (poly8x16_t)__builtin_aarch64_qtbx4v16qi ((int8x16_t)r, __o,
+						   (int8x16_t)idx);
+}
+
 /* vrbit  */
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqtbX.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqtbX.c
new file mode 100644
index 0000000..129ceaf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqtbX.c
@@ -0,0 +1,519 @@ 
+/* { dg-skip-if "" { arm*-*-* } } */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results for vqtbl1.  */
+VECT_VAR_DECL(expected_vqtbl1,int,8,8) [] = { 0xfb, 0xf3, 0xf3, 0xf3,
+					      0x0, 0x0, 0xf3, 0xf3 };
+VECT_VAR_DECL(expected_vqtbl1,uint,8,8) [] = { 0xfb, 0xf3, 0xf3, 0xf3,
+					       0x0, 0x0, 0xf3, 0xf3 };
+VECT_VAR_DECL(expected_vqtbl1,poly,8,8) [] = { 0xfb, 0xf3, 0xf3, 0xf3,
+					       0x0, 0x0, 0xf3, 0xf3 };
+
+/* Expected results for vqtbl2.  */
+VECT_VAR_DECL(expected_vqtbl2,int,8,8) [] = { 0x5, 0xf5, 0xf5, 0xf5,
+					      0xfa, 0x0, 0xf5, 0xf5 };
+VECT_VAR_DECL(expected_vqtbl2,uint,8,8) [] = { 0x5, 0xf5, 0xf5, 0xf5,
+					       0xfa, 0x0, 0xf5, 0xf5 };
+VECT_VAR_DECL(expected_vqtbl2,poly,8,8) [] = { 0x5, 0xf5, 0xf5, 0xf5,
+					       0xfa, 0x0, 0xf5, 0xf5 };
+
+/* Expected results for vqtbl3.  */
+VECT_VAR_DECL(expected_vqtbl3,int,8,8) [] = { 0xf, 0xf7, 0xf7, 0xf7,
+					      0xfe, 0xb, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected_vqtbl3,uint,8,8) [] = { 0xf, 0xf7, 0xf7, 0xf7,
+					       0xfe, 0xb, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected_vqtbl3,poly,8,8) [] = { 0xf, 0xf7, 0xf7, 0xf7,
+					       0xfe, 0xb, 0xf7, 0xf7 };
+
+/* Expected results for vqtbl4.  */
+VECT_VAR_DECL(expected_vqtbl4,int,8,8) [] = { 0x19, 0xf9, 0xf9, 0xf9,
+					      0x2, 0x13, 0xf9, 0xf9 };
+VECT_VAR_DECL(expected_vqtbl4,uint,8,8) [] = { 0x19, 0xf9, 0xf9, 0xf9,
+					       0x2, 0x13, 0xf9, 0xf9 };
+VECT_VAR_DECL(expected_vqtbl4,poly,8,8) [] = { 0x19, 0xf9, 0xf9, 0xf9,
+					       0x2, 0x13, 0xf9, 0xf9 };
+
+/* Expected results for vqtbx1.  */
+VECT_VAR_DECL(expected_vqtbx1,int,8,8) [] = { 0xfb, 0xf3, 0xf3, 0xf3,
+					      0x33, 0x33, 0xf3, 0xf3 };
+VECT_VAR_DECL(expected_vqtbx1,uint,8,8) [] = { 0xfb, 0xf3, 0xf3, 0xf3,
+					       0xcc, 0xcc, 0xf3, 0xf3 };
+VECT_VAR_DECL(expected_vqtbx1,poly,8,8) [] = { 0xfb, 0xf3, 0xf3, 0xf3,
+					       0xcc, 0xcc, 0xf3, 0xf3 };
+
+/* Expected results for vqtbx2.  */
+VECT_VAR_DECL(expected_vqtbx2,int,8,8) [] = { 0x5, 0xf5, 0xf5, 0xf5,
+					      0xfa, 0x33, 0xf5, 0xf5 };
+VECT_VAR_DECL(expected_vqtbx2,uint,8,8) [] = { 0x5, 0xf5, 0xf5, 0xf5,
+					       0xfa, 0xcc, 0xf5, 0xf5 };
+VECT_VAR_DECL(expected_vqtbx2,poly,8,8) [] = { 0x5, 0xf5, 0xf5, 0xf5,
+					       0xfa, 0xcc, 0xf5, 0xf5 };
+
+/* Expected results for vqtbx3.  */
+VECT_VAR_DECL(expected_vqtbx3,int,8,8) [] = { 0xf, 0xf7, 0xf7, 0xf7,
+					      0xfe, 0xb, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected_vqtbx3,uint,8,8) [] = { 0xf, 0xf7, 0xf7, 0xf7,
+					       0xfe, 0xb, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected_vqtbx3,poly,8,8) [] = { 0xf, 0xf7, 0xf7, 0xf7,
+					       0xfe, 0xb, 0xf7, 0xf7 };
+
+/* Expected results for vqtbx4.  */
+VECT_VAR_DECL(expected_vqtbx4,int,8,8) [] = { 0x19, 0xf9, 0xf9, 0xf9,
+					      0x2, 0x13, 0xf9, 0xf9 };
+VECT_VAR_DECL(expected_vqtbx4,uint,8,8) [] = { 0x19, 0xf9, 0xf9, 0xf9,
+					       0x2, 0x13, 0xf9, 0xf9 };
+VECT_VAR_DECL(expected_vqtbx4,poly,8,8) [] = { 0x19, 0xf9, 0xf9, 0xf9,
+					       0x2, 0x13, 0xf9, 0xf9 };
+
+/* Expected results for vqtbl1q.  */
+VECT_VAR_DECL(expected_vqtbl1q,int,8,16) [] = { 0xfb, 0xf3, 0xf3, 0xf3,
+						0x0, 0x0, 0xf3, 0xf3,
+						0xf3, 0xf3, 0xf3, 0xf3,
+						0xf3, 0xf3, 0xf3, 0xf3 };
+VECT_VAR_DECL(expected_vqtbl1q,uint,8,16) [] = { 0xfb, 0xf3, 0xf3, 0xf3,
+						 0x0, 0x0, 0xf3, 0xf3,
+						 0xf3, 0xf3, 0xf3, 0xf3,
+						 0xf3, 0xf3, 0xf3, 0xf3 };
+VECT_VAR_DECL(expected_vqtbl1q,poly,8,16) [] = { 0xfb, 0xf3, 0xf3, 0xf3,
+						 0x0, 0x0, 0xf3, 0xf3,
+						 0xf3, 0xf3, 0xf3, 0xf3,
+						 0xf3, 0xf3, 0xf3, 0xf3 };
+
+/* Expected results for vqtbl2q.  */
+VECT_VAR_DECL(expected_vqtbl2q,int,8,16) [] = { 0x5, 0xf5, 0xf5, 0xf5,
+						0xfa, 0x0, 0xf5, 0xf5,
+						0xf5, 0xf5, 0xf5, 0xf5,
+						0xf5, 0xf5, 0xf5, 0xf5 };
+VECT_VAR_DECL(expected_vqtbl2q,uint,8,16) [] = { 0x5, 0xf5, 0xf5, 0xf5,
+						 0xfa, 0x0, 0xf5, 0xf5,
+						 0xf5, 0xf5, 0xf5, 0xf5,
+						 0xf5, 0xf5, 0xf5, 0xf5 };
+VECT_VAR_DECL(expected_vqtbl2q,poly,8,16) [] = { 0x5, 0xf5, 0xf5, 0xf5,
+						 0xfa, 0x0, 0xf5, 0xf5,
+						 0xf5, 0xf5, 0xf5, 0xf5,
+						 0xf5, 0xf5, 0xf5, 0xf5 };
+
+/* Expected results for vqtbl3q.  */
+VECT_VAR_DECL(expected_vqtbl3q,int,8,16) [] = { 0xf, 0xf7, 0xf7, 0xf7,
+						0xfe, 0xb, 0xf7, 0xf7,
+						0xf7, 0xf7, 0xf7, 0xf7,
+						0xf7, 0xf7, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected_vqtbl3q,uint,8,16) [] = { 0xf, 0xf7, 0xf7, 0xf7,
+						 0xfe, 0xb, 0xf7, 0xf7,
+						 0xf7, 0xf7, 0xf7, 0xf7,
+						 0xf7, 0xf7, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected_vqtbl3q,poly,8,16) [] = { 0xf, 0xf7, 0xf7, 0xf7,
+						 0xfe, 0xb, 0xf7, 0xf7,
+						 0xf7, 0xf7, 0xf7, 0xf7,
+						 0xf7, 0xf7, 0xf7, 0xf7 };
+
+/* Expected results for vqtbl4q.  */
+VECT_VAR_DECL(expected_vqtbl4q,int,8,16) [] = { 0x19, 0xf9, 0xf9, 0xf9,
+						0x2, 0x13, 0xf9, 0xf9,
+						0xf9, 0xf9, 0xf9, 0xf9,
+						0xf9, 0xf9, 0xf9, 0xf9 };
+VECT_VAR_DECL(expected_vqtbl4q,uint,8,16) [] = { 0x19, 0xf9, 0xf9, 0xf9,
+						 0x2, 0x13, 0xf9, 0xf9,
+						 0xf9, 0xf9, 0xf9, 0xf9,
+						 0xf9, 0xf9, 0xf9, 0xf9 };
+VECT_VAR_DECL(expected_vqtbl4q,poly,8,16) [] = { 0x19, 0xf9, 0xf9, 0xf9,
+						 0x2, 0x13, 0xf9, 0xf9,
+						 0xf9, 0xf9, 0xf9, 0xf9,
+						 0xf9, 0xf9, 0xf9, 0xf9 };
+
+/* Expected results for vqtbx1q.  */
+VECT_VAR_DECL(expected_vqtbx1q,int,8,16) [] = { 0xfb, 0xf3, 0xf3, 0xf3,
+						0x33, 0x33, 0xf3, 0xf3,
+						0xf3, 0xf3, 0xf3, 0xf3,
+						0xf3, 0xf3, 0xf3, 0xf3 };
+VECT_VAR_DECL(expected_vqtbx1q,uint,8,16) [] = { 0xfb, 0xf3, 0xf3, 0xf3,
+						 0xcc, 0xcc, 0xf3, 0xf3,
+						 0xf3, 0xf3, 0xf3, 0xf3,
+						 0xf3, 0xf3, 0xf3, 0xf3 };
+VECT_VAR_DECL(expected_vqtbx1q,poly,8,16) [] = { 0xfb, 0xf3, 0xf3, 0xf3,
+						 0xcc, 0xcc, 0xf3, 0xf3,
+						 0xf3, 0xf3, 0xf3, 0xf3,
+						 0xf3, 0xf3, 0xf3, 0xf3 };
+
+/* Expected results for vqtbx2q.  */
+VECT_VAR_DECL(expected_vqtbx2q,int,8,16) [] = { 0x5, 0xf5, 0xf5, 0xf5,
+						0xfa, 0x33, 0xf5, 0xf5,
+						0xf5, 0xf5, 0xf5, 0xf5,
+						0xf5, 0xf5, 0xf5, 0xf5 };
+VECT_VAR_DECL(expected_vqtbx2q,uint,8,16) [] = { 0x5, 0xf5, 0xf5, 0xf5,
+						 0xfa, 0xcc, 0xf5, 0xf5,
+						 0xf5, 0xf5, 0xf5, 0xf5,
+						 0xf5, 0xf5, 0xf5, 0xf5 };
+VECT_VAR_DECL(expected_vqtbx2q,poly,8,16) [] = { 0x5, 0xf5, 0xf5, 0xf5,
+						 0xfa, 0xcc, 0xf5, 0xf5,
+						 0xf5, 0xf5, 0xf5, 0xf5,
+						 0xf5, 0xf5, 0xf5, 0xf5 };
+
+/* Expected results for vqtbx3q.  */
+VECT_VAR_DECL(expected_vqtbx3q,int,8,16) [] = { 0xf, 0xf7, 0xf7, 0xf7,
+						0xfe, 0xb, 0xf7, 0xf7,
+						0xf7, 0xf7, 0xf7, 0xf7,
+						0xf7, 0xf7, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected_vqtbx3q,uint,8,16) [] = { 0xf, 0xf7, 0xf7, 0xf7,
+						 0xfe, 0xb, 0xf7, 0xf7,
+						 0xf7, 0xf7, 0xf7, 0xf7,
+						 0xf7, 0xf7, 0xf7, 0xf7 };
+VECT_VAR_DECL(expected_vqtbx3q,poly,8,16) [] = { 0xf, 0xf7, 0xf7, 0xf7,
+						 0xfe, 0xb, 0xf7, 0xf7,
+						 0xf7, 0xf7, 0xf7, 0xf7,
+						 0xf7, 0xf7, 0xf7, 0xf7 };
+
+/* Expected results for vqtbx4q.  */
+VECT_VAR_DECL(expected_vqtbx4q,int,8,16) [] = { 0x19, 0xf9, 0xf9, 0xf9,
+						0x2, 0x13, 0xf9, 0xf9,
+						0xf9, 0xf9, 0xf9, 0xf9,
+						0xf9, 0xf9, 0xf9, 0xf9 };
+VECT_VAR_DECL(expected_vqtbx4q,uint,8,16) [] = { 0x19, 0xf9, 0xf9, 0xf9,
+						 0x2, 0x13, 0xf9, 0xf9,
+						 0xf9, 0xf9, 0xf9, 0xf9,
+						 0xf9, 0xf9, 0xf9, 0xf9 };
+VECT_VAR_DECL(expected_vqtbx4q,poly,8,16) [] = { 0x19, 0xf9, 0xf9, 0xf9,
+						 0x2, 0x13, 0xf9, 0xf9,
+						 0xf9, 0xf9, 0xf9, 0xf9,
+						 0xf9, 0xf9, 0xf9, 0xf9 };
+
+void exec_vqtbX (void)
+{
+  int i;
+
+  /* In this case, input variables are arrays of vectors.  */
+#define DECL_VQTBX(T1, W, N, X)						\
+  VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(table_vector, T1, W, N, X)
+
+  /* The vqtbl1 variant is different from vqtbl{2,3,4} because it takes a
+     vector as 1st param, instead of an array of vectors.  */
+#define TEST_VQTBL1(T1, T2, T3, W, N1, N2)		\
+  VECT_VAR(table_vector, T1, W, N2) =			\
+    vld1##q_##T2##W((T1##W##_t *)lookup_table);		\
+							\
+  VECT_VAR(vector_res, T1, W, N1) =			\
+    vqtbl1_##T2##W(VECT_VAR(table_vector, T1, W, N2),	\
+		   VECT_VAR(vector, T3, W, N1));	\
+  vst1_##T2##W(VECT_VAR(result, T1, W, N1),		\
+	       VECT_VAR(vector_res, T1, W, N1));
+
+#define TEST_VQTBL1Q(T1, T2, T3, W, N1, N2)		\
+  VECT_VAR(table_vector, T1, W, N2) =			\
+    vld1##q_##T2##W((T1##W##_t *)lookup_table);		\
+							\
+  VECT_VAR(vector_res, T1, W, N1) =			\
+    vqtbl1q_##T2##W(VECT_VAR(table_vector, T1, W, N2),	\
+		    VECT_VAR(vector, T3, W, N1));	\
+  vst1q_##T2##W(VECT_VAR(result, T1, W, N1),		\
+	       VECT_VAR(vector_res, T1, W, N1));
+
+#define TEST_VQTBLX(T1, T2, T3, W, N1, N2, X)				\
+  VECT_ARRAY_VAR(table_vector, T1, W, N2, X) =				\
+    vld##X##q_##T2##W((T1##W##_t *)lookup_table);			\
+									\
+  VECT_VAR(vector_res, T1, W, N1) =					\
+    vqtbl##X##_##T2##W(VECT_ARRAY_VAR(table_vector, T1, W, N2, X),	\
+		       VECT_VAR(vector, T3, W, N1));			\
+  vst1_##T2##W(VECT_VAR(result, T1, W, N1),				\
+		VECT_VAR(vector_res, T1, W, N1));
+
+#define TEST_VQTBLXQ(T1, T2, T3, W, N1, N2, X)				\
+  VECT_ARRAY_VAR(table_vector, T1, W, N2, X) =				\
+    vld##X##q_##T2##W((T1##W##_t *)lookup_table);			\
+									\
+  VECT_VAR(vector_res, T1, W, N1) =					\
+    vqtbl##X##q_##T2##W(VECT_ARRAY_VAR(table_vector, T1, W, N2, X),	\
+			VECT_VAR(vector, T3, W, N1));			\
+  vst1q_##T2##W(VECT_VAR(result, T1, W, N1),				\
+		VECT_VAR(vector_res, T1, W, N1));
+
+  /* We need to define a lookup table large enough.  */
+  int8_t lookup_table[4*16];
+
+  /* For vqtblX.  */
+  DECL_VARIABLE(vector, int, 8, 8);
+  DECL_VARIABLE(vector, uint, 8, 8);
+  DECL_VARIABLE(vector, poly, 8, 8);
+  DECL_VARIABLE(vector_res, int, 8, 8);
+  DECL_VARIABLE(vector_res, uint, 8, 8);
+  DECL_VARIABLE(vector_res, poly, 8, 8);
+
+  /* For vqtblXq.  */
+  DECL_VARIABLE(vector, int, 8, 16);
+  DECL_VARIABLE(vector, uint, 8, 16);
+  DECL_VARIABLE(vector, poly, 8, 16);
+  DECL_VARIABLE(vector_res, int, 8, 16);
+  DECL_VARIABLE(vector_res, uint, 8, 16);
+  DECL_VARIABLE(vector_res, poly, 8, 16);
+
+  /* For vqtbl1.  */
+  DECL_VARIABLE(table_vector, int, 8, 16);
+  DECL_VARIABLE(table_vector, uint, 8, 16);
+  DECL_VARIABLE(table_vector, poly, 8, 16);
+
+  /* For vqtbx*.  */
+  DECL_VARIABLE(default_vector, int, 8, 8);
+  DECL_VARIABLE(default_vector, uint, 8, 8);
+  DECL_VARIABLE(default_vector, poly, 8, 8);
+
+  /* For vqtbx*q.  */
+  DECL_VARIABLE(default_vector, int, 8, 16);
+  DECL_VARIABLE(default_vector, uint, 8, 16);
+  DECL_VARIABLE(default_vector, poly, 8, 16);
+
+  /* We need only 8 bits variants.  */
+#define DECL_ALL_VQTBLX(X)			\
+  DECL_VQTBX(int, 8, 16, X);			\
+  DECL_VQTBX(uint, 8, 16, X);			\
+  DECL_VQTBX(poly, 8, 16, X)
+
+#define TEST_ALL_VQTBL1()			\
+  TEST_VQTBL1(int, s, uint, 8, 8, 16);		\
+  TEST_VQTBL1(uint, u, uint, 8, 8, 16);		\
+  TEST_VQTBL1(poly, p, uint, 8, 8, 16);		\
+  TEST_VQTBL1Q(int, s, uint, 8, 16, 16);	\
+  TEST_VQTBL1Q(uint, u, uint, 8, 16, 16);	\
+  TEST_VQTBL1Q(poly, p, uint, 8, 16, 16)
+
+#define TEST_ALL_VQTBLX(X)			\
+  TEST_VQTBLX(int, s, uint, 8, 8, 16, X);	\
+  TEST_VQTBLX(uint, u, uint, 8, 8, 16, X);	\
+  TEST_VQTBLX(poly, p, uint, 8, 8, 16, X);	\
+  TEST_VQTBLXQ(int, s, uint, 8, 16, 16, X);	\
+  TEST_VQTBLXQ(uint, u, uint, 8, 16, 16, X);	\
+  TEST_VQTBLXQ(poly, p, uint, 8, 16, 16, X)
+
+  /* Declare the temporary buffers / variables.  */
+  DECL_ALL_VQTBLX(2);
+  DECL_ALL_VQTBLX(3);
+  DECL_ALL_VQTBLX(4);
+
+  /* Fill the lookup table.  */
+  for (i=0; i<4*16; i++) {
+    lookup_table[i] = i-15;
+  }
+
+  /* Choose init value arbitrarily, will be used as table index.  */
+  VDUP(vector, , uint, u, 8, 8, 2);
+  VDUP(vector, q, uint, u, 8, 16, 2);
+
+  /* To ensure coverage, add some indexes larger than 8, 16 and 32
+     except: lane 0 (index 10), lane 4 (index 20) and lane 5 (index
+     40).  */
+  VSET_LANE(vector, , uint, u, 8, 8, 0, 10);
+  VSET_LANE(vector, , uint, u, 8, 8, 4, 20);
+  VSET_LANE(vector, , uint, u, 8, 8, 5, 40);
+
+  VSET_LANE(vector, q, uint, u, 8, 16, 0, 10);
+  VSET_LANE(vector, q, uint, u, 8, 16, 4, 20);
+  VSET_LANE(vector, q, uint, u, 8, 16, 5, 40);
+
+  /* Check vqtbl1.  */
+  clean_results ();
+#define TEST_MSG "VQTBL1"
+  TEST_ALL_VQTBL1();
+
+  CHECK(TEST_MSG, int, 8, 8, PRIx8, expected_vqtbl1, "");
+  CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_vqtbl1, "");
+  CHECK(TEST_MSG, poly, 8, 8, PRIx8, expected_vqtbl1, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VQTBL1Q"
+  CHECK(TEST_MSG, int, 8, 16, PRIx8, expected_vqtbl1q, "");
+  CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_vqtbl1q, "");
+  CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_vqtbl1q, "");
+
+  /* Check vqtbl2.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VQTBL2"
+  TEST_ALL_VQTBLX(2);
+
+  CHECK(TEST_MSG, int, 8, 8, PRIx8, expected_vqtbl2, "");
+  CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_vqtbl2, "");
+  CHECK(TEST_MSG, poly, 8, 8, PRIx8, expected_vqtbl2, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VQTBL2Q"
+  CHECK(TEST_MSG, int, 8, 16, PRIx8, expected_vqtbl2q, "");
+  CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_vqtbl2q, "");
+  CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_vqtbl2q, "");
+
+  /* Check vqtbl3.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VQTBL3"
+  TEST_ALL_VQTBLX(3);
+
+  CHECK(TEST_MSG, int, 8, 8, PRIx8, expected_vqtbl3, "");
+  CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_vqtbl3, "");
+  CHECK(TEST_MSG, poly, 8, 8, PRIx8, expected_vqtbl3, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VQTBL3Q"
+  CHECK(TEST_MSG, int, 8, 16, PRIx8, expected_vqtbl3q, "");
+  CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_vqtbl3q, "");
+  CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_vqtbl3q, "");
+
+  /* Check vqtbl4.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VQTBL4"
+  TEST_ALL_VQTBLX(4);
+
+  CHECK(TEST_MSG, int, 8, 8, PRIx8, expected_vqtbl4, "");
+  CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_vqtbl4, "");
+  CHECK(TEST_MSG, poly, 8, 8, PRIx8, expected_vqtbl4, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VQTBL4Q"
+  CHECK(TEST_MSG, int, 8, 16, PRIx8, expected_vqtbl4q, "");
+  CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_vqtbl4q, "");
+  CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_vqtbl4q, "");
+
+
+  /* Now test VQTBX.  */
+
+  /* The vqtbx1 variant is different from vqtbx{2,3,4} because it takes a
+     vector as 1st param, instead of an array of vectors.  */
+#define TEST_VQTBX1(T1, T2, T3, W, N1, N2)		\
+  VECT_VAR(table_vector, T1, W, N2) =			\
+    vld1##q_##T2##W((T1##W##_t *)lookup_table);		\
+    							\
+  VECT_VAR(vector_res, T1, W, N1) =		       	\
+    vqtbx1_##T2##W(VECT_VAR(default_vector, T1, W, N1),	\
+		   VECT_VAR(table_vector, T1, W, N2),	\
+		   VECT_VAR(vector, T3, W, N1));	\
+  vst1_##T2##W(VECT_VAR(result, T1, W, N1),		\
+	       VECT_VAR(vector_res, T1, W, N1));
+
+#define TEST_VQTBX1Q(T1, T2, T3, W, N1, N2)		\
+  VECT_VAR(table_vector, T1, W, N2) =			\
+    vld1##q_##T2##W((T1##W##_t *)lookup_table);		\
+    							\
+  VECT_VAR(vector_res, T1, W, N1) =			\
+    vqtbx1q_##T2##W(VECT_VAR(default_vector, T1, W, N1),\
+		    VECT_VAR(table_vector, T1, W, N2),	\
+		    VECT_VAR(vector, T3, W, N1));	\
+    vst1q_##T2##W(VECT_VAR(result, T1, W, N1),		\
+		  VECT_VAR(vector_res, T1, W, N1));
+  
+#define TEST_VQTBXX(T1, T2, T3, W, N1, N2, X)				\
+  VECT_ARRAY_VAR(table_vector, T1, W, N2, X) =				\
+    vld##X##q_##T2##W((T1##W##_t *)lookup_table);			\
+									\
+  VECT_VAR(vector_res, T1, W, N1) =					\
+    vqtbx##X##_##T2##W(VECT_VAR(default_vector, T1, W, N1),		\
+			VECT_ARRAY_VAR(table_vector, T1, W, N2, X),	\
+			VECT_VAR(vector, T3, W, N1));			\
+  vst1_##T2##W(VECT_VAR(result, T1, W, N1),				\
+		VECT_VAR(vector_res, T1, W, N1));
+
+#define TEST_VQTBXXQ(T1, T2, T3, W, N1, N2, X)				\
+  VECT_ARRAY_VAR(table_vector, T1, W, N2, X) =				\
+    vld##X##q_##T2##W((T1##W##_t *)lookup_table);			\
+									\
+  VECT_VAR(vector_res, T1, W, N1) =					\
+    vqtbx##X##q_##T2##W(VECT_VAR(default_vector, T1, W, N1),		\
+			VECT_ARRAY_VAR(table_vector, T1, W, N2, X),	\
+			VECT_VAR(vector, T3, W, N1));			\
+  vst1q_##T2##W(VECT_VAR(result, T1, W, N1),				\
+		VECT_VAR(vector_res, T1, W, N1));
+
+#define TEST_ALL_VQTBX1()			\
+  TEST_VQTBX1(int, s, uint, 8, 8, 16);		\
+  TEST_VQTBX1(uint, u, uint, 8, 8, 16);		\
+  TEST_VQTBX1(poly, p, uint, 8, 8, 16);		\
+  TEST_VQTBX1Q(int, s, uint, 8, 16, 16);	\
+  TEST_VQTBX1Q(uint, u, uint, 8, 16, 16);	\
+  TEST_VQTBX1Q(poly, p, uint, 8, 16, 16)
+
+#define TEST_ALL_VQTBXX(X)			\
+  TEST_VQTBXX(int, s, uint, 8, 8, 16, X);	\
+  TEST_VQTBXX(uint, u, uint, 8, 8, 16, X);	\
+  TEST_VQTBXX(poly, p, uint, 8, 8, 16, X);	\
+  TEST_VQTBXXQ(int, s, uint, 8, 16, 16, X);	\
+  TEST_VQTBXXQ(uint, u, uint, 8, 16, 16, X);	\
+  TEST_VQTBXXQ(poly, p, uint, 8, 16, 16, X)
+
+  /* Choose init value arbitrarily, will be used as default value.  */
+  VDUP(default_vector, , int, s, 8, 8, 0x33);
+  VDUP(default_vector, , uint, u, 8, 8, 0xCC);
+  VDUP(default_vector, , poly, p, 8, 8, 0xCC);
+  VDUP(default_vector, q, int, s, 8, 16, 0x33);
+  VDUP(default_vector, q, uint, u, 8, 16, 0xCC);
+  VDUP(default_vector, q, poly, p, 8, 16, 0xCC);
+
+  /* Check vqtbx1.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VQTBX1"
+  TEST_ALL_VQTBX1();
+
+  CHECK(TEST_MSG, int, 8, 8, PRIx8, expected_vqtbx1, "");
+  CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_vqtbx1, "");
+  CHECK(TEST_MSG, poly, 8, 8, PRIx8, expected_vqtbx1, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VQTBX1Q"
+  CHECK(TEST_MSG, int, 8, 16, PRIx8, expected_vqtbx1q, "");
+  CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_vqtbx1q, "");
+  CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_vqtbx1q, "");
+
+  /* Check vqtbx2.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VQTBX2"
+  TEST_ALL_VQTBXX(2);
+
+  CHECK(TEST_MSG, int, 8, 8, PRIx8, expected_vqtbx2, "");
+  CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_vqtbx2, "");
+  CHECK(TEST_MSG, poly, 8, 8, PRIx8, expected_vqtbx2, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VQTBX2Q"
+  CHECK(TEST_MSG, int, 8, 16, PRIx8, expected_vqtbx2q, "");
+  CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_vqtbx2q, "");
+  CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_vqtbx2q, "");
+
+  /* Check vqtbx3.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VQTBX3"
+  TEST_ALL_VQTBXX(3);
+
+  CHECK(TEST_MSG, int, 8, 8, PRIx8, expected_vqtbx3, "");
+  CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_vqtbx3, "");
+  CHECK(TEST_MSG, poly, 8, 8, PRIx8, expected_vqtbx3, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VQTBX3Q"
+  CHECK(TEST_MSG, int, 8, 16, PRIx8, expected_vqtbx3q, "");
+  CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_vqtbx3q, "");
+  CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_vqtbx3q, "");
+
+  /* Check vqtbx4.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VQTBX4"
+  TEST_ALL_VQTBXX(4);
+
+  CHECK(TEST_MSG, int, 8, 8, PRIx8, expected_vqtbx4, "");
+  CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected_vqtbx4, "");
+  CHECK(TEST_MSG, poly, 8, 8, PRIx8, expected_vqtbx4, "");
+
+#undef TEST_MSG
+#define TEST_MSG "VQTBX4Q"
+  CHECK(TEST_MSG, int, 8, 16, PRIx8, expected_vqtbx4q, "");
+  CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected_vqtbx4q, "");
+  CHECK(TEST_MSG, poly, 8, 16, PRIx8, expected_vqtbx4q, "");
+}
+
+int main (void)
+{
+  exec_vqtbX ();
+  return 0;
+}