diff mbox

[ARM,1/3] Implement crypto intrinsics in AArch32 ARMv8-A - add functionality

Message ID 529F46E7.1060309@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov Dec. 4, 2013, 3:14 p.m. UTC
Hi all,

These are the changes to the arm backend that add support for the intrinsics.
neon.ml is modified to support the new poly64_t and poly128_t types and all 
their associated intrinsics.

Technically, arm_neon.h is generated from the ML scripts, but I'm including it 
in the patch here for easier review (although the diff tool did a quite poor job 
of finding the minimal diff :( )

We create new builtins that map down to the instructions and the intrinsics use 
those builtins (our usual mechanism for arm_neon.h intrinsics).

A new .md file is added: crypto.md that contains the patterns for the 
instructions. A new reinterpret pattern is added to neon.md and the reinterpret 
patterns are updated to handle the reinterprets to and from the new poly128_t type.

Ok for trunk?

Thanks,
Kyrill

2013-12-04  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/arm/arm.c (enum arm_builtins): Add crypto builtins.
     (arm_init_neon_builtins): Handle crypto builtins.
     (bdesc_2arg): Likewise.
     (bdesc_1arg): Likewise.
     (bdesc_3arg): New table.
     (arm_expand_ternop_builtin): New function.
     (arm_expand_unop_builtin): Handle sha1h explicitly.
     (arm_expand_builtin): Handle ternary builtins.
     * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS):
     Define __ARM_FEATURE_CRYPTO.
     * config/arm/arm.md: Include crypto.md.
     (is_neon_type): Add crypto types.
     * config/arm/arm_neon_builtins.def: Add TImode reinterprets.
     * config/arm/crypto.def: New.
     * config/arm/crypto.md: Likewise.
     * config/arm/iterators.md (CRYPTO_UNARY): New int iterator.
     (CRYPTO_BINARY): Likewise.
     (CRYPTO_TERNARY): Likewise.
     (CRYPTO_SELECTING): Likewise.
     (crypto_pattern): New int attribute.
     (crypto_size_sfx): Likewise.
     (crypto_mode): Likewise.
     (crypto_type): Likewise.
     * config/arm/neon-gen.ml: Handle poly64_t and poly128_t types.
     Handle crypto intrinsics.
     * config/arm/neon.ml: Add support for poly64 and polt128 types
     and intrinsics. Define crypto intrinsics.
     * config/arm/neon.md (neon_vreinterpretti<mode>): New pattern.
     (neon_vreinterpretv16qi<mode>): Use VQXMOV mode iterator.
     (neon_vreinterpretv8hi<mode>): Likewise.
     (neon_vreinterpretv4si<mode>): Likewise.
     (neon_vreinterpretv4sf<mode>): Likewise.
     (neon_vreinterpretv2di<mode>): Likewise.
     * config/arm/types.md (type): Add crypto_aes, crypto_sha1_xor,
     crypto_sha1_fast, crypto_sha1_slow, crypto_sha256_fast,
     crypto_sha256_slow.
     * config/arm/unspecs.md (UNSPEC_AESD, UNSPEC_AESE, UNSPEC_AESIMC,
     UNSPEC_AESMC, UNSPEC_SHA1C, UNSPEC_SHA1M, UNSPEC_SHA1P, UNSPEC_SHA1H,
     UNSPEC_SHA1SU0, UNSPEC_SHA1SU1, UNSPEC_SHA256H, UNSPEC_SHA256H2,
     UNSPEC_SHA256SU0, UNSPEC_SHA256SU1, VMULLP64): Define.
     * config/arm/arm_neon.h: Regenerate.

Comments

Richard Earnshaw Dec. 19, 2013, 3:01 p.m. UTC | #1
On 04/12/13 15:14, Kyrill Tkachov wrote:
> Hi all,
> 
> These are the changes to the arm backend that add support for the intrinsics.
> neon.ml is modified to support the new poly64_t and poly128_t types and all 
> their associated intrinsics.
> 
> Technically, arm_neon.h is generated from the ML scripts, but I'm including it 
> in the patch here for easier review (although the diff tool did a quite poor job 
> of finding the minimal diff :( )
> 
> We create new builtins that map down to the instructions and the intrinsics use 
> those builtins (our usual mechanism for arm_neon.h intrinsics).
> 
> A new .md file is added: crypto.md that contains the patterns for the 
> instructions. A new reinterpret pattern is added to neon.md and the reinterpret 
> patterns are updated to handle the reinterprets to and from the new poly128_t type.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2013-12-04  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>      * config/arm/arm.c (enum arm_builtins): Add crypto builtins.
>      (arm_init_neon_builtins): Handle crypto builtins.
>      (bdesc_2arg): Likewise.
>      (bdesc_1arg): Likewise.
>      (bdesc_3arg): New table.
>      (arm_expand_ternop_builtin): New function.
>      (arm_expand_unop_builtin): Handle sha1h explicitly.
>      (arm_expand_builtin): Handle ternary builtins.
>      * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS):
>      Define __ARM_FEATURE_CRYPTO.
>      * config/arm/arm.md: Include crypto.md.
>      (is_neon_type): Add crypto types.
>      * config/arm/arm_neon_builtins.def: Add TImode reinterprets.
>      * config/arm/crypto.def: New.
>      * config/arm/crypto.md: Likewise.
>      * config/arm/iterators.md (CRYPTO_UNARY): New int iterator.
>      (CRYPTO_BINARY): Likewise.
>      (CRYPTO_TERNARY): Likewise.
>      (CRYPTO_SELECTING): Likewise.
>      (crypto_pattern): New int attribute.
>      (crypto_size_sfx): Likewise.
>      (crypto_mode): Likewise.
>      (crypto_type): Likewise.
>      * config/arm/neon-gen.ml: Handle poly64_t and poly128_t types.
>      Handle crypto intrinsics.
>      * config/arm/neon.ml: Add support for poly64 and polt128 types
>      and intrinsics. Define crypto intrinsics.
>      * config/arm/neon.md (neon_vreinterpretti<mode>): New pattern.
>      (neon_vreinterpretv16qi<mode>): Use VQXMOV mode iterator.
>      (neon_vreinterpretv8hi<mode>): Likewise.
>      (neon_vreinterpretv4si<mode>): Likewise.
>      (neon_vreinterpretv4sf<mode>): Likewise.
>      (neon_vreinterpretv2di<mode>): Likewise.
>      * config/arm/types.md (type): Add crypto_aes, crypto_sha1_xor,
>      crypto_sha1_fast, crypto_sha1_slow, crypto_sha256_fast,
>      crypto_sha256_slow.
>      * config/arm/unspecs.md (UNSPEC_AESD, UNSPEC_AESE, UNSPEC_AESIMC,
>      UNSPEC_AESMC, UNSPEC_SHA1C, UNSPEC_SHA1M, UNSPEC_SHA1P, UNSPEC_SHA1H,
>      UNSPEC_SHA1SU0, UNSPEC_SHA1SU1, UNSPEC_SHA256H, UNSPEC_SHA256H2,
>      UNSPEC_SHA256SU0, UNSPEC_SHA256SU1, VMULLP64): Define.
>      * config/arm/arm_neon.h: Regenerate.
> 
> 
OK.

R.
diff mbox

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 73bd37d..73b7a96 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23142,6 +23142,23 @@  enum arm_builtins
   ARM_BUILTIN_CRC32CH,
   ARM_BUILTIN_CRC32CW,
 
+#undef CRYPTO1
+#undef CRYPTO2
+#undef CRYPTO3
+
+#define CRYPTO1(L, U, M1, M2) \
+  ARM_BUILTIN_CRYPTO_##U,
+#define CRYPTO2(L, U, M1, M2, M3) \
+  ARM_BUILTIN_CRYPTO_##U,
+#define CRYPTO3(L, U, M1, M2, M3, M4) \
+  ARM_BUILTIN_CRYPTO_##U,
+
+#include "crypto.def"
+
+#undef CRYPTO1
+#undef CRYPTO2
+#undef CRYPTO3
+
 #include "arm_neon_builtins.def"
 
   ,ARM_BUILTIN_MAX
@@ -23163,6 +23180,9 @@  enum arm_builtins
 
 static GTY(()) tree arm_builtin_decls[ARM_BUILTIN_MAX];
 
+#define NUM_DREG_TYPES 5
+#define NUM_QREG_TYPES 6
+
 static void
 arm_init_neon_builtins (void)
 {
@@ -23176,6 +23196,7 @@  arm_init_neon_builtins (void)
   tree neon_polyHI_type_node;
   tree neon_intSI_type_node;
   tree neon_intDI_type_node;
+  tree neon_intUTI_type_node;
   tree neon_float_type_node;
 
   tree intQI_pointer_node;
@@ -23238,9 +23259,9 @@  arm_init_neon_builtins (void)
   tree void_ftype_pv4sf_v4sf_v4sf;
   tree void_ftype_pv2di_v2di_v2di;
 
-  tree reinterp_ftype_dreg[5][5];
-  tree reinterp_ftype_qreg[5][5];
-  tree dreg_types[5], qreg_types[5];
+  tree reinterp_ftype_dreg[NUM_DREG_TYPES][NUM_DREG_TYPES];
+  tree reinterp_ftype_qreg[NUM_QREG_TYPES][NUM_QREG_TYPES];
+  tree dreg_types[NUM_DREG_TYPES], qreg_types[NUM_QREG_TYPES];
 
   /* Create distinguished type nodes for NEON vector element types,
      and pointers to values of such types, so we can detect them later.  */
@@ -23330,6 +23351,8 @@  arm_init_neon_builtins (void)
   intUHI_type_node = make_unsigned_type (GET_MODE_PRECISION (HImode));
   intUSI_type_node = make_unsigned_type (GET_MODE_PRECISION (SImode));
   intUDI_type_node = make_unsigned_type (GET_MODE_PRECISION (DImode));
+  neon_intUTI_type_node = make_unsigned_type (GET_MODE_PRECISION (TImode));
+
 
   (*lang_hooks.types.register_builtin_type) (intUQI_type_node,
 					     "__builtin_neon_uqi");
@@ -23339,6 +23362,10 @@  arm_init_neon_builtins (void)
 					     "__builtin_neon_usi");
   (*lang_hooks.types.register_builtin_type) (intUDI_type_node,
 					     "__builtin_neon_udi");
+  (*lang_hooks.types.register_builtin_type) (intUDI_type_node,
+					     "__builtin_neon_poly64");
+  (*lang_hooks.types.register_builtin_type) (neon_intUTI_type_node,
+					     "__builtin_neon_poly128");
 
   /* Opaque integer types for structures of vectors.  */
   intEI_type_node = make_signed_type (GET_MODE_PRECISION (EImode));
@@ -23400,6 +23427,80 @@  arm_init_neon_builtins (void)
     build_function_type_list (void_type_node, V2DI_pointer_node, V2DI_type_node,
 			      V2DI_type_node, NULL);
 
+  if (TARGET_CRYPTO && TARGET_HARD_FLOAT)
+  {
+    tree V4USI_type_node =
+      build_vector_type_for_mode (intUSI_type_node, V4SImode);
+
+    tree V16UQI_type_node =
+      build_vector_type_for_mode (intUQI_type_node, V16QImode);
+
+    tree v16uqi_ftype_v16uqi
+      = build_function_type_list (V16UQI_type_node, V16UQI_type_node, NULL_TREE);
+
+    tree v16uqi_ftype_v16uqi_v16uqi
+      = build_function_type_list (V16UQI_type_node, V16UQI_type_node,
+                                  V16UQI_type_node, NULL_TREE);
+
+    tree v4usi_ftype_v4usi
+      = build_function_type_list (V4USI_type_node, V4USI_type_node, NULL_TREE);
+
+    tree v4usi_ftype_v4usi_v4usi
+      = build_function_type_list (V4USI_type_node, V4USI_type_node,
+                                  V4USI_type_node, NULL_TREE);
+
+    tree v4usi_ftype_v4usi_v4usi_v4usi
+      = build_function_type_list (V4USI_type_node, V4USI_type_node,
+                                  V4USI_type_node, V4USI_type_node, NULL_TREE);
+
+    tree uti_ftype_udi_udi
+      = build_function_type_list (neon_intUTI_type_node, intUDI_type_node,
+                                  intUDI_type_node, NULL_TREE);
+
+    #undef CRYPTO1
+    #undef CRYPTO2
+    #undef CRYPTO3
+    #undef C
+    #undef N
+    #undef CF
+    #undef FT1
+    #undef FT2
+    #undef FT3
+
+    #define C(U) \
+      ARM_BUILTIN_CRYPTO_##U
+    #define N(L) \
+      "__builtin_arm_crypto_"#L
+    #define FT1(R, A) \
+      R##_ftype_##A
+    #define FT2(R, A1, A2) \
+      R##_ftype_##A1##_##A2
+    #define FT3(R, A1, A2, A3) \
+      R##_ftype_##A1##_##A2##_##A3
+    #define CRYPTO1(L, U, R, A) \
+      arm_builtin_decls[C (U)] = add_builtin_function (N (L), FT1 (R, A), \
+                                                       C (U), BUILT_IN_MD, \
+                                                       NULL, NULL_TREE);
+    #define CRYPTO2(L, U, R, A1, A2) \
+      arm_builtin_decls[C (U)] = add_builtin_function (N (L), FT2 (R, A1, A2), \
+                                                       C (U), BUILT_IN_MD, \
+                                                       NULL, NULL_TREE);
+
+    #define CRYPTO3(L, U, R, A1, A2, A3) \
+      arm_builtin_decls[C (U)] = add_builtin_function (N (L), FT3 (R, A1, A2, A3), \
+                                                       C (U), BUILT_IN_MD, \
+                                                       NULL, NULL_TREE);
+    #include "crypto.def"
+
+    #undef CRYPTO1
+    #undef CRYPTO2
+    #undef CRYPTO3
+    #undef C
+    #undef N
+    #undef FT1
+    #undef FT2
+    #undef FT3
+  }
   dreg_types[0] = V8QI_type_node;
   dreg_types[1] = V4HI_type_node;
   dreg_types[2] = V2SI_type_node;
@@ -23411,14 +23512,17 @@  arm_init_neon_builtins (void)
   qreg_types[2] = V4SI_type_node;
   qreg_types[3] = V4SF_type_node;
   qreg_types[4] = V2DI_type_node;
+  qreg_types[5] = neon_intUTI_type_node;
 
-  for (i = 0; i < 5; i++)
+  for (i = 0; i < NUM_QREG_TYPES; i++)
     {
       int j;
-      for (j = 0; j < 5; j++)
+      for (j = 0; j < NUM_QREG_TYPES; j++)
         {
-          reinterp_ftype_dreg[i][j]
-            = build_function_type_list (dreg_types[i], dreg_types[j], NULL);
+          if (i < NUM_DREG_TYPES && j < NUM_DREG_TYPES)
+            reinterp_ftype_dreg[i][j]
+              = build_function_type_list (dreg_types[i], dreg_types[j], NULL);
+
           reinterp_ftype_qreg[i][j]
             = build_function_type_list (qreg_types[i], qreg_types[j], NULL);
         }
@@ -23633,10 +23737,14 @@  arm_init_neon_builtins (void)
 
 	case NEON_REINTERP:
 	  {
-	    /* We iterate over 5 doubleword types, then 5 quadword
-	       types. V4HF is not a type used in reinterpret, so we translate
+	    /* We iterate over NUM_DREG_TYPES doubleword types,
+	       then NUM_QREG_TYPES quadword  types.
+	       V4HF is not a type used in reinterpret, so we translate
 	       d->mode to the correct index in reinterp_ftype_dreg.  */
-	    int rhs = (d->mode - ((d->mode > T_V4HF) ? 1 : 0)) % 5;
+	    bool qreg_p
+	      = GET_MODE_SIZE (insn_data[d->code].operand[0].mode) > 8;
+	    int rhs = (d->mode - ((!qreg_p && (d->mode > T_V4HF)) ? 1 : 0))
+	              % NUM_QREG_TYPES;
 	    switch (insn_data[d->code].operand[0].mode)
 	      {
 	      case V8QImode: ftype = reinterp_ftype_dreg[0][rhs]; break;
@@ -23649,6 +23757,7 @@  arm_init_neon_builtins (void)
 	      case V4SImode: ftype = reinterp_ftype_qreg[2][rhs]; break;
 	      case V4SFmode: ftype = reinterp_ftype_qreg[3][rhs]; break;
 	      case V2DImode: ftype = reinterp_ftype_qreg[4][rhs]; break;
+	      case TImode: ftype = reinterp_ftype_qreg[5][rhs]; break;
 	      default: gcc_unreachable ();
 	      }
 	  }
@@ -23699,6 +23808,9 @@  arm_init_neon_builtins (void)
     }
 }
 
+#undef NUM_DREG_TYPES
+#undef NUM_QREG_TYPES
+
 #define def_mbuiltin(MASK, NAME, TYPE, CODE)				\
   do									\
     {									\
@@ -23838,6 +23950,22 @@  static const struct builtin_description bdesc_2arg[] =
    CRC32_BUILTIN (crc32ch, CRC32CH)
    CRC32_BUILTIN (crc32cw, CRC32CW)
 #undef CRC32_BUILTIN
+
+
+#define CRYPTO_BUILTIN(L, U) \
+  {0, CODE_FOR_crypto_##L, "__builtin_arm_crypto_"#L, ARM_BUILTIN_CRYPTO_##U, \
+   UNKNOWN, 0},
+#undef CRYPTO1
+#undef CRYPTO2
+#undef CRYPTO3
+#define CRYPTO2(L, U, R, A1, A2) CRYPTO_BUILTIN (L, U)
+#define CRYPTO1(L, U, R, A)
+#define CRYPTO3(L, U, R, A1, A2, A3)
+#include "crypto.def"
+#undef CRYPTO1
+#undef CRYPTO2
+#undef CRYPTO3
+
 };
 
 static const struct builtin_description bdesc_1arg[] =
@@ -23866,8 +23994,28 @@  static const struct builtin_description bdesc_1arg[] =
   IWMMXT_BUILTIN (tbcstv8qi, "tbcstb", TBCSTB)
   IWMMXT_BUILTIN (tbcstv4hi, "tbcsth", TBCSTH)
   IWMMXT_BUILTIN (tbcstv2si, "tbcstw", TBCSTW)
+
+#define CRYPTO1(L, U, R, A) CRYPTO_BUILTIN (L, U)
+#define CRYPTO2(L, U, R, A1, A2)
+#define CRYPTO3(L, U, R, A1, A2, A3)
+#include "crypto.def"
+#undef CRYPTO1
+#undef CRYPTO2
+#undef CRYPTO3
 };
 
+static const struct builtin_description bdesc_3arg[] =
+{
+#define CRYPTO3(L, U, R, A1, A2, A3) CRYPTO_BUILTIN (L, U)
+#define CRYPTO1(L, U, R, A)
+#define CRYPTO2(L, U, R, A1, A2)
+#include "crypto.def"
+#undef CRYPTO1
+#undef CRYPTO2
+#undef CRYPTO3
+ };
+#undef CRYPTO_BUILTIN
+
 /* Set up all the iWMMXt builtins.  This is not called if
    TARGET_IWMMXT is zero.  */
 
@@ -24399,6 +24547,73 @@  safe_vector_operand (rtx x, enum machine_mode mode)
   return x;
 }
 
+/* Function to expand ternary builtins.  */
+static rtx
+arm_expand_ternop_builtin (enum insn_code icode,
+                           tree exp, rtx target)
+{
+  rtx pat;
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  tree arg2 = CALL_EXPR_ARG (exp, 2);
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx op2 = expand_normal (arg2);
+  rtx op3 = NULL_RTX;
+
+  /* The sha1c, sha1p, sha1m crypto builtins require a different vec_select
+     lane operand depending on endianness.  */
+  bool builtin_sha1cpm_p = false;
+
+  if (insn_data[icode].n_operands == 5)
+    {
+      gcc_assert (icode == CODE_FOR_crypto_sha1c
+                  || icode == CODE_FOR_crypto_sha1p
+                  || icode == CODE_FOR_crypto_sha1m);
+      builtin_sha1cpm_p = true;
+    }
+  enum machine_mode tmode = insn_data[icode].operand[0].mode;
+  enum machine_mode mode0 = insn_data[icode].operand[1].mode;
+  enum machine_mode mode1 = insn_data[icode].operand[2].mode;
+  enum machine_mode mode2 = insn_data[icode].operand[3].mode;
+
+
+  if (VECTOR_MODE_P (mode0))
+    op0 = safe_vector_operand (op0, mode0);
+  if (VECTOR_MODE_P (mode1))
+    op1 = safe_vector_operand (op1, mode1);
+  if (VECTOR_MODE_P (mode2))
+    op2 = safe_vector_operand (op2, mode2);
+
+  if (! target
+      || GET_MODE (target) != tmode
+      || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+    target = gen_reg_rtx (tmode);
+
+  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
+	      && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode)
+	      && (GET_MODE (op2) == mode2 || GET_MODE (op2) == VOIDmode));
+
+  if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
+    op0 = copy_to_mode_reg (mode0, op0);
+  if (! (*insn_data[icode].operand[2].predicate) (op1, mode1))
+    op1 = copy_to_mode_reg (mode1, op1);
+  if (! (*insn_data[icode].operand[3].predicate) (op2, mode2))
+    op2 = copy_to_mode_reg (mode2, op2);
+  if (builtin_sha1cpm_p)
+    op3 = GEN_INT (TARGET_BIG_END ? 1 : 0);
+
+  if (builtin_sha1cpm_p)
+    pat = GEN_FCN (icode) (target, op0, op1, op2, op3);
+  else
+    pat = GEN_FCN (icode) (target, op0, op1, op2);
+  if (! pat)
+    return 0;
+  emit_insn (pat);
+  return target;
+}
+
 /* Subroutine of arm_expand_builtin to take care of binop insns.  */
 
 static rtx
@@ -24448,8 +24663,16 @@  arm_expand_unop_builtin (enum insn_code icode,
   rtx pat;
   tree arg0 = CALL_EXPR_ARG (exp, 0);
   rtx op0 = expand_normal (arg0);
+  rtx op1 = NULL_RTX;
   enum machine_mode tmode = insn_data[icode].operand[0].mode;
   enum machine_mode mode0 = insn_data[icode].operand[1].mode;
+  bool builtin_sha1h_p = false;
+
+  if (insn_data[icode].n_operands == 3)
+    {
+      gcc_assert (icode == CODE_FOR_crypto_sha1h);
+      builtin_sha1h_p = true;
+    }
 
   if (! target
       || GET_MODE (target) != tmode
@@ -24465,8 +24688,13 @@  arm_expand_unop_builtin (enum insn_code icode,
       if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
 	op0 = copy_to_mode_reg (mode0, op0);
     }
+  if (builtin_sha1h_p)
+    op1 = GEN_INT (TARGET_BIG_END ? 1 : 0);
 
-  pat = GEN_FCN (icode) (target, op0);
+  if (builtin_sha1h_p)
+    pat = GEN_FCN (icode) (target, op0, op1);
+  else
+    pat = GEN_FCN (icode) (target, op0);
   if (! pat)
     return 0;
   emit_insn (pat);
@@ -25424,6 +25652,10 @@  arm_expand_builtin (tree exp,
     if (d->code == (const enum arm_builtins) fcode)
       return arm_expand_unop_builtin (d->icode, exp, target, 0);
 
+  for (i = 0, d = bdesc_3arg; i < ARRAY_SIZE (bdesc_3arg); i++, d++)
+    if (d->code == (const enum arm_builtins) fcode)
+      return arm_expand_ternop_builtin (d->icode, exp, target);
+
   /* @@@ Should really do something sensible here.  */
   return NULL_RTX;
 }
@@ -29083,6 +29315,7 @@  static arm_mangle_map_entry arm_mangle_map[] = {
   { V2SFmode,  "__builtin_neon_sf",     "18__simd64_float32_t" },
   { V8QImode,  "__builtin_neon_poly8",  "16__simd64_poly8_t" },
   { V4HImode,  "__builtin_neon_poly16", "17__simd64_poly16_t" },
+
   /* 128-bit containerized types.  */
   { V16QImode, "__builtin_neon_qi",     "16__simd128_int8_t" },
   { V16QImode, "__builtin_neon_uqi",    "17__simd128_uint8_t" },
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index cfc63bc..2e216e6 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -49,6 +49,8 @@  extern char arm_arch_name[];
            builtin_define ("__ARM_FEATURE_QBIT");	\
         if (TARGET_ARM_SAT)				\
            builtin_define ("__ARM_FEATURE_SAT");	\
+        if (TARGET_CRYPTO)				\
+	   builtin_define ("__ARM_FEATURE_CRYPTO");	\
 	if (unaligned_access)				\
 	  builtin_define ("__ARM_FEATURE_UNALIGNED");	\
 	if (TARGET_CRC32)				\
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 1692a13..05051ef 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -293,7 +293,7 @@ 
           neon_ext, neon_ext_q, neon_rbit, neon_rbit_q,\
           neon_rev, neon_rev_q, neon_mul_b, neon_mul_b_q, neon_mul_h,\
           neon_mul_h_q, neon_mul_s, neon_mul_s_q, neon_mul_b_long,\
-          neon_mul_h_long, neon_mul_s_long, neon_mul_h_scalar,\
+          neon_mul_h_long, neon_mul_s_long, neon_mul_d_long, neon_mul_h_scalar,\
           neon_mul_h_scalar_q, neon_mul_s_scalar, neon_mul_s_scalar_q,\
           neon_mul_h_scalar_long, neon_mul_s_scalar_long, neon_sat_mul_b,\
           neon_sat_mul_b_q, neon_sat_mul_h, neon_sat_mul_h_q,\
@@ -355,7 +355,9 @@ 
           neon_fp_mla_s_scalar, neon_fp_mla_s_scalar_q, neon_fp_mla_d,\
           neon_fp_mla_d_q, neon_fp_mla_d_scalar_q, neon_fp_sqrt_s,\
           neon_fp_sqrt_s_q, neon_fp_sqrt_d, neon_fp_sqrt_d_q,\
-          neon_fp_div_s, neon_fp_div_s_q, neon_fp_div_d, neon_fp_div_d_q")
+          neon_fp_div_s, neon_fp_div_s_q, neon_fp_div_d, neon_fp_div_d_q, crypto_aes,\
+          crypto_sha1_xor, crypto_sha1_fast, crypto_sha1_slow, crypto_sha256_fast,\
+          crypto_sha256_slow")
         (const_string "yes")
         (const_string "no")))
 
@@ -12900,6 +12902,8 @@ 
 (include "thumb2.md")
 ;; Neon patterns
 (include "neon.md")
+;; Crypto patterns
+(include "crypto.md")
 ;; Synchronization Primitives
 (include "sync.md")
 ;; Fixed-point patterns
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index e23d03b..59ef22c 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -42,10 +42,13 @@  typedef __builtin_neon_qi int8x8_t	__attribute__ ((__vector_size__ (8)));
 typedef __builtin_neon_hi int16x4_t	__attribute__ ((__vector_size__ (8)));
 typedef __builtin_neon_si int32x2_t	__attribute__ ((__vector_size__ (8)));
 typedef __builtin_neon_di int64x1_t;
-typedef __builtin_neon_sf float32x2_t	__attribute__ ((__vector_size__ (8)));
 typedef __builtin_neon_hf float16x4_t	__attribute__ ((__vector_size__ (8)));
+typedef __builtin_neon_sf float32x2_t	__attribute__ ((__vector_size__ (8)));
 typedef __builtin_neon_poly8 poly8x8_t	__attribute__ ((__vector_size__ (8)));
 typedef __builtin_neon_poly16 poly16x4_t	__attribute__ ((__vector_size__ (8)));
+#ifdef __ARM_FEATURE_CRYPTO
+typedef __builtin_neon_poly64 poly64x1_t;
+#endif
 typedef __builtin_neon_uqi uint8x8_t	__attribute__ ((__vector_size__ (8)));
 typedef __builtin_neon_uhi uint16x4_t	__attribute__ ((__vector_size__ (8)));
 typedef __builtin_neon_usi uint32x2_t	__attribute__ ((__vector_size__ (8)));
@@ -57,6 +60,9 @@  typedef __builtin_neon_di int64x2_t	__attribute__ ((__vector_size__ (16)));
 typedef __builtin_neon_sf float32x4_t	__attribute__ ((__vector_size__ (16)));
 typedef __builtin_neon_poly8 poly8x16_t	__attribute__ ((__vector_size__ (16)));
 typedef __builtin_neon_poly16 poly16x8_t	__attribute__ ((__vector_size__ (16)));
+#ifdef __ARM_FEATURE_CRYPTO
+typedef __builtin_neon_poly64 poly64x2_t	__attribute__ ((__vector_size__ (16)));
+#endif
 typedef __builtin_neon_uqi uint8x16_t	__attribute__ ((__vector_size__ (16)));
 typedef __builtin_neon_uhi uint16x8_t	__attribute__ ((__vector_size__ (16)));
 typedef __builtin_neon_usi uint32x4_t	__attribute__ ((__vector_size__ (16)));
@@ -65,6 +71,10 @@  typedef __builtin_neon_udi uint64x2_t	__attribute__ ((__vector_size__ (16)));
 typedef float float32_t;
 typedef __builtin_neon_poly8 poly8_t;
 typedef __builtin_neon_poly16 poly16_t;
+#ifdef __ARM_FEATURE_CRYPTO
+typedef __builtin_neon_poly64 poly64_t;
+typedef __builtin_neon_poly128 poly128_t;
+#endif
 
 typedef struct int8x8x2_t
 {
@@ -176,6 +186,22 @@  typedef struct poly16x8x2_t
   poly16x8_t val[2];
 } poly16x8x2_t;
 
+#ifdef __ARM_FEATURE_CRYPTO
+typedef struct poly64x1x2_t
+{
+  poly64x1_t val[2];
+} poly64x1x2_t;
+#endif
+
+
+#ifdef __ARM_FEATURE_CRYPTO
+typedef struct poly64x2x2_t
+{
+  poly64x2_t val[2];
+} poly64x2x2_t;
+#endif
+
+
 typedef struct int8x8x3_t
 {
   int8x8_t val[3];
@@ -286,6 +312,22 @@  typedef struct poly16x8x3_t
   poly16x8_t val[3];
 } poly16x8x3_t;
 
+#ifdef __ARM_FEATURE_CRYPTO
+typedef struct poly64x1x3_t
+{
+  poly64x1_t val[3];
+} poly64x1x3_t;
+#endif
+
+
+#ifdef __ARM_FEATURE_CRYPTO
+typedef struct poly64x2x3_t
+{
+  poly64x2_t val[3];
+} poly64x2x3_t;
+#endif
+
+
 typedef struct int8x8x4_t
 {
   int8x8_t val[4];
@@ -396,6 +438,22 @@  typedef struct poly16x8x4_t
   poly16x8_t val[4];
 } poly16x8x4_t;
 
+#ifdef __ARM_FEATURE_CRYPTO
+typedef struct poly64x1x4_t
+{
+  poly64x1_t val[4];
+} poly64x1x4_t;
+#endif
+
+
+#ifdef __ARM_FEATURE_CRYPTO
+typedef struct poly64x2x4_t
+{
+  poly64x2_t val[4];
+} poly64x2x4_t;
+#endif
+
+
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vadd_s8 (int8x8_t __a, int8x8_t __b)
@@ -4361,6 +4419,14 @@  vrsraq_n_u64 (uint64x2_t __a, uint64x2_t __b, const int __c)
   return (uint64x2_t)__builtin_neon_vsra_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c, 4);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vsri_n_p64 (poly64x1_t __a, poly64x1_t __b, const int __c)
+{
+  return (poly64x1_t)__builtin_neon_vsri_ndi (__a, __b, __c);
+}
+
+#endif
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vsri_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
 {
@@ -4421,6 +4487,14 @@  vsri_n_p16 (poly16x4_t __a, poly16x4_t __b, const int __c)
   return (poly16x4_t)__builtin_neon_vsri_nv4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vsriq_n_p64 (poly64x2_t __a, poly64x2_t __b, const int __c)
+{
+  return (poly64x2_t)__builtin_neon_vsri_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
+}
+
+#endif
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vsriq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
 {
@@ -4481,6 +4555,14 @@  vsriq_n_p16 (poly16x8_t __a, poly16x8_t __b, const int __c)
   return (poly16x8_t)__builtin_neon_vsri_nv8hi ((int16x8_t) __a, (int16x8_t) __b, __c);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vsli_n_p64 (poly64x1_t __a, poly64x1_t __b, const int __c)
+{
+  return (poly64x1_t)__builtin_neon_vsli_ndi (__a, __b, __c);
+}
+
+#endif
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vsli_n_s8 (int8x8_t __a, int8x8_t __b, const int __c)
 {
@@ -4541,6 +4623,14 @@  vsli_n_p16 (poly16x4_t __a, poly16x4_t __b, const int __c)
   return (poly16x4_t)__builtin_neon_vsli_nv4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vsliq_n_p64 (poly64x2_t __a, poly64x2_t __b, const int __c)
+{
+  return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
+}
+
+#endif
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vsliq_n_s8 (int8x16_t __a, int8x16_t __b, const int __c)
 {
@@ -5309,6 +5399,14 @@  vsetq_lane_u64 (uint64_t __a, uint64x2_t __b, const int __c)
   return (uint64x2_t)__builtin_neon_vset_lanev2di ((__builtin_neon_di) __a, (int64x2_t) __b, __c);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vcreate_p64 (uint64_t __a)
+{
+  return (poly64x1_t)__builtin_neon_vcreatedi ((__builtin_neon_di) __a);
+}
+
+#endif
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vcreate_s8 (uint64_t __a)
 {
@@ -5429,6 +5527,14 @@  vdup_n_p16 (poly16_t __a)
   return (poly16x4_t)__builtin_neon_vdup_nv4hi ((__builtin_neon_hi) __a);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vdup_n_p64 (poly64_t __a)
+{
+  return (poly64x1_t)__builtin_neon_vdup_ndi ((__builtin_neon_di) __a);
+}
+
+#endif
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vdup_n_s64 (int64_t __a)
 {
@@ -5441,6 +5547,14 @@  vdup_n_u64 (uint64_t __a)
   return (uint64x1_t)__builtin_neon_vdup_ndi ((__builtin_neon_di) __a);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vdupq_n_p64 (poly64_t __a)
+{
+  return (poly64x2_t)__builtin_neon_vdup_nv2di ((__builtin_neon_di) __a);
+}
+
+#endif
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vdupq_n_s8 (int8_t __a)
 {
@@ -5693,6 +5807,14 @@  vdup_lane_p16 (poly16x4_t __a, const int __b)
   return (poly16x4_t)__builtin_neon_vdup_lanev4hi ((int16x4_t) __a, __b);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vdup_lane_p64 (poly64x1_t __a, const int __b)
+{
+  return (poly64x1_t)__builtin_neon_vdup_lanedi (__a, __b);
+}
+
+#endif
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vdup_lane_s64 (int64x1_t __a, const int __b)
 {
@@ -5759,6 +5881,14 @@  vdupq_lane_p16 (poly16x4_t __a, const int __b)
   return (poly16x8_t)__builtin_neon_vdup_lanev8hi ((int16x4_t) __a, __b);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vdupq_lane_p64 (poly64x1_t __a, const int __b)
+{
+  return (poly64x2_t)__builtin_neon_vdup_lanev2di (__a, __b);
+}
+
+#endif
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vdupq_lane_s64 (int64x1_t __a, const int __b)
 {
@@ -5771,6 +5901,14 @@  vdupq_lane_u64 (uint64x1_t __a, const int __b)
   return (uint64x2_t)__builtin_neon_vdup_lanev2di ((int64x1_t) __a, __b);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vcombine_p64 (poly64x1_t __a, poly64x1_t __b)
+{
+  return (poly64x2_t)__builtin_neon_vcombinedi (__a, __b);
+}
+
+#endif
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vcombine_s8 (int8x8_t __a, int8x8_t __b)
 {
@@ -5837,6 +5975,14 @@  vcombine_p16 (poly16x4_t __a, poly16x4_t __b)
   return (poly16x8_t)__builtin_neon_vcombinev4hi ((int16x4_t) __a, (int16x4_t) __b);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vget_high_p64 (poly64x2_t __a)
+{
+  return (poly64x1_t)__builtin_neon_vget_highv2di ((int64x2_t) __a);
+}
+
+#endif
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vget_high_s8 (int8x16_t __a)
 {
@@ -5957,6 +6103,14 @@  vget_low_p16 (poly16x8_t __a)
   return (poly16x4_t)__builtin_neon_vget_lowv8hi ((int16x8_t) __a);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vget_low_p64 (poly64x2_t __a)
+{
+  return (poly64x1_t)__builtin_neon_vget_lowv2di ((int64x2_t) __a);
+}
+
+#endif
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vget_low_s64 (int64x2_t __a)
 {
@@ -7041,6 +7195,14 @@  vqdmlsl_n_s32 (int64x2_t __a, int32x2_t __b, int32_t __c)
   return (int64x2_t)__builtin_neon_vqdmlsl_nv2si (__a, __b, (__builtin_neon_si) __c, 1);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vext_p64 (poly64x1_t __a, poly64x1_t __b, const int __c)
+{
+  return (poly64x1_t)__builtin_neon_vextdi (__a, __b, __c);
+}
+
+#endif
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vext_s8 (int8x8_t __a, int8x8_t __b, const int __c)
 {
@@ -7107,6 +7269,14 @@  vext_p16 (poly16x4_t __a, poly16x4_t __b, const int __c)
   return (poly16x4_t)__builtin_neon_vextv4hi ((int16x4_t) __a, (int16x4_t) __b, __c);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vextq_p64 (poly64x2_t __a, poly64x2_t __b, const int __c)
+{
+  return (poly64x2_t)__builtin_neon_vextv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
+}
+
+#endif
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vextq_s8 (int8x16_t __a, int8x16_t __b, const int __c)
 {
@@ -7389,6 +7559,14 @@  vrev16q_p8 (poly8x16_t __a)
   return (poly8x16_t) __builtin_shuffle (__a, (uint8x16_t) { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 });
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vbsl_p64 (uint64x1_t __a, poly64x1_t __b, poly64x1_t __c)
+{
+  return (poly64x1_t)__builtin_neon_vbsldi ((int64x1_t) __a, __b, __c);
+}
+
+#endif
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vbsl_s8 (uint8x8_t __a, int8x8_t __b, int8x8_t __c)
 {
@@ -7455,6 +7633,14 @@  vbsl_p16 (uint16x4_t __a, poly16x4_t __b, poly16x4_t __c)
   return (poly16x4_t)__builtin_neon_vbslv4hi ((int16x4_t) __a, (int16x4_t) __b, (int16x4_t) __c);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vbslq_p64 (uint64x2_t __a, poly64x2_t __b, poly64x2_t __c)
+{
+  return (poly64x2_t)__builtin_neon_vbslv2di ((int64x2_t) __a, (int64x2_t) __b, (int64x2_t) __c);
+}
+
+#endif
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vbslq_s8 (uint8x16_t __a, int8x16_t __b, int8x16_t __c)
 {
@@ -8007,6 +8193,14 @@  vuzpq_p16 (poly16x8_t __a, poly16x8_t __b)
   return __rv;
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vld1_p64 (const poly64_t * __a)
+{
+  return (poly64x1_t)__builtin_neon_vld1di ((const __builtin_neon_di *) __a);
+}
+
+#endif
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vld1_s8 (const int8_t * __a)
 {
@@ -8073,6 +8267,14 @@  vld1_p16 (const poly16_t * __a)
   return (poly16x4_t)__builtin_neon_vld1v4hi ((const __builtin_neon_hi *) __a);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vld1q_p64 (const poly64_t * __a)
+{
+  return (poly64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
+}
+
+#endif
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vld1q_s8 (const int8_t * __a)
 {
@@ -8193,6 +8395,14 @@  vld1_lane_p16 (const poly16_t * __a, poly16x4_t __b, const int __c)
   return (poly16x4_t)__builtin_neon_vld1_lanev4hi ((const __builtin_neon_hi *) __a, (int16x4_t) __b, __c);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vld1_lane_p64 (const poly64_t * __a, poly64x1_t __b, const int __c)
+{
+  return (poly64x1_t)__builtin_neon_vld1_lanedi ((const __builtin_neon_di *) __a, __b, __c);
+}
+
+#endif
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vld1_lane_s64 (const int64_t * __a, int64x1_t __b, const int __c)
 {
@@ -8259,6 +8469,14 @@  vld1q_lane_p16 (const poly16_t * __a, poly16x8_t __b, const int __c)
   return (poly16x8_t)__builtin_neon_vld1_lanev8hi ((const __builtin_neon_hi *) __a, (int16x8_t) __b, __c);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vld1q_lane_p64 (const poly64_t * __a, poly64x2_t __b, const int __c)
+{
+  return (poly64x2_t)__builtin_neon_vld1_lanev2di ((const __builtin_neon_di *) __a, (int64x2_t) __b, __c);
+}
+
+#endif
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vld1q_lane_s64 (const int64_t * __a, int64x2_t __b, const int __c)
 {
@@ -8325,6 +8543,14 @@  vld1_dup_p16 (const poly16_t * __a)
   return (poly16x4_t)__builtin_neon_vld1_dupv4hi ((const __builtin_neon_hi *) __a);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vld1_dup_p64 (const poly64_t * __a)
+{
+  return (poly64x1_t)__builtin_neon_vld1_dupdi ((const __builtin_neon_di *) __a);
+}
+
+#endif
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vld1_dup_s64 (const int64_t * __a)
 {
@@ -8391,6 +8617,14 @@  vld1q_dup_p16 (const poly16_t * __a)
   return (poly16x8_t)__builtin_neon_vld1_dupv8hi ((const __builtin_neon_hi *) __a);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vld1q_dup_p64 (const poly64_t * __a)
+{
+  return (poly64x2_t)__builtin_neon_vld1_dupv2di ((const __builtin_neon_di *) __a);
+}
+
+#endif
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vld1q_dup_s64 (const int64_t * __a)
 {
@@ -8403,6 +8637,14 @@  vld1q_dup_u64 (const uint64_t * __a)
   return (uint64x2_t)__builtin_neon_vld1_dupv2di ((const __builtin_neon_di *) __a);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_p64 (poly64_t * __a, poly64x1_t __b)
+{
+  __builtin_neon_vst1di ((__builtin_neon_di *) __a, __b);
+}
+
+#endif
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst1_s8 (int8_t * __a, int8x8_t __b)
 {
@@ -8469,6 +8711,14 @@  vst1_p16 (poly16_t * __a, poly16x4_t __b)
   __builtin_neon_vst1v4hi ((__builtin_neon_hi *) __a, (int16x4_t) __b);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_p64 (poly64_t * __a, poly64x2_t __b)
+{
+  __builtin_neon_vst1v2di ((__builtin_neon_di *) __a, (int64x2_t) __b);
+}
+
+#endif
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst1q_s8 (int8_t * __a, int8x16_t __b)
 {
@@ -8589,6 +8839,14 @@  vst1_lane_p16 (poly16_t * __a, poly16x4_t __b, const int __c)
   __builtin_neon_vst1_lanev4hi ((__builtin_neon_hi *) __a, (int16x4_t) __b, __c);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1_lane_p64 (poly64_t * __a, poly64x1_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanedi ((__builtin_neon_di *) __a, __b, __c);
+}
+
+#endif
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst1_lane_s64 (int64_t * __a, int64x1_t __b, const int __c)
 {
@@ -8655,6 +8913,14 @@  vst1q_lane_p16 (poly16_t * __a, poly16x8_t __b, const int __c)
   __builtin_neon_vst1_lanev8hi ((__builtin_neon_hi *) __a, (int16x8_t) __b, __c);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst1q_lane_p64 (poly64_t * __a, poly64x2_t __b, const int __c)
+{
+  __builtin_neon_vst1_lanev2di ((__builtin_neon_di *) __a, (int64x2_t) __b, __c);
+}
+
+#endif
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst1q_lane_s64 (int64_t * __a, int64x2_t __b, const int __c)
 {
@@ -8739,6 +9005,16 @@  vld2_p16 (const poly16_t * __a)
   return __rv.__i;
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1x2_t __attribute__ ((__always_inline__))
+vld2_p64 (const poly64_t * __a)
+{
+  union { poly64x1x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+#endif
 __extension__ static __inline int64x1x2_t __attribute__ ((__always_inline__))
 vld2_s64 (const int64_t * __a)
 {
@@ -9034,6 +9310,16 @@  vld2_dup_p16 (const poly16_t * __a)
   return __rv.__i;
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1x2_t __attribute__ ((__always_inline__))
+vld2_dup_p64 (const poly64_t * __a)
+{
+  union { poly64x1x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld2_dupdi ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+#endif
 __extension__ static __inline int64x1x2_t __attribute__ ((__always_inline__))
 vld2_dup_s64 (const int64_t * __a)
 {
@@ -9113,6 +9399,15 @@  vst2_p16 (poly16_t * __a, poly16x4x2_t __b)
   __builtin_neon_vst2v4hi ((__builtin_neon_hi *) __a, __bu.__o);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst2_p64 (poly64_t * __a, poly64x1x2_t __b)
+{
+  union { poly64x1x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
+#endif
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst2_s64 (int64_t * __a, int64x1x2_t __b)
 {
@@ -9367,6 +9662,16 @@  vld3_p16 (const poly16_t * __a)
   return __rv.__i;
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1x3_t __attribute__ ((__always_inline__))
+vld3_p64 (const poly64_t * __a)
+{
+  union { poly64x1x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+#endif
 __extension__ static __inline int64x1x3_t __attribute__ ((__always_inline__))
 vld3_s64 (const int64_t * __a)
 {
@@ -9662,6 +9967,16 @@  vld3_dup_p16 (const poly16_t * __a)
   return __rv.__i;
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1x3_t __attribute__ ((__always_inline__))
+vld3_dup_p64 (const poly64_t * __a)
+{
+  union { poly64x1x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld3_dupdi ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+#endif
 __extension__ static __inline int64x1x3_t __attribute__ ((__always_inline__))
 vld3_dup_s64 (const int64_t * __a)
 {
@@ -9741,6 +10056,15 @@  vst3_p16 (poly16_t * __a, poly16x4x3_t __b)
   __builtin_neon_vst3v4hi ((__builtin_neon_hi *) __a, __bu.__o);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst3_p64 (poly64_t * __a, poly64x1x3_t __b)
+{
+  union { poly64x1x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst3di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
+#endif
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst3_s64 (int64_t * __a, int64x1x3_t __b)
 {
@@ -9995,6 +10319,16 @@  vld4_p16 (const poly16_t * __a)
   return __rv.__i;
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1x4_t __attribute__ ((__always_inline__))
+vld4_p64 (const poly64_t * __a)
+{
+  union { poly64x1x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+#endif
 __extension__ static __inline int64x1x4_t __attribute__ ((__always_inline__))
 vld4_s64 (const int64_t * __a)
 {
@@ -10290,6 +10624,16 @@  vld4_dup_p16 (const poly16_t * __a)
   return __rv.__i;
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1x4_t __attribute__ ((__always_inline__))
+vld4_dup_p64 (const poly64_t * __a)
+{
+  union { poly64x1x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld4_dupdi ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
+#endif
 __extension__ static __inline int64x1x4_t __attribute__ ((__always_inline__))
 vld4_dup_s64 (const int64_t * __a)
 {
@@ -10369,6 +10713,15 @@  vst4_p16 (poly16_t * __a, poly16x4x4_t __b)
   __builtin_neon_vst4v4hi ((__builtin_neon_hi *) __a, __bu.__o);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vst4_p64 (poly64_t * __a, poly64x1x4_t __b)
+{
+  union { poly64x1x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst4di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
+#endif
 __extension__ static __inline void __attribute__ ((__always_inline__))
 vst4_s64 (int64_t * __a, int64x1x4_t __b)
 {
@@ -11033,23 +11386,25 @@  vornq_u64 (uint64x2_t __a, uint64x2_t __b)
 
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vreinterpret_p8_s8 (int8x8_t __a)
+vreinterpret_p8_p16 (poly16x4_t __a)
 {
-  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv8qi (__a);
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
 }
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vreinterpret_p8_s16 (int16x4_t __a)
+vreinterpret_p8_f32 (float32x2_t __a)
 {
-  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv4hi (__a);
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vreinterpret_p8_s32 (int32x2_t __a)
+vreinterpret_p8_p64 (poly64x1_t __a)
 {
-  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2si (__a);
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qidi (__a);
 }
 
+#endif
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
 vreinterpret_p8_s64 (int64x1_t __a)
 {
@@ -11057,99 +11412,77 @@  vreinterpret_p8_s64 (int64x1_t __a)
 }
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vreinterpret_p8_f32 (float32x2_t __a)
+vreinterpret_p8_u64 (uint64x1_t __a)
 {
-  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a);
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qidi ((int64x1_t) __a);
 }
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vreinterpret_p8_u8 (uint8x8_t __a)
+vreinterpret_p8_s8 (int8x8_t __a)
 {
-  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv8qi ((int8x8_t) __a);
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv8qi (__a);
 }
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vreinterpret_p8_u16 (uint16x4_t __a)
+vreinterpret_p8_s16 (int16x4_t __a)
 {
-  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv4hi (__a);
 }
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vreinterpret_p8_u32 (uint32x2_t __a)
+vreinterpret_p8_s32 (int32x2_t __a)
 {
-  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2si ((int32x2_t) __a);
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2si (__a);
 }
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vreinterpret_p8_u64 (uint64x1_t __a)
+vreinterpret_p8_u8 (uint8x8_t __a)
 {
-  return (poly8x8_t)__builtin_neon_vreinterpretv8qidi ((int64x1_t) __a);
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv8qi ((int8x8_t) __a);
 }
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
-vreinterpret_p8_p16 (poly16x4_t __a)
+vreinterpret_p8_u16 (uint16x4_t __a)
 {
   return (poly8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
 }
 
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_p8_s8 (int8x16_t __a)
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_u32 (uint32x2_t __a)
 {
-  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv16qi (__a);
+  return (poly8x8_t)__builtin_neon_vreinterpretv8qiv2si ((int32x2_t) __a);
 }
 
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_p8_s16 (int16x8_t __a)
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_p8 (poly8x8_t __a)
 {
-  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv8hi (__a);
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
 }
 
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_p8_s32 (int32x4_t __a)
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_f32 (float32x2_t __a)
 {
-  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4si (__a);
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a);
 }
 
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_p8_s64 (int64x2_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_p64 (poly64x1_t __a)
 {
-  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv2di (__a);
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hidi (__a);
 }
 
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_p8_f32 (float32x4_t __a)
-{
-  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a);
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_p8_u8 (uint8x16_t __a)
-{
-  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv16qi ((int8x16_t) __a);
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_p8_u16 (uint16x8_t __a)
-{
-  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_p8_u32 (uint32x4_t __a)
-{
-  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4si ((int32x4_t) __a);
-}
-
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_p8_u64 (uint64x2_t __a)
+#endif
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_s64 (int64x1_t __a)
 {
-  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv2di ((int64x2_t) __a);
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hidi (__a);
 }
 
-__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_p8_p16 (poly16x8_t __a)
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_u64 (uint64x1_t __a)
 {
-  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+  return (poly16x4_t)__builtin_neon_vreinterpretv4hidi ((int64x1_t) __a);
 }
 
 __extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
@@ -11171,18 +11504,6 @@  vreinterpret_p16_s32 (int32x2_t __a)
 }
 
 __extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
-vreinterpret_p16_s64 (int64x1_t __a)
-{
-  return (poly16x4_t)__builtin_neon_vreinterpretv4hidi (__a);
-}
-
-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
-vreinterpret_p16_f32 (float32x2_t __a)
-{
-  return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a);
-}
-
-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
 vreinterpret_p16_u8 (uint8x8_t __a)
 {
   return (poly16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
@@ -11200,76 +11521,36 @@  vreinterpret_p16_u32 (uint32x2_t __a)
   return (poly16x4_t)__builtin_neon_vreinterpretv4hiv2si ((int32x2_t) __a);
 }
 
-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
-vreinterpret_p16_u64 (uint64x1_t __a)
-{
-  return (poly16x4_t)__builtin_neon_vreinterpretv4hidi ((int64x1_t) __a);
-}
-
-__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
-vreinterpret_p16_p8 (poly8x8_t __a)
-{
-  return (poly16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
-}
-
-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_p16_s8 (int8x16_t __a)
-{
-  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv16qi (__a);
-}
-
-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_p16_s16 (int16x8_t __a)
-{
-  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv8hi (__a);
-}
-
-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_p16_s32 (int32x4_t __a)
-{
-  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si (__a);
-}
-
-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_p16_s64 (int64x2_t __a)
-{
-  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv2di (__a);
-}
-
-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_p16_f32 (float32x4_t __a)
-{
-  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a);
-}
-
-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_p16_u8 (uint8x16_t __a)
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_p8 (poly8x8_t __a)
 {
-  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfv8qi ((int8x8_t) __a);
 }
 
-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_p16_u16 (uint16x8_t __a)
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_p16 (poly16x4_t __a)
 {
-  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfv4hi ((int16x4_t) __a);
 }
 
-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_p16_u32 (uint32x4_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_p64 (poly64x1_t __a)
 {
-  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfdi (__a);
 }
 
-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_p16_u64 (uint64x2_t __a)
+#endif
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_s64 (int64x1_t __a)
 {
-  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv2di ((int64x2_t) __a);
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfdi (__a);
 }
 
-__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_p16_p8 (poly8x16_t __a)
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vreinterpret_f32_u64 (uint64x1_t __a)
 {
-  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+  return (float32x2_t)__builtin_neon_vreinterpretv2sfdi ((int64x1_t) __a);
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
@@ -11291,12 +11572,6 @@  vreinterpret_f32_s32 (int32x2_t __a)
 }
 
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vreinterpret_f32_s64 (int64x1_t __a)
-{
-  return (float32x2_t)__builtin_neon_vreinterpretv2sfdi (__a);
-}
-
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vreinterpret_f32_u8 (uint8x8_t __a)
 {
   return (float32x2_t)__builtin_neon_vreinterpretv2sfv8qi ((int8x8_t) __a);
@@ -11314,82 +11589,124 @@  vreinterpret_f32_u32 (uint32x2_t __a)
   return (float32x2_t)__builtin_neon_vreinterpretv2sfv2si ((int32x2_t) __a);
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vreinterpret_f32_u64 (uint64x1_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vreinterpret_p64_p8 (poly8x8_t __a)
 {
-  return (float32x2_t)__builtin_neon_vreinterpretv2sfdi ((int64x1_t) __a);
+  return (poly64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vreinterpret_f32_p8 (poly8x8_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vreinterpret_p64_p16 (poly16x4_t __a)
 {
-  return (float32x2_t)__builtin_neon_vreinterpretv2sfv8qi ((int8x8_t) __a);
+  return (poly64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vreinterpret_f32_p16 (poly16x4_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vreinterpret_p64_f32 (float32x2_t __a)
 {
-  return (float32x2_t)__builtin_neon_vreinterpretv2sfv4hi ((int16x4_t) __a);
+  return (poly64x1_t)__builtin_neon_vreinterpretdiv2sf (__a);
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_f32_s8 (int8x16_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vreinterpret_p64_s64 (int64x1_t __a)
 {
-  return (float32x4_t)__builtin_neon_vreinterpretv4sfv16qi (__a);
+  return (poly64x1_t)__builtin_neon_vreinterpretdidi (__a);
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_f32_s16 (int16x8_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vreinterpret_p64_u64 (uint64x1_t __a)
 {
-  return (float32x4_t)__builtin_neon_vreinterpretv4sfv8hi (__a);
+  return (poly64x1_t)__builtin_neon_vreinterpretdidi ((int64x1_t) __a);
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_f32_s32 (int32x4_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vreinterpret_p64_s8 (int8x8_t __a)
 {
-  return (float32x4_t)__builtin_neon_vreinterpretv4sfv4si (__a);
+  return (poly64x1_t)__builtin_neon_vreinterpretdiv8qi (__a);
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_f32_s64 (int64x2_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vreinterpret_p64_s16 (int16x4_t __a)
 {
-  return (float32x4_t)__builtin_neon_vreinterpretv4sfv2di (__a);
+  return (poly64x1_t)__builtin_neon_vreinterpretdiv4hi (__a);
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_f32_u8 (uint8x16_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vreinterpret_p64_s32 (int32x2_t __a)
 {
-  return (float32x4_t)__builtin_neon_vreinterpretv4sfv16qi ((int8x16_t) __a);
+  return (poly64x1_t)__builtin_neon_vreinterpretdiv2si (__a);
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_f32_u16 (uint16x8_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vreinterpret_p64_u8 (uint8x8_t __a)
 {
-  return (float32x4_t)__builtin_neon_vreinterpretv4sfv8hi ((int16x8_t) __a);
+  return (poly64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_f32_u32 (uint32x4_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vreinterpret_p64_u16 (uint16x4_t __a)
 {
-  return (float32x4_t)__builtin_neon_vreinterpretv4sfv4si ((int32x4_t) __a);
+  return (poly64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_f32_u64 (uint64x2_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x1_t __attribute__ ((__always_inline__))
+vreinterpret_p64_u32 (uint32x2_t __a)
 {
-  return (float32x4_t)__builtin_neon_vreinterpretv4sfv2di ((int64x2_t) __a);
+  return (poly64x1_t)__builtin_neon_vreinterpretdiv2si ((int32x2_t) __a);
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_f32_p8 (poly8x16_t __a)
+#endif
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_p8 (poly8x8_t __a)
 {
-  return (float32x4_t)__builtin_neon_vreinterpretv4sfv16qi ((int8x16_t) __a);
+  return (int64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_f32_p16 (poly16x8_t __a)
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_p16 (poly16x4_t __a)
 {
-  return (float32x4_t)__builtin_neon_vreinterpretv4sfv8hi ((int16x8_t) __a);
+  return (int64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_f32 (float32x2_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdiv2sf (__a);
+}
+
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_p64 (poly64x1_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdidi (__a);
+}
+
+#endif
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vreinterpret_s64_u64 (uint64x1_t __a)
+{
+  return (int64x1_t)__builtin_neon_vreinterpretdidi ((int64x1_t) __a);
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
@@ -11411,12 +11728,6 @@  vreinterpret_s64_s32 (int32x2_t __a)
 }
 
 __extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
-vreinterpret_s64_f32 (float32x2_t __a)
-{
-  return (int64x1_t)__builtin_neon_vreinterpretdiv2sf (__a);
-}
-
-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
 vreinterpret_s64_u8 (uint8x8_t __a)
 {
   return (int64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
@@ -11434,550 +11745,1204 @@  vreinterpret_s64_u32 (uint32x2_t __a)
   return (int64x1_t)__builtin_neon_vreinterpretdiv2si ((int32x2_t) __a);
 }
 
-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
-vreinterpret_s64_u64 (uint64x1_t __a)
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_p8 (poly8x8_t __a)
 {
-  return (int64x1_t)__builtin_neon_vreinterpretdidi ((int64x1_t) __a);
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
 }
 
-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
-vreinterpret_s64_p8 (poly8x8_t __a)
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_p16 (poly16x4_t __a)
 {
-  return (int64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
 }
 
-__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
-vreinterpret_s64_p16 (poly16x4_t __a)
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_f32 (float32x2_t __a)
 {
-  return (int64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv2sf (__a);
 }
 
-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_s64_s8 (int8x16_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_p64 (poly64x1_t __a)
 {
-  return (int64x2_t)__builtin_neon_vreinterpretv2div16qi (__a);
+  return (uint64x1_t)__builtin_neon_vreinterpretdidi (__a);
 }
 
-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_s64_s16 (int16x8_t __a)
+#endif
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_s64 (int64x1_t __a)
 {
-  return (int64x2_t)__builtin_neon_vreinterpretv2div8hi (__a);
+  return (uint64x1_t)__builtin_neon_vreinterpretdidi (__a);
 }
 
-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_s64_s32 (int32x4_t __a)
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_s8 (int8x8_t __a)
 {
-  return (int64x2_t)__builtin_neon_vreinterpretv2div4si (__a);
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv8qi (__a);
 }
 
-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_s64_f32 (float32x4_t __a)
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_s16 (int16x4_t __a)
 {
-  return (int64x2_t)__builtin_neon_vreinterpretv2div4sf (__a);
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv4hi (__a);
 }
 
-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_s64_u8 (uint8x16_t __a)
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_s32 (int32x2_t __a)
 {
-  return (int64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv2si (__a);
 }
 
-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_s64_u16 (uint16x8_t __a)
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_u8 (uint8x8_t __a)
 {
-  return (int64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
 }
 
-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_s64_u32 (uint32x4_t __a)
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_u16 (uint16x4_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vreinterpret_u64_u32 (uint32x2_t __a)
+{
+  return (uint64x1_t)__builtin_neon_vreinterpretdiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_p8 (poly8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_p16 (poly16x4_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_f32 (float32x2_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a);
+}
+
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_p64 (poly64x1_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qidi (__a);
+}
+
+#endif
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_s64 (int64x1_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qidi (__a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_u64 (uint64x1_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_s16 (int16x4_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv4hi (__a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_s32 (int32x2_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv2si (__a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_u8 (uint8x8_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_u16 (uint16x4_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
+vreinterpret_s8_u32 (uint32x2_t __a)
+{
+  return (int8x8_t)__builtin_neon_vreinterpretv8qiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_p8 (poly8x8_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_p16 (poly16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_f32 (float32x2_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a);
+}
+
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_p64 (poly64x1_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hidi (__a);
+}
+
+#endif
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_s64 (int64x1_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hidi (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_u64 (uint64x1_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_s8 (int8x8_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv8qi (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_s32 (int32x2_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv2si (__a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_u8 (uint8x8_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_u16 (uint16x4_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vreinterpret_s16_u32 (uint32x2_t __a)
+{
+  return (int16x4_t)__builtin_neon_vreinterpretv4hiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_p8 (poly8x8_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_p16 (poly16x4_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_f32 (float32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv2sf (__a);
+}
+
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_p64 (poly64x1_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2sidi (__a);
+}
+
+#endif
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_s64 (int64x1_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2sidi (__a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_u64 (uint64x1_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2sidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_s8 (int8x8_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv8qi (__a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_s16 (int16x4_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv4hi (__a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_u8 (uint8x8_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_u16 (uint16x4_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vreinterpret_s32_u32 (uint32x2_t __a)
+{
+  return (int32x2_t)__builtin_neon_vreinterpretv2siv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_p8 (poly8x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_p16 (poly16x4_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_f32 (float32x2_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a);
+}
+
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_p64 (poly64x1_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qidi (__a);
+}
+
+#endif
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_s64 (int64x1_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qidi (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_u64 (uint64x1_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_s8 (int8x8_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv8qi (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_s16 (int16x4_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv4hi (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_s32 (int32x2_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv2si (__a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_u16 (uint16x4_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
+vreinterpret_u8_u32 (uint32x2_t __a)
+{
+  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_p8 (poly8x8_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_p16 (poly16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_f32 (float32x2_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a);
+}
+
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_p64 (poly64x1_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hidi (__a);
+}
+
+#endif
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_s64 (int64x1_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hidi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_u64 (uint64x1_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_s8 (int8x8_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv8qi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_s16 (int16x4_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv4hi (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_s32 (int32x2_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv2si (__a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_u8 (uint8x8_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
+vreinterpret_u16_u32 (uint32x2_t __a)
+{
+  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv2si ((int32x2_t) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_p8 (poly8x8_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_p16 (poly16x4_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_f32 (float32x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv2sf (__a);
+}
+
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_p64 (poly64x1_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2sidi (__a);
+}
+
+#endif
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_s64 (int64x1_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2sidi (__a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_u64 (uint64x1_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2sidi ((int64x1_t) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_s8 (int8x8_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv8qi (__a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_s16 (int16x4_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv4hi (__a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_s32 (int32x2_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv2si (__a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_u8 (uint8x8_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv8qi ((int8x8_t) __a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vreinterpret_u32_u16 (uint16x4_t __a)
+{
+  return (uint32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_p16 (poly16x8_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_f32 (float32x4_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a);
+}
+
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_p64 (poly64x2_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv2di ((int64x2_t) __a);
+}
+
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_p128 (poly128_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiti ((__builtin_neon_ti) __a);
+}
+
+#endif
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_s64 (int64x2_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv2di (__a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_u64 (uint64x2_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_s8 (int8x16_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv16qi (__a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_s16 (int16x8_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv8hi (__a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_s32 (int32x4_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4si (__a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_u8 (uint8x16_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_u16 (uint16x8_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_u32 (uint32x4_t __a)
+{
+  return (poly8x16_t)__builtin_neon_vreinterpretv16qiv4si ((int32x4_t) __a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_p8 (poly8x16_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_f32 (float32x4_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a);
+}
+
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_p64 (poly64x2_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv2di ((int64x2_t) __a);
+}
+
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_p128 (poly128_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiti ((__builtin_neon_ti) __a);
+}
+
+#endif
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_s64 (int64x2_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv2di (__a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_u64 (uint64x2_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_s8 (int8x16_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv16qi (__a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_s16 (int16x8_t __a)
+{
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv8hi (__a);
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_s32 (int32x4_t __a)
 {
-  return (int64x2_t)__builtin_neon_vreinterpretv2div4si ((int32x4_t) __a);
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si (__a);
 }
 
-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_s64_u64 (uint64x2_t __a)
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_u8 (uint8x16_t __a)
 {
-  return (int64x2_t)__builtin_neon_vreinterpretv2div2di ((int64x2_t) __a);
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_s64_p8 (poly8x16_t __a)
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_u16 (uint16x8_t __a)
 {
-  return (int64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_s64_p16 (poly16x8_t __a)
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_u32 (uint32x4_t __a)
 {
-  return (int64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
+  return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
 }
 
-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
-vreinterpret_u64_s8 (int8x8_t __a)
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_p8 (poly8x16_t __a)
 {
-  return (uint64x1_t)__builtin_neon_vreinterpretdiv8qi (__a);
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
-vreinterpret_u64_s16 (int16x4_t __a)
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_p16 (poly16x8_t __a)
 {
-  return (uint64x1_t)__builtin_neon_vreinterpretdiv4hi (__a);
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
-vreinterpret_u64_s32 (int32x2_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_p64 (poly64x2_t __a)
 {
-  return (uint64x1_t)__builtin_neon_vreinterpretdiv2si (__a);
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
-vreinterpret_u64_s64 (int64x1_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_p128 (poly128_t __a)
 {
-  return (uint64x1_t)__builtin_neon_vreinterpretdidi (__a);
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfti ((__builtin_neon_ti) __a);
 }
 
-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
-vreinterpret_u64_f32 (float32x2_t __a)
+#endif
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_s64 (int64x2_t __a)
 {
-  return (uint64x1_t)__builtin_neon_vreinterpretdiv2sf (__a);
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv2di (__a);
 }
 
-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
-vreinterpret_u64_u8 (uint8x8_t __a)
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_u64 (uint64x2_t __a)
 {
-  return (uint64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
-vreinterpret_u64_u16 (uint16x4_t __a)
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_s8 (int8x16_t __a)
 {
-  return (uint64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv16qi (__a);
 }
 
-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
-vreinterpret_u64_u32 (uint32x2_t __a)
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_s16 (int16x8_t __a)
 {
-  return (uint64x1_t)__builtin_neon_vreinterpretdiv2si ((int32x2_t) __a);
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv8hi (__a);
 }
 
-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
-vreinterpret_u64_p8 (poly8x8_t __a)
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_s32 (int32x4_t __a)
 {
-  return (uint64x1_t)__builtin_neon_vreinterpretdiv8qi ((int8x8_t) __a);
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv4si (__a);
 }
 
-__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
-vreinterpret_u64_p16 (poly16x4_t __a)
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_u8 (uint8x16_t __a)
 {
-  return (uint64x1_t)__builtin_neon_vreinterpretdiv4hi ((int16x4_t) __a);
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_u64_s8 (int8x16_t __a)
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_u16 (uint16x8_t __a)
 {
-  return (uint64x2_t)__builtin_neon_vreinterpretv2div16qi (__a);
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_u64_s16 (int16x8_t __a)
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_f32_u32 (uint32x4_t __a)
 {
-  return (uint64x2_t)__builtin_neon_vreinterpretv2div8hi (__a);
+  return (float32x4_t)__builtin_neon_vreinterpretv4sfv4si ((int32x4_t) __a);
 }
 
-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_u64_s32 (int32x4_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_p8 (poly8x16_t __a)
 {
-  return (uint64x2_t)__builtin_neon_vreinterpretv2div4si (__a);
+  return (poly64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_u64_s64 (int64x2_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_p16 (poly16x8_t __a)
 {
-  return (uint64x2_t)__builtin_neon_vreinterpretv2div2di (__a);
+  return (poly64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_u64_f32 (float32x4_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_f32 (float32x4_t __a)
 {
-  return (uint64x2_t)__builtin_neon_vreinterpretv2div4sf (__a);
+  return (poly64x2_t)__builtin_neon_vreinterpretv2div4sf (__a);
 }
 
-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_u64_u8 (uint8x16_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_p128 (poly128_t __a)
 {
-  return (uint64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
+  return (poly64x2_t)__builtin_neon_vreinterpretv2diti ((__builtin_neon_ti) __a);
 }
 
-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_u64_u16 (uint16x8_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_s64 (int64x2_t __a)
 {
-  return (uint64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
+  return (poly64x2_t)__builtin_neon_vreinterpretv2div2di (__a);
 }
 
-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_u64_u32 (uint32x4_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_u64 (uint64x2_t __a)
 {
-  return (uint64x2_t)__builtin_neon_vreinterpretv2div4si ((int32x4_t) __a);
+  return (poly64x2_t)__builtin_neon_vreinterpretv2div2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_u64_p8 (poly8x16_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_s8 (int8x16_t __a)
 {
-  return (uint64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
+  return (poly64x2_t)__builtin_neon_vreinterpretv2div16qi (__a);
 }
 
-__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
-vreinterpretq_u64_p16 (poly16x8_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_s16 (int16x8_t __a)
 {
-  return (uint64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
+  return (poly64x2_t)__builtin_neon_vreinterpretv2div8hi (__a);
 }
 
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vreinterpret_s8_s16 (int16x4_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_s32 (int32x4_t __a)
 {
-  return (int8x8_t)__builtin_neon_vreinterpretv8qiv4hi (__a);
+  return (poly64x2_t)__builtin_neon_vreinterpretv2div4si (__a);
 }
 
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vreinterpret_s8_s32 (int32x2_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_u8 (uint8x16_t __a)
 {
-  return (int8x8_t)__builtin_neon_vreinterpretv8qiv2si (__a);
+  return (poly64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vreinterpret_s8_s64 (int64x1_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_u16 (uint16x8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vreinterpretv8qidi (__a);
+  return (poly64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vreinterpret_s8_f32 (float32x2_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_p64_u32 (uint32x4_t __a)
 {
-  return (int8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a);
+  return (poly64x2_t)__builtin_neon_vreinterpretv2div4si ((int32x4_t) __a);
 }
 
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vreinterpret_s8_u8 (uint8x8_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_p8 (poly8x16_t __a)
 {
-  return (int8x8_t)__builtin_neon_vreinterpretv8qiv8qi ((int8x8_t) __a);
+  return (poly128_t)__builtin_neon_vreinterprettiv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vreinterpret_s8_u16 (uint16x4_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_p16 (poly16x8_t __a)
 {
-  return (int8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+  return (poly128_t)__builtin_neon_vreinterprettiv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vreinterpret_s8_u32 (uint32x2_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_f32 (float32x4_t __a)
 {
-  return (int8x8_t)__builtin_neon_vreinterpretv8qiv2si ((int32x2_t) __a);
+  return (poly128_t)__builtin_neon_vreinterprettiv4sf (__a);
 }
 
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vreinterpret_s8_u64 (uint64x1_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_p64 (poly64x2_t __a)
 {
-  return (int8x8_t)__builtin_neon_vreinterpretv8qidi ((int64x1_t) __a);
+  return (poly128_t)__builtin_neon_vreinterprettiv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vreinterpret_s8_p8 (poly8x8_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_s64 (int64x2_t __a)
 {
-  return (int8x8_t)__builtin_neon_vreinterpretv8qiv8qi ((int8x8_t) __a);
+  return (poly128_t)__builtin_neon_vreinterprettiv2di (__a);
 }
 
-__extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vreinterpret_s8_p16 (poly16x4_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_u64 (uint64x2_t __a)
 {
-  return (int8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+  return (poly128_t)__builtin_neon_vreinterprettiv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_s8_s16 (int16x8_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_s8 (int8x16_t __a)
 {
-  return (int8x16_t)__builtin_neon_vreinterpretv16qiv8hi (__a);
+  return (poly128_t)__builtin_neon_vreinterprettiv16qi (__a);
 }
 
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_s8_s32 (int32x4_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_s16 (int16x8_t __a)
 {
-  return (int8x16_t)__builtin_neon_vreinterpretv16qiv4si (__a);
+  return (poly128_t)__builtin_neon_vreinterprettiv8hi (__a);
 }
 
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_s8_s64 (int64x2_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_s32 (int32x4_t __a)
 {
-  return (int8x16_t)__builtin_neon_vreinterpretv16qiv2di (__a);
+  return (poly128_t)__builtin_neon_vreinterprettiv4si (__a);
 }
 
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_s8_f32 (float32x4_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_u8 (uint8x16_t __a)
 {
-  return (int8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a);
+  return (poly128_t)__builtin_neon_vreinterprettiv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_s8_u8 (uint8x16_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_u16 (uint16x8_t __a)
 {
-  return (int8x16_t)__builtin_neon_vreinterpretv16qiv16qi ((int8x16_t) __a);
+  return (poly128_t)__builtin_neon_vreinterprettiv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_s8_u16 (uint16x8_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vreinterpretq_p128_u32 (uint32x4_t __a)
 {
-  return (int8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+  return (poly128_t)__builtin_neon_vreinterprettiv4si ((int32x4_t) __a);
 }
 
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_s8_u32 (uint32x4_t __a)
+#endif
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_p8 (poly8x16_t __a)
 {
-  return (int8x16_t)__builtin_neon_vreinterpretv16qiv4si ((int32x4_t) __a);
+  return (int64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_s8_u64 (uint64x2_t __a)
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_p16 (poly16x8_t __a)
 {
-  return (int8x16_t)__builtin_neon_vreinterpretv16qiv2di ((int64x2_t) __a);
+  return (int64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_s8_p8 (poly8x16_t __a)
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_f32 (float32x4_t __a)
 {
-  return (int8x16_t)__builtin_neon_vreinterpretv16qiv16qi ((int8x16_t) __a);
+  return (int64x2_t)__builtin_neon_vreinterpretv2div4sf (__a);
 }
 
-__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_s8_p16 (poly16x8_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_p64 (poly64x2_t __a)
 {
-  return (int8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+  return (int64x2_t)__builtin_neon_vreinterpretv2div2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vreinterpret_s16_s8 (int8x8_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_p128 (poly128_t __a)
 {
-  return (int16x4_t)__builtin_neon_vreinterpretv4hiv8qi (__a);
+  return (int64x2_t)__builtin_neon_vreinterpretv2diti ((__builtin_neon_ti) __a);
 }
 
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vreinterpret_s16_s32 (int32x2_t __a)
+#endif
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_u64 (uint64x2_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div2di ((int64x2_t) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_s8 (int8x16_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div16qi (__a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_s16 (int16x8_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div8hi (__a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_s32 (int32x4_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div4si (__a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_u8 (uint8x16_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_u16 (uint16x8_t __a)
+{
+  return (int64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
+}
+
+__extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_s64_u32 (uint32x4_t __a)
 {
-  return (int16x4_t)__builtin_neon_vreinterpretv4hiv2si (__a);
+  return (int64x2_t)__builtin_neon_vreinterpretv2div4si ((int32x4_t) __a);
 }
 
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vreinterpret_s16_s64 (int64x1_t __a)
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_p8 (poly8x16_t __a)
 {
-  return (int16x4_t)__builtin_neon_vreinterpretv4hidi (__a);
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vreinterpret_s16_f32 (float32x2_t __a)
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_p16 (poly16x8_t __a)
 {
-  return (int16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a);
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vreinterpret_s16_u8 (uint8x8_t __a)
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_f32 (float32x4_t __a)
 {
-  return (int16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div4sf (__a);
 }
 
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vreinterpret_s16_u16 (uint16x4_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_p64 (poly64x2_t __a)
 {
-  return (int16x4_t)__builtin_neon_vreinterpretv4hiv4hi ((int16x4_t) __a);
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vreinterpret_s16_u32 (uint32x2_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_p128 (poly128_t __a)
 {
-  return (int16x4_t)__builtin_neon_vreinterpretv4hiv2si ((int32x2_t) __a);
+  return (uint64x2_t)__builtin_neon_vreinterpretv2diti ((__builtin_neon_ti) __a);
 }
 
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vreinterpret_s16_u64 (uint64x1_t __a)
+#endif
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_s64 (int64x2_t __a)
 {
-  return (int16x4_t)__builtin_neon_vreinterpretv4hidi ((int64x1_t) __a);
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div2di (__a);
 }
 
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vreinterpret_s16_p8 (poly8x8_t __a)
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_s8 (int8x16_t __a)
 {
-  return (int16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div16qi (__a);
 }
 
-__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
-vreinterpret_s16_p16 (poly16x4_t __a)
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_s16 (int16x8_t __a)
 {
-  return (int16x4_t)__builtin_neon_vreinterpretv4hiv4hi ((int16x4_t) __a);
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div8hi (__a);
 }
 
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_s16_s8 (int8x16_t __a)
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_s32 (int32x4_t __a)
 {
-  return (int16x8_t)__builtin_neon_vreinterpretv8hiv16qi (__a);
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div4si (__a);
 }
 
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_s16_s32 (int32x4_t __a)
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_u8 (uint8x16_t __a)
 {
-  return (int16x8_t)__builtin_neon_vreinterpretv8hiv4si (__a);
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_s16_s64 (int64x2_t __a)
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_u16 (uint16x8_t __a)
 {
-  return (int16x8_t)__builtin_neon_vreinterpretv8hiv2di (__a);
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_s16_f32 (float32x4_t __a)
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_u32 (uint32x4_t __a)
 {
-  return (int16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a);
+  return (uint64x2_t)__builtin_neon_vreinterpretv2div4si ((int32x4_t) __a);
 }
 
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_s16_u8 (uint8x16_t __a)
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_p8 (poly8x16_t __a)
 {
-  return (int16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_s16_u16 (uint16x8_t __a)
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_p16 (poly16x8_t __a)
 {
-  return (int16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_s16_u32 (uint32x4_t __a)
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_f32 (float32x4_t __a)
 {
-  return (int16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a);
 }
 
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_s16_u64 (uint64x2_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_p64 (poly64x2_t __a)
 {
-  return (int16x8_t)__builtin_neon_vreinterpretv8hiv2di ((int64x2_t) __a);
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_s16_p8 (poly8x16_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_p128 (poly128_t __a)
 {
-  return (int16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiti ((__builtin_neon_ti) __a);
 }
 
-__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_s16_p16 (poly16x8_t __a)
+#endif
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_s64 (int64x2_t __a)
 {
-  return (int16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv2di (__a);
 }
 
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vreinterpret_s32_s8 (int8x8_t __a)
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_u64 (uint64x2_t __a)
 {
-  return (int32x2_t)__builtin_neon_vreinterpretv2siv8qi (__a);
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vreinterpret_s32_s16 (int16x4_t __a)
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_s16 (int16x8_t __a)
 {
-  return (int32x2_t)__builtin_neon_vreinterpretv2siv4hi (__a);
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv8hi (__a);
 }
 
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vreinterpret_s32_s64 (int64x1_t __a)
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_s32 (int32x4_t __a)
 {
-  return (int32x2_t)__builtin_neon_vreinterpretv2sidi (__a);
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv4si (__a);
 }
 
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vreinterpret_s32_f32 (float32x2_t __a)
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_u8 (uint8x16_t __a)
 {
-  return (int32x2_t)__builtin_neon_vreinterpretv2siv2sf (__a);
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vreinterpret_s32_u8 (uint8x8_t __a)
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_u16 (uint16x8_t __a)
 {
-  return (int32x2_t)__builtin_neon_vreinterpretv2siv8qi ((int8x8_t) __a);
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vreinterpret_s32_u16 (uint16x4_t __a)
+__extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_s8_u32 (uint32x4_t __a)
 {
-  return (int32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
+  return (int8x16_t)__builtin_neon_vreinterpretv16qiv4si ((int32x4_t) __a);
 }
 
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vreinterpret_s32_u32 (uint32x2_t __a)
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_p8 (poly8x16_t __a)
 {
-  return (int32x2_t)__builtin_neon_vreinterpretv2siv2si ((int32x2_t) __a);
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vreinterpret_s32_u64 (uint64x1_t __a)
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_p16 (poly16x8_t __a)
 {
-  return (int32x2_t)__builtin_neon_vreinterpretv2sidi ((int64x1_t) __a);
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vreinterpret_s32_p8 (poly8x8_t __a)
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_f32 (float32x4_t __a)
 {
-  return (int32x2_t)__builtin_neon_vreinterpretv2siv8qi ((int8x8_t) __a);
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a);
 }
 
-__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
-vreinterpret_s32_p16 (poly16x4_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_p64 (poly64x2_t __a)
 {
-  return (int32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_s32_s8 (int8x16_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_p128 (poly128_t __a)
 {
-  return (int32x4_t)__builtin_neon_vreinterpretv4siv16qi (__a);
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiti ((__builtin_neon_ti) __a);
 }
 
-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_s32_s16 (int16x8_t __a)
+#endif
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_s64 (int64x2_t __a)
 {
-  return (int32x4_t)__builtin_neon_vreinterpretv4siv8hi (__a);
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv2di (__a);
 }
 
-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_s32_s64 (int64x2_t __a)
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_u64 (uint64x2_t __a)
 {
-  return (int32x4_t)__builtin_neon_vreinterpretv4siv2di (__a);
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_s32_f32 (float32x4_t __a)
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_s8 (int8x16_t __a)
 {
-  return (int32x4_t)__builtin_neon_vreinterpretv4siv4sf (__a);
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv16qi (__a);
 }
 
-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_s32_u8 (uint8x16_t __a)
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_s32 (int32x4_t __a)
 {
-  return (int32x4_t)__builtin_neon_vreinterpretv4siv16qi ((int8x16_t) __a);
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv4si (__a);
 }
 
-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_s32_u16 (uint16x8_t __a)
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_u8 (uint8x16_t __a)
 {
-  return (int32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_s32_u32 (uint32x4_t __a)
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_u16 (uint16x8_t __a)
 {
-  return (int32x4_t)__builtin_neon_vreinterpretv4siv4si ((int32x4_t) __a);
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_s32_u64 (uint64x2_t __a)
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_s16_u32 (uint32x4_t __a)
 {
-  return (int32x4_t)__builtin_neon_vreinterpretv4siv2di ((int64x2_t) __a);
+  return (int16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
 }
 
 __extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
@@ -11992,88 +12957,80 @@  vreinterpretq_s32_p16 (poly16x8_t __a)
   return (int32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vreinterpret_u8_s8 (int8x8_t __a)
-{
-  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv8qi (__a);
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vreinterpret_u8_s16 (int16x4_t __a)
-{
-  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv4hi (__a);
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vreinterpret_u8_s32 (int32x2_t __a)
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_f32 (float32x4_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv2si (__a);
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv4sf (__a);
 }
 
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vreinterpret_u8_s64 (int64x1_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_p64 (poly64x2_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vreinterpretv8qidi (__a);
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vreinterpret_u8_f32 (float32x2_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_p128 (poly128_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv2sf (__a);
+  return (int32x4_t)__builtin_neon_vreinterpretv4siti ((__builtin_neon_ti) __a);
 }
 
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vreinterpret_u8_u16 (uint16x4_t __a)
+#endif
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_s64 (int64x2_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv2di (__a);
 }
 
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vreinterpret_u8_u32 (uint32x2_t __a)
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_u64 (uint64x2_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv2si ((int32x2_t) __a);
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vreinterpret_u8_u64 (uint64x1_t __a)
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_s8 (int8x16_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vreinterpretv8qidi ((int64x1_t) __a);
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv16qi (__a);
 }
 
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vreinterpret_u8_p8 (poly8x8_t __a)
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_s16 (int16x8_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv8qi ((int8x8_t) __a);
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv8hi (__a);
 }
 
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vreinterpret_u8_p16 (poly16x4_t __a)
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_u8 (uint8x16_t __a)
 {
-  return (uint8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_u8_s8 (int8x16_t __a)
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_u16 (uint16x8_t __a)
 {
-  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv16qi (__a);
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_u8_s16 (int16x8_t __a)
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_s32_u32 (uint32x4_t __a)
 {
-  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv8hi (__a);
+  return (int32x4_t)__builtin_neon_vreinterpretv4siv4si ((int32x4_t) __a);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_u8_s32 (int32x4_t __a)
+vreinterpretq_u8_p8 (poly8x16_t __a)
 {
-  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv4si (__a);
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv16qi ((int8x16_t) __a);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_u8_s64 (int64x2_t __a)
+vreinterpretq_u8_p16 (poly16x8_t __a)
 {
-  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv2di (__a);
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
@@ -12082,16 +13039,26 @@  vreinterpretq_u8_f32 (float32x4_t __a)
   return (uint8x16_t)__builtin_neon_vreinterpretv16qiv4sf (__a);
 }
 
+#ifdef __ARM_FEATURE_CRYPTO
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_u8_u16 (uint16x8_t __a)
+vreinterpretq_u8_p64 (poly64x2_t __a)
 {
-  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv2di ((int64x2_t) __a);
 }
 
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_u8_u32 (uint32x4_t __a)
+vreinterpretq_u8_p128 (poly128_t __a)
 {
-  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv4si ((int32x4_t) __a);
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiti ((__builtin_neon_ti) __a);
+}
+
+#endif
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_s64 (int64x2_t __a)
+{
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv2di (__a);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
@@ -12101,75 +13068,79 @@  vreinterpretq_u8_u64 (uint64x2_t __a)
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_u8_p8 (poly8x16_t __a)
+vreinterpretq_u8_s8 (int8x16_t __a)
 {
-  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv16qi ((int8x16_t) __a);
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv16qi (__a);
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vreinterpretq_u8_p16 (poly16x8_t __a)
+vreinterpretq_u8_s16 (int16x8_t __a)
 {
-  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv8hi (__a);
 }
 
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vreinterpret_u16_s8 (int8x8_t __a)
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_s32 (int32x4_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv8qi (__a);
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv4si (__a);
 }
 
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vreinterpret_u16_s16 (int16x4_t __a)
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_u16 (uint16x8_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv4hi (__a);
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vreinterpret_u16_s32 (int32x2_t __a)
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_u8_u32 (uint32x4_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv2si (__a);
+  return (uint8x16_t)__builtin_neon_vreinterpretv16qiv4si ((int32x4_t) __a);
 }
 
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vreinterpret_u16_s64 (int64x1_t __a)
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_p8 (poly8x16_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vreinterpretv4hidi (__a);
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vreinterpret_u16_f32 (float32x2_t __a)
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_p16 (poly16x8_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv2sf (__a);
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vreinterpret_u16_u8 (uint8x8_t __a)
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_f32 (float32x4_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a);
 }
 
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vreinterpret_u16_u32 (uint32x2_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_p64 (poly64x2_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv2si ((int32x2_t) __a);
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vreinterpret_u16_u64 (uint64x1_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_p128 (poly128_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vreinterpretv4hidi ((int64x1_t) __a);
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiti ((__builtin_neon_ti) __a);
 }
 
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vreinterpret_u16_p8 (poly8x8_t __a)
+#endif
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_s64 (int64x2_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv8qi ((int8x8_t) __a);
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv2di (__a);
 }
 
-__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
-vreinterpret_u16_p16 (poly16x4_t __a)
+__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_u16_u64 (uint64x2_t __a)
 {
-  return (uint16x4_t)__builtin_neon_vreinterpretv4hiv4hi ((int16x4_t) __a);
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv2di ((int64x2_t) __a);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
@@ -12191,167 +13162,231 @@  vreinterpretq_u16_s32 (int32x4_t __a)
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_u16_s64 (int64x2_t __a)
+vreinterpretq_u16_u8 (uint8x16_t __a)
 {
-  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv2di (__a);
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
 }
 
 __extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_u16_f32 (float32x4_t __a)
+vreinterpretq_u16_u32 (uint32x4_t __a)
 {
-  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv4sf (__a);
+  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
 }
 
-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_u16_u8 (uint8x16_t __a)
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_p8 (poly8x16_t __a)
 {
-  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_u16_u32 (uint32x4_t __a)
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_p16 (poly16x8_t __a)
 {
-  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_u16_u64 (uint64x2_t __a)
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_f32 (float32x4_t __a)
 {
-  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv2di ((int64x2_t) __a);
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv4sf (__a);
 }
 
-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_u16_p8 (poly8x16_t __a)
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_p64 (poly64x2_t __a)
 {
-  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__))
-vreinterpretq_u16_p16 (poly16x8_t __a)
+#endif
+#ifdef __ARM_FEATURE_CRYPTO
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_p128 (poly128_t __a)
 {
-  return (uint16x8_t)__builtin_neon_vreinterpretv8hiv8hi ((int16x8_t) __a);
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siti ((__builtin_neon_ti) __a);
 }
 
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vreinterpret_u32_s8 (int8x8_t __a)
+#endif
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_s64 (int64x2_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vreinterpretv2siv8qi (__a);
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv2di (__a);
 }
 
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vreinterpret_u32_s16 (int16x4_t __a)
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_u64 (uint64x2_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vreinterpretv2siv4hi (__a);
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv2di ((int64x2_t) __a);
 }
 
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vreinterpret_u32_s32 (int32x2_t __a)
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_s8 (int8x16_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vreinterpretv2siv2si (__a);
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv16qi (__a);
 }
 
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vreinterpret_u32_s64 (int64x1_t __a)
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_s16 (int16x8_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vreinterpretv2sidi (__a);
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv8hi (__a);
 }
 
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vreinterpret_u32_f32 (float32x2_t __a)
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_s32 (int32x4_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vreinterpretv2siv2sf (__a);
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv4si (__a);
 }
 
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vreinterpret_u32_u8 (uint8x8_t __a)
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_u8 (uint8x16_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vreinterpretv2siv8qi ((int8x8_t) __a);
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv16qi ((int8x16_t) __a);
 }
 
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vreinterpret_u32_u16 (uint16x4_t __a)
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vreinterpretq_u32_u16 (uint16x8_t __a)
 {
-  return (uint32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
+  return (uint32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
 }
 
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vreinterpret_u32_u64 (uint64x1_t __a)
+
+#ifdef __ARM_FEATURE_CRYPTO
+
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vldrq_p128 (poly128_t const * __ptr)
 {
-  return (uint32x2_t)__builtin_neon_vreinterpretv2sidi ((int64x1_t) __a);
+#ifdef __ARM_BIG_ENDIAN
+  poly64_t* __ptmp = (poly64_t*) __ptr;
+  poly64_t __d0 = vld1_p64 (__ptmp);
+  poly64_t __d1 = vld1_p64 (__ptmp + 1);
+  return vreinterpretq_p128_p64 (vcombine_p64 (__d1, __d0));
+#else
+  return vreinterpretq_p128_p64 (vld1q_p64 ((poly64_t*) __ptr));
+#endif
 }
 
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vreinterpret_u32_p8 (poly8x8_t __a)
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vstrq_p128 (poly128_t * __ptr, poly128_t __val)
 {
-  return (uint32x2_t)__builtin_neon_vreinterpretv2siv8qi ((int8x8_t) __a);
+#ifdef __ARM_BIG_ENDIAN
+  poly64x2_t __tmp = vreinterpretq_p64_p128 (__val);
+  poly64_t __d0 = vget_high_p64 (__tmp);
+  poly64_t __d1 = vget_low_p64 (__tmp);
+  vst1q_p64 ((poly64_t*) __ptr, vcombine_p64 (__d0, __d1));
+#else
+  vst1q_p64 ((poly64_t*) __ptr, vreinterpretq_p64_p128 (__val));
+#endif
 }
 
-__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
-vreinterpret_u32_p16 (poly16x4_t __a)
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vaeseq_u8 (uint8x16_t __data, uint8x16_t __key)
 {
-  return (uint32x2_t)__builtin_neon_vreinterpretv2siv4hi ((int16x4_t) __a);
+  return __builtin_arm_crypto_aese (__data, __key);
 }
 
-__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_u32_s8 (int8x16_t __a)
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vaesdq_u8 (uint8x16_t __data, uint8x16_t __key)
 {
-  return (uint32x4_t)__builtin_neon_vreinterpretv4siv16qi (__a);
+  return __builtin_arm_crypto_aesd (__data, __key);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vaesmcq_u8 (uint8x16_t __data)
+{
+  return __builtin_arm_crypto_aesmc (__data);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vaesimcq_u8 (uint8x16_t __data)
+{
+  return __builtin_arm_crypto_aesimc (__data);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vsha1h_u32 (uint32_t __hash_e)
+{
+  uint32x4_t __t = vdupq_n_u32 (0);
+  __t = vsetq_lane_u32 (__hash_e, __t, 0);
+  __t = __builtin_arm_crypto_sha1h (__t);
+  return vgetq_lane_u32 (__t, 0);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_u32_s16 (int16x8_t __a)
+vsha1cq_u32 (uint32x4_t __hash_abcd, uint32_t __hash_e, uint32x4_t __wk)
 {
-  return (uint32x4_t)__builtin_neon_vreinterpretv4siv8hi (__a);
+  uint32x4_t __t = vdupq_n_u32 (0);
+  __t = vsetq_lane_u32 (__hash_e, __t, 0);
+  return __builtin_arm_crypto_sha1c (__hash_abcd, __t, __wk);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_u32_s32 (int32x4_t __a)
+vsha1pq_u32 (uint32x4_t __hash_abcd, uint32_t __hash_e, uint32x4_t __wk)
 {
-  return (uint32x4_t)__builtin_neon_vreinterpretv4siv4si (__a);
+  uint32x4_t __t = vdupq_n_u32 (0);
+  __t = vsetq_lane_u32 (__hash_e, __t, 0);
+  return __builtin_arm_crypto_sha1p (__hash_abcd, __t, __wk);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_u32_s64 (int64x2_t __a)
+vsha1mq_u32 (uint32x4_t __hash_abcd, uint32_t __hash_e, uint32x4_t __wk)
 {
-  return (uint32x4_t)__builtin_neon_vreinterpretv4siv2di (__a);
+  uint32x4_t __t = vdupq_n_u32 (0);
+  __t = vsetq_lane_u32 (__hash_e, __t, 0);
+  return __builtin_arm_crypto_sha1m (__hash_abcd, __t, __wk);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_u32_f32 (float32x4_t __a)
+vsha1su0q_u32 (uint32x4_t __w0_3, uint32x4_t __w4_7, uint32x4_t __w8_11)
 {
-  return (uint32x4_t)__builtin_neon_vreinterpretv4siv4sf (__a);
+  return __builtin_arm_crypto_sha1su0 (__w0_3, __w4_7, __w8_11);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_u32_u8 (uint8x16_t __a)
+vsha1su1q_u32 (uint32x4_t __tw0_3, uint32x4_t __w12_15)
 {
-  return (uint32x4_t)__builtin_neon_vreinterpretv4siv16qi ((int8x16_t) __a);
+  return __builtin_arm_crypto_sha1su1 (__tw0_3, __w12_15);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_u32_u16 (uint16x8_t __a)
+vsha256hq_u32 (uint32x4_t __hash_abcd, uint32x4_t __hash_efgh, uint32x4_t __wk)
 {
-  return (uint32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
+  return __builtin_arm_crypto_sha256h (__hash_abcd, __hash_efgh, __wk);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_u32_u64 (uint64x2_t __a)
+vsha256h2q_u32 (uint32x4_t __hash_abcd, uint32x4_t __hash_efgh, uint32x4_t __wk)
 {
-  return (uint32x4_t)__builtin_neon_vreinterpretv4siv2di ((int64x2_t) __a);
+  return __builtin_arm_crypto_sha256h2 (__hash_abcd, __hash_efgh, __wk);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_u32_p8 (poly8x16_t __a)
+vsha256su0q_u32 (uint32x4_t __w0_3, uint32x4_t __w4_7)
 {
-  return (uint32x4_t)__builtin_neon_vreinterpretv4siv16qi ((int8x16_t) __a);
+  return __builtin_arm_crypto_sha256su0 (__w0_3, __w4_7);
 }
 
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
-vreinterpretq_u32_p16 (poly16x8_t __a)
+vsha256su1q_u32 (uint32x4_t __tw0_3, uint32x4_t __w8_11, uint32x4_t __w12_15)
 {
-  return (uint32x4_t)__builtin_neon_vreinterpretv4siv8hi ((int16x8_t) __a);
+  return __builtin_arm_crypto_sha256su1 (__tw0_3, __w8_11, __w12_15);
 }
 
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vmull_p64 (poly64_t __a, poly64_t __b)
+{
+  return (poly128_t) __builtin_arm_crypto_vmullp64 ((uint64_t) __a, (uint64_t) __b);
+}
+
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
+{
+  poly64_t __t1 = vget_high_p64 (__a);
+  poly64_t __t2 = vget_high_p64 (__b);
+
+  return (poly128_t) __builtin_arm_crypto_vmullp64 ((uint64_t) __t1, (uint64_t) __t2);
+}
+
+#endif
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 92f1d7a..5fa17bd 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -158,11 +158,12 @@  VAR5 (REINTERP, vreinterpretv4hi, v8qi, v4hi, v2si, v2sf, di),
 VAR5 (REINTERP, vreinterpretv2si, v8qi, v4hi, v2si, v2sf, di),
 VAR5 (REINTERP, vreinterpretv2sf, v8qi, v4hi, v2si, v2sf, di),
 VAR5 (REINTERP, vreinterpretdi, v8qi, v4hi, v2si, v2sf, di),
-VAR5 (REINTERP, vreinterpretv16qi, v16qi, v8hi, v4si, v4sf, v2di),
-VAR5 (REINTERP, vreinterpretv8hi, v16qi, v8hi, v4si, v4sf, v2di),
-VAR5 (REINTERP, vreinterpretv4si, v16qi, v8hi, v4si, v4sf, v2di),
-VAR5 (REINTERP, vreinterpretv4sf, v16qi, v8hi, v4si, v4sf, v2di),
-VAR5 (REINTERP, vreinterpretv2di, v16qi, v8hi, v4si, v4sf, v2di),
+VAR6 (REINTERP, vreinterpretv16qi, v16qi, v8hi, v4si, v4sf, v2di, ti),
+VAR6 (REINTERP, vreinterpretv8hi, v16qi, v8hi, v4si, v4sf, v2di, ti),
+VAR6 (REINTERP, vreinterpretv4si, v16qi, v8hi, v4si, v4sf, v2di, ti),
+VAR6 (REINTERP, vreinterpretv4sf, v16qi, v8hi, v4si, v4sf, v2di, ti),
+VAR6 (REINTERP, vreinterpretv2di, v16qi, v8hi, v4si, v4sf, v2di, ti),
+VAR6 (REINTERP, vreinterpretti, v16qi, v8hi, v4si, v4sf, v2di, ti),
 VAR10 (LOAD1, vld1,
          v8qi, v4hi, v2si, v2sf, di, v16qi, v8hi, v4si, v4sf, v2di),
 VAR10 (LOAD1LANE, vld1_lane,
diff --git a/gcc/config/arm/crypto.def b/gcc/config/arm/crypto.def
new file mode 100644
index 0000000..0b3a395
--- /dev/null
+++ b/gcc/config/arm/crypto.def
@@ -0,0 +1,35 @@ 
+/* Cryptographic instruction builtin definitions.
+   Copyright (C) 2013
+   Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+CRYPTO2 (aesd, AESD, v16uqi, v16uqi, v16uqi)
+CRYPTO2 (aese, AESE, v16uqi, v16uqi, v16uqi)
+CRYPTO1 (aesimc, AESIMC, v16uqi, v16uqi)
+CRYPTO1 (aesmc, AESMC, v16uqi, v16uqi)
+CRYPTO1 (sha1h, SHA1H, v4usi, v4usi)
+CRYPTO2 (sha1su1, SHA1SU1, v4usi, v4usi, v4usi)
+CRYPTO2 (sha256su0, SHA256SU0, v4usi, v4usi, v4usi)
+CRYPTO3 (sha1c, SHA1C, v4usi, v4usi, v4usi, v4usi)
+CRYPTO3 (sha1m, SHA1M, v4usi, v4usi, v4usi, v4usi)
+CRYPTO3 (sha1p, SHA1P, v4usi, v4usi, v4usi, v4usi)
+CRYPTO3 (sha1su0, SHA1SU0, v4usi, v4usi, v4usi, v4usi)
+CRYPTO3 (sha256h, SHA256H, v4usi, v4usi, v4usi, v4usi)
+CRYPTO3 (sha256h2, SHA256H2, v4usi, v4usi, v4usi, v4usi)
+CRYPTO3 (sha256su1, SHA256SU1, v4usi, v4usi, v4usi, v4usi)
+CRYPTO2 (vmullp64, VMULLP64, uti, udi, udi)
diff --git a/gcc/config/arm/crypto.md b/gcc/config/arm/crypto.md
new file mode 100644
index 0000000..e365cf8
--- /dev/null
+++ b/gcc/config/arm/crypto.md
@@ -0,0 +1,86 @@ 
+;; ARMv8-A crypto patterns.
+;; Copyright (C) 2013 Free Software Foundation, Inc.
+;; Contributed by ARM Ltd.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_insn "crypto_<crypto_pattern>"
+  [(set (match_operand:<crypto_mode> 0 "register_operand" "=w")
+        (unspec:<crypto_mode> [(match_operand:<crypto_mode> 1
+                       "register_operand" "w")]
+         CRYPTO_UNARY))]
+  "TARGET_CRYPTO"
+  "<crypto_pattern>.<crypto_size_sfx>\\t%q0, %q1"
+  [(set_attr "type" "<crypto_type>")]
+)
+
+(define_insn "crypto_<crypto_pattern>"
+  [(set (match_operand:<crypto_mode> 0 "register_operand" "=w")
+        (unspec:<crypto_mode> [(match_operand:<crypto_mode> 1 "register_operand" "0")
+                      (match_operand:<crypto_mode> 2 "register_operand" "w")]
+         CRYPTO_BINARY))]
+  "TARGET_CRYPTO"
+  "<crypto_pattern>.<crypto_size_sfx>\\t%q0, %q2"
+  [(set_attr "type" "<crypto_type>")]
+)
+
+(define_insn "crypto_<crypto_pattern>"
+  [(set (match_operand:<crypto_mode> 0 "register_operand" "=w")
+        (unspec:<crypto_mode> [(match_operand:<crypto_mode> 1 "register_operand" "0")
+                      (match_operand:<crypto_mode> 2 "register_operand" "w")
+                      (match_operand:<crypto_mode> 3 "register_operand" "w")]
+         CRYPTO_TERNARY))]
+  "TARGET_CRYPTO"
+  "<crypto_pattern>.<crypto_size_sfx>\\t%q0, %q2, %q3"
+  [(set_attr "type" "<crypto_type>")]
+)
+
+(define_insn "crypto_sha1h"
+  [(set (match_operand:V4SI 0 "register_operand" "=w")
+        (zero_extend:V4SI
+          (unspec:SI [(vec_select:SI
+                        (match_operand:V4SI 1 "register_operand" "w")
+                        (parallel [(match_operand:SI 2 "immediate_operand" "i")]))]
+           UNSPEC_SHA1H)))]
+  "TARGET_CRYPTO"
+  "sha1h.32\\t%q0, %q1"
+  [(set_attr "type" "crypto_sha1_fast")]
+)
+
+(define_insn "crypto_vmullp64"
+  [(set (match_operand:TI 0 "register_operand" "=w")
+        (unspec:TI [(match_operand:DI 1 "register_operand" "w")
+                    (match_operand:DI 2 "register_operand" "w")]
+         UNSPEC_VMULLP64))]
+  "TARGET_CRYPTO"
+  "vmull.p64\\t%q0, %P1, %P2"
+  [(set_attr "type" "neon_mul_d_long")]
+)
+
+(define_insn "crypto_<crypto_pattern>"
+  [(set (match_operand:V4SI 0 "register_operand" "=w")
+        (unspec:<crypto_mode>
+                     [(match_operand:<crypto_mode> 1 "register_operand" "0")
+                      (vec_select:SI
+                        (match_operand:<crypto_mode> 2 "register_operand" "w")
+                        (parallel [(match_operand:SI 4 "immediate_operand" "i")]))
+                      (match_operand:<crypto_mode> 3 "register_operand" "w")]
+         CRYPTO_SELECTING))]
+  "TARGET_CRYPTO"
+  "<crypto_pattern>.<crypto_size_sfx>\\t%q0, %q2, %q3"
+  [(set_attr "type" "<crypto_type>")]
+)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index ff5462c..299c959 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -204,6 +204,17 @@ 
 (define_int_iterator CRC [UNSPEC_CRC32B UNSPEC_CRC32H UNSPEC_CRC32W
                           UNSPEC_CRC32CB UNSPEC_CRC32CH UNSPEC_CRC32CW])
 
+(define_int_iterator CRYPTO_UNARY [UNSPEC_AESMC UNSPEC_AESIMC])
+
+(define_int_iterator CRYPTO_BINARY [UNSPEC_AESD UNSPEC_AESE
+                                    UNSPEC_SHA1SU1 UNSPEC_SHA256SU0])
+
+(define_int_iterator CRYPTO_TERNARY [UNSPEC_SHA1SU0 UNSPEC_SHA256H
+                                     UNSPEC_SHA256H2 UNSPEC_SHA256SU1])
+
+(define_int_iterator CRYPTO_SELECTING [UNSPEC_SHA1C UNSPEC_SHA1M
+                                       UNSPEC_SHA1P])
+
 ;;----------------------------------------------------------------------------
 ;; Mode attributes
 ;;----------------------------------------------------------------------------
@@ -530,6 +541,40 @@ 
                         (UNSPEC_CRC32W "SI") (UNSPEC_CRC32CB "QI")
                         (UNSPEC_CRC32CH "HI") (UNSPEC_CRC32CW "SI")])
 
+(define_int_attr crypto_pattern [(UNSPEC_SHA1H "sha1h") (UNSPEC_AESMC "aesmc")
+                          (UNSPEC_AESIMC "aesimc") (UNSPEC_AESD "aesd")
+                          (UNSPEC_AESE "aese") (UNSPEC_SHA1SU1 "sha1su1")
+                          (UNSPEC_SHA256SU0 "sha256su0") (UNSPEC_SHA1C "sha1c")
+                          (UNSPEC_SHA1M "sha1m") (UNSPEC_SHA1P "sha1p")
+                          (UNSPEC_SHA1SU0 "sha1su0") (UNSPEC_SHA256H "sha256h")
+                          (UNSPEC_SHA256H2 "sha256h2")
+                          (UNSPEC_SHA256SU1 "sha256su1")])
+
+(define_int_attr crypto_type
+ [(UNSPEC_AESE "crypto_aes") (UNSPEC_AESD "crypto_aes")
+ (UNSPEC_AESMC "crypto_aes") (UNSPEC_AESIMC "crypto_aes")
+ (UNSPEC_SHA1C "crypto_sha1_slow") (UNSPEC_SHA1P "crypto_sha1_slow")
+ (UNSPEC_SHA1M "crypto_sha1_slow") (UNSPEC_SHA1SU1 "crypto_sha1_fast")
+ (UNSPEC_SHA1SU0 "crypto_sha1_xor") (UNSPEC_SHA256H "crypto_sha256_slow")
+ (UNSPEC_SHA256H2 "crypto_sha256_slow") (UNSPEC_SHA256SU0 "crypto_sha256_fast")
+ (UNSPEC_SHA256SU1 "crypto_sha256_slow")])
+
+(define_int_attr crypto_size_sfx [(UNSPEC_SHA1H "32") (UNSPEC_AESMC "8")
+                          (UNSPEC_AESIMC "8") (UNSPEC_AESD "8")
+                          (UNSPEC_AESE "8") (UNSPEC_SHA1SU1 "32")
+                          (UNSPEC_SHA256SU0 "32") (UNSPEC_SHA1C "32")
+                          (UNSPEC_SHA1M "32") (UNSPEC_SHA1P "32")
+                          (UNSPEC_SHA1SU0 "32") (UNSPEC_SHA256H "32")
+                          (UNSPEC_SHA256H2 "32") (UNSPEC_SHA256SU1 "32")])
+
+(define_int_attr crypto_mode [(UNSPEC_SHA1H "V4SI") (UNSPEC_AESMC "V16QI")
+                          (UNSPEC_AESIMC "V16QI") (UNSPEC_AESD "V16QI")
+                          (UNSPEC_AESE "V16QI") (UNSPEC_SHA1SU1 "V4SI")
+                          (UNSPEC_SHA256SU0 "V4SI") (UNSPEC_SHA1C "V4SI")
+                          (UNSPEC_SHA1M "V4SI") (UNSPEC_SHA1P "V4SI")
+                          (UNSPEC_SHA1SU0 "V4SI") (UNSPEC_SHA256H "V4SI")
+                          (UNSPEC_SHA256H2 "V4SI") (UNSPEC_SHA256SU1 "V4SI")])
+
 ;; Both kinds of return insn.
 (define_code_iterator returns [return simple_return])
 (define_code_attr return_str [(return "") (simple_return "simple_")])
diff --git a/gcc/config/arm/neon-gen.ml b/gcc/config/arm/neon-gen.ml
index 948b162..e5da658 100644
--- a/gcc/config/arm/neon-gen.ml
+++ b/gcc/config/arm/neon-gen.ml
@@ -114,6 +114,7 @@  let rec signed_ctype = function
   | T_uint32x4 -> T_int32x4
   | T_uint64x1 -> T_int64x1
   | T_uint64x2 -> T_int64x2
+  | T_poly64x2 -> T_int64x2
   (* Cast to types defined by mode in arm.c, not random types pulled in from
      the <stdint.h> header in use. This fixes incompatible pointer errors when
      compiling with C++.  *)
@@ -125,6 +126,8 @@  let rec signed_ctype = function
   | T_float32 -> T_floatSF
   | T_poly8 -> T_intQI
   | T_poly16 -> T_intHI
+  | T_poly64 -> T_intDI
+  | T_poly128 -> T_intTI
   | T_arrayof (n, elt) -> T_arrayof (n, signed_ctype elt)
   | T_ptrto elt -> T_ptrto (signed_ctype elt)
   | T_const elt -> T_const (signed_ctype elt)
@@ -362,80 +365,96 @@  let print_ops ops =
      abase : "ARM" base name for the type (i.e. int in int8x8_t).
      esize : element size.
      enum : element count.
+     alevel: architecture level at which available.
 *)
 
+type fpulevel = CRYPTO | ALL
+
 let deftypes () =
   let typeinfo = [
     (* Doubleword vector types.  *)
-    "__builtin_neon_qi", "int", 8, 8;
-    "__builtin_neon_hi", "int", 16, 4;
-    "__builtin_neon_si", "int", 32, 2;
-    "__builtin_neon_di", "int", 64, 1;
-    "__builtin_neon_hf", "float", 16, 4;
-    "__builtin_neon_sf", "float", 32, 2;
-    "__builtin_neon_poly8", "poly", 8, 8;
-    "__builtin_neon_poly16", "poly", 16, 4;
-    "__builtin_neon_uqi", "uint", 8, 8;
-    "__builtin_neon_uhi", "uint", 16, 4;
-    "__builtin_neon_usi", "uint", 32, 2;
-    "__builtin_neon_udi", "uint", 64, 1;
+    "__builtin_neon_qi", "int", 8, 8, ALL;
+    "__builtin_neon_hi", "int", 16, 4, ALL;
+    "__builtin_neon_si", "int", 32, 2, ALL;
+    "__builtin_neon_di", "int", 64, 1, ALL;
+    "__builtin_neon_hf", "float", 16, 4, ALL;
+    "__builtin_neon_sf", "float", 32, 2, ALL;
+    "__builtin_neon_poly8", "poly", 8, 8, ALL;
+    "__builtin_neon_poly16", "poly", 16, 4, ALL;
+    "__builtin_neon_poly64", "poly", 64, 1, CRYPTO;
+    "__builtin_neon_uqi", "uint", 8, 8, ALL;
+    "__builtin_neon_uhi", "uint", 16, 4, ALL;
+    "__builtin_neon_usi", "uint", 32, 2, ALL;
+    "__builtin_neon_udi", "uint", 64, 1, ALL;
 
     (* Quadword vector types.  *)
-    "__builtin_neon_qi", "int", 8, 16;
-    "__builtin_neon_hi", "int", 16, 8;
-    "__builtin_neon_si", "int", 32, 4;
-    "__builtin_neon_di", "int", 64, 2;
-    "__builtin_neon_sf", "float", 32, 4;
-    "__builtin_neon_poly8", "poly", 8, 16;
-    "__builtin_neon_poly16", "poly", 16, 8;
-    "__builtin_neon_uqi", "uint", 8, 16;
-    "__builtin_neon_uhi", "uint", 16, 8;
-    "__builtin_neon_usi", "uint", 32, 4;
-    "__builtin_neon_udi", "uint", 64, 2
+    "__builtin_neon_qi", "int", 8, 16, ALL;
+    "__builtin_neon_hi", "int", 16, 8, ALL;
+    "__builtin_neon_si", "int", 32, 4, ALL;
+    "__builtin_neon_di", "int", 64, 2, ALL;
+    "__builtin_neon_sf", "float", 32, 4, ALL;
+    "__builtin_neon_poly8", "poly", 8, 16, ALL;
+    "__builtin_neon_poly16", "poly", 16, 8, ALL;
+    "__builtin_neon_poly64", "poly", 64, 2, CRYPTO;
+    "__builtin_neon_uqi", "uint", 8, 16, ALL;
+    "__builtin_neon_uhi", "uint", 16, 8, ALL;
+    "__builtin_neon_usi", "uint", 32, 4, ALL;
+    "__builtin_neon_udi", "uint", 64, 2, ALL
   ] in
   List.iter
-    (fun (cbase, abase, esize, enum) ->
+    (fun (cbase, abase, esize, enum, fpulevel) ->
       let attr =
         match enum with
           1 -> ""
         | _ -> Printf.sprintf "\t__attribute__ ((__vector_size__ (%d)))"
                               (esize * enum / 8) in
-      Format.printf "typedef %s %s%dx%d_t%s;@\n" cbase abase esize enum attr)
+      if fpulevel == CRYPTO then
+        Format.printf "#ifdef __ARM_FEATURE_CRYPTO\n";
+      Format.printf "typedef %s %s%dx%d_t%s;@\n" cbase abase esize enum attr;
+      if fpulevel == CRYPTO then
+        Format.printf "#endif\n";)
     typeinfo;
   Format.print_newline ();
   (* Extra types not in <stdint.h>.  *)
   Format.printf "typedef float float32_t;\n";
   Format.printf "typedef __builtin_neon_poly8 poly8_t;\n";
-  Format.printf "typedef __builtin_neon_poly16 poly16_t;\n"
+  Format.printf "typedef __builtin_neon_poly16 poly16_t;\n";
+  Format.printf "#ifdef __ARM_FEATURE_CRYPTO\n";
+  Format.printf "typedef __builtin_neon_poly64 poly64_t;\n";
+  Format.printf "typedef __builtin_neon_poly128 poly128_t;\n";
+  Format.printf "#endif\n"
 
-(* Output structs containing arrays, for load & store instructions etc.  *)
+(* Output structs containing arrays, for load & store instructions etc.
+   poly128_t is deliberately not included here because it has no array types
+   defined for it.  *)
 
 let arrtypes () =
   let typeinfo = [
-    "int", 8;    "int", 16;
-    "int", 32;   "int", 64;
-    "uint", 8;   "uint", 16;
-    "uint", 32;  "uint", 64;
-    "float", 32; "poly", 8;
-    "poly", 16
+    "int", 8, ALL;    "int", 16, ALL;
+    "int", 32, ALL;   "int", 64, ALL;
+    "uint", 8, ALL;   "uint", 16, ALL;
+    "uint", 32, ALL;  "uint", 64, ALL;
+    "float", 32, ALL; "poly", 8, ALL;
+    "poly", 16, ALL; "poly", 64, CRYPTO
   ] in
-  let writestruct elname elsize regsize arrsize =
+  let writestruct elname elsize regsize arrsize fpulevel =
     let elnum = regsize / elsize in
     let structname =
       Printf.sprintf "%s%dx%dx%d_t" elname elsize elnum arrsize in
     let sfmt = start_function () in
-    Format.printf "typedef struct %s" structname;
+    Format.printf "%stypedef struct %s"
+      (if fpulevel == CRYPTO then "#ifdef __ARM_FEATURE_CRYPTO\n" else "") structname;
     open_braceblock sfmt;
     Format.printf "%s%dx%d_t val[%d];" elname elsize elnum arrsize;
     close_braceblock sfmt;
-    Format.printf " %s;" structname;
+    Format.printf " %s;%s" structname (if fpulevel == CRYPTO then "\n#endif\n" else "");
     end_function sfmt;
   in
     for n = 2 to 4 do
       List.iter
-        (fun (elname, elsize) ->
-          writestruct elname elsize 64 n;
-          writestruct elname elsize 128 n)
+        (fun (elname, elsize, alevel) ->
+          writestruct elname elsize 64 n alevel;
+          writestruct elname elsize 128 n alevel)
         typeinfo
     done
 
@@ -491,6 +510,8 @@  let _ =
   print_ops ops;
   Format.print_newline ();
   print_ops reinterp;
+  print_ops reinterpq;
+  Format.printf "%s" crypto_intrinsics;
   print_lines [
 "#ifdef __cplusplus";
 "}";
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index b2ac45e..5e9b441 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -4259,9 +4259,19 @@ 
   DONE;
 })
 
+(define_expand "neon_vreinterpretti<mode>"
+  [(match_operand:TI 0 "s_register_operand" "")
+   (match_operand:VQXMOV 1 "s_register_operand" "")]
+  "TARGET_NEON"
+{
+  neon_reinterpret (operands[0], operands[1]);
+  DONE;
+})
+
+
 (define_expand "neon_vreinterpretv16qi<mode>"
   [(match_operand:V16QI 0 "s_register_operand" "")
-   (match_operand:VQX 1 "s_register_operand" "")]
+   (match_operand:VQXMOV 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   neon_reinterpret (operands[0], operands[1]);
@@ -4270,7 +4280,7 @@ 
 
 (define_expand "neon_vreinterpretv8hi<mode>"
   [(match_operand:V8HI 0 "s_register_operand" "")
-   (match_operand:VQX 1 "s_register_operand" "")]
+   (match_operand:VQXMOV 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   neon_reinterpret (operands[0], operands[1]);
@@ -4279,7 +4289,7 @@ 
 
 (define_expand "neon_vreinterpretv4si<mode>"
   [(match_operand:V4SI 0 "s_register_operand" "")
-   (match_operand:VQX 1 "s_register_operand" "")]
+   (match_operand:VQXMOV 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   neon_reinterpret (operands[0], operands[1]);
@@ -4288,7 +4298,7 @@ 
 
 (define_expand "neon_vreinterpretv4sf<mode>"
   [(match_operand:V4SF 0 "s_register_operand" "")
-   (match_operand:VQX 1 "s_register_operand" "")]
+   (match_operand:VQXMOV 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   neon_reinterpret (operands[0], operands[1]);
@@ -4297,7 +4307,7 @@ 
 
 (define_expand "neon_vreinterpretv2di<mode>"
   [(match_operand:V2DI 0 "s_register_operand" "")
-   (match_operand:VQX 1 "s_register_operand" "")]
+   (match_operand:VQXMOV 1 "s_register_operand" "")]
   "TARGET_NEON"
 {
   neon_reinterpret (operands[0], operands[1]);
diff --git a/gcc/config/arm/neon.ml b/gcc/config/arm/neon.ml
index ca9a4c0..968c171 100644
--- a/gcc/config/arm/neon.ml
+++ b/gcc/config/arm/neon.ml
@@ -22,7 +22,7 @@ 
 
 (* Shorthand types for vector elements.  *)
 type elts = S8 | S16 | S32 | S64 | F16 | F32 | U8 | U16 | U32 | U64 | P8 | P16
-          | I8 | I16 | I32 | I64 | B8 | B16 | B32 | B64 | Conv of elts * elts
+          | P64 | P128 | I8 | I16 | I32 | I64 | B8 | B16 | B32 | B64 | Conv of elts * elts
           | Cast of elts * elts | NoElts
 
 type eltclass = Signed | Unsigned | Float | Poly | Int | Bits
@@ -47,13 +47,15 @@  type vectype = T_int8x8    | T_int8x16
              | T_uint8     | T_uint16
              | T_uint32    | T_uint64
              | T_poly8     | T_poly16
+             | T_poly64    | T_poly64x1
+             | T_poly64x2  | T_poly128
              | T_float16   | T_float32
              | T_arrayof of int * vectype
              | T_ptrto of vectype | T_const of vectype
              | T_void      | T_intQI
              | T_intHI     | T_intSI
-             | T_intDI     | T_floatHF
-             | T_floatSF
+             | T_intDI     | T_intTI
+             | T_floatHF   | T_floatSF
 
 (* The meanings of the following are:
      TImode : "Tetra", two registers (four words).
@@ -96,7 +98,7 @@  type arity = Arity0 of vectype
            | Arity4 of vectype * vectype * vectype * vectype * vectype
 
 type vecmode = V8QI | V4HI | V4HF |V2SI | V2SF | DI
-             | V16QI | V8HI | V4SI | V4SF | V2DI
+             | V16QI | V8HI | V4SI | V4SF | V2DI | TI
              | QI | HI | SI | SF
 
 type opcode =
@@ -299,7 +301,8 @@  let rec elt_width = function
     S8 | U8 | P8 | I8 | B8 -> 8
   | S16 | U16 | P16 | I16 | B16 | F16 -> 16
   | S32 | F32 | U32 | I32 | B32 -> 32
-  | S64 | U64 | I64 | B64 -> 64
+  | S64 | U64 | P64 | I64 | B64 -> 64
+  | P128 -> 128
   | Conv (a, b) ->
       let wa = elt_width a and wb = elt_width b in
       if wa = wb then wa else raise (MixedMode (a, b))
@@ -309,7 +312,7 @@  let rec elt_width = function
 let rec elt_class = function
     S8 | S16 | S32 | S64 -> Signed
   | U8 | U16 | U32 | U64 -> Unsigned
-  | P8 | P16 -> Poly
+  | P8 | P16 | P64 | P128 -> Poly
   | F16 | F32 -> Float
   | I8 | I16 | I32 | I64 -> Int
   | B8 | B16 | B32 | B64 -> Bits
@@ -330,6 +333,8 @@  let elt_of_class_width c w =
   | Unsigned, 64 -> U64
   | Poly, 8 -> P8
   | Poly, 16 -> P16
+  | Poly, 64 -> P64
+  | Poly, 128 -> P128
   | Int, 8 -> I8
   | Int, 16 -> I16
   | Int, 32 -> I32
@@ -402,7 +407,7 @@  let rec mode_of_elt ?argpos elt shape =
     Float | ConvClass(_, Float) -> true | _ -> false in
   let idx =
     match elt_width elt with
-      8 -> 0 | 16 -> 1 | 32 -> 2 | 64 -> 3
+      8 -> 0 | 16 -> 1 | 32 -> 2 | 64 -> 3 | 128 -> 4
     | _ -> failwith "Bad element width"
   in match shape with
     All (_, Dreg) | By_scalar Dreg | Pair_result Dreg | Unary_scalar Dreg
@@ -413,7 +418,7 @@  let rec mode_of_elt ?argpos elt shape =
         [| V8QI; V4HI; V2SI; DI |].(idx)
   | All (_, Qreg) | By_scalar Qreg | Pair_result Qreg | Unary_scalar Qreg
   | Binary_imm Qreg | Long_noreg Qreg | Wide_noreg Qreg ->
-      [| V16QI; V8HI; if flt then V4SF else V4SI; V2DI |].(idx)
+      [| V16QI; V8HI; if flt then V4SF else V4SI; V2DI; TI|].(idx)
   | All (_, (Corereg | PtrTo _ | CstPtrTo _)) ->
       [| QI; HI; if flt then SF else SI; DI |].(idx)
   | Long | Wide | Wide_lane | Wide_scalar
@@ -474,6 +479,8 @@  let type_for_elt shape elt no =
         | U16 -> T_uint16x4
         | U32 -> T_uint32x2
         | U64 -> T_uint64x1
+        | P64 -> T_poly64x1
+        | P128 -> T_poly128
         | F16 -> T_float16x4
         | F32 -> T_float32x2
         | P8 -> T_poly8x8
@@ -493,6 +500,8 @@  let type_for_elt shape elt no =
         | F32 -> T_float32x4
         | P8 -> T_poly8x16
         | P16 -> T_poly16x8
+        | P64 -> T_poly64x2
+        | P128 -> T_poly128
         | _ -> failwith "Bad elt type for Qreg"
         end
     | Corereg ->
@@ -507,6 +516,8 @@  let type_for_elt shape elt no =
         | U64 -> T_uint64
         | P8 -> T_poly8
         | P16 -> T_poly16
+        | P64 -> T_poly64
+        | P128 -> T_poly128
         | F32 -> T_float32
         | _ -> failwith "Bad elt type for Corereg"
         end
@@ -527,10 +538,10 @@  let type_for_elt shape elt no =
 let vectype_size = function
     T_int8x8 | T_int16x4 | T_int32x2 | T_int64x1
   | T_uint8x8 | T_uint16x4 | T_uint32x2 | T_uint64x1
-  | T_float32x2 | T_poly8x8 | T_poly16x4 | T_float16x4 -> 64
+  | T_float32x2 | T_poly8x8 | T_poly64x1 | T_poly16x4 | T_float16x4 -> 64
   | T_int8x16 | T_int16x8 | T_int32x4 | T_int64x2
   | T_uint8x16 | T_uint16x8  | T_uint32x4  | T_uint64x2
-  | T_float32x4 | T_poly8x16 | T_poly16x8 -> 128
+  | T_float32x4 | T_poly8x16 | T_poly64x2 | T_poly16x8 -> 128
   | _ -> raise Not_found
 
 let inttype_for_array num elttype =
@@ -1041,14 +1052,22 @@  let ops =
       "vRsraQ_n", shift_right_acc, su_8_64;
 
     (* Vector shift right and insert.  *)
+    Vsri, [Requires_feature "CRYPTO"], Use_operands [| Dreg; Dreg; Immed |], "vsri_n", shift_insert,
+      [P64];
     Vsri, [], Use_operands [| Dreg; Dreg; Immed |], "vsri_n", shift_insert,
       P8 :: P16 :: su_8_64;
+    Vsri, [Requires_feature "CRYPTO"], Use_operands [| Qreg; Qreg; Immed |], "vsriQ_n", shift_insert,
+      [P64];
     Vsri, [], Use_operands [| Qreg; Qreg; Immed |], "vsriQ_n", shift_insert,
       P8 :: P16 :: su_8_64;
 
     (* Vector shift left and insert.  *)
+    Vsli, [Requires_feature "CRYPTO"], Use_operands [| Dreg; Dreg; Immed |], "vsli_n", shift_insert,
+      [P64];
     Vsli, [], Use_operands [| Dreg; Dreg; Immed |], "vsli_n", shift_insert,
       P8 :: P16 :: su_8_64;
+    Vsli, [Requires_feature "CRYPTO"], Use_operands [| Qreg; Qreg; Immed |], "vsliQ_n", shift_insert,
+      [P64];
     Vsli, [], Use_operands [| Qreg; Qreg; Immed |], "vsliQ_n", shift_insert,
       P8 :: P16 :: su_8_64;
 
@@ -1135,6 +1154,11 @@  let ops =
 
     (* Create vector from literal bit pattern.  *)
     Vcreate,
+      [Requires_feature "CRYPTO"; No_op], (* Not really, but it can yield various things that are too
+                                   hard for the test generator at this time.  *)
+      Use_operands [| Dreg; Corereg |], "vcreate", create_vector,
+      [P64];
+    Vcreate,
       [No_op], (* Not really, but it can yield various things that are too
                   hard for the test generator at this time.  *)
       Use_operands [| Dreg; Corereg |], "vcreate", create_vector,
@@ -1148,12 +1172,25 @@  let ops =
       Use_operands [| Dreg; Corereg |], "vdup_n", bits_1,
       pf_su_8_32;
     Vdup_n,
+      [No_op; Requires_feature "CRYPTO";
+       Instruction_name ["vmov"];
+       Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |]]],
+      Use_operands [| Dreg; Corereg |], "vdup_n", notype_1,
+      [P64];
+    Vdup_n,
       [No_op;
        Instruction_name ["vmov"];
        Disassembles_as [Use_operands [| Dreg; Corereg; Corereg |]]],
       Use_operands [| Dreg; Corereg |], "vdup_n", notype_1,
       [S64; U64];
     Vdup_n,
+      [No_op; Requires_feature "CRYPTO";
+       Disassembles_as [Use_operands [| Qreg;
+                                        Alternatives [ Corereg;
+                                                       Element_of_dreg ] |]]],
+      Use_operands [| Qreg; Corereg |], "vdupQ_n", bits_1,
+      [P64];
+    Vdup_n,
       [Disassembles_as [Use_operands [| Qreg;
                                         Alternatives [ Corereg;
                                                        Element_of_dreg ] |]]],
@@ -1206,21 +1243,33 @@  let ops =
       [Disassembles_as [Use_operands [| Dreg; Element_of_dreg |]]],
       Unary_scalar Dreg, "vdup_lane", bits_2, pf_su_8_32;
     Vdup_lane,
+      [No_op; Requires_feature "CRYPTO"; Const_valuator (fun _ -> 0)],
+      Unary_scalar Dreg, "vdup_lane", bits_2, [P64];
+    Vdup_lane,
       [No_op; Const_valuator (fun _ -> 0)],
       Unary_scalar Dreg, "vdup_lane", bits_2, [S64; U64];
     Vdup_lane,
       [Disassembles_as [Use_operands [| Qreg; Element_of_dreg |]]],
       Unary_scalar Qreg, "vdupQ_lane", bits_2, pf_su_8_32;
     Vdup_lane,
+      [No_op; Requires_feature "CRYPTO"; Const_valuator (fun _ -> 0)],
+      Unary_scalar Qreg, "vdupQ_lane", bits_2, [P64];
+    Vdup_lane,
       [No_op; Const_valuator (fun _ -> 0)],
       Unary_scalar Qreg, "vdupQ_lane", bits_2, [S64; U64];
 
     (* Combining vectors.  *)
+    Vcombine, [Requires_feature "CRYPTO"; No_op],
+      Use_operands [| Qreg; Dreg; Dreg |], "vcombine", notype_2,
+      [P64];
     Vcombine, [No_op],
       Use_operands [| Qreg; Dreg; Dreg |], "vcombine", notype_2,
       pf_su_8_64;
 
     (* Splitting vectors.  *)
+    Vget_high, [Requires_feature "CRYPTO"; No_op],
+      Use_operands [| Dreg; Qreg |], "vget_high",
+      notype_1, [P64];
     Vget_high, [No_op],
       Use_operands [| Dreg; Qreg |], "vget_high",
       notype_1, pf_su_8_64;
@@ -1229,7 +1278,10 @@  let ops =
 	       Fixed_vector_reg],
       Use_operands [| Dreg; Qreg |], "vget_low",
       notype_1, pf_su_8_32;
-     Vget_low, [No_op],
+    Vget_low, [Requires_feature "CRYPTO"; No_op],
+      Use_operands [| Dreg; Qreg |], "vget_low",
+      notype_1, [P64];
+    Vget_low, [No_op],
       Use_operands [| Dreg; Qreg |], "vget_low",
       notype_1, [S64; U64];
 
@@ -1412,9 +1464,15 @@  let ops =
       [S16; S32];
 
     (* Vector extract.  *)
+    Vext, [Requires_feature "CRYPTO"; Const_valuator (fun _ -> 0)],
+      Use_operands [| Dreg; Dreg; Dreg; Immed |], "vext", extend,
+      [P64];
     Vext, [Const_valuator (fun _ -> 0)],
       Use_operands [| Dreg; Dreg; Dreg; Immed |], "vext", extend,
       pf_su_8_64;
+    Vext, [Requires_feature "CRYPTO"; Const_valuator (fun _ -> 0)],
+      Use_operands [| Qreg; Qreg; Qreg; Immed |], "vextQ", extend,
+      [P64];
     Vext, [Const_valuator (fun _ -> 0)],
       Use_operands [| Qreg; Qreg; Qreg; Immed |], "vextQ", extend,
       pf_su_8_64;
@@ -1435,11 +1493,21 @@  let ops =
 
     (* Bit selection.  *)
     Vbsl,
+      [Requires_feature "CRYPTO"; Instruction_name ["vbsl"; "vbit"; "vbif"];
+       Disassembles_as [Use_operands [| Dreg; Dreg; Dreg |]]],
+      Use_operands [| Dreg; Dreg; Dreg; Dreg |], "vbsl", bit_select,
+      [P64];
+    Vbsl,
       [Instruction_name ["vbsl"; "vbit"; "vbif"];
        Disassembles_as [Use_operands [| Dreg; Dreg; Dreg |]]],
       Use_operands [| Dreg; Dreg; Dreg; Dreg |], "vbsl", bit_select,
       pf_su_8_64;
     Vbsl,
+      [Requires_feature "CRYPTO"; Instruction_name ["vbsl"; "vbit"; "vbif"];
+       Disassembles_as [Use_operands [| Qreg; Qreg; Qreg |]]],
+      Use_operands [| Qreg; Qreg; Qreg; Qreg |], "vbslQ", bit_select,
+      [P64];
+    Vbsl,
       [Instruction_name ["vbsl"; "vbit"; "vbif"];
        Disassembles_as [Use_operands [| Qreg; Qreg; Qreg |]]],
       Use_operands [| Qreg; Qreg; Qreg; Qreg |], "vbslQ", bit_select,
@@ -1461,10 +1529,21 @@  let ops =
 
     (* Element/structure loads.  VLD1 variants.  *)
     Vldx 1,
+      [Requires_feature "CRYPTO";
+       Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| Dreg; CstPtrTo Corereg |], "vld1", bits_1,
+      [P64];
+    Vldx 1,
       [Disassembles_as [Use_operands [| VecArray (1, Dreg);
                                         CstPtrTo Corereg |]]],
       Use_operands [| Dreg; CstPtrTo Corereg |], "vld1", bits_1,
       pf_su_8_64;
+    Vldx 1, [Requires_feature "CRYPTO";
+             Disassembles_as [Use_operands [| VecArray (2, Dreg);
+					      CstPtrTo Corereg |]]],
+      Use_operands [| Qreg; CstPtrTo Corereg |], "vld1Q", bits_1,
+      [P64];
     Vldx 1, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
 					      CstPtrTo Corereg |]]],
       Use_operands [| Qreg; CstPtrTo Corereg |], "vld1Q", bits_1,
@@ -1476,6 +1555,13 @@  let ops =
       Use_operands [| Dreg; CstPtrTo Corereg; Dreg; Immed |],
       "vld1_lane", bits_3, pf_su_8_32;
     Vldx_lane 1,
+      [Requires_feature "CRYPTO";
+       Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]];
+       Const_valuator (fun _ -> 0)],
+      Use_operands [| Dreg; CstPtrTo Corereg; Dreg; Immed |],
+      "vld1_lane", bits_3, [P64];
+    Vldx_lane 1,
       [Disassembles_as [Use_operands [| VecArray (1, Dreg);
                                         CstPtrTo Corereg |]];
        Const_valuator (fun _ -> 0)],
@@ -1487,6 +1573,12 @@  let ops =
       Use_operands [| Qreg; CstPtrTo Corereg; Qreg; Immed |],
       "vld1Q_lane", bits_3, pf_su_8_32;
     Vldx_lane 1,
+      [Requires_feature "CRYPTO";
+       Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| Qreg; CstPtrTo Corereg; Qreg; Immed |],
+      "vld1Q_lane", bits_3, [P64];
+    Vldx_lane 1,
       [Disassembles_as [Use_operands [| VecArray (1, Dreg);
                                         CstPtrTo Corereg |]]],
       Use_operands [| Qreg; CstPtrTo Corereg; Qreg; Immed |],
@@ -1498,6 +1590,12 @@  let ops =
       Use_operands [| Dreg; CstPtrTo Corereg |], "vld1_dup",
       bits_1, pf_su_8_32;
     Vldx_dup 1,
+      [Requires_feature "CRYPTO";
+       Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| Dreg; CstPtrTo Corereg |], "vld1_dup",
+      bits_1, [P64];
+    Vldx_dup 1,
       [Disassembles_as [Use_operands [| VecArray (1, Dreg);
                                         CstPtrTo Corereg |]]],
       Use_operands [| Dreg; CstPtrTo Corereg |], "vld1_dup",
@@ -1510,16 +1608,32 @@  let ops =
     (* Treated identically to vld1_dup above as we now
        do a single load followed by a duplicate.  *)
     Vldx_dup 1,
+      [Requires_feature "CRYPTO";
+       Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| Qreg; CstPtrTo Corereg |], "vld1Q_dup",
+      bits_1, [P64];
+    Vldx_dup 1,
       [Disassembles_as [Use_operands [| VecArray (1, Dreg);
                                         CstPtrTo Corereg |]]],
       Use_operands [| Qreg; CstPtrTo Corereg |], "vld1Q_dup",
       bits_1, [S64; U64];
 
     (* VST1 variants.  *)
+    Vstx 1, [Requires_feature "CRYPTO";
+             Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                              PtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; Dreg |], "vst1",
+      store_1, [P64];
     Vstx 1, [Disassembles_as [Use_operands [| VecArray (1, Dreg);
                                               PtrTo Corereg |]]],
       Use_operands [| PtrTo Corereg; Dreg |], "vst1",
       store_1, pf_su_8_64;
+    Vstx 1, [Requires_feature "CRYPTO";
+             Disassembles_as [Use_operands [| VecArray (2, Dreg);
+					      PtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; Qreg |], "vst1Q",
+      store_1, [P64];
     Vstx 1, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
 					      PtrTo Corereg |]]],
       Use_operands [| PtrTo Corereg; Qreg |], "vst1Q",
@@ -1531,6 +1645,13 @@  let ops =
       Use_operands [| PtrTo Corereg; Dreg; Immed |],
       "vst1_lane", store_3, pf_su_8_32;
     Vstx_lane 1,
+      [Requires_feature "CRYPTO";
+       Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]];
+       Const_valuator (fun _ -> 0)],
+      Use_operands [| PtrTo Corereg; Dreg; Immed |],
+      "vst1_lane", store_3, [P64];
+    Vstx_lane 1,
       [Disassembles_as [Use_operands [| VecArray (1, Dreg);
                                         CstPtrTo Corereg |]];
        Const_valuator (fun _ -> 0)],
@@ -1542,6 +1663,12 @@  let ops =
       Use_operands [| PtrTo Corereg; Qreg; Immed |],
       "vst1Q_lane", store_3, pf_su_8_32;
     Vstx_lane 1,
+      [Requires_feature "CRYPTO";
+       Disassembles_as [Use_operands [| VecArray (1, Dreg);
+                                        CstPtrTo Corereg |]]],
+      Use_operands [| PtrTo Corereg; Qreg; Immed |],
+      "vst1Q_lane", store_3, [P64];
+    Vstx_lane 1,
       [Disassembles_as [Use_operands [| VecArray (1, Dreg);
                                         CstPtrTo Corereg |]]],
       Use_operands [| PtrTo Corereg; Qreg; Immed |],
@@ -1550,6 +1677,9 @@  let ops =
     (* VLD2 variants.  *)
     Vldx 2, [], Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
       "vld2", bits_1, pf_su_8_32;
+    Vldx 2, [Requires_feature "CRYPTO"; Instruction_name ["vld1"]],
+       Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
+      "vld2", bits_1, [P64];
     Vldx 2, [Instruction_name ["vld1"]],
        Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
       "vld2", bits_1, [S64; U64];
@@ -1581,6 +1711,12 @@  let ops =
       Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
       "vld2_dup", bits_1, pf_su_8_32;
     Vldx_dup 2,
+      [Requires_feature "CRYPTO";
+       Instruction_name ["vld1"]; Disassembles_as [Use_operands
+        [| VecArray (2, Dreg); CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
+      "vld2_dup", bits_1, [P64];
+    Vldx_dup 2,
       [Instruction_name ["vld1"]; Disassembles_as [Use_operands
         [| VecArray (2, Dreg); CstPtrTo Corereg |]]],
       Use_operands [| VecArray (2, Dreg); CstPtrTo Corereg |],
@@ -1591,6 +1727,12 @@  let ops =
                                               PtrTo Corereg |]]],
       Use_operands [| PtrTo Corereg; VecArray (2, Dreg) |], "vst2",
       store_1, pf_su_8_32;
+    Vstx 2, [Requires_feature "CRYPTO";
+             Disassembles_as [Use_operands [| VecArray (2, Dreg);
+                                              PtrTo Corereg |]];
+             Instruction_name ["vst1"]],
+      Use_operands [| PtrTo Corereg; VecArray (2, Dreg) |], "vst2",
+      store_1, [P64];
     Vstx 2, [Disassembles_as [Use_operands [| VecArray (2, Dreg);
                                               PtrTo Corereg |]];
              Instruction_name ["vst1"]],
@@ -1619,6 +1761,9 @@  let ops =
     (* VLD3 variants.  *)
     Vldx 3, [], Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
       "vld3", bits_1, pf_su_8_32;
+    Vldx 3, [Requires_feature "CRYPTO"; Instruction_name ["vld1"]],
+      Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
+      "vld3", bits_1, [P64];
     Vldx 3, [Instruction_name ["vld1"]],
       Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
       "vld3", bits_1, [S64; U64];
@@ -1650,6 +1795,12 @@  let ops =
       Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
       "vld3_dup", bits_1, pf_su_8_32;
     Vldx_dup 3,
+      [Requires_feature "CRYPTO";
+       Instruction_name ["vld1"]; Disassembles_as [Use_operands
+        [| VecArray (3, Dreg); CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
+      "vld3_dup", bits_1, [P64];
+    Vldx_dup 3,
       [Instruction_name ["vld1"]; Disassembles_as [Use_operands
         [| VecArray (3, Dreg); CstPtrTo Corereg |]]],
       Use_operands [| VecArray (3, Dreg); CstPtrTo Corereg |],
@@ -1660,6 +1811,12 @@  let ops =
                                               PtrTo Corereg |]]],
       Use_operands [| PtrTo Corereg; VecArray (3, Dreg) |], "vst3",
       store_1, pf_su_8_32;
+    Vstx 3, [Requires_feature "CRYPTO";
+             Disassembles_as [Use_operands [| VecArray (4, Dreg);
+                                              PtrTo Corereg |]];
+             Instruction_name ["vst1"]],
+      Use_operands [| PtrTo Corereg; VecArray (3, Dreg) |], "vst3",
+      store_1, [P64];
     Vstx 3, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
                                               PtrTo Corereg |]];
              Instruction_name ["vst1"]],
@@ -1688,6 +1845,9 @@  let ops =
     (* VLD4/VST4 variants.  *)
     Vldx 4, [], Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
       "vld4", bits_1, pf_su_8_32;
+    Vldx 4, [Requires_feature "CRYPTO"; Instruction_name ["vld1"]],
+      Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
+      "vld4", bits_1, [P64];
     Vldx 4, [Instruction_name ["vld1"]],
       Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
       "vld4", bits_1, [S64; U64];
@@ -1719,6 +1879,12 @@  let ops =
       Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
       "vld4_dup", bits_1, pf_su_8_32;
     Vldx_dup 4,
+      [Requires_feature "CRYPTO";
+       Instruction_name ["vld1"]; Disassembles_as [Use_operands
+        [| VecArray (4, Dreg); CstPtrTo Corereg |]]],
+      Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
+      "vld4_dup", bits_1, [P64];
+    Vldx_dup 4,
       [Instruction_name ["vld1"]; Disassembles_as [Use_operands
         [| VecArray (4, Dreg); CstPtrTo Corereg |]]],
       Use_operands [| VecArray (4, Dreg); CstPtrTo Corereg |],
@@ -1728,6 +1894,12 @@  let ops =
                                               PtrTo Corereg |]]],
       Use_operands [| PtrTo Corereg; VecArray (4, Dreg) |], "vst4",
       store_1, pf_su_8_32;
+    Vstx 4, [Requires_feature "CRYPTO";
+             Disassembles_as [Use_operands [| VecArray (4, Dreg);
+                                              PtrTo Corereg |]];
+             Instruction_name ["vst1"]],
+      Use_operands [| PtrTo Corereg; VecArray (4, Dreg) |], "vst4",
+      store_1, [P64];
     Vstx 4, [Disassembles_as [Use_operands [| VecArray (4, Dreg);
                                               PtrTo Corereg |]];
              Instruction_name ["vst1"]],
@@ -1779,26 +1951,32 @@  let ops =
     Vorn, [], All (3, Qreg), "vornQ", notype_2, su_8_64;
   ]
 
+let type_in_crypto_only t
+  = (t == P64) or (t == P128)
+
+let cross_product s1 s2
+  = List.filter (fun (e, e') -> e <> e')
+                (List.concat (List.map (fun e1 -> List.map (fun e2 -> (e1,e2)) s1) s2))
+
 let reinterp =
-  let elems = P8 :: P16 :: F32 :: su_8_64 in
-  List.fold_right
-    (fun convto acc ->
-      let types = List.fold_right
-        (fun convfrom acc ->
-          if convfrom <> convto then
-            Cast (convto, convfrom) :: acc
-          else
-            acc)
-        elems
-        []
-      in
-        let dconv = Vreinterp, [No_op], Use_operands [| Dreg; Dreg |],
-                      "vreinterpret", conv_1, types
-        and qconv = Vreinterp, [No_op], Use_operands [| Qreg; Qreg |],
-		      "vreinterpretQ", conv_1, types in
-        dconv :: qconv :: acc)
-    elems
-    []
+  let elems = P8 :: P16 :: F32 :: P64 :: su_8_64 in
+  let casts = cross_product elems elems in
+  List.map
+    (fun (convto, convfrom) ->
+       Vreinterp, (if (type_in_crypto_only convto) or (type_in_crypto_only convfrom)
+                   then [Requires_feature "CRYPTO"] else []) @ [No_op], Use_operands [| Dreg; Dreg |],
+                   "vreinterpret", conv_1, [Cast (convto, convfrom)])
+    casts
+
+let reinterpq =
+  let elems = P8 :: P16 :: F32 :: P64 :: P128 :: su_8_64 in
+  let casts = cross_product elems elems in
+  List.map
+    (fun (convto, convfrom) ->
+       Vreinterp, (if (type_in_crypto_only convto) or (type_in_crypto_only convfrom)
+                   then [Requires_feature "CRYPTO"] else []) @ [No_op], Use_operands [| Qreg; Qreg |],
+                   "vreinterpretQ", conv_1, [Cast (convto, convfrom)])
+    casts
 
 (* Output routines.  *)
 
@@ -1808,6 +1986,7 @@  let rec string_of_elt = function
   | I8 -> "i8" | I16 -> "i16" | I32 -> "i32" | I64 -> "i64"
   | B8 -> "8" | B16 -> "16" | B32 -> "32" | B64 -> "64"
   | F16 -> "f16" | F32 -> "f32" | P8 -> "p8" | P16 -> "p16"
+  | P64 -> "p64" | P128 -> "p128"
   | Conv (a, b) | Cast (a, b) -> string_of_elt a ^ "_" ^ string_of_elt b
   | NoElts -> failwith "No elts"
 
@@ -1851,6 +2030,10 @@  let string_of_vectype vt =
   | T_uint64 -> affix "uint64"
   | T_poly8 -> affix "poly8"
   | T_poly16 -> affix "poly16"
+  | T_poly64 -> affix "poly64"
+  | T_poly64x1 -> affix "poly64x1"
+  | T_poly64x2 -> affix "poly64x2"
+  | T_poly128 -> affix "poly128"
   | T_float16 -> affix "float16"
   | T_float32 -> affix "float32"
   | T_immediate _ -> "const int"
@@ -1859,6 +2042,7 @@  let string_of_vectype vt =
   | T_intHI -> "__builtin_neon_hi"
   | T_intSI -> "__builtin_neon_si"
   | T_intDI -> "__builtin_neon_di"
+  | T_intTI -> "__builtin_neon_ti"
   | T_floatHF -> "__builtin_neon_hf"
   | T_floatSF -> "__builtin_neon_sf"
   | T_arrayof (num, base) ->
@@ -1884,7 +2068,7 @@  let string_of_mode = function
     V8QI -> "v8qi" | V4HI -> "v4hi" | V4HF  -> "v4hf"  | V2SI -> "v2si"
   | V2SF -> "v2sf" | DI   -> "di"   | V16QI -> "v16qi" | V8HI -> "v8hi"
   | V4SI -> "v4si" | V4SF -> "v4sf" | V2DI  -> "v2di"  | QI   -> "qi"
-  | HI -> "hi" | SI -> "si" | SF -> "sf"
+  | HI -> "hi" | SI -> "si" | SF -> "sf" | TI -> "ti"
 
 (* Use uppercase chars for letters which form part of the intrinsic name, but
    should be omitted from the builtin name (the info is passed in an extra
@@ -1991,3 +2175,146 @@  let analyze_all_shapes features shape f =
     | _ -> assert false
   with Not_found -> [f shape]
 
+(* The crypto intrinsics have unconventional shapes and are not that
+   numerous to be worth the trouble of encoding here.  We implement them
+   explicitly here.  *)
+let crypto_intrinsics =
+"
+#ifdef __ARM_FEATURE_CRYPTO
+
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vldrq_p128 (poly128_t const * __ptr)
+{
+#ifdef __ARM_BIG_ENDIAN
+  poly64_t* __ptmp = (poly64_t*) __ptr;
+  poly64_t __d0 = vld1_p64 (__ptmp);
+  poly64_t __d1 = vld1_p64 (__ptmp + 1);
+  return vreinterpretq_p128_p64 (vcombine_p64 (__d1, __d0));
+#else
+  return vreinterpretq_p128_p64 (vld1q_p64 ((poly64_t*) __ptr));
+#endif
+}
+
+__extension__ static __inline void __attribute__ ((__always_inline__))
+vstrq_p128 (poly128_t * __ptr, poly128_t __val)
+{
+#ifdef __ARM_BIG_ENDIAN
+  poly64x2_t __tmp = vreinterpretq_p64_p128 (__val);
+  poly64_t __d0 = vget_high_p64 (__tmp);
+  poly64_t __d1 = vget_low_p64 (__tmp);
+  vst1q_p64 ((poly64_t*) __ptr, vcombine_p64 (__d0, __d1));
+#else
+  vst1q_p64 ((poly64_t*) __ptr, vreinterpretq_p64_p128 (__val));
+#endif
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vaeseq_u8 (uint8x16_t __data, uint8x16_t __key)
+{
+  return __builtin_arm_crypto_aese (__data, __key);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vaesdq_u8 (uint8x16_t __data, uint8x16_t __key)
+{
+  return __builtin_arm_crypto_aesd (__data, __key);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vaesmcq_u8 (uint8x16_t __data)
+{
+  return __builtin_arm_crypto_aesmc (__data);
+}
+
+__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
+vaesimcq_u8 (uint8x16_t __data)
+{
+  return __builtin_arm_crypto_aesimc (__data);
+}
+
+__extension__ static __inline uint32_t __attribute__ ((__always_inline__))
+vsha1h_u32 (uint32_t __hash_e)
+{
+  uint32x4_t __t = vdupq_n_u32 (0);
+  __t = vsetq_lane_u32 (__hash_e, __t, 0);
+  __t = __builtin_arm_crypto_sha1h (__t);
+  return vgetq_lane_u32 (__t, 0);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsha1cq_u32 (uint32x4_t __hash_abcd, uint32_t __hash_e, uint32x4_t __wk)
+{
+  uint32x4_t __t = vdupq_n_u32 (0);
+  __t = vsetq_lane_u32 (__hash_e, __t, 0);
+  return __builtin_arm_crypto_sha1c (__hash_abcd, __t, __wk);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsha1pq_u32 (uint32x4_t __hash_abcd, uint32_t __hash_e, uint32x4_t __wk)
+{
+  uint32x4_t __t = vdupq_n_u32 (0);
+  __t = vsetq_lane_u32 (__hash_e, __t, 0);
+  return __builtin_arm_crypto_sha1p (__hash_abcd, __t, __wk);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsha1mq_u32 (uint32x4_t __hash_abcd, uint32_t __hash_e, uint32x4_t __wk)
+{
+  uint32x4_t __t = vdupq_n_u32 (0);
+  __t = vsetq_lane_u32 (__hash_e, __t, 0);
+  return __builtin_arm_crypto_sha1m (__hash_abcd, __t, __wk);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsha1su0q_u32 (uint32x4_t __w0_3, uint32x4_t __w4_7, uint32x4_t __w8_11)
+{
+  return __builtin_arm_crypto_sha1su0 (__w0_3, __w4_7, __w8_11);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsha1su1q_u32 (uint32x4_t __tw0_3, uint32x4_t __w12_15)
+{
+  return __builtin_arm_crypto_sha1su1 (__tw0_3, __w12_15);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsha256hq_u32 (uint32x4_t __hash_abcd, uint32x4_t __hash_efgh, uint32x4_t __wk)
+{
+  return __builtin_arm_crypto_sha256h (__hash_abcd, __hash_efgh, __wk);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsha256h2q_u32 (uint32x4_t __hash_abcd, uint32x4_t __hash_efgh, uint32x4_t __wk)
+{
+  return __builtin_arm_crypto_sha256h2 (__hash_abcd, __hash_efgh, __wk);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsha256su0q_u32 (uint32x4_t __w0_3, uint32x4_t __w4_7)
+{
+  return __builtin_arm_crypto_sha256su0 (__w0_3, __w4_7);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vsha256su1q_u32 (uint32x4_t __tw0_3, uint32x4_t __w8_11, uint32x4_t __w12_15)
+{
+  return __builtin_arm_crypto_sha256su1 (__tw0_3, __w8_11, __w12_15);
+}
+
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vmull_p64 (poly64_t __a, poly64_t __b)
+{
+  return (poly128_t) __builtin_arm_crypto_vmullp64 ((uint64_t) __a, (uint64_t) __b);
+}
+
+__extension__ static __inline poly128_t __attribute__ ((__always_inline__))
+vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
+{
+  poly64_t __t1 = vget_high_p64 (__a);
+  poly64_t __t2 = vget_high_p64 (__b);
+
+  return (poly128_t) __builtin_arm_crypto_vmullp64 ((uint64_t) __t1, (uint64_t) __t2);
+}
+
+#endif
+"
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index b505be3..e51e63d 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -519,6 +519,12 @@ 
 ; neon_fp_div_s_q
 ; neon_fp_div_d
 ; neon_fp_div_d_q
+; crypto_aes
+; crypto_sha1_xor
+; crypto_sha1_fast
+; crypto_sha1_slow
+; crypto_sha256_fast
+; crypto_sha256_slow
 
 (define_attr "type"
  "adc_imm,\
@@ -822,6 +828,7 @@ 
   neon_mul_b_long,\
   neon_mul_h_long,\
   neon_mul_s_long,\
+  neon_mul_d_long,\
   neon_mul_h_scalar,\
   neon_mul_h_scalar_q,\
   neon_mul_s_scalar,\
@@ -1036,7 +1043,14 @@ 
   neon_fp_div_s,\
   neon_fp_div_s_q,\
   neon_fp_div_d,\
-  neon_fp_div_d_q"
+  neon_fp_div_d_q,\
+\
+  crypto_aes,\
+  crypto_sha1_xor,\
+  crypto_sha1_fast,\
+  crypto_sha1_slow,\
+  crypto_sha256_fast,\
+  crypto_sha256_slow"
    (const_string "untyped"))
 
 ; Is this an (integer side) multiply with a 32-bit (or smaller) result?
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index f8faba3..af4b832 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -155,6 +155,21 @@ 
   UNSPEC_CRC32CB
   UNSPEC_CRC32CH
   UNSPEC_CRC32CW
+  UNSPEC_AESD
+  UNSPEC_AESE
+  UNSPEC_AESIMC
+  UNSPEC_AESMC
+  UNSPEC_SHA1C
+  UNSPEC_SHA1M
+  UNSPEC_SHA1P
+  UNSPEC_SHA1H
+  UNSPEC_SHA1SU0
+  UNSPEC_SHA1SU1
+  UNSPEC_SHA256H
+  UNSPEC_SHA256H2
+  UNSPEC_SHA256SU0
+  UNSPEC_SHA256SU1
+  UNSPEC_VMULLP64
   UNSPEC_LOAD_COUNT
   UNSPEC_VABD
   UNSPEC_VABDL