From patchwork Thu Dec 14 08:29:33 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sameera Deshpande X-Patchwork-Id: 848399 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-469173-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="q85vbFEV"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yy6F14Sdzz9sR8 for ; Thu, 14 Dec 2017 19:29:56 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:cc:content-type; q=dns; s=default; b=lPJQr7IEaWixgLIPNLpIp2UobrCns8h6fiN26BI0NLF Q01YmaoNUpo74t/+NcUglL4gnZHEH1/J4VsdF5YPV5a6cVtXFtG8QwAC96kY7aMf xo6UWcMaG7IEoj2Psk8+NNuH4wdJXw6QB2hogfTjMpFVn3My/Kx51qa9yepgylpo = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:cc:content-type; s=default; bh=K1CAv1YYl4Rz1l1b0D4qEpe3trc=; b=q85vbFEVwSdK1sq+0 1lQTwb4shZGBJKg80WKSJNydzLrY8P1V95DdLGCR2VbOpu9kiP5xz8Tx1WpMClXS G9RT1l7xYjo3vwK8wH5DW+ZNsYKp1UTJRCPhXBHO2xja6g2CMm9SyrAPW4QUt9ye y32ojBENiQYxbbguZLJPlqzDww= Received: (qmail 61752 invoked by alias); 14 Dec 2017 08:29:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 60420 invoked by uid 89); 14 Dec 2017 08:29:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wr0-f176.google.com Received: from mail-wr0-f176.google.com (HELO mail-wr0-f176.google.com) (209.85.128.176) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 14 Dec 2017 08:29:36 +0000 Received: by mail-wr0-f176.google.com with SMTP id o2so4406561wro.5 for ; Thu, 14 Dec 2017 00:29:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=fFGw4e2AguOfsifBL0EJYq9HuGvOdAKMcnGvOf4S7H8=; b=F0biJbAjHWB5iq+Sk8P/hUAo+f7s7nJ+lYg7jLTuw+DZukR4IGSSxKINl2s6vZ3yuZ qc1KDFi9Xx6c9B5xke7FXpkgkerDmJGzMa4J0AiR10tU16iFgLP8BG3OPdoKmxdFfTku RYmT91jvo738eIhFDiEgllhBZL6SrFOuURoIObnPP4pY61cm/eSvm4Y31BmLlIu0LLTw LvujpNHnUAyTyg2Gv2hDyUVLmjgByHUxlRteCLBDaHaM0NrEKp9HyPUQ6QQtZlgus1gI GGQPEOvvTIPX+79DDn+o6q2UmjPt6mPwIgJJTT1X6/g0eg11eAo8avyOFaMA1EkwO8xU PwHw== X-Gm-Message-State: AKGB3mJUHImYT66y+IXQj1d4ihvYPez2k8Hf2QUaBRzPE5XvFfbh5oVl SnwquX6FefrVuSTlSr/LQv7fb9QdemmKUsNkm9+lTsu3 X-Google-Smtp-Source: ACJfBou7WY7SpPCpqI2ZAddEoDdiv9eNTwKCWfxNjI1Xq7PYB4hyu4iaf02DEsWZnyxMunOr7I5M87PS4To4yLP9sfI= X-Received: by 10.223.178.130 with SMTP id g2mr4734411wrd.129.1513240174169; Thu, 14 Dec 2017 00:29:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.150.79 with HTTP; Thu, 14 Dec 2017 00:29:33 -0800 (PST) From: Sameera Deshpande Date: Thu, 14 Dec 2017 13:59:33 +0530 Message-ID: Subject: [AARCH64] Neon vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics To: gcc-patches@gcc.gnu.org Cc: james.greenhalgh@arm.com, Richard Earnshaw Hi! Please find attached the patch implementing vld1_*_x3, vst1_*_x2 and vst1_*_x3 intrinsics as defined by Neon document. Ok for trunk? - Thanks and regards, Sameera D. gcc/Changelog: 2017-11-14 Sameera Deshpande * config/aarch64/aarch64-simd-builtins.def (ld1x3): New. (st1x2): Likewise. (st1x3): Likewise. * config/aarch64/aarch64-simd.md (aarch64_ld1x3): New pattern. (aarch64_ld1_x3_): Likewise (aarch64_st1x2): Likewise (aarch64_st1_x2_): Likewise (aarch64_st1x3): Likewise (aarch64_st1_x3_): Likewise * config/aarch64/arm_neon.h (vld1_u8_x3): New function. (vld1_s8_x3): Likewise. (vld1_u16_x3): Likewise. (vld1_s16_x3): Likewise. (vld1_u32_x3): Likewise. (vld1_s32_x3): Likewise. (vld1_u64_x3): Likewise. (vld1_s64_x3): Likewise. (vld1_fp16_x3): Likewise. (vld1_f32_x3): Likewise. (vld1_f64_x3): Likewise. (vld1_p8_x3): Likewise. (vld1_p16_x3): Likewise. (vld1_p64_x3): Likewise. (vld1q_u8_x3): Likewise. (vld1q_s8_x3): Likewise. (vld1q_u16_x3): Likewise. (vld1q_s16_x3): Likewise. (vld1q_u32_x3): Likewise. (vld1q_s32_x3): Likewise. (vld1q_u64_x3): Likewise. (vld1q_s64_x3): Likewise. (vld1q_f16_x3): Likewise. (vld1q_f32_x3): Likewise. (vld1q_f64_x3): Likewise. (vld1q_p8_x3): Likewise. (vld1q_p16_x3): Likewise. (vld1q_p64_x3): Likewise. (vst1_s64_x2): Likewise. (vst1_u64_x2): Likewise. (vst1_f64_x2): Likewise. (vst1_s8_x2): Likewise. (vst1_p8_x2): Likewise. (vst1_s16_x2): Likewise. (vst1_p16_x2): Likewise. (vst1_s32_x2): Likewise. (vst1_u8_x2): Likewise. (vst1_u16_x2): Likewise. (vst1_u32_x2): Likewise. (vst1_f16_x2): Likewise. (vst1_f32_x2): Likewise. (vst1_p64_x2): Likewise. (vst1q_s8_x2): Likewise. (vst1q_p8_x2): Likewise. (vst1q_s16_x2): Likewise. (vst1q_p16_x2): Likewise. (vst1q_s32_x2): Likewise. (vst1q_s64_x2): Likewise. (vst1q_u8_x2): Likewise. (vst1q_u16_x2): Likewise. (vst1q_u32_x2): Likewise. (vst1q_u64_x2): Likewise. (vst1q_f16_x2): Likewise. (vst1q_f32_x2): Likewise. (vst1q_f64_x2): Likewise. (vst1q_p64_x2): Likewise. (vst1_s64_x3): Likewise. (vst1_u64_x3): Likewise. (vst1_f64_x3): Likewise. (vst1_s8_x3): Likewise. (vst1_p8_x3): Likewise. (vst1_s16_x3): Likewise. (vst1_p16_x3): Likewise. (vst1_s32_x3): Likewise. (vst1_u8_x3): Likewise. (vst1_u16_x3): Likewise. (vst1_u32_x3): Likewise. (vst1_f16_x3): Likewise. (vst1_f32_x3): Likewise. (vst1_p64_x3): Likewise. (vst1q_s8_x3): Likewise. (vst1q_p8_x3): Likewise. (vst1q_s16_x3): Likewise. (vst1q_p16_x3): Likewise. (vst1q_s32_x3): Likewise. (vst1q_s64_x3): Likewise. (vst1q_u8_x3): Likewise. (vst1q_u16_x3): Likewise. (vst1q_u32_x3): Likewise. (vst1q_u64_x3): Likewise. (vst1q_f16_x3): Likewise. (vst1q_f32_x3): Likewise. (vst1q_f64_x3): Likewise. (vst1q_p64_x3): Likewise. diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 52d01342372..fa623e90017 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -441,6 +441,15 @@ BUILTIN_VALL_F16 (STORE1, st1, 0) VAR1(STORE1P, st1, 0, v2di) + /* Implemented by aarch64_ld1x3. */ + BUILTIN_VALLDIF (LOADSTRUCT, ld1x3, 0) + + /* Implemented by aarch64_st1x2. */ + BUILTIN_VALLDIF (STORESTRUCT, st1x2, 0) + + /* Implemented by aarch64_st1x3. */ + BUILTIN_VALLDIF (STORESTRUCT, st1x3, 0) + /* Implemented by fma4. */ BUILTIN_VHSDF (TERNOP, fma, 4) VAR1 (TERNOP, fma, 4, hf) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 4fd34c18f95..852bcf0c16a 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -5038,6 +5038,70 @@ } }) + +(define_expand "aarch64_ld1x3" + [(match_operand:CI 0 "register_operand" "=w") + (match_operand:DI 1 "register_operand" "r") + (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + "TARGET_SIMD" +{ + rtx mem = gen_rtx_MEM (CImode, operands[1]); + emit_insn (gen_aarch64_ld1_x3_ (operands[0], mem)); + DONE; +}) + +(define_insn "aarch64_ld1_x3_" + [(set (match_operand:CI 0 "register_operand" "=w") + (unspec:CI + [(match_operand:CI 1 "aarch64_simd_struct_operand" "Utv") + (unspec:VALLDIF [(const_int 3)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_LD1))] + "TARGET_SIMD" + "ld1\\t{%S0. - %U0.}, %1" + [(set_attr "type" "neon_load1_3reg")] +) + +(define_expand "aarch64_st1x2" + [(match_operand:DI 0 "register_operand" "") + (match_operand:OI 1 "register_operand" "") + (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + "TARGET_SIMD" +{ + rtx mem = gen_rtx_MEM (OImode, operands[0]); + emit_insn (gen_aarch64_st1_x2_ (mem, operands[1])); + DONE; +}) + +(define_insn "aarch64_st1_x2_" + [(set (unspec:OI + [(match_operand:OI 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VALLDIF [(const_int 2)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_ST1) + (match_operand:OI 1 "register_operand" "w"))] + "TARGET_SIMD" + "st1\\t{%S1. - %T1.}, %0" + [(set_attr "type" "neon_store1_2reg")] +) + +(define_expand "aarch64_st1x3" + [(match_operand:DI 0 "register_operand" "") + (match_operand:CI 1 "register_operand" "") + (unspec:VALLDIF [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + "TARGET_SIMD" +{ + rtx mem = gen_rtx_MEM (CImode, operands[0]); + emit_insn (gen_aarch64_st1_x3_ (mem, operands[1])); + DONE; +}) + +(define_insn "aarch64_st1_x3_" + [(set (unspec:CI + [(match_operand:CI 0 "aarch64_simd_struct_operand" "=Utv") + (unspec:VALLDIF [(const_int 3)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_ST1) + (match_operand:CI 1 "register_operand" "w"))] + "TARGET_SIMD" + "st1\\t{%S1. - %U1.}, %0" + [(set_attr "type" "neon_store1_3reg")] +) + (define_insn "*aarch64_mov" [(set (match_operand:VSTRUCT 0 "aarch64_simd_nonimmediate_operand" "=w,Utv,w") (match_operand:VSTRUCT 1 "aarch64_simd_general_operand" " w,w,Utv"))] diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 96e740f91a7..81fed6c852f 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -17145,6 +17145,374 @@ vld1_u64 (const uint64_t *a) return (uint64x1_t) {*a}; } +/* vld1x3 */ + +__extension__ extern __inline uint8x8x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_u8_x3 (const uint8_t *__a) +{ + uint8x8x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = (__builtin_aarch64_simd_ci)__builtin_aarch64_ld1x3v8qi ((const __builtin_aarch64_simd_qi *) __a); + __i.val[0] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0); + __i.val[1] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1); + __i.val[2] = (uint8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2); + return __i; +} + +__extension__ extern __inline int8x8x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_s8_x3 (const uint8_t *__a) +{ + int8x8x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v8qi ((const __builtin_aarch64_simd_qi *) __a); + __i.val[0] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0); + __i.val[1] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1); + __i.val[2] = (int8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2); + return __i; +} + +__extension__ extern __inline uint16x4x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_u16_x3 (const uint16_t *__a) +{ + uint16x4x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v4hi ((const __builtin_aarch64_simd_hi *) __a); + __i.val[0] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0); + __i.val[1] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1); + __i.val[2] = (uint16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2); + return __i; +} + +__extension__ extern __inline int16x4x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_s16_x3 (const int16_t *__a) +{ + int16x4x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v4hi ((const __builtin_aarch64_simd_hi *) __a); + __i.val[0] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0); + __i.val[1] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1); + __i.val[2] = (int16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2); + return __i; +} + +__extension__ extern __inline uint32x2x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_u32_x3 (const uint32_t *__a) +{ + uint32x2x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v2si ((const __builtin_aarch64_simd_si *) __a); + __i.val[0] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0); + __i.val[1] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1); + __i.val[2] = (uint32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2); + return __i; +} + +__extension__ extern __inline int32x2x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_s32_x3 (const uint32_t *__a) +{ + int32x2x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v2si ((const __builtin_aarch64_simd_si *) __a); + __i.val[0] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 0); + __i.val[1] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 1); + __i.val[2] = (int32x2_t) __builtin_aarch64_get_dregciv2si (__o, 2); + return __i; +} + +__extension__ extern __inline uint64x1x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_u64_x3 (const uint64_t *__a) +{ + uint64x1x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3di ((const __builtin_aarch64_simd_di *) __a); + __i.val[0] = ((uint64x1_t *)__o)[0]; + __i.val[1] = ((uint64x1_t *)__o)[1]; + __i.val[2] = ((uint64x1_t *)__o)[2]; + return __i; +} + +__extension__ extern __inline int64x1x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_s64_x3 (const int64_t *__a) +{ + int64x1x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3di ((const __builtin_aarch64_simd_di *) __a); + __i.val[0] = ((int64x1_t *)__o)[0]; + __i.val[1] = ((int64x1_t *)__o)[1]; + __i.val[2] = ((int64x1_t *)__o)[2]; + + return __i; +} + +__extension__ extern __inline float16x4x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_fp16_x3 (const float16_t *__a) +{ + float16x4x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v4hf ((const __builtin_aarch64_simd_hf *) __a); + __i.val[0] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 0); + __i.val[1] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 1); + __i.val[2] = (float16x4_t) __builtin_aarch64_get_dregciv4hf (__o, 2); + return __i; +} + +__extension__ extern __inline float32x2x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_f32_x3 (const float32_t *__a) +{ + float32x2x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v2sf ((const __builtin_aarch64_simd_sf *) __a); + __i.val[0] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 0); + __i.val[1] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 1); + __i.val[2] = (float32x2_t) __builtin_aarch64_get_dregciv2sf (__o, 2); + return __i; +} + +__extension__ extern __inline float64x1x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_f64_x3 (const float64_t *__a) +{ + float64x1x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3df ((const __builtin_aarch64_simd_df *) __a); + __i.val[0] = ((float64x1_t *)__o)[0]; + __i.val[1] = ((float64x1_t *)__o)[1]; + __i.val[2] = ((float64x1_t *)__o)[2]; + return __i; +} + +__extension__ extern __inline poly8x8x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_p8_x3 (const poly8_t *__a) +{ + poly8x8x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v8qi ((const __builtin_aarch64_simd_qi *) __a); + __i.val[0] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 0); + __i.val[1] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 1); + __i.val[2] = (poly8x8_t) __builtin_aarch64_get_dregciv8qi (__o, 2); + return __i; +} + +__extension__ extern __inline poly16x4x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_p16_x3 (const poly16_t *__a) +{ + poly16x4x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v4hi ((const __builtin_aarch64_simd_hi *) __a); + __i.val[0] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 0); + __i.val[1] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 1); + __i.val[2] = (poly16x4_t) __builtin_aarch64_get_dregciv4hi (__o, 2); + return __i; +} + +__extension__ extern __inline poly64x1x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1_p64_x3 (const poly64_t *__a) +{ + poly64x1x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3di ((const __builtin_aarch64_simd_di *) __a); + __i.val[0] = ((poly64x1_t *)__o)[0]; + __i.val[1] = ((poly64x1_t *)__o)[1]; + __i.val[2] = ((poly64x1_t *)__o)[2]; + +return __i; +} + +__extension__ extern __inline uint8x16x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_u8_x3 (const uint8_t *__a) +{ + uint8x16x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v16qi ((const __builtin_aarch64_simd_qi *) __a); + __i.val[0] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); + __i.val[1] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); + __i.val[2] = (uint8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); + return __i; +} + +__extension__ extern __inline int8x16x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_s8_x3 (const int8_t *__a) +{ + int8x16x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v16qi ((const __builtin_aarch64_simd_qi *) __a); + __i.val[0] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 0); + __i.val[1] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 1); + __i.val[2] = (int8x16_t) __builtin_aarch64_get_qregciv16qi (__o, 2); + return __i; +} + +__extension__ extern __inline uint16x8x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_u16_x3 (const uint16_t *__a) +{ + uint16x8x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v8hi ((const __builtin_aarch64_simd_hi *) __a); + __i.val[0] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); + __i.val[1] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); + __i.val[2] = (uint16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); + return __i; +} + +__extension__ extern __inline int16x8x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_s16_x3 (const int16_t *__a) +{ + int16x8x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v8hi ((const __builtin_aarch64_simd_hi *) __a); + __i.val[0] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); + __i.val[1] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); + __i.val[2] = (int16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); + return __i; +} + +__extension__ extern __inline uint32x4x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_u32_x3 (const uint32_t *__a) +{ + uint32x4x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v4si ((const __builtin_aarch64_simd_si *) __a); + __i.val[0] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); + __i.val[1] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); + __i.val[2] = (uint32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); + return __i; +} + +__extension__ extern __inline int32x4x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_s32_x3 (const int32_t *__a) +{ + int32x4x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v4si ((const __builtin_aarch64_simd_si *) __a); + __i.val[0] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 0); + __i.val[1] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 1); + __i.val[2] = (int32x4_t) __builtin_aarch64_get_qregciv4si (__o, 2); + return __i; +} + +__extension__ extern __inline uint64x2x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_u64_x3 (const uint64_t *__a) +{ + uint64x2x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v2di ((const __builtin_aarch64_simd_di *) __a); + __i.val[0] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0); + __i.val[1] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1); + __i.val[2] = (uint64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2); + return __i; +} + +__extension__ extern __inline int64x2x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_s64_x3 (const int64_t *__a) +{ + int64x2x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v2di ((const __builtin_aarch64_simd_di *) __a); + __i.val[0] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0); + __i.val[1] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1); + __i.val[2] = (int64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2); + return __i; +} + +__extension__ extern __inline float16x8x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_f16_x3 (const float16_t *__a) +{ + float16x8x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v8hf ((const __builtin_aarch64_simd_hf *) __a); + __i.val[0] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 0); + __i.val[1] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 1); + __i.val[2] = (float16x8_t) __builtin_aarch64_get_qregciv8hf (__o, 2); + return __i; +} + +__extension__ extern __inline float32x4x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_f32_x3 (const float32_t *__a) +{ + float32x4x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v4sf ((const __builtin_aarch64_simd_sf *) __a); + __i.val[0] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 0); + __i.val[1] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 1); + __i.val[2] = (float32x4_t) __builtin_aarch64_get_qregciv4sf (__o, 2); + return __i; +} + +__extension__ extern __inline float64x2x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_f64_x3 (const float64_t *__a) +{ + float64x2x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v2df ((const __builtin_aarch64_simd_df *) __a); + __i.val[0] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 0); + __i.val[1] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 1); + __i.val[2] = (float64x2_t) __builtin_aarch64_get_qregciv2df (__o, 2); + return __i; +} + +__extension__ extern __inline poly8x16x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_p8_x3 (const poly8_t *__a) +{ + poly8x16x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v8hi ((const __builtin_aarch64_simd_hi *) __a); + __i.val[0] = (poly8x16_t) __builtin_aarch64_get_qregciv8hi (__o, 0); + __i.val[1] = (poly8x16_t) __builtin_aarch64_get_qregciv8hi (__o, 1); + __i.val[2] = (poly8x16_t) __builtin_aarch64_get_qregciv8hi (__o, 2); + return __i; +} + +__extension__ extern __inline poly16x8x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_p16_x3 (const poly16_t *__a) +{ + poly16x8x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v8hi ((const __builtin_aarch64_simd_hi *) __a); + __i.val[0] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 0); + __i.val[1] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 1); + __i.val[2] = (poly16x8_t) __builtin_aarch64_get_qregciv8hi (__o, 2); + return __i; +} + +__extension__ extern __inline poly64x2x3_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vld1q_p64_x3 (const poly64_t *__a) +{ + poly64x2x3_t __i; + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_ld1x3v2di ((const __builtin_aarch64_simd_di *) __a); + __i.val[0] = (poly64x2_t) __builtin_aarch64_get_qregciv2di (__o, 0); + __i.val[1] = (poly64x2_t) __builtin_aarch64_get_qregciv2di (__o, 1); + __i.val[2] = (poly64x2_t) __builtin_aarch64_get_qregciv2di (__o, 2); + return __i; +} + /* vld1q */ __extension__ extern __inline float16x8_t @@ -27161,6 +27529,706 @@ vst1q_lane_u64 (uint64_t *__a, uint64x2_t __b, const int __lane) *__a = __aarch64_vget_lane_any (__b, __lane); } +/* vst1x2 */ + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_s64_x2 (int64_t * __a, int64x1x2_t val) +{ + __builtin_aarch64_simd_oi __o; + int64x2x2_t temp; + temp.val[0] = vcombine_s64 (val.val[0], vcreate_s64 (__AARCH64_INT64_C (0))); + temp.val[1] = vcombine_s64 (val.val[1], vcreate_s64 (__AARCH64_INT64_C (0))); + __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) temp.val[1], 1); + __builtin_aarch64_st1x2di ((__builtin_aarch64_simd_di *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_u64_x2 (uint64_t * __a, uint64x1x2_t val) +{ + __builtin_aarch64_simd_oi __o; + uint64x2x2_t temp; + temp.val[0] = vcombine_u64 (val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_u64 (val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) temp.val[1], 1); + __builtin_aarch64_st1x2di ((__builtin_aarch64_simd_di *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_f64_x2 (float64_t * __a, float64x1x2_t val) +{ + __builtin_aarch64_simd_oi __o; + float64x2x2_t temp; + temp.val[0] = vcombine_f64 (val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_f64 (val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) temp.val[1], 1); + __builtin_aarch64_st1x2df ((__builtin_aarch64_simd_df *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_s8_x2 (int8_t * __a, int8x8x2_t val) +{ + __builtin_aarch64_simd_oi __o; + int8x16x2_t temp; + temp.val[0] = vcombine_s8 (val.val[0], vcreate_s8 (__AARCH64_INT64_C (0))); + temp.val[1] = vcombine_s8 (val.val[1], vcreate_s8 (__AARCH64_INT64_C (0))); + __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[1], 1); + __builtin_aarch64_st1x2v8qi ((__builtin_aarch64_simd_qi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_p8_x2 (poly8_t * __a, poly8x8x2_t val) +{ + __builtin_aarch64_simd_oi __o; + poly8x16x2_t temp; + temp.val[0] = vcombine_p8 (val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_p8 (val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[1], 1); + __builtin_aarch64_st1x2v8qi ((__builtin_aarch64_simd_qi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_s16_x2 (int16_t * __a, int16x4x2_t val) +{ + __builtin_aarch64_simd_oi __o; + int16x8x2_t temp; + temp.val[0] = vcombine_s16 (val.val[0], vcreate_s16 (__AARCH64_INT64_C (0))); + temp.val[1] = vcombine_s16 (val.val[1], vcreate_s16 (__AARCH64_INT64_C (0))); + __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[1], 1); + __builtin_aarch64_st1x2v4hi ((__builtin_aarch64_simd_hi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_p16_x2 (poly16_t * __a, poly16x4x2_t val) +{ + __builtin_aarch64_simd_oi __o; + poly16x8x2_t temp; + temp.val[0] = vcombine_p16 (val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_p16 (val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[1], 1); + __builtin_aarch64_st1x2v4hi ((__builtin_aarch64_simd_hi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_s32_x2 (int32_t * __a, int32x2x2_t val) +{ + __builtin_aarch64_simd_oi __o; + int32x4x2_t temp; + temp.val[0] = vcombine_s32 (val.val[0], vcreate_s32 (__AARCH64_INT64_C (0))); + temp.val[1] = vcombine_s32 (val.val[1], vcreate_s32 (__AARCH64_INT64_C (0))); + __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) temp.val[1], 1); + __builtin_aarch64_st1x2v2si ((__builtin_aarch64_simd_si *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_u8_x2 (uint8_t * __a, uint8x8x2_t val) +{ + __builtin_aarch64_simd_oi __o; + uint8x16x2_t temp; + temp.val[0] = vcombine_u8 (val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_u8 (val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) temp.val[1], 1); + __builtin_aarch64_st1x2v8qi ((__builtin_aarch64_simd_qi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_u16_x2 (uint16_t * __a, uint16x4x2_t val) +{ + __builtin_aarch64_simd_oi __o; + uint16x8x2_t temp; + temp.val[0] = vcombine_u16 (val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_u16 (val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) temp.val[1], 1); + __builtin_aarch64_st1x2v4hi ((__builtin_aarch64_simd_hi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_u32_x2 (uint32_t * __a, uint32x2x2_t val) +{ + __builtin_aarch64_simd_oi __o; + uint32x4x2_t temp; + temp.val[0] = vcombine_u32 (val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_u32 (val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) temp.val[1], 1); + __builtin_aarch64_st1x2v2si ((__builtin_aarch64_simd_si *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_f16_x2 (float16_t * __a, float16x4x2_t val) +{ + __builtin_aarch64_simd_oi __o; + float16x8x2_t temp; + temp.val[0] = vcombine_f16 (val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_f16 (val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregoiv8hf (__o, temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv8hf (__o, temp.val[1], 1); + __builtin_aarch64_st1x2v4hf (__a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_f32_x2 (float32_t * __a, float32x2x2_t val) +{ + __builtin_aarch64_simd_oi __o; + float32x4x2_t temp; + temp.val[0] = vcombine_f32 (val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_f32 (val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) temp.val[1], 1); + __builtin_aarch64_st1x2v2sf ((__builtin_aarch64_simd_sf *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_p64_x2 (poly64_t * __a, poly64x1x2_t val) +{ + __builtin_aarch64_simd_oi __o; + poly64x2x2_t temp; + temp.val[0] = vcombine_p64 (val.val[0], vcreate_p64 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_p64 (val.val[1], vcreate_p64 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregoiv2di_ssps (__o, + (poly64x2_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregoiv2di_ssps (__o, + (poly64x2_t) temp.val[1], 1); + __builtin_aarch64_st1x2di ((__builtin_aarch64_simd_di *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_s8_x2 (int8_t * __a, int8x16x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[1], 1); + __builtin_aarch64_st1x2v16qi ((__builtin_aarch64_simd_qi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_p8_x2 (poly8_t * __a, poly8x16x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[1], 1); + __builtin_aarch64_st1x2v16qi ((__builtin_aarch64_simd_qi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_s16_x2 (int16_t * __a, int16x8x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[1], 1); + __builtin_aarch64_st1x2v8hi ((__builtin_aarch64_simd_hi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_p16_x2 (poly16_t * __a, poly16x8x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[1], 1); + __builtin_aarch64_st1x2v8hi ((__builtin_aarch64_simd_hi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_s32_x2 (int32_t * __a, int32x4x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[1], 1); + __builtin_aarch64_st1x2v4si ((__builtin_aarch64_simd_si *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_s64_x2 (int64_t * __a, int64x2x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[1], 1); + __builtin_aarch64_st1x2v2di ((__builtin_aarch64_simd_di *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_u8_x2 (uint8_t * __a, uint8x16x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv16qi (__o, (int8x16_t) val.val[1], 1); + __builtin_aarch64_st1x2v16qi ((__builtin_aarch64_simd_qi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_u16_x2 (uint16_t * __a, uint16x8x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv8hi (__o, (int16x8_t) val.val[1], 1); + __builtin_aarch64_st1x2v8hi ((__builtin_aarch64_simd_hi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_u32_x2 (uint32_t * __a, uint32x4x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv4si (__o, (int32x4_t) val.val[1], 1); + __builtin_aarch64_st1x2v4si ((__builtin_aarch64_simd_si *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_u64_x2 (uint64_t * __a, uint64x2x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv2di (__o, (int64x2_t) val.val[1], 1); + __builtin_aarch64_st1x2v2di ((__builtin_aarch64_simd_di *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_f16_x2 (float16_t * __a, float16x8x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv8hf (__o, val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv8hf (__o, val.val[1], 1); + __builtin_aarch64_st1x2v8hf (__a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_f32_x2 (float32_t * __a, float32x4x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv4sf (__o, (float32x4_t) val.val[1], 1); + __builtin_aarch64_st1x2v4sf ((__builtin_aarch64_simd_sf *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_f64_x2 (float64_t * __a, float64x2x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv2df (__o, (float64x2_t) val.val[1], 1); + __builtin_aarch64_st1x2v2df ((__builtin_aarch64_simd_df *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_p64_x2 (poly64_t * __a, poly64x2x2_t val) +{ + __builtin_aarch64_simd_oi __o; + __o = __builtin_aarch64_set_qregoiv2di_ssps (__o, + (poly64x2_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregoiv2di_ssps (__o, + (poly64x2_t) val.val[1], 1); + __builtin_aarch64_st1x2v2di ((__builtin_aarch64_simd_di *) __a, __o); +} + +/* vst1x3 */ + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_s64_x3 (int64_t * __a, int64x1x3_t val) +{ + __builtin_aarch64_simd_ci __o; + int64x2x3_t temp; + temp.val[0] = vcombine_s64 (val.val[0], vcreate_s64 (__AARCH64_INT64_C (0))); + temp.val[1] = vcombine_s64 (val.val[1], vcreate_s64 (__AARCH64_INT64_C (0))); + temp.val[2] = vcombine_s64 (val.val[2], vcreate_s64 (__AARCH64_INT64_C (0))); + __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[2], 2); + __builtin_aarch64_st1x3di ((__builtin_aarch64_simd_di *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_u64_x3 (uint64_t * __a, uint64x1x3_t val) +{ + __builtin_aarch64_simd_ci __o; + uint64x2x3_t temp; + temp.val[0] = vcombine_u64 (val.val[0], vcreate_u64 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_u64 (val.val[1], vcreate_u64 (__AARCH64_UINT64_C (0))); + temp.val[2] = vcombine_u64 (val.val[2], vcreate_u64 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) temp.val[2], 2); + __builtin_aarch64_st1x3di ((__builtin_aarch64_simd_di *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_f64_x3 (float64_t * __a, float64x1x3_t val) +{ + __builtin_aarch64_simd_ci __o; + float64x2x3_t temp; + temp.val[0] = vcombine_f64 (val.val[0], vcreate_f64 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_f64 (val.val[1], vcreate_f64 (__AARCH64_UINT64_C (0))); + temp.val[2] = vcombine_f64 (val.val[2], vcreate_f64 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) temp.val[2], 2); + __builtin_aarch64_st1x3df ((__builtin_aarch64_simd_df *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_s8_x3 (int8_t * __a, int8x8x3_t val) +{ + __builtin_aarch64_simd_ci __o; + int8x16x3_t temp; + temp.val[0] = vcombine_s8 (val.val[0], vcreate_s8 (__AARCH64_INT64_C (0))); + temp.val[1] = vcombine_s8 (val.val[1], vcreate_s8 (__AARCH64_INT64_C (0))); + temp.val[2] = vcombine_s8 (val.val[2], vcreate_s8 (__AARCH64_INT64_C (0))); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[2], 2); + __builtin_aarch64_st1x3v8qi ((__builtin_aarch64_simd_qi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_p8_x3 (poly8_t * __a, poly8x8x3_t val) +{ + __builtin_aarch64_simd_ci __o; + poly8x16x3_t temp; + temp.val[0] = vcombine_p8 (val.val[0], vcreate_p8 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_p8 (val.val[1], vcreate_p8 (__AARCH64_UINT64_C (0))); + temp.val[2] = vcombine_p8 (val.val[2], vcreate_p8 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[2], 2); + __builtin_aarch64_st1x3v8qi ((__builtin_aarch64_simd_qi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_s16_x3 (int16_t * __a, int16x4x3_t val) +{ + __builtin_aarch64_simd_ci __o; + int16x8x3_t temp; + temp.val[0] = vcombine_s16 (val.val[0], vcreate_s16 (__AARCH64_INT64_C (0))); + temp.val[1] = vcombine_s16 (val.val[1], vcreate_s16 (__AARCH64_INT64_C (0))); + temp.val[2] = vcombine_s16 (val.val[2], vcreate_s16 (__AARCH64_INT64_C (0))); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[2], 2); + __builtin_aarch64_st1x3v4hi ((__builtin_aarch64_simd_hi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_p16_x3 (poly16_t * __a, poly16x4x3_t val) +{ + __builtin_aarch64_simd_ci __o; + poly16x8x3_t temp; + temp.val[0] = vcombine_p16 (val.val[0], vcreate_p16 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_p16 (val.val[1], vcreate_p16 (__AARCH64_UINT64_C (0))); + temp.val[2] = vcombine_p16 (val.val[2], vcreate_p16 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[2], 2); + __builtin_aarch64_st1x3v4hi ((__builtin_aarch64_simd_hi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_s32_x3 (int32_t * __a, int32x2x3_t val) +{ + __builtin_aarch64_simd_ci __o; + int32x4x3_t temp; + temp.val[0] = vcombine_s32 (val.val[0], vcreate_s32 (__AARCH64_INT64_C (0))); + temp.val[1] = vcombine_s32 (val.val[1], vcreate_s32 (__AARCH64_INT64_C (0))); + temp.val[2] = vcombine_s32 (val.val[2], vcreate_s32 (__AARCH64_INT64_C (0))); + __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[2], 2); + __builtin_aarch64_st1x3v2si ((__builtin_aarch64_simd_si *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_u8_x3 (uint8_t * __a, uint8x8x3_t val) +{ + __builtin_aarch64_simd_ci __o; + uint8x16x3_t temp; + temp.val[0] = vcombine_u8 (val.val[0], vcreate_u8 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_u8 (val.val[1], vcreate_u8 (__AARCH64_UINT64_C (0))); + temp.val[2] = vcombine_u8 (val.val[2], vcreate_u8 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) temp.val[2], 2); + __builtin_aarch64_st1x3v8qi ((__builtin_aarch64_simd_qi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_u16_x3 (uint16_t * __a, uint16x4x3_t val) +{ + __builtin_aarch64_simd_ci __o; + uint16x8x3_t temp; + temp.val[0] = vcombine_u16 (val.val[0], vcreate_u16 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_u16 (val.val[1], vcreate_u16 (__AARCH64_UINT64_C (0))); + temp.val[2] = vcombine_u16 (val.val[2], vcreate_u16 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) temp.val[2], 2); + __builtin_aarch64_st1x3v4hi ((__builtin_aarch64_simd_hi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_u32_x3 (uint32_t * __a, uint32x2x3_t val) +{ + __builtin_aarch64_simd_ci __o; + uint32x4x3_t temp; + temp.val[0] = vcombine_u32 (val.val[0], vcreate_u32 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_u32 (val.val[1], vcreate_u32 (__AARCH64_UINT64_C (0))); + temp.val[2] = vcombine_u32 (val.val[2], vcreate_u32 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) temp.val[2], 2); + __builtin_aarch64_st1x3v2si ((__builtin_aarch64_simd_si *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_f16_x3 (float16_t * __a, float16x4x3_t val) +{ + __builtin_aarch64_simd_ci __o; + float16x8x3_t temp; + temp.val[0] = vcombine_f16 (val.val[0], vcreate_f16 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_f16 (val.val[1], vcreate_f16 (__AARCH64_UINT64_C (0))); + temp.val[2] = vcombine_f16 (val.val[2], vcreate_f16 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) temp.val[2], 2); + __builtin_aarch64_st1x3v4hf ((__builtin_aarch64_simd_hf *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_f32_x3 (float32_t * __a, float32x2x3_t val) +{ + __builtin_aarch64_simd_ci __o; + float32x4x3_t temp; + temp.val[0] = vcombine_f32 (val.val[0], vcreate_f32 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_f32 (val.val[1], vcreate_f32 (__AARCH64_UINT64_C (0))); + temp.val[2] = vcombine_f32 (val.val[2], vcreate_f32 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) temp.val[2], 2); + __builtin_aarch64_st1x3v2sf ((__builtin_aarch64_simd_sf *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1_p64_x3 (poly64_t * __a, poly64x1x3_t val) +{ + __builtin_aarch64_simd_ci __o; + poly64x2x3_t temp; + temp.val[0] = vcombine_p64 (val.val[0], vcreate_p64 (__AARCH64_UINT64_C (0))); + temp.val[1] = vcombine_p64 (val.val[1], vcreate_p64 (__AARCH64_UINT64_C (0))); + temp.val[2] = vcombine_p64 (val.val[2], vcreate_p64 (__AARCH64_UINT64_C (0))); + __o = __builtin_aarch64_set_qregciv2di_ssps (__o, + (poly64x2_t) temp.val[0], 0); + __o = __builtin_aarch64_set_qregciv2di_ssps (__o, + (poly64x2_t) temp.val[1], 1); + __o = __builtin_aarch64_set_qregciv2di_ssps (__o, + (poly64x2_t) temp.val[2], 2); + __builtin_aarch64_st1x3di ((__builtin_aarch64_simd_di *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_s8_x3 (int8_t * __a, int8x16x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[2], 2); + __builtin_aarch64_st1x3v16qi ((__builtin_aarch64_simd_qi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_p8_x3 (poly8_t * __a, poly8x16x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[2], 2); + __builtin_aarch64_st1x3v16qi ((__builtin_aarch64_simd_qi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_s16_x3 (int16_t * __a, int16x8x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[2], 2); + __builtin_aarch64_st1x3v8hi ((__builtin_aarch64_simd_hi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_p16_x3 (poly16_t * __a, poly16x8x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[2], 2); + __builtin_aarch64_st1x3v8hi ((__builtin_aarch64_simd_hi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_s32_x3 (int32_t * __a, int32x4x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[2], 2); + __builtin_aarch64_st1x3v4si ((__builtin_aarch64_simd_si *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_s64_x3 (int64_t * __a, int64x2x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[2], 2); + __builtin_aarch64_st1x3v2di ((__builtin_aarch64_simd_di *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_u8_x3 (uint8_t * __a, uint8x16x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv16qi (__o, (int8x16_t) val.val[2], 2); + __builtin_aarch64_st1x3v16qi ((__builtin_aarch64_simd_qi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_u16_x3 (uint16_t * __a, uint16x8x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv8hi (__o, (int16x8_t) val.val[2], 2); + __builtin_aarch64_st1x3v8hi ((__builtin_aarch64_simd_hi *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_u32_x3 (uint32_t * __a, uint32x4x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv4si (__o, (int32x4_t) val.val[2], 2); + __builtin_aarch64_st1x3v4si ((__builtin_aarch64_simd_si *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_u64_x3 (uint64_t * __a, uint64x2x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv2di (__o, (int64x2_t) val.val[2], 2); + __builtin_aarch64_st1x3v2di ((__builtin_aarch64_simd_di *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_f16_x3 (float16_t * __a, float16x8x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv8hf (__o, (float16x8_t) val.val[2], 2); + __builtin_aarch64_st1x3v8hf ((__builtin_aarch64_simd_hf *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_f32_x3 (float32_t * __a, float32x4x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv4sf (__o, (float32x4_t) val.val[2], 2); + __builtin_aarch64_st1x3v4sf ((__builtin_aarch64_simd_sf *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_f64_x3 (float64_t * __a, float64x2x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv2df (__o, (float64x2_t) val.val[2], 2); + __builtin_aarch64_st1x3v2df ((__builtin_aarch64_simd_df *) __a, __o); +} + +__extension__ extern __inline void +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vst1q_p64_x3 (poly64_t * __a, poly64x2x3_t val) +{ + __builtin_aarch64_simd_ci __o; + __o = __builtin_aarch64_set_qregciv2di_ssps (__o, + (poly64x2_t) val.val[0], 0); + __o = __builtin_aarch64_set_qregciv2di_ssps (__o, + (poly64x2_t) val.val[1], 1); + __o = __builtin_aarch64_set_qregciv2di_ssps (__o, + (poly64x2_t) val.val[2], 2); + __builtin_aarch64_st1x3v2di ((__builtin_aarch64_simd_di *) __a, __o); +} + /* vstn */ __extension__ extern __inline void