From patchwork Fri Sep 13 18:35:15 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Greenhalgh X-Patchwork-Id: 274854 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id BB8912C0174 for ; Sat, 14 Sep 2013 04:35:36 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=snYOQ/lu0CUtB9TV6q2XFms+Jjn+y8DbV+EkUZ5lsGWDwhC3Tg w3o7dIotPw/5/Zx+GSKGQGx42BrIjL2x5p32bbKg/8gKC8n+PVkIS24EI3spYdsw mdZvszHpiHO4y8muFOFQRNhrjP8fcJD4Vno8faWrUkxarc14b5CPrgfDM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=1EaSZ/PXlIykA0R0ISjTHc6lBeU=; b=F/O70YljnDBlQ5vV25eZ UH5Em4qQcTsxUJHwQt2ycIE0DZ+/ObUruBj0Im5bw/0kuKvRwU0sSpll9AaVvnkc +xdxjN908pcGbWp/zOuKrJ0aj/UJjj6Gw3WxmReFoFc+CLXvBRqjNabZB868Yz5T hrY8PtofjBW9+PlkfUQT2qQ= Received: (qmail 30504 invoked by alias); 13 Sep 2013 18:35:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 30494 invoked by uid 89); 13 Sep 2013 18:35:30 -0000 Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 13 Sep 2013 18:35:30 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_NO, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Fri, 13 Sep 2013 19:35:22 +0100 Received: from e106375-lin.cambridge.arm.com ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Fri, 13 Sep 2013 19:35:17 +0100 From: James Greenhalgh To: gcc-patches@gcc.gnu.org Cc: marcus.shawcroft@arm.com Subject: [AArch64] Implement vset_lane intrinsics in C Date: Fri, 13 Sep 2013 19:35:15 +0100 Message-Id: <1379097315-27647-1-git-send-email-james.greenhalgh@arm.com> MIME-Version: 1.0 X-MC-Unique: 113091319352209801 X-IsSubscribed: yes Hi, The vset_lane_<8,16,32,64> intrinsics are currently written useing assembler, but can be easily expressed in C. As I expect we will want to efficiently compose these intrinsics I've added them as macros, just as was done with the vget_lane intrinsics. Regression tested for aarch64-none-elf and a new testcase added to ensure these intrinsics generate the expected instruction. OK? Thanks, James --- gcc/ 2013-09-13 James Greenhalgh * config/aarch64/arm_neon.h (__aarch64_vset_lane_any): New. (__aarch64_vset_lane_<8,16,32,64>): Likewise. (vset_lane_<8,16,32,64>): Use new macros. gcc/testsuite 2013-09-13 James Greenhalgh * gcc.target/aarch64/vect_set_lane_1.c: New. diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index cb58602..6335ddf 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -508,6 +508,58 @@ typedef struct poly16x8x4_t #define __aarch64_vgetq_lane_u64(__a, __b) \ __aarch64_vget_lane_any (v2di, (uint64_t), (int64x2_t), __a, __b) +/* __aarch64_vset_lane internal macros. */ +#define __aarch64_vset_lane_any(__source, __v, __index) \ + (__v[__index] = __source, __v) + +#define __aarch64_vset_lane_f32(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vset_lane_f64(__source, __v, __index) (__source) +#define __aarch64_vset_lane_p8(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vset_lane_p16(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vset_lane_s8(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vset_lane_s16(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vset_lane_s32(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vset_lane_s64(__source, __v, __index) (__source) +#define __aarch64_vset_lane_u8(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vset_lane_u16(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vset_lane_u32(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vset_lane_u64(__source, __v, __index) (__source) + +/* __aarch64_vset_laneq internal macros. */ +#define __aarch64_vsetq_lane_f32(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vsetq_lane_f64(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vsetq_lane_p8(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vsetq_lane_p16(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vsetq_lane_s8(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vsetq_lane_s16(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vsetq_lane_s32(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vsetq_lane_s64(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vsetq_lane_u8(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vsetq_lane_u16(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vsetq_lane_u32(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) +#define __aarch64_vsetq_lane_u64(__source, __v, __index) \ + __aarch64_vset_lane_any (__source, __v, __index) + /* __aarch64_vdup_lane internal macros. */ #define __aarch64_vdup_lane_any(__size, __q1, __q2, __a, __b) \ vdup##__q1##_n_##__size (__aarch64_vget##__q2##_lane_##__size (__a, __b)) @@ -3969,6 +4021,154 @@ vreinterpretq_u32_p16 (poly16x8_t __a) return (uint32x4_t) __builtin_aarch64_reinterpretv4siv8hi ((int16x8_t) __a); } +/* vset_lane. */ + +__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) +vset_lane_f32 (float32_t __a, float32x2_t __v, const int __index) +{ + return __aarch64_vset_lane_f32 (__a, __v, __index); +} + +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vset_lane_f64 (float64_t __a, float64x1_t __v, const int __index) +{ + return __aarch64_vset_lane_f64 (__a, __v, __index); +} + +__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__)) +vset_lane_p8 (poly8_t __a, poly8x8_t __v, const int __index) +{ + return __aarch64_vset_lane_p8 (__a, __v, __index); +} + +__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__)) +vset_lane_p16 (poly16_t __a, poly16x4_t __v, const int __index) +{ + return __aarch64_vset_lane_p16 (__a, __v, __index); +} + +__extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) +vset_lane_s8 (int8_t __a, int8x8_t __v, const int __index) +{ + return __aarch64_vset_lane_s8 (__a, __v, __index); +} + +__extension__ static __inline int16x4_t __attribute__ ((__always_inline__)) +vset_lane_s16 (int16_t __a, int16x4_t __v, const int __index) +{ + return __aarch64_vset_lane_s16 (__a, __v, __index); +} + +__extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) +vset_lane_s32 (int32_t __a, int32x2_t __v, const int __index) +{ + return __aarch64_vset_lane_s32 (__a, __v, __index); +} + +__extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) +vset_lane_s64 (int64_t __a, int64x1_t __v, const int __index) +{ + return __aarch64_vset_lane_s64 (__a, __v, __index); +} + +__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) +vset_lane_u8 (uint8_t __a, uint8x8_t __v, const int __index) +{ + return __aarch64_vset_lane_u8 (__a, __v, __index); +} + +__extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) +vset_lane_u16 (uint16_t __a, uint16x4_t __v, const int __index) +{ + return __aarch64_vset_lane_u16 (__a, __v, __index); +} + +__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) +vset_lane_u32 (uint32_t __a, uint32x2_t __v, const int __index) +{ + return __aarch64_vset_lane_u32 (__a, __v, __index); +} + +__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) +vset_lane_u64 (uint64_t __a, uint64x1_t __v, const int __index) +{ + return __aarch64_vset_lane_u64 (__a, __v, __index); +} + +/* vsetq_lane */ + +__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) +vsetq_lane_f32 (float32_t __a, float32x4_t __v, const int __index) +{ + return __aarch64_vsetq_lane_f32 (__a, __v, __index); +} + +__extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) +vsetq_lane_f64 (float64_t __a, float64x2_t __v, const int __index) +{ + return __aarch64_vsetq_lane_f64 (__a, __v, __index); +} + +__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__)) +vsetq_lane_p8 (poly8_t __a, poly8x16_t __v, const int __index) +{ + return __aarch64_vsetq_lane_p8 (__a, __v, __index); +} + +__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__)) +vsetq_lane_p16 (poly16_t __a, poly16x8_t __v, const int __index) +{ + return __aarch64_vsetq_lane_p16 (__a, __v, __index); +} + +__extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) +vsetq_lane_s8 (int8_t __a, int8x16_t __v, const int __index) +{ + return __aarch64_vsetq_lane_s8 (__a, __v, __index); +} + +__extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) +vsetq_lane_s16 (int16_t __a, int16x8_t __v, const int __index) +{ + return __aarch64_vsetq_lane_s16 (__a, __v, __index); +} + +__extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) +vsetq_lane_s32 (int32_t __a, int32x4_t __v, const int __index) +{ + return __aarch64_vsetq_lane_s32 (__a, __v, __index); +} + +__extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) +vsetq_lane_s64 (int64_t __a, int64x2_t __v, const int __index) +{ + return __aarch64_vsetq_lane_s64 (__a, __v, __index); +} + +__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) +vsetq_lane_u8 (uint8_t __a, uint8x16_t __v, const int __index) +{ + return __aarch64_vsetq_lane_u8 (__a, __v, __index); +} + +__extension__ static __inline uint16x8_t __attribute__ ((__always_inline__)) +vsetq_lane_u16 (uint16_t __a, uint16x8_t __v, const int __index) +{ + return __aarch64_vsetq_lane_u16 (__a, __v, __index); +} + +__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) +vsetq_lane_u32 (uint32_t __a, uint32x4_t __v, const int __index) +{ + return __aarch64_vsetq_lane_u32 (__a, __v, __index); +} + +__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__)) +vsetq_lane_u64 (uint64_t __a, uint64x2_t __v, const int __index) +{ + return __aarch64_vsetq_lane_u64 (__a, __v, __index); +} + #define __GET_LOW(__TYPE) \ uint64x2_t tmp = vreinterpretq_u64_##__TYPE (__a); \ uint64_t lo = vgetq_lane_u64 (tmp, 0); \ @@ -12192,318 +12392,6 @@ vrsubhn_u64 (uint64x2_t a, uint64x2_t b) return result; } -#define vset_lane_f32(a, b, c) \ - __extension__ \ - ({ \ - float32x2_t b_ = (b); \ - float32_t a_ = (a); \ - float32x2_t result; \ - __asm__ ("ins %0.s[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vset_lane_f64(a, b, c) \ - __extension__ \ - ({ \ - float64x1_t b_ = (b); \ - float64_t a_ = (a); \ - float64x1_t result; \ - __asm__ ("ins %0.d[%3], %x1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vset_lane_p8(a, b, c) \ - __extension__ \ - ({ \ - poly8x8_t b_ = (b); \ - poly8_t a_ = (a); \ - poly8x8_t result; \ - __asm__ ("ins %0.b[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vset_lane_p16(a, b, c) \ - __extension__ \ - ({ \ - poly16x4_t b_ = (b); \ - poly16_t a_ = (a); \ - poly16x4_t result; \ - __asm__ ("ins %0.h[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vset_lane_s8(a, b, c) \ - __extension__ \ - ({ \ - int8x8_t b_ = (b); \ - int8_t a_ = (a); \ - int8x8_t result; \ - __asm__ ("ins %0.b[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vset_lane_s16(a, b, c) \ - __extension__ \ - ({ \ - int16x4_t b_ = (b); \ - int16_t a_ = (a); \ - int16x4_t result; \ - __asm__ ("ins %0.h[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vset_lane_s32(a, b, c) \ - __extension__ \ - ({ \ - int32x2_t b_ = (b); \ - int32_t a_ = (a); \ - int32x2_t result; \ - __asm__ ("ins %0.s[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vset_lane_s64(a, b, c) \ - __extension__ \ - ({ \ - int64x1_t b_ = (b); \ - int64_t a_ = (a); \ - int64x1_t result; \ - __asm__ ("ins %0.d[%3], %x1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vset_lane_u8(a, b, c) \ - __extension__ \ - ({ \ - uint8x8_t b_ = (b); \ - uint8_t a_ = (a); \ - uint8x8_t result; \ - __asm__ ("ins %0.b[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vset_lane_u16(a, b, c) \ - __extension__ \ - ({ \ - uint16x4_t b_ = (b); \ - uint16_t a_ = (a); \ - uint16x4_t result; \ - __asm__ ("ins %0.h[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vset_lane_u32(a, b, c) \ - __extension__ \ - ({ \ - uint32x2_t b_ = (b); \ - uint32_t a_ = (a); \ - uint32x2_t result; \ - __asm__ ("ins %0.s[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vset_lane_u64(a, b, c) \ - __extension__ \ - ({ \ - uint64x1_t b_ = (b); \ - uint64_t a_ = (a); \ - uint64x1_t result; \ - __asm__ ("ins %0.d[%3], %x1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vsetq_lane_f32(a, b, c) \ - __extension__ \ - ({ \ - float32x4_t b_ = (b); \ - float32_t a_ = (a); \ - float32x4_t result; \ - __asm__ ("ins %0.s[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vsetq_lane_f64(a, b, c) \ - __extension__ \ - ({ \ - float64x2_t b_ = (b); \ - float64_t a_ = (a); \ - float64x2_t result; \ - __asm__ ("ins %0.d[%3], %x1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vsetq_lane_p8(a, b, c) \ - __extension__ \ - ({ \ - poly8x16_t b_ = (b); \ - poly8_t a_ = (a); \ - poly8x16_t result; \ - __asm__ ("ins %0.b[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vsetq_lane_p16(a, b, c) \ - __extension__ \ - ({ \ - poly16x8_t b_ = (b); \ - poly16_t a_ = (a); \ - poly16x8_t result; \ - __asm__ ("ins %0.h[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vsetq_lane_s8(a, b, c) \ - __extension__ \ - ({ \ - int8x16_t b_ = (b); \ - int8_t a_ = (a); \ - int8x16_t result; \ - __asm__ ("ins %0.b[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vsetq_lane_s16(a, b, c) \ - __extension__ \ - ({ \ - int16x8_t b_ = (b); \ - int16_t a_ = (a); \ - int16x8_t result; \ - __asm__ ("ins %0.h[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vsetq_lane_s32(a, b, c) \ - __extension__ \ - ({ \ - int32x4_t b_ = (b); \ - int32_t a_ = (a); \ - int32x4_t result; \ - __asm__ ("ins %0.s[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vsetq_lane_s64(a, b, c) \ - __extension__ \ - ({ \ - int64x2_t b_ = (b); \ - int64_t a_ = (a); \ - int64x2_t result; \ - __asm__ ("ins %0.d[%3], %x1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vsetq_lane_u8(a, b, c) \ - __extension__ \ - ({ \ - uint8x16_t b_ = (b); \ - uint8_t a_ = (a); \ - uint8x16_t result; \ - __asm__ ("ins %0.b[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vsetq_lane_u16(a, b, c) \ - __extension__ \ - ({ \ - uint16x8_t b_ = (b); \ - uint16_t a_ = (a); \ - uint16x8_t result; \ - __asm__ ("ins %0.h[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vsetq_lane_u32(a, b, c) \ - __extension__ \ - ({ \ - uint32x4_t b_ = (b); \ - uint32_t a_ = (a); \ - uint32x4_t result; \ - __asm__ ("ins %0.s[%3], %w1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - -#define vsetq_lane_u64(a, b, c) \ - __extension__ \ - ({ \ - uint64x2_t b_ = (b); \ - uint64_t a_ = (a); \ - uint64x2_t result; \ - __asm__ ("ins %0.d[%3], %x1" \ - : "=w"(result) \ - : "r"(a_), "0"(b_), "i"(c) \ - : /* No clobbers */); \ - result; \ - }) - #define vshrn_high_n_s16(a, b, c) \ __extension__ \ ({ \ @@ -25537,6 +25425,33 @@ __INTERLEAVE_LIST (zip) #undef __aarch64_vgetq_lane_u32 #undef __aarch64_vgetq_lane_u64 +#undef __aarch64_vset_lane_any +#undef __aarch64_vset_lane_f32 +#undef __aarch64_vset_lane_f64 +#undef __aarch64_vset_lane_p8 +#undef __aarch64_vset_lane_p16 +#undef __aarch64_vset_lane_s8 +#undef __aarch64_vset_lane_s16 +#undef __aarch64_vset_lane_s32 +#undef __aarch64_vset_lane_s64 +#undef __aarch64_vset_lane_u8 +#undef __aarch64_vset_lane_u16 +#undef __aarch64_vset_lane_u32 +#undef __aarch64_vset_lane_u64 + +#undef __aarch64_vsetq_lane_f32 +#undef __aarch64_vsetq_lane_f64 +#undef __aarch64_vsetq_lane_p8 +#undef __aarch64_vsetq_lane_p16 +#undef __aarch64_vsetq_lane_s8 +#undef __aarch64_vsetq_lane_s16 +#undef __aarch64_vsetq_lane_s32 +#undef __aarch64_vsetq_lane_s64 +#undef __aarch64_vsetq_lane_u8 +#undef __aarch64_vsetq_lane_u16 +#undef __aarch64_vsetq_lane_u32 +#undef __aarch64_vsetq_lane_u64 + #undef __aarch64_vdup_lane_any #undef __aarch64_vdup_lane_f32 #undef __aarch64_vdup_lane_f64 diff --git a/gcc/testsuite/gcc.target/aarch64/vect_set_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vect_set_lane_1.c new file mode 100644 index 0000000..800ffce --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect_set_lane_1.c @@ -0,0 +1,57 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +#include "arm_neon.h" + +#define BUILD_TEST(TYPE, INNER_TYPE, Q, SUFFIX, INDEX) \ +TYPE \ +test_set##Q##_lane_##SUFFIX (INNER_TYPE a, TYPE v) \ +{ \ + return vset##Q##_lane_##SUFFIX (a, v, INDEX); \ +} + +/* vset_lane. */ +BUILD_TEST (poly8x8_t, poly8_t, , p8, 7) +BUILD_TEST (int8x8_t, int8_t, , s8, 7) +BUILD_TEST (uint8x8_t, uint8_t, , u8, 7) +/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], w0" 3 } } */ +BUILD_TEST (poly16x4_t, poly16_t, , p16, 3) +BUILD_TEST (int16x4_t, int16_t, , s16, 3) +BUILD_TEST (uint16x4_t, uint16_t, , u16, 3) +/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], w0" 3 } } */ +BUILD_TEST (int32x2_t, int32_t, , s32, 1) +BUILD_TEST (uint32x2_t, uint32_t, , u32, 1) +/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], w0" 2 } } */ +BUILD_TEST (int64x1_t, int64_t, , s64, 0) +BUILD_TEST (uint64x1_t, uint64_t, , u64, 0) +/* Nothing to do. */ + +/* vsetq_lane. */ + +BUILD_TEST (poly8x16_t, poly8_t, q, p8, 15) +BUILD_TEST (int8x16_t, int8_t, q, s8, 15) +BUILD_TEST (uint8x16_t, uint8_t, q, u8, 15) +/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[15\\\], w0" 3 } } */ +BUILD_TEST (poly16x8_t, poly16_t, q, p16, 7) +BUILD_TEST (int16x8_t, int16_t, q, s16, 7) +BUILD_TEST (uint16x8_t, uint16_t, q, u16, 7) +/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[7\\\], w0" 3 } } */ +BUILD_TEST (int32x4_t, int32_t, q, s32, 3) +BUILD_TEST (uint32x4_t, uint32_t, q, u32, 3) +/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[3\\\], w0" 2 } } */ +BUILD_TEST (int64x2_t, int64_t, q, s64, 1) +BUILD_TEST (uint64x2_t, uint64_t, q, u64, 1) +/* { dg-final { scan-assembler-times "ins\\tv0.d\\\[1\\\], x0" 2 } } */ + +/* Float versions are slightly different as their scalar value + will be in v0 rather than w0. */ +BUILD_TEST (float32x2_t, float32_t, , f32, 1) +/* { dg-final { scan-assembler-times "ins\\tv1.s\\\[1\\\], v0.s\\\[0\\\]" 1 } } */ +BUILD_TEST (float64x1_t, float64_t, , f64, 0) +/* Nothing to do. */ +BUILD_TEST (float32x4_t, float32_t, q, f32, 3) +/* { dg-final { scan-assembler-times "ins\\tv1.s\\\[3\\\], v0.s\\\[0\\\]" 1 } } */ +BUILD_TEST (float64x2_t, float64_t, q, f64, 1) +/* { dg-final { scan-assembler-times "ins\\tv1.d\\\[1\\\], v0.d\\\[0\\\]" 1 } } */ + +/* { dg-final { cleanup-saved-temps } } */