From patchwork Tue May 24 08:23:53 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiong Wang X-Patchwork-Id: 625538 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rDT5X3M4yz9snk for ; Tue, 24 May 2016 18:26:24 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=FYut1yjw; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:references:message-id:date:mime-version:in-reply-to :content-type; q=dns; s=default; b=mWBqavu32jgoi1dmZYqdyvya/1jP+ Rj9kGIO+0FptnXu8dio9IS3aHj4d/aVg5m9zU1vP7A7fMxTPkQwKS3OfTa5K2UaV XxORuXBX12nn1Xs8AzMu09SGIy/hjC8y7yY4AEFjY1SP4ajFv+oXruRGt5SxKjNI gxhxufqKRhJ5Lo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:references:message-id:date:mime-version:in-reply-to :content-type; s=default; bh=rImeVEvKnZTL+ELvFahG7nWbTGw=; b=FYu t1yjwBJCh82L1Jt/b4P9e8RhXxqh2P0ynol29ZZ7f/l6tF8Pny25a20PxyEXtrSx Ltd3rw+u/C62VQKct2CGwxHirBvVh6rAvlfANmst6MyGdD8xvDd6472rzhJt6Ygd Kirn6bITRt6ADja92iiJxRfncL3VA9jvLnq795jY= Received: (qmail 52727 invoked by alias); 24 May 2016 08:24:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 52683 invoked by uid 89); 24 May 2016 08:24:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=dfmode, DFmode X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 24 May 2016 08:23:56 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4C361435 for ; Tue, 24 May 2016 01:24:18 -0700 (PDT) Received: from [10.2.206.198] (e104437-lin.cambridge.arm.com [10.2.206.198]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4CCE43F5C4 for ; Tue, 24 May 2016 01:23:55 -0700 (PDT) From: Jiong Wang Subject: [AArch64, 4/6] Reimplement frsqrts intrinsics To: GCC Patches References: <57430251.6060902@foss.arm.com> <57430271.3070504@foss.arm.com> <5743029C.60208@foss.arm.com> <574302DA.6090803@foss.arm.com> Message-ID: <57440F99.2060204@foss.arm.com> Date: Tue, 24 May 2016 09:23:53 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <574302DA.6090803@foss.arm.com> X-IsSubscribed: yes Similar as [3/6], these intrinsics were implemented before the instruction pattern "aarch64_rsqrts" added, that these intrinsics were implemented through inline assembly. This mirgrate the implementation to builtin. gcc/ 2016-05-23 Jiong Wang * config/aarch64/aarch64-builtins.def (rsqrts): New builtins for modes VALLF. * config/aarch64/aarch64-simd.md (aarch64_rsqrts_3): Rename to "aarch64_rsqrts". * config/aarch64/aarch64.c (get_rsqrts_type): Update gen* name. * config/aarch64/arm_neon.h (vrsqrtss_f32): Remove inline assembly. Use builtin. (vrsqrtsd_f64): Likewise. (vrsqrts_f32): Likewise. (vrsqrtsq_f32): Likewise. (vrsqrtsq_f64): Likewise. From ea271deeb19e3a1e611cbc1ddf3abfec06388958 Mon Sep 17 00:00:00 2001 From: "Jiong.Wang" Date: Mon, 23 May 2016 12:12:33 +0100 Subject: [PATCH 4/6] 4 --- gcc/config/aarch64/aarch64-builtins.def | 3 ++ gcc/config/aarch64/aarch64-simd.md | 2 +- gcc/config/aarch64/aarch64.c | 10 ++-- gcc/config/aarch64/arm_neon.h | 87 ++++++++++++--------------------- 4 files changed, 41 insertions(+), 61 deletions(-) diff --git a/gcc/config/aarch64/aarch64-builtins.def b/gcc/config/aarch64/aarch64-builtins.def index 32bcd06..1955d17 100644 --- a/gcc/config/aarch64/aarch64-builtins.def +++ b/gcc/config/aarch64/aarch64-builtins.def @@ -462,3 +462,6 @@ /* Implemented by aarch64_rsqrte. */ BUILTIN_VALLF (UNOP, rsqrte, 0) + + /* Implemented by aarch64_rsqrts. */ + BUILTIN_VALLF (BINOP, rsqrts, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c34d21e..cca6c1b 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -390,7 +390,7 @@ "frsqrte\\t%0, %1" [(set_attr "type" "neon_fp_rsqrte_")]) -(define_insn "aarch64_rsqrts_3" +(define_insn "aarch64_rsqrts" [(set (match_operand:VALLF 0 "register_operand" "=w") (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w") (match_operand:VALLF 2 "register_operand" "w")] diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 18a8c1e..ba71d2a 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7377,11 +7377,11 @@ get_rsqrts_type (machine_mode mode) { switch (mode) { - case DFmode: return gen_aarch64_rsqrts_df3; - case SFmode: return gen_aarch64_rsqrts_sf3; - case V2DFmode: return gen_aarch64_rsqrts_v2df3; - case V2SFmode: return gen_aarch64_rsqrts_v2sf3; - case V4SFmode: return gen_aarch64_rsqrts_v4sf3; + case DFmode: return gen_aarch64_rsqrtsdf; + case SFmode: return gen_aarch64_rsqrtssf; + case V2DFmode: return gen_aarch64_rsqrtsv2df; + case V2SFmode: return gen_aarch64_rsqrtsv2sf; + case V4SFmode: return gen_aarch64_rsqrtsv4sf; default: gcc_unreachable (); } } diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index be48a5e..1971373 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -9196,61 +9196,6 @@ vrsqrteq_u32 (uint32x4_t a) return result; } -__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) -vrsqrts_f32 (float32x2_t a, float32x2_t b) -{ - float32x2_t result; - __asm__ ("frsqrts %0.2s,%1.2s,%2.2s" - : "=w"(result) - : "w"(a), "w"(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline float64_t __attribute__ ((__always_inline__)) -vrsqrtsd_f64 (float64_t a, float64_t b) -{ - float64_t result; - __asm__ ("frsqrts %d0,%d1,%d2" - : "=w"(result) - : "w"(a), "w"(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) -vrsqrtsq_f32 (float32x4_t a, float32x4_t b) -{ - float32x4_t result; - __asm__ ("frsqrts %0.4s,%1.4s,%2.4s" - : "=w"(result) - : "w"(a), "w"(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) -vrsqrtsq_f64 (float64x2_t a, float64x2_t b) -{ - float64x2_t result; - __asm__ ("frsqrts %0.2d,%1.2d,%2.2d" - : "=w"(result) - : "w"(a), "w"(b) - : /* No clobbers */); - return result; -} - -__extension__ static __inline float32_t __attribute__ ((__always_inline__)) -vrsqrtss_f32 (float32_t a, float32_t b) -{ - float32_t result; - __asm__ ("frsqrts %s0,%s1,%s2" - : "=w"(result) - : "w"(a), "w"(b) - : /* No clobbers */); - return result; -} - #define vshrn_high_n_s16(a, b, c) \ __extension__ \ ({ \ @@ -21481,6 +21426,38 @@ vrsqrteq_f64 (float64x2_t a) return __builtin_aarch64_rsqrtev2df (a); } +/* vrsqrts. */ + +__extension__ static __inline float32_t __attribute__ ((__always_inline__)) +vrsqrtss_f32 (float32_t a, float32_t b) +{ + return __builtin_aarch64_rsqrtssf (a, b); +} + +__extension__ static __inline float64_t __attribute__ ((__always_inline__)) +vrsqrtsd_f64 (float64_t a, float64_t b) +{ + return __builtin_aarch64_rsqrtsdf (a, b); +} + +__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) +vrsqrts_f32 (float32x2_t a, float32x2_t b) +{ + return __builtin_aarch64_rsqrtsv2sf (a, b); +} + +__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) +vrsqrtsq_f32 (float32x4_t a, float32x4_t b) +{ + return __builtin_aarch64_rsqrtsv4sf (a, b); +} + +__extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) +vrsqrtsq_f64 (float64x2_t a, float64x2_t b) +{ + return __builtin_aarch64_rsqrtsv2df (a, b); +} + /* vrsra */ __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -- 1.9.1