From patchwork Fri Feb 21 10:29:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1242030 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-519887-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha1 header.s=default header.b=Ov0LDp1X; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48P74w1ptdz9sPk for ; Fri, 21 Feb 2020 21:30:08 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=eR3yJfCJ+c6+YXc45tE6F3Ky83ZC6SdYjh2vXxDG6hvNqIbElHiz1 lSRUk4yo7SHKr3T5GqmrQL3KIeC05Jb9gDp9cYb6744ELMYkrUNJ49k4NujUCiOx 3I+UWfPsQaKZAVnQ6PJCPNosC1ItTukM6WDNsw2vpuOU+7gEPSxzrU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=T1uLzuJa+7+RfuI3i1y6qWKUcIQ=; b=Ov0LDp1Xqr4Z+SDFrgBX VOYnCdOcyQuX1CbNpJAvBVSLENketIceUq57AZdeAJBmTtNwj76TVDdCiycEiB51 ielRHgJI+OzqJIea2z8T5b5k/ykMGhXWR0uDB43eiK993c0jLPZjTiWYqyIYMZ5b cGcD6ZXomz0LqgMxBi9U2BA= Received: (qmail 10415 invoked by alias); 21 Feb 2020 10:30:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 9816 invoked by uid 89); 21 Feb 2020 10:29:59 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-17.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS autolearn=ham version=3.3.1 spammy=calculated, Ofast, ofast, subtraction X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 21 Feb 2020 10:29:57 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1F5C231B for ; Fri, 21 Feb 2020 02:29:56 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BC8A53F68F for ; Fri, 21 Feb 2020 02:29:55 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [committed] aarch64: Add SVE support for -mlow-precision-div Date: Fri, 21 Feb 2020 10:29:54 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 X-IsSubscribed: yes SVE was missing support for -mlow-precision-div, which meant that -march=armv8.2-a+sve -mlow-precision-div could cause a performance regression compared to -march=armv8.2-a -mlow-precision-div. I ended up doing this much later than originally intended, sorry... Tested on aarch64-linux-gnu and aarch64_be-elf, pushed. Richard 2020-02-21 Richard Sandiford gcc/ * config/aarch64/aarch64.c (aarch64_emit_mult): New function. (aarch64_emit_approx_div): Add SVE support. Use aarch64_emit_mult instead of emitting multiplication instructions directly. * config/aarch64/iterators.md (SVE_COND_FP_BINARY_OPTAB): New iterator. * config/aarch64/aarch64-sve.md (div3, @aarch64_frecpe) (@aarch64_frecps): New expanders. gcc/testsuite/ * gcc.target/aarch64/sve/recip_1.c: New test. * gcc.target/aarch64/sve/recip_1_run.c: Likewise. * gcc.target/aarch64/sve/recip_2.c: Likewise. * gcc.target/aarch64/sve/recip_2_run.c: Likewise. --- gcc/config/aarch64/aarch64-sve.md | 44 ++++++++++++++++++- gcc/config/aarch64/aarch64.c | 29 ++++++++++-- gcc/config/aarch64/iterators.md | 11 +++++ .../gcc.target/aarch64/sve/recip_1.c | 27 ++++++++++++ .../gcc.target/aarch64/sve/recip_1_run.c | 27 ++++++++++++ .../gcc.target/aarch64/sve/recip_2.c | 27 ++++++++++++ .../gcc.target/aarch64/sve/recip_2_run.c | 30 +++++++++++++ 7 files changed, 191 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/recip_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/recip_1_run.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/recip_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/recip_2_run.c diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index fa3852992e1..e3b1da89c1a 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -99,6 +99,7 @@ ;; ---- [FP] Subtraction ;; ---- [FP] Absolute difference ;; ---- [FP] Multiplication +;; ---- [FP] Division ;; ---- [FP] Binary logical operations ;; ---- [FP] Sign copying ;; ---- [FP] Maximum and minimum @@ -4719,7 +4720,7 @@ (define_expand "3" (const_int SVE_RELAXED_GP) (match_operand:SVE_FULL_F 1 "") (match_operand:SVE_FULL_F 2 "")] - SVE_COND_FP_BINARY))] + SVE_COND_FP_BINARY_OPTAB))] "TARGET_SVE" { operands[3] = aarch64_ptrue_reg (mode); @@ -5455,6 +5456,47 @@ (define_insn "@aarch64_mul_lane_" "fmul\t%0., %1., %2.[%3]" ) +;; ------------------------------------------------------------------------- +;; ---- [FP] Division +;; ------------------------------------------------------------------------- +;; The patterns in this section are synthetic. +;; ------------------------------------------------------------------------- + +(define_expand "div3" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_dup 3) + (const_int SVE_RELAXED_GP) + (match_operand:SVE_FULL_F 1 "nonmemory_operand") + (match_operand:SVE_FULL_F 2 "register_operand")] + UNSPEC_COND_FDIV))] + "TARGET_SVE" + { + if (aarch64_emit_approx_div (operands[0], operands[1], operands[2])) + DONE; + + operands[1] = force_reg (mode, operands[1]); + operands[3] = aarch64_ptrue_reg (mode); + } +) + +(define_expand "@aarch64_frecpe" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_operand:SVE_FULL_F 1 "register_operand")] + UNSPEC_FRECPE))] + "TARGET_SVE" +) + +(define_expand "@aarch64_frecps" + [(set (match_operand:SVE_FULL_F 0 "register_operand") + (unspec:SVE_FULL_F + [(match_operand:SVE_FULL_F 1 "register_operand") + (match_operand:SVE_FULL_F 2 "register_operand")] + UNSPEC_FRECPS))] + "TARGET_SVE" +) + ;; ------------------------------------------------------------------------- ;; ---- [FP] Binary logical operations ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 0acaa06b91c..c1bbc4917c7 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -12739,6 +12739,25 @@ aarch64_builtin_reciprocal (tree fndecl) gcc_unreachable (); } +/* Emit code to perform the floating-point operation: + + DST = SRC1 * SRC2 + + where all three operands are already known to be registers. + If the operation is an SVE one, PTRUE is a suitable all-true + predicate. */ + +static void +aarch64_emit_mult (rtx dst, rtx ptrue, rtx src1, rtx src2) +{ + if (ptrue) + emit_insn (gen_aarch64_pred (UNSPEC_COND_FMUL, GET_MODE (dst), + dst, ptrue, src1, src2, + gen_int_mode (SVE_RELAXED_GP, SImode))); + else + emit_set_insn (dst, gen_rtx_MULT (GET_MODE (dst), src1, src2)); +} + /* Emit instruction sequence to compute either the approximate square root or its approximate reciprocal, depending on the flag RECP, and return whether the sequence was emitted or not. */ @@ -12857,6 +12876,10 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den) if (!TARGET_SIMD && VECTOR_MODE_P (mode)) return false; + rtx pg = NULL_RTX; + if (aarch64_sve_mode_p (mode)) + pg = aarch64_ptrue_reg (aarch64_sve_pred_mode (mode)); + /* Estimate the approximate reciprocal. */ rtx xrcp = gen_reg_rtx (mode); emit_insn (gen_aarch64_frecpe (mode, xrcp, den)); @@ -12876,7 +12899,7 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den) emit_insn (gen_aarch64_frecps (mode, xtmp, xrcp, den)); if (iterations > 0) - emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp)); + aarch64_emit_mult (xrcp, pg, xrcp, xtmp); } if (num != CONST1_RTX (mode)) @@ -12884,11 +12907,11 @@ aarch64_emit_approx_div (rtx quo, rtx num, rtx den) /* As the approximate reciprocal of DEN is already calculated, only calculate the approximate division when NUM is not 1.0. */ rtx xnum = force_reg (mode, num); - emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum)); + aarch64_emit_mult (xrcp, pg, xrcp, xnum); } /* Finalize the approximation. */ - emit_set_insn (quo, gen_rtx_MULT (mode, xrcp, xtmp)); + aarch64_emit_mult (quo, pg, xrcp, xtmp); return true; } diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index d17d79a30da..548ee0f51e8 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -2291,6 +2291,17 @@ (define_int_iterator SVE_COND_FP_BINARY [UNSPEC_COND_FADD UNSPEC_COND_FMULX UNSPEC_COND_FSUB]) +;; Same as SVE_COND_FP_BINARY, but without codes that have a dedicated +;; 3 expander. +(define_int_iterator SVE_COND_FP_BINARY_OPTAB [UNSPEC_COND_FADD + UNSPEC_COND_FMAX + UNSPEC_COND_FMAXNM + UNSPEC_COND_FMIN + UNSPEC_COND_FMINNM + UNSPEC_COND_FMUL + UNSPEC_COND_FMULX + UNSPEC_COND_FSUB]) + (define_int_iterator SVE_COND_FP_BINARY_INT [UNSPEC_COND_FSCALE]) (define_int_iterator SVE_COND_FP_ADD [UNSPEC_COND_FADD]) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/recip_1.c b/gcc/testsuite/gcc.target/aarch64/sve/recip_1.c new file mode 100644 index 00000000000..c9d470f5c03 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/recip_1.c @@ -0,0 +1,27 @@ +/* { dg-options "-Ofast -mlow-precision-div" } */ + +#define DEF_LOOP(TYPE) \ + void \ + test_##TYPE (TYPE *x, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + x[i] = (TYPE) 1 / x[i]; \ + } + +#define TEST_ALL(T) \ + T (_Float16) \ + T (float) \ + T (double) + +TEST_ALL (DEF_LOOP) + +/* { dg-final { scan-assembler-not {\tfrecpe\tz[0-9]+\.h} } } */ +/* { dg-final { scan-assembler-not {\tfrecps\tz[0-9]+\.h} } } */ + +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfrecpe\tz[0-9]+\.s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfrecps\tz[0-9]+\.s} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d} 2 } } */ +/* { dg-final { scan-assembler-times {\tfrecpe\tz[0-9]+\.d} 1 } } */ +/* { dg-final { scan-assembler-times {\tfrecps\tz[0-9]+\.d} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/recip_1_run.c b/gcc/testsuite/gcc.target/aarch64/sve/recip_1_run.c new file mode 100644 index 00000000000..b232b88530a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/recip_1_run.c @@ -0,0 +1,27 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-Ofast -mlow-precision-div" } */ + +#include "recip_1.c" + +#define N 77 + +#define TEST_LOOP(TYPE) \ + { \ + TYPE a[N]; \ + for (int i = 0; i < N; ++i) \ + a[i] = i + 1; \ + test_##TYPE (a, N); \ + for (int i = 0; i < N; ++i) \ + { \ + double diff = a[i] - 1.0 / (i + 1); \ + if (__builtin_fabs (diff) > 0x1.0p-8) \ + __builtin_abort (); \ + } \ + } + +int +main (void) +{ + TEST_ALL (TEST_LOOP); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/sve/recip_2.c b/gcc/testsuite/gcc.target/aarch64/sve/recip_2.c new file mode 100644 index 00000000000..f308a6b7874 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/recip_2.c @@ -0,0 +1,27 @@ +/* { dg-options "-Ofast -mlow-precision-div" } */ + +#define DEF_LOOP(TYPE) \ + void \ + test_##TYPE (TYPE *restrict x, TYPE *restrict y, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + x[i] /= y[i]; \ + } + +#define TEST_ALL(T) \ + T (_Float16) \ + T (float) \ + T (double) + +TEST_ALL (DEF_LOOP) + +/* { dg-final { scan-assembler-not {\tfrecpe\tz[0-9]+\.h} } } */ +/* { dg-final { scan-assembler-not {\tfrecps\tz[0-9]+\.h} } } */ + +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s} 2 } } */ +/* { dg-final { scan-assembler-times {\tfrecpe\tz[0-9]+\.s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfrecps\tz[0-9]+\.s} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d} 3 } } */ +/* { dg-final { scan-assembler-times {\tfrecpe\tz[0-9]+\.d} 1 } } */ +/* { dg-final { scan-assembler-times {\tfrecps\tz[0-9]+\.d} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/recip_2_run.c b/gcc/testsuite/gcc.target/aarch64/sve/recip_2_run.c new file mode 100644 index 00000000000..25a31e11f55 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/recip_2_run.c @@ -0,0 +1,30 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-Ofast -mlow-precision-div" } */ + +#include "recip_2.c" + +#define N 77 + +#define TEST_LOOP(TYPE) \ + { \ + TYPE a[N], b[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + a[i] = i + 11; \ + b[i] = i + 1; \ + } \ + test_##TYPE (a, b, N); \ + for (int i = 0; i < N; ++i) \ + { \ + double diff = a[i] - (i + 11.0) / (i + 1); \ + if (__builtin_fabs (diff) > 0x1.0p-8) \ + __builtin_abort (); \ + } \ + } + +int +main (void) +{ + TEST_ALL (TEST_LOOP); + return 0; +}