From patchwork Fri May 27 22:57:23 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Evandro Menezes X-Patchwork-Id: 627374 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rGhJD3yKgz9sBf for ; Sat, 28 May 2016 08:58:20 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=CGmK+N4f; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=bqKNN2VhBkbhsHm67 fGPPmVmTQ8nOa/UYunjyYBCXNyWs9xEJCyilamhkpOMmjutDBLP9GAFvRDX81WV+ OeAzYr6pKlRl4ddDIchVLSeln35wTqJHq1Eg7SwL5xXTQi6T9O5WKfL0FaGw39uX OKzX31AiDee7wY0XwFGxi/H7E0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=OdZrdMAiYVxhkZF1w+PV7xM w5BM=; b=CGmK+N4f/2Me3PSHG2qUWgwPiCF5Fs0m8DT9NaYYNulAxDq8oLffvtL NP5cFQ1rp5slh2bndEdNYtBNCkLsvu4hsKGWr0i3TMQPkGDxIwcGFjDz9CsX2dqr Awyuf23yXttx4sG44yxloZvcifOGcAGsCiyzfpDMY21nQy31t+KM= Received: (qmail 33365 invoked by alias); 27 May 2016 22:57:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 33044 invoked by uid 89); 27 May 2016 22:57:52 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=sk:flag_t, sk:!flag_t X-HELO: usmailout3.samsung.com Received: from mailout3.w2.samsung.com (HELO usmailout3.samsung.com) (211.189.100.13) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Fri, 27 May 2016 22:57:38 +0000 Received: from uscpsbgm2.samsung.com (u115.gpu85.samsung.co.kr [203.254.195.115]) by usmailout3.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0O7U00D0FYFPY7A0@usmailout3.samsung.com> for gcc-patches@gcc.gnu.org; Fri, 27 May 2016 18:57:25 -0400 (EDT) Received: from ussync2.samsung.com ( [203.254.195.82]) by uscpsbgm2.samsung.com (USCPMTA) with SMTP id 3A.42.07641.5D0D8475; Fri, 27 May 2016 18:57:25 -0400 (EDT) Received: from [172.31.207.194] ([105.140.31.10]) by ussync2.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0O7U00CNVYFO0600@ussync2.samsung.com>; Fri, 27 May 2016 18:57:25 -0400 (EDT) Subject: Re: [PATCH 1/3][AArch64] Add more choices for the reciprocal square root approximation To: James Greenhalgh References: <57212B7D.9000807@samsung.com> <20160525101553.GB9511@arm.com> Cc: GCC Patches , Wilco Dijkstra , Andrew Pinski , "philipp.tomsich@theobroma-systems.com" , Benedikt Huber , nd@arm.com From: Evandro Menezes Message-id: <5748D0D3.2000001@samsung.com> Date: Fri, 27 May 2016 17:57:23 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-version: 1.0 In-reply-to: <20160525101553.GB9511@arm.com> Content-type: multipart/mixed; boundary=------------020905040804080306030308 X-IsSubscribed: yes On 05/25/16 05:15, James Greenhalgh wrote: > On Wed, Apr 27, 2016 at 04:13:33PM -0500, Evandro Menezes wrote: >> gcc/ >> * config/aarch64/aarch64-protos.h >> (AARCH64_APPROX_MODE): New macro. >> (AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}): >> Likewise. >> (tune_params): New member "approx_rsqrt_modes". >> * config/aarch64/aarch64-tuning-flags.def >> (AARCH64_EXTRA_TUNE_APPROX_RSQRT): Remove macro. >> * config/aarch64/aarch64.c >> (generic_tunings): New member "approx_rsqrt_modes". >> (cortexa35_tunings): Likewise. >> (cortexa53_tunings): Likewise. >> (cortexa57_tunings): Likewise. >> (cortexa72_tunings): Likewise. >> (exynosm1_tunings): Likewise. >> (thunderx_tunings): Likewise. >> (xgene1_tunings): Likewise. >> (use_rsqrt_p): New argument for the mode and use new member from >> "tune_params". >> (aarch64_builtin_reciprocal): Devise mode from builtin. >> (aarch64_optab_supported_p): New argument for the mode. >> * doc/invoke.texi (-mlow-precision-recip-sqrt): Reword description. >> >> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h >> index f22a31c..50f1d24 100644 >> --- a/gcc/config/aarch64/aarch64-protos.h >> +++ b/gcc/config/aarch64/aarch64-protos.h >> @@ -178,6 +178,32 @@ struct cpu_branch_cost >> const int unpredictable; /* Unpredictable branch or optimizing for speed. */ >> }; >> >> +/* Control approximate alternatives to certain FP operators. */ >> +#define AARCH64_APPROX_MODE(MODE) \ >> + ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \ >> + ? (1 << ((MODE) - MIN_MODE_FLOAT)) \ >> + : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \ >> + ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT \ >> + + MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \ >> + : (0)) >> +#define AARCH64_APPROX_NONE (0) >> +#define AARCH64_APPROX_SP (AARCH64_APPROX_MODE (SFmode) \ >> + | AARCH64_APPROX_MODE (V2SFmode) \ >> + | AARCH64_APPROX_MODE (V4SFmode)) >> +#define AARCH64_APPROX_DP (AARCH64_APPROX_MODE (DFmode) \ >> + | AARCH64_APPROX_MODE (V2DFmode)) >> +#define AARCH64_APPROX_DFORM (AARCH64_APPROX_MODE (SFmode) \ >> + | AARCH64_APPROX_MODE (DFmode) \ >> + | AARCH64_APPROX_MODE (V2SFmode)) >> +#define AARCH64_APPROX_QFORM (AARCH64_APPROX_MODE (V4SFmode) \ >> + | AARCH64_APPROX_MODE (V2DFmode)) >> +#define AARCH64_APPROX_SCALAR (AARCH64_APPROX_MODE (SFmode) \ >> + | AARCH64_APPROX_MODE (DFmode)) >> +#define AARCH64_APPROX_VECTOR (AARCH64_APPROX_MODE (V2SFmode) \ >> + | AARCH64_APPROX_MODE (V4SFmode) \ >> + | AARCH64_APPROX_MODE (V2DFmode)) >> +#define AARCH64_APPROX_ALL (-1) >> + > Thanks for providing these various subsets, but I think they are > unneccesary for the final submission. From what I can see, only > AARCH64_APPROX_ALL and AARCH64_APPROX_NONE are used. Please remove the > rest, they are easy enough to add back if a subtarget wants them. OK >> struct tune_params >> { >> const struct cpu_cost_table *insn_extra_cost; >> @@ -218,6 +244,7 @@ struct tune_params >> } autoprefetcher_model; >> >> unsigned int extra_tuning_flags; >> + unsigned int approx_rsqrt_modes; > As we're going to add a few of these, lets follow the approach for some > of the other costs (e.g. branch costs, vector costs) and bury them in a > structure of their own. OK >> }; >> >> #define AARCH64_FUSION_PAIR(x, name) \ >> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def >> index 7e45a0c..048c2a3 100644 >> --- a/gcc/config/aarch64/aarch64-tuning-flags.def >> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def >> @@ -29,5 +29,3 @@ >> AARCH64_TUNE_ to give an enum name. */ >> >> AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) >> -AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT) >> - > Did you want to add another way to tune these by command line (not > neccessary now, but as a follow-up)? See how instruction fusion is > handled by the -moverride code for an example. I prefer your suggestion a la mode of RS6000, something like -mapprox=. Thank you, From 86d7690632d03ec85fd69bfaef8e89c0542518ad Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Thu, 3 Mar 2016 18:13:46 -0600 Subject: [PATCH 1/3] [AArch64] Add more choices for the reciprocal square root approximation Allow a target to prefer such operation depending on the operation mode. 2016-03-03 Evandro Menezes gcc/ * config/aarch64/aarch64-protos.h (AARCH64_APPROX_MODE): New macro. (AARCH64_APPROX_{NONE,ALL}): Likewise. (cpu_approx_modes): New structure. (tune_params): New member "approx_modes". * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_RSQRT): Remove macro. * config/aarch64/aarch64.c ({generic,exynosm1,xgene1}_approx_modes): New core "cpu_approx_modes" structures. (generic_tunings): New member "approx_modes". (cortexa35_tunings): Likewise. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (cortexa72_tunings): Likewise. (exynosm1_tunings): Likewise. (thunderx_tunings): Likewise. (xgene1_tunings): Likewise. (use_rsqrt_p): New argument for the mode and use new member from "tune_params". (aarch64_builtin_reciprocal): Devise mode from builtin. (aarch64_optab_supported_p): New argument for the mode. * doc/invoke.texi (-mlow-precision-recip-sqrt): Reword description. --- gcc/config/aarch64/aarch64-protos.h | 19 ++++++++++ gcc/config/aarch64/aarch64-tuning-flags.def | 2 - gcc/config/aarch64/aarch64.c | 57 ++++++++++++++++++++++------- gcc/doc/invoke.texi | 2 +- 4 files changed, 63 insertions(+), 17 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index f22a31c..6156281 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -178,6 +178,23 @@ struct cpu_branch_cost const int unpredictable; /* Unpredictable branch or optimizing for speed. */ }; +/* Control approximate alternatives to certain FP operators. */ +#define AARCH64_APPROX_MODE(MODE) \ + ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \ + ? (1 << ((MODE) - MIN_MODE_FLOAT)) \ + : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \ + ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT \ + + MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \ + : (0)) +#define AARCH64_APPROX_NONE (0) +#define AARCH64_APPROX_ALL (-1) + +/* Allowed modes for approximations. */ +struct cpu_approx_modes +{ + const unsigned int recip_sqrt; /* Reciprocal square root. */ +}; + struct tune_params { const struct cpu_cost_table *insn_extra_cost; @@ -218,6 +235,8 @@ struct tune_params } autoprefetcher_model; unsigned int extra_tuning_flags; + + const struct cpu_approx_modes *approx_modes; }; #define AARCH64_FUSION_PAIR(x, name) \ diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 7e45a0c..048c2a3 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -29,5 +29,3 @@ AARCH64_TUNE_ to give an enum name. */ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) -AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT) - diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 9995494..e532cfc 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -38,6 +38,7 @@ #include "recog.h" #include "diagnostic.h" #include "insn-attr.h" +#include "insn-modes.h" #include "alias.h" #include "fold-const.h" #include "stor-layout.h" @@ -393,6 +394,24 @@ static const struct cpu_branch_cost cortexa57_branch_cost = 3 /* Unpredictable. */ }; +/* Generic approximation modes. */ +static const cpu_approx_modes generic_approx_modes = +{ + AARCH64_APPROX_NONE /* recip_sqrt */ +}; + +/* Approximation modes for Exynos M1. */ +static const cpu_approx_modes exynosm1_approx_modes = +{ + AARCH64_APPROX_ALL /* recip_sqrt */ +}; + +/* Approximation modes for Xgene1. */ +static const cpu_approx_modes xgene1_approx_modes = +{ + AARCH64_APPROX_ALL /* recip_sqrt */ +}; + static const struct tune_params generic_tunings = { &cortexa57_extra_costs, @@ -414,7 +433,8 @@ static const struct tune_params generic_tunings = 0, /* max_case_values. */ 0, /* cache_line_size. */ tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_approx_modes /* approx_modes. */ }; static const struct tune_params cortexa35_tunings = @@ -439,7 +459,8 @@ static const struct tune_params cortexa35_tunings = 0, /* max_case_values. */ 0, /* cache_line_size. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_approx_modes /* approx_modes. */ }; static const struct tune_params cortexa53_tunings = @@ -464,7 +485,8 @@ static const struct tune_params cortexa53_tunings = 0, /* max_case_values. */ 0, /* cache_line_size. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_approx_modes /* approx_modes. */ }; static const struct tune_params cortexa57_tunings = @@ -489,7 +511,8 @@ static const struct tune_params cortexa57_tunings = 0, /* max_case_values. */ 0, /* cache_line_size. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags. */ + (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS), /* tune_flags. */ + &generic_approx_modes /* approx_modes. */ }; static const struct tune_params cortexa72_tunings = @@ -514,7 +537,8 @@ static const struct tune_params cortexa72_tunings = 0, /* max_case_values. */ 0, /* cache_line_size. */ tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_approx_modes /* approx_modes. */ }; static const struct tune_params exynosm1_tunings = @@ -538,7 +562,8 @@ static const struct tune_params exynosm1_tunings = 48, /* max_case_values. */ 64, /* cache_line_size. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_APPROX_RSQRT) /* tune_flags. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &exynosm1_approx_modes /* approx_modes. */ }; static const struct tune_params thunderx_tunings = @@ -562,7 +587,8 @@ static const struct tune_params thunderx_tunings = 0, /* max_case_values. */ 0, /* cache_line_size. */ tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE) /* tune_flags. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &generic_approx_modes /* approx_modes. */ }; static const struct tune_params xgene1_tunings = @@ -586,7 +612,8 @@ static const struct tune_params xgene1_tunings = 0, /* max_case_values. */ 0, /* cache_line_size. */ tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_APPROX_RSQRT) /* tune_flags. */ + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &xgene1_approx_modes /* approx_modes. */ }; /* Support for fine-grained override of the tuning structures. */ @@ -7452,12 +7479,12 @@ aarch64_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED, to optimize 1.0/sqrt. */ static bool -use_rsqrt_p (void) +use_rsqrt_p (machine_mode mode) { return (!flag_trapping_math && flag_unsafe_math_optimizations - && ((aarch64_tune_params.extra_tuning_flags - & AARCH64_EXTRA_TUNE_APPROX_RSQRT) + && ((aarch64_tune_params.approx_modes->recip_sqrt + & AARCH64_APPROX_MODE (mode)) || flag_mrecip_low_precision_sqrt)); } @@ -7467,7 +7494,9 @@ use_rsqrt_p (void) static tree aarch64_builtin_reciprocal (tree fndecl) { - if (!use_rsqrt_p ()) + machine_mode mode = TYPE_MODE (TREE_TYPE (fndecl)); + + if (!use_rsqrt_p (mode)) return NULL_TREE; return aarch64_builtin_rsqrt (DECL_FUNCTION_CODE (fndecl)); } @@ -13889,13 +13918,13 @@ aarch64_promoted_type (const_tree t) /* Implement the TARGET_OPTAB_SUPPORTED_P hook. */ static bool -aarch64_optab_supported_p (int op, machine_mode, machine_mode, +aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode, optimization_type opt_type) { switch (op) { case rsqrt_optab: - return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (); + return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (mode1); default: return true; diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index f1ac257..4340b08 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12939,7 +12939,7 @@ corresponding flag to the linker. When calculating the reciprocal square root approximation, uses one less step than otherwise, thus reducing latency and precision. This is only relevant if @option{-ffast-math} enables the reciprocal square root -approximation, which in turn depends on the target processor. +approximation. @item -march=@var{name} @opindex march -- 2.6.3