From patchwork Thu Mar 17 22:12:44 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Evandro Menezes X-Patchwork-Id: 599253 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3qR2fy2q0gz9s9N for ; Fri, 18 Mar 2016 09:13:14 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=P+i/kEeK; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:cc:message-id:date:mime-version:content-type; q=dns; s=default; b=A9oRtyWe586yOOMmwRk8ggHu/dOfVfC7xn8XjtwgX01TqGZ7zp Hw51HuvGVf1N3xBo50ToJyBiSETze4tjUzqxxxnLt+VbMRKLffqwQAUJQ28sjuXE HGsIigsWu2XjYJSgXDK979YnsQuyKe8hAMSafXFhctjIljTSXABLvgsNA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:cc:message-id:date:mime-version:content-type; s= default; bh=TyL77zZqNwEjldbA/mB+rIevRH8=; b=P+i/kEeKKAOWgMZtsqJd lulA1dbkWLlc+R7Y5CWQibq6eQ/2FVSteKZ6iY3wmR+8GaX4nC0j5tYSdRhKdVh8 7MSwiMYaLUkJZ3S6zl3UPigYnZPH6CQrHneXdXHJLinfgncqEv1XYp8CjQtpR3K6 x4g7MCD79eNKN/lRL6+5a2w= Received: (qmail 81275 invoked by alias); 17 Mar 2016 22:12:58 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 81260 invoked by uid 89); 17 Mar 2016 22:12:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.9 required=5.0 tests=AWL, BAYES_50, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD, T_HDRS_LCASE, T_MANY_HDRS_LCASE autolearn=no version=3.3.2 spammy=get_mode_inner, A57, a57, UD:aarch64_tune_params.extra_tuning_flags X-HELO: usmailout3.samsung.com Received: from mailout3.w2.samsung.com (HELO usmailout3.samsung.com) (211.189.100.13) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 17 Mar 2016 22:12:47 +0000 Received: from uscpsbgm1.samsung.com (u114.gpu85.samsung.co.kr [203.254.195.114]) by usmailout3.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0O4700K71F19AY60@usmailout3.samsung.com> for gcc-patches@gcc.gnu.org; Thu, 17 Mar 2016 18:12:45 -0400 (EDT) Received: from ussync3.samsung.com ( [203.254.195.83]) by uscpsbgm1.samsung.com (USCPMTA) with SMTP id 90.46.04845.DDB2BE65; Thu, 17 Mar 2016 18:12:45 -0400 (EDT) Received: from [172.31.207.194] ([105.140.31.10]) by ussync3.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0O4700L8JF18CZ40@ussync3.samsung.com>; Thu, 17 Mar 2016 18:12:45 -0400 (EDT) From: Evandro Menezes Subject: [AArch64] Add precision choices for the reciprocal square root approximation To: GCC Patches Cc: James Greenhalgh , Wilco Dijkstra , Andrew Pinski Message-id: <56EB2BDC.30209@samsung.com> Date: Thu, 17 Mar 2016 17:12:44 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-version: 1.0 Content-type: multipart/mixed; boundary=------------060904060004090109050007 X-IsSubscribed: yes Add precision choices for the reciprocal square root approximation Allow a target to prefer such operation depending on the FP precision. gcc/ * config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask. (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise. * config/aarch64/aarch64.c (use_rsqrt_p): New argument for the mode. (aarch64_builtin_reciprocal): Devise mode from builtin. (aarch64_optab_supported_p): New argument for the mode. This patch allows a target to choose for which FP precision the reciprocal square root approximation is used. For example, though this approximation is improves the performance noticeably for DF on A57, for SF, not so much, if at all. Feedback appreciated. Thank you, From 95581aefcf324233c3603f4d8232ee18c5836f8a Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Thu, 17 Mar 2016 17:00:03 -0500 Subject: [PATCH] Add precision choices for the reciprocal square root approximation Allow a target to prefer such operation depending on the FP precision. gcc/ * config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask. (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise. * config/aarch64/aarch64.c (use_rsqrt_p): New argument for the mode. (aarch64_builtin_reciprocal): Devise mode from builtin. (aarch64_optab_supported_p): New argument for the mode. --- gcc/config/aarch64/aarch64-protos.h | 3 +++ gcc/config/aarch64/aarch64-tuning-flags.def | 3 ++- gcc/config/aarch64/aarch64.c | 23 +++++++++++++++-------- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index dced209..58e5d73 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -263,6 +263,9 @@ enum aarch64_extra_tuning_flags }; #undef AARCH64_EXTRA_TUNING_OPTION +#define AARCH64_EXTRA_TUNE_APPROX_RSQRT \ + (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF | AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF) + extern struct tune_params aarch64_tune_params; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 7e45a0c..57d9588 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -29,5 +29,6 @@ AARCH64_TUNE_ to give an enum name. */ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) -AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT) +AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT_DF) +AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrtf", APPROX_RSQRT_SF) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 12e498d..e651123 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7440,12 +7440,16 @@ aarch64_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED, to optimize 1.0/sqrt. */ static bool -use_rsqrt_p (void) +use_rsqrt_p (machine_mode mode) { return (!flag_trapping_math && flag_unsafe_math_optimizations - && ((aarch64_tune_params.extra_tuning_flags - & AARCH64_EXTRA_TUNE_APPROX_RSQRT) + && ((GET_MODE_INNER (mode) == SFmode + && (aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF)) + || (GET_MODE_INNER (mode) == DFmode + && (aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF)) || flag_mrecip_low_precision_sqrt)); } @@ -7455,9 +7459,12 @@ use_rsqrt_p (void) static tree aarch64_builtin_reciprocal (tree fndecl) { - if (!use_rsqrt_p ()) - return NULL_TREE; - return aarch64_builtin_rsqrt (DECL_FUNCTION_CODE (fndecl)); + machine_mode mode = TYPE_MODE (TREE_TYPE (fndecl)); + + if (use_rsqrt_p (mode)) + return aarch64_builtin_rsqrt (DECL_FUNCTION_CODE (fndecl)); + + return NULL_TREE; } typedef rtx (*rsqrte_type) (rtx, rtx); @@ -13952,13 +13959,13 @@ aarch64_promoted_type (const_tree t) /* Implement the TARGET_OPTAB_SUPPORTED_P hook. */ static bool -aarch64_optab_supported_p (int op, machine_mode, machine_mode, +aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode, optimization_type opt_type) { switch (op) { case rsqrt_optab: - return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (); + return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (mode1); default: return true; -- 1.9.1