From patchwork Tue Mar 12 09:50:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1055265 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-497743-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="tgtQFDJ8"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44JVZj6TCvz9s7h for ; Tue, 12 Mar 2019 20:50:19 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:references:date:in-reply-to:message-id :mime-version:content-type; q=dns; s=default; b=JXcly60H/LeyXSS/ kmwKlQSsAinb75vRcc51ROBCsrpmdPZoDhJ9nuaZYong1IMAQ9319OcnfDvtv56K 4zkSvSS7fOIn7wzuSiq5txtN52UqE7yaLQN/MVuuRr/WUVnrAkhX2+p3Rd3Wbt1o 2nm2P/UKRBl83BMY3QVCJ4oGb0Q= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:references:date:in-reply-to:message-id :mime-version:content-type; s=default; bh=7iwCmFznqKiaHMqzQpDrfc Wehck=; b=tgtQFDJ88syMK3dh7ye+tWs0IqVg7HiLUAiCT+RaoQVjxwOGSGAdAm rdu8cAF9ekivRQoS+jqZhDGYlXNqB1tNBSwwF5l2VJbbw59mg/tVZ/iUJBluIyz9 Dif0U+1r7mUkw0XgOdf5wUosWm/JN1nOwAT08WuT1tnk9hhvcCS4s= Received: (qmail 114634 invoked by alias); 12 Mar 2019 09:50:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 114624 invoked by uid 89); 12 Mar 2019 09:50:11 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, SPF_PASS autolearn=ham version=3.3.1 spammy=optimisation, 2926 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 12 Mar 2019 09:50:09 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A1E6AA78; Tue, 12 Mar 2019 02:50:07 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.39]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 89DE83F59C; Tue, 12 Mar 2019 02:50:06 -0700 (PDT) From: Richard Sandiford To: Steve Ellcey Mail-Followup-To: Steve Ellcey , "gcc-patches\@gcc.gnu.org" , "Marcus.Shawcroft\@arm.com" , "james.greenhalgh\@arm.com" , "Richard.Earnshaw\@arm.com" , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , "Marcus.Shawcroft\@arm.com" , "james.greenhalgh\@arm.com" , "Richard.Earnshaw\@arm.com" Subject: New target hook for function scratch registers (PR 89628) References: <0966cb22817a24bf206cbeb9272bb4d441cf9528.camel@marvell.com> <8f7c7e37c841eb9e0f40e1091a72f5e30011d3e3.camel@marvell.com> Date: Tue, 12 Mar 2019 09:50:04 +0000 In-Reply-To: <8f7c7e37c841eb9e0f40e1091a72f5e30011d3e3.camel@marvell.com> (Steve Ellcey's message of "Mon, 11 Mar 2019 16:10:15 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Steve Ellcey writes: > Richard, > > I don't necessarily disagree with anything in your comments and long > term I think that is the right direction, but I wonder if that level of > change is appropriate for GCC Stage 4 which is where we are now. Your > changes would require fixes in shared code, whereas setting > REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change. I guess it's weighing invasiveness in terms of lines of code/location of code vs. invasiveness in terms of the effect the code has. Changing REG_ALLOC_ORDER affects all functinos that use significant numbers of SIMD registers, which could have unexpected knock-on effects like performance regressions or exposing latent bugs. It would also make the RA try V24 before V16 for callers of vector PCS functions, even though they should prefer V16. Keeping the change specific to callees that use the new vector PCS feature seems more conservative from that point of view. > I am not sure how long it would take me to implement something along > the lines of what you are suggesting. OK, in that case, I thought I'd give a go. This patch adds a new target hook that says which registers a function can use without saving them in the prologue and restoring them in the epilogue. By default this is call_used_reg_set. The patch only makes IRA use the hook, and in that context it's just an (important) optimisation. But I think we need the same thing for passes like regrename, where it'll be a correctness issue. I'll follow on with patches there if this one is OK. Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu. OK to install? Richard 2019-03-12 Richard Sandiford gcc/ PR target/89628 * target.def (function_scratch_regs): New hook. * doc/tm.texi.in (TARGET_FUNCTION_SCRATCH_REGS): New placeholder. * doc/tm.texi: Regenerate. * targhooks.h (default_function_scratch_regs): Declare. * targhooks.c (default_function_scratch_regs): New function. * emit-rtl.h (rtl_data::scratch_regs): New member variable. * function.c (init_dummy_function_start): Initialize it. (init_function_start): Likewise. * ira-color.c (calculate_saved_nregs): Use it instead of call_used_reg_set. * config/aarch64/aarch64.c (aarch64_function_scratch_regs): New function. (TARGET_FUNCTION_SCRATCH_REGS): Redefine. gcc/testsuite/ PR target/89628 * gcc.target/aarch64/pr89628.c: New test. Index: gcc/target.def =================================================================== --- gcc/target.def 2019-03-08 18:14:26.113008680 +0000 +++ gcc/target.def 2019-03-12 09:34:25.447913041 +0000 @@ -5763,6 +5763,18 @@ The default version of this hook always default_hard_regno_scratch_ok) DEFHOOK +(function_scratch_regs, + "This hook sets @var{regs} to the set of registers that function @var{fn}\n\ +can use without having to save them in the prologue and restore them in the\n\ +epilogue.\n\ +\n\ +By default this set is the same as @code{call_used_reg_set}. Targets only\n\ +need to override the hook if some functions implement alternative ABIs that\n\ +save more registers or fewer registers than normal.", + void, (struct function *fn, HARD_REG_SET *regs), + default_function_scratch_regs) + +DEFHOOK (hard_regno_call_part_clobbered, "This hook should return true if @var{regno} is partly call-saved and\n\ partly call-clobbered, and if a value of mode @var{mode} would be partly\n\ Index: gcc/doc/tm.texi.in =================================================================== --- gcc/doc/tm.texi.in 2019-03-08 18:14:25.577010718 +0000 +++ gcc/doc/tm.texi.in 2019-03-12 09:34:25.443913054 +0000 @@ -1705,6 +1705,11 @@ of @code{CALL_USED_REGISTERS}. @cindex call-used register @cindex call-clobbered register @cindex call-saved register +@hook TARGET_FUNCTION_SCRATCH_REGS + +@cindex call-used register +@cindex call-clobbered register +@cindex call-saved register @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED @hook TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS Index: gcc/doc/tm.texi =================================================================== --- gcc/doc/tm.texi 2019-03-08 18:14:25.573010734 +0000 +++ gcc/doc/tm.texi 2019-03-12 09:34:25.443913054 +0000 @@ -1894,6 +1894,19 @@ of @code{CALL_USED_REGISTERS}. @cindex call-used register @cindex call-clobbered register @cindex call-saved register +@deftypefn {Target Hook} void TARGET_FUNCTION_SCRATCH_REGS (struct function *@var{fn}, HARD_REG_SET *@var{regs}) +This hook sets @var{regs} to the set of registers that function @var{fn} +can use without having to save them in the prologue and restore them in the +epilogue. + +By default this set is the same as @code{call_used_reg_set}. Targets only +need to override the hook if some functions implement alternative ABIs that +save more registers or fewer registers than normal. +@end deftypefn + +@cindex call-used register +@cindex call-clobbered register +@cindex call-saved register @deftypefn {Target Hook} bool TARGET_HARD_REGNO_CALL_PART_CLOBBERED (rtx_insn *@var{insn}, unsigned int @var{regno}, machine_mode @var{mode}) This hook should return true if @var{regno} is partly call-saved and partly call-clobbered, and if a value of mode @var{mode} would be partly Index: gcc/targhooks.h =================================================================== --- gcc/targhooks.h 2019-03-08 18:14:25.849009683 +0000 +++ gcc/targhooks.h 2019-03-12 09:34:25.447913041 +0000 @@ -286,5 +286,6 @@ extern bool speculation_safe_value_not_n extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx); extern void default_remove_extra_call_preserved_regs (rtx_insn *, HARD_REG_SET *); +extern void default_function_scratch_regs (struct function *, HARD_REG_SET *); #endif /* GCC_TARGHOOKS_H */ Index: gcc/targhooks.c =================================================================== --- gcc/targhooks.c 2019-03-08 18:14:25.845009699 +0000 +++ gcc/targhooks.c 2019-03-12 09:34:25.447913041 +0000 @@ -2379,4 +2379,10 @@ default_remove_extra_call_preserved_regs { } +void +default_function_scratch_regs (struct function *, HARD_REG_SET *regs) +{ + COPY_HARD_REG_SET (*regs, call_used_reg_set); +} + #include "gt-targhooks.h" Index: gcc/emit-rtl.h =================================================================== --- gcc/emit-rtl.h 2019-03-08 18:15:33.700751752 +0000 +++ gcc/emit-rtl.h 2019-03-12 09:34:25.443913054 +0000 @@ -292,6 +292,9 @@ struct GTY(()) rtl_data { sets them. */ HARD_REG_SET asm_clobbers; + /* Registers that the function can use without having to save them first. */ + HARD_REG_SET scratch_regs; + /* The highest address seen during shorten_branches. */ int max_insn_address; }; Index: gcc/function.c =================================================================== --- gcc/function.c 2019-03-08 18:15:33.660751905 +0000 +++ gcc/function.c 2019-03-12 09:34:25.447913041 +0000 @@ -4875,6 +4875,7 @@ init_dummy_function_start (void) { push_dummy_function (false); prepare_function_start (); + COPY_HARD_REG_SET (crtl->scratch_regs, call_used_reg_set); } /* Generate RTL for the start of the function SUBR (a FUNCTION_DECL tree node) @@ -4888,6 +4889,8 @@ init_function_start (tree subr) initialize_rtl (); prepare_function_start (); + targetm.function_scratch_regs (DECL_STRUCT_FUNCTION (subr), + &crtl->scratch_regs); decide_function_section (subr); /* Warn if this value is an aggregate type, Index: gcc/ira-color.c =================================================================== --- gcc/ira-color.c 2019-03-08 18:15:26.824777889 +0000 +++ gcc/ira-color.c 2019-03-12 09:34:25.447913041 +0000 @@ -1660,7 +1660,7 @@ calculate_saved_nregs (int hard_regno, m ira_assert (hard_regno >= 0); for (i = hard_regno_nregs (hard_regno, mode) - 1; i >= 0; i--) if (!allocated_hardreg_p[hard_regno + i] - && !TEST_HARD_REG_BIT (call_used_reg_set, hard_regno + i) + && !TEST_HARD_REG_BIT (crtl->scratch_regs, hard_regno + i) && !LOCAL_REGNO (hard_regno + i)) nregs++; return nregs; Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2019-03-08 18:15:38.228734542 +0000 +++ gcc/config/aarch64/aarch64.c 2019-03-12 09:34:25.427913117 +0000 @@ -1706,6 +1706,19 @@ aarch64_simd_call_p (rtx_insn *insn) return aarch64_simd_decl_p (fndecl); } +/* Implement TARGET_FUNCTION_SCRATCH_REGS. Vector PCS functions have + fewer available scratch registers than normal functions do. */ + +static void +aarch64_function_scratch_regs (function *fn, HARD_REG_SET *scratch_regs) +{ + default_function_scratch_regs (fn, scratch_regs); + if (aarch64_simd_decl_p (fn->decl)) + for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) + if (FP_SIMD_SAVED_REGNUM_P (regno)) + CLEAR_HARD_REG_BIT (*scratch_regs, regno); +} + /* Implement TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS. If INSN calls a function that uses the SIMD ABI, take advantage of the extra call-preserved registers that the ABI provides. */ @@ -19208,6 +19221,9 @@ #define TARGET_MODES_TIEABLE_P aarch64_m #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \ aarch64_hard_regno_call_part_clobbered +#undef TARGET_FUNCTION_SCRATCH_REGS +#define TARGET_FUNCTION_SCRATCH_REGS aarch64_function_scratch_regs + #undef TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS #define TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS \ aarch64_remove_extra_call_preserved_regs Index: gcc/testsuite/gcc.target/aarch64/pr89628.c =================================================================== --- /dev/null 2019-03-08 11:40:14.606883727 +0000 +++ gcc/testsuite/gcc.target/aarch64/pr89628.c 2019-03-12 09:34:25.447913041 +0000 @@ -0,0 +1,25 @@ +/* { dg-options "-O2" } */ + +typedef __Float32x4_t vec; + +__attribute__((aarch64_vector_pcs)) +vec f(vec a0, vec a1, vec a2, vec a3, vec a4, vec a5, vec a6, vec a7) +{ + vec t0, t1, t2, t3, t4, t5, t6, t7, s0, s1, s2, s3; + t0 = a0 - a7; + t1 = a1 - a6; + t2 = a2 - a5; + t3 = a3 - a4; + t4 = a4 - a3; + t5 = a5 - a2; + t6 = a6 - a1; + t7 = a7 - a0; + s0 = t0 * t1; + s1 = t2 * t3; + s2 = t4 * t5; + s3 = t6 * t7; + return s0 * s1 * s2 * s3 * a0 * a1 * a2 * a3 * a4 * a5 * a6 * a7; +} + +/* { dg-final { scan-assembler-not '\tldr\t' } } */ +/* { dg-final { scan-assembler-not '\tstr\t' } } */