From patchwork Mon Jan 16 15:00:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 715795 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3v2Gdj11p8z9stc for ; Tue, 17 Jan 2017 02:01:12 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="xO7lF7wP"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :content-type:content-transfer-encoding:mime-version; q=dns; s= default; b=oELesFLJ513BVNVme6zBBVACqwLbdHElzNAkz3kg84mHAs1e1UYRW RB1J51WcX0vVa6Hjtt8B4hs7Igi57cBzqG+BqiB/1pUjJo8mhJCzMtEyKZnV142Z cAW/jEgdiUMnk2EB+RCw4tCGpl5xK8ov0ANghheIygf+QwhGsM6Ln8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:references:in-reply-to :content-type:content-transfer-encoding:mime-version; s=default; bh=WhCf6lkKpjDh+T9lsaGBUQ3MRME=; b=xO7lF7wPC5CI6PX2mkmJT0Wan9/o j4j0xZ/qYCj3vlpMtMaNNnQr5hiX0D0OQDxxSKsbkCox/9TpP/Cxy1q3jnsn3iNv K1uE0UQAZt4UqPbRAPeuhVzTKQ46ObLilGxY4o527Dfu5K/SysKbU6d9riRN7wbb wAgVkMK0uKORKZY= Received: (qmail 94969 invoked by alias); 16 Jan 2017 15:01:03 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 94952 invoked by uid 89); 16 Jan 2017 15:01:02 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=no version=3.3.2 spammy=eh2, pmode, match_operand, Pmode X-HELO: EUR01-DB5-obe.outbound.protection.outlook.com Received: from mail-db5eur01on0048.outbound.protection.outlook.com (HELO EUR01-DB5-obe.outbound.protection.outlook.com) (104.47.2.48) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 16 Jan 2017 15:00:52 +0000 Received: from AM5PR0802MB2610.eurprd08.prod.outlook.com (10.175.46.18) by AM5PR0801MB2082.eurprd08.prod.outlook.com (10.168.158.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.845.12; Mon, 16 Jan 2017 15:00:48 +0000 Received: from AM5PR0802MB2610.eurprd08.prod.outlook.com ([10.175.46.18]) by AM5PR0802MB2610.eurprd08.prod.outlook.com ([10.175.46.18]) with mapi id 15.01.0845.014; Mon, 16 Jan 2017 15:00:49 +0000 From: Wilco Dijkstra To: James Greenhalgh CC: Ramana Radhakrishnan , GCC Patches , nd Subject: Re: [PATCH][AArch64 - v4] Simplify eh_return implementation Date: Mon, 16 Jan 2017 15:00:48 +0000 Message-ID: References: <20170113175441.GA38433@arm.com> , <20170116110506.GB38422@arm.com> In-Reply-To: <20170116110506.GB38422@arm.com> authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-office365-filtering-correlation-id: 00e02d7f-10d6-46cb-5a32-08d43e207516 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001); SRVR:AM5PR0801MB2082; x-microsoft-exchange-diagnostics: 1; AM5PR0801MB2082; 7:WOhcO29nK0mCtw+4sPzfyNCq89wE4AIvqyA3QK128qVIUN3OU86GFM8jlL9QThN1y51gmvr6tXmZC+QqXECMEOuEm8zwonJpivBQ4/gU9dO1Cm1tIb0OLAXwEmzKaMWSUZVhonaS+5uhXyf9iLxZ/sYKuJCnxibot+YRKpYuWKyfkU8g/vrnwIGzdtDBPz5HjzlyRTmHidmi+UfcxNaLlk7bj02J21NFKteoWRsy0KxpQ7AUG3wRAwfR7nCwmYBbVRGcgCkYWYA+HI+A48iSXGGlcpH8NiEwTGYLtzv+5E1Z/CyUQgaOe5VLI2SQPjxyfl3oD2uKE7kQQDzSF6dNRu98TFgrOFMcr1UelrWJCaQrW30B0nb/cVh1HZZwNYNY8jwUiNWEcl+eXicO77be1zkbGT1/ZhbWlWzoA9gbEKgVZ6CNKfKSZhPgof3byYBC25lBF3iM8SSZeOMsF3PK0w== nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040375)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6055026)(6041248)(20161123555025)(20161123560025)(20161123564025)(20161123562025)(6072148); SRVR:AM5PR0801MB2082; BCL:0; PCL:0; RULEID:; SRVR:AM5PR0801MB2082; x-forefront-prvs: 01894AD3B8 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(7916002)(39860400002)(39450400003)(39410400002)(39840400002)(39850400002)(189002)(54534003)(377424004)(199003)(450100001)(5660300001)(102836003)(110136003)(6436002)(7696004)(122556002)(93886004)(66066001)(6116002)(189998001)(55016002)(3846002)(76176999)(229853002)(4326007)(38730400001)(3280700002)(25786008)(54356999)(50986999)(2906002)(99286003)(77096006)(27001)(6506006)(54906002)(101416001)(3660700001)(8936002)(6636002)(575784001)(92566002)(7736002)(97736004)(33656002)(8676002)(2900100001)(81166006)(81156014)(74316002)(305945005)(68736007)(106356001)(86362001)(9686003)(105586002)(106116001)(2950100002); DIR:OUT; SFP:1101; SCL:1; SRVR:AM5PR0801MB2082; H:AM5PR0802MB2610.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 16 Jan 2017 15:00:48.7975 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0801MB2082 Here is the updated version: This patch simplifies the handling of the EH return value. We force the use of the frame pointer so the return location is always at FP + 8. This means we can emit a simple volatile access in EH_RETURN_HANDLER_RTX without needing md patterns, splitters and frame offset calculations. The new implementation also fixes various bugs in aarch64_final_eh_return_addr, which does not work with -fomit-frame-pointer, alloca or outgoing arguments. Bootstrap OK, GCC Regression OK, OK for trunk? Would it be useful to backport this to GCC6.x? ChangeLog: 2017-01-16 Wilco Dijkstra PR77455 gcc/ * config/aarch64/aarch64.md (eh_return): Remove pattern and splitter. * config/aarch64/aarch64.h (AARCH64_EH_STACKADJ_REGNUM): Remove. (EH_RETURN_HANDLER_RTX): New define. * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Force frame pointer in EH return functions. (aarch64_expand_epilogue): Add barrier for eh_return. (aarch64_final_eh_return_addr): Remove. (aarch64_eh_return_handler_rtx): New function. * config/aarch64/aarch64-protos.h (aarch64_final_eh_return_addr): Remove. (aarch64_eh_return_handler_rtx): New prototype. testsuite/ * gcc.target/aarch64/eh_return.c: New test. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 29a3bd71151aa4fb7c6728f0fb52e2f3f233f41d..602f54f934a9b62e782154e7f19cafe3b1bf0481 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -358,7 +358,7 @@ int aarch64_hard_regno_mode_ok (unsigned, machine_mode); int aarch64_hard_regno_nregs (unsigned, machine_mode); int aarch64_uxt_size (int, HOST_WIDE_INT); int aarch64_vec_fpconst_pow_of_2 (rtx); -rtx aarch64_final_eh_return_addr (void); +rtx aarch64_eh_return_handler_rtx (void); rtx aarch64_mask_from_zextract_ops (rtx, rtx); const char *aarch64_output_move_struct (rtx *operands); rtx aarch64_return_addr (int, rtx); diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 5aee9eb5b42d200343484ac4d18d650d26027078..924376223f52e9f614ea2b4bc12bd24d2ab2720a 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -400,9 +400,9 @@ extern unsigned aarch64_architecture_version; #define ASM_DECLARE_FUNCTION_NAME(STR, NAME, DECL) \ aarch64_declare_function_name (STR, NAME, DECL) -/* The register that holds the return address in exception handlers. */ -#define AARCH64_EH_STACKADJ_REGNUM (R0_REGNUM + 4) -#define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, AARCH64_EH_STACKADJ_REGNUM) +/* For EH returns X4 contains the stack adjustment. */ +#define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, R4_REGNUM) +#define EH_RETURN_HANDLER_RTX aarch64_eh_return_handler_rtx () /* Don't use __builtin_setjmp until we've defined it. */ #undef DONT_USE_BUILTIN_SETJMP diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4e5fad912ba24ffeca36e2d601b67e09c005bb54..d2585dc6eb14fc82c952ca8dbeda445d249fdb9b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -2762,6 +2762,10 @@ aarch64_frame_pointer_required (void) && (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM))) return true; + /* Force a frame pointer for EH returns so the return address is at FP+8. */ + if (crtl->calls_eh_return) + return true; + return false; } @@ -3623,7 +3627,8 @@ aarch64_expand_epilogue (bool for_sibcall) + cfun->machine->frame.saved_varargs_size) != 0; /* Emit a barrier to prevent loads from a deallocated stack. */ - if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca) + if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca + || crtl->calls_eh_return) { emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx)); need_barrier_p = false; @@ -3691,52 +3696,40 @@ aarch64_expand_epilogue (bool for_sibcall) emit_jump_insn (ret_rtx); } -/* Return the place to copy the exception unwinding return address to. - This will probably be a stack slot, but could (in theory be the - return register). */ -rtx -aarch64_final_eh_return_addr (void) -{ - HOST_WIDE_INT fp_offset; - - aarch64_layout_frame (); +/* Implement EH_RETURN_HANDLER_RTX. EH returns need to either return + normally or return to a previous frame after unwinding. - fp_offset = cfun->machine->frame.frame_size - - cfun->machine->frame.hard_fp_offset; + An EH return uses a single shared return sequence. The epilogue is + exactly like a normal epilogue except that it has an extra input + register (EH_RETURN_STACKADJ_RTX) which contains the stack adjustment + that must be applied after the frame has been destroyed. An extra label + is inserted before the epilogue which initializes this register to zero, + and this is the entry point for a normal return. - if (cfun->machine->frame.reg_offset[LR_REGNUM] < 0) - return gen_rtx_REG (DImode, LR_REGNUM); + An actual EH return updates the return address, initializes the stack + adjustment and jumps directly into the epilogue (bypassing the zeroing + of the adjustment). Since the return address is typically saved on the + stack when a function makes a call, the saved LR must be updated outside + the epilogue. - /* DSE and CSELIB do not detect an alias between sp+k1 and fp+k2. This can - result in a store to save LR introduced by builtin_eh_return () being - incorrectly deleted because the alias is not detected. - So in the calculation of the address to copy the exception unwinding - return address to, we note 2 cases. - If FP is needed and the fp_offset is 0, it means that SP = FP and hence - we return a SP-relative location since all the addresses are SP-relative - in this case. This prevents the store from being optimized away. - If the fp_offset is not 0, then the addresses will be FP-relative and - therefore we return a FP-relative location. */ + This poses problems as the store is generated well before the epilogue, + so the offset of LR is not known yet. Also optimizations will remove the + store as it appears dead, even after the epilogue is generated (as the + base or offset for loading LR is different in many cases). - if (frame_pointer_needed) - { - if (fp_offset) - return gen_frame_mem (DImode, - plus_constant (Pmode, hard_frame_pointer_rtx, UNITS_PER_WORD)); - else - return gen_frame_mem (DImode, - plus_constant (Pmode, stack_pointer_rtx, UNITS_PER_WORD)); - } - - /* If FP is not needed, we calculate the location of LR, which would be - at the top of the saved registers block. */ + To avoid these problems this implementation forces the frame pointer + in eh_return functions so that the location of LR is fixed and known early. + It also marks the store volatile, so no optimization is permitted to + remove the store. */ +rtx +aarch64_eh_return_handler_rtx (void) +{ + rtx tmp = gen_frame_mem (Pmode, + plus_constant (Pmode, hard_frame_pointer_rtx, UNITS_PER_WORD)); - return gen_frame_mem (DImode, - plus_constant (Pmode, - stack_pointer_rtx, - fp_offset - + cfun->machine->frame.saved_regs_size - - 2 * UNITS_PER_WORD)); + /* Mark the store volatile, so no optimization is permitted to remove it. */ + MEM_VOLATILE_P (tmp) = true; + return tmp; } /* Output code to add DELTA to the first argument, and then jump diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index bde42317f1bfd91a9d38a4cfa94d4cedd5246003..b3be106ac6ff1fb77bdc05cb4b0e57102787b87c 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -592,25 +592,6 @@ (define_insn "simple_return" [(set_attr "type" "branch")] ) -(define_insn "eh_return" - [(unspec_volatile [(match_operand:DI 0 "register_operand" "r")] - UNSPECV_EH_RETURN)] - "" - "#" - [(set_attr "type" "branch")] - -) - -(define_split - [(unspec_volatile [(match_operand:DI 0 "register_operand" "")] - UNSPECV_EH_RETURN)] - "reload_completed" - [(set (match_dup 1) (match_dup 0))] - { - operands[1] = aarch64_final_eh_return_addr (); - } -) - (define_insn "*cb1" [(set (pc) (if_then_else (EQL (match_operand:GPI 0 "register_operand" "r") (const_int 0)) diff --git a/gcc/testsuite/gcc.target/aarch64/eh_return.c b/gcc/testsuite/gcc.target/aarch64/eh_return.c new file mode 100644 index 0000000000000000000000000000000000000000..32179488085ed4c84aa77007565c875d09c4197c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/eh_return.c @@ -0,0 +1,82 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fno-inline" } */ + +#include +#include + +int val, test, failed; + +int main (void); + +void +eh0 (void *p) +{ + val = (int)(long)p & 7; + if (val) + abort (); +} + +void +eh1 (void *p, int x) +{ + void *q = __builtin_alloca (x); + eh0 (q); + __builtin_eh_return (0, p); +} + +void +eh2a (int a,int b,int c,int d,int e,int f,int g,int h, void *p) +{ + val = a + b + c + d + e + f + g + h + (int)(long)p & 7; +} + +void +eh2 (void *p) +{ + eh2a (val, val, val, val, val, val, val, val, p); + __builtin_eh_return (0, p); +} + + +void +continuation (void) +{ + test++; + main (); +} + +void +fail (void) +{ + failed = 1; + printf ("failed\n"); + continuation (); +} + +void +do_test1 (void) +{ + if (!val) + eh1 (continuation, 100); + fail (); +} + +void +do_test2 (void) +{ + if (!val) + eh2 (continuation); + fail (); +} + +int +main (void) +{ + if (test == 0) + do_test1 (); + if (test == 1) + do_test2 (); + if (failed || test != 2) + exit (1); + exit (0); +}