From patchwork Mon Apr 15 17:19:12 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greta Yorsh X-Patchwork-Id: 236657 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 9DC3A2C00E3 for ; Tue, 16 Apr 2013 03:19:36 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=DOhOKIsdQ5jH63zZyQNXwadHnAoMyLvIkaMBKz+p2zRxhDs1A5 O9TTnGhI5ygTDKg0TujQ5y257tURSDbfgX+8AVeZhL1qSJKQDyaiifVmUAG2kK+B YWHFltIL2Av7GqyWj2hfPgnXipB17z6J/DSAnxitdtGbNNj9QTjGAd8VM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=d6+3NQsaScZmGduSiDqdNMgVdSk=; b=A4w3IzDRPNwJnUwyGzO6 ryyHUNnd75ivXucSgtj4BTOwq0PZLEckMm02P9tzsD/x8YM0ItnwUKLSjpN+4rHW XHNfzNBN+uZLEqL3bPaD5HbNjk36z+faoDoHqolrIixVnO+MnvGAFYXirb7QfYGT iE4OtDte2DiOuIsbKyNnlIk= Received: (qmail 16684 invoked by alias); 15 Apr 2013 17:19:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 16671 invoked by uid 89); 15 Apr 2013 17:19:28 -0000 X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, MSGID_MULTIPLE_AT, RCVD_IN_DNSWL_LOW, TW_QE autolearn=no version=3.3.1 Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 15 Apr 2013 17:19:26 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Mon, 15 Apr 2013 18:19:23 +0100 Received: from e103227vm ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Mon, 15 Apr 2013 18:19:20 +0100 From: "Greta Yorsh" To: "GCC Patches" Cc: "Richard Earnshaw" , "Ramana Radhakrishnan" , , Subject: [PATCH, ARM] Prologue/epilogue using STRD/LDRD in ARM mode Date: Mon, 15 Apr 2013 18:19:12 +0100 Message-ID: <000d01ce39fd$598f5d80$0cae1880$@yorsh@arm.com> MIME-Version: 1.0 X-MC-Unique: 113041518192302001 X-Virus-Found: No Generate prologue/epilogue using STRD/LDRD in ARM mode, when tuning prefer_ldrd_strd flag is set, such as in Cortex-A15. The previous version of this patch was posted for review here: http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00995.html The new version includes the following improvements: (1) For prologue, it generates STRD whenever possible, otherwise it generate single-word loads, instead of STM. This allows us to use offset addressing with STRD, instead of writeback on every store used in the previous version of this patch. Similarly, for epilogue. To allow epilogue returns by loading directly into PC, a separate stack update instruction is emitted before the final load into PC. (2) The previous version of this patch causes an ICE in arm_emit_strd_push, when gcc is called with "-fno-omit-frame-pointer -mapcs-frame" command-line options. It is fixed in the attached patch, where arm_emit_strd_push is not called when TARGET_APCS_FRAME holds (epilogue already has a similar condition). (3) The previous version of the patch generated incorrect return sequences for interrupt function. This version fixes it by using the original LDM epilogues for interrupt functions. No need to change the tests gcc.target/arm/interrupt-*.c. (4) Takes assert statements out of the loop, addressing a comment made about a related patch, also relevant here. (5) Improves dwarf info generation. No regression on qemu for arm-none-eabi cortex-a15. Bootstrap successful on A15 TC2. Spec2k overall slight performance improvement (less than 1%) on Cortex-A15 TC2. Out of 26 benchmarks, 4 show regression of 2.5% or less (benchmarks 186,254,255,178). Other benchmarks show improvements or no change. Size increase overall by 1.4%. No clear correlation between performance and size increase. Ok for trunk? Thanks, Greta ChangeLog gcc/ 2013-04-15 Greta Yorsh * config/arm/arm.c (emit_multi_reg_push): New declaration for an existing function. (arm_emit_strd_push): New function. (arm_expand_prologue): Used here. (arm_emit_ldrd_pop): New function. (arm_expand_epilogue): Used here. (arm_get_frame_offsets): Update condition. (arm_emit_multi_reg_pop): Add a special case for load of a single register with writeback. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 982487e..833d092 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -173,6 +173,7 @@ static rtx arm_expand_builtin (tree, rtx, rtx, enum machine_mode, int); static tree arm_builtin_decl (unsigned, bool); static void emit_constant_insn (rtx cond, rtx pattern); static rtx emit_set_insn (rtx, rtx); +static rtx emit_multi_reg_push (unsigned long); static int arm_arg_partial_bytes (cumulative_args_t, enum machine_mode, tree, bool); static rtx arm_function_arg (cumulative_args_t, enum machine_mode, @@ -16690,6 +16691,148 @@ thumb2_emit_strd_push (unsigned long saved_regs_mask) return; } +/* STRD in ARM mode requires consecutive registers. This function emits STRD + whenever possible, otherwise it emits single-word stores. The first store + also allocates stack space for all saved registers, using writeback with + post-addressing mode. All other stores use offset addressing. If no STRD + can be emitted, this function emits a sequence of single-word stores, + and not an STM as before, because single-word stores provide more freedom + scheduling and can be turned into an STM by peephole optimizations. */ +static void +arm_emit_strd_push (unsigned long saved_regs_mask) +{ + int num_regs = 0; + int i, j, dwarf_index = 0; + int offset = 0; + rtx dwarf = NULL_RTX; + rtx insn = NULL_RTX; + rtx tmp, mem; + + /* TODO: A more efficient code can be emitted by changing the + layout, e.g., first push all pairs that can use STRD to keep the + stack aligned, and then push all other registers. */ + for (i = 0; i <= LAST_ARM_REGNUM; i++) + if (saved_regs_mask & (1 << i)) + num_regs++; + + gcc_assert (!(saved_regs_mask & (1 << SP_REGNUM))); + gcc_assert (!(saved_regs_mask & (1 << PC_REGNUM))); + gcc_assert (num_regs > 0); + + /* Create sequence for DWARF info. */ + dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1)); + + /* For dwarf info, we generate explicit stack update. */ + tmp = gen_rtx_SET (VOIDmode, + stack_pointer_rtx, + plus_constant (Pmode, stack_pointer_rtx, -4 * num_regs)); + RTX_FRAME_RELATED_P (tmp) = 1; + XVECEXP (dwarf, 0, dwarf_index++) = tmp; + + /* Save registers. */ + offset = - 4 * num_regs; + j = 0; + while (j <= LAST_ARM_REGNUM) + if (saved_regs_mask & (1 << j)) + { + if ((j % 2 == 0) + && (saved_regs_mask & (1 << (j + 1)))) + { + /* Current register and previous register form register pair for + which STRD can be generated. */ + if (offset < 0) + { + /* Allocate stack space for all saved registers. */ + tmp = plus_constant (Pmode, stack_pointer_rtx, offset); + tmp = gen_rtx_PRE_MODIFY (Pmode, stack_pointer_rtx, tmp); + mem = gen_frame_mem (DImode, tmp); + offset = 0; + } + else if (offset > 0) + mem = gen_frame_mem (DImode, + plus_constant (Pmode, + stack_pointer_rtx, + offset)); + else + mem = gen_frame_mem (DImode, stack_pointer_rtx); + + tmp = gen_rtx_SET (DImode, mem, gen_rtx_REG (DImode, j)); + RTX_FRAME_RELATED_P (tmp) = 1; + tmp = emit_insn (tmp); + + /* Record the first store insn. */ + if (dwarf_index == 1) + insn = tmp; + + /* Generate dwarf info. */ + mem = gen_frame_mem (SImode, + plus_constant (Pmode, + stack_pointer_rtx, + offset)); + tmp = gen_rtx_SET (SImode, mem, gen_rtx_REG (SImode, j)); + RTX_FRAME_RELATED_P (tmp) = 1; + XVECEXP (dwarf, 0, dwarf_index++) = tmp; + + mem = gen_frame_mem (SImode, + plus_constant (Pmode, + stack_pointer_rtx, + offset + 4)); + tmp = gen_rtx_SET (SImode, mem, gen_rtx_REG (SImode, j + 1)); + RTX_FRAME_RELATED_P (tmp) = 1; + XVECEXP (dwarf, 0, dwarf_index++) = tmp; + + offset += 8; + j += 2; + } + else + { + /* Emit a single word store. */ + if (offset < 0) + { + /* Allocate stack space for all saved registers. */ + tmp = plus_constant (Pmode, stack_pointer_rtx, offset); + tmp = gen_rtx_PRE_MODIFY (Pmode, stack_pointer_rtx, tmp); + mem = gen_frame_mem (SImode, tmp); + offset = 0; + } + else if (offset > 0) + mem = gen_frame_mem (SImode, + plus_constant (Pmode, + stack_pointer_rtx, + offset)); + else + mem = gen_frame_mem (SImode, stack_pointer_rtx); + + tmp = gen_rtx_SET (SImode, mem, gen_rtx_REG (SImode, j)); + RTX_FRAME_RELATED_P (tmp) = 1; + tmp = emit_insn (tmp); + + /* Record the first store insn. */ + if (dwarf_index == 1) + insn = tmp; + + /* Generate dwarf info. */ + mem = gen_frame_mem (SImode, + plus_constant(Pmode, + stack_pointer_rtx, + offset)); + tmp = gen_rtx_SET (SImode, mem, gen_rtx_REG (SImode, j)); + RTX_FRAME_RELATED_P (tmp) = 1; + XVECEXP (dwarf, 0, dwarf_index++) = tmp; + + offset += 4; + j += 1; + } + } + else + j++; + + /* Attach dwarf info to the first insn we generate. */ + gcc_assert (insn != NULL_RTX); + add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf); + RTX_FRAME_RELATED_P (insn) = 1; +} + /* Generate and emit an insn that we will recognize as a push_multi. Unfortunately, since this insn does not reflect very well the actual semantics of the operation, we need to annotate the insn for the benefit @@ -16889,6 +17032,17 @@ arm_emit_multi_reg_pop (unsigned long saved_regs_mask) if (saved_regs_mask & (1 << i)) { reg = gen_rtx_REG (SImode, i); + if ((num_regs == 1) && emit_update && !return_in_pc) + { + /* Emit single load with writeback. */ + tmp = gen_frame_mem (SImode, + gen_rtx_POST_INC (Pmode, + stack_pointer_rtx)); + tmp = emit_insn (gen_rtx_SET (VOIDmode, reg, tmp)); + REG_NOTES (tmp) = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf); + return; + } + tmp = gen_rtx_SET (VOIDmode, reg, gen_frame_mem @@ -17120,6 +17274,129 @@ thumb2_emit_ldrd_pop (unsigned long saved_regs_mask) return; } +/* LDRD in ARM mode needs consecutive registers as operands. This function + emits LDRD whenever possible, otherwise it emits single-word loads. It uses + offset addressing and then generates one separate stack udpate. This provides + more scheduling freedom, compared to writeback on every load. However, + if the function returns using load into PC directly + (i.e., if PC is in SAVED_REGS_MASK), the stack needs to be updated + before the last load. TODO: Add a peephole optimization to recognize + the new epilogue sequence as an LDM instruction whenever possible. TODO: Add + peephole optimization to merge the load at stack-offset zero + with the stack update instruction using load with writeback + in post-index addressing mode. */ +static void +arm_emit_ldrd_pop (unsigned long saved_regs_mask) +{ + int j = 0; + int offset = 0; + rtx par = NULL_RTX; + rtx dwarf = NULL_RTX; + rtx tmp, mem; + + /* Restore saved registers. */ + gcc_assert (!((saved_regs_mask & (1 << SP_REGNUM)))); + j = 0; + while (j <= LAST_ARM_REGNUM) + if (saved_regs_mask & (1 << j)) + { + if ((j % 2) == 0 + && (saved_regs_mask & (1 << (j + 1))) + && (j + 1) != PC_REGNUM) + { + /* Current register and next register form register pair for which + LDRD can be generated. PC is always the last register popped, and + we handle it separately. */ + if (offset > 0) + mem = gen_frame_mem (DImode, + plus_constant (Pmode, + stack_pointer_rtx, + offset)); + else + mem = gen_frame_mem (DImode, stack_pointer_rtx); + + tmp = gen_rtx_SET (DImode, gen_rtx_REG (DImode, j), mem); + RTX_FRAME_RELATED_P (tmp) = 1; + tmp = emit_insn (tmp); + + /* Generate dwarf info. */ + + dwarf = alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (SImode, j), + NULL_RTX); + dwarf = alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (SImode, j + 1), + dwarf); + + REG_NOTES (tmp) = dwarf; + + offset += 8; + j += 2; + } + else if (j != PC_REGNUM) + { + /* Emit a single word load. */ + if (offset > 0) + mem = gen_frame_mem (SImode, + plus_constant (Pmode, + stack_pointer_rtx, + offset)); + else + mem = gen_frame_mem (SImode, stack_pointer_rtx); + + tmp = gen_rtx_SET (SImode, gen_rtx_REG (SImode, j), mem); + RTX_FRAME_RELATED_P (tmp) = 1; + tmp = emit_insn (tmp); + + /* Generate dwarf info. */ + REG_NOTES (tmp) = alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (SImode, j), + NULL_RTX); + + offset += 4; + j += 1; + } + else /* j == PC_REGNUM */ + j++; + } + else + j++; + + /* Update the stack. */ + if (offset > 0) + { + tmp = gen_rtx_SET (Pmode, + stack_pointer_rtx, + plus_constant (Pmode, + stack_pointer_rtx, + offset)); + RTX_FRAME_RELATED_P (tmp) = 1; + emit_insn (tmp); + offset = 0; + } + + if (saved_regs_mask & (1 << PC_REGNUM)) + { + /* Only PC is to be popped. */ + par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2)); + XVECEXP (par, 0, 0) = ret_rtx; + tmp = gen_rtx_SET (SImode, + gen_rtx_REG (SImode, PC_REGNUM), + gen_frame_mem (SImode, + gen_rtx_POST_INC (SImode, + stack_pointer_rtx))); + RTX_FRAME_RELATED_P (tmp) = 1; + XVECEXP (par, 0, 1) = tmp; + par = emit_jump_insn (par); + + /* Generate dwarf info. */ + dwarf = alloc_reg_note (REG_CFA_RESTORE, + gen_rtx_REG (SImode, PC_REGNUM), + NULL_RTX); + REG_NOTES (par) = dwarf; + } +} + /* Calculate the size of the return value that is passed in registers. */ static unsigned arm_size_return_regs (void) @@ -17329,9 +17606,10 @@ arm_get_frame_offsets (void) /* If it is safe to use r3, then do so. This sometimes generates better code on Thumb-2 by avoiding the need to use 32-bit push/pop instructions. */ - if (! any_sibcall_uses_r3 () + if (! any_sibcall_uses_r3 () && arm_size_return_regs () <= 12 - && (offsets->saved_regs_mask & (1 << 3)) == 0) + && (offsets->saved_regs_mask & (1 << 3)) == 0 + && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd)) { reg = 3; } @@ -17763,6 +18041,12 @@ arm_expand_prologue (void) { thumb2_emit_strd_push (live_regs_mask); } + else if (TARGET_ARM + && !TARGET_APCS_FRAME + && !IS_INTERRUPT (func_type)) + { + arm_emit_strd_push (live_regs_mask); + } else { insn = emit_multi_reg_push (live_regs_mask); @@ -23922,6 +24206,8 @@ arm_expand_epilogue (bool really_return) { if (TARGET_THUMB2) thumb2_emit_ldrd_pop (saved_regs_mask); + else if (TARGET_ARM && !IS_INTERRUPT (func_type)) + arm_emit_ldrd_pop (saved_regs_mask); else arm_emit_multi_reg_pop (saved_regs_mask); }