From patchwork Mon Dec 29 23:38:10 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 424492 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 458B21400B7 for ; Tue, 30 Dec 2014 10:38:33 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type; q=dns; s=default; b=HodBA1D7wbcHe47R 7CcvDr0UpPg6yF19whazP0gwhs4Z9UkysFSrjk/+bLEccEBpXLCpL+v07KQzx6Jd l+45C9eJOI3DB55wxspHGzlKg5c3y8EhNbD8M4sAtN7zYPi+RVZhzpjH/FnkyQa2 9gEQUIkUXrqTTp9k8ytLUWT55TM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type; s=default; bh=mcDYCq73YqCZzi6yjyAFT1 pOdMU=; b=TWk+XVR2FiMh9JfCTYHZIM7qwa767nsbFf+5kD4d/zdgacWaPwwoyM cPxQj7YE+3fT9K4LNTwDc+wS5HbNUwkeKDbtDxGRx0CHdg5XaLZ+Nud+MoQF/OAD BXckznNnAZ1mbXXTT4PC1O8vQWcSiBhMVscdHgHLeVAE1VN6MvjZg= Received: (qmail 8683 invoked by alias); 29 Dec 2014 23:38:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 8671 invoked by uid 89); 29 Dec 2014 23:38:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 29 Dec 2014 23:38:21 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1Y5jtB-0001OR-Sh from Maciej_Rozycki@mentor.com ; Mon, 29 Dec 2014 15:38:18 -0800 Received: from localhost (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server (TLS) id 14.3.224.2; Mon, 29 Dec 2014 23:38:16 +0000 Date: Mon, 29 Dec 2014 23:38:10 +0000 From: "Maciej W. Rozycki" To: CC: Catherine Moore , Eric Christopher , Matthew Fortune Subject: [PATCH] MIPS16/GCC: Optimise `__call_stub_' call stubs In-Reply-To: Message-ID: References: User-Agent: Alpine 1.10 (DEB 962 2008-03-14) MIME-Version: 1.0 On Wed, 19 Nov 2014, Maciej W. Rozycki wrote: > I have a second optimisation to make here too, but that triggers a > surprising bug in GNU LD where BFD code meant to discard unused stubs > appears not to work at all. So that'll have to be fixed first and it > also means the other optimisation is unsafe to include in 5.0. I plan > to post it shortly anyway for discussion, once I have the linker bug > fixed. For posterity -- optimise plain call `__call_stub_' MIPS16 stubs (where no FP value is returned) that just jump to (tail-call) the actual standard MIPS function. There is no need to jump via a register here as we know that: 1. By definition the jump target is going to be standard MIPS code (no need to relax to JALX ever). 2. We are not linked into PIC code as PIC code uses libgcc.a's indirect stubs instead so $25 doesn't have to be valid on function's entry. 3. If the target function has been compiled to PIC code, then a PIC stub will be prepended to the function by LD to load $25 on entry as usually with non-PIC code. This shortens the stub from code like: Disassembly of section .mips16.call.callee_af7: 00000000 <__call_stub_callee_af7>: 0: 3c190000 lui t9,0x0 0: R_MIPS_HI16 callee_af7 4: 27390000 addiu t9,t9,0 4: R_MIPS_LO16 callee_af7 8: 44846000 mtc1 a0,$f12 c: 44877000 mtc1 a3,$f14 10: 44867800 mtc1 a2,$f15 14: 03200008 jr t9 18: 00000000 nop (taken from the `pr23324' test case at `-O0' which also implies `-O1' for GAS, i.e. no branch swapping and hence the delay-slot NOP) to code like: Disassembly of section .mips16.call.callee_af7: 00000000 <__call_stub_callee_af7>: 0: 44846000 mtc1 a0,$f12 4: 44877000 mtc1 a3,$f14 8: 44867800 mtc1 a2,$f15 c: 08000000 j 0 <__call_stub_callee_af7> c: R_MIPS_26 callee_af7 10: 00000000 nop and also helps branch prediction (instruction prefetching at the target of the jump) where available by avoiding an indirect jump. As noted in the previous message cited above this however triggers a BFD bug, which I tracked down to a missing feature: call stubs are meant to be discarded where not needed -- which is where the actual function called is MIPS16 code -- but that has never been implemented where the actual function called is local (symbol referred binds locally). This is because the global symbol hash is used internally by MIPS BFD linker code to check which MIPS16 stubs have to stay and which ought to be discarded. Consequently any such stubs associated with local symbols are left untouched and get through to linker output, wasting storage and runtime memory space too. When the actual function is MIPS16 code, the linker then fails as it cannot relax the jump associated with the R_MIPS_26 relocation and bails out: mips-linux-gnu-ld: pr23324.o: .mips16.call.callee_af7+0xc: Unsupported jump between ISA modes; consider recompiling with interlinking enabled. mips-linux-gnu-ld: final link failed: Bad value even though the stub will obviously never execute. With an indirect jump currently produced the useless stub makes its way to linker output successfully with the mode switch taken into account in the HI16/LO16 relocations associated with the LUI/ADDIU instruction pair. Therefore the BFD issue needs to be fixed first before this optimisation can be made and right now I cannot dive into implementing the missing bit noted above, so I'm just sharing this change so that it can be used in the future when BFD has been corrected. 2014-12-29 Maciej W. Rozycki gcc/ * config/mips/mips.c (mips16_build_call_stub): Emit a direct jump (and omit the address load) rather than a jump-register instruction in the tail-call case. Maciej gcc-mips16-call-stub-j.patch Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.c =================================================================== --- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.c 2014-11-18 23:33:10.917768628 +0000 +++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.c 2014-11-18 23:33:32.417976370 +0000 @@ -6957,19 +6957,6 @@ mips16_build_call_stub (rtx retval, rtx reg_names[GP_REG_FIRST + 18], reg_names[RETURN_ADDR_REGNUM]); } - else - { - /* Load the address of the MIPS16 function into $25. Do this - first so that targets with coprocessor interlocks can use - an MFC1 to fill the delay slot. */ - if (TARGET_EXPLICIT_RELOCS) - { - output_asm_insn ("lui\t%^,%%hi(%0)", &fn); - output_asm_insn ("addiu\t%^,%^,%%lo(%0)", &fn); - } - else - output_asm_insn ("la\t%^,%0", &fn); - } /* Move the arguments from general registers to floating-point registers. */ @@ -7037,10 +7024,7 @@ mips16_build_call_stub (rtx retval, rtx fprintf (asm_out_file, "\t.cfi_endproc\n"); } else - { - /* Jump to the previously-loaded address. */ - output_asm_insn ("jr\t%^", NULL); - } + output_asm_insn (MIPS_CALL ("j", &fn, 0, -1), &fn); #ifdef ASM_DECLARE_FUNCTION_SIZE ASM_DECLARE_FUNCTION_SIZE (asm_out_file, stubname, stubdecl);