From patchwork Mon Jul 29 17:33:15 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 262924 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 062622C00BD for ; Tue, 30 Jul 2013 03:33:40 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=VK4yTJFln7GCyBpLeDKcwtNSxCoesy9Bz9j4LxEMaXLwPSCG+E KOt+gZEDu5uIg1n8UaTcklnhDg/JuBCjULTfaxPubfMh15BA/RxCXx7lj1y6h/vN vVoNYjKzr1QDdtzSzcCcpWZRJypNiPjM4mL00lh6vnBiN1V2/QuIOAQQw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=Uqu91+/NWs3VevKHFuxsYGHCCkI=; b=yGyAvi6O8XpA67O7gebd m9EiV6MeWqc6Kl5J+S4ZlacGj44XoWPq8MMBsFouVd2vWxpHY5JTy5+kTkBj/YPO zxJgqE44T/5W9hOYEzZ3KOWUvupkUODBf4kyh8MHVm4qiWBBxrAsNaQoFcZXzbpc Ivp1fRfuAiYlUNErrr7zTYM= Received: (qmail 18144 invoked by alias); 29 Jul 2013 17:33:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 18135 invoked by uid 89); 29 Jul 2013 17:33:33 -0000 X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_50, KHOP_RCVD_UNTRUST, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, RDNS_NONE autolearn=no version=3.3.1 Received: from Unknown (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 29 Jul 2013 17:33:32 +0000 Received: from svr-orw-fem-01.mgc.mentorg.com ([147.34.98.93]) by relay1.mentorg.com with esmtp id 1V3rK0-0005XD-NY from Maciej_Rozycki@mentor.com ; Mon, 29 Jul 2013 10:33:24 -0700 Received: from SVR-IES-FEM-01.mgc.mentorg.com ([137.202.0.104]) by svr-orw-fem-01.mgc.mentorg.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Mon, 29 Jul 2013 10:33:24 -0700 Received: from [172.30.64.173] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.2.247.3; Mon, 29 Jul 2013 18:33:22 +0100 Date: Mon, 29 Jul 2013 18:33:15 +0100 From: "Maciej W. Rozycki" To: CC: Richard Sandiford Subject: [PATCH] libgcc/MIPS: Fill in delay slots of (some) MIPS16 call stubs Message-ID: User-Agent: Alpine 1.10 (DEB 962 2008-03-14) MIME-Version: 1.0 Hi, A shortcoming of older versions of GAS makes branch swapping not happen if the instruction to be reordered into a branch delay slot immediately follows a delay slot of another branch. This happens to hit some MIPS16 call stubs, e.g. (from libgcc.a): 00000000 <__mips16_call_stub_sf_0>: 0: 03e09021 move s2,ra 4: 0040f809 jalr v0 8: 0040c821 move t9,v0 c: 44020000 mfc1 v0,$f0 10: 02400008 jr s2 14: 00000000 nop The shortcoming has been recently lifted, but I gather GCC generally wants to (and does) schedule delay slots elsewhere manually, so why not to do so here as well. The piece of code above is generated from libgcc/config/mips/mips16.S with a macro called DELAYf() meant for pieces that read from an FPR. There's a complementing macro called DELAYt() to write an FPR that does schedule the delay slot manually. The reason for such an arrangement is I believe a possibility that a read from CP1 may require another instruction to complete before the value read is available in the destination GPR (a coprocessor move delay slot). I believe the only legacy MIPS processors that implemented the MIPS16 ASE in its original variation (i.e. with no compact jumps, no SAVE/RESTORE, and no extend instructions) were the LSI's TinyRISC cores. It's unclear to me from TinyRISC documentation whether these cores suffered from the coprocessor move delay slot. They featured a short three-stage pipeline that had a bypass implemented to make data from memory loads available to the immediately following instruction if needed, in parallel to the destination register write back, to avoid load delay slots. Unfortunately documentation does not mention whether such a bypass was available for coprocessor moves or not, even though the instructions are said to have the very same pipeline stages as memory moves. It is therefore safe to assume coprocessor move delay slots were required. OTOH no modern MIPS architecture processor requires coprocessor move delay slots (they were lifted with the MIPS IV ISA legacy ISA already), hence the current arrangement incurs unnecessary text space consumption and a performance hit for all the modern targets. Especially as in many cases the cases the next instruction executed after the branch delay slot will not access the GPR anyway and thus will not cause any potential pipeline stall even with any less efficient architecture implementations. This change therefore enables manual delay-slot scheduling of move-from-CP1 instructions whenever the stubs are built for the MIPS IV or a newer ISA. It makes the stub above look like this: 00000000 <__mips16_call_stub_sf_0>: 0: 03e09021 move s2,ra 4: 0040f809 jalr v0 8: 0040c821 move t9,v0 c: 02400008 jr s2 10: 44020000 mfc1 v0,$f0 These stubs are I believe not really covered in our testing, because they require a mixed standard-MIPS/MIPS16 environment. I have therefore verified libgcc.a object code by inspection to be still correct after this change, i.e. no change at all with current GAS (that otherwise schedules these move-from-CP1 instructions into the following jump's delay slot automatically) and the expected improved code with old GAS (that otherwise inserts a NOP into that delay slot instead). OK to apply? 2013-07-29 Maciej W. Rozycki libgcc/ * config/mips/mips16.S (DELAYf): Alias to DELAYt for the MIPS IV ISA and up. Maciej gcc-mips16-stub-delay-slot.patch Index: gcc-fsf-trunk-quilt/libgcc/config/mips/mips16.S =================================================================== --- gcc-fsf-trunk-quilt.orig/libgcc/config/mips/mips16.S 2013-03-27 15:20:54.000000000 +0000 +++ gcc-fsf-trunk-quilt/libgcc/config/mips/mips16.S 2013-07-13 02:40:38.300930313 +0100 @@ -89,8 +89,13 @@ see the files COPYING3 and COPYING.RUNTI OPCODE, OP2; \ .set reorder +#if __mips >= 4 +/* Coprocessor moves are interlocked from the MIPS IV ISA up. */ +#define DELAYf(T, OPCODE, OP2) DELAYt (T, OPCODE, OP2) +#else /* Use "OPCODE. OP2" and jump to T. */ #define DELAYf(T, OPCODE, OP2) OPCODE, OP2; jr T +#endif /* MOVE_SF_BYTE0(D) Move the first single-precision floating-point argument between