From patchwork Wed Nov 19 13:05:19 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Maciej W. Rozycki" <macro@codesourcery.com>
X-Patchwork-Id: 412379
Return-Path: 
 <gcc-patches-return-385077-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id EC24E140170
	for <incoming@patchwork.ozlabs.org>;
	Thu, 20 Nov 2014 00:05:45 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:mime-version:content-type; q=dns;
	s=default; b=IwkZq3qBzSOTJd+4ei5TI/uETsKJoZ3ackb8IrKd9cHIPBMh40
	wm95GcrcfbNU3H8Qpoj0wD0tbeqx31D+z3Uuj/U3eXDdAD4NTO39qYIDFB7fJeq7
	/xMWvEqjWouwOMwD7P0baP4m4Ru6VXZYJuwbg29VqR+ZKjXwQdERmi3Og=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:mime-version:content-type; s=
	default; bh=Midi9fPMtlY6TX76KwIYOhpjfmc=; b=wbceGSKVfYXRQORWOaZk
	NjkUc6cpk3qqPywE15mqV3O1nyoJ5kW4a+D1g7TlPDXNUnObs7JUsWkN+0+G1BWk
	QgjbBqHPUkROwevkgZoo2/UGo80YznV15Fiv1fUuOuM9ecYY77Ci4pxCyPS3cFET
	qIS2x0iwjEcU+s97SsLXazA=
Received: (qmail 17999 invoked by alias); 19 Nov 2014 13:05:37 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 17984 invoked by uid 89); 19 Nov 2014 13:05:36 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL, BAYES_00,
	RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2
X-HELO: relay1.mentorg.com
Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131)
	by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	ESMTP; Wed, 19 Nov 2014 13:05:29 +0000
Received: from nat-ies.mentorg.com ([192.94.31.2]
	helo=SVR-IES-FEM-01.mgc.mentorg.com)	by relay1.mentorg.com
	with esmtp id 1Xr4wn-0001ed-ER from Maciej_Rozycki@mentor.com
	; Wed, 19 Nov 2014 05:05:25 -0800
Received: from localhost (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com
	(137.202.0.104) with Microsoft SMTP Server (TLS) id
	14.3.181.6; Wed, 19 Nov 2014 13:05:23 +0000
Date: Wed, 19 Nov 2014 13:05:19 +0000
From: "Maciej W. Rozycki" <macro@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>
CC: Catherine Moore <clm@codesourcery.com>,
	Eric Christopher	<echristo@gmail.com>,
	Matthew Fortune <matthew.fortune@imgtec.com>
Subject: [PATCH] MIPS16/GCC: Optimise `__call_stub_fp_' call/return stubs
Message-ID: <alpine.DEB.1.10.1411191209550.2881@tp.orcam.me.uk>
User-Agent: Alpine 1.10 (DEB 962 2008-03-14)
MIME-Version: 1.0

Hi,

 It has come to my attention that we create suboptimal code for the 
`__call_stub_fp_' variant of the MIPS16 call stubs.  These stubs are 
used for outgoing calls made from MIPS16 code to standard MIPS code that 
return floating-point results and may also pass floating-point 
arguments.

 This can be illustrated with this small program:

$ cat foo.c
double bar (double d);

int main (int argc, char **argv)
{
  return bar (argc);
}
$ mips-linux-gnu-gcc -O2 -mips16 -c foo.c
$ mips-linux-gnu-objdump -dr foo.o

foo.o:     file format elf32-tradbigmips

Disassembly of section .mips16.call.fp.bar:

00000000 <__call_stub_fp_bar>:
   0:	44856000 	mtc1	a1,$f12
   4:	44846800 	mtc1	a0,$f13
   8:	03e09021 	move	s2,ra
   c:	0c000000 	jal	0 <__call_stub_fp_bar>
			c: R_MIPS_26	bar
  10:	00000000 	nop
  14:	44030000 	mfc1	v1,$f0
  18:	02400008 	jr	s2
  1c:	44020800 	mfc1	v0,$f1

Disassembly of section .text.startup:

00000000 <main>:
   0:	f100 64c4 	save	32,ra,s2
   4:	1800 0000 	jal	0 <main>
			4: R_MIPS16_26	__mips16_floatsidf
   8:	6500      	nop
   a:	67a3      	move	a1,v1
   c:	1800 0000 	jal	0 <main>
			c: R_MIPS16_26	bar
  10:	6782      	move	a0,v0
  12:	67a3      	move	a1,v1
  14:	1800 0000 	jal	0 <main>
			14: R_MIPS16_26	__mips16_fix_truncdfsi
  18:	6782      	move	a0,v0
  1a:	f100 6444 	restore	32,ra,s2
  1e:	e8a0      	jrc	ra

-- as you can see in `__call_stub_fp_bar' above the jump delay slot 
remains empty because the move to $s2 instruction cannot be scheduled 
there due to a data dependency on $ra.

 However there is no need to save $ra last and since the MIPS IV ISA 
there are no coprocessor move delay slots so the last MTC1 instruction 
could be scheduled there instead. 

 So here is a change to avoid this empty delay slot.  With this change 
in place we get this code instead:

$ mips-linux-gnu-objdump -dr foo.o

foo.o:     file format elf32-tradbigmips

Disassembly of section .mips16.call.fp.bar:

00000000 <__call_stub_fp_bar>:
   0:	03e09021 	move	s2,ra
   4:	44856000 	mtc1	a1,$f12
   8:	0c000000 	jal	0 <__call_stub_fp_bar>
			8: R_MIPS_26	bar
   c:	44846800 	mtc1	a0,$f13
  10:	44030000 	mfc1	v1,$f0
  14:	02400008 	jr	s2
  18:	44020800 	mfc1	v0,$f1

Disassembly of section .text.startup:

00000000 <main>:
   0:	f100 64c4 	save	32,ra,s2
   4:	1800 0000 	jal	0 <main>
			4: R_MIPS16_26	__mips16_floatsidf
   8:	6500      	nop
   a:	67a3      	move	a1,v1
   c:	1800 0000 	jal	0 <main>
			c: R_MIPS16_26	bar
  10:	6782      	move	a0,v0
  12:	67a3      	move	a1,v1
  14:	1800 0000 	jal	0 <main>
			14: R_MIPS16_26	__mips16_fix_truncdfsi
  18:	6782      	move	a0,v0
  1a:	f100 6444 	restore	32,ra,s2
  1e:	e8a0      	jrc	ra

-- as you can see the last MTC1 instruction has now been placed in the 
delay slot.

 For ISAs that do have a coprocessor move delay slot there is no gain, 
but no loss either, both delay slots remain empty due to data 
dependencies (coprocessor move delay slots):

$ mips-linux-gnu-gcc -O2 -mips1 -mips16 -c foo.c
$ mips-linux-gnu-objdump -dr foo.o

foo.o:     file format elf32-tradbigmips

Disassembly of section .mips16.call.fp.bar:

00000000 <__call_stub_fp_bar>:
   0:	03e09021 	move	s2,ra
   4:	44856000 	mtc1	a1,$f12
   8:	44846800 	mtc1	a0,$f13
   c:	0c000000 	jal	0 <__call_stub_fp_bar>
			c: R_MIPS_26	bar
  10:	00000000 	nop
  14:	44030000 	mfc1	v1,$f0
  18:	44020800 	mfc1	v0,$f1
  1c:	02400008 	jr	s2
  20:	00000000 	nop

Disassembly of section .text.startup:

00000000 <main>:
   0:	63fc      	addiu	sp,-32
   2:	6772      	move	v1,s2
   4:	6207      	sw	ra,28(sp)
   6:	1800 0000 	jal	0 <main>
			6: R_MIPS16_26	__mips16_floatsidf
   a:	d306      	sw	v1,24(sp)
   c:	67a3      	move	a1,v1
   e:	1800 0000 	jal	0 <main>
			e: R_MIPS16_26	bar
  12:	6782      	move	a0,v0
  14:	67a3      	move	a1,v1
  16:	1800 0000 	jal	0 <main>
			16: R_MIPS16_26	__mips16_fix_truncdfsi
  1a:	6782      	move	a0,v0
  1c:	9606      	lw	a2,24(sp)
  1e:	9707      	lw	a3,28(sp)
  20:	6556      	move	s2,a2
  22:	ef00      	jr	a3
  24:	6304      	addiu	sp,32
  26:	6500      	nop

-- though I think this consideration is actually academic as no legacy 
MIPS16 processors have ever been made that had an FPU I believe (delay 
slots do not matter with software FPU emulation).  The original MIPS16 
implementation, the LSI's TinyRISC cores such as the TR4101 CPU did 
provide a COPx interface including possibly an FPU, but to the best of 
my knowledge none was actually made.  See also the discussion around the 
previous iteration of these optimisations here:

http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01414.html

 This change has been regression-tested with the mips-linux-gnu target 
and these multilibs:

-EB
-EB -mips16
-EB -mmicromips
-EB -mabi=n32
-EB -mabi=64

and the -EL variants of same, with no changes in results.

 FWIW there is no need to make a corresponding change to GDB to 
interpret the modified stub as the debugger ignores and skips over the 
move to $s2 in interpreting stub's code, the instruction can be 
anywhere.

 I have a second optimisation to make here too, but that triggers a 
surprising bug in GNU LD where BFD code meant to discard unused stubs 
appears not to work at all.  So that'll have to be fixed first and it 
also means the other optimisation is unsafe to include in 5.0.  I plan 
to post it shortly anyway for discussion, once I have the linker bug 
fixed.

 Meanwhile, OK to apply this change?

2014-11-19  Maciej W. Rozycki  <macro@codesourcery.com>

	gcc/
	* config/mips/mips.c (mips16_build_call_stub): Move the save of
	the return address in $18 ahead of passing arguments to FPRs.

  Maciej

gcc-mips16-call-fp-stub-s2.patch

Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.c
===================================================================
--- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.c	2014-11-18 13:28:13.508975765 +0000
+++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.c	2014-11-18 23:33:01.418180571 +0000
@@ -6945,6 +6945,17 @@ mips16_build_call_stub (rtx retval, rtx 
 	  /* "Save" $sp in itself so we don't use the fake CFA.
 	     This is: DW_CFA_val_expression r29, { DW_OP_reg29 }.  */
 	  fprintf (asm_out_file, "\t.cfi_escape 0x16,29,1,0x6d\n");
+
+	  /* Save the return address in $18.  The stub's caller knows
+	     that $18 might be clobbered, even though $18 is usually
+	     a call-saved register.
+
+	     Do it early on in case the last move to a floating-point
+	     register can be scheduled into the delay slot of the
+	     call we are about to make.  */
+	  fprintf (asm_out_file, "\tmove\t%s,%s\n",
+		   reg_names[GP_REG_FIRST + 18],
+		   reg_names[RETURN_ADDR_REGNUM]);
 	}
       else
 	{
@@ -6966,11 +6977,7 @@ mips16_build_call_stub (rtx retval, rtx 
 
       if (fp_ret_p)
 	{
-	  /* Save the return address in $18 and call the non-MIPS16 function.
-	     The stub's caller knows that $18 might be clobbered, even though
-	     $18 is usually a call-saved register.  */
-	  fprintf (asm_out_file, "\tmove\t%s,%s\n",
-		   reg_names[GP_REG_FIRST + 18], reg_names[RETURN_ADDR_REGNUM]);
+	  /* Now call the non-MIPS16 function.  */
 	  output_asm_insn (MIPS_CALL ("jal", &fn, 0, -1), &fn);
 	  fprintf (asm_out_file, "\t.cfi_register 31,18\n");