diff mbox

MIPS16/GCC: Optimise `__call_stub_' call stubs

Message ID alpine.DEB.1.10.1412262352130.19155@tp.orcam.me.uk
State New
Headers show

Commit Message

Maciej W. Rozycki Dec. 29, 2014, 11:38 p.m. UTC
On Wed, 19 Nov 2014, Maciej W. Rozycki wrote:

>  I have a second optimisation to make here too, but that triggers a 
> surprising bug in GNU LD where BFD code meant to discard unused stubs 
> appears not to work at all.  So that'll have to be fixed first and it 
> also means the other optimisation is unsafe to include in 5.0.  I plan 
> to post it shortly anyway for discussion, once I have the linker bug 
> fixed.

 For posterity -- optimise plain call `__call_stub_' MIPS16 stubs (where 
no FP value is returned) that just jump to (tail-call) the actual standard 
MIPS function.  There is no need to jump via a register here as we know 
that:

1. By definition the jump target is going to be standard MIPS code (no 
   need to relax to JALX ever).

2. We are not linked into PIC code as PIC code uses libgcc.a's indirect 
   stubs instead so $25 doesn't have to be valid on function's entry.

3. If the target function has been compiled to PIC code, then a PIC stub 
   will be prepended to the function by LD to load $25 on entry as usually 
   with non-PIC code.

This shortens the stub from code like:

Disassembly of section .mips16.call.callee_af7:

00000000 <__call_stub_callee_af7>:
   0:	3c190000 	lui	t9,0x0
			0: R_MIPS_HI16	callee_af7
   4:	27390000 	addiu	t9,t9,0
			4: R_MIPS_LO16	callee_af7
   8:	44846000 	mtc1	a0,$f12
   c:	44877000 	mtc1	a3,$f14
  10:	44867800 	mtc1	a2,$f15
  14:	03200008 	jr	t9
  18:	00000000 	nop

(taken from the `pr23324' test case at `-O0' which also implies `-O1' for 
GAS, i.e. no branch swapping and hence the delay-slot NOP) to code like:

Disassembly of section .mips16.call.callee_af7:

00000000 <__call_stub_callee_af7>:
   0:	44846000 	mtc1	a0,$f12
   4:	44877000 	mtc1	a3,$f14
   8:	44867800 	mtc1	a2,$f15
   c:	08000000 	j	0 <__call_stub_callee_af7>
			c: R_MIPS_26	callee_af7
  10:	00000000 	nop

and also helps branch prediction (instruction prefetching at the target of 
the jump) where available by avoiding an indirect jump.

 As noted in the previous message cited above this however triggers a BFD 
bug, which I tracked down to a missing feature: call stubs are meant to be 
discarded where not needed -- which is where the actual function called is 
MIPS16 code -- but that has never been implemented where the actual 
function called is local (symbol referred binds locally).  This is because 
the global symbol hash is used internally by MIPS BFD linker code to check 
which MIPS16 stubs have to stay and which ought to be discarded.  

 Consequently any such stubs associated with local symbols are left 
untouched and get through to linker output, wasting storage and runtime 
memory space too.  When the actual function is MIPS16 code, the linker 
then fails as it cannot relax the jump associated with the R_MIPS_26 
relocation and bails out:

mips-linux-gnu-ld: pr23324.o: .mips16.call.callee_af7+0xc: Unsupported jump between ISA modes; consider recompiling with interlinking enabled.
mips-linux-gnu-ld: final link failed: Bad value

even though the stub will obviously never execute.  With an indirect jump 
currently produced the useless stub makes its way to linker output 
successfully with the mode switch taken into account in the HI16/LO16 
relocations associated with the LUI/ADDIU instruction pair.

 Therefore the BFD issue needs to be fixed first before this optimisation 
can be made and right now I cannot dive into implementing the missing bit 
noted above, so I'm just sharing this change so that it can be used in the 
future when BFD has been corrected.

2014-12-29  Maciej W. Rozycki  <macro@codesourcery.com>

	gcc/
	* config/mips/mips.c (mips16_build_call_stub): Emit a direct
	jump (and omit the address load) rather than a jump-register 
	instruction in the tail-call case.

  Maciej

gcc-mips16-call-stub-j.patch
diff mbox

Patch

Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.c
===================================================================
--- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.c	2014-11-18 23:33:10.917768628 +0000
+++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.c	2014-11-18 23:33:32.417976370 +0000
@@ -6957,19 +6957,6 @@  mips16_build_call_stub (rtx retval, rtx 
 		   reg_names[GP_REG_FIRST + 18],
 		   reg_names[RETURN_ADDR_REGNUM]);
 	}
-      else
-	{
-	  /* Load the address of the MIPS16 function into $25.  Do this
-	     first so that targets with coprocessor interlocks can use
-	     an MFC1 to fill the delay slot.  */
-	  if (TARGET_EXPLICIT_RELOCS)
-	    {
-	      output_asm_insn ("lui\t%^,%%hi(%0)", &fn);
-	      output_asm_insn ("addiu\t%^,%^,%%lo(%0)", &fn);
-	    }
-	  else
-	    output_asm_insn ("la\t%^,%0", &fn);
-	}
 
       /* Move the arguments from general registers to floating-point
 	 registers.  */
@@ -7037,10 +7024,7 @@  mips16_build_call_stub (rtx retval, rtx 
 	  fprintf (asm_out_file, "\t.cfi_endproc\n");
 	}
       else
-	{
-	  /* Jump to the previously-loaded address.  */
-	  output_asm_insn ("jr\t%^", NULL);
-	}
+	output_asm_insn (MIPS_CALL ("j", &fn, 0, -1), &fn);
 
 #ifdef ASM_DECLARE_FUNCTION_SIZE
       ASM_DECLARE_FUNCTION_SIZE (asm_out_file, stubname, stubdecl);