libgcc/MIPS: Fill in delay slots of (some) MIPS16 call stubs
diff mbox

Message ID alpine.DEB.1.10.1307291809530.32382@tp.orcam.me.uk
State Accepted
Headers show

Commit Message

Maciej W. Rozycki July 29, 2013, 5:33 p.m. UTC
Hi,

 A shortcoming of older versions of GAS makes branch swapping not happen 
if the instruction to be reordered into a branch delay slot immediately 
follows a delay slot of another branch.  This happens to hit some MIPS16 
call stubs, e.g. (from libgcc.a):

00000000 <__mips16_call_stub_sf_0>:
   0:	03e09021 	move	s2,ra
   4:	0040f809 	jalr	v0
   8:	0040c821 	move	t9,v0
   c:	44020000 	mfc1	v0,$f0
  10:	02400008 	jr	s2
  14:	00000000 	nop

The shortcoming has been recently lifted, but I gather GCC generally wants 
to (and does) schedule delay slots elsewhere manually, so why not to do so 
here as well.

 The piece of code above is generated from libgcc/config/mips/mips16.S 
with a macro called DELAYf() meant for pieces that read from an FPR.  
There's a complementing macro called DELAYt() to write an FPR that does 
schedule the delay slot manually.  The reason for such an arrangement is I 
believe a possibility that a read from CP1 may require another instruction 
to complete before the value read is available in the destination GPR (a 
coprocessor move delay slot).

 I believe the only legacy MIPS processors that implemented the MIPS16 ASE 
in its original variation (i.e. with no compact jumps, no SAVE/RESTORE, 
and no extend instructions) were the LSI's TinyRISC cores.  It's unclear 
to me from TinyRISC documentation whether these cores suffered from the 
coprocessor move delay slot.  They featured a short three-stage pipeline 
that had a bypass implemented to make data from memory loads available to 
the immediately following instruction if needed, in parallel to the 
destination register write back, to avoid load delay slots.  
Unfortunately documentation does not mention whether such a bypass was 
available for coprocessor moves or not, even though the instructions are 
said to have the very same pipeline stages as memory moves.  It is 
therefore safe to assume coprocessor move delay slots were required.

 OTOH no modern MIPS architecture processor requires coprocessor move 
delay slots (they were lifted with the MIPS IV ISA legacy ISA already), 
hence the current arrangement incurs unnecessary text space consumption 
and a performance hit for all the modern targets.  Especially as in many 
cases the cases the next instruction executed after the branch delay slot 
will not access the GPR anyway and thus will not cause any potential 
pipeline stall even with any less efficient architecture implementations.

 This change therefore enables manual delay-slot scheduling of 
move-from-CP1 instructions whenever the stubs are built for the MIPS IV or 
a newer ISA. It makes the stub above look like this:

00000000 <__mips16_call_stub_sf_0>:
   0:   03e09021        move    s2,ra
   4:   0040f809        jalr    v0
   8:   0040c821        move    t9,v0
   c:   02400008        jr      s2
  10:   44020000        mfc1    v0,$f0

 These stubs are I believe not really covered in our testing, because they 
require a mixed standard-MIPS/MIPS16 environment.  I have therefore 
verified libgcc.a object code by inspection to be still correct after this 
change, i.e. no change at all with current GAS (that otherwise schedules 
these move-from-CP1 instructions into the following jump's delay slot 
automatically) and the expected improved code with old GAS (that otherwise 
inserts a NOP into that delay slot instead).

 OK to apply?

2013-07-29  Maciej W. Rozycki  <macro@codesourcery.com>

	libgcc/
	* config/mips/mips16.S (DELAYf): Alias to DELAYt for the MIPS IV 
	ISA and up.

  Maciej

gcc-mips16-stub-delay-slot.patch

Comments

Richard Sandiford July 29, 2013, 7:19 p.m. UTC | #1
"Maciej W. Rozycki" <macro@codesourcery.com> writes:
>  I believe the only legacy MIPS processors that implemented the MIPS16 ASE 
> in its original variation (i.e. with no compact jumps, no SAVE/RESTORE, 
> and no extend instructions) were the LSI's TinyRISC cores.

Ah, hadn't realised that the original version had no EXTEND instruction.
Which TinyRISC iteration was that?  The LR4102 had it AFAIK.

FWIW GCC and binutils always assume that EXTEND is available and just
make a distinction between "original MIPS16" (which sounds like it might
not be as original as I'd thought) and "MIPS16e".  The VR4121 and VR4131
were also in the "original MIPS16" category (no compact jumps, SAVE/RESTORE,
etc.).  But they also had no FPU.

>  These stubs are I believe not really covered in our testing, because they 
> require a mixed standard-MIPS/MIPS16 environment.

What's the barrier to testing a mixed environment?  The normal *-linux-gnu
configurations have no MIPS16 multilibs, so you should be able to test it
on a plain mips-linux-gnu configuration using --target_flags unix/-mips16.
FWIW I've been using the mips64-linux-gnu equivalent
(--target_flags unix/-mabi=32/-mips16) without problems.  

Or if you don't want to test on GNU/Linux, you should be able to use something
like mips64-elf configured with whichever --with-arch= you like (and an
appropriate simulator).  Long time since I tried that though.  I prefer
testing on GNU/Linux because it also covers libgcc.so symbol visibility
and has better libgfortran support.

> 	libgcc/
> 	* config/mips/mips16.S (DELAYf): Alias to DELAYt for the MIPS IV 
> 	ISA and up.

OK, thanks, but please do run it through the testsuite first.

Richard
Richard Sandiford July 29, 2013, 7:31 p.m. UTC | #2
Richard Sandiford <rdsandiford@googlemail.com> writes:
> "Maciej W. Rozycki" <macro@codesourcery.com> writes:
>>  I believe the only legacy MIPS processors that implemented the MIPS16 ASE 
>> in its original variation (i.e. with no compact jumps, no SAVE/RESTORE, 
>> and no extend instructions) were the LSI's TinyRISC cores.
>
> Ah, hadn't realised that the original version had no EXTEND instruction.
> Which TinyRISC iteration was that?  The LR4102 had it AFAIK.

*sigh*.  I only just realised you meant the _register_ extend instructions
(SEB, etc.).  Sorry about that.

Richard
Maciej W. Rozycki July 29, 2013, 9:20 p.m. UTC | #3
On Mon, 29 Jul 2013, Richard Sandiford wrote:

> >  I believe the only legacy MIPS processors that implemented the MIPS16 ASE 
> > in its original variation (i.e. with no compact jumps, no SAVE/RESTORE, 
> > and no extend instructions) were the LSI's TinyRISC cores.
> 
> Ah, hadn't realised that the original version had no EXTEND instruction.
> Which TinyRISC iteration was that?  The LR4102 had it AFAIK.
> 
> FWIW GCC and binutils always assume that EXTEND is available and just
> make a distinction between "original MIPS16" (which sounds like it might
> not be as original as I'd thought) and "MIPS16e".  The VR4121 and VR4131
> were also in the "original MIPS16" category (no compact jumps, SAVE/RESTORE,
> etc.).  But they also had no FPU.

 I meant the sign- and zero-extend instructions indeed, not the EXTEND 
opcode (not really a distinct instruction, more like a prefix), sorry for 
not being clear enough.  Thanks for the reminder of the NEC silicon.

> >  These stubs are I believe not really covered in our testing, because they 
> > require a mixed standard-MIPS/MIPS16 environment.
> 
> What's the barrier to testing a mixed environment?  The normal *-linux-gnu
> configurations have no MIPS16 multilibs, so you should be able to test it
> on a plain mips-linux-gnu configuration using --target_flags unix/-mips16.
> FWIW I've been using the mips64-linux-gnu equivalent
> (--target_flags unix/-mabi=32/-mips16) without problems.  
> 
> Or if you don't want to test on GNU/Linux, you should be able to use something
> like mips64-elf configured with whichever --with-arch= you like (and an
> appropriate simulator).  Long time since I tried that though.  I prefer
> testing on GNU/Linux because it also covers libgcc.so symbol visibility
> and has better libgfortran support.

 We don't have specific coverage, but something in the testsuite might 
happen to pull one or more of these thunks indeed.

> > 	libgcc/
> > 	* config/mips/mips16.S (DELAYf): Alias to DELAYt for the MIPS IV 
> > 	ISA and up.
> 
> OK, thanks, but please do run it through the testsuite first.

 I'll see if I can do it -- the MIPS/Linux tree I used for recent MIPS32r2 
MADD.fmt testing has no MIPS16 multilibs configured, so it might happen to 
just work with -mips16 passed as an extra option (otherwise MIPS16 libs 
would be automagically picked instead).  I'll check if binaries executeed 
really pulled any of the thunks concerned.

  Maciej
Maciej W. Rozycki July 30, 2013, 6:36 p.m. UTC | #4
On Mon, 29 Jul 2013, Maciej W. Rozycki wrote:

> > >  These stubs are I believe not really covered in our testing, because they 
> > > require a mixed standard-MIPS/MIPS16 environment.
> > 
> > What's the barrier to testing a mixed environment?  The normal *-linux-gnu
> > configurations have no MIPS16 multilibs, so you should be able to test it
> > on a plain mips-linux-gnu configuration using --target_flags unix/-mips16.
> > FWIW I've been using the mips64-linux-gnu equivalent
> > (--target_flags unix/-mabi=32/-mips16) without problems.  
> > 
> > Or if you don't want to test on GNU/Linux, you should be able to use something
> > like mips64-elf configured with whichever --with-arch= you like (and an
> > appropriate simulator).  Long time since I tried that though.  I prefer
> > testing on GNU/Linux because it also covers libgcc.so symbol visibility
> > and has better libgfortran support.
> 
>  We don't have specific coverage, but something in the testsuite might 
> happen to pull one or more of these thunks indeed.

 It is as I feared, the coverage is sparse.  The only call/return stubs 
pulled across the testsuite are:

__mips16_call_stub_sf_5
__mips16_call_stub_sc_0
__mips16_call_stub_df_0
__mips16_call_stub_df_1
__mips16_call_stub_df_2
__mips16_call_stub_df_10

(there are some call stubs too and some PIC stubs, but they all use 
different code).  Of these only the first suffers from the old GAS 
shortcoming:

00000000 <__mips16_call_stub_sf_5>:
   0:	03e09021 	move	s2,ra
   4:	44846000 	mtc1	a0,$f12
   8:	44857000 	mtc1	a1,$f14
   c:	0040f809 	jalr	v0
  10:	0040c821 	move	t9,v0
  14:	44020000 	mfc1	v0,$f0
  18:	02400008 	jr	s2
  1c:	00000000 	nop

as only stubs that return a single float result are affected (ones that 
return a single complex or a double result are not, because they use more 
than one CP1 move in their epilogues and old GAS is capable enough to 
schedule the last move into the JR's delay slot itself).

> > > 	libgcc/
> > > 	* config/mips/mips16.S (DELAYf): Alias to DELAYt for the MIPS IV 
> > > 	ISA and up.
> > 
> > OK, thanks, but please do run it through the testsuite first.
> 
>  I'll see if I can do it -- the MIPS/Linux tree I used for recent MIPS32r2 
> MADD.fmt testing has no MIPS16 multilibs configured, so it might happen to 
> just work with -mips16 passed as an extra option (otherwise MIPS16 libs 
> would be automagically picked instead).  I'll check if binaries executeed 
> really pulled any of the thunks concerned.

 No regressions seen, applied now.  Thanks for your review.

  Maciej

Patch
diff mbox

Index: gcc-fsf-trunk-quilt/libgcc/config/mips/mips16.S
===================================================================
--- gcc-fsf-trunk-quilt.orig/libgcc/config/mips/mips16.S	2013-03-27 15:20:54.000000000 +0000
+++ gcc-fsf-trunk-quilt/libgcc/config/mips/mips16.S	2013-07-13 02:40:38.300930313 +0100
@@ -89,8 +89,13 @@  see the files COPYING3 and COPYING.RUNTI
 	OPCODE, OP2;				\
 	.set	reorder
 
+#if __mips >= 4
+/* Coprocessor moves are interlocked from the MIPS IV ISA up.  */
+#define DELAYf(T, OPCODE, OP2) DELAYt (T, OPCODE, OP2)
+#else
 /* Use "OPCODE. OP2" and jump to T.  */
 #define DELAYf(T, OPCODE, OP2) OPCODE, OP2; jr T
+#endif
 
 /* MOVE_SF_BYTE0(D)
 	Move the first single-precision floating-point argument between