diff mbox

If using branch likelies in MIPS sync code fill the delay slot with a nop

Message ID 0DA23CC379F5F945ACB41CF394B9827720F331A8@LEMAIL01.le.imgtec.org
State New
Headers show

Commit Message

Andrew Bennett Nov. 18, 2014, 1:37 p.m. UTC
Hi,

The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when
using the -mfix-r10000 option.  This is due to the fact that the delay 
slot of the branch instruction that checks if the atomic operation was
not successful can be filled with an operation that returns the output 
result.  For the branch likely case this operation can not go in the 
delay slot because it wont execute when the atomic operation was 
successful and therefore the wrong result will be returned.  This patch 
forces a nop to be placed in the delay slot if a branch likely is used.  
The fix is as simple as possible; it does cause a second nop to be introduced 
after the branch instruction in the branch likely case.  As this option is 
rarely used, it makes more sense to keep the code readable than trying to 
optimise it.

The atomic tests now pass successfully.  The ChangeLog and patch are below.

Ok to commit?

Many thanks,



Andrew



gcc/
	* config/mips/mips.c (mips_process_sync_loop): Place a nop in the 
	delay slot of the branch likely instruction.

Comments

Maciej W. Rozycki Nov. 18, 2014, 1:48 p.m. UTC | #1
On Tue, 18 Nov 2014, Andrew Bennett wrote:

> The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when
> using the -mfix-r10000 option.  This is due to the fact that the delay 
> slot of the branch instruction that checks if the atomic operation was
> not successful can be filled with an operation that returns the output 
> result.  For the branch likely case this operation can not go in the 
> delay slot because it wont execute when the atomic operation was 
> successful and therefore the wrong result will be returned.  This patch 
> forces a nop to be placed in the delay slot if a branch likely is used.  
> The fix is as simple as possible; it does cause a second nop to be introduced 
> after the branch instruction in the branch likely case.  As this option is 
> rarely used, it makes more sense to keep the code readable than trying to 
> optimise it.

 Can you please be a bit more elaborate on how the wrong code sequence 
looks like, why it is produced like that and how your fix changes it?  
Without seeing examples of code generated it is very hard to imagine 
what is really going on here.

  Maciej
Andrew Bennett Nov. 18, 2014, 2:34 p.m. UTC | #2
> -----Original Message-----
> From: Maciej W. Rozycki [mailto:macro@codesourcery.com]
> Sent: 18 November 2014 13:48
> To: Andrew Bennett
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] If using branch likelies in MIPS sync code fill the delay
> slot with a nop
> 
> On Tue, 18 Nov 2014, Andrew Bennett wrote:
> 
> > The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when
> > using the -mfix-r10000 option.  This is due to the fact that the delay
> > slot of the branch instruction that checks if the atomic operation was
> > not successful can be filled with an operation that returns the output
> > result.  For the branch likely case this operation can not go in the
> > delay slot because it wont execute when the atomic operation was
> > successful and therefore the wrong result will be returned.  This patch
> > forces a nop to be placed in the delay slot if a branch likely is used.
> > The fix is as simple as possible; it does cause a second nop to be
> introduced
> > after the branch instruction in the branch likely case.  As this option is
> > rarely used, it makes more sense to keep the code readable than trying to
> > optimise it.
> 
>  Can you please be a bit more elaborate on how the wrong code sequence
> looks like, why it is produced like that and how your fix changes it?
> Without seeing examples of code generated it is very hard to imagine
> what is really going on here.

Ok if we compile the following cut-down atomic-op-3.c test case with
-mfix-r10000.

extern void abort(void);

int v, count, res;
int
main ()
{
  v = 0;
  count = 1;

  if (__atomic_add_fetch (&v, count, __ATOMIC_RELAXED) != 1)
    abort ();

  return 0;
}

The following assembly code is produced for the atomic operation:

        .set    noat
        sync     # 15   sync_new_addsi/2        [length = 24]
1:
        ll      $3,0($4)
        addu    $1,$3,$2
        sc      $1,0($4)
        beql    $1,$0,1b
        addu    $3,$3,$2
        sync
        .set    at


Notice here that the addu is in the delay slot of the branch likely instruction.
This is computing the value that exists in the memory location after the atomic 
operation has completed.  Because it is a branch likely instruction
this will only run when the atomic operation fails, and hence when it is
successful it will not return the correct value.


The offending code is in mips_process_sync_loop:

mips_multi_add_insn ("beq%?\t%0,%.,1b", at, NULL);
 
  /* if (INSN1 != MOVE && INSN1 != LI) NEWVAL = $TMP3 [delay slot].  */
  if (insn1 != SYNC_INSN1_MOVE && insn1 != SYNC_INSN1_LI && tmp3 != newval)
    {
      mips_multi_copy_insn (tmp3_insn);
      mips_multi_set_operand (mips_multi_last_index (), 0, newval);
    }
  else if (!(required_oldval && cmp))
    mips_multi_add_insn ("nop", NULL);
  
  /* CMP = 1 -- either standalone or in a delay slot.  */
  if (required_oldval && cmp)
    mips_multi_add_insn ("li\t%0,1", cmp, NULL);


For the branch likely case the delay slot could be filled either with the
operation to compute the value that exists in memory after the atomic operation
has completed; an operation to return if the atomic operation is successful 
or not; or a nop.  The first two operations should not be in the delay slot 
of the branch if it is a branch likely because they will only run if the atomic 
operation was unsuccessful.

My fix places a nop in the delay slot of the branch likely instruction
by using the %~ output operation.  This then causes the sync code for the 
previous example to be correct:

        .set    noat
        sync     # 15   sync_new_addsi/2        [length = 24]
1:
        ll      $3,0($4)
        addu    $1,$3,$2
        sc      $1,0($4)
        beql    $1,$0,1b
        nop
        addu    $3,$3,$2
        sync
        .set    at



Regards,



Andrew
Matthew Fortune Nov. 18, 2014, 2:42 p.m. UTC | #3
> The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing when
> using the -mfix-r10000 option.  This is due to the fact that the delay
> slot of the branch instruction that checks if the atomic operation was
> not successful can be filled with an operation that returns the output
> result.  For the branch likely case this operation can not go in the
> delay slot because it wont execute when the atomic operation was
> successful and therefore the wrong result will be returned.  This patch
> forces a nop to be placed in the delay slot if a branch likely is used.
> The fix is as simple as possible; it does cause a second nop to be
> introduced
> after the branch instruction in the branch likely case.  As this option is
> rarely used, it makes more sense to keep the code readable than trying to
> optimise it.
> 
> The atomic tests now pass successfully.  The ChangeLog and patch are
> below.
> 
> Ok to commit?

I'm OK with just making the fix-r10000 case safe rather than also
complicating the normal code path to remove the normal delay slot NOP in
this special case.  I'd like to see what Catherine thinks though.  To
remove the second NOP I believe we would have to add a !TARGET_FIX_R10000
to the condition of the normal delay slot NOP. This seems a bit convoluted
when the real reason is because branch likely is in use.  To make use of
the mips_branch_likely flag it would have to be set earlier in:
mips_output_sync_loop and also get set in mips_sync_loop_insns.

If Catherine thinks getting rid of the second NOP is important enough then
please base it on mips_branch_likely and fix the callers.

> gcc/
> 	* config/mips/mips.c (mips_process_sync_loop): Place a nop in the
> 	delay slot of the branch likely instruction.
> 
> 
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 02268f3..6dd3728 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx
> *operands)
>       This will sometimes be a delayed branch; see the write code below
>       for details.  */
>    mips_multi_add_insn (is_64bit_p ? "scd\t%0,%1" : "sc\t%0,%1", at, mem,
> NULL);
> -  mips_multi_add_insn ("beq%?\t%0,%.,1b", at, NULL);
> +
> +  /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the delay
> slot
> +     of the branch if it is a branch likely because they will not execute
> when
> +     the atomic operation is successful, so place a NOP in there instead.
> */
> +

I found the comment hard to digest, perhaps:

/* When using branch likely (-mfix-r10000), the delay slot instruction will
   be annulled on false.  The normal delay slot instructions calculate the
   overall result of the atomic operation and must not be annulled.
   Unconditionally use a NOP instead for the branch likely case.  */

> +  mips_multi_add_insn ("beq%?\t%0,%.,1b%~", at, NULL);
> 
>    /* if (INSN1 != MOVE && INSN1 != LI) NEWVAL = $TMP3 [delay slot].  */
>    if (insn1 != SYNC_INSN1_MOVE && insn1 != SYNC_INSN1_LI && tmp3 !=
> newval)

Matthew
Maciej W. Rozycki Nov. 18, 2014, 2:56 p.m. UTC | #4
On Tue, 18 Nov 2014, Andrew Bennett wrote:

> My fix places a nop in the delay slot of the branch likely instruction
> by using the %~ output operation.  This then causes the sync code for the 
> previous example to be correct:
> 
>         .set    noat
>         sync     # 15   sync_new_addsi/2        [length = 24]
> 1:
>         ll      $3,0($4)
>         addu    $1,$3,$2
>         sc      $1,0($4)
>         beql    $1,$0,1b
>         nop
>         addu    $3,$3,$2
>         sync
>         .set    at

 OK, this does look to me like the correct way to address the issue, but 
where is the second NOP that you previously mentioned?  I fail to see it 
here and this code can't be made any better, there isn't anything you 
could possibly schedule into the delay slot as there is nothing else to 
do in this loop.

  Maciej
Andrew Bennett Nov. 18, 2014, 3:18 p.m. UTC | #5
>  OK, this does look to me like the correct way to address the issue, but
> where is the second NOP that you previously mentioned?  I fail to see it
> here and this code can't be made any better, there isn't anything you
> could possibly schedule into the delay slot as there is nothing else to
> do in this loop.

The following testcase shows this occurring.

short v, count, ret;

int
main ()
{
  v = 0;
  count = 0;

  __atomic_exchange_n (&v, count + 1, __ATOMIC_RELAXED);

  return 0;
}

Produces (for the atomic operation):

       .set    noat
        sync
1:
        ll      $3,0($5)
        and     $1,$3,$4
        bne     $1,$7,2f
        and     $1,$3,$6
        or      $1,$1,$8
        sc      $1,0($5)
        beql    $1,$0,1b
        nop
        nop
        sync
2:
        .set    at


Regards,



Andrew
Maciej W. Rozycki Nov. 18, 2014, 4:05 p.m. UTC | #6
On Tue, 18 Nov 2014, Andrew Bennett wrote:

> Produces (for the atomic operation):
> 
>        .set    noat
>         sync
> 1:
>         ll      $3,0($5)
>         and     $1,$3,$4
>         bne     $1,$7,2f
>         and     $1,$3,$6
>         or      $1,$1,$8
>         sc      $1,0($5)
>         beql    $1,$0,1b
>         nop
>         nop
>         sync
> 2:
>         .set    at

 With a quick look at `mips_process_sync_loop' it looks to me the
other NOP is produced from here:

  else if (!(required_oldval && cmp))
    mips_multi_add_insn ("nop", NULL);

-- correct?  If so, then can't you just make it:

  else if (!(required_oldval && cmp) && !TARGET_FIX_R10000)
    mips_multi_add_insn ("nop", NULL);

instead?  It appears plain and simple, and if you're concerned about it 
being unobvious to the casual reader, then add a one-line comment or 
suchlike.  It's not like TARGET_FIX_R10000 is going to imply anything 
else about branches in the future (and r6 code won't run on an R10k 
anyway).

  Maciej
Andrew Bennett Nov. 18, 2014, 4:13 p.m. UTC | #7
> From: Maciej W. Rozycki [mailto:macro@codesourcery.com]
> 
> On Tue, 18 Nov 2014, Andrew Bennett wrote:
> 
> > Produces (for the atomic operation):
> >
> >        .set    noat
> >         sync
> > 1:
> >         ll      $3,0($5)
> >         and     $1,$3,$4
> >         bne     $1,$7,2f
> >         and     $1,$3,$6
> >         or      $1,$1,$8
> >         sc      $1,0($5)
> >         beql    $1,$0,1b
> >         nop
> >         nop
> >         sync
> > 2:
> >         .set    at
> 
>  With a quick look at `mips_process_sync_loop' it looks to me the
> other NOP is produced from here:
> 
>   else if (!(required_oldval && cmp))
>     mips_multi_add_insn ("nop", NULL);
> 
> -- correct?  If so, then can't you just make it:

Correct.

> 
>   else if (!(required_oldval && cmp) && !TARGET_FIX_R10000)
>     mips_multi_add_insn ("nop", NULL);
> 
> instead?  It appears plain and simple, and if you're concerned about it
> being unobvious to the casual reader, then add a one-line comment or
> suchlike.  It's not like TARGET_FIX_R10000 is going to imply anything
> else about branches in the future (and r6 code won't run on an R10k
> anyway).

True.  Actually this is what my original patch had in it, but Matthew and I
decided to leave it out for the initial submission.  I am happy to add
it back in, and provide a comment to explain what is happening. 
 
Catherine are you happy with this?


Regards,


Andrew
Matthew Fortune Nov. 18, 2014, 8:14 p.m. UTC | #8
> > From: Maciej W. Rozycki [mailto:macro@codesourcery.com]
> >
> > On Tue, 18 Nov 2014, Andrew Bennett wrote:
> >
> > > Produces (for the atomic operation):
> > >
> > >        .set    noat
> > >         sync
> > > 1:
> > >         ll      $3,0($5)
> > >         and     $1,$3,$4
> > >         bne     $1,$7,2f
> > >         and     $1,$3,$6
> > >         or      $1,$1,$8
> > >         sc      $1,0($5)
> > >         beql    $1,$0,1b
> > >         nop
> > >         nop
> > >         sync
> > > 2:
> > >         .set    at
> >
> >  With a quick look at `mips_process_sync_loop' it looks to me the
> > other NOP is produced from here:
> >
> >   else if (!(required_oldval && cmp))
> >     mips_multi_add_insn ("nop", NULL);
> >
> > -- correct?  If so, then can't you just make it:
> 
> Correct.
> 
> >
> >   else if (!(required_oldval && cmp) && !TARGET_FIX_R10000)
> >     mips_multi_add_insn ("nop", NULL);
> >
> > instead?  It appears plain and simple, and if you're concerned about
> > it being unobvious to the casual reader, then add a one-line comment
> > or suchlike.  It's not like TARGET_FIX_R10000 is going to imply
> > anything else about branches in the future (and r6 code won't run on
> > an R10k anyway).

I'm afraid I disagree about it being plain and simple even with a comment.
The amount of code in the backend that makes assumptions based on derived
variables is very high and it makes the code hard to read (especially w.r.t.
branches).  I'm OK with adding the code to avoid the extra NOP but have it
based on mips_branch_likely and fix the callers so that it is set
appropriately.

Thanks,
Matthew
Maciej W. Rozycki Nov. 19, 2014, 12:53 a.m. UTC | #9
On Tue, 18 Nov 2014, Matthew Fortune wrote:

> > >  With a quick look at `mips_process_sync_loop' it looks to me the
> > > other NOP is produced from here:
> > >
> > >   else if (!(required_oldval && cmp))
> > >     mips_multi_add_insn ("nop", NULL);
> > >
> > > -- correct?  If so, then can't you just make it:
> > 
> > Correct.
> > 
> > >
> > >   else if (!(required_oldval && cmp) && !TARGET_FIX_R10000)
> > >     mips_multi_add_insn ("nop", NULL);
> > >
> > > instead?  It appears plain and simple, and if you're concerned about
> > > it being unobvious to the casual reader, then add a one-line comment
> > > or suchlike.  It's not like TARGET_FIX_R10000 is going to imply
> > > anything else about branches in the future (and r6 code won't run on
> > > an R10k anyway).
> 
> I'm afraid I disagree about it being plain and simple even with a comment.
> The amount of code in the backend that makes assumptions based on derived
> variables is very high and it makes the code hard to read (especially w.r.t.
> branches).  I'm OK with adding the code to avoid the extra NOP but have it
> based on mips_branch_likely and fix the callers so that it is set
> appropriately.

 Sure, fine with me too.  It looks to me it'll be more natural even to 
have `mips_branch_likely' preset on entering `mips_process_sync_loop'.

  Maciej
Moore, Catherine Nov. 19, 2014, 1:32 p.m. UTC | #10
> -----Original Message-----
> From: Matthew Fortune [mailto:Matthew.Fortune@imgtec.com]
> Sent: Tuesday, November 18, 2014 9:42 AM
> To: Andrew Bennett; gcc-patches@gcc.gnu.org
> Cc: Moore, Catherine; Rozycki, Maciej
> Subject: RE: [PATCH] If using branch likelies in MIPS sync code fill the delay
> slot with a nop
> 
> > The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing
> > when using the -mfix-r10000 option.  This is due to the fact that the
> > delay slot of the branch instruction that checks if the atomic
> > operation was not successful can be filled with an operation that
> > returns the output result.  For the branch likely case this operation
> > can not go in the delay slot because it wont execute when the atomic
> > operation was successful and therefore the wrong result will be
> > returned.  This patch forces a nop to be placed in the delay slot if a branch
> likely is used.
> > The fix is as simple as possible; it does cause a second nop to be
> > introduced after the branch instruction in the branch likely case.  As
> > this option is rarely used, it makes more sense to keep the code
> > readable than trying to optimise it.
> >
> > The atomic tests now pass successfully.  The ChangeLog and patch are
> > below.
> >
> > Ok to commit?
> 
> I'm OK with just making the fix-r10000 case safe rather than also complicating
> the normal code path to remove the normal delay slot NOP in this special
> case.  I'd like to see what Catherine thinks though.  To remove the second
> NOP I believe we would have to add a !TARGET_FIX_R10000 to the condition
> of the normal delay slot NOP. This seems a bit convoluted when the real
> reason is because branch likely is in use.  To make use of the
> mips_branch_likely flag it would have to be set earlier in:
> mips_output_sync_loop and also get set in mips_sync_loop_insns.
> 
> If Catherine thinks getting rid of the second NOP is important enough then
> please base it on mips_branch_likely and fix the callers.
> 
  Yes, removing the second NOP is worth the additional effort.
diff mbox

Patch

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 02268f3..6dd3728 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -12997,7 +12997,12 @@  mips_process_sync_loop (rtx_insn *insn, rtx *operands)
      This will sometimes be a delayed branch; see the write code below
      for details.  */
   mips_multi_add_insn (is_64bit_p ? "scd\t%0,%1" : "sc\t%0,%1", at, mem, NULL);
-  mips_multi_add_insn ("beq%?\t%0,%.,1b", at, NULL);
+
+  /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the delay slot
+     of the branch if it is a branch likely because they will not execute when
+     the atomic operation is successful, so place a NOP in there instead.  */
+
+  mips_multi_add_insn ("beq%?\t%0,%.,1b%~", at, NULL);
 
   /* if (INSN1 != MOVE && INSN1 != LI) NEWVAL = $TMP3 [delay slot].  */
   if (insn1 != SYNC_INSN1_MOVE && insn1 != SYNC_INSN1_LI && tmp3 != newval)