diff mbox

RTL cprop vs. fixed hard regs

Message ID 20150116094227.GV23768@bubble.grove.modra.org
State New
Headers show

Commit Message

Alan Modra Jan. 16, 2015, 9:42 a.m. UTC
https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-December/123776.html
shows gcc-5 miscompiling a powerpc64 linux kernel.  The executive
summary is that the rs6000 backend has a bug in its RTL description of
indirect calls.  We specify a parallel containing both the actual call
and an action that happens after the call, the restore of r2.  The
restore is simply a memory load:
            (set (reg:DI 2 2)
                (mem/v/c:DI (plus:DI (reg/f:DI 1 1)
                        (const_int 40 [0x28])) [0  S8 A8]))
This leads to cprop concluding that it is valid to replace the
reference to r1 with another register having the same value before the
call.  Unfortunately, sometimes a call-clobbered register is chosen.

OK, so we need to fix this in the rs6000 backend, but it occurs to me
that cprop also has a bug here.  It shouldn't be touching fixed hard
registers.  Bootstrapped and regression tested powerpc64-linux.  OK
for mainline?

	* cprop.c (do_local_cprop): Disallow replacement of fixed
	hard registers.

Comments

Jeff Law Jan. 16, 2015, 4:35 p.m. UTC | #1
On 01/16/15 02:42, Alan Modra wrote:
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-December/123776.html
> shows gcc-5 miscompiling a powerpc64 linux kernel.  The executive
> summary is that the rs6000 backend has a bug in its RTL description of
> indirect calls.  We specify a parallel containing both the actual call
> and an action that happens after the call, the restore of r2.  The
> restore is simply a memory load:
>              (set (reg:DI 2 2)
>                  (mem/v/c:DI (plus:DI (reg/f:DI 1 1)
>                          (const_int 40 [0x28])) [0  S8 A8]))
> This leads to cprop concluding that it is valid to replace the
> reference to r1 with another register having the same value before the
> call.  Unfortunately, sometimes a call-clobbered register is chosen.
>
> OK, so we need to fix this in the rs6000 backend, but it occurs to me
> that cprop also has a bug here.  It shouldn't be touching fixed hard
> registers.  Bootstrapped and regression tested powerpc64-linux.  OK
> for mainline?
>
> 	* cprop.c (do_local_cprop): Disallow replacement of fixed
> 	hard registers.
OK.  Extra credit for a testcase, ppc specific is obviously OK.

jeff
Segher Boessenkool Jan. 16, 2015, 5:03 p.m. UTC | #2
On Fri, Jan 16, 2015 at 08:12:27PM +1030, Alan Modra wrote:
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-December/123776.html
> shows gcc-5 miscompiling a powerpc64 linux kernel.  The executive
> summary is that the rs6000 backend has a bug in its RTL description of
> indirect calls.  We specify a parallel containing both the actual call
> and an action that happens after the call, the restore of r2.  The
> restore is simply a memory load:
>             (set (reg:DI 2 2)
>                 (mem/v/c:DI (plus:DI (reg/f:DI 1 1)
>                         (const_int 40 [0x28])) [0  S8 A8]))
> This leads to cprop concluding that it is valid to replace the
> reference to r1 with another register having the same value before the
> call.  Unfortunately, sometimes a call-clobbered register is chosen.
> 
> OK, so we need to fix this in the rs6000 backend, but it occurs to me
> that cprop also has a bug here.  It shouldn't be touching fixed hard
> registers.

Why not?  It cannot allocate a fixed reg to a pseudo, but other than
that there is nothing special about fixed regs; the transform is
perfectly valid as far as I see.

It isn't a desirable transform in this case, but that is not true for
fixed regs in general (just because the stack pointer is live everywhere).


Segher
Alan Modra Jan. 17, 2015, 12:37 a.m. UTC | #3
On Fri, Jan 16, 2015 at 11:03:24AM -0600, Segher Boessenkool wrote:
> On Fri, Jan 16, 2015 at 08:12:27PM +1030, Alan Modra wrote:
> > OK, so we need to fix this in the rs6000 backend, but it occurs to me
> > that cprop also has a bug here.  It shouldn't be touching fixed hard
> > registers.
> 
> Why not?  It cannot allocate a fixed reg to a pseudo, but other than
> that there is nothing special about fixed regs; the transform is
> perfectly valid as far as I see.

I didn't say that copying to a pseudo and using that was invalid..
The bug I see is a mis-optimisation.  Also, the asm operands case that
do_local_cprop already rules out changing is very similar to fixed
regs.  Would you argue that changing asm operands is also valid?  :)

> It isn't a desirable transform in this case, but that is not true for
> fixed regs in general (just because the stack pointer is live everywhere).

What's the point in extending the lifetime of some pseudo when you
know the original fixed register is available everywhere?  Do you have
some concrete example in mind where this "optimisation" is beneficial?

Some ports even include pc in fixed_regs.  So there are obvious
examples where regs in fixed_regs change behind the compiler's back.
Naive users might even expect to see the "current" value of those
regs.  (Again, I'm not saying that it is invalid if gcc substituted an
older value.)
Alan Modra Jan. 17, 2015, 12:46 a.m. UTC | #4
On Fri, Jan 16, 2015 at 09:35:16AM -0700, Jeff Law wrote:
> On 01/16/15 02:42, Alan Modra wrote:
> >	* cprop.c (do_local_cprop): Disallow replacement of fixed
> >	hard registers.
> OK.  Extra credit for a testcase, ppc specific is obviously OK.

Thanks.  Committed revision 219786.  I'll see if I can come up with a
reasonable testcase.
Segher Boessenkool Jan. 17, 2015, 2:09 a.m. UTC | #5
On Sat, Jan 17, 2015 at 11:07:12AM +1030, Alan Modra wrote:
> On Fri, Jan 16, 2015 at 11:03:24AM -0600, Segher Boessenkool wrote:
> > On Fri, Jan 16, 2015 at 08:12:27PM +1030, Alan Modra wrote:
> > > OK, so we need to fix this in the rs6000 backend, but it occurs to me
> > > that cprop also has a bug here.  It shouldn't be touching fixed hard
> > > registers.
> > 
> > Why not?  It cannot allocate a fixed reg to a pseudo, but other than
> > that there is nothing special about fixed regs; the transform is
> > perfectly valid as far as I see.
> 
> I didn't say that copying to a pseudo and using that was invalid..
> The bug I see is a mis-optimisation.

Ah, okay, good :-)

This same mis-optimisation would happen if r1 was just some regular
non-fixed register, hrm.  Maybe something else in cprop needs some
tuning up?

> Also, the asm operands case that
> do_local_cprop already rules out changing is very similar to fixed
> regs.  Would you argue that changing asm operands is also valid?  :)

A fixed reg in an asm_operands is a hard reg; a hard reg in an asm_operands
(before reload) is a register asm variable.  And we had better not change
register variable asm arguments, since that is what we promise not to do
with register variables.  The case is not similar at all.

> > It isn't a desirable transform in this case, but that is not true for
> > fixed regs in general (just because the stack pointer is live everywhere).
> 
> What's the point in extending the lifetime of some pseudo when you
> know the original fixed register is available everywhere?

That is my point: _if_ you know it is live all the time, or if there is no
advantage to shortening the lifetime of the value in that fixed reg, then
yes we should not propagate that value.  But that is not true for all
fixed regs.

> Do you have
> some concrete example in mind where this "optimisation" is beneficial?

The CA_REG in rs6000 is a fixed register.  It isn't a terribly good
example because it cannot be propagated anyway, for other reasons; but
it will hopefully help explain my point.  So please pretend we can copy
it to GPRs :-)  [ The situation with the T bit on SH is similar, but I
don't know the details there well enough. ]

There is only one CA_REG.  It is used in quite a few sequences.  It
contains a totally different value every time.  Because there is only
one such register the instruction sequences around it cannot be reordered
very well.

Propagating the value in such a not-so-very-fixed fixed reg helps reduce
the lifetimes of the values in those regs, helps reordering, combining,
scheduling, performance in general.

If you are only concerned about the stack pointer, you could just check
for that?  But please add a comment in any case, saying why you exclude
it (and ideally don't lump it in with tests that are needed for
correctness).

Cheers,


Segher
Alan Modra Jan. 17, 2015, 2:44 a.m. UTC | #6
On Fri, Jan 16, 2015 at 08:09:51PM -0600, Segher Boessenkool wrote:
> On Sat, Jan 17, 2015 at 11:07:12AM +1030, Alan Modra wrote:
> > On Fri, Jan 16, 2015 at 11:03:24AM -0600, Segher Boessenkool wrote:
> > > On Fri, Jan 16, 2015 at 08:12:27PM +1030, Alan Modra wrote:
> > > > OK, so we need to fix this in the rs6000 backend, but it occurs to me
> > > > that cprop also has a bug here.  It shouldn't be touching fixed hard
> > > > registers.
> > > 
> > > Why not?  It cannot allocate a fixed reg to a pseudo, but other than
> > > that there is nothing special about fixed regs; the transform is
> > > perfectly valid as far as I see.
> > 
> > I didn't say that copying to a pseudo and using that was invalid..
> > The bug I see is a mis-optimisation.
> 
> Ah, okay, good :-)
> 
> This same mis-optimisation would happen if r1 was just some regular
> non-fixed register, hrm.  Maybe something else in cprop needs some
> tuning up?

Well, if the original pseudo register dies earlier as a result of
substituting a copy then you've gained.

> > Also, the asm operands case that
> > do_local_cprop already rules out changing is very similar to fixed
> > regs.  Would you argue that changing asm operands is also valid?  :)
> 
> A fixed reg in an asm_operands is a hard reg; a hard reg in an asm_operands
> (before reload) is a register asm variable.  And we had better not change
> register variable asm arguments, since that is what we promise not to do
> with register variables.  The case is not similar at all.

The similarity I see is that we have a hard reg that is a register asm
variable here too.  How else do you get a copy from a fixed hard reg
to a pseudo?  Hrrm, maybe some backend code does that sort of thing.

> > > It isn't a desirable transform in this case, but that is not true for
> > > fixed regs in general (just because the stack pointer is live everywhere).
> > 
> > What's the point in extending the lifetime of some pseudo when you
> > know the original fixed register is available everywhere?
> 
> That is my point: _if_ you know it is live all the time, or if there is no
> advantage to shortening the lifetime of the value in that fixed reg, then
> yes we should not propagate that value.  But that is not true for all
> fixed regs.
> 
> > Do you have
> > some concrete example in mind where this "optimisation" is beneficial?
> 
> The CA_REG in rs6000 is a fixed register.  It isn't a terribly good
> example because it cannot be propagated anyway, for other reasons; but
> it will hopefully help explain my point.  So please pretend we can copy
> it to GPRs :-)  [ The situation with the T bit on SH is similar, but I
> don't know the details there well enough. ]
> 
> There is only one CA_REG.  It is used in quite a few sequences.  It
> contains a totally different value every time.  Because there is only
> one such register the instruction sequences around it cannot be reordered
> very well.
> 
> Propagating the value in such a not-so-very-fixed fixed reg helps reduce
> the lifetimes of the values in those regs, helps reordering, combining,
> scheduling, performance in general.
> 
> If you are only concerned about the stack pointer, you could just check
> for that?  But please add a comment in any case, saying why you exclude
> it (and ideally don't lump it in with tests that are needed for
> correctness).

No, I don't want to special-case sp.  That's horrible.  If the patch I
just committed is wrong, I'll revert it.
Richard Biener Jan. 17, 2015, 12:12 p.m. UTC | #7
On January 17, 2015 1:37:12 AM CET, Alan Modra <amodra@gmail.com> wrote:
>On Fri, Jan 16, 2015 at 11:03:24AM -0600, Segher Boessenkool wrote:
>> On Fri, Jan 16, 2015 at 08:12:27PM +1030, Alan Modra wrote:
>> > OK, so we need to fix this in the rs6000 backend, but it occurs to
>me
>> > that cprop also has a bug here.  It shouldn't be touching fixed
>hard
>> > registers.
>> 
>> Why not?  It cannot allocate a fixed reg to a pseudo, but other than
>> that there is nothing special about fixed regs; the transform is
>> perfectly valid as far as I see.
>
>I didn't say that copying to a pseudo and using that was invalid..
>The bug I see is a mis-optimisation.  Also, the asm operands case that
>do_local_cprop already rules out changing is very similar to fixed
>regs.  Would you argue that changing asm operands is also valid?  :)
>
>> It isn't a desirable transform in this case, but that is not true for
>> fixed regs in general (just because the stack pointer is live
>everywhere).
>
>What's the point in extending the lifetime of some pseudo when you
>know the original fixed register is available everywhere?  Do you have
>some concrete example in mind where this "optimisation" is beneficial?
>
>Some ports even include pc in fixed_regs.  So there are obvious
>examples where regs in fixed_regs change behind the compiler's back.
>Naive users might even expect to see the "current" value of those
>regs.  (Again, I'm not saying that it is invalid if gcc substituted an
>older value.)

Just to add, we avoid doing this on the GIMPLE level as well.

Richard.
diff mbox

Patch

Index: gcc/cprop.c
===================================================================
--- gcc/cprop.c	(revision 219662)
+++ gcc/cprop.c	(working copy)
@@ -1189,10 +1189,12 @@  do_local_cprop (rtx x, rtx_insn *insn)
   rtx newreg = NULL, newcnst = NULL;
 
   /* Rule out USE instructions and ASM statements as we don't want to
-     change the hard registers mentioned.  */
+     change the hard registers mentioned, and don't change fixed hard
+     registers.  */
   if (REG_P (x)
       && (REGNO (x) >= FIRST_PSEUDO_REGISTER
           || (GET_CODE (PATTERN (insn)) != USE
+	      && !fixed_regs[REGNO (x)]
 	      && asm_noperands (PATTERN (insn)) < 0)))
     {
       cselib_val *val = cselib_lookup (x, GET_MODE (x), 0, VOIDmode);