diff mbox

Fix pr69012 ICE on building libgfortran for mips

Message ID HE1PR07MB0905CFE292AF5D8E25959E93E4FD0@HE1PR07MB0905.eurprd07.prod.outlook.com
State New
Headers show

Commit Message

Bernd Edlinger Dec. 30, 2015, 8:21 p.m. UTC
Hi,

On 30.12.2015 15:31, Richard Sandiford wrote:
> I think the problem is deeper than that though. The instructions that 
> are triggering the ICE are only generated by the prologue, so this 
> means that we're trying to lay out the frame again after the prologue 
> has been generated, whereas it really needs to be fixed by then. (And 
> even if recalculating it is a no-op, the operation is still too 
> expensive to be repeated lightly.) Query functions like 
> rtx_addr_can_trap_p(_1) shouldn't really be changing or recalculating 
> the frame layout or other global state. I think we need to find a 
> different way of getting the information. Maybe reload/LRA should use 
> its own structures to calculate the range of "safe" stack and hfp 
> offsets, then store them in crtl for rtx_addr_can_trap_p to use. AIUI, 
> before reload, the only non-trapping uses of sp should be for outgoing 
> arguments, so we can use a test based on the cumulative outgoing 
> arguments size. I don't think the hfp should be used at all before 
> reload, so we could conservatively return -1 for that case. Thanks, 
> Richard 


Yes, I agree, it _should_ be a no-op, but given the complexity of 
mips_compute_frame_info it is probably better to use cached values after 
reload completed.

Before reload_completed, rtx_addr_can_trap_p is just trying to do an 
initial guess on the stack layout, it may well be possible that some 
important information is still missing here.  We will never call the 
target callback from here, before reload_completed.  It is however not 
completely impossible that rtx_addr_can_trap_p is called while 
lra/reload is already renaming the registers, see PR66614.

Of course the required information is already cached in 
cfun->machine->frame, but rtx_addr_can_trap_p can't use the information 
without a target callback.  And I just want to avoid to invent a new 
callback for that.

I looked at how other targets handle this situation and found that most 
of them either have a more simple frame layout function or they keep the 
previously cached values if they are ever called after reload_completed.

I found, previously mips_compute_frame_info was always called with 
reload_completed=false, either from mips_frame_pointer_required or from 
mips_initial_elimination_offset.  Reload seems to call the 
initial_elimination_offset repeatedly and seems to _expect_ the values 
to change from time to time.  However it does not seem to expect the 
result from mips_frame_pointer_required to change after reload/LRA has 
started.  But as it seems, that may be possible for TARGET_MIPS16 if the 
frame size happens to be very close to 0x7FFF. Well that could be 
another story.

Anyways when mips_compute_frame_info is called with 
reload_completed=true, we can as well use the cached values, as they 
were immediately before reload_completed.

So what do you think of my new version of the patch?


Thanks
Bernd.
2015-12-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR target/69012
	* config/mips/mips.c (mips_compute_frame_info): Don't change the
	frame info after reload completed.

Comments

Matthew Fortune Jan. 5, 2016, 9:09 a.m. UTC | #1
Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
> Hi,
> 
> On 30.12.2015 15:31, Richard Sandiford wrote:
> > I think the problem is deeper than that though. The instructions that
> > are triggering the ICE are only generated by the prologue, so this
> > means that we're trying to lay out the frame again after the prologue
> > has been generated, whereas it really needs to be fixed by then. (And
> > even if recalculating it is a no-op, the operation is still too
> > expensive to be repeated lightly.) Query functions like
> > rtx_addr_can_trap_p(_1) shouldn't really be changing or recalculating
> > the frame layout or other global state. I think we need to find a
> > different way of getting the information. Maybe reload/LRA should use
> > its own structures to calculate the range of "safe" stack and hfp
> > offsets, then store them in crtl for rtx_addr_can_trap_p to use. AIUI,
> > before reload, the only non-trapping uses of sp should be for outgoing
> > arguments, so we can use a test based on the cumulative outgoing
> > arguments size. I don't think the hfp should be used at all before
> > reload, so we could conservatively return -1 for that case. Thanks,
> > Richard
> 
> 
> Yes, I agree, it _should_ be a no-op, but given the complexity of
> mips_compute_frame_info it is probably better to use cached values after
> reload completed.
> 
> Before reload_completed, rtx_addr_can_trap_p is just trying to do an
> initial guess on the stack layout, it may well be possible that some
> important information is still missing here.  We will never call the
> target callback from here, before reload_completed.  It is however not
> completely impossible that rtx_addr_can_trap_p is called while
> lra/reload is already renaming the registers, see PR66614.
> 
> Of course the required information is already cached in
> cfun->machine->frame, but rtx_addr_can_trap_p can't use the information
> without a target callback.  And I just want to avoid to invent a new
> callback for that.
> 
> I looked at how other targets handle this situation and found that most
> of them either have a more simple frame layout function or they keep the
> previously cached values if they are ever called after reload_completed.
> 
> I found, previously mips_compute_frame_info was always called with
> reload_completed=false, either from mips_frame_pointer_required or from
> mips_initial_elimination_offset.  Reload seems to call the
> initial_elimination_offset repeatedly and seems to _expect_ the values
> to change from time to time.  However it does not seem to expect the
> result from mips_frame_pointer_required to change after reload/LRA has
> started.  But as it seems, that may be possible for TARGET_MIPS16 if the
> frame size happens to be very close to 0x7FFF. Well that could be
> another story.
> 
> Anyways when mips_compute_frame_info is called with
> reload_completed=true, we can as well use the cached values, as they
> were immediately before reload_completed.
> 
> So what do you think of my new version of the patch?

Hi Bernd,

I don't see any problem with this change from a MIPS backend perspective.
As to whether rtx_addr_can_trap_p should be using initial_elimination_offset
then I don't have any particular opinion.

Richard: Any objections to using this fix so we can solve the PR leaving
debate on the original rtx_addr_can_trap_p as a separate issue?

Thanks,
Matthew
Richard Sandiford Jan. 5, 2016, 8:41 p.m. UTC | #2
Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>> On 30.12.2015 15:31, Richard Sandiford wrote:
>> > I think the problem is deeper than that though. The instructions that
>> > are triggering the ICE are only generated by the prologue, so this
>> > means that we're trying to lay out the frame again after the prologue
>> > has been generated, whereas it really needs to be fixed by then. (And
>> > even if recalculating it is a no-op, the operation is still too
>> > expensive to be repeated lightly.) Query functions like
>> > rtx_addr_can_trap_p(_1) shouldn't really be changing or recalculating
>> > the frame layout or other global state. I think we need to find a
>> > different way of getting the information. Maybe reload/LRA should use
>> > its own structures to calculate the range of "safe" stack and hfp
>> > offsets, then store them in crtl for rtx_addr_can_trap_p to use. AIUI,
>> > before reload, the only non-trapping uses of sp should be for outgoing
>> > arguments, so we can use a test based on the cumulative outgoing
>> > arguments size. I don't think the hfp should be used at all before
>> > reload, so we could conservatively return -1 for that case. Thanks,
>> > Richard
>> 
>> 
>> Yes, I agree, it _should_ be a no-op, but given the complexity of
>> mips_compute_frame_info it is probably better to use cached values after
>> reload completed.
>> 
>> Before reload_completed, rtx_addr_can_trap_p is just trying to do an
>> initial guess on the stack layout, it may well be possible that some
>> important information is still missing here.  We will never call the
>> target callback from here, before reload_completed.  It is however not
>> completely impossible that rtx_addr_can_trap_p is called while
>> lra/reload is already renaming the registers, see PR66614.

It sounds like we could then have the same kind of problem when
reload_in_progress though.  I.e. we'd have a predicate function like
rtx_addr_can_trap_p changing global state.  I realise the symptoms
wouldn't be the same as in this PR because the instructions that trigger
the ICE are part of the prologue, but it might show up in other ways.

E.g. LRA (like reload) loops until it reaches a stable stack layout.
If one iteration of the loop does something that changes the stack
layout (and so requires another go-round), calling rtx_addr_can_trap_p
before the loop restarts would alter the global state unexpectedly.
That might trigger an ICE.

If we need rtx_addr_can_trap_p to be as accurate as possible (rather
than conservative) during LRA/reload, I suppose we should calculate
the crtl fields I mentioned during each LRA/reload phase, rather than
at the end.

>> Of course the required information is already cached in
>> cfun->machine->frame, but rtx_addr_can_trap_p can't use the information
>> without a target callback.  And I just want to avoid to invent a new
>> callback for that.

Right.  I don't think we need or want a new callback.  My point was
that reload and LRA already compute the offsets you need, so it seems
better to cache them there than to try to recalculate them in rtlanal.c.

>> I looked at how other targets handle this situation and found that most
>> of them either have a more simple frame layout function or they keep the
>> previously cached values if they are ever called after reload_completed.
>> 
>> I found, previously mips_compute_frame_info was always called with
>> reload_completed=false, either from mips_frame_pointer_required or from
>> mips_initial_elimination_offset.  Reload seems to call the
>> initial_elimination_offset repeatedly and seems to _expect_ the values
>> to change from time to time.

Yeah.  This happens if the first reload attempt fails because there aren't
enough free registers or enough spill slots.  We then update the frame
layout to account for the new spill slots and call-saved register save
slots, then try again.

>> However it does not seem to expect the
>> result from mips_frame_pointer_required to change after reload/LRA has
>> started.  But as it seems, that may be possible for TARGET_MIPS16 if the
>> frame size happens to be very close to 0x7FFF. Well that could be
>> another story.

What kind of case were you thinking of?

>> Anyways when mips_compute_frame_info is called with
>> reload_completed=true, we can as well use the cached values, as they
>> were immediately before reload_completed.
>> 
>> So what do you think of my new version of the patch?
>
> Hi Bernd,
>
> I don't see any problem with this change from a MIPS backend perspective.
> As to whether rtx_addr_can_trap_p should be using initial_elimination_offset
> then I don't have any particular opinion.
>
> Richard: Any objections to using this fix so we can solve the PR leaving
> debate on the original rtx_addr_can_trap_p as a separate issue?

Well, it still feels to me like the patch is papering over the problem.
Like Bernd says, we didn't previously call mips_compute_frame_info with
reload_completed, and I don't think we should be treating the new
behaviour as normal.

Thanks,
Richard
Bernd Edlinger Jan. 6, 2016, 8:52 a.m. UTC | #3
On 05.01.2016 21:41, Richard Sandiford wrote:
> Matthew Fortune <Matthew.Fortune@imgtec.com> writes:
>> Bernd Edlinger <bernd.edlinger@hotmail.de> writes:
>>> Yes, I agree, it _should_ be a no-op, but given the complexity of
>>> mips_compute_frame_info it is probably better to use cached values after
>>> reload completed.
>>>
>>> Before reload_completed, rtx_addr_can_trap_p is just trying to do an
>>> initial guess on the stack layout, it may well be possible that some
>>> important information is still missing here.  We will never call the
>>> target callback from here, before reload_completed.  It is however not
>>> completely impossible that rtx_addr_can_trap_p is called while
>>> lra/reload is already renaming the registers, see PR66614.
> It sounds like we could then have the same kind of problem when
> reload_in_progress though.  I.e. we'd have a predicate function like
> rtx_addr_can_trap_p changing global state.  I realise the symptoms
> wouldn't be the same as in this PR because the instructions that trigger
> the ICE are part of the prologue, but it might show up in other ways.

No, not really.  The logic in rtx_addr_can_trap_p is only dependent on 
reload_completed, when reload_completed is false, it uses a simplified 
stack layout, that only takes crtl->outgoing_args_size and 
get_frame_size () into account.  Only when reload_completed is true, it 
tries to get more accurate information from the 
INITIAL_ELIMINATION_OFFSET, which can easily skip the frame re-layout if 
reload_completed is true.

> E.g. LRA (like reload) loops until it reaches a stable stack layout.
> If one iteration of the loop does something that changes the stack
> layout (and so requires another go-round), calling rtx_addr_can_trap_p
> before the loop restarts would alter the global state unexpectedly.
> That might trigger an ICE.

It is impossible that it can happen again.

>
> If we need rtx_addr_can_trap_p to be as accurate as possible (rather
> than conservative) during LRA/reload, I suppose we should calculate
> the crtl fields I mentioned during each LRA/reload phase, rather than
> at the end.

Sure rtx_addr_can_trap_p is not always 100% accurate, but the data that 
is missing is not only the initial_elimination_offset, as it was used by 
reload, which may be already available with your patch while 
reload_in_progress (but the PR describes a case where the index register 
and the offset does not match, so that would not be fixed. Other than 
these I never observed false positives in the reload pass, so I assume 
that the simplified frame layout is sufficiently accurate though).

What could be improved is this, but it is not really simple: Actually 
most registers keep the same offset relative to each other, argp, fp, 
hfp are constant, but sp is something different. Sometimes, sp can keep 
the same value in the whole body, but even without alloca, sp can become 
different than what the function started with.  For instance we push 
parameters for a function call, and then reload knows that, and can 
still use [sp+x] to access the next value, but rtx_addr_can_trap_p does 
not have access to the insn, and thus can not know that sp is now 
different than at the beginning of the function.  I think the reg_notes 
on the insns should have the necessary data, but from 
rtx_addr_can_trap_p it is impossible to get to the corresponding insn.  
Maybe reload could add some annotations to the sp-relative offsets, to 
make it possible for rtx_addr_can_trap_p to find out, if there is a 
constant offset from the current sp to the initial sp.

>>> Of course the required information is already cached in
>>> cfun->machine->frame, but rtx_addr_can_trap_p can't use the information
>>> without a target callback.  And I just want to avoid to invent a new
>>> callback for that.
> Right.  I don't think we need or want a new callback.  My point was
> that reload and LRA already compute the offsets you need, so it seems
> better to cache them there than to try to recalculate them in rtlanal.c.
>
>>> I looked at how other targets handle this situation and found that most
>>> of them either have a more simple frame layout function or they keep the
>>> previously cached values if they are ever called after reload_completed.
>>>
>>> I found, previously mips_compute_frame_info was always called with
>>> reload_completed=false, either from mips_frame_pointer_required or from
>>> mips_initial_elimination_offset.  Reload seems to call the
>>> initial_elimination_offset repeatedly and seems to _expect_ the values
>>> to change from time to time.
> Yeah.  This happens if the first reload attempt fails because there aren't
> enough free registers or enough spill slots.  We then update the frame
> layout to account for the new spill slots and call-saved register save
> slots, then try again.
>
>>> However it does not seem to expect the
>>> result from mips_frame_pointer_required to change after reload/LRA has
>>> started.  But as it seems, that may be possible for TARGET_MIPS16 if the
>>> frame size happens to be very close to 0x7FFF. Well that could be
>>> another story.
> What kind of case were you thinking of?

I thougt, what if we have a function where the initial frame without any 
spill slots is exactly 0x7FFC bytes, and we have a case where the result 
of mips_frame_pointer_required matters, then it would return false 
initially, but then it tourns out, that we have to update the frame 
layout, and this time mips_frame_pointer_required would return true, but 
it is not called again, right?


Thanks,
Bernd.

>>> Anyways when mips_compute_frame_info is called with
>>> reload_completed=true, we can as well use the cached values, as they
>>> were immediately before reload_completed.
>>>
>>> So what do you think of my new version of the patch?
>> Hi Bernd,
>>
>> I don't see any problem with this change from a MIPS backend perspective.
>> As to whether rtx_addr_can_trap_p should be using initial_elimination_offset
>> then I don't have any particular opinion.
>>
>> Richard: Any objections to using this fix so we can solve the PR leaving
>> debate on the original rtx_addr_can_trap_p as a separate issue?
> Well, it still feels to me like the patch is papering over the problem.
> Like Bernd says, we didn't previously call mips_compute_frame_info with
> reload_completed, and I don't think we should be treating the new
> behaviour as normal.
>
> Thanks,
> Richard
diff mbox

Patch

Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 231954)
+++ gcc/config/mips/mips.c	(working copy)
@@ -10321,6 +10321,10 @@  mips_compute_frame_info (void)
   HOST_WIDE_INT offset, size;
   unsigned int regno, i;
 
+  /* Don't change the frame info after reload completed.  */
+  if (reload_completed)
+    return;
+
   /* Set this function's interrupt properties.  */
   if (mips_interrupt_type_p (TREE_TYPE (current_function_decl)))
     {