diff mbox series

Fix LRA/reload issue with -fnon-call-exceptions

Message ID 5055239.m081rdh9Qn@polaris
State New
Headers show
Series Fix LRA/reload issue with -fnon-call-exceptions | expand

Commit Message

Eric Botcazou Feb. 15, 2019, 10:04 a.m. UTC
Hi,

this is a regression present on all active branches since the controversial 
get_initial_register_offset stuff was added to rtlanal.c some time ago, and 
visible in the testsuite on PowerPC/Linux under the form of gnat.dg/opt73.adb 
timing out at run time.

The problem is that the compiler generates code that doesn't save the frame 
pointer before clobbering it, because rs6000_stack_info computes a wrong final 
(post-reload) stack layout.  The scenario is as follows: LRA decides to use 
the frame pointer, sets reload_completed to 1 at the end and then does:

  /* We've possibly turned single trapping insn into multiple ones.  */
  if (cfun->can_throw_non_call_exceptions)
    {
      auto_sbitmap blocks (last_basic_block_for_fn (cfun));
      bitmap_ones (blocks);
      find_many_sub_basic_blocks (blocks);
    }

But find_many_sub_basic_blocks calls control_flow_insn_p, which in turn can 
call rtx_addr_can_trap_p_1, which can call get_initial_register_offset, which 
uses INITIAL_ELIMINATION_OFFSET, IOW rs6000_initial_elimination_offset, which 
calls rs6000_stack_info.  But at this point the DF information hasn't been 
updated so the frame pointer isn't detected as live by df_regs_ever_live_p.

You may think that the fix is just to set reload_completed to 1 after the 
above code in lra, but that's not sufficient because the same issue can arise 
from the do_reload function:

  if (optimize)
    cleanup_cfg (CLEANUP_EXPENSIVE);

when checking is enabled, because cleanup_cfg can calls control_flow_insn_p 
and then eventually rtx_addr_can_trap_p_1.  In other words, we would need 
to set reload_completed to 1 only after the above code, which is very late.
As a matter of fact, that's not possible for old reload itself because of:

  /* We must set reload_completed now since the cleanup_subreg_operands call
     below will re-recognize each insn and reload may have generated insns
     which are only valid during and after reload.  */
  reload_completed = 1;

So, barring the removal of the get_initial_register_offset stuff, the only 
simple fix is probably to prevent it from calling into the back-end too early, 
for example with the attached fixlet.  Tested on x86-64 and PowerPC/Linux.

Thoughts?  Where do we want to fix this?


	* rtlanal.c (get_initial_register_offset): Fall back to the raw estimate
	as long as the epilogue isn't completed.

Comments

Eric Botcazou Feb. 19, 2019, 9:26 a.m. UTC | #1
> So, barring the removal of the get_initial_register_offset stuff, the only
> simple fix is probably to prevent it from calling into the back-end too
> early, for example with the attached fixlet.  Tested on x86-64 and
> PowerPC/Linux.
> 
> Thoughts?  Where do we want to fix this?
> 
> 
> 	* rtlanal.c (get_initial_register_offset): Fall back to the raw estimate
> 	as long as the epilogue isn't completed.

I have installed it on mainline only for now.
Richard Sandiford Feb. 19, 2019, 12:47 p.m. UTC | #2
Eric Botcazou <ebotcazou@adacore.com> writes:
> Hi,
>
> this is a regression present on all active branches since the controversial 
> get_initial_register_offset stuff was added to rtlanal.c some time ago, and 
> visible in the testsuite on PowerPC/Linux under the form of gnat.dg/opt73.adb 
> timing out at run time.
>
> The problem is that the compiler generates code that doesn't save the frame 
> pointer before clobbering it, because rs6000_stack_info computes a wrong final 
> (post-reload) stack layout.  The scenario is as follows: LRA decides to use 
> the frame pointer, sets reload_completed to 1 at the end and then does:
>
>   /* We've possibly turned single trapping insn into multiple ones.  */
>   if (cfun->can_throw_non_call_exceptions)
>     {
>       auto_sbitmap blocks (last_basic_block_for_fn (cfun));
>       bitmap_ones (blocks);
>       find_many_sub_basic_blocks (blocks);
>     }
>
> But find_many_sub_basic_blocks calls control_flow_insn_p, which in turn can 
> call rtx_addr_can_trap_p_1, which can call get_initial_register_offset, which 
> uses INITIAL_ELIMINATION_OFFSET, IOW rs6000_initial_elimination_offset, which 
> calls rs6000_stack_info.  But at this point the DF information hasn't been 
> updated so the frame pointer isn't detected as live by df_regs_ever_live_p.

Doesn't the frame have to be laid out correctly during the final reload
subpass anyway, in order to get the right elimination choices and
elimination offsets?  If so, I'm not sure what the " && info->reload_completed"
check in rs6000_stack_info achieves.  It seems like that's at least partly
responsible for the problem.

> You may think that the fix is just to set reload_completed to 1 after the 
> above code in lra, but that's not sufficient because the same issue can arise 
> from the do_reload function:
>
>   if (optimize)
>     cleanup_cfg (CLEANUP_EXPENSIVE);
>
> when checking is enabled, because cleanup_cfg can calls control_flow_insn_p 
> and then eventually rtx_addr_can_trap_p_1.  In other words, we would need 
> to set reload_completed to 1 only after the above code, which is very late.
> As a matter of fact, that's not possible for old reload itself because of:
>
>   /* We must set reload_completed now since the cleanup_subreg_operands call
>      below will re-recognize each insn and reload may have generated insns
>      which are only valid during and after reload.  */
>   reload_completed = 1;
>
> So, barring the removal of the get_initial_register_offset stuff, the only 
> simple fix is probably to prevent it from calling into the back-end too early, 
> for example with the attached fixlet.  Tested on x86-64 and PowerPC/Linux.
>
> Thoughts?  Where do we want to fix this?
>
>
> 	* rtlanal.c (get_initial_register_offset): Fall back to the raw estimate
> 	as long as the epilogue isn't completed.

I realise you've already applied it and that you saw it as a hack too,
but this seems like a bit too much.  IMO a cleaner fix would be to define
TARGET_COMPUTE_FRAME_LAYOUT for rs6000.

I guess that approach means that TARGET_COMPUTE_FRAME_LAYOUT isn't really
optional though.

Thanks,
Richard
Eric Botcazou Feb. 19, 2019, 5:15 p.m. UTC | #3
> Doesn't the frame have to be laid out correctly during the final reload
> subpass anyway, in order to get the right elimination choices and
> elimination offsets?  If so, I'm not sure what the " &&
> info->reload_completed" check in rs6000_stack_info achieves.  It seems like
> that's at least partly responsible for the problem.

Clearly a comment would be in order.  I'm not sure it's responsible for the 
problem, unless you mean that it's papering over the underlying issue, in 
which case you're probably right.  This underlying issue is probably that:

  /* Calculate which registers need to be saved & save area size.  */
  info->first_gp_reg_save = first_reg_to_save ();

overlooks the specific status of the frame pointer, i.e. it doesn't test 
frame_pointer_needed explicitly, which I think is pretty much mandatory.
The rest of the code apparently works correctly despite this though.

> I realise you've already applied it and that you saw it as a hack too,
> but this seems like a bit too much.  IMO a cleaner fix would be to define
> TARGET_COMPUTE_FRAME_LAYOUT for rs6000.

Well, it's a hack on top of a big kludge (the get_initial_register_offset 
stuff in rtlanal.c) so it can be viewed as a safety net too. :-)

> I guess that approach means that TARGET_COMPUTE_FRAME_LAYOUT isn't really
> optional though.

IMO another workaround for the underlying issue.
Eric Botcazou Feb. 26, 2019, 11:10 a.m. UTC | #4
> 	* rtlanal.c (get_initial_register_offset): Fall back to the raw estimate
> 	as long as the epilogue isn't completed.

I have backported it onto the 8 branch, where it fixes the failure (timeout) 
of gnat.dg/opt73.adb for PowerPC/Linux, after testing it on this platform.
Eric Botcazou April 6, 2019, 9:44 p.m. UTC | #5
> I have backported it onto the 8 branch, where it fixes the failure (timeout)
> of gnat.dg/opt73.adb for PowerPC/Linux, after testing it on this platform.

And now onto the 7 branch, after testing on x86-64/Linux and PowerPC/Linux.
diff mbox series

Patch

Index: rtlanal.c
===================================================================
--- rtlanal.c	(revision 268849)
+++ rtlanal.c	(working copy)
@@ -359,10 +359,10 @@  get_initial_register_offset (int from, i
   if (to == from)
     return 0;
 
-  /* It is not safe to call INITIAL_ELIMINATION_OFFSET
-     before the reload pass.  We need to give at least
-     an estimation for the resulting frame size.  */
-  if (! reload_completed)
+  /* It is not safe to call INITIAL_ELIMINATION_OFFSET before the epilogue
+     is completed, but we need to give at least an estimate for the stack
+     pointer based on the frame size.  */
+  if (!epilogue_completed)
     {
       offset1 = crtl->outgoing_args_size + get_frame_size ();
 #if !STACK_GROWS_DOWNWARD