diff mbox series

Fix dangling references in thunks at -O0

Message ID 2568702.eKRIyvcb7p@fomalhaut
State New
Headers show
Series Fix dangling references in thunks at -O0 | expand

Commit Message

Eric Botcazou Sept. 14, 2020, 7:44 a.m. UTC
Hi,

when a thunk cannot be emitted in assembly directly, cgraph_node::expand_thunk 
generates regular GIMPLE code but unconditionally forces a tail call to the 
target of the thunk.  That's theoretically OK because the thunk essentially 
forwards its parameters to the target, but in practice the RTL expander can 
spill parameters passed by reference on the stack, see assign_parm_setup_reg:

  /* If we were passed a pointer but the actual value can safely live
     in a register, retrieve it and use it directly.  */
  if (data->arg.pass_by_reference && TYPE_MODE (TREE_TYPE (parm)) != BLKmode)
    {
      /* We can't use nominal_mode, because it will have been set to
	 Pmode above.  We must use the actual mode of the parm.  */
      if (use_register_for_decl (parm))
	{
	  parmreg = gen_reg_rtx (TYPE_MODE (TREE_TYPE (parm)));
	  mark_user_reg (parmreg);
	}
      else
	{
	  int align = STACK_SLOT_ALIGNMENT (TREE_TYPE (parm),
				    TYPE_MODE (TREE_TYPE (parm)),
			    TYPE_ALIGN (TREE_TYPE (parm)));
	  parmreg
	    = assign_stack_local (TYPE_MODE (TREE_TYPE (parm)),
		  GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (parm))),
				  align);
	  set_mem_attributes (parmreg, parm, 1);
	}

use_register_for_decl always return false at -O0 so, in this case, the thunk 
will pass an address within its frame to its target, so it cannot use a tail 
call to invoke it.

Tested on x86_64-suse-linux, OK for the mainline?


2020-09-14  Eric Botcazou  <ebotcazou@adacore.com>

	* cgraphunit.c (cgraph_node::expand_thunk): Force a tail call only
	when optimizing.


2020-09-14  Eric Botcazou  <ebotcazou@adacore.com>

	* gnat.dg/thunk1.adb: New test.
	* gnat.dg/thunk1_pkg1.ads: New helper.
	* gnat.dg/thunk1_pkg2.ads: Likewise.
	* gnat.dg/thunk1_pkg2.adb: Likewise.

Comments

Richard Biener Sept. 14, 2020, 7:53 a.m. UTC | #1
On Mon, Sep 14, 2020 at 9:46 AM Eric Botcazou <botcazou@adacore.com> wrote:
>
> Hi,
>
> when a thunk cannot be emitted in assembly directly, cgraph_node::expand_thunk
> generates regular GIMPLE code but unconditionally forces a tail call to the
> target of the thunk.  That's theoretically OK because the thunk essentially
> forwards its parameters to the target, but in practice the RTL expander can
> spill parameters passed by reference on the stack, see assign_parm_setup_reg:
>
>   /* If we were passed a pointer but the actual value can safely live
>      in a register, retrieve it and use it directly.  */
>   if (data->arg.pass_by_reference && TYPE_MODE (TREE_TYPE (parm)) != BLKmode)
>     {
>       /* We can't use nominal_mode, because it will have been set to
>          Pmode above.  We must use the actual mode of the parm.  */
>       if (use_register_for_decl (parm))
>         {
>           parmreg = gen_reg_rtx (TYPE_MODE (TREE_TYPE (parm)));
>           mark_user_reg (parmreg);
>         }
>       else
>         {
>           int align = STACK_SLOT_ALIGNMENT (TREE_TYPE (parm),
>                                     TYPE_MODE (TREE_TYPE (parm)),
>                             TYPE_ALIGN (TREE_TYPE (parm)));
>           parmreg
>             = assign_stack_local (TYPE_MODE (TREE_TYPE (parm)),
>                   GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (parm))),
>                                   align);
>           set_mem_attributes (parmreg, parm, 1);
>         }
>
> use_register_for_decl always return false at -O0 so, in this case, the thunk
> will pass an address within its frame to its target, so it cannot use a tail
> call to invoke it.
>
> Tested on x86_64-suse-linux, OK for the mainline?

ISTR the tailcall flag is only a hint and RTL expansion can decide to
not tailcall based on targets.  So to me it looks like a missed disqualification
on the RTL expansion side.  Or do we, besides from this very single spot,
simply never tailcall at -O0 and thus never hit this latent issue?

How does this change the debug experience at -O0 when GIMPLE thunks
are used?

Thanks,
Richard.

>
> 2020-09-14  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * cgraphunit.c (cgraph_node::expand_thunk): Force a tail call only
>         when optimizing.
>
>
> 2020-09-14  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * gnat.dg/thunk1.adb: New test.
>         * gnat.dg/thunk1_pkg1.ads: New helper.
>         * gnat.dg/thunk1_pkg2.ads: Likewise.
>         * gnat.dg/thunk1_pkg2.adb: Likewise.
>
> --
> Eric Botcazou
Eric Botcazou Sept. 14, 2020, 8:36 a.m. UTC | #2
> ISTR the tailcall flag is only a hint and RTL expansion can decide to
> not tailcall based on targets.  So to me it looks like a missed
> disqualification on the RTL expansion side.

That's IMO debatable: the GIMPLE tailcall optimization pass e.g. does:

  /* Make sure the tail invocation of this function does not indirectly
     refer to local variables.  (Passing variables directly by value
     is OK.)  */
  FOR_EACH_LOCAL_DECL (cfun, idx, var)
    {
      if (TREE_CODE (var) != PARM_DECL
	  && auto_var_in_fn_p (var, cfun->decl)
	  && may_be_aliased (var)
	  && (ref_maybe_used_by_stmt_p (call, var)
	      || call_may_clobber_ref_p (call, var)))

Do you want to replace it with something at the RTL level too?  I don't think 
that you can disqualify things at the RTL as easily as at the GIMPLE level.

> Or do we, besides from this very single spot, simply never tailcall at -O0
> and thus never hit this latent issue?

Presumably yes, tail calling is an optimization like the others.

> How does this change the debug experience at -O0 when GIMPLE thunks
> are used?

In Ada this doesn't change much since thunks have line debug info.
Richard Biener Sept. 14, 2020, 9:54 a.m. UTC | #3
On Mon, Sep 14, 2020 at 10:36 AM Eric Botcazou <botcazou@adacore.com> wrote:
>
> > ISTR the tailcall flag is only a hint and RTL expansion can decide to
> > not tailcall based on targets.  So to me it looks like a missed
> > disqualification on the RTL expansion side.
>
> That's IMO debatable: the GIMPLE tailcall optimization pass e.g. does:
>
>   /* Make sure the tail invocation of this function does not indirectly
>      refer to local variables.  (Passing variables directly by value
>      is OK.)  */
>   FOR_EACH_LOCAL_DECL (cfun, idx, var)
>     {
>       if (TREE_CODE (var) != PARM_DECL
>           && auto_var_in_fn_p (var, cfun->decl)
>           && may_be_aliased (var)
>           && (ref_maybe_used_by_stmt_p (call, var)
>               || call_may_clobber_ref_p (call, var)))
>
> Do you want to replace it with something at the RTL level too?  I don't think
> that you can disqualify things at the RTL as easily as at the GIMPLE level.

No, so you're right - validity analysis is split.  Still the cause you reference
comes from RTL expansion which means RTL expansion should possibly
do the disqualification here.  The question is what makes the case you
run into at -O0 impossible to trigger with -O1+?

Maybe we can also avoid this spilling at -O0?

> > Or do we, besides from this very single spot, simply never tailcall at -O0
> > and thus never hit this latent issue?
>
> Presumably yes, tail calling is an optimization like the others.

Yeah, but the asm thunk tail-calls also at -O0.  I guess tail-calling is forced
here because gimple analysis sometimes would not mark the call.

> > How does this change the debug experience at -O0 when GIMPLE thunks
> > are used?
>
> In Ada this doesn't change much since thunks have line debug info.

I see, so as long as you then don't see extra frames it should be fine
with regard to debug info.

I'm not against the patch but let's leave it a bit for others to comment.

Richard.

> --
> Eric Botcazou
>
>
Eric Botcazou Sept. 14, 2020, 10:56 a.m. UTC | #4
> No, so you're right - validity analysis is split.  Still the cause you
> reference comes from RTL expansion which means RTL expansion should
> possibly do the disqualification here.  The question is what makes the case
> you run into at -O0 impossible to trigger with -O1+?

The key test in use_register_for_decl is:

  if (optimize)
    return true;

I think that all the preceding tests that return false cannot trigger in this 
context - a parameter passed by invisible reference - so, by returning true 
here, you're pretty much covered I think.

> Maybe we can also avoid this spilling at -O0?

The code wants to retrieve the parameter at all optimization levels.  The 
question is: register or stack slot?  It uses use_register_for_decl to decide 
as in other circumstances, but so you want to always force a register?

Note that, since cgraph_node::expand_thunk has expanded the thunk in GIMPLE, 
the function is no longer a thunk for the middle-end (cfun->is_thunk false).

That being said, I can add another bit to cfun, e.g. is_gimple_thunk, and 
force the use of a register above only in case the bit is true.

> Yeah, but the asm thunk tail-calls also at -O0.  I guess tail-calling is
> forced here because gimple analysis sometimes would not mark the call.

The low-level assembly thunks simply support the most simple thunks, see the 
condition in expand_thunk.  Moreover targets can opt out as they wish.
Eric Botcazou Sept. 14, 2020, 11:05 a.m. UTC | #5
> That being said, I can add another bit to cfun, e.g. is_gimple_thunk, and
> force the use of a register above only in case the bit is true.

In fact I can probably reuse cfun->tail_call_marked for this purpose.
Richard Biener Sept. 14, 2020, 11:27 a.m. UTC | #6
On Mon, Sep 14, 2020 at 12:56 PM Eric Botcazou <botcazou@adacore.com> wrote:
>
> > No, so you're right - validity analysis is split.  Still the cause you
> > reference comes from RTL expansion which means RTL expansion should
> > possibly do the disqualification here.  The question is what makes the case
> > you run into at -O0 impossible to trigger with -O1+?
>
> The key test in use_register_for_decl is:
>
>   if (optimize)
>     return true;
>
> I think that all the preceding tests that return false cannot trigger in this
> context - a parameter passed by invisible reference - so, by returning true
> here, you're pretty much covered I think.
>
> > Maybe we can also avoid this spilling at -O0?
>
> The code wants to retrieve the parameter at all optimization levels.  The
> question is: register or stack slot?  It uses use_register_for_decl to decide
> as in other circumstances, but so you want to always force a register?
>
> Note that, since cgraph_node::expand_thunk has expanded the thunk in GIMPLE,
> the function is no longer a thunk for the middle-end (cfun->is_thunk false).
>
> That being said, I can add another bit to cfun, e.g. is_gimple_thunk, and
> force the use of a register above only in case the bit is true.

      /* We don't set DECL_IGNORED_P for the function_result_decl.  */
      if (optimize)
        return true;
      /* We don't set DECL_REGISTER for the function_result_decl.  */
      return false;

or maybe set (and check) DECL_REGISTER?  But the above certainly
looks like we can pick as we wish.

> > Yeah, but the asm thunk tail-calls also at -O0.  I guess tail-calling is
> > forced here because gimple analysis sometimes would not mark the call.
>
> The low-level assembly thunks simply support the most simple thunks, see the
> condition in expand_thunk.  Moreover targets can opt out as they wish.
>
> --
> Eric Botcazou
>
>
Eric Botcazou Sept. 14, 2020, 11:31 a.m. UTC | #7
> In fact I can probably reuse cfun->tail_call_marked for this purpose.

Like so.

	* cgraphunit.c (cgraph_node::expand_thunk): Make sure to set
 	cfun->tail_call_marked when forcing a tail call.
	* function.c (assign_parm_setup_reg): Always use a register to
	retrieve a parameter passed by reference if cfun->tail_call_marked.
Richard Biener Sept. 14, 2020, 1:40 p.m. UTC | #8
On Mon, Sep 14, 2020 at 1:31 PM Eric Botcazou <botcazou@adacore.com> wrote:
>
> > In fact I can probably reuse cfun->tail_call_marked for this purpose.
>
> Like so.

Looks reasonable to me.

Richard.
>
>         * cgraphunit.c (cgraph_node::expand_thunk): Make sure to set
>         cfun->tail_call_marked when forcing a tail call.
>         * function.c (assign_parm_setup_reg): Always use a register to
>         retrieve a parameter passed by reference if cfun->tail_call_marked.
>
> --
> Eric Botcazou
Joseph Myers Sept. 16, 2020, 9:17 p.m. UTC | #9
This introduces an ICE building the glibc testsuite for alpha (bisected), 
s390 and sparc (symptoms appear the same, not bisected to confirm the 
exact revision).  See bug 97078.
Eric Botcazou Sept. 17, 2020, 11:01 a.m. UTC | #10
> This introduces an ICE building the glibc testsuite for alpha (bisected),
> s390 and sparc (symptoms appear the same, not bisected to confirm the
> exact revision).  See bug 97078.

Fixed thus, tested on x86_64-suse-linux, applied on mainline as obvious.


	PR middle-end/97078
	* function.c (use_register_for_decl): Test cfun->tail_call_marked
	for a parameter here instead of...
	(assign_parm_setup_reg): ...here.

	* gcc.dg/pr97078.c: New test.
diff mbox series

Patch

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 26d3995a0c0..fe8552e3cf2 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2170,8 +2170,11 @@  cgraph_node::expand_thunk (bool output_asm_thunks, bool force_gimple_thunk)
 		  bsi = gsi_last_bb (return_bb);
 		}
 	    }
+
+	  /* We can use a tail call, but only when optimizing to be sure that
+	     the RTL expander doesn't spill parameters passed by reference.  */
 	  else
-	    gimple_call_set_tail (call, true);
+	    gimple_call_set_tail (call, optimize);
 
 	  /* Build return value.  */
 	  if (!DECL_BY_REFERENCE (resdecl))
@@ -2183,7 +2186,7 @@  cgraph_node::expand_thunk (bool output_asm_thunks, bool force_gimple_thunk)
 	}
       else
 	{
-	  gimple_call_set_tail (call, true);
+	  gimple_call_set_tail (call, optimize);
 	  remove_edge (single_succ_edge (bb));
 	}