Message ID | 2568702.eKRIyvcb7p@fomalhaut |
---|---|
State | New |
Headers | show |
Series | Fix dangling references in thunks at -O0 | expand |
On Mon, Sep 14, 2020 at 9:46 AM Eric Botcazou <botcazou@adacore.com> wrote: > > Hi, > > when a thunk cannot be emitted in assembly directly, cgraph_node::expand_thunk > generates regular GIMPLE code but unconditionally forces a tail call to the > target of the thunk. That's theoretically OK because the thunk essentially > forwards its parameters to the target, but in practice the RTL expander can > spill parameters passed by reference on the stack, see assign_parm_setup_reg: > > /* If we were passed a pointer but the actual value can safely live > in a register, retrieve it and use it directly. */ > if (data->arg.pass_by_reference && TYPE_MODE (TREE_TYPE (parm)) != BLKmode) > { > /* We can't use nominal_mode, because it will have been set to > Pmode above. We must use the actual mode of the parm. */ > if (use_register_for_decl (parm)) > { > parmreg = gen_reg_rtx (TYPE_MODE (TREE_TYPE (parm))); > mark_user_reg (parmreg); > } > else > { > int align = STACK_SLOT_ALIGNMENT (TREE_TYPE (parm), > TYPE_MODE (TREE_TYPE (parm)), > TYPE_ALIGN (TREE_TYPE (parm))); > parmreg > = assign_stack_local (TYPE_MODE (TREE_TYPE (parm)), > GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (parm))), > align); > set_mem_attributes (parmreg, parm, 1); > } > > use_register_for_decl always return false at -O0 so, in this case, the thunk > will pass an address within its frame to its target, so it cannot use a tail > call to invoke it. > > Tested on x86_64-suse-linux, OK for the mainline? ISTR the tailcall flag is only a hint and RTL expansion can decide to not tailcall based on targets. So to me it looks like a missed disqualification on the RTL expansion side. Or do we, besides from this very single spot, simply never tailcall at -O0 and thus never hit this latent issue? How does this change the debug experience at -O0 when GIMPLE thunks are used? Thanks, Richard. > > 2020-09-14 Eric Botcazou <ebotcazou@adacore.com> > > * cgraphunit.c (cgraph_node::expand_thunk): Force a tail call only > when optimizing. > > > 2020-09-14 Eric Botcazou <ebotcazou@adacore.com> > > * gnat.dg/thunk1.adb: New test. > * gnat.dg/thunk1_pkg1.ads: New helper. > * gnat.dg/thunk1_pkg2.ads: Likewise. > * gnat.dg/thunk1_pkg2.adb: Likewise. > > -- > Eric Botcazou
> ISTR the tailcall flag is only a hint and RTL expansion can decide to > not tailcall based on targets. So to me it looks like a missed > disqualification on the RTL expansion side. That's IMO debatable: the GIMPLE tailcall optimization pass e.g. does: /* Make sure the tail invocation of this function does not indirectly refer to local variables. (Passing variables directly by value is OK.) */ FOR_EACH_LOCAL_DECL (cfun, idx, var) { if (TREE_CODE (var) != PARM_DECL && auto_var_in_fn_p (var, cfun->decl) && may_be_aliased (var) && (ref_maybe_used_by_stmt_p (call, var) || call_may_clobber_ref_p (call, var))) Do you want to replace it with something at the RTL level too? I don't think that you can disqualify things at the RTL as easily as at the GIMPLE level. > Or do we, besides from this very single spot, simply never tailcall at -O0 > and thus never hit this latent issue? Presumably yes, tail calling is an optimization like the others. > How does this change the debug experience at -O0 when GIMPLE thunks > are used? In Ada this doesn't change much since thunks have line debug info.
On Mon, Sep 14, 2020 at 10:36 AM Eric Botcazou <botcazou@adacore.com> wrote: > > > ISTR the tailcall flag is only a hint and RTL expansion can decide to > > not tailcall based on targets. So to me it looks like a missed > > disqualification on the RTL expansion side. > > That's IMO debatable: the GIMPLE tailcall optimization pass e.g. does: > > /* Make sure the tail invocation of this function does not indirectly > refer to local variables. (Passing variables directly by value > is OK.) */ > FOR_EACH_LOCAL_DECL (cfun, idx, var) > { > if (TREE_CODE (var) != PARM_DECL > && auto_var_in_fn_p (var, cfun->decl) > && may_be_aliased (var) > && (ref_maybe_used_by_stmt_p (call, var) > || call_may_clobber_ref_p (call, var))) > > Do you want to replace it with something at the RTL level too? I don't think > that you can disqualify things at the RTL as easily as at the GIMPLE level. No, so you're right - validity analysis is split. Still the cause you reference comes from RTL expansion which means RTL expansion should possibly do the disqualification here. The question is what makes the case you run into at -O0 impossible to trigger with -O1+? Maybe we can also avoid this spilling at -O0? > > Or do we, besides from this very single spot, simply never tailcall at -O0 > > and thus never hit this latent issue? > > Presumably yes, tail calling is an optimization like the others. Yeah, but the asm thunk tail-calls also at -O0. I guess tail-calling is forced here because gimple analysis sometimes would not mark the call. > > How does this change the debug experience at -O0 when GIMPLE thunks > > are used? > > In Ada this doesn't change much since thunks have line debug info. I see, so as long as you then don't see extra frames it should be fine with regard to debug info. I'm not against the patch but let's leave it a bit for others to comment. Richard. > -- > Eric Botcazou > >
> No, so you're right - validity analysis is split. Still the cause you > reference comes from RTL expansion which means RTL expansion should > possibly do the disqualification here. The question is what makes the case > you run into at -O0 impossible to trigger with -O1+? The key test in use_register_for_decl is: if (optimize) return true; I think that all the preceding tests that return false cannot trigger in this context - a parameter passed by invisible reference - so, by returning true here, you're pretty much covered I think. > Maybe we can also avoid this spilling at -O0? The code wants to retrieve the parameter at all optimization levels. The question is: register or stack slot? It uses use_register_for_decl to decide as in other circumstances, but so you want to always force a register? Note that, since cgraph_node::expand_thunk has expanded the thunk in GIMPLE, the function is no longer a thunk for the middle-end (cfun->is_thunk false). That being said, I can add another bit to cfun, e.g. is_gimple_thunk, and force the use of a register above only in case the bit is true. > Yeah, but the asm thunk tail-calls also at -O0. I guess tail-calling is > forced here because gimple analysis sometimes would not mark the call. The low-level assembly thunks simply support the most simple thunks, see the condition in expand_thunk. Moreover targets can opt out as they wish.
> That being said, I can add another bit to cfun, e.g. is_gimple_thunk, and > force the use of a register above only in case the bit is true. In fact I can probably reuse cfun->tail_call_marked for this purpose.
On Mon, Sep 14, 2020 at 12:56 PM Eric Botcazou <botcazou@adacore.com> wrote: > > > No, so you're right - validity analysis is split. Still the cause you > > reference comes from RTL expansion which means RTL expansion should > > possibly do the disqualification here. The question is what makes the case > > you run into at -O0 impossible to trigger with -O1+? > > The key test in use_register_for_decl is: > > if (optimize) > return true; > > I think that all the preceding tests that return false cannot trigger in this > context - a parameter passed by invisible reference - so, by returning true > here, you're pretty much covered I think. > > > Maybe we can also avoid this spilling at -O0? > > The code wants to retrieve the parameter at all optimization levels. The > question is: register or stack slot? It uses use_register_for_decl to decide > as in other circumstances, but so you want to always force a register? > > Note that, since cgraph_node::expand_thunk has expanded the thunk in GIMPLE, > the function is no longer a thunk for the middle-end (cfun->is_thunk false). > > That being said, I can add another bit to cfun, e.g. is_gimple_thunk, and > force the use of a register above only in case the bit is true. /* We don't set DECL_IGNORED_P for the function_result_decl. */ if (optimize) return true; /* We don't set DECL_REGISTER for the function_result_decl. */ return false; or maybe set (and check) DECL_REGISTER? But the above certainly looks like we can pick as we wish. > > Yeah, but the asm thunk tail-calls also at -O0. I guess tail-calling is > > forced here because gimple analysis sometimes would not mark the call. > > The low-level assembly thunks simply support the most simple thunks, see the > condition in expand_thunk. Moreover targets can opt out as they wish. > > -- > Eric Botcazou > >
> In fact I can probably reuse cfun->tail_call_marked for this purpose.
Like so.
* cgraphunit.c (cgraph_node::expand_thunk): Make sure to set
cfun->tail_call_marked when forcing a tail call.
* function.c (assign_parm_setup_reg): Always use a register to
retrieve a parameter passed by reference if cfun->tail_call_marked.
On Mon, Sep 14, 2020 at 1:31 PM Eric Botcazou <botcazou@adacore.com> wrote: > > > In fact I can probably reuse cfun->tail_call_marked for this purpose. > > Like so. Looks reasonable to me. Richard. > > * cgraphunit.c (cgraph_node::expand_thunk): Make sure to set > cfun->tail_call_marked when forcing a tail call. > * function.c (assign_parm_setup_reg): Always use a register to > retrieve a parameter passed by reference if cfun->tail_call_marked. > > -- > Eric Botcazou
This introduces an ICE building the glibc testsuite for alpha (bisected), s390 and sparc (symptoms appear the same, not bisected to confirm the exact revision). See bug 97078.
> This introduces an ICE building the glibc testsuite for alpha (bisected), > s390 and sparc (symptoms appear the same, not bisected to confirm the > exact revision). See bug 97078. Fixed thus, tested on x86_64-suse-linux, applied on mainline as obvious. PR middle-end/97078 * function.c (use_register_for_decl): Test cfun->tail_call_marked for a parameter here instead of... (assign_parm_setup_reg): ...here. * gcc.dg/pr97078.c: New test.
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 26d3995a0c0..fe8552e3cf2 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2170,8 +2170,11 @@ cgraph_node::expand_thunk (bool output_asm_thunks, bool force_gimple_thunk) bsi = gsi_last_bb (return_bb); } } + + /* We can use a tail call, but only when optimizing to be sure that + the RTL expander doesn't spill parameters passed by reference. */ else - gimple_call_set_tail (call, true); + gimple_call_set_tail (call, optimize); /* Build return value. */ if (!DECL_BY_REFERENCE (resdecl)) @@ -2183,7 +2186,7 @@ cgraph_node::expand_thunk (bool output_asm_thunks, bool force_gimple_thunk) } else { - gimple_call_set_tail (call, true); + gimple_call_set_tail (call, optimize); remove_edge (single_succ_edge (bb)); }