Message ID | AANLkTikWGWLJR6nQYie6dMpQ6zk7mz3VJq1Acy71OgpU@mail.gmail.com |
---|---|
State | New |
Headers | show |
> The cgraphunit.c change looks ok in this regard, can you explain > the predict.c change? They do look independent, splitting up predict.c change looks wrong to me. pass_profile is done as part of early passes. Only case it can be run on clone of already processed function is the case where early optimizations would produce an clone and that clone would be early optimized again. We clone as part of ipa-sra and partial inlining, but that one expects clone to not be re-processed and do thinks like calling compute_inline_parameters. We can re-process them if the result seems benefical, but I can not think of cases where this maes big difference (changes of partial inlining are rather simple, with ipa-sra there is some chance that the new function body would optimize better, but not too grand either) I will need to go through the thread. What was reason for splitting lowering passes to two? This way we will process the whole program twice that is bit slower since it leads to poorer locality. Honza
On Feb 7, 2011, Richard Guenther <richard.guenther@gmail.com> wrote: > The referenced_var_lookup interface change is ok. Ok, I'll split it out and check it in. > tree old_cur_fun_decl = current_function_decl; > + referenced_var_iterator rvi; > + > + gcc_checking_assert (gimple_referenced_vars (cfun)); > + > current_function_decl = decl; > push_cfun (DECL_STRUCT_FUNCTION (decl)); > looks like only the assert after pushing cfun is required? Doh! Thanks. > Can we get away w/o cfun pushing here now? Maybe. I'll have a look. > Index: gcc/tree-inline.c > =================================================================== > --- gcc/tree-inline.c.orig 2011-02-07 09:23:32.407058212 -0200 > +++ gcc/tree-inline.c 2011-02-07 09:25:20.376780499 -0200 > @@ -317,7 +317,11 @@ remap_decl (tree decl, copy_body_data *i > || TREE_CODE (t) == RESULT_DECL || TREE_CODE (t) == PARM_DECL)) > { > get_var_ann (t); > - add_referenced_var (t); > + if (TREE_CODE (decl) != VAR_DECL > + || !gimple_referenced_vars (DECL_STRUCT_FUNCTION (id->src_fn)) > + || referenced_var_lookup (DECL_STRUCT_FUNCTION (id->src_fn), > + DECL_UID (decl))) > + add_referenced_var (t); > get_var_ann is redundant with add_referenced_var. Yeah, will drop and unminimize the patch. > Do we really run into the > !gimple_referenced_vars case here? Some earlier version of the patch did. Maybe not any more. > @@ -4753,6 +4757,13 @@ copy_decl_for_dup_finish (copy_body_data > new function. */ > DECL_CONTEXT (copy) = id->dst_fn; > + if (TREE_CODE (decl) == VAR_DECL > + && gimple_referenced_vars (cfun) > + && gimple_referenced_vars (DECL_STRUCT_FUNCTION (id->src_fn)) > + && referenced_var_lookup (DECL_STRUCT_FUNCTION (id->src_fn), > + DECL_UID (decl))) > + add_referenced_var (copy); > Why only for VAR_DECL decls and not to-var-decl translated PARM_DECLs? Yeah, PARM_DECLs and RESULT_DECLs get unconditional add_referenced_var elsewhere. > I wanted to ask you about >> That isn't enough. We often attempt to estimate the stack size of a >> function before we as much as put it in SSA form (let alone compute >> referenced_vras), while processing other functions. > anyway, as I don't see why we do that (nor where). What I noticed was that the trivial patch that moved pass_inline_parameters down and asserted gimple_referenced_vars(cfun) failed the assertion. I didn't investigate much, I just figured it made sense for one function to look at stack size estimates of its potentially-inlined callees to estimate its own, and since the call graph may contain cycles, we can't ensure we'll always process callee before caller. If my “figuring” is right, I don't see how to fix this, but I'll go back to the drawing board and investigate further. > The cgraphunit.c change looks ok in this regard, can you explain > the predict.c change? We split a function after profile-guessing, and initialize_cfun copied the PROFILE_GUESSED status from the original function to the split. Then, when we got to running the various passes over the newly-created version, we failed the assertion that profile_status != PROFILE_GUESSED in gimple_predict_edge. I didn't try to find out why this didn't happen before the pass rearrangement. I figured it made sense that the pass rearrangement enabled some more inlining/versioning, exposing this kind of situation just like it did expose the need for cleanups after early inlining to avoid getting SSA verify failures because formerly-throwing call stmts ended up known not to throw.
On Feb 7, 2011, Jan Hubicka <hubicka@ucw.cz> wrote: >> The cgraphunit.c change looks ok in this regard, can you explain >> the predict.c change? They do look independent, splitting up > predict.c change looks wrong to me. pass_profile is done as part of early > passes. Only case it can be run on clone of already processed function is > the case where early optimizations would produce an clone and that clone > would be early optimized again. Maybe it was the split that enabled the second part of the early passes to run for nearly-generated functions. So I won't investigate, because we're going to (try to) go back to a single early pass. > What was reason for splitting lowering > passes to two? It looked like we had to compute referenced_vars for all functions before computing inline parameters for any, because we could attempt to estimate the stack size of functions that hadn't gone through referenced_vars yet.
On Mon, Feb 7, 2011 at 11:30 PM, Alexandre Oliva <aoliva@redhat.com> wrote: > On Feb 7, 2011, Richard Guenther <richard.guenther@gmail.com> wrote: > >> The referenced_var_lookup interface change is ok. > > Ok, I'll split it out and check it in. > >> tree old_cur_fun_decl = current_function_decl; >> + referenced_var_iterator rvi; >> + >> + gcc_checking_assert (gimple_referenced_vars (cfun)); >> + >> current_function_decl = decl; >> push_cfun (DECL_STRUCT_FUNCTION (decl)); > >> looks like only the assert after pushing cfun is required? > > Doh! Thanks. > >> Can we get away w/o cfun pushing here now? > > Maybe. I'll have a look. > >> Index: gcc/tree-inline.c >> =================================================================== >> --- gcc/tree-inline.c.orig 2011-02-07 09:23:32.407058212 -0200 >> +++ gcc/tree-inline.c 2011-02-07 09:25:20.376780499 -0200 >> @@ -317,7 +317,11 @@ remap_decl (tree decl, copy_body_data *i >> || TREE_CODE (t) == RESULT_DECL || TREE_CODE (t) == PARM_DECL)) >> { >> get_var_ann (t); >> - add_referenced_var (t); >> + if (TREE_CODE (decl) != VAR_DECL >> + || !gimple_referenced_vars (DECL_STRUCT_FUNCTION (id->src_fn)) >> + || referenced_var_lookup (DECL_STRUCT_FUNCTION (id->src_fn), >> + DECL_UID (decl))) >> + add_referenced_var (t); > >> get_var_ann is redundant with add_referenced_var. > > Yeah, will drop and unminimize the patch. > >> Do we really run into the >> !gimple_referenced_vars case here? > > Some earlier version of the patch did. Maybe not any more. > >> @@ -4753,6 +4757,13 @@ copy_decl_for_dup_finish (copy_body_data >> new function. */ >> DECL_CONTEXT (copy) = id->dst_fn; > >> + if (TREE_CODE (decl) == VAR_DECL >> + && gimple_referenced_vars (cfun) >> + && gimple_referenced_vars (DECL_STRUCT_FUNCTION (id->src_fn)) >> + && referenced_var_lookup (DECL_STRUCT_FUNCTION (id->src_fn), >> + DECL_UID (decl))) >> + add_referenced_var (copy); > >> Why only for VAR_DECL decls and not to-var-decl translated PARM_DECLs? > > Yeah, PARM_DECLs and RESULT_DECLs get unconditional add_referenced_var > elsewhere. > >> I wanted to ask you about > >>> That isn't enough. We often attempt to estimate the stack size of a >>> function before we as much as put it in SSA form (let alone compute >>> referenced_vras), while processing other functions. > >> anyway, as I don't see why we do that (nor where). > > What I noticed was that the trivial patch that moved > pass_inline_parameters down and asserted gimple_referenced_vars(cfun) > failed the assertion. I didn't investigate much, I just figured it made > sense for one function to look at stack size estimates of its > potentially-inlined callees to estimate its own, and since the call > graph may contain cycles, we can't ensure we'll always process callee > before caller. If my “figuring” is right, I don't see how to fix this, > but I'll go back to the drawing board and investigate further. Yes, we can have cycles and no referenced vars in the callee. But we won't inline that callee (as it isn't in SSA form), so we can just as well skip looking at its estimated stack frame size. >> The cgraphunit.c change looks ok in this regard, can you explain >> the predict.c change? > > We split a function after profile-guessing, and initialize_cfun copied > the PROFILE_GUESSED status from the original function to the split. > Then, when we got to running the various passes over the newly-created > version, we failed the assertion that profile_status != PROFILE_GUESSED > in gimple_predict_edge. I didn't try to find out why this didn't happen > before the pass rearrangement. I figured it made sense that the pass > rearrangement enabled some more inlining/versioning, exposing this kind > of situation just like it did expose the need for cleanups after early > inlining to avoid getting SSA verify failures because formerly-throwing > call stmts ended up known not to throw. I think you are now feeding more functions into re-optimization (that's another reason I want to avoid the pass reorg for stage4). Richard. > > -- > Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ > You must be the change you wish to see in the world. -- Gandhi > Be Free! -- http://FSFLA.org/ FSF Latin America board member > Free Software Evangelist Red Hat Brazil Compiler Engineer >
> Yes, we can have cycles and no referenced vars in the callee. But > we won't inline that callee (as it isn't in SSA form), so we can just > as well skip looking at its estimated stack frame size. Checking in_ssa_p tests in ipa-inline, I think we check them early enough (before we check only cgraph properties - whether edge is inlined or inlining is recursive etc.). Functions checking inline parameters (check_inline_limits etc.) are done only after in_ssa_p check is done, so we should be safe here. (with exception of disregard inline limits, you can move that check later but any value of that flag is OK since we won't do anything with the call) Honza > > >> The cgraphunit.c change looks ok in this regard, can you explain > >> the predict.c change? > > > > We split a function after profile-guessing, and initialize_cfun copied > > the PROFILE_GUESSED status from the original function to the split. > > Then, when we got to running the various passes over the newly-created > > version, we failed the assertion that profile_status != PROFILE_GUESSED > > in gimple_predict_edge. I didn't try to find out why this didn't happen > > before the pass rearrangement. I figured it made sense that the pass > > rearrangement enabled some more inlining/versioning, exposing this kind > > of situation just like it did expose the need for cleanups after early > > inlining to avoid getting SSA verify failures because formerly-throwing > > call stmts ended up known not to throw. > > I think you are now feeding more functions into re-optimization > (that's another reason I want to avoid the pass reorg for stage4). > > Richard. > > > > > -- > > Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ > > You must be the change you wish to see in the world. -- Gandhi > > Be Free! -- http://FSFLA.org/ FSF Latin America board member > > Free Software Evangelist Red Hat Brazil Compiler Engineer > >
Index: gcc/tree-inline.c =================================================================== --- gcc/tree-inline.c.orig 2011-02-07 09:23:32.407058212 -0200 +++ gcc/tree-inline.c 2011-02-07 09:25:20.376780499 -0200 @@ -317,7 +317,11 @@ remap_decl (tree decl, copy_body_data *i || TREE_CODE (t) == RESULT_DECL || TREE_CODE (t) == PARM_DECL)) { get_var_ann (t); - add_referenced_var (t); + if (TREE_CODE (decl) != VAR_DECL + || !gimple_referenced_vars (DECL_STRUCT_FUNCTION (id->src_fn)) + || referenced_var_lookup (DECL_STRUCT_FUNCTION (id->src_fn), + DECL_UID (decl))) + add_referenced_var (t); get_var_ann is redundant with add_referenced_var. It probably isn't needed for debug vars, is it? id-> contains struct function pointers, use them, not DECL_STRUCT_FUNCTION. Do we really run into the !gimple_referenced_vars case here? I think we can't (cfun is in SSA form and we do not inline non-SSA fns into SSA fns). @@ -4753,6 +4757,13 @@ copy_decl_for_dup_finish (copy_body_data new function. */ DECL_CONTEXT (copy) = id->dst_fn; + if (TREE_CODE (decl) == VAR_DECL + && gimple_referenced_vars (cfun) + && gimple_referenced_vars (DECL_STRUCT_FUNCTION (id->src_fn)) + && referenced_var_lookup (DECL_STRUCT_FUNCTION (id->src_fn), + DECL_UID (decl))) + add_referenced_var (copy); Why only for VAR_DECL decls and not to-var-decl translated PARM_DECLs? Please use the same (single) check as the other piece, gimple_in_ssa_p (cfun). I wanted to ask you about > That isn't enough. We often attempt to estimate the stack size of a > function before we as much as put it in SSA form (let alone compute > referenced_vras), while processing other functions. anyway, as I don't see why we do that (nor where). And I'd rather fix that than doing the pass re-org you did (which are not optimal for cache locality reasons). Honza correctly noted that we never inline non-SSA fns into SSA fns, but maybe we'd query estimates in wrong order (and we should fix that, instead of your reorg I think). Thus, Index: gcc/passes.c =================================================================== --- gcc/passes.c (revision 169878) +++ gcc/passes.c (working copy) @@ -729,7 +729,6 @@ init_optimization_passes (void) NEXT_PASS (pass_build_cfg); NEXT_PASS (pass_warn_function_return); NEXT_PASS (pass_build_cgraph_edges); - NEXT_PASS (pass_inline_parameters); *p = NULL; /* Interprocedural optimization passes. */ @@ -753,6 +752,7 @@ init_optimization_passes (void) inline functions to be inlined even at -O0. This does not happen during the first early inline pass. */ NEXT_PASS (pass_rebuild_cgraph_edges); + NEXT_PASS (pass_inline_parameters); NEXT_PASS (pass_early_inline); NEXT_PASS (pass_all_early_optimizations); {