diff mbox

[PR,debug/47106] account used vars only once

Message ID AANLkTikWGWLJR6nQYie6dMpQ6zk7mz3VJq1Acy71OgpU@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener Feb. 7, 2011, 1:39 p.m. UTC
On Mon, Feb 7, 2011 at 1:31 PM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Feb  4, 2011, Alexandre Oliva <aoliva@redhat.com> wrote:
>
>> That isn't enough.  We often attempt to estimate the stack size of a
>> function before we as much as put it in SSA form (let alone compute
>> referenced_vras), while processing other functions.
>
>> if we switch to referenced_vars only, we have to figure out how to
>> rearrange the passes so that we make reasonable estimates.
>
> Like this.  This has completed bootstrap and is almost done building
> target libs on amd64-linux-gnu.  Ok to install if it passes regression
> testing?
>
> I've used a patch to report differences in the computation of stack
> size, and in nearly all cases the size goes down, but there are
> exceptions in which it goes up by 4 or 8 bytes.  I'll run a full
> bootstrap with that patch once I'm done with testing and summarize the
> results.

The referenced_var_lookup interface change is ok.

   tree old_cur_fun_decl = current_function_decl;
+  referenced_var_iterator rvi;
+
+  gcc_checking_assert (gimple_referenced_vars (cfun));
+
   current_function_decl = decl;
   push_cfun (DECL_STRUCT_FUNCTION (decl));

-  init_vars_expansion ();
-
-  FOR_EACH_LOCAL_DECL (cfun, ix, var)
-    {
-      /* TREE_USED marks local variables that do not appear in lexical
-        blocks.  We don't want to expand those that do twice.  */
-      if (TREE_USED (var))
-        size += expand_one_var (var, true, false);
-    }
-  size += account_used_vars_for_block (outer_block, true);
+  gcc_checking_assert (gimple_referenced_vars (cfun));
+  FOR_EACH_REFERENCED_VAR (var, rvi)
+    size += expand_one_var (var, true, false);

looks like only the assert after pushing cfun is required?  Can we get
away w/o cfun pushing here now?



should really just work.  If it doesn't then there is a bug (somewhere).

The cgraphunit.c change looks ok in this regard, can you explain
the predict.c change?  They do look independent, splitting up
commits at this stage would be nice.

Thanks,
Richard.


>
>
> --
> Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! -- http://FSFLA.org/   FSF Latin America board member
> Free Software Evangelist      Red Hat Brazil Compiler Engineer
>
>

Comments

Jan Hubicka Feb. 7, 2011, 4 p.m. UTC | #1
> The cgraphunit.c change looks ok in this regard, can you explain
> the predict.c change?  They do look independent, splitting up

predict.c change looks wrong to me.  pass_profile is done as part of early
passes.  Only case it can be run on clone of already processed function is
the case where early optimizations would produce an clone and that clone
would be early optimized again.

We clone as part of ipa-sra and partial inlining, but that one expects clone to
not be re-processed and do thinks like calling compute_inline_parameters.

We can re-process them if the result seems benefical, but I can not think
of cases where this maes big difference (changes of partial inlining are
rather simple, with ipa-sra there is some chance that the new function body
would optimize better, but not too grand either)

I will need to go through the thread.  What was reason for splitting lowering
passes to two?  This way we will process the whole program twice that is bit
slower since it leads to poorer locality.

Honza
Alexandre Oliva Feb. 7, 2011, 10:30 p.m. UTC | #2
On Feb  7, 2011, Richard Guenther <richard.guenther@gmail.com> wrote:

> The referenced_var_lookup interface change is ok.

Ok, I'll split it out and check it in.

>    tree old_cur_fun_decl = current_function_decl;
> +  referenced_var_iterator rvi;
> +
> +  gcc_checking_assert (gimple_referenced_vars (cfun));
> +
>    current_function_decl = decl;
>    push_cfun (DECL_STRUCT_FUNCTION (decl));

> looks like only the assert after pushing cfun is required?

Doh!  Thanks.

> Can we get away w/o cfun pushing here now?

Maybe.  I'll have a look.

> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c.orig      2011-02-07 09:23:32.407058212 -0200
> +++ gcc/tree-inline.c   2011-02-07 09:25:20.376780499 -0200
> @@ -317,7 +317,11 @@ remap_decl (tree decl, copy_body_data *i
>               || TREE_CODE (t) == RESULT_DECL || TREE_CODE (t) == PARM_DECL))
>         {
>           get_var_ann (t);
> -         add_referenced_var (t);
> +         if (TREE_CODE (decl) != VAR_DECL
> +             || !gimple_referenced_vars (DECL_STRUCT_FUNCTION (id->src_fn))
> +             || referenced_var_lookup (DECL_STRUCT_FUNCTION (id->src_fn),
> +                                       DECL_UID (decl)))
> +           add_referenced_var (t);

> get_var_ann is redundant with add_referenced_var.

Yeah, will drop and unminimize the patch.

> Do we really run into the
> !gimple_referenced_vars case here?

Some earlier version of the patch did.  Maybe not any more.

> @@ -4753,6 +4757,13 @@ copy_decl_for_dup_finish (copy_body_data
>         new function.  */
>      DECL_CONTEXT (copy) = id->dst_fn;

> +  if (TREE_CODE (decl) == VAR_DECL
> +      && gimple_referenced_vars (cfun)
> +      && gimple_referenced_vars (DECL_STRUCT_FUNCTION (id->src_fn))
> +      && referenced_var_lookup (DECL_STRUCT_FUNCTION (id->src_fn),
> +                               DECL_UID (decl)))
> +    add_referenced_var (copy);

> Why only for VAR_DECL decls and not to-var-decl translated PARM_DECLs?

Yeah, PARM_DECLs and RESULT_DECLs get unconditional add_referenced_var
elsewhere.

> I wanted to ask you about

>> That isn't enough.  We often attempt to estimate the stack size of a
>> function before we as much as put it in SSA form (let alone compute
>> referenced_vras), while processing other functions.

> anyway, as I don't see why we do that (nor where).

What I noticed was that the trivial patch that moved
pass_inline_parameters down and asserted gimple_referenced_vars(cfun)
failed the assertion.  I didn't investigate much, I just figured it made
sense for one function to look at stack size estimates of its
potentially-inlined callees to estimate its own, and since the call
graph may contain cycles, we can't ensure we'll always process callee
before caller.  If my “figuring” is right, I don't see how to fix this,
but I'll go back to the drawing board and investigate further.

> The cgraphunit.c change looks ok in this regard, can you explain
> the predict.c change?

We split a function after profile-guessing, and initialize_cfun copied
the PROFILE_GUESSED status from the original function to the split.
Then, when we got to running the various passes over the newly-created
version, we failed the assertion that profile_status != PROFILE_GUESSED
in gimple_predict_edge.  I didn't try to find out why this didn't happen
before the pass rearrangement.  I figured it made sense that the pass
rearrangement enabled some more inlining/versioning, exposing this kind
of situation just like it did expose the need for cleanups after early
inlining to avoid getting SSA verify failures because formerly-throwing
call stmts ended up known not to throw.
Alexandre Oliva Feb. 7, 2011, 10:37 p.m. UTC | #3
On Feb  7, 2011, Jan Hubicka <hubicka@ucw.cz> wrote:

>> The cgraphunit.c change looks ok in this regard, can you explain
>> the predict.c change?  They do look independent, splitting up

> predict.c change looks wrong to me.  pass_profile is done as part of early
> passes.  Only case it can be run on clone of already processed function is
> the case where early optimizations would produce an clone and that clone
> would be early optimized again.

Maybe it was the split that enabled the second part of the early passes
to run for nearly-generated functions.  So I won't investigate, because
we're going to (try to) go back to a single early pass.

> What was reason for splitting lowering
> passes to two?

It looked like we had to compute referenced_vars for all functions
before computing inline parameters for any, because we could attempt to
estimate the stack size of functions that hadn't gone through
referenced_vars yet.
Richard Biener Feb. 8, 2011, 11:10 a.m. UTC | #4
On Mon, Feb 7, 2011 at 11:30 PM, Alexandre Oliva <aoliva@redhat.com> wrote:
> On Feb  7, 2011, Richard Guenther <richard.guenther@gmail.com> wrote:
>
>> The referenced_var_lookup interface change is ok.
>
> Ok, I'll split it out and check it in.
>
>>    tree old_cur_fun_decl = current_function_decl;
>> +  referenced_var_iterator rvi;
>> +
>> +  gcc_checking_assert (gimple_referenced_vars (cfun));
>> +
>>    current_function_decl = decl;
>>    push_cfun (DECL_STRUCT_FUNCTION (decl));
>
>> looks like only the assert after pushing cfun is required?
>
> Doh!  Thanks.
>
>> Can we get away w/o cfun pushing here now?
>
> Maybe.  I'll have a look.
>
>> Index: gcc/tree-inline.c
>> ===================================================================
>> --- gcc/tree-inline.c.orig      2011-02-07 09:23:32.407058212 -0200
>> +++ gcc/tree-inline.c   2011-02-07 09:25:20.376780499 -0200
>> @@ -317,7 +317,11 @@ remap_decl (tree decl, copy_body_data *i
>>               || TREE_CODE (t) == RESULT_DECL || TREE_CODE (t) == PARM_DECL))
>>         {
>>           get_var_ann (t);
>> -         add_referenced_var (t);
>> +         if (TREE_CODE (decl) != VAR_DECL
>> +             || !gimple_referenced_vars (DECL_STRUCT_FUNCTION (id->src_fn))
>> +             || referenced_var_lookup (DECL_STRUCT_FUNCTION (id->src_fn),
>> +                                       DECL_UID (decl)))
>> +           add_referenced_var (t);
>
>> get_var_ann is redundant with add_referenced_var.
>
> Yeah, will drop and unminimize the patch.
>
>> Do we really run into the
>> !gimple_referenced_vars case here?
>
> Some earlier version of the patch did.  Maybe not any more.
>
>> @@ -4753,6 +4757,13 @@ copy_decl_for_dup_finish (copy_body_data
>>         new function.  */
>>      DECL_CONTEXT (copy) = id->dst_fn;
>
>> +  if (TREE_CODE (decl) == VAR_DECL
>> +      && gimple_referenced_vars (cfun)
>> +      && gimple_referenced_vars (DECL_STRUCT_FUNCTION (id->src_fn))
>> +      && referenced_var_lookup (DECL_STRUCT_FUNCTION (id->src_fn),
>> +                               DECL_UID (decl)))
>> +    add_referenced_var (copy);
>
>> Why only for VAR_DECL decls and not to-var-decl translated PARM_DECLs?
>
> Yeah, PARM_DECLs and RESULT_DECLs get unconditional add_referenced_var
> elsewhere.
>
>> I wanted to ask you about
>
>>> That isn't enough.  We often attempt to estimate the stack size of a
>>> function before we as much as put it in SSA form (let alone compute
>>> referenced_vras), while processing other functions.
>
>> anyway, as I don't see why we do that (nor where).
>
> What I noticed was that the trivial patch that moved
> pass_inline_parameters down and asserted gimple_referenced_vars(cfun)
> failed the assertion.  I didn't investigate much, I just figured it made
> sense for one function to look at stack size estimates of its
> potentially-inlined callees to estimate its own, and since the call
> graph may contain cycles, we can't ensure we'll always process callee
> before caller.  If my “figuring” is right, I don't see how to fix this,
> but I'll go back to the drawing board and investigate further.

Yes, we can have cycles and no referenced vars in the callee.  But
we won't inline that callee (as it isn't in SSA form), so we can just
as well skip looking at its estimated stack frame size.

>> The cgraphunit.c change looks ok in this regard, can you explain
>> the predict.c change?
>
> We split a function after profile-guessing, and initialize_cfun copied
> the PROFILE_GUESSED status from the original function to the split.
> Then, when we got to running the various passes over the newly-created
> version, we failed the assertion that profile_status != PROFILE_GUESSED
> in gimple_predict_edge.  I didn't try to find out why this didn't happen
> before the pass rearrangement.  I figured it made sense that the pass
> rearrangement enabled some more inlining/versioning, exposing this kind
> of situation just like it did expose the need for cleanups after early
> inlining to avoid getting SSA verify failures because formerly-throwing
> call stmts ended up known not to throw.

I think you are now feeding more functions into re-optimization
(that's another reason I want to avoid the pass reorg for stage4).

Richard.

>
> --
> Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! -- http://FSFLA.org/   FSF Latin America board member
> Free Software Evangelist      Red Hat Brazil Compiler Engineer
>
Jan Hubicka Feb. 8, 2011, 11:30 a.m. UTC | #5
> Yes, we can have cycles and no referenced vars in the callee.  But
> we won't inline that callee (as it isn't in SSA form), so we can just
> as well skip looking at its estimated stack frame size.

Checking in_ssa_p tests in ipa-inline, I think we check them early enough
(before we check only cgraph properties - whether edge is inlined or inlining
is recursive etc.).

Functions checking inline parameters (check_inline_limits etc.) are done only
after in_ssa_p check is done, so we should be safe here.  (with exception of
disregard inline limits, you can move that check later but any value of that
flag is OK since we won't do anything with the call)

Honza
> 
> >> The cgraphunit.c change looks ok in this regard, can you explain
> >> the predict.c change?
> >
> > We split a function after profile-guessing, and initialize_cfun copied
> > the PROFILE_GUESSED status from the original function to the split.
> > Then, when we got to running the various passes over the newly-created
> > version, we failed the assertion that profile_status != PROFILE_GUESSED
> > in gimple_predict_edge.  I didn't try to find out why this didn't happen
> > before the pass rearrangement.  I figured it made sense that the pass
> > rearrangement enabled some more inlining/versioning, exposing this kind
> > of situation just like it did expose the need for cleanups after early
> > inlining to avoid getting SSA verify failures because formerly-throwing
> > call stmts ended up known not to throw.
> 
> I think you are now feeding more functions into re-optimization
> (that's another reason I want to avoid the pass reorg for stage4).
> 
> Richard.
> 
> >
> > --
> > Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
> > You must be the change you wish to see in the world. -- Gandhi
> > Be Free! -- http://FSFLA.org/   FSF Latin America board member
> > Free Software Evangelist      Red Hat Brazil Compiler Engineer
> >
diff mbox

Patch

Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c.orig      2011-02-07 09:23:32.407058212 -0200
+++ gcc/tree-inline.c   2011-02-07 09:25:20.376780499 -0200
@@ -317,7 +317,11 @@  remap_decl (tree decl, copy_body_data *i
              || TREE_CODE (t) == RESULT_DECL || TREE_CODE (t) == PARM_DECL))
        {
          get_var_ann (t);
-         add_referenced_var (t);
+         if (TREE_CODE (decl) != VAR_DECL
+             || !gimple_referenced_vars (DECL_STRUCT_FUNCTION (id->src_fn))
+             || referenced_var_lookup (DECL_STRUCT_FUNCTION (id->src_fn),
+                                       DECL_UID (decl)))
+           add_referenced_var (t);

get_var_ann is redundant with add_referenced_var.  It probably isn't
needed for debug vars, is it?  id-> contains struct function pointers,
use them, not
DECL_STRUCT_FUNCTION.  Do we really run into the
!gimple_referenced_vars case here?  I think we can't (cfun is in SSA form
and we do not inline non-SSA fns into SSA fns).

@@ -4753,6 +4757,13 @@  copy_decl_for_dup_finish (copy_body_data
        new function.  */
     DECL_CONTEXT (copy) = id->dst_fn;

+  if (TREE_CODE (decl) == VAR_DECL
+      && gimple_referenced_vars (cfun)
+      && gimple_referenced_vars (DECL_STRUCT_FUNCTION (id->src_fn))
+      && referenced_var_lookup (DECL_STRUCT_FUNCTION (id->src_fn),
+                               DECL_UID (decl)))
+    add_referenced_var (copy);

Why only for VAR_DECL decls and not to-var-decl translated PARM_DECLs?
Please use the same (single) check as the other piece, gimple_in_ssa_p (cfun).

I wanted to ask you about

> That isn't enough.  We often attempt to estimate the stack size of a
> function before we as much as put it in SSA form (let alone compute
> referenced_vras), while processing other functions.

anyway, as I don't see why we do that (nor where).  And I'd rather
fix that than doing the pass re-org you did (which are not optimal
for cache locality reasons).  Honza correctly noted that we never
inline non-SSA fns into SSA fns, but maybe we'd query estimates
in wrong order (and we should fix that, instead of your reorg I think).

Thus,

Index: gcc/passes.c
===================================================================
--- gcc/passes.c        (revision 169878)
+++ gcc/passes.c        (working copy)
@@ -729,7 +729,6 @@  init_optimization_passes (void)
   NEXT_PASS (pass_build_cfg);
   NEXT_PASS (pass_warn_function_return);
   NEXT_PASS (pass_build_cgraph_edges);
-  NEXT_PASS (pass_inline_parameters);
   *p = NULL;

   /* Interprocedural optimization passes.  */
@@ -753,6 +752,7 @@  init_optimization_passes (void)
         inline functions to be inlined even at -O0.  This does not
         happen during the first early inline pass.  */
       NEXT_PASS (pass_rebuild_cgraph_edges);
+      NEXT_PASS (pass_inline_parameters);
       NEXT_PASS (pass_early_inline);
       NEXT_PASS (pass_all_early_optimizations);
        {