diff mbox

Fix PR37448 (and dups?)

Message ID alpine.LSU.2.11.1602191024470.31547@t29.fhfr.qr
State New
Headers show

Commit Message

Richard Biener Feb. 19, 2016, 9:31 a.m. UTC
On Thu, 18 Feb 2016, Richard Biener wrote:

> On Thu, 18 Feb 2016, Jan Hubicka wrote:
> 
> > > 
> > > This fixes the IPA inline analysis quadraticness that arises when we
> > > inline functions into the same function many times via
> > > inline_to_all_callers as we do in the testcase for PR37448.  In that
> > > case updating the overall summary of the caller is done for each
> > > inlined call instead of just once.
> > > 
> > > The solution is to keep track of which nodes we inlined to and
> > > delay the summary update.
> > > 
> > > Honza, I believe this is safe as the callers summary should only
> > > matter for further inline decisions and not this one (inlining
> > > into all callers).  Double-checking is appreciated - there should
> > > be no code changes by this patch (I verified this on the PR37448
> > > testcase).
> > 
> > I think it is not the case if one function contains multiple calls
> > and eventually hits large function growth.  In that case we inline
> > just some callers and not the others.
> 
> Are you sure?  I thought we compute that up-front ... at least
> we make sure we can_inline_edge_p all calls before even trying
> to inline all calls - that looks somewhat redundant then if it
> can happen that we give up anyway.  Ah, so can_inline_edge_p has
> 
>   /* Check if caller growth allows the inlining.  */
>   else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
>            && !disregard_limits
>            && !lookup_attribute ("flatten",
>                                  DECL_ATTRIBUTES (caller->decl))
>            && !caller_growth_limits (e))
>     inlinable = false;
> 
> and we check that from when we do the inlining and upfront from
> want_inline_function_to_all_callers_p ... we also give up
> if we inline a recursive function it seems.
> 
> > For next stage1 I plan to finish the overahaul to sreals that will let
> > me to make update_overall_summary incremental (i.e. O(size_of_inlined_function)
> > and not O(size_of_inlined_function+size_of_function_inlined_to)) that will
> > also solve these issues.
> 
> Hopefully.
> 
> > One can probably do a pesimistic estimate on code size of the function 
> > (i.e. add the size of inlined function) and only if that hits the large 
> > function growth do the exact calculation. Or we can just decide that it 
> > is OK to ignore large function growht in this partiuclar case.
> 
> Ignoring it is probably not a good idea and will just lead to a different
> degenerate case.  As we update the summary afterwards it's probably
> ok to do incremental (pessimistic) updates on the size anyway?  In
> inline_merge_summary I mean?  Or should I open-code that into
> inline_call if ! update_overall_summary?

So like below - I update self-size by the growth estimate which is
supposed to match, this should keep the overall function growth
limits working, not sure about the stack stuff but that doesn't seem
to be updated by update_overall_summaries either.

This also fixes a similar issue in early inlining where we can then
trivially delay update_overall_summaries.  I remember seeing early
inline analysis behave similarly quadratic.

It leaves alone recursive and IPA small function inlining as those
update overall unit size as well - I first thought about somehow
making overall summary update lazy, but that looks quite complicated
and IIRC the round-off errors issue was when re-computing the
key for the fibheap after doing one inlining, right?

I hope you can get to use sreals early in next stage1 though I wonder
how that will avoid all corner-cases.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.  Ok for trunk
and branch?

Thanks,
Richard.

2016-02-18  Richard Biener  <rguenther@suse.de>

	PR ipa/37448
	* ipa-inline-transform.c (inline_call): When not updating
	overall summaries adjust self size by the growth estimate.
	* ipa-inline.c (inline_to_all_callers_1): Add to the callers
	hash-set, do not update overall summaries here.  Renamed from ...
	(inline_to_all_callers): ... this which is now wrapping the
	above and performing delayed overall summary update.
	(early_inline_small_functions): Delay updating of the overall
	summary.

Comments

Richard Biener Feb. 19, 2016, 10:05 a.m. UTC | #1
On Fri, 19 Feb 2016, Richard Biener wrote:

> On Thu, 18 Feb 2016, Richard Biener wrote:
> 
> > On Thu, 18 Feb 2016, Jan Hubicka wrote:
> > 
> > > > 
> > > > This fixes the IPA inline analysis quadraticness that arises when we
> > > > inline functions into the same function many times via
> > > > inline_to_all_callers as we do in the testcase for PR37448.  In that
> > > > case updating the overall summary of the caller is done for each
> > > > inlined call instead of just once.
> > > > 
> > > > The solution is to keep track of which nodes we inlined to and
> > > > delay the summary update.
> > > > 
> > > > Honza, I believe this is safe as the callers summary should only
> > > > matter for further inline decisions and not this one (inlining
> > > > into all callers).  Double-checking is appreciated - there should
> > > > be no code changes by this patch (I verified this on the PR37448
> > > > testcase).
> > > 
> > > I think it is not the case if one function contains multiple calls
> > > and eventually hits large function growth.  In that case we inline
> > > just some callers and not the others.
> > 
> > Are you sure?  I thought we compute that up-front ... at least
> > we make sure we can_inline_edge_p all calls before even trying
> > to inline all calls - that looks somewhat redundant then if it
> > can happen that we give up anyway.  Ah, so can_inline_edge_p has
> > 
> >   /* Check if caller growth allows the inlining.  */
> >   else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
> >            && !disregard_limits
> >            && !lookup_attribute ("flatten",
> >                                  DECL_ATTRIBUTES (caller->decl))
> >            && !caller_growth_limits (e))
> >     inlinable = false;
> > 
> > and we check that from when we do the inlining and upfront from
> > want_inline_function_to_all_callers_p ... we also give up
> > if we inline a recursive function it seems.
> > 
> > > For next stage1 I plan to finish the overahaul to sreals that will let
> > > me to make update_overall_summary incremental (i.e. O(size_of_inlined_function)
> > > and not O(size_of_inlined_function+size_of_function_inlined_to)) that will
> > > also solve these issues.
> > 
> > Hopefully.
> > 
> > > One can probably do a pesimistic estimate on code size of the function 
> > > (i.e. add the size of inlined function) and only if that hits the large 
> > > function growth do the exact calculation. Or we can just decide that it 
> > > is OK to ignore large function growht in this partiuclar case.
> > 
> > Ignoring it is probably not a good idea and will just lead to a different
> > degenerate case.  As we update the summary afterwards it's probably
> > ok to do incremental (pessimistic) updates on the size anyway?  In
> > inline_merge_summary I mean?  Or should I open-code that into
> > inline_call if ! update_overall_summary?
> 
> So like below - I update self-size by the growth estimate which is
> supposed to match, this should keep the overall function growth
> limits working, not sure about the stack stuff but that doesn't seem
> to be updated by update_overall_summaries either.
> 
> This also fixes a similar issue in early inlining where we can then
> trivially delay update_overall_summaries.  I remember seeing early
> inline analysis behave similarly quadratic.
> 
> It leaves alone recursive and IPA small function inlining as those
> update overall unit size as well - I first thought about somehow
> making overall summary update lazy, but that looks quite complicated
> and IIRC the round-off errors issue was when re-computing the
> key for the fibheap after doing one inlining, right?

So like having a ->need_update flag in the summaries and in


...
          inline_call (edge, true, &new_indirect_edges, &overall_size, 
true);
          add_new_edges_to_heap (&edge_heap, new_indirect_edges);

          reset_edge_caches (edge->callee->function_symbol ());

          update_callee_keys (&edge_heap, where, updated_nodes);
        }
..

update caller/callee keys with the "estimate" but set a ->need_key_update
flag on them.  When getting the new min key

  while (!edge_heap.empty ())
    {
      int old_size = overall_size;
      struct cgraph_node *where, *callee;
      sreal badness = edge_heap.min_key ();

and ->need_update is set re-compute summaries and re-compute the
key (re-inserting it if it isn't still the minimum).

Of course making the incremental updates just work would be best.

> I hope you can get to use sreals early in next stage1 though I wonder
> how that will avoid all corner-cases.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.  Ok for trunk
> and branch?
> 
> Thanks,
> Richard.
> 
> 2016-02-18  Richard Biener  <rguenther@suse.de>
> 
> 	PR ipa/37448
> 	* ipa-inline-transform.c (inline_call): When not updating
> 	overall summaries adjust self size by the growth estimate.
> 	* ipa-inline.c (inline_to_all_callers_1): Add to the callers
> 	hash-set, do not update overall summaries here.  Renamed from ...
> 	(inline_to_all_callers): ... this which is now wrapping the
> 	above and performing delayed overall summary update.
> 	(early_inline_small_functions): Delay updating of the overall
> 	summary.
> 
> Index: gcc/ipa-inline-transform.c
> ===================================================================
> --- gcc/ipa-inline-transform.c	(revision 233498)
> +++ gcc/ipa-inline-transform.c	(working copy)
> @@ -300,10 +300,12 @@ inline_call (struct cgraph_edge *e, bool
>    struct cgraph_edge *curr = e;
>    struct cgraph_node *callee = e->callee->ultimate_alias_target ();
>    bool new_edges_found = false;
> +  int estimated_growth;
>  
> +  if (! update_overall_summary)
> +    estimated_growth = estimate_edge_growth (e);
>    /* This is used only for assert bellow.  */
>  #if 0
> -  int estimated_growth = estimate_edge_growth (e);
>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>  #endif
>  
> @@ -373,7 +375,13 @@ inline_call (struct cgraph_edge *e, bool
>      new_edges_found = ipa_propagate_indirect_call_infos (curr, new_edges);
>    check_speculations (e->callee);
>    if (update_overall_summary)
> -   inline_update_overall_summary (to);
> +    inline_update_overall_summary (to);
> +  else
> +    /* Update self size by the estimate so overall function growth limits
> +       work for further inlining into this function.  Before inlining
> +       the function we inlined to again we expect the caller to update
> +       the overall summary.  */
> +    inline_summaries->get (to)->size += estimated_growth;
>    new_size = inline_summaries->get (to)->size;
>  
>    if (callee->calls_comdat_local)
> Index: gcc/ipa-inline.c
> ===================================================================
> --- gcc/ipa-inline.c	(revision 233498)
> +++ gcc/ipa-inline.c	(working copy)
> @@ -2163,7 +2163,8 @@ flatten_function (struct cgraph_node *no
>     recursion.  */
>  
>  static bool
> -inline_to_all_callers (struct cgraph_node *node, void *data)
> +inline_to_all_callers_1 (struct cgraph_node *node, void *data,
> +			 hash_set<cgraph_node *> *callers)
>  {
>    int *num_calls = (int *)data;
>    bool callee_removed = false;
> @@ -2193,7 +2194,10 @@ inline_to_all_callers (struct cgraph_nod
>  		   inline_summaries->get (node->callers->caller)->size);
>  	}
>  
> -      inline_call (node->callers, true, NULL, NULL, true, &callee_removed);
> +      /* Remember which callers we inlined to, delaying updating the
> +	 overall summary.  */
> +      callers->add (node->callers->caller);
> +      inline_call (node->callers, true, NULL, NULL, false, &callee_removed);
>        if (dump_file)
>  	fprintf (dump_file,
>  		 " Inlined into %s which now has %i size\n",
> @@ -2211,6 +2215,23 @@ inline_to_all_callers (struct cgraph_nod
>    return false;
>  }
>  
> +/* Wrapper around inline_to_all_callers_1 doing delayed overall summary
> +   update.  */
> +
> +static bool
> +inline_to_all_callers (struct cgraph_node *node, void *data)
> +{
> +  hash_set<cgraph_node *> callers;
> +  bool res = inline_to_all_callers_1 (node, data, &callers);
> +  /* Perform the delayed update of the overall summary of all callers
> +     processed.  This avoids quadratic behavior in the cases where
> +     we have a lot of calls to the same function.  */
> +  for (hash_set<cgraph_node *>::iterator i = callers.begin ();
> +       i != callers.end (); ++i)
> +    inline_update_overall_summary (*i);
> +  return res;
> +}
> +
>  /* Output overall time estimate.  */
>  static void
>  dump_overall_stats (void)
> @@ -2590,10 +2611,13 @@ early_inline_small_functions (struct cgr
>  	fprintf (dump_file, " Inlining %s into %s.\n",
>  		 xstrdup_for_dump (callee->name ()),
>  		 xstrdup_for_dump (e->caller->name ()));
> -      inline_call (e, true, NULL, NULL, true);
> +      inline_call (e, true, NULL, NULL, false);
>        inlined = true;
>      }
>  
> +  if (inlined)
> +    inline_update_overall_summary (node);
> +
>    return inlined;
>  }
>  
>
Jan Hubicka Feb. 19, 2016, 3:59 p.m. UTC | #2
> > 
> > Are you sure?  I thought we compute that up-front ... at least
> > we make sure we can_inline_edge_p all calls before even trying
> > to inline all calls - that looks somewhat redundant then if it
> > can happen that we give up anyway.  Ah, so can_inline_edge_p has
> > 
> >   /* Check if caller growth allows the inlining.  */
> >   else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
> >            && !disregard_limits
> >            && !lookup_attribute ("flatten",
> >                                  DECL_ATTRIBUTES (caller->decl))
> >            && !caller_growth_limits (e))
> >     inlinable = false;
> > 
> > and we check that from when we do the inlining and upfront from

> > want_inline_function_to_all_callers_p ... we also give up
> > if we inline a recursive function it seems.
Yep, there is early check for recursion and sizes. But you can still hit the
size limits later. If a function A calls 1000 times function B, each of the
calls individually may be inlinable, but not all together.
> > 
> > > For next stage1 I plan to finish the overahaul to sreals that will let
> > > me to make update_overall_summary incremental (i.e. O(size_of_inlined_function)
> > > and not O(size_of_inlined_function+size_of_function_inlined_to)) that will
> > > also solve these issues.
> > 
> > Hopefully.
> > 
> > > One can probably do a pesimistic estimate on code size of the function 
> > > (i.e. add the size of inlined function) and only if that hits the large 
> > > function growth do the exact calculation. Or we can just decide that it 
> > > is OK to ignore large function growht in this partiuclar case.
> > 
> > Ignoring it is probably not a good idea and will just lead to a different
> > degenerate case.  As we update the summary afterwards it's probably
> > ok to do incremental (pessimistic) updates on the size anyway?  In
> > inline_merge_summary I mean?  Or should I open-code that into
> > inline_call if ! update_overall_summary?
> 
> So like below - I update self-size by the growth estimate which is
> supposed to match, this should keep the overall function growth
> limits working, not sure about the stack stuff but that doesn't seem
> to be updated by update_overall_summaries either.

Not really, the sizes are also computed in fixed point math, because we estimate probabilities
when insn is going to be removed. There are some cases wher estimate_growth is not as precise
as the real thing, you can see the #if 0 out asstert in ipa-inline-transform abou tthat.
Well, for inline functions called once, I suppose precision of these details is not as important,
so we may just go with the patch. 
I will finish the sreal conversion next stage1 (I got stuck on this patch series on gengtype change
to allow sreal inline_summary which I do not think ever got reviewed) and do the incremental update.
So if you fell this PR is important, go ahead with the patch.

Honza
> 
> This also fixes a similar issue in early inlining where we can then
> trivially delay update_overall_summaries.  I remember seeing early
> inline analysis behave similarly quadratic.
> 
> It leaves alone recursive and IPA small function inlining as those
> update overall unit size as well - I first thought about somehow
> making overall summary update lazy, but that looks quite complicated
> and IIRC the round-off errors issue was when re-computing the
> key for the fibheap after doing one inlining, right?
> 
> I hope you can get to use sreals early in next stage1 though I wonder
> how that will avoid all corner-cases.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.  Ok for trunk
> and branch?
> 
> Thanks,
> Richard.
> 
> 2016-02-18  Richard Biener  <rguenther@suse.de>
> 
> 	PR ipa/37448
> 	* ipa-inline-transform.c (inline_call): When not updating
> 	overall summaries adjust self size by the growth estimate.
> 	* ipa-inline.c (inline_to_all_callers_1): Add to the callers
> 	hash-set, do not update overall summaries here.  Renamed from ...
> 	(inline_to_all_callers): ... this which is now wrapping the
> 	above and performing delayed overall summary update.
> 	(early_inline_small_functions): Delay updating of the overall
> 	summary.
> 
> Index: gcc/ipa-inline-transform.c
> ===================================================================
> --- gcc/ipa-inline-transform.c	(revision 233498)
> +++ gcc/ipa-inline-transform.c	(working copy)
> @@ -300,10 +300,12 @@ inline_call (struct cgraph_edge *e, bool
>    struct cgraph_edge *curr = e;
>    struct cgraph_node *callee = e->callee->ultimate_alias_target ();
>    bool new_edges_found = false;
> +  int estimated_growth;
>  
> +  if (! update_overall_summary)
> +    estimated_growth = estimate_edge_growth (e);
>    /* This is used only for assert bellow.  */
>  #if 0
> -  int estimated_growth = estimate_edge_growth (e);
>    bool predicated = inline_edge_summary (e)->predicate != NULL;
>  #endif
>  
> @@ -373,7 +375,13 @@ inline_call (struct cgraph_edge *e, bool
>      new_edges_found = ipa_propagate_indirect_call_infos (curr, new_edges);
>    check_speculations (e->callee);
>    if (update_overall_summary)
> -   inline_update_overall_summary (to);
> +    inline_update_overall_summary (to);
> +  else
> +    /* Update self size by the estimate so overall function growth limits
> +       work for further inlining into this function.  Before inlining
> +       the function we inlined to again we expect the caller to update
> +       the overall summary.  */
> +    inline_summaries->get (to)->size += estimated_growth;
>    new_size = inline_summaries->get (to)->size;
>  
>    if (callee->calls_comdat_local)
> Index: gcc/ipa-inline.c
> ===================================================================
> --- gcc/ipa-inline.c	(revision 233498)
> +++ gcc/ipa-inline.c	(working copy)
> @@ -2163,7 +2163,8 @@ flatten_function (struct cgraph_node *no
>     recursion.  */
>  
>  static bool
> -inline_to_all_callers (struct cgraph_node *node, void *data)
> +inline_to_all_callers_1 (struct cgraph_node *node, void *data,
> +			 hash_set<cgraph_node *> *callers)
>  {
>    int *num_calls = (int *)data;
>    bool callee_removed = false;
> @@ -2193,7 +2194,10 @@ inline_to_all_callers (struct cgraph_nod
>  		   inline_summaries->get (node->callers->caller)->size);
>  	}
>  
> -      inline_call (node->callers, true, NULL, NULL, true, &callee_removed);
> +      /* Remember which callers we inlined to, delaying updating the
> +	 overall summary.  */
> +      callers->add (node->callers->caller);
> +      inline_call (node->callers, true, NULL, NULL, false, &callee_removed);
>        if (dump_file)
>  	fprintf (dump_file,
>  		 " Inlined into %s which now has %i size\n",
> @@ -2211,6 +2215,23 @@ inline_to_all_callers (struct cgraph_nod
>    return false;
>  }
>  
> +/* Wrapper around inline_to_all_callers_1 doing delayed overall summary
> +   update.  */
> +
> +static bool
> +inline_to_all_callers (struct cgraph_node *node, void *data)
> +{
> +  hash_set<cgraph_node *> callers;
> +  bool res = inline_to_all_callers_1 (node, data, &callers);
> +  /* Perform the delayed update of the overall summary of all callers
> +     processed.  This avoids quadratic behavior in the cases where
> +     we have a lot of calls to the same function.  */
> +  for (hash_set<cgraph_node *>::iterator i = callers.begin ();
> +       i != callers.end (); ++i)
> +    inline_update_overall_summary (*i);
> +  return res;
> +}
> +
>  /* Output overall time estimate.  */
>  static void
>  dump_overall_stats (void)
> @@ -2590,10 +2611,13 @@ early_inline_small_functions (struct cgr
>  	fprintf (dump_file, " Inlining %s into %s.\n",
>  		 xstrdup_for_dump (callee->name ()),
>  		 xstrdup_for_dump (e->caller->name ()));
> -      inline_call (e, true, NULL, NULL, true);
> +      inline_call (e, true, NULL, NULL, false);
>        inlined = true;
>      }
>  
> +  if (inlined)
> +    inline_update_overall_summary (node);
> +
>    return inlined;
>  }
>
Richard Biener Feb. 22, 2016, 9:39 a.m. UTC | #3
On Fri, 19 Feb 2016, Jan Hubicka wrote:

> > > 
> > > Are you sure?  I thought we compute that up-front ... at least
> > > we make sure we can_inline_edge_p all calls before even trying
> > > to inline all calls - that looks somewhat redundant then if it
> > > can happen that we give up anyway.  Ah, so can_inline_edge_p has
> > > 
> > >   /* Check if caller growth allows the inlining.  */
> > >   else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
> > >            && !disregard_limits
> > >            && !lookup_attribute ("flatten",
> > >                                  DECL_ATTRIBUTES (caller->decl))
> > >            && !caller_growth_limits (e))
> > >     inlinable = false;
> > > 
> > > and we check that from when we do the inlining and upfront from
> 
> > > want_inline_function_to_all_callers_p ... we also give up
> > > if we inline a recursive function it seems.
> Yep, there is early check for recursion and sizes. But you can still hit the
> size limits later. If a function A calls 1000 times function B, each of the
> calls individually may be inlinable, but not all together.
> > > 
> > > > For next stage1 I plan to finish the overahaul to sreals that will let
> > > > me to make update_overall_summary incremental (i.e. O(size_of_inlined_function)
> > > > and not O(size_of_inlined_function+size_of_function_inlined_to)) that will
> > > > also solve these issues.
> > > 
> > > Hopefully.
> > > 
> > > > One can probably do a pesimistic estimate on code size of the function 
> > > > (i.e. add the size of inlined function) and only if that hits the large 
> > > > function growth do the exact calculation. Or we can just decide that it 
> > > > is OK to ignore large function growht in this partiuclar case.
> > > 
> > > Ignoring it is probably not a good idea and will just lead to a different
> > > degenerate case.  As we update the summary afterwards it's probably
> > > ok to do incremental (pessimistic) updates on the size anyway?  In
> > > inline_merge_summary I mean?  Or should I open-code that into
> > > inline_call if ! update_overall_summary?
> > 
> > So like below - I update self-size by the growth estimate which is
> > supposed to match, this should keep the overall function growth
> > limits working, not sure about the stack stuff but that doesn't seem
> > to be updated by update_overall_summaries either.
> 
> Not really, the sizes are also computed in fixed point math, because we 
> estimate probabilities when insn is going to be removed. There are some 
> cases wher estimate_growth is not as precise as the real thing, you can 
> see the #if 0 out asstert in ipa-inline-transform abou tthat. Well, for 
> inline functions called once, I suppose precision of these details is 
> not as important, so we may just go with the patch. I will finish the 
> sreal conversion next stage1 (I got stuck on this patch series on 
> gengtype change to allow sreal inline_summary which I do not think ever 
> got reviewed) and do the incremental update. So if you fell this PR is 
> important, go ahead with the patch.

I installed it on trunk.  It's a goal to have no quadratic algorithms
in -O1 (yeah, out-of-SSA and RA are some known left-overs here).

Richard.

> Honza
> > 
> > This also fixes a similar issue in early inlining where we can then
> > trivially delay update_overall_summaries.  I remember seeing early
> > inline analysis behave similarly quadratic.
> > 
> > It leaves alone recursive and IPA small function inlining as those
> > update overall unit size as well - I first thought about somehow
> > making overall summary update lazy, but that looks quite complicated
> > and IIRC the round-off errors issue was when re-computing the
> > key for the fibheap after doing one inlining, right?
> > 
> > I hope you can get to use sreals early in next stage1 though I wonder
> > how that will avoid all corner-cases.
> > 
> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.  Ok for trunk
> > and branch?
> > 
> > Thanks,
> > Richard.
> > 
> > 2016-02-18  Richard Biener  <rguenther@suse.de>
> > 
> > 	PR ipa/37448
> > 	* ipa-inline-transform.c (inline_call): When not updating
> > 	overall summaries adjust self size by the growth estimate.
> > 	* ipa-inline.c (inline_to_all_callers_1): Add to the callers
> > 	hash-set, do not update overall summaries here.  Renamed from ...
> > 	(inline_to_all_callers): ... this which is now wrapping the
> > 	above and performing delayed overall summary update.
> > 	(early_inline_small_functions): Delay updating of the overall
> > 	summary.
> > 
> > Index: gcc/ipa-inline-transform.c
> > ===================================================================
> > --- gcc/ipa-inline-transform.c	(revision 233498)
> > +++ gcc/ipa-inline-transform.c	(working copy)
> > @@ -300,10 +300,12 @@ inline_call (struct cgraph_edge *e, bool
> >    struct cgraph_edge *curr = e;
> >    struct cgraph_node *callee = e->callee->ultimate_alias_target ();
> >    bool new_edges_found = false;
> > +  int estimated_growth;
> >  
> > +  if (! update_overall_summary)
> > +    estimated_growth = estimate_edge_growth (e);
> >    /* This is used only for assert bellow.  */
> >  #if 0
> > -  int estimated_growth = estimate_edge_growth (e);
> >    bool predicated = inline_edge_summary (e)->predicate != NULL;
> >  #endif
> >  
> > @@ -373,7 +375,13 @@ inline_call (struct cgraph_edge *e, bool
> >      new_edges_found = ipa_propagate_indirect_call_infos (curr, new_edges);
> >    check_speculations (e->callee);
> >    if (update_overall_summary)
> > -   inline_update_overall_summary (to);
> > +    inline_update_overall_summary (to);
> > +  else
> > +    /* Update self size by the estimate so overall function growth limits
> > +       work for further inlining into this function.  Before inlining
> > +       the function we inlined to again we expect the caller to update
> > +       the overall summary.  */
> > +    inline_summaries->get (to)->size += estimated_growth;
> >    new_size = inline_summaries->get (to)->size;
> >  
> >    if (callee->calls_comdat_local)
> > Index: gcc/ipa-inline.c
> > ===================================================================
> > --- gcc/ipa-inline.c	(revision 233498)
> > +++ gcc/ipa-inline.c	(working copy)
> > @@ -2163,7 +2163,8 @@ flatten_function (struct cgraph_node *no
> >     recursion.  */
> >  
> >  static bool
> > -inline_to_all_callers (struct cgraph_node *node, void *data)
> > +inline_to_all_callers_1 (struct cgraph_node *node, void *data,
> > +			 hash_set<cgraph_node *> *callers)
> >  {
> >    int *num_calls = (int *)data;
> >    bool callee_removed = false;
> > @@ -2193,7 +2194,10 @@ inline_to_all_callers (struct cgraph_nod
> >  		   inline_summaries->get (node->callers->caller)->size);
> >  	}
> >  
> > -      inline_call (node->callers, true, NULL, NULL, true, &callee_removed);
> > +      /* Remember which callers we inlined to, delaying updating the
> > +	 overall summary.  */
> > +      callers->add (node->callers->caller);
> > +      inline_call (node->callers, true, NULL, NULL, false, &callee_removed);
> >        if (dump_file)
> >  	fprintf (dump_file,
> >  		 " Inlined into %s which now has %i size\n",
> > @@ -2211,6 +2215,23 @@ inline_to_all_callers (struct cgraph_nod
> >    return false;
> >  }
> >  
> > +/* Wrapper around inline_to_all_callers_1 doing delayed overall summary
> > +   update.  */
> > +
> > +static bool
> > +inline_to_all_callers (struct cgraph_node *node, void *data)
> > +{
> > +  hash_set<cgraph_node *> callers;
> > +  bool res = inline_to_all_callers_1 (node, data, &callers);
> > +  /* Perform the delayed update of the overall summary of all callers
> > +     processed.  This avoids quadratic behavior in the cases where
> > +     we have a lot of calls to the same function.  */
> > +  for (hash_set<cgraph_node *>::iterator i = callers.begin ();
> > +       i != callers.end (); ++i)
> > +    inline_update_overall_summary (*i);
> > +  return res;
> > +}
> > +
> >  /* Output overall time estimate.  */
> >  static void
> >  dump_overall_stats (void)
> > @@ -2590,10 +2611,13 @@ early_inline_small_functions (struct cgr
> >  	fprintf (dump_file, " Inlining %s into %s.\n",
> >  		 xstrdup_for_dump (callee->name ()),
> >  		 xstrdup_for_dump (e->caller->name ()));
> > -      inline_call (e, true, NULL, NULL, true);
> > +      inline_call (e, true, NULL, NULL, false);
> >        inlined = true;
> >      }
> >  
> > +  if (inlined)
> > +    inline_update_overall_summary (node);
> > +
> >    return inlined;
> >  }
> >  
> 
>
diff mbox

Patch

Index: gcc/ipa-inline-transform.c
===================================================================
--- gcc/ipa-inline-transform.c	(revision 233498)
+++ gcc/ipa-inline-transform.c	(working copy)
@@ -300,10 +300,12 @@  inline_call (struct cgraph_edge *e, bool
   struct cgraph_edge *curr = e;
   struct cgraph_node *callee = e->callee->ultimate_alias_target ();
   bool new_edges_found = false;
+  int estimated_growth;
 
+  if (! update_overall_summary)
+    estimated_growth = estimate_edge_growth (e);
   /* This is used only for assert bellow.  */
 #if 0
-  int estimated_growth = estimate_edge_growth (e);
   bool predicated = inline_edge_summary (e)->predicate != NULL;
 #endif
 
@@ -373,7 +375,13 @@  inline_call (struct cgraph_edge *e, bool
     new_edges_found = ipa_propagate_indirect_call_infos (curr, new_edges);
   check_speculations (e->callee);
   if (update_overall_summary)
-   inline_update_overall_summary (to);
+    inline_update_overall_summary (to);
+  else
+    /* Update self size by the estimate so overall function growth limits
+       work for further inlining into this function.  Before inlining
+       the function we inlined to again we expect the caller to update
+       the overall summary.  */
+    inline_summaries->get (to)->size += estimated_growth;
   new_size = inline_summaries->get (to)->size;
 
   if (callee->calls_comdat_local)
Index: gcc/ipa-inline.c
===================================================================
--- gcc/ipa-inline.c	(revision 233498)
+++ gcc/ipa-inline.c	(working copy)
@@ -2163,7 +2163,8 @@  flatten_function (struct cgraph_node *no
    recursion.  */
 
 static bool
-inline_to_all_callers (struct cgraph_node *node, void *data)
+inline_to_all_callers_1 (struct cgraph_node *node, void *data,
+			 hash_set<cgraph_node *> *callers)
 {
   int *num_calls = (int *)data;
   bool callee_removed = false;
@@ -2193,7 +2194,10 @@  inline_to_all_callers (struct cgraph_nod
 		   inline_summaries->get (node->callers->caller)->size);
 	}
 
-      inline_call (node->callers, true, NULL, NULL, true, &callee_removed);
+      /* Remember which callers we inlined to, delaying updating the
+	 overall summary.  */
+      callers->add (node->callers->caller);
+      inline_call (node->callers, true, NULL, NULL, false, &callee_removed);
       if (dump_file)
 	fprintf (dump_file,
 		 " Inlined into %s which now has %i size\n",
@@ -2211,6 +2215,23 @@  inline_to_all_callers (struct cgraph_nod
   return false;
 }
 
+/* Wrapper around inline_to_all_callers_1 doing delayed overall summary
+   update.  */
+
+static bool
+inline_to_all_callers (struct cgraph_node *node, void *data)
+{
+  hash_set<cgraph_node *> callers;
+  bool res = inline_to_all_callers_1 (node, data, &callers);
+  /* Perform the delayed update of the overall summary of all callers
+     processed.  This avoids quadratic behavior in the cases where
+     we have a lot of calls to the same function.  */
+  for (hash_set<cgraph_node *>::iterator i = callers.begin ();
+       i != callers.end (); ++i)
+    inline_update_overall_summary (*i);
+  return res;
+}
+
 /* Output overall time estimate.  */
 static void
 dump_overall_stats (void)
@@ -2590,10 +2611,13 @@  early_inline_small_functions (struct cgr
 	fprintf (dump_file, " Inlining %s into %s.\n",
 		 xstrdup_for_dump (callee->name ()),
 		 xstrdup_for_dump (e->caller->name ()));
-      inline_call (e, true, NULL, NULL, true);
+      inline_call (e, true, NULL, NULL, false);
       inlined = true;
     }
 
+  if (inlined)
+    inline_update_overall_summary (node);
+
   return inlined;
 }