Message ID | CAO2gOZWAsJcV_tM_0pYyin3-1FQAqGmYX17QjD-b9QbpdczBqg@mail.gmail.com |
---|---|
State | New |
Headers | show |
The patch is ok if performance test passes. For a complete fix, Is it better to tune down PARAM_EARLY_INLINE_INSNS from 11 to a small value for autoFDO or use a different parameter? David On Sun, Jun 2, 2013 at 9:19 PM, Dehao Chen <dehao@google.com> wrote: > I've updated the patch to check it at ipa-inline: > > Index: gcc/ipa-inline.c > =================================================================== > --- gcc/ipa-inline.c (revision 199593) > +++ gcc/ipa-inline.c (working copy) > @@ -434,6 +434,16 @@ want_early_inline_function_p (struct cgraph_edge * > > if (growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_ANY)) > ; > + else if (flag_auto_profile) > + { > + if (dump_file) > + fprintf (dump_file, " will not early inline: %s/%i->%s/%i, " > + "call is cold in profiling and code would grow by %i\n", > + xstrdup (cgraph_node_name (e->caller)), e->caller->uid, > + xstrdup (cgraph_node_name (callee)), callee->uid, > + growth); > + want_inline = false; > + } > else if (!cgraph_maybe_hot_edge_p (e)) > { > if (dump_file) > > Thanks, > Dehao > > On Sun, Jun 2, 2013 at 9:08 PM, Xinliang David Li <davidxl@google.com> wrote: >> If the purpose of the fix is to filter early inlinings with code >> growth in autoFDO, the proposed fix is the wrong way to do -- it >> changes the meaning of cgraph_maybe_hot_edge_p. >> >> David >> >> On Sun, Jun 2, 2013 at 7:25 PM, Dehao Chen <dehao@google.com> wrote: >>> On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li <davidxl@google.com> wrote: >>>> >>>> auto profile info is not available yet in early inlining, why would >>>> this change make any difference? >>> >>> Because the check of PARAM_EARLY_INLINING_INSNS is after the check of >>> cgraph_maybe_hot_edge_p in early inline. If >>> cgraph_maybe_hot_edge_p fails, the early inline will not happen even >>> if growth is less than PARAM_EARLY_INLINING_INSNS. >>> >>>> >>>> Can you just reset the max_iters to a >>>> higher value for autoFDO? >>> >>> We could do that, but it could still lead to some code bloat because >>> recursive inlines can happen for at most, say 10, iterations. >>> >>> Dehao >>> >>>> >>>> David >>>> >>>> On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen <dehao@google.com> wrote: >>>> > The patch was committed to google-4_8, but it causes problem because >>>> > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause >>>> > recursive inlining at einline stage (e.g. main->foo, foo->bar, >>>> > bar->foo) when autofdo is enabled. >>>> > >>>> > The following patch can fix the problem by doing more targetted early inlining: >>>> > >>>> > Index: gcc/predict.c >>>> > =================================================================== >>>> > --- gcc/predict.c (revision 199593) >>>> > +++ gcc/predict.c (working copy) >>>> > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) >>>> > && !maybe_hot_count_p (NULL, >>>> > edge->count)) >>>> > return false; >>>> > + if (flag_auto_profile) >>>> > + return false; >>>> > if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED >>>> > || (edge->callee >>>> > && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)) >>>> > >>>> > Performance testing on-going... >>>> > >>>> > Dehao >>>> > >>>> > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <dehao@google.com> wrote: >>>> >> OK, I'll commit the early inline part. >>>> >> >>>> >> Dehao >>>> >> >>>> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <davidxl@google.com> wrote: >>>> >>> The early inlining part is ok. The tracer optimization should be >>>> >>> revisited -- we should have more fine grain control on it (for >>>> >>> instance, based on FDO summary -- but that should be common to >>>> >>> FDO/LIPO). >>>> >>> >>>> >>> David >>>> >>> >>>> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <dehao@google.com> wrote: >>>> >>>> In gcc4-8, the max einline iterations are restricted to 1. For >>>> >>>> AutoFDO, this is bad because early inline is not size restricted. This >>>> >>>> patch allows einline to do multiple iterations in AutoFDO. It also >>>> >>>> enables tracer optimization in AutoFDO. >>>> >>>> >>>> >>>> Bootstrapped and passed regression test. >>>> >>>> >>>> >>>> OK for googel-4_8? >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Dehao >>>> >>>> >>>> >>>> Index: gcc/ipa-inline.c >>>> >>>> =================================================================== >>>> >>>> --- gcc/ipa-inline.c (revision 199416) >>>> >>>> +++ gcc/ipa-inline.c (working copy) >>>> >>>> @@ -2161,7 +2161,8 @@ early_inliner (void) >>>> >>>> { >>>> >>>> /* We iterate incremental inlining to get trivial cases of indirect >>>> >>>> inlining. */ >>>> >>>> - while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS) >>>> >>>> + while ((flag_auto_profile >>>> >>>> + || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) >>>> >>>> && early_inline_small_functions (node)) >>>> >>>> { >>>> >>>> timevar_push (TV_INTEGRATION); >>>> >>>> Index: gcc/opts.c >>>> >>>> =================================================================== >>>> >>>> --- gcc/opts.c (revision 199416) >>>> >>>> +++ gcc/opts.c (working copy) >>>> >>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, >>>> >>>> opts->x_flag_peel_loops = value; >>>> >>>> if (!opts_set->x_flag_value_profile_transformations) >>>> >>>> opts->x_flag_value_profile_transformations = value; >>>> >>>> + if (!opts_set->x_flag_tracer) >>>> >>>> + opts->x_flag_tracer = value; >>>> >>>> if (!opts_set->x_flag_inline_functions) >>>> >>>> opts->x_flag_inline_functions = value; >>>> >>>> if (!opts_set->x_flag_ipa_cp)
The performance testing is ok. But it does not solve the problem: for some recursive function calls, the size growth is calculated as 0. So I think we may want to just fall back to the 4.7 to limit the iterations to 10 for AutoFDO enabled build? Dehao On Sun, Jun 2, 2013 at 9:36 PM, Xinliang David Li <davidxl@google.com> wrote: > The patch is ok if performance test passes. For a complete fix, Is it > better to tune down PARAM_EARLY_INLINE_INSNS from 11 to a small value > for autoFDO or use a different parameter? > > David > > On Sun, Jun 2, 2013 at 9:19 PM, Dehao Chen <dehao@google.com> wrote: >> I've updated the patch to check it at ipa-inline: >> >> Index: gcc/ipa-inline.c >> =================================================================== >> --- gcc/ipa-inline.c (revision 199593) >> +++ gcc/ipa-inline.c (working copy) >> @@ -434,6 +434,16 @@ want_early_inline_function_p (struct cgraph_edge * >> >> if (growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_ANY)) >> ; >> + else if (flag_auto_profile) >> + { >> + if (dump_file) >> + fprintf (dump_file, " will not early inline: %s/%i->%s/%i, " >> + "call is cold in profiling and code would grow by %i\n", >> + xstrdup (cgraph_node_name (e->caller)), e->caller->uid, >> + xstrdup (cgraph_node_name (callee)), callee->uid, >> + growth); >> + want_inline = false; >> + } >> else if (!cgraph_maybe_hot_edge_p (e)) >> { >> if (dump_file) >> >> Thanks, >> Dehao >> >> On Sun, Jun 2, 2013 at 9:08 PM, Xinliang David Li <davidxl@google.com> wrote: >>> If the purpose of the fix is to filter early inlinings with code >>> growth in autoFDO, the proposed fix is the wrong way to do -- it >>> changes the meaning of cgraph_maybe_hot_edge_p. >>> >>> David >>> >>> On Sun, Jun 2, 2013 at 7:25 PM, Dehao Chen <dehao@google.com> wrote: >>>> On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li <davidxl@google.com> wrote: >>>>> >>>>> auto profile info is not available yet in early inlining, why would >>>>> this change make any difference? >>>> >>>> Because the check of PARAM_EARLY_INLINING_INSNS is after the check of >>>> cgraph_maybe_hot_edge_p in early inline. If >>>> cgraph_maybe_hot_edge_p fails, the early inline will not happen even >>>> if growth is less than PARAM_EARLY_INLINING_INSNS. >>>> >>>>> >>>>> Can you just reset the max_iters to a >>>>> higher value for autoFDO? >>>> >>>> We could do that, but it could still lead to some code bloat because >>>> recursive inlines can happen for at most, say 10, iterations. >>>> >>>> Dehao >>>> >>>>> >>>>> David >>>>> >>>>> On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen <dehao@google.com> wrote: >>>>> > The patch was committed to google-4_8, but it causes problem because >>>>> > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause >>>>> > recursive inlining at einline stage (e.g. main->foo, foo->bar, >>>>> > bar->foo) when autofdo is enabled. >>>>> > >>>>> > The following patch can fix the problem by doing more targetted early inlining: >>>>> > >>>>> > Index: gcc/predict.c >>>>> > =================================================================== >>>>> > --- gcc/predict.c (revision 199593) >>>>> > +++ gcc/predict.c (working copy) >>>>> > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) >>>>> > && !maybe_hot_count_p (NULL, >>>>> > edge->count)) >>>>> > return false; >>>>> > + if (flag_auto_profile) >>>>> > + return false; >>>>> > if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED >>>>> > || (edge->callee >>>>> > && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED)) >>>>> > >>>>> > Performance testing on-going... >>>>> > >>>>> > Dehao >>>>> > >>>>> > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <dehao@google.com> wrote: >>>>> >> OK, I'll commit the early inline part. >>>>> >> >>>>> >> Dehao >>>>> >> >>>>> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <davidxl@google.com> wrote: >>>>> >>> The early inlining part is ok. The tracer optimization should be >>>>> >>> revisited -- we should have more fine grain control on it (for >>>>> >>> instance, based on FDO summary -- but that should be common to >>>>> >>> FDO/LIPO). >>>>> >>> >>>>> >>> David >>>>> >>> >>>>> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <dehao@google.com> wrote: >>>>> >>>> In gcc4-8, the max einline iterations are restricted to 1. For >>>>> >>>> AutoFDO, this is bad because early inline is not size restricted. This >>>>> >>>> patch allows einline to do multiple iterations in AutoFDO. It also >>>>> >>>> enables tracer optimization in AutoFDO. >>>>> >>>> >>>>> >>>> Bootstrapped and passed regression test. >>>>> >>>> >>>>> >>>> OK for googel-4_8? >>>>> >>>> >>>>> >>>> Thanks, >>>>> >>>> Dehao >>>>> >>>> >>>>> >>>> Index: gcc/ipa-inline.c >>>>> >>>> =================================================================== >>>>> >>>> --- gcc/ipa-inline.c (revision 199416) >>>>> >>>> +++ gcc/ipa-inline.c (working copy) >>>>> >>>> @@ -2161,7 +2161,8 @@ early_inliner (void) >>>>> >>>> { >>>>> >>>> /* We iterate incremental inlining to get trivial cases of indirect >>>>> >>>> inlining. */ >>>>> >>>> - while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS) >>>>> >>>> + while ((flag_auto_profile >>>>> >>>> + || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)) >>>>> >>>> && early_inline_small_functions (node)) >>>>> >>>> { >>>>> >>>> timevar_push (TV_INTEGRATION); >>>>> >>>> Index: gcc/opts.c >>>>> >>>> =================================================================== >>>>> >>>> --- gcc/opts.c (revision 199416) >>>>> >>>> +++ gcc/opts.c (working copy) >>>>> >>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts, >>>>> >>>> opts->x_flag_peel_loops = value; >>>>> >>>> if (!opts_set->x_flag_value_profile_transformations) >>>>> >>>> opts->x_flag_value_profile_transformations = value; >>>>> >>>> + if (!opts_set->x_flag_tracer) >>>>> >>>> + opts->x_flag_tracer = value; >>>>> >>>> if (!opts_set->x_flag_inline_functions) >>>>> >>>> opts->x_flag_inline_functions = value; >>>>> >>>> if (!opts_set->x_flag_ipa_cp)
Index: gcc/ipa-inline.c =================================================================== --- gcc/ipa-inline.c (revision 199593) +++ gcc/ipa-inline.c (working copy) @@ -434,6 +434,16 @@ want_early_inline_function_p (struct cgraph_edge * if (growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_ANY)) ; + else if (flag_auto_profile) + { + if (dump_file) + fprintf (dump_file, " will not early inline: %s/%i->%s/%i, " + "call is cold in profiling and code would grow by %i\n", + xstrdup (cgraph_node_name (e->caller)), e->caller->uid, + xstrdup (cgraph_node_name (callee)), callee->uid, + growth); + want_inline = false; + } else if (!cgraph_maybe_hot_edge_p (e)) { if (dump_file)