diff mbox

[GOOGLE] Unrestrict early inline restrictions for AutoFDO

Message ID CAO2gOZWAsJcV_tM_0pYyin3-1FQAqGmYX17QjD-b9QbpdczBqg@mail.gmail.com
State New
Headers show

Commit Message

Dehao Chen June 3, 2013, 4:19 a.m. UTC
I've updated the patch to check it at ipa-inline:


Thanks,
Dehao

On Sun, Jun 2, 2013 at 9:08 PM, Xinliang David Li <davidxl@google.com> wrote:
> If the purpose of the fix is to filter early inlinings with code
> growth in autoFDO, the proposed fix is the wrong way to do -- it
> changes the meaning of cgraph_maybe_hot_edge_p.
>
> David
>
> On Sun, Jun 2, 2013 at 7:25 PM, Dehao Chen <dehao@google.com> wrote:
>> On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>
>>> auto profile info is not available yet in early inlining, why would
>>> this change make any difference?
>>
>> Because the check of PARAM_EARLY_INLINING_INSNS is after the check of
>> cgraph_maybe_hot_edge_p in early inline. If
>> cgraph_maybe_hot_edge_p fails, the early inline will not happen even
>> if growth is less than PARAM_EARLY_INLINING_INSNS.
>>
>>>
>>> Can you just reset the max_iters to a
>>> higher value for autoFDO?
>>
>> We could do that, but it could still lead to some code bloat because
>> recursive inlines can happen for at most, say 10, iterations.
>>
>> Dehao
>>
>>>
>>> David
>>>
>>> On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen <dehao@google.com> wrote:
>>> > The patch was committed to google-4_8, but it causes problem because
>>> > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause
>>> > recursive inlining at einline stage (e.g. main->foo, foo->bar,
>>> > bar->foo) when autofdo is enabled.
>>> >
>>> > The following patch can fix the problem by doing more targetted early inlining:
>>> >
>>> > Index: gcc/predict.c
>>> > ===================================================================
>>> > --- gcc/predict.c (revision 199593)
>>> > +++ gcc/predict.c (working copy)
>>> > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
>>> >        && !maybe_hot_count_p (NULL,
>>> >                               edge->count))
>>> >      return false;
>>> > +  if (flag_auto_profile)
>>> > +    return false;
>>> >    if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
>>> >        || (edge->callee
>>> >    && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))
>>> >
>>> > Performance testing on-going...
>>> >
>>> > Dehao
>>> >
>>> > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <dehao@google.com> wrote:
>>> >> OK, I'll commit the early inline part.
>>> >>
>>> >> Dehao
>>> >>
>>> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <davidxl@google.com> wrote:
>>> >>> The early inlining part is ok. The tracer optimization should be
>>> >>> revisited -- we should have more fine grain control on it (for
>>> >>> instance, based on FDO summary -- but that should be common to
>>> >>> FDO/LIPO).
>>> >>>
>>> >>> David
>>> >>>
>>> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <dehao@google.com> wrote:
>>> >>>> In gcc4-8, the max einline iterations are restricted to 1. For
>>> >>>> AutoFDO, this is bad because early inline is not size restricted. This
>>> >>>> patch allows einline to do multiple iterations in AutoFDO. It also
>>> >>>> enables tracer optimization in AutoFDO.
>>> >>>>
>>> >>>> Bootstrapped and passed regression test.
>>> >>>>
>>> >>>> OK for googel-4_8?
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Dehao
>>> >>>>
>>> >>>> Index: gcc/ipa-inline.c
>>> >>>> ===================================================================
>>> >>>> --- gcc/ipa-inline.c (revision 199416)
>>> >>>> +++ gcc/ipa-inline.c (working copy)
>>> >>>> @@ -2161,7 +2161,8 @@ early_inliner (void)
>>> >>>>      {
>>> >>>>        /* We iterate incremental inlining to get trivial cases of indirect
>>> >>>>   inlining.  */
>>> >>>> -      while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)
>>> >>>> +      while ((flag_auto_profile
>>> >>>> +      || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS))
>>> >>>>       && early_inline_small_functions (node))
>>> >>>>   {
>>> >>>>    timevar_push (TV_INTEGRATION);
>>> >>>> Index: gcc/opts.c
>>> >>>> ===================================================================
>>> >>>> --- gcc/opts.c (revision 199416)
>>> >>>> +++ gcc/opts.c (working copy)
>>> >>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts,
>>> >>>>   opts->x_flag_peel_loops = value;
>>> >>>>        if (!opts_set->x_flag_value_profile_transformations)
>>> >>>>   opts->x_flag_value_profile_transformations = value;
>>> >>>> +      if (!opts_set->x_flag_tracer)
>>> >>>> + opts->x_flag_tracer = value;
>>> >>>>        if (!opts_set->x_flag_inline_functions)
>>> >>>>   opts->x_flag_inline_functions = value;
>>> >>>>        if (!opts_set->x_flag_ipa_cp)

Comments

Xinliang David Li June 3, 2013, 4:36 a.m. UTC | #1
The patch is ok if performance test passes.  For a complete fix, Is it
better to tune down PARAM_EARLY_INLINE_INSNS from 11 to a small value
for autoFDO or use a different parameter?

David

On Sun, Jun 2, 2013 at 9:19 PM, Dehao Chen <dehao@google.com> wrote:
> I've updated the patch to check it at ipa-inline:
>
> Index: gcc/ipa-inline.c
> ===================================================================
> --- gcc/ipa-inline.c (revision 199593)
> +++ gcc/ipa-inline.c (working copy)
> @@ -434,6 +434,16 @@ want_early_inline_function_p (struct cgraph_edge *
>
>        if (growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_ANY))
>   ;
> +      else if (flag_auto_profile)
> + {
> +  if (dump_file)
> +    fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
> +     "call is cold in profiling and code would grow by %i\n",
> +     xstrdup (cgraph_node_name (e->caller)), e->caller->uid,
> +     xstrdup (cgraph_node_name (callee)), callee->uid,
> +     growth);
> +    want_inline = false;
> + }
>        else if (!cgraph_maybe_hot_edge_p (e))
>   {
>    if (dump_file)
>
> Thanks,
> Dehao
>
> On Sun, Jun 2, 2013 at 9:08 PM, Xinliang David Li <davidxl@google.com> wrote:
>> If the purpose of the fix is to filter early inlinings with code
>> growth in autoFDO, the proposed fix is the wrong way to do -- it
>> changes the meaning of cgraph_maybe_hot_edge_p.
>>
>> David
>>
>> On Sun, Jun 2, 2013 at 7:25 PM, Dehao Chen <dehao@google.com> wrote:
>>> On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>
>>>> auto profile info is not available yet in early inlining, why would
>>>> this change make any difference?
>>>
>>> Because the check of PARAM_EARLY_INLINING_INSNS is after the check of
>>> cgraph_maybe_hot_edge_p in early inline. If
>>> cgraph_maybe_hot_edge_p fails, the early inline will not happen even
>>> if growth is less than PARAM_EARLY_INLINING_INSNS.
>>>
>>>>
>>>> Can you just reset the max_iters to a
>>>> higher value for autoFDO?
>>>
>>> We could do that, but it could still lead to some code bloat because
>>> recursive inlines can happen for at most, say 10, iterations.
>>>
>>> Dehao
>>>
>>>>
>>>> David
>>>>
>>>> On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen <dehao@google.com> wrote:
>>>> > The patch was committed to google-4_8, but it causes problem because
>>>> > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause
>>>> > recursive inlining at einline stage (e.g. main->foo, foo->bar,
>>>> > bar->foo) when autofdo is enabled.
>>>> >
>>>> > The following patch can fix the problem by doing more targetted early inlining:
>>>> >
>>>> > Index: gcc/predict.c
>>>> > ===================================================================
>>>> > --- gcc/predict.c (revision 199593)
>>>> > +++ gcc/predict.c (working copy)
>>>> > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
>>>> >        && !maybe_hot_count_p (NULL,
>>>> >                               edge->count))
>>>> >      return false;
>>>> > +  if (flag_auto_profile)
>>>> > +    return false;
>>>> >    if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
>>>> >        || (edge->callee
>>>> >    && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))
>>>> >
>>>> > Performance testing on-going...
>>>> >
>>>> > Dehao
>>>> >
>>>> > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <dehao@google.com> wrote:
>>>> >> OK, I'll commit the early inline part.
>>>> >>
>>>> >> Dehao
>>>> >>
>>>> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>> >>> The early inlining part is ok. The tracer optimization should be
>>>> >>> revisited -- we should have more fine grain control on it (for
>>>> >>> instance, based on FDO summary -- but that should be common to
>>>> >>> FDO/LIPO).
>>>> >>>
>>>> >>> David
>>>> >>>
>>>> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <dehao@google.com> wrote:
>>>> >>>> In gcc4-8, the max einline iterations are restricted to 1. For
>>>> >>>> AutoFDO, this is bad because early inline is not size restricted. This
>>>> >>>> patch allows einline to do multiple iterations in AutoFDO. It also
>>>> >>>> enables tracer optimization in AutoFDO.
>>>> >>>>
>>>> >>>> Bootstrapped and passed regression test.
>>>> >>>>
>>>> >>>> OK for googel-4_8?
>>>> >>>>
>>>> >>>> Thanks,
>>>> >>>> Dehao
>>>> >>>>
>>>> >>>> Index: gcc/ipa-inline.c
>>>> >>>> ===================================================================
>>>> >>>> --- gcc/ipa-inline.c (revision 199416)
>>>> >>>> +++ gcc/ipa-inline.c (working copy)
>>>> >>>> @@ -2161,7 +2161,8 @@ early_inliner (void)
>>>> >>>>      {
>>>> >>>>        /* We iterate incremental inlining to get trivial cases of indirect
>>>> >>>>   inlining.  */
>>>> >>>> -      while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)
>>>> >>>> +      while ((flag_auto_profile
>>>> >>>> +      || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS))
>>>> >>>>       && early_inline_small_functions (node))
>>>> >>>>   {
>>>> >>>>    timevar_push (TV_INTEGRATION);
>>>> >>>> Index: gcc/opts.c
>>>> >>>> ===================================================================
>>>> >>>> --- gcc/opts.c (revision 199416)
>>>> >>>> +++ gcc/opts.c (working copy)
>>>> >>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts,
>>>> >>>>   opts->x_flag_peel_loops = value;
>>>> >>>>        if (!opts_set->x_flag_value_profile_transformations)
>>>> >>>>   opts->x_flag_value_profile_transformations = value;
>>>> >>>> +      if (!opts_set->x_flag_tracer)
>>>> >>>> + opts->x_flag_tracer = value;
>>>> >>>>        if (!opts_set->x_flag_inline_functions)
>>>> >>>>   opts->x_flag_inline_functions = value;
>>>> >>>>        if (!opts_set->x_flag_ipa_cp)
Dehao Chen June 3, 2013, 3:47 p.m. UTC | #2
The performance testing is ok. But it does not solve the problem: for
some recursive function calls, the size growth is calculated as 0. So
I think we may want to just fall back to the 4.7 to limit the
iterations to 10 for AutoFDO enabled build?

Dehao

On Sun, Jun 2, 2013 at 9:36 PM, Xinliang David Li <davidxl@google.com> wrote:
> The patch is ok if performance test passes.  For a complete fix, Is it
> better to tune down PARAM_EARLY_INLINE_INSNS from 11 to a small value
> for autoFDO or use a different parameter?
>
> David
>
> On Sun, Jun 2, 2013 at 9:19 PM, Dehao Chen <dehao@google.com> wrote:
>> I've updated the patch to check it at ipa-inline:
>>
>> Index: gcc/ipa-inline.c
>> ===================================================================
>> --- gcc/ipa-inline.c (revision 199593)
>> +++ gcc/ipa-inline.c (working copy)
>> @@ -434,6 +434,16 @@ want_early_inline_function_p (struct cgraph_edge *
>>
>>        if (growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_ANY))
>>   ;
>> +      else if (flag_auto_profile)
>> + {
>> +  if (dump_file)
>> +    fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
>> +     "call is cold in profiling and code would grow by %i\n",
>> +     xstrdup (cgraph_node_name (e->caller)), e->caller->uid,
>> +     xstrdup (cgraph_node_name (callee)), callee->uid,
>> +     growth);
>> +    want_inline = false;
>> + }
>>        else if (!cgraph_maybe_hot_edge_p (e))
>>   {
>>    if (dump_file)
>>
>> Thanks,
>> Dehao
>>
>> On Sun, Jun 2, 2013 at 9:08 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> If the purpose of the fix is to filter early inlinings with code
>>> growth in autoFDO, the proposed fix is the wrong way to do -- it
>>> changes the meaning of cgraph_maybe_hot_edge_p.
>>>
>>> David
>>>
>>> On Sun, Jun 2, 2013 at 7:25 PM, Dehao Chen <dehao@google.com> wrote:
>>>> On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>
>>>>> auto profile info is not available yet in early inlining, why would
>>>>> this change make any difference?
>>>>
>>>> Because the check of PARAM_EARLY_INLINING_INSNS is after the check of
>>>> cgraph_maybe_hot_edge_p in early inline. If
>>>> cgraph_maybe_hot_edge_p fails, the early inline will not happen even
>>>> if growth is less than PARAM_EARLY_INLINING_INSNS.
>>>>
>>>>>
>>>>> Can you just reset the max_iters to a
>>>>> higher value for autoFDO?
>>>>
>>>> We could do that, but it could still lead to some code bloat because
>>>> recursive inlines can happen for at most, say 10, iterations.
>>>>
>>>> Dehao
>>>>
>>>>>
>>>>> David
>>>>>
>>>>> On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen <dehao@google.com> wrote:
>>>>> > The patch was committed to google-4_8, but it causes problem because
>>>>> > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause
>>>>> > recursive inlining at einline stage (e.g. main->foo, foo->bar,
>>>>> > bar->foo) when autofdo is enabled.
>>>>> >
>>>>> > The following patch can fix the problem by doing more targetted early inlining:
>>>>> >
>>>>> > Index: gcc/predict.c
>>>>> > ===================================================================
>>>>> > --- gcc/predict.c (revision 199593)
>>>>> > +++ gcc/predict.c (working copy)
>>>>> > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
>>>>> >        && !maybe_hot_count_p (NULL,
>>>>> >                               edge->count))
>>>>> >      return false;
>>>>> > +  if (flag_auto_profile)
>>>>> > +    return false;
>>>>> >    if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
>>>>> >        || (edge->callee
>>>>> >    && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))
>>>>> >
>>>>> > Performance testing on-going...
>>>>> >
>>>>> > Dehao
>>>>> >
>>>>> > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <dehao@google.com> wrote:
>>>>> >> OK, I'll commit the early inline part.
>>>>> >>
>>>>> >> Dehao
>>>>> >>
>>>>> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> >>> The early inlining part is ok. The tracer optimization should be
>>>>> >>> revisited -- we should have more fine grain control on it (for
>>>>> >>> instance, based on FDO summary -- but that should be common to
>>>>> >>> FDO/LIPO).
>>>>> >>>
>>>>> >>> David
>>>>> >>>
>>>>> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <dehao@google.com> wrote:
>>>>> >>>> In gcc4-8, the max einline iterations are restricted to 1. For
>>>>> >>>> AutoFDO, this is bad because early inline is not size restricted. This
>>>>> >>>> patch allows einline to do multiple iterations in AutoFDO. It also
>>>>> >>>> enables tracer optimization in AutoFDO.
>>>>> >>>>
>>>>> >>>> Bootstrapped and passed regression test.
>>>>> >>>>
>>>>> >>>> OK for googel-4_8?
>>>>> >>>>
>>>>> >>>> Thanks,
>>>>> >>>> Dehao
>>>>> >>>>
>>>>> >>>> Index: gcc/ipa-inline.c
>>>>> >>>> ===================================================================
>>>>> >>>> --- gcc/ipa-inline.c (revision 199416)
>>>>> >>>> +++ gcc/ipa-inline.c (working copy)
>>>>> >>>> @@ -2161,7 +2161,8 @@ early_inliner (void)
>>>>> >>>>      {
>>>>> >>>>        /* We iterate incremental inlining to get trivial cases of indirect
>>>>> >>>>   inlining.  */
>>>>> >>>> -      while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)
>>>>> >>>> +      while ((flag_auto_profile
>>>>> >>>> +      || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS))
>>>>> >>>>       && early_inline_small_functions (node))
>>>>> >>>>   {
>>>>> >>>>    timevar_push (TV_INTEGRATION);
>>>>> >>>> Index: gcc/opts.c
>>>>> >>>> ===================================================================
>>>>> >>>> --- gcc/opts.c (revision 199416)
>>>>> >>>> +++ gcc/opts.c (working copy)
>>>>> >>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts,
>>>>> >>>>   opts->x_flag_peel_loops = value;
>>>>> >>>>        if (!opts_set->x_flag_value_profile_transformations)
>>>>> >>>>   opts->x_flag_value_profile_transformations = value;
>>>>> >>>> +      if (!opts_set->x_flag_tracer)
>>>>> >>>> + opts->x_flag_tracer = value;
>>>>> >>>>        if (!opts_set->x_flag_inline_functions)
>>>>> >>>>   opts->x_flag_inline_functions = value;
>>>>> >>>>        if (!opts_set->x_flag_ipa_cp)
diff mbox

Patch

Index: gcc/ipa-inline.c
===================================================================
--- gcc/ipa-inline.c (revision 199593)
+++ gcc/ipa-inline.c (working copy)
@@ -434,6 +434,16 @@  want_early_inline_function_p (struct cgraph_edge *

       if (growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_ANY))
  ;
+      else if (flag_auto_profile)
+ {
+  if (dump_file)
+    fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
+     "call is cold in profiling and code would grow by %i\n",
+     xstrdup (cgraph_node_name (e->caller)), e->caller->uid,
+     xstrdup (cgraph_node_name (callee)), callee->uid,
+     growth);
+    want_inline = false;
+ }
       else if (!cgraph_maybe_hot_edge_p (e))
  {
   if (dump_file)