Patchwork [GOOGLE] Unrestrict early inline restrictions for AutoFDO

login
register
mail settings
Submitter Dehao Chen
Date June 3, 2013, 1:21 a.m.
Message ID <CAO2gOZW3v87az_UJ2LTmnGaRnsdbBKNxEak3YzC=ByhgBJTgug@mail.gmail.com>
Download mbox | patch
Permalink /patch/248151/
State New
Headers show

Comments

Dehao Chen - June 3, 2013, 1:21 a.m.
The patch was committed to google-4_8, but it causes problem because
einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause
recursive inlining at einline stage (e.g. main->foo, foo->bar,
bar->foo) when autofdo is enabled.

The following patch can fix the problem by doing more targetted early inlining:


Performance testing on-going...

Dehao

On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <dehao@google.com> wrote:
> OK, I'll commit the early inline part.
>
> Dehao
>
> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <davidxl@google.com> wrote:
>> The early inlining part is ok. The tracer optimization should be
>> revisited -- we should have more fine grain control on it (for
>> instance, based on FDO summary -- but that should be common to
>> FDO/LIPO).
>>
>> David
>>
>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <dehao@google.com> wrote:
>>> In gcc4-8, the max einline iterations are restricted to 1. For
>>> AutoFDO, this is bad because early inline is not size restricted. This
>>> patch allows einline to do multiple iterations in AutoFDO. It also
>>> enables tracer optimization in AutoFDO.
>>>
>>> Bootstrapped and passed regression test.
>>>
>>> OK for googel-4_8?
>>>
>>> Thanks,
>>> Dehao
>>>
>>> Index: gcc/ipa-inline.c
>>> ===================================================================
>>> --- gcc/ipa-inline.c (revision 199416)
>>> +++ gcc/ipa-inline.c (working copy)
>>> @@ -2161,7 +2161,8 @@ early_inliner (void)
>>>      {
>>>        /* We iterate incremental inlining to get trivial cases of indirect
>>>   inlining.  */
>>> -      while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)
>>> +      while ((flag_auto_profile
>>> +      || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS))
>>>       && early_inline_small_functions (node))
>>>   {
>>>    timevar_push (TV_INTEGRATION);
>>> Index: gcc/opts.c
>>> ===================================================================
>>> --- gcc/opts.c (revision 199416)
>>> +++ gcc/opts.c (working copy)
>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts,
>>>   opts->x_flag_peel_loops = value;
>>>        if (!opts_set->x_flag_value_profile_transformations)
>>>   opts->x_flag_value_profile_transformations = value;
>>> +      if (!opts_set->x_flag_tracer)
>>> + opts->x_flag_tracer = value;
>>>        if (!opts_set->x_flag_inline_functions)
>>>   opts->x_flag_inline_functions = value;
>>>        if (!opts_set->x_flag_ipa_cp)
Xinliang David Li - June 3, 2013, 2:14 a.m.
auto profile info is not available yet in early inlining, why would
this change make any difference? Can you just reset the max_iters to a
higher value for autoFDO?

David

On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen <dehao@google.com> wrote:
> The patch was committed to google-4_8, but it causes problem because
> einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause
> recursive inlining at einline stage (e.g. main->foo, foo->bar,
> bar->foo) when autofdo is enabled.
>
> The following patch can fix the problem by doing more targetted early inlining:
>
> Index: gcc/predict.c
> ===================================================================
> --- gcc/predict.c (revision 199593)
> +++ gcc/predict.c (working copy)
> @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
>        && !maybe_hot_count_p (NULL,
>                               edge->count))
>      return false;
> +  if (flag_auto_profile)
> +    return false;
>    if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
>        || (edge->callee
>    && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))
>
> Performance testing on-going...
>
> Dehao
>
> On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <dehao@google.com> wrote:
>> OK, I'll commit the early inline part.
>>
>> Dehao
>>
>> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <davidxl@google.com> wrote:
>>> The early inlining part is ok. The tracer optimization should be
>>> revisited -- we should have more fine grain control on it (for
>>> instance, based on FDO summary -- but that should be common to
>>> FDO/LIPO).
>>>
>>> David
>>>
>>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <dehao@google.com> wrote:
>>>> In gcc4-8, the max einline iterations are restricted to 1. For
>>>> AutoFDO, this is bad because early inline is not size restricted. This
>>>> patch allows einline to do multiple iterations in AutoFDO. It also
>>>> enables tracer optimization in AutoFDO.
>>>>
>>>> Bootstrapped and passed regression test.
>>>>
>>>> OK for googel-4_8?
>>>>
>>>> Thanks,
>>>> Dehao
>>>>
>>>> Index: gcc/ipa-inline.c
>>>> ===================================================================
>>>> --- gcc/ipa-inline.c (revision 199416)
>>>> +++ gcc/ipa-inline.c (working copy)
>>>> @@ -2161,7 +2161,8 @@ early_inliner (void)
>>>>      {
>>>>        /* We iterate incremental inlining to get trivial cases of indirect
>>>>   inlining.  */
>>>> -      while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)
>>>> +      while ((flag_auto_profile
>>>> +      || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS))
>>>>       && early_inline_small_functions (node))
>>>>   {
>>>>    timevar_push (TV_INTEGRATION);
>>>> Index: gcc/opts.c
>>>> ===================================================================
>>>> --- gcc/opts.c (revision 199416)
>>>> +++ gcc/opts.c (working copy)
>>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts,
>>>>   opts->x_flag_peel_loops = value;
>>>>        if (!opts_set->x_flag_value_profile_transformations)
>>>>   opts->x_flag_value_profile_transformations = value;
>>>> +      if (!opts_set->x_flag_tracer)
>>>> + opts->x_flag_tracer = value;
>>>>        if (!opts_set->x_flag_inline_functions)
>>>>   opts->x_flag_inline_functions = value;
>>>>        if (!opts_set->x_flag_ipa_cp)
Dehao Chen - June 3, 2013, 2:25 a.m.
On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li <davidxl@google.com> wrote:
>
> auto profile info is not available yet in early inlining, why would
> this change make any difference?

Because the check of PARAM_EARLY_INLINING_INSNS is after the check of
cgraph_maybe_hot_edge_p in early inline. If
cgraph_maybe_hot_edge_p fails, the early inline will not happen even
if growth is less than PARAM_EARLY_INLINING_INSNS.

>
> Can you just reset the max_iters to a
> higher value for autoFDO?

We could do that, but it could still lead to some code bloat because
recursive inlines can happen for at most, say 10, iterations.

Dehao

>
> David
>
> On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen <dehao@google.com> wrote:
> > The patch was committed to google-4_8, but it causes problem because
> > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause
> > recursive inlining at einline stage (e.g. main->foo, foo->bar,
> > bar->foo) when autofdo is enabled.
> >
> > The following patch can fix the problem by doing more targetted early inlining:
> >
> > Index: gcc/predict.c
> > ===================================================================
> > --- gcc/predict.c (revision 199593)
> > +++ gcc/predict.c (working copy)
> > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
> >        && !maybe_hot_count_p (NULL,
> >                               edge->count))
> >      return false;
> > +  if (flag_auto_profile)
> > +    return false;
> >    if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
> >        || (edge->callee
> >    && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))
> >
> > Performance testing on-going...
> >
> > Dehao
> >
> > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <dehao@google.com> wrote:
> >> OK, I'll commit the early inline part.
> >>
> >> Dehao
> >>
> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <davidxl@google.com> wrote:
> >>> The early inlining part is ok. The tracer optimization should be
> >>> revisited -- we should have more fine grain control on it (for
> >>> instance, based on FDO summary -- but that should be common to
> >>> FDO/LIPO).
> >>>
> >>> David
> >>>
> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <dehao@google.com> wrote:
> >>>> In gcc4-8, the max einline iterations are restricted to 1. For
> >>>> AutoFDO, this is bad because early inline is not size restricted. This
> >>>> patch allows einline to do multiple iterations in AutoFDO. It also
> >>>> enables tracer optimization in AutoFDO.
> >>>>
> >>>> Bootstrapped and passed regression test.
> >>>>
> >>>> OK for googel-4_8?
> >>>>
> >>>> Thanks,
> >>>> Dehao
> >>>>
> >>>> Index: gcc/ipa-inline.c
> >>>> ===================================================================
> >>>> --- gcc/ipa-inline.c (revision 199416)
> >>>> +++ gcc/ipa-inline.c (working copy)
> >>>> @@ -2161,7 +2161,8 @@ early_inliner (void)
> >>>>      {
> >>>>        /* We iterate incremental inlining to get trivial cases of indirect
> >>>>   inlining.  */
> >>>> -      while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)
> >>>> +      while ((flag_auto_profile
> >>>> +      || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS))
> >>>>       && early_inline_small_functions (node))
> >>>>   {
> >>>>    timevar_push (TV_INTEGRATION);
> >>>> Index: gcc/opts.c
> >>>> ===================================================================
> >>>> --- gcc/opts.c (revision 199416)
> >>>> +++ gcc/opts.c (working copy)
> >>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts,
> >>>>   opts->x_flag_peel_loops = value;
> >>>>        if (!opts_set->x_flag_value_profile_transformations)
> >>>>   opts->x_flag_value_profile_transformations = value;
> >>>> +      if (!opts_set->x_flag_tracer)
> >>>> + opts->x_flag_tracer = value;
> >>>>        if (!opts_set->x_flag_inline_functions)
> >>>>   opts->x_flag_inline_functions = value;
> >>>>        if (!opts_set->x_flag_ipa_cp)
Xinliang David Li - June 3, 2013, 4:08 a.m.
If the purpose of the fix is to filter early inlinings with code
growth in autoFDO, the proposed fix is the wrong way to do -- it
changes the meaning of cgraph_maybe_hot_edge_p.

David

On Sun, Jun 2, 2013 at 7:25 PM, Dehao Chen <dehao@google.com> wrote:
> On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li <davidxl@google.com> wrote:
>>
>> auto profile info is not available yet in early inlining, why would
>> this change make any difference?
>
> Because the check of PARAM_EARLY_INLINING_INSNS is after the check of
> cgraph_maybe_hot_edge_p in early inline. If
> cgraph_maybe_hot_edge_p fails, the early inline will not happen even
> if growth is less than PARAM_EARLY_INLINING_INSNS.
>
>>
>> Can you just reset the max_iters to a
>> higher value for autoFDO?
>
> We could do that, but it could still lead to some code bloat because
> recursive inlines can happen for at most, say 10, iterations.
>
> Dehao
>
>>
>> David
>>
>> On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen <dehao@google.com> wrote:
>> > The patch was committed to google-4_8, but it causes problem because
>> > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause
>> > recursive inlining at einline stage (e.g. main->foo, foo->bar,
>> > bar->foo) when autofdo is enabled.
>> >
>> > The following patch can fix the problem by doing more targetted early inlining:
>> >
>> > Index: gcc/predict.c
>> > ===================================================================
>> > --- gcc/predict.c (revision 199593)
>> > +++ gcc/predict.c (working copy)
>> > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
>> >        && !maybe_hot_count_p (NULL,
>> >                               edge->count))
>> >      return false;
>> > +  if (flag_auto_profile)
>> > +    return false;
>> >    if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
>> >        || (edge->callee
>> >    && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))
>> >
>> > Performance testing on-going...
>> >
>> > Dehao
>> >
>> > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <dehao@google.com> wrote:
>> >> OK, I'll commit the early inline part.
>> >>
>> >> Dehao
>> >>
>> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <davidxl@google.com> wrote:
>> >>> The early inlining part is ok. The tracer optimization should be
>> >>> revisited -- we should have more fine grain control on it (for
>> >>> instance, based on FDO summary -- but that should be common to
>> >>> FDO/LIPO).
>> >>>
>> >>> David
>> >>>
>> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <dehao@google.com> wrote:
>> >>>> In gcc4-8, the max einline iterations are restricted to 1. For
>> >>>> AutoFDO, this is bad because early inline is not size restricted. This
>> >>>> patch allows einline to do multiple iterations in AutoFDO. It also
>> >>>> enables tracer optimization in AutoFDO.
>> >>>>
>> >>>> Bootstrapped and passed regression test.
>> >>>>
>> >>>> OK for googel-4_8?
>> >>>>
>> >>>> Thanks,
>> >>>> Dehao
>> >>>>
>> >>>> Index: gcc/ipa-inline.c
>> >>>> ===================================================================
>> >>>> --- gcc/ipa-inline.c (revision 199416)
>> >>>> +++ gcc/ipa-inline.c (working copy)
>> >>>> @@ -2161,7 +2161,8 @@ early_inliner (void)
>> >>>>      {
>> >>>>        /* We iterate incremental inlining to get trivial cases of indirect
>> >>>>   inlining.  */
>> >>>> -      while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)
>> >>>> +      while ((flag_auto_profile
>> >>>> +      || iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS))
>> >>>>       && early_inline_small_functions (node))
>> >>>>   {
>> >>>>    timevar_push (TV_INTEGRATION);
>> >>>> Index: gcc/opts.c
>> >>>> ===================================================================
>> >>>> --- gcc/opts.c (revision 199416)
>> >>>> +++ gcc/opts.c (working copy)
>> >>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts,
>> >>>>   opts->x_flag_peel_loops = value;
>> >>>>        if (!opts_set->x_flag_value_profile_transformations)
>> >>>>   opts->x_flag_value_profile_transformations = value;
>> >>>> +      if (!opts_set->x_flag_tracer)
>> >>>> + opts->x_flag_tracer = value;
>> >>>>        if (!opts_set->x_flag_inline_functions)
>> >>>>   opts->x_flag_inline_functions = value;
>> >>>>        if (!opts_set->x_flag_ipa_cp)

Patch

Index: gcc/predict.c
===================================================================
--- gcc/predict.c (revision 199593)
+++ gcc/predict.c (working copy)
@@ -175,6 +175,8 @@  cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
       && !maybe_hot_count_p (NULL,
                              edge->count))
     return false;
+  if (flag_auto_profile)
+    return false;
   if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
       || (edge->callee
   && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))