diff mbox

[GOOGLE] Fix AutoFDO size issue

Message ID CAO2gOZUhpbGL02wQbTi9tp1SByj-NkDfhCB3r_QK-WOVCLcMiQ@mail.gmail.com
State New
Headers show

Commit Message

Dehao Chen Nov. 17, 2014, 8:47 p.m. UTC
The patch was updated to ignore comdat einline tuning for AutoFDO.
Performance testing is green.

OK for google-4_9?

Thanks,
Dehao


On Thu, Nov 13, 2014 at 3:42 PM, Dehao Chen <dehao@google.com> wrote:
> We do not do sophisticated recursive call detection in einline phase.
> It only happens in ipa-inline phase.
>
> Dehao
>
> On Thu, Nov 13, 2014 at 3:18 PM, Xinliang David Li <davidxl@google.com> wrote:
>> On Thu, Nov 13, 2014 at 2:57 PM, Dehao Chen <dehao@google.com> wrote:
>>> IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it
>>> should not harm to set max iter as 10.
>>>
>>> On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> After inline summary is recomputed, the large code growth problem will
>>>> also be better controlled, right?
>>>
>>> For this case, recomputing inline summary does not help because the
>>> code was bloated in first einline phase.
>>
>> For recursive inlining, the inline summary for the cloned edges need
>> to be updated to prevent the growth?
>>
>> david
>>
>>>
>>> Dehao
>>>
>>>>
>>>> David
>>>>
>>>> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> Is there a need to have 10 iterations of early inline for autofdo?
>>>>>
>>>>> David
>>>>>
>>>>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>>>>>> In AutoFDO, we increase einline iterations. This could lead to
>>>>>> extensive code bloat if we have recursive calls like:
>>>>>>
>>>>>> dtor() {
>>>>>>   destroy(node);
>>>>>> }
>>>>>>
>>>>>> destroy(node) {
>>>>>>   destroy(left)
>>>>>>   destroy(right)
>>>>>> }
>>>>>>
>>>>>> In this case, the size growth will be around 8 which is smaller than
>>>>>> threshold (11). However, if we allow this to happen for 2 iterations,
>>>>>> it will expand the size by 1024X. To fix this problem, we want to set
>>>>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>>>>>> do not not rely on aggressive einline to gain more profile context.
>>>>>>
>>>>>> And also, in AutoFDO pass, after we processed a function, we need to
>>>>>> recompute inline parameters because rebuild_cgraph_edges will zero out
>>>>>> all inline parameters.
>>>>>>
>>>>>> The patch is attached below, bootstrapped and perf test on-going. OK
>>>>>> for google-4_9?
>>>>>>
>>>>>> Thanks,
>>>>>> Dehao
>>>>>>
>>>>>> Index: gcc/auto-profile.c
>>>>>> ===================================================================
>>>>>> --- gcc/auto-profile.c (revision 217523)
>>>>>> +++ gcc/auto-profile.c (working copy)
>>>>>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>>>>>        free_dominance_info (CDI_DOMINATORS);
>>>>>>        free_dominance_info (CDI_POST_DOMINATORS);
>>>>>>        rebuild_cgraph_edges ();
>>>>>> +      compute_inline_parameters (cgraph_get_node
>>>>>> (current_function_decl), true);
>>>>>>        pop_cfun ();
>>>>>>      }
>>>>>>
>>>>>> Index: gcc/opts.c
>>>>>> ===================================================================
>>>>>> --- gcc/opts.c (revision 217523)
>>>>>> +++ gcc/opts.c (working copy)
>>>>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>>>>>        maybe_set_param_value (
>>>>>>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>>>>>   opts->x_param_values, opts_set->x_param_values);
>>>>>> +      maybe_set_param_value (
>>>>>> +        PARAM_EARLY_INLINING_INSNS, 4,
>>>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>>> +      maybe_set_param_value (
>>>>>> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>>>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>>>        value = true;
>>>>>>      /* No break here - do -fauto-profile processing. */
>>>>>>      case OPT_fauto_profile:

Comments

Xinliang David Li Nov. 17, 2014, 10:22 p.m. UTC | #1
Ok for now as a workraround, but this is probably not a long term fix.

David


On Mon, Nov 17, 2014 at 12:47 PM, Dehao Chen <dehao@google.com> wrote:
> The patch was updated to ignore comdat einline tuning for AutoFDO.
> Performance testing is green.
>
> OK for google-4_9?
>
> Thanks,
> Dehao
>
> Index: gcc/auto-profile.c
> ===================================================================
> --- gcc/auto-profile.c (revision 217523)
> +++ gcc/auto-profile.c (working copy)
> @@ -1771,6 +1771,7 @@ auto_profile (void)
>        free_dominance_info (CDI_DOMINATORS);
>        free_dominance_info (CDI_POST_DOMINATORS);
>        rebuild_cgraph_edges ();
> +      compute_inline_parameters (cgraph_get_node
> (current_function_decl), true);
>        pop_cfun ();
>      }
>
> Index: gcc/ipa-inline.c
> ===================================================================
> --- gcc/ipa-inline.c (revision 217523)
> +++ gcc/ipa-inline.c (working copy)
> @@ -501,7 +501,7 @@ want_early_inline_function_p (struct cgraph_edge *
>       growth);
>    want_inline = false;
>   }
> -      else if (DECL_COMDAT (callee->decl)
> +      else if (!flag_auto_profile && DECL_COMDAT (callee->decl)
>                 && growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_COMDAT))
>          ;
>        else if ((n = num_calls (callee)) != 0
>
> On Thu, Nov 13, 2014 at 3:42 PM, Dehao Chen <dehao@google.com> wrote:
>> We do not do sophisticated recursive call detection in einline phase.
>> It only happens in ipa-inline phase.
>>
>> Dehao
>>
>> On Thu, Nov 13, 2014 at 3:18 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> On Thu, Nov 13, 2014 at 2:57 PM, Dehao Chen <dehao@google.com> wrote:
>>>> IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it
>>>> should not harm to set max iter as 10.
>>>>
>>>> On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> After inline summary is recomputed, the large code growth problem will
>>>>> also be better controlled, right?
>>>>
>>>> For this case, recomputing inline summary does not help because the
>>>> code was bloated in first einline phase.
>>>
>>> For recursive inlining, the inline summary for the cloned edges need
>>> to be updated to prevent the growth?
>>>
>>> david
>>>
>>>>
>>>> Dehao
>>>>
>>>>>
>>>>> David
>>>>>
>>>>> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> Is there a need to have 10 iterations of early inline for autofdo?
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>>>>>>> In AutoFDO, we increase einline iterations. This could lead to
>>>>>>> extensive code bloat if we have recursive calls like:
>>>>>>>
>>>>>>> dtor() {
>>>>>>>   destroy(node);
>>>>>>> }
>>>>>>>
>>>>>>> destroy(node) {
>>>>>>>   destroy(left)
>>>>>>>   destroy(right)
>>>>>>> }
>>>>>>>
>>>>>>> In this case, the size growth will be around 8 which is smaller than
>>>>>>> threshold (11). However, if we allow this to happen for 2 iterations,
>>>>>>> it will expand the size by 1024X. To fix this problem, we want to set
>>>>>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>>>>>>> do not not rely on aggressive einline to gain more profile context.
>>>>>>>
>>>>>>> And also, in AutoFDO pass, after we processed a function, we need to
>>>>>>> recompute inline parameters because rebuild_cgraph_edges will zero out
>>>>>>> all inline parameters.
>>>>>>>
>>>>>>> The patch is attached below, bootstrapped and perf test on-going. OK
>>>>>>> for google-4_9?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dehao
>>>>>>>
>>>>>>> Index: gcc/auto-profile.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/auto-profile.c (revision 217523)
>>>>>>> +++ gcc/auto-profile.c (working copy)
>>>>>>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>>>>>>        free_dominance_info (CDI_DOMINATORS);
>>>>>>>        free_dominance_info (CDI_POST_DOMINATORS);
>>>>>>>        rebuild_cgraph_edges ();
>>>>>>> +      compute_inline_parameters (cgraph_get_node
>>>>>>> (current_function_decl), true);
>>>>>>>        pop_cfun ();
>>>>>>>      }
>>>>>>>
>>>>>>> Index: gcc/opts.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/opts.c (revision 217523)
>>>>>>> +++ gcc/opts.c (working copy)
>>>>>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>>>>>>        maybe_set_param_value (
>>>>>>>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>>>>>>   opts->x_param_values, opts_set->x_param_values);
>>>>>>> +      maybe_set_param_value (
>>>>>>> +        PARAM_EARLY_INLINING_INSNS, 4,
>>>>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>>>> +      maybe_set_param_value (
>>>>>>> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>>>>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>>>>        value = true;
>>>>>>>      /* No break here - do -fauto-profile processing. */
>>>>>>>      case OPT_fauto_profile:
Andi Kleen Nov. 18, 2014, 12:58 a.m. UTC | #2
Xinliang David Li <davidxl@google.com> writes:

> Ok for now as a workraround, but this is probably not a long term fix.

Is the workaround needed for the mainline autofdo version too?

-Andi
>
> David
>
>
> On Mon, Nov 17, 2014 at 12:47 PM, Dehao Chen <dehao@google.com> wrote:
>> The patch was updated to ignore comdat einline tuning for AutoFDO.
>> Performance testing is green.
Dehao Chen Nov. 18, 2014, 1:17 a.m. UTC | #3
There are actually two patches needed to port to mainline. I'll send
out the patch tomorrow.

Dehao

On Mon, Nov 17, 2014 at 4:58 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Xinliang David Li <davidxl@google.com> writes:
>
>> Ok for now as a workraround, but this is probably not a long term fix.
>
> Is the workaround needed for the mainline autofdo version too?
>
> -Andi
>>
>> David
>>
>>
>> On Mon, Nov 17, 2014 at 12:47 PM, Dehao Chen <dehao@google.com> wrote:
>>> The patch was updated to ignore comdat einline tuning for AutoFDO.
>>> Performance testing is green.
diff mbox

Patch

Index: gcc/auto-profile.c
===================================================================
--- gcc/auto-profile.c (revision 217523)
+++ gcc/auto-profile.c (working copy)
@@ -1771,6 +1771,7 @@  auto_profile (void)
       free_dominance_info (CDI_DOMINATORS);
       free_dominance_info (CDI_POST_DOMINATORS);
       rebuild_cgraph_edges ();
+      compute_inline_parameters (cgraph_get_node
(current_function_decl), true);
       pop_cfun ();
     }

Index: gcc/ipa-inline.c
===================================================================
--- gcc/ipa-inline.c (revision 217523)
+++ gcc/ipa-inline.c (working copy)
@@ -501,7 +501,7 @@  want_early_inline_function_p (struct cgraph_edge *
      growth);
   want_inline = false;
  }
-      else if (DECL_COMDAT (callee->decl)
+      else if (!flag_auto_profile && DECL_COMDAT (callee->decl)
                && growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_COMDAT))
         ;
       else if ((n = num_calls (callee)) != 0