diff mbox

[GOOGLE] Fix AutoFDO size issue

Message ID CAO2gOZV8=MEHb_Vz6+ZNrUGe0dsyfG3F1hYd+mr2gyFHSCeG1Q@mail.gmail.com
State New
Headers show

Commit Message

Dehao Chen Nov. 13, 2014, 10:25 p.m. UTC
In AutoFDO, we increase einline iterations. This could lead to
extensive code bloat if we have recursive calls like:

dtor() {
  destroy(node);
}

destroy(node) {
  destroy(left)
  destroy(right)
}

In this case, the size growth will be around 8 which is smaller than
threshold (11). However, if we allow this to happen for 2 iterations,
it will expand the size by 1024X. To fix this problem, we want to set
a much smaller threshold in the AutoFDO case. This is because AutoFDO
do not not rely on aggressive einline to gain more profile context.

And also, in AutoFDO pass, after we processed a function, we need to
recompute inline parameters because rebuild_cgraph_edges will zero out
all inline parameters.

The patch is attached below, bootstrapped and perf test on-going. OK
for google-4_9?

Thanks,
Dehao

Comments

Xinliang David Li Nov. 13, 2014, 10:48 p.m. UTC | #1
Is there a need to have 10 iterations of early inline for autofdo?

David

On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
> In AutoFDO, we increase einline iterations. This could lead to
> extensive code bloat if we have recursive calls like:
>
> dtor() {
>   destroy(node);
> }
>
> destroy(node) {
>   destroy(left)
>   destroy(right)
> }
>
> In this case, the size growth will be around 8 which is smaller than
> threshold (11). However, if we allow this to happen for 2 iterations,
> it will expand the size by 1024X. To fix this problem, we want to set
> a much smaller threshold in the AutoFDO case. This is because AutoFDO
> do not not rely on aggressive einline to gain more profile context.
>
> And also, in AutoFDO pass, after we processed a function, we need to
> recompute inline parameters because rebuild_cgraph_edges will zero out
> all inline parameters.
>
> The patch is attached below, bootstrapped and perf test on-going. OK
> for google-4_9?
>
> Thanks,
> Dehao
>
> Index: gcc/auto-profile.c
> ===================================================================
> --- gcc/auto-profile.c (revision 217523)
> +++ gcc/auto-profile.c (working copy)
> @@ -1771,6 +1771,7 @@ auto_profile (void)
>        free_dominance_info (CDI_DOMINATORS);
>        free_dominance_info (CDI_POST_DOMINATORS);
>        rebuild_cgraph_edges ();
> +      compute_inline_parameters (cgraph_get_node
> (current_function_decl), true);
>        pop_cfun ();
>      }
>
> Index: gcc/opts.c
> ===================================================================
> --- gcc/opts.c (revision 217523)
> +++ gcc/opts.c (working copy)
> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>        maybe_set_param_value (
>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>   opts->x_param_values, opts_set->x_param_values);
> +      maybe_set_param_value (
> +        PARAM_EARLY_INLINING_INSNS, 4,
> +        opts->x_param_values, opts_set->x_param_values);
> +      maybe_set_param_value (
> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
> +        opts->x_param_values, opts_set->x_param_values);
>        value = true;
>      /* No break here - do -fauto-profile processing. */
>      case OPT_fauto_profile:
Xinliang David Li Nov. 13, 2014, 10:51 p.m. UTC | #2
After inline summary is recomputed, the large code growth problem will
also be better controlled, right?

David

On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
> Is there a need to have 10 iterations of early inline for autofdo?
>
> David
>
> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>> In AutoFDO, we increase einline iterations. This could lead to
>> extensive code bloat if we have recursive calls like:
>>
>> dtor() {
>>   destroy(node);
>> }
>>
>> destroy(node) {
>>   destroy(left)
>>   destroy(right)
>> }
>>
>> In this case, the size growth will be around 8 which is smaller than
>> threshold (11). However, if we allow this to happen for 2 iterations,
>> it will expand the size by 1024X. To fix this problem, we want to set
>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>> do not not rely on aggressive einline to gain more profile context.
>>
>> And also, in AutoFDO pass, after we processed a function, we need to
>> recompute inline parameters because rebuild_cgraph_edges will zero out
>> all inline parameters.
>>
>> The patch is attached below, bootstrapped and perf test on-going. OK
>> for google-4_9?
>>
>> Thanks,
>> Dehao
>>
>> Index: gcc/auto-profile.c
>> ===================================================================
>> --- gcc/auto-profile.c (revision 217523)
>> +++ gcc/auto-profile.c (working copy)
>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>        free_dominance_info (CDI_DOMINATORS);
>>        free_dominance_info (CDI_POST_DOMINATORS);
>>        rebuild_cgraph_edges ();
>> +      compute_inline_parameters (cgraph_get_node
>> (current_function_decl), true);
>>        pop_cfun ();
>>      }
>>
>> Index: gcc/opts.c
>> ===================================================================
>> --- gcc/opts.c (revision 217523)
>> +++ gcc/opts.c (working copy)
>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>        maybe_set_param_value (
>>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>   opts->x_param_values, opts_set->x_param_values);
>> +      maybe_set_param_value (
>> +        PARAM_EARLY_INLINING_INSNS, 4,
>> +        opts->x_param_values, opts_set->x_param_values);
>> +      maybe_set_param_value (
>> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>> +        opts->x_param_values, opts_set->x_param_values);
>>        value = true;
>>      /* No break here - do -fauto-profile processing. */
>>      case OPT_fauto_profile:
Dehao Chen Nov. 13, 2014, 10:57 p.m. UTC | #3
IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it
should not harm to set max iter as 10.

On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote:
> After inline summary is recomputed, the large code growth problem will
> also be better controlled, right?

For this case, recomputing inline summary does not help because the
code was bloated in first einline phase.

Dehao

>
> David
>
> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
>> Is there a need to have 10 iterations of early inline for autofdo?
>>
>> David
>>
>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>>> In AutoFDO, we increase einline iterations. This could lead to
>>> extensive code bloat if we have recursive calls like:
>>>
>>> dtor() {
>>>   destroy(node);
>>> }
>>>
>>> destroy(node) {
>>>   destroy(left)
>>>   destroy(right)
>>> }
>>>
>>> In this case, the size growth will be around 8 which is smaller than
>>> threshold (11). However, if we allow this to happen for 2 iterations,
>>> it will expand the size by 1024X. To fix this problem, we want to set
>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>>> do not not rely on aggressive einline to gain more profile context.
>>>
>>> And also, in AutoFDO pass, after we processed a function, we need to
>>> recompute inline parameters because rebuild_cgraph_edges will zero out
>>> all inline parameters.
>>>
>>> The patch is attached below, bootstrapped and perf test on-going. OK
>>> for google-4_9?
>>>
>>> Thanks,
>>> Dehao
>>>
>>> Index: gcc/auto-profile.c
>>> ===================================================================
>>> --- gcc/auto-profile.c (revision 217523)
>>> +++ gcc/auto-profile.c (working copy)
>>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>>        free_dominance_info (CDI_DOMINATORS);
>>>        free_dominance_info (CDI_POST_DOMINATORS);
>>>        rebuild_cgraph_edges ();
>>> +      compute_inline_parameters (cgraph_get_node
>>> (current_function_decl), true);
>>>        pop_cfun ();
>>>      }
>>>
>>> Index: gcc/opts.c
>>> ===================================================================
>>> --- gcc/opts.c (revision 217523)
>>> +++ gcc/opts.c (working copy)
>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>>        maybe_set_param_value (
>>>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>>   opts->x_param_values, opts_set->x_param_values);
>>> +      maybe_set_param_value (
>>> +        PARAM_EARLY_INLINING_INSNS, 4,
>>> +        opts->x_param_values, opts_set->x_param_values);
>>> +      maybe_set_param_value (
>>> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>>> +        opts->x_param_values, opts_set->x_param_values);
>>>        value = true;
>>>      /* No break here - do -fauto-profile processing. */
>>>      case OPT_fauto_profile:
Xinliang David Li Nov. 13, 2014, 11:18 p.m. UTC | #4
On Thu, Nov 13, 2014 at 2:57 PM, Dehao Chen <dehao@google.com> wrote:
> IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it
> should not harm to set max iter as 10.
>
> On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote:
>> After inline summary is recomputed, the large code growth problem will
>> also be better controlled, right?
>
> For this case, recomputing inline summary does not help because the
> code was bloated in first einline phase.

For recursive inlining, the inline summary for the cloned edges need
to be updated to prevent the growth?

david

>
> Dehao
>
>>
>> David
>>
>> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> Is there a need to have 10 iterations of early inline for autofdo?
>>>
>>> David
>>>
>>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>>>> In AutoFDO, we increase einline iterations. This could lead to
>>>> extensive code bloat if we have recursive calls like:
>>>>
>>>> dtor() {
>>>>   destroy(node);
>>>> }
>>>>
>>>> destroy(node) {
>>>>   destroy(left)
>>>>   destroy(right)
>>>> }
>>>>
>>>> In this case, the size growth will be around 8 which is smaller than
>>>> threshold (11). However, if we allow this to happen for 2 iterations,
>>>> it will expand the size by 1024X. To fix this problem, we want to set
>>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>>>> do not not rely on aggressive einline to gain more profile context.
>>>>
>>>> And also, in AutoFDO pass, after we processed a function, we need to
>>>> recompute inline parameters because rebuild_cgraph_edges will zero out
>>>> all inline parameters.
>>>>
>>>> The patch is attached below, bootstrapped and perf test on-going. OK
>>>> for google-4_9?
>>>>
>>>> Thanks,
>>>> Dehao
>>>>
>>>> Index: gcc/auto-profile.c
>>>> ===================================================================
>>>> --- gcc/auto-profile.c (revision 217523)
>>>> +++ gcc/auto-profile.c (working copy)
>>>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>>>        free_dominance_info (CDI_DOMINATORS);
>>>>        free_dominance_info (CDI_POST_DOMINATORS);
>>>>        rebuild_cgraph_edges ();
>>>> +      compute_inline_parameters (cgraph_get_node
>>>> (current_function_decl), true);
>>>>        pop_cfun ();
>>>>      }
>>>>
>>>> Index: gcc/opts.c
>>>> ===================================================================
>>>> --- gcc/opts.c (revision 217523)
>>>> +++ gcc/opts.c (working copy)
>>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>>>        maybe_set_param_value (
>>>>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>>>   opts->x_param_values, opts_set->x_param_values);
>>>> +      maybe_set_param_value (
>>>> +        PARAM_EARLY_INLINING_INSNS, 4,
>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>> +      maybe_set_param_value (
>>>> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>        value = true;
>>>>      /* No break here - do -fauto-profile processing. */
>>>>      case OPT_fauto_profile:
Dehao Chen Nov. 13, 2014, 11:42 p.m. UTC | #5
We do not do sophisticated recursive call detection in einline phase.
It only happens in ipa-inline phase.

Dehao

On Thu, Nov 13, 2014 at 3:18 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Thu, Nov 13, 2014 at 2:57 PM, Dehao Chen <dehao@google.com> wrote:
>> IIRC, AutoFDO the actual iteration for AutoFDO is mostly <3. But it
>> should not harm to set max iter as 10.
>>
>> On Thu, Nov 13, 2014 at 2:51 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> After inline summary is recomputed, the large code growth problem will
>>> also be better controlled, right?
>>
>> For this case, recomputing inline summary does not help because the
>> code was bloated in first einline phase.
>
> For recursive inlining, the inline summary for the cloned edges need
> to be updated to prevent the growth?
>
> david
>
>>
>> Dehao
>>
>>>
>>> David
>>>
>>> On Thu, Nov 13, 2014 at 2:48 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> Is there a need to have 10 iterations of early inline for autofdo?
>>>>
>>>> David
>>>>
>>>> On Thu, Nov 13, 2014 at 2:25 PM, Dehao Chen <dehao@google.com> wrote:
>>>>> In AutoFDO, we increase einline iterations. This could lead to
>>>>> extensive code bloat if we have recursive calls like:
>>>>>
>>>>> dtor() {
>>>>>   destroy(node);
>>>>> }
>>>>>
>>>>> destroy(node) {
>>>>>   destroy(left)
>>>>>   destroy(right)
>>>>> }
>>>>>
>>>>> In this case, the size growth will be around 8 which is smaller than
>>>>> threshold (11). However, if we allow this to happen for 2 iterations,
>>>>> it will expand the size by 1024X. To fix this problem, we want to set
>>>>> a much smaller threshold in the AutoFDO case. This is because AutoFDO
>>>>> do not not rely on aggressive einline to gain more profile context.
>>>>>
>>>>> And also, in AutoFDO pass, after we processed a function, we need to
>>>>> recompute inline parameters because rebuild_cgraph_edges will zero out
>>>>> all inline parameters.
>>>>>
>>>>> The patch is attached below, bootstrapped and perf test on-going. OK
>>>>> for google-4_9?
>>>>>
>>>>> Thanks,
>>>>> Dehao
>>>>>
>>>>> Index: gcc/auto-profile.c
>>>>> ===================================================================
>>>>> --- gcc/auto-profile.c (revision 217523)
>>>>> +++ gcc/auto-profile.c (working copy)
>>>>> @@ -1771,6 +1771,7 @@ auto_profile (void)
>>>>>        free_dominance_info (CDI_DOMINATORS);
>>>>>        free_dominance_info (CDI_POST_DOMINATORS);
>>>>>        rebuild_cgraph_edges ();
>>>>> +      compute_inline_parameters (cgraph_get_node
>>>>> (current_function_decl), true);
>>>>>        pop_cfun ();
>>>>>      }
>>>>>
>>>>> Index: gcc/opts.c
>>>>> ===================================================================
>>>>> --- gcc/opts.c (revision 217523)
>>>>> +++ gcc/opts.c (working copy)
>>>>> @@ -1853,6 +1853,12 @@ common_handle_option (struct gcc_options *opts,
>>>>>        maybe_set_param_value (
>>>>>   PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
>>>>>   opts->x_param_values, opts_set->x_param_values);
>>>>> +      maybe_set_param_value (
>>>>> +        PARAM_EARLY_INLINING_INSNS, 4,
>>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>> +      maybe_set_param_value (
>>>>> +        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
>>>>> +        opts->x_param_values, opts_set->x_param_values);
>>>>>        value = true;
>>>>>      /* No break here - do -fauto-profile processing. */
>>>>>      case OPT_fauto_profile:
diff mbox

Patch

Index: gcc/auto-profile.c
===================================================================
--- gcc/auto-profile.c (revision 217523)
+++ gcc/auto-profile.c (working copy)
@@ -1771,6 +1771,7 @@  auto_profile (void)
       free_dominance_info (CDI_DOMINATORS);
       free_dominance_info (CDI_POST_DOMINATORS);
       rebuild_cgraph_edges ();
+      compute_inline_parameters (cgraph_get_node
(current_function_decl), true);
       pop_cfun ();
     }

Index: gcc/opts.c
===================================================================
--- gcc/opts.c (revision 217523)
+++ gcc/opts.c (working copy)
@@ -1853,6 +1853,12 @@  common_handle_option (struct gcc_options *opts,
       maybe_set_param_value (
  PARAM_EARLY_INLINER_MAX_ITERATIONS, 10,
  opts->x_param_values, opts_set->x_param_values);
+      maybe_set_param_value (
+        PARAM_EARLY_INLINING_INSNS, 4,
+        opts->x_param_values, opts_set->x_param_values);
+      maybe_set_param_value (
+        PARAM_EARLY_INLINING_INSNS_COMDAT, 4,
+        opts->x_param_values, opts_set->x_param_values);
       value = true;
     /* No break here - do -fauto-profile processing. */
     case OPT_fauto_profile: