diff mbox

decide edge's hotness when there is profile info

Message ID CAAe5K+VJ70+Sj_J-uwcr+W6THPZXe5cy-hwtBCpJXms2=nPimg@mail.gmail.com
State New
Headers show

Commit Message

Teresa Johnson Oct. 15, 2013, 2:59 p.m. UTC
On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen <dehao@google.com> wrote:
> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> For my test case, the entire inline instance is optimized away, so
>>> there is no info about it in the profile. I can do some fixup in the
>>> rebuild_cgraph_edge though.
>>
>> Yep, I understand that.  In this case we should turn PROFILE_READ to PROFILE_GUESSED
>> and guess the profile when we detect this (i.e. we have edges with non-0 counts into
>> functions with 0 profile).  That should prvent these from getting UNLIKELY_EXECUTED
>> and they will be inlined normal way.
>
> Oh, actually in AutoFDO, only functions with samples will be marked as
> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.

Here is Honza's patch that he was referring to:


Which is discussed in this email:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html

For COMDATs I need to extend this to do it a little later to do it
recursively to catch the case of COMDATs feeding other COMDATs and I
need to do some other handling to compute counts from the frequencies
when inlining. I have been meaning to work on this for awhile but
finally am getting to it this week. (Here's the last message from a
later thread that forked off the above one:
http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)

In the meantime, perhaps Honza's patch will suffice?

Teresa

>
> Dehao
>
>>
>> Honza
>>>
>>> Dehao
>>>
>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> > Is it possible to update the callee node summary after profile
>>> > annotate (using information from inline instances which are not
>>> > inlined in early inline)?
>>> >
>>> > David
>>> >
>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>> >>>> could be a potential risk because some callee is marked unlikely
>>> >>>> executed simply because they are inlined and eliminated in the O2
>>> >>>> binary. But in ipa-inline it will not get inlined because the edge is
>>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>> >>>
>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>>> >>> It seems that having profile set incorrectly will lead to other problems later, too.
>>> >>> We discussed similar problem with Teresa about the missing profiles for comdat,
>>> >>> basically one should detect these cases as profile being lost and go with guessed
>>> >>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>>> >>> to this issue)
>>> >>
>>> >> The current AutoFDO implementation will take all functions that do not
>>> >> have have profile as normally executed, thus use guessed profile for
>>> >> it. This is like using profile for truly hot functions, and using O2
>>> >> for other functions. This works fine. However, it leads to larger code
>>> >> size (approximately 10%~20% larger than FDO).
>>> >>
>>> >> I'd like to introduce another mode for users who care about both
>>> >> performance and code size, and can be sure that profile is
>>> >> representative. In this mode, we will mark all functions without
>>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>>> >> (of optimized code) to represent profile, it's possible that some hot
>>> >> functions (say foo) are inlined and fully eliminated into another hot
>>> >> function (say bar). So in the profile, bar is cold, and because the
>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>>> >> before the profile annotation. However, after profile annotate, we can
>>> >> infer from the bb count that foo->bar is hot, thus it should be
>>> >> inlined in ipa-inline phase. However, because bar itself is marked
>>> >> UNLIKELY_EXECUTED, it will not be inlined.
>>> >>
>>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>>> >> we find an edge's callee is unlikely executed, add the edge count to
>>> >> the callee's count and recalculate callee's frequency.
>>> >>
>>> >> Dehao
>>> >>
>>> >>>
>>> >>> Honza

Comments

Dehao Chen Oct. 15, 2013, 3:40 p.m. UTC | #1
Thanks for the pointer to Honza's patch. The patch does exactly what I
need. But it only resides in the instrumentation based FDO path. Can
we move the code to more common place (like rebuild_cgraph_edges)?

Thanks,
Dehao

On Tue, Oct 15, 2013 at 7:59 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen <dehao@google.com> wrote:
>> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> For my test case, the entire inline instance is optimized away, so
>>>> there is no info about it in the profile. I can do some fixup in the
>>>> rebuild_cgraph_edge though.
>>>
>>> Yep, I understand that.  In this case we should turn PROFILE_READ to PROFILE_GUESSED
>>> and guess the profile when we detect this (i.e. we have edges with non-0 counts into
>>> functions with 0 profile).  That should prvent these from getting UNLIKELY_EXECUTED
>>> and they will be inlined normal way.
>>
>> Oh, actually in AutoFDO, only functions with samples will be marked as
>> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
>
> Here is Honza's patch that he was referring to:
>
> Index: tree-profile.c
> ===================================================================
> --- tree-profile.c (revision 201838)
> +++ tree-profile.c (working copy)
> @@ -604,6 +604,34 @@
>
>        pop_cfun ();
>      }
> +  /* See if 0 count function has non-0 count callers.  In this case we
> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    {
> +      struct cgraph_edge *e;
> +      bool called = false;
> +      if (node->count)
> + continue;
> +      for (e = node->callers; e; e = e->next_caller)
> + {
> +  if (e->count)
> +    called = true;
> +  if (cgraph_maybe_hot_edge_p (e))
> +    break;
> + }
> +      if (e || called
> +  && profile_status_for_function
> +      (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
> + {
> +  if (dump_file)
> +    fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
> +     cgraph_node_name (node), node->symbol.order,
> +     e ? "function is hot" : "function is normal");
> +  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
> +    = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> +  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
> + }
> +    }
>
>    del_node_map();
>    return 0;
> Index: predict.c
> ===================================================================
> --- predict.c (revision 201838)
> +++ predict.c (working copy)
> @@ -2715,6 +2715,9 @@
>    gcov_type count_max, true_count_max = 0;
>    basic_block bb;
>
> +  if (!ENTRY_BLOCK_PTR->count)
> +    return 0;
> +
>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>      true_count_max = MAX (bb->count, true_count_max);
>
>
> Which is discussed in this email:
> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html
>
> For COMDATs I need to extend this to do it a little later to do it
> recursively to catch the case of COMDATs feeding other COMDATs and I
> need to do some other handling to compute counts from the frequencies
> when inlining. I have been meaning to work on this for awhile but
> finally am getting to it this week. (Here's the last message from a
> later thread that forked off the above one:
> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)
>
> In the meantime, perhaps Honza's patch will suffice?
>
> Teresa
>
>>
>> Dehao
>>
>>>
>>> Honza
>>>>
>>>> Dehao
>>>>
>>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> > Is it possible to update the callee node summary after profile
>>>> > annotate (using information from inline instances which are not
>>>> > inlined in early inline)?
>>>> >
>>>> > David
>>>> >
>>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>>> >>>> could be a potential risk because some callee is marked unlikely
>>>> >>>> executed simply because they are inlined and eliminated in the O2
>>>> >>>> binary. But in ipa-inline it will not get inlined because the edge is
>>>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>>> >>>
>>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>>>> >>> It seems that having profile set incorrectly will lead to other problems later, too.
>>>> >>> We discussed similar problem with Teresa about the missing profiles for comdat,
>>>> >>> basically one should detect these cases as profile being lost and go with guessed
>>>> >>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>>>> >>> to this issue)
>>>> >>
>>>> >> The current AutoFDO implementation will take all functions that do not
>>>> >> have have profile as normally executed, thus use guessed profile for
>>>> >> it. This is like using profile for truly hot functions, and using O2
>>>> >> for other functions. This works fine. However, it leads to larger code
>>>> >> size (approximately 10%~20% larger than FDO).
>>>> >>
>>>> >> I'd like to introduce another mode for users who care about both
>>>> >> performance and code size, and can be sure that profile is
>>>> >> representative. In this mode, we will mark all functions without
>>>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>>>> >> (of optimized code) to represent profile, it's possible that some hot
>>>> >> functions (say foo) are inlined and fully eliminated into another hot
>>>> >> function (say bar). So in the profile, bar is cold, and because the
>>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>>>> >> before the profile annotation. However, after profile annotate, we can
>>>> >> infer from the bb count that foo->bar is hot, thus it should be
>>>> >> inlined in ipa-inline phase. However, because bar itself is marked
>>>> >> UNLIKELY_EXECUTED, it will not be inlined.
>>>> >>
>>>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>>>> >> we find an edge's callee is unlikely executed, add the edge count to
>>>> >> the callee's count and recalculate callee's frequency.
>>>> >>
>>>> >> Dehao
>>>> >>
>>>> >>>
>>>> >>> Honza
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Oct. 15, 2013, 3:42 p.m. UTC | #2
I'm planning to move it to ipa_profile (pass ipa-profile_estimate) and
doing it iteratively. Would that location work?

Teresa

On Tue, Oct 15, 2013 at 8:40 AM, Dehao Chen <dehao@google.com> wrote:
> Thanks for the pointer to Honza's patch. The patch does exactly what I
> need. But it only resides in the instrumentation based FDO path. Can
> we move the code to more common place (like rebuild_cgraph_edges)?
>
> Thanks,
> Dehao
>
> On Tue, Oct 15, 2013 at 7:59 AM, Teresa Johnson <tejohnson@google.com> wrote:
>> On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen <dehao@google.com> wrote:
>>> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>> For my test case, the entire inline instance is optimized away, so
>>>>> there is no info about it in the profile. I can do some fixup in the
>>>>> rebuild_cgraph_edge though.
>>>>
>>>> Yep, I understand that.  In this case we should turn PROFILE_READ to PROFILE_GUESSED
>>>> and guess the profile when we detect this (i.e. we have edges with non-0 counts into
>>>> functions with 0 profile).  That should prvent these from getting UNLIKELY_EXECUTED
>>>> and they will be inlined normal way.
>>>
>>> Oh, actually in AutoFDO, only functions with samples will be marked as
>>> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
>>
>> Here is Honza's patch that he was referring to:
>>
>> Index: tree-profile.c
>> ===================================================================
>> --- tree-profile.c (revision 201838)
>> +++ tree-profile.c (working copy)
>> @@ -604,6 +604,34 @@
>>
>>        pop_cfun ();
>>      }
>> +  /* See if 0 count function has non-0 count callers.  In this case we
>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>> +  FOR_EACH_DEFINED_FUNCTION (node)
>> +    {
>> +      struct cgraph_edge *e;
>> +      bool called = false;
>> +      if (node->count)
>> + continue;
>> +      for (e = node->callers; e; e = e->next_caller)
>> + {
>> +  if (e->count)
>> +    called = true;
>> +  if (cgraph_maybe_hot_edge_p (e))
>> +    break;
>> + }
>> +      if (e || called
>> +  && profile_status_for_function
>> +      (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
>> + {
>> +  if (dump_file)
>> +    fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
>> +     cgraph_node_name (node), node->symbol.order,
>> +     e ? "function is hot" : "function is normal");
>> +  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
>> +    = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>> +  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>> + }
>> +    }
>>
>>    del_node_map();
>>    return 0;
>> Index: predict.c
>> ===================================================================
>> --- predict.c (revision 201838)
>> +++ predict.c (working copy)
>> @@ -2715,6 +2715,9 @@
>>    gcov_type count_max, true_count_max = 0;
>>    basic_block bb;
>>
>> +  if (!ENTRY_BLOCK_PTR->count)
>> +    return 0;
>> +
>>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>>      true_count_max = MAX (bb->count, true_count_max);
>>
>>
>> Which is discussed in this email:
>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html
>>
>> For COMDATs I need to extend this to do it a little later to do it
>> recursively to catch the case of COMDATs feeding other COMDATs and I
>> need to do some other handling to compute counts from the frequencies
>> when inlining. I have been meaning to work on this for awhile but
>> finally am getting to it this week. (Here's the last message from a
>> later thread that forked off the above one:
>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)
>>
>> In the meantime, perhaps Honza's patch will suffice?
>>
>> Teresa
>>
>>>
>>> Dehao
>>>
>>>>
>>>> Honza
>>>>>
>>>>> Dehao
>>>>>
>>>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> > Is it possible to update the callee node summary after profile
>>>>> > annotate (using information from inline instances which are not
>>>>> > inlined in early inline)?
>>>>> >
>>>>> > David
>>>>> >
>>>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>>>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>>>> >>>> could be a potential risk because some callee is marked unlikely
>>>>> >>>> executed simply because they are inlined and eliminated in the O2
>>>>> >>>> binary. But in ipa-inline it will not get inlined because the edge is
>>>>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>>>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>>>> >>>
>>>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>>>>> >>> It seems that having profile set incorrectly will lead to other problems later, too.
>>>>> >>> We discussed similar problem with Teresa about the missing profiles for comdat,
>>>>> >>> basically one should detect these cases as profile being lost and go with guessed
>>>>> >>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>>>>> >>> to this issue)
>>>>> >>
>>>>> >> The current AutoFDO implementation will take all functions that do not
>>>>> >> have have profile as normally executed, thus use guessed profile for
>>>>> >> it. This is like using profile for truly hot functions, and using O2
>>>>> >> for other functions. This works fine. However, it leads to larger code
>>>>> >> size (approximately 10%~20% larger than FDO).
>>>>> >>
>>>>> >> I'd like to introduce another mode for users who care about both
>>>>> >> performance and code size, and can be sure that profile is
>>>>> >> representative. In this mode, we will mark all functions without
>>>>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>>>>> >> (of optimized code) to represent profile, it's possible that some hot
>>>>> >> functions (say foo) are inlined and fully eliminated into another hot
>>>>> >> function (say bar). So in the profile, bar is cold, and because the
>>>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>>>>> >> before the profile annotation. However, after profile annotate, we can
>>>>> >> infer from the bb count that foo->bar is hot, thus it should be
>>>>> >> inlined in ipa-inline phase. However, because bar itself is marked
>>>>> >> UNLIKELY_EXECUTED, it will not be inlined.
>>>>> >>
>>>>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>>>>> >> we find an edge's callee is unlikely executed, add the edge count to
>>>>> >> the callee's count and recalculate callee's frequency.
>>>>> >>
>>>>> >> Dehao
>>>>> >>
>>>>> >>>
>>>>> >>> Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Dehao Chen Oct. 15, 2013, 4:34 p.m. UTC | #3
Yes, that would work. So let's discard this patch because the fix for
comdat can also fix this problem.

Thanks,
Dehao

On Tue, Oct 15, 2013 at 8:42 AM, Teresa Johnson <tejohnson@google.com> wrote:
> I'm planning to move it to ipa_profile (pass ipa-profile_estimate) and
> doing it iteratively. Would that location work?
>
> Teresa
>
> On Tue, Oct 15, 2013 at 8:40 AM, Dehao Chen <dehao@google.com> wrote:
>> Thanks for the pointer to Honza's patch. The patch does exactly what I
>> need. But it only resides in the instrumentation based FDO path. Can
>> we move the code to more common place (like rebuild_cgraph_edges)?
>>
>> Thanks,
>> Dehao
>>
>> On Tue, Oct 15, 2013 at 7:59 AM, Teresa Johnson <tejohnson@google.com> wrote:
>>> On Mon, Oct 14, 2013 at 3:26 PM, Dehao Chen <dehao@google.com> wrote:
>>>> On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>>> For my test case, the entire inline instance is optimized away, so
>>>>>> there is no info about it in the profile. I can do some fixup in the
>>>>>> rebuild_cgraph_edge though.
>>>>>
>>>>> Yep, I understand that.  In this case we should turn PROFILE_READ to PROFILE_GUESSED
>>>>> and guess the profile when we detect this (i.e. we have edges with non-0 counts into
>>>>> functions with 0 profile).  That should prvent these from getting UNLIKELY_EXECUTED
>>>>> and they will be inlined normal way.
>>>>
>>>> Oh, actually in AutoFDO, only functions with samples will be marked as
>>>> PROFILE_READ. Others will all be marked as PROFILE_GUESSED.
>>>
>>> Here is Honza's patch that he was referring to:
>>>
>>> Index: tree-profile.c
>>> ===================================================================
>>> --- tree-profile.c (revision 201838)
>>> +++ tree-profile.c (working copy)
>>> @@ -604,6 +604,34 @@
>>>
>>>        pop_cfun ();
>>>      }
>>> +  /* See if 0 count function has non-0 count callers.  In this case we
>>> +     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
>>> +  FOR_EACH_DEFINED_FUNCTION (node)
>>> +    {
>>> +      struct cgraph_edge *e;
>>> +      bool called = false;
>>> +      if (node->count)
>>> + continue;
>>> +      for (e = node->callers; e; e = e->next_caller)
>>> + {
>>> +  if (e->count)
>>> +    called = true;
>>> +  if (cgraph_maybe_hot_edge_p (e))
>>> +    break;
>>> + }
>>> +      if (e || called
>>> +  && profile_status_for_function
>>> +      (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
>>> + {
>>> +  if (dump_file)
>>> +    fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
>>> +     cgraph_node_name (node), node->symbol.order,
>>> +     e ? "function is hot" : "function is normal");
>>> +  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
>>> +    = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>>> +  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
>>> + }
>>> +    }
>>>
>>>    del_node_map();
>>>    return 0;
>>> Index: predict.c
>>> ===================================================================
>>> --- predict.c (revision 201838)
>>> +++ predict.c (working copy)
>>> @@ -2715,6 +2715,9 @@
>>>    gcov_type count_max, true_count_max = 0;
>>>    basic_block bb;
>>>
>>> +  if (!ENTRY_BLOCK_PTR->count)
>>> +    return 0;
>>> +
>>>    FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
>>>      true_count_max = MAX (bb->count, true_count_max);
>>>
>>>
>>> Which is discussed in this email:
>>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01185.html
>>>
>>> For COMDATs I need to extend this to do it a little later to do it
>>> recursively to catch the case of COMDATs feeding other COMDATs and I
>>> need to do some other handling to compute counts from the frequencies
>>> when inlining. I have been meaning to work on this for awhile but
>>> finally am getting to it this week. (Here's the last message from a
>>> later thread that forked off the above one:
>>> http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01907.html)
>>>
>>> In the meantime, perhaps Honza's patch will suffice?
>>>
>>> Teresa
>>>
>>>>
>>>> Dehao
>>>>
>>>>>
>>>>> Honza
>>>>>>
>>>>>> Dehao
>>>>>>
>>>>>> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> > Is it possible to update the callee node summary after profile
>>>>>> > annotate (using information from inline instances which are not
>>>>>> > inlined in early inline)?
>>>>>> >
>>>>>> > David
>>>>>> >
>>>>>> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote:
>>>>>> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>>> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this
>>>>>> >>>> could be a potential risk because some callee is marked unlikely
>>>>>> >>>> executed simply because they are inlined and eliminated in the O2
>>>>>> >>>> binary. But in ipa-inline it will not get inlined because the edge is
>>>>>> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is
>>>>>> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot.
>>>>>> >>>
>>>>>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead?
>>>>>> >>> It seems that having profile set incorrectly will lead to other problems later, too.
>>>>>> >>> We discussed similar problem with Teresa about the missing profiles for comdat,
>>>>>> >>> basically one should detect these cases as profile being lost and go with guessed
>>>>>> >>> profile.  (I believe patch for that was posted, too, and so far it seems best approach
>>>>>> >>> to this issue)
>>>>>> >>
>>>>>> >> The current AutoFDO implementation will take all functions that do not
>>>>>> >> have have profile as normally executed, thus use guessed profile for
>>>>>> >> it. This is like using profile for truly hot functions, and using O2
>>>>>> >> for other functions. This works fine. However, it leads to larger code
>>>>>> >> size (approximately 10%~20% larger than FDO).
>>>>>> >>
>>>>>> >> I'd like to introduce another mode for users who care about both
>>>>>> >> performance and code size, and can be sure that profile is
>>>>>> >> representative. In this mode, we will mark all functions without
>>>>>> >> sample as "unlikely executed". However, because AutoFDO use debug info
>>>>>> >> (of optimized code) to represent profile, it's possible that some hot
>>>>>> >> functions (say foo) are inlined and fully eliminated into another hot
>>>>>> >> function (say bar). So in the profile, bar is cold, and because the
>>>>>> >> profile for foo::bar is eliminated, bar will not be inlined into foo
>>>>>> >> before the profile annotation. However, after profile annotate, we can
>>>>>> >> infer from the bb count that foo->bar is hot, thus it should be
>>>>>> >> inlined in ipa-inline phase. However, because bar itself is marked
>>>>>> >> UNLIKELY_EXECUTED, it will not be inlined.
>>>>>> >>
>>>>>> >> One possible workaround would be that during rebuild_cgraph_edges, if
>>>>>> >> we find an edge's callee is unlikely executed, add the edge count to
>>>>>> >> the callee's count and recalculate callee's frequency.
>>>>>> >>
>>>>>> >> Dehao
>>>>>> >>
>>>>>> >>>
>>>>>> >>> Honza
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Jan Hubicka Oct. 15, 2013, 9:03 p.m. UTC | #4
> Yes, that would work. So let's discard this patch because the fix for
> comdat can also fix this problem.

Unforutnately ipa-profile-estimate is an IPA pass and as such it does
not have access to profile_status_to_function.
You can probably just factor this out into a function that can be called
and for normal FDO we call it where the loop stis now and for auto-FDO we can
probably have another invocation from before early passes where auto-FDO is collected.
> >>> +      if (node->count)
> >>> + continue;
Also here we should sum the counts and consider function non unlikely executed
in the same way as probably_never_executed does.

I can prepare updated patch, but i am currently travelling, so i would not
be disapointed if you beat me ;)

Honza
Teresa Johnson Oct. 15, 2013, 9:22 p.m. UTC | #5
On Tue, Oct 15, 2013 at 2:03 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Yes, that would work. So let's discard this patch because the fix for
>> comdat can also fix this problem.
>
> Unforutnately ipa-profile-estimate is an IPA pass and as such it does
> not have access to profile_status_to_function.

I see. I was coding that up today but hadn't tested it yet.

> You can probably just factor this out into a function that can be called
> and for normal FDO we call it where the loop stis now and for auto-FDO we can
> probably have another invocation from before early passes where auto-FDO is collected.

Ok, let's go with that approach for now. It won't address the 0 count
COMDAT calling another 0 count COMDAT problem, but I will probably
just find a way to deal with this when inlining.

>> >>> +      if (node->count)
>> >>> + continue;
> Also here we should sum the counts and consider function non unlikely executed
> in the same way as probably_never_executed does.

I assume you mean by doing the same comparison to the number of
profile->runs. Yes, this makes sense.

>
> I can prepare updated patch, but i am currently travelling, so i would not
> be disapointed if you beat me ;)

I'm working on it, and I think based on Dehao's needs I am going to
split up the patch into two phases, the one that does just the part
you had sent a patch for (making sure 0 count routines with non-zero
calls are marked guessed and have their node frequency set
appropriately), and a subsequent one to do the count application when
we inline a 0-count routine into a non-zero callsite. I'll shoot for
getting this ready by tomorrow.

BTW, in your original patch you are checking for both e->count or
cgraph_maybe_hot_edge_p(e), but AFAICT the call to
cgraph_maybe_hot_edge_p will never return true when e->count is zero.
When there is a profile it will return false via maybe_hot_count_p
since e->count == 0. When there is no profile it will return false
when the callee has NODE_FREQUENCY_UNLIKELY_EXECUTED. So I think just
checking for e->count >0 is sufficient here.

Thanks,
Teresa

>
> Honza
Jan Hubicka Oct. 15, 2013, 9:57 p.m. UTC | #6
> On Tue, Oct 15, 2013 at 2:03 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> Yes, that would work. So let's discard this patch because the fix for
> >> comdat can also fix this problem.
> >
> > Unforutnately ipa-profile-estimate is an IPA pass and as such it does
> > not have access to profile_status_to_function.
> 
> I see. I was coding that up today but hadn't tested it yet.
> 
> > You can probably just factor this out into a function that can be called
> > and for normal FDO we call it where the loop stis now and for auto-FDO we can
> > probably have another invocation from before early passes where auto-FDO is collected.
> 
> Ok, let's go with that approach for now. It won't address the 0 count
> COMDAT calling another 0 count COMDAT problem, but I will probably
> just find a way to deal with this when inlining.

You can still propagate, since tree-profile is an simple-ipa pass.
> 
> >> >>> +      if (node->count)
> >> >>> + continue;
> > Also here we should sum the counts and consider function non unlikely executed
> > in the same way as probably_never_executed does.
> 
> I assume you mean by doing the same comparison to the number of
> profile->runs. Yes, this makes sense.

Yes.
> 
> >
> > I can prepare updated patch, but i am currently travelling, so i would not
> > be disapointed if you beat me ;)
> 
> I'm working on it, and I think based on Dehao's needs I am going to
> split up the patch into two phases, the one that does just the part
> you had sent a patch for (making sure 0 count routines with non-zero
> calls are marked guessed and have their node frequency set
> appropriately), and a subsequent one to do the count application when
> we inline a 0-count routine into a non-zero callsite. I'll shoot for
> getting this ready by tomorrow.
> 
> BTW, in your original patch you are checking for both e->count or
> cgraph_maybe_hot_edge_p(e), but AFAICT the call to
> cgraph_maybe_hot_edge_p will never return true when e->count is zero.
> When there is a profile it will return false via maybe_hot_count_p
> since e->count == 0. When there is no profile it will return false
> when the callee has NODE_FREQUENCY_UNLIKELY_EXECUTED. So I think just
> checking for e->count >0 is sufficient here.

I think I was checking caller count here (that is read) and the code
was supposed to make functoin with hot caller to be hot...

Honza
> 
> Thanks,
> Teresa
> 
> >
> > Honza
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Oct. 15, 2013, 10:39 p.m. UTC | #7
On Tue, Oct 15, 2013 at 2:57 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Tue, Oct 15, 2013 at 2:03 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> Yes, that would work. So let's discard this patch because the fix for
>> >> comdat can also fix this problem.
>> >
>> > Unforutnately ipa-profile-estimate is an IPA pass and as such it does
>> > not have access to profile_status_to_function.
>>
>> I see. I was coding that up today but hadn't tested it yet.
>>
>> > You can probably just factor this out into a function that can be called
>> > and for normal FDO we call it where the loop stis now and for auto-FDO we can
>> > probably have another invocation from before early passes where auto-FDO is collected.
>>
>> Ok, let's go with that approach for now. It won't address the 0 count
>> COMDAT calling another 0 count COMDAT problem, but I will probably
>> just find a way to deal with this when inlining.
>
> You can still propagate, since tree-profile is an simple-ipa pass.

Ok, I think I will leave that for the second patch, since I need to do
some testing on the effects - i.e. the propagation I had in mind would
make any 0-count routine called by a routine that was dropped to
guessed to also be dropped to guessed (and no longer unlikely). It may
be too aggressive, I need to check.

>>
>> >> >>> +      if (node->count)
>> >> >>> + continue;
>> > Also here we should sum the counts and consider function non unlikely executed
>> > in the same way as probably_never_executed does.
>>
>> I assume you mean by doing the same comparison to the number of
>> profile->runs. Yes, this makes sense.
>
> Yes.
>>
>> >
>> > I can prepare updated patch, but i am currently travelling, so i would not
>> > be disapointed if you beat me ;)
>>
>> I'm working on it, and I think based on Dehao's needs I am going to
>> split up the patch into two phases, the one that does just the part
>> you had sent a patch for (making sure 0 count routines with non-zero
>> calls are marked guessed and have their node frequency set
>> appropriately), and a subsequent one to do the count application when
>> we inline a 0-count routine into a non-zero callsite. I'll shoot for
>> getting this ready by tomorrow.
>>
>> BTW, in your original patch you are checking for both e->count or
>> cgraph_maybe_hot_edge_p(e), but AFAICT the call to
>> cgraph_maybe_hot_edge_p will never return true when e->count is zero.
>> When there is a profile it will return false via maybe_hot_count_p
>> since e->count == 0. When there is no profile it will return false
>> when the callee has NODE_FREQUENCY_UNLIKELY_EXECUTED. So I think just
>> checking for e->count >0 is sufficient here.
>
> I think I was checking caller count here (that is read) and the code
> was supposed to make functoin with hot caller to be hot...

The code in your patch was just checking the edge count, not the
caller count. cgraph_maybe_hot_edge_p also doesn't check the caller
count, just the edge count. Do we want to make all functions with hot
callers also be hot, even when the edge count is not hot? This may be
to aggressive. I was simply going to make the code check e->count.

Teresa

>
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > Honza
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
diff mbox

Patch

Index: tree-profile.c
===================================================================
--- tree-profile.c (revision 201838)
+++ tree-profile.c (working copy)
@@ -604,6 +604,34 @@ 

       pop_cfun ();
     }
+  /* See if 0 count function has non-0 count callers.  In this case we
+     lost some profile.  Drop its function profile to PROFILE_GUESSED.  */
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      struct cgraph_edge *e;
+      bool called = false;
+      if (node->count)
+ continue;
+      for (e = node->callers; e; e = e->next_caller)
+ {
+  if (e->count)
+    called = true;
+  if (cgraph_maybe_hot_edge_p (e))
+    break;
+ }
+      if (e || called
+  && profile_status_for_function
+      (DECL_STRUCT_FUNCTION (node->symbol.decl)) == PROFILE_READ)
+ {
+  if (dump_file)
+    fprintf (stderr, "Dropping 0 profile for %s/%i.%s based on calls.\n",
+     cgraph_node_name (node), node->symbol.order,
+     e ? "function is hot" : "function is normal");
+  profile_status_for_function (DECL_STRUCT_FUNCTION (node->symbol.decl))
+    = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
+  node->frequency = e ? NODE_FREQUENCY_HOT : NODE_FREQUENCY_NORMAL;
+ }
+    }

   del_node_map();
   return 0;
Index: predict.c
===================================================================
--- predict.c (revision 201838)
+++ predict.c (working copy)
@@ -2715,6 +2715,9 @@ 
   gcov_type count_max, true_count_max = 0;
   basic_block bb;

+  if (!ENTRY_BLOCK_PTR->count)
+    return 0;
+
   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
     true_count_max = MAX (bb->count, true_count_max);