Message ID | CAO2gOZVX2QuoUGRy6uRiT4A+=xdeCXmq04tF0Qr1kqezP-y5-A@mail.gmail.com |
---|---|
State | New |
Headers | show |
Looks like there is some inconsistency between edge hotness and callee frequency? David On Mon, Oct 14, 2013 at 9:08 AM, Dehao Chen <dehao@google.com> wrote: > This patch forces to use profile info to check if an edge is hot when > profile is available. > > Bootstrapped and passed regression tests. > > OK for trunk? > > Thanks, > Dehao > > gcc/ChangeLog: > 2013-10-14 Dehao Chen <dehao@google.com> > * predict.c(cgraph_maybe_hot_edge_p): Decide edge's hotness from profile. > > Index: gcc/predict.c > =================================================================== > --- gcc/predict.c (revision 203568) > +++ gcc/predict.c (working copy) > @@ -185,10 +185,8 @@ maybe_hot_bb_p (struct function *fun, const_basic_ > bool > cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) > { > - if (profile_info && flag_branch_probabilities > - && !maybe_hot_count_p (NULL, > - edge->count)) > - return false; > + if (profile_info && flag_branch_probabilities) > + return maybe_hot_count_p (NULL, edge->count); > if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED > || (edge->callee > && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))
Not for instrumented FDO (not as I know of). But for AutoFDO, this could be a potential risk because some callee is marked unlikely executed simply because they are inlined and eliminated in the O2 binary. But in ipa-inline it will not get inlined because the edge is not hot from cgraph_maybe_hot_edge_p (because callee is UNLIKELY_EXECUTED), while the edge->count is actually hot. Dehao On Mon, Oct 14, 2013 at 9:13 AM, Xinliang David Li <davidxl@google.com> wrote: > Looks like there is some inconsistency between edge hotness and callee > frequency? > > David > > On Mon, Oct 14, 2013 at 9:08 AM, Dehao Chen <dehao@google.com> wrote: >> This patch forces to use profile info to check if an edge is hot when >> profile is available. >> >> Bootstrapped and passed regression tests. >> >> OK for trunk? >> >> Thanks, >> Dehao >> >> gcc/ChangeLog: >> 2013-10-14 Dehao Chen <dehao@google.com> >> * predict.c(cgraph_maybe_hot_edge_p): Decide edge's hotness from profile. >> >> Index: gcc/predict.c >> =================================================================== >> --- gcc/predict.c (revision 203568) >> +++ gcc/predict.c (working copy) >> @@ -185,10 +185,8 @@ maybe_hot_bb_p (struct function *fun, const_basic_ >> bool >> cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) >> { >> - if (profile_info && flag_branch_probabilities >> - && !maybe_hot_count_p (NULL, >> - edge->count)) >> - return false; >> + if (profile_info && flag_branch_probabilities) >> + return maybe_hot_count_p (NULL, edge->count); >> if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED >> || (edge->callee >> && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))
> Not for instrumented FDO (not as I know of). But for AutoFDO, this > could be a potential risk because some callee is marked unlikely > executed simply because they are inlined and eliminated in the O2 > binary. But in ipa-inline it will not get inlined because the edge is > not hot from cgraph_maybe_hot_edge_p (because callee is > UNLIKELY_EXECUTED), while the edge->count is actually hot. Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead? It seems that having profile set incorrectly will lead to other problems later, too. We discussed similar problem with Teresa about the missing profiles for comdat, basically one should detect these cases as profile being lost and go with guessed profile. (I believe patch for that was posted, too, and so far it seems best approach to this issue) Honza
On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >> Not for instrumented FDO (not as I know of). But for AutoFDO, this >> could be a potential risk because some callee is marked unlikely >> executed simply because they are inlined and eliminated in the O2 >> binary. But in ipa-inline it will not get inlined because the edge is >> not hot from cgraph_maybe_hot_edge_p (because callee is >> UNLIKELY_EXECUTED), while the edge->count is actually hot. > > Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead? > It seems that having profile set incorrectly will lead to other problems later, too. > We discussed similar problem with Teresa about the missing profiles for comdat, > basically one should detect these cases as profile being lost and go with guessed > profile. (I believe patch for that was posted, too, and so far it seems best approach > to this issue) The current AutoFDO implementation will take all functions that do not have have profile as normally executed, thus use guessed profile for it. This is like using profile for truly hot functions, and using O2 for other functions. This works fine. However, it leads to larger code size (approximately 10%~20% larger than FDO). I'd like to introduce another mode for users who care about both performance and code size, and can be sure that profile is representative. In this mode, we will mark all functions without sample as "unlikely executed". However, because AutoFDO use debug info (of optimized code) to represent profile, it's possible that some hot functions (say foo) are inlined and fully eliminated into another hot function (say bar). So in the profile, bar is cold, and because the profile for foo::bar is eliminated, bar will not be inlined into foo before the profile annotation. However, after profile annotate, we can infer from the bb count that foo->bar is hot, thus it should be inlined in ipa-inline phase. However, because bar itself is marked UNLIKELY_EXECUTED, it will not be inlined. One possible workaround would be that during rebuild_cgraph_edges, if we find an edge's callee is unlikely executed, add the edge count to the callee's count and recalculate callee's frequency. Dehao > > Honza
Is it possible to update the callee node summary after profile annotate (using information from inline instances which are not inlined in early inline)? David On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote: > On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >>> Not for instrumented FDO (not as I know of). But for AutoFDO, this >>> could be a potential risk because some callee is marked unlikely >>> executed simply because they are inlined and eliminated in the O2 >>> binary. But in ipa-inline it will not get inlined because the edge is >>> not hot from cgraph_maybe_hot_edge_p (because callee is >>> UNLIKELY_EXECUTED), while the edge->count is actually hot. >> >> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead? >> It seems that having profile set incorrectly will lead to other problems later, too. >> We discussed similar problem with Teresa about the missing profiles for comdat, >> basically one should detect these cases as profile being lost and go with guessed >> profile. (I believe patch for that was posted, too, and so far it seems best approach >> to this issue) > > The current AutoFDO implementation will take all functions that do not > have have profile as normally executed, thus use guessed profile for > it. This is like using profile for truly hot functions, and using O2 > for other functions. This works fine. However, it leads to larger code > size (approximately 10%~20% larger than FDO). > > I'd like to introduce another mode for users who care about both > performance and code size, and can be sure that profile is > representative. In this mode, we will mark all functions without > sample as "unlikely executed". However, because AutoFDO use debug info > (of optimized code) to represent profile, it's possible that some hot > functions (say foo) are inlined and fully eliminated into another hot > function (say bar). So in the profile, bar is cold, and because the > profile for foo::bar is eliminated, bar will not be inlined into foo > before the profile annotation. However, after profile annotate, we can > infer from the bb count that foo->bar is hot, thus it should be > inlined in ipa-inline phase. However, because bar itself is marked > UNLIKELY_EXECUTED, it will not be inlined. > > One possible workaround would be that during rebuild_cgraph_edges, if > we find an edge's callee is unlikely executed, add the edge count to > the callee's count and recalculate callee's frequency. > > Dehao > >> >> Honza
For my test case, the entire inline instance is optimized away, so there is no info about it in the profile. I can do some fixup in the rebuild_cgraph_edge though. Dehao On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote: > Is it possible to update the callee node summary after profile > annotate (using information from inline instances which are not > inlined in early inline)? > > David > > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote: >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this >>>> could be a potential risk because some callee is marked unlikely >>>> executed simply because they are inlined and eliminated in the O2 >>>> binary. But in ipa-inline it will not get inlined because the edge is >>>> not hot from cgraph_maybe_hot_edge_p (because callee is >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot. >>> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead? >>> It seems that having profile set incorrectly will lead to other problems later, too. >>> We discussed similar problem with Teresa about the missing profiles for comdat, >>> basically one should detect these cases as profile being lost and go with guessed >>> profile. (I believe patch for that was posted, too, and so far it seems best approach >>> to this issue) >> >> The current AutoFDO implementation will take all functions that do not >> have have profile as normally executed, thus use guessed profile for >> it. This is like using profile for truly hot functions, and using O2 >> for other functions. This works fine. However, it leads to larger code >> size (approximately 10%~20% larger than FDO). >> >> I'd like to introduce another mode for users who care about both >> performance and code size, and can be sure that profile is >> representative. In this mode, we will mark all functions without >> sample as "unlikely executed". However, because AutoFDO use debug info >> (of optimized code) to represent profile, it's possible that some hot >> functions (say foo) are inlined and fully eliminated into another hot >> function (say bar). So in the profile, bar is cold, and because the >> profile for foo::bar is eliminated, bar will not be inlined into foo >> before the profile annotation. However, after profile annotate, we can >> infer from the bb count that foo->bar is hot, thus it should be >> inlined in ipa-inline phase. However, because bar itself is marked >> UNLIKELY_EXECUTED, it will not be inlined. >> >> One possible workaround would be that during rebuild_cgraph_edges, if >> we find an edge's callee is unlikely executed, add the edge count to >> the callee's count and recalculate callee's frequency. >> >> Dehao >> >>> >>> Honza
> For my test case, the entire inline instance is optimized away, so > there is no info about it in the profile. I can do some fixup in the > rebuild_cgraph_edge though. Yep, I understand that. In this case we should turn PROFILE_READ to PROFILE_GUESSED and guess the profile when we detect this (i.e. we have edges with non-0 counts into functions with 0 profile). That should prvent these from getting UNLIKELY_EXECUTED and they will be inlined normal way. Honza > > Dehao > > On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote: > > Is it possible to update the callee node summary after profile > > annotate (using information from inline instances which are not > > inlined in early inline)? > > > > David > > > > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote: > >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote: > >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this > >>>> could be a potential risk because some callee is marked unlikely > >>>> executed simply because they are inlined and eliminated in the O2 > >>>> binary. But in ipa-inline it will not get inlined because the edge is > >>>> not hot from cgraph_maybe_hot_edge_p (because callee is > >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot. > >>> > >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead? > >>> It seems that having profile set incorrectly will lead to other problems later, too. > >>> We discussed similar problem with Teresa about the missing profiles for comdat, > >>> basically one should detect these cases as profile being lost and go with guessed > >>> profile. (I believe patch for that was posted, too, and so far it seems best approach > >>> to this issue) > >> > >> The current AutoFDO implementation will take all functions that do not > >> have have profile as normally executed, thus use guessed profile for > >> it. This is like using profile for truly hot functions, and using O2 > >> for other functions. This works fine. However, it leads to larger code > >> size (approximately 10%~20% larger than FDO). > >> > >> I'd like to introduce another mode for users who care about both > >> performance and code size, and can be sure that profile is > >> representative. In this mode, we will mark all functions without > >> sample as "unlikely executed". However, because AutoFDO use debug info > >> (of optimized code) to represent profile, it's possible that some hot > >> functions (say foo) are inlined and fully eliminated into another hot > >> function (say bar). So in the profile, bar is cold, and because the > >> profile for foo::bar is eliminated, bar will not be inlined into foo > >> before the profile annotation. However, after profile annotate, we can > >> infer from the bb count that foo->bar is hot, thus it should be > >> inlined in ipa-inline phase. However, because bar itself is marked > >> UNLIKELY_EXECUTED, it will not be inlined. > >> > >> One possible workaround would be that during rebuild_cgraph_edges, if > >> we find an edge's callee is unlikely executed, add the edge count to > >> the callee's count and recalculate callee's frequency. > >> > >> Dehao > >> > >>> > >>> Honza
On Mon, Oct 14, 2013 at 2:34 PM, Dehao Chen <dehao@google.com> wrote: > For my test case, the entire inline instance is optimized away, do you mean there is no out of line instance for the target function in the profile binary? David > so > there is no info about it in the profile. I can do some fixup in the > rebuild_cgraph_edge though. > > Dehao > > On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote: >> Is it possible to update the callee node summary after profile >> annotate (using information from inline instances which are not >> inlined in early inline)? >> >> David >> >> On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote: >>> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >>>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this >>>>> could be a potential risk because some callee is marked unlikely >>>>> executed simply because they are inlined and eliminated in the O2 >>>>> binary. But in ipa-inline it will not get inlined because the edge is >>>>> not hot from cgraph_maybe_hot_edge_p (because callee is >>>>> UNLIKELY_EXECUTED), while the edge->count is actually hot. >>>> >>>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead? >>>> It seems that having profile set incorrectly will lead to other problems later, too. >>>> We discussed similar problem with Teresa about the missing profiles for comdat, >>>> basically one should detect these cases as profile being lost and go with guessed >>>> profile. (I believe patch for that was posted, too, and so far it seems best approach >>>> to this issue) >>> >>> The current AutoFDO implementation will take all functions that do not >>> have have profile as normally executed, thus use guessed profile for >>> it. This is like using profile for truly hot functions, and using O2 >>> for other functions. This works fine. However, it leads to larger code >>> size (approximately 10%~20% larger than FDO). >>> >>> I'd like to introduce another mode for users who care about both >>> performance and code size, and can be sure that profile is >>> representative. In this mode, we will mark all functions without >>> sample as "unlikely executed". However, because AutoFDO use debug info >>> (of optimized code) to represent profile, it's possible that some hot >>> functions (say foo) are inlined and fully eliminated into another hot >>> function (say bar). So in the profile, bar is cold, and because the >>> profile for foo::bar is eliminated, bar will not be inlined into foo >>> before the profile annotation. However, after profile annotate, we can >>> infer from the bb count that foo->bar is hot, thus it should be >>> inlined in ipa-inline phase. However, because bar itself is marked >>> UNLIKELY_EXECUTED, it will not be inlined. >>> >>> One possible workaround would be that during rebuild_cgraph_edges, if >>> we find an edge's callee is unlikely executed, add the edge count to >>> the callee's count and recalculate callee's frequency. >>> >>> Dehao >>> >>>> >>>> Honza
On Mon, Oct 14, 2013 at 3:04 PM, Xinliang David Li <davidxl@google.com> wrote: > On Mon, Oct 14, 2013 at 2:34 PM, Dehao Chen <dehao@google.com> wrote: >> For my test case, the entire inline instance is optimized away, > > do you mean there is no out of line instance for the target function > in the profile binary? Yes, and there is no inline instance either. Dehao > > David > >> so >> there is no info about it in the profile. I can do some fixup in the >> rebuild_cgraph_edge though. >> >> Dehao >> >> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote: >>> Is it possible to update the callee node summary after profile >>> annotate (using information from inline instances which are not >>> inlined in early inline)? >>> >>> David >>> >>> On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote: >>>> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >>>>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this >>>>>> could be a potential risk because some callee is marked unlikely >>>>>> executed simply because they are inlined and eliminated in the O2 >>>>>> binary. But in ipa-inline it will not get inlined because the edge is >>>>>> not hot from cgraph_maybe_hot_edge_p (because callee is >>>>>> UNLIKELY_EXECUTED), while the edge->count is actually hot. >>>>> >>>>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead? >>>>> It seems that having profile set incorrectly will lead to other problems later, too. >>>>> We discussed similar problem with Teresa about the missing profiles for comdat, >>>>> basically one should detect these cases as profile being lost and go with guessed >>>>> profile. (I believe patch for that was posted, too, and so far it seems best approach >>>>> to this issue) >>>> >>>> The current AutoFDO implementation will take all functions that do not >>>> have have profile as normally executed, thus use guessed profile for >>>> it. This is like using profile for truly hot functions, and using O2 >>>> for other functions. This works fine. However, it leads to larger code >>>> size (approximately 10%~20% larger than FDO). >>>> >>>> I'd like to introduce another mode for users who care about both >>>> performance and code size, and can be sure that profile is >>>> representative. In this mode, we will mark all functions without >>>> sample as "unlikely executed". However, because AutoFDO use debug info >>>> (of optimized code) to represent profile, it's possible that some hot >>>> functions (say foo) are inlined and fully eliminated into another hot >>>> function (say bar). So in the profile, bar is cold, and because the >>>> profile for foo::bar is eliminated, bar will not be inlined into foo >>>> before the profile annotation. However, after profile annotate, we can >>>> infer from the bb count that foo->bar is hot, thus it should be >>>> inlined in ipa-inline phase. However, because bar itself is marked >>>> UNLIKELY_EXECUTED, it will not be inlined. >>>> >>>> One possible workaround would be that during rebuild_cgraph_edges, if >>>> we find an edge's callee is unlikely executed, add the edge count to >>>> the callee's count and recalculate callee's frequency. >>>> >>>> Dehao >>>> >>>>> >>>>> Honza
On Mon, Oct 14, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >> For my test case, the entire inline instance is optimized away, so >> there is no info about it in the profile. I can do some fixup in the >> rebuild_cgraph_edge though. > > Yep, I understand that. In this case we should turn PROFILE_READ to PROFILE_GUESSED > and guess the profile when we detect this (i.e. we have edges with non-0 counts into > functions with 0 profile). That should prvent these from getting UNLIKELY_EXECUTED > and they will be inlined normal way. Oh, actually in AutoFDO, only functions with samples will be marked as PROFILE_READ. Others will all be marked as PROFILE_GUESSED. Dehao > > Honza >> >> Dehao >> >> On Mon, Oct 14, 2013 at 2:27 PM, Xinliang David Li <davidxl@google.com> wrote: >> > Is it possible to update the callee node summary after profile >> > annotate (using information from inline instances which are not >> > inlined in early inline)? >> > >> > David >> > >> > On Mon, Oct 14, 2013 at 2:18 PM, Dehao Chen <dehao@google.com> wrote: >> >> On Mon, Oct 14, 2013 at 12:49 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >> >>>> Not for instrumented FDO (not as I know of). But for AutoFDO, this >> >>>> could be a potential risk because some callee is marked unlikely >> >>>> executed simply because they are inlined and eliminated in the O2 >> >>>> binary. But in ipa-inline it will not get inlined because the edge is >> >>>> not hot from cgraph_maybe_hot_edge_p (because callee is >> >>>> UNLIKELY_EXECUTED), while the edge->count is actually hot. >> >>> >> >>> Can't you prevent setting calle to UNLIKELY_EXECUTED in these cases instead? >> >>> It seems that having profile set incorrectly will lead to other problems later, too. >> >>> We discussed similar problem with Teresa about the missing profiles for comdat, >> >>> basically one should detect these cases as profile being lost and go with guessed >> >>> profile. (I believe patch for that was posted, too, and so far it seems best approach >> >>> to this issue) >> >> >> >> The current AutoFDO implementation will take all functions that do not >> >> have have profile as normally executed, thus use guessed profile for >> >> it. This is like using profile for truly hot functions, and using O2 >> >> for other functions. This works fine. However, it leads to larger code >> >> size (approximately 10%~20% larger than FDO). >> >> >> >> I'd like to introduce another mode for users who care about both >> >> performance and code size, and can be sure that profile is >> >> representative. In this mode, we will mark all functions without >> >> sample as "unlikely executed". However, because AutoFDO use debug info >> >> (of optimized code) to represent profile, it's possible that some hot >> >> functions (say foo) are inlined and fully eliminated into another hot >> >> function (say bar). So in the profile, bar is cold, and because the >> >> profile for foo::bar is eliminated, bar will not be inlined into foo >> >> before the profile annotation. However, after profile annotate, we can >> >> infer from the bb count that foo->bar is hot, thus it should be >> >> inlined in ipa-inline phase. However, because bar itself is marked >> >> UNLIKELY_EXECUTED, it will not be inlined. >> >> >> >> One possible workaround would be that during rebuild_cgraph_edges, if >> >> we find an edge's callee is unlikely executed, add the edge count to >> >> the callee's count and recalculate callee's frequency. >> >> >> >> Dehao >> >> >> >>> >> >>> Honza
Index: gcc/predict.c =================================================================== --- gcc/predict.c (revision 203568) +++ gcc/predict.c (working copy) @@ -185,10 +185,8 @@ maybe_hot_bb_p (struct function *fun, const_basic_ bool cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) { - if (profile_info && flag_branch_probabilities - && !maybe_hot_count_p (NULL, - edge->count)) - return false; + if (profile_info && flag_branch_probabilities) + return maybe_hot_count_p (NULL, edge->count); if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED || (edge->callee && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))