diff mbox series

IPA: merge profiles more sensitively

Message ID 0d58d21e-2608-3c6a-bdd1-348851bc7782@suse.cz
State New
Headers show
Series IPA: merge profiles more sensitively | expand

Commit Message

Martin Liška Oct. 7, 2020, 11:18 a.m. UTC
During WPA we merge 2 functions where one is built with -O3
-fprofile-use while the other one uses -O0. We end up with:

prevailing node:

BitwiseCast (const double aFrom)
{
   long unsigned int temp;
   long unsigned int D.4528;

   <bb 2> :
   BitwiseCast (aFrom_2(D), &temp);
   _4 = temp;
   temp ={v} {CLOBBER};

   <bb 3> :
<L0>:
   return _4;

}

merged node:
BitwiseCast (const double aFrom)
{
   <bb 2> [count: 1509]:
   _3 = VIEW_CONVERT_EXPR<long unsigned int>(aFrom_2(D));
   return _3;
}

What happens is that we copy dst->count = src->count.ipa() but
we skip merging of BBs (a different CFG). Then we end up in situation
where profile of a ENTRY_BB_FOR_FN == uninitialized while
dst->count.quality is precise.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

	PR ipa/97295
	* ipa-utils.c (ipa_merge_profiles):
---
  gcc/ipa-utils.c | 8 ++++----
  1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Martin Liška Oct. 7, 2020, 11:20 a.m. UTC | #1
On 10/7/20 1:18 PM, Martin Liška wrote:
>      * ipa-utils.c (ipa_merge_profiles):

I forgot to fill up the ChangeLog entry. Fixed.

Martin
Martin Liška Oct. 7, 2020, 11:20 a.m. UTC | #2
On 10/7/20 1:18 PM, Martin Liška wrote:
>      * ipa-utils.c (ipa_merge_profiles):

I forgot to fill up the ChangeLog entry. Fixed.

Martin
Jan Hubicka Oct. 7, 2020, 11:23 a.m. UTC | #3
> During WPA we merge 2 functions where one is built with -O3
> -fprofile-use while the other one uses -O0. We end up with:
> 
> prevailing node:
> 
> BitwiseCast (const double aFrom)
> {
>   long unsigned int temp;
>   long unsigned int D.4528;
> 
>   <bb 2> :
>   BitwiseCast (aFrom_2(D), &temp);
>   _4 = temp;
>   temp ={v} {CLOBBER};
> 
>   <bb 3> :
> <L0>:
>   return _4;
> 
> }
> 
> merged node:
> BitwiseCast (const double aFrom)
> {
>   <bb 2> [count: 1509]:
>   _3 = VIEW_CONVERT_EXPR<long unsigned int>(aFrom_2(D));
>   return _3;
> }
> 
> What happens is that we copy dst->count = src->count.ipa() but
> we skip merging of BBs (a different CFG). Then we end up in situation
> where profile of a ENTRY_BB_FOR_FN == uninitialized while
> dst->count.quality is precise.

In usual case there of diverging CFGs both functions have non-trivial
profile, one is ipa and other is guessed (so count.initialized_p returns
false)

In this case we do want to copy the count which will lead in scaling the
local profile by the known entry frequency.  This is arranged by
fixup_cfg that does this kind of transitions.

Is this the bug that riggers Firefox ICE?
What goes wrong later?

Honza
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 	PR ipa/97295
> 	* ipa-utils.c (ipa_merge_profiles):
> ---
>  gcc/ipa-utils.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
> index 23e7f714306..fe2149069be 100644
> --- a/gcc/ipa-utils.c
> +++ b/gcc/ipa-utils.c
> @@ -437,10 +437,7 @@ ipa_merge_profiles (struct cgraph_node *dst,
>    else if (dst->count.ipa ().initialized_p ())
>      ;
>    else if (src->count.ipa ().initialized_p ())
> -    {
> -      copy_counts = true;
> -      dst->count = src->count.ipa ();
> -    }
> +    copy_counts = true;
>    /* If no updating needed return early.  */
>    if (dst->count == orig_count)
> @@ -620,6 +617,9 @@ ipa_merge_profiles (struct cgraph_node *dst,
>        bool dstscale = !copy_counts
>  		      && dstnum.initialized_p () && !(dstnum == dstden);
> +      if (copy_counts)
> +	dst->count = src->count.ipa ();
> +
>        /* TODO: merge also statement histograms.  */
>        FOR_ALL_BB_FN (srcbb, srccfun)
>  	{
> -- 
> 2.28.0
>
Martin Liška Oct. 7, 2020, 11:56 a.m. UTC | #4
On 10/7/20 1:23 PM, Jan Hubicka wrote:
>> During WPA we merge 2 functions where one is built with -O3
>> -fprofile-use while the other one uses -O0. We end up with:
>>
>> prevailing node:
>>
>> BitwiseCast (const double aFrom)
>> {
>>    long unsigned int temp;
>>    long unsigned int D.4528;
>>
>>    <bb 2> :
>>    BitwiseCast (aFrom_2(D), &temp);
>>    _4 = temp;
>>    temp ={v} {CLOBBER};
>>
>>    <bb 3> :
>> <L0>:
>>    return _4;
>>
>> }
>>
>> merged node:
>> BitwiseCast (const double aFrom)
>> {
>>    <bb 2> [count: 1509]:
>>    _3 = VIEW_CONVERT_EXPR<long unsigned int>(aFrom_2(D));
>>    return _3;
>> }
>>
>> What happens is that we copy dst->count = src->count.ipa() but
>> we skip merging of BBs (a different CFG). Then we end up in situation
>> where profile of a ENTRY_BB_FOR_FN == uninitialized while
>> dst->count.quality is precise.
> 
> In usual case there of diverging CFGs both functions have non-trivial
> profile, one is ipa and other is guessed (so count.initialized_p returns
> false)

In this case one TU is compiled with -O0, so we miss profile at all
(if I see correctly).

> 
> In this case we do want to copy the count which will lead in scaling the
> local profile by the known entry frequency.  This is arranged by
> fixup_cfg that does this kind of transitions.

All right, here we end up with

   profile_count num = node->count;
   profile_count den = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;

(gdb) p num.debug()
1509 (precise)
$7 = void
(gdb) p den.debug()
1509 (precise)
$8 = void

   bool scale = num.initialized_p () && !(num == den);

and so scale == false;
Thus no scaling happens and e.g. BB 2 misses profile (due to -O0):

(gdb) p debug_bb(bb)
<bb 2> :
BitwiseCast (aFrom_2(D), &temp);
_4 = temp;
temp ={v} {CLOBBER};

and we later ICE here:

Breakpoint 1, profile_count::to_frequency (this=0x7ffff73f8058, fun=0x7ffff73ee000) at ../../gcc/profile-count.c:273
273	  gcc_assert (REG_BR_PROB_BASE == BB_FREQ_MAX
(gdb) bt
#0  profile_count::to_frequency (this=0x7ffff73f8058, fun=0x7ffff73ee000) at ../../gcc/profile-count.c:273
#1  0x0000000000d72c4d in regstat_bb_compute_ri (bb=0x7ffff73f8000, live=0x2839f60) at ../../gcc/regstat.c:200
#2  0x0000000000d73005 in regstat_compute_ri () at ../../gcc/regstat.c:253
#3  0x0000000000be1a7d in ira (f=<optimized out>) at ../../gcc/ira.c:5294
#4  (anonymous namespace)::pass_ira::execute (this=<optimized out>) at ../../gcc/ira.c:5666
#5  0x0000000000d23b45 in execute_one_pass (pass=0x26ef850) at ../../gcc/passes.c:2502
#6  0x0000000000d23e6a in execute_pass_list_1 (pass=0x26ef850) at ../../gcc/passes.c:2590
#7  0x0000000000d23e9b in execute_pass_list_1 (pass=0x26ee5d0) at ../../gcc/passes.c:2591
#8  0x0000000000d23ef3 in execute_pass_list (fn=0x7ffff73ee000, pass=0x26ea890) at ../../gcc/passes.c:2601
#9  0x000000000090a7d7 in cgraph_node::expand (this=0x7ffff73ede10) at ../../gcc/cgraphunit.c:2300
#10 0x000000000090b232 in output_in_order () at ../../gcc/cgraphunit.c:2578
#11 0x000000000090b869 in symbol_table::compile (this=0x7ffff75d6100) at ../../gcc/cgraphunit.c:2819
#12 0x000000000083df7d in lto_main () at ../../gcc/lto/lto.c:653
#13 0x0000000000deda92 in compile_file () at ../../gcc/toplev.c:458
#14 0x0000000000df0c7a in do_compile () at ../../gcc/toplev.c:2278
#15 0x0000000000df0f63 in toplev::main (this=0x7fffffffd9fe, argc=19, argv=0x26c26f0) at ../../gcc/toplev.c:2417
#16 0x00000000008196af in main (argc=19, argv=0x7fffffffdb08) at ../../gcc/main.c:39

So what do you suggest to fix it :P?

> 
> Is this the bug that riggers Firefox ICE?
> What goes wrong later?
> 
> Honza
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 	PR ipa/97295
>> 	* ipa-utils.c (ipa_merge_profiles):
>> ---
>>   gcc/ipa-utils.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
>> index 23e7f714306..fe2149069be 100644
>> --- a/gcc/ipa-utils.c
>> +++ b/gcc/ipa-utils.c
>> @@ -437,10 +437,7 @@ ipa_merge_profiles (struct cgraph_node *dst,
>>     else if (dst->count.ipa ().initialized_p ())
>>       ;
>>     else if (src->count.ipa ().initialized_p ())
>> -    {
>> -      copy_counts = true;
>> -      dst->count = src->count.ipa ();
>> -    }
>> +    copy_counts = true;
>>     /* If no updating needed return early.  */
>>     if (dst->count == orig_count)
>> @@ -620,6 +617,9 @@ ipa_merge_profiles (struct cgraph_node *dst,
>>         bool dstscale = !copy_counts
>>   		      && dstnum.initialized_p () && !(dstnum == dstden);
>> +      if (copy_counts)
>> +	dst->count = src->count.ipa ();
>> +
>>         /* TODO: merge also statement histograms.  */
>>         FOR_ALL_BB_FN (srcbb, srccfun)
>>   	{
>> -- 
>> 2.28.0
>>
Martin Liška Oct. 9, 2020, 11:06 a.m. UTC | #5
On 10/7/20 1:56 PM, Martin Liška wrote:
> So what do you suggest to fix it :P?

Note that we have a nice reduced test-case here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97295#c6

It's really a mixture of -O1 -fprofile-use and -O0.

Martin
diff mbox series

Patch

diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
index 23e7f714306..fe2149069be 100644
--- a/gcc/ipa-utils.c
+++ b/gcc/ipa-utils.c
@@ -437,10 +437,7 @@  ipa_merge_profiles (struct cgraph_node *dst,
    else if (dst->count.ipa ().initialized_p ())
      ;
    else if (src->count.ipa ().initialized_p ())
-    {
-      copy_counts = true;
-      dst->count = src->count.ipa ();
-    }
+    copy_counts = true;
  
    /* If no updating needed return early.  */
    if (dst->count == orig_count)
@@ -620,6 +617,9 @@  ipa_merge_profiles (struct cgraph_node *dst,
        bool dstscale = !copy_counts
  		      && dstnum.initialized_p () && !(dstnum == dstden);
  
+      if (copy_counts)
+	dst->count = src->count.ipa ();
+
        /* TODO: merge also statement histograms.  */
        FOR_ALL_BB_FN (srcbb, srccfun)
  	{