[2/3] Fix incorrect loop exit edge probability [PR103270]

Message ID	20211208055416.1415283-3-luoxhu@linux.ibm.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3CA823858C60 To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/3] Fix incorrect loop exit edge probability [PR103270] Date: Tue, 7 Dec 2021 23:54:15 -0600 Message-Id: <20211208055416.1415283-3-luoxhu@linux.ibm.com> In-Reply-To: <20211208055416.1415283-1-luoxhu@linux.ibm.com> References: <20211208055416.1415283-1-luoxhu@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list From: Xionghu Luo via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Xionghu Luo <luoxhu@linux.ibm.com> Cc: segher@kernel.crashing.org, Xionghu Luo <luoxhu@linux.ibm.com>, hubicka@kam.mff.cuni.cz, wschmidt@linux.ibm.com, linkw@gcc.gnu.org, dje.gcc@gmail.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	Dependency patches for hoist LIM code to cold loop \| expand [0/3] Dependency patches for hoist LIM code to cold loop [1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL [2/3] Fix incorrect loop exit edge probability [PR103270] [3/3] Fix loop split incorrect count and probability

Xionghu Luo Dec. 8, 2021, 5:54 a.m. UTC

r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
profile-estimate when predict_extra_loop_exits, outer loop's exit edge
is marked as inner loop's extra loop exit and set with incorrect
prediction, then a hot inner loop will become cold loop finally through
optimizations, this patch add loop check when searching extra exit edges
to avoid unexpected predict_edge from predict_paths_for_bb.

Regression tested on P8LE, OK for master?

gcc/ChangeLog:

	PR middle-end/103270
	* predict.c (predict_extra_loop_exits): Add loop parameter.
	(predict_loops): Call with loop argument.

gcc/testsuite/ChangeLog:

	PR middle-end/103270
	* gcc.dg/pr103270.c: New test.
---
 gcc/predict.c                   | 10 ++++++----
 gcc/testsuite/gcc.dg/pr103270.c | 19 +++++++++++++++++++
 2 files changed, 25 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr103270.c

Jeff Law Dec. 8, 2021, 11:28 p.m. UTC | #1

On 12/7/2021 10:54 PM, Xionghu Luo via Gcc-patches wrote:
> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
> is marked as inner loop's extra loop exit and set with incorrect
> prediction, then a hot inner loop will become cold loop finally through
> optimizations, this patch add loop check when searching extra exit edges
> to avoid unexpected predict_edge from predict_paths_for_bb.
>
> Regression tested on P8LE, OK for master?
>
> gcc/ChangeLog:
>
> 	PR middle-end/103270
> 	* predict.c (predict_extra_loop_exits): Add loop parameter.
> 	(predict_loops): Call with loop argument.
>
> gcc/testsuite/ChangeLog:
>
> 	PR middle-end/103270
> 	* gcc.dg/pr103270.c: New test.
OK
jeff

Jan Hubicka Dec. 13, 2021, 9:25 a.m. UTC | #2

> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
> is marked as inner loop's extra loop exit and set with incorrect
> prediction, then a hot inner loop will become cold loop finally through
> optimizations, this patch add loop check when searching extra exit edges
> to avoid unexpected predict_edge from predict_paths_for_bb.
> 
> Regression tested on P8LE, OK for master?
> 
> gcc/ChangeLog:
> 
> 	PR middle-end/103270
> 	* predict.c (predict_extra_loop_exits): Add loop parameter.
> 	(predict_loops): Call with loop argument.

With changes to branch predictors it is useful to re-test their
effectivity on spec and see if their hitrates are still mathcing
reality.  You can do it by buiding spec with -fprofile-generate, train
it and then build with -fprofile-use -fdump-tree-ipa-profile-details
and use contrib/analyze_brprob.py that will collect info on how they
work.

This patch looks good to me, but it would be nice to have things reality
checked (and since we did not do the stats for some time, there may be
surprises) so if you could run the specs and post results of
analyze_brprob, it would be great.  I will also try to get to that soon,
but currently I am bit swamped by other problems I noticed on clang
builds.

Thanks a lot for working on profile fixes - I am trying now to get
things into shape.  With Martin we added basic testing infrastructure
for keeping track of profile updates and I am trying to see how it works
in practice now.  Hopefully it will make it easier to judge on profile
updating patches. I would welcome list of patches I should look at.

I will write separate mail on this.
Honza
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR middle-end/103270
> 	* gcc.dg/pr103270.c: New test.
> ---
>  gcc/predict.c                   | 10 ++++++----
>  gcc/testsuite/gcc.dg/pr103270.c | 19 +++++++++++++++++++
>  2 files changed, 25 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr103270.c
> 
> diff --git a/gcc/predict.c b/gcc/predict.c
> index 3cb4e3c0eb5..5b6e0cf722b 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -1859,7 +1859,7 @@ predict_iv_comparison (class loop *loop, basic_block bb,
>     exits to predict them using PRED_LOOP_EXTRA_EXIT.  */
>  
>  static void
> -predict_extra_loop_exits (edge exit_edge)
> +predict_extra_loop_exits (class loop *loop, edge exit_edge)
>  {
>    unsigned i;
>    bool check_value_one;
> @@ -1912,12 +1912,14 @@ predict_extra_loop_exits (edge exit_edge)
>  	continue;
>        if (EDGE_COUNT (e->src->succs) != 1)
>  	{
> -	  predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
> +	  predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
> +					 loop);
>  	  continue;
>  	}
>  
>        FOR_EACH_EDGE (e1, ei, e->src->preds)
> -	predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
> +	predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
> +				       loop);
>      }
>  }
>  
> @@ -2008,7 +2010,7 @@ predict_loops (void)
>  			 ex->src->index, ex->dest->index);
>  	      continue;
>  	    }
> -	  predict_extra_loop_exits (ex);
> +	  predict_extra_loop_exits (loop, ex);
>  
>  	  if (number_of_iterations_exit (loop, ex, &niter_desc, false, false))
>  	    niter = niter_desc.niter;
> diff --git a/gcc/testsuite/gcc.dg/pr103270.c b/gcc/testsuite/gcc.dg/pr103270.c
> new file mode 100644
> index 00000000000..819310e360e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr103270.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
> +
> +void test(int a, int* i)
> +{
> +  for (; a < 5; ++a)
> +    {
> +      int b = 0;
> +      int c = 0;
> +      for (; b != -11; b--)
> +	for (int d = 0; d ==0; d++)
> +	  {
> +	    *i += c & a;
> +	    c = b;
> +	  }
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-not "extra loop exit heuristics of edge\[^:\]*:" "profile_estimate"} } */
> -- 
> 2.25.1
>

Xionghu Luo Dec. 14, 2021, 9:27 a.m. UTC | #3

On 2021/12/13 17:25, Jan Hubicka wrote:
>> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
>> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
>> is marked as inner loop's extra loop exit and set with incorrect
>> prediction, then a hot inner loop will become cold loop finally through
>> optimizations, this patch add loop check when searching extra exit edges
>> to avoid unexpected predict_edge from predict_paths_for_bb.
>>
>> Regression tested on P8LE, OK for master?
>>
>> gcc/ChangeLog:
>>
>> 	PR middle-end/103270
>> 	* predict.c (predict_extra_loop_exits): Add loop parameter.
>> 	(predict_loops): Call with loop argument.
> 
> With changes to branch predictors it is useful to re-test their
> effectivity on spec and see if their hitrates are still mathcing
> reality.  You can do it by buiding spec with -fprofile-generate, train
> it and then build with -fprofile-use -fdump-tree-ipa-profile-details
> and use contrib/analyze_brprob.py that will collect info on how they
> work.
> 
> This patch looks good to me, but it would be nice to have things reality
> checked (and since we did not do the stats for some time, there may be
> surprises) so if you could run the specs and post results of
> analyze_brprob, it would be great.  I will also try to get to that soon,
> but currently I am bit swamped by other problems I noticed on clang
> builds.
> 
> Thanks a lot for working on profile fixes - I am trying now to get
> things into shape.  With Martin we added basic testing infrastructure
> for keeping track of profile updates and I am trying to see how it works
> in practice now.  Hopefully it will make it easier to judge on profile
> updating patches. I would welcome list of patches I should look at.
> 
> I will write separate mail on this.
> Honza


With the patch, the analyze_brprob.py outputs below data with PGO build,
there is no verification code in the script, so how to check whether it
is correct?  Run it again without the patch and compare "extra loop exit"
field?


./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
__builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
loop guard                                   1177   1.6%       56.33%   42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
loop iterations                              4761   6.3%       99.98%   84.27% /  84.27%    73463634555   73.46G  13.9%
pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
opcode values nonequal (on trees)           12237  16.3%       70.70%   70.86% /  83.54%    36638772333   36.64G   6.9%
guessed loop iterations                     16760  22.3%       99.78%   91.49% /  91.49%   162952747918  162.95G  30.9%

HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
no prediction                               12730  16.9%       39.29%   33.32% /  79.93%   121106031835  121.11G  23.0%
first match                                 25261  33.6%       92.17%   88.33% /  88.98%   296652487962  296.65G  56.3%
DS theory                                   28333  37.7%       63.03%   72.05% /  85.00%   109563734005  109.56G  20.8%
combined                                    75232 100.0%       73.17%   72.32% /  86.08%   527351738575  527.35G 100.0%

Loop count: 37870
  avg. # of iter: 8444.77
  median # of iter: 7.00
  avg. (1% cutoff) # of iter: 174.68
  avg. (5% cutoff) # of iter: 55.14
  avg. (10% cutoff) # of iter: 35.21
  avg. (20% cutoff) # of iter: 26.23
  avg. (30% cutoff) # of iter: 21.70


>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR middle-end/103270
>> 	* gcc.dg/pr103270.c: New test.
>> ---
>>  gcc/predict.c                   | 10 ++++++----
>>  gcc/testsuite/gcc.dg/pr103270.c | 19 +++++++++++++++++++
>>  2 files changed, 25 insertions(+), 4 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/pr103270.c
>>
>> diff --git a/gcc/predict.c b/gcc/predict.c
>> index 3cb4e3c0eb5..5b6e0cf722b 100644
>> --- a/gcc/predict.c
>> +++ b/gcc/predict.c
>> @@ -1859,7 +1859,7 @@ predict_iv_comparison (class loop *loop, basic_block bb,
>>     exits to predict them using PRED_LOOP_EXTRA_EXIT.  */
>>  
>>  static void
>> -predict_extra_loop_exits (edge exit_edge)
>> +predict_extra_loop_exits (class loop *loop, edge exit_edge)
>>  {
>>    unsigned i;
>>    bool check_value_one;
>> @@ -1912,12 +1912,14 @@ predict_extra_loop_exits (edge exit_edge)
>>  	continue;
>>        if (EDGE_COUNT (e->src->succs) != 1)
>>  	{
>> -	  predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
>> +	  predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
>> +					 loop);
>>  	  continue;
>>  	}
>>  
>>        FOR_EACH_EDGE (e1, ei, e->src->preds)
>> -	predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN);
>> +	predict_paths_leading_to_edge (e1, PRED_LOOP_EXTRA_EXIT, NOT_TAKEN,
>> +				       loop);
>>      }
>>  }
>>  
>> @@ -2008,7 +2010,7 @@ predict_loops (void)
>>  			 ex->src->index, ex->dest->index);
>>  	      continue;
>>  	    }
>> -	  predict_extra_loop_exits (ex);
>> +	  predict_extra_loop_exits (loop, ex);
>>  
>>  	  if (number_of_iterations_exit (loop, ex, &niter_desc, false, false))
>>  	    niter = niter_desc.niter;
>> diff --git a/gcc/testsuite/gcc.dg/pr103270.c b/gcc/testsuite/gcc.dg/pr103270.c
>> new file mode 100644
>> index 00000000000..819310e360e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr103270.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-profile_estimate" } */
>> +
>> +void test(int a, int* i)
>> +{
>> +  for (; a < 5; ++a)
>> +    {
>> +      int b = 0;
>> +      int c = 0;
>> +      for (; b != -11; b--)
>> +	for (int d = 0; d ==0; d++)
>> +	  {
>> +	    *i += c & a;
>> +	    c = b;
>> +	  }
>> +    }
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not "extra loop exit heuristics of edge\[^:\]*:" "profile_estimate"} } */
>> -- 
>> 2.25.1
>>

Xionghu Luo Dec. 15, 2021, 6:40 a.m. UTC | #4

On 2021/12/14 17:27, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/12/13 17:25, Jan Hubicka wrote:
>>> r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
>>> profile-estimate when predict_extra_loop_exits, outer loop's exit edge
>>> is marked as inner loop's extra loop exit and set with incorrect
>>> prediction, then a hot inner loop will become cold loop finally through
>>> optimizations, this patch add loop check when searching extra exit edges
>>> to avoid unexpected predict_edge from predict_paths_for_bb.
>>>
>>> Regression tested on P8LE, OK for master?
>>>
>>> gcc/ChangeLog:
>>>
>>> 	PR middle-end/103270
>>> 	* predict.c (predict_extra_loop_exits): Add loop parameter.
>>> 	(predict_loops): Call with loop argument.
>>
>> With changes to branch predictors it is useful to re-test their
>> effectivity on spec and see if their hitrates are still mathcing
>> reality.  You can do it by buiding spec with -fprofile-generate, train
>> it and then build with -fprofile-use -fdump-tree-ipa-profile-details
>> and use contrib/analyze_brprob.py that will collect info on how they
>> work.
>>
>> This patch looks good to me, but it would be nice to have things reality
>> checked (and since we did not do the stats for some time, there may be
>> surprises) so if you could run the specs and post results of
>> analyze_brprob, it would be great.  I will also try to get to that soon,
>> but currently I am bit swamped by other problems I noticed on clang
>> builds.
>>
>> Thanks a lot for working on profile fixes - I am trying now to get
>> things into shape.  With Martin we added basic testing infrastructure
>> for keeping track of profile updates and I am trying to see how it works
>> in practice now.  Hopefully it will make it easier to judge on profile
>> updating patches. I would welcome list of patches I should look at.
>>
>> I will write separate mail on this.
>> Honza
> 
> 
> With the patch, the analyze_brprob.py outputs below data with PGO build,
> there is no verification code in the script, so how to check whether it
> is correct?  Run it again without the patch and compare "extra loop exit"
> field?
> 
> 
> ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
> loop guard                                   1177   1.6%       56.33%   42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
> loop iterations                              4761   6.3%       99.98%   84.27% /  84.27%    73463634555   73.46G  13.9%
> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
> opcode values nonequal (on trees)           12237  16.3%       70.70%   70.86% /  83.54%    36638772333   36.64G   6.9%
> guessed loop iterations                     16760  22.3%       99.78%   91.49% /  91.49%   162952747918  162.95G  30.9%
> 
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> no prediction                               12730  16.9%       39.29%   33.32% /  79.93%   121106031835  121.11G  23.0%
> first match                                 25261  33.6%       92.17%   88.33% /  88.98%   296652487962  296.65G  56.3%
> DS theory                                   28333  37.7%       63.03%   72.05% /  85.00%   109563734005  109.56G  20.8%
> combined                                    75232 100.0%       73.17%   72.32% /  86.08%   527351738575  527.35G 100.0%
> 
> Loop count: 37870
>   avg. # of iter: 8444.77
>   median # of iter: 7.00
>   avg. (1% cutoff) # of iter: 174.68
>   avg. (5% cutoff) # of iter: 55.14
>   avg. (10% cutoff) # of iter: 35.21
>   avg. (20% cutoff) # of iter: 26.23
>   avg. (30% cutoff) # of iter: 21.70

This is the output data collected without the patch, as can be seen, no difference on "extra loop exit".
But this issue should be fixed.


./contrib/analyze_brprob_spec.py ~/workspace/tests/spec2017/

benchspec
HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
__builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688238    3.27G   0.6%                     53%:2
recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303795    3.78G   0.7%                     52%:3
loop guard                                   1178   1.6%       56.37%   42.54% /  80.32%     7373601533    7.37G   1.4%                     50%:2
opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   5.9%                     21%:2
loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.4%                     18%:1
loop iterations                              4772   6.3%       99.98%   84.27% /  84.27%    74045982111   74.05G  13.8%
pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.7%                     34%:1
opcode values nonequal (on trees)           12240  16.2%       70.71%   70.86% /  83.54%    36638772682   36.64G   6.9%
guessed loop iterations                     16854  22.4%       99.78%   91.21% /  91.22%   169765264401  169.77G  31.7%

HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
no prediction                               12731  16.9%       39.30%   33.32% /  79.93%   121106031963  121.11G  22.6%
first match                                 25366  33.7%       92.20%   88.24% /  88.88%   304047352001  304.05G  56.9%
DS theory                                   28337  37.6%       63.03%   72.05% /  85.00%   109563734430  109.56G  20.5%
combined                                    75342 100.0%       73.21%   72.49% /  86.06%   534746603167  534.75G 100.0%

Loop count: 38058
  avg. # of iter: 8403.32
  median # of iter: 7.00
  avg. (1% cutoff) # of iter: 173.72
  avg. (5% cutoff) # of iter: 54.90
  avg. (10% cutoff) # of iter: 35.20
  avg. (20% cutoff) # of iter: 26.35
  avg. (30% cutoff) # of iter: 21.87

Jan Hubicka Dec. 16, 2021, 11:18 a.m. UTC | #5

> > 
> > 
> > ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
> > HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> > noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
> > Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
> > loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
> > __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
> > loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
> > extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
> > guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
> > negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
> > loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
> > const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
> > indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
> > polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
> > recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
> > goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
> > null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
> > continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
> > loop guard                                   1177   1.6%       56.33%   42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
> > opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
> > loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
> > loop iterations                              4761   6.3%       99.98%   84.27% /  84.27%    73463634555   73.46G  13.9%
> > pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
> > call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
> > opcode values nonequal (on trees)           12237  16.3%       70.70%   70.86% /  83.54%    36638772333   36.64G   6.9%
> > guessed loop iterations                     16760  22.3%       99.78%   91.49% /  91.49%   162952747918  162.95G  30.9%
> > 
> > HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> > no prediction                               12730  16.9%       39.29%   33.32% /  79.93%   121106031835  121.11G  23.0%
> > first match                                 25261  33.6%       92.17%   88.33% /  88.98%   296652487962  296.65G  56.3%
> > DS theory                                   28333  37.7%       63.03%   72.05% /  85.00%   109563734005  109.56G  20.8%
> > combined                                    75232 100.0%       73.17%   72.32% /  86.08%   527351738575  527.35G 100.0%
> > 
> > Loop count: 37870
> >   avg. # of iter: 8444.77
> >   median # of iter: 7.00
> >   avg. (1% cutoff) # of iter: 174.68
> >   avg. (5% cutoff) # of iter: 55.14
> >   avg. (10% cutoff) # of iter: 35.21
> >   avg. (20% cutoff) # of iter: 26.23
> >   avg. (30% cutoff) # of iter: 21.70
> 
> This is the output data collected without the patch, as can be seen, no difference on "extra loop exit".
> But this issue should be fixed.
> 
> 
> ./contrib/analyze_brprob_spec.py ~/workspace/tests/spec2017/
> 
> benchspec
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688238    3.27G   0.6%                     53%:2
> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303795    3.78G   0.7%                     52%:3
> loop guard                                   1178   1.6%       56.37%   42.54% /  80.32%     7373601533    7.37G   1.4%                     50%:2
> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   5.9%                     21%:2
> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.4%                     18%:1
> loop iterations                              4772   6.3%       99.98%   84.27% /  84.27%    74045982111   74.05G  13.8%
> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.7%                     34%:1
> opcode values nonequal (on trees)           12240  16.2%       70.71%   70.86% /  83.54%    36638772682   36.64G   6.9%
> guessed loop iterations                     16854  22.4%       99.78%   91.21% /  91.22%   169765264401  169.77G  31.7%
> 
> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
> no prediction                               12731  16.9%       39.30%   33.32% /  79.93%   121106031963  121.11G  22.6%
> first match                                 25366  33.7%       92.20%   88.24% /  88.88%   304047352001  304.05G  56.9%
> DS theory                                   28337  37.6%       63.03%   72.05% /  85.00%   109563734430  109.56G  20.5%
> combined                                    75342 100.0%       73.21%   72.49% /  86.06%   534746603167  534.75G 100.0%

Thank you.  So it seems that the problem does not trigger in Spec but I
was also wondering if our current predict.def values are anywhere near
to reality.

THe table reads as follows:  
 - BRANCHES is number of branches the heuristics hit on (so extra loop
   exit has 80 and therefore we do not have that good statistics on it)
 - HITRATE is the probability that the prediction goes given direction
   during the train run.
   after / is the value which would be reached by perfect predictor
   (which predict branch to the direction that dominates during train)
   Extra loop exit is 81% out of 89% so it is pretty close to optimum
 - COVERAGE is how many times the predicted branch was executed

In general the idea is that for most heuristics (wihch can not determine
exact value like loop iteraitons) HITRATE values can be put to
predict.def so the Dempster-Shafer formula (DS theory) combines the
hypothesis sort of realistically (it assumes that all the predictors are
staistically independent which they are not).

We have HITRATE 67 for extra loop exit which is bit off what we do have
in the measured data, but I think our predict.def is still based on
spec2006 numbers.

So the patch is OK.  Perhaps we could experiment with updating
predict.def (It does develop even when run across same benchmark suite
since early optimizations change - this stage1 I think the threading
work definitly affects the situation substantially)

Honza
> 
> Loop count: 38058
>   avg. # of iter: 8403.32
>   median # of iter: 7.00
>   avg. (1% cutoff) # of iter: 173.72
>   avg. (5% cutoff) # of iter: 54.90
>   avg. (10% cutoff) # of iter: 35.20
>   avg. (20% cutoff) # of iter: 26.35
>   avg. (30% cutoff) # of iter: 21.87
> 
> 
> -- 
> Thanks,
> Xionghu

Xionghu Luo Dec. 21, 2021, 3:56 a.m. UTC | #6

On 2021/12/16 19:18, Jan Hubicka wrote:
>>>
>>>
>>> ./contrib/analyze_brprob.py ~/workspace/tests/spec2017/dump_file_all
>>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>>> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
>>> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
>>> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
>>> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
>>> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
>>> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
>>> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
>>> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
>>> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
>>> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
>>> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
>>> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688244    3.27G   0.6%                     53%:2
>>> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
>>> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
>>> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
>>> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303799    3.78G   0.7%                     52%:3
>>> loop guard                                   1177   1.6%       56.33%   42.54% /  80.32%     7373601457    7.37G   1.4%                     50%:2
>>> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   6.0%                     21%:2
>>> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.6%                     18%:1
>>> loop iterations                              4761   6.3%       99.98%   84.27% /  84.27%    73463634555   73.46G  13.9%
>>> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
>>> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.8%                     34%:1
>>> opcode values nonequal (on trees)           12237  16.3%       70.70%   70.86% /  83.54%    36638772333   36.64G   6.9%
>>> guessed loop iterations                     16760  22.3%       99.78%   91.49% /  91.49%   162952747918  162.95G  30.9%
>>>
>>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>>> no prediction                               12730  16.9%       39.29%   33.32% /  79.93%   121106031835  121.11G  23.0%
>>> first match                                 25261  33.6%       92.17%   88.33% /  88.98%   296652487962  296.65G  56.3%
>>> DS theory                                   28333  37.7%       63.03%   72.05% /  85.00%   109563734005  109.56G  20.8%
>>> combined                                    75232 100.0%       73.17%   72.32% /  86.08%   527351738575  527.35G 100.0%
>>>
>>> Loop count: 37870
>>>   avg. # of iter: 8444.77
>>>   median # of iter: 7.00
>>>   avg. (1% cutoff) # of iter: 174.68
>>>   avg. (5% cutoff) # of iter: 55.14
>>>   avg. (10% cutoff) # of iter: 35.21
>>>   avg. (20% cutoff) # of iter: 26.23
>>>   avg. (30% cutoff) # of iter: 21.70
>>
>> This is the output data collected without the patch, as can be seen, no difference on "extra loop exit".
>> But this issue should be fixed.
>>
>>
>> ./contrib/analyze_brprob_spec.py ~/workspace/tests/spec2017/
>>
>> benchspec
>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>> noreturn call                                   1   0.0%      100.00%   50.00% /  50.00%              2     2.00   0.0%                     100%:1
>> Fortran zero-sized array                        3   0.0%       66.67%   41.71% /  60.50%            362   362.00   0.0%                     100%:3
>> loop iv compare                                16   0.0%       93.75%   98.26% /  98.76%         279847  279.85k   0.0%                     93%:4
>> __builtin_expect                               35   0.0%       97.14%   78.09% /  78.35%       17079558   17.08M   0.0%
>> loop guard with recursion                      45   0.1%       86.67%   85.13% /  85.14%     6722424412    6.72G   1.3%                     74%:4
>> extra loop exit                                80   0.1%       58.75%   81.49% /  89.21%      438470261  438.47M   0.1%                     86%:3
>> guess loop iv compare                         235   0.3%       80.85%   52.83% /  73.97%      148558247  148.56M   0.0%                     47%:3
>> negative return                               241   0.3%       71.37%   25.33% /  92.61%      250402383  250.40M   0.0%                     69%:2
>> loop exit with recursion                      315   0.4%       74.60%   85.07% /  85.71%     9403136858    9.40G   1.8%                     59%:4
>> const return                                  320   0.4%       51.88%   90.45% /  95.63%      925341727  925.34M   0.2%                     76%:5
>> indirect call                                 377   0.5%       51.46%   84.72% /  91.14%     2133772848    2.13G   0.4%                     69%:1
>> polymorphic call                              410   0.5%       44.15%   31.26% /  79.37%     3272688238    3.27G   0.6%                     53%:2
>> recursive call                                506   0.7%       39.53%   44.97% /  83.92%     1211036806    1.21G   0.2%                     10%:1
>> goto                                          618   0.8%       64.24%   65.37% /  83.57%      702446178  702.45M   0.1%                     20%:1
>> null return                                   800   1.1%       64.62%   56.59% /  77.70%      603952067  603.95M   0.1%                     28%:2
>> continue                                      956   1.3%       63.70%   65.65% /  79.97%     3780303795    3.78G   0.7%                     52%:3
>> loop guard                                   1178   1.6%       56.37%   42.54% /  80.32%     7373601533    7.37G   1.4%                     50%:2
>> opcode values positive (on trees)            2020   2.7%       62.38%   64.16% /  84.44%    31695571761   31.70G   5.9%                     21%:2
>> loop exit                                    3293   4.4%       76.19%   87.18% /  88.35%    50377138963   50.38G   9.4%                     18%:1
>> loop iterations                              4772   6.3%       99.98%   84.27% /  84.27%    74045982111   74.05G  13.8%
>> pointer (on trees)                           8076  10.7%       56.23%   69.36% /  83.15%    12322099991   12.32G   2.3%
>> call                                        11396  15.1%       64.14%   74.13% /  89.82%    25197949198   25.20G   4.7%                     34%:1
>> opcode values nonequal (on trees)           12240  16.2%       70.71%   70.86% /  83.54%    36638772682   36.64G   6.9%
>> guessed loop iterations                     16854  22.4%       99.78%   91.21% /  91.22%   169765264401  169.77G  31.7%
>>
>> HEURISTICS                               BRANCHES  (REL)  BR. HITRATE            HITRATE       COVERAGE COVERAGE  (REL)  predict.def  (REL) HOT branches (>10%)
>> no prediction                               12731  16.9%       39.30%   33.32% /  79.93%   121106031963  121.11G  22.6%
>> first match                                 25366  33.7%       92.20%   88.24% /  88.88%   304047352001  304.05G  56.9%
>> DS theory                                   28337  37.6%       63.03%   72.05% /  85.00%   109563734430  109.56G  20.5%
>> combined                                    75342 100.0%       73.21%   72.49% /  86.06%   534746603167  534.75G 100.0%
> 
> Thank you.  So it seems that the problem does not trigger in Spec but I
> was also wondering if our current predict.def values are anywhere near
> to reality.
> 
> THe table reads as follows:  
>  - BRANCHES is number of branches the heuristics hit on (so extra loop
>    exit has 80 and therefore we do not have that good statistics on it)
>  - HITRATE is the probability that the prediction goes given direction
>    during the train run.
>    after / is the value which would be reached by perfect predictor
>    (which predict branch to the direction that dominates during train)
>    Extra loop exit is 81% out of 89% so it is pretty close to optimum
>  - COVERAGE is how many times the predicted branch was executed
> 
> In general the idea is that for most heuristics (wihch can not determine
> exact value like loop iteraitons) HITRATE values can be put to
> predict.def so the Dempster-Shafer formula (DS theory) combines the
> hypothesis sort of realistically (it assumes that all the predictors are
> staistically independent which they are not).
> 
> We have HITRATE 67 for extra loop exit which is bit off what we do have
> in the measured data, but I think our predict.def is still based on
> spec2006 numbers.
> 
> So the patch is OK.  Perhaps we could experiment with updating
> predict.def (It does develop even when run across same benchmark suite
> since early optimizations change - this stage1 I think the threading
> work definitly affects the situation substantially)

Thanks, committed to r12-6085.

> 
> Honza
>>
>> Loop count: 38058
>>   avg. # of iter: 8403.32
>>   median # of iter: 7.00
>>   avg. (1% cutoff) # of iter: 173.72
>>   avg. (5% cutoff) # of iter: 54.90
>>   avg. (10% cutoff) # of iter: 35.20
>>   avg. (20% cutoff) # of iter: 26.35
>>   avg. (30% cutoff) # of iter: 21.87
>>
>>
>> -- 
>> Thanks,
>> Xionghu

[2/3] Fix incorrect loop exit edge probability [PR103270]

Commit Message

Comments

Patch