diff mbox series

Increase precision of static profiles

Message ID 20171117195307.GC15395@kam.mff.cuni.cz
State New
Headers show
Series Increase precision of static profiles | expand

Commit Message

Jan Hubicka Nov. 17, 2017, 7:53 p.m. UTC
Hi,
this patch makes static profile to be in range 0...2^30 rather than
0...10000.  This is safe now as profile-counts are taking care of
possible overflow when the profile ends up cummulating too high after
inlining.
MThere are two testcases that needs adusting. dump-2.c simply checks
for specific value of counter that is now different. pr77445-2
now gets one extra mismatch reported. The mismatch was present before
too but due to low precision it was not visible.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

	* predict.c (determine_unlikely_bbs): Set cgraph node count to 0
	when entry block was promoted unlikely.
	(estimate_bb_frequencies): Increase frequency scale.
	* profile-count.h (profile_count): Export precision info.
	* gcc.dg/tree-ssa/dump-2.c: Fixup template for profile precision
	changes.
	* gcc.dg/tree-ssa/pr77445-2.c: Fixup template for profile precision
	changes.

Comments

Tom de Vries Nov. 19, 2017, 10:01 a.m. UTC | #1
On 11/17/2017 08:53 PM, Jan Hubicka wrote:
> Hi,
> this patch makes static profile to be in range 0...2^30 rather than
> 0...10000.  This is safe now as profile-counts are taking care of
> possible overflow when the profile ends up cummulating too high after
> inlining.
> MThere are two testcases that needs adusting. dump-2.c simply checks
> for specific value of counter that is now different. pr77445-2
> now gets one extra mismatch reported. The mismatch was present before
> too but due to low precision it was not visible.
> 
> Bootstrapped/regtested x86_64-linux, comitted.
> 
> Honza
> 
> 	* predict.c (determine_unlikely_bbs): Set cgraph node count to 0
> 	when entry block was promoted unlikely.
> 	(estimate_bb_frequencies): Increase frequency scale.
> 	* profile-count.h (profile_count): Export precision info.
> 	* gcc.dg/tree-ssa/dump-2.c: Fixup template for profile precision
> 	changes.
> 	* gcc.dg/tree-ssa/pr77445-2.c: Fixup template for profile precision
> 	changes.

Hi,

this caused PR 83043 - "FAIL: libgomp.graphite/force-parallel-1.c 
scan-tree-dump-times graphite "2 loops carried no dependency" 1 (found 0 
times)" ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83043 ).

Thanks,
- Tom
Christophe Lyon Nov. 20, 2017, 8:35 p.m. UTC | #2
On 19 November 2017 at 11:01, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 11/17/2017 08:53 PM, Jan Hubicka wrote:
>>
>> Hi,
>> this patch makes static profile to be in range 0...2^30 rather than
>> 0...10000.  This is safe now as profile-counts are taking care of
>> possible overflow when the profile ends up cummulating too high after
>> inlining.
>> MThere are two testcases that needs adusting. dump-2.c simply checks
>> for specific value of counter that is now different. pr77445-2
>> now gets one extra mismatch reported. The mismatch was present before
>> too but due to low precision it was not visible.
>>
>> Bootstrapped/regtested x86_64-linux, comitted.
>>
>> Honza
>>
>>         * predict.c (determine_unlikely_bbs): Set cgraph node count to 0
>>         when entry block was promoted unlikely.
>>         (estimate_bb_frequencies): Increase frequency scale.
>>         * profile-count.h (profile_count): Export precision info.
>>         * gcc.dg/tree-ssa/dump-2.c: Fixup template for profile precision
>>         changes.
>>         * gcc.dg/tree-ssa/pr77445-2.c: Fixup template for profile
>> precision
>>         changes.
>
>
> Hi,
>
> this caused PR 83043 - "FAIL: libgomp.graphite/force-parallel-1.c
> scan-tree-dump-times graphite "2 loops carried no dependency" 1 (found 0
> times)" ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83043 ).
>

I have also reported PR 83081
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83081

FAIL: gcc.dg/pr80218.c scan-rtl-dump-not ira "Invalid sum"
on arm-none-linux-gnueabi

This might be a duplicate.

Christophe

> Thanks,
> - Tom
diff mbox series

Patch

Index: predict.c
===================================================================
--- predict.c	(revision 254884)
+++ predict.c	(working copy)
@@ -3542,6 +3542,8 @@  determine_unlikely_bbs ()
 			bb->index, e->dest->index);
 	     e->probability = profile_probability::never ();
 	   }
+  if (ENTRY_BLOCK_PTR_FOR_FN (cfun)->count == profile_count::zero ())
+    cgraph_node::get (current_function_decl)->count = profile_count::zero ();
 }
 
 /* Estimate and propagate basic block frequencies using the given branch
@@ -3565,7 +3567,11 @@  estimate_bb_frequencies (bool force)
         {
 	  real_values_initialized = 1;
 	  real_br_prob_base = REG_BR_PROB_BASE;
-	  real_bb_freq_max = BB_FREQ_MAX;
+	  /* Scaling frequencies up to maximal profile count may result in
+	     frequent overflows especially when inlining loops.
+	     Small scalling results in unnecesary precision loss.  Stay in
+	     the half of the (exponential) range.  */
+	  real_bb_freq_max = (uint64_t)1 << (profile_count::n_bits / 2);
 	  real_one_half = sreal (1, -1);
 	  real_inv_br_prob_base = sreal (1) / real_br_prob_base;
 	  real_almost_one = sreal (1) - real_inv_br_prob_base;
@@ -3610,6 +3616,8 @@  estimate_bb_frequencies (bool force)
 	  freq_max = BLOCK_INFO (bb)->frequency;
 
       freq_max = real_bb_freq_max / freq_max;
+      if (freq_max < 16)
+	freq_max = 16;
       cfun->cfg->count_max = profile_count::uninitialized ();
       FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
 	{
Index: profile-count.h
===================================================================
--- profile-count.h	(revision 254884)
+++ profile-count.h	(working copy)
@@ -605,11 +605,13 @@  class sreal;
 
 class GTY(()) profile_count
 {
+public:
   /* Use 62bit to hold basic block counters.  Should be at least
      64bit.  Although a counter cannot be negative, we use a signed
      type to hold various extra stages.  */
 
   static const int n_bits = 61;
+private:
   static const uint64_t max_count = ((uint64_t) 1 << n_bits) - 2;
   static const uint64_t uninitialized_count = ((uint64_t) 1 << n_bits) - 1;
 
Index: testsuite/gcc.dg/tree-ssa/dump-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/dump-2.c	(revision 254884)
+++ testsuite/gcc.dg/tree-ssa/dump-2.c	(working copy)
@@ -6,4 +6,4 @@  int f(void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "<bb \[0-9\]> \\\[local count: 10000\\\]:" "optimized" } } */
+/* { dg-final { scan-tree-dump "<bb \[0-9\]> \\\[local count: " "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/pr77445-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/pr77445-2.c	(revision 254884)
+++ testsuite/gcc.dg/tree-ssa/pr77445-2.c	(working copy)
@@ -120,7 +120,7 @@  enum STATES FMS( u8 **in , u32 *transiti
    profile estimation stage. But the number of inconsistencies should not
    increase much.  */
 /* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Invalid sum" 2 "thread1" } } */
+/* { dg-final { scan-tree-dump-times "Invalid sum" 3 "thread1" } } */
 /* { dg-final { scan-tree-dump-not "not considered" "thread1" } } */
 /* { dg-final { scan-tree-dump-not "not considered" "thread2" } } */
 /* { dg-final { scan-tree-dump-not "not considered" "thread3" } } */