Message ID | 20171117195307.GC15395@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
Series | Increase precision of static profiles | expand |
On 11/17/2017 08:53 PM, Jan Hubicka wrote: > Hi, > this patch makes static profile to be in range 0...2^30 rather than > 0...10000. This is safe now as profile-counts are taking care of > possible overflow when the profile ends up cummulating too high after > inlining. > MThere are two testcases that needs adusting. dump-2.c simply checks > for specific value of counter that is now different. pr77445-2 > now gets one extra mismatch reported. The mismatch was present before > too but due to low precision it was not visible. > > Bootstrapped/regtested x86_64-linux, comitted. > > Honza > > * predict.c (determine_unlikely_bbs): Set cgraph node count to 0 > when entry block was promoted unlikely. > (estimate_bb_frequencies): Increase frequency scale. > * profile-count.h (profile_count): Export precision info. > * gcc.dg/tree-ssa/dump-2.c: Fixup template for profile precision > changes. > * gcc.dg/tree-ssa/pr77445-2.c: Fixup template for profile precision > changes. Hi, this caused PR 83043 - "FAIL: libgomp.graphite/force-parallel-1.c scan-tree-dump-times graphite "2 loops carried no dependency" 1 (found 0 times)" ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83043 ). Thanks, - Tom
On 19 November 2017 at 11:01, Tom de Vries <Tom_deVries@mentor.com> wrote: > On 11/17/2017 08:53 PM, Jan Hubicka wrote: >> >> Hi, >> this patch makes static profile to be in range 0...2^30 rather than >> 0...10000. This is safe now as profile-counts are taking care of >> possible overflow when the profile ends up cummulating too high after >> inlining. >> MThere are two testcases that needs adusting. dump-2.c simply checks >> for specific value of counter that is now different. pr77445-2 >> now gets one extra mismatch reported. The mismatch was present before >> too but due to low precision it was not visible. >> >> Bootstrapped/regtested x86_64-linux, comitted. >> >> Honza >> >> * predict.c (determine_unlikely_bbs): Set cgraph node count to 0 >> when entry block was promoted unlikely. >> (estimate_bb_frequencies): Increase frequency scale. >> * profile-count.h (profile_count): Export precision info. >> * gcc.dg/tree-ssa/dump-2.c: Fixup template for profile precision >> changes. >> * gcc.dg/tree-ssa/pr77445-2.c: Fixup template for profile >> precision >> changes. > > > Hi, > > this caused PR 83043 - "FAIL: libgomp.graphite/force-parallel-1.c > scan-tree-dump-times graphite "2 loops carried no dependency" 1 (found 0 > times)" ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83043 ). > I have also reported PR 83081 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83081 FAIL: gcc.dg/pr80218.c scan-rtl-dump-not ira "Invalid sum" on arm-none-linux-gnueabi This might be a duplicate. Christophe > Thanks, > - Tom
Index: predict.c =================================================================== --- predict.c (revision 254884) +++ predict.c (working copy) @@ -3542,6 +3542,8 @@ determine_unlikely_bbs () bb->index, e->dest->index); e->probability = profile_probability::never (); } + if (ENTRY_BLOCK_PTR_FOR_FN (cfun)->count == profile_count::zero ()) + cgraph_node::get (current_function_decl)->count = profile_count::zero (); } /* Estimate and propagate basic block frequencies using the given branch @@ -3565,7 +3567,11 @@ estimate_bb_frequencies (bool force) { real_values_initialized = 1; real_br_prob_base = REG_BR_PROB_BASE; - real_bb_freq_max = BB_FREQ_MAX; + /* Scaling frequencies up to maximal profile count may result in + frequent overflows especially when inlining loops. + Small scalling results in unnecesary precision loss. Stay in + the half of the (exponential) range. */ + real_bb_freq_max = (uint64_t)1 << (profile_count::n_bits / 2); real_one_half = sreal (1, -1); real_inv_br_prob_base = sreal (1) / real_br_prob_base; real_almost_one = sreal (1) - real_inv_br_prob_base; @@ -3610,6 +3616,8 @@ estimate_bb_frequencies (bool force) freq_max = BLOCK_INFO (bb)->frequency; freq_max = real_bb_freq_max / freq_max; + if (freq_max < 16) + freq_max = 16; cfun->cfg->count_max = profile_count::uninitialized (); FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb) { Index: profile-count.h =================================================================== --- profile-count.h (revision 254884) +++ profile-count.h (working copy) @@ -605,11 +605,13 @@ class sreal; class GTY(()) profile_count { +public: /* Use 62bit to hold basic block counters. Should be at least 64bit. Although a counter cannot be negative, we use a signed type to hold various extra stages. */ static const int n_bits = 61; +private: static const uint64_t max_count = ((uint64_t) 1 << n_bits) - 2; static const uint64_t uninitialized_count = ((uint64_t) 1 << n_bits) - 1; Index: testsuite/gcc.dg/tree-ssa/dump-2.c =================================================================== --- testsuite/gcc.dg/tree-ssa/dump-2.c (revision 254884) +++ testsuite/gcc.dg/tree-ssa/dump-2.c (working copy) @@ -6,4 +6,4 @@ int f(void) return 0; } -/* { dg-final { scan-tree-dump "<bb \[0-9\]> \\\[local count: 10000\\\]:" "optimized" } } */ +/* { dg-final { scan-tree-dump "<bb \[0-9\]> \\\[local count: " "optimized" } } */ Index: testsuite/gcc.dg/tree-ssa/pr77445-2.c =================================================================== --- testsuite/gcc.dg/tree-ssa/pr77445-2.c (revision 254884) +++ testsuite/gcc.dg/tree-ssa/pr77445-2.c (working copy) @@ -120,7 +120,7 @@ enum STATES FMS( u8 **in , u32 *transiti profile estimation stage. But the number of inconsistencies should not increase much. */ /* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */ -/* { dg-final { scan-tree-dump-times "Invalid sum" 2 "thread1" } } */ +/* { dg-final { scan-tree-dump-times "Invalid sum" 3 "thread1" } } */ /* { dg-final { scan-tree-dump-not "not considered" "thread1" } } */ /* { dg-final { scan-tree-dump-not "not considered" "thread2" } } */ /* { dg-final { scan-tree-dump-not "not considered" "thread3" } } */