Message ID | nycvar.YFH.7.76.1910071049030.5566@zhemvz.fhfr.qr |
---|---|
State | New |
Headers | show |
Series | Fix PR91975, tame PRE some more | expand |
On Mon, 7 Oct 2019, Richard Biener wrote: > > The following tries to address the issue that PRE is quite happy > to introduce new IVs in loops just because it can compute some > constant value on the loop entry edge. In principle there's > already code that should work against that but it simply searches > for a optimize_edge_for_speed () edge. That still considers > the loop entry edge to be worth optimizing because it ends > up as maybe_hot_edge_p (e) for -O2 which compares the edge count > against the entry block count. For PRE we want something more > local (comparing to the destination block count). > > Now for the simple testcases this shouldn't make a difference > but hot/cold uses PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) which > isn't the same as profile_probabilities likely or very likely... > > Still one key of the patch is that we compare the sum of the > edge counts where the value is available (and thus the redundancy > elimination happens) with the count we have to insert rather > than looking for a single optimize_edge_for_speed_p edge. > > For that I've used > > if (avail_count < block->count.apply_probability > (profile_probability::unlikely ())) > > so we block insertion if the redundancies would overall be "unlikely". > > I'm also not sure why maybe_hot_count_p uses HOT_BB_FREQUENCY_FRACTION > while there exists HOT_BB_COUNT_FRACTION (with a ten-fold larger > default value) that seems to match better for scaling a profile-count? > > Honza? Honza - does this sound sensible? Can you comment on the maybe_hot_count_p param use? I wanted to avoid specifically looking to avoid inserting on backedges but instead improve the optimize_edge_for_speed heuristic. > Bootstrap & regtest running on x86-64-unknown-linux-gnu. > > Does the above predicate look sane or am I on a wrong track with > using the destination block count here (I realize even the "locally cold" > entries into the block might be quite hot globally). > > For a 1:1 translation of the existing code to sth using the > original predicate but summing over edges I could use > !maybe_hot_count_p (cfun, avail_count)? But then we're back to > PRE doing the unwanted insertions. Changing maybe_hot_count_p > to use HOT_BB_COUNT_FRACTION doesn't make any difference there > (obviously). > > Thanks, > Richard. > > 2019-10-06 Richard Biener <rguenther@suse.de> > > PR tree-optimization/91975 > * tree-ssa-pre.c (do_pre_regular_insertion): Adjust > profitability check to use the sum of all edge counts the > value is available on and check against unlikely execution > of the block. > > * gcc.dg/tree-ssa/ldist-39.c: New testcase. > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c > new file mode 100644 > index 00000000000..a63548979ea > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c > @@ -0,0 +1,43 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-ldist-details" } */ > + > +#define T int > + > +const T a[] = { 0, 1, 2, 3, 4, 5, 6, 7 }; > +T b[sizeof a / sizeof *a]; > + > +void f0 (void) > +{ > + const T *s = a; > + T *d = b; > + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) > + d[i] = s[i]; > +} > + > +void g0 (void) > +{ > + const T *s = a; > + T *d = b; > + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) > + *d++ = *s++; > +} > + > +extern const T c[sizeof a / sizeof *a]; > + > +void f1 (void) > +{ > + const T *s = c; > + T *d = b; > + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) > + d[i] = s[i]; > +} > + > +void g1 (void) > +{ > + const T *s = c; > + T *d = b; > + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) > + *d++ = *s++; > +} > + > +/* { dg-final { scan-tree-dump-times "generated memcpy" 4 "ldist" } } */ > diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c > index c618601a184..af49ba388c1 100644 > --- a/gcc/tree-ssa-pre.c > +++ b/gcc/tree-ssa-pre.c > @@ -3195,7 +3195,7 @@ do_pre_regular_insertion (basic_block block, basic_block dom) > pre_expr eprime = NULL; > edge_iterator ei; > pre_expr edoubleprime = NULL; > - bool do_insertion = false; > + profile_count avail_count = profile_count::zero (); > > val = get_expr_value_id (expr); > if (bitmap_set_contains_value (PHI_GEN (block), val)) > @@ -3250,10 +3250,7 @@ do_pre_regular_insertion (basic_block block, basic_block dom) > { > avail[pred->dest_idx] = edoubleprime; > by_some = true; > - /* We want to perform insertions to remove a redundancy on > - a path in the CFG we want to optimize for speed. */ > - if (optimize_edge_for_speed_p (pred)) > - do_insertion = true; > + avail_count += pred->count (); > if (first_s == NULL) > first_s = edoubleprime; > else if (!pre_expr_d::equal (first_s, edoubleprime)) > @@ -3266,7 +3263,11 @@ do_pre_regular_insertion (basic_block block, basic_block dom) > partially redundant. */ > if (!cant_insert && !all_same && by_some) > { > - if (!do_insertion) > + /* We want to perform insertions to remove a redundancy on > + a path in the CFG that is somewhat likely. Avoid inserting > + when we'd only remove a redundancy on unlikely paths. */ > + if (avail_count < block->count.apply_probability > + (profile_probability::unlikely ())) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > { >
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c new file mode 100644 index 00000000000..a63548979ea --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ldist-details" } */ + +#define T int + +const T a[] = { 0, 1, 2, 3, 4, 5, 6, 7 }; +T b[sizeof a / sizeof *a]; + +void f0 (void) +{ + const T *s = a; + T *d = b; + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) + d[i] = s[i]; +} + +void g0 (void) +{ + const T *s = a; + T *d = b; + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) + *d++ = *s++; +} + +extern const T c[sizeof a / sizeof *a]; + +void f1 (void) +{ + const T *s = c; + T *d = b; + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) + d[i] = s[i]; +} + +void g1 (void) +{ + const T *s = c; + T *d = b; + for (unsigned i = 0; i != sizeof a / sizeof *a; ++i) + *d++ = *s++; +} + +/* { dg-final { scan-tree-dump-times "generated memcpy" 4 "ldist" } } */ diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index c618601a184..af49ba388c1 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -3195,7 +3195,7 @@ do_pre_regular_insertion (basic_block block, basic_block dom) pre_expr eprime = NULL; edge_iterator ei; pre_expr edoubleprime = NULL; - bool do_insertion = false; + profile_count avail_count = profile_count::zero (); val = get_expr_value_id (expr); if (bitmap_set_contains_value (PHI_GEN (block), val)) @@ -3250,10 +3250,7 @@ do_pre_regular_insertion (basic_block block, basic_block dom) { avail[pred->dest_idx] = edoubleprime; by_some = true; - /* We want to perform insertions to remove a redundancy on - a path in the CFG we want to optimize for speed. */ - if (optimize_edge_for_speed_p (pred)) - do_insertion = true; + avail_count += pred->count (); if (first_s == NULL) first_s = edoubleprime; else if (!pre_expr_d::equal (first_s, edoubleprime)) @@ -3266,7 +3263,11 @@ do_pre_regular_insertion (basic_block block, basic_block dom) partially redundant. */ if (!cant_insert && !all_same && by_some) { - if (!do_insertion) + /* We want to perform insertions to remove a redundancy on + a path in the CFG that is somewhat likely. Avoid inserting + when we'd only remove a redundancy on unlikely paths. */ + if (avail_count < block->count.apply_probability + (profile_probability::unlikely ())) { if (dump_file && (dump_flags & TDF_DETAILS)) {