diff mbox series

Fix PR91975, tame PRE some more

Message ID nycvar.YFH.7.76.1910071049030.5566@zhemvz.fhfr.qr
State New
Headers show
Series Fix PR91975, tame PRE some more | expand

Commit Message

Richard Biener Oct. 7, 2019, 9:20 a.m. UTC
The following tries to address the issue that PRE is quite happy
to introduce new IVs in loops just because it can compute some
constant value on the loop entry edge.  In principle there's
already code that should work against that but it simply searches
for a optimize_edge_for_speed () edge.  That still considers
the loop entry edge to be worth optimizing because it ends
up as maybe_hot_edge_p (e) for -O2 which compares the edge count
against the entry block count.  For PRE we want something more
local (comparing to the destination block count).

Now for the simple testcases this shouldn't make a difference
but hot/cold uses PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) which
isn't the same as profile_probabilities likely or very likely...

Still one key of the patch is that we compare the sum of the
edge counts where the value is available (and thus the redundancy
elimination happens) with the count we have to insert rather
than looking for a single optimize_edge_for_speed_p edge.

For that I've used

    if (avail_count < block->count.apply_probability
                                    (profile_probability::unlikely ()))

so we block insertion if the redundancies would overall be "unlikely".

I'm also not sure why maybe_hot_count_p uses HOT_BB_FREQUENCY_FRACTION
while there exists HOT_BB_COUNT_FRACTION (with a ten-fold larger
default value) that seems to match better for scaling a profile-count?

Honza?

Bootstrap & regtest running on x86-64-unknown-linux-gnu.

Does the above predicate look sane or am I on a wrong track with
using the destination block count here (I realize even the "locally cold"
entries into the block might be quite hot globally).

For a 1:1 translation of the existing code to sth using the
original predicate but summing over edges I could use
!maybe_hot_count_p (cfun, avail_count)?  But then we're back to
PRE doing the unwanted insertions.  Changing maybe_hot_count_p
to use HOT_BB_COUNT_FRACTION doesn't make any difference there
(obviously).

Thanks,
Richard.

2019-10-06  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/91975
	* tree-ssa-pre.c (do_pre_regular_insertion): Adjust
	profitability check to use the sum of all edge counts the
	value is available on and check against unlikely execution
	of the block.

	* gcc.dg/tree-ssa/ldist-39.c: New testcase.

Comments

Richard Biener Oct. 16, 2019, 9:34 a.m. UTC | #1
On Mon, 7 Oct 2019, Richard Biener wrote:

> 
> The following tries to address the issue that PRE is quite happy
> to introduce new IVs in loops just because it can compute some
> constant value on the loop entry edge.  In principle there's
> already code that should work against that but it simply searches
> for a optimize_edge_for_speed () edge.  That still considers
> the loop entry edge to be worth optimizing because it ends
> up as maybe_hot_edge_p (e) for -O2 which compares the edge count
> against the entry block count.  For PRE we want something more
> local (comparing to the destination block count).
> 
> Now for the simple testcases this shouldn't make a difference
> but hot/cold uses PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) which
> isn't the same as profile_probabilities likely or very likely...
> 
> Still one key of the patch is that we compare the sum of the
> edge counts where the value is available (and thus the redundancy
> elimination happens) with the count we have to insert rather
> than looking for a single optimize_edge_for_speed_p edge.
> 
> For that I've used
> 
>     if (avail_count < block->count.apply_probability
>                                     (profile_probability::unlikely ()))
> 
> so we block insertion if the redundancies would overall be "unlikely".
> 
> I'm also not sure why maybe_hot_count_p uses HOT_BB_FREQUENCY_FRACTION
> while there exists HOT_BB_COUNT_FRACTION (with a ten-fold larger
> default value) that seems to match better for scaling a profile-count?
> 
> Honza?

Honza - does this sound sensible?  Can you comment on the
maybe_hot_count_p param use?

I wanted to avoid specifically looking to avoid inserting on
backedges but instead improve the optimize_edge_for_speed
heuristic.

> Bootstrap & regtest running on x86-64-unknown-linux-gnu.
> 
> Does the above predicate look sane or am I on a wrong track with
> using the destination block count here (I realize even the "locally cold"
> entries into the block might be quite hot globally).
> 
> For a 1:1 translation of the existing code to sth using the
> original predicate but summing over edges I could use
> !maybe_hot_count_p (cfun, avail_count)?  But then we're back to
> PRE doing the unwanted insertions.  Changing maybe_hot_count_p
> to use HOT_BB_COUNT_FRACTION doesn't make any difference there
> (obviously).
> 
> Thanks,
> Richard.
> 
> 2019-10-06  Richard Biener  <rguenther@suse.de>
> 
> 	PR tree-optimization/91975
> 	* tree-ssa-pre.c (do_pre_regular_insertion): Adjust
> 	profitability check to use the sum of all edge counts the
> 	value is available on and check against unlikely execution
> 	of the block.
> 
> 	* gcc.dg/tree-ssa/ldist-39.c: New testcase.
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
> new file mode 100644
> index 00000000000..a63548979ea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
> @@ -0,0 +1,43 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-ldist-details" } */
> +
> +#define T int
> +
> +const T a[] = { 0, 1, 2, 3, 4, 5, 6, 7 };
> +T b[sizeof a / sizeof *a];
> +
> +void f0 (void)
> +{
> +  const T *s = a;
> +  T *d = b;
> +  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
> +    d[i] = s[i];
> +}
> +
> +void g0 (void)
> +{
> +  const T *s = a;
> +  T *d = b;
> +  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
> +    *d++ = *s++;
> +}
> +
> +extern const T c[sizeof a / sizeof *a];
> +
> +void f1 (void)
> +{
> +  const T *s = c;
> +  T *d = b;
> +  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
> +    d[i] = s[i];
> +}
> +
> +void g1 (void)
> +{
> +  const T *s = c;
> +  T *d = b;
> +  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
> +    *d++ = *s++;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "generated memcpy" 4 "ldist" } } */
> diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
> index c618601a184..af49ba388c1 100644
> --- a/gcc/tree-ssa-pre.c
> +++ b/gcc/tree-ssa-pre.c
> @@ -3195,7 +3195,7 @@ do_pre_regular_insertion (basic_block block, basic_block dom)
>  	  pre_expr eprime = NULL;
>  	  edge_iterator ei;
>  	  pre_expr edoubleprime = NULL;
> -	  bool do_insertion = false;
> +	  profile_count avail_count = profile_count::zero ();
>  
>  	  val = get_expr_value_id (expr);
>  	  if (bitmap_set_contains_value (PHI_GEN (block), val))
> @@ -3250,10 +3250,7 @@ do_pre_regular_insertion (basic_block block, basic_block dom)
>  		{
>  		  avail[pred->dest_idx] = edoubleprime;
>  		  by_some = true;
> -		  /* We want to perform insertions to remove a redundancy on
> -		     a path in the CFG we want to optimize for speed.  */
> -		  if (optimize_edge_for_speed_p (pred))
> -		    do_insertion = true;
> +		  avail_count += pred->count ();
>  		  if (first_s == NULL)
>  		    first_s = edoubleprime;
>  		  else if (!pre_expr_d::equal (first_s, edoubleprime))
> @@ -3266,7 +3263,11 @@ do_pre_regular_insertion (basic_block block, basic_block dom)
>  	     partially redundant.  */
>  	  if (!cant_insert && !all_same && by_some)
>  	    {
> -	      if (!do_insertion)
> +	      /* We want to perform insertions to remove a redundancy on
> +		 a path in the CFG that is somewhat likely.  Avoid inserting
> +		 when we'd only remove a redundancy on unlikely paths.  */
> +	      if (avail_count < block->count.apply_probability
> +					    (profile_probability::unlikely ()))
>  		{
>  		  if (dump_file && (dump_flags & TDF_DETAILS))
>  		    {
>
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
new file mode 100644
index 00000000000..a63548979ea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
@@ -0,0 +1,43 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ldist-details" } */
+
+#define T int
+
+const T a[] = { 0, 1, 2, 3, 4, 5, 6, 7 };
+T b[sizeof a / sizeof *a];
+
+void f0 (void)
+{
+  const T *s = a;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+    d[i] = s[i];
+}
+
+void g0 (void)
+{
+  const T *s = a;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+    *d++ = *s++;
+}
+
+extern const T c[sizeof a / sizeof *a];
+
+void f1 (void)
+{
+  const T *s = c;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+    d[i] = s[i];
+}
+
+void g1 (void)
+{
+  const T *s = c;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+    *d++ = *s++;
+}
+
+/* { dg-final { scan-tree-dump-times "generated memcpy" 4 "ldist" } } */
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index c618601a184..af49ba388c1 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3195,7 +3195,7 @@  do_pre_regular_insertion (basic_block block, basic_block dom)
 	  pre_expr eprime = NULL;
 	  edge_iterator ei;
 	  pre_expr edoubleprime = NULL;
-	  bool do_insertion = false;
+	  profile_count avail_count = profile_count::zero ();
 
 	  val = get_expr_value_id (expr);
 	  if (bitmap_set_contains_value (PHI_GEN (block), val))
@@ -3250,10 +3250,7 @@  do_pre_regular_insertion (basic_block block, basic_block dom)
 		{
 		  avail[pred->dest_idx] = edoubleprime;
 		  by_some = true;
-		  /* We want to perform insertions to remove a redundancy on
-		     a path in the CFG we want to optimize for speed.  */
-		  if (optimize_edge_for_speed_p (pred))
-		    do_insertion = true;
+		  avail_count += pred->count ();
 		  if (first_s == NULL)
 		    first_s = edoubleprime;
 		  else if (!pre_expr_d::equal (first_s, edoubleprime))
@@ -3266,7 +3263,11 @@  do_pre_regular_insertion (basic_block block, basic_block dom)
 	     partially redundant.  */
 	  if (!cant_insert && !all_same && by_some)
 	    {
-	      if (!do_insertion)
+	      /* We want to perform insertions to remove a redundancy on
+		 a path in the CFG that is somewhat likely.  Avoid inserting
+		 when we'd only remove a redundancy on unlikely paths.  */
+	      if (avail_count < block->count.apply_probability
+					    (profile_probability::unlikely ()))
 		{
 		  if (dump_file && (dump_flags & TDF_DETAILS))
 		    {