Patchwork [PR44576] : Don't do further analysis if we could know prefetching is not benefitial

login
register
mail settings
Submitter Fang, Changpeng
Date July 9, 2010, 1:15 a.m.
Message ID <D4C76825A6780047854A11E93CDE84D02F7769@SAUSEXMBP01.amd.com>
Download mbox | patch
Permalink /patch/58318/
State New
Headers show

Comments

Fang, Changpeng - July 9, 2010, 1:15 a.m.
Hi,

Attached is the patch to fix bug 44576.

Some prefetch analysis (e.g.  miss rate and dependence computations for reuse analysis) has
been known to be expansive. We should avoid un-necessary analysis whenever possible.

In this patch, we split the cost model implementing function (is_loop_prefetching_profitable) into
three small functions (trip_count_to_ahead_ratio_too_small_p, mem_ref_count_reasonable_p and
insn_to_prefetch_ratio_too_small_p), with each one called at the earliest possible time.  In
this way, we can reduce the compilation time.

In this patch, we also introduce a new parameter, PREFETCH_MAX_MEM_REFS_PER_LOOP.
We give up prefetching if the number of memory references in a loop is above this threshold.
This is based on both compilation time and memory consumption considerations. We found
that the memory allocation and freeing of data dependence relation (for reuse analysis) will become
the bottleneck if there is too many memory references in the loop.

This patch passed bootstrapping and completely fixed bug 44576.

Is it OK for the trunk?

Thanks,

Changpeng
Christian Borntraeger - July 9, 2010, 7:34 a.m.
Am Freitag 09 Juli 2010, 03:15:26 schrieb Fang, Changpeng:
> +  /* FIXME: the time should be weighted by the probabilities of the blocks in
> +     the loop body.  */
> +  time = tree_num_loop_insns (loop, &eni_time_weights);
> +  ahead = (PREFETCH_LATENCY + time - 1) / time;

Here I sometimes get a division by zero.
Zdenek Dvorak - July 9, 2010, 7:53 a.m.
Hi,

> Some prefetch analysis (e.g.  miss rate and dependence computations for reuse analysis) has
> been known to be expansive. We should avoid un-necessary analysis whenever possible.
> 
> In this patch, we split the cost model implementing function (is_loop_prefetching_profitable) into
> three small functions (trip_count_to_ahead_ratio_too_small_p, mem_ref_count_reasonable_p and
> insn_to_prefetch_ratio_too_small_p), with each one called at the earliest possible time.  In
> this way, we can reduce the compilation time.
> 
> In this patch, we also introduce a new parameter, PREFETCH_MAX_MEM_REFS_PER_LOOP.
> We give up prefetching if the number of memory references in a loop is above this threshold.
> This is based on both compilation time and memory consumption considerations. We found
> that the memory allocation and freeing of data dependence relation (for reuse analysis) will become
> the bottleneck if there is too many memory references in the loop.
> 
> This patch passed bootstrapping and completely fixed bug 44576.
> 
> Is it OK for the trunk?

OK.

Zdenek
Fang, Changpeng - July 9, 2010, 5:06 p.m.
Am Freitag 09 Juli 2010, 03:15:26 schrieb Fang, Changpeng:
>> +  /* FIXME: the time should be weighted by the probabilities of the blocks in
>> +     the loop body.  */
>> +  time = tree_num_loop_insns (loop, &eni_time_weights);
>> +  ahead = (PREFETCH_LATENCY + time - 1) / time;

>Here I sometimes get a division by zero.

That should not happen (time couldn't be zero). Would you please double check and give a testcase to show the problem
if possible?

Thanks,

Changpeng
Christian Borntraeger - July 11, 2010, 5:26 a.m.
Am Freitag 09 Juli 2010, 19:06:55 schrieb Fang, Changpeng:
> 
> Am Freitag 09 Juli 2010, 03:15:26 schrieb Fang, Changpeng:
> >> +  /* FIXME: the time should be weighted by the probabilities of the blocks in
> >> +     the loop body.  */
> >> +  time = tree_num_loop_insns (loop, &eni_time_weights);
> >> +  ahead = (PREFETCH_LATENCY + time - 1) / time;
> 
> >Here I sometimes get a division by zero.
> 
> That should not happen (time couldn't be zero). Would you please double check and give a testcase to show the problem
> if possible?

dependence.c from gcc (spec version) fails in such a way.
Since you added a check I guess you dont need a testcase any more?

Christian
Sebastian Pop - July 11, 2010, 10:52 p.m.
On Sun, Jul 11, 2010 at 00:26, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
> Am Freitag 09 Juli 2010, 19:06:55 schrieb Fang, Changpeng:
>>
>> Am Freitag 09 Juli 2010, 03:15:26 schrieb Fang, Changpeng:
>> >> +  /* FIXME: the time should be weighted by the probabilities of the blocks in
>> >> +     the loop body.  */
>> >> +  time = tree_num_loop_insns (loop, &eni_time_weights);
>> >> +  ahead = (PREFETCH_LATENCY + time - 1) / time;
>>
>> >Here I sometimes get a division by zero.
>>
>> That should not happen (time couldn't be zero). Would you please double check and give a testcase to show the problem
>> if possible?
>
> dependence.c from gcc (spec version) fails in such a way.
> Since you added a check I guess you dont need a testcase any more?
>

I think that testcases are always useful.  Please use delta as
described in http://gcc.gnu.org/wiki/A_guide_to_testcase_reduction
to reduce the ICE that you see, then add the reduced testcase for
this problem to the testsuite.

Thanks,
Sebastian
Christian Borntraeger - July 12, 2010, 10:29 a.m.
Am Montag 12 Juli 2010, 00:52:10 schrieb Sebastian Pop:
> I think that testcases are always useful.  Please use delta as
> described in http://gcc.gnu.org/wiki/A_guide_to_testcase_reduction
> to reduce the ICE that you see, then add the reduced testcase for
> this problem to the testsuite.

Ok. Something like the following?

2010-07-11  Christian Borntraeger  <borntraeger.de.ibm.com>

        * gcc.dg/tree-ssa/prefetch-empty.c: New test.

Index: gcc/testsuite/gcc.dg/prefetch-empty.c
===================================================================
*** /dev/null
--- gcc/testsuite/gcc.dg/tree-ssa/prefetch-empty.c
***************
*** 0 ****
--- 1,24 ----
+ /* This test case causes tree_num_loop_insns to return 0.  This has created
+    problems when this value was used as divisor.  */
+ /* { dg-do compile } */
+ /* { dg-options "-c -O1 -fprefetch-loop-arrays -march=core2" { target { i?86-*-* x86_64-*-* } } } */
+ /* { dg-options "-c -O1 -fprefetch-loop-arrays -march=z10" { target { s390x-*-* } } } */
+ 
+ 
+ typedef union tree_node *tree;
+ enum tree_code
+ {
+   E1
+ };
+ union tree_node
+ {
+   enum tree_code code:8;
+ };
+ void
+ build_def_use (tree exp)
+ {
+   while (exp)
+     switch (((enum tree_code) (exp)->code))
+       {
+       }
+ }
Jakub Jelinek - July 12, 2010, 10:33 a.m.
On Mon, Jul 12, 2010 at 12:29:46PM +0200, Christian Borntraeger wrote:
> Am Montag 12 Juli 2010, 00:52:10 schrieb Sebastian Pop:
> Index: gcc/testsuite/gcc.dg/prefetch-empty.c
> ===================================================================
> *** /dev/null
> --- gcc/testsuite/gcc.dg/tree-ssa/prefetch-empty.c
> ***************
> *** 0 ****
> --- 1,24 ----
> + /* This test case causes tree_num_loop_insns to return 0.  This has created
> +    problems when this value was used as divisor.  */
> + /* { dg-do compile } */
> + /* { dg-options "-c -O1 -fprefetch-loop-arrays -march=core2" { target { i?86-*-* x86_64-*-* } } } */
> + /* { dg-options "-c -O1 -fprefetch-loop-arrays -march=z10" { target { s390x-*-* } } } */

-c doesn't belong into dg-options.

	Jakub
Christian Borntraeger - July 12, 2010, 10:48 a.m.
> -c doesn't belong into dg-options.

Thanks. Here is an updated version.


2010-07-11  Christian Borntraeger  <borntraeger.de.ibm.com>

        * gcc.dg/tree-ssa/prefetch-empty.c: New test.

Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-empty.c
===================================================================
*** /dev/null
--- gcc/testsuite/gcc.dg/tree-ssa/prefetch-empty.c
***************
*** 0 ****
--- 1,24 ----
+ /* This test case causes tree_num_loop_insns to return 0.  This has created
+    problems when this value was used as divisor.  */
+ /* { dg-do compile } */
+ /* { dg-options "-O1 -fprefetch-loop-arrays -march=core2" { target { i?86-*-* x86_64-*-* } } } */
+ /* { dg-options "-O1 -fprefetch-loop-arrays -march=z10" { target { s390x-*-* } } } */
+ 
+ 
+ typedef union tree_node *tree;
+ enum tree_code
+ {
+   E1
+ };
+ union tree_node
+ {
+   enum tree_code code:8;
+ };
+ void
+ build_def_use (tree exp)
+ {
+   while (exp)
+     switch (((enum tree_code) (exp)->code))
+       {
+       }
+ }

Patch

From 8d878775b85c961ab02076c7deffebcc38cd35b5 Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@pathscale.(none)>
Date: Thu, 8 Jul 2010 13:32:11 -0700
Subject: [PATCH] pr44576 Avoid un-necessary prefetch analysis by distributing the cost models

	* tree-ssa-loop-prefetch.c (trip_count_to_ahead_ratio_too_small_p): New.
	Pull out from is_loop_prefetching_profitable to implement the trip count
	to ahead ratio heuristic.
        (mem_ref_count_reasonable_p): New.  Pull out from is_loop_prefetching_profitable
	to implement the instruction to memory reference ratio heuristic.  Also consider
	not reasonable if the memory reference count is above a threshold (to avoid
	explosive compilation time.
	(insn_to_prefetch_ratio_too_small_p): New.  Pull out from is_loop_prefetching_profitable
	to implement the instruction to prefetch ratio heuristic.
	(is_loop_prefetching_profitable): Removed.
	(loop_prefetch_arrays): Distribute the cost analysis across the function to
	allow early exit of the prefetch analysis.  is_loop_prefetching_profitable is
	splitted into three functions, with each one called as early as possible.
	(PREFETCH_MAX_MEM_REFS_PER_LOOP): New.  Threshold above which the number of
	memory references in a loop is considered too many.
---
 gcc/tree-ssa-loop-prefetch.c |  145 ++++++++++++++++++++++++++++-------------
 1 files changed, 99 insertions(+), 46 deletions(-)

diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index c3e90d2..9f2bbba 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -109,18 +109,23 @@  along with GCC; see the file COPYING3.  If not see
       prefetch instructions with guards in cases where 5) was not sufficient
       to satisfy the constraints?
 
-   The function is_loop_prefetching_profitable() implements a cost model
-   to determine if prefetching is profitable for a given loop. The cost
-   model has two heuristcs:
-   1. A heuristic that determines whether the given loop has enough CPU
-      ops that can be overlapped with cache missing memory ops.
-      If not, the loop won't benefit from prefetching. This is implemented
-      by requirung the ratio between the instruction count and the mem ref
-      count to be above a certain minimum.
-   2. A heuristic that disables prefetching in a loop with an unknown trip
-      count if the prefetching cost is above a certain limit. The relative
-      prefetching cost is estimated by taking the ratio between the
-      prefetch count and the total intruction count (this models the I-cache
+   A cost model is implemented to determine whether or not prefetching is
+   profitable for a given loop.  The cost model has three heuristics:
+   1. Function trip_count_to_ahead_ratio_too_small_p implements a heuristic
+      that determines whether or not the loop has too few iterations
+      (compared to ahead).  Prefetching is not likely to be beneficial if the
+      trip count to ahead ratio is below a certain minimum.
+   
+   2. Function mem_ref_count_reasonable_p implements a heuristic that determines
+      whether the given loop has enough CPU ops that can be overlapped with cache
+      missing memory ops.  If not, the loop won't benefit from prefetching.  In the
+      implementation, prefetching is not considered beneficial if the ratio between
+      the instruction count and the mem ref count is below a certain minimum.
+
+   3. Function insn_to_prefetch_ratio_too_small_p implements a heuristic that
+      disables prefetching in a loop if the prefetching cost is above a certain
+      limit.  The relative prefetching cost is estimated by taking the ratio between
+      the prefetch count and the total intruction count (this models the I-cache
       cost).
    The limits used in these heuristics are defined as parameters with
    reasonable default values. Machine-specific default values will be
@@ -237,6 +242,14 @@  struct mem_ref_group
 #define PREFETCH_MOD_TO_UNROLL_FACTOR_RATIO 4
 #endif
 
+/* Some of the prefetch computations have quadratic complexity.  We want to
+   avoid huge compile times and, therefore, want to limit the amount of
+   memory references per loop where we consider prefetching.  */
+
+#ifndef PREFETCH_MAX_MEM_REFS_PER_LOOP
+#define PREFETCH_MAX_MEM_REFS_PER_LOOP 200
+#endif
+
 /* The memory reference.  */
 
 struct mem_ref
@@ -1640,24 +1653,51 @@  determine_loop_nest_reuse (struct loop *loop, struct mem_ref_group *refs,
     }
 }
 
-/* Do a cost-benefit analysis to determine if prefetching is profitable
-   for the current loop given the following parameters:
+/* Determine whether or not the trip count to ahead ratio is too small based
+   on prefitablility consideration.
    AHEAD: the iteration ahead distance,
-   EST_NITER: the estimated trip count,
+   EST_NITER: the estimated trip count.  */
+
+static bool
+trip_count_to_ahead_ratio_too_small_p (unsigned ahead, HOST_WIDE_INT est_niter)
+{
+  /* Assume trip count to ahead ratio is big enough if the trip count could not
+     be estimated at compile time.  */
+  if (est_niter < 0)
+    return false;
+
+  if (est_niter < (HOST_WIDE_INT) (TRIP_COUNT_TO_AHEAD_RATIO * ahead))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Not prefetching -- loop estimated to roll only %d times\n",
+		 (int) est_niter);
+      return true;
+    }
+
+  return false;
+}
+
+/* Determine whether or not the number of memory references in the loop is
+   reasonable based on the profitablity and compilation time considerations.
    NINSNS: estimated number of instructions in the loop,
-   PREFETCH_COUNT: an estimate of the number of prefetches
    MEM_REF_COUNT: total number of memory references in the loop.  */
 
 static bool
-is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter,
-				unsigned ninsns, unsigned prefetch_count,
-				unsigned mem_ref_count, unsigned unroll_factor)
+mem_ref_count_reasonable_p (unsigned ninsns, unsigned mem_ref_count)
 {
-  int insn_to_mem_ratio, insn_to_prefetch_ratio;
+  int insn_to_mem_ratio;
 
   if (mem_ref_count == 0)
     return false;
 
+  /* Miss rate computation (is_miss_rate_acceptable) and dependence analysis
+     (compute_all_dependences) have high costs based on quadratic complexity.
+     To avoid huge compilation time, we give up prefetching if mem_ref_count
+     is too large.  */
+  if (mem_ref_count > PREFETCH_MAX_MEM_REFS_PER_LOOP)
+    return false;
+
   /* Prefetching improves performance by overlapping cache missing
      memory accesses with CPU operations.  If the loop does not have
      enough CPU operations to overlap with memory operations, prefetching
@@ -1678,6 +1718,21 @@  is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter,
       return false;
     }
 
+  return true;
+}
+
+/* Determine whether or not the instruction to prefetch ratio in the loop is
+   too small based on the profitablity consideration.
+   NINSNS: estimated number of instructions in the loop,
+   PREFETCH_COUNT: an estimate of the number of prefetches,
+   UNROLL_FACTOR:  the factor to unroll the loop if prefetching.  */
+
+static bool
+insn_to_prefetch_ratio_too_small_p (unsigned ninsns, unsigned prefetch_count,
+                                     unsigned unroll_factor)
+{
+  int insn_to_prefetch_ratio;
+
   /* Prefetching most likely causes performance degradation when the instruction
      to prefetch ratio is too small.  Too many prefetch instructions in a loop
      may reduce the I-cache performance.
@@ -1697,23 +1752,10 @@  is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter,
         fprintf (dump_file,
 		 "Not prefetching -- instruction to prefetch ratio (%d) too small\n",
 		 insn_to_prefetch_ratio);
-      return false;
+      return true;
     }
 
-  /* Could not do further estimation if the trip count is unknown.  Just assume
-     prefetching is profitable. Too aggressive???  */
-  if (est_niter < 0)
-    return true;
-
-  if (est_niter < (HOST_WIDE_INT) (TRIP_COUNT_TO_AHEAD_RATIO * ahead))
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file,
-		 "Not prefetching -- loop estimated to roll only %d times\n",
-		 (int) est_niter);
-      return false;
-    }
-  return true;
+  return false;
 }
 
 
@@ -1738,9 +1780,27 @@  loop_prefetch_arrays (struct loop *loop)
       return false;
     }
 
+  /* FIXME: the time should be weighted by the probabilities of the blocks in
+     the loop body.  */
+  time = tree_num_loop_insns (loop, &eni_time_weights);
+  ahead = (PREFETCH_LATENCY + time - 1) / time;
+  est_niter = estimated_loop_iterations_int (loop, false);
+
+  /* Prefetching is not likely to be profitable if the trip count to ahead
+     ratio is too small.  */
+  if (trip_count_to_ahead_ratio_too_small_p (ahead, est_niter))
+    return false;
+
+  ninsns = tree_num_loop_insns (loop, &eni_size_weights);
+
   /* Step 1: gather the memory references.  */
   refs = gather_memory_references (loop, &no_other_refs, &mem_ref_count);
 
+  /* Give up prefetching if the number of memory references in the loop is
+     not reasonable based on profitablity and compilation time considerations.  */
+  if (!mem_ref_count_reasonable_p (ninsns, mem_ref_count))
+    goto fail;
+
   /* Step 2: estimate the reuse effects.  */
   prune_by_reuse (refs);
 
@@ -1749,15 +1809,7 @@  loop_prefetch_arrays (struct loop *loop)
 
   determine_loop_nest_reuse (loop, refs, no_other_refs);
 
-  /* Step 3: determine the ahead and unroll factor.  */
-
-  /* FIXME: the time should be weighted by the probabilities of the blocks in
-     the loop body.  */
-  time = tree_num_loop_insns (loop, &eni_time_weights);
-  ahead = (PREFETCH_LATENCY + time - 1) / time;
-  est_niter = estimated_loop_iterations_int (loop, false);
-
-  ninsns = tree_num_loop_insns (loop, &eni_size_weights);
+  /* Step 3: determine unroll factor.  */
   unroll_factor = determine_unroll_factor (loop, refs, ninsns, &desc,
 					   est_niter);
 
@@ -1773,8 +1825,9 @@  loop_prefetch_arrays (struct loop *loop)
 	     ahead, unroll_factor, est_niter,
 	     ninsns, mem_ref_count, prefetch_count);
 
-  if (!is_loop_prefetching_profitable (ahead, est_niter, ninsns, prefetch_count,
-				       mem_ref_count, unroll_factor))
+  /* Prefetching is not likely to be profitable if the instruction to prefetch
+     ratio is too small.  */
+  if (insn_to_prefetch_ratio_too_small_p (ninsns, prefetch_count, unroll_factor))
     goto fail;
 
   mark_nontemporal_stores (loop, refs);
-- 
1.6.3.3