From patchwork Fri Jul 9 01:15:26 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [PR44576] : Don't do further analysis if we could know prefetching is not benefitial From: "Fang, Changpeng" X-Patchwork-Id: 58318 Message-Id: To: Zdenek Dvorak Cc: Richard Guenther , Sebastian Pop , Christian Borntraeger , "gcc-patches@gcc.gnu.org" , "uweigand@de.ibm.com" Date: Thu, 8 Jul 2010 20:15:26 -0500 Hi, Attached is the patch to fix bug 44576. Some prefetch analysis (e.g. miss rate and dependence computations for reuse analysis) has been known to be expansive. We should avoid un-necessary analysis whenever possible. In this patch, we split the cost model implementing function (is_loop_prefetching_profitable) into three small functions (trip_count_to_ahead_ratio_too_small_p, mem_ref_count_reasonable_p and insn_to_prefetch_ratio_too_small_p), with each one called at the earliest possible time. In this way, we can reduce the compilation time. In this patch, we also introduce a new parameter, PREFETCH_MAX_MEM_REFS_PER_LOOP. We give up prefetching if the number of memory references in a loop is above this threshold. This is based on both compilation time and memory consumption considerations. We found that the memory allocation and freeing of data dependence relation (for reuse analysis) will become the bottleneck if there is too many memory references in the loop. This patch passed bootstrapping and completely fixed bug 44576. Is it OK for the trunk? Thanks, Changpeng >From 8d878775b85c961ab02076c7deffebcc38cd35b5 Mon Sep 17 00:00:00 2001 From: Changpeng Fang Date: Thu, 8 Jul 2010 13:32:11 -0700 Subject: [PATCH] pr44576 Avoid un-necessary prefetch analysis by distributing the cost models * tree-ssa-loop-prefetch.c (trip_count_to_ahead_ratio_too_small_p): New. Pull out from is_loop_prefetching_profitable to implement the trip count to ahead ratio heuristic. (mem_ref_count_reasonable_p): New. Pull out from is_loop_prefetching_profitable to implement the instruction to memory reference ratio heuristic. Also consider not reasonable if the memory reference count is above a threshold (to avoid explosive compilation time. (insn_to_prefetch_ratio_too_small_p): New. Pull out from is_loop_prefetching_profitable to implement the instruction to prefetch ratio heuristic. (is_loop_prefetching_profitable): Removed. (loop_prefetch_arrays): Distribute the cost analysis across the function to allow early exit of the prefetch analysis. is_loop_prefetching_profitable is splitted into three functions, with each one called as early as possible. (PREFETCH_MAX_MEM_REFS_PER_LOOP): New. Threshold above which the number of memory references in a loop is considered too many. --- gcc/tree-ssa-loop-prefetch.c | 145 ++++++++++++++++++++++++++++------------- 1 files changed, 99 insertions(+), 46 deletions(-) diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c index c3e90d2..9f2bbba 100644 --- a/gcc/tree-ssa-loop-prefetch.c +++ b/gcc/tree-ssa-loop-prefetch.c @@ -109,18 +109,23 @@ along with GCC; see the file COPYING3. If not see prefetch instructions with guards in cases where 5) was not sufficient to satisfy the constraints? - The function is_loop_prefetching_profitable() implements a cost model - to determine if prefetching is profitable for a given loop. The cost - model has two heuristcs: - 1. A heuristic that determines whether the given loop has enough CPU - ops that can be overlapped with cache missing memory ops. - If not, the loop won't benefit from prefetching. This is implemented - by requirung the ratio between the instruction count and the mem ref - count to be above a certain minimum. - 2. A heuristic that disables prefetching in a loop with an unknown trip - count if the prefetching cost is above a certain limit. The relative - prefetching cost is estimated by taking the ratio between the - prefetch count and the total intruction count (this models the I-cache + A cost model is implemented to determine whether or not prefetching is + profitable for a given loop. The cost model has three heuristics: + 1. Function trip_count_to_ahead_ratio_too_small_p implements a heuristic + that determines whether or not the loop has too few iterations + (compared to ahead). Prefetching is not likely to be beneficial if the + trip count to ahead ratio is below a certain minimum. + + 2. Function mem_ref_count_reasonable_p implements a heuristic that determines + whether the given loop has enough CPU ops that can be overlapped with cache + missing memory ops. If not, the loop won't benefit from prefetching. In the + implementation, prefetching is not considered beneficial if the ratio between + the instruction count and the mem ref count is below a certain minimum. + + 3. Function insn_to_prefetch_ratio_too_small_p implements a heuristic that + disables prefetching in a loop if the prefetching cost is above a certain + limit. The relative prefetching cost is estimated by taking the ratio between + the prefetch count and the total intruction count (this models the I-cache cost). The limits used in these heuristics are defined as parameters with reasonable default values. Machine-specific default values will be @@ -237,6 +242,14 @@ struct mem_ref_group #define PREFETCH_MOD_TO_UNROLL_FACTOR_RATIO 4 #endif +/* Some of the prefetch computations have quadratic complexity. We want to + avoid huge compile times and, therefore, want to limit the amount of + memory references per loop where we consider prefetching. */ + +#ifndef PREFETCH_MAX_MEM_REFS_PER_LOOP +#define PREFETCH_MAX_MEM_REFS_PER_LOOP 200 +#endif + /* The memory reference. */ struct mem_ref @@ -1640,24 +1653,51 @@ determine_loop_nest_reuse (struct loop *loop, struct mem_ref_group *refs, } } -/* Do a cost-benefit analysis to determine if prefetching is profitable - for the current loop given the following parameters: +/* Determine whether or not the trip count to ahead ratio is too small based + on prefitablility consideration. AHEAD: the iteration ahead distance, - EST_NITER: the estimated trip count, + EST_NITER: the estimated trip count. */ + +static bool +trip_count_to_ahead_ratio_too_small_p (unsigned ahead, HOST_WIDE_INT est_niter) +{ + /* Assume trip count to ahead ratio is big enough if the trip count could not + be estimated at compile time. */ + if (est_niter < 0) + return false; + + if (est_niter < (HOST_WIDE_INT) (TRIP_COUNT_TO_AHEAD_RATIO * ahead)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Not prefetching -- loop estimated to roll only %d times\n", + (int) est_niter); + return true; + } + + return false; +} + +/* Determine whether or not the number of memory references in the loop is + reasonable based on the profitablity and compilation time considerations. NINSNS: estimated number of instructions in the loop, - PREFETCH_COUNT: an estimate of the number of prefetches MEM_REF_COUNT: total number of memory references in the loop. */ static bool -is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter, - unsigned ninsns, unsigned prefetch_count, - unsigned mem_ref_count, unsigned unroll_factor) +mem_ref_count_reasonable_p (unsigned ninsns, unsigned mem_ref_count) { - int insn_to_mem_ratio, insn_to_prefetch_ratio; + int insn_to_mem_ratio; if (mem_ref_count == 0) return false; + /* Miss rate computation (is_miss_rate_acceptable) and dependence analysis + (compute_all_dependences) have high costs based on quadratic complexity. + To avoid huge compilation time, we give up prefetching if mem_ref_count + is too large. */ + if (mem_ref_count > PREFETCH_MAX_MEM_REFS_PER_LOOP) + return false; + /* Prefetching improves performance by overlapping cache missing memory accesses with CPU operations. If the loop does not have enough CPU operations to overlap with memory operations, prefetching @@ -1678,6 +1718,21 @@ is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter, return false; } + return true; +} + +/* Determine whether or not the instruction to prefetch ratio in the loop is + too small based on the profitablity consideration. + NINSNS: estimated number of instructions in the loop, + PREFETCH_COUNT: an estimate of the number of prefetches, + UNROLL_FACTOR: the factor to unroll the loop if prefetching. */ + +static bool +insn_to_prefetch_ratio_too_small_p (unsigned ninsns, unsigned prefetch_count, + unsigned unroll_factor) +{ + int insn_to_prefetch_ratio; + /* Prefetching most likely causes performance degradation when the instruction to prefetch ratio is too small. Too many prefetch instructions in a loop may reduce the I-cache performance. @@ -1697,23 +1752,10 @@ is_loop_prefetching_profitable (unsigned ahead, HOST_WIDE_INT est_niter, fprintf (dump_file, "Not prefetching -- instruction to prefetch ratio (%d) too small\n", insn_to_prefetch_ratio); - return false; + return true; } - /* Could not do further estimation if the trip count is unknown. Just assume - prefetching is profitable. Too aggressive??? */ - if (est_niter < 0) - return true; - - if (est_niter < (HOST_WIDE_INT) (TRIP_COUNT_TO_AHEAD_RATIO * ahead)) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, - "Not prefetching -- loop estimated to roll only %d times\n", - (int) est_niter); - return false; - } - return true; + return false; } @@ -1738,9 +1780,27 @@ loop_prefetch_arrays (struct loop *loop) return false; } + /* FIXME: the time should be weighted by the probabilities of the blocks in + the loop body. */ + time = tree_num_loop_insns (loop, &eni_time_weights); + ahead = (PREFETCH_LATENCY + time - 1) / time; + est_niter = estimated_loop_iterations_int (loop, false); + + /* Prefetching is not likely to be profitable if the trip count to ahead + ratio is too small. */ + if (trip_count_to_ahead_ratio_too_small_p (ahead, est_niter)) + return false; + + ninsns = tree_num_loop_insns (loop, &eni_size_weights); + /* Step 1: gather the memory references. */ refs = gather_memory_references (loop, &no_other_refs, &mem_ref_count); + /* Give up prefetching if the number of memory references in the loop is + not reasonable based on profitablity and compilation time considerations. */ + if (!mem_ref_count_reasonable_p (ninsns, mem_ref_count)) + goto fail; + /* Step 2: estimate the reuse effects. */ prune_by_reuse (refs); @@ -1749,15 +1809,7 @@ loop_prefetch_arrays (struct loop *loop) determine_loop_nest_reuse (loop, refs, no_other_refs); - /* Step 3: determine the ahead and unroll factor. */ - - /* FIXME: the time should be weighted by the probabilities of the blocks in - the loop body. */ - time = tree_num_loop_insns (loop, &eni_time_weights); - ahead = (PREFETCH_LATENCY + time - 1) / time; - est_niter = estimated_loop_iterations_int (loop, false); - - ninsns = tree_num_loop_insns (loop, &eni_size_weights); + /* Step 3: determine unroll factor. */ unroll_factor = determine_unroll_factor (loop, refs, ninsns, &desc, est_niter); @@ -1773,8 +1825,9 @@ loop_prefetch_arrays (struct loop *loop) ahead, unroll_factor, est_niter, ninsns, mem_ref_count, prefetch_count); - if (!is_loop_prefetching_profitable (ahead, est_niter, ninsns, prefetch_count, - mem_ref_count, unroll_factor)) + /* Prefetching is not likely to be profitable if the instruction to prefetch + ratio is too small. */ + if (insn_to_prefetch_ratio_too_small_p (ninsns, prefetch_count, unroll_factor)) goto fail; mark_nontemporal_stores (loop, refs); -- 1.6.3.3