[PR,44576] : imprivement in compute_miss_rate for prefetching loop arrays

Submitted by Fang, Changpeng on June 29, 2010, 12:01 a.m.

Details

Message ID D4C76825A6780047854A11E93CDE84D02F7757@SAUSEXMBP01.amd.com
State New
Headers show

Commit Message

Fang, Changpeng June 29, 2010, 12:01 a.m.
Hi,

Attached is the patch that partially fixes bug 44576:  testsuite/gfortran.dg/zero_sized_1.f90 with huge compile 
time on prefetching + peeling.

This patch avoid useless computation of miss rate because, if delta (address diference) is greater than or equal to
cache line size, The two references will never hit the same cache size and thus all misses.

This patch reduces the compile time of the test case from 5m30'' to 1m20'' on an amd-linux64 system.
Note that without -fprefetching-loop-arrays, the compile time on the same system is 30'', and I am still
working on reducing the complexity of reuse analysis and miss rate computation.

The patch passed Bootstrapping and regression tests.

Is this patch OK to commit?

Thanks,

Changpeng

Comments

Richard Guenther June 29, 2010, 9:33 a.m.
On Tue, Jun 29, 2010 at 2:01 AM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> Hi,
>
> Attached is the patch that partially fixes bug 44576:  testsuite/gfortran.dg/zero_sized_1.f90 with huge compile
> time on prefetching + peeling.
>
> This patch avoid useless computation of miss rate because, if delta (address diference) is greater than or equal to
> cache line size, The two references will never hit the same cache size and thus all misses.
>
> This patch reduces the compile time of the test case from 5m30'' to 1m20'' on an amd-linux64 system.
> Note that without -fprefetching-loop-arrays, the compile time on the same system is 30'', and I am still
> working on reducing the complexity of reuse analysis and miss rate computation.
>
> The patch passed Bootstrapping and regression tests.
>
> Is this patch OK to commit?

Ok.

Thanks,
Richard.

> Thanks,
>
> Changpeng
Zdenek Dvorak June 29, 2010, 4:40 p.m.
Hi,

> Is this patch OK to commit?

yes,

Zdenek
Sebastian Pop July 2, 2010, 4:42 p.m.
On Tue, Jun 29, 2010 at 04:33, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Jun 29, 2010 at 2:01 AM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
>> Hi,
>>
>> Attached is the patch that partially fixes bug 44576:  testsuite/gfortran.dg/zero_sized_1.f90 with huge compile
>> time on prefetching + peeling.
>>
>> This patch avoid useless computation of miss rate because, if delta (address diference) is greater than or equal to
>> cache line size, The two references will never hit the same cache size and thus all misses.
>>
>> This patch reduces the compile time of the test case from 5m30'' to 1m20'' on an amd-linux64 system.
>> Note that without -fprefetching-loop-arrays, the compile time on the same system is 30'', and I am still
>> working on reducing the complexity of reuse analysis and miss rate computation.
>>
>> The patch passed Bootstrapping and regression tests.
>>
>> Is this patch OK to commit?
>
> Ok.
>

Committed r161727

Patch hide | download patch | download mbox

From b29f8edf2b1a068ab7271746e8c621446e342dc1 Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@pathscale.(none)>
Date: Mon, 28 Jun 2010 10:23:36 -0700
Subject: [PATCH 4/4] pr 44576: miss rate computation improvement for prefetching loop arrays.

	* tree-ssa-loop-prefetch.c (compute_miss_rate): Return 1000 (out of 1000)
	for miss rate if the address diference is greater than or equal to the
	cache line size (the two reference will never hit the same cache
	line).
---
 gcc/tree-ssa-loop-prefetch.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index 548c3e4..27e2b42 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -654,6 +654,11 @@  compute_miss_rate (unsigned HOST_WIDE_INT cache_line_size,
   int total_positions, miss_positions, miss_rate;
   int address1, address2, cache_line1, cache_line2;
 
+  /* It always misses if delta is greater than or equal to the cache
+     line size.  */ 
+  if (delta >= cache_line_size)
+    return 1000;
+
   total_positions = 0;
   miss_positions = 0;
 
-- 
1.6.3.3