Patchwork [PR,44576] : imprivement in compute_miss_rate for prefetching loop arrays

login
register
mail settings
Submitter Fang, Changpeng
Date June 29, 2010, 12:01 a.m.
Message ID <D4C76825A6780047854A11E93CDE84D02F7757@SAUSEXMBP01.amd.com>
Download mbox | patch
Permalink /patch/57215/
State New
Headers show

Comments

Fang, Changpeng - June 29, 2010, 12:01 a.m.
Hi,

Attached is the patch that partially fixes bug 44576:  testsuite/gfortran.dg/zero_sized_1.f90 with huge compile 
time on prefetching + peeling.

This patch avoid useless computation of miss rate because, if delta (address diference) is greater than or equal to
cache line size, The two references will never hit the same cache size and thus all misses.

This patch reduces the compile time of the test case from 5m30'' to 1m20'' on an amd-linux64 system.
Note that without -fprefetching-loop-arrays, the compile time on the same system is 30'', and I am still
working on reducing the complexity of reuse analysis and miss rate computation.

The patch passed Bootstrapping and regression tests.

Is this patch OK to commit?

Thanks,

Changpeng
Richard Guenther - June 29, 2010, 9:33 a.m.
On Tue, Jun 29, 2010 at 2:01 AM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
> Hi,
>
> Attached is the patch that partially fixes bug 44576:  testsuite/gfortran.dg/zero_sized_1.f90 with huge compile
> time on prefetching + peeling.
>
> This patch avoid useless computation of miss rate because, if delta (address diference) is greater than or equal to
> cache line size, The two references will never hit the same cache size and thus all misses.
>
> This patch reduces the compile time of the test case from 5m30'' to 1m20'' on an amd-linux64 system.
> Note that without -fprefetching-loop-arrays, the compile time on the same system is 30'', and I am still
> working on reducing the complexity of reuse analysis and miss rate computation.
>
> The patch passed Bootstrapping and regression tests.
>
> Is this patch OK to commit?

Ok.

Thanks,
Richard.

> Thanks,
>
> Changpeng
Zdenek Dvorak - June 29, 2010, 4:40 p.m.
Hi,

> Is this patch OK to commit?

yes,

Zdenek
Sebastian Pop - July 2, 2010, 4:42 p.m.
On Tue, Jun 29, 2010 at 04:33, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, Jun 29, 2010 at 2:01 AM, Fang, Changpeng <Changpeng.Fang@amd.com> wrote:
>> Hi,
>>
>> Attached is the patch that partially fixes bug 44576:  testsuite/gfortran.dg/zero_sized_1.f90 with huge compile
>> time on prefetching + peeling.
>>
>> This patch avoid useless computation of miss rate because, if delta (address diference) is greater than or equal to
>> cache line size, The two references will never hit the same cache size and thus all misses.
>>
>> This patch reduces the compile time of the test case from 5m30'' to 1m20'' on an amd-linux64 system.
>> Note that without -fprefetching-loop-arrays, the compile time on the same system is 30'', and I am still
>> working on reducing the complexity of reuse analysis and miss rate computation.
>>
>> The patch passed Bootstrapping and regression tests.
>>
>> Is this patch OK to commit?
>
> Ok.
>

Committed r161727

Patch

From b29f8edf2b1a068ab7271746e8c621446e342dc1 Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@pathscale.(none)>
Date: Mon, 28 Jun 2010 10:23:36 -0700
Subject: [PATCH 4/4] pr 44576: miss rate computation improvement for prefetching loop arrays.

	* tree-ssa-loop-prefetch.c (compute_miss_rate): Return 1000 (out of 1000)
	for miss rate if the address diference is greater than or equal to the
	cache line size (the two reference will never hit the same cache
	line).
---
 gcc/tree-ssa-loop-prefetch.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index 548c3e4..27e2b42 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -654,6 +654,11 @@  compute_miss_rate (unsigned HOST_WIDE_INT cache_line_size,
   int total_positions, miss_positions, miss_rate;
   int address1, address2, cache_line1, cache_line2;
 
+  /* It always misses if delta is greater than or equal to the cache
+     line size.  */ 
+  if (delta >= cache_line_size)
+    return 1000;
+
   total_positions = 0;
   miss_positions = 0;
 
-- 
1.6.3.3