diff mbox

PR44185 Fix new prefetch test failures - second

Message ID D4C76825A6780047854A11E93CDE84D02F772D@SAUSEXMBP01.amd.com
State New
Headers show

Commit Message

Fang, Changpeng June 10, 2010, 12:13 a.m. UTC
Hi,

>> Attached is the patch to fix PR 44185: new prefetch failures. After the previous fix
>> of gcc.dg/tree-ssa/prefetch-c, new failure occurs because the number of non-temporal
>> stores is different in the assembler and .optimized files for different architectures. The
>> reason is that the unroll_factor is different (and this is why in the original test case the unroll
>> factor is limited to 1).
>>
>> In this patch, we don't count the exact number of non-temporal stores in the assembler and
>> .optimized files. Instead, we just scan for the existence.
>>
>> Is it OK for the trunk?

>yes, overall; but, with this change, we would not recognize if some change
>breaks the optimization for just one of the loops.  I guess the test needs to
>be split to several testcases, one for each loop,

Attached please find the patch that splits the tests in the following ways:
prefetch-7.c: split loops that generate any non-temporal stores out.  As a result, no non-temporal
stores should be generated in any loops in prefetch-7.c now.

prefetch-8.c: contains a loop that should generate one non-temporal store (before unrolling).

prefetch-9.c: contains a loop that should generate one non-temporal store and one non-temporal prefetch.

Is this OK?

Thanks,

Changpeng

Comments

Zdenek Dvorak June 10, 2010, 7:07 a.m. UTC | #1
Hi,

> >> Attached is the patch to fix PR 44185: new prefetch failures. After the previous fix
> >> of gcc.dg/tree-ssa/prefetch-c, new failure occurs because the number of non-temporal
> >> stores is different in the assembler and .optimized files for different architectures. The
> >> reason is that the unroll_factor is different (and this is why in the original test case the unroll
> >> factor is limited to 1).
> >>
> >> In this patch, we don't count the exact number of non-temporal stores in the assembler and
> >> .optimized files. Instead, we just scan for the existence.
> >>
> >> Is it OK for the trunk?
> 
> >yes, overall; but, with this change, we would not recognize if some change
> >breaks the optimization for just one of the loops.  I guess the test needs to
> >be split to several testcases, one for each loop,
> 
> Attached please find the patch that splits the tests in the following ways:
> prefetch-7.c: split loops that generate any non-temporal stores out.  As a result, no non-temporal
> stores should be generated in any loops in prefetch-7.c now.
> 
> prefetch-8.c: contains a loop that should generate one non-temporal store (before unrolling).
> 
> prefetch-9.c: contains a loop that should generate one non-temporal store and one non-temporal prefetch.
> 
> Is this OK?

OK,

Zdenek
Sebastian Pop June 10, 2010, 5:55 p.m. UTC | #2
Committed r160566.

Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools

On Thu, Jun 10, 2010 at 02:07, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> >> Attached is the patch to fix PR 44185: new prefetch failures. After the previous fix
>> >> of gcc.dg/tree-ssa/prefetch-c, new failure occurs because the number of non-temporal
>> >> stores is different in the assembler and .optimized files for different architectures. The
>> >> reason is that the unroll_factor is different (and this is why in the original test case the unroll
>> >> factor is limited to 1).
>> >>
>> >> In this patch, we don't count the exact number of non-temporal stores in the assembler and
>> >> .optimized files. Instead, we just scan for the existence.
>> >>
>> >> Is it OK for the trunk?
>>
>> >yes, overall; but, with this change, we would not recognize if some change
>> >breaks the optimization for just one of the loops.  I guess the test needs to
>> >be split to several testcases, one for each loop,
>>
>> Attached please find the patch that splits the tests in the following ways:
>> prefetch-7.c: split loops that generate any non-temporal stores out.  As a result, no non-temporal
>> stores should be generated in any loops in prefetch-7.c now.
>>
>> prefetch-8.c: contains a loop that should generate one non-temporal store (before unrolling).
>>
>> prefetch-9.c: contains a loop that should generate one non-temporal store and one non-temporal prefetch.
>>
>> Is this OK?
>
> OK,
>
> Zdenek
>
diff mbox

Patch

From 5545109a5d8da47e9f71658a282623fb9e4e8a52 Mon Sep 17 00:00:00 2001
From: Changpeng Fang <chfang@houghton.(none)>
Date: Wed, 9 Jun 2010 16:54:21 -0700
Subject: [PATCH] pr44185: fix prefetch test failures

	*gcc.dg/tree-ssa/prefetch-7.c: take the loops that will generate
	non-temporal stores out of the tests to form new test cases. As
	a result, no non-temporal store should be generated in this case.

	*gcc.dg/tree-ssa/prefetch-8.c: New. Test from original prefetch-7.c
	that generate one non-temporal store.

	*gcc.dg/tree-ssa/prefetch-9.c: New. Test from original prefetch-7.c
	that generate one non-temporal store and one one-temporal prefetch.
---
 gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c |   22 ++++--------------
 gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c |   28 ++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c |   32 ++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c b/gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c
index 3b9e19f..9e453a7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c
@@ -5,20 +5,12 @@ 
 /* { dg-options "-O2 -fprefetch-loop-arrays -march=athlon -msse2 -mfpmath=sse --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
 
 #define K 1000000
-int a[K], b[K];
+int a[K];
 
 void test(int *p)
 {
   unsigned i;
 
-  /* Nontemporal store should be used for a.  */
-  for (i = 0; i < K; i++)
-    a[i] = 0;
-
-  /* Nontemporal store should be used for a, nontemporal prefetch for b.  */
-  for (i = 0; i < K; i++)
-    a[i] = b[i];
-
   /* Nontemporal store should not be used here (only write and read temporal
      prefetches).  */
   for (i = 0; i < K - 10000; i++)
@@ -44,18 +36,14 @@  void test(int *p)
 }
 
 /* { dg-final { scan-tree-dump-times "Issued prefetch" 5 "aprefetch" } } */
-/* { dg-final { scan-tree-dump-times "Issued nontemporal prefetch" 3 "aprefetch" } } */
-/* { dg-final { scan-tree-dump-times "a nontemporal store" 2 "aprefetch" } } */
+/* { dg-final { scan-tree-dump-times "Issued nontemporal prefetch" 2 "aprefetch" } } */
+/* { dg-final { scan-tree-dump-times "a nontemporal store" 0 "aprefetch" } } */
 
-/* { dg-final { scan-tree-dump-times "builtin_prefetch" 8 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "=\\{nt\\}" 18 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "__builtin_ia32_mfence" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "builtin_prefetch" 7 "optimized" } } */
 
 /* { dg-final { scan-assembler-times "prefetchw" 5 } } */
 /* { dg-final { scan-assembler-times "prefetcht" 1 } } */
-/* { dg-final { scan-assembler-times "prefetchnta" 2 } } */
-/* { dg-final { scan-assembler-times "movnti" 18 } } */
-/* { dg-final { scan-assembler-times "mfence" 2 } } */
+/* { dg-final { scan-assembler-times "prefetchnta" 1 } } */
 
 /* { dg-final { cleanup-tree-dump "aprefetch" } } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c b/gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c
new file mode 100644
index 0000000..a05d552
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c
@@ -0,0 +1,28 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-effective-target ilp32 } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-skip-if "" { i?86-*-* x86_64-*-* } { "-march=*" } { "-march=athlon" } } */
+/* { dg-options "-O2 -fprefetch-loop-arrays -march=athlon -msse2 -mfpmath=sse --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
+
+#define K 1000000
+int a[K];
+
+void test()
+{
+  unsigned i;
+
+  /* Nontemporal store should be used for a.  */
+  for (i = 0; i < K; i++)
+    a[i] = 0;
+}
+
+/* { dg-final { scan-tree-dump-times "a nontemporal store" 1 "aprefetch" } } */
+
+/* { dg-final { scan-tree-dump "=\\{nt\\}" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_ia32_mfence" 1 "optimized" } } */
+
+/* { dg-final { scan-assembler "movnti" } } */
+/* { dg-final { scan-assembler-times "mfence" 1 } } */
+
+/* { dg-final { cleanup-tree-dump "aprefetch" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c b/gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c
new file mode 100644
index 0000000..eb22a66
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c
@@ -0,0 +1,32 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-effective-target ilp32 } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-skip-if "" { i?86-*-* x86_64-*-* } { "-march=*" } { "-march=athlon" } } */
+/* { dg-options "-O2 -fprefetch-loop-arrays -march=athlon -msse2 -mfpmath=sse --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
+
+#define K 1000000
+int a[K], b[K];
+
+void test()
+{
+  unsigned i;
+
+  /* Nontemporal store should be used for a, nontemporal prefetch for b.  */
+  for (i = 0; i < K; i++)
+    a[i] = b[i];
+
+}
+
+/* { dg-final { scan-tree-dump-times "Issued nontemporal prefetch" 1 "aprefetch" } } */
+/* { dg-final { scan-tree-dump-times "a nontemporal store" 1 "aprefetch" } } */
+
+/* { dg-final { scan-tree-dump-times "builtin_prefetch" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump "=\\{nt\\}" "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_ia32_mfence" 1 "optimized" } } */
+
+/* { dg-final { scan-assembler-times "prefetchnta" 1 } } */
+/* { dg-final { scan-assembler "movnti" } } */
+/* { dg-final { scan-assembler-times "mfence" 1 } } */
+
+/* { dg-final { cleanup-tree-dump "aprefetch" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
-- 
1.6.3.3