Message ID | 7794A52CE4D579448B959EED7DD0A472138F1C2A@sausexdag03.amd.com |
---|---|
State | New |
Headers | show |
On Mon, Oct 8, 2012 at 12:01 PM, Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> wrote: > Hi Richard, > > I have incorporated your comments. > >> Yes, call dump_mem_ref then, instead of repeating parts of its body. > > Reference object is not yet created at the place we check for invariance. It is still a tree expression. I created a common function and used at all places to dump the "step", "base" and "delta" values of memory reference being analyzed. > > Please find the modified patch attached. > > GCC regression "make check -k" passes with x86_64-unknown-linux-gnu. I presume also bootstrapped. Ok. Thanks, Richard. > Regards, > Venkat. > > -----Original Message----- > From: Richard Guenther [mailto:richard.guenther@gmail.com] > Sent: Thursday, October 04, 2012 6:26 PM > To: Kumar, Venkataramanan > Cc: Richard Guenther; gcc-patches@gcc.gnu.org > Subject: Re: [Patch] Fix PR53397 > > On Tue, Oct 2, 2012 at 6:40 PM, Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> wrote: >> Hi Richi, >> >> (Snip) >>> + (!cst_and_fits_in_hwi (step)) >>> + { >>> + if( loop->inner != NULL) >>> + { >>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>> + { >>> + fprintf (dump_file, "Reference %p:\n", (void *) ref); >>> + fprintf (dump_file, "(base " ); >>> + print_generic_expr (dump_file, base, TDF_SLIM); >>> + fprintf (dump_file, ", step "); >>> + print_generic_expr (dump_file, step, TDF_TREE); >>> + fprintf (dump_file, ")\n"); >> >> No need to repeat this - all references are dumped when we gather them. >> (Snip) >> >> The dumping happens at "record_ref" which is called after these statements to record these references. >> >> When the step is invariant we return from the function without recording the references. >> >> so I thought of dumping the references here. >> >> Is there a cleaner way to dump the references at one place? > > Yes, call dump_mem_ref then, instead of repeating parts of its body. > > Richard. > >> Regards, >> Venkat. >> >> >> >> -----Original Message----- >> From: Richard Guenther [mailto:rguenther@suse.de] >> Sent: Tuesday, October 02, 2012 5:42 PM >> To: Kumar, Venkataramanan >> Cc: gcc-patches@gcc.gnu.org >> Subject: Re: [Patch] Fix PR53397 >> >> On Mon, 1 Oct 2012, venkataramanan.kumar@amd.com wrote: >> >>> Hi, >>> >>> The below patch fixes the FFT/Scimark regression caused by useless >>> prefetch generation. >>> >>> This fix tries to make prefetch less aggressive by prefetching arrays >>> in the inner loop, when the step is invariant in the entire loop nest. >>> >>> GCC currently tries to prefetch invariant steps when they are in the >>> inner loop. But does not check if the step is variant in outer loops. >>> >>> In the scimark FFT case, the trip count of the inner loop varies by a >>> non constant step, which is invariant in the inner loop. >>> But the step variable is varying in outer loop. This makes inner loop >>> trip count small (at run time varies sometimes as small as 1 >>> iteration) >>> >>> Prefetching ahead x iteration when the inner loop trip count is >>> smaller than x leads to useless prefetches. >>> >>> Flag used: -O3 -march=amdfam10 >>> >>> Before >>> ** ** >>> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** >>> ** for details. (Results can be submitted to pozo@nist.gov) ** >>> ** ** >>> Using 2.00 seconds min time per kenel. >>> Composite Score: 550.50 >>> FFT Mflops: 38.66 (N=1024) >>> SOR Mflops: 617.61 (100 x 100) >>> MonteCarlo: Mflops: 173.74 >>> Sparse matmult Mflops: 675.63 (N=1000, nz=5000) >>> LU Mflops: 1246.88 (M=100, N=100) >>> >>> >>> After >>> ** ** >>> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** >>> ** for details. (Results can be submitted to pozo@nist.gov) ** >>> ** ** >>> Using 2.00 seconds min time per kenel. >>> Composite Score: 639.20 >>> FFT Mflops: 479.19 (N=1024) >>> SOR Mflops: 617.61 (100 x 100) >>> MonteCarlo: Mflops: 173.18 >>> Sparse matmult Mflops: 679.13 (N=1000, nz=5000) >>> LU Mflops: 1246.88 (M=100, N=100) >>> >>> GCC regression "make check -k" passes with x86_64-unknown-linux-gnu >>> New tests that PASS: >>> >>> gcc.dg/pr53397-1.c scan-assembler prefetcht0 gcc.dg/pr53397-1.c >>> scan-tree-dump aprefetch "Issued prefetch" >>> gcc.dg/pr53397-1.c (test for excess errors) gcc.dg/pr53397-2.c >>> scan-tree-dump aprefetch "loop variant step" >>> gcc.dg/pr53397-2.c scan-tree-dump aprefetch "Not prefetching" >>> gcc.dg/pr53397-2.c (test for excess errors) >>> >>> >>> Checked CPU2006 and polyhedron on latest AMD processor, no regressions noted. >>> >>> Ok to commit in trunk? >>> >>> regards, >>> Venkat >>> >>> gcc/ChangeLog >>> +2012-10-01 Venkataramanan Kumar <venkataramanan.kumar@amd.com> >>> + >>> + * tree-ssa-loop-prefetch.c (gather_memory_references_ref):$ >>> + Perform non constant step prefetching in inner loop, only $ >>> + when it is invariant in the entire loop nest. $ >>> + * testsuite/gcc.dg/pr53397-1.c: New test case $ >>> + Checks we are prefecthing for loop invariant steps$ >>> + * testsuite/gcc.dg/pr53397-2.c: New test case$ >>> + Checks we are not prefecthing for loop variant steps >>> + >>> >>> >>> Index: gcc/testsuite/gcc.dg/pr53397-1.c >>> =================================================================== >>> --- gcc/testsuite/gcc.dg/pr53397-1.c (revision 0) >>> +++ gcc/testsuite/gcc.dg/pr53397-1.c (revision 0) >>> @@ -0,0 +1,28 @@ >>> +/* Prefetching when the step is loop invariant. */ >>> + >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O3 -fprefetch-loop-arrays >>> +-fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 >>> +--param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } >>> +*/ >>> + >>> + >>> +double data[16384]; >>> +void prefetch_when_non_constant_step_is_invariant(int step, int n) { >>> + int a; >>> + int b; >>> + for (a = 1; a < step; a++) { >>> + for (b = 0; b < n; b += 2 * step) { >>> + >>> + int i = 2*(b + a); >>> + int j = 2*(b + a + step); >>> + >>> + >>> + data[j] = data[i]; >>> + data[j+1] = data[i+1]; >>> + } >>> + } >>> +} >>> + >>> +/* { dg-final { scan-tree-dump "Issued prefetch" "aprefetch" } } */ >>> +/* { dg-final { scan-assembler "prefetcht0" } } */ >> >> This (and the case below) needs to be adjusted to only run on the appropriate hardware. See for example gcc.dg/tree-ssa/prefetch-8.c for how to do this. >> >>> +/* { dg-final { cleanup-tree-dump "aprefetch" } } */ >>> Index: gcc/testsuite/gcc.dg/pr53397-2.c >>> =================================================================== >>> --- gcc/testsuite/gcc.dg/pr53397-2.c (revision 0) >>> +++ gcc/testsuite/gcc.dg/pr53397-2.c (revision 0) >>> @@ -0,0 +1,29 @@ >>> +/* Not prefetching when the step is loop variant. */ >>> + >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O3 -fprefetch-loop-arrays >>> +-fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 >>> +--param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } >>> +*/ >>> + >>> + >>> +double data[16384]; >>> +void donot_prefetch_when_non_constant_step_is_variant(int step, int >>> +n) { >>> + int a; >>> + int b; >>> + for (a = 1; a < step; a++,step*=2) { >>> + for (b = 0; b < n; b += 2 * step) { >>> + >>> + int i = 2*(b + a); >>> + int j = 2*(b + a + step); >>> + >>> + >>> + data[j] = data[i]; >>> + data[j+1] = data[i+1]; >>> + } >>> + } >>> +} >>> + >>> +/* { dg-final { scan-tree-dump "Not prefetching" "aprefetch" } } */ >>> +/* { dg-final { scan-tree-dump "loop variant step" "aprefetch" } } >>> +*/ >>> + >>> +/* { dg-final { cleanup-tree-dump "aprefetch" } } */ >>> + >>> Index: gcc/tree-ssa-loop-prefetch.c >>> =================================================================== >>> --- gcc/tree-ssa-loop-prefetch.c (revision 191642) >>> +++ gcc/tree-ssa-loop-prefetch.c (working copy) >>> @@ -523,6 +523,7 @@ >>> tree base, step; >>> HOST_WIDE_INT delta; >>> struct mem_ref_group *agrp; >>> + loop_p ploop; >>> >>> if (get_base_address (ref) == NULL) >>> return false; >>> @@ -537,10 +538,50 @@ >>> if (may_be_nonaddressable_p (base)) >>> return false; >>> >>> - /* Limit non-constant step prefetching only to the innermost loops. >>> */ >>> - if (!cst_and_fits_in_hwi (step) && loop->inner != NULL) >>> - return false; >>> + /* Limit non-constant step prefetching only to the innermost loops and >>> + only when the step is invariant in the entire loop nest. */ if >>> + (!cst_and_fits_in_hwi (step)) >>> + { >>> + if( loop->inner != NULL) >>> + { >>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>> + { >>> + fprintf (dump_file, "Reference %p:\n", (void *) ref); >>> + fprintf (dump_file, "(base " ); >>> + print_generic_expr (dump_file, base, TDF_SLIM); >>> + fprintf (dump_file, ", step "); >>> + print_generic_expr (dump_file, step, TDF_TREE); >>> + fprintf (dump_file, ")\n"); >> >> No need to repeat this - all references are dumped when we gather them. >> >>> + fprintf (dump_file, "Ignoring %p, non-constant step prefetching\ >>> + is limited to inner most loops \n",(void *) ref); >>> + } >>> + return false; >>> + } >>> + else >>> + { >>> + ploop = loop; >>> >>> + while (loop_outer (ploop)) >>> + { >>> + if (!expr_invariant_in_loop_p (ploop , step)) >>> + { >>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>> + { >>> + fprintf (dump_file, "Reference %p:\n", (void *) ref); >>> + fprintf (dump_file, "(base " ); >>> + print_generic_expr (dump_file, base, TDF_SLIM); >>> + fprintf (dump_file, ", step "); >>> + print_generic_expr (dump_file, step, TDF_TREE); >>> + fprintf (dump_file, ")\n"); >> >> Likewise. >> >>> + fprintf (dump_file, "Not prefetching, ignoring %p due to loop variant step\n",(void *) ref); >>> + } >>> + return false; >>> + } >>> + ploop = loop_outer (ploop); >>> + } >> >> Instead of the loop checking expr_invariant_in_loop_p for each sub-loop you can query expr_invariant_in_loop_p once, for the outermost loop. >> >> Please add a new function to cfgloop.h to access this outermost loop: >> >> /* Returns the outermost loop of the loop nest that contains LOOP. */ >> >> static inline struct loop * >> loop_outermost (const struct loop *loop) { >> unsigned n = VEC_length (loop_p, loop->superloops); >> >> if (n <= 1) >> return loop; >> >> return VEC_index (loop_p, loop->superloops, 1); } >> >> Thanks, >> Richard. >> >> >
Hi! On Mon, Oct 08, 2012 at 10:01:53AM +0000, Kumar, Venkataramanan wrote: Both the testcases fail on i686-linux unfortunately: Excess errors: /usr/src/gcc/gcc/testsuite/gcc.dg/pr53397-1.c:1:0: warning: -fprefetch-loop-arrays not supported for this target (try -march switches) [enabled by default] Guess you need to add -msse2 to dg-options (for x86_64 it doesn't change anything, but for i686 it does). Furthermore, as the testcases are limited to x86_64/i686, I wonder why the tests didn't go into gcc/testsuite/gcc.target/i386/ directory instead. > >> --- gcc/testsuite/gcc.dg/pr53397-1.c (revision 0) > >> +++ gcc/testsuite/gcc.dg/pr53397-1.c (revision 0) > >> @@ -0,0 +1,28 @@ > >> +/* Prefetching when the step is loop invariant. */ > >> + > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O3 -fprefetch-loop-arrays -fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 --param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } > >> +*/ > >> + > >> + > >> +double data[16384]; > >> +void prefetch_when_non_constant_step_is_invariant(int step, int n) { > >> + int a; > >> + int b; > >> + for (a = 1; a < step; a++) { > >> + for (b = 0; b < n; b += 2 * step) { > >> + > >> + int i = 2*(b + a); > >> + int j = 2*(b + a + step); > >> + > >> + > >> + data[j] = data[i]; > >> + data[j+1] = data[i+1]; > >> + } > >> + } > >> +} > >> + > >> +/* { dg-final { scan-tree-dump "Issued prefetch" "aprefetch" } } */ > >> +/* { dg-final { scan-assembler "prefetcht0" } } */ > > > > This (and the case below) needs to be adjusted to only run on the appropriate hardware. See for example gcc.dg/tree-ssa/prefetch-8.c for how to do this. > > > >> +/* { dg-final { cleanup-tree-dump "aprefetch" } } */ > >> --- gcc/testsuite/gcc.dg/pr53397-2.c (revision 0) > >> +++ gcc/testsuite/gcc.dg/pr53397-2.c (revision 0) > >> @@ -0,0 +1,29 @@ > >> +/* Not prefetching when the step is loop variant. */ > >> + > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O3 -fprefetch-loop-arrays -fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 --param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } > >> +*/ > >> + > >> + > >> +double data[16384]; > >> +void donot_prefetch_when_non_constant_step_is_variant(int step, int > >> +n) { > >> + int a; > >> + int b; > >> + for (a = 1; a < step; a++,step*=2) { > >> + for (b = 0; b < n; b += 2 * step) { > >> + > >> + int i = 2*(b + a); > >> + int j = 2*(b + a + step); > >> + > >> + > >> + data[j] = data[i]; > >> + data[j+1] = data[i+1]; > >> + } > >> + } > >> +} > >> + > >> +/* { dg-final { scan-tree-dump "Not prefetching" "aprefetch" } } */ > >> +/* { dg-final { scan-tree-dump "loop variant step" "aprefetch" } } > >> +*/ > >> + > >> +/* { dg-final { cleanup-tree-dump "aprefetch" } } */ > >> + Jakub
Index: gcc/testsuite/gcc.dg/pr53397-1.c =================================================================== --- gcc/testsuite/gcc.dg/pr53397-1.c (revision 0) +++ gcc/testsuite/gcc.dg/pr53397-1.c (revision 0) @@ -0,0 +1,28 @@ +/* Prefetching when the step is loop invariant. */ +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-require-effective-target sse2 } */ +/* { dg-options "-O3 -fprefetch-loop-arrays -fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 --param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } */ + + +double data[16384]; +void prefetch_when_non_constant_step_is_invariant(int step, int n) +{ + int a; + int b; + for (a = 1; a < step; a++) { + for (b = 0; b < n; b += 2 * step) { + + int i = 2*(b + a); + int j = 2*(b + a + step); + + + data[j] = data[i]; + data[j+1] = data[i+1]; + } + } +} + +/* { dg-final { scan-tree-dump "Issued prefetch" "aprefetch" } } */ +/* { dg-final { scan-assembler "prefetcht0" } } */ + +/* { dg-final { cleanup-tree-dump "aprefetch" } } */ Index: gcc/testsuite/gcc.dg/pr53397-2.c =================================================================== --- gcc/testsuite/gcc.dg/pr53397-2.c (revision 0) +++ gcc/testsuite/gcc.dg/pr53397-2.c (revision 0) @@ -0,0 +1,28 @@ +/* Not prefetching when the step is loop variant. */ +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-require-effective-target sse2 } */ +/* { dg-options "-O3 -fprefetch-loop-arrays -fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 --param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } */ + +double data[16384]; +void donot_prefetch_when_non_constant_step_is_variant(int step, int n) +{ + int a; + int b; + for (a = 1; a < step; a++,step*=2) { + for (b = 0; b < n; b += 2 * step) { + + int i = 2*(b + a); + int j = 2*(b + a + step); + + + data[j] = data[i]; + data[j+1] = data[i+1]; + } + } +} + +/* { dg-final { scan-tree-dump "Not prefetching" "aprefetch" } } */ +/* { dg-final { scan-tree-dump "loop variant step" "aprefetch" } } */ + +/* { dg-final { cleanup-tree-dump "aprefetch" } } */ + Index: gcc/tree-ssa-loop-prefetch.c =================================================================== --- gcc/tree-ssa-loop-prefetch.c (revision 191642) +++ gcc/tree-ssa-loop-prefetch.c (working copy) @@ -278,29 +278,37 @@ nontemporal one. */ }; -/* Dumps information about reference REF to FILE. */ - +/* Dumps information about memory reference */ static void -dump_mem_ref (FILE *file, struct mem_ref *ref) +dump_mem_details (FILE *file, tree base, tree step, + HOST_WIDE_INT delta, bool write_p) { - fprintf (file, "Reference %p:\n", (void *) ref); - - fprintf (file, " group %p (base ", (void *) ref->group); - print_generic_expr (file, ref->group->base, TDF_SLIM); + fprintf (file, "(base "); + print_generic_expr (file, base, TDF_SLIM); fprintf (file, ", step "); - if (cst_and_fits_in_hwi (ref->group->step)) - fprintf (file, HOST_WIDE_INT_PRINT_DEC, int_cst_value (ref->group->step)); + if (cst_and_fits_in_hwi (step)) + fprintf (file, HOST_WIDE_INT_PRINT_DEC, int_cst_value (step)); else - print_generic_expr (file, ref->group->step, TDF_TREE); + print_generic_expr (file, step, TDF_TREE); fprintf (file, ")\n"); - fprintf (file, " delta "); - fprintf (file, HOST_WIDE_INT_PRINT_DEC, ref->delta); + fprintf (file, HOST_WIDE_INT_PRINT_DEC, delta); fprintf (file, "\n"); + fprintf (file, " %s\n", write_p ? "write" : "read"); + fprintf (file, "\n"); +} - fprintf (file, " %s\n", ref->write_p ? "write" : "read"); +/* Dumps information about reference REF to FILE. */ - fprintf (file, "\n"); +static void +dump_mem_ref (FILE *file, struct mem_ref *ref) +{ + fprintf (file, "Reference %p:\n", (void *) ref); + + fprintf (file, " group %p ", (void *) ref->group); + + dump_mem_details (file, ref->group->base, ref->group->step, ref->delta, + ref->write_p); } /* Finds a group with BASE and STEP in GROUPS, or creates one if it does not @@ -537,9 +545,44 @@ if (may_be_nonaddressable_p (base)) return false; - /* Limit non-constant step prefetching only to the innermost loops. */ - if (!cst_and_fits_in_hwi (step) && loop->inner != NULL) - return false; + /* Limit non-constant step prefetching only to the innermost loops and + only when the step is loop invariant in the entire loop nest. */ + if (!cst_and_fits_in_hwi (step)) + { + if (loop->inner != NULL) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Memory expression %p\n",(void *) ref ); + print_generic_expr (dump_file, ref, TDF_TREE); + fprintf (dump_file,":"); + dump_mem_details( dump_file, base, step, delta, write_p); + fprintf (dump_file, + "Ignoring %p, non-constant step prefetching is " + "limited to inner most loops \n", + (void *) ref); + } + return false; + } + else + { + if (!expr_invariant_in_loop_p (loop_outermost (loop), step)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Memory expression %p\n",(void *) ref ); + print_generic_expr (dump_file, ref, TDF_TREE); + fprintf (dump_file,":"); + dump_mem_details(dump_file, base, step, delta, write_p); + fprintf (dump_file, + "Not prefetching, ignoring %p due to " + "loop variant step\n", + (void *) ref); + } + return false; + } + } + } /* Now we know that REF = &BASE + STEP * iter + DELTA, where DELTA and STEP are integer constants. */ Index: gcc/cfgloop.h =================================================================== --- gcc/cfgloop.h (revision 191642) +++ gcc/cfgloop.h (working copy) @@ -714,4 +714,18 @@ extern void move_loop_invariants (void); extern bool finite_loop_p (struct loop *); +/* Returns the outermost loop of the loop nest that contains LOOP.*/ +static inline struct loop * +loop_outermost (struct loop *loop) +{ + + unsigned n = VEC_length (loop_p, loop->superloops); + + if (n <= 1) + return loop; + + return VEC_index (loop_p, loop->superloops, 1); +} + + #endif /* GCC_CFGLOOP_H */