Patchwork Fix PR53397

login
register
mail settings
Submitter venkataramanan.kumar@amd.com
Date Oct. 8, 2012, 10:01 a.m.
Message ID <7794A52CE4D579448B959EED7DD0A472138F1C2A@sausexdag03.amd.com>
Download mbox | patch
Permalink /patch/189968/
State New
Headers show

Comments

venkataramanan.kumar@amd.com - Oct. 8, 2012, 10:01 a.m.
Hi Richard,

I have incorporated your comments. 

> Yes, call dump_mem_ref then, instead of repeating parts of its body.

Reference object  is not yet created at the place we check for invariance. It is still a tree expression.  I created a common function and used at all places to dump the "step", "base" and "delta" values of  memory reference being analyzed.

Please find the modified patch attached.

GCC regression "make check -k" passes with x86_64-unknown-linux-gnu.

Regards,
Venkat.

-----Original Message-----
From: Richard Guenther [mailto:richard.guenther@gmail.com] 
Sent: Thursday, October 04, 2012 6:26 PM
To: Kumar, Venkataramanan
Cc: Richard Guenther; gcc-patches@gcc.gnu.org
Subject: Re: [Patch] Fix PR53397

On Tue, Oct 2, 2012 at 6:40 PM, Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> wrote:
> Hi Richi,
>
> (Snip)
>> + (!cst_and_fits_in_hwi (step))
>> +    {
>> +      if( loop->inner != NULL)
>> +        {
>> +          if (dump_file && (dump_flags & TDF_DETAILS))
>> +            {
>> +              fprintf (dump_file, "Reference %p:\n", (void *) ref);
>> +              fprintf (dump_file, "(base " );
>> +              print_generic_expr (dump_file, base, TDF_SLIM);
>> +              fprintf (dump_file, ", step ");
>> +              print_generic_expr (dump_file, step, TDF_TREE);
>> +              fprintf (dump_file, ")\n");
>
> No need to repeat this - all references are dumped when we gather them.
> (Snip)
>
> The dumping happens at "record_ref" which is called after these statements to record these references.
>
> When the step is invariant  we return from the function without recording the references.
>
>  so I thought of dumping the references here.
>
> Is there a cleaner way to dump the references at one place?

Yes, call dump_mem_ref then, instead of repeating parts of its body.

Richard.

> Regards,
> Venkat.
>
>
>
> -----Original Message-----
> From: Richard Guenther [mailto:rguenther@suse.de]
> Sent: Tuesday, October 02, 2012 5:42 PM
> To: Kumar, Venkataramanan
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [Patch] Fix PR53397
>
> On Mon, 1 Oct 2012, venkataramanan.kumar@amd.com wrote:
>
>> Hi,
>>
>> The below patch fixes the FFT/Scimark regression caused by useless 
>> prefetch generation.
>>
>> This fix tries to make prefetch less aggressive by prefetching arrays 
>> in the inner loop, when the step is invariant in the entire loop nest.
>>
>> GCC currently tries to prefetch invariant steps when they are in the 
>> inner loop. But does not check if the step is variant in outer loops.
>>
>> In the scimark FFT case, the trip count of the inner loop varies by a 
>> non constant step, which is invariant in the inner loop.
>> But the step variable is varying in outer loop. This makes inner loop 
>> trip count small (at run time varies sometimes as small as 1
>> iteration)
>>
>> Prefetching ahead x iteration when the inner loop trip count is 
>> smaller than x leads to useless prefetches.
>>
>> Flag used: -O3 -march=amdfam10
>>
>> Before
>> **                                                              **
>> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>> ** for details. (Results can be submitted to pozo@nist.gov)     **
>> **                                                              **
>> Using       2.00 seconds min time per kenel.
>> Composite Score:          550.50
>> FFT             Mflops:    38.66    (N=1024)
>> SOR             Mflops:   617.61    (100 x 100)
>> MonteCarlo:     Mflops:   173.74
>> Sparse matmult  Mflops:   675.63    (N=1000, nz=5000)
>> LU              Mflops:  1246.88    (M=100, N=100)
>>
>>
>> After
>> **                                                              **
>> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>> ** for details. (Results can be submitted to pozo@nist.gov)     **
>> **                                                              **
>> Using       2.00 seconds min time per kenel.
>> Composite Score:          639.20
>> FFT             Mflops:   479.19    (N=1024)
>> SOR             Mflops:   617.61    (100 x 100)
>> MonteCarlo:     Mflops:   173.18
>> Sparse matmult  Mflops:   679.13    (N=1000, nz=5000)
>> LU              Mflops:  1246.88    (M=100, N=100)
>>
>> GCC regression "make check -k" passes with x86_64-unknown-linux-gnu 
>> New tests that PASS:
>>
>> gcc.dg/pr53397-1.c scan-assembler prefetcht0 gcc.dg/pr53397-1.c 
>> scan-tree-dump aprefetch "Issued prefetch"
>> gcc.dg/pr53397-1.c (test for excess errors) gcc.dg/pr53397-2.c 
>> scan-tree-dump aprefetch "loop variant step"
>> gcc.dg/pr53397-2.c scan-tree-dump aprefetch "Not prefetching"
>> gcc.dg/pr53397-2.c (test for excess errors)
>>
>>
>> Checked CPU2006 and polyhedron on latest AMD processor, no regressions noted.
>>
>> Ok to commit in trunk?
>>
>> regards,
>> Venkat
>>
>> gcc/ChangeLog
>> +2012-10-01  Venkataramanan Kumar  <venkataramanan.kumar@amd.com>
>> +
>> +   * tree-ssa-loop-prefetch.c (gather_memory_references_ref):$
>> +   Perform non constant step prefetching in inner loop, only $
>> +   when it is invariant in the entire loop nest.  $
>> +   * testsuite/gcc.dg/pr53397-1.c: New test case $
>> +   Checks we are prefecthing for loop invariant steps$
>> +   * testsuite/gcc.dg/pr53397-2.c: New test case$
>> +   Checks we are not prefecthing for loop variant steps
>> +
>>
>>
>> Index: gcc/testsuite/gcc.dg/pr53397-1.c 
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/pr53397-1.c  (revision 0)
>> +++ gcc/testsuite/gcc.dg/pr53397-1.c  (revision 0)
>> @@ -0,0 +1,28 @@
>> +/* Prefetching when the step is loop invariant.  */
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3 -fprefetch-loop-arrays 
>> +-fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 
>> +--param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } 
>> +*/
>> +
>> +
>> +double data[16384];
>> +void prefetch_when_non_constant_step_is_invariant(int step, int n) {
>> +     int a;
>> +     int b;
>> +     for (a = 1; a < step; a++) {
>> +        for (b = 0; b < n; b += 2 * step) {
>> +
>> +          int i = 2*(b + a);
>> +          int j = 2*(b + a + step);
>> +
>> +
>> +          data[j]   = data[i];
>> +          data[j+1] = data[i+1];
>> +        }
>> +     }
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "Issued prefetch" "aprefetch" } } */
>> +/* { dg-final { scan-assembler "prefetcht0" } } */
>
> This (and the case below) needs to be adjusted to only run on the appropriate hardware.  See for example gcc.dg/tree-ssa/prefetch-8.c for how to do this.
>
>> +/* { dg-final { cleanup-tree-dump "aprefetch" } } */
>> Index: gcc/testsuite/gcc.dg/pr53397-2.c 
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/pr53397-2.c  (revision 0)
>> +++ gcc/testsuite/gcc.dg/pr53397-2.c  (revision 0)
>> @@ -0,0 +1,29 @@
>> +/* Not prefetching when the step is loop variant.  */
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3 -fprefetch-loop-arrays 
>> +-fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 
>> +--param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } 
>> +*/
>> +
>> +
>> +double data[16384];
>> +void donot_prefetch_when_non_constant_step_is_variant(int step, int
>> +n) {
>> +     int a;
>> +     int b;
>> +     for (a = 1; a < step; a++,step*=2) {
>> +        for (b = 0; b < n; b += 2 * step) {
>> +
>> +          int i = 2*(b + a);
>> +          int j = 2*(b + a + step);
>> +
>> +
>> +          data[j]   = data[i];
>> +          data[j+1] = data[i+1];
>> +        }
>> +     }
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "Not prefetching" "aprefetch" } } */
>> +/* { dg-final { scan-tree-dump "loop variant step" "aprefetch" } } 
>> +*/
>> +
>> +/* { dg-final { cleanup-tree-dump "aprefetch" } } */
>> +
>> Index: gcc/tree-ssa-loop-prefetch.c
>> ===================================================================
>> --- gcc/tree-ssa-loop-prefetch.c      (revision 191642)
>> +++ gcc/tree-ssa-loop-prefetch.c      (working copy)
>> @@ -523,6 +523,7 @@
>>    tree base, step;
>>    HOST_WIDE_INT delta;
>>    struct mem_ref_group *agrp;
>> +  loop_p ploop;
>>
>>    if (get_base_address (ref) == NULL)
>>      return false;
>> @@ -537,10 +538,50 @@
>>    if (may_be_nonaddressable_p (base))
>>      return false;
>>
>> -  /* Limit non-constant step prefetching only to the innermost loops.
>> */
>> -  if (!cst_and_fits_in_hwi (step) && loop->inner != NULL)
>> -    return false;
>> +  /* Limit non-constant step prefetching only to the innermost loops and
>> +     only when the step is invariant in the entire loop nest. */  if 
>> + (!cst_and_fits_in_hwi (step))
>> +    {
>> +      if( loop->inner != NULL)
>> +        {
>> +          if (dump_file && (dump_flags & TDF_DETAILS))
>> +            {
>> +              fprintf (dump_file, "Reference %p:\n", (void *) ref);
>> +              fprintf (dump_file, "(base " );
>> +              print_generic_expr (dump_file, base, TDF_SLIM);
>> +              fprintf (dump_file, ", step ");
>> +              print_generic_expr (dump_file, step, TDF_TREE);
>> +              fprintf (dump_file, ")\n");
>
> No need to repeat this - all references are dumped when we gather them.
>
>> +              fprintf (dump_file, "Ignoring %p, non-constant step prefetching\
>> +                is limited to inner most loops \n",(void *) ref);
>> +            }
>> +            return false;
>> +         }
>> +      else
>> +        {
>> +          ploop = loop;
>>
>> +          while (loop_outer (ploop))
>> +            {
>> +              if (!expr_invariant_in_loop_p (ploop , step))
>> +                {
>> +                  if (dump_file && (dump_flags & TDF_DETAILS))
>> +                    {
>> +                      fprintf (dump_file, "Reference %p:\n", (void *) ref);
>> +                      fprintf (dump_file, "(base " );
>> +                      print_generic_expr (dump_file, base, TDF_SLIM);
>> +                      fprintf (dump_file, ", step ");
>> +                      print_generic_expr (dump_file, step, TDF_TREE);
>> +                      fprintf (dump_file, ")\n");
>
> Likewise.
>
>> +                      fprintf (dump_file, "Not prefetching, ignoring %p due to loop variant step\n",(void *) ref);
>> +                    }
>> +                      return false;
>> +                }
>> +                ploop = loop_outer (ploop);
>> +            }
>
> Instead of the loop checking expr_invariant_in_loop_p for each sub-loop you can query expr_invariant_in_loop_p once, for the outermost loop.
>
> Please add a new function to cfgloop.h to access this outermost loop:
>
> /* Returns the outermost loop of the loop nest that contains LOOP.  */
>
> static inline struct loop *
> loop_outermost (const struct loop *loop) {
>   unsigned n = VEC_length (loop_p, loop->superloops);
>
>   if (n <= 1)
>     return loop;
>
>   return VEC_index (loop_p, loop->superloops, 1); }
>
> Thanks,
> Richard.
>
>
Richard Guenther - Oct. 8, 2012, 10:16 a.m.
On Mon, Oct 8, 2012 at 12:01 PM, Kumar, Venkataramanan
<Venkataramanan.Kumar@amd.com> wrote:
> Hi Richard,
>
> I have incorporated your comments.
>
>> Yes, call dump_mem_ref then, instead of repeating parts of its body.
>
> Reference object  is not yet created at the place we check for invariance. It is still a tree expression.  I created a common function and used at all places to dump the "step", "base" and "delta" values of  memory reference being analyzed.
>
> Please find the modified patch attached.
>
> GCC regression "make check -k" passes with x86_64-unknown-linux-gnu.

I presume also bootstrapped.

Ok.

Thanks,
Richard.

> Regards,
> Venkat.
>
> -----Original Message-----
> From: Richard Guenther [mailto:richard.guenther@gmail.com]
> Sent: Thursday, October 04, 2012 6:26 PM
> To: Kumar, Venkataramanan
> Cc: Richard Guenther; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch] Fix PR53397
>
> On Tue, Oct 2, 2012 at 6:40 PM, Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> wrote:
>> Hi Richi,
>>
>> (Snip)
>>> + (!cst_and_fits_in_hwi (step))
>>> +    {
>>> +      if( loop->inner != NULL)
>>> +        {
>>> +          if (dump_file && (dump_flags & TDF_DETAILS))
>>> +            {
>>> +              fprintf (dump_file, "Reference %p:\n", (void *) ref);
>>> +              fprintf (dump_file, "(base " );
>>> +              print_generic_expr (dump_file, base, TDF_SLIM);
>>> +              fprintf (dump_file, ", step ");
>>> +              print_generic_expr (dump_file, step, TDF_TREE);
>>> +              fprintf (dump_file, ")\n");
>>
>> No need to repeat this - all references are dumped when we gather them.
>> (Snip)
>>
>> The dumping happens at "record_ref" which is called after these statements to record these references.
>>
>> When the step is invariant  we return from the function without recording the references.
>>
>>  so I thought of dumping the references here.
>>
>> Is there a cleaner way to dump the references at one place?
>
> Yes, call dump_mem_ref then, instead of repeating parts of its body.
>
> Richard.
>
>> Regards,
>> Venkat.
>>
>>
>>
>> -----Original Message-----
>> From: Richard Guenther [mailto:rguenther@suse.de]
>> Sent: Tuesday, October 02, 2012 5:42 PM
>> To: Kumar, Venkataramanan
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [Patch] Fix PR53397
>>
>> On Mon, 1 Oct 2012, venkataramanan.kumar@amd.com wrote:
>>
>>> Hi,
>>>
>>> The below patch fixes the FFT/Scimark regression caused by useless
>>> prefetch generation.
>>>
>>> This fix tries to make prefetch less aggressive by prefetching arrays
>>> in the inner loop, when the step is invariant in the entire loop nest.
>>>
>>> GCC currently tries to prefetch invariant steps when they are in the
>>> inner loop. But does not check if the step is variant in outer loops.
>>>
>>> In the scimark FFT case, the trip count of the inner loop varies by a
>>> non constant step, which is invariant in the inner loop.
>>> But the step variable is varying in outer loop. This makes inner loop
>>> trip count small (at run time varies sometimes as small as 1
>>> iteration)
>>>
>>> Prefetching ahead x iteration when the inner loop trip count is
>>> smaller than x leads to useless prefetches.
>>>
>>> Flag used: -O3 -march=amdfam10
>>>
>>> Before
>>> **                                                              **
>>> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>>> ** for details. (Results can be submitted to pozo@nist.gov)     **
>>> **                                                              **
>>> Using       2.00 seconds min time per kenel.
>>> Composite Score:          550.50
>>> FFT             Mflops:    38.66    (N=1024)
>>> SOR             Mflops:   617.61    (100 x 100)
>>> MonteCarlo:     Mflops:   173.74
>>> Sparse matmult  Mflops:   675.63    (N=1000, nz=5000)
>>> LU              Mflops:  1246.88    (M=100, N=100)
>>>
>>>
>>> After
>>> **                                                              **
>>> ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
>>> ** for details. (Results can be submitted to pozo@nist.gov)     **
>>> **                                                              **
>>> Using       2.00 seconds min time per kenel.
>>> Composite Score:          639.20
>>> FFT             Mflops:   479.19    (N=1024)
>>> SOR             Mflops:   617.61    (100 x 100)
>>> MonteCarlo:     Mflops:   173.18
>>> Sparse matmult  Mflops:   679.13    (N=1000, nz=5000)
>>> LU              Mflops:  1246.88    (M=100, N=100)
>>>
>>> GCC regression "make check -k" passes with x86_64-unknown-linux-gnu
>>> New tests that PASS:
>>>
>>> gcc.dg/pr53397-1.c scan-assembler prefetcht0 gcc.dg/pr53397-1.c
>>> scan-tree-dump aprefetch "Issued prefetch"
>>> gcc.dg/pr53397-1.c (test for excess errors) gcc.dg/pr53397-2.c
>>> scan-tree-dump aprefetch "loop variant step"
>>> gcc.dg/pr53397-2.c scan-tree-dump aprefetch "Not prefetching"
>>> gcc.dg/pr53397-2.c (test for excess errors)
>>>
>>>
>>> Checked CPU2006 and polyhedron on latest AMD processor, no regressions noted.
>>>
>>> Ok to commit in trunk?
>>>
>>> regards,
>>> Venkat
>>>
>>> gcc/ChangeLog
>>> +2012-10-01  Venkataramanan Kumar  <venkataramanan.kumar@amd.com>
>>> +
>>> +   * tree-ssa-loop-prefetch.c (gather_memory_references_ref):$
>>> +   Perform non constant step prefetching in inner loop, only $
>>> +   when it is invariant in the entire loop nest.  $
>>> +   * testsuite/gcc.dg/pr53397-1.c: New test case $
>>> +   Checks we are prefecthing for loop invariant steps$
>>> +   * testsuite/gcc.dg/pr53397-2.c: New test case$
>>> +   Checks we are not prefecthing for loop variant steps
>>> +
>>>
>>>
>>> Index: gcc/testsuite/gcc.dg/pr53397-1.c
>>> ===================================================================
>>> --- gcc/testsuite/gcc.dg/pr53397-1.c  (revision 0)
>>> +++ gcc/testsuite/gcc.dg/pr53397-1.c  (revision 0)
>>> @@ -0,0 +1,28 @@
>>> +/* Prefetching when the step is loop invariant.  */
>>> +
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3 -fprefetch-loop-arrays
>>> +-fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3
>>> +--param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" }
>>> +*/
>>> +
>>> +
>>> +double data[16384];
>>> +void prefetch_when_non_constant_step_is_invariant(int step, int n) {
>>> +     int a;
>>> +     int b;
>>> +     for (a = 1; a < step; a++) {
>>> +        for (b = 0; b < n; b += 2 * step) {
>>> +
>>> +          int i = 2*(b + a);
>>> +          int j = 2*(b + a + step);
>>> +
>>> +
>>> +          data[j]   = data[i];
>>> +          data[j+1] = data[i+1];
>>> +        }
>>> +     }
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "Issued prefetch" "aprefetch" } } */
>>> +/* { dg-final { scan-assembler "prefetcht0" } } */
>>
>> This (and the case below) needs to be adjusted to only run on the appropriate hardware.  See for example gcc.dg/tree-ssa/prefetch-8.c for how to do this.
>>
>>> +/* { dg-final { cleanup-tree-dump "aprefetch" } } */
>>> Index: gcc/testsuite/gcc.dg/pr53397-2.c
>>> ===================================================================
>>> --- gcc/testsuite/gcc.dg/pr53397-2.c  (revision 0)
>>> +++ gcc/testsuite/gcc.dg/pr53397-2.c  (revision 0)
>>> @@ -0,0 +1,29 @@
>>> +/* Not prefetching when the step is loop variant.  */
>>> +
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3 -fprefetch-loop-arrays
>>> +-fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3
>>> +--param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" }
>>> +*/
>>> +
>>> +
>>> +double data[16384];
>>> +void donot_prefetch_when_non_constant_step_is_variant(int step, int
>>> +n) {
>>> +     int a;
>>> +     int b;
>>> +     for (a = 1; a < step; a++,step*=2) {
>>> +        for (b = 0; b < n; b += 2 * step) {
>>> +
>>> +          int i = 2*(b + a);
>>> +          int j = 2*(b + a + step);
>>> +
>>> +
>>> +          data[j]   = data[i];
>>> +          data[j+1] = data[i+1];
>>> +        }
>>> +     }
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "Not prefetching" "aprefetch" } } */
>>> +/* { dg-final { scan-tree-dump "loop variant step" "aprefetch" } }
>>> +*/
>>> +
>>> +/* { dg-final { cleanup-tree-dump "aprefetch" } } */
>>> +
>>> Index: gcc/tree-ssa-loop-prefetch.c
>>> ===================================================================
>>> --- gcc/tree-ssa-loop-prefetch.c      (revision 191642)
>>> +++ gcc/tree-ssa-loop-prefetch.c      (working copy)
>>> @@ -523,6 +523,7 @@
>>>    tree base, step;
>>>    HOST_WIDE_INT delta;
>>>    struct mem_ref_group *agrp;
>>> +  loop_p ploop;
>>>
>>>    if (get_base_address (ref) == NULL)
>>>      return false;
>>> @@ -537,10 +538,50 @@
>>>    if (may_be_nonaddressable_p (base))
>>>      return false;
>>>
>>> -  /* Limit non-constant step prefetching only to the innermost loops.
>>> */
>>> -  if (!cst_and_fits_in_hwi (step) && loop->inner != NULL)
>>> -    return false;
>>> +  /* Limit non-constant step prefetching only to the innermost loops and
>>> +     only when the step is invariant in the entire loop nest. */  if
>>> + (!cst_and_fits_in_hwi (step))
>>> +    {
>>> +      if( loop->inner != NULL)
>>> +        {
>>> +          if (dump_file && (dump_flags & TDF_DETAILS))
>>> +            {
>>> +              fprintf (dump_file, "Reference %p:\n", (void *) ref);
>>> +              fprintf (dump_file, "(base " );
>>> +              print_generic_expr (dump_file, base, TDF_SLIM);
>>> +              fprintf (dump_file, ", step ");
>>> +              print_generic_expr (dump_file, step, TDF_TREE);
>>> +              fprintf (dump_file, ")\n");
>>
>> No need to repeat this - all references are dumped when we gather them.
>>
>>> +              fprintf (dump_file, "Ignoring %p, non-constant step prefetching\
>>> +                is limited to inner most loops \n",(void *) ref);
>>> +            }
>>> +            return false;
>>> +         }
>>> +      else
>>> +        {
>>> +          ploop = loop;
>>>
>>> +          while (loop_outer (ploop))
>>> +            {
>>> +              if (!expr_invariant_in_loop_p (ploop , step))
>>> +                {
>>> +                  if (dump_file && (dump_flags & TDF_DETAILS))
>>> +                    {
>>> +                      fprintf (dump_file, "Reference %p:\n", (void *) ref);
>>> +                      fprintf (dump_file, "(base " );
>>> +                      print_generic_expr (dump_file, base, TDF_SLIM);
>>> +                      fprintf (dump_file, ", step ");
>>> +                      print_generic_expr (dump_file, step, TDF_TREE);
>>> +                      fprintf (dump_file, ")\n");
>>
>> Likewise.
>>
>>> +                      fprintf (dump_file, "Not prefetching, ignoring %p due to loop variant step\n",(void *) ref);
>>> +                    }
>>> +                      return false;
>>> +                }
>>> +                ploop = loop_outer (ploop);
>>> +            }
>>
>> Instead of the loop checking expr_invariant_in_loop_p for each sub-loop you can query expr_invariant_in_loop_p once, for the outermost loop.
>>
>> Please add a new function to cfgloop.h to access this outermost loop:
>>
>> /* Returns the outermost loop of the loop nest that contains LOOP.  */
>>
>> static inline struct loop *
>> loop_outermost (const struct loop *loop) {
>>   unsigned n = VEC_length (loop_p, loop->superloops);
>>
>>   if (n <= 1)
>>     return loop;
>>
>>   return VEC_index (loop_p, loop->superloops, 1); }
>>
>> Thanks,
>> Richard.
>>
>>
>
Jakub Jelinek - Oct. 10, 2012, 6:35 a.m.
Hi!

On Mon, Oct 08, 2012 at 10:01:53AM +0000, Kumar, Venkataramanan wrote:

Both the testcases fail on i686-linux unfortunately:
Excess errors:
/usr/src/gcc/gcc/testsuite/gcc.dg/pr53397-1.c:1:0: warning: -fprefetch-loop-arrays not supported for this target (try -march switches) [enabled by default]

Guess you need to add -msse2 to dg-options (for x86_64 it doesn't
change anything, but for i686 it does).  Furthermore, as the testcases
are limited to x86_64/i686, I wonder why the tests didn't go into
gcc/testsuite/gcc.target/i386/ directory instead.

> >> --- gcc/testsuite/gcc.dg/pr53397-1.c  (revision 0)
> >> +++ gcc/testsuite/gcc.dg/pr53397-1.c  (revision 0)
> >> @@ -0,0 +1,28 @@
> >> +/* Prefetching when the step is loop invariant.  */
> >> +
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O3 -fprefetch-loop-arrays -fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 --param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } 
> >> +*/
> >> +
> >> +
> >> +double data[16384];
> >> +void prefetch_when_non_constant_step_is_invariant(int step, int n) {
> >> +     int a;
> >> +     int b;
> >> +     for (a = 1; a < step; a++) {
> >> +        for (b = 0; b < n; b += 2 * step) {
> >> +
> >> +          int i = 2*(b + a);
> >> +          int j = 2*(b + a + step);
> >> +
> >> +
> >> +          data[j]   = data[i];
> >> +          data[j+1] = data[i+1];
> >> +        }
> >> +     }
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump "Issued prefetch" "aprefetch" } } */
> >> +/* { dg-final { scan-assembler "prefetcht0" } } */
> >
> > This (and the case below) needs to be adjusted to only run on the appropriate hardware.  See for example gcc.dg/tree-ssa/prefetch-8.c for how to do this.
> >
> >> +/* { dg-final { cleanup-tree-dump "aprefetch" } } */
> >> --- gcc/testsuite/gcc.dg/pr53397-2.c  (revision 0)
> >> +++ gcc/testsuite/gcc.dg/pr53397-2.c  (revision 0)
> >> @@ -0,0 +1,29 @@
> >> +/* Not prefetching when the step is loop variant.  */
> >> +
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O3 -fprefetch-loop-arrays -fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 --param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } 
> >> +*/
> >> +
> >> +
> >> +double data[16384];
> >> +void donot_prefetch_when_non_constant_step_is_variant(int step, int
> >> +n) {
> >> +     int a;
> >> +     int b;
> >> +     for (a = 1; a < step; a++,step*=2) {
> >> +        for (b = 0; b < n; b += 2 * step) {
> >> +
> >> +          int i = 2*(b + a);
> >> +          int j = 2*(b + a + step);
> >> +
> >> +
> >> +          data[j]   = data[i];
> >> +          data[j+1] = data[i+1];
> >> +        }
> >> +     }
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump "Not prefetching" "aprefetch" } } */
> >> +/* { dg-final { scan-tree-dump "loop variant step" "aprefetch" } } 
> >> +*/
> >> +
> >> +/* { dg-final { cleanup-tree-dump "aprefetch" } } */
> >> +

	Jakub

Patch

Index: gcc/testsuite/gcc.dg/pr53397-1.c
===================================================================
--- gcc/testsuite/gcc.dg/pr53397-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr53397-1.c	(revision 0)
@@ -0,0 +1,28 @@ 
+/* Prefetching when the step is loop invariant.  */
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-options "-O3 -fprefetch-loop-arrays -fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 --param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } */
+
+
+double data[16384];
+void prefetch_when_non_constant_step_is_invariant(int step, int n)
+{
+     int a;
+     int b;
+     for (a = 1; a < step; a++) {
+        for (b = 0; b < n; b += 2 * step) {
+
+          int i = 2*(b + a);
+          int j = 2*(b + a + step);
+
+
+          data[j]   = data[i];
+          data[j+1] = data[i+1];
+        }
+     }
+}
+
+/* { dg-final { scan-tree-dump "Issued prefetch" "aprefetch" } } */
+/* { dg-final { scan-assembler "prefetcht0" } } */
+
+/* { dg-final { cleanup-tree-dump "aprefetch" } } */
Index: gcc/testsuite/gcc.dg/pr53397-2.c
===================================================================
--- gcc/testsuite/gcc.dg/pr53397-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr53397-2.c	(revision 0)
@@ -0,0 +1,28 @@ 
+/* Not prefetching when the step is loop variant.  */
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-options "-O3 -fprefetch-loop-arrays -fdump-tree-aprefetch-details --param min-insn-to-prefetch-ratio=3 --param simultaneous-prefetches=10 -fdump-tree-aprefetch-details" } */
+
+double data[16384];
+void donot_prefetch_when_non_constant_step_is_variant(int step, int n)
+{ 
+     int a;
+     int b;
+     for (a = 1; a < step; a++,step*=2) {
+        for (b = 0; b < n; b += 2 * step) {
+
+          int i = 2*(b + a);
+          int j = 2*(b + a + step);
+
+
+          data[j]   = data[i];
+          data[j+1] = data[i+1];
+        }
+     } 
+}
+
+/* { dg-final { scan-tree-dump "Not prefetching" "aprefetch" } } */
+/* { dg-final { scan-tree-dump "loop variant step" "aprefetch" } } */
+
+/* { dg-final { cleanup-tree-dump "aprefetch" } } */
+
Index: gcc/tree-ssa-loop-prefetch.c
===================================================================
--- gcc/tree-ssa-loop-prefetch.c	(revision 191642)
+++ gcc/tree-ssa-loop-prefetch.c	(working copy)
@@ -278,29 +278,37 @@ 
 				   nontemporal one.  */
 };
 
-/* Dumps information about reference REF to FILE.  */
-
+/* Dumps information about memory reference */
 static void
-dump_mem_ref (FILE *file, struct mem_ref *ref)
+dump_mem_details (FILE *file, tree base, tree step,
+	    HOST_WIDE_INT delta, bool write_p) 
 {
-  fprintf (file, "Reference %p:\n", (void *) ref);
-
-  fprintf (file, "  group %p (base ", (void *) ref->group);
-  print_generic_expr (file, ref->group->base, TDF_SLIM);
+  fprintf (file, "(base ");
+  print_generic_expr (file, base, TDF_SLIM);
   fprintf (file, ", step ");
-  if (cst_and_fits_in_hwi (ref->group->step))
-    fprintf (file, HOST_WIDE_INT_PRINT_DEC, int_cst_value (ref->group->step));
+  if (cst_and_fits_in_hwi (step))
+    fprintf (file, HOST_WIDE_INT_PRINT_DEC, int_cst_value (step));
   else
-    print_generic_expr (file, ref->group->step, TDF_TREE);
+    print_generic_expr (file, step, TDF_TREE);
   fprintf (file, ")\n");
-
   fprintf (file, "  delta ");
-  fprintf (file, HOST_WIDE_INT_PRINT_DEC, ref->delta);
+  fprintf (file, HOST_WIDE_INT_PRINT_DEC, delta);
   fprintf (file, "\n");
+  fprintf (file, "  %s\n", write_p ? "write" : "read");
+  fprintf (file, "\n");
+}
 
-  fprintf (file, "  %s\n", ref->write_p ? "write" : "read");
+/* Dumps information about reference REF to FILE.  */
 
-  fprintf (file, "\n");
+static void
+dump_mem_ref (FILE *file, struct mem_ref *ref)
+{
+  fprintf (file, "Reference %p:\n", (void *) ref);
+
+  fprintf (file, "  group %p ", (void *) ref->group);
+
+  dump_mem_details (file, ref->group->base, ref->group->step, ref->delta,
+                   ref->write_p);
 }
 
 /* Finds a group with BASE and STEP in GROUPS, or creates one if it does not
@@ -537,9 +545,44 @@ 
   if (may_be_nonaddressable_p (base))
     return false;
 
-  /* Limit non-constant step prefetching only to the innermost loops.  */
-  if (!cst_and_fits_in_hwi (step) && loop->inner != NULL)
-    return false;
+  /* Limit non-constant step prefetching only to the innermost loops and 
+     only when the step is loop invariant in the entire loop nest. */
+  if (!cst_and_fits_in_hwi (step))
+    {
+      if (loop->inner != NULL)
+        {
+          if (dump_file && (dump_flags & TDF_DETAILS))
+            {
+              fprintf (dump_file, "Memory expression %p\n",(void *) ref ); 
+              print_generic_expr (dump_file, ref, TDF_TREE); 
+              fprintf (dump_file,":");
+              dump_mem_details( dump_file, base, step, delta, write_p);              
+              fprintf (dump_file, 
+                       "Ignoring %p, non-constant step prefetching is "
+                       "limited to inner most loops \n", 
+                       (void *) ref);
+            }
+            return false;    
+         }
+      else
+        {
+          if (!expr_invariant_in_loop_p (loop_outermost (loop), step))
+          {
+            if (dump_file && (dump_flags & TDF_DETAILS))
+              {
+                fprintf (dump_file, "Memory expression %p\n",(void *) ref );
+                print_generic_expr (dump_file, ref, TDF_TREE);
+                fprintf (dump_file,":");
+                dump_mem_details(dump_file, base, step, delta, write_p);
+                fprintf (dump_file, 
+                         "Not prefetching, ignoring %p due to "
+                         "loop variant step\n",
+                         (void *) ref);
+              }
+              return false;                 
+            }
+        }
+    }
 
   /* Now we know that REF = &BASE + STEP * iter + DELTA, where DELTA and STEP
      are integer constants.  */
Index: gcc/cfgloop.h
===================================================================
--- gcc/cfgloop.h	(revision 191642)
+++ gcc/cfgloop.h	(working copy)
@@ -714,4 +714,18 @@ 
 extern void move_loop_invariants (void);
 extern bool finite_loop_p (struct loop *);
 
+/* Returns the outermost loop of the loop nest that contains LOOP.*/
+static inline struct loop *
+loop_outermost (struct loop *loop)
+{
+  
+  unsigned n = VEC_length (loop_p, loop->superloops);
+
+  if (n <= 1)
+    return loop;
+
+  return VEC_index (loop_p, loop->superloops, 1);
+}
+
+
 #endif /* GCC_CFGLOOP_H */