Message ID | 20240508075457.1243150-1-stefansf@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | tree-ssa-loop-prefetch.cc: Honour -fno-unroll-loops | expand |
On Wed, May 8, 2024 at 9:56 AM Stefan Schulze Frielinghaus <stefansf@linux.ibm.com> wrote: > > On s390 the following tests fail > > FAIL: gcc.dg/vect/pr109011-1.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = .CLZ \\\\(vect" 1 > FAIL: gcc.dg/vect/pr109011-1.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = .POPCOUNT \\\\(vect" 1 > FAIL: gcc.dg/vect/pr109011-1.c scan-tree-dump-times optimized " = .CLZ \\\\(vect" 1 > FAIL: gcc.dg/vect/pr109011-1.c scan-tree-dump-times optimized " = .POPCOUNT \\\\(vect" 1 > FAIL: gcc.dg/vect/pr109011-2.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = .CTZ \\\\(vect" 2 > FAIL: gcc.dg/vect/pr109011-2.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = .POPCOUNT \\\\(vect" 1 > FAIL: gcc.dg/vect/pr109011-2.c scan-tree-dump-times optimized " = .CTZ \\\\(vect" 2 > FAIL: gcc.dg/vect/pr109011-2.c scan-tree-dump-times optimized " = .POPCOUNT \\\\(vect" 1 > FAIL: gcc.dg/vect/pr109011-4.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = .CTZ \\\\(vect" 2 > FAIL: gcc.dg/vect/pr109011-4.c -flto -ffat-lto-objects scan-tree-dump-times optimized " = .POPCOUNT \\\\(vect" 1 > FAIL: gcc.dg/vect/pr109011-4.c scan-tree-dump-times optimized " = .CTZ \\\\(vect" 2 > FAIL: gcc.dg/vect/pr109011-4.c scan-tree-dump-times optimized " = .POPCOUNT \\\\(vect" 1 > > because aprefetch unrolls loops even if -fno-unroll-loops is used. > Accordingly, the scan patterns match more than one time. > > Could also be fixed by using -fno-prefetch-loop-arrays for the tests. > Though, I tend to prefer if aprefetch honours -fno-unroll-loops. Any > preferences? > > Bootstrapped and regtested on x86_64 and s390. Ok for mainline? OK. Richard. > gcc/ChangeLog: > > * tree-ssa-loop-prefetch.cc (determine_unroll_factor): Honour > -fno-unroll-loops. > --- > gcc/tree-ssa-loop-prefetch.cc | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/gcc/tree-ssa-loop-prefetch.cc b/gcc/tree-ssa-loop-prefetch.cc > index 70073cc4fe4..bb5d5dec779 100644 > --- a/gcc/tree-ssa-loop-prefetch.cc > +++ b/gcc/tree-ssa-loop-prefetch.cc > @@ -1401,6 +1401,10 @@ determine_unroll_factor (class loop *loop, struct mem_ref_group *refs, > struct mem_ref_group *agp; > struct mem_ref *ref; > > + /* Bail out early in case we must not unroll loops. */ > + if (!flag_unroll_loops) > + return 1; > + > /* First check whether the loop is not too large to unroll. We ignore > PARAM_MAX_UNROLL_TIMES, because for small loops, it prevented us > from unrolling them enough to make exactly one cache line covered by each > -- > 2.44.0 >
diff --git a/gcc/tree-ssa-loop-prefetch.cc b/gcc/tree-ssa-loop-prefetch.cc index 70073cc4fe4..bb5d5dec779 100644 --- a/gcc/tree-ssa-loop-prefetch.cc +++ b/gcc/tree-ssa-loop-prefetch.cc @@ -1401,6 +1401,10 @@ determine_unroll_factor (class loop *loop, struct mem_ref_group *refs, struct mem_ref_group *agp; struct mem_ref *ref; + /* Bail out early in case we must not unroll loops. */ + if (!flag_unroll_loops) + return 1; + /* First check whether the loop is not too large to unroll. We ignore PARAM_MAX_UNROLL_TIMES, because for small loops, it prevented us from unrolling them enough to make exactly one cache line covered by each