diff mbox series

Fix PR88440, enable mem* detection at -O[2s]

Message ID alpine.LSU.2.20.1905221138520.10704@zhemvz.fhfr.qr
State New
Headers show
Series Fix PR88440, enable mem* detection at -O[2s] | expand

Commit Message

Richard Biener May 22, 2019, 9:40 a.m. UTC
This enables -ftree-loop-distribute-patterns at -O[2s] and also
arranges cold loops to be still processed but for pattern
recognition to save code-size.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Martin has done extensive compile-time testing on SPEC
identifying only a single regression I'll have a look into.

Richard.

2019-05-22  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/88440
	* opts.c (default_options_table): Enable -ftree-loop-distribute-patterns
	at -O[2s]+.
	* tree-loop-distribution.c (generate_memset_builtin): Fold the
	generated call.
	(generate_memcpy_builtin): Likewise.
	(distribute_loop): Pass in whether to only distribute patterns.
	(prepare_perfect_loop_nest): Also allow size optimization.
	(pass_loop_distribution::execute): When optimizing a loop
	nest for size allow pattern replacement.

	* gcc.dg/tree-ssa/ldist-37.c: New testcase.
	* gcc.dg/tree-ssa/ldist-38.c: Likewise.

Comments

Richard Biener May 23, 2019, 11:32 a.m. UTC | #1
On Wed, 22 May 2019, Richard Biener wrote:

> 
> This enables -ftree-loop-distribute-patterns at -O[2s] and also
> arranges cold loops to be still processed but for pattern
> recognition to save code-size.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Martin has done extensive compile-time testing on SPEC
> identifying only a single regression I'll have a look into.

The reason for the compile-time regression is the complexity
heuristic in LRA no longer choosing "simple" algorithms and
the LIVE problem in particular being awfully slow.

Unsurprisingly testing has also revealed loads of testsuite
fallout which I deal with in the patch as committed below.
Sorry for any further fallout on other targets (which I do
expect).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-05-23  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/88440
	* opts.c (default_options_table): Enable -ftree-loop-distribute-patterns
	at -O[2s]+.
	* tree-loop-distribution.c (generate_memset_builtin): Fold the
	generated call.
	(generate_memcpy_builtin): Likewise.
	(distribute_loop): Pass in whether to only distribute patterns.
	(prepare_perfect_loop_nest): Also allow size optimization.
	(pass_loop_distribution::execute): When optimizing a loop
	nest for size allow pattern replacement.

	* gcc.dg/tree-ssa/ldist-37.c: New testcase.
	* gcc.dg/tree-ssa/ldist-38.c: Likewise.
	* gcc.dg/vect/vect.exp: Add -fno-tree-loop-distribute-patterns.
	* gcc.dg/tree-ssa/ldist-37.c: Adjust.
	* gcc.dg/tree-ssa/ldist-38.c: Likewise.
	* g++.dg/tree-ssa/pr78847.C: Likewise.
	* gcc.dg/autopar/pr39500-1.c: Likewise.
	* gcc.dg/autopar/reduc-1char.c: Likewise.
	* gcc.dg/autopar/reduc-7.c: Likewise.
	* gcc.dg/tree-ssa/ivopts-lt-2.c: Likewise.
	* gcc.dg/tree-ssa/ivopts-lt.c: Likewise.
	* gcc.dg/tree-ssa/predcom-dse-1.c: Likewise.
	* gcc.dg/tree-ssa/predcom-dse-2.c: Likewise.
	* gcc.dg/tree-ssa/predcom-dse-3.c: Likewise.
	* gcc.dg/tree-ssa/predcom-dse-4.c: Likewise.
	* gcc.dg/tree-ssa/prefetch-7.c: Likewise.
	* gcc.dg/tree-ssa/prefetch-8.c: Likewise.
	* gcc.dg/tree-ssa/prefetch-9.c: Likewise.
	* gcc.dg/tree-ssa/scev-11.c: Likewise.
	* gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: Likewise.
	* gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: Likewise.
	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: Likewise.
	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: Likewise.
	* gcc.target/i386/pr30970.c: Likewise.
	* gcc.target/i386/vect-double-1.c: Likewise.
	* gcc.target/i386/vect-double-2.c: Likewise.
	* gcc.dg/tree-ssa/gen-vect-2.c: Likewise.
	* gcc.dg/tree-ssa/gen-vect-26.c: Likewise.
	* gcc.dg/tree-ssa/gen-vect-28.c: Likewise.
	* gcc.dg/tree-ssa/gen-vect-32.c: Likewise.
	* gfortran.dg/vect/vect-5.f90: Likewise.
	* gfortran.dg/vect/vect-8.f90: Likewise.

Index: gcc/opts.c
===================================================================
--- gcc/opts.c	(revision 271513)
+++ gcc/opts.c	(working copy)
@@ -550,7 +550,7 @@ static const struct default_options defa
     { OPT_LEVELS_3_PLUS, OPT_fpredictive_commoning, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fsplit_paths, NULL, 1 },
-    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
+    { OPT_LEVELS_2_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribution, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
Index: gcc/testsuite/g++.dg/tree-ssa/pr78847.C
===================================================================
--- gcc/testsuite/g++.dg/tree-ssa/pr78847.C	(revision 271513)
+++ gcc/testsuite/g++.dg/tree-ssa/pr78847.C	(working copy)
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target c++14 } */
-/* { dg-options "-O3 -fdump-tree-ldist" } */
+/* { dg-options "-O3 -fdump-tree-ldist-optimized" } */
 
 #include <stddef.h>
 #include <cstring>
@@ -23,4 +23,4 @@ void testWithLoopValue(const Foo foo, si
       buf_[ptr++] = c;
 }
 
-/* { dg-final { scan-tree-dump "memcpy\[^\n\r\]*, 9\\);" "ldist" } } */
+/* { dg-final { scan-tree-dump "split to 0 loops and 1 library calls" "ldist" } } */
Index: gcc/testsuite/gcc.dg/autopar/pr39500-1.c
===================================================================
--- gcc/testsuite/gcc.dg/autopar/pr39500-1.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/autopar/pr39500-1.c	(working copy)
@@ -1,7 +1,7 @@
 /* pr39500: autopar fails to parallel */
 /* origin: nemokingdom@gmail.com(LiFeng) */
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops2-details" } */
+/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-parallelize-loops=4 -fdump-tree-parloops2-details" } */
 
 void abort (void);
 
Index: gcc/testsuite/gcc.dg/autopar/reduc-1char.c
===================================================================
--- gcc/testsuite/gcc.dg/autopar/reduc-1char.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/autopar/reduc-1char.c	(working copy)
@@ -61,5 +61,5 @@ int main (void)
 
 
 /* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops2" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops2" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops2" } } */
 
Index: gcc/testsuite/gcc.dg/autopar/reduc-7.c
===================================================================
--- gcc/testsuite/gcc.dg/autopar/reduc-7.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/autopar/reduc-7.c	(working copy)
@@ -85,5 +85,5 @@ int main (void)
 
 
 /* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops2" } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops2" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops2" } } */
 
Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c	(working copy)
@@ -1,6 +1,6 @@
 /* { dg-do run { target vect_cmdline_needed } } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
+/* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
 
 #include <stdlib.h>
 
Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c	(working copy)
@@ -1,6 +1,6 @@
 /* { dg-do run { target vect_cmdline_needed } } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
+/* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
 
 #include <stdlib.h>
 
Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c	(working copy)
@@ -1,6 +1,6 @@
 /* { dg-do run { target vect_cmdline_needed } } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
+/* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
 
 #include <stdlib.h>
 
Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run { target vect_cmdline_needed } } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fno-vect-cost-model" } */
+/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fno-vect-cost-model" } */
 /* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
 
 #include <stdlib.h>
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-ivopts" } */
+/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fdump-tree-ivopts" } */
 /* { dg-skip-if "PR68644" { hppa*-*-* powerpc*-*-* } } */
 
 void
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-ivopts" } */
+/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fdump-tree-ivopts" } */
 /* { dg-require-effective-target stdint_types } */
 
 #include "stdint.h"
Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
+/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
 
 int arr[105] = {2, 3, 5, 7, 11};
 int result0[10] = {2, 3, 5, 7, 11};
Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
+/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
 
 int arr[105] = {2, 3, 5, 7, 11};
 int result0[10] = {2, 3, 5, 7, 11};
Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
+/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
 
 int arr1[105] = {2, 3, 5, 7, 11, 13, 0};
 int arr2[105] = {2, 3, 5, 7, 11, 13, 0};
Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
+/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
 
 int arr[105] = {2, 3, 5, 7, 11};
 int result0[10] = {2, 3, 5, 7, 11};
Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
-/* { dg-options "-O2 -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
 
 #define K 1000000
 int a[K];
Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
-/* { dg-options "-O2 -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
 
 #define K 1000000
 int a[K];
Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
-/* { dg-options "-O2 -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
 
 #define K 1000000
 int a[K], b[K];
Index: gcc/testsuite/gcc.dg/tree-ssa/scev-11.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/scev-11.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/tree-ssa/scev-11.c	(working copy)
@@ -15,7 +15,7 @@ foo (int n)
     {
       unsigned char uc = (unsigned char)i;
       a[i] = i;
-      b[uc] = 0;
+      b[uc] = 1;
     }
 
   bar (a);
Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c	(working copy)
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
 
 #include <stdarg.h>
 #include "../../tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c	(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
 
 #include <stdarg.h>
 #include "../../tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c	(working copy)
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
 
 #include <stdarg.h>
 #include "../../tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c	(revision 271513)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c	(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
 
 #include <stdarg.h>
 #include "../../tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/vect.exp
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect.exp	(revision 271513)
+++ gcc/testsuite/gcc.dg/vect/vect.exp	(working copy)
@@ -45,7 +45,7 @@ if ![check_vect_support_and_set_flags] {
 }
 
 # These flags are used for all targets.
-lappend DEFAULT_VECTCFLAGS "-ftree-vectorize" "-fno-vect-cost-model" "-fno-common"
+lappend DEFAULT_VECTCFLAGS "-ftree-vectorize" "-fno-tree-loop-distribute-patterns" "-fno-vect-cost-model" "-fno-common"
 
 # Initialize `dg'.
 dg-init
Index: gcc/testsuite/gcc.target/i386/pr30970.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr30970.c	(revision 271513)
+++ gcc/testsuite/gcc.target/i386/pr30970.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile }
-/* { dg-options "-msse2 -O2 -ftree-vectorize -mtune=generic" } */
+/* { dg-options "-msse2 -O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -mtune=generic" } */
 
 #define N 256
 int b[N];
Index: gcc/testsuite/gcc.target/i386/vect-double-1.c
===================================================================
--- gcc/testsuite/gcc.target/i386/vect-double-1.c	(revision 271513)
+++ gcc/testsuite/gcc.target/i386/vect-double-1.c	(working copy)
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-skip-if "" { *-*-* } { "-march=*" } { "-march=core2" } } */
-/* { dg-options "-O2 -ftree-vectorize -mfpmath=sse -march=core2 -fdump-tree-vect-stats" } */
+/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns -mfpmath=sse -march=core2 -fdump-tree-vect-stats" } */
 /* { dg-add-options bind_pic_locally } */
 
 extern void abort (void);
Index: gcc/testsuite/gcc.target/i386/vect-double-2.c
===================================================================
--- gcc/testsuite/gcc.target/i386/vect-double-2.c	(revision 271513)
+++ gcc/testsuite/gcc.target/i386/vect-double-2.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -mfpmath=sse -msse2 -mtune=atom -fdump-tree-vect-stats" } */
+/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns -mfpmath=sse -msse2 -mtune=atom -fdump-tree-vect-stats" } */
 
 extern void abort (void);
 
Index: gcc/testsuite/gfortran.dg/vect/vect-5.f90
===================================================================
--- gcc/testsuite/gfortran.dg/vect/vect-5.f90	(revision 271513)
+++ gcc/testsuite/gfortran.dg/vect/vect-5.f90	(working copy)
@@ -1,5 +1,5 @@
 ! { dg-require-effective-target vect_int }
-! { dg-additional-options "--param vect-max-peeling-for-alignment=0" }
+! { dg-additional-options "-fno-tree-loop-distribute-patterns --param vect-max-peeling-for-alignment=0" }
 
         Subroutine foo (N, M)
         Integer N
Index: gcc/testsuite/gfortran.dg/vect/vect-8.f90
===================================================================
--- gcc/testsuite/gfortran.dg/vect/vect-8.f90	(revision 271513)
+++ gcc/testsuite/gfortran.dg/vect/vect-8.f90	(working copy)
@@ -1,6 +1,6 @@
 ! { dg-do compile }
 ! { dg-require-effective-target vect_double }
-! { dg-additional-options "-finline-matmul-limit=0" }
+! { dg-additional-options "-fno-tree-loop-distribute-patterns -finline-matmul-limit=0" }
 
 module lfk_prec
  integer, parameter :: dp=kind(1.d0)
Index: gcc/tree-loop-distribution.c
===================================================================
--- gcc/tree-loop-distribution.c	(revision 271513)
+++ gcc/tree-loop-distribution.c	(working copy)
@@ -115,6 +115,7 @@ along with GCC; see the file COPYING3.
 #include "params.h"
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
+#include "gimple-fold.h"
 
 
 #define MAX_DATAREFS_NUM \
@@ -1028,6 +1029,7 @@ generate_memset_builtin (struct loop *lo
   fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_MEMSET));
   fn_call = gimple_build_call (fn, 3, mem, val, nb_bytes);
   gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
+  fold_stmt (&gsi);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -1071,6 +1073,7 @@ generate_memcpy_builtin (struct loop *lo
   fn = build_fold_addr_expr (builtin_decl_implicit (kind));
   fn_call = gimple_build_call (fn, 3, dest, src, nb_bytes);
   gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
+  fold_stmt (&gsi);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -2769,7 +2772,8 @@ finalize_partitions (struct loop *loop,
 
 static int
 distribute_loop (struct loop *loop, vec<gimple *> stmts,
-		 control_dependences *cd, int *nb_calls, bool *destroy_p)
+		 control_dependences *cd, int *nb_calls, bool *destroy_p,
+		 bool only_patterns_p)
 {
   ddrs_table = new hash_table<ddr_hasher> (389);
   struct graph *rdg;
@@ -2843,7 +2847,7 @@ distribute_loop (struct loop *loop, vec<
 
   /* If we are only distributing patterns but did not detect any,
      simply bail out.  */
-  if (!flag_tree_loop_distribution
+  if (only_patterns_p
       && !any_builtin)
     {
       nbp = 0;
@@ -2855,7 +2859,7 @@ distribute_loop (struct loop *loop, vec<
      a loop into pieces, separated by builtin calls.  That is, we
      only want no or a single loop body remaining.  */
   struct partition *into;
-  if (!flag_tree_loop_distribution)
+  if (only_patterns_p)
     {
       for (i = 0; partitions.iterate (i, &into); ++i)
 	if (!partition_builtin_p (into))
@@ -3085,7 +3089,6 @@ prepare_perfect_loop_nest (struct loop *
 	 && loop_outer (outer)
 	 && outer->inner == loop && loop->next == NULL
 	 && single_exit (outer)
-	 && optimize_loop_for_speed_p (outer)
 	 && !chrec_contains_symbols_defined_in_loop (niters, outer->num)
 	 && (niters = number_of_latch_executions (outer)) != NULL_TREE
 	 && niters != chrec_dont_know)
@@ -3139,9 +3142,11 @@ pass_loop_distribution::execute (functio
      walking to innermost loops.  */
   FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST)
     {
-      /* Don't distribute multiple exit edges loop, or cold loop.  */
+      /* Don't distribute multiple exit edges loop, or cold loop when
+         not doing pattern detection.  */
       if (!single_exit (loop)
-	  || !optimize_loop_for_speed_p (loop))
+	  || (!flag_tree_loop_distribute_patterns
+	      && !optimize_loop_for_speed_p (loop)))
 	continue;
 
       /* Don't distribute loop if niters is unknown.  */
@@ -3169,9 +3174,10 @@ pass_loop_distribution::execute (functio
 
 	  bool destroy_p;
 	  int nb_generated_loops, nb_generated_calls;
-	  nb_generated_loops = distribute_loop (loop, work_list, cd,
-						&nb_generated_calls,
-						&destroy_p);
+	  nb_generated_loops
+	    = distribute_loop (loop, work_list, cd, &nb_generated_calls,
+			       &destroy_p, (!optimize_loop_for_speed_p (loop)
+					    || !flag_tree_loop_distribution));
 	  if (destroy_p)
 	    loops_to_be_destroyed.safe_push (loop);
Christophe Lyon May 27, 2019, 6:42 a.m. UTC | #2
On Thu, 23 May 2019 at 13:32, Richard Biener <rguenther@suse.de> wrote:
>
> On Wed, 22 May 2019, Richard Biener wrote:
>
> >
> > This enables -ftree-loop-distribute-patterns at -O[2s] and also
> > arranges cold loops to be still processed but for pattern
> > recognition to save code-size.
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> > Martin has done extensive compile-time testing on SPEC
> > identifying only a single regression I'll have a look into.
>
> The reason for the compile-time regression is the complexity
> heuristic in LRA no longer choosing "simple" algorithms and
> the LIVE problem in particular being awfully slow.
>
> Unsurprisingly testing has also revealed loads of testsuite
> fallout which I deal with in the patch as committed below.
> Sorry for any further fallout on other targets (which I do
> expect).
>

Hi Richard,http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/271588/report-build-info.html

Indeed git bisect pointed me to this commit when checking
the regressions on arm & aarch64 reported at:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/271588/report-build-info.html

Since I'm a bit later in reporting, I'm not sure you've fixed them already?
(I didn't notice follow-ups)
Looking at this patch, it seems adding -fno-tree-loop-distribute-patterns to
dg-options is the standard way of fixing the regressions?

Christophe

> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>
> Richard.
>
> 2019-05-23  Richard Biener  <rguenther@suse.de>
>
>         PR tree-optimization/88440
>         * opts.c (default_options_table): Enable -ftree-loop-distribute-patterns
>         at -O[2s]+.
>         * tree-loop-distribution.c (generate_memset_builtin): Fold the
>         generated call.
>         (generate_memcpy_builtin): Likewise.
>         (distribute_loop): Pass in whether to only distribute patterns.
>         (prepare_perfect_loop_nest): Also allow size optimization.
>         (pass_loop_distribution::execute): When optimizing a loop
>         nest for size allow pattern replacement.
>
>         * gcc.dg/tree-ssa/ldist-37.c: New testcase.
>         * gcc.dg/tree-ssa/ldist-38.c: Likewise.
>         * gcc.dg/vect/vect.exp: Add -fno-tree-loop-distribute-patterns.
>         * gcc.dg/tree-ssa/ldist-37.c: Adjust.
>         * gcc.dg/tree-ssa/ldist-38.c: Likewise.
>         * g++.dg/tree-ssa/pr78847.C: Likewise.
>         * gcc.dg/autopar/pr39500-1.c: Likewise.
>         * gcc.dg/autopar/reduc-1char.c: Likewise.
>         * gcc.dg/autopar/reduc-7.c: Likewise.
>         * gcc.dg/tree-ssa/ivopts-lt-2.c: Likewise.
>         * gcc.dg/tree-ssa/ivopts-lt.c: Likewise.
>         * gcc.dg/tree-ssa/predcom-dse-1.c: Likewise.
>         * gcc.dg/tree-ssa/predcom-dse-2.c: Likewise.
>         * gcc.dg/tree-ssa/predcom-dse-3.c: Likewise.
>         * gcc.dg/tree-ssa/predcom-dse-4.c: Likewise.
>         * gcc.dg/tree-ssa/prefetch-7.c: Likewise.
>         * gcc.dg/tree-ssa/prefetch-8.c: Likewise.
>         * gcc.dg/tree-ssa/prefetch-9.c: Likewise.
>         * gcc.dg/tree-ssa/scev-11.c: Likewise.
>         * gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: Likewise.
>         * gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: Likewise.
>         * gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: Likewise.
>         * gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: Likewise.
>         * gcc.target/i386/pr30970.c: Likewise.
>         * gcc.target/i386/vect-double-1.c: Likewise.
>         * gcc.target/i386/vect-double-2.c: Likewise.
>         * gcc.dg/tree-ssa/gen-vect-2.c: Likewise.
>         * gcc.dg/tree-ssa/gen-vect-26.c: Likewise.
>         * gcc.dg/tree-ssa/gen-vect-28.c: Likewise.
>         * gcc.dg/tree-ssa/gen-vect-32.c: Likewise.
>         * gfortran.dg/vect/vect-5.f90: Likewise.
>         * gfortran.dg/vect/vect-8.f90: Likewise.
>
> Index: gcc/opts.c
> ===================================================================
> --- gcc/opts.c  (revision 271513)
> +++ gcc/opts.c  (working copy)
> @@ -550,7 +550,7 @@ static const struct default_options defa
>      { OPT_LEVELS_3_PLUS, OPT_fpredictive_commoning, NULL, 1 },
>      { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
>      { OPT_LEVELS_3_PLUS, OPT_fsplit_paths, NULL, 1 },
> -    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
> +    { OPT_LEVELS_2_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribution, NULL, 1 },
>      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
>      { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
> Index: gcc/testsuite/g++.dg/tree-ssa/pr78847.C
> ===================================================================
> --- gcc/testsuite/g++.dg/tree-ssa/pr78847.C     (revision 271513)
> +++ gcc/testsuite/g++.dg/tree-ssa/pr78847.C     (working copy)
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target c++14 } */
> -/* { dg-options "-O3 -fdump-tree-ldist" } */
> +/* { dg-options "-O3 -fdump-tree-ldist-optimized" } */
>
>  #include <stddef.h>
>  #include <cstring>
> @@ -23,4 +23,4 @@ void testWithLoopValue(const Foo foo, si
>        buf_[ptr++] = c;
>  }
>
> -/* { dg-final { scan-tree-dump "memcpy\[^\n\r\]*, 9\\);" "ldist" } } */
> +/* { dg-final { scan-tree-dump "split to 0 loops and 1 library calls" "ldist" } } */
> Index: gcc/testsuite/gcc.dg/autopar/pr39500-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/autopar/pr39500-1.c    (revision 271513)
> +++ gcc/testsuite/gcc.dg/autopar/pr39500-1.c    (working copy)
> @@ -1,7 +1,7 @@
>  /* pr39500: autopar fails to parallel */
>  /* origin: nemokingdom@gmail.com(LiFeng) */
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops2-details" } */
> +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-parallelize-loops=4 -fdump-tree-parloops2-details" } */
>
>  void abort (void);
>
> Index: gcc/testsuite/gcc.dg/autopar/reduc-1char.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/autopar/reduc-1char.c  (revision 271513)
> +++ gcc/testsuite/gcc.dg/autopar/reduc-1char.c  (working copy)
> @@ -61,5 +61,5 @@ int main (void)
>
>
>  /* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops2" } } */
> -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops2" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops2" } } */
>
> Index: gcc/testsuite/gcc.dg/autopar/reduc-7.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/autopar/reduc-7.c      (revision 271513)
> +++ gcc/testsuite/gcc.dg/autopar/reduc-7.c      (working copy)
> @@ -85,5 +85,5 @@ int main (void)
>
>
>  /* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops2" } } */
> -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops2" } } */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops2" } } */
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c  (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c  (working copy)
> @@ -1,6 +1,6 @@
>  /* { dg-do run { target vect_cmdline_needed } } */
> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> +/* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
>
>  #include <stdlib.h>
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c (working copy)
> @@ -1,6 +1,6 @@
>  /* { dg-do run { target vect_cmdline_needed } } */
> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> +/* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
>
>  #include <stdlib.h>
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c (working copy)
> @@ -1,6 +1,6 @@
>  /* { dg-do run { target vect_cmdline_needed } } */
> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> +/* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
>
>  #include <stdlib.h>
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do run { target vect_cmdline_needed } } */
> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fno-vect-cost-model" } */
> +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fno-vect-cost-model" } */
>  /* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
>
>  #include <stdlib.h>
> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-ivopts" } */
> +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fdump-tree-ivopts" } */
>  /* { dg-skip-if "PR68644" { hppa*-*-* powerpc*-*-* } } */
>
>  void
> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c   (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c   (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-ivopts" } */
> +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fdump-tree-ivopts" } */
>  /* { dg-require-effective-target stdint_types } */
>
>  #include "stdint.h"
> Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c       (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c       (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
> +/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
>
>  int arr[105] = {2, 3, 5, 7, 11};
>  int result0[10] = {2, 3, 5, 7, 11};
> Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c       (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c       (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
> +/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
>
>  int arr[105] = {2, 3, 5, 7, 11};
>  int result0[10] = {2, 3, 5, 7, 11};
> Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c       (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c       (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
> +/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
>
>  int arr1[105] = {2, 3, 5, 7, 11, 13, 0};
>  int arr2[105] = {2, 3, 5, 7, 11, 13, 0};
> Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c       (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c       (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
> +/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
>
>  int arr[105] = {2, 3, 5, 7, 11};
>  int result0[10] = {2, 3, 5, 7, 11};
> Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c  (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c  (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> -/* { dg-options "-O2 -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
>
>  #define K 1000000
>  int a[K];
> Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c  (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c  (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> -/* { dg-options "-O2 -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
>
>  #define K 1000000
>  int a[K];
> Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c  (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c  (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> -/* { dg-options "-O2 -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
>
>  #define K 1000000
>  int a[K], b[K];
> Index: gcc/testsuite/gcc.dg/tree-ssa/scev-11.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/scev-11.c     (revision 271513)
> +++ gcc/testsuite/gcc.dg/tree-ssa/scev-11.c     (working copy)
> @@ -15,7 +15,7 @@ foo (int n)
>      {
>        unsigned char uc = (unsigned char)i;
>        a[i] = i;
> -      b[uc] = 0;
> +      b[uc] = 1;
>      }
>
>    bar (a);
> Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c        (revision 271513)
> +++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c        (working copy)
> @@ -1,4 +1,5 @@
>  /* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
>
>  #include <stdarg.h>
>  #include "../../tree-vect.h"
> Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c        (revision 271513)
> +++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c        (working copy)
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
>
>  #include <stdarg.h>
>  #include "../../tree-vect.h"
> Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c      (revision 271513)
> +++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c      (working copy)
> @@ -1,4 +1,5 @@
>  /* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
>
>  #include <stdarg.h>
>  #include "../../tree-vect.h"
> Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c      (revision 271513)
> +++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c      (working copy)
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
>
>  #include <stdarg.h>
>  #include "../../tree-vect.h"
> Index: gcc/testsuite/gcc.dg/vect/vect.exp
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect.exp  (revision 271513)
> +++ gcc/testsuite/gcc.dg/vect/vect.exp  (working copy)
> @@ -45,7 +45,7 @@ if ![check_vect_support_and_set_flags] {
>  }
>
>  # These flags are used for all targets.
> -lappend DEFAULT_VECTCFLAGS "-ftree-vectorize" "-fno-vect-cost-model" "-fno-common"
> +lappend DEFAULT_VECTCFLAGS "-ftree-vectorize" "-fno-tree-loop-distribute-patterns" "-fno-vect-cost-model" "-fno-common"
>
>  # Initialize `dg'.
>  dg-init
> Index: gcc/testsuite/gcc.target/i386/pr30970.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/pr30970.c     (revision 271513)
> +++ gcc/testsuite/gcc.target/i386/pr30970.c     (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile }
> -/* { dg-options "-msse2 -O2 -ftree-vectorize -mtune=generic" } */
> +/* { dg-options "-msse2 -O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -mtune=generic" } */
>
>  #define N 256
>  int b[N];
> Index: gcc/testsuite/gcc.target/i386/vect-double-1.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/vect-double-1.c       (revision 271513)
> +++ gcc/testsuite/gcc.target/i386/vect-double-1.c       (working copy)
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-skip-if "" { *-*-* } { "-march=*" } { "-march=core2" } } */
> -/* { dg-options "-O2 -ftree-vectorize -mfpmath=sse -march=core2 -fdump-tree-vect-stats" } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns -mfpmath=sse -march=core2 -fdump-tree-vect-stats" } */
>  /* { dg-add-options bind_pic_locally } */
>
>  extern void abort (void);
> Index: gcc/testsuite/gcc.target/i386/vect-double-2.c
> ===================================================================
> --- gcc/testsuite/gcc.target/i386/vect-double-2.c       (revision 271513)
> +++ gcc/testsuite/gcc.target/i386/vect-double-2.c       (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize -mfpmath=sse -msse2 -mtune=atom -fdump-tree-vect-stats" } */
> +/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns -mfpmath=sse -msse2 -mtune=atom -fdump-tree-vect-stats" } */
>
>  extern void abort (void);
>
> Index: gcc/testsuite/gfortran.dg/vect/vect-5.f90
> ===================================================================
> --- gcc/testsuite/gfortran.dg/vect/vect-5.f90   (revision 271513)
> +++ gcc/testsuite/gfortran.dg/vect/vect-5.f90   (working copy)
> @@ -1,5 +1,5 @@
>  ! { dg-require-effective-target vect_int }
> -! { dg-additional-options "--param vect-max-peeling-for-alignment=0" }
> +! { dg-additional-options "-fno-tree-loop-distribute-patterns --param vect-max-peeling-for-alignment=0" }
>
>          Subroutine foo (N, M)
>          Integer N
> Index: gcc/testsuite/gfortran.dg/vect/vect-8.f90
> ===================================================================
> --- gcc/testsuite/gfortran.dg/vect/vect-8.f90   (revision 271513)
> +++ gcc/testsuite/gfortran.dg/vect/vect-8.f90   (working copy)
> @@ -1,6 +1,6 @@
>  ! { dg-do compile }
>  ! { dg-require-effective-target vect_double }
> -! { dg-additional-options "-finline-matmul-limit=0" }
> +! { dg-additional-options "-fno-tree-loop-distribute-patterns -finline-matmul-limit=0" }
>
>  module lfk_prec
>   integer, parameter :: dp=kind(1.d0)
> Index: gcc/tree-loop-distribution.c
> ===================================================================
> --- gcc/tree-loop-distribution.c        (revision 271513)
> +++ gcc/tree-loop-distribution.c        (working copy)
> @@ -115,6 +115,7 @@ along with GCC; see the file COPYING3.
>  #include "params.h"
>  #include "tree-vectorizer.h"
>  #include "tree-eh.h"
> +#include "gimple-fold.h"
>
>
>  #define MAX_DATAREFS_NUM \
> @@ -1028,6 +1029,7 @@ generate_memset_builtin (struct loop *lo
>    fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_MEMSET));
>    fn_call = gimple_build_call (fn, 3, mem, val, nb_bytes);
>    gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
> +  fold_stmt (&gsi);
>
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      {
> @@ -1071,6 +1073,7 @@ generate_memcpy_builtin (struct loop *lo
>    fn = build_fold_addr_expr (builtin_decl_implicit (kind));
>    fn_call = gimple_build_call (fn, 3, dest, src, nb_bytes);
>    gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
> +  fold_stmt (&gsi);
>
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      {
> @@ -2769,7 +2772,8 @@ finalize_partitions (struct loop *loop,
>
>  static int
>  distribute_loop (struct loop *loop, vec<gimple *> stmts,
> -                control_dependences *cd, int *nb_calls, bool *destroy_p)
> +                control_dependences *cd, int *nb_calls, bool *destroy_p,
> +                bool only_patterns_p)
>  {
>    ddrs_table = new hash_table<ddr_hasher> (389);
>    struct graph *rdg;
> @@ -2843,7 +2847,7 @@ distribute_loop (struct loop *loop, vec<
>
>    /* If we are only distributing patterns but did not detect any,
>       simply bail out.  */
> -  if (!flag_tree_loop_distribution
> +  if (only_patterns_p
>        && !any_builtin)
>      {
>        nbp = 0;
> @@ -2855,7 +2859,7 @@ distribute_loop (struct loop *loop, vec<
>       a loop into pieces, separated by builtin calls.  That is, we
>       only want no or a single loop body remaining.  */
>    struct partition *into;
> -  if (!flag_tree_loop_distribution)
> +  if (only_patterns_p)
>      {
>        for (i = 0; partitions.iterate (i, &into); ++i)
>         if (!partition_builtin_p (into))
> @@ -3085,7 +3089,6 @@ prepare_perfect_loop_nest (struct loop *
>          && loop_outer (outer)
>          && outer->inner == loop && loop->next == NULL
>          && single_exit (outer)
> -        && optimize_loop_for_speed_p (outer)
>          && !chrec_contains_symbols_defined_in_loop (niters, outer->num)
>          && (niters = number_of_latch_executions (outer)) != NULL_TREE
>          && niters != chrec_dont_know)
> @@ -3139,9 +3142,11 @@ pass_loop_distribution::execute (functio
>       walking to innermost loops.  */
>    FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST)
>      {
> -      /* Don't distribute multiple exit edges loop, or cold loop.  */
> +      /* Don't distribute multiple exit edges loop, or cold loop when
> +         not doing pattern detection.  */
>        if (!single_exit (loop)
> -         || !optimize_loop_for_speed_p (loop))
> +         || (!flag_tree_loop_distribute_patterns
> +             && !optimize_loop_for_speed_p (loop)))
>         continue;
>
>        /* Don't distribute loop if niters is unknown.  */
> @@ -3169,9 +3174,10 @@ pass_loop_distribution::execute (functio
>
>           bool destroy_p;
>           int nb_generated_loops, nb_generated_calls;
> -         nb_generated_loops = distribute_loop (loop, work_list, cd,
> -                                               &nb_generated_calls,
> -                                               &destroy_p);
> +         nb_generated_loops
> +           = distribute_loop (loop, work_list, cd, &nb_generated_calls,
> +                              &destroy_p, (!optimize_loop_for_speed_p (loop)
> +                                           || !flag_tree_loop_distribution));
>           if (destroy_p)
>             loops_to_be_destroyed.safe_push (loop);
>
Richard Biener May 27, 2019, 7:26 a.m. UTC | #3
On Mon, 27 May 2019, Christophe Lyon wrote:

> On Thu, 23 May 2019 at 13:32, Richard Biener <rguenther@suse.de> wrote:
> >
> > On Wed, 22 May 2019, Richard Biener wrote:
> >
> > >
> > > This enables -ftree-loop-distribute-patterns at -O[2s] and also
> > > arranges cold loops to be still processed but for pattern
> > > recognition to save code-size.
> > >
> > > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > >
> > > Martin has done extensive compile-time testing on SPEC
> > > identifying only a single regression I'll have a look into.
> >
> > The reason for the compile-time regression is the complexity
> > heuristic in LRA no longer choosing "simple" algorithms and
> > the LIVE problem in particular being awfully slow.
> >
> > Unsurprisingly testing has also revealed loads of testsuite
> > fallout which I deal with in the patch as committed below.
> > Sorry for any further fallout on other targets (which I do
> > expect).
> >
> 
> Hi Richard,http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/271588/report-build-info.html
> 
> Indeed git bisect pointed me to this commit when checking
> the regressions on arm & aarch64 reported at:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/271588/report-build-info.html
> 
> Since I'm a bit later in reporting, I'm not sure you've fixed them already?
> (I didn't notice follow-ups)
> Looking at this patch, it seems adding -fno-tree-loop-distribute-patterns to
> dg-options is the standard way of fixing the regressions?

Yes.  As I wrote above I did expect some target specific fallout and
hoped target maintainers would fix that.

Richard.

> Christophe
> 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >
> > Richard.
> >
> > 2019-05-23  Richard Biener  <rguenther@suse.de>
> >
> >         PR tree-optimization/88440
> >         * opts.c (default_options_table): Enable -ftree-loop-distribute-patterns
> >         at -O[2s]+.
> >         * tree-loop-distribution.c (generate_memset_builtin): Fold the
> >         generated call.
> >         (generate_memcpy_builtin): Likewise.
> >         (distribute_loop): Pass in whether to only distribute patterns.
> >         (prepare_perfect_loop_nest): Also allow size optimization.
> >         (pass_loop_distribution::execute): When optimizing a loop
> >         nest for size allow pattern replacement.
> >
> >         * gcc.dg/tree-ssa/ldist-37.c: New testcase.
> >         * gcc.dg/tree-ssa/ldist-38.c: Likewise.
> >         * gcc.dg/vect/vect.exp: Add -fno-tree-loop-distribute-patterns.
> >         * gcc.dg/tree-ssa/ldist-37.c: Adjust.
> >         * gcc.dg/tree-ssa/ldist-38.c: Likewise.
> >         * g++.dg/tree-ssa/pr78847.C: Likewise.
> >         * gcc.dg/autopar/pr39500-1.c: Likewise.
> >         * gcc.dg/autopar/reduc-1char.c: Likewise.
> >         * gcc.dg/autopar/reduc-7.c: Likewise.
> >         * gcc.dg/tree-ssa/ivopts-lt-2.c: Likewise.
> >         * gcc.dg/tree-ssa/ivopts-lt.c: Likewise.
> >         * gcc.dg/tree-ssa/predcom-dse-1.c: Likewise.
> >         * gcc.dg/tree-ssa/predcom-dse-2.c: Likewise.
> >         * gcc.dg/tree-ssa/predcom-dse-3.c: Likewise.
> >         * gcc.dg/tree-ssa/predcom-dse-4.c: Likewise.
> >         * gcc.dg/tree-ssa/prefetch-7.c: Likewise.
> >         * gcc.dg/tree-ssa/prefetch-8.c: Likewise.
> >         * gcc.dg/tree-ssa/prefetch-9.c: Likewise.
> >         * gcc.dg/tree-ssa/scev-11.c: Likewise.
> >         * gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: Likewise.
> >         * gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: Likewise.
> >         * gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: Likewise.
> >         * gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: Likewise.
> >         * gcc.target/i386/pr30970.c: Likewise.
> >         * gcc.target/i386/vect-double-1.c: Likewise.
> >         * gcc.target/i386/vect-double-2.c: Likewise.
> >         * gcc.dg/tree-ssa/gen-vect-2.c: Likewise.
> >         * gcc.dg/tree-ssa/gen-vect-26.c: Likewise.
> >         * gcc.dg/tree-ssa/gen-vect-28.c: Likewise.
> >         * gcc.dg/tree-ssa/gen-vect-32.c: Likewise.
> >         * gfortran.dg/vect/vect-5.f90: Likewise.
> >         * gfortran.dg/vect/vect-8.f90: Likewise.
> >
> > Index: gcc/opts.c
> > ===================================================================
> > --- gcc/opts.c  (revision 271513)
> > +++ gcc/opts.c  (working copy)
> > @@ -550,7 +550,7 @@ static const struct default_options defa
> >      { OPT_LEVELS_3_PLUS, OPT_fpredictive_commoning, NULL, 1 },
> >      { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
> >      { OPT_LEVELS_3_PLUS, OPT_fsplit_paths, NULL, 1 },
> > -    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
> > +    { OPT_LEVELS_2_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
> >      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribution, NULL, 1 },
> >      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
> >      { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
> > Index: gcc/testsuite/g++.dg/tree-ssa/pr78847.C
> > ===================================================================
> > --- gcc/testsuite/g++.dg/tree-ssa/pr78847.C     (revision 271513)
> > +++ gcc/testsuite/g++.dg/tree-ssa/pr78847.C     (working copy)
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-require-effective-target c++14 } */
> > -/* { dg-options "-O3 -fdump-tree-ldist" } */
> > +/* { dg-options "-O3 -fdump-tree-ldist-optimized" } */
> >
> >  #include <stddef.h>
> >  #include <cstring>
> > @@ -23,4 +23,4 @@ void testWithLoopValue(const Foo foo, si
> >        buf_[ptr++] = c;
> >  }
> >
> > -/* { dg-final { scan-tree-dump "memcpy\[^\n\r\]*, 9\\);" "ldist" } } */
> > +/* { dg-final { scan-tree-dump "split to 0 loops and 1 library calls" "ldist" } } */
> > Index: gcc/testsuite/gcc.dg/autopar/pr39500-1.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/autopar/pr39500-1.c    (revision 271513)
> > +++ gcc/testsuite/gcc.dg/autopar/pr39500-1.c    (working copy)
> > @@ -1,7 +1,7 @@
> >  /* pr39500: autopar fails to parallel */
> >  /* origin: nemokingdom@gmail.com(LiFeng) */
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops2-details" } */
> > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-parallelize-loops=4 -fdump-tree-parloops2-details" } */
> >
> >  void abort (void);
> >
> > Index: gcc/testsuite/gcc.dg/autopar/reduc-1char.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/autopar/reduc-1char.c  (revision 271513)
> > +++ gcc/testsuite/gcc.dg/autopar/reduc-1char.c  (working copy)
> > @@ -61,5 +61,5 @@ int main (void)
> >
> >
> >  /* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops2" } } */
> > -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops2" } } */
> > +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops2" } } */
> >
> > Index: gcc/testsuite/gcc.dg/autopar/reduc-7.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/autopar/reduc-7.c      (revision 271513)
> > +++ gcc/testsuite/gcc.dg/autopar/reduc-7.c      (working copy)
> > @@ -85,5 +85,5 @@ int main (void)
> >
> >
> >  /* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops2" } } */
> > -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops2" } } */
> > +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops2" } } */
> >
> > Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c  (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c  (working copy)
> > @@ -1,6 +1,6 @@
> >  /* { dg-do run { target vect_cmdline_needed } } */
> > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> > +/* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> >
> >  #include <stdlib.h>
> >
> > Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c (working copy)
> > @@ -1,6 +1,6 @@
> >  /* { dg-do run { target vect_cmdline_needed } } */
> > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> > +/* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> >
> >  #include <stdlib.h>
> >
> > Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c (working copy)
> > @@ -1,6 +1,6 @@
> >  /* { dg-do run { target vect_cmdline_needed } } */
> > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> > +/* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> >
> >  #include <stdlib.h>
> >
> > Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do run { target vect_cmdline_needed } } */
> > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fno-vect-cost-model" } */
> > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fno-vect-cost-model" } */
> >  /* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> >
> >  #include <stdlib.h>
> > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -fdump-tree-ivopts" } */
> > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fdump-tree-ivopts" } */
> >  /* { dg-skip-if "PR68644" { hppa*-*-* powerpc*-*-* } } */
> >
> >  void
> > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c   (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c   (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -fdump-tree-ivopts" } */
> > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fdump-tree-ivopts" } */
> >  /* { dg-require-effective-target stdint_types } */
> >
> >  #include "stdint.h"
> > Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c       (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c       (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do run } */
> > -/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
> > +/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
> >
> >  int arr[105] = {2, 3, 5, 7, 11};
> >  int result0[10] = {2, 3, 5, 7, 11};
> > Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c       (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c       (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do run } */
> > -/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
> > +/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
> >
> >  int arr[105] = {2, 3, 5, 7, 11};
> >  int result0[10] = {2, 3, 5, 7, 11};
> > Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c       (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c       (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do run } */
> > -/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
> > +/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
> >
> >  int arr1[105] = {2, 3, 5, 7, 11, 13, 0};
> >  int arr2[105] = {2, 3, 5, 7, 11, 13, 0};
> > Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c       (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c       (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do run } */
> > -/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
> > +/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
> >
> >  int arr[105] = {2, 3, 5, 7, 11};
> >  int result0[10] = {2, 3, 5, 7, 11};
> > Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c  (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c  (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> > -/* { dg-options "-O2 -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> >
> >  #define K 1000000
> >  int a[K];
> > Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c  (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c  (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> > -/* { dg-options "-O2 -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> >
> >  #define K 1000000
> >  int a[K];
> > Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c  (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c  (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> > -/* { dg-options "-O2 -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> >
> >  #define K 1000000
> >  int a[K], b[K];
> > Index: gcc/testsuite/gcc.dg/tree-ssa/scev-11.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/scev-11.c     (revision 271513)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/scev-11.c     (working copy)
> > @@ -15,7 +15,7 @@ foo (int n)
> >      {
> >        unsigned char uc = (unsigned char)i;
> >        a[i] = i;
> > -      b[uc] = 0;
> > +      b[uc] = 1;
> >      }
> >
> >    bar (a);
> > Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c        (revision 271513)
> > +++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c        (working copy)
> > @@ -1,4 +1,5 @@
> >  /* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
> >
> >  #include <stdarg.h>
> >  #include "../../tree-vect.h"
> > Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c        (revision 271513)
> > +++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c        (working copy)
> > @@ -1,5 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
> >
> >  #include <stdarg.h>
> >  #include "../../tree-vect.h"
> > Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c      (revision 271513)
> > +++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c      (working copy)
> > @@ -1,4 +1,5 @@
> >  /* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
> >
> >  #include <stdarg.h>
> >  #include "../../tree-vect.h"
> > Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c      (revision 271513)
> > +++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c      (working copy)
> > @@ -1,5 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-require-effective-target vect_int } */
> > +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
> >
> >  #include <stdarg.h>
> >  #include "../../tree-vect.h"
> > Index: gcc/testsuite/gcc.dg/vect/vect.exp
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/vect/vect.exp  (revision 271513)
> > +++ gcc/testsuite/gcc.dg/vect/vect.exp  (working copy)
> > @@ -45,7 +45,7 @@ if ![check_vect_support_and_set_flags] {
> >  }
> >
> >  # These flags are used for all targets.
> > -lappend DEFAULT_VECTCFLAGS "-ftree-vectorize" "-fno-vect-cost-model" "-fno-common"
> > +lappend DEFAULT_VECTCFLAGS "-ftree-vectorize" "-fno-tree-loop-distribute-patterns" "-fno-vect-cost-model" "-fno-common"
> >
> >  # Initialize `dg'.
> >  dg-init
> > Index: gcc/testsuite/gcc.target/i386/pr30970.c
> > ===================================================================
> > --- gcc/testsuite/gcc.target/i386/pr30970.c     (revision 271513)
> > +++ gcc/testsuite/gcc.target/i386/pr30970.c     (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile }
> > -/* { dg-options "-msse2 -O2 -ftree-vectorize -mtune=generic" } */
> > +/* { dg-options "-msse2 -O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -mtune=generic" } */
> >
> >  #define N 256
> >  int b[N];
> > Index: gcc/testsuite/gcc.target/i386/vect-double-1.c
> > ===================================================================
> > --- gcc/testsuite/gcc.target/i386/vect-double-1.c       (revision 271513)
> > +++ gcc/testsuite/gcc.target/i386/vect-double-1.c       (working copy)
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-skip-if "" { *-*-* } { "-march=*" } { "-march=core2" } } */
> > -/* { dg-options "-O2 -ftree-vectorize -mfpmath=sse -march=core2 -fdump-tree-vect-stats" } */
> > +/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns -mfpmath=sse -march=core2 -fdump-tree-vect-stats" } */
> >  /* { dg-add-options bind_pic_locally } */
> >
> >  extern void abort (void);
> > Index: gcc/testsuite/gcc.target/i386/vect-double-2.c
> > ===================================================================
> > --- gcc/testsuite/gcc.target/i386/vect-double-2.c       (revision 271513)
> > +++ gcc/testsuite/gcc.target/i386/vect-double-2.c       (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -ftree-vectorize -mfpmath=sse -msse2 -mtune=atom -fdump-tree-vect-stats" } */
> > +/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns -mfpmath=sse -msse2 -mtune=atom -fdump-tree-vect-stats" } */
> >
> >  extern void abort (void);
> >
> > Index: gcc/testsuite/gfortran.dg/vect/vect-5.f90
> > ===================================================================
> > --- gcc/testsuite/gfortran.dg/vect/vect-5.f90   (revision 271513)
> > +++ gcc/testsuite/gfortran.dg/vect/vect-5.f90   (working copy)
> > @@ -1,5 +1,5 @@
> >  ! { dg-require-effective-target vect_int }
> > -! { dg-additional-options "--param vect-max-peeling-for-alignment=0" }
> > +! { dg-additional-options "-fno-tree-loop-distribute-patterns --param vect-max-peeling-for-alignment=0" }
> >
> >          Subroutine foo (N, M)
> >          Integer N
> > Index: gcc/testsuite/gfortran.dg/vect/vect-8.f90
> > ===================================================================
> > --- gcc/testsuite/gfortran.dg/vect/vect-8.f90   (revision 271513)
> > +++ gcc/testsuite/gfortran.dg/vect/vect-8.f90   (working copy)
> > @@ -1,6 +1,6 @@
> >  ! { dg-do compile }
> >  ! { dg-require-effective-target vect_double }
> > -! { dg-additional-options "-finline-matmul-limit=0" }
> > +! { dg-additional-options "-fno-tree-loop-distribute-patterns -finline-matmul-limit=0" }
> >
> >  module lfk_prec
> >   integer, parameter :: dp=kind(1.d0)
> > Index: gcc/tree-loop-distribution.c
> > ===================================================================
> > --- gcc/tree-loop-distribution.c        (revision 271513)
> > +++ gcc/tree-loop-distribution.c        (working copy)
> > @@ -115,6 +115,7 @@ along with GCC; see the file COPYING3.
> >  #include "params.h"
> >  #include "tree-vectorizer.h"
> >  #include "tree-eh.h"
> > +#include "gimple-fold.h"
> >
> >
> >  #define MAX_DATAREFS_NUM \
> > @@ -1028,6 +1029,7 @@ generate_memset_builtin (struct loop *lo
> >    fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_MEMSET));
> >    fn_call = gimple_build_call (fn, 3, mem, val, nb_bytes);
> >    gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
> > +  fold_stmt (&gsi);
> >
> >    if (dump_file && (dump_flags & TDF_DETAILS))
> >      {
> > @@ -1071,6 +1073,7 @@ generate_memcpy_builtin (struct loop *lo
> >    fn = build_fold_addr_expr (builtin_decl_implicit (kind));
> >    fn_call = gimple_build_call (fn, 3, dest, src, nb_bytes);
> >    gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
> > +  fold_stmt (&gsi);
> >
> >    if (dump_file && (dump_flags & TDF_DETAILS))
> >      {
> > @@ -2769,7 +2772,8 @@ finalize_partitions (struct loop *loop,
> >
> >  static int
> >  distribute_loop (struct loop *loop, vec<gimple *> stmts,
> > -                control_dependences *cd, int *nb_calls, bool *destroy_p)
> > +                control_dependences *cd, int *nb_calls, bool *destroy_p,
> > +                bool only_patterns_p)
> >  {
> >    ddrs_table = new hash_table<ddr_hasher> (389);
> >    struct graph *rdg;
> > @@ -2843,7 +2847,7 @@ distribute_loop (struct loop *loop, vec<
> >
> >    /* If we are only distributing patterns but did not detect any,
> >       simply bail out.  */
> > -  if (!flag_tree_loop_distribution
> > +  if (only_patterns_p
> >        && !any_builtin)
> >      {
> >        nbp = 0;
> > @@ -2855,7 +2859,7 @@ distribute_loop (struct loop *loop, vec<
> >       a loop into pieces, separated by builtin calls.  That is, we
> >       only want no or a single loop body remaining.  */
> >    struct partition *into;
> > -  if (!flag_tree_loop_distribution)
> > +  if (only_patterns_p)
> >      {
> >        for (i = 0; partitions.iterate (i, &into); ++i)
> >         if (!partition_builtin_p (into))
> > @@ -3085,7 +3089,6 @@ prepare_perfect_loop_nest (struct loop *
> >          && loop_outer (outer)
> >          && outer->inner == loop && loop->next == NULL
> >          && single_exit (outer)
> > -        && optimize_loop_for_speed_p (outer)
> >          && !chrec_contains_symbols_defined_in_loop (niters, outer->num)
> >          && (niters = number_of_latch_executions (outer)) != NULL_TREE
> >          && niters != chrec_dont_know)
> > @@ -3139,9 +3142,11 @@ pass_loop_distribution::execute (functio
> >       walking to innermost loops.  */
> >    FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST)
> >      {
> > -      /* Don't distribute multiple exit edges loop, or cold loop.  */
> > +      /* Don't distribute multiple exit edges loop, or cold loop when
> > +         not doing pattern detection.  */
> >        if (!single_exit (loop)
> > -         || !optimize_loop_for_speed_p (loop))
> > +         || (!flag_tree_loop_distribute_patterns
> > +             && !optimize_loop_for_speed_p (loop)))
> >         continue;
> >
> >        /* Don't distribute loop if niters is unknown.  */
> > @@ -3169,9 +3174,10 @@ pass_loop_distribution::execute (functio
> >
> >           bool destroy_p;
> >           int nb_generated_loops, nb_generated_calls;
> > -         nb_generated_loops = distribute_loop (loop, work_list, cd,
> > -                                               &nb_generated_calls,
> > -                                               &destroy_p);
> > +         nb_generated_loops
> > +           = distribute_loop (loop, work_list, cd, &nb_generated_calls,
> > +                              &destroy_p, (!optimize_loop_for_speed_p (loop)
> > +                                           || !flag_tree_loop_distribution));
> >           if (destroy_p)
> >             loops_to_be_destroyed.safe_push (loop);
> >
>
Christophe Lyon May 27, 2019, 1:38 p.m. UTC | #4
On Mon, 27 May 2019 at 09:26, Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 27 May 2019, Christophe Lyon wrote:
>
> > On Thu, 23 May 2019 at 13:32, Richard Biener <rguenther@suse.de> wrote:
> > >
> > > On Wed, 22 May 2019, Richard Biener wrote:
> > >
> > > >
> > > > This enables -ftree-loop-distribute-patterns at -O[2s] and also
> > > > arranges cold loops to be still processed but for pattern
> > > > recognition to save code-size.
> > > >
> > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > > >
> > > > Martin has done extensive compile-time testing on SPEC
> > > > identifying only a single regression I'll have a look into.
> > >
> > > The reason for the compile-time regression is the complexity
> > > heuristic in LRA no longer choosing "simple" algorithms and
> > > the LIVE problem in particular being awfully slow.
> > >
> > > Unsurprisingly testing has also revealed loads of testsuite
> > > fallout which I deal with in the patch as committed below.
> > > Sorry for any further fallout on other targets (which I do
> > > expect).
> > >
> >
> > Hi Richard,http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/271588/report-build-info.html
> >
> > Indeed git bisect pointed me to this commit when checking
> > the regressions on arm & aarch64 reported at:
> > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/271588/report-build-info.html
> >
> > Since I'm a bit later in reporting, I'm not sure you've fixed them already?
> > (I didn't notice follow-ups)
> > Looking at this patch, it seems adding -fno-tree-loop-distribute-patterns to
> > dg-options is the standard way of fixing the regressions?
>
> Yes.  As I wrote above I did expect some target specific fallout and
> hoped target maintainers would fix that.
>

OK, I've committed the attached patch as r271662.

Christophe

> Richard.
>
> > Christophe
> >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > >
> > > Richard.
> > >
> > > 2019-05-23  Richard Biener  <rguenther@suse.de>
> > >
> > >         PR tree-optimization/88440
> > >         * opts.c (default_options_table): Enable -ftree-loop-distribute-patterns
> > >         at -O[2s]+.
> > >         * tree-loop-distribution.c (generate_memset_builtin): Fold the
> > >         generated call.
> > >         (generate_memcpy_builtin): Likewise.
> > >         (distribute_loop): Pass in whether to only distribute patterns.
> > >         (prepare_perfect_loop_nest): Also allow size optimization.
> > >         (pass_loop_distribution::execute): When optimizing a loop
> > >         nest for size allow pattern replacement.
> > >
> > >         * gcc.dg/tree-ssa/ldist-37.c: New testcase.
> > >         * gcc.dg/tree-ssa/ldist-38.c: Likewise.
> > >         * gcc.dg/vect/vect.exp: Add -fno-tree-loop-distribute-patterns.
> > >         * gcc.dg/tree-ssa/ldist-37.c: Adjust.
> > >         * gcc.dg/tree-ssa/ldist-38.c: Likewise.
> > >         * g++.dg/tree-ssa/pr78847.C: Likewise.
> > >         * gcc.dg/autopar/pr39500-1.c: Likewise.
> > >         * gcc.dg/autopar/reduc-1char.c: Likewise.
> > >         * gcc.dg/autopar/reduc-7.c: Likewise.
> > >         * gcc.dg/tree-ssa/ivopts-lt-2.c: Likewise.
> > >         * gcc.dg/tree-ssa/ivopts-lt.c: Likewise.
> > >         * gcc.dg/tree-ssa/predcom-dse-1.c: Likewise.
> > >         * gcc.dg/tree-ssa/predcom-dse-2.c: Likewise.
> > >         * gcc.dg/tree-ssa/predcom-dse-3.c: Likewise.
> > >         * gcc.dg/tree-ssa/predcom-dse-4.c: Likewise.
> > >         * gcc.dg/tree-ssa/prefetch-7.c: Likewise.
> > >         * gcc.dg/tree-ssa/prefetch-8.c: Likewise.
> > >         * gcc.dg/tree-ssa/prefetch-9.c: Likewise.
> > >         * gcc.dg/tree-ssa/scev-11.c: Likewise.
> > >         * gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: Likewise.
> > >         * gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: Likewise.
> > >         * gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: Likewise.
> > >         * gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: Likewise.
> > >         * gcc.target/i386/pr30970.c: Likewise.
> > >         * gcc.target/i386/vect-double-1.c: Likewise.
> > >         * gcc.target/i386/vect-double-2.c: Likewise.
> > >         * gcc.dg/tree-ssa/gen-vect-2.c: Likewise.
> > >         * gcc.dg/tree-ssa/gen-vect-26.c: Likewise.
> > >         * gcc.dg/tree-ssa/gen-vect-28.c: Likewise.
> > >         * gcc.dg/tree-ssa/gen-vect-32.c: Likewise.
> > >         * gfortran.dg/vect/vect-5.f90: Likewise.
> > >         * gfortran.dg/vect/vect-8.f90: Likewise.
> > >
> > > Index: gcc/opts.c
> > > ===================================================================
> > > --- gcc/opts.c  (revision 271513)
> > > +++ gcc/opts.c  (working copy)
> > > @@ -550,7 +550,7 @@ static const struct default_options defa
> > >      { OPT_LEVELS_3_PLUS, OPT_fpredictive_commoning, NULL, 1 },
> > >      { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
> > >      { OPT_LEVELS_3_PLUS, OPT_fsplit_paths, NULL, 1 },
> > > -    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
> > > +    { OPT_LEVELS_2_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
> > >      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribution, NULL, 1 },
> > >      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
> > >      { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
> > > Index: gcc/testsuite/g++.dg/tree-ssa/pr78847.C
> > > ===================================================================
> > > --- gcc/testsuite/g++.dg/tree-ssa/pr78847.C     (revision 271513)
> > > +++ gcc/testsuite/g++.dg/tree-ssa/pr78847.C     (working copy)
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-do compile } */
> > >  /* { dg-require-effective-target c++14 } */
> > > -/* { dg-options "-O3 -fdump-tree-ldist" } */
> > > +/* { dg-options "-O3 -fdump-tree-ldist-optimized" } */
> > >
> > >  #include <stddef.h>
> > >  #include <cstring>
> > > @@ -23,4 +23,4 @@ void testWithLoopValue(const Foo foo, si
> > >        buf_[ptr++] = c;
> > >  }
> > >
> > > -/* { dg-final { scan-tree-dump "memcpy\[^\n\r\]*, 9\\);" "ldist" } } */
> > > +/* { dg-final { scan-tree-dump "split to 0 loops and 1 library calls" "ldist" } } */
> > > Index: gcc/testsuite/gcc.dg/autopar/pr39500-1.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/autopar/pr39500-1.c    (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/autopar/pr39500-1.c    (working copy)
> > > @@ -1,7 +1,7 @@
> > >  /* pr39500: autopar fails to parallel */
> > >  /* origin: nemokingdom@gmail.com(LiFeng) */
> > >  /* { dg-do compile } */
> > > -/* { dg-options "-O2 -ftree-parallelize-loops=4 -fdump-tree-parloops2-details" } */
> > > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-parallelize-loops=4 -fdump-tree-parloops2-details" } */
> > >
> > >  void abort (void);
> > >
> > > Index: gcc/testsuite/gcc.dg/autopar/reduc-1char.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/autopar/reduc-1char.c  (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/autopar/reduc-1char.c  (working copy)
> > > @@ -61,5 +61,5 @@ int main (void)
> > >
> > >
> > >  /* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops2" } } */
> > > -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 4 "parloops2" } } */
> > > +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops2" } } */
> > >
> > > Index: gcc/testsuite/gcc.dg/autopar/reduc-7.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/autopar/reduc-7.c      (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/autopar/reduc-7.c      (working copy)
> > > @@ -85,5 +85,5 @@ int main (void)
> > >
> > >
> > >  /* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops2" } } */
> > > -/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops2" } } */
> > > +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops2" } } */
> > >
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c  (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-2.c  (working copy)
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-do run { target vect_cmdline_needed } } */
> > > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> > > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> > > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> > > +/* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> > >
> > >  #include <stdlib.h>
> > >
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-26.c (working copy)
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-do run { target vect_cmdline_needed } } */
> > > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> > > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> > > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> > > +/* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> > >
> > >  #include <stdlib.h>
> > >
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-28.c (working copy)
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-do run { target vect_cmdline_needed } } */
> > > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> > > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic -mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> > > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fvect-cost-model=dynamic" } */
> > > +/* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> > >
> > >  #include <stdlib.h>
> > >
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/gen-vect-32.c (working copy)
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do run { target vect_cmdline_needed } } */
> > > -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details -fno-vect-cost-model" } */
> > > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -fdump-tree-vect-details -fno-vect-cost-model" } */
> > >  /* { dg-additional-options "-mno-sse" { target { i?86-*-* x86_64-*-* } } } */
> > >
> > >  #include <stdlib.h>
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt-2.c (working copy)
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do compile } */
> > > -/* { dg-options "-O2 -fdump-tree-ivopts" } */
> > > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fdump-tree-ivopts" } */
> > >  /* { dg-skip-if "PR68644" { hppa*-*-* powerpc*-*-* } } */
> > >
> > >  void
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c   (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c   (working copy)
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do compile } */
> > > -/* { dg-options "-O2 -fdump-tree-ivopts" } */
> > > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fdump-tree-ivopts" } */
> > >  /* { dg-require-effective-target stdint_types } */
> > >
> > >  #include "stdint.h"
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c       (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-1.c       (working copy)
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do run } */
> > > -/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
> > > +/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
> > >
> > >  int arr[105] = {2, 3, 5, 7, 11};
> > >  int result0[10] = {2, 3, 5, 7, 11};
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c       (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-2.c       (working copy)
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do run } */
> > > -/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
> > > +/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
> > >
> > >  int arr[105] = {2, 3, 5, 7, 11};
> > >  int result0[10] = {2, 3, 5, 7, 11};
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c       (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-3.c       (working copy)
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do run } */
> > > -/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
> > > +/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
> > >
> > >  int arr1[105] = {2, 3, 5, 7, 11, 13, 0};
> > >  int arr2[105] = {2, 3, 5, 7, 11, 13, 0};
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c       (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-dse-4.c       (working copy)
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do run } */
> > > -/* { dg-options "-O2 -fno-inline -fpredictive-commoning -fdump-tree-pcom-details" } */
> > > +/* { dg-options "-O2 -fno-inline -fno-tree-loop-distribute-patterns -fpredictive-commoning -fdump-tree-pcom-details" } */
> > >
> > >  int arr[105] = {2, 3, 5, 7, 11};
> > >  int result0[10] = {2, 3, 5, 7, 11};
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c  (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/prefetch-7.c  (working copy)
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> > > -/* { dg-options "-O2 -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> > > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> > >
> > >  #define K 1000000
> > >  int a[K];
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c  (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/prefetch-8.c  (working copy)
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> > > -/* { dg-options "-O2 -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> > > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> > >
> > >  #define K 1000000
> > >  int a[K];
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c  (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/prefetch-9.c  (working copy)
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> > > -/* { dg-options "-O2 -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> > > +/* { dg-options "-O2 -fno-tree-loop-distribute-patterns -fprefetch-loop-arrays -march=amdfam10 --param simultaneous-prefetches=100 -fdump-tree-aprefetch-details -fdump-tree-optimized" } */
> > >
> > >  #define K 1000000
> > >  int a[K], b[K];
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/scev-11.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/scev-11.c     (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/scev-11.c     (working copy)
> > > @@ -15,7 +15,7 @@ foo (int n)
> > >      {
> > >        unsigned char uc = (unsigned char)i;
> > >        a[i] = i;
> > > -      b[uc] = 0;
> > > +      b[uc] = 1;
> > >      }
> > >
> > >    bar (a);
> > > Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c        (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c        (working copy)
> > > @@ -1,4 +1,5 @@
> > >  /* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
> > >
> > >  #include <stdarg.h>
> > >  #include "../../tree-vect.h"
> > > Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c        (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c        (working copy)
> > > @@ -1,5 +1,6 @@
> > >  /* { dg-do compile } */
> > >  /* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
> > >
> > >  #include <stdarg.h>
> > >  #include "../../tree-vect.h"
> > > Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c      (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c      (working copy)
> > > @@ -1,4 +1,5 @@
> > >  /* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
> > >
> > >  #include <stdarg.h>
> > >  #include "../../tree-vect.h"
> > > Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c      (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c      (working copy)
> > > @@ -1,5 +1,6 @@
> > >  /* { dg-do compile } */
> > >  /* { dg-require-effective-target vect_int } */
> > > +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
> > >
> > >  #include <stdarg.h>
> > >  #include "../../tree-vect.h"
> > > Index: gcc/testsuite/gcc.dg/vect/vect.exp
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/vect/vect.exp  (revision 271513)
> > > +++ gcc/testsuite/gcc.dg/vect/vect.exp  (working copy)
> > > @@ -45,7 +45,7 @@ if ![check_vect_support_and_set_flags] {
> > >  }
> > >
> > >  # These flags are used for all targets.
> > > -lappend DEFAULT_VECTCFLAGS "-ftree-vectorize" "-fno-vect-cost-model" "-fno-common"
> > > +lappend DEFAULT_VECTCFLAGS "-ftree-vectorize" "-fno-tree-loop-distribute-patterns" "-fno-vect-cost-model" "-fno-common"
> > >
> > >  # Initialize `dg'.
> > >  dg-init
> > > Index: gcc/testsuite/gcc.target/i386/pr30970.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.target/i386/pr30970.c     (revision 271513)
> > > +++ gcc/testsuite/gcc.target/i386/pr30970.c     (working copy)
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do compile }
> > > -/* { dg-options "-msse2 -O2 -ftree-vectorize -mtune=generic" } */
> > > +/* { dg-options "-msse2 -O2 -fno-tree-loop-distribute-patterns -ftree-vectorize -mtune=generic" } */
> > >
> > >  #define N 256
> > >  int b[N];
> > > Index: gcc/testsuite/gcc.target/i386/vect-double-1.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.target/i386/vect-double-1.c       (revision 271513)
> > > +++ gcc/testsuite/gcc.target/i386/vect-double-1.c       (working copy)
> > > @@ -1,6 +1,6 @@
> > >  /* { dg-do compile } */
> > >  /* { dg-skip-if "" { *-*-* } { "-march=*" } { "-march=core2" } } */
> > > -/* { dg-options "-O2 -ftree-vectorize -mfpmath=sse -march=core2 -fdump-tree-vect-stats" } */
> > > +/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns -mfpmath=sse -march=core2 -fdump-tree-vect-stats" } */
> > >  /* { dg-add-options bind_pic_locally } */
> > >
> > >  extern void abort (void);
> > > Index: gcc/testsuite/gcc.target/i386/vect-double-2.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.target/i386/vect-double-2.c       (revision 271513)
> > > +++ gcc/testsuite/gcc.target/i386/vect-double-2.c       (working copy)
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do compile } */
> > > -/* { dg-options "-O2 -ftree-vectorize -mfpmath=sse -msse2 -mtune=atom -fdump-tree-vect-stats" } */
> > > +/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns -mfpmath=sse -msse2 -mtune=atom -fdump-tree-vect-stats" } */
> > >
> > >  extern void abort (void);
> > >
> > > Index: gcc/testsuite/gfortran.dg/vect/vect-5.f90
> > > ===================================================================
> > > --- gcc/testsuite/gfortran.dg/vect/vect-5.f90   (revision 271513)
> > > +++ gcc/testsuite/gfortran.dg/vect/vect-5.f90   (working copy)
> > > @@ -1,5 +1,5 @@
> > >  ! { dg-require-effective-target vect_int }
> > > -! { dg-additional-options "--param vect-max-peeling-for-alignment=0" }
> > > +! { dg-additional-options "-fno-tree-loop-distribute-patterns --param vect-max-peeling-for-alignment=0" }
> > >
> > >          Subroutine foo (N, M)
> > >          Integer N
> > > Index: gcc/testsuite/gfortran.dg/vect/vect-8.f90
> > > ===================================================================
> > > --- gcc/testsuite/gfortran.dg/vect/vect-8.f90   (revision 271513)
> > > +++ gcc/testsuite/gfortran.dg/vect/vect-8.f90   (working copy)
> > > @@ -1,6 +1,6 @@
> > >  ! { dg-do compile }
> > >  ! { dg-require-effective-target vect_double }
> > > -! { dg-additional-options "-finline-matmul-limit=0" }
> > > +! { dg-additional-options "-fno-tree-loop-distribute-patterns -finline-matmul-limit=0" }
> > >
> > >  module lfk_prec
> > >   integer, parameter :: dp=kind(1.d0)
> > > Index: gcc/tree-loop-distribution.c
> > > ===================================================================
> > > --- gcc/tree-loop-distribution.c        (revision 271513)
> > > +++ gcc/tree-loop-distribution.c        (working copy)
> > > @@ -115,6 +115,7 @@ along with GCC; see the file COPYING3.
> > >  #include "params.h"
> > >  #include "tree-vectorizer.h"
> > >  #include "tree-eh.h"
> > > +#include "gimple-fold.h"
> > >
> > >
> > >  #define MAX_DATAREFS_NUM \
> > > @@ -1028,6 +1029,7 @@ generate_memset_builtin (struct loop *lo
> > >    fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_MEMSET));
> > >    fn_call = gimple_build_call (fn, 3, mem, val, nb_bytes);
> > >    gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
> > > +  fold_stmt (&gsi);
> > >
> > >    if (dump_file && (dump_flags & TDF_DETAILS))
> > >      {
> > > @@ -1071,6 +1073,7 @@ generate_memcpy_builtin (struct loop *lo
> > >    fn = build_fold_addr_expr (builtin_decl_implicit (kind));
> > >    fn_call = gimple_build_call (fn, 3, dest, src, nb_bytes);
> > >    gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
> > > +  fold_stmt (&gsi);
> > >
> > >    if (dump_file && (dump_flags & TDF_DETAILS))
> > >      {
> > > @@ -2769,7 +2772,8 @@ finalize_partitions (struct loop *loop,
> > >
> > >  static int
> > >  distribute_loop (struct loop *loop, vec<gimple *> stmts,
> > > -                control_dependences *cd, int *nb_calls, bool *destroy_p)
> > > +                control_dependences *cd, int *nb_calls, bool *destroy_p,
> > > +                bool only_patterns_p)
> > >  {
> > >    ddrs_table = new hash_table<ddr_hasher> (389);
> > >    struct graph *rdg;
> > > @@ -2843,7 +2847,7 @@ distribute_loop (struct loop *loop, vec<
> > >
> > >    /* If we are only distributing patterns but did not detect any,
> > >       simply bail out.  */
> > > -  if (!flag_tree_loop_distribution
> > > +  if (only_patterns_p
> > >        && !any_builtin)
> > >      {
> > >        nbp = 0;
> > > @@ -2855,7 +2859,7 @@ distribute_loop (struct loop *loop, vec<
> > >       a loop into pieces, separated by builtin calls.  That is, we
> > >       only want no or a single loop body remaining.  */
> > >    struct partition *into;
> > > -  if (!flag_tree_loop_distribution)
> > > +  if (only_patterns_p)
> > >      {
> > >        for (i = 0; partitions.iterate (i, &into); ++i)
> > >         if (!partition_builtin_p (into))
> > > @@ -3085,7 +3089,6 @@ prepare_perfect_loop_nest (struct loop *
> > >          && loop_outer (outer)
> > >          && outer->inner == loop && loop->next == NULL
> > >          && single_exit (outer)
> > > -        && optimize_loop_for_speed_p (outer)
> > >          && !chrec_contains_symbols_defined_in_loop (niters, outer->num)
> > >          && (niters = number_of_latch_executions (outer)) != NULL_TREE
> > >          && niters != chrec_dont_know)
> > > @@ -3139,9 +3142,11 @@ pass_loop_distribution::execute (functio
> > >       walking to innermost loops.  */
> > >    FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST)
> > >      {
> > > -      /* Don't distribute multiple exit edges loop, or cold loop.  */
> > > +      /* Don't distribute multiple exit edges loop, or cold loop when
> > > +         not doing pattern detection.  */
> > >        if (!single_exit (loop)
> > > -         || !optimize_loop_for_speed_p (loop))
> > > +         || (!flag_tree_loop_distribute_patterns
> > > +             && !optimize_loop_for_speed_p (loop)))
> > >         continue;
> > >
> > >        /* Don't distribute loop if niters is unknown.  */
> > > @@ -3169,9 +3174,10 @@ pass_loop_distribution::execute (functio
> > >
> > >           bool destroy_p;
> > >           int nb_generated_loops, nb_generated_calls;
> > > -         nb_generated_loops = distribute_loop (loop, work_list, cd,
> > > -                                               &nb_generated_calls,
> > > -                                               &destroy_p);
> > > +         nb_generated_loops
> > > +           = distribute_loop (loop, work_list, cd, &nb_generated_calls,
> > > +                              &destroy_p, (!optimize_loop_for_speed_p (loop)
> > > +                                           || !flag_tree_loop_distribution));
> > >           if (destroy_p)
> > >             loops_to_be_destroyed.safe_push (loop);
> > >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)
gcc/testsuite/ChangeLog:

2019-05-27  Christophe Lyon  <christophe.lyon@linaro.org>

	PR tree-optimization/88440
	gcc/testsuite/
	* gcc.target/aarch64/sve/index_offset_1.c: Add -fno-tree-loop-distribute-patterns.
	* gcc.target/aarch64/sve/single_1.c: Likewise.
	* gcc.target/aarch64/sve/single_2.c: Likewise.
	* gcc.target/aarch64/sve/single_3.c: Likewise.
	* gcc.target/aarch64/sve/single_4.c: Likewise.
	* gcc.target/aarch64/sve/vec_init_1.c: Likewise.
	* gcc.target/aarch64/vect-fmovd-zero.c: Likewise.
	* gcc.target/aarch64/vect-fmovf-zero.c: Likewise.
	* gcc.target/arm/ivopts.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/index_offset_1.c b/gcc/testsuite/gcc.target/aarch64/sve/index_offset_1.c
index 31d46aa..a26be32 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/index_offset_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/index_offset_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256" } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 -fno-tree-loop-distribute-patterns" } */
 
 #define SIZE (15 * 8 + 3)
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/single_1.c b/gcc/testsuite/gcc.target/aarch64/sve/single_1.c
index 11b88ae..e3a8409 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/single_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/single_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -fopenmp-simd -msve-vector-bits=256" } */
+/* { dg-options "-O2 -ftree-vectorize -fopenmp-simd -msve-vector-bits=256 -fno-tree-loop-distribute-patterns" } */
 
 #ifndef N
 #define N 32
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/single_2.c b/gcc/testsuite/gcc.target/aarch64/sve/single_2.c
index 1fbf489..195ee20 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/single_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/single_2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -fopenmp-simd -msve-vector-bits=512" } */
+/* { dg-options "-O2 -ftree-vectorize -fopenmp-simd -msve-vector-bits=512 -fno-tree-loop-distribute-patterns" } */
 
 #define N 64
 #include "single_1.c"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/single_3.c b/gcc/testsuite/gcc.target/aarch64/sve/single_3.c
index a3688b6..e031276 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/single_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/single_3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -fopenmp-simd -msve-vector-bits=1024" } */
+/* { dg-options "-O2 -ftree-vectorize -fopenmp-simd -msve-vector-bits=1024 -fno-tree-loop-distribute-patterns" } */
 
 #define N 128
 #include "single_1.c"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/single_4.c b/gcc/testsuite/gcc.target/aarch64/sve/single_4.c
index 08965d3..01ff7f6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/single_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/single_4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -fopenmp-simd -msve-vector-bits=2048" } */
+/* { dg-options "-O2 -ftree-vectorize -fopenmp-simd -msve-vector-bits=2048 -fno-tree-loop-distribute-patterns" } */
 
 #define N 256
 #include "single_1.c"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vec_init_1.c b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_1.c
index 6042606..1624ab1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/vec_init_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/vec_init_1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize" } */
+/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns" } */
 
 #include <stdint.h>
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fmovd-zero.c b/gcc/testsuite/gcc.target/aarch64/vect-fmovd-zero.c
index c987f5f..a51aa33 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-fmovd-zero.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-fmovd-zero.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all -fno-vect-cost-model" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all -fno-vect-cost-model -fno-tree-loop-distribute-patterns" } */
 
 #pragma GCC target "+nosve"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fmovf-zero.c b/gcc/testsuite/gcc.target/aarch64/vect-fmovf-zero.c
index 22a0535..8dfd26b 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-fmovf-zero.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-fmovf-zero.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all -fno-vect-cost-model" } */
+/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-all -fno-vect-cost-model -fno-tree-loop-distribute-patterns" } */
 
 #pragma GCC target "+nosve"
 
diff --git a/gcc/testsuite/gcc.target/arm/ivopts.c b/gcc/testsuite/gcc.target/arm/ivopts.c
index 2bb6cc4..5d27240 100644
--- a/gcc/testsuite/gcc.target/arm/ivopts.c
+++ b/gcc/testsuite/gcc.target/arm/ivopts.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble } */
-/* { dg-options "-Os -fdump-tree-ivopts -save-temps" } */
+/* { dg-options "-Os -fdump-tree-ivopts -save-temps -fno-tree-loop-distribute-patterns" } */
 
 void
 tr5 (short array[], int n)
diff mbox series

Patch

Index: gcc/opts.c
===================================================================
--- gcc/opts.c	(revision 271463)
+++ gcc/opts.c	(working copy)
@@ -550,7 +550,7 @@  static const struct default_options defa
     { OPT_LEVELS_3_PLUS, OPT_fpredictive_commoning, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fsplit_loops, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fsplit_paths, NULL, 1 },
-    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
+    { OPT_LEVELS_2_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribution, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_vectorize, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
Index: gcc/tree-loop-distribution.c
===================================================================
--- gcc/tree-loop-distribution.c	(revision 271463)
+++ gcc/tree-loop-distribution.c	(working copy)
@@ -115,6 +115,7 @@  along with GCC; see the file COPYING3.
 #include "params.h"
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
+#include "gimple-fold.h"
 
 
 #define MAX_DATAREFS_NUM \
@@ -1028,6 +1029,7 @@  generate_memset_builtin (struct loop *lo
   fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_MEMSET));
   fn_call = gimple_build_call (fn, 3, mem, val, nb_bytes);
   gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
+  fold_stmt (&gsi);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -1071,6 +1073,7 @@  generate_memcpy_builtin (struct loop *lo
   fn = build_fold_addr_expr (builtin_decl_implicit (kind));
   fn_call = gimple_build_call (fn, 3, dest, src, nb_bytes);
   gsi_insert_after (&gsi, fn_call, GSI_CONTINUE_LINKING);
+  fold_stmt (&gsi);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -2769,7 +2772,8 @@  finalize_partitions (struct loop *loop,
 
 static int
 distribute_loop (struct loop *loop, vec<gimple *> stmts,
-		 control_dependences *cd, int *nb_calls, bool *destroy_p)
+		 control_dependences *cd, int *nb_calls, bool *destroy_p,
+		 bool only_patterns_p)
 {
   ddrs_table = new hash_table<ddr_hasher> (389);
   struct graph *rdg;
@@ -2843,7 +2847,7 @@  distribute_loop (struct loop *loop, vec<
 
   /* If we are only distributing patterns but did not detect any,
      simply bail out.  */
-  if (!flag_tree_loop_distribution
+  if (only_patterns_p
       && !any_builtin)
     {
       nbp = 0;
@@ -2855,7 +2859,7 @@  distribute_loop (struct loop *loop, vec<
      a loop into pieces, separated by builtin calls.  That is, we
      only want no or a single loop body remaining.  */
   struct partition *into;
-  if (!flag_tree_loop_distribution)
+  if (only_patterns_p)
     {
       for (i = 0; partitions.iterate (i, &into); ++i)
 	if (!partition_builtin_p (into))
@@ -3085,7 +3089,6 @@  prepare_perfect_loop_nest (struct loop *
 	 && loop_outer (outer)
 	 && outer->inner == loop && loop->next == NULL
 	 && single_exit (outer)
-	 && optimize_loop_for_speed_p (outer)
 	 && !chrec_contains_symbols_defined_in_loop (niters, outer->num)
 	 && (niters = number_of_latch_executions (outer)) != NULL_TREE
 	 && niters != chrec_dont_know)
@@ -3139,9 +3142,11 @@  pass_loop_distribution::execute (functio
      walking to innermost loops.  */
   FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST)
     {
-      /* Don't distribute multiple exit edges loop, or cold loop.  */
+      /* Don't distribute multiple exit edges loop, or cold loop when
+         not doing pattern detection.  */
       if (!single_exit (loop)
-	  || !optimize_loop_for_speed_p (loop))
+	  || (!flag_tree_loop_distribute_patterns
+	      && !optimize_loop_for_speed_p (loop)))
 	continue;
 
       /* Don't distribute loop if niters is unknown.  */
@@ -3169,9 +3174,10 @@  pass_loop_distribution::execute (functio
 
 	  bool destroy_p;
 	  int nb_generated_loops, nb_generated_calls;
-	  nb_generated_loops = distribute_loop (loop, work_list, cd,
-						&nb_generated_calls,
-						&destroy_p);
+	  nb_generated_loops
+	    = distribute_loop (loop, work_list, cd, &nb_generated_calls,
+			       &destroy_p, (!optimize_loop_for_speed_p (loop)
+					    || !flag_tree_loop_distribution));
 	  if (destroy_p)
 	    loops_to_be_destroyed.safe_push (loop);
 
Index: gcc/testsuite/gcc.dg/tree-ssa/ldist-37.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ldist-37.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/ldist-37.c	(working copy)
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-tree-ldist-optimized" } */
+
+void foo(char* restrict dst, const char* buf)
+{
+  for (int i=0; i<8; ++i)
+    *dst++ = *buf++;
+}
+
+/* { dg-final { scan-tree-dump "split to 0 loops and 1 library calls" "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ldist-38.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ldist-38.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/ldist-38.c	(working copy)
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ldist-optimized" } */
+
+void foo(char* restrict dst, const char* buf)
+{
+  for (int i=0; i<8; ++i)
+    *dst++ = *buf++;
+}
+
+/* { dg-final { scan-tree-dump "split to 0 loops and 1 library calls" "ldist" } } */